[critical bug] [[MediaWiki:Withoutinterwiki-legend/fr]] : translation mixups: updates are going to the wrong page
[critical bug] MediaWiki:Withoutinterwiki-legend/fr : translation mixups: updates are going to the wrong page
Translations are sent to the wrong page. Trying to set them correctly, the submission is accepted, but when we refresh the page, we still see an unrelated message, not for this resource, and no change is applied (and no error ever displayed).
And if we look deeper, we see that what was submitted has in fact changed another page, creating a mixup, which then affects multiple other pages.
Very probably problem of index corruption (page or version ID's) in the database (I think there are duplicate ID's in some tables that should be unique), or a previous bad use by some prvilege admin of native SQL update.
As well the histories are mixed up across pages.
A maintenance should be performed to check the database integrity (notably check all table indexes).
It is also likely a problem of the background job queue, which seems to be stalled (recent changes are not fed correctly, statistics are out of sync, page counts are wrong...)n or its backlog of pending jobs is too large, or there's a lack of storage on the server.
I just opened the page MediaWiki:Withoutinterwiki-legend/fr and it really shows the unrelated text, but clicking on "Edit" shows the text correctly.
This was a real bug (https://phabricator.wikimedia.org/T235188) and is what is referenced by the site notice at top. It was affecting transwlatewiki.net stince the beginning of month. The bug was a caching bug on the server. Still it affects some resources displaying incorrect "fuzzy" state (yellow icon) which seem to reoccur again. The issue was real and very strange. Its resolution is partial, as some other glitches are still occuring; the resolution seems to be non definitive because the impact is still not well understood. That's why the site notice at top remains.
Quite certain that these issues started appearing since our deployment last Wednesday (8th Oct, 2019). We should not be seeing this issue on twn any more, but if you do, please ping on the Phab ticket or here.
Actually this started around September 24-26: many edits made after that date (or even reviews) were affected. And this continues even if caches were tentatively purged.
See for example with a new incorrect edit saved based on the wrong unrelated message: https://translatewiki.net/w/i.php?title=Wikimedia:Wikipedia-library-c0f799-Renewal_confirmation/fr&action=history
As long as the new patch for disabling the use of Memcached (by Fuzzybot and other background jobs, including bots exporting data or importing data from external sources) is not applied here (it is still being tested by Mediawiki developers, hoping to find the effective origin of this bug), we still see those bad data. It seems that there's a design bug in how multiple APIs are working together across multiple services, multiple hosts, and in very different security contexts.
The dates of 24-26 September does not match the 8 October deployment. Something occured earlier that had unexpected impact. For now there's no clear isolation of the bug, and there are propably multiple levels of unchecked assumptions between the various components, maintained separately but with a sequencing of events that may have changed and were not supposed to be out of sync or occuring in different orders, because they were not covered by any existing unit tests.
However the bug seems related to a probable bug in internal memory management in PHP for "function closures" (a very tricky piece of code in PHP). The PHP engine effectively used may also have an effect (for example Wikimedia is currently actively working on coming back to PHP7 instead of HHVM, and the past switch from PHP/Zend to HHVM had already caused similar problems some years ago, because PHP is still not defined with very strict semantics). As well there are strange bugs in Memcached itself (still using unsafe assumptions about memory allocation made by C compilers, plus the effects of some deployed mitigations in processor firmwares, used to solve 90% of problems while also adding their own few % of problems).
Thanks Wikimedia works in improving the unitary tests and give more powers to linters to detect false assumptions. But there are tons of compiler/linter warnings to check now: some early solutions to detect them can cause some parts of the software to be simplified, changing the behavior that was previously just "unspecified" but worked as expected. Ideally MediaWiki should be tested by running it on very different architectures and OSes. This is not really the case, but it would help detect more unspecified behaviors and isolate more bugs, while enforcing better software quality by improved portability. However Mediawiki is highly develkoped now in terms of pure performance on the existing and very costly Wikimedia architecture.
But the bug T235188 is now proven to affect also Wikimedia wikis (including Commons, Meta-Wiki and Wikidata, but also some Wikipedias that use the translate extension for some local project pages or for running some tools or because they need to support pages with several orthographies when automatic transliterators are not reliable enough).