[critical bug] [[MediaWiki:Withoutinterwiki-legend/fr]] : translation mixups: updates are going to the wrong page

Fragment of a discussion from Support
Jump to navigation Jump to search

Actually this started around September 24-26: many edits made after that date (or even reviews) were affected. And this continues even if caches were tentatively purged.

See for example with a new incorrect edit saved based on the wrong unrelated message: https://translatewiki.net/w/i.php?title=Wikimedia:Wikipedia-library-c0f799-Renewal_confirmation/fr&action=history

As long as the new patch for disabling the use of Memcached (by Fuzzybot and other background jobs, including bots exporting data or importing data from external sources) is not applied here (it is still being tested by Mediawiki developers, hoping to find the effective origin of this bug), we still see those bad data. It seems that there's a design bug in how multiple APIs are working together across multiple services, multiple hosts, and in very different security contexts.

The dates of 24-26 September does not match the 8 October deployment. Something occured earlier that had unexpected impact. For now there's no clear isolation of the bug, and there are propably multiple levels of unchecked assumptions between the various components, maintained separately but with a sequencing of events that may have changed and were not supposed to be out of sync or occuring in different orders, because they were not covered by any existing unit tests.

However the bug seems related to a probable bug in internal memory management in PHP for "function closures" (a very tricky piece of code in PHP). The PHP engine effectively used may also have an effect (for example Wikimedia is currently actively working on coming back to PHP7 instead of HHVM, and the past switch from PHP/Zend to HHVM had already caused similar problems some years ago, because PHP is still not defined with very strict semantics). As well there are strange bugs in Memcached itself (still using unsafe assumptions about memory allocation made by C compilers, plus the effects of some deployed mitigations in processor firmwares, used to solve 90% of problems while also adding their own few % of problems).

Thanks Wikimedia works in improving the unitary tests and give more powers to linters to detect false assumptions. But there are tons of compiler/linter warnings to check now: some early solutions to detect them can cause some parts of the software to be simplified, changing the behavior that was previously just "unspecified" but worked as expected. Ideally MediaWiki should be tested by running it on very different architectures and OSes. This is not really the case, but it would help detect more unspecified behaviors and isolate more bugs, while enforcing better software quality by improved portability. However Mediawiki is highly develkoped now in terms of pure performance on the existing and very costly Wikimedia architecture.

But the bug T235188 is now proven to affect also Wikimedia wikis (including Commons, Meta-Wiki and Wikidata, but also some Wikipedias that use the translate extension for some local project pages or for running some tools or because they need to support pages with several orthographies when automatic transliterators are not reliable enough).

Verdy p (talk)22:21, 15 October 2019