Replacing the HTML entity by the actual Unicode character

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/448160/3/includes/api/i18n/en.json#67 is the reason the escaping exists.

This is a bug in the terminology gadget which alters the source text.

Nike (talk)19:50, 16 February 2022

But then there also remains the additional bug in the Translate UI, showing things differently than what is expected (see the screenshot above): HTML escapes present in the source message are passed as is, and silently/automatically replaced by their *equivalent* in HTML when they are rendered by any browser.

If the source message uses any HTML escape, they should be also HTML escaped to render the source message, as well as in the translation (just like when editing a page with the MediaWiki wikitext editor)! However these escapes won't be displayed anywhere else (e.g. when transcluding the messages in any other wiki pages).

The Translate UI should at least HTLM-escape the following basic characters for safety when rendering the source messages (and the translated ones when they are shown in the list of messages of the UI):

  • & as &
  • < as &lt;
  • > as &gt;

And some tests must also be added for how to prefill the content of the editable input box, and how to parser and store it in the translated message page (it should use the same rules as those used by the Mediawiki editor), to make sure that the transform is safe and bijective/reversible (editing, saving as is, then reediting should not alter the content).

Verdy p (talk)19:56, 16 February 2022
 

Thanks for the heads-up. This should be fixed now.

Jon Harald Søby (talk)20:34, 16 February 2022

I'm not alone who was trapped, you can find other languages where translators just typed exactly the code that was displayed in the Translate UI (and did not use the MediaWiki editor to create/modify or review these messages).

For example in Bokmal Norwegian:

There are other messages using {slot} or similar in the MediaWiki API doc to translate. A bug in a translated message at this level however is not critical, it does not affect the Mediawiki parser, or the API itself (as used by bots) or the API portal app (with query assistants), it only affects its translated doc pages (referenced by the API portal application) which may eventually not render (in fact not). So the only thing that is affected is your new Beta Terminology gadget which shows things incorrectly: the gadget should not even be concerned by these inclusions of "var/code" in these messages as they are explicitly not translatable, you should still fix this new gadget).

There are also other message groups using HTML escapes in their source messages (but until now this did not cause any issue for these target projects, or the project maintainers have tuned their own import/export tools to properly escape/unescape what was needed and they tested it, so that actual translators would not have to worry about this very tricky technical detail that most of them will always have dificulties to understand when they edit translations or review them).

It is possible also that these messages were initially translated before the source in English was modified (so before the very recent deployment of the beta version of the Terminology gadget, that most users still don't use as is is still in the list of beta features, which should be disabled by default).

The rendering bug for messages with HTML escapes was still in effect, independantly of this new gadget, or the recent change in the source English message (which casued these messages to become recently fuzzy, and then all proposed to the Translate review interface, which is used FIRST before even editing an isolated message by the wikitext editor).

I suggest that the "/qqq" page contains a warning saying that characters represented by HTML escapes (starting by an ampersand and terminated by colons) must remain as is (provided that the Translate UI properly shows these escapes), including the opening brace given as an example.

Verdy p (talk)20:38, 16 February 2022