Replacing the HTML entity by the actual Unicode character

Replacing the HTML entity by the actual Unicode character

Edited by author.
Last edit: 19:43, 16 February 2022

Hello,

Verdy p insists on replacing an escaped character by the actual Unicode character here: [1][2] while the /qqq clearly states that the content within var tags should not be translated (therefore not touched).

In most other languages ([3][4]), the HTML entity is kept.

What’s your opinion on this?

Thanks.

Thibaut (talk)18:38, 16 February 2022

I have first sent a message to you about it, did you only read it before sending your message here?

There's no HTML-encoding there in the Translate UI, and the "/qqq" doc explicitly says we must preserve them And these 2 messages are not the only messages for the MediaWiki API interface that use opening braces (which are not HTML-encoded, and why just then the opening brace and not the closing brace???).

Anyway these messages have strange/unsupported message IDs (containing equal signs). Most probably a bug of the ID generator used by MediaWiki developers when importing/exporting messages between TWN and the MediaWiki repository. The Translatewiki doc explicitly speaks about how to generate message IDs correctly (and equal signs should not be there, as it breaks various things when TWN generates its contents using the MediaWiki syntax).

Verdy p (talk)18:41, 16 February 2022

"which are not HTML-encoded, and why just then the opening brace and not the closing brace???"

If you want to write <var> for example, you only need to escape the first bracket (&lt;var>).

Thibaut (talk)18:52, 16 February 2022
Edited by author.
Last edit: 19:37, 16 February 2022

Herr Professor, this message is not parsed by MediaWiki as wikitext.

In HTML, there's never the need to HTML-encode braces, only &amp; or &lt;, and sometimes as well &gt; or &quot; in some HTML nodes (like attribute values). So don't explain to me what I know and that is not the topic.

Your revert comments just said "there must be some reason", so you clearly say that you don't have a suitable answer to explain this caveat: you don't know !

The original message in English as it appears in the Translate UI, along with instructions asking to not change the "var" contents; now it is clearly different in the translated French, and contradicts the doc.

TWN displays unencoded plain braces in its Translate/Review interface for the original English (using the MediaWiki editor is not supported for many messages, and not for messages that are not using the MediaWiki syntax, it will bring false displays in that case, and this is the case here).

And as I replied to you, it's not the "community" that can solve this problem, only developers of the Mediawiki API, or developers of TWN. And they must clarify what is really expected here. Otherwise you will pass your time to change edits/reviews made by any one else, in every language, still without knowing why you do that.

Note also that TWN does NOT display the last edit comment of the messages we are editing/reviewing. So once you revert it, the message is to review again, and any one can later revert you againg without being aware (or remembering) that there's some reason for that: there's also nothing about it in the "/qqq" doc. So your own action has no long term effect. It's no productful to threat other people that can't even kwno what was discussed somewhere else (and not even hinted by the Translate UI with the data that it already maintains or generated itself). Any translator in this site can fall in this trap, at any time, don't accuse someone with bad intents when it is not their fault and the reason is technical (and unexplained anywhere).

I did not insist, but YOU only insisted to use HTML encoding which is absent from the source English, and you are contradicting the existing "/qqq" doc by your own insistance. You cannot solve it without hearing something from Mediawiki API developers (in Phabricator).

Note: I have not attacked you, you were the first one to use the term "insist" here, while first hiding the polite message I had sent to you in your talk page. Instead of replying there, you created this thread here. Please don't mix the order of events.

Verdy p (talk)18:58, 16 February 2022

"YOU only insisted to use HTML encoding which is absent from the source English, and you are contradicting the existing "/qqq" doc by your own insistance."

There’s clearly a HTML entity in the original English message, now please stop with the personal attacks and walls of text and let the TWN admins respond on what we should do when we encounter HTML entities within var or code tags in original English messages (not only these two messages), thanks.

Thibaut (talk)19:12, 16 February 2022
 
 
 

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/448160/3/includes/api/i18n/en.json#67 is the reason the escaping exists.

This is a bug in the terminology gadget which alters the source text.

Nike (talk)19:50, 16 February 2022

But then there also remains the additional bug in the Translate UI, showing things differently than what is expected (see the screenshot above): HTML escapes present in the source message are passed as is, and silently/automatically replaced by their *equivalent* in HTML when they are rendered by any browser.

If the source message uses any HTML escape, they should be also HTML escaped to render the source message, as well as in the translation (just like when editing a page with the MediaWiki wikitext editor)! However these escapes won't be displayed anywhere else (e.g. when transcluding the messages in any other wiki pages).

The Translate UI should at least HTLM-escape the following basic characters for safety when rendering the source messages (and the translated ones when they are shown in the list of messages of the UI):

  • & as &amp;
  • < as &lt;
  • > as &gt;

And some tests must also be added for how to prefill the content of the editable input box, and how to parser and store it in the translated message page (it should use the same rules as those used by the Mediawiki editor), to make sure that the transform is safe and bijective/reversible (editing, saving as is, then reediting should not alter the content).

Verdy p (talk)19:56, 16 February 2022
 

Thanks for the heads-up. This should be fixed now.

Jon Harald Søby (talk)20:34, 16 February 2022

I'm not alone who was trapped, you can find other languages where translators just typed exactly the code that was displayed in the Translate UI (and did not use the MediaWiki editor to create/modify or review these messages).

For example in Bokmal Norwegian:

There are other messages using {slot} or similar in the MediaWiki API doc to translate. A bug in a translated message at this level however is not critical, it does not affect the Mediawiki parser, or the API itself (as used by bots) or the API portal app (with query assistants), it only affects its translated doc pages (referenced by the API portal application) which may eventually not render (in fact not). So the only thing that is affected is your new Beta Terminology gadget which shows things incorrectly: the gadget should not even be concerned by these inclusions of "var/code" in these messages as they are explicitly not translatable, you should still fix this new gadget).

There are also other message groups using HTML escapes in their source messages (but until now this did not cause any issue for these target projects, or the project maintainers have tuned their own import/export tools to properly escape/unescape what was needed and they tested it, so that actual translators would not have to worry about this very tricky technical detail that most of them will always have dificulties to understand when they edit translations or review them).

It is possible also that these messages were initially translated before the source in English was modified (so before the very recent deployment of the beta version of the Terminology gadget, that most users still don't use as is is still in the list of beta features, which should be disabled by default).

The rendering bug for messages with HTML escapes was still in effect, independantly of this new gadget, or the recent change in the source English message (which casued these messages to become recently fuzzy, and then all proposed to the Translate review interface, which is used FIRST before even editing an isolated message by the wikitext editor).

I suggest that the "/qqq" page contains a warning saying that characters represented by HTML escapes (starting by an ampersand and terminated by colons) must remain as is (provided that the Translate UI properly shows these escapes), including the opening brace given as an example.

Verdy p (talk)20:38, 16 February 2022