Request for feedback about Apertium
Hi translators! We provide machine translation suggestions using Apertium here. It has a feature that if the source language (i.e. English) is not supported by Apertium, it looks for translations to other languages and see if any of them is supported by Apertium, and using that language instead of English to create a suggestion.
We have received multiple complaints about this feature and are considering to remove this feature.
Before removing this feature, I want to know from you whether you find this alternate source language feature useful. Since most language pairs supported by Apertium do not include English, this would remove most translation suggestions from Apertium.
So let us know, should we remove it, keep it, or limit it to some languages only?
There is also a task in Phabricator about this: https://phabricator.wikimedia.org/T177434
Personally I like the Apertium translations – the ones suggested to me are usually Swedish → Norwegian Bokmål. It is not perfect, but it usually gives good suggestions, and for short messages my translations often end up exactly (or almost exactly) like what Apertium suggests.
I have a positive view of it, overall. An English → Hindi pair doesn't exist, so a fallback with mixed Hindi and Urdu words are suggested to me, which is not perfect, but has still given me some good suggestions so far. Personally, it doesn't affect me since Google Translate is now supported and its suggestions are much more accurate than Apertium.
Anyway these are just suggestions. Yes Google is often good, but any automatic translators will not know the context of use (thise can create confusion), or will not enforce a common terminology (so the same term may be translated differently from one message to another, or sometimes within the same message, depending on the structure of the sentence).
One bad thing that automatic translators have with English as the source, is that English is frequently confusive between nouns and verbs, and tends to remove many prepositions that would clarify the sense (as well the source punctuation is often forgotten, or English overuses the capitalization in many things that are actually not artwork title or brand names, so we get often some source messages that look much like the short newsfeeds on TV, with partial sentences and many unexpalined abreviations, sometimes a very specific jargon which is specific to the source project, and spoken by their programmers in their small teams, and it's not always evident to see what they mean).
So it's good if we can not just look at suggestions, but also being able to look at how they were understood in other languages (where they would be explained). That's why we not only see suggestions from automatic translators, but translators should be able (if they can) to read other submitted translations for other languages than just English (this can solve ambiguities).
But one thing should be clear: anything suggested in the right pane of the translate UI are jsut suggestions, hints. They may eventually accelerate edits by avoiding some common typos in terms that don't need to be corrected, but we still need to review them and correct them precisely. Just clicking a suggestion and validating is not acceptable, and we have to use the local search engine and, if there's some indication from the projects, their associated terminologies or guidelines in the Translating project portal, or in the language portal.
And if there are difficulties, we need to place hints or links to relevant discussions in the "/qqq" doc page (including when some message has a pending support request, here or on another channel), or may have some unsuspected consequence for some wiki (without forgetting that they may be specific to a specific target wiki or application instance). As well there are sometimes conflicts with their deployed target versions (here we may have messages accurate only for the latest development version, but still not for the currently deployed versions, and we have no visibility on when they will be deployed, possibly differently across sites or repositories)
If possible, the developers that change messages for a future version should create new message keys and keep the existing keys of messages for versions that are still active and supported). Unfortunately this is not always the case, and it's unpredictable when there's no clear documentation or warning (at least to signal to some target project that there may be some changes no longer working as expected on their current deployed version without preparing them).
So if it was possible, this wiki should be able to track branches/versions of a project for which any given message (or its /qqq documentation) is defined (the same message could belong to several branches). For now there's no such tracking: all versions are mixed in the same set, and we still don't have any lifecycle management for each project (just like we have for example in Git for branches, including those just for tests or for specific targets). The existing message keys could be reused for that, except that it's not easy to indicate multiple concurrent branches in the same key. The other solution would be to indicate branches by the key used to identify existing "message groups" (which already allow creating distinct but intersecting subsets of messages).
Finally, automatic translators are not equally good on all language pairs. One may be better or another, but there's also the local translation memory (for the target language, and for other related languages) that may help. In any case, all these useful suggestions have to be reviewed in the editor and not validated as is after just clicking on any one.
In my case as an en-ja translator, Apertium and its zh-ja translation suggestions are very handy. Basically, translation in zh gives me hints what context the source language (eg. en) holds. But still need to be cautious as some wording holds quite contrasting meaning/different context between ja and zh.
- Zh-hant and ja share writing system, but not grammar, for over 1,400 years.
- Actually, it is zh-hant, or traditional Chinese writing system or 漢字.
I can guess what a sentence written in simplified Chinese, or "zh" means, but still I sometimes have to look up digital dictionaries/thesauri as they are not always visually identical to its ja cousin and requires extra work to read. :)
Please don't apply DeePL at least for en-ja and ja-en translation. For Google, I know its shortcoming and how to fix, but with DeePL, it outputs silky smooth sentences mixing in contextual error. That is very hard to proofread.
- the translator/proofreader circle on ja-Wikipedia has been suffering with direct DeePL output: So much errors annoying those cleaning up the mess. We are seeing more users with good intention, happily dropping new articles that they fabricated with DeePL: too innocent to care about the baseline, you copy-vio by input directly to wikis when you use DeePL.
- It outputs, I am talking about in en-ja language pair, more than often as making the "x is y" context into "x ISN'T y", and that is very disturbing. As its output looks very smooth otherwise, sometimes very easy to slip the proofreaders' attention.
It sometimes does not do the correct translation so sometimes you may say "refrain from using Apertium in this message group".
Repeating my experience at https://phabricator.wikimedia.org/T177434#3658542. As an en-es translator, Apertium is usually just bad and their translations make absolutely no sense many times. I'd love to see it removed for Spanish.
Just an example out of many. The proposed Spanish translation for MediaWiki:Parsoid-stash-rate-limit-error by Apertium is as follows:
- Stashing Falló porque límite de índice estuvo superado. Complacer probar otra vez más tarde.
That is just eye-breaking horrible and makes no sense at all.
Hi, I translate EN -> FR and I do no longer rely on Apertium and sometimes it makes me laugh. A lot of errors of syntax, words not translated, mistakes... I understand it is not easy to translate automatically and I was explained from Apertium that they uses steps of intermediate languages to have the result. So I get rid of it and translate on the flow, and sometimes I use Google Trad but that means that the EN form is not clear enough (ex: Where is the subject?) or appears ambigous depending on the culture I have. Google Trad remain the best for me, very clever to solve gramatical trick FR has. I agree with @MarcoAurelio.
Sometimes, its translations can be pasted and corrected (usually, when the MediaWiki message is shorter). Sometimes. But I fear that some users will unwillingly abuse by the machine translation, pasting the translation without correcting it.
I mostly remember it offering inconsistent translations into Afrikaans, which doesn't make any sense when translating to Dutch. It also mangled formatting and never showed up consistently. The existing Google suggestions are miles ahead of whatever Apertium is doing and almost always present. I wouldn't be sad to see it go, at least for Dutch.
I don’t think anyone should use machine translations as-is anyway (at least for longer messages). As for the question itself, I think that for Russian having translations from Ukrainian and Belarusian is helpful sometimes, since you can see how someone else interpreted a message if its text is not clear to you.