Language names not in Unicode CLDR

Thread summary:[Link to] [Edit]

CLOSED This issue is closed by Nemo_bis

Ticket 6763 on CLDR requesting help from our translators to translate language names
Language names added to the English source for CLDR.
We're now able to quickly create accounts for our users wishing to translate them in their language.
- Send an email to get access.
- We asked Language support teams, translators-l, mediawiki-i18n and m:Tech/News subscribers; for CLDR 26, 22/42 «users have submitted 11155 items».[1].
- 4200 changes coming from CLDR 25.
Wrap-up: https://lists.wikimedia.org/pipermail/mediawiki-i18n/2014-September/000916.html

Patch is up at gerrit:161912, tracked there from now on.

Hello! We're working on a template for language names at Catalan Wiktionary ([1]). We use {{#language:code|ca}} which is mw:Extension:CLDR (Version 4.0.0 (CLDR 22.1)). I've read FAQ#Language_names, but I can't find at Unicode the "Extremaduran" language (Portal:Ext), so I don't know if it would be correct to apply there the Catalan and Spanish translations (Extremaduran is English). Look at this:

{{#language:ext|ca}} => extremeny
{{#language:ext|es}} => Extremaduran
{{#language:ext|en}} => Extremaduran
{{#language:ext|ext}} => estremeñu

How can I make {{#language:ext|ca}} to give "extremeny" (ref1, ref2), and {{#language:ext|es}} to give "extremeño" (ref1)? Thanks!

Aleator (talk)‎

See CLDR or http://cldr.unicode.org/index/process. We cannot help you more with CLDR at the moment, other than possibly creating an account for you to input data there (and I think that requires an existing locale).

Siebrand‎

It does require an existing locale. See details at CLDR#Creating a new locale.

Purodha Blissenbach (talk)‎

Catalan already has a locale at CLDR.

I think that in order to add Extremaduran to the list of language names which can be localised at CLDR you need to file a ticket to make a change request at CLDR Change Requests. I don't think you need an account to do this. The process for requesting data to be added (to common data I think) is described at "CLDR process". You could try explaining why you want it added to the list of language names.

Lloffiwr (talk)‎

OK, thank you all for your interesting comments. I will finally not ask to Unicode new locales (in plural; Extremaduran was only one of some other languages that have no translations). My prejudice was the belief that Unicode's CLDR supported the translation of the names of "all" languages in the Earth. But "Unicode CLDR 22.0 contains data for 215 languages..." (http://cldr.unicode.org/index/downloads/cldr-22).

Just for curiosity, I wonder where the two words "Extremaduran" and "estremeñu" come from. As far as I guess, the CLDR extension feeds from Unicode's CLDR, but they are not there. Perhaps they comes from Names.php. Anyway, thanks again.

Aleator (talk)‎

Yes, from Names.php

Purodha Blissenbach (talk)‎

I think that the languages supported here but not in the group of language names which can be localised at CLDR, the "Locale Display Names - Languages", are:

aeb, akz, aln, als, aro, arq, ary, arz, ase, avk, azb, bar, bbc, bcc, bcl, bew, bfq, bjn, bpy, bqi, brh, bxr, cbk-zam (not a 3 letter code), cps, diq, dtp, egl, eml, esu, ext, fit, frc, frp, gag, gan, gbz, glk, gom, guc, gur, hak, hif, hsn, izh, jam, jut, kgp, khw, kiu, koi, kri, krj, lez, lfn, lij, liv, lmo, ltg, lzh, lzz, mhr, mrj, mwv, mzn, nan, njo, nov, pcd, pdc, pdt, pfl, pms, pnb, pnt, prg, qug, rgn, rif, rmy, rtm rue, rug, saz, sdc, sei, sgs, sli, sly, stq, szl, tcy, tkr, tly, tru, tsd, ttt, vec, vep, vls, vmf, vro, wuu, xmf, yrl, zea.

I am willing to request the addition of these languages for localisation at CLDR. When submitting these to CLDR we could also:

provide the English names used by MediaWiki for these languages where known
ask the registered translators for their preferred English name, where missing
check with registered translators whether the existing English name is still the preferred name.

If we think that any or all of the above are worth doing, then I am willing to do the work involved.

Lloffiwr (talk)‎

Lloffiwr, I think it's worth sending them this list. You can file a request in their trac and they'll tell you if that's enough (unlikely) or they want more (possible) or they're not interested.

Nemo (talk)‎

I've put this on my todo list.

Lloffiwr (talk)‎

I have received an e-mail from John Emmons of CLDR concerning ticket 6763 at CLDR, as follows:

"I am starting to prepare to do work on this ticket that you opened - requesting new language names be added to CLDR. This ticket was presented to the CLDR TC a few weeks back and the concept was generally approved by the committee, pending some confirmation that you or someone else at translatewiki will be able to provide us a reasonable amount of translated material for these new language names. As has already been pointed out by my colleagues, many of these will not fall into the "modern coverage" bucket that the "big players" such as Google and Apple will translate to. Without a plan to offer translated material ( either via bulk upload or via survey tool entry ), adding these additional languages would be a virtually pointless exercise on our part.

So, if you can offer a plan that will convince me that this is worth doing, I'm agreeable. But we need to act pretty quickly, as I would want to have this all in place to open CLDR 26 data entry on May 1."

He wrote again wanting a response by 18th April. Unfortunately, I have not had time to post this here until today. Are there any translators interested in putting language name translations onto CLDR? If so, please reply to this thread and mention the code of the language into which you normally translate.

If you would be willing to provide translations but not enter them on CLDR, please mention this and I will let CLDR know.

Lloffiwr (talk)‎

It looks as though these languages are going to be added, except for some codes which are not standard and the codes als, bcc, bcl, bxr, diq, mhr, pnb which are all macrolanguages. Am I right in thinking that adding these might cause a problem further down the line if the locales are migrated to actual language codes instead of macrolanguage codes?

When the new codes are live on CLDR I will put something on translatewiki.net news about this. Could we also provide some publicity on the central banner, to see if we can encourage translators to contribute to CLDR?

Lloffiwr (talk)‎

I'd say not to bother about those non-standard codes. Sure, we can use sitenotice once those language names are added to the English source for CLDR, but first we could send direct messages to Language support team members for languages with existing CLDR locales. In the meanwhile I'll email Amir, Santhosh and the CLDR survey tool admin to figure out account creation for our translators.

Nemo (talk)‎

Language names were added to English source yesterday! http://unicode.org/cldr/trac/changeset/10166 In May we'll translate them. :)

Nemo (talk)‎

I see some of the needed translations into Japanese (and a couple of other languages) were added at https://git.wikimedia.org/tree/mediawiki%2Fextensions%2Fcldr/HEAD/LocalNames . Can someone merge them?

whym‎

whym, yes, you can. :) Send me an email and I'll add you to the CLDR survey tool as soon as possible. Let me know if you want to add translations in all those languages or only Japanese.

Nemo (talk)‎

I have added a news item; the banner can wait till 8 May, if I remember.

The survey tool on CLDR will be open for contributions from 8 May to 19 June, for those keen to contribute as soon as possible. If you already have an account at CLDR you can log in here.

Lloffiwr (talk)‎

I have reviewed the list of aliases at CLDR. Apart from the macrolanguages als, bcc, bcl, bxr, diq, mhr, pnb and rmy, there are 3 codes on this list, which are used at translatewiki.net:

mo - Moldovan, deprecated in CLDR - CLDR use ro_MD
sh - Serbo-Croatian in translatewiki.net, Serbian (Latin) in CLDR. CLDR use sr_Latn for Serbian(Latin)
tl - Tagalog in twn, Filipino in CLDR. CLDR use fil for Filipino.

These 3 codes are already in CLDR so I assume there must be a way of mapping the CLDR code to the twn code.

Lloffiwr (talk)‎

Thanks, that's useful. The other day I was stupidly wondering how could CLDR not have Tagalog as locale... I'm not sure about aliasing but surely one bug should be filed for each of those languages to be renamed to its proper language code, can you do that? At least tl sounds uncontroversial.

Nemo (talk)‎

I think that Siebrand is already aware of these, and will know better than I whether they should be changed.

Lloffiwr (talk)‎

What we commonly call "Tagalog" in Wikimedia is the "Filipino" (or Pilipino) language in standards. But the language code "tl" is ambiguous, it can be considered as a macrolanguage encompassing the traditional Tagalog and the modern Filipino. Filipino has its CLDR data under its standard code as an individual language. Note that the traditional Tagalog was not written with the Latin script, and was not so much creolized with lots of borrows and important simplifications of the phonology. "tl" is not recommended, but as a macrolanguage, can be considered like "zh" for Chinese (even if most of the time it just means modern Mandarin, and most of the time in the simplified version of the Han script). "tl-Tglg" on the opposite only qualifies the traditional language (the modern Filipino is almost never written in the traditional script, and that's probably why "tl" is not standardized as including Filipino). Wikimedia makes an exception to that view on its localized sites (but not in Wiktionary which preferably uses more precise language codes).

Verdy p (talk)‎

The CLDR input period will reopen soon. For now it is betatesting a new faster and easier version of the site. Generally the input period takes about 1 months, terminated by a vetting period of about 1 month (sometimes more if there are lot of new input data, and performance problems). The next release follows in the next 2 weeks. We can expect a new release of CLDR at the beginning of summer. It could be good to request these 100 languages in the CLDR bug report to allow input there even if Wikimedia has already started working on this language list (Wikimedia can also start discussing and fixing many of them, and these discussions considered during the CLDR vetting process).

Verdy p (talk)‎

Verdy: Bug 6763 on CLDR was opened by us, so there is no need to make a further report. Localised language names are taken from CLDR usually. This bug will only help those languages which have a locale at CLDR (currently 240 compared to 345 supported languages at translatewiki.net) - but that is a different issue. User Whym pointed out that we do have some local names data ourselves. It does mention on Translatewiki.net_languages#Language_names that it is possible to get language names added at Wikimedia if CLDR doesn't yet support the locale or the language name, but this won't help other projects. Entering data at CLDR is definitely preferable, where possible.

I hope to do a review of translatewiki.net supported languages again next winter to identify more new languages at translatewiki.net, which are not yet included on the CLDR language names on the survey tool. We can then raise a new bug at CLDR to get these added to the survey tool. If the localisation statistics at CLDR are good for the batch of new names just added, then our chances of getting another batch of language names added will be better. Basically, the more people who contribute localised language names at CLDR, the better!

Lloffiwr (talk)‎

When you posted this initial thread the CLDR input period was still not open. I replied at that time and it was still not open. This is no longer the case, as now CLDR input has started again (but it currelty has some start problems with performance, so it's difficult to use when each input requires waiting for avout 20-30 seconds after each click or submission (otherwose the next clicks are handled asynchronously in the wrong order, or your clicks may be handled up to one minute later when the screen content has finally been changed and that click will go to another element than the one effectively clicked. For now, it's simply unusable for mass input. Additionnaly there are frequent losses of sessions.

These problems in the CLDR Survey are not new (it has always existed since so many years), but version after version it gets each time worse because the UI performs too many background requests, but also because it constantly reflows the content when all the page is in a giant table.

The CDLR Survey tool has not seriously been designed (and it is unusable without a very solid PC: even with higabytes of memory and an octo-core CPU, its javascripts are dramatically slow, even at non-peek hours, with less than a dozen users connected to it). It really lacks a good server (and probaly my own desktop PC is more powerful than the one used to host the CLDR survey).

So we cannot recommend many people to use it. It is simply more efficient to submit a bug report with the necessary data in XML format than using the online tool (and in fact most of the content of the CLDR has been submitted this way). The tool itelf is only uable for vetting a few items.

May be the Wikimedia Foundation could provide a grant to the CLDR project if we want to use it as a source. But for now, Wikiemdia projects should simply leave better by just providing the data itself, here on translatewiki.net, to perform most of the job needed for creating and vetting data, even if we submit it in XML format to the CLDR project where they will be merged and vetted in a later version. The CLDR project is very slow to accept new data, too much for us where we have more urgent needs (and my opinion is that our own community is larger and privides more fata with more quality than the very small communty on CLDR).

For now, the other "major" participants in CLDR (Microsoft, Google, Apple, IBM) did not contribute with the necessary resources they should have given to CLDR to make it effective. In fact they also have their own internal development processes and submit data very passively to CLDR, where cooperation is in fact very poor (really Google could have provided the needed technical infrastructure of servers, and a few of its web designers, but apparently it is not really interested in providing more locales than about 30-50 for which most of the needed data is already in CLDR and the rest can remain in English for Google... Microsoft, Apple and IBM are apparently on the same path and don't care much about our desire in Wikimedia to support more languages).

Even if I've been member to the CLDR project and Unicode since years, I am still convinced that Translatewiki.net performs better than CLDR and provides more data, faster, in a more efficient way, with faster corrections of errors, faster delivery, and a much larger community of contributors and users.

However the CLDR project is still good for the technical aspect of specifications. But for collecting the data itself I cannot recommend it (and there are really a lot of errors in this data, that has been signaled years ago, and impossible to change year after year because of not enough votes, and the near impssibility to involve more people to participate in this Survey tool).

I son't say that the CDLR is not unneeded, but it should just be one (minor) source of data for Wikimedia projects (and for most other open projects, including Ubuntu and Launchpad). I propose to reverse the direction of interaction with CLDR, and for as long as this CLDR project will not scale better with serious technical resources and more serious involvement of Google, Microsoft, Apple, IBM, Adobe, Oracle (and other large "official members" of CLDR TC).

Verdy p (talk)‎

Our new users have made thousands submissions already, so I'm glad to say the survey tool is fine at least for some people.

Nemo (talk)‎

Speaking of usability, on Firefox it was nearly unusable because of frequent freezes, but Chromium seemed to work well for me.

whym‎

The CLDR Survey uses too long forms and some browsers have severe problems handling long lists. Things are goind better now that long lists have been split into sublists. But it's true that it is sometime slow, the server sometimes delays its responses to the browwer in a very strange way (sometimes several minutes after the change, and the display is not necessarily synchronized with the input that was done. I also don't like the fact that clicking on an item is moving elements down and up on the page, because of these delays : sometimes this causes vetting clicks to be sent to a non-desired item.

So use the Survey tool with care: it you see some strange behavior, or if it suddenly starts being extremely slow, this is because it has accumulated in the browser too many pending requests (many of them are completed, but the completion event was not received, and these stale HTTP sessions are increasing the number of background threads and sessions up to the point that it may hang the browser (even in Chrome or Chromium).

In Firefox the tool is clearly unusable (let's not speak about Opera...) but the tool works also in IE. The tool lack some developments, it evolves slowly year after year; but Google, IBM and Microsoft should offer more help to the few programmers maintaining it for the CLDR TC. The server side of this tool however is much more stable today and much faster than it was in the past years.

Verdy p (talk)‎