Language names not in Unicode CLDR
I have received an e-mail from John Emmons of CLDR concerning ticket 6763 at CLDR, as follows:
"I am starting to prepare to do work on this ticket that you opened - requesting new language names be added to CLDR. This ticket was presented to the CLDR TC a few weeks back and the concept was generally approved by the committee, pending some confirmation that you or someone else at translatewiki will be able to provide us a reasonable amount of translated material for these new language names. As has already been pointed out by my colleagues, many of these will not fall into the "modern coverage" bucket that the "big players" such as Google and Apple will translate to. Without a plan to offer translated material ( either via bulk upload or via survey tool entry ), adding these additional languages would be a virtually pointless exercise on our part.
So, if you can offer a plan that will convince me that this is worth doing, I'm agreeable. But we need to act pretty quickly, as I would want to have this all in place to open CLDR 26 data entry on May 1."
He wrote again wanting a response by 18th April. Unfortunately, I have not had time to post this here until today. Are there any translators interested in putting language name translations onto CLDR? If so, please reply to this thread and mention the code of the language into which you normally translate.
If you would be willing to provide translations but not enter them on CLDR, please mention this and I will let CLDR know.
It looks as though these languages are going to be added, except for some codes which are not standard and the codes als, bcc, bcl, bxr, diq, mhr, pnb which are all macrolanguages. Am I right in thinking that adding these might cause a problem further down the line if the locales are migrated to actual language codes instead of macrolanguage codes?
When the new codes are live on CLDR I will put something on translatewiki.net news about this. Could we also provide some publicity on the central banner, to see if we can encourage translators to contribute to CLDR?
I'd say not to bother about those non-standard codes. Sure, we can use sitenotice once those language names are added to the English source for CLDR, but first we could send direct messages to Language support team members for languages with existing CLDR locales. In the meanwhile I'll email Amir, Santhosh and the CLDR survey tool admin to figure out account creation for our translators.
Language names were added to English source yesterday! http://unicode.org/cldr/trac/changeset/10166 In May we'll translate them. :)
I see some of the needed translations into Japanese (and a couple of other languages) were added at https://git.wikimedia.org/tree/mediawiki%2Fextensions%2Fcldr/HEAD/LocalNames . Can someone merge them?
whym, yes, you can. :) Send me an email and I'll add you to the CLDR survey tool as soon as possible. Let me know if you want to add translations in all those languages or only Japanese.
I have reviewed the list of aliases at CLDR. Apart from the macrolanguages als, bcc, bcl, bxr, diq, mhr, pnb and rmy, there are 3 codes on this list, which are used at translatewiki.net:
- mo - Moldovan, deprecated in CLDR - CLDR use ro_MD
- sh - Serbo-Croatian in translatewiki.net, Serbian (Latin) in CLDR. CLDR use sr_Latn for Serbian(Latin)
- tl - Tagalog in twn, Filipino in CLDR. CLDR use fil for Filipino.
These 3 codes are already in CLDR so I assume there must be a way of mapping the CLDR code to the twn code.
Thanks, that's useful. The other day I was stupidly wondering how could CLDR not have Tagalog as locale... I'm not sure about aliasing but surely one bug should be filed for each of those languages to be renamed to its proper language code, can you do that? At least tl sounds uncontroversial.
I think that Siebrand is already aware of these, and will know better than I whether they should be changed.
What we commonly call "Tagalog" in Wikimedia is the "Filipino" (or Pilipino) language in standards. But the language code "tl" is ambiguous, it can be considered as a macrolanguage encompassing the traditional Tagalog and the modern Filipino. Filipino has its CLDR data under its standard code as an individual language. Note that the traditional Tagalog was not written with the Latin script, and was not so much creolized with lots of borrows and important simplifications of the phonology. "tl" is not recommended, but as a macrolanguage, can be considered like "zh" for Chinese (even if most of the time it just means modern Mandarin, and most of the time in the simplified version of the Han script). "tl-Tglg" on the opposite only qualifies the traditional language (the modern Filipino is almost never written in the traditional script, and that's probably why "tl" is not standardized as including Filipino). Wikimedia makes an exception to that view on its localized sites (but not in Wiktionary which preferably uses more precise language codes).