CLDR and pt
Pt is the same as pt-BR in CLDR, but in MediaWiki pt is pt-PT. Siebrand has proposed a solution to fix this discrepancy. This is still causing problems at least with the i18n tools extension which uses data from CLDR.
I'm going to be long, and ask for you patience here. There is a problem in the mapping CLDR -> MediaWiki. CLDR has the languages:
- pt
- pt-BR
- pt-PT
and its policy is that the main pt is the same as the variant with the largest population. So:
- "CLDR pt" = "CLDR pt-BR".
MediaWiki does not have pt-PT. Its languages are:
- pt
- pt-BR
to avoid a third L10n/i18n. But MediaWiki then maps "MediaWiki pt" to "CLDR pt" instead of the correct "CLDR pt-PT". So,
- "MediaWiki pt" = "CLDR pt" = "CLDR pt-BR".
For any purpose external to MediaWiki, this makes MediaWiki's languages:
- pt-BR
- pt-BR
Ensuing issues[edit source]
This has vast implications, but some I've come across which seem related are:
- My language here is pt. Yet when I use {{#languagename:pl}} the extension gives me the result "polonês" (from "CLDR pt") instead of "polaco" (from "CLDR pt-PT"). In other words, the language names in the MediaWiki universe are incorrect for users with the language pt. Similarly, date formats, currency designations, etc. should be incorrect for such users. Please remember that we corrected the PLURAL thing the other day, so that's fine.
- FreeCol seems to ignore the existence of the "MediaWiki pt" language. So, although it has been translated here, it's not used at all in FreeCol. I am guessing that this is related, because the pt translations seem to be complete for some time and are not in FreeCol. Please let me know if it will be used, because there are many messages and I'd like to be sure before reviewing some issues in the current translation.
How can we get around all this? Anything I can do to help?
FreeCol has FreeColMessages_pt_BR.properties and FreeColMessages_pt_PT.properties, and we map "pt: pt_PT" and "pt-br: pt_BR". So there is no issue there, is there?
As for CLDR, I do not have an answer, nor a solution. I will ask Niklas to take a look at this.
Yep, you're right about FreeCol. Right now, it seems the issue is that they haven't uploaded the translations for quite some time. We have translations from the beginning of the year which are not in 0.8.4 (Oct release) yet. I'll follow this up with them, so no issue on this side.
As for CLDR, thanks for following this up. It's quite important.
By the way, the FreeCol issue is being followed up here: https://sourceforge.net/tracker/?func=detail&aid=2901948&group_id=43225&atid=435578 (don't know if the URL will work as login is required).
I guess the problem is that there were once pt, pt-PT and pt-BR translations. I disabled one of those some time ago, but there should be two variants in trunk at least (not sure about the ages old 0.8.x).
According to them, they weren't reloading already translated messages. So changes made here apparently were not committed. But they've assured me that version 0.9 of the software (release date unknown) will reload all messages. If somehow you guys could make sure that they synch to the appropriate language codes that would be great. I'll review the translations soon.
FreeCol already has pt_PT and pt_BR in trunk and the codemap looks like this:
pt: pt_PT pt-br: pt_BR
Can you test the trunk version and see that they are correct?
Sorry Nike, but how do I go about downloading the trunk version for windows? I'm unable to do compilations and so on, so would need a version I can install with a click. Currently they have 0.8.4 and 0.9.0 available in this way, but those are too old. Looked in sourceforge but couldn't figure out if such a thing even exists for the trunk version.
You can download pre-release from http://www.freecol.org/news/freecol-0.9.0-alpha2-released.html
Hummm... I could be wrong about this being the cause for missing languages in other packages. FUDForum only lists one theme "default - portuguese" and presents the "pt" messages. So the "pt-BR" are unused. Can we get them to list both, perhaps adding a "default - portuguese (br)" and placing the "pt-BR" messages in it? Anything I can do to help?
FUDforum has 'pt' and 'pt-br'. What's the problem?
P.s. I see you are uploading screenshots of the FreeCol installation. Please report those issues in the FreeCol bug tracker.
Only language names are extracted from CLDR, everything else should be correct. We could introduce pt-PT locale, but I don't think we could change the meaning of pt without causing massive outcry.
Let me provide the terms for getting that done:
- organise a vote on pt.wp that the 'pt' code should use 'pt-br' as default. I would recommend that the vote would at least take a month to allow sufficient input from Portuguese Wikipedians
- in case of an outcome in favour to using the Brazilian variant, create a request in bugzilla:.
The result of processing the bugzilla: request would be the following:
- all translations in /pt would be put in 'pt-pt'
- all source code repositories would be updated - lot of work, but we would do that.
- 'pt' would be made empty, and fall back to 'pt-br'
- all current 'pt' special page names, etc, that differ from those in 'pt-br' would be added to 'pt-br' as aliases for backward compatibility.
+Mention it in the release notes for everybody else other than Wikimedia projects.
If I understood correctly Nike's input, then the "CLDR pt-PT" -> "MediaWiki pt" mapping is already done correctly in MediaWiki, so no issue there.
As for adopting the CLDR main language thingy in MediaWiki (i.e. having a separate "pt-PT", then fallbacking pt to pt-BR) and Siebrand's terms for getting it done, I don't regard it necessary. It's somewhat silly in itself, and as long as we're clear about what "pt" and "pt-br" mean in MediaWiki, (and we are) we should be fine. Anyone disagrees?
This means that the only open issues here would be:
- a bug in extension Language Names (Version 1.7.1 (CLDR 1.7.1)), because it maps "CLDR pt" -> "MediaWiki pt" when, in fact, the mapping should be "CLDR pt-PT" -> "MediaWiki pt". If we all agree on this, I can follow it up (any special procedure for that?).
- any future extensions resorting to CLDR may commit the same mapping error - I'd suggest we deal with that on a case-by-case basis, as they turn up. Anyone disagrees?
Right. I think I may have misunderstood. So the CLDR conversion script needs to have code mapping. Niklas?
Will look into it when I have time. (Should be easy to do if anyone wants to hack the code though)
Just to ensure we're in synch, what you guys are talking about will resolve the Language Names extension issue, right? So, I won't need to follow it up independently.
Is it correct now?
Thanks for addressing this, Nike. Well... I'm unsure :-). We have:
- {{#languagename:pl}} now returns the correct "polaco", so that is certainly fixed.
- But, on the "Other languages" box, in the "Intro" pages of this wiki, it still says "polonês" (incorrect) instead of the correct "polaco". Could this be due to the chache, perhaps?
- On the weird side of things, though, in the side bar, under section "Recent changes", there was a change from "Traduções em português" (correct) to "Traduções em Portuguese" (incorrect). It seems to have been reverted to the english language... any idea why?
- Sidebar's in other languages comes from MediaWiki.
- Don't know about the translations in ... thing. Maybe the pt-pt data in CLDR only has overrides for the pt translations?
Yes, it should be the case that the pt-PT variation only has overrides, pretty much as all other variations which CDLR does not consider main. So, I guess it needs to be retrieved in the same way as en-GB, de-AT, de-CH, etc. Is that possible?
May I be incredibly pushy here and nudge you a tiny little itsy bitsy bit? Maybe some hack possible?
I'd go for fixing the code discrepancy. That is out of my abilities however.