Localisation of page
Having the local name in brackets is the best we can do at the moment.
Speaking of the CLDR data... I wonder if we should change our trance and allow language name translation in twn (only once!) and try to push all we have into CLDR as one blob (someone would need to negotiate how to do that). Forcing everyone to go to CLDR directly seems to be too much for translators and puts some languages into different positions because those languages are not even available in CLDR yet
I'll give you the current state of play for Welsh and Swahili, as working examples. For Welsh, somebody has entered quite a lot of proposed terms into the survey tool, and I have voted on that and added some further proposals. 8 votes are needed for a term to be published on CLDR. However, I have come to a temporary halt with it, whilst waiting for a response from the linguists at the Bedwyr Centre for Celtic languages on a number of issues; the use of diacritic marks, the use of letters not normally part of the Welsh alphabet, principles for converting language names for unfamiliar languages into Welsh, and the like. Going on previous experience, I am not expecting them to answer in a hurry, and so realistically, the number of language names in the Welsh database is not going to increase significantly for at least another 6 months. If we were to set up the localisation of language names here, then the issue of the diacritic marks would disappear, because I would decide to use them, ignoring potential problems for users with outdated browsers which cannot cope with the diacritic marks (CLDR advises that this should be considered when deciding whether to include diacritic marks). And I would probably choose to follow the principles of localisation in the Academy Welsh Language Dictionary (which emphasises phonetic pronunciation over similarity to accepted 'international' versions of a language name). I would still like to get the opinion of the linguists at the Bedwyr Centre, but if they couldn't deal with my queries quickly then I would just go ahead anyway, subject to contributions from other Welsh translatewiki.net and Wikipedia users.
On Swahili, I have proposed a few mainly African language names on the survey tool after consulting dictionaries and native speakers. Some of the proposed terms already on CLDR are way off base - I know some are wrong, but am not in a position to come up with a correct term myself. However, I still haven't managed to set up any communications with professional linguists, because of various communication difficulties. The dictionary sources available to me contain only the names of very well-known languages. There are as yet no generally accepted written forms for less prominent languages, and I (and I venture my Tanzanian co-tranlator also) would not really like to venture to propose these terms on translatewiki.net until we had managed to consult with a professional linguist, who could either do the work for us, or propose principles for us to work with.
These two languages have both been written languages for a long time. Some of the languages supported here have been written for a short time only, or they are non-state languages with little opportunity to develop academic or other specialist vocabulary. I don't know whether other translators have other factors to contribute from their own experience.
It would be very nice to be able to see the language name for Breton in Welsh (Llydaweg) and Asu in Swahili (Kipare), both of which are correct, are already proposed (by me) on CLDR, but are waiting for votes before being published. So I like your idea to create a database here, which CLDR might be persuaded to accept as a bulk set of proposals (or a vote where a term has already been proposed). But creating the database here would not be straightforward, as described above.
For the languages that are not yet available on CLDR, it would be really good to have a database here on translatewiki.net.
For all languages, it would be good to be able to create a database, if Mediawiki could use a localised CLDR term first, or where one didn't exist, choose the translatewiki.net or Mediawiki database term instead. I don't know whether that is possible. If it was, we wouldn't have to depend on the speed of development of CLDR.
It's imho pretty easy to populate a database of language names, since ISO 639 has names already. Depending on the substandard (currently 1, 2B, 2T, 3, 5 plus 6 in preparation), we have English, French, and an autonym or several (name of a language name in the language itself) They are all available to more or less automatic bulk download. Since I've done that already seveal times, for my own tools, I'd be available to make the same for twn as well. I know there are few special cases with dropped codes and altered meanings, but the bulk is fairly easy. Btw., the Babel extension is doing something very similar already.
Have you considered encouraging the Welsh linguists to work with the CLDR directly? Quite a number of such organizations contribute data directly to the CLDR. Please don't create yet another divergent process.
This is a sore point. As well as asking the Bedwyr Language Centre, who tell me that they are the 'official' contributors to CLDR, to comment on the principals of localising foreign proper nouns to Welsh for CLDR, I have also asked them to contribute terms and vote on those already contributed by others. I have deliberately held off from voting myself where the term to be adopted is not obvious, to allow them to take the lead in localising. I expect, however, to be waiting a long time, if past experience is anything to go by, before they can do anything with this, since their time and resources are heavily committed on all sorts of things. I intended to convey my cautious approach to the question of standardisation in my post above and am sorry that that was not clearly stated.
At the same time there is nothing controversial about the word 'Llydaweg', meaning 'Breton language', which has been around for hundreds of years, and no-one needs a language specialist to rubber stamp it. It would be nice to be able to have 'Llydaweg' in a database created here, for use here until it is superceded by the CLDR term, whenever that may become available. Whether that is practical is a matter for the developers here to address.
As it stands 'Llydaweg' is the unopposed entry. It has the status of 'contributed', but it certainly doesn't need 8 votes to be 'in CLDR' as things stand. It will show up as "<language type="br" draft="contributed">Llydaweg</language>" and twn (and other CLDR consumers) can decide whether "contributed" is better than nothing.
You said that you held off on voting, did you at least enter the terms? Because, the time for submitting data has now passed in most cases. What you can (could) have done is to enter an option, and then change your vote back to n/o (no opinion)- that puts an entry in but doesn't yet vote for it.
Thanks for participating, and please watch for a mass e-mail soon about the vetting phase.
We should only do that if we also import what CLDR already has - and we should be able to maintain that proerly with a conversion/maintenance script -, and if we take both language names and country names, IMO. Possibly also currency names, currency signs, currency abbreviations, time zones. It should be a separate product/namespace.
New CLDR data is published roughly twice a year. We need to go with CLDRs work cycle.
After the publication of new data and the moment, their survey tool is opened again for new submissions, we can simply rely on the published data.
When the survey tool is open, we could use it to poll new suggested data, but
- it does not fit our data structure, since they do not alter or add autoritative data, but rather collect sets of new/additional suggestions per item. Which one would we anticipate to become final?
- I cannot recommend polling because of performance considerations on either side, and we would at least have to ask for permission, because we could degrade their (sometimes flaky) server performance.
When vetting is open, the situation is basically very similar.
After the vettig phase, until the new data is published, I think, we cannot get the new data, but it may still be possible to use the survey tool read only, while data should remain unaltered.
Conclusion: Unless we find a special arrangement with CLDR, we likely can see their data as static and stable from publication until the survey tool is reopened, and maybe even until it is closed. If we are able to to find an arrangement with them that allows us to bulk submit our new data as suggestions towards the end of the survey phase, we can then start our new collection phase right after the (last) submission.
Using a bot to suppy data via the survey tool is likely possible. I cannot estimate the labour needed to make such a bot, but one should not be hard to find with a little reseach. Whether or not CLDR would accept that. is unclear. Usually, accounts are given to individuals allowing them access to two locales, that is und
(undefined) for trials and tests, and the "real" one they work upon. We would need to have a more universal account anyways. It may be easier to supply our data in a common exchange format such as CSV, XML (LDML) e.g. via e-mail or upload. I suggest to find someone from their staff, and talk about this.
Nike's original proposal included us localising language names here first and then transferring the whole database to CLDR. I think it might be more realistic and effective if we were to tackle the transfer to CLDR for a few locales, or just one, at a time. That way, we can prioritise the languages which most need this leg up, and which also have enough translators to complete the localisation.
So, any volunteers to pick up the communication?
I did, by e-mail, but no reply yet. I may have to poke several potential partners and shall do that one by one. May take a week or two.
Sorry, it looks like I didn't reply properly. Any ways, I just saw your e-mail (I was away from my mail over the weekend, after working very hard on preparing for the ST vetting phase to open). Please do not poke potential partners! It would be better to just file a single bug, but I'm glad to forward the e-mail to the TC if you prefer that.
I got an e-mail reply from Srl295 last monday who asked for permission to forward the note on to the CLDR technical committee, which I granted.
A rough sketch for the translatewiki.net technical part as I currently see it:
- We could use locales provided in the Local Description Markup Language LDML, an XML format designed by CLDR, to extract a set of key => value pairs per locale, which obviously can be imported into its own namespace in translatewiki.net, possibly including limited amounts of message documentation.
- The inverse process could be used to generate files with suggestions of additions and updates to be provided to CLDR.
- Other types of export formats are possible of course as well.
- While this scheme could transport all possible locale items, we do not have to support them all.
- I can do the necessary coding.