User:Purodha/cldr and lib-c data

From translatewiki.net

We want to have cldr[1] and lib-c data in translatewiki.net. We want to:

  • use it,
  • make additions quickly and easily on-site,
  • contribute to the common libraries for everyones use.

In order to do that, we need to have:

  • import modules
  • export modules

that yet have to be programmed.

Use the talkpage for discussion and suggestions. Alter this page to answer open questions.

Timing and work considerations

Import obviously is the step to begin with. Export needs not be ready before data to be exported has been collected, and can even be postponed until someone is ready to take it and use it.

A testing and development environment is nearly there, and likely to be fully funktional within days. Early commits can be made so as to allow everyone to monitor progress and run own tests if they like. Running the code at translatewiki.net would require staff action. Still, it would not likely accidentally break anything else even if broken, since it is isolated from the rest.

Whether or not cldr and lib-c imports can be developed in one go is not clear at the moment. It depends on whether or not they are structurally equal or similar enough. Also, it has do be decided how to deal with non-identical values for identical keys, that is, when the contradict one another.

Import from cldr

Cldr allows to download locale data[1] from their website in LDML[1], an is easily parseable as XML format. It includes a hierarchy of items. Both the hierarchy levels and items include names. That allows to

  • generate local message keys from flattening the hierary appending names to each other with a separator that cannot not be part of a name.

There is a gross 50% overlap between the languages translatewiki.net has, and the locales, that cldr has. They are not named equally[2].

  • We need to decide what to do with locales that we currently do not support.
  • We need a mapping for the locales of cldr and languages of translatewiki.net.

There are occasional hints about proposed alternates in the cldr data.

  • Hints on alternates could go to either talk pages or editable parts of message documentiaton.

There is some explanatory documentation, hints, and format information available from cldr on various items. Some is in depths, some covers merely the data types. For some sorts of items, editor most know specific requirements so as to correctly edit them.

  • Can the information for editors be imported as not editable part of the message documentation?
    • Was it legal? Likely mostly because they do not qualify for copyrighted work, but it should be better to asked for, or license terms be checked. Linking to them is not a good alternative, for practicability reasons.

Is the XML tree structure identical for all languages? Likely not so, since some information cannot be entered via cldrs survey tool[1] but influences the choices it offers.

  • How to deal with structures local to locales?
    • Those of lesser importance for translatewiki.net can be skipped in a first step if an easy other solution is not found. There are likely none of considerable urgency.


Reimporting, that is importing a new cldr release once data already exists in translatewiki.net, is still to be assessed.

Import from lib-c

  to be added later

Exports

  to be added later

Editing considerations

  • Editing English data should be enabled for this group.
  • Editing is likely not to be called "translating" for this data.
  • It may be worth considering to allow any ISO 639 defined language to be supported for this data. Many things can be simply taken from various published sources[3] by everyone, selectively restricting language support may thus be counterproductive.

Weblinks

Footnotes

[1]   See Weblinks above.
[2] For instance translatewiki.nets pt is pt-PT for cldr and translatewiki.nets pt-BR is pt at cldr
[3] such as international dialling directories, banks lists of exchange rates, tourist guides, and many others.