User:Siebrand/An update on localisation in MediaWiki (2009)

=An update on localisation in MediaWiki (2009)= This message was posted as an e-mail to mediawiki-l, mediawiki-i18n, wikitech-l, foundation-l, wikimediaindia-l and translators-l. This is the wiki page version.

On 31 December 2007 and 1 January 2009 I sent an e-mail to these lists, to which this is a follow up.

First things first, because not everyone reads e-mails completely:
 * MediaWiki localisation (that is the translation of English source messages to other languages) depends on you! If you speak a language other than English, care about your language in MediaWiki and Wikimedia and like translating, go to http://translatewiki.net, register a user and start contributing translations for MediaWiki and MediaWiki extensions. When your localisation is complete, keep coming back regularly to re-complete it and do quality control. Thank you in advance for all your contributions and effort.
 * The i18n and L10n area of MediaWiki requires continuous efforts. If this area of FOSS has your interest: we need your help. Please offer your development skills to further MediaWiki's i18n, L10n and translation capabilities.

All statistics are based on MediaWiki 1.16 alpha, SVN version r60527 (31 December 2009). Comparisons are to MediaWiki 1.14 alpha, SVN version r45277 (1 January 2009).

Introduction

 * Localisation or L10n - the process of adapting the software to be as familiar as possible to a specific locale (topic of this message)
 * Internationalisation or i18n - the process of ensuring that an application is capable of adapting to local requirements (out of scope of this message)

MediaWiki has a user interface definition for 362 languages (up from 348). Of those languages at least 39 language codes are duplicates and/or serve a purpose for usability[1]. Reporting on them, however, is not relevant. So MediaWiki in its current state supports 323 languages (up from 322). MediaWiki has 346 core language files (up from 326), of which 27 are redirects from the duplicates/usability group or just empty[2]. So MediaWiki has an active in-product localisation for 308 languages (up from 299).

The MediaWiki core product has several areas that can be localised:
 * regular messages that can and should be localised (2,369 - up 9% from 2,168)
 * optional messages that can be localised, which is mostly used for languages not using a Latin script (187 - up 8% from 173)
 * ignored messages that should not be localised (152 - up 2% from 149)
 * namespace names and namespace aliases (17 - no change)
 * magic words (142 - up 8% from 132)
 * special page names (88 - up 2% from 86)
 * other (directionality, date formats, separators, book store lists, link trail, and others)

Localisation of MediaWiki revolves around all of the above. Reporting is done on the regular messages only.

MediaWiki is more than just the core product. On mediawiki.org 1500 extensions (up 25% from 1200) have some kind of documentation. This analysis only takes the code currently present in svn.wikimedia.org/svnroot/mediawiki/trunk into account. The source code repository contains give or take 445 extensions (up 25% from 370). Most extensions in the MediaWiki Subversion repository now use the reference implementation for i18n. Currently 8,200 messages for MediaWiki extensions can be localised in a consistent way (up 37% from 6,000).

[1] als, be-x-old, ckb, crh, de-at, de-ch, de-formal, dk, en-gb, fiu-vro, gan, got, hif, kk, kk-cn, iu, kk-kz, kk-tr, ko-kp, ku, ku-arab, nb, ruq, simple, sr, tg, tp, tt, ug, zh, zh-classical, zh-cn, zh-sg, zh-hk, zh-min-nan, zh-mo, zh-my, zh-tw, zh-yue

[2] als, be-x-old, bh, ckb, ckb-latn, crh, de-at, dk, en-rtl, fiu-vro, gan, hif, hif-deva, ii, iu, kk, kk-cn, kk-kz, kk-tr, ko-kp, ks, ku, nb, pi, ruq, simple, st, tg, tp, tt, ug, zh-classical, zh-cn, zh-min-nan, zh-mo, zh-my, zh-sg, zh-yue

MediaWiki localisation in practice
MediaWiki localisation has moved further to a centralised collaborative process in translatewiki.net in the past year. Where in 2008 some wikis were still translating in their own MediaWiki: namespace, the introduction of the LocalisationUpdate extension, especially in the Wikimedia Foundation wikis, has taken away the last hurdle for local translation against centralised translation: instant gratification. Translations that are committed to Subversion can be added to wikis without requiring software updates, as often as desirable.

Little to no translations are submitted through the Bugzilla ticketing system or directly by SVN committers. Exceptions are the localisations of Hebrew, Cantonese, Simplified Chinese, Traditional Chinese, Classical Chinese and Persian, that are still actively maintained in SVN, next to regular contributors from the centralised system.

The past, the present and the future
MediaWiki localisation has always been a volunteer effort, and expect that it will remain so. 2009 brought a successful Google Summer of Code project, executed by Niklas Laxstrom and the Wikimedia Foundation is supporting the localisation that takes place at translatewiki.net. Not only MediaWiki, but all Open Source projects that are supported there benefit from these developments. We want to keep using the Translate extension technology and expand on it, as well as nourish our translator base of nearly 2,000 translators by providing them with better tooling and more projects in 2010. Vereniging Wikimedia Nederland, the Dutch Wikimedia Chapter has granted 2,000 Euro to Stichting Open Progress for the translatewiki.net Translation Rallies, that motivated its translators to make more than 60,000 new translations for MediaWiki and its extensions in August and December 2009.

New opportunities lie in better support of Translation Memory technology and more supported projects to grow the community and allow the translators to spend their time as productive as possible, while still allowing all the socialising and collaboration features of MediaWiki. At the Google Summer of Code Mentor Summit there was interest from the KDE Documentation Project, the PHP Documentation Project, Pidgin, wxWidgets, and other projects. For translatewiki staff this was a confirmation that our approach works. The Translate extension however needs more development. If you want to work on an exciting extension that makes a difference in multi language support for Open Source software and MediaWiki content pages that require structured translation, check out the Translate extension and help us make it better. Your help *is* needed and most welcome!

The Wikimedia Strategic Planning process that is currently taking place also allows for a broader perspective on the localisation of MediaWiki in a Wikimedia context. Support for several dozen MediaWiki extension in the Wikia code repository is expected within the next few weeks. Wikimedia is, or will soon be including a localisation score for language projects in their statistics, so that in a year we expect to be able to analyse if localisation is a requirement for a rise in usage or if it is a consequence.

MediaWiki localisation statistics
Daily statistics for MediaWiki and extension localisation have been available for the past two years. For the past two years (arbitrary) milestones have been set for four collections of MediaWiki related messages. For the usability of MediaWiki in a particular language, the group 'core most used' is the most important. A language must qualify for MediaWiki to have 'minimal support' for that language in the first group. Reaching further milestones indicates the maturity of a localisation:
 * core most used (469): 98%
 * core (2,369 messages): 90%
 * Wikimedia extensions (2,700 messages): 90%
 * extensions (8,200 messages): 65%

Currently the following numbers of languages have passed the above milestones:
 * core most used: 147 (45.6% of supported languages - up 35% from 109 - goal of 130 passed)
 * core: 82 (21.1% of supported languages - up 21% from 68 - goal of 90 missed by 203 translations)
 * Wikimedia extensions: 44 (13.6% of supported languages - up 22% from 36 - goals of 50 missed by 1,500 translations)
 * extensions: 39 (12.1% of supported languages - up 86% from 21 - goal of 30 passed)

I think the changes in the past year are very satisfying. MediaWiki localisation has again improved enormously in the past year. Two of the four goals I set in last years' e-mail have not been reached (only one of four goals was reached for 2008). We nearly got there, though. Currently MediaWiki core contains 377,394 messages (up 24% from 303,863 ultimo 2008).

Conclusion
So... Is MediaWiki doing well on localisation? Just like the past two years, my personal opinion is that we do a proper job, but can still do a lot better. After all, MediaWiki is the engine that runs a top 5 site in the world committed to creating "a world in which every single human being can freely share in the sum of all knowledge." Observing that there are also an estimated hundred thousand MediaWiki installations out there, more than 250 Wikipedias that all use the Wikimedia Commons media repository, and that 147 languages out of 323 have a minimal localisation, there is a lot of room for improvement; more realistically: the work will never be done, we the least we can do is try to get there :).

Last year I mentioned languages from Africa performing way below average. I am sad to conclude that this has not changed considerably. In an overview with a weighted score for the localisation level of MediaWiki in a Wikimedia context, the largest African languages have the lowest score (52 out of 100). Large languages spoken on multiple continents and large languages from Europe are doing best (100 and 99 out of 100 respectively). Languages like Oriya, Zulu, Burmese and Urdu are the large languages with the worst localisation score. It is my personal aim to work towards an average L10n score of 83 for the 50 largest languages in the world by the end of September 2010.

We have all the tools to successfully localise MediaWiki into any of the 7,000 or so languages that have been classified in ISO 639-3. We only need one person per language to make and effort and make it happen. Reaching the first milestone (core most used) takes about six hours of work. Using translatewiki.net or the Gettext file, little to no technical knowledge is required. Knowledge of MediaWiki is a plus.

This was the pitch, basically the same as in 2007 and 2008, with even more experience and data. Goals for MediaWiki localisation per end of 2010 are ambitious, but still realistic with the right effort:
 * core most used: 170 languages with 98% or more localised
 * core: 105 languages with 90% or more localised
 * wikimedia extensions: 65 languages with 90% or more localised
 * extensions: 50 languages with 65% or more localised

I would like to wish everyone involved in any aspect of MediaWiki a wonderful 2010.

Cheers!

Siebrand Mazeland