User:Siebrand/An update on localisation in MediaWiki (2008)

Mail has been sent. Please do not make any changes. On 31 December 2007 I sent an e-mail to these lists, to which this is a follow up[1].

First things first, because not everyone reads e-mails completely: other languages) depends on you! If you speak a language other than English, care about this and like translating, go to http://translatewiki.net, register  a user and start contributing translations for MediaWiki messages and  extension messages. When your localisation is complete, keep coming back  regularly to re-complete it and do quality control. Thank you in advance for  all your contributions and effort.  of FOSS has your interest, please do not hesitate and offer your development  skills to further MediaWiki's i18n and L10n capabilities[5,6].
 * MediaWiki localisation (that is the translation of English source messages to
 * The i18n and L10n area of MediaWiki requires continuous efforts. If this area

All statistics are based on MediaWiki 1.14 alpha, SVN version r45277 (1 January 2009). Comparisons are to MediaWiki 1.12 alpha, SVN version r29106 (31 December 2007).

Introduction
as possible to a specific locale (in scope) capable of adapting to local requirements (out of scope)
 * Localisation or l10n - the process of adapting the software to be as familiar
 * Internationalisation or i18n - the process of ensuring that an application is

MediaWiki has a user interface (UI) definition for 348 languages (up from 319). Of those languages at least 26 language codes are duplicates and/or serve a purpose for usability[2]. Reporting on them, however, is not relevant. So MediaWiki in its current state supports 322 languages (up from 302). To be able to generate statistics on localisation, a MessagesXx.php file should be present in languages/messages. There currently are 326 such files (up from 262), of which 27 are redirects from the duplicates/usability group or just empty[3]. So MediaWiki has an active in-product localisation for 299 languages (up from 236).

The MediaWiki core product recognises several collections of localisable content (three of which are defined in messageTypes.inc): languages not using a Latin script (173 - up 7% from 161) trail, and others)
 * 'normal' messages that can be localised (2168 - up 26% from 1726)
 * optional messages that can be localised, which usually only happens for
 * ignored messages that should not be localised (149 - up 49% from 100)
 * namespace names and namespace aliases (17 - no change)
 * magic words (132 - up 10% from 120)
 * special page names (86 - up 13% from 76)
 * other (directionality, date formats, separators, book store lists, link

Localisation of MediaWiki revolves around all of the above. Reporting is done on the normal messages only.

MediaWiki is more than just the core product. On http://www.mediawiki.org/wiki/Category:All_extensions some 1200 extensions (up 60% from 750) have some kind of documentation. This analysis will scope only to the code currently present in svn.wikimedia.org/svnroot/mediawiki/trunk. The source code repository contains give or take 370 extensions (up 61% from 230). Of those 370 extensions, about 300 contain messages that can be visible in the UI in some use case (debugging excluded). Out of those 300, about 35 have an exotic implementation for localisation support, no localisation at all (just English text in the code), or are outdated, broken or obsolete. Compared to last year, when there were about 5 different 'standard' implementations of i18n in extensions, the situation has changed a lot. The vast majority of extensions now make use of $wgExtensionMessagesFiles and wfLoadExtensionMessages. Currently some 6,000 messages for extensions can be localised in a consistent way (up 200% from 2,000).

MediaWiki localisation in practice
Ways to to MediaWiki localisation have not changed a lot in the past year. Still, the changes that have taken place have a profound impact on the quality and volume of localisation for MediaWiki. translations, imported those in the base product and tried to recruit translators to audit and extend the centralised localisation. This project has been very succesful, and aside from a few exceptions, local wikis now customise and no longer do base localisation. and/or extensions. These users are mostly part of a wiki community that is part of Wikimedia. Compared to last year, this group of localisation contributors has decreased. The maintainer for German for example started working in Betawiki, because he stated that he was no longer able to keep up with the workload. Languages that remain getting (very) frequent updates through subversion are Danish, Hebrew, and Chinese (4 variants). The number of localisations maintained this way has dropped from more than 10 to about 6. Localisation updates submitted through Bugzilla are virtually non-existent. (translators, translations, supported products, traffic, etc.). 95% or more of the localisation volume for MediaWiki goes through this wiki, and a lot of development has been done on the Translate extension in the past year. Betawiki staff remains committed to MediaWiki i18n and L10n, and still has a strong belief in collaborative localisation.
 * in local wikis: we have scavenged all Wikimedia wikis for existing
 * through bugzilla/svn: A user of MediaWiki submits patches for core messages
 * through Project: In the past year Betawiki has about doubled in size

The professional amateur approach
2008 was also the year in which MediaWiki localisation got outside stimuli. A grant given to Stichting Open Progress[8] by Hivos[9] enabled us to provide bounties for less resources languages[10], and enabled us to have an end of year translation rally[11]. Niklas Laxström[12] participated in the Finnish Summer Coding Project[13], which led to a more feature rich and usable Translate extension, allowing translators to be more productive. We (Betawiki staff and Stichting Open Progress) intend to continue trying to get funding to improve language support for MediaWiki and FOSS in general. If you think you can help in any way to achieve this goal, please do no hesitate to contact any of us.

Multiple developers contribute on i18n and L10n features. Most, if not all features that are added to subversion these days are audited for i18n or L10n omissions. These are usually corrected quickly after being discovered. Core messages and extension messages have been made more consistent. These are ongoing processes, that we need your help and awareness for[5,7].

MediaWiki localisation statistics
Per end of 2007 MediaWiki localisation has no longer only focused on a complete translation of core messages, but also on messages used in the most often use cases. This resulted in a set of about 25% of the MediaWiki core messages that really have to be translated before a language is really usable with MediaWiki. Because software like MediaWiki is ever changing, an updated of this list will be released in the coming week[4].

Daily statistics for MediaWiki and extension localisation are created at http://translatewiki.net/wiki/Translating:Group_statistics. Last year, some (arbitrary) milestones have been set for four collections of MediaWiki related messages. For the usability of MediaWiki in a particular language, the group 'core most used' is the most important. A language must qualify for MediaWiki to have 'minimal support' for that language in the first group. Reaching further milestones indicates the maturity of a localisation:
 * core most used (485 messages): 98%
 * core (2,168 messages): 90%
 * wikimedia extensions (1,067 messages): 90%
 * extensions (6,013 messages): 65%

Currently the following numbers of languages have passed the above milestones:
 * core most used: 109 (33.9% of supported languages - up 132% from 47)
 * core: 68 (21.1% of supported languages - up 39% from 49)
 * Wikimedia extensions: 36 (11.2% of supported languages - up 260% from 10)
 * extensions: 21 (6.5% of supported languages - up 200% from 7)

As you can see, the changes in the past year are gigantic. And these changes have been accomplished even though many messages have disappeared from and have been added to all the message groups. Currently MediaWiki core contains 303,863 messages (up 77% from 171,261 ultimo 2007).

Conclusion
So... Is MediaWiki doing well on localisation? Just like last year, my personal opinion is that we do a proper job, but can still do a lot better. Observing that there are more than 250 Wikipedias that all use the Wikimedia Commons media repository, and that 109 languages have a minimal localisation, there is a lot of room for improvement. With the Wikimedia Foundation using Single User Login, MediaWiki must do better.

Last year I mentioned a few example cases of languages that did very well, and also a few language that didn't do well. So what happened there? Well, Hindi got a boost, but sank away as messages were added without an active maintainer. Asturian and Extremaduran have no active maintainers. Bikol Central is not doing to bad, and Lower Sorbian and Galician are doing great. Languages from Asia have heavily improved their MediaWiki localisation last year. But where are the languages from Africa? In the past year we have seen steady contributions for Amharic and Afrikaans, some for Swahili, Wolof and Yoruba, but all in all, just not enough to provide (native) speakers with a user interface in their own language.

With the Wikimedia Foundation aiming to put MediaWiki to good use in developing countries and products like NGO-in-a-box that include MediaWiki, the potential of MediaWiki as a tool in creating and preserving knowledge in the languages of the world is huge. We have to tap into that potential and *you* (yes, I am glad you read this far and are now reading my appeal) can help. If you know people that are proficient in a language and like contributing to localisation, please point them in the right direction. If you know of organisations that can help localise MediaWiki: please approach them and ask them to help.

We have all the tools to successfully localise MediaWiki into any of the 7000 or so languages that have been classified in ISO 639-3. We only need one person per language to make and effort and make it happen. Reaching the first milestone (core most used) takes about six hours of work. Using Betawiki or the gettext file, little to no technical knowledge is required.

This was the pitch, basically the same as in 2007, but with more experience and data. Three of the four goals I set in last years' e-mail have not been reached. I did not take into account how rapidly MediaWiki would grow, or how quickly we could standardise i18n implementation for extensions. Goals for MediaWiki localisation per end of 2009 remain largely the same as for 2008:
 * core most used: 130 languages with 98% or more localised
 * core: 90 languages with 90% or more localised
 * wikimedia extensions: 50 languages with 90% or more localised
 * extensions: 30 languages with 65% or more localised

I would like to wish everyone involved in any aspect of MediaWiki a wonderful 2009.

Cheers!

Siebrand Mazeland

[1] http://lists.wikimedia.org/pipermail/translators-l/2007-December/000571.html [2] als,be-x-old,crh,de-formal,dk,en-gb,hif,kk,kk-cn,iu,kk-kz,kk-tr,ku,nb,ruq, simple,sr,tg,tp,tt,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-mo,zh-yue [3] als,be-x-old,crh,dk,hif,ii,iu,kk-cn,kk-kz,kk-tr,ks,ku,lld,nb,ruq-grek,ruq, simple,tg,tp,tt,ydd,zh-cn,zh-min-nan,zh-mo,zh-my,zh-sg,zh-yue [4] http://translatewiki.net/wiki/Most_often_used_messages_in_MediaWiki [5] i18n Bugzilla issues: https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&component=Internationalization&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED [6] Translate extension bugs and feature requests: http://translatewiki.net/wiki/User:Siebrand#Bugs [7] http://translatewiki.net/wiki/Support [8] http://www.openprogress.org/Stichting_Open_Progress [9] http://www.hivos.nl/eng [10] http://translatewiki.net/wiki/Translating:Language_project [11] http://translatewiki.net/wiki/Project:News/Newsletter_2008-12-2 [12] http://nike.fixme.fi/blag/ [13] http://www.coss.fi