User:Siebrand/An update on localisation in MediaWiki (2009)

< User:Siebrand
Revision as of 00:41, 31 December 2009 by Siebrand (talk | contribs) (draft)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This is a draft On 31 December 2007 and 1 January 2008 I sent an e-mail to these lists, to which this is a follow up[1,2].

First things first, because not everyone reads e-mails completely:

  • MediaWiki localisation (that is the translation of English source messages to other languages) depends on you! If you speak a language other than English, care about your language in MediaWiki and Wikimedia and like translating, go to, register a user and start contributing translations for MediaWiki and MediaWiki extensions. When your localisation is complete, keep coming back regularly to re-complete it and do quality control. Thank you in advance for all your contributions and effort.
  • The i18n and L10n area of MediaWiki requires continuous efforts. If this area of FOSS has your interest, please do not hesitate and offer your development skills to further MediaWiki's i18n and L10n capabilities[3,4].

All statistics are based on MediaWiki 1.16 alpha, SVN version r60xxx (1 January 2010). Comparisons are to MediaWiki 1.14 alpha, SVN version r45277 (1 January 2009).


  • Localisation or l10n - the process of adapting the software to be as familiar as possible to a specific locale (in scope of this message)
  • Internationalisation or i18n - the process of ensuring that an application is capable of adapting to local requirements (out of scope of this message)

MediaWiki has a user interface (UI) definition for 362 languages (up from 348). Of those languages at least 39 language codes are duplicates and/or serve a purpose for usability[5]. Reporting on them, however, is not relevant. So MediaWiki in its current state supports 323 languages (up from 322). To be able to generate statistics on localisation, a MessagesXx.php file should be present in languages/messages. There currently are 346 such files (up from 326), of which 27 are redirects from the duplicates/usability group or just empty[6]. So MediaWiki has an active in-product localisation for 308 languages (up from 299).

The MediaWiki core product recognises several collections of localisable content (three of which are defined in

  • 'normal' messages that can be localised (2369 - up 9% from 2168)
  • optional messages that can be localised, which is mostly used for languages not using a Latin script (187 - up 8% from 173)
  • ignored messages that should not be localised (152 - up 2% from 149)
  • namespace names and namespace aliases (17 - no change)
  • magic words (142 - up 8% from 132)
  • special page names (88 - up 2% from 86)
  • other (directionality, date formats, separators, book store lists, link trail, and others)

Localisation of MediaWiki revolves around all of the above. Reporting is done on the normal messages only.

MediaWiki is more than just the core product. On some 1500 extensions (up 25% from 1200) have some kind of documentation. This analysis will scope only to the code currently present in The source code repository contains give or take 445 extensions (up 25% from 370). Most extensions in the MediaWiki Subversion reposiroty now use the reference implementation for i18n. Currently some 8,200 messages for MediaWiki extensions can be localised in a consistent way (up 37% from 6,000).

MediaWiki localisation in practice

MediaWiki localisation has moved further to a centralised collaborative process in in the past year. Where in 2008 some wikis were still translating in their own MediaWiki: namespace, the introduction of the LocalisationUpdate extension[7], especially in the Wikimedia Foundation wikis, has taken away the last hurdle for local translation against centralised translation: instant gratification. Translations that are committed to Subversion can be added to wikis without requiring software updates on a daily basis.

Little to no translations are submitted throught the bugzilla ticketing system or directly by SVN committers. Exceptions are the localisations of Hebrew, Cantonese, Simplified Chinese, Traditional Chinese and Classical Chinese, that are still actively maintained in SVN, next to regular contributors from the centralised system.


The professional amateur approach

2008 was also the year in which MediaWiki localisation got outside stimuli. A grant given to Stichting Open Progress[8] by Hivos[9] enabled us to provide bounties for less resources languages[10], and enabled us to have an end of year translation rally[11]. Niklas Laxström[12] participated in the Finnish Summer Coding Project[13], which led to a more feature rich and usable Translate extension, allowing translators to be more productive. We (Betawiki staff and Stichting Open Progress) intend to continue trying to get funding to improve language support for MediaWiki and FOSS in general. If you think you can help in any way to achieve this goal, please do no hesitate to contact any of us.

Multiple developers contribute on i18n and L10n features. Most, if not all features that are added to subversion these days are audited for i18n or L10n omissions. These are usually corrected quickly after being discovered. Core messages and extension messages have been made more consistent. These are ongoing processes, that we need your help and awareness for[5,7].

MediaWiki localisation statistics

Per end of 2007 MediaWiki localisation has no longer only focused on a complete translation of core messages, but also on messages used in the most often use cases. This resulted in a set of about 25% of the MediaWiki core messages that really have to be translated before a language is really usable with MediaWiki. Because software like MediaWiki is ever changing, an updated of this list will be released in the coming week[4].

Daily statistics for MediaWiki and extension localisation are created at Last year, some (arbitrary) milestones have been set for four collections of MediaWiki related messages. For the usability of MediaWiki in a particular language, the group 'core most used' is the most important. A language must qualify for MediaWiki to have 'minimal support' for that language in the first group. Reaching further milestones indicates the maturity of a localisation:

  • core most used (485 messages): 98%
  • core (2,168 messages): 90%
  • wikimedia extensions (1,067 messages): 90%
  • extensions (6,013 messages): 65%

Currently the following numbers of languages have passed the above milestones:

  • core most used: 109 (33.9% of supported languages - up 132% from 47)
  • core: 68 (21.1% of supported languages - up 39% from 49)
  • Wikimedia extensions: 36 (11.2% of supported languages - up 260% from 10)
  • extensions: 21 (6.5% of supported languages - up 200% from 7)

As you can see, the changes in the past year are gigantic. And these changes have been accomplished even though many messages have disappeared from and have been added to all the message groups. Currently MediaWiki core contains 303,863 messages (up 77% from 171,261 ultimo 2007).


So... Is MediaWiki doing well on localisation? Just like last year, my personal opinion is that we do a proper job, but can still do a lot better. Observing that there are more than 250 Wikipedias that all use the Wikimedia Commons media repository, and that 109 languages have a minimal localisation, there is a lot of room for improvement. With the Wikimedia Foundation using Single User Login, MediaWiki must do better.

Last year I mentioned a few example cases of languages that did very well, and also a few language that didn't do well. So what happened there? Well, Hindi got a boost, but sank away as messages were added without an active maintainer. Asturian and Extremaduran have no active maintainers. Bikol Central is not doing to bad, and Lower Sorbian and Galician are doing great. Languages from Asia have heavily improved their MediaWiki localisation last year. But where are the languages from Africa? In the past year we have seen steady contributions for Amharic and Afrikaans, some for Swahili, Wolof and Yoruba, but all in all, just not enough to provide (native) speakers with a user interface in their own language.

With the Wikimedia Foundation aiming to put MediaWiki to good use in developing countries and products like NGO-in-a-box that include MediaWiki, the potential of MediaWiki as a tool in creating and preserving knowledge in the languages of the world is huge. We have to tap into that potential and *you* (yes, I am glad you read this far and are now reading my appeal) can help. If you know people that are proficient in a language and like contributing to localisation, please point them in the right direction. If you know of organisations that can help localise MediaWiki: please approach them and ask them to help.

We have all the tools to successfully localise MediaWiki into any of the 7000 or so languages that have been classified in ISO 639-3. We only need one person per language to make and effort and make it happen. Reaching the first milestone (core most used) takes about six hours of work. Using Betawiki or the gettext file, little to no technical knowledge is required.

This was the pitch, basically the same as in 2007, but with more experience and data. Three of the four goals I set in last years' e-mail have not been reached. I did not take into account how rapidly MediaWiki would grow, or how quickly we could standardise i18n implementation for extensions. Goals for MediaWiki localisation per end of 2009 remain largely the same as for 2008:

  • core most used: 130 languages with 98% or more localised
  • core: 90 languages with 90% or more localised
  • wikimedia extensions: 50 languages with 90% or more localised
  • extensions: 30 languages with 65% or more localised

I would like to wish everyone involved in any aspect of MediaWiki a wonderful 2009.


Siebrand Mazeland

[1] [2] [3] i18n Bugzilla issues: [4] Translate extension bugs and feature requests: [5] als, be-x-old, ckb, crh, de-at, de-ch, de-formal, dk, en-gb, fiu-vro, gan, got, hif, kk, kk-cn, iu, kk-kz, kk-tr, ko-kp, ku, ku-arab, nb, ruq, simple, sr, tg, tp, tt, ug, zh, zh-classical, zh-cn, zh-sg, zh-hk, zh-min-nan, zh-mo, zh-my, zh-tw, zh-yue [6] als, be-x-old, bh, ckb, ckb-latn, crh, de-at, dk, en-rtl, fiu-vro, gan, hif, hif-deva, ii, iu, kk, kk-cn, kk-kz, kk-tr, ko-kp, ks, ku, nb, pi, ruq, simple, st, tg, tp, tt, ug, zh-classical, zh-cn, zh-min-nan, zh-mo, zh-my, zh-sg, zh-yue [7]

[4] [7] [8] [9] [10] [11] [12] [13]