User:Siebrand/An update on localisation in MediaWiki (2008)

From translatewiki.net

Mail has been sent. Please do not make any changes.

On 31 December 2007 I sent an e-mail to these lists, to which this is a follow
up[1].

First things first, because not everyone reads e-mails completely:
* MediaWiki localisation (that is the translation of English source messages to
  other languages) depends on you! If you speak a language other than English,
  care about this and like translating, go to http://translatewiki.net, register
  a user and start contributing translations for MediaWiki messages and
  extension messages. When your localisation is complete, keep coming back
  regularly to re-complete it and do quality control. Thank you in advance for
  all your contributions and effort.
* The i18n and L10n area of MediaWiki requires continuous efforts. If this area
  of FOSS has your interest, please do not hesitate and offer your development
  skills to further MediaWiki's i18n and L10n capabilities[5,6].

All statistics are based on MediaWiki 1.14 alpha, SVN version r45277
(1 January 2009). Comparisons are to MediaWiki 1.12 alpha, SVN version r29106
(31 December 2007).

==Introduction==
* Localisation or l10n - the process of adapting the software to be as familiar
  as possible to a specific locale (in scope)
* Internationalisation or i18n - the process of ensuring that an application is
  capable of adapting to local requirements (out of scope)

MediaWiki has a user interface (UI) definition for 348 languages (up from 319).
Of those languages at least 26 language codes are duplicates and/or serve a
purpose for usability[2]. Reporting on them, however, is not relevant. So
MediaWiki in its current state supports 322 languages (up from 302). To be
able to generate statistics on localisation, a MessagesXx.php file should be
present in languages/messages. There currently are 326 such files (up from
262), of which 27 are redirects from the duplicates/usability group or just
empty[3]. So MediaWiki has an active in-product localisation for 299 languages
(up from 236).

The MediaWiki core product recognises several collections of localisable
content (three of which are defined in messageTypes.inc):
* 'normal' messages that can be localised (2168 - up 26% from 1726)
* optional messages that can be localised, which usually only happens for
  languages not using a Latin script (173 - up 7% from 161)
* ignored messages that should not be localised (149 - up 49% from 100)
* namespace names and namespace aliases (17 - no change)
* magic words (132 - up 10% from 120)
* special page names (86 - up 13% from 76)
* other (directionality, date formats, separators, book store lists, link
  trail, and others)

Localisation of MediaWiki revolves around all of the above. Reporting is done
on the normal messages only.

MediaWiki is more than just the core product. On
http://www.mediawiki.org/wiki/Category:All_extensions some 1200 extensions (up
60% from 750) have some kind of documentation. This analysis will scope only
to the code currently present in svn.wikimedia.org/svnroot/mediawiki/trunk.
The source code repository contains give or take 370 extensions (up 61% from
230). Of those 370 extensions, about 300 contain messages that can be visible
in the UI in some use case (debugging excluded). Out of those 300, about 35
have an exotic implementation for localisation support, no localisation at all
(just English text in the code), or are outdated, broken or obsolete.
Compared to last year, when there were about 5 different 'standard'
implementations of i18n in extensions, the situation has changed a lot. The
vast majority of extensions now make use of $wgExtensionMessagesFiles and
wfLoadExtensionMessages. Currently some 6,000 messages for extensions can be
localised in a consistent way (up 200% from 2,000).

==MediaWiki localisation in practice==
Ways to to MediaWiki localisation have not changed a lot in the past year.
Still, the changes that have taken place have a profound impact on the quality
and volume of localisation for MediaWiki.
* in local wikis: we have scavenged all Wikimedia wikis for existing
  translations, imported those in the base product and tried to recruit
  translators to audit and extend the centralised localisation. This project
  has been very succesful, and aside from a few exceptions, local wikis now
  customise and no longer do base localisation.
* through bugzilla/svn: A user of MediaWiki submits patches for core messages
  and/or extensions. These users are mostly part of a wiki community that is
  part of Wikimedia. Compared to last year, this group of localisation
  contributors has decreased. The maintainer for German for example started
  working in Betawiki, because he stated that he was no longer able to keep
  up with the workload. Languages that remain getting (very) frequent updates
  through subversion are Danish, Hebrew, and Chinese (4 variants). The number
  of localisations maintained this way has dropped from more than 10 to about
  6. Localisation updates submitted through Bugzilla are virtually
  non-existent.
* through Project: In the past year Betawiki has about doubled in size
  (translators, translations, supported products, traffic, etc.). 95% or more
  of the localisation volume for MediaWiki goes through this wiki, and a lot
  of development has been done on the Translate extension in the past year.
  Betawiki staff remains committed to MediaWiki i18n and L10n, and still has a
  strong belief in collaborative localisation.

==The professional amateur approach==
2008 was also the year in which MediaWiki localisation got outside stimuli. A
grant given to Stichting Open Progress[8] by Hivos[9] enabled us to provide
bounties for less resources languages[10], and enabled us to have an end of year
translation rally[11]. Niklas Laxström[12] participated in the Finnish Summer
Coding Project[13], which led to a more feature rich and usable Translate
extension, allowing translators to be more productive. We (Betawiki staff and
Stichting Open Progress) intend to continue trying to get funding to improve
language support for MediaWiki and FOSS in general. If you think you can help
in any way to achieve this goal, please do no hesitate to contact any of us.

Multiple developers contribute on i18n and L10n features. Most, if not all
features that are added to subversion these days are audited for i18n or
L10n omissions. These are usually corrected quickly after being discovered.
Core messages and extension messages have been made more consistent. These
are ongoing processes, that we need your help and awareness for[5,7].

==MediaWiki localisation statistics==
Per end of 2007 MediaWiki localisation has no longer only focused on a complete
translation of core messages, but also on messages used in the most often use
cases. This resulted in a set of about 25% of the MediaWiki core messages that
really have to be translated before a language is really usable with MediaWiki.
Because software like MediaWiki is ever changing, an updated of this list will
be released in the coming week[4].

Daily statistics for MediaWiki and extension localisation are created at
http://translatewiki.net/wiki/Translating:Group_statistics. Last year, some
(arbitrary) milestones have been set for four collections of MediaWiki
related messages. For the usability of MediaWiki in a particular language,
the group 'core most used' is the most important. A language must qualify for
MediaWiki to have 'minimal support' for that language in the first group.
Reaching further milestones indicates the maturity of a localisation:
* core most used (485 messages): 98%
* core (2,168 messages): 90%
* wikimedia extensions (1,067 messages): 90%
* extensions (6,013 messages): 65%

Currently the following numbers of languages have passed the above milestones:
* core most used: 109 (33.9% of supported languages - up 132% from 47)
* core: 68 (21.1% of supported languages - up 39% from 49)
* Wikimedia extensions: 36 (11.2% of supported languages - up 260% from 10)
* extensions: 21 (6.5% of supported languages - up 200% from 7)

As you can see, the changes in the past year are gigantic. And these changes
have been accomplished even though many messages have disappeared from and
have been added to all the message groups. Currently MediaWiki core
contains 303,863 messages (up 77% from 171,261 ultimo 2007).

==Conclusion==
So... Is MediaWiki doing well on localisation? Just like last year, my personal
opinion is that we do a proper job, but can still do a lot better. Observing
that there are more than 250 Wikipedias that all use the Wikimedia Commons
media repository, and that 109 languages have a minimal localisation, there is
a lot of room for improvement. With the Wikimedia Foundation using Single User
Login, MediaWiki must do better.

Last year I mentioned a few example cases of languages that did very well, and
also a few language that didn't do well. So what happened there? Well, Hindi
got a boost, but sank away as messages were added without an active maintainer.
Asturian and Extremaduran have no active maintainers. Bikol Central is not
doing to bad, and Lower Sorbian and Galician are doing great. Languages from
Asia have heavily improved their MediaWiki localisation last year. But where
are the languages from Africa? In the past year we have seen steady
contributions for Amharic and Afrikaans, some for Swahili, Wolof and Yoruba,
but all in all, just not enough to provide (native) speakers with a user
interface in their own language.

With the Wikimedia Foundation aiming to put MediaWiki to good use in developing
countries and products like NGO-in-a-box that include MediaWiki, the potential
of MediaWiki as a tool in creating and preserving knowledge in the languages of
the world is huge. We have to tap into that potential and *you* (yes, I am glad
you read this far and are now reading my appeal) can help. If you know people
that are proficient in a language and like contributing to localisation, please
point them in the right direction. If you know of organisations that can help
localise MediaWiki: please approach them and ask them to help.

We have all the tools to successfully localise MediaWiki into any of the 7000
or so languages that have been classified in ISO 639-3. We only need one person
per language to make and effort and make it happen. Reaching the first
milestone (core most used) takes about six hours of work. Using Betawiki or the
gettext file, little to no technical knowledge is required.

This was the pitch, basically the same as in 2007, but with more experience and
data. Three of the four goals I set in last years' e-mail have not been
reached. I did not take into account how rapidly MediaWiki would grow, or how
quickly we could standardise i18n implementation for extensions. Goals for
MediaWiki localisation per end of 2009 remain largely the same as for 2008:
* core most used: 130 languages with 98% or more localised
* core: 90 languages with 90% or more localised
* wikimedia extensions: 50 languages with 90% or more localised
* extensions: 30 languages with 65% or more localised

I would like to wish everyone involved in any aspect of MediaWiki a wonderful 2009.

Cheers!

Siebrand Mazeland

[1] http://lists.wikimedia.org/pipermail/translators-l/2007-December/000571.html
[2] als,be-x-old,crh,de-formal,dk,en-gb,hif,kk,kk-cn,iu,kk-kz,kk-tr,ku,nb,ruq,
    simple,sr,tg,tp,tt,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-mo,zh-yue
[3] als,be-x-old,crh,dk,hif,ii,iu,kk-cn,kk-kz,kk-tr,ks,ku,lld,nb,ruq-grek,ruq,
    simple,tg,tp,tt,ydd,zh-cn,zh-min-nan,zh-mo,zh-my,zh-sg,zh-yue
[4] http://translatewiki.net/wiki/Most_often_used_messages_in_MediaWiki
[5] i18n Bugzilla issues: https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&component=Internationalization&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED
[6] Translate extension bugs and feature requests: http://translatewiki.net/wiki/User:Siebrand#Bugs
[7] http://translatewiki.net/wiki/Support
[8] http://www.openprogress.org/Stichting_Open_Progress
[9] http://www.hivos.nl/eng
[10] http://translatewiki.net/wiki/Translating:Language_project
[11] http://translatewiki.net/wiki/Project:News/Newsletter_2008-12-2
[12] http://nike.fixme.fi/blag/
[13] http://www.coss.fi