This page documents how the translatewiki.net volunteer staff handles localisation updates.
High level summary
Translatewiki.net deals with over a thousand version control repositories, mainly git but also some using Subversion. To cope with so many repositories, things have been highly automated. There are two main actions, import and export, that consist of multiple subtasks.
Import means that we synchronise any changes made to the repositories into translatewiki.net. We host read-only checkouts of all the repositories under
/resources/projects in the main translatewiki.net server. These checkouts use anonymous read access and a dedicated user account
betawiki, so that all our staff can update them without having to deal with access and permissions. Our scripts automatically use the correct user (to which our staff has sudo access). The import consists of three subtasks:
- Update the read-only repository checkouts
processMessageChanges.phpfrom the Translate extension to compare the state in the repository and the state in the wiki to list all changes
- The list of changes or manually or semi-automatically processed using Special:ManageMessageGroups
Export means that we push all new and updated translations done in translatewiki.net to the repositories. Each person doing exports have their own write repositories under
/resources/<username>/. In principle there is no reason why this could not be shared between different users like the read-only repositories are. The export consists of three subtasks:
- Update the write repository checkouts to the state of the read-only repository checkouts (so that we do not accidentally overwrite any changes we haven't yet processed)
- Export translations using
export.phpfrom the Translate extension
- Add commits and push those changes using a dedicated translatewiki accounts in different repository hosting places
In the simplest form one only needs to run the following scripts. We have two versions of each because Raymond is taking care of MediaWiki code hosted in Wikimedia Gerrit and pushing updates daily. Currently Nikerabbit is doing all the rest and pushing updates twice a week.
autoimportdoes import for all supported projects (except MediaWiki in Wikimedia Gerrit)
autoimport-mediawikidoes import for all MediaWiki code (core, extensions, skins hosted in Wikimedia Gerrit)
autoexportdoes export for all supported projects (except MediaWiki in Wikimedia Gerrit)
autoexport-mediawikidoes export for all MediaWiki code (core, extensions, skins hosted in Wikimedia Gerrit)
autoimport* scripts are automatically run by Cron multiple times a day. They announce any changes that need manual processing in the
#mediawiki-i18n IRC channel in Freenode network. Those scripts can also be run manually (but the output is still in IRC).
autoexport* scripts will ask you to type the password for the private key that gives access to the dedicated translatewiki accounts. Remember to create
REPONG_VARIANT file with contents
export so that repositories are created so that you can commit localisation updates.
The import scripts usually only take few minutes to complete. The export scripts take around 30 to 60 minutes to complete.
Sometimes you don't want to deal with all repositories, for example while doing one-off exports or setting things up. The scripts above internally call the following scripts:
repoupdateupdates a repositories
repoexportexports the translations
repocommitcreates commits and pushes them out
Each command takes a project as an argument, e.g.
mediawiki. The list of projects is listed in
repoconfig.yaml, that is part of the translatewiki code repository. Each of the commands will crawl up the directory tree until it finds a file named
repoconfig.yaml. All project directories will be placed under that directory. Each project can have one or more repositories, and each repository can have one or more message groups that are defined in group configuration (under
groups in the translatewiki code repository). The link between the two is the
group definition in
There is fourth special command:
repo. This command automatically uses the
betawiki user and the
/resources/projects read-only repositories using the
/home/betawiki/config/repoconfig.yaml configuration. It takes two commands:
commitcan be used too, but they don't make any sense!)
- project name as above
repo* commands are only thin wrappers to
repong.php script. This scripts does most of the actual work, although it uses
export.php from Translate and
clupdate-X-repo where X is one of git, gerrit, github, svn, bzr that support different version control systems and authentication. Authentication should be separated from the version control in the future.
RepoNG has some nice feature such as doing things parallel using multiple threads to speed things up. It also handles state synchronization between the read-only and write checkouts so that we do not accidentally overwrite changes we haven't processed yet.
This command takes two arguments:
- command: One of
- project name as above
It also has two switches:
- -v makes it to print out the commands it executes. Useful for debugging. By default the script is very quiet.
- --variant can be used choose a variant from the config (currently only
exportis supported. Default is taken from
REPONG-VARIANTfile that is created alongside the
repoconfig.yamlfile. If neither is given, it will default to default variant used for read-only checkouts.
How to use Special:ManageMessageGroups
processMessageChanges.php directly, one gets a link to Special:ManageMessageGroups. On this page one does a sanity check of the changes before "accepting" them to translatewiki.net.
The page consists of diffs, where external state (files in repositories) is on the first column and the wiki state is on the second column. Changes seen on this page usually fall into the following categories:
New messages in source language. There is usually nothing to check for these, and in fact
autoimport will accept all new messages for a message group if there are no other changes. If there is something that doesn't look translatable (empty messages, URLs with no translations, symbols) one should update message group configuration to list these messages either as
optional as appropriate.
Messages or translations deleted. Again, these can usually be safely accepted. If there is a large amount of unexpected deletions, there might be a syntax error in the source file, that should be fixed before proceeding. We don't delete translation that go unused from the wiki.
Changed messages in source language. It is normal for the messages in source language to changed. In this case one should see if the change is something that doesn't require fixes in translations (usually only spelling mistakes fill this criteria) and in that case choose the option to not mark translations as outdated.
Changes in translations. Our exports are not yet fully atomic. Changes in translations should be checked carefully, because the system might try to overwrite a very recent translation with the previous one. It might also be an external change, in which case one should use his/her best judgement which version to choose.
Renamed messages. Message renames can also happen, although they are discouraged. In this case you would see a message deleted and a new message with exactly the same content. Translations might or might not be renamed externally. Regardless, one should use tool like Special:ReplaceText to rename the keys to preserve full edit history. Tips: Copy the old and new key to Special:ReplaceText without namespace and language code, but include the trailing
/. Regular expression can be included to rename multiple similar messages at once, but then
/ might need to be written as
\/. Uncheck all namespaces and check the namespace and its talk namespace in question. Uncheck replace in content and check replace in titles. On the confirmation page check that all pages can be renamed – it might show that source page cannot be renamed if you don't have sufficient permissions. Once you have done the renames, wait a bit for JobQueue to process them, and then re-run the script to re-generate the diffs and proceed as usual. After you accept all changes, you might get an page with heading but no diffs, this is okay and it will disappear after re-generating the diffs next time.
Sometimes the diffs can be messy, for example if people duplicate messages, or do renames and change content at the same time. These situations need special care to get them right (or sometimes it is just too difficult and we make translators re-translate using translation memory). In some cases there are changes to the source message that could be programmatically applied to the translation using Special:ReplaceText or similar. The issue with these tools is that they do not preserve the outdated status of translations, so one should make sure they are either automatically or manually marked outdated after the automatic replacements. For this reason it is not usually worth trying to do those changes programmatically.