Repository management

From translatewiki.net
Jump to navigation Jump to search

This page documents how the translatewiki.net volunteer staff handles localisation updates.

High level summary

Translatewiki.net deals with over a thousand version control repositories, mainly git but also some using Subversion. To cope with so many repositories, things have been highly automated. There are two main actions, import and export, that consist of multiple subtasks.

Import means that we synchronise any changes made to the repositories into translatewiki.net. We host read-only checkouts of all the repositories under /resources/projects in the main translatewiki.net server. These checkouts use anonymous read access and a dedicated user account betawiki, so that all our staff can update them without having to deal with access and permissions. Our scripts automatically use the correct user (to which our staff has sudo access). The import consists of three subtasks:

  1. Update the read-only repository checkouts
  2. Invoke processMessageChanges.php from the Translate extension to compare the state in the repository and the state in the wiki to list all changes
  3. The list of changes or manually or semi-automatically processed using Special:ManageMessageGroups

Export means that we push all new and updated translations done in translatewiki.net to the repositories. Each person doing exports have their own write repositories under /resources/<username>/. In principle there is no reason why this could not be shared between different users like the read-only repositories are. The export consists of three subtasks:

  1. Update the write repository checkouts to the state of the read-only repository checkouts (so that we do not accidentally overwrite any changes we haven't yet processed)
  2. Export translations using export.php from the Translate extension
  3. Add commits and push those changes using a dedicated translatewiki accounts in different repository hosting places

In the simplest form one only needs to run the following scripts. We have two versions of each because Raymond is taking care of MediaWiki code hosted in Wikimedia Gerrit and pushing updates daily. Currently Nikerabbit is doing all the rest and pushing updates twice a week.

  • autoimport does import for all supported projects (except MediaWiki in Wikimedia Gerrit)
  • autoimport-mediawiki does import for all MediaWiki code (core, extensions, skins hosted in Wikimedia Gerrit)
  • autoexport does export for all supported projects (except MediaWiki in Wikimedia Gerrit)
  • autoexport-mediawiki does export for all MediaWiki code (core, extensions, skins hosted in Wikimedia Gerrit)

The autoimport* scripts are automatically run by Cron multiple times a day. They announce any changes that need manual processing in the #mediawiki-i18n IRC channel in Freenode network. Those scripts can also be run manually (but the output is still in IRC).

The autoexport* scripts can only be used if you are a member of the l10n-bot group. Remember to create REPONG_VARIANT file with contents export so that repositories are created so that you can commit localisation updates.

The import scripts usually only take few minutes to complete. The export scripts take around 20 to 60 minutes to complete – likely longer if you are doing exports for the first time.

Subscribe to Phab:T208190 to stay aware of known export failures and add new ones you notice.

Internal commands

Sometimes you don't want to deal with all repositories, for example while doing one-off exports or setting things up. The scripts above internally call the following scripts:

  • repoupdate updates a repositories
  • repoexport exports the translations
  • repocommit creates commits and pushes them out

Each command takes a project as an argument, e.g. freecol, mediawiki. The list of projects is listed in repoconfig.yaml, that is part of the translatewiki code repository. Each of the commands will crawl up the directory tree until it finds a file named repoconfig.yaml. All project directories will be placed under that directory. Each project can have one or more repositories, and each repository can have one or more message groups that are defined in group configuration (under groups in the translatewiki code repository). The link between the two is the group definition in repoconfig.yaml.

There is fourth special command: repo. This command automatically uses the betawiki user and the /resources/projects read-only repositories using the /home/betawiki/config/repoconfig.yaml configuration. It takes two commands:

  • command: update (export and commit can be used too, but they don't make any sense!)
  • project name as above

RepoNG

The above repo* commands are only thin wrappers to repong.php script. This scripts does most of the actual work, although it uses export.php from Translate and clupdate-X-repo where X is one of git, gerrit, github, svn, bzr that support different version control systems and authentication. Authentication should be separated from the version control in the future.

RepoNG has some nice feature such as doing things parallel using multiple threads to speed things up. It also handles state synchronization between the read-only and write checkouts so that we do not accidentally overwrite changes we haven't processed yet.

This command takes two arguments:

  • command: One of update, export, commit
  • project name as above

It also has two switches:

  • -v makes it to print out the commands it executes. Useful for debugging. By default the script is very quiet.
  • --variant can be used choose a variant from the config (currently only export is supported. Default is taken from REPONG-VARIANT file that is created alongside the repoconfig.yaml file. If neither is given, it will default to default variant used for read-only checkouts.

How to use Special:ManageMessageGroups

When running autoimport or autoimport-mediawiki or processMessageChanges.php directly, one gets a link to Special:ManageMessageGroups. On this page one does a sanity check of the changes before "accepting" them to translatewiki.net.

The page consists of diffs, where external state (files in repositories) is on the first column and the wiki state is on the second column. Changes seen on this page usually fall into the following categories:

New messages in source language. There is usually nothing to check for these, and in fact autoimport will accept all new messages for a message group if there are no other changes. If there is something that doesn't look translatable (empty messages, URLs with no translations, symbols) one should update message group configuration to list these messages either as ignored or optional as appropriate.

Messages or translations deleted. Again, these can usually be safely accepted. If there is a large amount of unexpected deletions, there might be a syntax error in the source file, that should be fixed before proceeding. We don't delete translation that go unused from the wiki.

Changed messages in source language. It is normal for the messages in source language to changed. In this case one should see if the change is something that doesn't require fixes in translations (usually only spelling mistakes fill this criteria) and in that case choose the option to not mark translations as outdated.

Changes in translations. Our exports are not yet fully atomic. Changes in translations should be checked carefully, because the system might try to overwrite a very recent translation with the previous one. It might also be an external change, in which case one should use his/her best judgement which version to choose.

Renamed messages. Message renames can also happen, although they are discouraged. In this case you would see a message deleted and a new message with exactly the same content. Translations might or might not be renamed externally. Regardless, one should use tool like Special:ReplaceText to rename the keys to preserve full edit history. Tips: Copy the old and new key to Special:ReplaceText without namespace and language code, but include the trailing /. Regular expression can be included to rename multiple similar messages at once, but then / might need to be written as \/. Uncheck all namespaces and check the namespace and its talk namespace in question. Uncheck replace in content and check replace in titles. On the confirmation page check that all pages can be renamed – it might show that source page cannot be renamed if you don't have sufficient permissions. Once you have done the renames, wait a bit for JobQueue to process them, and then re-run the script to re-generate the diffs and proceed as usual. After you accept all changes, you might get an page with heading but no diffs, this is okay and it will disappear after re-generating the diffs next time.

Sometimes the diffs can be messy, for example if people duplicate messages, or do renames and change content at the same time. These situations need special care to get them right (or sometimes it is just too difficult and we make translators re-translate using translation memory). In some cases there are changes to the source message that could be programmatically applied to the translation using Special:ReplaceText or similar. The issue with these tools is that they do not preserve the outdated status of translations, so one should make sure they are either automatically or manually marked outdated after the automatic replacements. For this reason it is not usually worth trying to do those changes programmatically.