This page documents the tools and processes for synchronization of translations in source code repositories for translation admins.
High level summary
Translation admins synchronize translation made in translatewiki.net with over a thousand source code repositories. This process is automated to a large extend. This process can be divided into two parts: import and export. Each part has multiple steps.
Import means that we update messages in translatewiki.net based on changes made in the source code repositories. Checkouts of all the source code repositories of all the supported projects are in the directory
/resources/projects on the server. These checkouts use anonymous read access and a dedicated user account
betawiki so that translation admins and scheduled scripts can update the checkouts without having to deal with access and permissions. The main synchronization scripts automatically take care of the permissions.
Import consists of the following steps:
- Update the source code repository checkouts for import to the newest version.
- Run the script
processMessageChanges.phpfrom the Translate extension. The script compares the messages and translations in the repository to those in the wiki. The script creates a list of changes based on the type of the changes in the source code repositories. For example, rename of a message must be handled differently from a change of a translations.
- Process the list of changes. Some changes are executed automatically. Translation admins process the remaining changes using Special:ManageMessageGroups.
Export means that a translation admins pushes all new and updated translations in translatewiki.net to the source code repositories. Each translation admin has their own set of checkouts under
/scratch/<username>/. In the future, we may use a shared set of checkouts for exports as well.
Export consists of the following steps:
- Update the source code repository checkouts for export to the same version as the source code repository checkouts for import, or to the latest version if repository state synchronization is disabled. See #Repository state synchronization.
- Export translations using the script
export.phpfrom the Translate extension.
- Create commits for the changes and push those commits to update the canonical version of the the source code repositories.
The automated steps for import and export actions are captured by the following commands. Each command has two versions. One version does the action for all MediaWiki related source code repositories hosted in Wikimedia Gerrit and the other version does the action for all other source code repositores. These two sets of projects are managed by different translation admins and use different schedules.
autoimportdoes import for all supported projects (except MediaWiki in Wikimedia Gerrit)
autoimport-mediawikidoes import for all MediaWiki code (core, extensions, skins hosted in Wikimedia Gerrit)
autoexportdoes export for all supported projects (except MediaWiki in Wikimedia Gerrit)
autoexport-mediawikidoes export for all MediaWiki code (core, extensions, skins hosted in Wikimedia Gerrit)
autoimport* scripts are automatically run by Cron multiple times a day. They announce any changes that need manual processing in the
#translatewiki IRC channel on Libera.Chat. Those scripts can also be run manually (but the output is still in IRC).
autoexport* scripts can only be used if you are a member of the l10n-bot group. Remember to create
REPONG_VARIANT file with contents
export so that repositories are created so that you can commit localisation updates.
The import scripts usually only take few minutes to complete. The export scripts take around 20 to 60 minutes to complete – likely longer if you are doing exports for the first time.
Subscribe to Phab:T208190 to stay aware of known export failures and add new ones you notice.
Repository state synchronization
Repository state synchronization prevents the export scripts from accidentally undoing changes that have been made to the source code repositories after the last import. It is enabled by default if
state-directory is configured in #repoconfig.yaml.
The state synchronization is applied during exports. When updating a repository for translation exports, the commands will check what version the read-only checkouts are, and update the export repository to the same version. With
git repositories, after the translation updates have been exported to the file system, a
git rebase is applied to bring the changes on top of latest upstream version. If the rebase fails, translation updates for that repository fail until read-only checkouts are updated.
Here is an example of sequence where accidental overwrites can happen without state synchronization:
- We update read-only checkout of a project to latest version A.
- We process changes automatically or manually using Special:ManageMessageGroups.
- Upstream modifies English (source language) and message documentation files to create version B.
- We export and commit translations on top of A to create version C.
The upstream changes to message documentation in B are lost. Do note that changes in English files are not usually exported, as it is usually the source language. Hence those changes would not get overwritten in this scenario.
In the future, we would like even stronger synchronization. Here is an example where accidental overwrites can still happen:
- Upstream modifies all translation files: for example, by updating a copyright date in a string and removing some unused strings (basically, any changes that make --safe-import to not process changes automatically).
- We update read-only checkout from version A to latest version B.
- We do not process changes using Special:MessageGroupChanges
- We export and commit translations on top of B (but based on A) to create version C.
Upstream changes in version B are lost. For the next import we automatically update our read-only checkout to version C, so we never see or process changes in B.
Improving the synchronization is tracked on Phab:T182433.
Sometimes you don't want to deal with all repositories, for example while doing one-off exports or setting things up. You should use
repo command when working with the read-only checkouts. You should use
repomulti command when working with write checkouts. These commands set up appropriate permissions for you.
Each of the commands documented in this section will crawl up the directory tree until it finds the
repoconfig.yaml file. This means you can run these commands in the project subdirectories as well.
repo command automatically uses the
betawiki user and the
/resources/projects read-only repositories using the
/home/betawiki/config/repoconfig.yaml configuration. It takes two commands:
commitcan be used too, but they don't make any sense!)
- project name as above
You need to be in the
betawiki shell user group to be able to use this command.
repomulti is a versatile command, which takes two arguments:
- project selector regular expression, defaults to all projects
Warning: Unlike all other commands, this command takes a regular expression for the group selector!
You need to be in
l10n-bot shell user group to be able to use this command.
For example, if you haven't done exports in a while, you can run
repomulti update first, so that when you run
autoexport later it will be faster. Or if you need to do manual exports to specific projects, you can do:
repomulti update '^mw.*' # matches mwgitlab, mwgithub and mwgerrit, but not mediawiki-extensions repomulti export '^mw.*' repomulti status '^mw.*' # you can check what has changed, prints git/svn status information repomulti commit '^mw.*'
Finally, for scripting purposes, we have three commands:
repoupdateupdates a repositories.
repoexportexports the translations.
repocommitcreates commits and pushes them out.
Each command takes a project as the second argument, e.g.
mediawiki. These commands do not set up permissions, so you need to do it manually using
l10n-botas well as
git-ssh-wrapper as necessary.
repo* commands are only thin wrappers to
repong.php script. This scripts does most of the actual work, although it uses
export.php from Translate and
clupdate-X-repo where X is one of
bzr that support different version control systems and authentication. Authentication should be separated from the version control in the future.
RepoNG has some nice feature such as doing things in parallel using multiple threads to speed things up. It also handles state synchronization between the read-only and write checkouts so that we do not accidentally overwrite changes we haven't processed yet.
This command takes two arguments:
- command: One of
- project name as above
It also has two switches:
- -v makes it to print out the commands it executes. Useful for debugging. By default the script is very quiet.
- --variant can be used choose a variant from the config (currently only
exportis supported. Default is taken from
REPONG-VARIANTfile that is created alongside the
repoconfig.yamlfile. If neither is given, it will default to default variant used for read-only checkouts.
repoconfig.yaml is the configuration file that acts as a list of managed repositories for RepoNG. It is a YAML file that that contains one or more projects. The basic structure of a project is as follows:
project name: # Project contains project properties group: example repos: checkout-path: # Repos contain repo properties type: github url: https://github.com/translatewiki/example.git
Same config file can be used in multiple contexts using variants. The most common case is a shared unauthenticated read-only checkout of all repositories and one or more authenticated checkouts for pushing translation updates. Any property can vary, but it's recommended to only vary scalar values. Variants are specified by appending
| to the key followed by the variant name. In examples below you see examples of this feature using
export as the variant name.
Global properties are given under a virtual project name
@meta. Mandatory keys are bolded.
|expand||Command line Translate's |
|export||Command line to Translate's |
|Note that you can add |
|state-directory||Path to your read-only checkouts. See #Repository state synchronization.|
'@meta': export: php /srv/mediawiki/targets/production/extensions/Translate/scripts/export.php expand: php /srv/mediawiki/targets/production/extensions/Translate/scripts/expand-groupspec.php --exportable state-directory|export: /resources/projects
Project properties apply to all repositories under a project. Mandatory keys are bolded.
|group||Which message groups are connected to this project. This accepts a |
|repos||List of repositories connected to this project. This is a hash where the key is the filesystem path relative to the repoconfig.yaml where the repository is checked out, and the value is a hash of repository properties. Exception to this is key |
|export-threshold||This controls the |
Default value is 25.
|no-export-languages||This controls the |
|always-export-languages||These languages are always exported even if they do not pass export-threshold. Useful for language variants which are not expected to reach 100%, such as British English.
Default value is
|auto-merge||If set, after pushing updates to Wikimedia Gerrit it will give CR+2 to all of them.
Only applicable for repository type
To use patterns, the string must start with
mediawiki-extensions: always-export-languages: en-ca,en-gb,es-formal,de-formal,de-at,de-ch,hu-formal,nl-informal,zh-hk no-export-languages: test,aeb,ais,be-x-old,crh,dk,en,fiu-vro,gan,gom,hif,kbd,kk,kk-cn,iu,kk-kz,kk-tr,ko-kp,ku,ku-arab,no,ruq,simple,sr,tg,tp,tt,ug,zh,zh-classical,zh-cn,zh-sg,zh-min-nan,zh-mo,zh-my,zh-tw,zh-yue,bbc,ady export-threshold: 0 group: ext-* auto-merge: ^mediawiki/extensions/.* repos: '@generator': php ../groups/MediaWiki/repong-generator.php extensions
Repository properties only apply to one repository. Mandatory keys are bolded.
|type||Supported values are:
|url||URL of the repository. This usually varies with variant configuration.|
|branch||Which branch to use. Ignored for repository type |
Default value is
|push-branch||Which branch translation updates are pushed to. This will do a force push to create or update the remote branch. Only applicable for repository types |
|pull-branch||Same as |
|no-state-sync||Disables repository state synchronization. It is automatically disabled if |
|svn-add-options||Additional options for Subversion to be applied for newly added files.|
fudforum: group: out-fudforum repos: fudforum: type: svn url: svn://svn.code.sf.net/p/fudforum/code/trunk/install/forum_data/thm/default/i18n svn-add-options: config:auto-props:msg=svn:mime-type=text/plain;svn:eol-style=native url|export: svn+ssh://email@example.com/p/fudforum/code/trunk/install/forum_data/thm/default/i18n
How to process external message changes
processMessageChanges.php directly, one gets a link to Special:ManageMessageGroups. On this page one does a sanity check of the changes before "accepting" them to translatewiki.net.
The page consists of diffs, where external state (files in repositories) is on the first column and the wiki state is on the second column. Changes seen on this page usually fall into the following categories:
- New messages in source language.
- There is usually nothing to check for these, and in fact
autoimportwill accept all new messages for a message group if there are no other changes. If there is something that doesn't look translatable (empty messages, URLs with no translations, symbols) one should update message group configuration to list these messages either as
- Messages or translations deleted.
- Again, these can usually be safely accepted. If there is a large amount of unexpected deletions, there might be a syntax error in the source file, that should be fixed before proceeding. We don't delete translation that go unused from the wiki.
- Changed messages in source language.
- It is normal for the messages in source language to changed. In this case one should see if the change is something that doesn't require fixes in translations (usually only spelling mistakes fill this criteria) and in that case choose the option to not mark translations as outdated.
- Changes in translations.
- Our exports are not yet fully atomic. Changes in translations should be checked carefully, because the system might try to overwrite a very recent translation with the previous one. It might also be an external change, in which case one should use his/her best judgement which version to choose.
- Renamed messages.
- Message renames can also happen, although they are discouraged. In this case you would see a message deleted and a new message with exactly the same content. Translations might or might not be renamed externally.
If messages keys are renamed while the content is exactly the same, Translatewiki.net will match the messages automatically and display them. The matching can be broken by selecting the Add as new menu option next to the matched renames.
Sometimes renames and content change is done at the same time. In such cases Translatewiki.net will not be able to match the messages automatically. If such messages are spotted, they can be manually matched using the Add as a rename menu option
The dialog box that appears when renaming, displays the list of messages that Translatewiki.net detects as missing, as of this import for that group. It also displays the similarity % between the message for which we are trying to find a rename, and the possible renames. The actual renaming is performed once the page is submitted and happens in the background via the JobQueue.
Using Special:ReplaceText to rename messages: We can still rename messages using the Special:ReplaceText. Copy the old and new key to Special:ReplaceText without namespace and language code, but include the trailing
/. Regular expression can be included to rename multiple similar messages at once, but then
/ might need to be written as
\/. Uncheck all namespaces and check the namespace and its talk namespace in question. Uncheck replace in content and check replace in titles. On the confirmation page check that all pages can be renamed – it might show that source page cannot be renamed if you don't have sufficient permissions. Once you have done the renames, wait a bit for JobQueue to process them, and then re-run the script to re-generate the diffs and proceed as usual. After you accept all changes, you might get an page with heading but no diffs, this is okay and it will disappear after re-generating the diffs next time.
Sometimes the diffs can be messy, for example if people duplicate messages, or do renames and change content at the same time. These situations need special care to get them right (or sometimes it is just too difficult and we make translators re-translate using translation memory). In some cases there are changes to the source message that could be programmatically applied to the translation using Special:ReplaceText or similar. The issue with these tools is that they do not preserve the outdated status of translations, so one should make sure they are either automatically or manually marked outdated after the automatic replacements. For this reason it is not usually worth trying to do those changes programmatically.
Project maintainers are advised to inform translation admins if there are major message renames, so check with the Translation admins whenever you have are reviewing a large number of renames as they may have further information about it.
Sometimes the cache can go out of sync and does not show changes that should be there.
One case where this happens is when adding a prefix to a message that has already been imported. In this case you can delete
/resources/caches/translatewiki.net/translate_groupcache-*groupid*/*languagecode*.cdb for appropriate languages (usually
qqq) and re-run the import.
Another case where this happens is when replacing a regular message group with an aggregate message group. This can be observed by a warning such as
AggregateMessageGroup ext-scribunto cannot be primary owner of key scribunto-lua-error. In this case you can safely remove the whole directory for that group and run
createmi to verify.
Sometimes for debugging it is useful to go very low level to figure out why access to some repository fails. Some examples:
# Check whether you can access ssh-agent l10n-bot ssh-add -L # Execute git command with verbose mode enabled for ssh: GIT_SSH_COMMAND="git-ssh-wrapper -v" l10n-bot git fetch