Repository management
This page documents the tools and processes for synchronization of translations in source code repositories for translation admins.
High level summary
Translation admins synchronize translation made in translatewiki.net with over a thousand source code repositories. This process is automated to a large extend. This process can be divided into two parts: import and export. Each part has multiple steps.
Import means that we update messages in translatewiki.net based on changes made in the source code repositories. Checkouts of all the source code repositories of all the supported projects are in the directory /resources/projects
on the server. These checkouts use anonymous read access and a dedicated user account l10n-bot
so that translation admins and scheduled scripts can update the checkouts without having to deal with access and permissions. The main synchronization scripts automatically take care of the permissions.
Import consists of the following steps:
- Update the source code repository checkouts for import to the newest version.
- Run the script
processMessageChanges.php
from the Translate extension. The script compares the messages and translations in the repository to those in the wiki. The script creates a list of changes based on the type of the changes in the source code repositories. For example, rename of a message must be handled differently from a change of a translations. If there are changes, the message group is locked to prevent imports and exports happening in an inconsistent state. - Process the list of changes. Some changes are imported automatically. Translation admins process the remaining changes using Special:ManageMessageGroups.
- A systemd script checks every 5 minutes if changes are synchronized properly and unlocks such message groups.
Export means that all new, updated and deleted translations in translatewiki.net are sent to the source code repositories.
Export consists of the following steps:
- Update the source code repository checkouts for export to the same version as the source code repository checkouts for import, or to the latest version if repository state synchronization is disabled. See #Repository state synchronization.
- Export translations using the script
export.php
from the Translate extension. - Create commits for the changes and push those commits to update the canonical version of the the source code repositories.
- Do additional steps like creating pull/merge requests or voting in Gerrit.
The automated steps for import and export actions are captured by the following commands. Each command has two versions. One version does the action for all MediaWiki related source code repositories hosted in Wikimedia Gerrit and the other version does the action for all other source code repositories. These two sets of projects are managed by different translation admins and use different schedules.
autoimport
does import for all supported projects (except MediaWiki in Wikimedia Gerrit)autoimport-mediawiki
does import for all MediaWiki code (core, extensions, skins hosted in Wikimedia Gerrit)autoexport
does export for all supported projects (except MediaWiki in Wikimedia Gerrit)autoexport-mediawiki
does export for all MediaWiki code (core, extensions, skins hosted in Wikimedia Gerrit)
The autoimport*
scripts are automatically run by systemd multiple times a day. They announce any changes that need manual processing in the #translatewiki
IRC channel on Libera.Chat. Those scripts can also be run manually (but the output is still in IRC).
The autoexport*
scripts can only be used if you are a member of the l10n-bot group.
The import scripts usually only take few minutes to complete. The export scripts take around 20 to 60 minutes to complete – likely longer if you are doing exports for the first time.
Subscribe to Phab:T208190 to stay aware of known export failures and add new ones you notice.
Repository state synchronization
Repository state synchronization prevents the export scripts from accidentally undoing changes that have been made to the source code repositories after the last import. It is enabled by default if state-directory
is configured in #repoconfig.yaml.
The state synchronization is applied during exports. When updating a repository for translation exports, the commands will check what version the read-only checkouts are, and update the export repository to the same version. With git
repositories, after the translation updates have been exported to the file system, a git rebase
is applied to bring the changes on top of latest upstream version. If the rebase fails, translation updates for that repository fail until read-only checkouts are updated.
Here is an example of sequence where accidental overwrites can happen without state synchronization:
- We update read-only checkout of a project to latest version A.
- We process changes automatically or manually using Special:ManageMessageGroups.
- Upstream modifies English (source language) and message documentation files to create version B.
- We export and commit translations on top of A to create version C.
The upstream changes to message documentation in B are lost. Do note that changes in English files are not usually exported, as it is usually the source language. Hence those changes would not get overwritten in this scenario.
Message group synchronization lock (Strong synchronization)
Here is an example where accidental overwrites could happen without the "strong synchronization" feature:
- Upstream modifies all translation files: for example, by updating a copyright date in a string and removing some unused strings (basically, any changes that make --safe-import to not process changes automatically).
- We update read-only checkout from version A to latest version B.
- We do not process changes using Special:MessageGroupChanges
- We export and commit translations on top of B (but based on A) to create version C.
Upstream changes in version B are lost. For the next import we automatically update our read-only checkout to version C, so we never see or process changes in B.
Strong synchronization prevents imports and exports for message groups that are currently in synchronization or failed synchronization state.
Helper commands
Sometimes you don't want to deal with all repositories, for example while doing one-off exports or setting things up. You should use repo
command when working with the read-only checkouts. You should use repomulti
command when working with write checkouts. These commands set up appropriate permissions for you.
Each of the commands documented in this section will crawl up the directory tree until it finds the repoconfig.yaml
file. This means you can run these commands in the project subdirectories as well.
The repo
command automatically uses the l10n-bot
user and the /resources/projects
read-only repositories using the /resources/projects/repoconfig.yaml
(symlink) configuration. It takes two commands:
- command:
update
(export
andcommit
can be used too, but they don't make any sense!) - project name as above
You need to be in the l10n-bot
shell user group to be able to use this command.
repomulti
is a versatile command, which takes two arguments:
- command:
status
(default),update
,export
orcommit
- project selector regular expression, defaults to all projects
Warning: Unlike all other commands, this command takes a regular expression for the group selector!
You need to be in l10n-bot
shell user group to be able to use this command.
For example, if you haven't done exports in a while, you can run repomulti update
first, so that when you run autoexport
later it will be faster. Or if you need to do manual exports to specific projects, you can do:
sudo -su l10n-bot
repomulti update 'mw.*' # matches mwgitlab, mwgithub and mwgerrit, but not mediawiki-extensions
repomulti export 'mw.*'
repomulti status 'mw.*' # you can check what has changed, prints git/svn status information
repomulti commit 'mw.*'
Finally, for scripting purposes, we have three commands:
repoupdate
updates a repositories.repoexport
exports the translations.repocommit
creates commits and pushes them out.
Each command takes a project as the second argument, e.g. freecol
, mediawiki
. These commands do not set up permissions, so you need to do it manually using sudo -u l10n-bot
as well as l10n-bot
wrapper for ssh key access.
RepoNG
The above repo*
commands are only thin wrappers to repong.php
script. This scripts does most of the actual work, although it uses export.php
from Translate and clupdate-X-repo
where X is one of git
, svn
, bzr
that support different version control systems and authentication. Authentication should be separated from the version control in the future.
RepoNG has some nice feature such as doing things in parallel using multiple threads to speed things up. It also handles state synchronization between the read-only and write checkouts so that we do not accidentally overwrite changes we haven't processed yet.
This command takes two arguments:
- command: One of
update
,export
,commit
- project name as above
It also has two switches:
- -v makes it to print out the commands it executes. Useful for debugging. By default the script is very quiet.
- --variant can be used choose a variant from the config (currently only
export
is supported. Default is taken fromREPONG-VARIANT
file that is created alongside therepoconfig.yaml
file. If neither is given, it will default to default variant used for read-only checkouts.
repoconfig.yaml
repoconfig.yaml
is the configuration file that acts as a list of managed repositories for RepoNG. It is a YAML file that that contains one or more projects. The basic structure of a project is as follows:
project name:
# Project contains project properties
group: example
repos:
checkout-path:
# Repos contain repo properties
type: github
url: https://github.com/translatewiki/example.git
Same config file can be used in multiple contexts using variants. The most common case is a shared unauthenticated read-only checkout of all repositories and one or more authenticated checkouts for pushing translation updates. Any property can vary, but it's recommended to only vary scalar values. Variants are specified by appending |
to the key followed by the variant name. In examples below you see examples of this feature using export
as the variant name.
Global properties
Global properties are given under a virtual project name @meta
. Mandatory keys are bolded.
Property key | Description |
---|---|
expand | Command line Translate's expand-groupspec.php in MediaWiki installation.
|
export | Command line to Translate's export.php .
|
Note that you can add --wiki parameter to expand and export to choose a wiki in multi-wiki setup.
| |
state-directory | Path to your read-only checkouts. See #Repository state synchronization. |
- Example
'@meta':
export: php /srv/mediawiki/targets/production/extensions/Translate/scripts/export.php
expand: php /srv/mediawiki/targets/production/extensions/Translate/scripts/expand-groupspec.php --exportable
state-directory|export: /resources/projects
Project properties
Project properties apply to all repositories under a project. Mandatory keys are bolded.
Property key | Description |
---|---|
group | Which message groups are connected to this project. This accepts a GroupSpec : comma separated values with support for wildcards * and ?.
|
repos | List of repositories connected to this project. This is a hash where the key is the filesystem path relative to the repoconfig.yaml where the repository is checked out, and the value is a hash of repository properties. Exception to this is key @generator which takes a command line to a script, which must return the repositories as a JSON string as output. If this method is used, no repositories can be specified for this project in the repoconfig.yaml file.
|
export-threshold | This controls the --threshold parameter for export.php . Languages where less than given percentage of messages are not translated are not exported. The threshold is checked independently for each message group, even if the project consists of many.
Default value is 25. |
no-export-languages | This controls the --skip parameter for export.php . Accepts a comma separated string of language codes. These languages are never exported. By default en is not exported.
|
always-export-languages | These languages are always exported even if they do not pass export-threshold. Useful for language variants which are not expected to reach 100%, such as British English.
Default value is |
auto-merge | If set, after pushing updates to Wikimedia Gerrit it will give CR+2 to all of them.
Only applicable for repository type |
- Example
mediawiki-extensions:
always-export-languages: en-ca,en-gb,es-formal,de-formal,de-at,de-ch,hu-formal,nl-informal,zh-hk
no-export-languages: test,aeb,ais,be-x-old,crh,dk,en,fiu-vro,gan,gom,hif,kbd,kk,kk-cn,iu,kk-kz,kk-tr,ko-kp,ku,ku-arab,no,ruq,simple,sr,tg,tp,tt,ug,zh,zh-classical,zh-cn,zh-sg,zh-min-nan,zh-mo,zh-my,zh-tw,zh-yue,bbc,ady
export-threshold: 0
group: ext-*
auto-merge: ^mediawiki/extensions/.*
repos:
'@generator': php ../groups/MediaWiki/repong-generator.php extensions
Repository properties
Repository properties only apply to one repository. Mandatory keys are bolded.
Property key | Description |
---|---|
type | Supported values are:
|
url | URL of the repository. This usually varies with variant configuration. |
branch | Which branch to use. Ignored for repository type svn . This can vary with variant configuration, in which cause state sync is automatically disabled. Commits will be added to the remote branch without rebasing and rewriting, which means they will get out of sync with the source branch over time. Do not vary this when using push-branch or pull-branch .
Default value is |
push-branch | Which branch translation updates are pushed to. This will do a force push to create or update the remote branch. Only applicable for repository types git , github and gitlab .
|
pull-branch | Same as push-branch , but additionally updates or creates a pull request. Only applicable for repository types github and gitlab .
|
no-state-sync | Disables repository state synchronization. It is automatically disabled if branch varies. See #Repository state synchronization.
|
svn-add-options | Additional options for Subversion to be applied for newly added files. |
- Example
fudforum:
group: out-fudforum
repos:
fudforum:
type: svn
url: svn://svn.code.sf.net/p/fudforum/code/trunk/install/forum_data/thm/default/i18n
svn-add-options: config:auto-props:msg=svn:mime-type=text/plain;svn:eol-style=native
url|export: svn+ssh://translatewiki@svn.code.sf.net/p/fudforum/code/trunk/install/forum_data/thm/default/i18n
How to process external message changes
When running autoimport
or autoimport-mediawiki
or processMessageChanges.php
directly, one gets a link to Special:ManageMessageGroups. On this page one does a sanity check of the changes before "accepting" them to translatewiki.net.
The page consists of diffs, where external state (files in repositories) is on the first column and the wiki state is on the second column. Changes seen on this page usually fall into the following categories:
- New messages in source language.
- There is usually nothing to check for these, and in fact
autoimport
will accept all new messages for a message group if there are no other changes. If there is something that doesn't look translatable (empty messages, URLs with no translations, symbols) one should update message group configuration to list these messages either asignored
oroptional
as appropriate. - Messages or translations deleted.
- Again, these can usually be safely accepted. If there is a large amount of unexpected deletions, there might be a syntax error in the source file, that should be fixed before proceeding. We don't delete translation that go unused from the wiki.
- Changed messages in source language.
- It is normal for the messages in source language to changed. In this case one should see if the change is something that doesn't require fixes in translations (usually only spelling mistakes fill this criteria) and in that case choose the option to not mark translations as outdated.
- Changes in translations.
- Our exports are not yet fully atomic. Changes in translations should be checked carefully, because the system might try to overwrite a very recent translation with the previous one. It might also be an external change, in which case one should use his/her best judgement which version to choose.
- Renamed messages.
- Message renames can also happen, although they are discouraged. In this case you would see a message deleted and a new message with exactly the same content. Translations might or might not be renamed externally.
If messages keys are renamed while the content is exactly the same, Translatewiki.net will match the messages automatically and display them. The matching can be broken by selecting the Add as new menu option next to the matched renames.
Sometimes renames and content change is done at the same time. In such cases Translatewiki.net will not be able to match the messages automatically. If such messages are spotted, they can be manually matched using the Add as a rename menu option
The dialog box that appears when renaming, displays the list of messages that Translatewiki.net detects as missing, as of this import for that group. It also displays the similarity % between the message for which we are trying to find a rename, and the possible renames. The actual renaming is performed once the page is submitted and happens in the background via the JobQueue.
-
Renamed message with different content displaying the menu next to the newly added message (1).
-
Dialog displaying the list of possible renames. Also shows the similarity % (1). Once a message is chosen, press Select (2).
-
Breaking already matched renames using the Add as new option (1).
Using Special:ReplaceText to rename messages: We can still rename messages using the Special:ReplaceText. Copy the old and new key to Special:ReplaceText without namespace and language code, but include the trailing /
. Regular expression can be included to rename multiple similar messages at once, but then /
might need to be written as \/
. Uncheck all namespaces and check the namespace and its talk namespace in question. Uncheck replace in content and check replace in titles. On the confirmation page check that all pages can be renamed – it might show that source page cannot be renamed if you don't have sufficient permissions. Once you have done the renames, wait a bit for JobQueue to process them, and then re-run the script to re-generate the diffs and proceed as usual. After you accept all changes, you might get an page with heading but no diffs, this is okay and it will disappear after re-generating the diffs next time.
Sometimes the diffs can be messy, for example if people duplicate messages, or do renames and change content at the same time. These situations need special care to get them right (or sometimes it is just too difficult and we make translators re-translate using translation memory). In some cases there are changes to the source message that could be programmatically applied to the translation using Special:ReplaceText or similar. The issue with these tools is that they do not preserve the outdated status of translations, so one should make sure they are either automatically or manually marked outdated after the automatic replacements. For this reason it is not usually worth trying to do those changes programmatically.
Project maintainers are advised to inform translation admins if there are major message renames, so check with the Translation admins whenever you have are reviewing a large number of renames as they may have further information about it.
Troubleshooting
Sometimes the cache can go out of sync and does not show changes that should be there.
One case where this happens is when adding a prefix to a message that has already been imported. In this case you can delete /resources/caches/translatewiki.net/translate_groupcache-*groupid*/*languagecode*.cdb
for appropriate languages (usually en
or qqq
) and re-run the import.
Another case where this happens is when replacing a regular message group with an aggregate message group. This can be observed by a warning such as AggregateMessageGroup ext-scribunto cannot be primary owner of key scribunto-lua-error
. In this case you can safely remove the whole directory for that group and run createmi
to verify.
Sometimes for debugging it is useful to go very low level to figure out why access to some repository fails. Some examples:
# Check whether you can access ssh-agent
l10n-bot ssh-add -L
# Execute git command with verbose mode enabled for ssh:
GIT_SSH_COMMAND="git-ssh-wrapper -v" l10n-bot git fetch