Messages customization statistics
I went to a few Wikimedia projects, in English and in other languages, and browsed Special:AllMessages.
There are several repeating phenomena:
- Messages are often modified, even thought the wording is identical. These can probably be deleted in the project, because otherwise they won't be updated when they are updated in the software.
- Modified messages often differ very little from the original. For example, the original may have a period or a colon in the end, and the modified messages doesn't have it, but except that they are identical. This often happens with input box labels and error messages. Many of these probably should be deleted too.
- Sometimes groups of messages (not necessarily related to TranslateWiki groups) have changed the same word, because the project decided to change the wording. For example, in en.wp "content page" is replaced with "article" and a similar thing is done in he.wp and probably in many other projects. In the fr.wp, abuse filter is called "filtre anti-erreur", but in the source it's called "filtre antiabus". This causes a lot of messages to be modified, even though only a small part is actually modified, so maybe this could be parametrized. And in some cases changing the message in the source may be better.
- In some languages, for example Ossetian (os) a lot of messages are properly translated in the project, but the translations are not kept in the source. If they are good enough for the functioning project, they should probably be imported to the source.
- In older, but medium-traffic projects, such as he.wikisource, many translations are outdated, because they were manually modified before the days of proper MediaWiki localization and Betawiki.
Can anyone create statistics of modified messages in all the 700+ Wikimedia projects? These are the curious things that i can think of now:
- How many messages are modified?
- How many are modified, but identical to the source?
- What is the most frequently modified message (i would bet on Common.css or aboutsite...)?
- If it's not too hard - how many messages are only slightly modified (let's say, identical to the source except 15% different characters).
Thanks in advance.
You do not have permission to edit this page, for the following reason:
The action you have requested is limited to users in the group: Users.
You can view and copy the source of this page.Purodha Blissenbach
Messages are often modified, even thought the wording is identical. These can probably be deleted in the project, because otherwise they won't be updated when they are updated in the software.
These may be modifications that admins were waiting for too long. Best make them aware, I'd suggest.
The modifications could have been done manually in the project, as suggested above, but I seem to remember that we were told it is a quirk of 'Localisation Update' extension, that it sometimes generates these duplicates. The advice when 'Localisation Update' was first introduced was to delete the duplicate messages in the project manually because there wasn't a way to do it automatically back then, if I remember right.
I have now deleted 370+ translations on Wikimedia Commons where the message was identical (or less useful) than the default translation. In nearly all of these cases, the translation message was created before 2008. We should create a list of all languages, and check them off as we have reviewed all of the custom messages on Commons and/or <lang>.Wikipedia.