Thread:Support/Messages customization statistics

I went to a few Wikimedia projects, in English and in other languages, and browsed Special:AllMessages.

There are several repeating phenomena:
 * Messages are often modified, even thought the wording is identical. These can probably be deleted in the project, because otherwise they won't be updated when they are updated in the software.
 * Modified messages often differ very little from the original. For example, the original may have a period or a colon in the end, and the modified messages doesn't have it, but except that they are identical. This often happens with input box labels and error messages. Many of these probably should be deleted too.
 * Sometimes groups of messages (not necessarily related to TranslateWiki groups) have changed the same word, because the project decided to change the wording. For example, in en.wp "content page" is replaced with "article" and a similar thing is done in he.wp and probably in many other projects. In the fr.wp, abuse filter is called "filtre anti-erreur", but in the source it's called "filtre antiabus". This causes a lot of messages to be modified, even though only a small part is actually modified, so maybe this could be parametrized. And in some cases changing the message in the source may be better.
 * In some languages, for example Ossetian (os) a lot of messages are properly translated in the project, but the translations are not kept in the source. If they are good enough for the functioning project, they should probably be imported to the source.
 * In older, but medium-traffic projects, such as he.wikisource, many translations are outdated, because they were manually modified before the days of proper MediaWiki localization and Betawiki.

Can anyone create statistics of modified messages in all the 700+ Wikimedia projects? These are the curious things that i can think of now:
 * How many messages are modified?
 * How many are modified, but identical to the source?
 * What is the most frequently modified message (i would bet on Common.css or aboutsite...)?
 * If it's not too hard - how many messages are only slightly modified (let's say, identical to the source except 15% different characters).

Thanks in advance.