Statistics Project


 * ''Note: this page is outdated.

This page served as a spec and discussion point for the Translation Statistics Project, a co-operation between translatewiki.net and Wikia.

The project expanded the array of tools that lets users and translators assess the progress of translating specific extensions, groups of extensions, etc.

Product specification
Most of Wikia's translations are now done at translatewiki.net. The problem is we don't know what has been translated and what the progress is.

We need to know:
 * Has group X been 100% translated in language xx?
 * eg: Are Achievements ready for Italian?
 * What % of messages are complete in language? (excluding products we don't need translated like staff-only tools)
 * eg: Is French 100% translated?
 * Table to show progress in languages

Groups
Existing group "Wikia extensions" should be separated into two sub groups "Wikia user facing extensions" and "Wikia staff facing extensions".

Special:LanguageStats

 * input: language xx
 * output: show all products and their percentage
 * options: supress 100%

Special:TranslationStats

 * input: languages, projects
 * output: graph of number of edits or active translators over a given time frame
 * options: number of edits, active translators, given time frame

Modify Special:LanguageStats

 * options: supress 0%, specify group (e.g. Wikia user facing extensions) [instead of showing all group/products]

Group statistics ("Special:GroupStats")

 * input: group x
 * output: show all languages and their percentage for a given group
 * options: supress 0%, supress 100%, show only 100%, specify language list
 * comment: click on a language should call Special:Languagestats for this group

Too much options for one page? Perhaps split into two pages.

Note: this is a merger of Special:ProductStats and Special:GroupStats. Because each product is a group in TWN terminology, this will serve both purpouses.

Modify Special:TranslationStats

 * input: languages, groups, projects
 * output: table of number of edits or active translators over a given time frame
 * options: number of edits, active translators, given time frame

Number of translations ("Special:TranslationCount")

 * input: group x (e.g. Wikia extensions)
 * output: number of un/translated messages
 * options: breakdown per language, breakdown per product

Questions and feedback

 * How to avoid problems if one word in en and another language is the same? (Guidelines: do not create this message, problem: language won't get 100%)

Proposed solution for Group Statistics issues by Nike
Requirements:
 * Must be able to query per language or per group
 * Must be able to query per language and group together
 * Must be fast
 * Should be as simple as possible

Database table
Name: groupstats

Unique key: gs_group, gs_language

Fields:
 * 1) gs_group: text # message group id
 * 2) gs_language: text # language code
 * 3) gs_total: int # total number of messages in this group
 * 4) gs_translated: int # number of translated messages in this language
 * 5) gs_fuzzy: int # number of translations which are not up to date

Cache invalidation
Implement some helpful function to trigger cache invalidation on Things to consider:
 * Translation is added or updated
 * Group definitions change ( message change, message removed or message added )
 * We might need to invalidate multiple groups ( namely all aggregate groups which have the children )
 * If we have the message key, we can easily get the list of groups. Otherwise we need to loop trough all aggregate groups asking if they care about this group.
 * Only action at first can be removing the cache entry
 * Later we can try to be more intelligent:
 * When new messages are added, increase the total column in all cache entries
 * Might cause stale cache if those new messages already have translations, which might sometimes be the case
 * When existing messages are removed, it's best just to invalidate all caches
 * When new translation is made, increase the translated column for that language (also fuzzy if needed)
 * When existing translation is modified, check fuzzy status and update translated and fuzzy columns