Statistics Project
This page served as a spec and discussion point for the Translation Statistics Project, a co-operation between translatewiki.net and Wikia.
The project expanded the array of tools that lets users and translators assess the progress of translating specific extensions, groups of extensions, etc.
Product specification
Most of Wikia's translations are now done at translatewiki.net. The problem is we don't know what has been translated and what the progress is.
We need to know:
- Has group X been 100% translated in language xx?
- eg: Are Achievements ready for Italian?
- What % of messages are complete in language? (excluding products we don't need translated like staff-only tools)
- eg: Is French 100% translated?
- Table to show progress in languages
Planning
Groups
Existing group "Wikia extensions" should be separated into two sub groups "Wikia user facing extensions" and "Wikia staff facing extensions".
Existing statistics
Special:LanguageStats
- input: language xx
- output: show all products and their percentage
- options: supress 100%
Special:TranslationStats
- input: languages, projects
- output: graph of number of edits or active translators over a given time frame
- options: number of edits, active translators, given time frame
Wanted statistics
Modify Special:LanguageStats
- options: supress 0%, specify group (e.g. Wikia user facing extensions) [instead of showing all group/products]
Group statistics ("Special:GroupStats")
- input: group x
- output: show all languages and their percentage for a given group
- options: supress 0%, supress 100%, show only 100%, specify language list
- comment: click on a language should call Special:Languagestats for this group
Too much options for one page? Perhaps split into two pages.
Note: this is a merger of Special:ProductStats and Special:GroupStats. Because each product is a group in TWN terminology, this will serve both purpouses.
Modify Special:TranslationStats
- input: languages, groups, projects
- output: table of number of edits or active translators over a given time frame
- options: number of edits, active translators, given time frame
Number of translations ("Special:TranslationCount")
- input: group x (e.g. Wikia extensions)
- output: number of un/translated messages
- options: breakdown per language, breakdown per product
Questions and feedback
- How to avoid problems if one word in en and another language is the same? (Guidelines: do not create this message, problem: language won't get 100%)
Proposed solution for Group Statistics issues by Nike
Requirements:
- Must be able to query per language or per group
- Must be able to query per language and group together
- Must be fast
- Should be as simple as possible
Implementation
Database table
Name: groupstats
Unique key: gs_group, gs_language
Fields:
- gs_group: text # message group id
- gs_language: text # language code
- gs_total: int # total number of messages in this group
- gs_translated: int # number of translated messages in this language
- gs_fuzzy: int # number of translations which are not up to date
PHP class
class MessageGroupStatistics {
public static function forLanguage( $code ) {
# Fetch from database
# Go over non-aggregate message groups filling missing entries
# Go over aggregate message groups filling missing entries
}
public static function forGroup( $group ) {
# Fetch from database
# Go over each language filling missing entries
}
// Used by the two function above to fill missing entries
public static function forItem( $group, $code ) {
# Check again if already in db ( to avoid overload in big clusters )
# Calculate if missing and store in the db
}
}
Cache invalidation
Implement some helpful function to trigger cache invalidation on
- Translation is added or updated
- Group definitions change ( message change, message removed or message added )
Things to consider:
- We might need to invalidate multiple groups ( namely all aggregate groups which have the children )
- If we have the message key, we can easily get the list of groups. Otherwise we need to loop trough all aggregate groups asking if they care about this group.
- Only action at first can be removing the cache entry
- Later we can try to be more intelligent:
- When new messages are added, increase the total column in all cache entries
- Might cause stale cache if those new messages already have translations, which might sometimes be the case
- When existing messages are removed, it's best just to invalidate all caches
- When new translation is made, increase the translated column for that language (also fuzzy if needed)
- When existing translation is modified, check fuzzy status and update translated and fuzzy columns
- When new messages are added, increase the total column in all cache entries