PLURAL localization

The current rules come from CLDR and are:

<pluralRules locales="lv prg">
    <pluralRule count="zero">n % 10 = 0 or n % 100 = 11..19 or v = 2 and f % 100 = 11..19 @integer 0, 10~20, 30, 40, 50, 60, 100, 1000, 10000, 100000, 1000000,  @decimal 0.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, </pluralRule>
    <pluralRule count="one">n % 10 = 1 and n % 100 != 11 or v = 2 and f % 10 = 1 and f % 100 != 11 or v != 2 and f % 10 = 1 @integer 1, 21, 31, 41, 51, 61, 71, 81, 101, 1001,  @decimal 0.1, 1.0, 1.1, 2.1, 3.1, 4.1, 5.1, 6.1, 7.1, 10.1, 100.1, 1000.1, </pluralRule>
    <pluralRule count="other"> @integer 2~9, 22~29, 102, 1002,  @decimal 0.2~0.9, 1.2~1.9, 10.2, 100.2, 1000.2, </pluralRule>
</pluralRules>

You would have to convince CLDR to change the rules. We only override CLDR rules in exceptional cases.

Nike (talk)00:44, 25 December 2014

The use of

{{PLURAL|form1|fallbackform}}

with positional (numbered) parameters should be deprecated in favor of the more explicit labelled forms:

{{PLURAL|one=form1|zero=form2|fallbackform}}

The mapping from labels (one, zero, other... as used in CLDR) to positional numbers (compatible with imports/exports for the legacy GNU Gettext format) can be made language dependant, but the database should record translations with those labels instead of implicitly numbered positions. All we can assume is that the form "other" (used by CLDR) should be mapped to the fallback form (the last form used in the legacy syntax using positional parameters).

This way no bot action would be required, except for transforming first all the existing translations so that they use labeled forms (except for the last form which should remain "other").

Then adding a plural rule for the "zero" form will have a minimum impact. It will be possible to reorder later the fallback order (once all the existing translations have had their forms labeled), by remapping the positional numbers assigned to each label.

This could also allow adding more labels, for example for gender, case, or for selectors of their object (reader, author, each cited person, group of persons, subject of the verb, object of the sentence, genitive possessor....).

A translation could also add these labels to qualify them as tags (e.g. in Chess games, say that the translation of "tower" is "tour" in French but qualifying it as "feminine" for the French feminine, and "one" for the French singular, its plural would be "tours" tagged by "feminine" and "other" ; the translation of "pawn" in French would be "pion", tagged as "masculine" and "one", its plural "pions" would be tagged as "masculine" and "other" : no need to add multiple variants, the source translation only requests the translation for "tower" and "pawn" ; but when this item used in a variable, the selectors are accessible from the variable name and variants are automatically selected so that:

{{#grammarswitch: ${variablename|?=gender articlecontract} ${count|?=plural}
| f plain    zero  = aucune       ${variablename} ${color|f one}
| f plain    one   = la           ${variablename} ${color|f one}
| f contract one   =            l’${variablename} ${color|f one}
| f          other = les ${count} ${variablename} ${color|f other}
| m plain    zero  = aucun        ${variablename} ${color|m one}
| m plain    one   = le           ${variablename} ${color|m one}
| m contract one   =            l’${variablename} ${color|m one}
| m          other
| #default         = les ${count} ${variablename} ${color|m other}
}}
// E.g. Given variablename="tower", count="2", color="white", we get: "les 2 tours blanches"

The selectors used above are generic (and look complex here), but there can exist also shorter aliases for combinations, defined by a mapping similar to the mapping of form labels to numbers. Note that for the ${color} adjective (which would be replaced by the translted forms "blanc(he)(s)" or "noir(e)(s)", depending on the color value and the gender and plrual of the name used with it), we select the appropriate form of the adjective using a smaller set of labels (we could use the "zero" form but its default is the same as "one" in French).

Selectors in the grammar switch are just an unordered space-separated list of symbolic tags (labels). A selector matches if all tags in the selector are present in the first parameter of the grammar switch. Some known synonyms (defined for the language) could be used such as "s" for "one", or "ms" instead of "m one".

variables can be queried in two ways:

  • using a "?" parameter in the pipe to retrieve a list of specific tags matching some known pattern (e.g. "gender" matches the tags "m" or "f" if they are set in the translation of the variable and return them). Here the synonym tags (such as "?=gp" instead of "?=gender plural" could be matched too, it will retrieve the union of tags "m" or "f" for the gender, or "one" or "other" for the plural form)
  • using a simple pipe followed by a space-separated list of tags, to select one of its known translations (internally stored like a grammar switch). Here the synonym tags (such as "|ms" instead of "|m one" could be matched too, it will switch the correct variant to return for the masculine singular form in French).

Tags (labels) are just symbolic names (containing only digits, letters, or dashes/underscores, with significant case to simplify implementations). There's also no limitation on the number of tags that can be set or queried from a translation or combined in a selector of a "grammarswitch".

The "grammarswitch" syntax above is for MediaWiki or for use in parts of a compound translation of the same variant ; the internal representation in the database (for storing multiple variants of the same translation unit) will just create as many translation items as needed, indexed by the tags list reordered in alphabetic order.

We can also tag the resulting translation with its resulting appropriate tags (but by default, these tags for the result of a "grammarswitch" are those in its selector (in the first parameter) and we don't need to change it (such change may be needed if the full translation contains more than one grammarswitch, but by default all tags in all grammarswitches will be combined in an union, and other literal texts outside the switch don't alter the list of tags). To set another list of tags for the result, we could use a syntax such as "${|=new tags}" at end of the variant (in wikisyntax). This system of defaults would allow useful inheritance of lingusitic properties for complex composites (but they would collide if we want to memorize for example two different genders for disting parts, in which case new tags should be set for each part; as another example a composite containing two singular names linked a conjunct like "and"/"et" would not keep the "one" tag of the singular, but could reset it to use the tag "other" instead with "${|one = other}" and force the resulting gender to become masculine if both are masculine with "${|m f = m}", and both can be set in one tag: "${|one = other|m f = m}" which means: if tag "one" is set then remove it and set tag "other", if tags "m" and "f" are both set then remove them and add tag "m").

Other examples are possible for selectors of case, person/object/subject... And many linguistic exceptions may be taken into account (e.g. above: the invariable composite color names such as "bleu-vert" are invariable in French even when used as epithetical or attribute adjectives)

Verdy p (talk)21:55, 3 April 2016