Difference between CLDR plural rules and MediaWiki plural rules
CLDR plural rules define different sets of decimal numbers according to the differences in grammar which occur in a sentence which includes those numbers - see Language Plural Rules on CLDR. These rules do not appear to cover the situation where a decimal number does not appear in a sentence.
In MediaWiki there are a handful of sentences where the PLURAL magic word is used but no number appears in the sentence. In these sentences there is a subject or object which can be either singular or plural and PLURAL is used to enable correct grammatical sentences where verbs, pronouns, etc differ according to the (unstated) number of items discussed. Some examples are:
- ("Hidden categories")
- ("This page has been protected from editing because it is transcluded in the following pages, which are protected with the "cascading" option turned on:
- specific rights can edit it because it is transcluded in the following cascade-protected pages:") ("Warning: This page has been protected so that only users with
- ("The action you have requested is limited to users in one of the groups: $1.")
- ("The users $1 are now blocked.")
on article!") *
Potentially there are grammar changes triggered here which are different from the grammar rules triggered by changes in decimal numbers. For example:
- the form of a collective noun (referring to 'more than one') may be different to the form used with a specific number, for example in "the 11 following things" and "the following things" the form for "things" may differ.
- verb or pronoun forms used may be different for singular and collective plurals, but not different when the number is included in the sentence.
I know these cases should be rare, and it may be that in practice there are no languages where the CLDR rules do not cover all the uses of PLURAL in MediaWiki as well, but potentially there could be languages affected by this difference in how PLURAL is used on MediaWiki and how it is defined on CLDR. Does anyone know of any languages affected by this?
I think that this is not really an issue that necessarily needs to be raised at CLDR, although I am sure they would be interested to learn about how MediaWiki uses PLURAL in sentences that do not include a decimal number. However, what it may mean is that there may be rare cases where translatewiki.net will need to define more number sets for a language than there are in CLDR.
Last edit: 18:43, 29 January 2012
In other words we want to know if there is a language where any given number-included plural form is not a subset of number-less plural form. If this is the case, then we cannot achieve number-less forms by just combining existing (number-included) plural forms (in practice, giving them the same translation) in suitable combination.
I'm not aware of a language not having the wanted subset relation. Since there may be ones, I think, CLDR should be made aware, with the suggestion to make a note on their explanatory page. Unless someone reports that (s)he did it already, I shall do that.
Well, I think, it would be a superset, but anyways. When going through the above examples again, I found that in Colognian, a sentence followed or preceded by a list (the items of which you can count) is to be treated and built exactly as if the number was included, even if the sentence itself is numberless.
The page on plural rule syntax at CLDR says: "There are two extra values that can be used with count attributes: 0 and 1. These are used for the explicit values, and may or may not be the same as the forms for "zero" and "one"." It seems that CLDR have got around the problem of defining additional categories for use in particular circumstances by introducing these 2 additional values. Would it be possible to write code for Mediawiki plural which does the same, enabling the use of '0' and '1' only when needed?
I wrote the previous comment before I had understood how Mediawiki uses more than one defined plural ruleset to handle numberless sentences (and potentially sentences with zero?). Mediawiki's solution appears to be elegant, with simpler syntax for translators for numberless sentences.
However, you also say in another thread that it is 'hard to unify Mediawiki rules with other systems'. Would it be easier to unify with other systems if instead of making the second ruleset shorter than the normal ruleset, instead we made it longer, typically by adding an additional rule for 1 (or for 1 and 2 for Scottish Gaelic for example) and an additional rule for 0 where needed (Swahili would benefit from an additional rule for 0, for example). Making the second ruleset longer is not as elegant as the current system. But does it help with compatibility with other systems?
Other systems only support one ruleset - that's the problem. And frankly I don't see any reason for using multiple rulesets for one language except for translator's convenience. As far as I know the second ruleset is always a shortcut which can be done using the first ruleset in slight longer way. Make the shortcut harder makes it even more useless and we should rather just drop it in that case.
A second ruleset is used for the languages using rules J and K. Combining both rulesets would entail increasing the number of forms for these languages by 1. As you say, this shouldn't be a great hardship, given that they only have 3 forms at present.
It sounds to me as if you would possibly like to see Mediawiki use one ruleset only for all languages and that this ruleset should be able to be used for sentences without numbers. If so, we could add a paragraph about alternative rulesets not being recommended to Plural#Alternative_ruleset.
I will try to find time to add to the discussion on CLDR, now that we have got examples of languages affected.