GRAMMAR for Georgian

I do not see the implementjation in MEdiaWiki core sources. To be actually usable in translations, given that these translations will be used on other wikis, it needs to be deployable.

As far as I see there's still no "includes/languages/LanguageKa.php" and "languages/messages/MessagesKa.php" is basic and just implements translations for namespace names, magic keywords, and a custom regexp for link trails, without registering a grammar extension.

It should be implemented like the grammar extension for Finnish, Irish Gaelic, Upper Sorbian, Kazakh (Cyrillic, Latin, or Arabic), Kölssh/Ripuarian, Latin, Ossectic, Slovenian, or Tuvian:

https://doc.wikimedia.org/mediawiki-core/master/php/LanguageFi_8php_source.html
https://doc.wikimedia.org/mediawiki-core/master/php/LanguageGa_8php_source.html (only specific words)
https://doc.wikimedia.org/mediawiki-core/master/php/LanguageHsb_8php_source.html
https://doc.wikimedia.org/mediawiki-core/master/php/LanguageKk__8php_source.html (detects the target script to use for additional variants per country)
- https://doc.wikimedia.org/mediawiki-core/master/php/LanguageKk__cyrl_8php_source.html (actual implementation for the 3 scripts)
https://doc.wikimedia.org/mediawiki-core/master/php/LanguageKsh_8php_source.html
https://doc.wikimedia.org/mediawiki-core/master/php/LanguageLa_8php_source.html
https://doc.wikimedia.org/mediawiki-core/master/php/LanguageOs_8php_source.html
https://doc.wikimedia.org/mediawiki-core/master/php/LanguageSl_8php_source.html
https://doc.wikimedia.org/mediawiki-core/master/php/LanguageTyv_8php_source.html

I don't know if you implemented it just to provide case overrides for specific words in a small custom dictionnary (just for Wikimedia project names, like in Irish?), or in a generic way (like in Kazakh, Latin, Slovenian or Tuvan) if there's a way to guess it from the detected forms.

Note that the links above are those used for the current documentaton of MediaWiki, they are probably not in sync with the current development branches, or the last deployment branches, or the last testing ("Canary" build) branch used on test wikis before the actual deployment in the master branch, or in a specific subbranch (only for Wikimedia wikis if your changes do not apply to generic MediaWiki).

If there's no specific per-language override for grammar in any branch, we just get a default fallback for {{GRAMMAR:case|word}} in Language.php which just returns the given word without changing it for any given case.

https://doc.wikimedia.org/mediawiki-core/master/php/Language_8php_source.html

Also the recognized values for case given to {{GRAMMAR:case|word}} should be listed and documented. Unfortunately they seem to be non-generic: existing implementations use translated case names, and not generic abbreviations or English names for common cases like "nominative", "ergative", "vocative", "accusative", "genitive", "dative", "abblative", "locative", "adverbial", "prepositional"...), because otherwise these grammatical case forms won't work and will just return the unmodified given word. In some languages, even the case names are variable and may have known aliases, and their own local list of fallbacks from one case form to another. As well, the value to give for the word parameter is generally in the "nominative" case, but I wonder if that is always true for all languages (notably for proper names, the default case form is probably "ergative"); if we give the word in the wrong case form, the derivation of inflections may not work as intended.

Something that I'd like to have in French is a derivation pseudo-case form for elisions of the final vowel of leading articles ("le", "la"), or prepositions ("de", "jusque", "lorsque", "puisque"), or subject/object/reflected pronouns ("je", "me", "te", "se", "que"...), or negation particle ("ne"): there's a generic rule when the following word starts by a vowel (a,à,â,e,é,ê,i,o,ô,u,y) where elision is mandatory, but a lookup is needed in a moderately large dictionnary for words starting by a leading "h" (or at least a radical, or common prefix like "haplo", "hémi", "hém(a/o)", "hept(a/o)", "hétér(i/o)", "hom(i/o)", "human(i/o)"... which are generally unaspirated), because the elision of the leading article/pronoun/particle occurs only when the leading 'h' is mute in the following word/radical/prefix, not when this 'h' is "aspirated" (generally noted in French dictionaries by a leading asterisk in the phonetic notation of lexemes, most frequently nouns or substantive radicals like "haricot", or proper names and gentilés and people names like "Hongrie", "Hongrois(e)(s)", "Hun(s)", "Han(s)"... but not "Henr(i/y)"; but it may be possible to list these aspirated words as "wellknown" exceptions, but there are many "aspirateds" words that are in fact borrowed from other languages where the 'h' is not just "aspirated" and mute like in French, but pronounced with some leading glottal or voiced consonnant like "Haydn" taken from German: in general proper names starting by a capital 'H' are aspirated by default, except for a few know exceptions, so that no elision occurs for the common preposition "de", or rare article "le"). There also the very common (and mandatory) case of contraction of "le" or "la" (only if they were not elided with the following word) or "les", after prepositions "à" or "de", into "au(x)" or "du/des". Another case similar to elision is the mandatory transformation of the article "ce" into "cet" (not an alision but appending an epenthetic consonnant) in the same cases where the following word would require the elisition of a previous "le" or "de". The frequent epenthetic addition of particles "-t-" or "-z'" (purely phonetic, without any semantic meaning, but orthographicly mandatory) also occurs between a conjugated verb and a pronoun starting by a vowel ("il(s)", "elle(s)", "on", "en", "y"), when this conjugated verb ends with a vowel (even if that vowel is a "mute e").

Similar elisions occur in Italian (even more frequently than in French). Such very frequent uses currently have no support in MediaWiki (even if it is mandatory in the grammar) and thus requires adapting the translations in non-obvious ways.

For Celtic languages (Breton, Irish, Welsh...) the situation is more complex because there are mutations of leading consonnant(s) and possibly the following vowel.

For Slavic languages (Russian, Ukrainian, Serbian, Slovenian...) another complexity exists with proper names which have complex derivation (taking also into account the gender of the designated person): this leads to situations where translations made for these languages do not use normal sentences, but use a quite "ugly" form with extra punctuations and separators in order to "isolate" these names in an "ergative" form. The result is a non-natural translation using non-verbal sentences that look more like an enumeration of terms.

So for now the support of "GRAMMAR" transforms is really very basic, not very well studied, and a lot more development is needed to correctly handle all linguistic needs. The existing support may then be unstable. The existing basic syntax {{GRAMMAR:case|word}} may be insufficient, needing more parameters, e.g. for mutations and how to glue two terms, or for the placement of particles in the sentence, such as in German conjugations, but even in English with adverbial particles). In most cases it is simply impossible to derive terms or sentences correctly, and if possible, translation units should not fragment sentences (to also avoid bad assumptions about the correct word order, frequently made by English developers and causing troubles when we attempt to translate these "patchworked" messages), but terms that are very likely to be used as "variables" embedded in sentences and that are the most problematic are proper names (including user names, that must be properly isolated also because of Bidi, when we cannot even transliterate them or when they mix several scripts).

Verdy p (talk)‎

See https://gerrit.wikimedia.org/r/c/mediawiki/core/+/787720

Amir E. Aharoni (talk)‎