GRAMMAR for Georgian

GRAMMAR for Georgian

Hello! I want to enable magic word Grammar for Georgian language. What should I do in order to make the minimum cases available (sitename declination etc.)?

Გიო ოქრო (talk)11:09, 9 April 2022

Excellent!

Just write the forms as a reply here, and I'll add them to the code.

Amir E. Aharoni (talk)07:26, 12 April 2022

Elementary declination I wrote here - Template:Grammar-ka. There are English meanings to. Is it easyto perceive?

Გიო ოქრო (talk)08:13, 12 April 2022

OK, I'll do it. Give me a few days, it's a little bit complicated :)

Amir E. Aharoni (talk)08:06, 13 April 2022

The actual grammar for declensions and other word derivations in Georgian is quite complex: depending on word final letter it changes depending if it is a vowel or consonnant; but as well depends on a long list of exceptions; and propernames are treated specially); as well there can be a mutation or elision of the last vowel before the trailing consonnant. And there are many grammatical cases (including the "vocative" case, contested by several wellknown Georgian linguists, which applies to nouns or nominal groups that are not grammatically attached to a verb in a sentence or proposition and not attached to another noun as a genitive, dative or locative complement).

I looked inside Wiktionary (in Georgian, or in English and French that have extensive sets of templates to show tables of derivated words, like declensions, plurals and conjugation) or in Wikipedia, and for now this has never beend developped).

So before doing a change on translatewiki.net, you should first start developing a set of derivation tables in Wiktionary, enumerate and test the possible cases (this took a long time to develop for example in French, German, Italian or German, it was still messy to develop in English for conjugation of irregular verbs; as well the rules needed for Slavic and Celtic languages can be quite complicate: Georgian behaves much like Celtic languages, except that it does not seem to use mutations on the leading letters of roots) and then discuss it there, on how to integrate them with other wikis. This could take the form of a module rather than a template, but TWN still does not support Lua modules (with Scribunto), only regular MediaWiki templates.

However it may support an extension for the {{GRAMMAR:case|word}} by setting up an extension hook written in PHP and used by MediaWiki: you will need to contact MediaWiki developers for this, but generally such development starts only after setting up at least templates or modules (generally in Wiktionary where it can be tested and discussed and where lists of exceptions can be reliably developed and tested, instead of making assumptions on just a few words). So you should first contact users of Georgian Wikitionary to see how they can help on this project.

Later the Mediawiki PHP hook extension for grammatical cases (see mw:Help:Magic words#Localization and mw:Manual:$wgGrammarForms on the MediaWiki wiki, and the Language:convertGrammar method of the Language class, part of the "mediawiki-core" API in PHP on the Wikimedia documentation site) could be initiated. For now nothing has been done there for Georgian:

You'll need to think about the effect of other modifications based on the grammatical gender, or on the semantic type of names like inanimate/personal/honorific/conceptual (typical of various African languages), or based on the grammatical plural, or on the presence of contextual mutations and elisions, and make these part of the case selector parameter. As well the values of the case selector must be rationalized: generally, they use common abbreviation, they don't need to support "plain names" in Georgian, except as common aliases that may be optionally recognized/standardized, but this is probably not worth the effort and could cause confusion on how to use the "GRAMMAR:" parser function; so the case value could remain as plain abbreviations in ASCII/Basic Latin (like the "GRAMMAR:" function name itself). And for rationalizing that, you need good linguistic references for the target language (even native speakers do not know or think about all the needed cases to handle, notably about attested exceptions or derivation rules that are more complex than usual, and hard to think about without extensive searches, that's why I suggest basing the development and tests in Wikitionary, which should collect as many terms as possible).

The base documentation for linguistic conversion supported in MediaWiki is documented on the ILanguageConverter Interface Reference.

You'll also note that even wellknown languages like English (except very few things not related to grammar, basically only to handle fallbacks from other languages to English without producing errors, or for adding minimal experimental support for a fictive "Piglatin English" variant, only for the purpose of testing the support of "language variants" in other modules by developers knowing only English), French, German, Spanish, Portuguese, Italian, or Russian (fully documented in their own Wiktionary, and using sets of derivation templates or Lua modules) still do not even have a specialized class in Mediawiki for handling GRAMMAR)... That's because the maintenance cost of the GRAMMAR parser hook is much more complicate than just using templates/modules (on each wiki), and setting up a GRAMMAR parser hook support without maitnenance could potentially break a lot of pages (if they do not properly handle long lists of exceptions). The same can be said about the development of transliterators and input methods (difficult and already quite large to maintain for Chinese or Serbo-Croatian without an extensive user base and lot of discussion with a large enough team of supporting developers using reliable linguistic sources).

Verdy p (talk)10:53, 14 April 2022
 

Hi! I implemented ნათესაობითი (genitive). For example, now you can try changing MediaWiki:Aboutsite/ka to say {{GRAMMAR:ნათესაობითი|{{SITENAME}}}} შესახებ, and it should work.

If it works well, let me know, and I'll do the rest.

Amir E. Aharoni (talk)07:04, 30 May 2022

I do not see the implementjation in MEdiaWiki core sources. To be actually usable in translations, given that these translations will be used on other wikis, it needs to be deployable.

As far as I see there's still no "includes/languages/LanguageKa.php" and "languages/messages/MessagesKa.php" is basic and just implements translations for namespace names, magic keywords, and a custom regexp for link trails, without registering a grammar extension.

It should be implemented like the grammar extension for Finnish, Irish Gaelic, Upper Sorbian, Kazakh (Cyrillic, Latin, or Arabic), Kölssh/Ripuarian, Latin, Ossectic, Slovenian, or Tuvian:

I don't know if you implemented it just to provide case overrides for specific words in a small custom dictionnary (just for Wikimedia project names, like in Irish?), or in a generic way (like in Kazakh, Latin, Slovenian or Tuvan) if there's a way to guess it from the detected forms.


Note that the links above are those used for the current documentaton of MediaWiki, they are probably not in sync with the current development branches, or the last deployment branches, or the last testing ("Canary" build) branch used on test wikis before the actual deployment in the master branch, or in a specific subbranch (only for Wikimedia wikis if your changes do not apply to generic MediaWiki).

If there's no specific per-language override for grammar in any branch, we just get a default fallback for {{GRAMMAR:case|word}} in Language.php which just returns the given word without changing it for any given case.

Also the recognized values for case given to {{GRAMMAR:case|word}} should be listed and documented. Unfortunately they seem to be non-generic: existing implementations use translated case names, and not generic abbreviations or English names for common cases like "nominative", "ergative", "vocative", "accusative", "genitive", "dative", "abblative", "locative", "adverbial", "prepositional"...), because otherwise these grammatical case forms won't work and will just return the unmodified given word. In some languages, even the case names are variable and may have known aliases, and their own local list of fallbacks from one case form to another. As well, the value to give for the word parameter is generally in the "nominative" case, but I wonder if that is always true for all languages (notably for proper names, the default case form is probably "ergative"); if we give the word in the wrong case form, the derivation of inflections may not work as intended.


Something that I'd like to have in French is a derivation pseudo-case form for elisions of the final vowel of leading articles ("le", "la"), or prepositions ("de", "jusque", "lorsque", "puisque"), or subject/object/reflected pronouns ("je", "me", "te", "se", "que"...), or negation particle ("ne"): there's a generic rule when the following word starts by a vowel (a,à,â,e,é,ê,i,o,ô,u,y) where elision is mandatory, but a lookup is needed in a moderately large dictionnary for words starting by a leading "h" (or at least a radical, or common prefix like "haplo", "hémi", "hém(a/o)", "hept(a/o)", "hétér(i/o)", "hom(i/o)", "human(i/o)"... which are generally unaspirated), because the elision of the leading article/pronoun/particle occurs only when the leading 'h' is mute in the following word/radical/prefix, not when this 'h' is "aspirated" (generally noted in French dictionaries by a leading asterisk in the phonetic notation of lexemes, most frequently nouns or substantive radicals like "haricot", or proper names and gentilés and people names like "Hongrie", "Hongrois(e)(s)", "Hun(s)", "Han(s)"... but not "Henr(i/y)"; but it may be possible to list these aspirated words as "wellknown" exceptions, but there are many "aspirateds" words that are in fact borrowed from other languages where the 'h' is not just "aspirated" and mute like in French, but pronounced with some leading glottal or voiced consonnant like "Haydn" taken from German: in general proper names starting by a capital 'H' are aspirated by default, except for a few know exceptions, so that no elision occurs for the common preposition "de", or rare article "le"). There also the very common (and mandatory) case of contraction of "le" or "la" (only if they were not elided with the following word) or "les", after prepositions "à" or "de", into "au(x)" or "du/des". Another case similar to elision is the mandatory transformation of the article "ce" into "cet" (not an alision but appending an epenthetic consonnant) in the same cases where the following word would require the elisition of a previous "le" or "de". The frequent epenthetic addition of particles "-t-" or "-z'" (purely phonetic, without any semantic meaning, but orthographicly mandatory) also occurs between a conjugated verb and a pronoun starting by a vowel ("il(s)", "elle(s)", "on", "en", "y"), when this conjugated verb ends with a vowel (even if that vowel is a "mute e").

Similar elisions occur in Italian (even more frequently than in French). Such very frequent uses currently have no support in MediaWiki (even if it is mandatory in the grammar) and thus requires adapting the translations in non-obvious ways.

For Celtic languages (Breton, Irish, Welsh...) the situation is more complex because there are mutations of leading consonnant(s) and possibly the following vowel.

For Slavic languages (Russian, Ukrainian, Serbian, Slovenian...) another complexity exists with proper names which have complex derivation (taking also into account the gender of the designated person): this leads to situations where translations made for these languages do not use normal sentences, but use a quite "ugly" form with extra punctuations and separators in order to "isolate" these names in an "ergative" form. The result is a non-natural translation using non-verbal sentences that look more like an enumeration of terms.

So for now the support of "GRAMMAR" transforms is really very basic, not very well studied, and a lot more development is needed to correctly handle all linguistic needs. The existing support may then be unstable. The existing basic syntax {{GRAMMAR:case|word}} may be insufficient, needing more parameters, e.g. for mutations and how to glue two terms, or for the placement of particles in the sentence, such as in German conjugations, but even in English with adverbial particles). In most cases it is simply impossible to derive terms or sentences correctly, and if possible, translation units should not fragment sentences (to also avoid bad assumptions about the correct word order, frequently made by English developers and causing troubles when we attempt to translate these "patchworked" messages), but terms that are very likely to be used as "variables" embedded in sentences and that are the most problematic are proper names (including user names, that must be properly isolated also because of Bidi, when we cannot even transliterate them or when they mix several scripts).

Verdy p (talk)08:22, 30 May 2022

I'll try and write if it works. Thanks for your interest.

Გიო ოქრო (talk)10:57, 2 June 2022
Გიო ოქრო (talk)09:31, 14 June 2022

Thanks for the confirmation. I'll do it for the other cases some time soon. You can probably also do it on MediaWiki:Aboutsite/ka. (And after you do it, you should delete this local message on the Georgian Wikipedia: w:ka:MediaWiki:Aboutsite.)

Amir E. Aharoni (talk)12:11, 14 June 2022