Fragment of a discussion from Support
Jump to navigation Jump to search

That problem was also in, see Phabricator task. Now it has been rectified.

Იო ოქრო (talk)09:31, 17 March 2021

Ok. Do I understand correct, that Გ is the uppercase letter of გ? Looks like my computer does not have an up-to-date font for Georgian as I see Გ as placeholder character only :-(

Raymond14:52, 17 March 2021
Edited by author.
Last edit: 20:33, 17 March 2021

Actually modern Georgian is unicameral. Georgian has historically used at least 3 major script variants (which in earlier versions of Unicode were merged into a single one and incorrectly encoded as if it was bicameral with two variants; even if at its original it was unicameral and using only one variant).

There is however some limited use where 2 script variants may be used simultaneously, but normally NOT in the same word (it's like the upright versus italic variants in Latin, that you normally don't mixup to create case distinctions).

At some past period, there's been an attempt to use two variants as if the script was bicameral, making one of the variants "capital", and the other lowercase. This was an error.

Actually you should NEVER convert any lettercase in Georgian, letters should preserve their existing case in the 3 existing major variants. But outdated softwares still do this conversion to generate some "titlecasing", which is wrong for Georgian as it will mix the 3 variants (that are actually 3 distinct alphabets, not exactly and fully equivalent). So "Გ" is NOT the uppercase letter of "გ"; these are just semi-equivalent letters in two variants of the script. In the modern Georgian use, you can perfectly use only one variant for everything in your text;

But a second variant may be used to emphasize text (not to create case distinctions), just like you could use smallcaps in Latin in a paragraph of text, or use an "allcaps" style for monumental inscriptions and small extracts of text (but note that in Latin, you loose some text distinctions; the same occurs in Georgian because converting from the modern to the historic "capital" variant is also lossy for a few characters of the alphabet).

Mediawiki should not then remap any Georgian letter for titlecasing (i.e. the first character of pagenames), even if these characters are sorted at the same position for the primary collation level): these letters are still distinct, but the distinction is NOT a valid lettercase distinction.

There are some informal use of the Georgian script as if it was a bicameral script (or even tricameral!), but this is informal. Such transforms are in general lossy and invalid if they are automated. The best way you should look at the Georgian script is "as is" it actually was 3 distinct scripts M, A and N; but another view of it considers it as having 2 major scripts (only A, or mostly M plus some A; mostly A plus some N);

  • [Geor] (Georgian): Mkhedruli + some (Aso)mtavruli, or just Mkhedruli (modern use only). Mediawiki casemaps M+A as it it was bicameral (this is deprecated: a M-only text is correct, an A-only text is not modern Georgian).
  • [Geok] (Khutsuri): Asomtavruli + some Nuskhuri (Nuskhuri is normally never used alone); this is an old historic use (A+N, or just A for monumental use only).

The question of whever Georgian is really bicameral is still debated (this depends on orthographic conventions that have still never been resolved definitely): this is the same situation that occured in the past for Medieval Latin, or Medieval Greek, before bicameral lettercase were considered as important distinctions with string distinctions allowing the same word to mix the two cases under strict conditions (and then conversion under weak/lossy conditions).

There's a comparable situation in Japanese which is written with 2 distinct Kana scripts (Hiragana and Katakana) which are not really equivalent (one of them is more lossy than the other), and that uses a third script occasionally (Kaji sinograms): converting *automatically* between Hiragana and Katakana is invalid, even if some letter pairs collate at the same primary position (in a sorted list or for plain-text searches).

Raymond, what you did was clearly wrong: you removed the first letter "Გ", the user asked you to change the first letter "Გ" from one Georgian alphabet variant to "გ" another variant. Then the second letter was incorrectly capitalized (and here this is a bug of MediaWiki that should NOT capitalize this "ი" letter into "Ი").

In fact as long as Mediawiki considers Georgian as a true bicameral script (based on how the script was initially encoded in Unicode, then extended later to add a third historic alphabet now considered as part of a separate "Geok" script for ISO 15924) the two letters "Გ" and "Გ" (that are "case-mapped" by Mediawiki using the deprecated Unicode casing rules) should have caused the request to NOT be honored.

You should have not touched this username at all because Mediawiki still considers they are equivalent when ignoring case (and there's still no way to avoid the "promotion" of the "გ" initial onto "Გ" in page titles).

Instead the user could have modified its user page "User:Გიო ოქრო", using "{{DISPLAYTITLE:User:გიო ოქრო}}" as a valid way to override the incorrect forced capitalilisation made by MediaWiki.

Verdy p (talk)19:53, 17 March 2021

"Raymond, what you did was clearly wrong: you removed the first letter "Გ",..:"

Yes, this is what I understand no. Გ looks like a kind of control character which should be removed.

@Გიო ოქრო: I can try to rename you back to "Გიო ოქრო". But I see no chance to rename you to "გიო ოქრო" because MediaWiki converts every first character of a page, even a username, to uppercase.

Raymond20:10, 17 March 2021

And once again "Გ" is not really an "uppercase" letter; it is just converted from "გ" using the **old** uppercasing mapping (whose use is deprecated in Georgian, but has been kept in Mediawiki).

An no, "Გ" is not some control character, it is another letter from the (Aso)mtavruli alphabet (considered to be an "uppercase" letter, only if we accept that Georgian is bicameral); while "გ" is the Mkhedruli letter (considerd to be "lowercase", only if we accept that Georgian is bicameral).

Using Georgian scripts as if they were bicameral is highly debated and in fact not even recommended (this is a matter of opinion and author choices to prefer the bicameral view, or the unicameral view).

For the modern Georgian language, its official script is monocameral, Mkhedruli ("lowercase" only) and converting Mkhedruli ("lowercase") to Asomtavruli ("uppercase") is invalid (and lossy for some letters!); the inverse conversion is also invalid (but usally not lossy in that direction, so page titles in Georgian should *not* have their first Mkhedruli letter "uppercased" to (Aso)mtavruli, but rather have their (Aso)mtavruli initial "lowercased" to Mkhedruli. This is not what Mediawiki does because it does not use the Georgian casing rules and uses simpler casing rules (that have bugs in various languages, notably in Turkic languages for its behavior with dotted vs. dotless I or J, also in Germanic, Central European and Greek language for the initial/medial/final behavior of S or Sigma and its capitalisation forms or ligatures)

Verdy p (talk)20:30, 17 March 2021

Sorry, I give up. Too much headache :-(

Raymond21:07, 17 March 2021

@Raymond: Ok, rename me back to "Გიო ოქრო". Stay this until the problem is fixed.

გიო ოქრო (talk)13:14, 18 March 2021

Done Done Re-rename done. And please excuse my mistake due to my ignorance about Georgian script.

Raymond07:55, 19 March 2021