Alphabet changing Nogai language

Oh yes, it can be like the Kazakh Wikipedia and Serbian Wikipedia (like: you go to the languages of the page and also the alphabet is possible to change). And for information, we will use the new latin alphabet (not the 1928 version). File:Nogai latin alphabet.gif

TayfunEt. (talk)12:27, 25 April 2022
Edited by author.
Last edit: 08:10, 26 April 2022

At least Omniglot provides a good start for a transliterator. (https://www.omniglot.com/writing/nogai.htm)

It may eventually be reversible with these simple exceptions using digraphs that would take precedence for the conversion back to Cyrillic (still in use in the Russian Dagestan and Chechnya, while the Latin script is used in Turkey and Romania, and based on the Turkic Latin alphabet in the ISO/IEC 8859-9 subset for Latin, which also fully supports both Turkish and Romanian) :

  • "ya / Ya / YA" = "я / Я" [ja],
  • "yo / Yo / YO" = "ё / Ë" [jo] (only occuring in Nogai loanwords borrowed from Russian),
  • "yu / Yu / YU" = "ю / Ю" [ju] (or [jy] only if initial of the word or after a vowel),
  • "y / Y / ’" = "ь / Ь" [ɯ] (only occuring in Nogai loanwords borrowed from Russian),
    The special handling for the Latin apostrophe (preferably the curled version U+2019 (right single quotation mark) i.e. (’), not the ASCII quote, even if the ASCII apostrophe is probably commonly used as a substitute), to be associated to the Cyrillic soft sign letter "ь / Ь" (because there's no curly quote in ISO/IEC 8859-9, but it is present in the well supported codepages Windows-1254 for the Latin-based alphabet in Turkish or Windows-1250 for the Latin-based alphabet in Central European languages including Romanian, where the apostrophe is coded at 0x92 but rarely present on standard physical keyboards used in Turkey and Romania).
  • "j / J" = "й / Й" [j] (note: there's not need for distinction between dotted and undotted "j/J" in the Latin script, so it is "soft-dotted" like in English or Italian)
  • "i / İ" (dotted) = "и / И" [i],
  • "ı / I" (undotted) = "ш / Ш" [ɯ],
    Romanian users may have problems with distinguighing the dotted or undotted vowels with the Latin script, they may not have "ı" (undotted lowercase) or "İ" (dotted capital) on their physical Latin keyboard. Turkish users won't have such problem.
  • "ts / Ts / TS" = "ц / Ц" [ts] (only occuring in Nogai loanwords borrowed from Russian),
  • "şç / Şç / ŞÇ" = "щ / Щ" [ɕː] (only occuring in Nogai loanwords borrowed from Russian),
    Note: due to their keyboard, users in Romania writing Nogai in Latin script may type a comma below diacritic, which is standard in Romanian language, rather than the cedilla; but the same also happens frequently in Romanian where both diacritics are confused, and many old devices did not have fonts with the comma below which was mapped in legacy 8-bit charsets for Romanian with a precomposed character for base letters "c / C / s / S" only in ISO/CEI 8859-16 (Pan-EU Latin-10), only after it was encoded separately in Unicode, while the cedilla was used in ISO 8859-1 (Western European Latin-1) / ISO 8859-2 (Central European Latin-2) / ISO 8859-3 (Southern European Latin-3) / ISO 8859-9 (Turkic Latin-4; the "ISO/CEI 8859-16" (Latin-10) charset was added but fell out of real use as support for Unicode was already prefered, including in Wikipedia that already used UTF-8 at that time; and rapidly after that, ISO decided to no longer add and support new 8-bit charsets, focusing only in Unicode; Microsoft also did not need this charset, because it had already mapped since long the cedilla below "c/C/s/S" in Windows codepages based on Latin variants of ISO/CEI 8859).
    Only Romanian users using modern systems that are fully Unicode compatible and use modern Unicode-based fonts that supply the "WGL" common subset of Latin, may have keyboard layouts featuring the comma below, and may type it by default rather than typing cedillas, as if they were typing correctly in Romanian. Turkic-speaking users won't do that.

If that transliterator is enabled, then it could be installed by default and thus we would not even need to create separate translations for Nogai between these two scripts, and the wiki could remain unified with the same content, equally accessible from Russia and Turkey (or Romania).

Note also that MediaWiki supports special syntax (using -{code1=text1|code2=text2}-) to mark specific transliteration rules in articles (it is used for example in Chinese to make exceptions to the converter between simplified and traditional Chinese).

Verdy p (talk)13:53, 25 April 2022

And the fallback language Kazakh?

TayfunEt. (talk)04:58, 26 April 2022

Transliterating the content between Latin and Cyrillic would come first in my opinion, before trying a fallback to Kazakh in the appropriate script (Latin or Cyrillic, whichever has content).

Note that Kazakh Wikipedia also uses a transliterator, only from Cyrillic to Latin or to Arabic: all its article names (including proper names, except for brands like "Twitter" or "Los Angeles Times"), and category names are using only Cyrillic; some compatibility namespace names may have Latin alias in English (but they are translated to Cyrillic), and some template names borrowed from English Wikipedia or Commons without necessarily renaming them). So Kazah doesnot need an extension or separate contents, including in translations made on this wiki... except that Wikimedia transliterators are not installed on Translatewiki.net (which is not jsut translating for Wikimedia's Mediawiki-based wikis).

This means that Kazakh Wikipedia only uses and maintains the "kk-Cyrl" translation made here on translatewiki.net, "kk-Latn" and "kk-Arab" are used for the UI only (which does not use the content transliterator, but allows the users to choose their user language for the UI, independantly of page contents (which uses the script variant selector for the transliteration of page contents, and not the current user language for the rest of the UI).

Verdy p (talk)06:45, 26 April 2022

You are wrong with some of the lettes:

j = ж

ı (undotted)= ы

y = й

ş = ш

Ь (the soft sign) is not found on Nogai latin alphabet. Please don't confuse with the latin alphabet from 1928, the Latin alphabet wich is used today is different. The Latin alphabet is same with Crimean Tatar Latin alphabet, the different is the letter Ä ä wich is in Nogai Cyrillic Аь аь. Thank you for your helping!

TayfunEt. (talk)12:34, 26 April 2022