Translating to the same language but different script

From Support
Jump to navigation Jump to search

Translating to the same language but different script

Hello! Tagalog is currently pretty well-translated, however, it is translated into the Latin script (the writing system I am now writing this on). I am requesting if it can also be translated into a different script, the Baybayin script. I first thought to add that as a new language but then Baybayin is not in ISO 639-3. However, I have seen a wiki that has translated a couple of pages into Baybayin here with the code tl-b, but it isn't part of's languages. So I am requesting so that I can officially translate pages into Baybayin. It's just like the Tagalog translations but written in Baybayin.

Yivan000 (talk)06:30, 8 February 2021

"tl-b" is not a valid BCP-47 code. Baybayin is also not a script but the native name of the alphabet, encoded in Unicode/ISO 10646 and with the script named... "Tagalog" with standard ISO 15924 code "Tlgl". See ISO 15924.

For BCP-47 (part of the standard used on the web for W3C protocols (like HTML, CSS, SVG) and in many programming languages and internationalization libraries, the script variant you want would be encoded [tl-Tglg]. Note that letter case in codes does not matter in BCP 47 (as well the hyphen "-" is not different from an underscore "_").

Note that the term "Tagalog" refers to several related languages in the Philippines: the modern Tagalog is named "Filipino" and written only with the Latin script. It should use the ISO 639-2 or ISO639-3 code "fil", but "tl" has been used from the older standard ISO 639-1 (which used 2-letter codes and grouped various languages, not always mutually intelligible). So "tl" is used for its most common variant, the modern language Filipino officialized in the Philippines, even if abroad, the language spoken by the community is still named "Tagalog", but even abroad, most readers and writers (including in the United States) used the Latin script, with the vocabulary from Fillipino (there are also some creolized variants mixed with English, or other languages like Spanish, Malay, Indonesian, and some Chinese languages, with important borrowings added now in Fillipino, but also in other languages of the Phillipines like Ilocano (standard code [ilo]).

The historic Tagalog language, as it was wrriten with the Baybayin alphabet remains used by a small minority. It is valid to do your request, but for [tl-Tglg]. See Portal:Tl and Portal:Ilo which gives the code of their variants (disabled for now) in the Tagalog script (Baybayin).

For historic reasons, non-BCP447 language codes are still present on this wiki for legacy codes still in use by Wikimedia wikis only. But they are limtied in scope to only existing Wikimedia wikis. Wikimedia has a long process since years to only accept codes conforming to BCP47; but translation in these languages are still done on this wiki using valid BCP47 codes with very few exceptions: migrating these legacy invalid codes is a very complex process which take years and it has complexified the management of language codes and internationalization on these Wikimedia wikis. We should no longer add any invalid BCP47 code and there's a strong policy even in Wikimedia about this (there's a "language comity" now working since years in Wikimedia, and all remaining legacy codes are in a "bug list"; we should avoid increasing this bug list due to the extensive work it requires to migrate them later.) This policy is visible as well in Wikimedia Incubator for requests for new languages. And there's NO code using 1-letter extension for variants (and I think there should never be one).

For extensions to ISO 639, BCP 47 is the valid model (even if ISO 639-1 has been kept, it remains historic and its 2-letter codes are kept for compatibility with the most widely used languages; ISO 639-2 was a mess that was fixed later in ISO 639-3; but BCP 47 preserves the compatibility by defining aliases and reclassifying the mess left in ISO 639-1/2). BCP 47 allowed extensions for 3-letter code variants after a base language but their list is now closed since the adoption of ISO 639-3. It allowed 2-letter or 3-digit extensions for region codes (also deprecated, but left for compatibility: ISO 639-3 is prefered now in BCP 47, and 2-letter codes only for the most important language or variant in a group, e.z .with Chinese which is actually a macrolanguage whose major variant is Mandarin, "cmn" in ISO 639-3, but mapped as the default variant for "zh" coming from ISO 639-1).

As well BCP 47 has including some "grandfathered codes" using 1-letter prefixes, they are no longer recommended and remapped to ISO 639-3.

So today the recommendation in BCP 47 is

  • a 3-letter base code from ISO 639-3, or 2-letter code from ISO 639-1/2 only for the most widely used language in a group
  • an optional 2-letter or 3-digit region extension code (deprecated, only useful after legacy 2-letter language codes; the prefered way is now to use ISO 639-3)
  • an optional 4-letter extension code from ISO 15924 for script

and nothing else, except codes for private use only which start with a "x-" prefix !

BCP 47 also includes other reserved extensions like "u-*" for Unicode locales. The prefix "b-" is reserved and must remain reserved.

It is very interesting to see the history of these language codifications.

Verdy p (talk)14:12, 8 February 2021

Yeah, I forgot to mention that the language code can be discussed, so tl-Tglg can be used. So, can we pursue that?

Yivan000 (talk)03:25, 9 February 2021

All is ok with [tl-Tglg], don't worry, if you have supporters and want to work with them to start developping that project, continue with it. But interesting questions are :

  • Does this language need separate wikis, or is it possible to develop and use a transliterator between Latin and Tagalog/Baybayin scripts?
  • Have you collected a set of usable and free fonts for showing that script?
  • Have you tested that script in a test wiki such as Wikimedia Incubator? (this test wiki, which could be for a Wikipedia Incubator, or a Wikisource section/category in the existing Tagalgo Wikisource, or in Tagalog Wikitionary can also be used to show the interest with participating contributors). The test wiki can be any existing wiki as long as its local community will want to accept locally to have text written in that script, or will want to develop a transliterator, which may finally be deployed).
  • Can an automated transliterator be used reliably for the Mediawiki UI? (If so, we d'ont need these translations, Mediawiki could provide it automatically, just like it was done for variants of Chinese, but with active involvements of qualified developers and many linguist experts available to fix the quirks, and the developement of a huge dictionnary of rules; this development was not complex however for the transliterator used in Serbian between Latin and Cyrillic, even if there were separate developments for Croatian or Serbian due to differences/preferences of vocabularies and some grammatical forms)
  • Are there projects outside Mediawiki-base wikis ? E.g. wikis using other engines, forums or communication tools, databases, where a transliterator is insufficient: this would require the addition even if this is not for Wikimedia projects supported here (e.g. you may want to translate the FreeCol game, that works in Linux or Windows). There are tons of free/opensource projects that could be hosted here to translate their ".po" files, even if these translations will finally be posted in another repository like GitHub (where translation of these files offers no facility at all)

Note: I have no opposition to your project. But you have to realize why you need it and what are the best alternatives. And as it is for the Tagalog/Filipino language, the needs should be discussed with an active Tagalog community to determine precisely what is needed. The need to start translating separately must come only if this is the only solution and a test should first be driven on a site to demonstrate what is already possible and what requires a separate translation: adding a separate translation will have a huge maintenance cost and will require an active community, you have to evaluate what will be the simplest alternative with the lowest efforts that can be maintained for long. If an automated solution can work, it will be more beneficial to this language variant as it will immediately gain active support from existing Tagalog users and existing and future contents in Tagalog, made instantly available in the other script, so that more Tagalog/Filipino/Ilocano-speaking users will be able to use or learn and revive the non-Latin script.

So your request about [tl-Tglg] or [ilo-Tglg] is valid. I support it only as a matter of opinion, but the support needed will be from active Tagalog/Filipino/Ilocano speakers, and you'll need to get support from some developers. It is important to show and develop an online testbed for that script.

Verdy p (talk)15:55, 9 February 2021
  • IMO, this doesn't need separate wikis. Your suggestion may be used instead, or possibly other methods yet discussed.
  • Yep, "Noto Sans Tagalog" is all configured and I can type and read Baybayin without Unicode boxes.
  • I haven't tested it yet, however, I have plans to make a Baybayin version of Noli Me Tangere in Tagalog Wikibooks as I am an active contributor on that book.
  • IMO, yes, we can develop that as well. I see minimal problems as it is just character/string substitution from Latin to Baybayin (ex. ka -> ᜃ; ngi -> ᜅᜒ; po -> ᜉᜓ;).
  • As far as I know, no, there are none. Sad.

Yeah, that's the problem, there isn't an active Tagalog community these days. I think it is wiser to develop/program an automatic script to transliterate Tagalog Latin translations into Baybayin, and I could help with that.


Yivan000 (talk)04:59, 10 February 2021

Yoiu probably know that the Baybayin script is more "defective" than the modern Latin script which has more distinctions (even if some Latin letters are allophones in some, but not all Philippines languages, such as ra vs. "da (allophones only in Tagalog, but not in other languages that uses/used the Baybayin, and today even need the distinctions: the historic allophony of ra and "da is less effective today, when so many people have been exposed to English or historically also to Spanish and Dutch, and still today to other Chinese and Malaysian languages; and also need and use many borrowed proper names)

For what I see, the Baybayin alphabet (unfortunately encoded in Unicode and ISO 15924 under the normative but too restrive name "Tagalog") is simple to support with a transliterator from Latin (but care will be needed for the ba/ra distinction even if it is allophonic in Tagalog language for some people while it fact it shows today a clear contrast, the allophony being only marked by the contextual mutation depending on surrounding vowels): in the Baybayin alphabet, allophones may or may not share the same glyph, an in my optinion they should have been encoded separately (using a variant code where needed if we want to make or drop the visual distinction of "ra", with the letter "da" encoded always without the distinction: an Unicode font renderer could easily make the proper choice of glyph for "ra" automatically if the language is known: in Tagalog/Filipino language the distinction would be removed so "ra" would use the same glyph as "da"; this would also facilitate the bijective conversion with the Latin script in transliterators, so immediately the Philippine languages could automatically get a rendering in the Baybayin script without needing any development of a separate content).

As well there are only 3 vowels A/I/U in Baybayin, when Philippine languages in the Latin script also use E/O as historic allophones, which are also more clearly distinguished today. The same solution would ease porting, using variant selectors for each of the 2 Baybayin vowels I/O to preserve the distinction alreaday present in the Latin script.

Note that Unicode has left one empty cell in the Unicode block, it was not for the Tagalog language itself, but other Phillipine languages using the Baybayin alphabet.

Note finally that Baybayin has several graphical traditions (at least 8 are documented for different languages): only one tradition for the Tagalog language was studied in Unicode. The presence of an unallocated slot in the encoded script is a clear indication that this space was reserved, pending further studies, so that the Baybayin script is still not completely encoded and was only encoded for one form of the Tagalog language (hence its unfortunate name given in the Unicode/ISO 10646 encoded block, which is normative and cannot be changed, just like the code "Tglg"). The English and French names shown in ISO 15924 however should be clearly changed to Baybayin, they are not under a stability rule

I think you should discuss theses issues with Unicode and send requests, so that Baybayion can be used more easily and transliterators with Latin can work reliably, while still preserving the distinctions needed for other languages than just one of the historic forms of Tagalog (before the recent creation of its modern official Filipino variant).

Verdy p (talk)16:52, 10 February 2021

Sad. I knew this would flop. Well, at least I know now the reasons why. Thanks, anyway!

PS: I just found this MediaWiki main page in Baybayin, if you're interested in bringing it up again.

Yivan000 (talk)01:36, 12 February 2021