Salentinian & centro-meridional Calabrian

Fragment of a discussion from Support
Jump to navigation Jump to search

Unless there is some standard body or community defining an encoding for those dialects, and that have registered a documented code for it, you cannot encode dialects of what is still considered the same language. BCP 47 allows completing a language code (from ISO 639) with extra fields : a script code (from ISO 15924), a region code (two letters from ISO 639-1 or 3 digits from UN M.49), an "extlang" field (specific to BCP 47 but for most reasons, kept only for capabilities and frozen since it was actually used for extending ISO 639 when it was too fuzzy, and it has been replaced by ISO 639-3), and a code for dialects (5 to 8 letters or digits, no suppporting standard for now, all that exists is some codes registered in the IANA database for some dialects).

Wikimedia has registered for itself a few nonstandard codes (that are not part of BCP47 or ISO 639) for some "languages", which are in fact not languages in ISO 639 but dialects, and that also violates BCP 47, such as "roa-tara" (for Tarantino, which is a dialect of Napolitan encoded [nap] in ISO 639 and BCP 47). BCP 47 does not defin any way to infer such relation. However those codes exist in Wikimedia only because there were an agreement in Wikimedia communities to support them like this (even if this violated the BCP 47 format requirements, in fact that code was added BEFORE BCP47 was revized with the publication of ISO 639-3: today it would certainly not be encoded this way in Wikimedia, using a 3-letter code that was defined in ISO 639-2 as an "exclusive" code for the "Romance" language subfamily and not the 3 letter-code "nap" of Napolitan, complemetned by a non-registered "extlang" field which in fact violates BCP 47 as if "tara" was a script code).

The best code you may use, with any formal agreement by a standard body, is to use "private-use" extension after the base language code, such as "scn-x-salen" or "scn-x-cmcal" (with the "x" prefix for private use, after which the subfield can be letters or digits and have no constraints other than using one or more subfields that are 1 to 8 letters or digits, separated by a single hyphen).

But you'll notice that this is "private use" and this still required a private agreement: to be eligible for use here on this wiki, you'd first need to publish this private agreement and who supports it. Without it, there's no way to use such codes in this wiki, which is not even targetting only Wikiemdia projects: here we need such formal and public agreement (after discussing it and establishing a consensus). And this wiki is not the proper place to get this consensus: so you need to find a sponsoring community or standard body listing such code (and documenting it for public use, without licencing restrictions to get the basic documentation).

If you look into "Glottolog", you'll find it is uses its own private codes for dialects for local use; if you look at other lingusitic lists, they use their own private codes as well (different from those used in Glottolog): there's no standard agreed even between them... And for now we've have never seen any draft for a possible future standard list if dialects (the last attempt to make it in ISO 639 in a new part was abandoned: we are too far from being comprehensive enough, and in fact there's LOT of disagreements even between linguists about what are "dialects", and how they are separated in regional space, or in time/epoch, or in social usage: even the same speaker may have several "registers" of speech, possibly non neutral and politically oriented, depending with whom they are talking and how their speech will be recorded and reused).

As well, BCP 47 codes using region codes (such as "fr-ca" or "en-gb") are deprecated: country codes are not very relevant and are politcally oriented with boundaries that do not correctly represent the effective regional separation of language variants. Even "British English" has its own variants, such as Oxford English, or Bootlings; there are many vernacular variants existing for most languages (and notably the largest ones spoken by more than about a few dozens of thousands people), this is unavoidable and very varaible over time and epoch (because people are moving, dying, opinions and usages are diverging or converging in different directions with clusters being constantly splitted or merged and following their own genetic history whose classification becomes more and more fuzzy over time).

Verdy p (talk)22:13, 9 January 2022