GRAMMAR

Yes that's what I thought: gender and number are not always given.

On the other hand I think it is possible to create a model like "Gender" (a new MagicWords?) which tells you if the last letter of a word is a consonant or a vowel and to act consequently.

Ghybu (talk)21:00, 22 December 2022
  • The magic keyword "GENDER:" is already taken but for a limited usage: its's first parameter is a username on the local wiki (and by default it is the user reading the page, whose human gender may be unknown or volutarily chosen by that user to be neutral). It is completely unrelated to the grammatical gender (which may even follow other rules, e.g. for non-humans, unanimated concepts, or for reasons of style/politeness/respect or irrespect, or that sometimes changes depending on plurals or uncountability).
  • We have a keyword "GRAMMAR:" whose first parameter is already a "tag", but that doesnot accept list of tags, some required that may be needed, some other optional to alter the renered variant).

But we have NO way to return BOTH the text of the translated variant, AND classification tags on output; only one input tag is accepted.

  • The magic keyword "PLURAL:" has no input tags, it only takes a numerical value on input.

Returning tags only on output will be inefficiant: it requires returning all possible variants, each one with their own tags for external selection, or return a some functional object that will be able to generate the selection without having to enumerate all possible variants. For common cases, like derived terms in the conjugation of verbs, or grammatical cases, and more complex clases like German verbs with their "detachable" particles, or pronominal verbs, we need input tags. Output tags should be used for the generated sets of variants. then we'll need some orchestrator to reduce the sets of variants and generate the calls of functions that will make the selctions. But for translators this is a complex task to do: we need to allow them to just being able to add variants and complement each output with one or more tags that can can easily select form a set of known tags; and these tags must make sense to human translators (so tag names must follow some known and documented convention). That's not somethinb easy to design in a snigle basic "magic keyword" in MediaWiki, except for very limited use cases.


IMHO, using tags on input rather than output will be more powerful and will offer a simpler interface. And making all tags on input requires using a "functional" syntax, but with possibly unordered parameters (the input would just be some unordered set of tags. Adding variants for a given message to translate just requires each added variant to have distinct sets of tags, including the empty set which would be the default translation (that should be always usable in isolation, e.g. as items in a bulleted list or in an index, but not as a "title" or section header as it adds specific some usage context requiring its own generative tag fior example for its capitalization).

If we accept the fact that "tags" cannot be named with any space in them (only alphanumeric or hyphens), then such solution is implementable in MediaWiki using the existing "GRAMMAR:" magic keyword as:

{{GRAMMAR: |tag1 tag2 = variant1 |tag1 tag3 = variant2 | ...}}

and it would be important to remember that the order of those tags is not significant (so "tag1 tag2" or "tag2 tag1" are equivalent. Optionnally some wildcard may be used but it would complexify the task for translators. Tags should be wellknown, documented and not generated as they want, so that each one can be validated.

But this new syntax (only to be used in the middle of a translated message, may as well use another separate keyword like "SELECT:", "VARIANTS:" and so on.

For the case of variants making the whole content of the message, we don't necessarily need any syntax to be exposed to translators, as they may just have a "+" button to add variants and an extra field where they can add suitable tags (the validator would just have to check that sets of tags are distinct for each added variant of the message, and would store them in some "canonical" order). And the UI may offer facilities to easily select "known" tags from a repository appropriate for each language, so that translators have less difficulties to use select and use them. That syntax would be hidden.

Any translated message (even if it has a single variant listed) could have one or more tags added to them (including the single default variant): those tags would then be able to provide the needed semantics that also allows generators (like conjugators for verbs) to work with them. Internally each variant stored would be used as a pure functional object, capable to use tags given on input and the defined text of the variant, to generate the simple text output. Basic i18n libariies or applications would ignore the input tags and would work only with the defined text (but would loose the semantic of that text if they use it to generate other texts).

The main difference is that stored translations would not longer be opaque texts

Verdy p (talk)22:48, 22 December 2022
 

@Ghybu: Yes, that is possible, I have done so for genetive GRAMMAR in Norwegian; when words are put in genitive in Norwegian, we just add an "s", unless the word ends in s, x or z, then we add an apostrophe instead. It should be easy to do something similar for Kurdish where the output depends on whether the final letter is a vowel or not.

Jon Harald Søby (talk)12:02, 23 December 2022

There are some complexities: the content you test for a final s may be formatted (so at end of the you may find HTML tags around the text, or some image, or some existing apostrophe-quotes for the MediaWiki syntax of bold/italic styles). The last displayed character in the content may also be some punctuation, or it may be hidden because that content was generated or formated with a template transclusion or function call in a module (and that template or function may also not provide the metadata for the grammatical or lexical semantics of what it returns with the same template call: the template or function would have to do that change itself, other wise the parsing will be complex and may be faulty).

Here again we fall on the assumption that a single (wiki)text result is sufficient. But then where do we store and return the necessary metadata that allows doing correctly further processings? This could be in the same (wiki)text, however this requires defining an encoding syntax for that (could be some hidden tags that get stripped at end of processing, or some JSON or XML syntax, magic keyword, along with some escaping mechanisms for safer encapsulation... as long as further processes can handle it)

Verdy p (talk)13:21, 23 December 2022

@User:Verdy p Are you saying template Grammar shouldn't be used? We could make do with something like MediaWiki:Aboutsite/fi.

@ User:Jon Harald Søby Is there a know-how for creating something similar to what Finnish uses?

Balyozxane (talk)16:25, 25 December 2022

The Grammar documentation page is not about any "template". It's a parser function of MediaWiki (note the required presence of the colon after the keyword "GRAMMAR", before all other pipe-separated parameters, instead of the pipe if this was a template named "GRAMMAR").

In some cases we could use a template-syntax also on MediaWiki, as a possible wrapper (or workaround) for using the parser function. Some non-MediaWiki projects may also use their own syntax, or use a syntax similar to these two in MediaWiki. It does not matter, I've not said that either syntax "shoudn't" be used.

But the usable syntax must be clear, because the keyword used is significant (it may or may not be case-sensitive: the names of parser functions are not case-sensitive, and may be translated with some known aliases, but in translatable messages, we should always use the untranslated name which should still be usable on non-English wikis; however for MediaWiki template names, there's no English aliases unless they are defined as redirected pages (and to avoid the deletion of these redirects, that may not be used in Wiki pages but only in translatable messages, they should be documented as aliases/synonyms with an internal comment on the redirect page or a categorization saying that it's needed for translatable MediaWiki messages).

The other problem then is the syntax of parameters, the parameter names if they are named, and their order if it is significant; then comes the syntax and sematincs of their value (what is permitted, and the interpretation of values if there are restrictions).

Using the parser function syntax may add more restrictions/limitations than when using the template syntax, which could also applies some transforms or add supplementary features, including supporting aliased values, fallbacks and so on, whilst being written either entirely as a template, or using MediaWiki parser functions, or invoking a function in a Lua module supported by the Scribunto parser function, or a mix of all that).

The resulting syntax used in the translatable message does not mean that the project using that message absolutely be using MediaWiki, but the syntax finally used in the translatable message should be minimalist (with minimal technical tricks, which will also simplify the work performed by the local message validator used in translatewiki.net) so that any implementation may be easy plugged in the project via its own i18n library, or easily convertible to the syntax that will be finally be used in the runtime of the deployed application/project (such conversion would occur when exporting messages from translatewiki.net to the project's repository via its own import tool. And to help translators, the project page in translatewiki.net should have a link to the documentation related to that supported syntax for supporting such "grammar" extension.

Note that if the project uses a template-like syntax, it may be possible to also provide a syntax helper also in translatewiki.net, as long as the template name used is specific enough and related to the project using it. That syntax-helper may effectively render a link on translatewiki.net to that documentation page for that project. And IMHO, if the syntax used is for a non-MediaWiki-based project, the name chosen should include some project-specific prefix (instead of using a blind "GRAMMAR" name to reserve for MediaWiki-based projects; for example it it's a project in Python, Javascript, or C/C++: the MediaWiki syntax for transcluding templates with parameters is flexible enough to be easily supported by many other non-MediaWiki implementations; the syntax with parser functions is less flexible and adds further complexities for massage validation in translatewiki.net: see for example the complexities caused by "PLURAL" syntaxes).

Verdy p (talk)18:07, 25 December 2022
 

For common nouns in Kurmancî Kurdish, you need to know the gender of the words (masculine or feminine) for the declension. However, other than usernames, gender is not specified.

So if we create it, it will only be useful for usernames.

But I think it is necessary to create a model that tells us if the last word of a word is a vowel or a consonant to solve this kind of problem.

Then by combining the "Gender" template and knowing the last letter of the username, we can even decline it...

And I think it's possible to do it, you just have to find the time to make the request on "Phabricator" :)

Ghybu (talk)18:29, 25 December 2022

In some translated messages, it may be possible to find a formulation that does not depend on the grammatical gender (or sometimes plural) of the entity, and does not require any conjugation. For example, instead of translating "$1 sent a message sent to $2", you would translate it as if it was "message sent (sender: $1; recipient(s): $1)". Such trick is used in Russian, for example, but often requires adding some punctuation and some reordering, so that variables can be used in isolation form. However the result may not always be as easier to understand than just keeping a single form assuming a default gender, a default plural form, a default grammatical case or a default article.

Handling the case of contractions and internal mutations can be very tricky with a basic transformation rule that would ignore lot of exceptions and common usages.

The same is true when you attempt to derive an language name into an adjectival form, because the language name is sometimes plural and may also include already the term translating "language" that you'd want to use with the translated adjectival form of the language name: this cannot be done if the translation is just a single text and does not self-contains any metadata for correctly reducing its set of possible derivations, and finally select the best one at end of the text generation.

If there still remains multiple choices, each one of these choices should have at least one distinctive metadata, or should be sorted by order of occurence in the set, or better distinguished by some given or computable "score" metadata; one such computable score could be to use the shortest formulation, but an actual metadata score may contain some tag indicating a more common modern usage, or tags for politeness or level of formality.

If the text output is in MediaWiki or HTML or some other rich text format, it may even be possible to not make any predetermined choice, by using a rendering form based on dynamic features (provided by HTML or Javascript or similar), which could take into account user preferences, so that the actual display text may be derived on the client side (there are some examples in Wikipedia, where the generated text contains some microtagging semantic features, allowing such client-side modifications of the visible contents; this occurs for example with dates or with some accessiblity features; however the server or application using those "rich messages" must be prepared to be able to deliver the accessibility tool, or should document and standardize the microtagging system it uses. And such thing is not just needed for translation, it could be useful as well in any monolingual content (even in English only). So this is not just a problem of "internationalisation" (18n) or "regionalisation" (r15n) but more generally a problem of "localisation" (l10n). You have a good example with the orthograph of quoted words in the last sentence (is it an "s" or a "z" in English?).

Verdy p (talk)21:34, 25 December 2022

Personally, I don't see a real problem. A person using the site (origin of messages) will know how to use these templates correctly. If there is an error, the translator will come back here to correct it... And if he's nice :) he will complete the documentation ("Information about message") to help others so that he doesn't the same error!

Ghybu (talk)06:19, 26 December 2022

Yes but a template-based solution will not work on all wikis if these are generic Mediawiki messages (e.g. to be deployed to Miraheze, or elsewhere). The doc may however be linked and shared on Mediawiki-Wiki. Such thuing cannot be decided isolately.

Verdy p (talk)13:13, 26 December 2022

It's already decided. There is a Grammar parser function and we want to utilize it only for SITENAME and Usernames. Why are you complicating this even more? "message sent (sender: $1; recipient(s): $1)" is not always possible to implement and we need declensions for SITENAME. For example we have to omit SITENAME on MediaWiki:Searchsuggest-search/ku-latn or use "SITENAME: Lê bigere" which is not ideal at all. We encounter SITENAME in translations enough to require utilizing something that already exists. Can we stop discussing issues in Localization please and start doing something about our problem?

Balyozxane (talk)19:33, 26 December 2022