Jump to content

GRAMMAR

There are some complexities: the content you test for a final s may be formatted (so at end of the you may find HTML tags around the text, or some image, or some existing apostrophe-quotes for the MediaWiki syntax of bold/italic styles). The last displayed character in the content may also be some punctuation, or it may be hidden because that content was generated or formated with a template transclusion or function call in a module (and that template or function may also not provide the metadata for the grammatical or lexical semantics of what it returns with the same template call: the template or function would have to do that change itself, other wise the parsing will be complex and may be faulty).

Here again we fall on the assumption that a single (wiki)text result is sufficient. But then where do we store and return the necessary metadata that allows doing correctly further processings? This could be in the same (wiki)text, however this requires defining an encoding syntax for that (could be some hidden tags that get stripped at end of processing, or some JSON or XML syntax, magic keyword, along with some escaping mechanisms for safer encapsulation... as long as further processes can handle it)

Verdy p (talk)13:21, 23 December 2022

@User:Verdy p Are you saying template Grammar shouldn't be used? We could make do with something like MediaWiki:Aboutsite/fi.

@ User:Jon Harald Søby Is there a know-how for creating something similar to what Finnish uses?

Balyozxane (talk)16:25, 25 December 2022

The Grammar documentation page is not about any "template". It's a parser function of MediaWiki (note the required presence of the colon after the keyword "GRAMMAR", before all other pipe-separated parameters, instead of the pipe if this was a template named "GRAMMAR").

In some cases we could use a template-syntax also on MediaWiki, as a possible wrapper (or workaround) for using the parser function. Some non-MediaWiki projects may also use their own syntax, or use a syntax similar to these two in MediaWiki. It does not matter, I've not said that either syntax "shoudn't" be used.

But the usable syntax must be clear, because the keyword used is significant (it may or may not be case-sensitive: the names of parser functions are not case-sensitive, and may be translated with some known aliases, but in translatable messages, we should always use the untranslated name which should still be usable on non-English wikis; however for MediaWiki template names, there's no English aliases unless they are defined as redirected pages (and to avoid the deletion of these redirects, that may not be used in Wiki pages but only in translatable messages, they should be documented as aliases/synonyms with an internal comment on the redirect page or a categorization saying that it's needed for translatable MediaWiki messages).

The other problem then is the syntax of parameters, the parameter names if they are named, and their order if it is significant; then comes the syntax and sematincs of their value (what is permitted, and the interpretation of values if there are restrictions).

Using the parser function syntax may add more restrictions/limitations than when using the template syntax, which could also applies some transforms or add supplementary features, including supporting aliased values, fallbacks and so on, whilst being written either entirely as a template, or using MediaWiki parser functions, or invoking a function in a Lua module supported by the Scribunto parser function, or a mix of all that).

The resulting syntax used in the translatable message does not mean that the project using that message absolutely be using MediaWiki, but the syntax finally used in the translatable message should be minimalist (with minimal technical tricks, which will also simplify the work performed by the local message validator used in translatewiki.net) so that any implementation may be easy plugged in the project via its own i18n library, or easily convertible to the syntax that will be finally be used in the runtime of the deployed application/project (such conversion would occur when exporting messages from translatewiki.net to the project's repository via its own import tool. And to help translators, the project page in translatewiki.net should have a link to the documentation related to that supported syntax for supporting such "grammar" extension.

Note that if the project uses a template-like syntax, it may be possible to also provide a syntax helper also in translatewiki.net, as long as the template name used is specific enough and related to the project using it. That syntax-helper may effectively render a link on translatewiki.net to that documentation page for that project. And IMHO, if the syntax used is for a non-MediaWiki-based project, the name chosen should include some project-specific prefix (instead of using a blind "GRAMMAR" name to reserve for MediaWiki-based projects; for example it it's a project in Python, Javascript, or C/C++: the MediaWiki syntax for transcluding templates with parameters is flexible enough to be easily supported by many other non-MediaWiki implementations; the syntax with parser functions is less flexible and adds further complexities for massage validation in translatewiki.net: see for example the complexities caused by "PLURAL" syntaxes).

Verdy p (talk)18:07, 25 December 2022
 

For common nouns in Kurmancî Kurdish, you need to know the gender of the words (masculine or feminine) for the declension. However, other than usernames, gender is not specified.

So if we create it, it will only be useful for usernames.

But I think it is necessary to create a model that tells us if the last word of a word is a vowel or a consonant to solve this kind of problem.

Then by combining the "Gender" template and knowing the last letter of the username, we can even decline it...

And I think it's possible to do it, you just have to find the time to make the request on "Phabricator" :)

Ghybu (talk)18:29, 25 December 2022

In some translated messages, it may be possible to find a formulation that does not depend on the grammatical gender (or sometimes plural) of the entity, and does not require any conjugation. For example, instead of translating "$1 sent a message sent to $2", you would translate it as if it was "message sent (sender: $1; recipient(s): $1)". Such trick is used in Russian, for example, but often requires adding some punctuation and some reordering, so that variables can be used in isolation form. However the result may not always be as easier to understand than just keeping a single form assuming a default gender, a default plural form, a default grammatical case or a default article.

Handling the case of contractions and internal mutations can be very tricky with a basic transformation rule that would ignore lot of exceptions and common usages.

The same is true when you attempt to derive an language name into an adjectival form, because the language name is sometimes plural and may also include already the term translating "language" that you'd want to use with the translated adjectival form of the language name: this cannot be done if the translation is just a single text and does not self-contains any metadata for correctly reducing its set of possible derivations, and finally select the best one at end of the text generation.

If there still remains multiple choices, each one of these choices should have at least one distinctive metadata, or should be sorted by order of occurence in the set, or better distinguished by some given or computable "score" metadata; one such computable score could be to use the shortest formulation, but an actual metadata score may contain some tag indicating a more common modern usage, or tags for politeness or level of formality.

If the text output is in MediaWiki or HTML or some other rich text format, it may even be possible to not make any predetermined choice, by using a rendering form based on dynamic features (provided by HTML or Javascript or similar), which could take into account user preferences, so that the actual display text may be derived on the client side (there are some examples in Wikipedia, where the generated text contains some microtagging semantic features, allowing such client-side modifications of the visible contents; this occurs for example with dates or with some accessiblity features; however the server or application using those "rich messages" must be prepared to be able to deliver the accessibility tool, or should document and standardize the microtagging system it uses. And such thing is not just needed for translation, it could be useful as well in any monolingual content (even in English only). So this is not just a problem of "internationalisation" (18n) or "regionalisation" (r15n) but more generally a problem of "localisation" (l10n). You have a good example with the orthograph of quoted words in the last sentence (is it an "s" or a "z" in English?).

Verdy p (talk)21:34, 25 December 2022

Personally, I don't see a real problem. A person using the site (origin of messages) will know how to use these templates correctly. If there is an error, the translator will come back here to correct it... And if he's nice :) he will complete the documentation ("Information about message") to help others so that he doesn't the same error!

Ghybu (talk)06:19, 26 December 2022

Yes but a template-based solution will not work on all wikis if these are generic Mediawiki messages (e.g. to be deployed to Miraheze, or elsewhere). The doc may however be linked and shared on Mediawiki-Wiki. Such thuing cannot be decided isolately.

Verdy p (talk)13:13, 26 December 2022

It's already decided. There is a Grammar parser function and we want to utilize it only for SITENAME and Usernames. Why are you complicating this even more? "message sent (sender: $1; recipient(s): $1)" is not always possible to implement and we need declensions for SITENAME. For example we have to omit SITENAME on MediaWiki:Searchsuggest-search/ku-latn or use "SITENAME: Lê bigere" which is not ideal at all. We encounter SITENAME in translations enough to require utilizing something that already exists. Can we stop discussing issues in Localization please and start doing something about our problem?

Balyozxane (talk)19:33, 26 December 2022