Enable [[Project:Terminology gadget|Terminology gadget]] by default?

I would add though, related to the concerns above - it would be useful if it were possible to maintain a list or table for a given term, rather than a single string, to account for complexities in how the word is supposed to inflect, or differences such as the inclusion of diacritics where in the context of a sentence it could be ambiguous. For example, in English there is "page" and "pages," in Punjabi this is صفحہ and صفحے in the direct case, but in the oblique case (followed by a postposition), we have to use صفحے and صفحیاں for singular and plural respectively.

Bgo eiu (talk)18:33, 2 October 2022

That's in fact omore complicae than that in various languages, which also require special aggutination (no prefixes, suffixes, infixes), mutations or reductions (e.g. transforming "de le" into "du" in French, though that those term would not be in the glossary), contextual elision (frequent in French or Italian for pronouns, articles, and particles), deagglutination (think about German verbs whose prefix can be detached and moved to the ned of sentence as in: "(Something) angehen" - infinitive; into: "Ich gehe (something) an." - indicative). In French and English we can also have a variation about whever some words or prefixes should be glued, attached by an hyphen, or detached, As well there are wellknown words with several accepted orthographies (depending on whever an ortographic reform was applied or not).

Now consider the case of conjugating verbs; there are too many forms (and listing them exhaustively requires lot of data: see for example conjugation rules in French Wiktionnary; and they are still not exhaustively listed, because feminine or plural variations of their participles depending on other words are not shown). Complex rules also exist for Slavic languages, and usually verbs are listed by their infinitive (but this is not the case for Latin where traditioanlly they are listed by their first person present indicative, or by the two first persons and the infinitive; but there are also many defective forms where only some persons or tense mode are existing, so this is not a general rule...).

If we start enumerating all forms, there's no well defined order. In my opinion we should just list the main lemma entry found in dictionnaries, e.g. within Wiktionnary, which could be linked by a lexical Wikidata item, that would list all their other forms (but attaching links to other languages will remain very difficult as it is hard to bind lemmas, where it is simpler to bind specific forms to their main lemma entry within the same language (but there are also complex exceptions there).

Verdy p (talk)19:39, 2 October 2022

Attaching lexemes is a good idea, at least for now we may even just link these in the notes section, I had not thought of this.

What I had in mind though was that in including a limited selection of forms rather than a full conjugation table, it would be helpful to parse that grammatical information more easily. The nature of translating for software interfaces necessarily limits the contexts in which different words are used, making particular forms more used than others. For example, a typical Punjabi verb has over 100 possible forms. However, for some we may only need imperatives and for others we may only need imperfect participles. There are also some considerations for underlying meaning, for example, for the verb 'to put' I am only using the politest imperative form پایو rather than پاؤ which sounds forceful or پا which is just rude. So I would indicate that this particular form is the pertinent one, along with four perfect participle conjugations and a conjunctive participle form. For other verbs though, like 'to search,' I would make it more polite if the interface is telling the user to search as کھوجیو but normal politeness for the user telling the software to search as کھوجو. (Even though the software is not a person, being rude or informal with the software would not be read well.)

Bgo eiu (talk)19:39, 3 October 2022
 

I would say that the gadget expects users to know how to inflect a given word in their own language by themselves. But in any case, both fields in the gadget can use normal wikitext, including templates , so if you make a template that can show inflections of words there, feel free to go ahead and use it. Modules are not available on Translatewiki at the moment, and even if they were, I don't know if there's any way for them to access Wikidata lexemes the same way modules in Wikimedia projects can, so using lexeme data could be tricky. That would be beyond the purview of the gadget, though.

Jon Harald Søby (talk)13:43, 14 October 2022

I think just linking forms in the notes is sufficiently helpful that not much else is needed now that I've thought about it.

It is not really a matter of knowing how to inflect words but rather a reminder of which inflections make sense given the context. This is not always obvious from the English source strings, and a number of inflections are exclusively colloquial or exclusively written, or specific to a dialect. Potohari/Mirpuri Punjabi dialects for example allow for future tense while "standard" Punjabi does not, and in the Doabi dialect I am familiar with verbs can have negative conjugations which aren't used in other dialects. So there are various reasons why linking additional context might be helpful, especially for "low resource" languages which don't have as much precedent for software translations.

Bgo eiu (talk)15:09, 14 October 2022

Agreed, English can be confusing there, especially since the present tense, infinitive and imperative are almost always identical in English. Those ambiguities should be noted in the message documentation, though, since they vary by message (and not by word form), and knowing which one is intended is useful for many/most languages.

Jon Harald Søby (talk)00:27, 15 October 2022