Structured glossary

Structured glossary

In the FAQ there's a question: "How can we ensure consistent localised terminology?"

It's a very important question, but the answer is unfortunately a bit weak. It gives meta:Translation teams/fr/English-French Wikimedia Glossary as an example of an English-French glossary. That page has useful information for French translators, but technically it is just a plain page. It could be more structured - it could suggest translations according to the meaning of the word in context. Currently the Translate extension only suggests translations from automatic translation memory; it is often useful, but it is just guesswork.

I am not familiar with any Free Software translation platform that has such a feature, but i am familiar with one that is not free: Facebook. Its translation interface has many, many bugs and problems, but it has this nice feature: the writer of the English source message can mark specific words as referring to specific terms and the translator will see a list of these words, to which he needs to pay special attention and translate consistently. You know, words such as "friend", "like", "comment" etc.

Needless to say, i wish something like this was here in TWN, but of course i understand that it's not trivial. Is anyone familiar with any other Free translation platforms that do have something like this?

Amir E. Aharoni09:57, 16 April 2011

You can supply Google translate your own glossary when asking for translations. Though not really sufficient in general, imho, it fits nicely with the way translations are handled here at translatewiki.net since we can have glossries local to projects, extensios, or even message groups, if we decide so. I have on my agenda to dig into supplying glossaries (pages in TWN) to Google Translate, but since Google translate does not do Colognian/Ripuarian, which I would want it for, it's low on my priority list.

There is a pretty versatile open souce translation tool, OmegaT, that has a glossary functionality allowing one to select from a list of likely terms. I've only briefly tested it but could not get along with the glossary functionality.

Purodha Blissenbach10:53, 16 April 2011
 

As far as I know no other free translation platforms have glossary/terminology features either.

I'm part of a group that aims to build a term bank (including software and practices) for all fields of science in Finland. They are currently exploring different solutions and starting prototype projects with a wiki platform. What they come up with might be also useful to us.

Another direction not necessarily orthogonal to that one could be to investigate what is SemanticGlossary from the Semantic MediaWiki project.

I'm also programming an ontology edit interface plugin for MediaWiki, but that is unlikely to provide anything useful to us.

What is clear that the need is there, and there is lot of activity around, but that we don't have resources to build anything totally new on our own. If there comes up a existing solution we can use and adapt, then there is hope to get terminology stuff into twn.

Nike11:03, 16 April 2011
Edited by author.
Last edit: 20:28, 16 April 2011

My as of yet very superficial view of Extension:SemanticGlossary is that it adds markup to MediaWiki that you can use to annotate terms and abbreviations thus that:

  • hovering the mouse cursor over them may show a textual annotation or explanation,
  • terms and their annotations can be viewed as (sortable?) lists via a special page,
  • Semantic Mediawiki Web and queries can be used on them,
  • it looks like not being able to ideally deal with ambigous terms (such as "blow" being both several verbs and several nouns, think of wind, boxing, fuses, misfortune, doors and windows, noses, glass, eggs, saxes and trumpets, bubbles, and many more)

While the latter can likely be overcome somehow with disambiguation techniques, I doubt, it would currently be acceptable for

  • most translators to annotate their translations with it,
  • developers to add likewise annotations in either messages or message documentations,
  • minor: twn would need to strip annotations from messages on export.

SemanticGlossary can of course be used to develop a glossary or thesaurus of terms, up to a complete (monolingual) dictionary. Another project doing something similar is Omegawiki, only that Omegawiki has a whole lot of additional goals, and is multilingual. At the moment, I see several drawbacks, none of them prohibitive though:

  • I doubt that Omegawikis database would be very helpful to us at the moment lacking many of the needed terms, but we could find ways to supply them.
  • Omegawiki proper is geared towards all sorts of terminology, and we would have to find a way to mark very specialized terminologies, such as that of a specific extension of a specific piece of software. Not a software problem, but one of labour and proper coordination and preparation.
  • We do not have an interface in Extension:Translate querying Omegawiki, even less so updating Omegawiki, and wether or not the latter would at the moment (already) be accepted by Omegawiki needs to be found out.
  • Omegawiki at the moment has no way to deal with "ad hoc" translation relations. It needs at least one verbal description of a concept (term) in one language to be able to start relating translations to it. While this may well turn out be be beneficial to us, it is most often not easily done. And it should be done well! As experience shows, one can easily spent half an hour, or even two hours, figuring out a good and comprehensible definition / description / explanation of a term. A good translator is not necessarily a good definition writer.
  • Supplying definitions would likely be a separate step in twns workflow.
  • Translating them, though not neccessary from a strict technical and egoistic perspective, would mean additonal labour and require additional translator skills, apart from the ability of translating interface messages.

A somewhat similar project and inspired by Omegawiki is Ambaradan. It can do with and without definitions / descriptions / explanations. Unlike Omegawiki, it can be configured to only use local strorage with and without data exchange with the rest of the world. Otherwise, I cannot really say much about it at the moment, since all my knowledge is more than 2 years old and theoretical.

Google Translate glossaries are pretty simplistic. They can be multilingual, wich may come handy for us. They can handle ambiguities in a "many to one" fashion (i.e. quasi-synonymes) but not vice versa in one glossary. If I understand that right, they are supplied with each translation request, so it would not be feasible to have very large ones. They appear great to supply limited nonstandard terminology for the current translation context.

Purodha Blissenbach15:20, 16 April 2011

I actually wrote my Bachelor's thesis about using Omegawiki to help interface translation. The result was shortly put that it is not worth the hassle.

Nike17:03, 16 April 2011

That is a pity. Assuming that you wrote in Finnish, which I do not even rmotely read, it is likely useless to ask for a copy?

Without research, and putting the question of writing definitions aside (possibly following the Ambaradan approach (briefly: If you do not have definitions, number them and wait for texts to be written later - possibly forever) but doing crude disambiguations (such as "File <menu item>" versus "File <name or container of data>" where required), I am believing that a multilingual collections of terms can speed up translation processes. At least they could be used as an additional kind of translation memory. Since having been involved in a machine translation project both as a learner and a programmer when I was a student, my belief has "some" practical background :-) thus I would sincerly be interested to understand what lead you to the conclusion that, Omegawiki would not be helpful enough.

I agree, it would not help, unless properly coordinated and prepared, and this would have to be an extra step in the workflow, which had to take place ideally before translations begin, and would require access to programmers / developers / designers knowledge, and as a byproduct provide more and better message documentation.

Would you recommend feeding TWNs glossary data, if we collected any, to Omegawiki? Provided someone wanted to do that, of course.

Purodha Blissenbach20:26, 16 April 2011

Basically it would be lot of effort for very little gain:

  1. OmegaWiki doesn't currently contain nearly any of the DefinedMeanings we need
  2. Even when it does, they have definitions in less than ten languages by average, usually including those languages which need it least.
  3. OmegaWiki lacks pretty much all high level term management functions (splitting, merging, changing terms)
  4. Proper integration would be a lot of work. Message annotation is slow and difficult and OmegaWiki is not helping us in this process.
  5. If we basically need to build our own terminology from scratch, why make it complicated and use OmegaWiki, which is not integrated in twn in any way
Nike08:25, 17 April 2011

Agreed with all you write.

I've been looking into several projects and programs doing what we do here, i.e. the "translation" part of localizing software. Only OmegaT works with glossaries, but I do not know about any free glossaries related to the kind of terminology we need, so that needs research.

We must perspectively look into building or own glossaries. Good is that they can be very specific. I doubt :-) references to usage contexts such as "Mediawiki extension 'Socialprofile'" would be accepted in more general wordbook projects. They are however in Ambaradan, and maybe in Omegawiki, should we choose to automatically share collected data.

Let's see, what I am going to find, while doing some glossary related tests and research anyways in the future. I shall report.

Purodha Blissenbach18:35, 21 April 2011