|Thread title||Replies||Last modified|
|Request for comments/Help for starting project terminologies||23||13:52, 11 March 2022|
|Collection Komputeko||2||16:07, 7 March 2014|
|Need improvement in Search||3||22:24, 13 July 2012|
|group-specific doc||6||22:01, 13 July 2012|
|Structured glossary||7||18:29, 13 July 2012|
|Using Semantic MediaWiki for identical messages collection||2||13:33, 12 July 2012|
Last edit: 13:52, 11 March 2022
I've noticed some languages have started to build their own wordlists. That is good and fine, but how it is done is not in the spirit of translatewiki.net. Like message documentation it should be a collaborative work where anybody can enjoy the results.
For this reason I'd like to start project, for which the aim is to build monolingual wordlist/terminology in English for (some) of our projects. This kind of terminology would greatly help translators in choosing correct translations. It is also much easier to translate premade terminology than build one from scratch, since more than a half of the work is already done (collecting the terms and defining them, maybe even adding some relationship trees).
We don't currently have software to work with this kind of data, but I think we can start with simple word lists. Having actual data here is a good driving force for creating applications that use it.
I think this is a great idea. Some sort of glossary is often much needed, and it's especially so now that people here can translate multiple project and don't know all of them well. What do you mean with "simple word list", though?
Some projects don't always use a consistent terminology and this doesn't help; we should probably try to fix this as well, and perhaps some simpler process is needed.
I think that it would be even more useful and important to define the "official translations" for each project and language, so that people know how to translate technical words and they use a consistent translation. This could be done perhaps choosing the most used words or group of words in the source language and then finding their most common translations with our internal "translation memory", leaving some space for manual correction and discussion.
This is a very good idea as it sometime can be very hard to maintain a consistent terminology in a project, especially when there are several translators working on it.
As for how it could be done, as I seen it the ideal way would be that each project had a terminology list in English that then could be translated to the different languages. And then be just as a kind of glossary when translating.
And what would be really nice is if the software automatically could detect words from the glossary, and show it together with the translation in the translation window.
Automatically listing glossary items in translation pages would require a complete list of grammar forms of each glossary item of the source language. We start with English, but there are translators using translations other than English as their source languages.
The software could just use the English text to find what words from the glossary are in the text, and then show them in every language the user desire.
I was bold and started Terminology page. We should probably start by collecting existing word lists in this wiki (or elsewhere).
Sure. We could then see if there's some overlap and they can be combined in some way.
With regard to Terminology/mediawiki, it's really a shame that there's no MediaWiki glossary (apparently, only w:en:Wikipedia:Glossary and Wikibooks:Help:Glossary), and we should create one, but probably not here, rather on mw:, where it would be of general use; on the contrary, I don't think that creating a complete MediaWiki glossary in all languages is really feasible.
The glossaries we need here should include only those terms whose translation is not obvious: terms with a very technical meaning, or words without a true equivalent, which are hard to translate or end up being translated in a number of different ways; such terms are a subset of the complete glossary + the list of other used words (which may have a clear meaning in English but still be hard to translate).
We could pick up from the general English glossary and word list the words we think are hard to translate and start to make a huge table with a row per term and a column per language; perhaps not all languages will need to establish an "official translation" for all/the same terms, but if there's some overlap it will be useful (and should address Purodha's concern).
A glossary entry must list at least these things:
- a word, phrase or expression,
- a context (usually a software product or an extension or group of those, but may be an logical area in a program, or "unspecified", where the more specific context overrides the less specific in search queries)
- a language plus possibly an area and a sccript.
Do we need more?
Definition. Without definition it is not easy to spot overlapping and ambiguous terms and to actually makes the messages more clear. I also think we should start with monolingual terminology.
I agree that a terminology would be useful and I also agree that we should start with a monolingual English (or in future any other source language of a project) glossary for each project.
Eventually we will need to devise a way for each language to build a bilingual glossary out of the main monolingual glossary efficiently. Having the monolingual glossary to build on is going to cut out a lot of work in setting up bilingual glossaries and it will mean that language teams are not doubling-up on the work of defining terms. It will also mean that all glossaries will be compatible.
I think it will enhance the projects themselves too, to have a glossary for its users. If we can produce a good glossary here, then we should be able to return this to the project as a service to their users. Our task is to make or improve on the English glossary already made at the project, with advice from the project developers and being reviewed by them. Then, we can create bilingual glossaries of the terms themselves, for our own use. Optionally, we could also translate the definitions and supply the project with translated glossaries, if our translators are willing to translate the definitions also. Translated definitions are going to be useful here in the future in the languages which are used as fallback by other languages, if we acquire translators who work primarily with the fallback instead of English.
I agree with having Definitions, at least where needed for disambiguations; and of course, they would not hurt otherwise. I agree to start with English. I do not mind adding other languages as soon as an original glossary entry is stable, but we should carefully design a method to to expand glossary entries
- without confronting users with an endless list of all possible language translations,
- allowing users to view translations in selected languages in parallel,
- allowing several equivalent translations, or synonymes,
- allowing definitions or explanations to be translated, too, so as to support translators who understand less or no English.
My first idea to create a glossary had been to set up a namespace "Glossary:" with translateable pages holding the expressions, and probably /qqq pages holding explanations, but that seems not sufficient now.
Just so we're all clear cause I really don't see it mentioned to clearly: The aim is that not only we have a list of the English terms, but that every English term has a translation to all the different languages that the translators can work from!
The problem with a monolingual wordlist is that it doesn't immediately encourage everyone's participation wiki-style. Perfecting a monolingual wordlist takes forever and will never succeed completely.
Defining everything in terms of a single language (English) also creates avoidable problems. Many English terms actually describe several different concepts depending on context. For instance, "to check" can have three meanings: 1. to mark, select; 2. to verify; 3. to examine, look up. In most (or all) other languages these are all translated with different words. So they should be three entries rather than one (or perhaps more; I might have missed something).
I don't think it's actually necessary to start off with a monolingual glossary and I disagree that translating is pointless before the monolingual glossary is complete. On the contrary, starting off multilingual would make problems such as the aforementioned example immediately obvious so they don't need to be fixed later.
I think this would be a good moment to mention OmegaWiki, which is basically a large-scale implementation of the above idea: each concept ("DefinedMeaning") has its own entry in the database with its translations ("expressions") in as many languages as possible. The key is that the central data unit is the concept a.k.a. DefinedMeaning, not a word in any particular language.
In order to make our central glossary we could either simply use OmegaWiki which is ready and functioning now, and already contains many computer terms -- or the admins could decide to install the Wikidata extension, which it uses to do its magic, here on this site so we can roll our own.
Even without using them or their software, I think OmegaWiki is a powerful example of how a truly multilingual glossary can work. If they can do it on that scale, surely a relatively minor thing like a software translation glossary is trivial by comparison.
Defining everything in terms of a single language (English) also creates avoidable problems.
That's why the terms should be defined.
I don't think it's actually necessary to start off with a monolingual glossary
We have to start somewhere, and I'm afraid that unless we direct attention we never get anything into useful state.
I think this would be a good moment to mention OmegaWiki,
I know about OmegaWiki, I even wrote my candidate's thesis about OmegaWiki and translatewiki.net. The conclusion was that it is not suitable for us. Terminology is different from what OmegaWiki is doing. And we are building a terminology, not a (multilingual) glossary. There are some ideas we could borrow, I don't deny that. Currently I'm exploring if we can take advantage of Semantic MediaWiki.
The problem with a monolingual wordlist is that it doesn't immediately encourage everyone's participation wiki-style. Perfecting a monolingual wordlist takes forever and will never succeed completely.
I also think that this is a bit of a worry. Would it help to break the job of writing the monolingual glossary, particularly the glossary for MediaWiki core, into sections of say 100 terms? After a monolingual section was done, it would be prepared and added to the glossary formatted for translation, in whatever format is decided upon. Once translation work is begun on the first section, we will also gain experience on how many revisions are needed to the definitions, and how often we can expect to have to split a term into two carrying different definitions.
I cannot see how else than in data base tables glossary data could be held so as to be useful for automated lookups, thus we shall need another extension (or an additional part of Extension:Translate) for their management.
Also, to make a glossary useful, we shall have to deal with grammaratical forms of words for those languages having ones. For exmple, in English, we have both "page" and "pages" which should technically share a glossary entry. In other languages, such as Finnish, we likely have a dozen, and in Turkish several dozen of such forms per word. As long as we translate from English only, this is not a huge burden, since there are not really many grammatical forms for most words, and the vast majority is following few simple rules. On the target side, at least for my work it would be feasible and useful, to have a list of all words translated via glossary with all their potential grammar forms that could appear in our contexts, which one could just click in the order needed to combine them into a translated sentence. --Purodha Blissenbach 04:08, 2 August 2011 (UTC)
Purodha, if I understand you correctly, you are suggesting expanding the use of a glossary to partially automate the translation process. I don't think that this would be worth the effort to set up, and in the case of Welsh is not practical, owing to the complexity of the rules for initial mutation. The human translator can choose a verb form or noun form in a split second. So I don't think that we should develop our glossary/terminology much further than is already planned.
However, designing an easy way to access the English definitions and the agreed translations (the root forms of nouns and verbs) during translation work would be useful. I think that is what you are discussing in your first paragraph above.
Whilst discussion is still continuing on the design of suitable software for the glossary/terminology, I will continue to contribute to the building of the English glossary in its present table form, on the assumption that all the raw material, when edited and agreed by consensus, can be converted to its final format when that has been set up.
Well, yes, you are right, I was looking at supporting the translation process by some options to click instead of type, similar to what we have alerady with TM (Translation Memories)
This would be hardly helpful, when the number of possible choices becomes too big. For my work, they're usually limited to three to five for nouns, which is just easy to handle. My intention was not, to ask for so much at this time, but rather keep something in mind as a potential in the future that should not unnecessarily be sacrificed too easily.
Maybe you are interested about collection of IT terms Komputeko. It's goal is to provide a multilingual IT terminology collection (for now without definitions). I worked on Slovak version some time ago (and transfered it to one Slovak student to finish it as a bachelor thesis). I am in contact with initiator and maintainers and can mediate the work (email me).
I've not read about Komputeko, but in theory we're doing FUEL. I don't know how FUEL is progressing but you may want to check the differences between the two projects and report back.
Last edit: 12:30, 4 September 2011
- I have done good number of translations here at translate wiki.We had to create and use several new localised terms. When I return back after long time to search some term and what term we used to localise , Even after using all tricks mentioned in FAQ support page more than often I end up return back empty handed.
- More important is the fact referring back new casual wikipedia reader to search certain term here , he finds it to complicated.
- Available translator force just enough to take care of minimal transaltion need and hardly we do have enough time to generate glossary.
- Besides single page glossaries are not ideal for searching.
- We need some better automated system which will search entire message database for given languge and provide glossary result.and that we should be more comfortable to refer a novice also to search through search link
- I suppose Translate wiki developers may have other priorities too , still I belive that this usability point may already in their mind,and they will take care of the same in due course.
Thanks and Reagards
We would love to have a terminology repo per language, but it does not have a high priority at the moment. We are working on features however that may help in the near future.
Although I do not have much time available to actually do sophisticated work on it at the moment, I am planning to automatically or semi-automatically extract bilingual (likely partial) dictionaries from existing messages, giving us "terminology in context" data, such as:
|allows to||määt et müjjelich,||*-desc messages of Extension:*||yes|
|namespace||Appachtemang||all of Mediawiki, text excluding wikilink-targets||no|
|user||Metmaacher||all of Mediawiki, text excluding wikilink-targets||maybe|
The leftmost (English) column of the list would likely have some overlap between languages. The "Context" and "Usable Elsewhere?" columns would likely require some human input, or adjustments. A raw list of suggestions could imho likely be extracted pretty automatically from some standard source format, such as a .po file, e.g. I believe, such data can be used to either help us to very raw translations for languages that are not covered by better means, or at least provide a list of clickable suggestions next to message edit fields, so as to save some typing effort.
If we can share ideas, insights, and thoughts, I would certainly like to.
Sorry to say: I'm all for thinking with you. However, over the past years you have often said you would be working on some code, without delivering much. I really hope this will change.
As for the terminology repository: I am almost certain that a standard for these already exists. Nikerabbit probably knows more about it and will be able to provide some web reference(s).
Would it be possible to have group-specific documentation? For example, words like "badge" or "template" can be translated in several ways in Portuguese, and this creates problems of consistency; especially when different message groups mean it in different ways. I was wondering if a page like Mediawiki:GroupName/pt (or maybe Mediawiki:GroupName/qqq/pt) could be made to show up among the translation helpers (translation memory, etc) for all messages from that group. Are these changes feasible/desirable?
What you're basically asking for is a localised product based terminology respository. Or maybe it should only be product based if so indicated explicitely. We have no idea yet how to implement such a thing, or what it should be doing exactly. For now my advise is to create these manually as a subpage of your language portal, and refer from there.
Yes, indeed, it would be sort of a dictionary, albeit a very brief one --a cheatsheet, if you will--, containing only the core terms for the message group, so that it would fit into 2 or 3 lines of text. Then it could be embedded the same way the qqq files are right now. The idea is for it to work quite similarly to the current scheme of /qqq subpages. The pattern to look for would be slightly different, but the rest of the current implementation could be reused (I suppose). Or are there other obstacles I'm overlooking?
(As for your statement "it should only be product based if so indicated explicitely", yes, that would make sense if we were talking about a large dictionary, but the idea here is about very short lists of terms used by that specific product, and therefore tailored to it, which would make them not exactly reusable.)
A few remarks: "quick and small hacks" tend to turn into unworkable long term solutions. manually implementing this in qqq is not the way to go. There needs to be some process that will match words in the source message to defined terminology and translations so that everything works automagically.
I agree with the quick hacks thing, but I wasn't suggesting to manually(?) append these cheatsheets to the qqq messages, but rather to replicate that mechanism, with some changes in the pattern it uses to fetch the pages to transclude. This in principle addresses your concern about quick hacks -- unless the qqq thing is itself considered a quick hack. But if so, even though one wrong (is it?) does not justify another, I gotta say, while what you're talking about makes perfect sense and would be awesome, it seems like something that would take a while to be implemented (if ever). And on the meantime, maintaining consistency across translations stays a cumbersome task...
Thanks for the suggestion. We actually did that, some time ago, but it was chronically incomplete, and not very usable, because (1) we'd have to have the page open in a separate tab while translating and move forwards and backwards and (2) some terms would have to be translated differently depending on the message group.
In the FAQ there's a question: "How can we ensure consistent localised terminology?"
It's a very important question, but the answer is unfortunately a bit weak. It gives meta:Translation teams/fr/English-French Wikimedia Glossary as an example of an English-French glossary. That page has useful information for French translators, but technically it is just a plain page. It could be more structured - it could suggest translations according to the meaning of the word in context. Currently the Translate extension only suggests translations from automatic translation memory; it is often useful, but it is just guesswork.
I am not familiar with any Free Software translation platform that has such a feature, but i am familiar with one that is not free: Facebook. Its translation interface has many, many bugs and problems, but it has this nice feature: the writer of the English source message can mark specific words as referring to specific terms and the translator will see a list of these words, to which he needs to pay special attention and translate consistently. You know, words such as "friend", "like", "comment" etc.
Needless to say, i wish something like this was here in TWN, but of course i understand that it's not trivial. Is anyone familiar with any other Free translation platforms that do have something like this?
You can supply Google translate your own glossary when asking for translations. Though not really sufficient in general, imho, it fits nicely with the way translations are handled here at translatewiki.net since we can have glossries local to projects, extensios, or even message groups, if we decide so. I have on my agenda to dig into supplying glossaries (pages in TWN) to Google Translate, but since Google translate does not do Colognian/Ripuarian, which I would want it for, it's low on my priority list.
There is a pretty versatile open souce translation tool, OmegaT, that has a glossary functionality allowing one to select from a list of likely terms. I've only briefly tested it but could not get along with the glossary functionality.
As far as I know no other free translation platforms have glossary/terminology features either.
I'm part of a group that aims to build a term bank (including software and practices) for all fields of science in Finland. They are currently exploring different solutions and starting prototype projects with a wiki platform. What they come up with might be also useful to us.
Another direction not necessarily orthogonal to that one could be to investigate what is SemanticGlossary from the Semantic MediaWiki project.
I'm also programming an ontology edit interface plugin for MediaWiki, but that is unlikely to provide anything useful to us.
What is clear that the need is there, and there is lot of activity around, but that we don't have resources to build anything totally new on our own. If there comes up a existing solution we can use and adapt, then there is hope to get terminology stuff into twn.
My as of yet very superficial view of Extension:SemanticGlossary is that it adds markup to MediaWiki that you can use to annotate terms and abbreviations thus that:
- hovering the mouse cursor over them may show a textual annotation or explanation,
- terms and their annotations can be viewed as (sortable?) lists via a special page,
- Semantic Mediawiki Web and queries can be used on them,
- it looks like not being able to ideally deal with ambigous terms (such as "blow" being both several verbs and several nouns, think of wind, boxing, fuses, misfortune, doors and windows, noses, glass, eggs, saxes and trumpets, bubbles, and many more)
While the latter can likely be overcome somehow with disambiguation techniques, I doubt, it would currently be acceptable for
- most translators to annotate their translations with it,
- developers to add likewise annotations in either messages or message documentations,
- minor: twn would need to strip annotations from messages on export.
SemanticGlossary can of course be used to develop a glossary or thesaurus of terms, up to a complete (monolingual) dictionary. Another project doing something similar is Omegawiki, only that Omegawiki has a whole lot of additional goals, and is multilingual. At the moment, I see several drawbacks, none of them prohibitive though:
- I doubt that Omegawikis database would be very helpful to us at the moment lacking many of the needed terms, but we could find ways to supply them.
- Omegawiki proper is geared towards all sorts of terminology, and we would have to find a way to mark very specialized terminologies, such as that of a specific extension of a specific piece of software. Not a software problem, but one of labour and proper coordination and preparation.
- We do not have an interface in Extension:Translate querying Omegawiki, even less so updating Omegawiki, and wether or not the latter would at the moment (already) be accepted by Omegawiki needs to be found out.
- Omegawiki at the moment has no way to deal with "ad hoc" translation relations. It needs at least one verbal description of a concept (term) in one language to be able to start relating translations to it. While this may well turn out be be beneficial to us, it is most often not easily done. And it should be done well! As experience shows, one can easily spent half an hour, or even two hours, figuring out a good and comprehensible definition / description / explanation of a term. A good translator is not necessarily a good definition writer.
- Supplying definitions would likely be a separate step in twns workflow.
- Translating them, though not neccessary from a strict technical and egoistic perspective, would mean additonal labour and require additional translator skills, apart from the ability of translating interface messages.
A somewhat similar project and inspired by Omegawiki is Ambaradan. It can do with and without definitions / descriptions / explanations. Unlike Omegawiki, it can be configured to only use local strorage with and without data exchange with the rest of the world. Otherwise, I cannot really say much about it at the moment, since all my knowledge is more than 2 years old and theoretical.
Google Translate glossaries are pretty simplistic. They can be multilingual, wich may come handy for us. They can handle ambiguities in a "many to one" fashion (i.e. quasi-synonymes) but not vice versa in one glossary. If I understand that right, they are supplied with each translation request, so it would not be feasible to have very large ones. They appear great to supply limited nonstandard terminology for the current translation context.
I actually wrote my Bachelor's thesis about using Omegawiki to help interface translation. The result was shortly put that it is not worth the hassle.
That is a pity. Assuming that you wrote in Finnish, which I do not even rmotely read, it is likely useless to ask for a copy?
Without research, and putting the question of writing definitions aside (possibly following the Ambaradan approach (briefly: If you do not have definitions, number them and wait for texts to be written later - possibly forever) but doing crude disambiguations (such as "File <menu item>" versus "File <name or container of data>" where required), I am believing that a multilingual collections of terms can speed up translation processes. At least they could be used as an additional kind of translation memory. Since having been involved in a machine translation project both as a learner and a programmer when I was a student, my belief has "some" practical background :-) thus I would sincerly be interested to understand what lead you to the conclusion that, Omegawiki would not be helpful enough.
I agree, it would not help, unless properly coordinated and prepared, and this would have to be an extra step in the workflow, which had to take place ideally before translations begin, and would require access to programmers / developers / designers knowledge, and as a byproduct provide more and better message documentation.
Would you recommend feeding TWNs glossary data, if we collected any, to Omegawiki? Provided someone wanted to do that, of course.
Basically it would be lot of effort for very little gain:
- OmegaWiki doesn't currently contain nearly any of the DefinedMeanings we need
- Even when it does, they have definitions in less than ten languages by average, usually including those languages which need it least.
- OmegaWiki lacks pretty much all high level term management functions (splitting, merging, changing terms)
- Proper integration would be a lot of work. Message annotation is slow and difficult and OmegaWiki is not helping us in this process.
- If we basically need to build our own terminology from scratch, why make it complicated and use OmegaWiki, which is not integrated in twn in any way
I think will be good idea to try to use Semantic MediaWiki for identical messages collection.
Instead of referencing to template in message documentation we should only declare identical properties there: identical name and variation (plural, with comma, etc).
Queuing on all message namespaces for messages with same identical name property (if defined) should be done on every documentation page.
To make server workload lower, actual queuing of data may be deferred until somebody press expand identical messages button.
This is just and idea, further talks with folks with deeper knowledge of Semantic MediaWiki is a must :-)
I thought the whole "identical messages" template had already been obsoleted by the translation memory feature.
Translation memory has own issues:
- It can't suggest similar messages to translate (by definition)
- It's current usage doesn't allow to check translation for consistency or possible error (missing colons, parameters, etc.)
- It simple can't suggest translation in quite trivial situations (see Toolserver:Toolserverstatus-toolserver-status-short-erro/en as example).