User Guide for the MyDutchPal Translators Toolkit
Most of the tools in this toolkit were developed as part of my machine translation work for Siemens Nederland over the years. I’m packaging them in this application in the hope that individual tools will be of use to translators in the course of their work – especially if they don’t use other language technology tools.
Do you need to list all the words in a document, or the frequency of key phrases? Do you wish you could reverse the direction of a translation memory? Would you like search in a bilingual corpus or concordance but don’t have a translation memory tool? This toolkit will help you do all that, and more.
Term Jotter (Glossary Manager)
This is a tool providing quick and easy storage of terminology. It is language independent and can handle a mixture of
languages and subjects – even within the same glossary. Glossaries can be exported to the standard TBX format (with a simple back-up to the CSV
format included). The tool is ideal for use as “temporary accommodation” for terminology.
Explanation of the buttons
Use this button to create a glossary. The name of the glossary is displayed in the status bar at the bottom of the screen. The glossary is located in the Glossaries directory which is a subdirectory of the user’s Home directory.
Use this button to select an existing glossary. You can only add, delete or find a term when a specific glossary is selected.
Click this button to add a term to the glossary. Terms can be typed in, pasted in or spoken (via dictation software).
Terms can be entered in any language and any script (although what you can display will depend on the constraints of your particular operating system).
The Note section of this term entry panel allows you to enter definitions, descriptions or examples of usage.
Click on this button to search for a term in the selected glossary. Individual terms can then be edited and the glossary updated by pressing the Update button.
Use this button to delete a specific term from the selected glossary
This button lets you display a view of the entire contents of the selected glossary.
Clicking this button takes you directly onto the Web from within the Term Jotter application. The landing page is Google but you can navigate where
you want on the Web to find a translation, definition or example of a particular term.
Use this function to import a list of terms (and their translations). The list can be in an Excel file (saved as a Tab Separated File) or a tab delimited text file in each each term and its tab separated translation occupy a single line. Source & target language are entered by the user at the start of the operation
and the system fills in the creation date automatically. The record can then be completed manually.
Export to TBX
Use this button to export your glossary to the international standard TBX format. A simplified version of the glossary (source, target, subject field) is also
exported in the standard CSV format, which can be imported into an Excel file.
Use this button to save and close the glossary.
The Tools item in the Term Jotter menu bar includes a list of language codes which you will need to enter the standard language codes in your glossary
(essential to create a valid TBX export file).
This application will give you a list of words and their frequency of use in any text file. This is handy for identifying the key words in a document – particularly for interpreters preparing to work in a situation where knowledge of specialist terminology is needed. The application can handle documents of any size.
This tool also works with text files. It goes through a document sentence by sentence and breaks each sentence down into ngrams (phrases) giving the frequency of occurrence for each phrase. This knowledge is very useful to the translator since he/she can identify the most common phrases in a document, particularly collocations, and make sure these are translated correctly. Phrases like “Ministry of Cultural Development” will be identified (in this case as a 4-gram). Frequently occurring phrases can be usefully stored in a TMX file (use our Tab2TMX conversion tool) which can be imported into a translation memory.
With this feature the translator can search a translation memory database (in TMX format) for a word or phrase – in either source or target language. The TM search engine will display all the records in which the search term (or phrase occurs). This is an ideal solution for translators who want to consult bilingual corpora but don’t want to use a translation memory environment.
This tool reverses the language direction of a translation memory so that the source language becomes the target language, and vice versa. This can provide useful resources to translators working in the opposite language direction. Combined with our Tmx2txt tool, it can also generate the text files needed to train a machine translation engine in this new direction.
The toolkit provides numerous converters. These are:
– Tmx2Txt (generates separate source & language files from a TMX file)
– Tmx2Csv (converts a TMX file into a bilingual Comma Separated Value file which can be imported into Excel and other tools)
– Tmx2Tsv (generates a Tab Separated (Delimited) Value file).
– Tab2Tmx (converts a Tab Delimited Value file into a TMX file which can be imported into almost any translation memory program. This offers a simple solution for getting data from a bilingual Excel file into a TMX file).
– Xliff2Tmx (converts an XLIFF file to a TMX file)
– Ttx2Tmx (will try create a TMX file from a Trados TTX (bilingual) file. It requires the Trados file to be converted to UTF-8 encoding before processing. This is very much a “last resort” tool!
– HTML Entity Converter (converts code points to HTML entities and vice versa).
Gives separate counts for source & target in a TMX file
Language Code Changer
Offers a simple way to change the language code(s) in a translation memory (TMX file).
This tool will adapt the number formats to the locale.