Category: Academic publications

Machine Translation Markers in Post-Edited Machine Translation Output

Abstract

The author has conducted an experiment for two consecutive years with postgraduate university students in which half do an unaided human translation (HT) and the other half post-edit machine translation output (PEMT). Comparison of the texts produced shows – rather unsurprisingly – that post-editors faced with an acceptable solution tend not to edit it, even when often more than 60% of translators tackling the same text prefer an array of other different solutions. As a consequence, certain turns of phrase, expressions and choices of words occur with greater frequency in PEMT than in HT, making it theoretically possible to design tests to tell them apart. To verify this, the author successfully carried out
one such test on a small group of professional translators. This implies that PEMT may lack the variety and inventiveness of HT, and consequently may not actually reach the same standard. It is evident that the additional post-editing effort required to eliminate what are effectively MT markers is likely to nullify a great deal, if not all, of the time and cost-saving advantages of PEMT. However, the author argues that failure to eradicate these markers may eventually lead to lexical impoverishment of the target language.

Published in

Translating and the Computer 40: proceedings. Asling: International Society for Advancement in Language Technology, 15-16 November 2018; pp. 50‑59 (ISBN 978-2-9701095-5-6).

Download

Download full paper.
Alternative download.

Raw Output Evaluator, a Freeware Tool for Manually Assessing Raw Outputs from Different Machine Translation Engines

Abstract

Raw Output Evaluator is a freeware tool, which runs under Microsoft Windows. It allows quality evaluators to compare and manually assess raw outputs from different machine translation engines. The outputs may be assessed in comparison to each other and to other translations of the same input source text, and in absolute terms using standard industry metrics or ones designed specifically by the evaluators themselves. The errors found may be highlighted using various colours. Thanks to a built-in stopwatch, the same program can also be used as a simple post-editing tool in order to compare the time
required to post-edit MT output with how long it takes to produce an unaided human translation of the same input text. The MT outputs may be imported into the tool in a variety of formats, or pasted in from the PC Clipboard. The project files created by the tool may also be exported and re-imported in several file formats. Raw Output Evaluator was developed for use during a postgraduate course module on machine translation and post-editing.

Published in

Translating and the Computer 40: proceedings. Asling: International Society for Advancement in Language Technology, 15-16 November 2018; pp. 38‑49 (ISBN 978-2-9701095-5-6).

Download

Download full paper.
Alternative download.

Building a Custom Machine Translation Engine as part of a Postgraduate University Course: a Case Study

Abstract

In 2015, I was asked to design a postgraduate course on machine translation (MT) and post-editing. Following a preliminary theoretical part, the module concentrated on the building and practical use of custom machine translation (CMT) engines. This was a particularly ambitious proposition since it was not certain that students with undergraduate degrees in languages, translation and interpreting, without particular knowledge of computer science or computational linguistics, would succeed in assembling the necessary corpora and building a CMT engine. This paper looks at how the task was successfully achieved using KantanMT to build the CMT engines and Wordfast Anywhere to convert and align the training data.
The course was clearly a success since all students were able to train a working CMT engine and assess its output. The majority agreed their raw CMT engine output was better than Google Translate’s for the kinds of text it was trained for, and better than the raw output (pre-translation) from a translation memory tool.
There was some initial scepticism among the students regarding the effective usefulness of MT, but the mood clearly changed at the end of the course with virtually all students agreeing that post-edited MT has a legitimate role to play.

Published in

Translating and the Computer 39: proceedings. Asling: International Society for Advancement in Language Technology, 16-17 November 2017; pp. 35-39 (ISBN 978-2-9701095-3-2).

Download

Download full paper.
Alternative download.

Solving Terminology Problems More Quickly with ‘IntelliWebSearch (Almost) Unlimited’

Abstract

Michael Farrell received several descriptions of university courses to translate from Italian into English in early 2005. The syllabuses boiled down to a list of topics and laws of mathematics and physics: not many complex sentences, but a great deal of terminology which needed translating and double checking with the utmost care and attention.
To do this, he found himself repeatedly copying terms to his PC clipboard, opening his browser, opening the most appropriate on-line resources, pasting terms into search boxes, setting search parameters, clicking search buttons, analysing results, copying the best solutions back to the clipboard, returning to the translation environment and pasting the terms found into the text.
He quickly realized that he needed to find a way to semi-automate the terminology search process in order to complete the translation in a reasonable time and for his own sanity. He immediately started looking around for a tool, but surprisingly there seemed to be nothing similar to what he needed on the market. Having already created some simple macros with a free scripting language called AutoHotkey, he set about writing something that would do the trick.
The first simple macro he knocked out gradually grew and developed until it became a fully fledged software tool: IntelliWebSearch. After speaking to several colleagues about it, he was persuaded to share his work and put together a small group of volunteer beta- testers. After a few weeks of testing on various Windows systems, he released the tool as freeware towards the end of 2005.
At the beginning of his workshop, Michael Farrell will explain what prompted him to create the tool and how he went about it. He will then go on to describe its use and its limitations, and show how it can save translators and terminologists a lot of time with a live demonstration, connectivity permitting.
The workshop will conclude with a presentation revealing for the first time in public some of the features of a new version which is currently being developed under the code name “IntelliWebSearch (Almost) Unlimited” (pre-alpha at the time of writing).
The workshop is aimed at professional translators, interpreters and terminologists in all fields, especially those interested in increasing efficiency through the use of technology without lowering quality standards.

Published in

Translating and the Computer 36: proceedings. Asling: International Society for Advancement in Language Technology, 27-28 November 2014; pp. 211-216 (ISBN 978-2-9700736-2-8).

Download

Download full paper.
Alternative download.

Top