Category: Academic publications

Abstract

The experiment reported in this paper is a follow-up to one conducted in 2017/2018. The new experiment aimed to establish if the previously observed lexical impoverishment in machine translation post-editing (MTPE) has become more marked as technology has developed or if it has attenuated. This was done by focusing on two n-grams, which had been previously identified as MT markers, i.e., n-grams that give rise to translation solutions that occur with a higher frequency in MTPE than is natural in HT. The new findings suggest that lexical impoverishment in the two short texts examined has indeed diminished with DeepL Translator. The new experiment also considered possible syntactic differences, namely the number of text segments in the target text. However no significant difference waThe experiment reported in this paper is a follow-up to one conducted in 2017/2018. The new experiment aimed to establish if the previously observed lexical impoverishment in machine translation post-editing (MTPE) has become more marked as technology has developed or if it has attenuated. This was done by focusing on two n-grams, which had been previously identified as MT markers, i.e., n-grams that give rise to translation solutions that occur with a higher frequency in MTPE than is natural in HT. The new findings suggest that lexical impoverishment in the two short texts examined has indeed diminished with DeepL Translator. The new experiment also considered possible syntactic differences, namely the number of text segments in the target text. However no significant difference was observed. The participants were asked to complete a short questionnaire on how they went about their tasks. It emerged that it was helpful to consult the source language text while post-editing, and the original unedited raw output while self-revising, suggesting that monolingual MTPE of the two chosen texts would have been unwise. Despite not being given specific guidelines, the productivity of the post-editors increased. If the ISO 18587:2017 recommendation of using as much of the MT output as possible had been strictly followed, the MTPE would have been easier to distinguish from HT. If this can be taken to be generally true, it suggests that it is neither necessary nor advisable to follow this recommendation when lexical diversity is crucial for making the translation more engaging.

Presented at

International Conference HiT-IT 2023. Human-informed Translation and Interpreting Technology, 7-9 July 2023.

Download

Download full paper.

Abstract

This preliminary study consisted of two experiments. The first aimed to gauge the translation quality obtained from the free-plan version of ChatGPT in comparison with the free versions of DeepL Translator and Google Translate through human evaluation, and the second consisted of using the free-plan version of ChatGPT as an automatic post-editor of raw output from the pay-for version of DeepL Translator (both monolingual and bilingual full machine translation post-editing). The experiments were limited to a single language pair (from English to Italian) and only one text genre (Wikipedia articles). In the first experiment, DeepL Translator was judged to have performed best, Google Translate came second, and ChatGPT, last. In the second experiment, the free-plan version of ChatGPT equalled average human translation (HT) levels of lexical variety in automatic monolingual machine translation post-editing (MTPE) and exceeded average HT lexical variety levels in automatic bilingual MTPE. However, only one MT marker was considered, and the results of the post-editing were not quality-assessed for other features of MTPE that distinguish it from HT. It would therefore be unadvisable to generalize these findings at present. The author intends to carry out new translation experiments during the next academic year with ChatGPT Plus, instead of the free-plan version, both as an MT engine and as an automatic post-editor. The plan is to continue to evaluate the results manually and not automatically..

Presented at

International Conference HiT-IT 2023. Human-informed Translation and Interpreting Technology, 7-9 July 2023.

Download

Download full paper.

Abstract

This book looks at various aspects of machine translation, including the history of its technological advancement, quality evaluation, typical errors, techniques for improving its output, and how human translators can transform machine translation into a tool that can take some of the grind out of their work.

Published by

Amazon Digital Services LLC – KDP, 2023

Buy

On Amazon.

Abstract

The author conducted an anonymous online survey between 23 July and 21 October 2022 to gain insight into the proportion of translators that use machine translation (MT) in their translation workflow and the various ways they do. The results show that translators with more experience are less likely to accept MT post-editing (MTPE) assignments than their less experienced colleagues but are equally likely to use MT themselves in their translation work. Translators who deal with lower-resource languages are also less likely to accept MTPE jobs, but there is no such relationship regarding the use of MT in their own workflow. When left to their own devices, only 18.57% of the 69.54% of respondents that declared that they use MT while translating always or usually use it in the way the pioneers of MT envisaged, i.e., MTPE. Most either usually or always prefer to use MT in a whole range of other ways, including enabling MT functions in CAT tools and doing hybrid post-editing; using MT engines as if they were dictionaries; and using MT for inspiration. The vast majority of MT-users see MT as just another tool that their clients do not necessarily need to be informed about.

Published in

Translating and the Computer 44: proceedings. Asling: International Society for Advancement in Language Technology, 24-25 November 2022; pp. 49‑60 (ISBN 978-2-9701733-0-4).

Download

Download full paper.

Abstract

The author has conducted an experiment for two consecutive years with postgraduate university students in which half do an unaided human translation (HT) and the other half post-edit machine translation output (PEMT). Comparison of the texts produced shows – rather unsurprisingly – that post-editors faced with an acceptable solution tend not to edit it, even when often more than 60% of translators tackling the same text prefer an array of other different solutions. As a consequence, certain turns of phrase, expressions and choices of words occur with greater frequency in PEMT than in HT, making it theoretically possible to design tests to tell them apart. To verify this, the author successfully carried out
one such test on a small group of professional translators. This implies that PEMT may lack the variety and inventiveness of HT, and consequently may not actually reach the same standard. It is evident that the additional post-editing effort required to eliminate what are effectively MT markers is likely to nullify a great deal, if not all, of the time and cost-saving advantages of PEMT. However, the author argues that failure to eradicate these markers may eventually lead to lexical impoverishment of the target language.

Published in

Translating and the Computer 40: proceedings. Asling: International Society for Advancement in Language Technology, 15-16 November 2018; pp. 50‑59 (ISBN 978-2-9701095-5-6).

Download

Download full paper.
Alternative download.

Abstract

Raw Output Evaluator is a freeware tool, which runs under Microsoft Windows. It allows quality evaluators to compare and manually assess raw outputs from different machine translation engines. The outputs may be assessed in comparison to each other and to other translations of the same input source text, and in absolute terms using standard industry metrics or ones designed specifically by the evaluators themselves. The errors found may be highlighted using various colours. Thanks to a built-in stopwatch, the same program can also be used as a simple post-editing tool in order to compare the time
required to post-edit MT output with how long it takes to produce an unaided human translation of the same input text. The MT outputs may be imported into the tool in a variety of formats, or pasted in from the PC Clipboard. The project files created by the tool may also be exported and re-imported in several file formats. Raw Output Evaluator was developed for use during a postgraduate course module on machine translation and post-editing.

Published in

Translating and the Computer 40: proceedings. Asling: International Society for Advancement in Language Technology, 15-16 November 2018; pp. 38‑49 (ISBN 978-2-9701095-5-6).

Download

Download full paper.
Alternative download.

Abstract

In 2015, I was asked to design a postgraduate course on machine translation (MT) and post-editing. Following a preliminary theoretical part, the module concentrated on the building and practical use of custom machine translation (CMT) engines. This was a particularly ambitious proposition since it was not certain that students with undergraduate degrees in languages, translation and interpreting, without particular knowledge of computer science or computational linguistics, would succeed in assembling the necessary corpora and building a CMT engine. This paper looks at how the task was successfully achieved using KantanMT to build the CMT engines and Wordfast Anywhere to convert and align the training data.
The course was clearly a success since all students were able to train a working CMT engine and assess its output. The majority agreed their raw CMT engine output was better than Google Translate’s for the kinds of text it was trained for, and better than the raw output (pre-translation) from a translation memory tool.
There was some initial scepticism among the students regarding the effective usefulness of MT, but the mood clearly changed at the end of the course with virtually all students agreeing that post-edited MT has a legitimate role to play.

Published in

Translating and the Computer 39: proceedings. Asling: International Society for Advancement in Language Technology, 16-17 November 2017; pp. 35-39 (ISBN 978-2-9701095-3-2).

Download

Download full paper.
Alternative download.

Abstract

Michael Farrell received several descriptions of university courses to translate from Italian into English in early 2005. The syllabuses boiled down to a list of topics and laws of mathematics and physics: not many complex sentences, but a great deal of terminology which needed translating and double checking with the utmost care and attention.
To do this, he found himself repeatedly copying terms to his PC clipboard, opening his browser, opening the most appropriate on-line resources, pasting terms into search boxes, setting search parameters, clicking search buttons, analysing results, copying the best solutions back to the clipboard, returning to the translation environment and pasting the terms found into the text.
He quickly realized that he needed to find a way to semi-automate the terminology search process in order to complete the translation in a reasonable time and for his own sanity. He immediately started looking around for a tool, but surprisingly there seemed to be nothing similar to what he needed on the market. Having already created some simple macros with a free scripting language called AutoHotkey, he set about writing something that would do the trick.
The first simple macro he knocked out gradually grew and developed until it became a fully fledged software tool: IntelliWebSearch. After speaking to several colleagues about it, he was persuaded to share his work and put together a small group of volunteer beta- testers. After a few weeks of testing on various Windows systems, he released the tool as freeware towards the end of 2005.
At the beginning of his workshop, Michael Farrell will explain what prompted him to create the tool and how he went about it. He will then go on to describe its use and its limitations, and show how it can save translators and terminologists a lot of time with a live demonstration, connectivity permitting.
The workshop will conclude with a presentation revealing for the first time in public some of the features of a new version which is currently being developed under the code name “IntelliWebSearch (Almost) Unlimited” (pre-alpha at the time of writing).
The workshop is aimed at professional translators, interpreters and terminologists in all fields, especially those interested in increasing efficiency through the use of technology without lowering quality standards.

Published in

Translating and the Computer 36: proceedings. Asling: International Society for Advancement in Language Technology, 27-28 November 2014; pp. 211-216 (ISBN 978-2-9700736-2-8).

Download

Download full paper.
Alternative download.