Category: Academic publications

Abstract

In the article ‘I, robot’ in the May-June Bulletin, I left you with two puzzles and a provocative question (‘Do these puzzles demonstrate that, given enough examples…, it is always possible to translate accurately between two languages that you do not know without understanding the meaning of the sentence you need to translate?’). The idea behind the puzzles was to get you thinking like machines, and the purpose of the provocative question was, well, to provoke.

Published in

ITI Bulletin, July-August 2024

Download

Download full article.

Abstract

You’ll have read it all over the place by now: machine translation (MT) and generative artificial intelligence (GenAI) work by identifying patterns and reproducing them. They don’t really understand language. They just replace tokens (words or subwords) with numeric vectors, perform various arithmetic operations on them and calculate probabilities at an amazing speed.

Published in

ITI Bulletin, May-June 2024

Download

Download full article.

Abstract

There is no hiding it: the original idea of machine translation (MT) was to replace human translators completely. Way back in the 1950s (yes, MT is that old – older actually), when the first computer translation systems came into being, some of the researchers working on them were predicting that translators – or at least technical translators – would be gone within a matter of years.

Published in

ITI Bulletin, September-October 2023

Download

Download full article.

Abstract

The experiment reported in this paper is a follow-up to one conducted in 2017/2018. The new experiment aimed to establish if the previously observed lexical impoverishment in machine translation post-editing (MTPE) has become more marked as technology has developed or if it has attenuated. This was done by focusing on two n-grams, which had been previously identified as MT markers, i.e., n-grams that give rise to translation solutions that occur with a higher frequency in MTPE than is natural in HT. The new findings suggest that lexical impoverishment in the two short texts examined has indeed diminished with DeepL Translator. The new experiment also considered possible syntactic differences, namely the number of text segments in the target text. However no significant difference waThe experiment reported in this paper is a follow-up to one conducted in 2017/2018. The new experiment aimed to establish if the previously observed lexical impoverishment in machine translation post-editing (MTPE) has become more marked as technology has developed or if it has attenuated. This was done by focusing on two n-grams, which had been previously identified as MT markers, i.e., n-grams that give rise to translation solutions that occur with a higher frequency in MTPE than is natural in HT. The new findings suggest that lexical impoverishment in the two short texts examined has indeed diminished with DeepL Translator. The new experiment also considered possible syntactic differences, namely the number of text segments in the target text. However no significant difference was observed. The participants were asked to complete a short questionnaire on how they went about their tasks. It emerged that it was helpful to consult the source language text while post-editing, and the original unedited raw output while self-revising, suggesting that monolingual MTPE of the two chosen texts would have been unwise. Despite not being given specific guidelines, the productivity of the post-editors increased. If the ISO 18587:2017 recommendation of using as much of the MT output as possible had been strictly followed, the MTPE would have been easier to distinguish from HT. If this can be taken to be generally true, it suggests that it is neither necessary nor advisable to follow this recommendation when lexical diversity is crucial for making the translation more engaging.

Published in

International Conference on Human-Informed Translation and Interpreting Technology (HiT-IT 2023): proceedings. Naples, Italy, 7-9 July 2023.

Download

Download full paper.

Abstract

This preliminary study consisted of two experiments. The first aimed to gauge the translation quality obtained from the free-plan version of ChatGPT in comparison with the free versions of DeepL Translator and Google Translate through human evaluation, and the second consisted of using the free-plan version of ChatGPT as an automatic post-editor of raw output from the pay-for version of DeepL Translator (both monolingual and bilingual full machine translation post-editing). The experiments were limited to a single language pair (from English to Italian) and only one text genre (Wikipedia articles). In the first experiment, DeepL Translator was judged to have performed best, Google Translate came second, and ChatGPT, last. In the second experiment, the free-plan version of ChatGPT equalled average human translation (HT) levels of lexical variety in automatic monolingual machine translation post-editing (MTPE) and exceeded average HT lexical variety levels in automatic bilingual MTPE. However, only one MT marker was considered, and the results of the post-editing were not quality-assessed for other features of MTPE that distinguish it from HT. It would therefore be unadvisable to generalize these findings at present. The author intends to carry out new translation experiments during the next academic year with ChatGPT Plus, instead of the free-plan version, both as an MT engine and as an automatic post-editor. The plan is to continue to evaluate the results manually and not automatically..

Published in

International Conference on Human-Informed Translation and Interpreting Technology (HiT-IT 2023): proceedings. Naples, Italy, 7-9 July 2023.

Download

Download full paper.

Abstract

This book looks at various aspects of machine translation, including the history of its technological advancement, quality evaluation, typical errors, techniques for improving its output, and how human translators can transform machine translation into a tool that can take some of the grind out of their work.

Published by

Amazon Digital Services LLC – KDP, 2023

Buy

On Amazon.

Abstract

The author conducted an anonymous online survey between 23 July and 21 October 2022 to gain insight into the proportion of translators that use machine translation (MT) in their translation workflow and the various ways they do. The results show that translators with more experience are less likely to accept MT post-editing (MTPE) assignments than their less experienced colleagues but are equally likely to use MT themselves in their translation work. Translators who deal with lower-resource languages are also less likely to accept MTPE jobs, but there is no such relationship regarding the use of MT in their own workflow. When left to their own devices, only 18.57% of the 69.54% of respondents that declared that they use MT while translating always or usually use it in the way the pioneers of MT envisaged, i.e., MTPE. Most either usually or always prefer to use MT in a whole range of other ways, including enabling MT functions in CAT tools and doing hybrid post-editing; using MT engines as if they were dictionaries; and using MT for inspiration. The vast majority of MT-users see MT as just another tool that their clients do not necessarily need to be informed about.

Published in

Translating and the Computer 44: proceedings. Asling: International Society for Advancement in Language Technology, 24-25 November 2022; pp. 49‑60 (ISBN 978-2-9701733-0-4).

Download

Download full paper.

Abstract

The author has conducted an experiment for two consecutive years with postgraduate university students in which half do an unaided human translation (HT) and the other half post-edit machine translation output (PEMT). Comparison of the texts produced shows – rather unsurprisingly – that post-editors faced with an acceptable solution tend not to edit it, even when often more than 60% of translators tackling the same text prefer an array of other different solutions. As a consequence, certain turns of phrase, expressions and choices of words occur with greater frequency in PEMT than in HT, making it theoretically possible to design tests to tell them apart. To verify this, the author successfully carried out
one such test on a small group of professional translators. This implies that PEMT may lack the variety and inventiveness of HT, and consequently may not actually reach the same standard. It is evident that the additional post-editing effort required to eliminate what are effectively MT markers is likely to nullify a great deal, if not all, of the time and cost-saving advantages of PEMT. However, the author argues that failure to eradicate these markers may eventually lead to lexical impoverishment of the target language.

Published in

Translating and the Computer 40: proceedings. Asling: International Society for Advancement in Language Technology, 15-16 November 2018; pp. 50‑59 (ISBN 978-2-9701095-5-6).

Download

Download full paper.
Alternative download.

Abstract

Raw Output Evaluator is a freeware tool, which runs under Microsoft Windows. It allows quality evaluators to compare and manually assess raw outputs from different machine translation engines. The outputs may be assessed in comparison to each other and to other translations of the same input source text, and in absolute terms using standard industry metrics or ones designed specifically by the evaluators themselves. The errors found may be highlighted using various colours. Thanks to a built-in stopwatch, the same program can also be used as a simple post-editing tool in order to compare the time
required to post-edit MT output with how long it takes to produce an unaided human translation of the same input text. The MT outputs may be imported into the tool in a variety of formats, or pasted in from the PC Clipboard. The project files created by the tool may also be exported and re-imported in several file formats. Raw Output Evaluator was developed for use during a postgraduate course module on machine translation and post-editing.

Published in

Translating and the Computer 40: proceedings. Asling: International Society for Advancement in Language Technology, 15-16 November 2018; pp. 38‑49 (ISBN 978-2-9701095-5-6).

Download

Download full paper.
Alternative download.

Abstract

In 2015, I was asked to design a postgraduate course on machine translation (MT) and post-editing. Following a preliminary theoretical part, the module concentrated on the building and practical use of custom machine translation (CMT) engines. This was a particularly ambitious proposition since it was not certain that students with undergraduate degrees in languages, translation and interpreting, without particular knowledge of computer science or computational linguistics, would succeed in assembling the necessary corpora and building a CMT engine. This paper looks at how the task was successfully achieved using KantanMT to build the CMT engines and Wordfast Anywhere to convert and align the training data.
The course was clearly a success since all students were able to train a working CMT engine and assess its output. The majority agreed their raw CMT engine output was better than Google Translate’s for the kinds of text it was trained for, and better than the raw output (pre-translation) from a translation memory tool.
There was some initial scepticism among the students regarding the effective usefulness of MT, but the mood clearly changed at the end of the course with virtually all students agreeing that post-edited MT has a legitimate role to play.

Published in

Translating and the Computer 39: proceedings. Asling: International Society for Advancement in Language Technology, 16-17 November 2017; pp. 35-39 (ISBN 978-2-9701095-3-2).

Download

Download full paper.
Alternative download.