Estonian language technologists from the University of Tartu are helping improve the flexibility and quality of Mozilla’s new translation programme, Bergamot.
This article is published in collaboration with Research in Estonia.
Recently, news of Mozilla’s new translation programme, Bergamot, spread through international technology news portals. Few know the high-level team also includes language technologists from the University of Tartu who are helping improve the flexibility and quality of machine translation. Mark Fišel, a professor of natural language processing at the university’s Institute of Computer Science and in charge of the work done in Tartu, describes the background of the collaboration.
The Bergamot Project involves a machine translation programme for open-source web browsers, such as Mozilla Firefox. The largest difference, compared with other similar programmes, such as Google Translate, is privacy. When most similar machine translation programmes are cloud-based, Bergamot must be downloaded to the computer and no user data is collected during its use.
In addition to the University of Tartu and Mozilla, the consortium of Bergamot also includes the University of Edinburgh (the UK), Charles University (the Czech Republic) and the University of Sheffield (the UK).
Mark Fišel, please describe, what exactly this project is about?
It all began with language technologists from four universities wanting to do a European Commission-funded research project together on machine translation. One idea was to fit machine translation into a web browser. Thanks to a contact person at the University of Edinburgh, we asked Mozilla to be our partner and, in January 2019, the project kicked off.
This is a research project, which means most of our activity is exploratory: we are studying how we could alter the best existing machine translation methods in a way to make them even better.
What exactly is machine translation?
The principle of machine translation is easy to explain: a machine or a computer must translate one text from one language to another automatically. It is one of the oldest language processing tasks, as this has been actively addressed since the beginning of the1950s.
Despite the long history, ideal machine translation is yet to be developed; however, in practice, its quality is good enough for it to find use. Machine-translated text is mainly used in post-editing, where the automatically translated text is manually corrected. With many topics, the average time needed for post-editing is less than what is required for translating from scratch.
What needs to be done to make the quality of machine translation better? What does your daily work entail?
Our main role in this project is to make machine translation engines flexible and adaptable to the content and style of the text. For example, in the context of nature, the machine should translate the [Estonian] word aas as ‘meadow’, but recognising a text on knitting, aas should be translated as ‘loop’. Or seeing a formal English text, the Estonian translation should use the form ‘teie’, not ‘sina’ [translating the English word ‘you’]. In the end, the programme should be able to make these decisions automatically.
We are also participating in other stages of the project: for example, we are working on the automatic estimation of translation quality. Its purpose being to decide after the generation of the translation whether it was successful or not. This is necessary to warn the user of a low-quality translation.
What is the final product if everything goes according to plan?
A large proportion of the project is research and experiments, but a working prototype will also be made. Now we plan to make the new technology available in the Firefox browser.
What is its main difference compared with Google’s current automatic translation?
The main difference with Google’s automatic translation and its machine translation plugin for Chrome is that Google Translate is cloud-based, which means that all text input is sent to Google’s servers for translating. Bergamot machine translation will work on the client’s computer and not on a cloud, which ensures the privacy of the texts.
The second characteristic is that existing translation engines – including Google’s – translate single sentences without looking at the context. The contribution by the University of Tartu scientists should ensure that the translation engine adapts to the context and style of the entire web page and takes into account other additional information to improve translation quality.
What is the ‘shift to client-side translation’ that has received a lot of attention in the international media?
Our partners at the Charles University in Prague are working on the so-called client-side translation. The idea is to provide the possibility of improving translation quality for users who are not fluent in the target language. The purpose of the machine translation system in this case would be to identify that part of the input as either being too complicated or ambiguous for successful translation, and to ask the user to rephrase it.
In conclusion, it may be said that the researchers at the University of Tartu Institute of Computer Science are working on applications, which most of the readers of this article probably use regularly. It is important to note that all the results of this research when finished, will be freely released with permissive licences. This project involves translations from English into Estonian, Polish, Czech, German, French and Spanish, and vice versa.
Cover: The symbol of Mozilla Firefox browser.