Most frequent errors in machine translation

In this day and age, most people are familiar with the term “machine translation”, i.e., a translation done by a computer without human intervention. For example, when we don’t know what a word means in another language, we normally look up its meaning in Google Translate, which is just one of the existing machine translation tools.

Although these translation tools may be useful in an emergency, it is also true that the quality of the final text is by no means guaranteed. We’ll explain why and the type of errors that are most common, depending on the type of machine translation tool.

There are basically 3 types of Machine Translation (MT):

  • Rule-Based Machine Translation (RBMT): this type of translation is based on the grammars of the source and target language, bilingual dictionaries and transfer rules. Its greatest drawback is that it is unable to translate linguistic structures, words or expressions that do not appear in those instruments, and for this reason, it must be extremely extensive from the start and needs to be updated continuously.
  • Statistical Machine Translation (SMT): this type of translation requires a monolingual corpus in the target language and another parallel corpus with the translation from the source to the target language. Since its function is based on calculating the probability of success in translating, the problem is its enormous dependence on the languages in question (whether they are similar or not), the quality of the corpus, the field or the specialisation, among others.
  • Neural Machine Translation (NMT): this is based on extensive parallel corpora, emulating the way in which our neurons work by associating the words with underlying information to form associations of ideas and thus be able to translate. The drawback is that since it uses characters and sequences, it sometimes generates words that are non-existent or make no sense.

Most common errors in MT

  1. Translating proper nouns: the possibility of translating a surname or a place name with poor-quality MT tools. This can be confirmed by means of test!
  2. False friends: if the corpus is not of good quality, the MT system may sometimes use inaccurate translations, as it does not know how a word is normally translated. One example is the Spanish word “carrera”, which in certain cases refers to “academic degree”, but some MT tools translate it as “career”, changing therefore the meaning.
  3. Word order: MT may make mistakes in interpreting a sentence (e.g., a verb usually ends in “-ing” when it follows a preposition, but this has nothing to do with a verbal form of the continuous tense: “after having finished”, which should be translated into Spanish as “tras acabar” and not as “tras habiendo acabado”); in addition, unnatural structures may be used in the target language.
  4. Perpetuation of errors in the original: if the original text contains a spelling mistake, the MT tool may not translate that word or may confuse it with another that is unrelated to the text, which are things that a professional translator would detect and correct.
  5. Use of incorrect meanings in the case of polysemy: this may happen when working with low-quality corpus, as the system does not detect the context and uses an erroneous meaning in the dictionary.
  6. Inability to detect and translate “new” terms: if a word does not exist in its lexicon, the MT tool will not be able to translate it. The same occurs in the case of words invented by the author in the source language, or neologisms.
  7. Errors due to homographs: sometimes, two words may be written the same way, but have different meanings or even different grammatical categories. One example is the third person singular of the verb “market”: “markets” may confuse the MT tool and end up being translated into Spanish as “mercados” instead of “él/ella vende”.
  8. Incorrect use of upper/lower case letters: some terms are written in upper case, depending on the context (King/king), but an MT system is unable to detect when to do this and when not to.
  9. Untranslated acronyms: it is known that certain acronyms have an official translation, but the quality of some MT tools is insufficient for them to be incorporated into their corpus. One example is WHO (World Health Organization), which, in Spanish is OMS (Organización Mundial de la Salud).
  10. Literal translations: the most basic MT tools simply translate words, not meanings. So they may end up using structures or words that a native speaker of the target language would not use. For instance, one MT for “I’m not feeling well” could be “No me siento bien”, which sounds odd to a person from Spain, who would normally say “No me encuentro bien”.

Real examples of incorrect translations using an MT tool

Below are several examples of incorrect translations done by an MT tool that have raised quite a few eyebrows.

  • La Feria del Clítoris

The Google Translate tool translated the name of the “Feria do Grelo” for the As Pontes website in La Coruña as the “Feria del Clítoris” (The Clitoris Fair). A “grelo” is a type of plant that is derived from the turnip, which the MT tool detected in Portuguese instead of Galician; in actual fact, the Portuguese word “grelo” means “clitoris”.

You will find the article here:

  • Amazon in Sweden

A more recent case can be found in the Amazon website for Sweden, which published adverts of different brands and products translated by a machine. The text in the target language included a series of coarse expressions and unclear descriptions due to an erroneous translation.

You can read the news item below:

  • The Norwegian team at the PyeongChang Olympic Games (South Korea)

Another case is the mistake made by Google Translate when the chefs of the team ordered 1,500 eggs from a local Korean supplier and the tool changed the number to 15,000.

You can read the news item below: