You are on page 1of 12

Research Paper

Journal of
Language and Translation
Volume 12, Number 4, 2022 (pp. 107-118)

Assessing the Performance Quality of Google Translate in Translating


English and Persian Newspaper Texts Based on the MQM-DQF Model
Zahra Foradi1*, Jalilollah Faroughi Hendevalan2, Mohammad Reza Rezaeian Delouei3
1
MA in Translation Studies, English Department, University of Birjand, Iran
2
Associate Professor, English Department, University of Birjand, Iran
3
Assistant Professor, English Department, University of Birjand, Iran
Received: November 10, 2021 Accepted: March 21, 2022

ABSTRACT
The use of machine translation to communicate and access information has become increasingly
common. Various translation software and systems appear on the Internet to enable interlingual
communication. Accelerating translation and reducing its cost are other factors in the increasing
popularity of machine translation. Even if the quality of this type of translation is lower than human
translation, it is still significant in many ways. The MQM-DQF model provides standards of error
typology for objective and quantitative assessment of translation quality. In this model, two criteria
(accuracy and fluency) are used to assess machine translation quality. The MQM-DQF model was used
in this study to assess the quality of Google Translate performance in translating English and Persian
newspaper texts. Five texts from Persian newspapers and five texts from English newspapers were
randomly selected and translated by Google Translate both at the sentence level and the whole text. The
translated texts were assessed based on the MQM-DQF model. Translation errors were identified and
coded at three severity levels: critical, major, and minor errors. By counting the errors and scoring them,
the percentage of accuracy and fluency criteria in each of the translated texts was calculated. The results
showed that Google Translate performs better in translating texts from Persian into English;
furthermore, in sentence-level translation, it performs better than the translation of the whole text.
Moreover, translations of different newspaper texts (economic, cultural, sports, political, and scientific)
were not of the same quality.

Keywords: Accuracy; Fluency; Machine Translation (MT); MQM-DQF Model; Translation Quality
Assessment (TQA)

INTRODUCTION
In Translation Studies, the issue of translation Assessing the quality of translation is very
quality has always been of great importance. In complicated due to the subtleties of natural
order to eliminate the effect of subjectivity on language. One of the issues that have always
translation quality assessment (TQA), been considered in machine translation is the
assessment should be done based on pre- methods and parameters for assessing
defined criteria and models. “The main goal of translation results. “The evaluation of machine
Translation Quality Assessment (TQA) is to translation (MT) systems is a vital field of
maintain, and deliver to the client, buyer, user, research, both for determining the effectiveness
reader, etc., of translated texts” (Doherty, 2017, of existing MT systems and for optimizing the
p. 131). performance of MT systems” (Dorr et al, 2011,
* Corresponding Author’s email: p. 745). As the translation may be done by
zahraforadi@birjand.ac.ir human or machine, translation assessment also
108 Assessing the Performance Quality of Google Translate…
may be done either by human or machine; to do was conducted in 2011 on 11 current quality
so, a set of predefined criteria can be used as an evaluation models. It was the starting point of
instruction by a human assessor or be defined in benchmarking translation quality evaluation
the form of a quality assessing software that models to achieve a comprehensive TQA model
performs the assessment automatically; each of (O’Brien, 2012).
them has advantages and disadvantages. “Translation quality assessment (TQA) has
Automatic assessment is low-cost and suffered from a lack of standard methods.
objective, but types of errors in texts are not Starting in 2012, the Multidimensional Quality
well shown. The most valid technique for Metrics (MQM) and Dynamic Quality
judging the quality of translation is the manual Framework (DQF) projects independently
assessment by human assessors, but the began to address the need for such shared
inevitable negative point is that it is time- methods” (Moorkens et al. 2018, p. 109).
consuming and costly (see Maučec & Donaj, According to Lommel et al. (2018, p. 23), “the
2019). Furthermore, human assessment Multidimensional Quality Metrics (MQM)
depends heavily on the skills of the assessor. framework for describing and defining
Eventually, the performance quality of a translation quality assessment metrics was
machine assessor must also be assessed by a developed in the EU-funded QTLaunchPad
human assessor. project”.
The structure of a translation quality Some studies in the field of TQA have been
assessment model includes the classification of conducted based on the MQM model, including
types of errors and their weighting based on “The evaluation of the quality of
severity levels. If translation errors are Crowdsourcing Translations of Wikipedia
considered throughout the text in general and Articles based on the MQM model” (Vahedi
finally a score is assigned to the whole text, the Kakhki, 2018), whose results showed the
assessment is holistic, and if the errors are relatively high quality of the translation of these
examined in detail, the assessment is analytic. texts.
Until recent years, all the assessment “Dynamic Quality Framework (DQF) is a
approaches have been holistic (Karimnia, 2011) comprehensive suite of tools for quality
and evaluation criteria have been mainly evaluation of both human and machine
subjective (Saldanha & O’Brien, 2013). translation developed by Translation
According to Waddington (2001), the Automation User Society (TAUS)” (Lommel et
literature review on TQA shows that almost all al, 2018, p. 27).
the previous studies have been descriptive or According to Moorkens et al (2018, p 109),
theoretical. Waddington’s model was “in 2014 these approaches were integrated,
introduced through empirical studies, which centering on a shared error typology (the
have been used in numerous research to assess ‘DQF/MQM Error Typology’) that brought
the quality of translation (see, for example, them together.”
Andishe Borujeni, 2020). In this way, standard benchmarks were
With the expansion of machine translation established to assess the quality of translation.
use and increasing awareness of translation “This approach to quality evaluation provides a
consumers, the need for more detailed common vocabulary to describe and categorize
translation evaluation models, which could be translation errors and to create translation
used for machine translation evaluation, quality metrics that tie translation quality to
gradually increased. Around 2011, TAUS specifications” (Moorkens et al, 2018, p. 109).
(Translation Automation User Society) sought From the perspective of this model, two
to create a new way of measuring translation quantified dependent variables, namely
quality to satisfy its customers. O’Brien accuracy, and fluency, influence the
undertook a project in collaboration with TAUS independent variable, i.e. quality.
to explore the potential of a dynamic quality The MQM-DQF model was the metric used
estimation model. The first part of this study in the present research in order to evaluate the
JLT 12(4) – 2022 109

quality of Google Translate in translating these machine translation quality assessments


newspaper texts from English into Persian and in the genre of newspaper text translation.
vice versa.
Numerous case studies have been conducted Purpose of the Study
to evaluate the machine translation quality of Every day around the world, large volumes of
various text types from English into Persian and newspapers are published in different
vice versa. For example, Pajhooheshnia (2015) languages. To communicate correctly and
evaluated the quality of machine translation for effectively with other nations and increase
technical texts from Persian into English based public awareness, the translation of these texts
on two criteria of adequacy and fluency by is necessary and is done by various news
human evaluators. The human evaluator score agencies around the world. Due to the high
was 2.62 for the accuracy criterion and 2.45 for volume of production and publication of these
the fluency criterion. texts that is going on all over the world every
In another study on the translation quality of day, translating newspaper texts is necessary
proverbs translated from English into Persian and, at the same time, very difficult. Using
by Google Translate, the results showed that machine translation is a solution to deal with
“the minorities of the translated proverbs time and volume constraints in the translation
provide adequate comprehension” (Torkaman, of journalistic texts.
2013, p. iv). The integrated model (MQM-DQF) was
According to Moradi’s (2015) research, in used in this study to assess the performance
the evaluation of machine translation quality in quality of Google translate in translating
translating English scientific-technical articles, newspaper texts both at the level of sentence
including four empirical sciences, i.e. biology, and the whole text from English into Persian
chemistry, mathematics, and physics, based on and vice versa. We were looking to answer
EuroMatrix, “GT had the best performance in these research questions:
translating physical subtype and the worst in 1) Based on the MQM-DQF assessment
mathematical, biological and chemical being model, how is the translation quality of English
second and third in rank” (Moradi, 2015, p. 89). newspaper texts translated into Persian by
The overall adequacy and fluency values for Google Translate assessed?
GT were 47.5% and 46.27% respectively. 2) Based on the MQM-DQF assessment
Some comparative studies have been also model, how is the translation quality of Persian
performed. In a study conducted in 2017 to newspaper texts translated into English by
estimate the translation quality of two machine Google Translate assessed?
translators (including Google Translate) from 3) According to the MQM-DQF model, in
English into Persian, most linguistic errors were which case is the performance quality of
found to have occurred in the areas of Google Translate better, in translating English
morphology, complex sentences, syntactic newspaper texts into Persian or in translating
ambiguity, and semantic analysis, generation of Persian newspaper texts into English?
Persian, and long sentences (Aabedi, 2017). 4) According to the MQM-DQF model, at
Aabedi (2017) and Sharifiyan (2018) found that which translation level (at the level of sentence
Google performed better than other translation or the whole text) is the performance quality of
machines. Google Translate in translating Persian and
In the field of machine translation quality English newspaper texts better?
assessment for literary texts, some research also
has been done, including “Efficacy in METHOD
Translating Verb Tense from English into The purpose of the research was to assess the
Persian”; the result of this research showed that quality of Google Translate performance in
Google Translate could not translate verb tense translating English and Persian newspaper texts
appropriately into Persian (Hakiminejad & based on the MQM-DQF model. To this end,
Alaeddini, 2016). In this study, we continued the first well-known and widely circulated
110 Assessing the Performance Quality of Google Translate…
domestic and foreign newspapers were sentence to sentence and whole text, by Google
identified. Five texts (economic, cultural, Translate. Then, the translated texts were
sports, scientific and political) were selected examined and analysed based on the MQM-
randomly. A simple random sampling was also DQF model and the two criteria (i.e. accuracy
done to select these five types of texts from and fluency) and their subsets. Errors at three
domestic newspapers and five from foreign severity levels (critical, major, and minor) were
newspapers. identified. The error typology diagram used in
Five selected Persian texts and five selected the research is as follows.
English texts were translated separately, both

Figure 1
A subset for MT analysis (Lommel and Melby, 2018, p. 30)

Sub-categories of criteria Register: Have language characteristics in


According to Fig. 1, there are five sub- the source text been transferred into the target
categories to check accuracy in machine language?
translation outputs: Style: Has the social, cultural, and temporal
Terminology: Is there a proper equivalent position of the source language been transferred
for each word? into the target language?
Mistranslation: Has the word or phrase been Inconsistency: Has a specific word or phrase
mistranslated? repeated throughout the text been translated
Omission: Has deletion happened during the equally everywhere? Does the meaning of one
translation? part of the translated text not violate another
Addition: Has anything been added to the part of it?
translated text that did not exist in the source Spelling: Is the spelling of the words
text? correct? (For example, spelling mistakes
Untranslated words or phrases: Has an change the meaning of a word entirely or write
untranslated word or phrase been transferred the first letter of proper nouns in lowercase.)
from the source text to the translated text? Typography: Is the shape, size and spacing
There are also seven general sub-categories to of the letters and paragraphing of the text
assess fluency in machine-translation outputs: appropriate?
JLT 12(4) – 2022 111

Grammar: Are the grammatical rules in the user ignores. Minor errors do not pose a
translated text followed correctly? problem in conveying meaning. The default
Unintelligiblity: Does the translated text score for this group is 1.
convey its original meaning clearly? Null errors include errors that are ignored by
default before text assessment begins. For
Error severity levels (weighting) example, in assessing translation of a brochure
Evaluating a translation is not enough just to related to the instructions of an electrical
know the number of errors. “Evaluators also device, the style of writing may not be essential,
need to know how severe they are and how so it will not be part of the translation criteria at
important the error type is for the task at hand” all, and if such an error is seen in the text, it will
(Moorkens et al, 2018, p. 120). According to be ignored. The score of these errors is always
Lommel et al (2018, p. 33), “severity level is an zero (Moorkens et al., 2018, pp. 116–122).
indicator of the importance of an issue with an According to the definitions of the severity
accompanying numerical representation and level of error, each previously detected type of
weight is a numeric representation of the error was labelled. Then, the percentages of
importance of an issue type in a specific both accuracy and fluency criteria were
metric”. calculated using the following formula:
Critical error is an error the user does not
notice. Critical errors per se make the Score
translation unsuitable for its intended purpose. = 1
However, a critical error can prevent the minor(1) + major(10) + critical(100)

translation from achieving its purpose. For Word Count
example, if in the translation of an industrial The final score is numerically between zero
text a product weighs 2 pounds (approximately and one.
0.9 kg) and it is translated into 2 kg, the result
of this translation may be to the detriment of the The point here is to count the words in the
user while he/she does not notice it; therefore, translated texts correctly. Google does not
it is a critical error. The default score for each regard zero-width non-joiner (ZWNJ) in
of these errors is 100. translating from English into Persian; therefore,
Major errors in translation make the in translating the five texts from English into
meaning of the text unclear to the user. For Persian, formal Persian editing was done to
example, if in translating an educational text correct the ZWNJ to obtain the exact number of
about insects from Italian into English, the word words in the translated texts.
“ape” that means “bee” in Italian and “monkey” By weighing the errors and calculating their
in English is translated as “monkey”, it has scores, the rate of accuracy and fluency criteria
negative consequences but the user is aware of can be reported quantitatively (percentage),
this error. However, this type of error does not using descriptive statistics (frequency and
harm the user because he realizes this error. The percentage calculation).
default score for this group is 10.
If there is more than one major error per RESULTS
thousand words in a translated text by a holistic The quality of Google Translate performance in
assessment method, the assessor is required to translating English newspapers (economic,
change the assessment method to an analytic cultural, sport, political, and scientific) texts
one and find the exact cause of errors. into Persian and the same Persian newspaper
Minor errors do not affect the usability of a text types into English, at both sentence-level
translated text. In many cases, the user translation and whole-text translation, was
consciously ignores these types of errors. For examined based on MQM-DQF model and
example, in translating a text into English, the compared by calculating percentages of two
translator may say to who it may concern criteria or variables, i.e. accuracy and fluency.
instead of to whom it may concern, which the As mentioned before, the object of the current
112 Assessing the Performance Quality of Google Translate…
study was not to compare the translation quality percentage of both criteria in Persian–English
of Google Translate in translating different translation is more than English–Persian
types of newspaper texts (i.e., scientific, translation.
cultural, political, economic, sports, etc.) and Sport texts: In English to Persian
such a comparison was postponed to future translation, all accuracy errors were related to
research. The results are as follows: the mistranslation subcategory at both levels,
and most fluency errors were related to the
Research Findings unintelligibility subcategory.
Economic texts: In English to Persian In Persian into English translation, most
translation, at both levels, the maximum accuracy errors were related to terminology
number of accuracy errors belonged to subcategory and about fluency errors; most of
mistranslation subcategory and in the case of them were related to unintelligibility.
fluency criterion, the maximum errors belonged Based on the percentages calculated for each
to unintelligibility subcategory. of the criteria in the two translated texts,
In Persian to English translation, at both percentages of accuracy and fluency criteria in
levels, the maximum number of accuracy errors the complete translation of each text were less
was related to the terminology subcategory and or equal to the sentence level translation. At
in the case of fluency criterion, the maximum both translation levels, accuracy in Persian to
errors were related to the grammar subcategory. English translation was less than English to
According to the calculation of each Persian translation, but fluency in Persian to
criterion percentage in the translation of the two English translation was more than English to
selected newspaper texts (Table 1 and Table 2), Persian translation.
the percentages of accuracy and fluency criteria Political texts: In English to Persian
in the whole translation of each text are lower translation, at both levels, the highest number of
than the sentence level translation, and the errors in accuracy criterion belonged to
percentages of both criteria in Persian into mistranslation subcategory and about fluency
English translation are more than the exact criterion the most errors belonged to
percentages in English–Persian translation. typography subcategory.
Cultural texts: In English to Persian In Persian to English translation, at both
translation, at both levels, the maximum levels, the highest number of errors in terms of
number of accuracy errors belonged to accuracy was related to mistranslation
terminology and the maximum number of subcategory and regarding fluency, the highest
fluency errors belonged to the unintelligibility number of errors was related to unintelligibility.
subcategory. Based on the percentages calculated for each
In Persian to English translation, the of the criteria in the two selected texts, accuracy
maximum number of accuracy errors at the and fluency in the complete translation of each
sentence level translation was related to text were lower than translation at the sentence
terminology, and in the translation of the whole level. In both translation levels, accuracy in
text, it was related to the subcategory of Persian–English translation was lower, but
mistranslation. The maximum number of fluency was higher than English–Persian
fluency errors in translating the whole text was translation.
related to the unintelligibility subcategory and Scientific texts: In English to Persian
at the sentence level translation, the number of translation, the number, type and severity of
errors in two subcategories of unintelligibility accuracy and fluency errors were similar at both
and grammar was the same. levels. The highest number of accuracy errors
Based on the percentages calculated for each was related to mistranslation, and the highest
of the criteria in two translated texts, the number of fluency errors was related to
percentages of accuracy and fluency criteria in unintelligibility.
the complete translation of each text are less The number and type of accuracy and
than the sentence level translation, and the fluency errors in English to Persian translation
JLT 12(4) – 2022 113

were similar at both levels. Most accuracy to English translation, accuracy criterion
errors were related to mistranslation. Half of the percentage in sentence-level translation was
fluency errors were related to unintelligibility higher, and fluency percentage was lower than
and the other half were related to grammar. complete text translation. At both levels of
According to Table 1 and Table 2, accuracy translation, the percentage of both criteria was
and fluency criteria in English to Persian higher in Persian–English translation compared
translation were equal at both levels. In Persian to English–Persian translation.

Table 1
Percentages of quality criteria for newspaper texts translated from English into Persian (at the level
of sentence and whole text)

Texts Economic Cultural Sport Political Scientific

senten senten senten wh


Criteria whol whol whole senten sentenc whol
ce ce ce ole
(percen e text e text text ce e level e text
level level level text
tage) level
Accura 89.
90.4 87.6 79.7 73.8 86.9 86.9 92.2 72.2 72.2
cy 7
Fluenc
93.7 91.8 91.2 90.9 91.1 89.5 94.5 93 92.1 92.1
y

Table 2
Percentages of quality criteria for newspaper texts translated from Persian into English (at the level
of sentence and whole text)
Texts
Economic Cultural Sport Political Scientific

Criteria senten senten


whol sentenc whol sentenc whole senten whol whole
(percentag ce ce
e text e level e text e level text ce e text text
e) level level
level
Accuracy 93.8 87.9 94.1 89.9 80.9 76.4 90 86.9 96.6 85.2

Fluency 99.8 95.2 98.2 94.6 94.8 93 99.8 94.9 98.2 98.2

Frequent errors in translating these texts by ‫( برای عالمت زدن هر جعبه‬baray e alamat zadan e
Google Translate are summarized below: har ja'be) into Persian which is a word by
• Google Translate could not understand the word translation.
implicit meaning of many satires, proverbs,
• In the UK, the Prime Minister’s building
metaphors, allusions, idioms, slang terms,
number is 10, and across the UK No. 10
cultural elements, and similar items, and has
implicitly means government; however,
made mistakes in translating them. For
Google Translate transmitted exactly the
example, the phrase to tick every box is an
phrase No.10 into the translation, without any
idiom and means “to satisfy all of the
additional cultural explanation.
apparent requirements for success” (Collins
dictionary, 2021). But such a concept is not • About some children’s activities, which have
transmitted to the Persian-speaking reader, cultural aspect, Google Translate does not
because Google Translate has translated it as have the appropriate equivalent words and
114 Assessing the Performance Quality of Google Translate…
phrases that can be understood in the target which are understandable for the reader
culture, including rainbow trails and according to context.
doorstep discos or electric milk float that
• Google Translate made mistakes in
means milk distribution car. However, it has
translating proper nouns from Persian into
been translated as‫( شناور شیر الکتریکی‬shenavar
English and translated these nouns instead of
e shir e electrici) into Persian. They are
just transliterating them, such as translating
considered major errors because Google
Fekri (person’s name) into intellectual or
Translate has translated them word by word
Esteghlal and Nassaji (club name) into
into Persian with no additional explanation.
independence and textile.
• Google Translate may make mistakes in
• Google Translate has made mistakes in
translating words with several common
converting and transferring units of weight,
meanings (polysemy), and it may misuse
distance, time, volume, etc. For example, the
different meanings. For example, the word
solar year 1398 has been transferred without
pilot, which has several meanings, should
conversion, so the reader realizes this error,
have been translated as ‫( پارکینگ مسطح‬pilot
and it is a major accuracy error. However, in
parking), but has been translated as ‫خلبان‬
another text translation, Google Translate
(aviator).
mistranslated the same solar year into 2009,
• Google Translate may leave unfamiliar words which was considered a critical accuracy
in the target culture untranslated or translate error.
them as the most similar word structure it
• Google Translate could not recognize words
knows. For example, parklet that means small
with ZWNJ in Persian texts and sometimes
park, has been translated as ‫( پارک جیبی‬pocket
connects parts of such words and translates
park), which is similar to the structure of the
the new words. Sometimes this connection
word booklet.
creates an irrelevant word, such as the phrase
• There were also some untransliterated words ‫( کمکاری برخی از بازیکنان استقالل‬the negligence
in these translations, most of which being of Esteghlal players) where the word ‫کمکاری‬
proper nouns, and they are minor fluency (negligence) has ZWNJ and has been
errors. Google Translate has transmitted changed to ‫( کمکاری‬helping), and then it has
English proper nouns exactly in Latin and not been translated to some Esteghlal players
in Persian alphabet, such as CNBC and Covid, were helping.

Table 3
Comparison of percentages averages (se is the abbreviation for sentence level, wh for whole text, A
for accuracy and F for fluency)
Accuracy Accuracy
Fluency average Fluency average
average in average in
in sentence level in whole text
sentence level whole text
translation translation
translation translation
English to Persian x̅A.se= 84.28 x̅A.wh= 82.04 x̅F.se= 92.52 x̅F.wh= 91.46
Persian to English x̅A.se= 91.08 x̅A.wh= 85.26 x̅F.se= 98.16 x̅F.wh= 95.18
*A (Accuracy)
*F (Fluency)
*se (sentence level translation)
*wh (whole text translation)
The average of accuracy percentages in 82% in complete text translation. The average
translation from English into Persian is about fluency percentages are approximately 92%
84% in sentence-level translation and about and 91% in sentence-level translation and
JLT 12(4) – 2022 115

complete translation, respectively (Table 3). languages are often relatively poor” (Aiken &
Persian's average accuracy percentages into Balan, 2011, para. 1). The results were
English are about 91% in sentence-level confirmed by Benjamin (2019): The corpus of
translation and about 85% in complete text stored texts in Google Translate is richer for
translation. The average of fluency percentage upper languages, English above all, and poorer
is approximately 98% in sentence-level for bottom languages. On the other hand, in
translation and about 95% in whole text translating all pairs of non-English languages,
translation (see Table 3). Google acts indirectly; first, it translates each
As Table 3 shows, all the averages of language into English and then translates it into
accuracy and fluency percentages in the texts the other language. Therefore, the corpus of
translated from Persian into English were English texts in Google Translate is more
higher than the averages of the texts translated prosperous than other languages (Benjamin,
from English into Persian. According to Table 2019). But according to a 2016 study on Google
1 and Table 2, all percentages related to Translate performance, the difference between
sentence-level translations are equal or higher the frequencies of different types of English–
than whole text translations. Persian and Persian–English errors did not
reach statistical significance, the conclusion
DISCUSSION being that the direction of translation does not
Machine translation (MT) is related to how a affect the quality of the translation of Google
computer software translates texts from one Translate (Ghasemi & Hashemian, 2016, pp.
language into another without human 15–16). As confirmed in the present study,
intervention. The latest approach in machine Google’s errors in translating texts from Persian
translation and the approach used in this into English are more than translating from
research is the Neural Machine Translation English into Persian.
approach (NMT). Google Neural Machine In 2019, a study was conducted on fifty
Translation (GNMT) is an NMT system that languages used in Aiken & Balan’s (2011)
was developed by Google and was introduced study using the same sample texts, which aimed
in November 2016. It uses an artificial neural to compare the improvement of Google
network to increase fluency and accuracy in Translate in terms of accuracy. In 2011, the
Google Translate. It was also stated that with approach of Google Translate was PBMT
the upgrade of Google to NMT, translation of (Phrase-Based Machine Translation), which
texts would be done at the sentence level upgraded to NMT eight years later. In the 2011
(Turovsky, 2016). study, the BLEU score in translation from
In recent years, researches have been English into Persian was 16. The 2019 study
conducted on the quality of online machine gave a 39 BLEU score for translation from
translation from English into Persian and vice- English into Persian and a 66 BLEU score for
versa, either by human assessor or machine translation from Persian into English (Aiken,
assessor, in various genres. However, it should 2019). As it was found in the current research,
be noted that the results of these researches can the quality of Google translation in translating
no longer be reliable today due to the from English into Persian is better than
improvement of approaches to machine translating from Persian into English.
translation. Here we discuss these researches According to a study conducted in 2018 to
and their results. evaluate the translation quality of three
In 2011, an automated accuracy evaluation translation machines (including Google
was done using 51 languages with Google Translate) from Persian into English, out of six
Translate based on BLEU. Due to a large common error categories, the most common
number of languages in this study, human categories were missing words, extra words,
evaluation was not used. The results showed and word orders (Sharifiyan, 2018).
that “translations between European languages In the present study, in both English into
are usually good, while those involving Asian Persian and Persian into English translations,
116 Assessing the Performance Quality of Google Translate…
the highest number of accuracy errors was Translate has a better quality performance in
mistranslation and the highest number of translating newspapers texts at the level of
fluency errors was related to unintelligibility. sentence than the level of whole text.
Among the subgroups, errors of grammar
subcategory including morphology, part of REFERENCES
speech, agreement, word order and function Aabedi, F. (2017). Quality Estimation and
words, the highest number of errors in Generated Content of Machine
translation from English into Persian and vice Translation Programs: Babylon 10
versa was related to the part of speech. Premium Pro Review and Google
As already mentioned, due to the Translator (Master’s thesis, Faculty of
improvement of machine translation Literature and Human Science, Imam
approaches, the results of previous research in Reza International University, Mashhad,
evaluation of machine translation quality have Iran). Retrieved May 8, 2021, from
been improved and this trend continues, and https://ga nj.irandoc.a c.ir/#/artic les/8
because of this progress in machine translation fb842d6931a485d5874ea36c95e49cb
quality, translation quality assessment models Aiken, M., & Balan, S. (2011). An analysis of
need to be improved as well. Google Translate accuracy. Translation
Journal, 16(2), April. Retrieved July 12,
CONCLUSION 2021, from https://t ransla tionjou rnal.ne
We used MQM-DQF model, one of the most t/journal/56google.htm.
up-to-date TQA models, and its two criteria (i.e. Aiken, M. (2019). An Updated Evaluation of
accuracy and fluency) and their subsets to Google Translate Accuracy. Studies in
assess Google Translate quality performance in Linguistics and Literature, 3. p. 253.
translating newspaper texts (Persian and 10.22158/sll.v3n3p253.
English). Andishe Borujeni, F. (2020). The quality of
It was found that the percentages of translation of cultural elements
accuracy criterion in five English newspapers translated by human and machine in
texts translated into Persian were between Natour Dasht novel based on
72.2% and 92.2%, and the percentages of Weddington model (Master’s thesis,
fluency criterion in the texts translated were Faculty of Foreign Languages, Sheikh
between 89.5% and 94.5% (Table 1). The Baha'i University). Retrieved Avril 25,
percentages of accuracy criterion in five Persian 2021, from http s://gan j.irando c.ac.ir
newspapers texts translated into English were /#/articles/24d46684b88c4d7010f7c2de
between 76.4% and 96.6%, and the percentages 74fb27ac
of fluency criterion in the translated texts were Benjamin, M. (2019). When & How to Use
between 93% and 99.8% (Table 2). The Google Translate, Retrieved March, 27,
accuracy and fluency criteria in sentence-level 2021, from https:// www.tea chyouba
translation were equal or higher than whole-text ckwards.com/how-to-use-google-tran
translation. All the averages of accuracy and slate/
fluency in the newspapers translated from Collins dictionary online. (2021). Retrieved
Persian into English were higher than those of June 9, 2021, from https:// www.coll
the newspapers translated from English into insdictionary.com/ dictionar y/english/t
Persian. Therefore, it can be concluded that ick-all-the-boxes
Google Translate has a better performance in Doherty, S. (2017). Issues in human and
translating Persian newspaper texts into English automatic translation quality assessment.
than translating newspaper texts from English In D. Kenny (Ed.), Human issues in
into Persian. Moreover, the averages of translation technology (pp. 131 – 148).
accuracy and fluency in the sentence-level London, UK: Routledge.
translation are higher than whole text Dorr, B., Olive, J., Mccary, J., & Christianson,
translation; therefore, it can be said that Google C. (2011). Machine Translation
JLT 12(4) – 2022 117

Evaluation and Optimization. Handbook Literature and Humanities, University of


of Natural Language Processing and Shahid Bahonar, kerman, Iran).
Machine Translation, by Olive, Joseph; Retrieved Feb 22, 2021 from
Christianson, Caitlin; McCary, John, https://ganj.irandoc.ac.ir/#/articles/4182
ISBN 978-1-4419-7712-0. Springer d0197e7b33873b141664483f5fd3
Science+Business Media, LLC, 2011, p. O'Brien, S. (2012). Towards a dynamic quality
745. -1. 745. 10.1007/978-1-4419-7713- evaluation model for translation. The
7_5. Journal of Specialised Translation, 17,
Ghasemi, H & Hashemian, M. (2016). A 55–77 Retrieved Avril 18, 2021from
Comparative Study of Google Translate https://www.jostrans.org/issue17/art_ ob
Translations: An Error Analysis of riNn. pNdf
English-to-Persian and Persian-to- Pajhooheshnia, M (2015). Manual and
English Translations. English Language automatic comparative evaluation in
Teaching, 9. 13. 10.5539/elt.v9n3p13. online machine translation systems;
Hakiminejad, A & Alaeddini, M. (2016). A Case study: Persian-English translation
Contrastive Analysis of Machine of technical texts (master’s thesis,
Translation (Google Translate) and Sheikhbahaei University, Esfahan, Iran).
Human Translation: Efficacy in Retrieved Avril 25, 2021 from
Translating Verb Tense from English to https://ganj. irandoc.ac.i r/#/art icles/41b
Persian. Mediterranean Journal of Social 29375cf56 G6c1ae9ff607ea8029cfe
Sciences, 7. 10.5901 /m jss.2 016.v7n Saldanha, Gabriela and O Brien, Sharon (2013).
4S2p40. Research Methodologies in Translation
Karimnia, Amin. (2011). Waddington’s model Studies. London & New York: St.
of translation quality assessment: a Jerome Publishing.
critical inquiry. Elixir Ling. & Trans. 40, Sharifiyan, L. (2018). Evaluating Cohesion and
5219-5224. Comprehensibility in Persian-English
Lommel, A and Melby, A. (2018). MQM-DQF: Machine Translated Texts (Master’s
A Good Marriage (Translation Quality thesis, Faculty of Persian Literature and
for the 21st Century), The 13th Foreign Languages, Allameh Tabataba’i
Conference of The Association for University, Tehran, Iran). Retrieved May
Machine Translation in the Americas, 2, 2021 from https ://ganj.i randoc.ac
Retrieved December, 5, 2020 from .ir/#/articles/e8bb80d702dc3792ab08
www.conference.amtaweb.org e01f4c2e36ad
Lommel et al, (2018). Harmonised Metric, Torkaman, E. (2013). A comparative Study
Qt21 Deliverable 3.1. p.23. Retrieved Quality Assessment of Machine (Google
March13, 2021 from http://w ww.qt2 translate) and Human Translation of
1.eu/ Proverbs from English to Persian
Maučec, M & Donaj, G. (2019). Machine (Master’s thesis, Islamic Azad
Translation and the Evaluation of Its University, Central Tehran Branch –
Quality. DOI: 10.5772 /intechop Faculty of Foreign Languages –
en.89063. retrieved 28 July 2021 from: Department of English). Retrieved July
htps://www.intechopen.com/chapters /6 21, 2021 from https://ga nj.irandoc
8953. .ac.ir/#/articles/51d717a038e26c50aee9
Moorkens, J et al (2018). Translation Quality df9beb143c0f
Assessment: From Principles to Practice. Turovsky, B. (2016). Found in translation:
Berlin, Germany: Springer. More accurate, fluent sentences in
Moradi, S. (2015). Comparison of translation Google Translate, Retrieved April 27,
of scientific-technical texts by two web- 2021 from https://blo g.google/pro
based translation machines based on ducts/translate/found-translation more-
Eurometrics (Master’s thesis, Faculty of
118 Assessing the Performance Quality of Google Translate…
accurate-fluent-sentences-googletran Biodata
slate/ Zahra Foradi is holds an MA in Translation
Vahedi Kakhki, A. (2018). Quality Evaluation Studies from the University of Birjand, Iran.
of Persian Crowdsourcing Translations Her research interests include machine
of Wikipedia Articles by MQM, (Master’s translation, parallel linguistic corpora, the
thesis, Faculty of Literature and history of translation, and bibliometric studies.
Humanities- Department of Foreign Email: zahraforadi@birjand.ac.ir
Languages, University of Shahid
Bahonar, Kerman, Iran). Retrieved Feb Dr. Jalilollah Faroughi Hendevalan is
11, 2021 from https:/ /ganj.ira ndoc.ac.ir associate professor at the University of Birjand,
/#/articles/1f803e5d3 3f 98f 3de 54a 959 Iran. His research interests include machine
d92bbb2d2. translation and critical discourse analysis.
Waddington, C. (2001). Different methods of Email: jfaroughi@birjand.ac.ir
evaluating student translation: The
question of validity. Meta: Translators' Dr. Mohammad Reza Rezaeian Delouei is
Journal, 46(2), 311–325 Retrieved April assistant professor at the University of Birjand,
18, 2021 from https://www.erudit.org Iran. His research interests include conceptual
/fr/revues/meta/ 2001-v46-n2-meta 159 research, interdisciplinary research,
/004583ar.pdf. bibliometric studies, and translational
language.
Email: mrrezaeiand@birjand.ac.ir

You might also like