You are on page 1of 1

ERROR ANALYSIS OF TIGRINYA – ENGLISH MACHINE TRANSLATION SYSTEMS

Negasi Haile*, Nuredin Ali, Asmelash Teka*


Lesan.ai, Berlin, Germany*
University of Minnesota, Twin Cities, MN
negasihaile.abadi@gmail.com | ali00530@umn.edu | asme@lesan.ai

Introduction Data Collection

● Tigrinya is a low-resourced language that is ● The data is gathered from 4 domains: Arts and
spoken by more than 10 million native Cultures, Science and Technology, Politics, and
speakers mainly in Tigray, Ethiopia and Business and Economics.
Eritrea. ● From diverse data sources including News sites,
● In recent years, we have seen some progress social media platforms, text books, Wikipedia
in the development and deployment in articles.
production of MT systems for a handful of ● The dataset contains 100 article snippets from
African languages. each domains as well as direction.
● Evaluating the quality of such systems is ● In total 805 snippets (403 Tigrinya and 402
fundamental to accelerating progress in English).
Machine Translation systems. Fig 2. Distribution of error by domain. Arts and Culture
● In this work, we evaluated the current status of Methodology followed by Science and Technology have a higher
number of errors.
state-of-the-art MT systems that support the
translation of Tigrinya to and from English: ● We used the Multidimensional Quality Metrics (MQM) Findings
Google translate, Microsoft translator, and and Dynamic Quality Framework (DQF) standard
Lesan. error typology. ● 61.2% had translation quality issues.
● Provides a common vocabulary for translation errors, ● Most common error types are Mistranslation
and it was standard topology in MT evaluation. and Omission with 66.2%.
● MQM-DQF error categories: Accuracy, Fluency, ● The translation systems perform poorly when
Terminology, Style, Design, Locale Convention, Verity. translating Tigrinya sources to English.
● Two experts participated in the evaluation process. ● Arts and Culture is the most challenging
The annotators had 72% inter-reliability agreement on followed by Science and Technology in current
labeling the error types. systems.

Main Contributions Implications


● Current Tigrinya MT systems perform relatively
● Evaluate current state of Tigrinya-English well on particular domains such as Politics, and
Business and Economics.
Machine Translation Systems.
● Increasing domain diversity to the training
● Quantify the most common translation
sources.
issues present in current machine translation
● Incorporating of abbreviations and named
systems for Tigrinya to and from English.
entities in to avoid code mixing.
● Through a comprehensive analysis of their ● Utilization of diverse data sources may aid in
weaknesses, we provided practical addressing issues with handling multiple
suggestions for improvement. dialects and styles.

Fig 2. Distribution of error by translation direction. The


systems perform poorly when going from Tigrinya to English.

You might also like