Professional Documents
Culture Documents
● Tigrinya is a low-resourced language that is ● The data is gathered from 4 domains: Arts and
spoken by more than 10 million native Cultures, Science and Technology, Politics, and
speakers mainly in Tigray, Ethiopia and Business and Economics.
Eritrea. ● From diverse data sources including News sites,
● In recent years, we have seen some progress social media platforms, text books, Wikipedia
in the development and deployment in articles.
production of MT systems for a handful of ● The dataset contains 100 article snippets from
African languages. each domains as well as direction.
● Evaluating the quality of such systems is ● In total 805 snippets (403 Tigrinya and 402
fundamental to accelerating progress in English).
Machine Translation systems. Fig 2. Distribution of error by domain. Arts and Culture
● In this work, we evaluated the current status of Methodology followed by Science and Technology have a higher
number of errors.
state-of-the-art MT systems that support the
translation of Tigrinya to and from English: ● We used the Multidimensional Quality Metrics (MQM) Findings
Google translate, Microsoft translator, and and Dynamic Quality Framework (DQF) standard
Lesan. error typology. ● 61.2% had translation quality issues.
● Provides a common vocabulary for translation errors, ● Most common error types are Mistranslation
and it was standard topology in MT evaluation. and Omission with 66.2%.
● MQM-DQF error categories: Accuracy, Fluency, ● The translation systems perform poorly when
Terminology, Style, Design, Locale Convention, Verity. translating Tigrinya sources to English.
● Two experts participated in the evaluation process. ● Arts and Culture is the most challenging
The annotators had 72% inter-reliability agreement on followed by Science and Technology in current
labeling the error types. systems.