You are on page 1of 8

1.

Bidirectional Agewigna (Himtana) - English Machine Translation using neural


network machine translation approach
Bidirectional Agewigna (Himtana) - English Machine Translation using neural
network machine Techniques
Introduction
Language is the expression of ideas using speech sounds combined into words. Words are
combined into sentences, this combination answering to that of ideas into thoughts. Natural
language is the manifestation of human cognition and human intelligence. Natural language
includes indefinite phrases and statements that correspond to imprecision in the underlying
cognitive concepts. Natural language processing helps computers communicate with humans in
their language and scales other language-related tasks. For example, NLP makes it possible for
computers to read text, hear speech, interpret it, measure sentiment, and determine which parts
are important. Natural language came into existence because when the user wishes to
communicate with the computer you cannot force users to learn a machine-specific language so
this provides to managers or children's who do not have enough time to learn new specific
languages or get skilled in them, like Amharic, Himtana, English, French, and Chinese, etc. A
language is a system, a set of rules or set of symbols

Machine translation (MT) is an automatic translation from one language to another. Is the process of
using artificial intelligence (AI) to automatically translate content from one language (the source) to
another (the target) without any human input. It works by using computer software to translate text from
one language (source language) to another language (target language), it is only in the past ten years that
machine translation has become a viable tool in more widespread use. Advances in natural language
processing, artificial intelligence, and computing power all contribute to this increasingly useful
technology.

Some and common machine translation approaches are proposed, so far, none of them have the accuracy
of a human translator, those proposed machine translation approaches are statistical machine translation
(SMT), rule-based machine translation (RBMT), a hybrid of SMT and RBMT, and Neural Machine
Translation (NMT).

Statistical Machine Translation (SMT) works by referring to statistical models that are based on the
analysis of large volumes of bilingual text. It aims to determine the correspondence between a word from
the source language and a word from the target language, A good example of this is Google Translate.
Now, SMT is great for basic translation, but its greatest drawback is that it does not factor in context,
which means translations can often be erroneous. In other words, don’t expect high-quality translations.

Rule-Based Machine Translation (RBMT) on the other hand, translates based on grammatical rules. It
conducts a grammatical analysis of the source language and the target language to generate the translated
sentence. However, RBMT requires extensive proofreading, and its heavy dependence on lexicons means
that efficiency is achieved after a long period.
Hybrid Machine Translation (HMT) as the term indicates, is a blend of RBMT and SMT. It leverages a
translation memory, making it far more effective in terms of quality. However, even HMT has its share of
drawbacks, the greatest of which is the need for extensive editing. Human translators will be required.

Neural Machine Translation (NMT) is a type of machine translation that depends on neural network
models (based on the human brain) to develop statistical models for translation. The primary benefit of
NMT is that it provides a single system that can be trained to decipher the source and target text.

The Agew people ruled during the Zagwe dynasty of Ethiopia from about 900 to 1270. The Zagwe kings built
churches carved into the rock, The Agaw or Agew (Ge'ez: አገው Agäw, modern Agew) are a Cushitic ethnic
group inhabiting Ethiopia and neighboring Eritrea. They speak the Agaw languages, which belong to the
Cushitic branch of the Afroasiatic language family, and have a high degree of mutual intelligibility between
them. As the Agaw are a minority population in both Ethiopia and Eritrea, they face substantial pressures of
assimilation, and the Agaw languages are now considered critically endangered. The Agaw or Central Cushitic
languages are Afro-Asiatic languages spoken by several groups in Ethiopia and, in one case, Eritrea. They
form the main substratum influence on Amharic and other Ethiopian Semitic languages, The Central Cushitic
languages are classified as follows, Awngi (South Agaw) spoken southwest of Lake Tana, much the largest,
with over 350,000 speakers, Bilen (North) spoken (70,000 speakers) in Eritrea around the town of Keren and
eastern Sudan around the town of Kassala, Qimant (Western Agaw), spoken by the Qemant in Semien Gondar
Zone, Xamtanga (Himtanga)(Central Agaw; also called Khamir, Khamta) 143,000 speakers in the North
Amhara Region

The Ethiopic writing system is used to represent different Semitic languages including Geez, Amharic,
Agewigna (Himtagna), and Tigrigna. These languages are mainly spoken in Ethiopia and Eritrea The
writing system of those languages are from left to right and have similar alphabets i.e. Fidel. They have
33 and 35 base symbols (consonants) with seven orders which represent seven vowels for each base
symbol respectively. The vowels of the alphabet are not encoded explicitly but appear as modifiers of the
base characters. It does not make any distinction between upper and lower case letters. There are no
systematic variations in the form of the symbol according to its position in the world.
Statement of the problem
Now a day, the world becomes closer and people are joining together to discuss different global issues to
improve their environments. As an example, different countries in the world can participate in meetings
on world issues. At the time of their meeting, they discuss different issues like climate change adaptation,
urban nature-based solutions, and social, political, and economic development integrations, and the like.
After the meeting is completed; each point of discussion will be translated into the official language of
each country. and Also, as the need for technology is increasing, the number of social media and mass
media are also increasing. These media are re-reporting their news in different languages to get more
followers from different language speakers and they are also re-reporting the news of another media
repeating in their own media channel's language. In such a case, translating the news of one channel to
another targeted channel's language is necessary. If they rely only on expert-based translation, then the
slowness and expensiveness of using expert-based translation can challenge them. So they need to change
their translation mechanism from expert-based to automated machine-based translation.
Moreover, different inspirational books and academic materials are being translated from the writer's
language to other target languages. As an example, several books are being translated from the original
language like English to different targeted languages like Amharic. However, all these listed tasks are
difficult for expert-based translation as it takes much time and requires many experts. In this case, several
studies and applications have been done for foreign languages using different methodologies and
approaches. Most of the works have been done on language pairs of English and other languages, such as
Arabic and Japanese, French, Spanish, etc. This is because English languages are the dominant language
and it is an international language. However, only a little work has been done on machine translation
systems among English and Ethiopian language such as English to Amharic and English to Affan Oromo
language, but still, there is no well-developed automated machine translation that can be used to translate
from English to Himtana language.
The study between English and Ethiopian language is done such as English to Afan-Oromo, English to
Amharic and so on those study are done with different machine translation approaches like statistical
machine translation, rule-based machine translation, hybrid machine translation, and neural machine
translation.
However, when we compare the English to Ethiopian language translation, with the result of machine
translation between English and other languages like Amharic, Afan Oromo, it is not enough to be
satisfied. According to the study, neural network machine translation is the newest and best approach.
So, it is necessary to go forward with the study of machine translation between English and Himtana
using neural network machines to improve translation quality. As human translators are working in both
directions of translation, we need to make the bi-directional translation between these languages in our
study.
Research questions
To this end, this research challenge to answer the following research questions:
1. Is the neural network machine translation approach better than other machine translation
approach for the English-Himtana machine translation system?
2. What appropriate dataset preparation and learning function is used for English to Himtana
language machine translation?
3. What neural network model is used for English to Himtana language machine translation?
4. To what extent the model is effective?
Objectives are
General Objective
The general objective of this thesis is to design and develop Bidirectional English-Agewigna (Himtana)
Machine Translation using a neural network machine translation approach.

Specific Objectives
The specific objectives are: -

 To review related works in machine translation for different languages


 Identify the linguistic behaviors of English and Himtana languages
 Prepare a parallel corpus of English - Himtana language pair
 To align the parallel documents
 Develop a bilingual dictionary
 Evaluate the translation quality of the neural network machine translation system
2. Bidirectional Agewigna (Himtana) - Amharic Machine Translation using neural
network machine translation approach
Introduction
Language is the expression of ideas using speech sounds combined into words. Words are
combined into sentences, this combination answering to that of ideas into thoughts. Natural
language is the manifestation of human cognition and human intelligence. Natural language
includes indefinite phrases and statements that correspond to imprecision in the underlying
cognitive concepts. Natural language processing helps computers communicate with humans in
their language and scales other language-related tasks. For example, NLP makes it possible for
computers to read text, hear speech, interpret it, measure sentiment, and determine which parts
are important. Natural language came into existence because when the user wishes to
communicate with the computer you cannot force users to learn a machine-specific language so
this provides to managers or children's who do not have enough time to learn new specific
languages or get skilled in them, like Amharic, Himtana, English, French, and Chinese, etc. A
language is a system, a set of rules or set of symbols

Machine translation (MT) is an automatic translation from one language to another. Is the process of
using artificial intelligence (AI) to automatically translate content from one language (the source) to
another (the target) without any human input. It works by using computer software to translate text from
one language (source language) to another language (target language), it is only in the past ten years that
machine translation has become a viable tool in more widespread use. Advances in natural language
processing, artificial intelligence, and computing power all contribute to this increasingly useful
technology.

Some and common machine translation approaches are proposed, so far, none of them have the accuracy
of a human translator, those proposed machine translation approaches are statistical machine translation
(SMT), rule-based machine translation (RBMT), a hybrid of SMT and RBMT, and Neural Machine
Translation (NMT).

Statistical Machine Translation (SMT) works by referring to statistical models that are based on the
analysis of large volumes of bilingual text. It aims to determine the correspondence between a word from
the source language and a word from the target language, A good example of this is Google Translate.
Now, SMT is great for basic translation, but its greatest drawback is that it does not factor in context,
which means translations can often be erroneous. In other words, don’t expect high-quality translations.

Rule-Based Machine Translation (RBMT) on the other hand, translates based on grammatical rules. It
conducts a grammatical analysis of the source language and the target language to generate the translated
sentence. However, RBMT requires extensive proofreading, and its heavy dependence on lexicons means
that efficiency is achieved after a long period.
Hybrid Machine Translation (HMT) as the term indicates, is a blend of RBMT and SMT. It leverages a
translation memory, making it far more effective in terms of quality. However, even HMT has its share of
drawbacks, the greatest of which is the need for extensive editing. Human translators will be required.

Neural Machine Translation (NMT) is a type of machine translation that depends on neural network
models (based on the human brain) to develop statistical models for translation. The primary benefit of
NMT is that it provides a single system that can be trained to decipher the source and target text.

The Agew people ruled during the Zagwe dynasty of Ethiopia from about 900 to 1270. The Zagwe kings built
churches carved into the rock, The Agaw or Agew (Ge'ez: አገው Agäw, modern Agew) are a Cushitic ethnic
group inhabiting Ethiopia and neighboring Eritrea. They speak the Agaw languages, which belong to the
Cushitic branch of the Afroasiatic language family, and have a high degree of mutual intelligibility between
them. As the Agaw are a minority population in both Ethiopia and Eritrea, they face substantial pressures of
assimilation, and the Agaw languages are now considered critically endangered. The Agaw or Central Cushitic
languages are Afro-Asiatic languages spoken by several groups in Ethiopia and, in one case, Eritrea. They
form the main substratum influence on Amharic and other Ethiopian Semitic languages, The Central Cushitic
languages are classified as follows, Awngi (South Agaw) spoken southwest of Lake Tana, much the largest,
with over 350,000 speakers, Bilen (North) spoken (70,000 speakers) in Eritrea around the town of Keren and
eastern Sudan around the town of Kassala, Qimant (Western Agaw), spoken by the Qemant in Semien Gondar
Zone, Xamtanga (Himtanga)(Central Agaw; also called Khamir, Khamta) 143,000 speakers in the North
Amhara Region

The Ethiopic writing system is used to represent different Semitic languages including Geez, Amharic,
Agewigna (Himtagna), and Tigrigna. These languages are mainly spoken in Ethiopia and Eritrea The
writing system of those languages are from left to right and have similar alphabets i.e. Fidel. They have
33 and 35 base symbols (consonants) with seven orders which represent seven vowels for each base
symbol respectively. The vowels of the alphabet are not encoded explicitly but appear as modifiers of the
base characters. It does not make any distinction between upper and lower case letters. There are no
systematic variations in the form of the symbol according to its position in the world.
Statement of the problem
The reason why a language translation is needed is to increase society's communication and
information exchange. Since Amharic is the working language of Ethiopian a lot of the materials
are found in Amharic, which needs to be translated into other Ethiopian languages, including
scientific and technical documentation, instruction manuals, books, administrative written
proposals, medical reports, commercial documents, newspaper, and magazines. The Media that
transmit using Himtana language translate manually, when they use the Amharic source
information. Using bidirectional neural network machine translation helps them to facilitate their
work. For countries spoken above one language transmitting the information for all people in
their language have a great value in social, political, commercially, scientifically, and
intellectually or philosophically. In Ethiopia from 70 to 80 different languages are spoken by
many parts of the countries. These different languages and ethnic groups have important social,
cultural things that need to know by other ethnic groups in the country to share the good things
and to neglect bad activity undertaken in the society.
In the social-political concern citizens of the country must have equal rights in the country
without differentiating their culture, belief, and language. Expressing their political and social
issues using their mother tongue language leads to a high value to receive full information and
they can be express easily their internal feeling with their mother tongue compared to using one
common language.
Since Amharic is the working language of Ethiopian, Amharic language takes as one subject in
every region. For those, their mother language is not Amharic, the subject becomes harder to
learn, but Himtana one of Ethiopian language is spoken bay wag-Himra people and this language can
have over 143,000 speakers but they can be dominated by Amharic and Tigrigna spikers this machine
translation makes to exposé the language, using the translation system helps to improve the language
skill since difficult sentences and phrases would translate to their mother tongue language easily.
Depending on these problems it is difficult for professionals to meet the increasing demands of
translation. Then machine translation would bridge the gap of language diversity and the work
would be less tedious and repetitive.
Research questions
To this end, this research challenge to answer the following research questions:
1. Is the neural network machine translation approach better than other machine translation
approach for the Amharic to Himtana machine translation system?
2. What appropriate dataset preparation and learning function is used for Amharic to Himtana
language machine translation?
3. What neural network model is used for Amharic to Himtana language machine translation?
4. To what extent the model is effective?
Objectives are
General Objective
The general objective of this thesis is to design and develop Bidirectional English-Agewigna (Himtana)
Machine Translation using a neural network machine translation approach.

Specific Objectives
The specific objectives are: -

 To review related works in machine translation for different languages


 Identify the linguistic behaviors of Amharic and Himtana languages
 Prepare a parallel corpus of Amharic to Himtana language pair
 To align the parallel documents
 Develop a bilingual dictionary
 Evaluate the translation quality of the neural network machine translation system

You might also like