Professional Documents
Culture Documents
Introduction
exchange has made translation from one human language to another a matter of
to exhibit habits etc. which can be translated along a variety of channels (Bamisaye
O, 2000). There are over 6,800 living languages in the world which reflects the
language is of great interest and the means of sharing information across languages
communication and many voices will not be heard without this critical function.
(KoehnP, 2009) Showed that due to difference in culture and the multilingual
information and sharing of ideas. The need for translation is also very glaring in
the business community. It has been observed that language barriers between
companies and their global customers are stifling economic growth and in fact,
1
forty-nine percent of executives say a language barrier has stood in the way of a
major international business deal, nearly two-thirds (64 percent) of those same
Humans were originally responsible for translating from one language to another.
At a point the supply of translation services could no long keep pace with the
consuming and inadequate for addressing the real-time needs of businesses to serve
translation paramount. The search led to the discovery of what is known today as
defined as the use of computers to translate messages in the form of text or speech
from one natural language (human language) into another language of nature
(Ahmed & Mohd, 2014). It is the process of using software to translate text from
one natural language to another. This need has prompted research organizations
2
and government agencies to develop tools for automatic translation of text in
attempt to achieve wider outreach and bridge the gap of language diversity (Koehn
P, 2010).
Social and political importance emerges from the necessity to understand the other.
volumes of texts into many languages in a very limited time. For instance,
European Union allocates around €330m a year to translate from and into 23
official languages. In addition, Union allocates nearly %1 of the annual budget for
all the language services (DG Translation official website, 2014). European Union
uses an internal machine translation engine, which has shifted from rule - based to
the fact that for each step in international markets, from business agreements to
other. The delays in translation can be costly, so using MT can help translators and
Nigeria is the most populous country in Africa with a population of about 200
million people. It is also the seventh largest country in the world (Ayegba F, 2016).
Nigeria is a multilingual country with over 500 ethnic groups. This shows the level
3
of linguistic and cultural diversity in the country. Idoma is one of the ethnic groups
in Nigeria.
The Idomas are people that primarily inhabit the lower western areas of Benue
State, Nigeria, and some of them can be found in Taraba State, Cross Rivers
State, Enugu State, Kogi State and Nasarawa State in Nigeria. The Idoma
the Volta–Niger family, which include Igede, Alago, Agatu, Etulo, Ete, Akweya
(Akpa) and Yala languages of Benue, Nasarawa, Kogi, Enugu, and Northern Cross
River states. The Akweya subgroup is closely related to the Yatye-Akpa sub-
group. The bulk of the territory is inland, south of river Benue, some seventy-two
kilometres east of its confluence with River Niger. The Idoma tribe are known to
be 'warriors' and 'hunters' of class, but hospitable and peace-loving. The greater
part of Idoma land remained largely unknown to the West until the 1920s, leaving
much of the colourful traditional culture of the Idoma intact. The population of the
Idomas is estimated to be about 3.5 million. The Idoma people have a traditional
ruler called the Och'Idoma who is the head of the Idoma Area Traditional Council.
This was introduced by the British. Each community has its own traditional chief
such as the former Ad'Ogbadibo of Orokam, Late Chief D.E Enenche. The Palace
of the Och'Idoma is located at Otukpo, Benue State. The present Och'Idoma, HRM,
Elaigwu Odogbo John, the 5th Och'Idoma of the Idoma People was installed on the
4
30th of June, 2022 following the passing of his Predecessor HRH Agabaidu Elias
Ikoyi Obekpa who ruled from 1996 to October 2021. Past Och'Idomas also
include: HRH, Agabaidu Edwin Ogbu, who reigned from 1996 to 1997, HRH,
Abraham Ajene Okpabi of Igede descent who ruled from 1960 to 1995 and HRH,
Agabaidu Ogiri Oko whose reign took place between 1948 and 1959.
applications for Idoma language has shut the people out of these benefits.
to Idoma language to address this challenge has become imperative. This research
seeks to address this challenge and bring to the Idoma people the numerous
The aim of the project is to develop a system that can be used to translate
5
i. Accept as input a sentence in English language and translate it into Idoma
language.
ii. Store such translations and print out translated output if so desired
The project is aimed at creating a machine translation system that will accept
English source sentences and generate the equivalent Idoma sentences. It is not a
bidirectional application that can translate from English to Idoma and from Idoma
to English.
The need for this research arose from the fact that a lot of information and
possibilities remain hidden from the Idoma people who have little or no knowledge
of English. The research will provide Idoma people greater access to information.
It will also give Idoma language a public profile in the information technology
world and provide a platform for people to really appreciate the beauty of their
indigenous language.
A survey of existing literatures and the internet shows that Idoma is not a
6
either in hardcopies or in digitalized form. This greatly impeded the development
process.
limitations and organization of the project, which holds the foundation for the
and the work done by scholars in this field. It describes language translators with
translation and the various technologies for building language translators were
discussed.
Chapter Three:This chapter presents the analysis of Idoma and English language.
7
language translator designed. Here the output was tested to see if the stated
Parser: A natural language parser is a program that works out the grammatical
structure of sentences, for instance, which groups of words go together (as
"phrases") and which words are the subject or object of a verb.
8
Target Language: The language into which someone or an application translates
or interprets.
9
CHAPTER TWO
Literature Review
One theory argues that the origin of all languages was the same, but they slowly
evolved and made thoroughly different entities, just like the animals did. However,
considering the same root for all languages requires more evidence.
The first language on earth might be the origin of all languages or a dead language
that fathered only a few of today’s languages. Since language is 150,000 years old
and writing is only 6000, no written evidence of languages before can answer this
question.
The origin of language was perhaps the need to communicate. Maybe the initial
words were only howls and hoots, but eventually, they evolved to form a
The babel myth documented in Genesis 11:6-9 indicates that there was a time
when all human beings spoke one language. Men later developed an inordinate
ambition of building a city and a tower contrary to The Creator’s plan and purpose.
10
different groups of people to occupy different parts of the universe. The resources
of nature are not evenly distributed, what is found in one part may not be found in
another. This made people to travel from one region to another to meet their needs.
information one needs to play a full part in society (Andy & Hany, 2009).
Human translators have practical world knowledge which gives them the ability
to determine the proper connection between the words and between the sentences
throughout the document. This way the translator creates a legible document that is
also logical and contains the correct grammar and accurate connections. Despite
the fact human translators produce accurate translation, only a limited number of
human translators are available. According to market studies, the demand for
translation outweighs its supply. Apart from being in short supply human
translators are expensive and much time is spent carrying out translation. To meet
11
up with the ever increasing demand for translation, translation technologies were
evolved.
Machine translation was one of the first applications of the computers, and the idea
was conceived even before the invention of computers (Hutchins, 1986). The fall
of Latin as the universal scientific language and the supposed inability of natural
Leibniz to come up with the idea of numerical codes for languages. Descartes, in a
letter dated 1692, described a universal language cipher, where the lexical
equivalents of the all known languages would be given the same code (Hutchins,
1986). Such dictionaries were actually published by three people; by Cave Beck in
(Hutchins, 1986).
Automatic translation between human languages has been a long term scientific
dream. The research started in the 1930s. The research was aimed at developing
software capable of translating from one language to another with minimal human
participation. The first recorded success in machine translation took place in the
middle of the 1950s when a team of scientists from Georgetown University, USA,
English. This success led many universities to establish their own development
12
centers for machine translation. The centers needed fund to proceed with the
research and government was appealed to for funding. The government in response
(ALPAC) which was commissioned in 1964. The committee was to report to the
government on the state of the play with respect to machine translation as regards
quality, cost and prospect. The committee submitted a negative report in 1966 and
concluded that there was no shortage of human translators and that there was no
Optimism and enthusiasm for machine translation resurfaced in the 1980s for two
stimulated the demand for translation, and secondly because large-scale access to
13
establishing more business in different parts of the world and the need for
Omachonu, G, 2011,Fahime & Abbas, 2012, Finch & Hwang etal, 2005) provide
more details.
Machine translation approaches can be divided into different categories. Under this
classification, two main paradigms can be found: the rule-based approach and the
knowledge. The input is transferred to the target using a large set of sophisticated
require large initial investment and maintenance for every language pair (Egbunu,
F, 2013). Also within the empirical-based paradigm, two other approaches can be
14
built by human experts. The advantage is that, once the required techniques have
quickly developed for new language pairs using provided training data.
Although the rule based system require significant amount of linguistic knowledge,
the knowledge acquired for one natural language processing system may be reused
to build knowledge required for a similar task in another system. (Hieu, H, 2011)
approach for two main reasons: 1: less-resourced languages, for which large
are neither available nor easily affordable, and 2: for morphologically rich
languages, which even with the availability of corpora suffer from data sparseness.
It is clear from this argument that each of these technologies or approaches has
their strengths and weaknesses which will be discussed in detail in latter sections.
based translation.The basic idea is that the words will be translated word by word,
usually without much consideration for context match between them (Ibrahim, S
source language is selected, and then looked up in the dictionary for the
15
corresponding word or sentence in the target language. That is why the literal
versatile.
This approach also known as first generation approachis the original and oldest
translation strategy, which was employed around the 1950sto 60s when the need
for machine translation was mounting. It performs a word for word or phrase for
phrase translation. The word order of the target language text is the same as that of
the source language even where the target language does not permit the same word
order (Banjo & Jibowo, 2011). The method has no capacity for rearranging
syntactic construction or lexical selection. This means that the sentence is not
Osborne etal, 2006) the method depends heavily on a large bilingual dictionary and
lack separation between analysis and generation. Given any source sentence the
system picks up the direct equivalent of each target word from the bilingual
dictionary and presents it in the same order they appear in the source sentence.
Another problem with the literal translation approach is the lexical selection
problem. The approach does not analyze and translate words from their context.
This is especially problematic when the words to be translated have more than one
meaning.
16
One more obvious limitation of this method which was pointed out by (Roberto, N,
These limitations notwithstanding the direct model has one advantage, it is highly
Along with the development of the literal translation method, the transfer-based
sentence structure and generates the target-language text based on the different
The transfer model belongs to the second generation of machine translation (mid
60s to 1980s). This model is more is more advanced than the direct model because
it does not merely perform local and morphological analysis but carries out both
comprehensive analysis of the source text. It has rules that map the grammatical
rules which are used for the structural transformation of phrases and resolving
ambiguity are stored in a database(Howard, J, 1982). The rules are also stored as
facts in a rule base(Rekha & Neha, 2012). It is these rules that ensure that the
17
The advantages of the transfer-based approach were discussed in (Ikani, F, 2010,
Gurlen & Navjor, 2013, Banjo &Jibowo, 2011). These advantages which makes
system, one need only provide transfer components for the new language pair(s),
and monolingual components (analysis and synthesis) for the new languages.
less time and effort than Interlingua. Four, acquisition of linguistic knowledge is
easy, and the relevant set of rules is easy to construct, understand, modify and
maintain. Five, ambiguities that carry over from one language to another are
inherent limitations namely: A large set of transfer rules must be created for each
18
TAUM (Arnold & Sadler, 1990) and METEO (Asad & Habib, 2013) are examples
of transfer-based method.
SMT uses statistical analysis and predictive algorithms to define rules that are best
suited for target sentence translation. These models are trained using a bilingual
corpus.Based on the subject matter text that is used to train a corpus, the SMT will
be best suited for documents pertaining to the same subject. Usually, a solid corpus
IBM in the late 1980s and early 1990s (Nagao, M, 1984, Newmark, P. 1988).
Though SMT came to the scene lately it has become de facto technology for
community and the commercial sector. The requirement for using SMT approach
aligned bilingual corpora (Amparo, A, 2014). The progress of SMT has been
supported by the availability of large parallel corpus such as the Arabic –English
19
[96], the Europarl corpus(Omachonu &David, 2012) and the JRC-Acquis corpus
The notion of SMT implies the use of statistics. It is based on statistics derived
from the corpora of the naturally occurring language. The translation is done
( e.g English) is the translation of the string f in the source language (e.g French).
introduced by Makoto Nagao in 1984 Omar & Nazlia etal, 2010). It performs
example stored in a textual database and input expressions are rendered in the
target language by retrieving from the database that example which is most similar
to the input. EBMT has been described by different researchers in different ways;
(Carbonell & Klein etal, 2006) called it “case based”, (Nameh &Fakhrahmad etal,
EBMT and SMT share some similarities both in strengths and weaknesses.
20
Both rule based and empirical based approaches discussed so far have their
inherent strengths and weaknesses. The rule based technology requires significant
linguistic expertise which has to be manually encoded into their data structures and
and robustness. The empirical based technology needs large amount of data which
is usually not readily available especially for resource-poor languages, it also fails
when selection preferences need to be based on distant words. Due to the inherent
weaknesses of these approaches none has been able to singly achieve the required
level of accuracy and quality in translation. This situation led to the adoption of a
satisfactory result. Some popular machine translation systems which employ the
hybrid technology are PROMT, SYSTRAN and Asia Online. App Tek delivered
In 2017, Machine Translation made another technological leap with the advent of
21
power of Artificial Intelligence (AI) and uses neural networks to generate
translations.
deep learning algorithm in which massive dataset of translated sentences are used
NMT was introduced by (Kalchbrenner & Phil, 2013, Sutskever &Oriol etal, 2014,
Cho & Bart et al, 2014) who defined Recurrent Neural Networks (RNN) models
Before NMT, statistical machine translation (SMT) provided the most state-of-the-
art results. While many initially believed that SMT would eventually become the
that went into a single translation model and lack of generalizability of a model
At the very high level, NMT models are comprised of an Encoder and a Decoder,
both are Recurrent Neural Networks that are trained jointly. An attention
mechanism helps aligning the input tokens to the output tokens in order to facilitate
22
the translation. The Encoder reads the input sentence and generates a sequence of
hidden states. These hidden states are then used by the Decoder to generate a
Although Neural machine translation has emerged as the most promising machine
benchmarks and rapid adoption in deployments by, e.g., Google, Systran, and
WIPO there have been reports of poor performance, such as the systems built
23
CHAPTER THREE
words are formed and modified. Some languages, like English, rely heavily on
new words, while others, like Chinese, utilize more analytic or isolating
morphological strategies.
Syntax: Syntax refers to the rules governing the combination of words to form
phrases, clauses, and sentences. Languages exhibit diverse word order patterns
24
Semantics: Semantic structures vary in how meanings are encoded and interpreted.
semantics (meaning derived from the combination of words and phrases), and
These differences reflect the rich diversity of human linguistic expression and the
The two language pairs, English and Idoma were carefully studied in terms
experience and knowledge about Idoma history. The rules that govern the
25
Idoma is a fixed word order SVO (subject, verb, object) like English but the
arrangement of words in noun phrase and adjective phrase are not the same.
English places modifiers before nouns in noun phrases, Idoma does the reverse,
languages was carried out. This knowledge provided the basis for the design of the
rule base, the inference engine and the full-form lexicon which are essential
components of the proposed rule based system for automatic translation of English
2016, Ugwu, Imoh & Yakubu, 201..) the following the transformational rules that
govern the translation of English and Idoma phrases presented in the tables below
were formulated.
It was observed that Idoma language has no auxiliary verbs, therefore some
English words like is, are, of, a, an, have no equivalents in Idoma language.
26
3.1.1 Part of Speech tag system
Words are grouped into categories called parts of speech. There are eight parts of
The meaning of some words in Idoma depends on the nouns that follow them, for
this reason, we had to develop a part of speech tag system for the machine
translation differently from the conventional part of system so that meanings can
be appropriately conveyed. The part of speech tag system is presented in table 3.6
below.
27
14 ADJ Adjective
15 ADV Adverb
16 CDN Cardinal Numbers
17 ODN Ordinal Numbers
18 QTN Quantifiers
19 DEM Demonstratives
20 PDT PreDeterminers
21 PRP Preposition
22 V General verbs
28
3.2.2Strength of the present system
knowledge. This makes the translation of English to Idoma carried out by human
The proposed rule based machine translation system from English to Idoma
language will overcome the identified weaknesses of the present system and
29
3.3 System design
Bilingual Dictionary DB
Rule Engine
Arrays of English words, POS and Idoma Equivalents
Idoma
Sentence
The full form bilingual lexicon which contains the English words and Idoma
equivalents together with the parts of speech was developed in Microsoft Access
database platform. The rule engine which applies a collection of lexical and
syntactic transfer rules to generate the Idoma equivalent of the English sentence
30
3.3.2 Database Specification and Platform
The relational database model was used for building the database and MYSQL is
the relational database model of choice. The database schema is described below:
This is the full form bilingual dictionary contains an English word and the
Figure 3.2 displays the layout of the input and output screen. The user enters the
text to be translated in the space in the first section under Enter English text to
Translate and clicks the Translate button. The output of the translation is
displayed in the second section of the screen under Translated Idoma text.
31
English to Idoma Automatic Translator
The inference engine, which contains the rules for translating input sentence in
32
The name PHP stands for Hypertext Preprocessor and denotes a server-side
scripting language, which suggests that applications written thereon run on web
one among main languages for developing web based applications. Leading social
networking sites like Facebook and reputed organizations like Harvard University
both support PHP which makes PHP popular and increases its credibility.
33