EIdoma Translator

CHAPTER ONE
Introduction
1.1 Background of Research
The ever-increasing need for cross-regional communication and Information
exchange has made translation from one human language to another a matter of
absolute necessity in today’s highly globalized and connected/networked world.
Language is the medium of communication. Human language is purposively to
communicate ideas, emotions, feelings, desires, to co-operate among social groups,
to exhibit habits etc. which can be translated along a variety of channels (Bamisaye
O, 2000). There are over 6,800 living languages in the world which reflects the
scope of linguistic and cultural diversity. Access to information written in another
language is of great interest and the means of sharing information across languages
is translation, therefore developing technologies for translating from one language
to another is very important. Without translation, there can be no cross-regional
communication and many voices will not be heard without this critical function.
(KoehnP, 2009) Showed that due to difference in culture and the multilingual
environment in India inter-language translation was necessary for the transfer of
information and sharing of ideas. The need for translation is also very glaring in
the business community. It has been observed that language barriers between
companies and their global customers are stifling economic growth and in fact,
1
forty-nine percent of executives say a language barrier has stood in the way of a
major international business deal, nearly two-thirds (64 percent) of those same
executives said language barriers are making it difficult to gain a foothold in
international markets, whether inside or outside your company, your global
audiences prefer communicating with you in their native languages. It increases
efficiency, receptivity, and allows for easier understanding of concepts (Ayegba F,
2016). Language translation is imperative in the globally united and yet
linguistically and culturally separated world in which we live.
Humans were originally responsible for translating from one language to another.
At a point the supply of translation services could no long keep pace with the
demand for translated content, moreover human translation is costly, time
consuming and inadequate for addressing the real-time needs of businesses to serve
multilingual prospects, partners and customers (Ayegba F, 2016). The inherent
limitations of human translation made the search for an alternative means of
translation paramount. The search led to the discovery of what is known today as
machine translation or computer assisted translation. Machine Translation (MT) is
defined as the use of computers to translate messages in the form of text or speech
from one natural language (human language) into another language of nature
(Ahmed & Mohd, 2014). It is the process of using software to translate text from
one natural language to another. This need has prompted research organizations
2
and government agencies to develop tools for automatic translation of text in
attempt to achieve wider outreach and bridge the gap of language diversity (Koehn
P, 2010).
MT has proved to be of social, political, scientific and philosophical importance.
Social and political importance emerges from the necessity to understand the other.
Binational or multinational countries and organizations need to translate great
volumes of texts into many languages in a very limited time. For instance,
European Union allocates around €330m a year to translate from and into 23
official languages. In addition, Union allocates nearly %1 of the annual budget for
all the language services (DG Translation official website, 2014). European Union
uses an internal machine translation engine, which has shifted from rule - based to
statistical MT system in the recent years. Commercial importance emerges from
the fact that for each step in international markets, from business agreements to
instruction manuals, translation is a requirement for people to interact with each
other. The delays in translation can be costly, so using MT can help translators and
trading parties in the most efficient ways.
Nigeria is the most populous country in Africa with a population of about 200
million people. It is also the seventh largest country in the world (Ayegba F, 2016).
Nigeria is a multilingual country with over 500 ethnic groups. This shows the level
3
of linguistic and cultural diversity in the country. Idoma is one of the ethnic groups
in Nigeria.
The Idomas are people that primarily inhabit the lower western areas of Benue
State, Nigeria, and some of them can be found in Taraba State, Cross Rivers
State, Enugu State, Kogi State and Nasarawa State in Nigeria. The Idoma
language is classified in the Akweya subgroup of the Idomoid languages of
the Volta–Niger family, which include Igede, Alago, Agatu, Etulo, Ete, Akweya
(Akpa) and Yala languages of Benue, Nasarawa, Kogi, Enugu, and Northern Cross
River states. The Akweya subgroup is closely related to the Yatye-Akpa sub-
group. The bulk of the territory is inland, south of river Benue, some seventy-two
kilometres east of its confluence with River Niger. The Idoma tribe are known to
be 'warriors' and 'hunters' of class, but hospitable and peace-loving. The greater
part of Idoma land remained largely unknown to the West until the 1920s, leaving
much of the colourful traditional culture of the Idoma intact. The population of the
Idomas is estimated to be about 3.5 million. The Idoma people have a traditional
ruler called the Och'Idoma who is the head of the Idoma Area Traditional Council.
This was introduced by the British. Each community has its own traditional chief
such as the former Ad'Ogbadibo of Orokam, Late Chief D.E Enenche. The Palace
of the Och'Idoma is located at Otukpo, Benue State. The present Och'Idoma, HRM,
Elaigwu Odogbo John, the 5th Och'Idoma of the Idoma People was installed on the
4
30th of June, 2022 following the passing of his Predecessor HRH Agabaidu Elias
Ikoyi Obekpa who ruled from 1996 to October 2021. Past Och'Idomas also
include: HRH, Agabaidu Edwin Ogbu, who reigned from 1996 to 1997, HRH,
Abraham Ajene Okpabi of Igede descent who ruled from 1960 to 1995 and HRH,
Agabaidu Ogiri Oko whose reign took place between 1948 and 1959.
1.2 Statement of Problem

The benefits of Machine translation systems have social, political, scientific,
philosophical and economic dimensions. The absence of machine translation
applications for Idoma language has shut the people out of these benefits.
Therefore the development of a system for automatic translation of English content
to Idoma language to address this challenge has become imperative. This research
seeks to address this challenge and bring to the Idoma people the numerous
benefits associated with machine translation.
1.3 Aim and Objectives of Project
The aim of the project is to develop a system that can be used to translate
simple English sentences to Idoma language.
The specific objectives are:
To develop a language processor that will have the capacity to:
5
i. Accept as input a sentence in English language and translate it into Idoma
language.
ii. Store such translations and print out translated output if so desired
1.4 Scope of Research
The project is aimed at creating a machine translation system that will accept
English source sentences and generate the equivalent Idoma sentences. It is not a
bidirectional application that can translate from English to Idoma and from Idoma
to English.
1.5 Significance of Research
The need for this research arose from the fact that a lot of information and
possibilities remain hidden from the Idoma people who have little or no knowledge
of English. The research will provide Idoma people greater access to information.
It will also give Idoma language a public profile in the information technology
world and provide a platform for people to really appreciate the beauty of their
indigenous language.
1.6 Limitations of Research
A survey of existing literatures and the internet shows that Idoma is not a
well-documented language. Linguistic resources such as parsers, morphological
analyzers, parallel corpora, part-of- speech taggers, bilingual dictionaries which
facilitate rapid development of machine translation applications are non-existent
6
either in hardcopies or in digitalized form. This greatly impeded the development
process.
1.7 Project Outline
This thesis is organized into five Chapters and a number of appendices.
Chapter One: Lists criteria, objectives, statement of the problem, scope,
limitations and organization of the project, which holds the foundation for the
output of the project.
Chapter Two: Explores existing literatures on language and machine translation
and the work done by scholars in this field. It describes language translators with
examples. The components of machine translators, the procedures for language
translation and the various technologies for building language translators were
discussed.
Chapter Three:This chapter presents the analysis of Idoma and English language.
Existing methods of English-Idoma translation were studied and the attendant
difficulties inherent in the translation were identified.The challenges of translating
from English to Idoma were identified and discussed.
Chapter Four:This chapter subjects the Machine Translator to a technical
translation session of English to Idoma,discusses the test and evaluation of the
7
language translator designed. Here the output was tested to see if the stated
objectives of the research were achieved.The implementation of the proposed
system was also presented in this chapter.
Chapter Five: Discusses the summary, conclusions and recommendation for
extended research by future researchers.
1.8 Definition of Technical Terms
Morphological Analyzer: A morphological analyzer is a program for analyzing

the morphology of an input word; it detects morphemes of any text.
Morphemes: The minimal distinctive unit of grammar in a language.
Parser: A natural language parser is a program that works out the grammatical
structure of sentences, for instance, which groups of words go together (as
"phrases") and which words are the subject or object of a verb.
Corpus: In linguistics, a corpus (plural corpora) or text corpus is a large and

structured set of texts usually electronically stored and processed.
Parallel corpus:A parallel corpus is a corpus that contains a collection of original

texts in language L1 and their translations into a set of languages L 2 ... Ln. In most
cases, parallel corpora contain data from only two languages
Bilingual Dictionary: A bilingual dictionary or translation dictionary is a

specialized dictionary used to translatewords or phrases from one language to
another.
Part-of- speech tagger:A Part-Of-Speech Tagger (POS Tagger) is a piece of

software that reads text in some language and assigns parts of speech to each word
(and other token), such as noun, verb, adjective, etc., although generally
computational applications use more fine-grained POS tags like 'noun-plural'.
Source Language: The language in which a text appears that is to be translated

into another language.
8
Target Language: The language into which someone or an application translates
or interprets.
9
CHAPTER TWO
Literature Review
2.0 Origin of languages and the need for Translation
Language started around 150,000 years ago to meet humans’ communicational
needs. The origin of language is under debate as evidence of languages before
writing is almost impossible to find.
One theory argues that the origin of all languages was the same, but they slowly
evolved and made thoroughly different entities, just like the animals did. However,
considering the same root for all languages requires more evidence.
The first language on earth might be the origin of all languages or a dead language
that fathered only a few of today’s languages. Since language is 150,000 years old
and writing is only 6000, no written evidence of languages before can answer this
question.
The origin of language was perhaps the need to communicate. Maybe the initial
words were only howls and hoots, but eventually, they evolved to form a
systematic way of communication for humans.
The babel myth documented in Genesis 11:6-9 indicates that there was a time
when all human beings spoke one language. Men later developed an inordinate
ambition of building a city and a tower contrary to The Creator’s plan and purpose.
As a result, God gave people different languages. This resulted in movement of
10
different groups of people to occupy different parts of the universe. The resources
of nature are not evenly distributed, what is found in one part may not be found in
another. This made people to travel from one region to another to meet their needs.
A need for a means of information exchange arose that led to translation.
Translation is necessary for cross-regional communication and for gathering the
information one needs to play a full part in society (Andy & Hany, 2009).
Translation is a social, political, scientific, philosophical as well as economic
necessity our multilingual society. Translation is essential for international and
intercultural activity, for it facilitates mutual understanding among different and
conflicting racial, ethnic, religious and cultural groups.
2.1 Human-driven Translation
Human translators have practical world knowledge which gives them the ability
to determine the proper connection between the words and between the sentences
throughout the document. This way the translator creates a legible document that is
also logical and contains the correct grammar and accurate connections. Despite
the fact human translators produce accurate translation, only a limited number of
human translators are available. According to market studies, the demand for
translation outweighs its supply. Apart from being in short supply human
translators are expensive and much time is spent carrying out translation. To meet
11
up with the ever increasing demand for translation, translation technologies were
evolved.
2.2 History of Machine Translation
Machine translation was one of the first applications of the computers, and the idea
was conceived even before the invention of computers (Hutchins, 1986). The fall
of Latin as the universal scientific language and the supposed inability of natural
languages to express thought unambiguously led thinkers such as Descartes and
Leibniz to come up with the idea of numerical codes for languages. Descartes, in a
letter dated 1692, described a universal language cipher, where the lexical
equivalents of the all known languages would be given the same code (Hutchins,
1986). Such dictionaries were actually published by three people; by Cave Beck in
1657, by Athanasius Kircher in 1663 and by Johann Joachim Becker in 1661
(Hutchins, 1986).
Automatic translation between human languages has been a long term scientific
dream. The research started in the 1930s. The research was aimed at developing
software capable of translating from one language to another with minimal human
participation. The first recorded success in machine translation took place in the
middle of the 1950s when a team of scientists from Georgetown University, USA,
had a machine successfully translate a number of sentences from Russian to
English. This success led many universities to establish their own development
12
centers for machine translation. The centers needed fund to proceed with the
research and government was appealed to for funding. The government in response
set up a committee called Automatic Language Processing Advisory Committee
(ALPAC) which was commissioned in 1964. The committee was to report to the
government on the state of the play with respect to machine translation as regards
quality, cost and prospect. The committee submitted a negative report in 1966 and
concluded that there was no shortage of human translators and that there was no
immediate prospect of machine translation producing useful translation of general
scientific test. The report led to withdrawal of funding and demoralization.
Research in machine translation came to a standstill at this point.
Optimism and enthusiasm for machine translation resurfaced in the 1980s for two
reasons, first the administrative and commercial needs of multilingual communities
stimulated the demand for translation, and secondly because large-scale access to
personal computers and word-processing programs produced a market for less
expensive machine translation systems. The important machine translation
applications developed at this time were GETA-Ariane (Grenoble), SUSY
(Saarbrücken), MU (Kyoto), and Eurotra (the European Union).
The beginning of the 1990s also witnessed vital developments in machine
translation with the emergence of different translation technologies. To this day
machine translation has continued to progress fueled by the competition towards
13
establishing more business in different parts of the world and the need for
localization of industrial products and services as well as the provision of
information to a global audience. (Andy & Hany, 2009, Maryann F, 2009,
Omachonu, G, 2011,Fahime & Abbas, 2012, Finch & Hwang etal, 2005) provide
more details.
2.3 Machine Translation Approaches
Machine translation approaches can be divided into different categories. Under this
classification, two main paradigms can be found: the rule-based approach and the
empirical-based or data driven approach. Rule-based translation systems can be
divided into three catalogs: literal translation method, interlingua-based method
and transfer-based method. Rule-based systems are based on linguistically-
informed foundations requiring extensive morphological, syntactic and semantic
knowledge. The input is transferred to the target using a large set of sophisticated
linguistic translation rules. Translation rules are created manually, demanding
significant multilingual and linguistic expertise. Therefore, rule-based systems
require large initial investment and maintenance for every language pair (Egbunu,
F, 2013). Also within the empirical-based paradigm, two other approaches can be
further distinguished:example-based and statistical-based and context based
(Ibrahim, S (2014). Under the empirical-based approach the knowledge is
automatically extracted by analyzing translation examples from a parallel corpus
14
built by human experts. The advantage is that, once the required techniques have
been developed for a given language pair, MT systems should – theoretically be
quickly developed for new language pairs using provided training data.
Although the rule based system require significant amount of linguistic knowledge,
the knowledge acquired for one natural language processing system may be reused
to build knowledge required for a similar task in another system. (Hieu, H, 2011)
posited that rule-based approach is better than its counterpart corpus-based
approach for two main reasons: 1: less-resourced languages, for which large
corpora, possibly parallel or bilingual, with representative structures and entities
are neither available nor easily affordable, and 2: for morphologically rich
languages, which even with the availability of corpora suffer from data sparseness.
It is clear from this argument that each of these technologies or approaches has
their strengths and weaknesses which will be discussed in detail in latter sections.
2.3.1 Literal translation method
Literal translation method: It is a simple form of rule-based machine translation.
Literal translation is called direct translation, word-based translation or dictionary-
based translation.The basic idea is that the words will be translated word by word,
usually without much consideration for context match between them (Ibrahim, S
(2014). As an example, it basically works as follows: a word or sentence from the
source language is selected, and then looked up in the dictionary for the
15
corresponding word or sentence in the target language. That is why the literal
translation is generally designed for a particular language pair and it is not
versatile.
This approach also known as first generation approachis the original and oldest
translation strategy, which was employed around the 1950sto 60s when the need
for machine translation was mounting. It performs a word for word or phrase for
phrase translation. The word order of the target language text is the same as that of
the source language even where the target language does not permit the same word
order (Banjo & Jibowo, 2011). The method has no capacity for rearranging
syntactic construction or lexical selection. This means that the sentence is not
analyzed structurally or morphologically. It maps directly from source language to
target language with very minimal analysis. According to (Callison-Burch &
Osborne etal, 2006) the method depends heavily on a large bilingual dictionary and
lack separation between analysis and generation. Given any source sentence the
system picks up the direct equivalent of each target word from the bilingual
dictionary and presents it in the same order they appear in the source sentence.
Another problem with the literal translation approach is the lexical selection
problem. The approach does not analyze and translate words from their context.
This is especially problematic when the words to be translated have more than one
meaning.
16
One more obvious limitation of this method which was pointed out by (Roberto, N,
2009)is lack of extensibility, adding a fresh language pair (direction of translation)
to a direct system is hardly distinguishable from creating an entirely new system
These limitations notwithstanding the direct model has one advantage, it is highly
robust and simple to implement.
2.3.2 TRANSFER METHOD
Along with the development of the literal translation method, the transfer-based
method was proposed. The transfer-based method performs an analysis of the
sentence structure and generates the target-language text based on the different
linguistic rules of the different languages.
The transfer model belongs to the second generation of machine translation (mid
60s to 1980s). This model is more is more advanced than the direct model because
it does not merely perform local and morphological analysis but carries out both
regional and grammatical analysis. In other words the model conducts a
comprehensive analysis of the source text. It has rules that map the grammatical
segments of the source sentence to a representation in the target language. These
rules which are used for the structural transformation of phrases and resolving
ambiguity are stored in a database(Howard, J, 1982). The rules are also stored as
facts in a rule base(Rekha & Neha, 2012). It is these rules that ensure that the
translation is both morphologically and structurally correct in terms of word order.
17
The advantages of the transfer-based approach were discussed in (Ikani, F, 2010,
Gurlen & Navjor, 2013, Banjo &Jibowo, 2011). These advantages which makes
the transfer-based architecture appealing for many researchers includes: First, is
applicability. While it is difficult to reach the level of abstractness required in
interlingua systems, the level of analysis in transfer models is attainable. Second,
extensibility,To add a fresh language pair (or direction of translation) in a transfer
system, one need only provide transfer components for the new language pair(s),
and monolingual components (analysis and synthesis) for the new languages.
Existing monolingual components can be preserved. For example, an English-
Portuguese module may share several transformations with an English-Spanish
module. Third, ease of implementation. Developing a transfer based system require
less time and effort than Interlingua. Four, acquisition of linguistic knowledge is
easy, and the relevant set of rules is easy to construct, understand, modify and
maintain. Five, ambiguities that carry over from one language to another are
handled with minimal effort.
The above advantages notwithstanding, transfer based architecture, has some
inherent limitations namely: A large set of transfer rules must be created for each
source language/target language pair, a translation system that accommodates n
languages requires n2 set of translation rules.
18
TAUM (Arnold & Sadler, 1990) and METEO (Asad & Habib, 2013) are examples
of transfer-based method.
2.3.3 Statistical Machine Translation (SMT)
SMT uses statistical analysis and predictive algorithms to define rules that are best
suited for target sentence translation. These models are trained using a bilingual
corpus.Based on the subject matter text that is used to train a corpus, the SMT will
be best suited for documents pertaining to the same subject. Usually, a solid corpus
requires 100 million words and 1 million aligned sentences to be effective.SMT
can be approached through different subgroups: word-based, phrase-based, syntax-
based and hierarchical phrase-based.
The first statistical approach to MT was pioneered by a group of researchers in
IBM in the late 1980s and early 1990s (Nagao, M, 1984, Newmark, P. 1988).
Though SMT came to the scene lately it has become de facto technology for
building MT systems. It has gained tremendous momentum both in the research
community and the commercial sector. The requirement for using SMT approach
in machine translation is the availability of large, good quality and representative
aligned bilingual corpora (Amparo, A, 2014). The progress of SMT has been
supported by the availability of large parallel corpus such as the Arabic –English
and Chinese – English parallel corpora distributed by Linguistic Data Consortium
19
[96], the Europarl corpus(Omachonu &David, 2012) and the JRC-Acquis corpus
(Ralf & Bruno etal, 2006).
The notion of SMT implies the use of statistics. It is based on statistics derived
from the corpora of the naturally occurring language. The translation is done
according to probability distribution p(e/f) that a string e in the target language
( e.g English) is the translation of the string f in the source language (e.g French).
2.3.4 Example Based Machine Translation (EBMT)
Example-based machine translation simulates the human translation process. It was
introduced by Makoto Nagao in 1984 Omar & Nazlia etal, 2010). It performs
translation by retrieving similar examples with their translation from translation
example stored in a textual database and input expressions are rendered in the
target language by retrieving from the database that example which is most similar
to the input. EBMT has been described by different researchers in different ways;
(Carbonell & Klein etal, 2006) called it “case based”, (Nameh &Fakhrahmad etal,
2011) called it “analogy based” [101] referred to it as “experience –guided”
EBMT is an empirical-based or data driven approach and a major requirement for
its deployment is a parallel aligned corpora (Callison-Burch &Osborne etal, 2006).
EBMT and SMT share some similarities both in strengths and weaknesses.
2.3.5 Hybrid Approach
20
Both rule based and empirical based approaches discussed so far have their
inherent strengths and weaknesses. The rule based technology requires significant
linguistic expertise which has to be manually encoded into their data structures and
algorithms either as special cases or as a full representation of the conceptual
content (Callison-Burch &Osborne etal, 2006). This impedes development speed
and robustness. The empirical based technology needs large amount of data which
is usually not readily available especially for resource-poor languages, it also fails
when selection preferences need to be based on distant words. Due to the inherent
weaknesses of these approaches none has been able to singly achieve the required
level of accuracy and quality in translation. This situation led to the adoption of a
hybrid approach. The hybrid approach is a machine translation technology that
integrates various machine translation technologies (Salem & Brian, 2009,
Cristina, E, 2010). The technologies complement each other to produce a more
satisfactory result. Some popular machine translation systems which employ the
hybrid technology are PROMT, SYSTRAN and Asia Online. App Tek delivered
its first hybrid machine translation in 2009.
2.3.6 Neural Machine Translation
In 2017, Machine Translation made another technological leap with the advent of
Neural Machine Translation (NMT). Neural Machine Translation harnesses the
21
power of Artificial Intelligence (AI) and uses neural networks to generate
translations.
Neural Machine Translation is the primary algorithm used in the industry to
perform machine translation. This state of the art algorithm is an application of
deep learning algorithm in which massive dataset of translated sentences are used
to train a model capable of translating between two languages. It outperforms
phrase-based systems without the need of creating handcrafted features such as
lexical or grammatical rules. It has been used in language translations such as
Google Translate (https://translate.google.com/).
NMT was introduced by (Kalchbrenner & Phil, 2013, Sutskever &Oriol etal, 2014,
Cho & Bart et al, 2014) who defined Recurrent Neural Networks (RNN) models
for machine translation.
Before NMT, statistical machine translation (SMT) provided the most state-of-the-
art results. While many initially believed that SMT would eventually become the
answer to machine translation, several issues including the number of components
that went into a single translation model and lack of generalizability of a model
stagnated SMT progress and prevented it from providing perfect translations.
At the very high level, NMT models are comprised of an Encoder and a Decoder,
both are Recurrent Neural Networks that are trained jointly. An attention
mechanism helps aligning the input tokens to the output tokens in order to facilitate
22
the translation. The Encoder reads the input sentence and generates a sequence of
hidden states. These hidden states are then used by the Decoder to generate a
sequence of output words, representing the translation of the input sentence.
Although Neural machine translation has emerged as the most promising machine
translation approach in recent years, showing superior performance on public
benchmarks and rapid adoption in deployments by, e.g., Google, Systran, and
WIPO there have been reports of poor performance, such as the systems built
under low-resource conditions in the DARPA LORELEI program
A fundamental requirement for the deployment of NMT is the availability of
massive dataset of translated sentences are used to train a model capable of
translating between two languages
23
CHAPTER THREE
SYSTEMS ANALYSIS AND DESIGN
3.1 Language Differences
Indeed, despite the underlying universality of human languages, there are
significant differences in how their structural constituents are organized. These
differences can be observed at various levels of linguistic analysis, including
phonology, morphology, syntax, and semantics.
Phonology: Languages vary in their phonological systems, including the inventory
of sounds (consonants, vowels, and tones), phonotactics (permissible arrangements
of sounds), and phonological rules (sound patterns and processes).
Morphology: Morphological structures differ across languages in terms of how
words are formed and modified. Some languages, like English, rely heavily on
affixes (prefixes, suffixes, infixes) to indicate grammatical relationships and derive
new words, while others, like Chinese, utilize more analytic or isolating
morphological strategies.
Syntax: Syntax refers to the rules governing the combination of words to form
phrases, clauses, and sentences. Languages exhibit diverse word order patterns
(e.g., subject-verb-object vs. subject-object-verb), syntactic constituency (e.g.,
head-initial vs. head-final), and syntactic agreement systems.
24
Semantics: Semantic structures vary in how meanings are encoded and interpreted.
This includes differences in lexical semantics (word meanings), compositional
semantics (meaning derived from the combination of words and phrases), and
pragmatic principles governing language use in context.
Pragmatics: Pragmatics deals with how language is used in context, including
principles of communication, speech acts, implicature, and discourse structure.
Cultural and situational factors influence pragmatic conventions, leading to
variation across languages.
These differences reflect the rich diversity of human linguistic expression and the
adaptability of language to diverse sociocultural environments. Linguists study
these variations to uncover universal principles underlying human language and to
understand the unique features of individual languages.
3.2 Analysis of English and Idoma Language
The two language pairs, English and Idoma were carefully studied in terms
of syntax, semantics, morphology, part of speech, word order, pluralization of
nouns, structural differences etc. through document study, observation and
interaction with Idoma/English professionals and community elders with vast
experience and knowledge about Idoma history. The rules that govern the
combination of words to form correct sentences in Idoma were identified.
25
Idoma is a fixed word order SVO (subject, verb, object) like English but the
arrangement of words in noun phrase and adjective phrase are not the same.
English places modifiers before nouns in noun phrases, Idoma does the reverse,
nouns are placed before modifiers (Mary, 2016).
Significant amount of linguistic knowledge is required for successful
deployment of machine translation systems using the rule based machine
translation technology; therefore, a comprehensive study and analysis of the two
languages was carried out. This knowledge provided the basis for the design of the
rule base, the inference engine and the full-form lexicon which are essential
components of the proposed rule based system for automatic translation of English
sentences to Idoma sentences.
From the study of Idoma language sentence structure conducted by (Mary,
2016, Ugwu, Imoh & Yakubu, 201..) the following the transformational rules that
govern the translation of English and Idoma phrases presented in the tables below
were formulated.
Table 3.1: Noun Phrases Transformational Rules
It was observed that Idoma language has no auxiliary verbs, therefore some
English words like is, are, of, a, an, have no equivalents in Idoma language.
Whenever they occur in a sentence, they are ignored in the translation.
26
3.1.1 Part of Speech tag system
Words are grouped into categories called parts of speech. There are eight parts of
speech in English language. They are Nouns, Verb, Adjectives, Adverb,
Conjunctions, Preposition and Determiners. In Idoma there are some parts of
speech such as nouns, adverb, adjectives, verbs etc.
The meaning of some words in Idoma depends on the nouns that follow them, for
this reason, we had to develop a part of speech tag system for the machine
translation differently from the conventional part of system so that meanings can
be appropriately conveyed. The part of speech tag system is presented in table 3.6
below.
Table 3. Part of Speech Tags
S/N Tag Description

1 ANN Animate singular noun
2 ANNS Animate Plural noun
3 NOP Noun indicative of place or location
4 N General nouns
5 VPP Verb in present tense
6 VPD Verb in past tense
7 VPT Verb in past participle
8 PPN Personal pronoun
9 POPN Possessive pronoun
10 DA Definite Article
11 IDA Indefinite Article
12 SPV Split Verb
13 DRV Direction Verb
27
14 ADJ Adjective
15 ADV Adverb
16 CDN Cardinal Numbers
17 ODN Ordinal Numbers
18 QTN Quantifiers
19 DEM Demonstratives
20 PDT PreDeterminers
21 PRP Preposition
22 V General verbs
3.2 Analysis of the Present System of English to Idoma Translation
3.2.1 Weaknesses of the Present System
The following weaknesses were identified in the present system of translating
from English to Idoma language:
1. Scarcity of Professional Translators.
2. The scarcity of Professional Translators limits the possibility of meeting
deadlines for large volume assignments.
3. The cost of employing or hiring Professional translators is prohibitive.
Volunteer translators are only a handful.
4. Human translation is very slow and time consuming.
5. Lack of standardization in Idoma language makes translation a difficult task
especially with regard to technical documentation.
6. Lack of tools such as online dictionaries, glossaries and translation
memories inhibits the translation process.
28
3.2.2Strength of the present system
Human translators have practical world knowledge as well as pragmatic
knowledge. This makes the translation of English to Idoma carried out by human
translators of reasonable high quality.
3.2.3 Benefits of the proposed Solution
The proposed rule based machine translation system from English to Idoma
language will overcome the identified weaknesses of the present system and
provide the following benefits:
1. Reduced translation cost. With sufficient translation volume, MT translation
is less expensive than human translation.
2. Improved delivery times. Delivery time for machine generated-translation is
limited only by the time it takes to revise them. In many applications
revision is not critical, so delivery is immediate.
3. Availability. MT systems have the advantage of being always on, so request
are processed as they are received.
29
3.3 System design
3.3.1 System Architecture

The conceptual architecture of the system is shown in figure 3.1
Preprocessing
English Case Converter
Sanitizer Tokenizer
Input
Bilingual Dictionary DB
Rule Engine
Arrays of English words, POS and Idoma Equivalents
Idoma
Sentence
Figure 3.1 System Architecture
The full form bilingual lexicon which contains the English words and Idoma
equivalents together with the parts of speech was developed in Microsoft Access
database platform. The rule engine which applies a collection of lexical and
syntactic transfer rules to generate the Idoma equivalent of the English sentence
was developed using php platform.
30
3.3.2 Database Specification and Platform
The relational database model was used for building the database and MYSQL is
the relational database model of choice. The database schema is described below:
1. Table Name: BilingualDictionary
S/NO FELD NAME DATA TYPE FIELD SIZE

1. Id AUTONUMBER AUTONUMBER
2 Englishword VARCHAR 40
3 Idomaword VARCHAR 70
4 pos VARCHAR 7
Table 3.7 Bilingual Dictionary table structure
This is the full form bilingual dictionary contains an English word and the
corresponding Idoma equivalents and other information necessary for translation.
3.3.3 Input/output Specification
Figure 3.2 displays the layout of the input and output screen. The user enters the
text to be translated in the space in the first section under Enter English text to
Translate and clicks the Translate button. The output of the translation is
displayed in the second section of the screen under Translated Idoma text.
31
English to Idoma Automatic Translator
Enter English Text to Translate
Translated Idoma Text
Figure 3.2 Input/output Layout
3.3.4Justification for choice of programming language
The inference engine, which contains the rules for translating input sentence in
English language to Idoma language, was developed using PHP.
32
The name PHP stands for Hypertext Preprocessor and denotes a server-side
scripting language, which suggests that applications written thereon run on web
servers.PHP is being widely utilized in developing web applications and become
one among main languages for developing web based applications. Leading social
networking sites like Facebook and reputed organizations like Harvard University
both support PHP which makes PHP popular and increases its credibility.
33

EIdoma Translator

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EIdoma Translator

Uploaded by

Copyright:

Available Formats

CHAPTER ONE

1.1 Background of Research

The ever-increasing need for cross-regional communication and Information

absolute necessity in today’s highly globalized and connected/networked world.

Language is the medium of communication. Human language is purposively to

communicate ideas, emotions, feelings, desires, to co-operate among social groups,

scope of linguistic and cultural diversity. Access to information written in another

is translation, therefore developing technologies for translating from one language

to another is very important. Without translation, there can be no cross-regional

environment in India inter-language translation was necessary for the transfer of

executives said language barriers are making it difficult to gain a foothold in

international markets, whether inside or outside your company, your global

audiences prefer communicating with you in their native languages. It increases

efficiency, receptivity, and allows for easier understanding of concepts (Ayegba F,

2016). Language translation is imperative in the globally united and yet

linguistically and culturally separated world in which we live.

demand for translated content, moreover human translation is costly, time

multilingual prospects, partners and customers (Ayegba F, 2016). The inherent

limitations of human translation made the search for an alternative means of

machine translation or computer assisted translation. Machine Translation (MT) is

MT has proved to be of social, political, scientific and philosophical importance.

Binational or multinational countries and organizations need to translate great

statistical MT system in the recent years. Commercial importance emerges from

instruction manuals, translation is a requirement for people to interact with each

trading parties in the most efficient ways.

language is classified in the Akweya subgroup of the Idomoid languages of

1.2 Statement of Problem

philosophical and economic dimensions. The absence of machine translation

Therefore the development of a system for automatic translation of English content

benefits associated with machine translation.

1.3 Aim and Objectives of Project

simple English sentences to Idoma language.

The specific objectives are:

To develop a language processor that will have the capacity to:

1.4 Scope of Research

1.5 Significance of Research

1.6 Limitations of Research

well-documented language. Linguistic resources such as parsers, morphological

analyzers, parallel corpora, part-of- speech taggers, bilingual dictionaries which

facilitate rapid development of machine translation applications are non-existent

1.7 Project Outline

This thesis is organized into five Chapters and a number of appendices.

Chapter One: Lists criteria, objectives, statement of the problem, scope,

output of the project.

Chapter Two: Explores existing literatures on language and machine translation

examples. The components of machine translators, the procedures for language

Existing methods of English-Idoma translation were studied and the attendant

difficulties inherent in the translation were identified.The challenges of translating

from English to Idoma were identified and discussed.

Chapter Four:This chapter subjects the Machine Translator to a technical

translation session of English to Idoma,discusses the test and evaluation of the

objectives of the research were achieved.The implementation of the proposed

system was also presented in this chapter.

Chapter Five: Discusses the summary, conclusions and recommendation for

extended research by future researchers.

1.8 Definition of Technical Terms

Morphological Analyzer: A morphological analyzer is a program for analyzing

Morphemes: The minimal distinctive unit of grammar in a language.

Corpus: In linguistics, a corpus (plural corpora) or text corpus is a large and

Parallel corpus:A parallel corpus is a corpus that contains a collection of original

Bilingual Dictionary: A bilingual dictionary or translation dictionary is a

Part-of- speech tagger:A Part-Of-Speech Tagger (POS Tagger) is a piece of

Source Language: The language in which a text appears that is to be translated