Ex Based

CHAPTER ONE
1.1 INTRODUCTION
Living in the global village has brought man in touch with technology that seeks to
articulate itself in every aspect of life, which may be viewed as culture. The
concept of globalization pushes man back to nature, and nature back to nurture, in
a back-and-forth manner. For instance, in Judaism, we were told a story of how the
Supreme Being decided to frustrate a people whose culture and language remained
a common heritage for achieving a common purpose, even of climbing into the
space – Tower of Babel.
When they were frustrated by the sudden collapse of what they were building, and
sudden loss of common language, no further progress was possible. But today, man
has not ceased in his quest to make progress: to make progress in international
relations, bilateral trades and commerce, unionism, economy and foreign
representations, etc. Therefore, what “frustration” from a supreme being would
solve these quests again? How else can earth be more “global”, if man cannot
speak in one tongue or men cannot communicate with one another as freely as
possible?
In this research, we pursue an agenda of Example Based Machine Translation of
English to Igala language, the language of Eastern Kogi. What then is Machine
Translation? The idea here is that we want to use machine (computer) to translate
human languages, so that, a sequence of text in one language can be rendered in a
sequence of text in another language(s). Machine Translation (also MT) is not just
restricted to changing a sequence of text in language A to their equivalence in
language B, it may be a translation of one language into more than one language,
example translation of English to Hausa, Igbo, Igala, Itsekiri, etc. It also
encompasses speech translation of what a speaker says in one language into spoken
words in another language(s). Example Based Machine Translation is however
different from the translation of programming language from high level to low
level or to machine object language (Bowen et al., 1990).
In this research, attention is given to the translation of English sentence into Igala
using the Example Based Machine Translation.
1.2 BACKGROUND OF THE STUDY:
Example Based Machine Translation is no longer new, but it is an effort that keeps
evolving. As early as the 1930’s, patents were granted for the development of
translation machines; George Arsroni had a licence to develop a machine which he
called mechanical brain (Hutchins, 2005).
Earlier on, Example Based Machine Translation attempts were about how to
translate each sentence in one language in their equivalences in another
language(s) with little concerns given to a translation of “deeper meanings”

(Nagao, 1984). Although those approaches were not effective, they paved way for
useful researches. For instance, the earliest method of translation called Direct
method led to the idea of integrating syntax and semantic rules into the direct
translation, resulting in a rise on the VauquoisTraingle (Vauquois, 1968).
This rise was later developed into an approach that takes deep syntactic and
semantic rules into consideration, and was called the Transfer method (Hutchins,
2005). But when things began to get more complex such that the application of
rules was beginning to be difficult, the Statistical models for languages were
beginning to be built (Brown et al., 1990, 1993).
Today, translations are grouped according to whether they are done based on rules
or based on large body of texts called corpora. Our proposed Translation of
English sentence to Igala utilizes the rule-based translation for reasons that are
peculiar with the languages.
1.3 AIM:
The sole aim of this research work is to develop a system that translates English
sentence to their linguistic equivalents in the Igala language.
1.4 OBJECTIVES:
The objectives of this research are:

a) To formulate an English-to-Igala transfer- example based machine
translation model for the translation of English Language sentence to Igala
language sentence.
b) Implement and evaluate the model developed in (a) above.
1.5 SCOPE OF THE RESEARCH:
This research covers the following scope is intended to cover the translation of
English sentence to Igala sentence with the use of Transfer method of Example
Based Machine Translation.
No special fields or register is being considered, as normal everyday
conversational English language sentence were used.
1.6STATEMENT OF PROBLEM:
Given any input text, a translation model is expected to output the equivalent of the
input text in the target language. Thus for our language pair, we must develop a
Transfer based model of single directional (English-Igala) translation.
1.7MOTIVATION:
The core motivating factors for this research are outlined as follows:
a) The desire to have a functional system in place, which is capable of
translating English to Igala, at the finger-tips.
b) Project the Igala language, as a language for all, by making it easy to learn.
c) The quest to remove language barriers, and to enhance maximum interaction
in businesses and other dealings involving those which can speak only Igala
language and those who cannot speak Igala or cannot understand it.
d) To carry Igala language along and prevent it from extinction as a language,
owing to rapid westernization.
1.8 ORGANIZATION:
This research work begins with an overview of Machine Translation concepts as
contained in the preceding sections. Chapter two runs through the previous
attempts of Example Based Machine Translation especially, the Transfer approach.
Also, some briefs were attended to about the Igala language and the Igala people.
Later on in chapter three, a detailed analysis of the pair of languages, namely
English and Igala was presented; it is actually a framework for the implementation
of the translation system, where patterns were discovered concerning the structures
of both the source language and the target language. The patterns form the basis of
the transfer rules which were used to implement the translation tasks in chapter
four. Chapter five is a summary and conclusion about the whole system.
CHAPTER TWO
LITERATURE REVIEW
2.1 INTRODUCTION
The translation of English to Igala is similar to a translation between any other pair
of languages. For instance, if one were to build a translator that can translate
English to Hausa, the same procedure is entailed as in the translation of English to
Igala: the only difference lies in the morphology of each pair of languages. The
features of Igala language and English language are comparatively examined here.
Igala is the dialect and also the name of the ethnic group that live on the eastern
flank of the confluence of the rivers Niger and Benue (Saniet al., 2014). Igala
language is the ninth most widely spoken language in Nigeria. The Igala people
live between latitude 6o30 and 8o40 north and longitude 6o30 and 7o40 east,
covering an area of about 13,665 square kilometers (Boston, 1967). According to
the 1995 population census, the population of the Igala nationals is estimated at
two million (Egbunu, 2013). Owing to the central location of Igala land, it is
bordered by so many states and various ethnic nationals like the Igbos on the
eastern and southern border; the Idomas on the east; the Edos on the south-west.
Also, the prolonged interaction through trade between the Igala people and their
surrounding ethnic groups such as Edos, ebiras, Bassa-Nkomo, Idomas and
northern Igbos, have introduced a lot of linguistic divergences into the ascents of
the communities on the boundaries; Ibaji, Ogugu and Odolu cases are in point.
The Igala race is closely related to the Yoruba and Itsekiri languages, but is
recently being affected to a large extent by other tribal groups sharing border with
them, including the Hausas. As a consequence,the divergence is more pronounced
near the borders than within the central communities like Anyigba. A more general
Igala language is however, used in this translation project.
2.2 REVIEW OF RELATED WORKS:
Machine Translation (MT) is the changing of text which denotes a meaning in one
language to text that denotes an equivalent meaning in another language by
computer. It is the process of converting one natural language into another natural
language by means of the computer (Peng, 2013). Machine translation can be a
complete automation or merely a partial assistance of the computer system
(computer-aided translation).
In the early 1920s, governments of some countries displayed enormous interests in
the new effort to translate languages by means of the computer. These interests
were demonstrated by granting patents to some experts to enhance their pursuit of
the noble endeavor, such as George Arsrouni, who developed the mechanical brain
(Hutchins, 2005). In 1949, Warren Weaver suggested that the problem of
translation can be attacked with statistical methods, with some ideas from
information theory (Peter et al., 1990).

Machine translation is classified into: rule-based (RBMT) and corpus-based
translations (Peng, 2011). Transfer approach to Machine Translation and others
like Direct translation and Interlingua belong to the rule-based category of machine
translation, whereas, Statistical technique and Example-based method are
examples of Corpus-based MT system.
Transfer MT is a metamorphosis of the earliest Direct or literal translation: when
there is a shallow transfer between the source language strings to target language
strings, we have a direct transfer; if the analysis of the input sentence include
syntactic or semantic processing, the process is called syntactic-transfer or
semantic-transfer. Interlingua employs an approach where all sentences in different
languages (usually two or more languages are involved) that express the same
thing are represented in the same way. These are demonstrated using the
Vanquoistriangle(Vanquois, 1968).
Interlingua
Analysis Semantic transfer
Syntactic transfer Generation
Direct transfer
Source text Target text

Fig. 2.1: Vauquois triangle
Transfer approach handles syntactic transformation, applies lexical rules by using
bi-lingual dictionary, and for complex cases, implements word-sense
disambiguation (Hutchins et al., 1992; Senellartet al., 2001). This makes Transfer
approach a better option compared to Direct method. Study also shows that
Transfer is better than Interlingua because Interlingua performs well only in sub-
language scenarios. Statistical method can be preferred to Transfer approach for
reasons of scalability and less human labour, but it cannot be applied when there is
no availability of large online resources (corpus); statistical method also requires
complex computational skills (Costa-Jussa, 2012). Therefore, for a translation task
between English and Igala, Transfer MT is the most appropriate choice.
Some work has been done on the translation of languages by means of Transfer
method, specifically between English and Igala: a lone example is noun phrase
translation from English to Igala (Ayegbaet al., 2014). In this work, only noun
phrases were dealt with in a quite shallow transfer. Also, shallow transfer machine
translation was implemented in 2009 by means of Alignment Template (AT) for
small parallel corpora (Sanchez-Martinez et al., 2009). The work also dealt with
transfer at shallow levels, and was specifically for closely related language pairs
like Spanish and Italian. So it cannot be applied to non-related language pairs such
as Igala and English. The MU MT system employed transfer approach and the
method of annotated tree structures. It is motivated by an attempt to overcome the
difficulties of Interlingua (Nagao et al., 2005). The LFG-based Machine
Translation engine for English and Filipino (Borraet al., 2001) uses Lexical
Functional Transfer framework. It however lacks the ability to translate words that
are not found within the database. The database itself has too many entries as one
same word makes an entry for each different sense of the word! In a work titled
Rapid Proto-typing of a Transfer-based Hebrew-to-English MT system, statistical
decoder was integrated into the system. Transliteration was used as well (Lavieet
al., 2004). Although this work was implemented to obtain a rapid result, lack of
thorough editing and the low quality of the dictionary used are being criticized. A
Transfer-based translation between Spanish and Basque made use of Finite-state
Transducer for lexical processing, HMM for part-of-speech tagging, deformatter to
separate the source text from their format, and an analyzer for the Spanish side
(Alegriaet al., 2006). The system does not handle semantic ambiguity quite well,
neither is it able to reverse the process i.e. translate Basque to Spanish, for the fact
that Spanish is more developed as a language compared to Basque. An attempt was
also made to translate speech using a Transfer approach called Incremental
Transfer (Matsubara et al., 1997). Incremental Transfer procedures include
incremental parsing, transfer and generation. This attempt produces only literal
speech translation and does poorly on parallel phrases, garden path sentences and
idioms because it produces output that is unnatural in the target language.
A version of Hindi-to-English Transfer-based MT uses CYK algorithm for parsing.
Transliteration of words that does not have a corresponding entry in the database
(like proper nouns) was adopted. A tokenizer was also used. However, complex
and complex-compound sentence structures were not being effectively handled.
In developing an Icelandic-to-English shallow Transfer MT system, XML file
management was used along with some existing language processing tools like
IceNLP, Icemorphy, etc (Brandt, 2011). The translation is a shallow one, and
system was built from pre-existing resources and tools which are not applicable
with a language like Igala.
In recent years, transfer methods that combine some statistical tools are being
developed. Shilonet al (2011), in their Transfer-based MT system between

morphologically-rich and resources-poor languages, developed the xfer MT system
for Hebrew-to-Arabic. xfar or stat-XFER for Statistical Transfer Machine
Translation, uses translation rules and an SMT decoder. Lavieet al (2003) also
developed an xfer system that uses Elicitation tool and statistical decoder.
2.3 CONCEPTS OF TRANSFER IN MACHINE TRANSLATION:
The Transfer-based method analyzes the sentence structure, while generating the
target-language text on the basis of word-to-word translation, according to the
different linguistic rules of a pair of languages. Three dictionaries may be used: the
source language dictionary, the source language-target language bilingual
dictionary and the target language dictionary (Hutchins, 1999).
The first stage of the translation is to analyze the input text for morphology and
syntax (and sometimes semantics); next, an intermediate representation (IR) for the
analyzed input is created. Transfer converts the source language IR to the target
language IR by using both bilingual dictionaries and grammatical rules; the output
isthen generated by analyzing the target language IR (this includes re-ordering,
disambiguation, etc). This is shown below:

SL sentence TL sentence
Analyzing Analyzing
IR for SL Transfer IR for TL
Figure 2.2: Framework for Transfer-based MT
2.4 THE IGALA LANGUAGE:
The Igala language as a spoken and written language makes use of symbols and
sound (Omachonu, 2000). Igala unlike English has no formal alphabet system; the
26 letters or symbols of English alphabets are however used for Igala writing
system, with some exceptions and modifications. The 26 English alphabets include
uppercase A… Z and the lower case a… z; the text system of Igala language makes
use of almost all of the English alphabets and their combinations e.g. “a”, “e”, “o”,
“ba”, “bu”, “gbi”, etc. Special characters are also used as alphabets in Igala text
system, such as “ẹ”, “ọ”, “ñ”; these are specifically used to distinguish how one
vowel is pronounced from another e.g. “ẹ” specifies the vowel /e/ as found in the
English word, bet (/bet/). Whereas, “e” is used for the diphthong /ei/ which
appears in the English word such as make (/meik/). Some English alphabets e.g.
“s”, “q”, “v”, “x”, and “z”. This may be because these characters do not have
corresponding equivalences which are phonetically similar to Igala. Some of these

alphabets are however used in Igala text when there is a transliteration, or as a
“composite consonant” (e.g. “sh”). q, v, and x are never used at all (Atadoga,
2013).
Also, Igala language is yet to own a formal lexicon (language resources in soft
copies which are in storage, and can be retrieved for use); this explains why Igala
can be described as a resource scarce language. Also, Igala language like other
languages has inexhaustible lexicon but a finite list of word classes can be
described, unlike English language. These parts of speech are Nouns, Verbs,
prepositions, determiners, adjectives, pronouns, and few conjunctions and
interjections. The few pronouns are gender-insensitive (and to some extent, person-
insensitive). Adjectives and adverbs are mostly derived.

CHAPTER THREE
SYSTEM ANALYSIS AND DESIGN
3.1 INTRODUCTION
Example-based Machine Translation tries to overcome the differences which exist
between two languages by applying contrastive knowledge (i.e. knowledge about
the differences between the source language and the target language). This strategy
is referred to as direct transfer model. The model requires some representation of
the structure of the source language, which results in the structure for the target
language, followed by a generation sentence in order to create the output sentence.
The transfer model for sub-language domain translation (Sentence in our case),
follows exactly the same procedure like a full-fledged language translation: an
analysis of the source and target language is done, then with the differences in
structures in mind, an intermediate representation for the source and how it
transforms to the structures of the target is enforced. This is done by means of the
direct transfer example; EMT are usually formalized from the grammar of the
source and target languages. The final process is the generation of output, i.e. the
translated language or the result of the translation tasks.
In this chapter, a detailed analysis of English language grammar to Igala language
grammar and is presented along with the preprocessing of inputs, and direct
formalism.
3.2 ANALYSIS OF EXISTING SYSTEMS:

Transfer MT system goes through three phases namely: analysis, transfer and
generation (Gehlot et al., 2015).
3.2.1 Analysis:
Analysis involves a keen evaluation of both the source language and target
language syntax and semantics, including their lexical and morphological
variations. For instance, in the translation of English to Arabic, study reveals that
Arabic is VSO (i.e. verb-subject-object sequence) language, whereas English is
SVO (subject-verb-object sequence) (Jurafsky et al., 1999). Even though, both
Igala and English have the same word order (they are both SVO), Igala has a way
of removing and attaching particles which goes on to differentiate the morphology
of English and Igala; for instance, number inflection for plural form of nouns in
Igala, is by adding ‘abo-‘as prefix to the nouns (as in abok ẹlẹ for men). It is also
known that Igala as an isolating language lacks inflection in its morphology
(Ayegba et al., 2014). In view of these differences, an existing system tries to
translate English to Igala by considering the use of derived morphology. In derived
morphology (more often, derivational morphology), a sense is created by
combining words from different function domains. This is seen easily when trying
to translate a term that is expressed with the use of one word in English, by using
almost a sentence in Igala. Example:
 English word: funny
 Igala sense: ẹñwu ki ache anyi / ẹñwu anyi
The illustration is one of the many instances that define the morphological
differences between Igala and English. Many others are found with the verb and
adjective word categories. E.g.:
 English word: slap

 Igala sense: k’ onẹ awo
 English word: remove
 Igala sense: du kwo, etc.
English noun inflections for gender and number portray a wide gap between
English and Igala morphology.
3.2.2 Generation:
The processes of analysis and transfer produce some intermediate representations
that are used to formalize the EBMT that produce target output sentence (or
phrase). These intermediate representations are in the form of XML codes or
syntax trees. Usually parsing produces trees which can be reliably put to use in rule
formalism.
Generation is the phase that produces the translated text and ends the translation
process. Most often, generation phase have program source codes embedded,
which does re-ordering of words as well as ambiguity handling (word sense
disambiguation).
3.3 FEATURES OF THE PROPOSED SYSTEM:

English language is known to be a language that is spoken in more parts of the
world than any other language and by more people except Chinese. It has a
vocabulary that surpasses that of any other language, being over a million words
(Pado et al., 2009). This quickly explains why having to deal with even a sub-
domain or just the noun phrases in English can be such a herculean task.
In designing this system, a number of English language sentence types were

closely studied so as to understand their structure. Over 50 sentence structures
were identified from everyday use of English language in ordinary conversations,
and in the classrooms. More structures with established profiles were also
downloaded from online, and each of these structures was developed into at least
one sentence. The main components of the structures include:
Example-Based MT (similar to RBMT transfer): intermediate level of NLP
processing (middle of the triangle). Example-Based MT has to do with template
matching, recombination
Figure 3.3 EBMT of the proposed system
Example-based BMT similarity score between input fragments to fragments in

database (most critical step!). Syntactic and/or semantic similarity to rank
candidates as well as NLP layers until (possibly) deep semantic analysis (closer to
RBMT) with stitching together translated fragments (closer to SMT). Example-
Based Machine Translation (EBMT) includes the following step:
i) EBMT‘s Workflow
ii) EMBT‘s Working

iii) EBMT vs. Case-Based Reasoning (CBR)
iv) Text Similarity
v) Recombination
3.3.1 Essential steps in EBMT are:
i) Phrase fragment matching
ii) Translation of segments
iii) Recombination
Table 3.3.1 EBMT of examples: (matched templates)
Source Language Target Language
Náagò chọ́ ò Thanks a lot
Ùwẹ ajòkwúta má m’omi-í You eat stone and do not drink water
Óla mií yā ṅ I am not feeling well
Aladi ki a wa The week that is coming
The table 3.3.1 indicates the transfer approach between source language and target
language
3.4. B+ Tree
B+ tree is a data structure consists of nodes that linked by pointers (internal nodes),
a special node called the root, and leaves. It has a unique path to each leaf, and all
paths are equal in length. Each node of the tree contains an ordered list of reference
values and pointers to lower level nodes in the tree. These pointers can be thought
of as being between each of the references values. It stores keys only at leaves, and
stores reference values in other internal nodes. The key search is guided via the
reference values, from the root to the leaves. To search for or insert an element into
the tree, the root of the B+ Tree should be the starting point because it represents
the whole range of values in the tree, where every internal node is a subinterval.
We are looking for a value k in the B+ Tree. Starting from the root, the leaf which
may contain the value k is looked for. At each node, the adjacent reference values
are found that the searched-for value is between and follows the corresponding
pointer to the next node in the tree. An internal B+ Tree node has children where
every one of them represents a different sub-interval. Recursion eventually leads to
the desired value or the conclusion that the value is not present. B+ tree is often
used in the implementation of database indexes, such that each record will be
stored in the database. The reference number and the key of that record will be
stored in the B+ tree. To reach a certain record, we need to know its key to get its
reference number from the B+ tree. When we get the reference number of that
record we can retrieve the required record directly and efficiently. The diagram
below illustrate the structure of EBMT and CBR with B+ tree
Fig.3.4: Example Based Machine Translation (source: Dr. Mariana Neves 2017)
3.4.1 Description of the proposed method

In this research, a new approach is used EBMT system. The proposed system
depends mainly on the examples stored in the Example Base (EB) to get the
translation of the input sentence. It will search for the input sentence in the (EB). If
the input sentence is found in the (EB), then the system will retrieve its
corresponding translation.
Sentence Matching Transfer Recombination

Figure 3.4.1 EBMT Working Strategies
Figure 3.4.1 describes the Example-Based Machine Translation EBMT system

rests on the idea that similar sentences will have similar translations. It uses past
translation examples to generate a translation for a given source language (SL)
text. The system maintains an example-base (EB) consisting of translation
examples. When a SL sentence is given to the system, the system retrieves a
similar SL sentence from the EB with its translation. Then it adapts the example to
generate the target language (TL) sentence for the input sentence.
The system has two main modules:
1) Retrieval
2) Adaption
There are three tasks in EBMT: Matching fragments against existing examples,
transferring (Identifying the corresponding translation fragments) and recombining
the fragments to give the target text
3.4.2 Stages of EBMT
In general, there are four stages of work in EBMT. There are example acquisition,
example base management, example application, and target sentence synthesis.
Example acquisition is about how to obtain examples from parallel bilingual
corpus. The example base management is about how examples are stored and
maintained.
The example application stage is about how examples are used to facilitate
translation, which involves the decomposition of an input sentence into examples
and the transformation of source texts into target texts in terms of existing
translation. The sentence synthesis is to generate a target sentence by putting the
converted examples into a smoothly readable order, aiming at improving the
readability of the target sentence after conversion.
3.4.3 Advantages of EBMT
There are several main advantages from using EBMT:
 Improvement EBMT has no rules, thus improvement is effected simply by
adding appropriate examples to the database. In other words, EBMT is easily
upgraded.
 Translation speed EBMT directly returns a translation by adapting the
examples without reasoning through a long chain of rules. In EBMT, deep
semantic analysis is avoided because it is assumed that translations that are
appropriate for a given domain can be obtained using domain-specific
examples.
3.4.3.1 Translation Accuracy
In EBMT, a reliability factor is assigned to the translation result according to the
distance between the input and the similar examples found. In other words, EBMT
can tell when its translation is inappropriate.
3.4.4 Drawbacks of EBMT
Although the quality of translation improved as more examples were added to the
database, but there is a limit after which further examples do not improve the
quality. There may be cases where performance starts to decrease and retrieval
from the example database will be slow. The reason is because of storing and
accessing of a large corpus of examples, and of matching an input phrase or
sentence against this corpus.
Thus in the proposed method, EBMT will be used in order to avoid this problem
and to design a special dictionary for the source language sentences that works on:
 Provide efficient time for getting the translation of the source language
sentence.
 Provide efficient memory usage in storing the source language sentences.
3.4.5 Language Preprocessing:

The EBMT module shares similarities in structure with three stages: analysis,
transfer & generation as shown in the figure 3.3.The Vauquois Pyramid adapted
for EBMT, Direct, Transfer and Interlingual minimum of prior knowledge and are
therefore quickly adaptable to many language pairs. The particular EBMT system
that we are examining works in the following way. Given an extensive corpus of
aligned source-language and target-language sentences, and a source-language
sentence to translate:
1.) It identifies exact substrings of the sentence to be translated within the
source-language corpus, thereby returning a series of source-language
sentences
2.) It takes the corresponding sentences in the target-language corpus as the
translations of the source-language corpus (this should be the case!)
3.) Then for each pair of sentences:
i.) It attempts to align the source- and target-language sentences;
ii.) It retrieves the portion of the target-language sentence marked as
aligned with the corpus source-language sentence’s substring and returns
it as the translation of the input source-language chunk.
The above system is a specialization of generalized EBMT systems. Other specific
systems may operate on parse trees or only on entire sentences. The system
requires the following:
1. Sentence-aligned source and target corpora.
2. Source- to target- dictionary
3. Stemmer
The stemmer is necessary because we will typically find only uninflected forms in
dictionaries. While it is consulted in the alignment algorithm, it is not consulted in
the matching step as stated before, those matches must be exact. This made the
identification of distinct machine translation of English to Igala are very easy and
interesting, also, some of the possible Igala translation of the identified English
sentence were obtained from some experienced native speakers of the Igala
language, and they were also transfer by EBMT. The preceding table shows some
of the sentences identified, possible translation in Igala and their corresponding
target language
Example based Machine Translation of English to Igala
ENGLISH ÍGÁLÁÀ
You eat stone and do not drink water Ùwẹ ajòkwúta má m’omi-í
Touch him/her/it. D’ọwó ̣ k’ō.̣
Bring me close to yourself Fà mí m’óla
Carry go; bring come. Du ló; du wá.

Bring the food. D’ùjẹñwu wá.
Take it and eat it. Gbà k’é ̣ jẹ.
I am hungry Ébi ákpa mí.
My mother starved me Íye mi d’ebi kpa mí
I fell asleep Ólu fù mí mú.
The eyes that recognize someone. Éjú k’ì m’ẹnẹ…
I saw them with my two eyes. Ù lí má kpàí éjú mi méjì
It was fire that burned me. Úná jō mi-í
I am not feeling well Óla mií yā ṅ
Table 3.4.5: Possible English Sentence
3.5 Proposed Example Based Machine Translation (EBMT)
The Example based machine translation is one of the approaches in machine

translation. The concept uses the corpus of two languages and then translates the
input text to desired target text by proper matching.
Igala Corpus
English Corpus
Figure 3.4.5: Example Based Machine Translation of English to Igala Language
The different languages have different language structure of the subject-object-
verb (SOV) alignment. The matching is then arranged to give proper meaning in
target text language and to form proper structure. In this research work, we
describe the Example Based Machine Translation using Natural Language
Processing. The proposed EBMT framework can be used for automatic translation
of text by reusing the examples of previous translations. This framework comprises
of three phases, matching, alignment and recombination.
A) Example Based Machine Translation
 English Corpus: We have used 50 English sentences for forming a corpus.

The sentences are the news headlines from reputed newspaper.
 Igala Corpus: It consists of the translated sentences in igala for each of the
English sentences.
 Knowledge Base: It stores the patterns of how English sentences are

translated into Igala form.
 Inference Engine: It is a collection of facts and rules. Inference Engine

compares the given English sentence with the English sentences stored in
the corpus. After finding the best match, it translates it into Igala according
to the Igala translation present in the Igala corpus.
Automatic translation of text form one language into another is a Machine

Translation. Due to globalization, it is the need of today’s information technology
dominated age to understand the text written into different languages by using
computers. However, there are numerous challenges for automatic machine
translation due to diversity of language constructs. This research presents the
implementation of Example Based Machine Translation (EBMT) by using Natural
Language Processing (NLP) techniques. In this system, user can submit the text in
English and it will be translated into Igala language. If the input sentence is not
found among the examples in the (EB), it will not be partitioned into sub-sentences
and compared against the examples in the (EB).If these sub-sentences are found in
the (EB), the system will retrieve its corresponding translations. If these sub-
sentences are not found in the (EB), the EBMT system will depends on word by
word analysis of the input sentence to get the translation.
Human translation is a complex intellectual activity and accordingly Machine

Translation (MT) is a complex scientific task, involving virtually every aspect of
Natural Language Processing. Many approaches to MT have been proposed, each
of them inspired by some insight about translation. Each approach has its own
merits, accounting for some aspect of translation better than other approaches, but
typically each approach's advantages are countered by weaknesses in other
respects. The real challenge is combining different approaches and insights into a
comprehensive whole. To this end it is important to analyze and classify different
approaches, for two reasons:
It is important to see to what extent differences are substantial or notational.

Sometimes different approaches look at the same subject from different
viewpoints, or use different representations, but a formal analysis shows that they
are equivalent. This was the case with many formal systems (categorical and
phrase structure grammars, finite state machines and regular grammars,
explanation based generalization and partial evaluation, etc.). In other cases
differences have been demonstrated to be matters of degree (for instance, in the
field of MT, transfer and Interlingua approaches).
It is important to see to what extent different approaches are mutually exclusive, or
whether they can be integrated into one system that encompasses all of them.
In the broad and diversified panorama of MT, you believe that this definition task,
far from being a pedantic exercise, is an important step towards separating
essential differences among MT approaches from inessential ones. This effort may
lead to uncovering overlaps between approaches that at first glance seem quite far
apart, or conversely it may bring to light significant differences between
approaches that are superficially similar. One can believe that a better
understanding of the relations among different approaches provides valuable
insight that can guide MT researchers in their decisions about further directions to
take
Example-Based MT
Translation of fragmental phrases by analogy
It is similar to SMT‘s decoding process

Analogy (text similarity) is the key in EBMT
Requirements for text similarity:
(Measure of similarity: similar documents should be measured as similar, and

vice-versa;
Large lexical knowledge networks to support similarity, e.g., WordNet,

Wikipedia, etc.

Ex Based

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ex Based

Uploaded by

Copyright:

Available Formats

CHAPTER ONE

space – Tower of Babel.

relations, bilateral trades and commerce, unionism, economy and foreign

representations, etc. Therefore, what “frustration” from a supreme being would

In this research, we pursue an agenda of Example Based Machine Translation of

restricted to changing a sequence of text in language A to their equivalence in

example translation of English to Hausa, Igbo, Igala, Itsekiri, etc. It also

words in another language(s). Example Based Machine Translation is however

level or to machine object language (Bowen et al., 1990).

using the Example Based Machine Translation.

1.2 BACKGROUND OF THE STUDY:

translation machines; George Arsroni had a licence to develop a machine which he

called mechanical brain (Hutchins, 2005).

translate each sentence in one language in their equivalences in another

language(s) with little concerns given to a translation of “deeper meanings”

translation, resulting in a rise on the VauquoisTraingle (Vauquois, 1968).

beginning to be built (Brown et al., 1990, 1993).

or based on large body of texts called corpora. Our proposed Translation of

peculiar with the languages.

sentence to their linguistic equivalents in the Igala language.

The objectives of this research are:

translation model for the translation of English Language sentence to Igala

b) Implement and evaluate the model developed in (a) above.

1.5 SCOPE OF THE RESEARCH:

Based Machine Translation.

No special fields or register is being considered, as normal everyday

conversational English language sentence were used.

Transfer based model of single directional (English-Igala) translation.

a) The desire to have a functional system in place, which is capable of

translating English to Igala, at the finger-tips.

d) To carry Igala language along and prevent it from extinction as a language,

owing to rapid westernization.

This research work begins with an overview of Machine Translation concepts as

attempts of Example Based Machine Translation especially, the Transfer approach.

Later on in chapter three, a detailed analysis of the pair of languages, namely

English to Hausa, the same procedure is entailed as in the translation of English to

covering an area of about 13,665 square kilometers (Boston, 1967). According to

surrounding ethnic groups such as Edos, ebiras, Bassa-Nkomo, Idomas and

them, including the Hausas. As a consequence,the divergence is more pronounced

Igala language is however, used in this translation project.

2.2 REVIEW OF RELATED WORKS:

language to text that denotes an equivalent meaning in another language by

language by means of the computer (Peng, 2013). Machine translation can be a

complete automation or merely a partial assistance of the computer system

In the early 1920s, governments of some countries displayed enormous interests in

were demonstrated by granting patents to some experts to enhance their pursuit of

(Hutchins, 2005). In 1949, Warren Weaver suggested that the problem of

information theory (Peter et al., 1990).

translations (Peng, 2011). Transfer approach to Machine Translation and others

translation, whereas, Statistical technique and Example-based method are

examples of Corpus-based MT system.

Transfer MT is a metamorphosis of the earliest Direct or literal translation: when

syntactic or semantic processing, the process is called syntactic-transfer or

semantic-transfer. Interlingua employs an approach where all sentences in different

Analysis Semantic transfer

Syntactic transfer Generation

Source text Target text

Transfer approach handles syntactic transformation, applies lexical rules by using

bi-lingual dictionary, and for complex cases, implements word-sense

language scenarios. Statistical method can be preferred to Transfer approach for

no availability of large online resources (corpus); statistical method also requires

complex computational skills (Costa-Jussa, 2012). Therefore, for a translation task

between English and Igala, Transfer MT is the most appropriate choice.