Professional Documents
Culture Documents
1.1 INTRODUCTION
Living in the global village has brought man in touch with technology that seeks to
articulate itself in every aspect of life, which may be viewed as culture. The
concept of globalization pushes man back to nature, and nature back to nurture, in
a back-and-forth manner. For instance, in Judaism, we were told a story of how the
Supreme Being decided to frustrate a people whose culture and language remained
a common heritage for achieving a common purpose, even of climbing into the
When they were frustrated by the sudden collapse of what they were building, and
sudden loss of common language, no further progress was possible. But today, man
has not ceased in his quest to make progress: to make progress in international
solve these quests again? How else can earth be more “global”, if man cannot
speak in one tongue or men cannot communicate with one another as freely as
possible?
English to Igala language, the language of Eastern Kogi. What then is Machine
Translation? The idea here is that we want to use machine (computer) to translate
human languages, so that, a sequence of text in one language can be rendered in a
sequence of text in another language(s). Machine Translation (also MT) is not just
language B, it may be a translation of one language into more than one language,
encompasses speech translation of what a speaker says in one language into spoken
different from the translation of programming language from high level to low
In this research, attention is given to the translation of English sentence into Igala
Example Based Machine Translation is no longer new, but it is an effort that keeps
evolving. As early as the 1930’s, patents were granted for the development of
Earlier on, Example Based Machine Translation attempts were about how to
useful researches. For instance, the earliest method of translation called Direct
method led to the idea of integrating syntax and semantic rules into the direct
This rise was later developed into an approach that takes deep syntactic and
semantic rules into consideration, and was called the Transfer method (Hutchins,
2005). But when things began to get more complex such that the application of
rules was beginning to be difficult, the Statistical models for languages were
Today, translations are grouped according to whether they are done based on rules
English sentence to Igala utilizes the rule-based translation for reasons that are
1.3 AIM:
The sole aim of this research work is to develop a system that translates English
1.4 OBJECTIVES:
language sentence.
This research covers the following scope is intended to cover the translation of
English sentence to Igala sentence with the use of Transfer method of Example
1.6STATEMENT OF PROBLEM:
Given any input text, a translation model is expected to output the equivalent of the
input text in the target language. Thus for our language pair, we must develop a
1.7MOTIVATION:
The core motivating factors for this research are outlined as follows:
b) Project the Igala language, as a language for all, by making it easy to learn.
c) The quest to remove language barriers, and to enhance maximum interaction
in businesses and other dealings involving those which can speak only Igala
language and those who cannot speak Igala or cannot understand it.
1.8 ORGANIZATION:
contained in the preceding sections. Chapter two runs through the previous
Also, some briefs were attended to about the Igala language and the Igala people.
English and Igala was presented; it is actually a framework for the implementation
of the translation system, where patterns were discovered concerning the structures
of both the source language and the target language. The patterns form the basis of
the transfer rules which were used to implement the translation tasks in chapter
four. Chapter five is a summary and conclusion about the whole system.
CHAPTER TWO
LITERATURE REVIEW
2.1 INTRODUCTION
The translation of English to Igala is similar to a translation between any other pair
of languages. For instance, if one were to build a translator that can translate
Igala: the only difference lies in the morphology of each pair of languages. The
features of Igala language and English language are comparatively examined here.
Igala is the dialect and also the name of the ethnic group that live on the eastern
flank of the confluence of the rivers Niger and Benue (Saniet al., 2014). Igala
language is the ninth most widely spoken language in Nigeria. The Igala people
live between latitude 6o30 and 8o40 north and longitude 6o30 and 7o40 east,
the 1995 population census, the population of the Igala nationals is estimated at
two million (Egbunu, 2013). Owing to the central location of Igala land, it is
bordered by so many states and various ethnic nationals like the Igbos on the
eastern and southern border; the Idomas on the east; the Edos on the south-west.
Also, the prolonged interaction through trade between the Igala people and their
northern Igbos, have introduced a lot of linguistic divergences into the ascents of
the communities on the boundaries; Ibaji, Ogugu and Odolu cases are in point.
The Igala race is closely related to the Yoruba and Itsekiri languages, but is
recently being affected to a large extent by other tribal groups sharing border with
near the borders than within the central communities like Anyigba. A more general
Machine Translation (MT) is the changing of text which denotes a meaning in one
computer. It is the process of converting one natural language into another natural
(computer-aided translation).
the new effort to translate languages by means of the computer. These interests
the noble endeavor, such as George Arsrouni, who developed the mechanical brain
translation can be attacked with statistical methods, with some ideas from
like Direct translation and Interlingua belong to the rule-based category of machine
there is a shallow transfer between the source language strings to target language
strings, we have a direct transfer; if the analysis of the input sentence include
languages (usually two or more languages are involved) that express the same
thing are represented in the same way. These are demonstrated using the
Vanquoistriangle(Vanquois, 1968).
Interlingua
Direct transfer
disambiguation (Hutchins et al., 1992; Senellartet al., 2001). This makes Transfer
approach a better option compared to Direct method. Study also shows that
Transfer is better than Interlingua because Interlingua performs well only in sub-
reasons of scalability and less human labour, but it cannot be applied when there is
Some work has been done on the translation of languages by means of Transfer
method, specifically between English and Igala: a lone example is noun phrase
translation from English to Igala (Ayegbaet al., 2014). In this work, only noun
phrases were dealt with in a quite shallow transfer. Also, shallow transfer machine
small parallel corpora (Sanchez-Martinez et al., 2009). The work also dealt with
transfer at shallow levels, and was specifically for closely related language pairs
like Spanish and Italian. So it cannot be applied to non-related language pairs such
as Igala and English. The MU MT system employed transfer approach and the
Translation engine for English and Filipino (Borraet al., 2001) uses Lexical
Functional Transfer framework. It however lacks the ability to translate words that
are not found within the database. The database itself has too many entries as one
same word makes an entry for each different sense of the word! In a work titled
decoder was integrated into the system. Transliteration was used as well (Lavieet
al., 2004). Although this work was implemented to obtain a rapid result, lack of
thorough editing and the low quality of the dictionary used are being criticized. A
separate the source text from their format, and an analyzer for the Spanish side
(Alegriaet al., 2006). The system does not handle semantic ambiguity quite well,
neither is it able to reverse the process i.e. translate Basque to Spanish, for the fact
incremental parsing, transfer and generation. This attempt produces only literal
speech translation and does poorly on parallel phrases, garden path sentences and
Transliteration of words that does not have a corresponding entry in the database
(like proper nouns) was adopted. A tokenizer was also used. However, complex
management was used along with some existing language processing tools like
IceNLP, Icemorphy, etc (Brandt, 2011). The translation is a shallow one, and
system was built from pre-existing resources and tools which are not applicable
In recent years, transfer methods that combine some statistical tools are being
Translation, uses translation rules and an SMT decoder. Lavieet al (2003) also
developed an xfer system that uses Elicitation tool and statistical decoder.
The Transfer-based method analyzes the sentence structure, while generating the
different linguistic rules of a pair of languages. Three dictionaries may be used: the
The first stage of the translation is to analyze the input text for morphology and
syntax (and sometimes semantics); next, an intermediate representation (IR) for the
analyzed input is created. Transfer converts the source language IR to the target
language IR by using both bilingual dictionaries and grammatical rules; the output
Analyzing Analyzing
The Igala language as a spoken and written language makes use of symbols and
sound (Omachonu, 2000). Igala unlike English has no formal alphabet system; the
26 letters or symbols of English alphabets are however used for Igala writing
system, with some exceptions and modifications. The 26 English alphabets include
uppercase A… Z and the lower case a… z; the text system of Igala language makes
use of almost all of the English alphabets and their combinations e.g. “a”, “e”, “o”,
“ba”, “bu”, “gbi”, etc. Special characters are also used as alphabets in Igala text
system, such as “ẹ”, “ọ”, “ñ”; these are specifically used to distinguish how one
vowel is pronounced from another e.g. “ẹ” specifies the vowel /e/ as found in the
English word, bet (/bet/). Whereas, “e” is used for the diphthong /ei/ which
appears in the English word such as make (/meik/). Some English alphabets e.g.
“s”, “q”, “v”, “x”, and “z”. This may be because these characters do not have
“composite consonant” (e.g. “sh”). q, v, and x are never used at all (Atadoga,
2013).
Also, Igala language is yet to own a formal lexicon (language resources in soft
copies which are in storage, and can be retrieved for use); this explains why Igala
can be described as a resource scarce language. Also, Igala language like other
languages has inexhaustible lexicon but a finite list of word classes can be
described, unlike English language. These parts of speech are Nouns, Verbs,
interjections. The few pronouns are gender-insensitive (and to some extent, person-
3.1 INTRODUCTION
Example-based Machine Translation tries to overcome the differences which exist
between two languages by applying contrastive knowledge (i.e. knowledge about
the differences between the source language and the target language). This strategy
is referred to as direct transfer model. The model requires some representation of
the structure of the source language, which results in the structure for the target
language, followed by a generation sentence in order to create the output sentence.
The transfer model for sub-language domain translation (Sentence in our case),
follows exactly the same procedure like a full-fledged language translation: an
analysis of the source and target language is done, then with the differences in
structures in mind, an intermediate representation for the source and how it
transforms to the structures of the target is enforced. This is done by means of the
direct transfer example; EMT are usually formalized from the grammar of the
source and target languages. The final process is the generation of output, i.e. the
translated language or the result of the translation tasks.
In this chapter, a detailed analysis of English language grammar to Igala language
grammar and is presented along with the preprocessing of inputs, and direct
formalism.
i) EBMT‘s Workflow
v) Recombination
iii) Recombination
Ùwẹ ajòkwúta má m’omi-í You eat stone and do not drink water
The table 3.3.1 indicates the transfer approach between source language and target
language
3.4. B+ Tree
B+ tree is a data structure consists of nodes that linked by pointers (internal nodes),
a special node called the root, and leaves. It has a unique path to each leaf, and all
paths are equal in length. Each node of the tree contains an ordered list of reference
values and pointers to lower level nodes in the tree. These pointers can be thought
of as being between each of the references values. It stores keys only at leaves, and
stores reference values in other internal nodes. The key search is guided via the
reference values, from the root to the leaves. To search for or insert an element into
the tree, the root of the B+ Tree should be the starting point because it represents
the whole range of values in the tree, where every internal node is a subinterval.
We are looking for a value k in the B+ Tree. Starting from the root, the leaf which
may contain the value k is looked for. At each node, the adjacent reference values
are found that the searched-for value is between and follows the corresponding
pointer to the next node in the tree. An internal B+ Tree node has children where
every one of them represents a different sub-interval. Recursion eventually leads to
the desired value or the conclusion that the value is not present. B+ tree is often
used in the implementation of database indexes, such that each record will be
stored in the database. The reference number and the key of that record will be
stored in the B+ tree. To reach a certain record, we need to know its key to get its
reference number from the B+ tree. When we get the reference number of that
record we can retrieve the required record directly and efficiently. The diagram
below illustrate the structure of EBMT and CBR with B+ tree
Fig.3.4: Example Based Machine Translation (source: Dr. Mariana Neves 2017)
ENGLISH ÍGÁLÁÀ
You eat stone and do not drink water Ùwẹ ajòkwúta má m’omi-í
Igala Corpus
English Corpus
Figure 3.4.5: Example Based Machine Translation of English to Igala Language
The different languages have different language structure of the subject-object-
verb (SOV) alignment. The matching is then arranged to give proper meaning in
target text language and to form proper structure. In this research work, we
describe the Example Based Machine Translation using Natural Language
Processing. The proposed EBMT framework can be used for automatic translation
of text by reusing the examples of previous translations. This framework comprises
of three phases, matching, alignment and recombination.
Igala Corpus: It consists of the translated sentences in igala for each of the
English sentences.
In the broad and diversified panorama of MT, you believe that this definition task,
far from being a pedantic exercise, is an important step towards separating
essential differences among MT approaches from inessential ones. This effort may
lead to uncovering overlaps between approaches that at first glance seem quite far
apart, or conversely it may bring to light significant differences between
approaches that are superficially similar. One can believe that a better
understanding of the relations among different approaches provides valuable
insight that can guide MT researchers in their decisions about further directions to
take
Example-Based MT