You are on page 1of 31

CHAPTER ONE

1.1 INTRODUCTION

Living in the global village has brought man in touch with technology that seeks to

articulate itself in every aspect of life, which may be viewed as culture. The

concept of globalization pushes man back to nature, and nature back to nurture, in

a back-and-forth manner. For instance, in Judaism, we were told a story of how the

Supreme Being decided to frustrate a people whose culture and language remained

a common heritage for achieving a common purpose, even of climbing into the

space – Tower of Babel.

When they were frustrated by the sudden collapse of what they were building, and

sudden loss of common language, no further progress was possible. But today, man

has not ceased in his quest to make progress: to make progress in international

relations, bilateral trades and commerce, unionism, economy and foreign

representations, etc. Therefore, what “frustration” from a supreme being would

solve these quests again? How else can earth be more “global”, if man cannot

speak in one tongue or men cannot communicate with one another as freely as

possible?

In this research, we pursue an agenda of Example Based Machine Translation of

English to Igala language, the language of Eastern Kogi. What then is Machine

Translation? The idea here is that we want to use machine (computer) to translate
human languages, so that, a sequence of text in one language can be rendered in a

sequence of text in another language(s). Machine Translation (also MT) is not just

restricted to changing a sequence of text in language A to their equivalence in

language B, it may be a translation of one language into more than one language,

example translation of English to Hausa, Igbo, Igala, Itsekiri, etc. It also

encompasses speech translation of what a speaker says in one language into spoken

words in another language(s). Example Based Machine Translation is however

different from the translation of programming language from high level to low

level or to machine object language (Bowen et al., 1990).

In this research, attention is given to the translation of English sentence into Igala

using the Example Based Machine Translation.

1.2 BACKGROUND OF THE STUDY:

Example Based Machine Translation is no longer new, but it is an effort that keeps

evolving. As early as the 1930’s, patents were granted for the development of

translation machines; George Arsroni had a licence to develop a machine which he

called mechanical brain (Hutchins, 2005).

Earlier on, Example Based Machine Translation attempts were about how to

translate each sentence in one language in their equivalences in another

language(s) with little concerns given to a translation of “deeper meanings”


(Nagao, 1984). Although those approaches were not effective, they paved way for

useful researches. For instance, the earliest method of translation called Direct

method led to the idea of integrating syntax and semantic rules into the direct

translation, resulting in a rise on the VauquoisTraingle (Vauquois, 1968).

This rise was later developed into an approach that takes deep syntactic and

semantic rules into consideration, and was called the Transfer method (Hutchins,

2005). But when things began to get more complex such that the application of

rules was beginning to be difficult, the Statistical models for languages were

beginning to be built (Brown et al., 1990, 1993).

Today, translations are grouped according to whether they are done based on rules

or based on large body of texts called corpora. Our proposed Translation of

English sentence to Igala utilizes the rule-based translation for reasons that are

peculiar with the languages.

1.3 AIM:

The sole aim of this research work is to develop a system that translates English

sentence to their linguistic equivalents in the Igala language.

1.4 OBJECTIVES:

The objectives of this research are:


a) To formulate an English-to-Igala transfer- example based machine

translation model for the translation of English Language sentence to Igala

language sentence.

b) Implement and evaluate the model developed in (a) above.

1.5 SCOPE OF THE RESEARCH:

This research covers the following scope is intended to cover the translation of

English sentence to Igala sentence with the use of Transfer method of Example

Based Machine Translation.

No special fields or register is being considered, as normal everyday

conversational English language sentence were used.

1.6STATEMENT OF PROBLEM:

Given any input text, a translation model is expected to output the equivalent of the

input text in the target language. Thus for our language pair, we must develop a

Transfer based model of single directional (English-Igala) translation.

1.7MOTIVATION:

The core motivating factors for this research are outlined as follows:

a) The desire to have a functional system in place, which is capable of

translating English to Igala, at the finger-tips.

b) Project the Igala language, as a language for all, by making it easy to learn.
c) The quest to remove language barriers, and to enhance maximum interaction

in businesses and other dealings involving those which can speak only Igala

language and those who cannot speak Igala or cannot understand it.

d) To carry Igala language along and prevent it from extinction as a language,

owing to rapid westernization.

1.8 ORGANIZATION:

This research work begins with an overview of Machine Translation concepts as

contained in the preceding sections. Chapter two runs through the previous

attempts of Example Based Machine Translation especially, the Transfer approach.

Also, some briefs were attended to about the Igala language and the Igala people.

Later on in chapter three, a detailed analysis of the pair of languages, namely

English and Igala was presented; it is actually a framework for the implementation

of the translation system, where patterns were discovered concerning the structures

of both the source language and the target language. The patterns form the basis of

the transfer rules which were used to implement the translation tasks in chapter

four. Chapter five is a summary and conclusion about the whole system.
CHAPTER TWO

LITERATURE REVIEW

2.1 INTRODUCTION
The translation of English to Igala is similar to a translation between any other pair

of languages. For instance, if one were to build a translator that can translate

English to Hausa, the same procedure is entailed as in the translation of English to

Igala: the only difference lies in the morphology of each pair of languages. The

features of Igala language and English language are comparatively examined here.

Igala is the dialect and also the name of the ethnic group that live on the eastern

flank of the confluence of the rivers Niger and Benue (Saniet al., 2014). Igala

language is the ninth most widely spoken language in Nigeria. The Igala people

live between latitude 6o30 and 8o40 north and longitude 6o30 and 7o40 east,

covering an area of about 13,665 square kilometers (Boston, 1967). According to

the 1995 population census, the population of the Igala nationals is estimated at

two million (Egbunu, 2013). Owing to the central location of Igala land, it is

bordered by so many states and various ethnic nationals like the Igbos on the

eastern and southern border; the Idomas on the east; the Edos on the south-west.

Also, the prolonged interaction through trade between the Igala people and their

surrounding ethnic groups such as Edos, ebiras, Bassa-Nkomo, Idomas and

northern Igbos, have introduced a lot of linguistic divergences into the ascents of

the communities on the boundaries; Ibaji, Ogugu and Odolu cases are in point.
The Igala race is closely related to the Yoruba and Itsekiri languages, but is

recently being affected to a large extent by other tribal groups sharing border with

them, including the Hausas. As a consequence,the divergence is more pronounced

near the borders than within the central communities like Anyigba. A more general

Igala language is however, used in this translation project.

2.2 REVIEW OF RELATED WORKS:

Machine Translation (MT) is the changing of text which denotes a meaning in one

language to text that denotes an equivalent meaning in another language by

computer. It is the process of converting one natural language into another natural

language by means of the computer (Peng, 2013). Machine translation can be a

complete automation or merely a partial assistance of the computer system

(computer-aided translation).

In the early 1920s, governments of some countries displayed enormous interests in

the new effort to translate languages by means of the computer. These interests

were demonstrated by granting patents to some experts to enhance their pursuit of

the noble endeavor, such as George Arsrouni, who developed the mechanical brain

(Hutchins, 2005). In 1949, Warren Weaver suggested that the problem of

translation can be attacked with statistical methods, with some ideas from

information theory (Peter et al., 1990).


Machine translation is classified into: rule-based (RBMT) and corpus-based

translations (Peng, 2011). Transfer approach to Machine Translation and others

like Direct translation and Interlingua belong to the rule-based category of machine

translation, whereas, Statistical technique and Example-based method are

examples of Corpus-based MT system.

Transfer MT is a metamorphosis of the earliest Direct or literal translation: when

there is a shallow transfer between the source language strings to target language

strings, we have a direct transfer; if the analysis of the input sentence include

syntactic or semantic processing, the process is called syntactic-transfer or

semantic-transfer. Interlingua employs an approach where all sentences in different

languages (usually two or more languages are involved) that express the same

thing are represented in the same way. These are demonstrated using the

Vanquoistriangle(Vanquois, 1968).
Interlingua

Analysis Semantic transfer

Syntactic transfer Generation

Direct transfer

Source text Target text


Fig. 2.1: Vauquois triangle

Transfer approach handles syntactic transformation, applies lexical rules by using

bi-lingual dictionary, and for complex cases, implements word-sense

disambiguation (Hutchins et al., 1992; Senellartet al., 2001). This makes Transfer

approach a better option compared to Direct method. Study also shows that

Transfer is better than Interlingua because Interlingua performs well only in sub-

language scenarios. Statistical method can be preferred to Transfer approach for

reasons of scalability and less human labour, but it cannot be applied when there is

no availability of large online resources (corpus); statistical method also requires

complex computational skills (Costa-Jussa, 2012). Therefore, for a translation task

between English and Igala, Transfer MT is the most appropriate choice.

Some work has been done on the translation of languages by means of Transfer

method, specifically between English and Igala: a lone example is noun phrase
translation from English to Igala (Ayegbaet al., 2014). In this work, only noun

phrases were dealt with in a quite shallow transfer. Also, shallow transfer machine

translation was implemented in 2009 by means of Alignment Template (AT) for

small parallel corpora (Sanchez-Martinez et al., 2009). The work also dealt with

transfer at shallow levels, and was specifically for closely related language pairs

like Spanish and Italian. So it cannot be applied to non-related language pairs such

as Igala and English. The MU MT system employed transfer approach and the

method of annotated tree structures. It is motivated by an attempt to overcome the

difficulties of Interlingua (Nagao et al., 2005). The LFG-based Machine

Translation engine for English and Filipino (Borraet al., 2001) uses Lexical

Functional Transfer framework. It however lacks the ability to translate words that

are not found within the database. The database itself has too many entries as one

same word makes an entry for each different sense of the word! In a work titled

Rapid Proto-typing of a Transfer-based Hebrew-to-English MT system, statistical

decoder was integrated into the system. Transliteration was used as well (Lavieet

al., 2004). Although this work was implemented to obtain a rapid result, lack of

thorough editing and the low quality of the dictionary used are being criticized. A

Transfer-based translation between Spanish and Basque made use of Finite-state

Transducer for lexical processing, HMM for part-of-speech tagging, deformatter to

separate the source text from their format, and an analyzer for the Spanish side
(Alegriaet al., 2006). The system does not handle semantic ambiguity quite well,

neither is it able to reverse the process i.e. translate Basque to Spanish, for the fact

that Spanish is more developed as a language compared to Basque. An attempt was

also made to translate speech using a Transfer approach called Incremental

Transfer (Matsubara et al., 1997). Incremental Transfer procedures include

incremental parsing, transfer and generation. This attempt produces only literal

speech translation and does poorly on parallel phrases, garden path sentences and

idioms because it produces output that is unnatural in the target language.

A version of Hindi-to-English Transfer-based MT uses CYK algorithm for parsing.

Transliteration of words that does not have a corresponding entry in the database

(like proper nouns) was adopted. A tokenizer was also used. However, complex

and complex-compound sentence structures were not being effectively handled.

In developing an Icelandic-to-English shallow Transfer MT system, XML file

management was used along with some existing language processing tools like

IceNLP, Icemorphy, etc (Brandt, 2011). The translation is a shallow one, and

system was built from pre-existing resources and tools which are not applicable

with a language like Igala.

In recent years, transfer methods that combine some statistical tools are being

developed. Shilonet al (2011), in their Transfer-based MT system between


morphologically-rich and resources-poor languages, developed the xfer MT system

for Hebrew-to-Arabic. xfar or stat-XFER for Statistical Transfer Machine

Translation, uses translation rules and an SMT decoder. Lavieet al (2003) also

developed an xfer system that uses Elicitation tool and statistical decoder.

2.3 CONCEPTS OF TRANSFER IN MACHINE TRANSLATION:

The Transfer-based method analyzes the sentence structure, while generating the

target-language text on the basis of word-to-word translation, according to the

different linguistic rules of a pair of languages. Three dictionaries may be used: the

source language dictionary, the source language-target language bilingual

dictionary and the target language dictionary (Hutchins, 1999).

The first stage of the translation is to analyze the input text for morphology and

syntax (and sometimes semantics); next, an intermediate representation (IR) for the

analyzed input is created. Transfer converts the source language IR to the target

language IR by using both bilingual dictionaries and grammatical rules; the output

isthen generated by analyzing the target language IR (this includes re-ordering,

disambiguation, etc). This is shown below:


SL sentence TL sentence

Analyzing Analyzing

IR for SL Transfer IR for TL

Figure 2.2: Framework for Transfer-based MT

2.4 THE IGALA LANGUAGE:

The Igala language as a spoken and written language makes use of symbols and

sound (Omachonu, 2000). Igala unlike English has no formal alphabet system; the

26 letters or symbols of English alphabets are however used for Igala writing

system, with some exceptions and modifications. The 26 English alphabets include

uppercase A… Z and the lower case a… z; the text system of Igala language makes

use of almost all of the English alphabets and their combinations e.g. “a”, “e”, “o”,

“ba”, “bu”, “gbi”, etc. Special characters are also used as alphabets in Igala text

system, such as “ẹ”, “ọ”, “ñ”; these are specifically used to distinguish how one

vowel is pronounced from another e.g. “ẹ” specifies the vowel /e/ as found in the

English word, bet (/bet/). Whereas, “e” is used for the diphthong /ei/ which

appears in the English word such as make (/meik/). Some English alphabets e.g.

“s”, “q”, “v”, “x”, and “z”. This may be because these characters do not have

corresponding equivalences which are phonetically similar to Igala. Some of these


alphabets are however used in Igala text when there is a transliteration, or as a

“composite consonant” (e.g. “sh”). q, v, and x are never used at all (Atadoga,

2013).

Also, Igala language is yet to own a formal lexicon (language resources in soft

copies which are in storage, and can be retrieved for use); this explains why Igala

can be described as a resource scarce language. Also, Igala language like other

languages has inexhaustible lexicon but a finite list of word classes can be

described, unlike English language. These parts of speech are Nouns, Verbs,

prepositions, determiners, adjectives, pronouns, and few conjunctions and

interjections. The few pronouns are gender-insensitive (and to some extent, person-

insensitive). Adjectives and adverbs are mostly derived.


CHAPTER THREE
SYSTEM ANALYSIS AND DESIGN

3.1 INTRODUCTION
Example-based Machine Translation tries to overcome the differences which exist
between two languages by applying contrastive knowledge (i.e. knowledge about
the differences between the source language and the target language). This strategy
is referred to as direct transfer model. The model requires some representation of
the structure of the source language, which results in the structure for the target
language, followed by a generation sentence in order to create the output sentence.
The transfer model for sub-language domain translation (Sentence in our case),
follows exactly the same procedure like a full-fledged language translation: an
analysis of the source and target language is done, then with the differences in
structures in mind, an intermediate representation for the source and how it
transforms to the structures of the target is enforced. This is done by means of the
direct transfer example; EMT are usually formalized from the grammar of the
source and target languages. The final process is the generation of output, i.e. the
translated language or the result of the translation tasks.
In this chapter, a detailed analysis of English language grammar to Igala language
grammar and is presented along with the preprocessing of inputs, and direct
formalism.

3.2 ANALYSIS OF EXISTING SYSTEMS:


Transfer MT system goes through three phases namely: analysis, transfer and
generation (Gehlot et al., 2015).
3.2.1 Analysis:
Analysis involves a keen evaluation of both the source language and target
language syntax and semantics, including their lexical and morphological
variations. For instance, in the translation of English to Arabic, study reveals that
Arabic is VSO (i.e. verb-subject-object sequence) language, whereas English is
SVO (subject-verb-object sequence) (Jurafsky et al., 1999). Even though, both
Igala and English have the same word order (they are both SVO), Igala has a way
of removing and attaching particles which goes on to differentiate the morphology
of English and Igala; for instance, number inflection for plural form of nouns in
Igala, is by adding ‘abo-‘as prefix to the nouns (as in abok ẹlẹ for men). It is also
known that Igala as an isolating language lacks inflection in its morphology
(Ayegba et al., 2014). In view of these differences, an existing system tries to
translate English to Igala by considering the use of derived morphology. In derived
morphology (more often, derivational morphology), a sense is created by
combining words from different function domains. This is seen easily when trying
to translate a term that is expressed with the use of one word in English, by using
almost a sentence in Igala. Example:
 English word: funny
 Igala sense: ẹñwu ki ache anyi / ẹñwu anyi
The illustration is one of the many instances that define the morphological
differences between Igala and English. Many others are found with the verb and
adjective word categories. E.g.:

 English word: slap


 Igala sense: k’ onẹ awo
 English word: remove
 Igala sense: du kwo, etc.
English noun inflections for gender and number portray a wide gap between
English and Igala morphology.
3.2.2 Generation:
The processes of analysis and transfer produce some intermediate representations
that are used to formalize the EBMT that produce target output sentence (or
phrase). These intermediate representations are in the form of XML codes or
syntax trees. Usually parsing produces trees which can be reliably put to use in rule
formalism.
Generation is the phase that produces the translated text and ends the translation
process. Most often, generation phase have program source codes embedded,
which does re-ordering of words as well as ambiguity handling (word sense
disambiguation).

3.3 FEATURES OF THE PROPOSED SYSTEM:


English language is known to be a language that is spoken in more parts of the
world than any other language and by more people except Chinese. It has a
vocabulary that surpasses that of any other language, being over a million words
(Pado et al., 2009). This quickly explains why having to deal with even a sub-
domain or just the noun phrases in English can be such a herculean task.

In designing this system, a number of English language sentence types were


closely studied so as to understand their structure. Over 50 sentence structures
were identified from everyday use of English language in ordinary conversations,
and in the classrooms. More structures with established profiles were also
downloaded from online, and each of these structures was developed into at least
one sentence. The main components of the structures include:
Example-Based MT (similar to RBMT transfer): intermediate level of NLP
processing (middle of the triangle). Example-Based MT has to do with template
matching, recombination

Figure 3.3 EBMT of the proposed system

Example-based BMT similarity score between input fragments to fragments in


database (most critical step!). Syntactic and/or semantic similarity to rank
candidates as well as NLP layers until (possibly) deep semantic analysis (closer to
RBMT) with stitching together translated fragments (closer to SMT). Example-
Based Machine Translation (EBMT) includes the following step:

i) EBMT‘s Workflow

ii) EMBT‘s Working


iii) EBMT vs. Case-Based Reasoning (CBR)

iv) Text Similarity

v) Recombination

3.3.1 Essential steps in EBMT are:

i) Phrase fragment matching

ii) Translation of segments

iii) Recombination

Table 3.3.1 EBMT of examples: (matched templates)

Source Language Target Language

Náagò chọ́ ò Thanks a lot

Ùwẹ ajòkwúta má m’omi-í You eat stone and do not drink water

Óla mií yā ṅ I am not feeling well

Aladi ki a wa The week that is coming

The table 3.3.1 indicates the transfer approach between source language and target
language

3.4. B+ Tree

B+ tree is a data structure consists of nodes that linked by pointers (internal nodes),
a special node called the root, and leaves. It has a unique path to each leaf, and all
paths are equal in length. Each node of the tree contains an ordered list of reference
values and pointers to lower level nodes in the tree. These pointers can be thought
of as being between each of the references values. It stores keys only at leaves, and
stores reference values in other internal nodes. The key search is guided via the
reference values, from the root to the leaves. To search for or insert an element into
the tree, the root of the B+ Tree should be the starting point because it represents
the whole range of values in the tree, where every internal node is a subinterval.
We are looking for a value k in the B+ Tree. Starting from the root, the leaf which
may contain the value k is looked for. At each node, the adjacent reference values
are found that the searched-for value is between and follows the corresponding
pointer to the next node in the tree. An internal B+ Tree node has children where
every one of them represents a different sub-interval. Recursion eventually leads to
the desired value or the conclusion that the value is not present. B+ tree is often
used in the implementation of database indexes, such that each record will be
stored in the database. The reference number and the key of that record will be
stored in the B+ tree. To reach a certain record, we need to know its key to get its
reference number from the B+ tree. When we get the reference number of that
record we can retrieve the required record directly and efficiently. The diagram
below illustrate the structure of EBMT and CBR with B+ tree
Fig.3.4: Example Based Machine Translation (source: Dr. Mariana Neves 2017)

3.4.1 Description of the proposed method


In this research, a new approach is used EBMT system. The proposed system
depends mainly on the examples stored in the Example Base (EB) to get the
translation of the input sentence. It will search for the input sentence in the (EB). If
the input sentence is found in the (EB), then the system will retrieve its
corresponding translation.

Sentence Matching Transfer Recombination


Figure 3.4.1 EBMT Working Strategies

Figure 3.4.1 describes the Example-Based Machine Translation EBMT system


rests on the idea that similar sentences will have similar translations. It uses past
translation examples to generate a translation for a given source language (SL)
text. The system maintains an example-base (EB) consisting of translation
examples. When a SL sentence is given to the system, the system retrieves a
similar SL sentence from the EB with its translation. Then it adapts the example to
generate the target language (TL) sentence for the input sentence.
The system has two main modules:
1) Retrieval
2) Adaption
There are three tasks in EBMT: Matching fragments against existing examples,
transferring (Identifying the corresponding translation fragments) and recombining
the fragments to give the target text
3.4.2 Stages of EBMT
In general, there are four stages of work in EBMT. There are example acquisition,
example base management, example application, and target sentence synthesis.
Example acquisition is about how to obtain examples from parallel bilingual
corpus. The example base management is about how examples are stored and
maintained.
The example application stage is about how examples are used to facilitate
translation, which involves the decomposition of an input sentence into examples
and the transformation of source texts into target texts in terms of existing
translation. The sentence synthesis is to generate a target sentence by putting the
converted examples into a smoothly readable order, aiming at improving the
readability of the target sentence after conversion.
3.4.3 Advantages of EBMT
There are several main advantages from using EBMT:
 Improvement EBMT has no rules, thus improvement is effected simply by
adding appropriate examples to the database. In other words, EBMT is easily
upgraded.
 Translation speed EBMT directly returns a translation by adapting the
examples without reasoning through a long chain of rules. In EBMT, deep
semantic analysis is avoided because it is assumed that translations that are
appropriate for a given domain can be obtained using domain-specific
examples.
3.4.3.1 Translation Accuracy
In EBMT, a reliability factor is assigned to the translation result according to the
distance between the input and the similar examples found. In other words, EBMT
can tell when its translation is inappropriate.
3.4.4 Drawbacks of EBMT
Although the quality of translation improved as more examples were added to the
database, but there is a limit after which further examples do not improve the
quality. There may be cases where performance starts to decrease and retrieval
from the example database will be slow. The reason is because of storing and
accessing of a large corpus of examples, and of matching an input phrase or
sentence against this corpus.
Thus in the proposed method, EBMT will be used in order to avoid this problem
and to design a special dictionary for the source language sentences that works on:
 Provide efficient time for getting the translation of the source language
sentence.
 Provide efficient memory usage in storing the source language sentences.

3.4.5 Language Preprocessing:


The EBMT module shares similarities in structure with three stages: analysis,
transfer & generation as shown in the figure 3.3.The Vauquois Pyramid adapted
for EBMT, Direct, Transfer and Interlingual minimum of prior knowledge and are
therefore quickly adaptable to many language pairs. The particular EBMT system
that we are examining works in the following way. Given an extensive corpus of
aligned source-language and target-language sentences, and a source-language
sentence to translate:
1.) It identifies exact substrings of the sentence to be translated within the
source-language corpus, thereby returning a series of source-language
sentences
2.) It takes the corresponding sentences in the target-language corpus as the
translations of the source-language corpus (this should be the case!)
3.) Then for each pair of sentences:
i.) It attempts to align the source- and target-language sentences;
ii.) It retrieves the portion of the target-language sentence marked as
aligned with the corpus source-language sentence’s substring and returns
it as the translation of the input source-language chunk.
The above system is a specialization of generalized EBMT systems. Other specific
systems may operate on parse trees or only on entire sentences. The system
requires the following:
1. Sentence-aligned source and target corpora.
2. Source- to target- dictionary
3. Stemmer
The stemmer is necessary because we will typically find only uninflected forms in
dictionaries. While it is consulted in the alignment algorithm, it is not consulted in
the matching step as stated before, those matches must be exact. This made the
identification of distinct machine translation of English to Igala are very easy and
interesting, also, some of the possible Igala translation of the identified English
sentence were obtained from some experienced native speakers of the Igala
language, and they were also transfer by EBMT. The preceding table shows some
of the sentences identified, possible translation in Igala and their corresponding
target language

Example based Machine Translation of English to Igala

ENGLISH ÍGÁLÁÀ

You eat stone and do not drink water Ùwẹ ajòkwúta má m’omi-í

Touch him/her/it. D’ọwó ̣ k’ō.̣

Bring me close to yourself Fà mí m’óla

Carry go; bring come. Du ló; du wá.


Bring the food. D’ùjẹñwu wá.

Take it and eat it. Gbà k’é ̣ jẹ.

I am hungry Ébi ákpa mí.

My mother starved me Íye mi d’ebi kpa mí

I fell asleep Ólu fù mí mú.

The eyes that recognize someone. Éjú k’ì m’ẹnẹ…

I saw them with my two eyes. Ù lí má kpàí éjú mi méjì

It was fire that burned me. Úná jō mi-í

I am not feeling well Óla mií yā ṅ

Table 3.4.5: Possible English Sentence

3.5 Proposed Example Based Machine Translation (EBMT)

The Example based machine translation is one of the approaches in machine


translation. The concept uses the corpus of two languages and then translates the
input text to desired target text by proper matching.

Igala Corpus
English Corpus
Figure 3.4.5: Example Based Machine Translation of English to Igala Language
The different languages have different language structure of the subject-object-
verb (SOV) alignment. The matching is then arranged to give proper meaning in
target text language and to form proper structure. In this research work, we
describe the Example Based Machine Translation using Natural Language
Processing. The proposed EBMT framework can be used for automatic translation
of text by reusing the examples of previous translations. This framework comprises
of three phases, matching, alignment and recombination.

A) Example Based Machine Translation

 English Corpus: We have used 50 English sentences for forming a corpus.


The sentences are the news headlines from reputed newspaper.

 Igala Corpus: It consists of the translated sentences in igala for each of the
English sentences.

 Knowledge Base: It stores the patterns of how English sentences are


translated into Igala form.

 Inference Engine: It is a collection of facts and rules. Inference Engine


compares the given English sentence with the English sentences stored in
the corpus. After finding the best match, it translates it into Igala according
to the Igala translation present in the Igala corpus.

Automatic translation of text form one language into another is a Machine


Translation. Due to globalization, it is the need of today’s information technology
dominated age to understand the text written into different languages by using
computers. However, there are numerous challenges for automatic machine
translation due to diversity of language constructs. This research presents the
implementation of Example Based Machine Translation (EBMT) by using Natural
Language Processing (NLP) techniques. In this system, user can submit the text in
English and it will be translated into Igala language. If the input sentence is not
found among the examples in the (EB), it will not be partitioned into sub-sentences
and compared against the examples in the (EB).If these sub-sentences are found in
the (EB), the system will retrieve its corresponding translations. If these sub-
sentences are not found in the (EB), the EBMT system will depends on word by
word analysis of the input sentence to get the translation.

Human translation is a complex intellectual activity and accordingly Machine


Translation (MT) is a complex scientific task, involving virtually every aspect of
Natural Language Processing. Many approaches to MT have been proposed, each
of them inspired by some insight about translation. Each approach has its own
merits, accounting for some aspect of translation better than other approaches, but
typically each approach's advantages are countered by weaknesses in other
respects. The real challenge is combining different approaches and insights into a
comprehensive whole. To this end it is important to analyze and classify different
approaches, for two reasons:

It is important to see to what extent differences are substantial or notational.


Sometimes different approaches look at the same subject from different
viewpoints, or use different representations, but a formal analysis shows that they
are equivalent. This was the case with many formal systems (categorical and
phrase structure grammars, finite state machines and regular grammars,
explanation based generalization and partial evaluation, etc.). In other cases
differences have been demonstrated to be matters of degree (for instance, in the
field of MT, transfer and Interlingua approaches).
It is important to see to what extent different approaches are mutually exclusive, or
whether they can be integrated into one system that encompasses all of them.

In the broad and diversified panorama of MT, you believe that this definition task,
far from being a pedantic exercise, is an important step towards separating
essential differences among MT approaches from inessential ones. This effort may
lead to uncovering overlaps between approaches that at first glance seem quite far
apart, or conversely it may bring to light significant differences between
approaches that are superficially similar. One can believe that a better
understanding of the relations among different approaches provides valuable
insight that can guide MT researchers in their decisions about further directions to
take

Example-Based MT

Translation of fragmental phrases by analogy

It is similar to SMT‘s decoding process


Analogy (text similarity) is the key in EBMT

Requirements for text similarity:

(Measure of similarity: similar documents should be measured as similar, and


vice-versa;

Large lexical knowledge networks to support similarity, e.g., WordNet,


Wikipedia, etc.

You might also like