You are on page 1of 33

CHAPTER ONE

Introduction

1.1 Background of Research

The ever-increasing need for cross-regional communication and Information

exchange has made translation from one human language to another a matter of

absolute necessity in today’s highly globalized and connected/networked world.

Language is the medium of communication. Human language is purposively to

communicate ideas, emotions, feelings, desires, to co-operate among social groups,

to exhibit habits etc. which can be translated along a variety of channels (Bamisaye

O, 2000). There are over 6,800 living languages in the world which reflects the

scope of linguistic and cultural diversity. Access to information written in another

language is of great interest and the means of sharing information across languages

is translation, therefore developing technologies for translating from one language

to another is very important. Without translation, there can be no cross-regional

communication and many voices will not be heard without this critical function.

(KoehnP, 2009) Showed that due to difference in culture and the multilingual

environment in India inter-language translation was necessary for the transfer of

information and sharing of ideas. The need for translation is also very glaring in

the business community. It has been observed that language barriers between

companies and their global customers are stifling economic growth and in fact,

1
forty-nine percent of executives say a language barrier has stood in the way of a

major international business deal, nearly two-thirds (64 percent) of those same

executives said language barriers are making it difficult to gain a foothold in

international markets, whether inside or outside your company, your global

audiences prefer communicating with you in their native languages. It increases

efficiency, receptivity, and allows for easier understanding of concepts (Ayegba F,

2016). Language translation is imperative in the globally united and yet

linguistically and culturally separated world in which we live.

Humans were originally responsible for translating from one language to another.

At a point the supply of translation services could no long keep pace with the

demand for translated content, moreover human translation is costly, time

consuming and inadequate for addressing the real-time needs of businesses to serve

multilingual prospects, partners and customers (Ayegba F, 2016). The inherent

limitations of human translation made the search for an alternative means of

translation paramount. The search led to the discovery of what is known today as

machine translation or computer assisted translation. Machine Translation (MT) is

defined as the use of computers to translate messages in the form of text or speech

from one natural language (human language) into another language of nature

(Ahmed & Mohd, 2014). It is the process of using software to translate text from

one natural language to another. This need has prompted research organizations

2
and government agencies to develop tools for automatic translation of text in

attempt to achieve wider outreach and bridge the gap of language diversity (Koehn

P, 2010).

MT has proved to be of social, political, scientific and philosophical importance.

Social and political importance emerges from the necessity to understand the other.

Binational or multinational countries and organizations need to translate great

volumes of texts into many languages in a very limited time. For instance,

European Union allocates around €330m a year to translate from and into 23

official languages. In addition, Union allocates nearly %1 of the annual budget for

all the language services (DG Translation official website, 2014). European Union

uses an internal machine translation engine, which has shifted from rule - based to

statistical MT system in the recent years. Commercial importance emerges from

the fact that for each step in international markets, from business agreements to

instruction manuals, translation is a requirement for people to interact with each

other. The delays in translation can be costly, so using MT can help translators and

trading parties in the most efficient ways.

Nigeria is the most populous country in Africa with a population of about 200

million people. It is also the seventh largest country in the world (Ayegba F, 2016).

Nigeria is a multilingual country with over 500 ethnic groups. This shows the level

3
of linguistic and cultural diversity in the country. Idoma is one of the ethnic groups

in Nigeria.

The Idomas are people that primarily inhabit the lower western areas of Benue

State, Nigeria, and some of them can be found in Taraba State, Cross Rivers

State, Enugu State, Kogi State and Nasarawa State in Nigeria. The Idoma

language is classified in the Akweya subgroup of the Idomoid languages of

the Volta–Niger family, which include Igede, Alago, Agatu, Etulo, Ete, Akweya

(Akpa) and Yala languages of Benue, Nasarawa, Kogi, Enugu, and Northern Cross

River states. The Akweya subgroup is closely related to the Yatye-Akpa sub-

group. The bulk of the territory is inland, south of river Benue, some seventy-two

kilometres east of its confluence with River Niger. The Idoma tribe are known to

be 'warriors' and 'hunters' of class, but hospitable and peace-loving. The greater

part of Idoma land remained largely unknown to the West until the 1920s, leaving

much of the colourful traditional culture of the Idoma intact. The population of the

Idomas is estimated to be about 3.5 million. The Idoma people have a traditional

ruler called the Och'Idoma who is the head of the Idoma Area Traditional Council.

This was introduced by the British. Each community has its own traditional chief

such as the former Ad'Ogbadibo of Orokam, Late Chief D.E Enenche. The Palace

of the Och'Idoma is located at Otukpo, Benue State. The present Och'Idoma, HRM,

Elaigwu Odogbo John, the 5th Och'Idoma of the Idoma People was installed on the

4
30th of June, 2022 following the passing of his Predecessor HRH Agabaidu Elias

Ikoyi Obekpa who ruled from 1996 to October 2021. Past Och'Idomas also

include: HRH, Agabaidu Edwin Ogbu, who reigned from 1996 to 1997, HRH,

Abraham Ajene Okpabi of Igede descent who ruled from 1960 to 1995 and HRH,

Agabaidu Ogiri Oko whose reign took place between 1948 and 1959.

1.2 Statement of Problem


The benefits of Machine translation systems have social, political, scientific,

philosophical and economic dimensions. The absence of machine translation

applications for Idoma language has shut the people out of these benefits.

Therefore the development of a system for automatic translation of English content

to Idoma language to address this challenge has become imperative. This research

seeks to address this challenge and bring to the Idoma people the numerous

benefits associated with machine translation.

1.3 Aim and Objectives of Project

The aim of the project is to develop a system that can be used to translate

simple English sentences to Idoma language.

The specific objectives are:

To develop a language processor that will have the capacity to:

5
i. Accept as input a sentence in English language and translate it into Idoma
language.
ii. Store such translations and print out translated output if so desired

1.4 Scope of Research

The project is aimed at creating a machine translation system that will accept

English source sentences and generate the equivalent Idoma sentences. It is not a

bidirectional application that can translate from English to Idoma and from Idoma

to English.

1.5 Significance of Research

The need for this research arose from the fact that a lot of information and

possibilities remain hidden from the Idoma people who have little or no knowledge

of English. The research will provide Idoma people greater access to information.

It will also give Idoma language a public profile in the information technology

world and provide a platform for people to really appreciate the beauty of their

indigenous language.

1.6 Limitations of Research

A survey of existing literatures and the internet shows that Idoma is not a

well-documented language. Linguistic resources such as parsers, morphological

analyzers, parallel corpora, part-of- speech taggers, bilingual dictionaries which

facilitate rapid development of machine translation applications are non-existent

6
either in hardcopies or in digitalized form. This greatly impeded the development

process.

1.7 Project Outline

This thesis is organized into five Chapters and a number of appendices.

Chapter One: Lists criteria, objectives, statement of the problem, scope,

limitations and organization of the project, which holds the foundation for the

output of the project.

Chapter Two: Explores existing literatures on language and machine translation

and the work done by scholars in this field. It describes language translators with

examples. The components of machine translators, the procedures for language

translation and the various technologies for building language translators were

discussed.

Chapter Three:This chapter presents the analysis of Idoma and English language.

Existing methods of English-Idoma translation were studied and the attendant

difficulties inherent in the translation were identified.The challenges of translating

from English to Idoma were identified and discussed.

Chapter Four:This chapter subjects the Machine Translator to a technical

translation session of English to Idoma,discusses the test and evaluation of the

7
language translator designed. Here the output was tested to see if the stated

objectives of the research were achieved.The implementation of the proposed

system was also presented in this chapter.

Chapter Five: Discusses the summary, conclusions and recommendation for

extended research by future researchers.

1.8 Definition of Technical Terms

Morphological Analyzer: A morphological analyzer is a program for analyzing


the morphology of an input word; it detects morphemes of any text.

Morphemes: The minimal distinctive unit of grammar in a language.

Parser: A natural language parser is a program that works out the grammatical
structure of sentences, for instance, which groups of words go together (as
"phrases") and which words are the subject or object of a verb.

Corpus: In linguistics, a corpus (plural corpora) or text corpus is a large and


structured set of texts usually electronically stored and processed.

Parallel corpus:A parallel corpus is a corpus that contains a collection of original


texts in language L1 and their translations into a set of languages L 2 ... Ln. In most
cases, parallel corpora contain data from only two languages

Bilingual Dictionary: A bilingual dictionary or translation dictionary is a


specialized dictionary used to translatewords or phrases from one language to
another.

Part-of- speech tagger:A Part-Of-Speech Tagger (POS Tagger) is a piece of


software that reads text in some language and assigns parts of speech to each word
(and other token), such as noun, verb, adjective, etc., although generally
computational applications use more fine-grained POS tags like 'noun-plural'.

Source Language: The language in which a text appears that is to be translated


into another language.

8
Target Language: The language into which someone or an application translates
or interprets.

9
CHAPTER TWO

Literature Review

2.0 Origin of languages and the need for Translation

Language started around 150,000 years ago to meet humans’ communicational

needs. The origin of language is under debate as evidence of languages before

writing is almost impossible to find.

One theory argues that the origin of all languages was the same, but they slowly

evolved and made thoroughly different entities, just like the animals did. However,

considering the same root for all languages requires more evidence.

The first language on earth might be the origin of all languages or a dead language

that fathered only a few of today’s languages. Since language is 150,000 years old

and writing is only 6000, no written evidence of languages before can answer this

question.

The origin of language was perhaps the need to communicate. Maybe the initial

words were only howls and hoots, but eventually, they evolved to form a

systematic way of communication for humans.

The babel myth documented in Genesis 11:6-9 indicates that there was a time

when all human beings spoke one language. Men later developed an inordinate

ambition of building a city and a tower contrary to The Creator’s plan and purpose.

As a result, God gave people different languages. This resulted in movement of

10
different groups of people to occupy different parts of the universe. The resources

of nature are not evenly distributed, what is found in one part may not be found in

another. This made people to travel from one region to another to meet their needs.

A need for a means of information exchange arose that led to translation.

Translation is necessary for cross-regional communication and for gathering the

information one needs to play a full part in society (Andy & Hany, 2009).

Translation is a social, political, scientific, philosophical as well as economic

necessity our multilingual society. Translation is essential for international and

intercultural activity, for it facilitates mutual understanding among different and

conflicting racial, ethnic, religious and cultural groups.

2.1 Human-driven Translation

Human translators have practical world knowledge which gives them the ability

to determine the proper connection between the words and between the sentences

throughout the document. This way the translator creates a legible document that is

also logical and contains the correct grammar and accurate connections. Despite

the fact human translators produce accurate translation, only a limited number of

human translators are available. According to market studies, the demand for

translation outweighs its supply. Apart from being in short supply human

translators are expensive and much time is spent carrying out translation. To meet

11
up with the ever increasing demand for translation, translation technologies were

evolved.

2.2 History of Machine Translation

Machine translation was one of the first applications of the computers, and the idea

was conceived even before the invention of computers (Hutchins, 1986). The fall

of Latin as the universal scientific language and the supposed inability of natural

languages to express thought unambiguously led thinkers such as Descartes and

Leibniz to come up with the idea of numerical codes for languages. Descartes, in a

letter dated 1692, described a universal language cipher, where the lexical

equivalents of the all known languages would be given the same code (Hutchins,

1986). Such dictionaries were actually published by three people; by Cave Beck in

1657, by Athanasius Kircher in 1663 and by Johann Joachim Becker in 1661

(Hutchins, 1986).

Automatic translation between human languages has been a long term scientific

dream. The research started in the 1930s. The research was aimed at developing

software capable of translating from one language to another with minimal human

participation. The first recorded success in machine translation took place in the

middle of the 1950s when a team of scientists from Georgetown University, USA,

had a machine successfully translate a number of sentences from Russian to

English. This success led many universities to establish their own development

12
centers for machine translation. The centers needed fund to proceed with the

research and government was appealed to for funding. The government in response

set up a committee called Automatic Language Processing Advisory Committee

(ALPAC) which was commissioned in 1964. The committee was to report to the

government on the state of the play with respect to machine translation as regards

quality, cost and prospect. The committee submitted a negative report in 1966 and

concluded that there was no shortage of human translators and that there was no

immediate prospect of machine translation producing useful translation of general

scientific test. The report led to withdrawal of funding and demoralization.

Research in machine translation came to a standstill at this point.

Optimism and enthusiasm for machine translation resurfaced in the 1980s for two

reasons, first the administrative and commercial needs of multilingual communities

stimulated the demand for translation, and secondly because large-scale access to

personal computers and word-processing programs produced a market for less

expensive machine translation systems. The important machine translation

applications developed at this time were GETA-Ariane (Grenoble), SUSY

(Saarbrücken), MU (Kyoto), and Eurotra (the European Union).

The beginning of the 1990s also witnessed vital developments in machine

translation with the emergence of different translation technologies. To this day

machine translation has continued to progress fueled by the competition towards

13
establishing more business in different parts of the world and the need for

localization of industrial products and services as well as the provision of

information to a global audience. (Andy & Hany, 2009, Maryann F, 2009,

Omachonu, G, 2011,Fahime & Abbas, 2012, Finch & Hwang etal, 2005) provide

more details.

2.3 Machine Translation Approaches

Machine translation approaches can be divided into different categories. Under this

classification, two main paradigms can be found: the rule-based approach and the

empirical-based or data driven approach. Rule-based translation systems can be

divided into three catalogs: literal translation method, interlingua-based method

and transfer-based method. Rule-based systems are based on linguistically-

informed foundations requiring extensive morphological, syntactic and semantic

knowledge. The input is transferred to the target using a large set of sophisticated

linguistic translation rules. Translation rules are created manually, demanding

significant multilingual and linguistic expertise. Therefore, rule-based systems

require large initial investment and maintenance for every language pair (Egbunu,

F, 2013). Also within the empirical-based paradigm, two other approaches can be

further distinguished:example-based and statistical-based and context based

(Ibrahim, S (2014). Under the empirical-based approach the knowledge is

automatically extracted by analyzing translation examples from a parallel corpus

14
built by human experts. The advantage is that, once the required techniques have

been developed for a given language pair, MT systems should – theoretically be

quickly developed for new language pairs using provided training data.

Although the rule based system require significant amount of linguistic knowledge,

the knowledge acquired for one natural language processing system may be reused

to build knowledge required for a similar task in another system. (Hieu, H, 2011)

posited that rule-based approach is better than its counterpart corpus-based

approach for two main reasons: 1: less-resourced languages, for which large

corpora, possibly parallel or bilingual, with representative structures and entities

are neither available nor easily affordable, and 2: for morphologically rich

languages, which even with the availability of corpora suffer from data sparseness.

It is clear from this argument that each of these technologies or approaches has

their strengths and weaknesses which will be discussed in detail in latter sections.

2.3.1 Literal translation method

Literal translation method: It is a simple form of rule-based machine translation.

Literal translation is called direct translation, word-based translation or dictionary-

based translation.The basic idea is that the words will be translated word by word,

usually without much consideration for context match between them (Ibrahim, S

(2014). As an example, it basically works as follows: a word or sentence from the

source language is selected, and then looked up in the dictionary for the

15
corresponding word or sentence in the target language. That is why the literal

translation is generally designed for a particular language pair and it is not

versatile.

This approach also known as first generation approachis the original and oldest

translation strategy, which was employed around the 1950sto 60s when the need

for machine translation was mounting. It performs a word for word or phrase for

phrase translation. The word order of the target language text is the same as that of

the source language even where the target language does not permit the same word

order (Banjo & Jibowo, 2011). The method has no capacity for rearranging

syntactic construction or lexical selection. This means that the sentence is not

analyzed structurally or morphologically. It maps directly from source language to

target language with very minimal analysis. According to (Callison-Burch &

Osborne etal, 2006) the method depends heavily on a large bilingual dictionary and

lack separation between analysis and generation. Given any source sentence the

system picks up the direct equivalent of each target word from the bilingual

dictionary and presents it in the same order they appear in the source sentence.

Another problem with the literal translation approach is the lexical selection

problem. The approach does not analyze and translate words from their context.

This is especially problematic when the words to be translated have more than one

meaning.
16
One more obvious limitation of this method which was pointed out by (Roberto, N,

2009)is lack of extensibility, adding a fresh language pair (direction of translation)

to a direct system is hardly distinguishable from creating an entirely new system

These limitations notwithstanding the direct model has one advantage, it is highly

robust and simple to implement.

2.3.2 TRANSFER METHOD

Along with the development of the literal translation method, the transfer-based

method was proposed. The transfer-based method performs an analysis of the

sentence structure and generates the target-language text based on the different

linguistic rules of the different languages.

The transfer model belongs to the second generation of machine translation (mid

60s to 1980s). This model is more is more advanced than the direct model because

it does not merely perform local and morphological analysis but carries out both

regional and grammatical analysis. In other words the model conducts a

comprehensive analysis of the source text. It has rules that map the grammatical

segments of the source sentence to a representation in the target language. These

rules which are used for the structural transformation of phrases and resolving

ambiguity are stored in a database(Howard, J, 1982). The rules are also stored as

facts in a rule base(Rekha & Neha, 2012). It is these rules that ensure that the

translation is both morphologically and structurally correct in terms of word order.

17
The advantages of the transfer-based approach were discussed in (Ikani, F, 2010,

Gurlen & Navjor, 2013, Banjo &Jibowo, 2011). These advantages which makes

the transfer-based architecture appealing for many researchers includes: First, is

applicability. While it is difficult to reach the level of abstractness required in

interlingua systems, the level of analysis in transfer models is attainable. Second,

extensibility,To add a fresh language pair (or direction of translation) in a transfer

system, one need only provide transfer components for the new language pair(s),

and monolingual components (analysis and synthesis) for the new languages.

Existing monolingual components can be preserved. For example, an English-

Portuguese module may share several transformations with an English-Spanish

module. Third, ease of implementation. Developing a transfer based system require

less time and effort than Interlingua. Four, acquisition of linguistic knowledge is

easy, and the relevant set of rules is easy to construct, understand, modify and

maintain. Five, ambiguities that carry over from one language to another are

handled with minimal effort.

The above advantages notwithstanding, transfer based architecture, has some

inherent limitations namely: A large set of transfer rules must be created for each

source language/target language pair, a translation system that accommodates n

languages requires n2 set of translation rules.

18
TAUM (Arnold & Sadler, 1990) and METEO (Asad & Habib, 2013) are examples

of transfer-based method.

2.3.3 Statistical Machine Translation (SMT)

SMT uses statistical analysis and predictive algorithms to define rules that are best

suited for target sentence translation. These models are trained using a bilingual

corpus.Based on the subject matter text that is used to train a corpus, the SMT will

be best suited for documents pertaining to the same subject. Usually, a solid corpus

requires 100 million words and 1 million aligned sentences to be effective.SMT

can be approached through different subgroups: word-based, phrase-based, syntax-

based and hierarchical phrase-based.

The first statistical approach to MT was pioneered by a group of researchers in

IBM in the late 1980s and early 1990s (Nagao, M, 1984, Newmark, P. 1988).

Though SMT came to the scene lately it has become de facto technology for

building MT systems. It has gained tremendous momentum both in the research

community and the commercial sector. The requirement for using SMT approach

in machine translation is the availability of large, good quality and representative

aligned bilingual corpora (Amparo, A, 2014). The progress of SMT has been

supported by the availability of large parallel corpus such as the Arabic –English

and Chinese – English parallel corpora distributed by Linguistic Data Consortium

19
[96], the Europarl corpus(Omachonu &David, 2012) and the JRC-Acquis corpus

(Ralf & Bruno etal, 2006).

The notion of SMT implies the use of statistics. It is based on statistics derived

from the corpora of the naturally occurring language. The translation is done

according to probability distribution p(e/f) that a string e in the target language

( e.g English) is the translation of the string f in the source language (e.g French).

2.3.4 Example Based Machine Translation (EBMT)

Example-based machine translation simulates the human translation process. It was

introduced by Makoto Nagao in 1984 Omar & Nazlia etal, 2010). It performs

translation by retrieving similar examples with their translation from translation

example stored in a textual database and input expressions are rendered in the

target language by retrieving from the database that example which is most similar

to the input. EBMT has been described by different researchers in different ways;

(Carbonell & Klein etal, 2006) called it “case based”, (Nameh &Fakhrahmad etal,

2011) called it “analogy based” [101] referred to it as “experience –guided”

EBMT is an empirical-based or data driven approach and a major requirement for

its deployment is a parallel aligned corpora (Callison-Burch &Osborne etal, 2006).

EBMT and SMT share some similarities both in strengths and weaknesses.

2.3.5 Hybrid Approach

20
Both rule based and empirical based approaches discussed so far have their

inherent strengths and weaknesses. The rule based technology requires significant

linguistic expertise which has to be manually encoded into their data structures and

algorithms either as special cases or as a full representation of the conceptual

content (Callison-Burch &Osborne etal, 2006). This impedes development speed

and robustness. The empirical based technology needs large amount of data which

is usually not readily available especially for resource-poor languages, it also fails

when selection preferences need to be based on distant words. Due to the inherent

weaknesses of these approaches none has been able to singly achieve the required

level of accuracy and quality in translation. This situation led to the adoption of a

hybrid approach. The hybrid approach is a machine translation technology that

integrates various machine translation technologies (Salem & Brian, 2009,

Cristina, E, 2010). The technologies complement each other to produce a more

satisfactory result. Some popular machine translation systems which employ the

hybrid technology are PROMT, SYSTRAN and Asia Online. App Tek delivered

its first hybrid machine translation in 2009.

2.3.6 Neural Machine Translation

In 2017, Machine Translation made another technological leap with the advent of

Neural Machine Translation (NMT). Neural Machine Translation harnesses the

21
power of Artificial Intelligence (AI) and uses neural networks to generate

translations.

Neural Machine Translation is the primary algorithm used in the industry to

perform machine translation. This state of the art algorithm is an application of

deep learning algorithm in which massive dataset of translated sentences are used

to train a model capable of translating between two languages. It outperforms

phrase-based systems without the need of creating handcrafted features such as

lexical or grammatical rules. It has been used in language translations such as

Google Translate (https://translate.google.com/).

NMT was introduced by (Kalchbrenner & Phil, 2013, Sutskever &Oriol etal, 2014,

Cho & Bart et al, 2014) who defined Recurrent Neural Networks (RNN) models

for machine translation.

Before NMT, statistical machine translation (SMT) provided the most state-of-the-

art results. While many initially believed that SMT would eventually become the

answer to machine translation, several issues including the number of components

that went into a single translation model and lack of generalizability of a model

stagnated SMT progress and prevented it from providing perfect translations.

At the very high level, NMT models are comprised of an Encoder and a Decoder,

both are Recurrent Neural Networks that are trained jointly. An attention

mechanism helps aligning the input tokens to the output tokens in order to facilitate

22
the translation. The Encoder reads the input sentence and generates a sequence of

hidden states. These hidden states are then used by the Decoder to generate a

sequence of output words, representing the translation of the input sentence.

Although Neural machine translation has emerged as the most promising machine

translation approach in recent years, showing superior performance on public

benchmarks and rapid adoption in deployments by, e.g., Google, Systran, and

WIPO there have been reports of poor performance, such as the systems built

under low-resource conditions in the DARPA LORELEI program

A fundamental requirement for the deployment of NMT is the availability of

massive dataset of translated sentences are used to train a model capable of

translating between two languages

23
CHAPTER THREE

SYSTEMS ANALYSIS AND DESIGN

3.1 Language Differences

Indeed, despite the underlying universality of human languages, there are

significant differences in how their structural constituents are organized. These

differences can be observed at various levels of linguistic analysis, including

phonology, morphology, syntax, and semantics.

Phonology: Languages vary in their phonological systems, including the inventory

of sounds (consonants, vowels, and tones), phonotactics (permissible arrangements

of sounds), and phonological rules (sound patterns and processes).

Morphology: Morphological structures differ across languages in terms of how

words are formed and modified. Some languages, like English, rely heavily on

affixes (prefixes, suffixes, infixes) to indicate grammatical relationships and derive

new words, while others, like Chinese, utilize more analytic or isolating

morphological strategies.

Syntax: Syntax refers to the rules governing the combination of words to form

phrases, clauses, and sentences. Languages exhibit diverse word order patterns

(e.g., subject-verb-object vs. subject-object-verb), syntactic constituency (e.g.,

head-initial vs. head-final), and syntactic agreement systems.

24
Semantics: Semantic structures vary in how meanings are encoded and interpreted.

This includes differences in lexical semantics (word meanings), compositional

semantics (meaning derived from the combination of words and phrases), and

pragmatic principles governing language use in context.

Pragmatics: Pragmatics deals with how language is used in context, including

principles of communication, speech acts, implicature, and discourse structure.

Cultural and situational factors influence pragmatic conventions, leading to

variation across languages.

These differences reflect the rich diversity of human linguistic expression and the

adaptability of language to diverse sociocultural environments. Linguists study

these variations to uncover universal principles underlying human language and to

understand the unique features of individual languages.

3.2 Analysis of English and Idoma Language

The two language pairs, English and Idoma were carefully studied in terms

of syntax, semantics, morphology, part of speech, word order, pluralization of

nouns, structural differences etc. through document study, observation and

interaction with Idoma/English professionals and community elders with vast

experience and knowledge about Idoma history. The rules that govern the

combination of words to form correct sentences in Idoma were identified.

25
Idoma is a fixed word order SVO (subject, verb, object) like English but the

arrangement of words in noun phrase and adjective phrase are not the same.

English places modifiers before nouns in noun phrases, Idoma does the reverse,

nouns are placed before modifiers (Mary, 2016).

Significant amount of linguistic knowledge is required for successful

deployment of machine translation systems using the rule based machine

translation technology; therefore, a comprehensive study and analysis of the two

languages was carried out. This knowledge provided the basis for the design of the

rule base, the inference engine and the full-form lexicon which are essential

components of the proposed rule based system for automatic translation of English

sentences to Idoma sentences.

From the study of Idoma language sentence structure conducted by (Mary,

2016, Ugwu, Imoh & Yakubu, 201..) the following the transformational rules that

govern the translation of English and Idoma phrases presented in the tables below

were formulated.

Table 3.1: Noun Phrases Transformational Rules

It was observed that Idoma language has no auxiliary verbs, therefore some

English words like is, are, of, a, an, have no equivalents in Idoma language.

Whenever they occur in a sentence, they are ignored in the translation.

26
3.1.1 Part of Speech tag system

Words are grouped into categories called parts of speech. There are eight parts of

speech in English language. They are Nouns, Verb, Adjectives, Adverb,

Conjunctions, Preposition and Determiners. In Idoma there are some parts of

speech such as nouns, adverb, adjectives, verbs etc.

The meaning of some words in Idoma depends on the nouns that follow them, for

this reason, we had to develop a part of speech tag system for the machine

translation differently from the conventional part of system so that meanings can

be appropriately conveyed. The part of speech tag system is presented in table 3.6

below.

Table 3. Part of Speech Tags

S/N Tag Description


1 ANN Animate singular noun
2 ANNS Animate Plural noun
3 NOP Noun indicative of place or location
4 N General nouns
5 VPP Verb in present tense
6 VPD Verb in past tense
7 VPT Verb in past participle
8 PPN Personal pronoun
9 POPN Possessive pronoun
10 DA Definite Article
11 IDA Indefinite Article
12 SPV Split Verb
13 DRV Direction Verb

27
14 ADJ Adjective
15 ADV Adverb
16 CDN Cardinal Numbers
17 ODN Ordinal Numbers
18 QTN Quantifiers
19 DEM Demonstratives
20 PDT PreDeterminers
21 PRP Preposition
22 V General verbs

3.2 Analysis of the Present System of English to Idoma Translation

3.2.1 Weaknesses of the Present System

The following weaknesses were identified in the present system of translating

from English to Idoma language:

1. Scarcity of Professional Translators.

2. The scarcity of Professional Translators limits the possibility of meeting

deadlines for large volume assignments.

3. The cost of employing or hiring Professional translators is prohibitive.

Volunteer translators are only a handful.

4. Human translation is very slow and time consuming.

5. Lack of standardization in Idoma language makes translation a difficult task

especially with regard to technical documentation.

6. Lack of tools such as online dictionaries, glossaries and translation

memories inhibits the translation process.

28
3.2.2Strength of the present system

Human translators have practical world knowledge as well as pragmatic

knowledge. This makes the translation of English to Idoma carried out by human

translators of reasonable high quality.

3.2.3 Benefits of the proposed Solution

The proposed rule based machine translation system from English to Idoma

language will overcome the identified weaknesses of the present system and

provide the following benefits:

1. Reduced translation cost. With sufficient translation volume, MT translation

is less expensive than human translation.

2. Improved delivery times. Delivery time for machine generated-translation is

limited only by the time it takes to revise them. In many applications

revision is not critical, so delivery is immediate.

3. Availability. MT systems have the advantage of being always on, so request

are processed as they are received.

29
3.3 System design

3.3.1 System Architecture


The conceptual architecture of the system is shown in figure 3.1
Preprocessing
English Case Converter
Sanitizer Tokenizer
Input

Bilingual Dictionary DB

Rule Engine
Arrays of English words, POS and Idoma Equivalents

Idoma
Sentence

Figure 3.1 System Architecture

The full form bilingual lexicon which contains the English words and Idoma

equivalents together with the parts of speech was developed in Microsoft Access

database platform. The rule engine which applies a collection of lexical and

syntactic transfer rules to generate the Idoma equivalent of the English sentence

was developed using php platform.

30
3.3.2 Database Specification and Platform

The relational database model was used for building the database and MYSQL is

the relational database model of choice. The database schema is described below:

1. Table Name: BilingualDictionary

S/NO FELD NAME DATA TYPE FIELD SIZE


1. Id AUTONUMBER AUTONUMBER
2 Englishword VARCHAR 40
3 Idomaword VARCHAR 70
4 pos VARCHAR 7
Table 3.7 Bilingual Dictionary table structure

This is the full form bilingual dictionary contains an English word and the

corresponding Idoma equivalents and other information necessary for translation.

3.3.3 Input/output Specification

Figure 3.2 displays the layout of the input and output screen. The user enters the

text to be translated in the space in the first section under Enter English text to

Translate and clicks the Translate button. The output of the translation is

displayed in the second section of the screen under Translated Idoma text.

31
English to Idoma Automatic Translator

Enter English Text to Translate

Translated Idoma Text

Figure 3.2 Input/output Layout

3.3.4Justification for choice of programming language

The inference engine, which contains the rules for translating input sentence in

English language to Idoma language, was developed using PHP.

32
The name PHP stands for Hypertext Preprocessor and denotes a server-side

scripting language, which suggests that applications written thereon run on web

servers.PHP is being widely utilized in developing web applications and become

one among main languages for developing web based applications. Leading social

networking sites like Facebook and reputed organizations like Harvard University

both support PHP which makes PHP popular and increases its credibility.

33

You might also like