Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Download
Standard view
Full view
of .
Look up keyword
Like this
4Activity
0 of .
Results for:
No results containing your search query
P. 1
Structural Analysis of Bangla Sentences of Different Tenses for Automatic Bangla Machine Translator

Structural Analysis of Bangla Sentences of Different Tenses for Automatic Bangla Machine Translator

Ratings: (0)|Views: 993|Likes:
Published by ijcsis
This paper addresses about structural mappings of Bangla sentences of different tenses for machine translation (MT). Machine translation requires analysis, transfer and generation steps to produce target language output from a source language input. Structural representation of Bangla sentences encodes the information of Bangla sentences and a transfer module has been designed that can generate English sentences using Context Free Grammar (CFG). The MT system generates parse tree according to the parse rules and a lexicon provides the properties of the word and its meaning in the target language. The MT system can be extendable to paragraph translation.
This paper addresses about structural mappings of Bangla sentences of different tenses for machine translation (MT). Machine translation requires analysis, transfer and generation steps to produce target language output from a source language input. Structural representation of Bangla sentences encodes the information of Bangla sentences and a transfer module has been designed that can generate English sentences using Context Free Grammar (CFG). The MT system generates parse tree according to the parse rules and a lexicon provides the properties of the word and its meaning in the target language. The MT system can be extendable to paragraph translation.

More info:

Published by: ijcsis on Jan 20, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

05/07/2012

pdf

text

original

 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010
Structural Analysis of Bangla Sentences of Different Tenses for AutomaticBangla Machine Translator
Md. Musfique Anwar, Nasrin Sultana Shume and Md. Al-Amin Bhuiyan
Dept. of Computer Science & Engineering, Jahangirnagar University, Dhaka, BangladeshEmail:musfique.anwar@gmail.com,shume_sultana@yahoo.com,alamin_bhuiyan@yahoo.com
Abstract
 
This paper addresses about structural mappings of  Bangla sentences of different tenses for machinetranslation (MT). Machine translation requiresanalysis, transfer and generation steps to producetarget language output from a source language input.Structural representation of Bangla sentencesencodes the information of Bangla sentences and atransfer module has been designed that can generate English sentences using Context Free Grammar (CFG). The MT system generates parse treeaccording to the parse rules and a lexicon providesthe properties of the word and its meaning in thetarget language. The MT system can be extendable to paragraph translation.
Keywords:
 
 Machine Translation, Structural representation,Context Free Grammar, Parse tree, Lexicon etc.
1. Introduction
Machine translator refers to computerized systemresponsible for the production of translation from onenatural language to another, with or without humanassistance. It excludes computer-based translationtools, which support translators by providing access toon-line dictionaries, remote terminology databanks,transmission and reception of texts, etc. The core of MT itself is the automation of the full translationprocess. Machine translation (MT) means translationusing computers.We need to determine a sentence structure at firstusing grammatical rules to interpret any language.Parsing or, more formally, syntactic analysis, is theprocess of analyzing a text, made of a sequence of tokens (for example, words), to determine itsgrammatical structure with respect to a given formalgrammar. Parsing a sentence produces structuralrepresentation (SR) or parse tree of the sentence [1].Analysis and generation are two major phases of machine translation. There are two main techniquesconcerned in analysis phase and these aremorphological analysis and syntactic analysis.Morphological parsing strategy decomposes a wordinto morphemes given lexicon list, proper lexiconorder and different spelling change rules [2]. Thatmeans, it incorporates the rules by which the wordsare analyzed. For example, in the sentence -
The
young girl‟s behavior was
unladylike
, the word
Unladylike
can be divided into the morphemes asUn
 – 
not (prefix), Lady
 – 
well behaved female (rootword), Like
 – 
having the characteristics of (suffix).
 
Morphological information of words are storedtogether with syntactic and semantic information of the words.The purpose of syntactic analysis is to determine thestructure of the input text. This structure consists of ahierarchy of phrases, the smallest of which are thebasic symbols and the largest of which is the sentence.It can be described by a tree known as parse/syntaxtree with one node for each phrase. Basic symbols arerepresented by leaf nodes and other phrases byinterior nodes. The root of the tree represents thesentence.Syntactic analysis aims to identify the sequence of grammatical elements e.g. article, verb, preposition,etc or of functional elements e.g. subject, predicate,the grouping of grammatical elements e.g. nominalphrases consisting of nouns, articles, adjectives andother modifiers and the recognition of dependencyrelations i.e. hierarchical relations. If we can identifythe syntactic constituents of sentences, it will beeasier for us to obtain the structural representation of the sentence [3].Most grammar rule formalisms are based on the ideaof phrase structure
 – 
that strings are composed of substrings called phrases, which come in differentcategories. There are three types of phrases in Bangla-Noun phrase, Adjective Phrase and Verb Phrase.Simple sentences are composed of these phrases.Complex and compound sentences are composed of simple sentences [4].Within the early standard transformational models itis assumed that basic phrase markers are generated byphrase structure rules (PS rules) of the following sort[5]:S
→ NP AUX VP
 
 NP → ART N
 
VP → V NP
 PS rules given above tell us that a S (sentence) canconsist of, or can expanded as, the sequence NP (nounphrase) AUX (auxiliary verb) VP (verb phrase). Therules also indicate that NP can be expanded as ART Nand that VP can be expressed as V NP.
70http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010
This paper implements a technique to performstructural analysis of Bangla sentences of differenttenses using Context Free Grammar rules.
2. Bangla Sentences Structure
 
In Bangla language, a simple sentence is formed by anindependent clause or principal clause. Example:A complex sentence consists of one or moresubordinate clause within a principle clause [2]. Asfor example,Bangla compound sentence is formed by two or moreprincipal clauses joined by anindeclinable/conjunctive
 
.
Example:Types of Bangla tense are given below in
Fig. 1
:
Fig. 1
Types of Bangla tense
2.1 Basic Structural Difference between
 
Banglaand English Language
 Following are the structural differences betweenBangla and English languages:
• The
basic sentence pattern in English is subject +verb + object (SVO), whereas in Bangla it is- subject+ object + verb (SOV). Example:English: I (S) eat (V) rice (O)Bangla: (S)
 
(O)
 
(V)
 
Auxiliary verb is absent in Bangla language.Example: I (Pronoun) am (Auxiliary verb) reading(Main verb) a (Article) book (Noun)
 
Bangla:
 
(Pronoun)
 
(Article) (Noun)(Main verb)
• Preposition
is a word placed before a noun orpronoun or a noun-equivalent to show its relation toany other word of the sentence [6]. In Banglalanguage, bivakti will place after noun or pronoun or anoun-equivalent. Example:English: The man sat on the chair
 
Bangla:
,
here „
 
is bivakti
2.2 Structural Transfer from Bangla to English
Parsing is the process of building a parse tree for aninput string . We can extract the syntactic structure of a Bangla sentence using any of the two approaches: i)top-down parsing ii) bottom-up parsing.
2.2.1 Top-Down Parsing
Top-down parsing starts at the most abstract level (thelevel of sentences) and work down to the mostconcrete level (the level of words). An input sentenceis derived using the context-free grammar rules bymatching the terminals of the sentence. So, given aninput string, we start out by assuming that it is asentence, and then try to prove that it really is one byusing the grammar rules left-to-right. That works asfollows: If we want to prove that the input is of category
S
and we have the rule
S
NP VP
, then wewill try next to prove that the input string consists of anoun phrase followed by a verb phrase.
2.2.2 Bottom-Up Parsing
 
The basic idea of bottom up parsing is to begin withthe concrete data provided by the input string --- thatis, the words we have to parse/recognize --- and try tobuild more abstract high-level information.Example: Consider the Bangla sentence
 
”. To perform bottom
-up parsingof the sentence using the following rules of thecontext-free grammar,<SENTENCE>
<NOUN-PHRASE> <VERB-PHRASE*><NOUN-PHRASE>
<CMPLX-NOUN> |<CMPLX-NOUN> <PREP-PHRASE*> | <ART><ADJ> <NOUN> <PREP-PHRASE*>
71http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010
<VERB-PHRASE>
<CMPLX-VERB> |<CMPLX-VERB> <PREP-PHRASE*> | <CMPLX-VERB> <PREP-PHRASE*><PREP-PHRASE>
<PREP> <CMPLX-NOUN*> |<PREP> <PRONOUN><CMPLX-NOUN>
<ARTICLE> <NOUN> |<NOUN> | <PRONOUN>| <NOUN> <PRONOUN> <NOUN><CMPLX-VERB>
< MAIN-VERB> | < MAIN-VERB> <NOUN-PHRASE*>During the bottom-up parsing of the Bangla sentence
 
 ,
we obtain the syntacticalgrammatical structure NOUN ARTICLE NOUNMAIN-VERB.The syntactic categories in the resulting grammaticalstructure are then replaced by the constituents of thesame or smaller unit till a SENTENCE is obtained,which is shown below:InputSentence NOUN ARTICLE NOUN MAIN-VERB
 NOUN-PHRASE NOUN-PHRASEMAIN-VERB
 NOUN-PHRASE CMPLX-VERB
 NOUN-PHRASE VERB-PHRASE
 SENTENCE
3. Proposed MT Model
The model proposed model for structural analysis of Bangla sentences is shown in
Fig. 2
.
Fig. 2
Block diagram of proposed MT model
3.1 Description of the Proposed Model
The proposed MT system will take a Bangla naturalsentence as input for parsing. Stream of characters aresequentially scanned and grouped into tokensaccording to lexicon. The words having a collectivemeaning are grouped together in a lexicon. The outputof the Tokenizer of the input sentence
“Chele
ti Boi
Porche”
is as follows [1] [4]:
TOKEN = (
“Chele”
,
“Ti”, “Boi”, “Por”, “Che”).
 The parser involves grouping of tokens intogrammatical phrases that are used to synthesize theoutput. Usually, the phrases are represented by a parsetree that depicts the syntactic structure of the input.A lexicon can be defined as a dictionary of wordswhere each word contains some syntactic, semantic,and possibly some pragmatic information. The entriesin a lexicon could be grouped and given by wordcategory (nouns, verbs, prepositions and so on), andall words contained within the lexicon listed withinthe categories to which they belong [1] [4] [5] [7]. Inour project, the lexicon contains the English meaningand parts of speech of a Bangla word.A context-free grammar (CFG) is a set of recursiverewriting rules (or productions) used to generatepatterns of strings. It provides a simple and precisemechanism for describing the methods by whichphrases in some natural language are built fromsmaller blocks, capturing the "block structure" of sentences in a natural way. Such as noun, verb, andpreposition and their respective phrases lead to anatural recursion because noun phrase may appearinside a verb phrase and vice versa. The mostcommon way to represent grammar is as a set of production rules which says how the parts of speech
can put together to make grammatical, or “well
-
formed” sentences
[8].In the conversion unit, an input sentence is analyzedand a source language (SL) parse tree is producedusing bottom-up parsing methodology. Then thecorresponding parse tree of target language (TL) isproduced. Each Bangla word of the input sentence isreplaced with the corresponding English word fromthe lexicon in the target (English) parse tree toproduce the target (English) language sentence.Structural Representation (SR) is a process of findinga parse tree for a given input string. For example, theparse tree of the input sentence
 
and the corresponding parse tree of the English
sentence “The boy drinks tea” is shown in
Fig. 3
.ConversionUnitSource LanguageSentence (Bangla)ParserParse Treeof BanglaSentenceParse Treeof EnglishSentenceLexiconContext FreeGrammar RulesOutput targetLanguage Sentence(English)
72http://sites.google.com/site/ijcsis/ISSN 1947-5500

Activity (4)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
Atik Ars added this note
it's relay helpful for us
mdjoy07 liked this

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->