(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 9, December 2010
Structural Analysis of Bangla Sentences of Different Tenses for AutomaticBangla Machine Translator
Md. Musfique Anwar, Nasrin Sultana Shume and Md. Al-Amin Bhuiyan
This paper addresses about structural mappings of Bangla sentences of different tenses for machinetranslation (MT). Machine translation requiresanalysis, transfer and generation steps to producetarget language output from a source language input.Structural representation of Bangla sentencesencodes the information of Bangla sentences and atransfer module has been designed that can generate English sentences using Context Free Grammar (CFG). The MT system generates parse treeaccording to the parse rules and a lexicon providesthe properties of the word and its meaning in thetarget language. The MT system can be extendable to paragraph translation.
Machine Translation, Structural representation,Context Free Grammar, Parse tree, Lexicon etc.
Machine translator refers to computerized systemresponsible for the production of translation from onenatural language to another, with or without humanassistance. It excludes computer-based translationtools, which support translators by providing access toon-line dictionaries, remote terminology databanks,transmission and reception of texts, etc. The core of MT itself is the automation of the full translationprocess. Machine translation (MT) means translationusing computers.We need to determine a sentence structure at firstusing grammatical rules to interpret any language.Parsing or, more formally, syntactic analysis, is theprocess of analyzing a text, made of a sequence of tokens (for example, words), to determine itsgrammatical structure with respect to a given formalgrammar. Parsing a sentence produces structuralrepresentation (SR) or parse tree of the sentence .Analysis and generation are two major phases of machine translation. There are two main techniquesconcerned in analysis phase and these aremorphological analysis and syntactic analysis.Morphological parsing strategy decomposes a wordinto morphemes given lexicon list, proper lexiconorder and different spelling change rules . Thatmeans, it incorporates the rules by which the wordsare analyzed. For example, in the sentence -
young girl‟s behavior was
, the word
can be divided into the morphemes asUn
not (prefix), Lady
well behaved female (rootword), Like
having the characteristics of (suffix).
Morphological information of words are storedtogether with syntactic and semantic information of the words.The purpose of syntactic analysis is to determine thestructure of the input text. This structure consists of ahierarchy of phrases, the smallest of which are thebasic symbols and the largest of which is the sentence.It can be described by a tree known as parse/syntaxtree with one node for each phrase. Basic symbols arerepresented by leaf nodes and other phrases byinterior nodes. The root of the tree represents thesentence.Syntactic analysis aims to identify the sequence of grammatical elements e.g. article, verb, preposition,etc or of functional elements e.g. subject, predicate,the grouping of grammatical elements e.g. nominalphrases consisting of nouns, articles, adjectives andother modifiers and the recognition of dependencyrelations i.e. hierarchical relations. If we can identifythe syntactic constituents of sentences, it will beeasier for us to obtain the structural representation of the sentence .Most grammar rule formalisms are based on the ideaof phrase structure
that strings are composed of substrings called phrases, which come in differentcategories. There are three types of phrases in Bangla-Noun phrase, Adjective Phrase and Verb Phrase.Simple sentences are composed of these phrases.Complex and compound sentences are composed of simple sentences .Within the early standard transformational models itis assumed that basic phrase markers are generated byphrase structure rules (PS rules) of the following sort:S
→ NP AUX VP
NP → ART N
VP → V NP
PS rules given above tell us that a S (sentence) canconsist of, or can expanded as, the sequence NP (nounphrase) AUX (auxiliary verb) VP (verb phrase). Therules also indicate that NP can be expanded as ART Nand that VP can be expressed as V NP.