You are on page 1of 4

PAPER ON ENGLISH TO HINDI - TRANSLATION SYSTEM

By:
Prof. Lata Gore. Prof. Nishigandha Patil
M.E.(Comp. Science & Engg) B.E. (Comp. Science & Engg)

Dr.Babasaheb Ambedkar Marathwada University


Jawaharlal Nehru Engineering College
N-6, CIDCO, Aurangabad
431003-M.S, India

ABSTRACT
The goal of Natural Language Processing (NLP) is to design and build a computer system that will
analyze, understand and generate languages that human uses naturally, so that eventually you can address your
computer as though you were addressing another person.

Machine Translation (MT) is one of the major areas of NLP. The objective of MT is to recognize the
content of a document in order to render it in another language. We have designed and developed an English – Hindi
translation system with special reference to weather narration domain. Two objectives of this MT system are:
(a) To recognize the content in its source language.
(b) To generate a command to give same content in target language.

For recognition of the content of the source document, our MT system takes into account the structural
properties of the source language. In the generation process of MT, our system is able to solve some of the
ambiguities in source text and produces the same content in target language.

1. INTRODUCTION.
Due to inherent ambiguities in a natural language, making a computer to translate natural language is a
complex task. So we have to restrict it to some specific domain. We have selected Weather narration domain. We
have designed and developed an English – Hindi translation system with special reference to weather narration
domain.

The translation system from English to Hindi i.e. from a foreign to a regional language consists of many
problems. Any natural language is a free language, i.e. its structure is not fixed. The structure can keep changing as
the user wishes. Hence a good translation system will have to handle as many grammar constructs as possible.

Thus our purpose is to develop a Translation System that can translate English text into Hindi, with a special reference
to “Weather Narration”.

2. BODY OF PAPER
Evaluation and synthesis
The problems faced in translation from English to Hindi are immense, and a lot of efforts are being put into
it to try and solve them. On our part, we have tried to handle the different ambiguities and other problems as far as
possible. Considering the problems that may arise we decided to store the data along with the word’s attributes or
those it governs like gender, number, person etc.

The translation is text based i.e. input is a text file given by the user while the output is a text file generated
by the program which can be saved as per the user requirements. The input is sentence wise, which is converted into
a linked list of words. A linked list turns out to be economical for shifting of words required in the translation
process. Disambiguation is carried out step by step to get the final correct output.
To translate any text it is initially important to determine the noun, verbs, and phrases in the sentence. This
information is obtained from the database. Further, it is important to determine the subject and object in the
sentence, as further problems need this information. This information can be sought by checking the positions of
verbs, etc. in the sentence.

Grammar
In order to understand the syntactic structure of a sentence one must know two things: the grammar
which is a formal specification of the structures allowable in the language and the parsing techniques which is the
method of analyzing a sentence to determine its structure according to the grammar.

In our Translation system Context free grammar is used.


Example of CFG,
S NP.VP.
VP V.NP.
NP N.
NP D.NP

Based on such rules we have formalized the grammar rules for English as well as Hindi are which are
stored in the database. If developer wants to make changes in the grammar rules, if required, he can directly update
the database without modifying the source code.

Parser uses English grammar rules to determine whether the syntax of the input sentence is correct or not. If
it is correct English tree is generated. The Parsing technique used here is bottom-up parsing with Top-down filtering.
For generating the tree for the corresponding translated Hindi sentence mapping of English - Hindi grammar rules is
done.

Analysis and Modeling


Level 0:
English Text
1
Text Editor Hindi Text Translation
Updation System
Module Fetching words
From Database

Database

Level 1:

1.2
1.1 Translation according
Text English text Text Hindi Text Text
Editor to Hindi rules Editor
Processing

Updation Fetching words


Module from Database

Database
Level 2:
1.1.1 1.1.2
Text
Text Tokanization
Editor English acceptance
Text

1.1.3
Information
Retrieval

1.2.1
Updation Module English Tree
generator

1.2.2 Hindi Text Text


Hindi Tree
generator Editor
Database

File and Database Structure

The database for the project is designed in Microsoft Access 97. As the translated output is Devnagari Script,
use of MS Access was felt necessary. The database is divided into three parts:

1) MAP-TABLE: This table stores all the English word, their corresponding Hindi translation, along with the
features. All the words are stored here. This is the main table of our database.

2) VERB-TABLE: This table stores all the verbs, with the root Hindi word, and the type corresponding to
each of the verb.

3) AUXILLARY-TABLE: This table stores all the possible combinations of the auxiliaries occurring in
English, along with their corresponding Hindi Translation, and their type. There are in total 8 types of
verbs. The auxiliary combination of all these types is stored here.

4) RULES –TABLE: This table stores all formalized English grammar rules and corresponding Hindi
grammar rules.

Software Modules
The translation system consists of two main modules:
1. Translation
2. Updation
Under our research project we have used the transfer approach for translation. Using this
approach, MT system transfers the source sentence to the target sentence with the help of different
grammatical rules and also a bilingual dictionary. Our project is based on this approach as this gives
flexibility in using grammar. Using transfer approach the translation module consists performs the
translation task.

The translation module consists of following sub modules:


1. Input accepting module
2. Lexicon generating module
3. Information retrieval module.
4. Pre-processing of sentence
5. English tree generator
6. Post-processing of English tree
7. Generation of Hindi tree
8. Post-processing of Hindi tree & generating output

Conclusion
Our system supports

• Syntactically and semantically correct output text.


• Accuracy.
• High-speed translation.
• Facility for dynamic updating of the database.
• User friendly environment

Our translation system gives domain specific translation with satisfactory results. By modifying the
database it can be extended to another domains.

Due to intermediate representations of input sentences that are language independent framework the system
can be incorporated with generators for other regional languages.

The semantic analysis totally depends upon the depth of lexicons i.e. the attributes of the lexicons
considered.

Due to inherent ambiguities in natural language 100% accuracy was not possible. Still this project is a
humble effort to develop a system, for translating from English to Hindi resolving as many ambiguities as possible.

Our system does not support discourse analysis

References:
1. Natural Language Processing by Aksharbharati
2. Computation of language by Hausser
3. Research procedures in Machine Translation – Natural Language and computers by D.G. Hays

You might also like