Dr.Babasaheb Ambedkar Marathwada University
Jawaharlal Nehru Engineering College
N-6, CIDCO, Aurangabad
The goal of Natural Language Processing (NLP) is to design and build a computer system that will analyze, understand and generate languages that human uses naturally, so that eventually you can address your computer as though you were addressing another person.
Machine Translation (MT) is one of the major areas of NLP. The objective of MT is to recognize the content of a document in order to render it in another language. We have designed and developed an English \u2013 Hindi translation system with special reference to weather narration domain. Two objectives of this MT system are:
For recognition of the content of the source document, our MT system takes into account the structural properties of the source language. In the generation process of MT, our system is able to solve some of the ambiguities in source text and produces the same content in target language.
Due to inherent ambiguities in a natural language, making a computer to translate natural language is a complex task. So we have to restrict it to some specific domain. We have selected Weather narration domain. We have designed and developed an English \u2013 Hindi translation system with special reference to weather narration domain.
The translation system from English to Hindi i.e. from a foreign to a regional language consists of many problems. Any natural language is a free language, i.e. its structure is not fixed. The structure can keep changing as the user wishes. Hence a good translation system will have to handle as many grammar constructs as possible.
The problems faced in translation from English to Hindi are immense, and a lot of efforts are being put into it to try and solve them. On our part, we have tried to handle the different ambiguities and other problems as far as possible. Considering the problems that may arise we decided to store the data along with the word\u2019s attributes or those it governs like gender, number, person etc.
The translation is text based i.e. input is a text file given by the user while the output is a text file generated by the program which can be saved as per the user requirements. The input is sentence wise, which is converted into a linked list of words. A linked list turns out to be economical for shifting of words required in the translation process. Disambiguation is carried out step by step to get the final correct output.
To translate any text it is initially important to determine the noun, verbs, and phrases in the sentence. This information is obtained from the database. Further, it is important to determine the subject and object in the sentence, as further problems need this information. This information can be sought by checking the positions of verbs, etc. in the sentence.
In order to understand the syntactic structure of a sentence one must know two things: the grammar which is a formal specification of the structures allowable in the language and the parsing techniques which is the method of analyzing a sentence to determine its structure according to the grammar.
Based on such rules we have formalized the grammar rules for English as well as Hindi are which are stored in the database. If developer wants to make changes in the grammar rules, if required, he can directly update the database without modifying the source code.
Parser uses English grammar rules to determine whether the syntax of the input sentence is correct or not. If it is correct English tree is generated. The Parsing technique used here is bottom-up parsing with Top-down filtering. For generating the tree for the corresponding translated Hindi sentence mapping of English - Hindi grammar rules is done.
3)AUXILLARY-TABLE: This table stores all the possible combinations of the auxiliaries occurring in English, along with their corresponding Hindi Translation, and their type. There are in total 8 types of verbs. The auxiliary combination of all these types is stored here.
This action might not be possible to undo. Are you sure you want to continue?