You are on page 1of 8

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &

6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), pp. 350-357 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com

IJCET

© I A E M E

by GISI) www.jifactor.com IJCET © I A E M E ENGLISH TO PUNJABI MACHINE TRANSLATION SYSTEM

ENGLISH TO PUNJABI MACHINE TRANSLATION SYSTEM USING HYBRID APPROACH OF WORD SENSE DISAMBIGUATION AND MACHINE TRANSLATION

1 Gurleen Kaur Sidhu,

2 Navjot Kaur

1 Department of Computer Science and Engineering, Sri Guru Granth Sahib World University Fatehgarh Sahib, Punjab 140406, India 2 Department of Computer Science and Engineering, Punjabi university Patiala, Punjab 140406, India

ABSTRACT

Machine Translation and Word Sense Disambiguation are most popular applications of Natural Language Processing, because Machine Translation is cheap and best to understand than any other language during conversation. Whereas Word Sense Disambiguation helps to get the correct meaning of particular word in which context that is used. In our system we are using hybrid approach with help of which we can disambiguate the words and can get best result of machine translation. Conditional Random Field algorithm with decision list using direct mapping is easiest method with best result to solve the problem of disambiguation. In our system, Conditional Random field, divide the data into categories and calculate the frequency of words with respect to the category. Category having maximum frequency in the sentence meaning will relates to that category. Accuracy of our System for correct sentences is 81.2% on the bases of tested sentences only.

Keywords: Conditional Random Field, Machine Translation, Natural language, Word Sense disambiguation, Hybrid approach.

I.

INTRODUCTION

During automatic translation of sentences there is a problem of incorrect sense in the target text. The process of assigning correct sense according to context is known as Word Sense Disambiguation. We have a lot of applications and online sites which are helpful to give the meaning of the input text. But they are not able to disambiguate the meanings. We try to solve this problem using hybrid approach of word sense disambiguation and machine translation. Machine

350

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

translation and word sense disambiguation are the most popular applications of Natural language processing. To process the data available on Internet in Blogs, website, social sites, and business site which are presented in natural language is known as Natural language processing. More information about history and overview of applications are discussed in Fig1. Introduction.

of applications are discussed in Fig1. Introduction. Fig1. Introduction To review the previously used techniques

Fig1. Introduction

To review the previously used techniques on different languages are discussed in Literature survey. Methodology part is use to explain the proposed technique which is the combination of various sub-techniques or algorithms of Word sense disambiguation and Machine Translation. Result and discussion is use to discuss the advantages and disadvantages of the system. Conclusion explains the how much beneficial the proposed system is, accuracy is also discussed in this part. Future Work gives us the direction in this field.

II. LITERATURE SURVEY

Review of English study is given in Fig 2, in this brief introduction of six part of speech and their sub types are given. Whereas remaining two parts are preposition and article. Articles are use to

distinguish the vowels & consonants, to define singular „a

used.

351

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME Fig 2. Review of Part of speech

Fig 2. Review of Part of speech in English

Review of research papers for techniques.[1]- Hybrid (statistical +rules) approach based transliteration system of person names; from a person name written in Punjabi (Gurumukhi Script), the system produces its English (Roman Script) transliteration. Experiments have shown that the performance is sufficiently high. The overall accuracy of system comes out to be 95.23%. Reasons behind the wrong answers of named entities are Multiple Transliterations, Wrong Input of Words, Character Gap, One-to-Multi mapping Problem.[2] The natural language processing is a multidisciplinary field at intersection of linguistic, psycholinguistic, Computer science and engineering, machine learning and statistics. Also gives the reasons of popularity of the Natural language processing day by day. More increase in business world more people move from one to another country, help counters are established everywhere to Conway the proper message need to process the natural language. [6]-Machine translation is used to translate the source text into the target text with or without the help of human assistance. Machine translation has various approaches:

direct Translation method- word to word directly translate. Transfer-Based Translation- is done with the proper knowledge of the rule of any language in which we want to translate. Interlingua-based translation – inter-mediator is used to convert into target language. Corpus-based translation - is use the parallel corpus of source and target text. Hybrid translation- is made with the help of above all. Nancy ide (1998) [7] - define the various applications in which we can use the word sense disambiguation method. [11]- The supervised learning method of word Sense Disambiguation, which is Cosine Similarity. researcher extract two sets of features; the set of words that have occurred frequently in the text and Cosine similarity algorithm uses the concept of inner product of two vectors. After converting each context to a vector of words, cosine similarity measures the similarity between a new context and each existing context in the training corpus. [12] Researcher work on shahmukhi to Gurumukhi transliteration and try to remove the ambiguity problem. To different approaches are used for word sense disambiguation that are: state sequence representation as a Hidden Markov Model and N-gram in which small window of size - 5 + is used. Accuracy for word Sense Disambiguation using both approaches is calculated more than 92%.

352

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

III.

METHODOLOGY

4, Issue 4, July-August (2013), © IAEME III. METHODOLOGY Fig 3. Flow Chat for Proposed System

Fig 3. Flow Chat for Proposed System

Algorithm for Proposed Punjabi to English Machine Translation System:

Step1: START input text Step2: Check the text is present or not

o

If present then move to step3 Else

o

Display the message “please enter the text first”

Step3: ANALYSIS the sentence

o

TOKENIZATION (Split sentence on the bases of white-space count the words) Repeat the next two steps for every token

o

PREPOCESSING (further divide into 2 subparts)

o Text normalization (optional) Implement the proposed algorithm for American to British English o Sentence Differentiation

Rules implement to check sentence is simple or compound

o PART OF SPEECH TAGGING (DIRECT MAPPING IMPLEMENTED) After Analysis the sentence move on Step 4 Step4: SYNTHESIS the sentence

o DIRECT MAPPING( WORD + POS )

o

PRESENT then FETCH the MEANING (MOVE ON reorder)

o

Otherwise HYBRID APPROACH FOR WSD implement on sentence

If (WORD+ POS ) having multiple CATEGORIES

Increase the counter of all category(Repeat the above step for all tokens )

Check that category having (Ambiguous word+ maximum Frequency) assigns that meaning to the ambiguous word. Fetch the meaning move on next REORDER

353

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

o REORDER

o According to target text.

Step5: TRANSLATION ENGINE

o OUTPUT ( after Reordering combine the words in the form of sentence and display) Step 6: END.

IV. RESULTS AND DISCUSSION

First Case: general case is explained with 2 main examples that are give in below figures with their discussion according to their results. In this Simple sentence is entered as input which is correct in format our system show the output better than the previous one.

our system show the output better than the previous one. Fig.4: Correct and incorrect Sentence with

Fig.4: Correct and incorrect Sentence with discussion

Random words used in sentence: System gives their meaning if present in the database but avoid generating the sentence.

Fig.5 shows the Error given by our system due to incorrect formation of input sentence.

Thats why our system gives the message try again. To check whether Sentence formation is incorrect

our system gives the message try again. To check whether Sentence formation is incorrect Fig. 5:

Fig. 5: System gives Error

354

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME Fig.7 : Lack of Word Sense disambiguation

Fig.7 : Lack of Word Sense disambiguation

(2013), © IAEME Fig.7 : Lack of Word Sense disambiguation Fig.8: Remove ambiguity of Words Our

Fig.8: Remove ambiguity of Words

Our system uses the Conditional random field to remove the ambiguity of the words. In above fig. Input sentence is „we visited the bank and that was situated at the bank. Bank word is ambiguous here. First we check the conjunction word so that meaning of words fetch according to the sub-parts. So in first sub-part there is no specific category the sentence relates to the general category so we fetch the meaning which is generally used most that is financial bank. Then we solve the second part here is the word situatedwhich is belongs to geography category. We fetch the both meanings of bank. But here condition is applied the word used in sentence with maximum frequency will be used. So we use the meaning of Bank related with geography category for second part. Then reorder the sentence with respect to their POS then generate the target sentence as display in the fig8.

Inaccuracy of result: character-gap, wrong input, word not present in database.

355

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

V.

CONCLUSION

We conclude that Machine Translation and Word Sense Disambiguation are most popular applications of Natural Language Processing, because Machine Translation is cheap and best to understand than any other language during conversation. Whereas Word Sense Disambiguation helps to get the correct meaning of particular word in which context that is used. From Literature Survey, we conclude that the basic structure and various sub-parts of part of speech of both languages English and Punjabi. Also, know the previously implemented techniques by the different researchers. In our system we are using hybrid approach with help of which we can disambiguate the words and can get best result of machine translation. Conditional Random Field algorithm with decision list using direct mapping is easiest method with best result to solve the problem of disambiguation. Accuracy of our System is given below:

of disambiguation. Accuracy of our System is given below: Fig.9: Accuracy table for testing the system

Fig.9: Accuracy table for testing the system

VI.

FUTURE WORK

More techniques can combine with this system for more accuracy.

More data can use.

Categories can further classify into sub-parts.

Part of speech can more explore with sub-categories.

VII.

ACKNOWLEDGEMENTS

As a part of my course I have taken the problem as “English to Punjabi Machine Translation System using Hybrid Approach of Word Sense Disambiguation and Machine Translation” as my Thesis Topic. I am very thankful to Mrs. Navjot Kaur, Assistant Professor, Punjabi University, and Patiala for giving me such a valuable support in doing my work. She provided all the relevant material that was sufficient for me to complete my thesis work. She provided help and time whenever asked for. Last but not least, a word of thanks for the authors of all those books and papers which I have consulted during my thesis work as well as for preparing the report. At the end thanks to the Almighty for not letting me down at the time of crisis and showing me the silver lining in the dark clouds.

356

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME

VIII.

JOURNAL

REFERENCES

[1].

System, International Journal of Computer Applications (0975 – 8887)Volume 28– No.1, August 2011 [2]. Fabio Ciravegna, Recent Advances in Natural Language Processing, IEEE Computer Society 2003.

Kamal Deep, Dr.Vishal Goyal, Hybrid Approach for Punjabi to English Transliteration

[4].

J. Hutchins, An introduction to Machine Translation. Academic Press, 1992.

[7].

Nancy Ide, Jean Veronis, Introduction to the Special Issue on Word Sense Disambiguation:

The State of the Art, 1998J.

[8]. Pushpak Bhattacharyya, CS460/626: Natural LanguageProcessing/Speech, NLP and the Web (Lecture 25– Knowledge Based andSupervised WSD), IIT Bombay, 6th March, 2012,

p.24.

[9]. Pushpak Bhattacharyya, CS460/626: Natural LanguageProcessing/Speech, NLP and the Web (Lecture 25– Knowledge Based andSupervised WSD), IIT Bombay, 6th March, 2012,

p.35.

Durgesh D Rao, Machine Translation, pp.61-70, July1998.

[10].

[13]. Kamaljeet Kaur Batra, G S Lehal, Rule Based Machine Translation of Noun Phrases from Punjabi to English, IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 5, September 2010. [14]. P.Tamilselvi, S.K.Srivatsa, Case Based Word Sense Disambiguation Using Optimal Features, 2011 International Conference on Information Communication and Management IPCSIT vol.16 Singapore, (2011).

BOOKS

[15].

Wren & Martin, English Grammar and Composition, S.CHAND Publication,

THESIS

[6]. R.Harshawardhan,Rule Based Machine Translation System For English To Malayalam Language, Centre for Excellence in Computational Engineering and Networking, December

2011.

[28]. Kamal Deep, Dr.Vishal Goyal, Hybrid Approach for Punjabi to English Transliteration System, Punjabi university Patiala, September 2011.

PROCEEDING PAPER

[3].

[11]. M. Nameh, S.M. Fakhrahmad, M. Zolghadri Jahromi, A New Approach to Word Sense Disambiguation Based on Context Similarity, Proceedings of the World Congress on Engineering 2011 Vol I, pp. 456-459. [12]. Tejinder Singh Saini, Gurpreet Singh Lehal Word Disambiguation in Shahmukhi to

Available:

http://en.wikipedia.org/wiki/Natural_language_processing

[26].

Gurmukhi Transliteration, Proceedings of the 9th Workshop on Asian Language Resources, Chiang Mai, Thailand, November 12 and 13, 2011, pages 79–87. Available at: http://en.wikipedia.org/wiki/Machine_translation

[27].

Available at: http://en.wikipedia.org/wiki/Word-sense_disambiguation

357