Context Based Word Sense Extraction in Text:Design Approach
Prof.D. M. Thakore
Kaustubh S. Raval
Department of Computer Engineering M.Tech.(Computer Engineering)email@example.com firstname.lastname@example.org email@example.com
1, 2, 3
Bharati Vidyapeeth Deemed University, College of Engineering, Dhankawadi, Pune–411043.
Today user performs most of his work withelectronic document. Due to huge volumes of unstructured electronic texts available, it requiresautomated techniques to analyze and extractknowledge from these repositories of information.This unstructured text can be available in the form of emails, normal text document and HTML files etc.Understanding meanings and semantics of these textsis a complicated problem. This problem becomesmore vital, when meanings with respect to context,have to be extracted.Text mining, also known as Intelligent TextAnalysis, extract interesting information andknowledge from unstructured text. The agent forContext Based Sense Extraction in Text formulatesthe standard Natural language processing rules withcertain weights. These weights for each ruleultimately support in deciding the particular meaningof a word and sentence. In this paper architectureand design of Context Based Word Sense Extractionhave been presented.
Keywords-Text Mining, WordSense, Data Mining.
Sensing multiple meanings in a largeelectronic text is very difficult by a machine ascompared to natural human language. In naturallanguage, human extract the word sense by relatingit to that particular context. But for electronic textthis work is done by natural language processing byextracting two properties of word.1.It removes ambiguity of an individualword that can be used (in differentcontexts) to express multiple meanings.2.It identifies different meanings of word byextracting relation between two words thatare spelled the same way.To sense any word, two resources arenecessary: A context in which the word has beenused and Knowledge for finding relation of word incontext. Human has an ability to find relationknowledge of word in a context. For example foraword “Fine” in the context of human condition itassociate more word like “look”, “well”, “feel”.Due to these associated word it will definitelydescribe human condition and not refer to penalty.Due to lack of knowledge intelligence incomputer, it uses extra resources to sense word likedictionaries, tagged documents etc. Following aremain approaches used in computer to sense word.
It uses knowledge resources in the form of machinereadable dictionaries to extract multiplesense of word. Dictionary defines a term in aparticular subject.
Supervised Disambiguation Algorithms:
It uses knowledge resource in the form of tagged corpora which defines meaning of word. Itbuilds classifier which classifies new wordcorrectly depending on their context of use. Itneeds large sense training set to extract sense of word.
Unsupervised Disambiguation Algorithm:
Unsupervised disambiguation algorithm isequivalent to clustering in which they groupinstances of a word bymeaning.
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 5, May 201195http://sites.google.com/site/ijcsis/ISSN 1947-5500