Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
2Activity
0 of .
Results for:
No results containing your search query
P. 1
Context Based Word Sense Extraction in Text: Design Approach

Context Based Word Sense Extraction in Text: Design Approach

Ratings: (0)|Views: 207 |Likes:
Published by ijcsis
Today user performs most of his work with electronic document. Due to huge volumes of unstructured electronic texts available, it requires automated techniques to analyze and extract knowledge from these repositories of information. This unstructured text can be available in the form of emails, normal text document and HTML files etc. Understanding meanings and semantics of these texts is a complicated problem. This problem becomes more vital, when meanings with respect to context, have to be extracted. Text mining, also known as Intelligent Text Analysis, extract interesting information and knowledge from unstructured text. The agent for Context Based Sense Extraction in Text formulates the standard Natural language processing rules with certain weights. These weights for each rule ultimately support in deciding the particular meaning of a word and sentence. In this paper architecture and design of Context Based Word Sense Extraction have been presented.
Today user performs most of his work with electronic document. Due to huge volumes of unstructured electronic texts available, it requires automated techniques to analyze and extract knowledge from these repositories of information. This unstructured text can be available in the form of emails, normal text document and HTML files etc. Understanding meanings and semantics of these texts is a complicated problem. This problem becomes more vital, when meanings with respect to context, have to be extracted. Text mining, also known as Intelligent Text Analysis, extract interesting information and knowledge from unstructured text. The agent for Context Based Sense Extraction in Text formulates the standard Natural language processing rules with certain weights. These weights for each rule ultimately support in deciding the particular meaning of a word and sentence. In this paper architecture and design of Context Based Word Sense Extraction have been presented.

More info:

Published by: ijcsis on Jun 05, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

12/29/2013

pdf

text

original

 
Context Based Word Sense Extraction in Text:Design Approach
1
Ranjeetsingh S.Suryawanshi
2
Prof.D. M. Thakore
3
Kaustubh S. Raval
M.Tech.(Computer Engineering)
 
Department of Computer Engineering M.Tech.(Computer Engineering)ranjeetsuryawanshi06@gmail.com dmthakore@bvucoep.edu.in raval_kaustubh@yahoo.co.in
1, 2, 3
Bharati Vidyapeeth Deemed University, College of Engineering, Dhankawadi, Pune411043.
 Abstract
-
Today user performs most of his work withelectronic document. Due to huge volumes of unstructured electronic texts available, it requiresautomated techniques to analyze and extractknowledge from these repositories of information.This unstructured text can be available in the form of emails, normal text document and HTML files etc.Understanding meanings and semantics of these textsis a complicated problem. This problem becomesmore vital, when meanings with respect to context,have to be extracted.Text mining, also known as Intelligent TextAnalysis, extract interesting information andknowledge from unstructured text. The agent forContext Based Sense Extraction in Text formulatesthe standard Natural language processing rules withcertain weights. These weights for each ruleultimately support in deciding the particular meaningof a word and sentence. In this paper architectureand design of Context Based Word Sense Extractionhave been presented.
 Keywords-Text Mining, WordSense, Data Mining.
I.
I
NTRODUCTION
Sensing multiple meanings in a largeelectronic text is very difficult by a machine ascompared to natural human language. In naturallanguage, human extract the word sense by relatingit to that particular context. But for electronic textthis work is done by natural language processing byextracting two properties of word.1.It removes ambiguity of an individualword that can be used (in differentcontexts) to express multiple meanings.2.It identifies different meanings of word byextracting relation between two words thatare spelled the same way.To sense any word, two resources arenecessary: A context in which the word has beenused and Knowledge for finding relation of word incontext. Human has an ability to find relationknowledge of word in a context. For example foraword “Fine” in the context of human condition itassociate more word like “look”, “well”, “feel”.Due to these associated word it will definitelydescribe human condition and not refer to penalty.Due to lack of knowledge intelligence incomputer, it uses extra resources to sense word likedictionaries, tagged documents etc. Following aremain approaches used in computer to sense word[5].
 Dictionary-based Algorithms:
It uses knowledge resources in the form of machinereadable dictionaries to extract multiplesense of word. Dictionary defines a term in aparticular subject.
Supervised Disambiguation Algorithms:
It uses knowledge resource in the form of tagged corpora which defines meaning of word. Itbuilds classifier which classifies new wordcorrectly depending on their context of use. Itneeds large sense training set to extract sense of word.
Unsupervised Disambiguation Algorithm:
Unsupervised disambiguation algorithm isequivalent to clustering in which they groupinstances of a word bymeaning.
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 5, May 201195http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
II.
T
ECHNOLOGY
F
OUNDATIONS
 A.Data Mining:
Data mining is the analysis of largequantities of data, so as to retrieve useful andmeaningful patterns and rules. The volume of datais increasing day by day. In order to overcome thedeficiencies of manual analysis, data miningtechniques can be used, so that an accurate andoptimal result is obtained. Data mining involves aseries of steps. In classification, the incoming datais grouped by comparing their features to thepredefined elements of a class. In estimation, aborder limit is established and checked whether thedata value is above or below that limit and theclassification is done. Association rules helps todecide which combinations are best, so that theoutcome is best. In clustering the grouping of datais done.[4]There is a wide array of techniques thatcan be used to mine data. Statistical techniques,neural networks, machine learning techniques,genetic algorithms, rough sets techniques, fuzzy settechniques, decision tree building procedures, k nearest neighbor’s techniques, and other tools areavailable for data mining. Each of these techniqueshas its strengths and weaknesses, and part of thevalue provided to the project by the data miningteam lies in understanding which techniques to use,and whento use.
 B.Text mining:
Text mining, also known as Knowledge-Discovery in Text (KDT), refers to the process of extracting interesting information and knowledgefrom unstructured text. Data mining tools aredesigned to handle structured datafrom databases,while Text mining can handle unstructured or semi-structured data sets such as emails, full-textdocuments and HTML files etc. [1]Human can easily handle contextualmeaning but computer cannot handle easilyspelling variations and contextual meaning of textuntil some rules provided to the computer. Thisscenario becomes more significant and criticalwhen the meanings of a piece of text have to beextracted in a particular context. Natural languageprocessing (NLP) is used to determine which senseof a word should be adopted for each instance of aword. Figure 1depicts a generic process model fora text mining application[1].Starting with a collection of documents, atextmining tool would retrieve a particulardocument and pre-process it by checking formatand character sets. Then it would go through a textanalysis phase, sometimes repeating techniquesuntil information is extracted. Theresultinginformation can be placed as a pattern discoverywhichwillhelp to interpret target knowledge.Figure 1: Generic process for a text mining
TextDocumentCollection
DocumentRetrieval &Pre-processingInformationExtractionParsing
Classification
PatternDiscoveryInterpretingResult
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 5, May 201196http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
C.Natural language processing:
Natural Language Processing (NLP) is an areaof research and application that explores howcomputers can be used to understand andmanipulate natural language text or speech to douseful things.Natural language is used to representhuman thoughts and human actions. Naturallanguage processing produced technologies thatteach computers to analyze, understand, and evengenerate text. Some of the technologies that havebeen developed and can be used in the text miningprocess are information extraction, categorization,clustering, concept linkage, informationvisualization, and question answering.Applicationsof NLP include a number of fields of studies, suchas machine translation, natural language textprocessing and summarization, user interfaces,multilingual and cross language informationretrieval (CLIR), speech recognition, artificialintelligence etc.[6]III.
S
YSTEM
D
ESIGN
Figure2shows Use Case Diagram for Contextbased Word Sense in text with following entity.
Actor:
1)User.
Use Cases:
1)Frequency Count for Word.2)Calculate Weight Matrix.3)Find out relationship between words.As shown in figure 2 user will enter query tothe system. Then system will generate frequencycount for scenario provided by user. If multiplemeanings possible for entered query, then systemwill find relation of word within documents.Finally system calculateweight matrix to ranpossible senses of word.Figure 2: User InteractionFigure3shows the data flowfor the‘Context based word sense text-miningsystem’. The basic process is broken downinto sub-processes such as ‘Process 1:Parsing, ordering and finding key-phrase’,‘Process 2: Deciding contexts, ‘Process 3:Calculate Frequency Count for word,‘Process 4: Calculate weight matrix valueand associated word senses’.
E
Figure 3:Data flow for Context based word sensesystem.
<<extend>>User<<communicate>><<include>>Input WordFrequency of wordCo-ocuurance and relationof wordCalculate WeightMatrixCheck WordProbablity<<include>>
User
1PreprocessingAndTokenization
KeyphraseText MinerAgent
2Context Decision
KeyPharase
IntelligentAgen
3Calculate FrequencyFor Word
Dataset
4Calculate WeightMatrix & associatedWord senses
IntelligentAgent
Word &Contexts
Text Input
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 5, May 201197http://sites.google.com/site/ijcsis/ISSN 1947-5500

Activity (2)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->