Semantic Disambiguation

SEMANTIC DISAMBIGUATION
CHARBEL AL HELAYEL
RAMI SAADE
KARIM SALHAB
INTRODUCTION
INTRODUCTION
 Word Sense Disambiguation(WSD) is the task of assigning the
correct meaning to a word given its context, since a word can
have multiple meanings
 Example:
 The word date can be assigned different meanings:
1. Larry took Ellie out on a date
2. Larry‟s favorite fruit to eat is a date
 In the first sentence, the meaning of the word date is a social

engagement
 In the second sentence, the word date is a fruit
SEMANTIC DISAMBIGUATION 3
INTRODUCTION
 Considering another example:
1. I went fishing for some sea bass
2. The bass line of this song is too weak
 In the first example, bass is a type of fish

 In the second example, bass is a low frequency tone
 Humans are quite good when it comes to word-sense

disambiguation ranking above 90%
 Machine recognition is much lower than human recognition
INTRODUCTION
 Replicating this human ability using algorithms can be quite a demanding and
difficult task
 Solving this problem benefits many computer-related such as improving search
engine relevance, speech recognition and language translations
CONTENTS
 History
 Difficulties
 Approaches and Algorithms
 Knowledge-based
 Unsupervised
 Supervised
 Evaluations and demos
 Conclusion
HISTORY
Difficulties
Approaches and Algorithms
Evaluations and demos
Conclusion
HISTORY
 WSD is one of the oldest problems in computation linguistics that dated back to
the 40s.
 Introduced by Warren Weaver, a pioneer of the machine translation
 Semantic Disambiguation was believed, by early researchers, to be unsolvable by
electronic computer due to the necessity of having to model all knowledge
 In the1970, WSD systems were still very primitive as they were:
 Hand coded
 Rule based (if-then statements)
HISTORY
 In the 1980s, the emergence of large-scale lexical resources enabled the switch
from hand-coded knowledge to automatic extraction from these resources
 In the 1990s, supervised learning helped in the improvement of WSD
 In the 2000s, supervised techniques reached their optimal accuracy and other
methods emerged such as domain adaptation, semi-supervised and unsupervised
corpus
 Supervised systems are still the way to go
History
DIFFICULTIES
Conclusion
DIFFICULTIES
DIFFERENCE BETWEEN DICTIONARIES
 Different dictionaries can provide different definitions to the same word:
 Considering again the following example:

 The bass in this song is too strong
 Will the definition be that of the frequency or the musical instrument?
 This inconsistency between resources could lead to not returning the finest
sense of the word
 This reflects a difficulty in WSD since multiple dictionary resources are usually
used
DIFFICULTIES
PRAGMATICS
 Common sense is usually needed in WSD as some sentence formulation can lead
to confusion
 The following example illustrates the upper fact:
 Bob and Jim are fathers
 Bob and Jim are brothers
 In the first sentence, Bob and Jim are fathers independently of each other
 In the second sentence, Bob and Jim are actual brothers of each other
DIFFICULTIES
SENSE INVENTORY AND ALGORITHMS TASK-DEPENDENCY
 Translating from one language to another can be ambiguous

 Consider the following example:
 The word chair in English has two translations in French:
 “Chaise” as in furniture
 “Directeur” as in Director
 In addition, different applications require different algorithms to perform their

corresponding task
History
Difficulties
APPROACHES AND ALGORITHMS

Conclusion
KNOWLEDGE BASED
LESK ALGORITHM
DICTIONARY AND KNOWLEDGE BASED METHODS
 One of the most popular algorithms in knowledge based approaches is the Lesk
Algorithm
 Requires a Machine Readable Dictionary such as WordNet
 The steps are as follows :
1. Find the overlap between the features of different senses of an ambiguous word (sense
bag) and the features of the words in its context (context bag).
2. Weights can be given appropriately to each sense
3. The sense which has the maximum overlap is selected as the contextually appropriate sense.
 Sense Bag: contains the words in all of the definitions of the ambiguous words
 Context Bag: contains the words of the definitions of all context words
LESK ALGORITHM

 On burning coal we get ash
Senses Ash (Sense Bag) Coal (Context Bag)
1 Trees of the olive family with A piece of glowing carbon or burnt wood.
pinnate leaves, thin furrowed bark
and gray branches.
2 The solid residue left when charcoal.

combustible material is
thoroughly burned or oxidized.
3 To convert into ash A black solid combustible substance formed by the partial decomposition
of vegetable matter without free access to air and under the influence of
moisture and often increased pressure and temperature that is widely used
as a fuel for burning
The correct definition is Sense 2 for Ash

LESK ALGORITHM
 Problem: if a long sentence was considered, the database would explode and the
algorithm will take a long time
 Example:
 I went to the bank to donate blood for an old person
 Bank has 18 definitions
 Donate has 2 definitions
 Blood has 24 definitions
 Old has 28 definitions
 Person has 16 definitions
LESK ALGORITHM
 Solution: Use Simplified Lesk

 Find the sense that leads to the highest overlap between the ambiguous word
dictionary definition and the current context
 Time efficient and more precise
LESK ALGORITHM

 I am the chair, that‟s why I have authority
Senses Chair
1 a seat, especially for one person, usually having four legs for support
and a rest for the back and often having rests for the arms.
2 a position of authority, as of a judge, professor, etc.
The correct definition is Sense 1 for chair

COMPARISON AND DRAWBACKS
Algorithm Accuracy Advantages Disadvantages

Simplified Lesk 58% Faster Limited knowledge
Save resources Not as accurate as Lesk
Lesk’s algorithm 50-70% Best in class accuracy Very sensitive to the

exact word
Not enough vocabulary
to relate all words
Other algorithms like the Walker and Random Walk approaches are
used and are available in Appendix A of the presentation.
SUPERVISED
SUPERVISED METHODS
 In supervised methods, a classifier is trained on a corpus (set of texts)

 Words around the target word provide clues about its sense; they are called
features
 Each word in the corpus is “tagged,” that is put in different classes according to
each of its meaning
 In WordNet, “bass” has 8 possible tags
 Collocation of a target word are the features around the word at specific
positions (POS)
NAÏVE BAYES ALGORITHM
SUPERVISED METHODS
 In simple term, the classifier assigns the sense that has the biggest probability in the feature
set to be correct
 We „naively‟ assume that features are independent from one another, and calculate the
probabilities using the training data and the weight of the features
 Mathematically: Nc count(w, c) +1
P̂(c) = P̂(w | c) =
N count(c)+ | V |
cMAP = argmaxc P̂(c)Õ P̂(xi | c)

i
 Probabilities are estimated from the training set using relative frequency counts
 Each probability is calculated according to its frequency of appearance in the corpus
according to context
NAÏVE BAYES ALGORITHM
SUPERVISED METHODS
Doc Words Class

Word: “Bass” Training 1 Fish smoked f
fish
Priors:
2 Fish line f
P(f) = ¾
3 Fish haul f
P(g) = ¼ smoked
Conditional Probabilities: 4 Guitar jazz line g
 P(line|f) = (1+1)/(8+6) = 2/14 Features 5 Line guitar jazz ?
jazz
 P(guitar|f) = (1+1)/(8+6) = 1/14
 P(jazz|f) = (1+1)/(8+6) = 1/14 V={fish, smoked, line, haul, guitar, jazz}  6
Choosing a class:
 P(line|g) = (1+1)/(8+6) = 2/9 P(f|d5) = 3/4 * 2/14 * (1/14)2 * 1/14
 P(guitar|g) = (1+1)/(8+6) = 2/9 ≈ 0.00003
 P(jazz|g) = (1+1)/(8+6) = 2/9 P(g|d5) = 1/4 * 2/9 * (2/9)2 * 2/9

≈ 0.0006
DECISION LIST ALGORITHM
SUPERVISED METHODS
 Based on „One sense per collocation‟ property

 „Collocation‟ is a sequence of words that are often found in the same sentence and aren‟t
considered to be so by chance (ex: crystal clear, cosmetic surgery)
 The sense of the targeted word is determined through consistent clues provided by nearby
words
 Steps
1. Collect a big set of collocations for the word
2. Calculate the log-likelihood ratio:
3. Rank: higher log-likelihood means more predictive evidence, unless the target word
and its collocation words are found in the same order in the corpus
DECISION LIST ALGORITHM
SUPERVISED METHODS
 Lets consider this example:

Your initial check should be drawn on your nominated bank account
 We only consider two senses for „bank‟: A = financial and B = river
 In this example, the target word is associated with the word account, thus according to
our table, bank is being used in sense A
LogL Collocation Sense

11.3 River bank B
9.78 Money (within k words) A
9.62 Water (within k words) B
9.53 Bank account A
9.50 Financial (within k words) A
… … …
COMPARISON
SUPERVISED METHODS
Algorithm Accuracy Advantages Disadvantages

Naïve Bayes 64.13% Easy to implement Suffers from lack of data
Large number of
parameters to be trained
Decision Lists 96% Takes the single most One classifier for each
predictive feature word
Not tested enough (12
words)
UNSUPERVISED
HYPERFLEX
UNSUPERVISED APPROACHES
 One of the most popular unsupervised methods in knowledge based approaches is the Hyperflex
Approach
 Extracts the senses from the corpus itself without any human classification
 The corpus senses respond in their turn to clusters of similar context of a word
 The senses responses will create a co-occurrences graph of the target word with each edge having a
weight
 For the example the co-occurrences graph for the word Gram would be:
HYPERFLEX
 A Minimum Spanning Tree (MST) is then generated from this graph.

 The nodes with the highest weights are considered as Root hubs and present the sense set of the
target word
 The Senses (Root Hubs) are connected with a weight of 0 to the target word and with a distance d to
each context node
 Each node in MST is assigned a score s which is given by:
HYPERFLEX
 Consider this Example for the word Gram:

This smartphone feels heavier than the standard ones, how many grams does it weigh ?
Computations for Sense 2:

S Weight= 1/(1+0) = 1
S Heavy = 1/(1+0.11)
= 0.9
S Standard = 1/(1+0.09)
= 0.92
LIN‟S APPROACH
 Lin‟s Approach works on the principle that two words could have similar meanings in a local context
 The algorithms main goal is to search for Selectors Words that have a similar context to the context
word
 The Target Word sense that has the same Hypernym* as most of the Selectors is then chosen
 Consider this Example for the target word “Facility” and context word “Employ”:
The facility will employ 100 new employees. Selectors of “employ”
Senses of facility
 installation
 proficiency
 readiness Hypernym: a word
 toilet/bathroom that names a broad
category that
includes other
In this case , installation would be
words.
the chosen sense.
e.g.: “Primate” for
“Humans” and
SEMANTIC DISAMBIGUATION “Chimpanzee” 33
MORE UNSUPERVISED APPROACHES
 Another approach used approach is Parallel Corpora:

 Determines the sense of the word by looking at it‟s distinct translations
 Words are translated based on their context into other languages where they have
only one meaning
 Mostly used in determining sense of Hindu Words
UNSUPERVISED APPROACHES COMPARISON
Algorithm Accuracy Corpus Advantages Disadvantages
Lin’s Algorithm 68.5% Tested on a corpus Most tested algorithm General Approach
containing 25 million
words containing 2832
polysemous* nouns
Hyperlex 97% Test on a set of 10 highly Highest Precision Rate Could fail to distinguish
polysemous French between finer senses of
words A new approach that a word (e.g. the
extracts information medicinal and narcotic
from a corpus sense of “drug”)
WSD using parallel 62.4% Trained using a English Can distinguish between Needs a parallel corpus
corpora Spanish parallel corpus the finer sense of a of multiple languages
of nouns word due to distinct which needs a lot of
translations processing
* Polysemous:
SEMANTIC DISAMBIGUATION Words that can have 35
multiple meanings
History
Difficulties
EVALUATIONS AND DEMOS

Conclusion
EVALUATIONS AND DEMO
EVALUATION
 Comparing and evaluating different WSD systems is extremely difﬁcult:
 Since different test sets and knowledge resources are adopted
 During testing all word occurrences should be annotated and all methods compared on the same
corpus
 SemEval is an international word sense disambiguation competition where evaluations of WSD
systems are compared
Knowledge-Based Supervised Unsupervised

Approaches Approaches
No need for a corpus. Needs a corpus. Needs a corpus.
The used database includes all word Classifications of the different senses Clusters of words of similar context.
terms. of words.
No sense tagging. Manual sense tagging is expensive and No sense tagging.

time-consuming
Relies on the provided senses in Relies on the human classification of No need for human declarations ,
Machine Dictionaries the senses. results depend on the context words
37
SEMANTIC DISAMBIGUATION
PYTHIA DEMO
 Phythia
1. Go to http://omiotis.hua.gr/pythia/# or press above
2. Insert “Joe is a chair” as your text and press next, then choose short text
3. Choose Inter-Linear Programming and then Lesk-Like as we want to test an
algorithm similar to the Lesk one
4. Choose No for Sense pruning and then choose All Features for the
Classification model
5. Check your text when you are prompted to, you can see that chair will be given
the right meaning
6. Now try the following text “Joe took Ellie out on a date” you will see that date
was not given the correct meaning
BABELFY DEMO
 Babelfy
1. Go to http://babelfy.org/index or press above
2. Insert “Rami is a chair” as your text and press Babelfy, you will see that
Chair has been given the wrong meaning
3. Now try to give Babelfy more context by trying “Rami is a chair
member”, you will see that now the meaning of chair will be correct
4. As you can see, even Babelfy one of the most used WSD systems needs
a good amount of context for it to work
GOOGLE TRANSLATE DEMO
 Google Translate
1. Go to https://translate.google.com.lb/ or press above
2. Choose to translate from English to French
2. Insert “Rami is a chair member” as your text and translate you will see that
Chair has been given the wrong translation
3. Now try to give Google even more context by trying “Rami is the chair of the
board”, you will see that now the translation of chair will be correct
4. As you can see, also Google translate one of the most used translation systems
needs even more a fairly large amount of context for it to work correctly
History
Difficulties
Evaluations and Demo
CONCLUSION
CONCLUSION
 All three methods are closely related
 The denoted accuracy of each algorithm and the shown Demos prove that WSD
is still a long way from being completely perfect and truly reliable
APPENDIX A.1
WALKER‟S ALGORITHM
 Uses the features provided by the Thesaurus dictionaries

 Uses two steps:
 Step 1 : For each sense of the target word, find the thesaurus category to which that
sense belongs.
 Step 2: Calculate the score for each sense by using the context words. A context word
will add 1 to the score of the sense if the thesaurus category of the word matches
that of the sense.
APPENDIX A.1
WALKER‟S ALGORITHM

 The money in this bank fetches an interest of 8% per annum
 Context words add 1 to the sense when the topic of the word matches that of the sense
 Hence bank is defined as finance and not a location
Sense 1: finance Sense 2: location

Money +1 0
Interest +1 0
Fetch 0 0
Annum +1 0
Total 3 0
APPENDIX A.2
RANDOM WALK ALGORITHM
0.46 0.97
0.42
a
S3 b a
S3 S3
c
0.49
e
0.35 0.63
S2 f S2 S2
k
g
h
i 0.58
0.92 0.56 l 0.67
j
S1 S1 S1 S1
Bell ring church Sunday

Step 1: Add a vertex for each Step 3: Apply graph based ranking
possible sense of each word in algorithm to find score of
the text. each vertex (i.e. for each word
Step 2: Add weighted edges using sense).
Step 4: Select the vertex (sense)
definition based semantic
which has the highest score.
similarity (Lesk’s method).
REFERENCES
 [1]Harsimran Singh and Vishal Gupta, "An Insight into Word Sense Disambiguation Techniques ",International
Journal of Computer Applications, vol. 118, no. 23, 2015.
 [2]Yarowsky, David. 1992. "Word sense disambiguation using statistical models of Roget's categories trained on
large corpora", Proceedings of the 14th International Conference on Computational Linguistics (COLING), Nantes,
France, 454-460, 1992.
 [3]Roberto Navigli, Paolo Velardi, 2005."Structural Semantic Interconnections: A Knowledge-Based Approach to
Word Sense Disambiguation", IEEE Transactions On Pattern Analysis and Machine Intelligence, July 2005.
 [4]Roberto Navigli, 2006."Ensemble methods for unsupervised WSD", Proceedings of the 21st International
Conference on Computational Linguistics and the 44th annual meeting of the ACL, Sydney, 2006.
 [5]Mona Diab and Philip Resnik, 2002."An Unsupervised Method for Word Sense Tagging Using Parallel
Corpora", Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia,
Pennsylvania, July 2002.
 [6] B. Zopon Al-Bayaty and S. Joshi, "Comparative Analysis between Naïve Bayes Algorithm and Decision Tree to
Solve WSD Using Empirical Approach", 2016. [Online]. Available: http://www.lnse.org/vol4/228-SE3001.pdf.
[Accessed: 20- Sep- 2016].

Semantic Disambiguation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Semantic Disambiguation

Uploaded by

Copyright:

Available Formats

SEMANTIC DISAMBIGUATION

 In the first sentence, the meaning of the word date is a social

 In the first example, bass is a type of fish

 Humans are quite good when it comes to word-sense

Approaches and Algorithms

Evaluations and demos

Evaluations and demos

 Different dictionaries can provide different definitions to the same word:

 Considering again the following example:

 Translating from one language to another can be ambiguous

 In addition, different applications require different algorithms to perform their

APPROACHES AND ALGORITHMS

 Consider the following example:

2 The solid residue left when charcoal.

The correct definition is Sense 2 for Ash

 Solution: Use Simplified Lesk

 Consider the following example:

2 a position of authority, as of a judge, professor, etc.

The correct definition is Sense 1 for chair

Algorithm Accuracy Advantages Disadvantages

Lesk’s algorithm 50-70% Best in class accuracy Very sensitive to the

 In supervised methods, a classifier is trained on a corpus (set of texts)

cMAP = argmaxc P̂(c)Õ P̂(xi | c)

Doc Words Class

 P(jazz|g) = (1+1)/(8+6) = 2/9 P(g|d5) = 1/4 * 2/9 * (2/9)2 * 2/9

 Based on „One sense per collocation‟ property

 Lets consider this example:

LogL Collocation Sense

Algorithm Accuracy Advantages Disadvantages

 A Minimum Spanning Tree (MST) is then generated from this graph.

 Consider this Example for the word Gram:

Computations for Sense 2:

 Another approach used approach is Parallel Corpora:

Algorithm Accuracy Corpus Advantages Disadvantages

Approaches and Algorithms

EVALUATIONS AND DEMOS

Knowledge-Based Supervised Unsupervised

No sense tagging. Manual sense tagging is expensive and No sense tagging.

Approaches and Algorithms

Evaluations and Demo

 Uses the features provided by the Thesaurus dictionaries

 Consider the following example:

Sense 1: finance Sense 2: location

Bell ring church Sunday

You might also like