You are on page 1of 46

SEMANTIC DISAMBIGUATION

CHARBEL AL HELAYEL
RAMI SAADE
KARIM SALHAB
INTRODUCTION
INTRODUCTION
 Word Sense Disambiguation(WSD) is the task of assigning the
correct meaning to a word given its context, since a word can
have multiple meanings
 Example:
 The word date can be assigned different meanings:
1. Larry took Ellie out on a date
2. Larry‟s favorite fruit to eat is a date

 In the first sentence, the meaning of the word date is a social


engagement
 In the second sentence, the word date is a fruit

SEMANTIC DISAMBIGUATION 3
INTRODUCTION
 Considering another example:
1. I went fishing for some sea bass
2. The bass line of this song is too weak

 In the first example, bass is a type of fish


 In the second example, bass is a low frequency tone

 Humans are quite good when it comes to word-sense


disambiguation ranking above 90%
 Machine recognition is much lower than human recognition

SEMANTIC DISAMBIGUATION 4
INTRODUCTION
 Replicating this human ability using algorithms can be quite a demanding and
difficult task
 Solving this problem benefits many computer-related such as improving search
engine relevance, speech recognition and language translations

SEMANTIC DISAMBIGUATION 5
CONTENTS

 History
 Difficulties
 Approaches and Algorithms
 Knowledge-based
 Unsupervised
 Supervised
 Evaluations and demos
 Conclusion
SEMANTIC DISAMBIGUATION 6
HISTORY
Difficulties

Approaches and Algorithms

Evaluations and demos

Conclusion
HISTORY
 WSD is one of the oldest problems in computation linguistics that dated back to
the 40s.
 Introduced by Warren Weaver, a pioneer of the machine translation
 Semantic Disambiguation was believed, by early researchers, to be unsolvable by
electronic computer due to the necessity of having to model all knowledge
 In the1970, WSD systems were still very primitive as they were:
 Hand coded
 Rule based (if-then statements)

SEMANTIC DISAMBIGUATION 8
HISTORY

 In the 1980s, the emergence of large-scale lexical resources enabled the switch
from hand-coded knowledge to automatic extraction from these resources
 In the 1990s, supervised learning helped in the improvement of WSD
 In the 2000s, supervised techniques reached their optimal accuracy and other
methods emerged such as domain adaptation, semi-supervised and unsupervised
corpus
 Supervised systems are still the way to go

SEMANTIC DISAMBIGUATION 9
History

DIFFICULTIES
Approaches and Algorithms

Evaluations and demos

Conclusion
DIFFICULTIES
DIFFERENCE BETWEEN DICTIONARIES

 Different dictionaries can provide different definitions to the same word:

 Considering again the following example:


 The bass in this song is too strong
 Will the definition be that of the frequency or the musical instrument?

 This inconsistency between resources could lead to not returning the finest
sense of the word
 This reflects a difficulty in WSD since multiple dictionary resources are usually
used

SEMANTIC DISAMBIGUATION 11
DIFFICULTIES
PRAGMATICS

 Common sense is usually needed in WSD as some sentence formulation can lead
to confusion
 The following example illustrates the upper fact:
 Bob and Jim are fathers
 Bob and Jim are brothers
 In the first sentence, Bob and Jim are fathers independently of each other
 In the second sentence, Bob and Jim are actual brothers of each other

SEMANTIC DISAMBIGUATION 12
DIFFICULTIES
SENSE INVENTORY AND ALGORITHMS TASK-DEPENDENCY

 Translating from one language to another can be ambiguous


 Consider the following example:
 The word chair in English has two translations in French:
 “Chaise” as in furniture
 “Directeur” as in Director

 In addition, different applications require different algorithms to perform their


corresponding task

SEMANTIC DISAMBIGUATION 13
History

Difficulties

APPROACHES AND ALGORITHMS


Evaluations and demos

Conclusion
KNOWLEDGE BASED
APPROACHES AND ALGORITHMS
LESK ALGORITHM
DICTIONARY AND KNOWLEDGE BASED METHODS

 One of the most popular algorithms in knowledge based approaches is the Lesk
Algorithm
 Requires a Machine Readable Dictionary such as WordNet
 The steps are as follows :
1. Find the overlap between the features of different senses of an ambiguous word (sense
bag) and the features of the words in its context (context bag).
2. Weights can be given appropriately to each sense
3. The sense which has the maximum overlap is selected as the contextually appropriate sense.
 Sense Bag: contains the words in all of the definitions of the ambiguous words
 Context Bag: contains the words of the definitions of all context words

SEMANTIC DISAMBIGUATION 16
APPROACHES AND ALGORITHMS
LESK ALGORITHM
DICTIONARY AND KNOWLEDGE BASED METHODS

 Consider the following example:


 On burning coal we get ash
Senses Ash (Sense Bag) Coal (Context Bag)
1 Trees of the olive family with A piece of glowing carbon or burnt wood.
pinnate leaves, thin furrowed bark
and gray branches.

2 The solid residue left when charcoal.


combustible material is
thoroughly burned or oxidized.

3 To convert into ash A black solid combustible substance formed by the partial decomposition
of vegetable matter without free access to air and under the influence of
moisture and often increased pressure and temperature that is widely used
as a fuel for burning

The correct definition is Sense 2 for Ash


SEMANTIC DISAMBIGUATION 17
APPROACHES AND ALGORITHMS
LESK ALGORITHM
DICTIONARY AND KNOWLEDGE BASED METHODS

 Problem: if a long sentence was considered, the database would explode and the
algorithm will take a long time
 Example:
 I went to the bank to donate blood for an old person
 Bank has 18 definitions
 Donate has 2 definitions
 Blood has 24 definitions
 Old has 28 definitions
 Person has 16 definitions

SEMANTIC DISAMBIGUATION 18
APPROACHES AND ALGORITHMS
LESK ALGORITHM
DICTIONARY AND KNOWLEDGE BASED METHODS

 Solution: Use Simplified Lesk


 Find the sense that leads to the highest overlap between the ambiguous word
dictionary definition and the current context
 Time efficient and more precise

SEMANTIC DISAMBIGUATION 19
APPROACHES AND ALGORITHMS
LESK ALGORITHM
DICTIONARY AND KNOWLEDGE BASED METHODS

 Consider the following example:


 I am the chair, that‟s why I have authority
Senses Chair

1 a seat, especially for one person, usually having four legs for support
and a rest for the back and often having rests for the arms.

2 a position of authority, as of a judge, professor, etc.

The correct definition is Sense 1 for chair


SEMANTIC DISAMBIGUATION 20
APPROACHES AND ALGORITHMS
COMPARISON AND DRAWBACKS
DICTIONARY AND KNOWLEDGE BASED METHODS

Algorithm Accuracy Advantages Disadvantages


Simplified Lesk 58% Faster Limited knowledge
Save resources Not as accurate as Lesk

Lesk’s algorithm 50-70% Best in class accuracy Very sensitive to the


exact word
Not enough vocabulary
to relate all words

Other algorithms like the Walker and Random Walk approaches are
used and are available in Appendix A of the presentation.

SEMANTIC DISAMBIGUATION 21
SUPERVISED
APPROACHES AND ALGORITHMS
SUPERVISED METHODS

 In supervised methods, a classifier is trained on a corpus (set of texts)


 Words around the target word provide clues about its sense; they are called
features
 Each word in the corpus is “tagged,” that is put in different classes according to
each of its meaning
 In WordNet, “bass” has 8 possible tags
 Collocation of a target word are the features around the word at specific
positions (POS)

SEMANTIC DISAMBIGUATION 23
APPROACHES AND ALGORITHMS
NAÏVE BAYES ALGORITHM
SUPERVISED METHODS

 In simple term, the classifier assigns the sense that has the biggest probability in the feature
set to be correct
 We „naively‟ assume that features are independent from one another, and calculate the
probabilities using the training data and the weight of the features
 Mathematically: Nc count(w, c) +1
P̂(c) = P̂(w | c) =
N count(c)+ | V |

cMAP = argmaxc P̂(c)Õ P̂(xi | c)


i
 Probabilities are estimated from the training set using relative frequency counts
 Each probability is calculated according to its frequency of appearance in the corpus
according to context
SEMANTIC DISAMBIGUATION 24
APPROACHES AND ALGORITHMS
NAÏVE BAYES ALGORITHM
SUPERVISED METHODS

Doc Words Class


Word: “Bass” Training 1 Fish smoked f
fish
Priors:
2 Fish line f
P(f) = ¾
3 Fish haul f
P(g) = ¼ smoked
Conditional Probabilities: 4 Guitar jazz line g
 P(line|f) = (1+1)/(8+6) = 2/14 Features 5 Line guitar jazz ?
jazz
 P(guitar|f) = (1+1)/(8+6) = 1/14
 P(jazz|f) = (1+1)/(8+6) = 1/14 V={fish, smoked, line, haul, guitar, jazz}  6
Choosing a class:
 P(line|g) = (1+1)/(8+6) = 2/9 P(f|d5) = 3/4 * 2/14 * (1/14)2 * 1/14
 P(guitar|g) = (1+1)/(8+6) = 2/9 ≈ 0.00003

 P(jazz|g) = (1+1)/(8+6) = 2/9 P(g|d5) = 1/4 * 2/9 * (2/9)2 * 2/9


≈ 0.0006
SEMANTIC DISAMBIGUATION 25
APPROACHES AND ALGORITHMS
DECISION LIST ALGORITHM
SUPERVISED METHODS

 Based on „One sense per collocation‟ property


 „Collocation‟ is a sequence of words that are often found in the same sentence and aren‟t
considered to be so by chance (ex: crystal clear, cosmetic surgery)
 The sense of the targeted word is determined through consistent clues provided by nearby
words
 Steps
1. Collect a big set of collocations for the word
2. Calculate the log-likelihood ratio:

3. Rank: higher log-likelihood means more predictive evidence, unless the target word
and its collocation words are found in the same order in the corpus

SEMANTIC DISAMBIGUATION 26
APPROACHES AND ALGORITHMS
DECISION LIST ALGORITHM
SUPERVISED METHODS

 Lets consider this example:


Your initial check should be drawn on your nominated bank account
 We only consider two senses for „bank‟: A = financial and B = river
 In this example, the target word is associated with the word account, thus according to
our table, bank is being used in sense A

LogL Collocation Sense


11.3 River bank B
9.78 Money (within k words) A
9.62 Water (within k words) B
9.53 Bank account A
9.50 Financial (within k words) A
SEMANTIC DISAMBIGUATION 27

… … …
APPROACHES AND ALGORITHMS
COMPARISON
SUPERVISED METHODS

Algorithm Accuracy Advantages Disadvantages


Naïve Bayes 64.13% Easy to implement Suffers from lack of data
Large number of
parameters to be trained
Decision Lists 96% Takes the single most One classifier for each
predictive feature word
Not tested enough (12
words)

SEMANTIC DISAMBIGUATION 28
UNSUPERVISED
APPROACHES AND ALGORITHMS
HYPERFLEX
UNSUPERVISED APPROACHES

 One of the most popular unsupervised methods in knowledge based approaches is the Hyperflex
Approach
 Extracts the senses from the corpus itself without any human classification
 The corpus senses respond in their turn to clusters of similar context of a word
 The senses responses will create a co-occurrences graph of the target word with each edge having a
weight
 For the example the co-occurrences graph for the word Gram would be:

SEMANTIC DISAMBIGUATION 30
APPROACHES AND ALGORITHMS
HYPERFLEX
UNSUPERVISED APPROACHES

 A Minimum Spanning Tree (MST) is then generated from this graph.


 The nodes with the highest weights are considered as Root hubs and present the sense set of the
target word
 The Senses (Root Hubs) are connected with a weight of 0 to the target word and with a distance d to
each context node
 Each node in MST is assigned a score s which is given by:

SEMANTIC DISAMBIGUATION 31
APPROACHES AND ALGORITHMS
HYPERFLEX
UNSUPERVISED APPROACHES

 Consider this Example for the word Gram:


This smartphone feels heavier than the standard ones, how many grams does it weigh ?

Computations for Sense 2:


S Weight= 1/(1+0) = 1
S Heavy = 1/(1+0.11)
= 0.9
S Standard = 1/(1+0.09)
= 0.92

SEMANTIC DISAMBIGUATION 32
APPROACHES AND ALGORITHMS
LIN‟S APPROACH
UNSUPERVISED APPROACHES

 Lin‟s Approach works on the principle that two words could have similar meanings in a local context
 The algorithms main goal is to search for Selectors Words that have a similar context to the context
word
 The Target Word sense that has the same Hypernym* as most of the Selectors is then chosen
 Consider this Example for the target word “Facility” and context word “Employ”:
The facility will employ 100 new employees. Selectors of “employ”
Senses of facility

 installation
 proficiency
 readiness Hypernym: a word
 toilet/bathroom that names a broad
category that
includes other
In this case , installation would be
words.
the chosen sense.
e.g.: “Primate” for
“Humans” and
SEMANTIC DISAMBIGUATION “Chimpanzee” 33
APPROACHES AND ALGORITHMS
MORE UNSUPERVISED APPROACHES
UNSUPERVISED APPROACHES

 Another approach used approach is Parallel Corpora:


 Determines the sense of the word by looking at it‟s distinct translations
 Words are translated based on their context into other languages where they have
only one meaning
 Mostly used in determining sense of Hindu Words

SEMANTIC DISAMBIGUATION 34
APPROACHES AND ALGORITHMS
UNSUPERVISED APPROACHES COMPARISON
UNSUPERVISED APPROACHES

Algorithm Accuracy Corpus Advantages Disadvantages

Lin’s Algorithm 68.5% Tested on a corpus Most tested algorithm General Approach
containing 25 million
words containing 2832
polysemous* nouns
Hyperlex 97% Test on a set of 10 highly Highest Precision Rate Could fail to distinguish
polysemous French between finer senses of
words A new approach that a word (e.g. the
extracts information medicinal and narcotic
from a corpus sense of “drug”)
WSD using parallel 62.4% Trained using a English Can distinguish between Needs a parallel corpus
corpora Spanish parallel corpus the finer sense of a of multiple languages
of nouns word due to distinct which needs a lot of
translations processing

* Polysemous:
SEMANTIC DISAMBIGUATION Words that can have 35
multiple meanings
History

Difficulties

Approaches and Algorithms

EVALUATIONS AND DEMOS


Conclusion
EVALUATIONS AND DEMO
EVALUATION
 Comparing and evaluating different WSD systems is extremely difficult:
 Since different test sets and knowledge resources are adopted
 During testing all word occurrences should be annotated and all methods compared on the same
corpus
 SemEval is an international word sense disambiguation competition where evaluations of WSD
systems are compared

Knowledge-Based Supervised Unsupervised


Approaches Approaches
No need for a corpus. Needs a corpus. Needs a corpus.

The used database includes all word Classifications of the different senses Clusters of words of similar context.
terms. of words.

No sense tagging. Manual sense tagging is expensive and No sense tagging.


time-consuming

Relies on the provided senses in Relies on the human classification of No need for human declarations ,
Machine Dictionaries the senses. results depend on the context words

37
SEMANTIC DISAMBIGUATION
EVALUATIONS AND DEMO
PYTHIA DEMO

 Phythia
1. Go to http://omiotis.hua.gr/pythia/# or press above
2. Insert “Joe is a chair” as your text and press next, then choose short text
3. Choose Inter-Linear Programming and then Lesk-Like as we want to test an
algorithm similar to the Lesk one
4. Choose No for Sense pruning and then choose All Features for the
Classification model
5. Check your text when you are prompted to, you can see that chair will be given
the right meaning
6. Now try the following text “Joe took Ellie out on a date” you will see that date
was not given the correct meaning

SEMANTIC DISAMBIGUATION 38
EVALUATIONS AND DEMO
BABELFY DEMO

 Babelfy
1. Go to http://babelfy.org/index or press above
2. Insert “Rami is a chair” as your text and press Babelfy, you will see that
Chair has been given the wrong meaning
3. Now try to give Babelfy more context by trying “Rami is a chair
member”, you will see that now the meaning of chair will be correct
4. As you can see, even Babelfy one of the most used WSD systems needs
a good amount of context for it to work

SEMANTIC DISAMBIGUATION 39
EVALUATIONS AND DEMO
GOOGLE TRANSLATE DEMO

 Google Translate
1. Go to https://translate.google.com.lb/ or press above
2. Choose to translate from English to French
2. Insert “Rami is a chair member” as your text and translate you will see that
Chair has been given the wrong translation
3. Now try to give Google even more context by trying “Rami is the chair of the
board”, you will see that now the translation of chair will be correct
4. As you can see, also Google translate one of the most used translation systems
needs even more a fairly large amount of context for it to work correctly

SEMANTIC DISAMBIGUATION 40
History

Difficulties

Approaches and Algorithms

Evaluations and Demo

CONCLUSION
CONCLUSION
 All three methods are closely related
 The denoted accuracy of each algorithm and the shown Demos prove that WSD
is still a long way from being completely perfect and truly reliable

SEMANTIC DISAMBIGUATION 42
APPENDIX A.1
WALKER‟S ALGORITHM
DICTIONARY AND KNOWLEDGE BASED METHODS

 Uses the features provided by the Thesaurus dictionaries


 Uses two steps:
 Step 1 : For each sense of the target word, find the thesaurus category to which that
sense belongs.
 Step 2: Calculate the score for each sense by using the context words. A context word
will add 1 to the score of the sense if the thesaurus category of the word matches
that of the sense.

SEMANTIC DISAMBIGUATION 43
APPENDIX A.1
WALKER‟S ALGORITHM
DICTIONARY AND KNOWLEDGE BASED METHODS

 Consider the following example:


 The money in this bank fetches an interest of 8% per annum
 Context words add 1 to the sense when the topic of the word matches that of the sense
 Hence bank is defined as finance and not a location

Sense 1: finance Sense 2: location


Money +1 0
Interest +1 0
Fetch 0 0
Annum +1 0
Total 3 0

SEMANTIC DISAMBIGUATION 44
APPENDIX A.2
RANDOM WALK ALGORITHM
DICTIONARY AND KNOWLEDGE BASED METHODS

0.46 0.97
0.42
a
S3 b a
S3 S3
c

0.49
e
0.35 0.63

S2 f S2 S2
k
g

h
i 0.58
0.92 0.56 l 0.67
j
S1 S1 S1 S1

Bell ring church Sunday


Step 1: Add a vertex for each Step 3: Apply graph based ranking
possible sense of each word in algorithm to find score of
the text. each vertex (i.e. for each word
Step 2: Add weighted edges using sense).
Step 4: Select the vertex (sense)
definition based semantic
which has the highest score.
similarity (Lesk’s method).
REFERENCES
 [1]Harsimran Singh and Vishal Gupta, "An Insight into Word Sense Disambiguation Techniques ",International
Journal of Computer Applications, vol. 118, no. 23, 2015.
 [2]Yarowsky, David. 1992. "Word sense disambiguation using statistical models of Roget's categories trained on
large corpora", Proceedings of the 14th International Conference on Computational Linguistics (COLING), Nantes,
France, 454-460, 1992.
 [3]Roberto Navigli, Paolo Velardi, 2005."Structural Semantic Interconnections: A Knowledge-Based Approach to
Word Sense Disambiguation", IEEE Transactions On Pattern Analysis and Machine Intelligence, July 2005.
 [4]Roberto Navigli, 2006."Ensemble methods for unsupervised WSD", Proceedings of the 21st International
Conference on Computational Linguistics and the 44th annual meeting of the ACL, Sydney, 2006.
 [5]Mona Diab and Philip Resnik, 2002."An Unsupervised Method for Word Sense Tagging Using Parallel
Corpora", Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia,
Pennsylvania, July 2002.
 [6] B. Zopon Al-Bayaty and S. Joshi, "Comparative Analysis between Naïve Bayes Algorithm and Decision Tree to
Solve WSD Using Empirical Approach", 2016. [Online]. Available: http://www.lnse.org/vol4/228-SE3001.pdf.
[Accessed: 20- Sep- 2016].

SEMANTIC DISAMBIGUATION 46

You might also like