Professional Documents
Culture Documents
Four.
© Sakthi Balan M
NLP
Lexical Semantics
• We introduce a richer model of the semantics of words, drawing on the linguistic study of
word meaning, a field called lexical semantics
• Lexeme: Lexeme share a more meaning although they are spelt (Orthography) or
pronounced (Phonetics) differently
• Lemma: A lemma or citation form is the grammatical form that is used to represent a
lexeme.
• For example: The lemma or citation form for sing, sang, sung is sing
• The specific forms sung or carpets or sing are also called wordforms
• For example, the wordform found can map to the lemma find (meaning ‘to
locate’) or the lemma found (‘to create an institution’), as illustrated in the
following WSJ examples:
• He has looked at 14 baseball and football stadiums and found that only one
– private Dodger Stadium – brought more money into a city than it took out
• Note: Lemmas are part-of-speech specific; thus the wordform tables has
two possible lemmas, the noun table and the verb table
• A lemma is not necessarily the same as the stem from the morphological
parse (Celebration is a lemma but celebrate is the stem for celebration)
• While some banks give blood only to the needy as a Sense1 and Sense 3: Polysemy
service, others may do it as a business — sense 3
• He might have served his time, come out and led an upstanding
life.
• Synonymy:
• Two words may be synonymous but still they may not have an
identical meaning:
• long / short, big / little, fast / slow, cold / hot, dark / light, rise / fall, up / down, in / out
• Two senses can be antonyms if they define a binary opposition, or are at opposite ends
of some scale. This is the case for long/short, fast/slow, or big/little, which are at
opposite ends of the length or size scale.
• From one perspective, antonyms have very different meanings, since they are opposite.
• From another perspective, they have very similar meanings, since they share almost all
aspects of their meaning except their position on a scale, or their direction
• Hyponym: If the first sense is more specific than the second sense
• For example: car is a hyponym of vehicle; dog is a hyponym of animal, and mango is
a hyponym of fruit
• Hypernym: We say that vehicle is a hypernym of car, and animal is a hypernym of dog.
The word superordinate is often used instead of hypernym
• Class denoted by the superordinate extensionally includes the class denoted by the
hyponym
• Hypernymy can also be defined in terms of entailment. Under this definition, a sense A is
a hyponym of a sense B if everything that is A is also B
NLP © Sakthi Balan M
Relations Between Senses
• The term ontology usually refers to a set of distinct objects resulting from
an analysis of a domain, or microworld.
• WordNet consists of three separate databases, one each for nouns and verbs, and a
third for adjectives and adverbs
• Each database consists of a set of lemmas, each one annotated with a set of senses
• The WordNet 3.0 release has 117,097 nouns, 11,488 verbs, 22,141 adjectives, and
4,601 adverbs
• The average noun has 1.23 senses, and the average verb has 2.16 senses
• WordNet can be accessed via the web or downloaded and accessed locally.
NLP © Sakthi Balan M
WordNet
Noun Relations
Verb Relations
• In the lexical sample task: A small pre-selected set of target words is chosen, along with an
inventory of senses for each word from some lexicon. For each word, a number of corpus
instances (context sentences) can be selected and hand-labeled with the correct sense of
the target word in each. Classifier systems can then be trained using these labeled examples.
Unlabeled target words in context can then be labeled using such a trained classifier.
• Early work in word sense disambiguation focused solely on lexical sample tasks of this sort,
building word-specific algorithms for disambiguating single words like line, interest, or plant.
• In the all-words task: Systems are given entire texts and a lexicon with an inventory of senses
for each entry, and are required to disambiguate every content word in the text. The all-words
task is very similar to part-of-speech tagging, except with a much larger set of tags, since
each lemma has its own set.
NLP © Sakthi Balan M
Supervised Word Sense Disambiguation
• Collocational Features
• Bag-of-words
• Choose the most frequent sense for each word from the senses in a
labeled corpus
• For WordNet, this corresponds to the take the first sense heuristic,
since senses in WordNet are generally ordered from most-frequent
to least-frequent
• Algorithm that chooses the sense whose dictionary gloss or definition shares
the most words with the target word’s neighborhood
• Signature: set of words in the gloss and examples of the target word sense
• Find the overlap between context and signature and find the maximum
overlap wrt all senses
NLP © Sakthi Balan M
The Lesk Algorithm
• The bank can guarantee deposits will eventually cover future tuition costs
because it invests in adjustable-rate mortgage securities
bank1 has two (non-stop) words overlapping with the context: deposits
and mortgage
• Instead of comparing a target word’s signature with the context words, the target signature is compared
with the signatures of each of the context words
• Other Version (Corpus Lesk): Instead of just counting up the overlapping words, the Corpus Lesk
algorithm also applies a weight to each overlapping word.
• IDF measures how many different ’documents’ (in this case glosses and examples) a word occurs
in
• Since function words like the, of, etc, occur in many documents, their IDF is very low, while the IDF
of content words is high. Corpus Lesk thus uses IDF.
• The goal of the Yarowsky algorithm is to learn a classifier for a target word
• The algorithm is given a small seed-set Λ0 of labeled instances of each sense, and a much larger unlabeled corpus V0
• The algorithm first trains a classifier on the seed-set Λ0 and it then uses this classifier to label the unlabeled corpus V0
• The algorithm then selects the examples in V0 that it is most confident about, removes them, and adds them to the
training set (call it now Λ1)
• The algorithm then trains a new classifier (a new set of rules) on Λ1 and iterates by applying the classifier to the now-
smaller unlabeled set V1, extracting a new training set Λ2 and so on
• With each iteration of this process, the training corpus grows and the untagged corpus shrinks. The process is
repeated until some sufficiently low error-rate on the training set is reached, or until no further examples from the
untagged corpus are above threshold.
• The key to any bootstrapping approach lies in its ability to create a larger training set from a small set of seeds. This
requires an accurate initial set of seeds and a good confidence metric for picking good new examples to add to the
training set
Picture shows a partial result of a search for the strings “fish” and “play” in a corpus of bass
examples drawn from the WSJ
• Two words are more similar if they share more features of meaning, or are near-synonyms
• Two words are less similar, or have greater semantic distance, if they have fewer
common meaning elements
• Word Similarity — mainly means Synonymy relations but it may include other
relations also
• Examples: Car and bicycle are similar but car and gasoline are related
• But in this unit we will not distinguish between similarity and relatedness
• Applications:
• Information retrieval
• Summarisation
• Generation
• Machine translation
• This tells that a word/sense is very similar to its parents or its siblings, and less similar to
words that are far away in the network
• This idea can be easily implemented if we define the shortest distance between the words/
senses as the similarity measure
• For example: dime is most similar to nickel and coin but it is less similar to money, and even
less similar to Richter scale.
• pathlen(c1,c2) = the number of edges in the shortest path in the thesaurus graph
between the sense nodes c1 and c2
• If pathlen is a small value then their similarity is more and if it is large then the similarity
is less
NLP © Sakthi Balan M
Word Similarity — Thesaurus Methods
This measure assumes a uniform distance between nodes (which is NOT correct):
• For example nickel & money, and nickel & standard are of equal distance but
intuitively they are not the same
• Also, the link between medium of exchange and standard seems wider than that
between, say, coin and coinage
Lin (1998b) extended the Resnik intuition — a similarity metric between objects A
and B needs to do more than measure the amount of information in common
between A and B.
• commonality: the more information A and B have in common, the more similar they
are
• difference: the more differences between the information in A and B, the less
similar they are