DISCOVER WORD ASSOCIATIONS AND RELATIONS

UNIT 2: WORD ASSOCIATIONS AND RELATION DISCOVERY
What is Word Association?
• Word association is a relation that exists between two words.

• There are two types of relations: Paradigmatic and Syntagmatic.
• Paradigmatic: A & B have paradigmatic relation if they can be substituted
for each other (i.e., A & B are in the same class)
• e.g. “cat” and “dog”; “Monday” and “Tuesday”
• Syntagmatic: A & B have syntagmatic relation if they can be combined
with each other (i.e., A & B are related semantically)
• e.g. “cat” and “scratch”; “car” and “drive”
• These two basic and complementary relations can be generalized to
describe relations of any items in a language
2
Why Mine Word Associations?
• They are useful for improving accuracy of many NLP tasks

• POS tagging, parsing, entity recognition, acronym expansion
• Grammar learning
• They are directly useful for many applications in text retrieval and
mining
• Text retrieval (e.g., use word associations to suggest a variation of a
query)
• Automatic construction of topic map for browsing: words as nodes and
associations as edges
• Compare and summarize opinions (e.g., what words are most strongly
associated with “battery” in positive and negative reviews about iPhone 6,
respectively?)
3
Word Context
4
Word Co-occurrence
5
Mining Word Associations
• Paradigmatic
• Represent each word by its context
• Compute context similarity
• Words with high context similarity likely have paradigmatic relation
• Syntagmatic
• Count how many times two words occur together in a context (e.g., sentence or
paragraph)
• Compare their co-occurrences with their individual occurrences
• Words with high co-occurrences but relatively low individual occurrences likely
have syntagmatic relation
• Paradigmatically related words tend to have syntagmatic relation with the
same word  joint discovery of the two relations
• These ideas can be implemented in many different ways!
6
Word Context as “Pseudo Document”
7
Computing Similarity of
Word Context
8
9
Syntagmatic Relation –
Word Collocation
• Syntagmatic relation is word co-occurrence – called Collocation
• If two words occur together in a context more often than chance, they are in the
syntagmatic relation (i.e., related words).
10
Word Probability
• Word probability – how likely would a given word appear in a text/context?
11
Binomial Distribution
• Word (occurrence) probability is modeled by Binomial Distribution.
12
Entropy as a Measure of Randomness
• Entropy is a measure in Information Theory, and indicates purity or

(un)even/skewed distribution -- a large entropy means the distribution is
even/less skewed.
• Entropy takes on a value [0, 1] (between 0 and 1 inclusive).
• Entropy of a collection S with respect to
the target attribute which takes on c number
of values is calculated as:
𝑐

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆)=∑ − 𝑝 𝑖 ∙ log 2 𝑝𝑖
𝑖=1
• This is the average number of bits required to
encode an instance in the dataset.
• For a boolean classification, the entropy
function yields:

13
Entropy for Word Probability
14
Mutual Information (MI) as a Measure of Word
Collocation
• Mutual Information is a concept in probability theory, and indicates the

two random variables' mutual dependence – or the reduction of
entropy.
• How much reduction in the entropy of X can we obtain by knowing Y?
(where reduction give more predictability)
𝑝(𝑥 , 𝑦 )
𝐼 ( 𝑋 ;𝑌 ) = ∑ ∑ 𝑝 (𝑥 , 𝑦)∙ log
𝑦 ∈𝑌 𝑥 ∈𝑋 𝑝(𝑥)∙ 𝑝 ( 𝑦)
𝑰 ( 𝑿;𝒀 )=𝑯 ( 𝑿 ) −𝑯 ( 𝑿|𝒀 )= 𝑯 (𝒀 ) −𝑯 (𝒀|𝑿 )

15
Mutual Information (MI) and
Word Collocation
16
16
Probabilities in MI
17
17
Estimation of Word Probability
18
18
Point-wise Mutual Information
• Point-wise Mutual Information (PMI) is often used in place of MI.

• PMI is a specific event of the two random variables.
19
Vector Semantics
Positive Pointwise Mutual
Information (PPMI)
Dan Jurafsky
Word-Word matrix
Sample contexts 7 words
… …
21
Dan Jurafsky
Word-word matrix
• We showed only 4x6, but the real matrix is 50,000 x 50,000
• So it’s very sparse
• Most values are 0.
• That’s OK, since there are lots of efficient algorithms for sparse matrices.
• The size of windows depends on your goals
• The shorter the windows , the more syntactic the representation
1-3 very syntacticy
• The longer the windows, the more semantic the representation
4-10 more semanticy
22
Dan Jurafsky
Problem with raw counts

• Raw word frequency is not a great measure of
association between words
• It’s very skewed
• “the” and “of” are very frequent, but maybe not the most
discriminative
• We’d rather have a measure that asks whether a context word is
particularly informative about the target word.
• Positive Pointwise Mutual Information (PPMI)
23
Dan Jurafsky
Pointwise Mutual Information
•Pointwise
mutual information:
Do events x and y co-occur more than if they were independent?
PMI between two words: (Church & Hanks 1989)

Do words x and y co-occur more than if they were independent?
Dan Jurafsky
Positive Pointwise Mutual Information

• PMI
ranges from
• But the negative values are problematic
• Things are co-occurring less than we expect by chance
• Unreliable without enormous corpora
• Imagine w1 and w2 whose probability is each 10-6
• Hard to be sure p(w1,w2) is significantly different than 10-12
• Plus it’s not clear people are good at “unrelatedness”
• So we just replace negative PMI values by 0
• Positive PMI (PPMI) between word1 and word2:
Dan Jurafsky
Computing PPMI on a term-context matrix

• Matrix F with W rows (words) and C columns (contexts)
• fij is # of times wi occurs in context cj
26
Dan Jurafsky
p(w=information,c=data) = 6/19 = .32

p(w=information) = 11/19 = .58
p(c=data) = 7/19 = .37
27
Dan Jurafsky
• pmi(information,data) = log2 ( .32 / (.37*.58) ) = .58

(.57 using full precision)
28
Other Word Collocation Measures
The term diabetes occurs in 63 documents.

continued...
Conditional Counts: Concept Linking
Centered term: a term that is chosen to investigate
diabetes (63/63)
+insulin (14/58)
Concept linked term: a term that

co-occurs with a centered term
 In this diagram, the centered term is diabetes, which occurred in

63 documents. The term insulin (and its stemmed variations)
occurred in 58 documents, 14 of which also contained diabetes.
The term insulin and its variants occur in 58 documents, and 14 of
those documents also contain the term diabetes.
• Terms that are primary associates of insulin are secondary associates
of diabetes.

DISCOVER WORD ASSOCIATIONS AND RELATIONS

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DISCOVER WORD ASSOCIATIONS AND RELATIONS

Uploaded by

Copyright:

Available Formats

UNIT 2: WORD ASSOCIATIONS AND RELATION DISCOVERY

What is Word Association?

• Word association is a relation that exists between two words.

• They are useful for improving accuracy of many NLP tasks

• Word probability – how likely would a given word appear in a text/context?

• Word (occurrence) probability is modeled by Binomial Distribution.

• Entropy is a measure in Information Theory, and indicates purity or

• Mutual Information is a concept in probability theory, and indicates the

𝑰 ( 𝑿;𝒀 )=𝑯 ( 𝑿 ) −𝑯 ( 𝑿|𝒀 )= 𝑯 (𝒀 ) −𝑯 (𝒀|𝑿 )

• Point-wise Mutual Information (PMI) is often used in place of MI.

Problem with raw counts

Pointwise Mutual Information

PMI between two words: (Church & Hanks 1989)

Positive Pointwise Mutual Information

Computing PPMI on a term-context matrix

p(w=information,c=data) = 6/19 = .32

• pmi(information,data) = log2 ( .32 / (.37*.58) ) = .58

The term diabetes occurs in 63 documents.

Centered term: a term that is chosen to investigate

Concept linked term: a term that

 In this diagram, the centered term is diabetes, which occurred in

You might also like