You are on page 1of 12

Tutorial 2

bidisha.samanta
April 2019

1 Q1
Construct the dependency graph of the following sentences using the techniques
given

• This is an example sentence [Deterministic parsing]


• Economic news had little effect on financial markets. [Deterministic pars-
ing]
• Only one of them concerns equality [Deterministic parsing]

2 Q2
Explain the formal conditions which are desirable to be satisfied in the context
of Dependency graphs?
Given the dependency graph, tell which of the formal conditions are satisfied
and which of them are violated.

1
3 Q3
Consider the following graph in Figure 1 with a root node and 3 other vertices.
The edge weights between all the pair of nodes have been provided. Use Chu-
Liu-Edmonds algorithm to find the MST for this graph. You must clearly show
all the steps.

4 Q4
Consider the following dataset, consisting of 6 sentences.
• An automobile is a wheeled motor vehicle used for transporting passengers
.

• A car is a form of transport , usually with four wheels and the capacity
to carry around five passengers .
• Transport for the London games is limited , with spectators strongly ad-
vised to avoid the use of cars .

• The London 2012 soccer tournament began yesterday , with plenty of goals
in the opening matches .
• Giggs scored the first goal of the football tournament at Wembley , North
London .

2
• Bellamy was largely a passenger in the football match , playing no part in
either goal .
Suppose your target words and term vocabulary is given as:
Target words: automobile, car, soccer, football
Term vocabulary: wheel, transport, passenger, tournament, London, goal, match
Solve the following:
1. Obtain a word space representation for the target words. You can use
sentence as the context window, and co-occurrence counts for the repre-
sentation. You may also assume that the words are lemmatized before
computation.

2. Approximating these representations using binary vectors, compute the


overlap coefficient between (automobile, soccer) and (soccer, football)
3. Compute the association score between (car, transport) using Pointwise
Mutual Information, with and without discounting.

5 Q5
You are given a corpus C, with d documents and a vocabulary V . The corpus
is represented as a matrix of C with a size d × |V |. Each document di is a
1 × |V | vector such that dij represents the number of times the word j appears
in document i.

1. Using the matrix C, write the expression to obtain a word to word co-
occurrence matrix W for words in V . Entry wij implies how often the
words wi and wj co-occur. The diagonal entries can be set to zero after
obtaining W .
2. Calculate the all pairs Dice coefficient and cosine similarity for the words
‘bank,river and shore’. The vectors for the words should be obtained from
the matrix W. Matrix C is given below.

3
6 Q6
You are given the sentence “There is a crack in the officer’s cabin”. Consider
the following:

Explain in not more than 3 sentences, how the random walk algorithm can be
used for Word Sense Disambiguation (WSD) of the words. Assume, the words
in boldface only have multiple senses. Words with multiple senses only need
to be considered while performing the random walk algorithm. Also, assume
you are given a lexical resource that enumerates the senses for the words in
the sentence and provides a gloss for each of the sense.
What will be the number of nodes in the network?
What will be the number of edges in the network, provided we use a Markovian
assumption when forming edges?
Are there any restrictions for forming edges between any 2 pair of nodes?
Explain briefly 2 approaches by which edge weights can be calculated using
only the given resources.
How will you construct transition matrix for the Random walk approach? How
is it different from the adjacency matrix of the graph?
What is the terminating condition for the approach, so that the senses can be
assigned to the words based on the random walk score?
The matrix formulation for the random walk can be shown using the following
expression p(t) = p(t−1) p Here p(t) represents the scores for the nodes in the
network at time-step t. P represents the transition matrix. The score so
obtained represents network level relevance of individual nodes.
If we need to bias the scores of the nodes with respect to a particular node in
the network, how can the above matrix formulation be modified? What is this
approach popularly called?
State the only parameter for which the value needs to be tuned, and state one
principled approach by which the parameter value can be tuned

4
7 Q7
Suppose you are given the following words from the vocabulary along with the
number of times they occur together. Also, assume that they individually occur
for 50, 50, 20, 50, 20 times in the corpus. Now, suppose that you want to use
HyperLex algorithm to induce the word senses for the word ‘bar’.

1. Construct the co-occurrence graph G. You can use a threshold of 0.95 for
the edges.
2. What will be the first hub and corresponding high density component?

5
8 Q1 solution

6
Figure 1: Sentence 1

Figure 2: Sentence 2

7
Figure 3: Sentence 3

8
9 Q2 solution
• G is connected
– For every node i there is a node j such that i =⇒ j
• G is acyclic
– if i =⇒ j then not j =⇒ ∗i
• G is projective
– if i =⇒ j then j =⇒ ∗k for any such k such that both j and k lie
on the same side of i.
• G is acyclic
– if i =⇒ j then not j =⇒ ∗i
Violates Projective. Jenda to Z

10 Q3 solution
Consists of two stages:
1. Contracting (everything before the recursive call)
2. Expanding (everything after the recursive call)
Preprocessing step
1. Remove every edge incoming to ROOT
2. This ensures that ROOT is in fact the root of any solution
3. For every ordered pair of nodes, vi , vj , remove all but the highest-scoring
edge from vi to vj .
Contracting stage
1. For each non-ROOT node v , set bestInEdge[v ] to be its highest scoring
incoming edge.
2. If a cycle C is formed:
(a) contract the nodes in C into a new node vC
(b) edges outgoing from any node in C now get source vC
(c) edges incoming to any node in C now get destination vC
(d) For each node u in C , and for each edge e incoming to u from outside
of C :
i. set e.kicksOut to bestInEdge[u],
ii. set e.score to be e.score - e.kicksOut.score.
3. Repeat until every non-ROOT node has an incoming edge and no cycles
are formed

9
11 Q4 solution

wheel transport passenger tournament London Goal Match


automobile 1 0 0 0 0 0 0
car 1 2 1 0 0 0 0
soccer 1 0 0 0 1 0 0
Football 1 0 0 0 0 2 1

Overlap Coefficient:
X ∩Y
(1)
min(|X|, |Y |)

12 Q5 solution
1. Let us consider the vectors di to be Boolean vectors such that each element
dij is 1 or 0 [corresponding to Boolean True and False]. PFor any two words i
and j, Pthe condition for co-occurrence is represented by k dki dkj Therefore,
wij = k dki dkj . The above matrix W can also be obtained by computing
Ŵ = C T C and then setting
(
wij , i 6= j
ŵij = (2)
0, i == j

We then set the diagonal entries wii = 0 ∀i.


2. We obtain the matrix W as follows:
W = CT C

Dice coefficients:
2.3
• Dice(bank, river) = 4+7 = 0.545
2.2
• Dice(river, shore) = 7+3 = 0.4

10
2.2
• Dice(shore, bank) = 3+4 = 0.571

Cosine similarities:

• Cosine(bank, river) = √3 = 0.567


4×7

• Cosine(river, shore) = √2 = 0.436


7×3

• Cosine(shore, bank) = √2 = 0.577


3×4

13 Q6 solution
In order to apply the Random Walk algorithm for WSD, first, we need to create
a graph with a vertex for each possible sense of each word in the text. We will
add weighted edges using definition based semantic similarity (Lesk’s method)
and apply graph based ranking algorithm to find score of each vector (i.e. for
each word sense). Finally we will select the vertex (sense) which has the highest
score.
No. of nodes in the network = 3 + 4 + 3 = 10 No. of edges in the network
(with a Markovian assumption) = 3 × 4 + 4 × 3 = 24 Here we only form weighted
edges between two senses of adjacent words since we have limited the context
window size to two words (previous and succeeding). The weights are calculated
based on the overlap between the features (glosses) of the two senses.
Two approaches for obtaining the edge weights:
1. Lesk’s Algorithm: for each n-word phrase that occurs in both glosses, add
a score of n2 .
2. Overlap Coefficient: If X and Y are the two glosses, then the overlap score
|X∩Y |
is min(|X|,|Y |) where |X ∩ Y | denotes the no. of words common to X and
Y , and |X| denotes the no. of words in X.
The transition matrix of a random walk, P , is a square matrix denoting
the probability of transition from any vertex in the graph to any other vertex.
Formally, Puv = Pr[going from u to v, given that we are at u]. Thus, for a
random walk on an unweighted graph G = (V, E), Puv = d1u if (u, v) ∈ E and
0 otherwise (where du is the degree of u). For a weighted graph, we can set
1
Puv = P if (u, v) ∈ E and 0 otherwise, where nbd(u) is the set of
w∈nbd(u) wuw
nodes adjacent to u. It is different from the adjacency matrix of the graph since
the entries are normalized by the node degrees. If A be the adjacency matrix,
then P = AD−1 where D is a diagonal degree matrix with dii being the degree
of node i, and dij = 0 for i 6= j.
Using the matrix formulation of random walk, p(t) = p(t−1) P , we can define
the termination condition as kp(t) − p(t−1) k < δ where δ is a predefined small
threshold and k.k is a suitable vector norm.
Let N be the no. of nodes in the network. Let us define a 1 × N probability
P
vector B such that Bu gives the probability of restart from node u, and u Bu =

11
1. For a random restart, Bu = N1 ∀u. However, if a node i (say) is to be
1−β
given a bias, then we set Bi = β, and Bj = (N −1) where 0 < β < 1 so that
P
u B u = 1 holds. Then the matrix formulation of the random walk becomes
p(t) = αp(t−1) P + (1 − α)B where α is a parameter, called the damping factor.
This approach is popularly called the Personalized PageRank (PPR).
The parameter α, called the damping factor, is the only parameter which
needs to be tuned. The original PageRank algorithm suggested α = 0.85. How-
ever, this value can be tuned by cross-validation.

14 Q7 solution
In the HyperLex algorithm, the distance between two nodes wi and wj is given
f req
by wij = 1 − max{P (wi |wj ), P (wj |wi )} where P (wi |wj ) = f reqijj .
The distances between word-pairs are given below:

bar iron steel gold coffee


bar 0 0.92 0.75 0.98 0.75
iron 0.92 0 0.7 0.86 1
steel 0.75 0.7 0 0.95 1
gold 0.98 0.86 0.95 0 0.95
coffee 0.75 1 1 0.95 0
This is a distance metric so we take the entries whose values are less than the
threshold. Setting a threshold of 0.95, the adjacency matrix of the co-occurrence
graph is as follows:

bar iron steel gold coffee


bar 0 1 1 0 1
iron 1 0 1 1 0
steel 1 1 0 0 0
gold 0 1 0 0 0
coffee 1 0 0 0 0
The co-occurrence graph can be represented graphically as follows:

bar

iron steel coffee

gold

The first hub is bar and the corresponding high-density component is (bar,
iron, steel, coffee).

12

You might also like