Professional Documents
Culture Documents
used per test verb. The lexicon was evaluated against We chose two clustering methods which do not in-
manually analysed corpus data after an empirically volve task-oriented tuning (such as pre-fixed thresh-
defined threshold of 0.025 was set on relative fre- olds or restricted cluster sizes) and which approach
quencies of SCFs to remove noisy SCFs. The method data straightforwardly, in its distributional form: (i)
yielded 71.8% precision and 34.5% recall. When we a simple hard method that collects the nearest neigh-
removed the filtering threshold, and evaluated the bours (NN) of each verb (figure 1), and (ii) the In-
noisy distribution, F-measure4 dropped from 44.9 to formation Bottleneck (IB), an iterative soft method
38.51.5 (Tishby et al., 1999) based on information-theoretic
grounds.
4 Clustering Method The NN method is very simple, but it has some
Data clustering is a process which aims to partition a disadvantages. It outputs only one clustering config-
given set into subsets (clusters) of elements that are uration, and therefore does not allow examination
similar to one another, while ensuring that elements of different cluster granularities. It is also highly
that are not similar are assigned to different clusters. sensitive to noise. Few exceptional neighbourhood
We use clustering for partitioning a set of verbs. Our relations contradicting the typical trends in the data
hypothesis is that information about SCFs and their are enough to cause the formation of a single cluster
associated frequencies is relevant for identifying se- which encompasses all elements.
mantically related verbs. Hence, we use SCFs as rel- Therefore we employed the more sophisticated
evance features to guide the clustering process.6 IB method as well. The IB quantifies the rele-
vance information of a SCF distribution with re-
4
F =
2 pre
ision re
all
pre
ision+re
all spect to output clusters, through their mutual infor-
5
These figures are not particularly impressive because our
evaluation is exceptionally hard. We use 1) highly polysemic
( ; )
mation I Clusters SCFs . The relevance informa-
test verbs, 2) a high number of SCFs and 3) evaluate against tion is maximized, while the compression informa-
manually analysed data rather than dictionaries (the latter have
high precision but low recall).
( ; )
tion I Clusters V erbs is minimized. This en-
6 sures optimal compression of data through clusters.
The relevance of the features to the task is evident when
comparing the probability of a randomly chosen pair of verbs The tradeoff between the two constraints is realized
verbi and verbj to share the same predominant sense (4.5%)
with the probability obtained when verbj is the JS-divergence nearest neighbour of verbi (36%).
NN Clustering: IB Clustering (fixed ):
1. For each verb v : Perform till convergence, for each time step
2. Calculate the JS divergence between the SCF t =1 2; ;::: :
distributions of v and all other verbs:
(
1. zt K; V )= ( )
pt 1 K e D[p(S jV )kpt 1 (S jK )℄
(p; q) = D p
=1 ( )
JS 1
p+2 q + D q
p+2 q (When t , initialize zt K; V arbitrarily)
P
2
P K0 zt K ;V ( )
4. Find all the connected components 3. pt (K ) = V p(V )pt (K jV )
P
4. pt (S jK ) = V p(S jV )pt (V jK )
Figure 1: Connected components nearest neighbour ( NN)
clustering. D is the Kullback-Leibler distance.
Figure 2: Information Bottleneck (IB) iterative clustering. D
is the Kullback-Leibler distance.
through minimizing the cost term: