Professional Documents
Culture Documents
LESSON 5
Dissimilarity Measures
1
Proximity measures
• For clustering also, patterns which are similar to each other are to be
put into the same cluster while patterns which are dissimilar are to be
put into different clusters.
• Some of the proximity measures are metrics while some are not.
Distance measure
2
• The Euclidean distance is the most popular distance measure. If we
have two patterns X and Y , then the euclidean distance will be
v
u d
uX
d2 (X, Y ) = t (xk − yk )2
k=1
where m=1,2,...,∞
Euclidean distance(ED)
v
u d
uX
d2 (X, Y ) = t (xk − yk )2
k=1
Manhattan distance
3
• This is also called the L1 norm.
d
X
d(X, Y ) = ( | xk − yk |)
k=1
L∞ = max | xk − yk | f or k = 1, 2, ...d
• When some features are more significant than others, the weighted
distance measure is used.
d
1
wk (xk − yk )m ) m
X
d(X, Y ) = (
k=1
• The greater the significance of the feature, the larger the weight.
4
d2 (X, Y ) = w1 ∗ (x1 − y1 )2 + w2 ∗ (x2 − y2 )2 + ... + wd ∗ (xd − yd )2
(X − µ)t Σ−1 (X − µ)
d(X, Y ) =| 5 − 6 | + | 3 − 5 | + | 7 − 2 | + | 5 − 9 |
+ | 1 − 5 |= 16
L∞ norm
d∞ (X, Y ) = max(1, 2, 5, 4, 4) = 5
5
Let the weight for the feature w1 be 0.5
d(X, Y ) =
q
(0.5 ∗ (5 − 6)2 ) + (0.1 ∗ (3 − 5)2 ) + (0.3 ∗ (7 − 2)2 ) + (0.7 ∗ (5 − 9)2 ) + (0.4 ∗ (1 − 5)2 )
= 5.1
• There are other dissimilarity measures besides the metric distance mea-
sure described above.
k-Median Distance
• The difference between the values for each element of the vector is
found.
6
• If the two vectors are
P = (10, 56, 23, 32, 78, 99, 65, 83, 1 , 53) and
Q = (34, 87, 94, 21, 49, 81, 23, 77, 34, 61)
D =(| 10 − 34 |, | 56 − 87 |, | 23 − 94 |, | 32 − 21 |, | 78 − 49 |,
| 99 − 81 |, | 65 − 23 |, | 83 − 77 |, | 1 − 34 |, | 53 − 61 |)
= (24, 31, 71, 11, 29, 18, 42, 6, 33, 8)
If k = 5, then d(P,Q) = 29