Calculating Dissimilarities PDF

You might also like

You are on page 1of 4

Calculating Minkowskis Distance

Used for computing differences on NUMERIC data only


Minkowski Distance can be:
1. Manhattan Distance (1st degree)
2. Euclidean Distance (2nd degree)
3. Supremum or Chebyshev Distance

SAMPLE: You need to measure the dissimilarity between 3 students based on 4 different grades
Name

HS Final
Grade
95
95
95

Anna
Mark
Philip

Standardized
Aptitude Test Grade
84
87
88

College Entrance Exam


Grade
90
95
90

Manhattan Distance:
d (Anna, Mark) = |95-95| + |84-87| + |90-95| + |88-90|
=0+3+5+2
= 10
d (Anna, Philip) = |95-95| + |84-88| + |90-90| + |88-90|
=0+4+0+2
=6
d (Mark, Philip) = |95-95| + |87-88| + |95-90| + |90-90|
=0+1+5+0
=6
Dissimilarity Matrix for Manhattan Distance:

Anna
Mark
Philip

Anna
0
10
6

Mark

Philip

0
6

Euclidean Distance:
d (Anna, Mark) = sqrt ( (95-95)2 + (84-87)2 + (90-95)2 + (88-90)2 )
= sqrt ( 0 + 9 + 25 + 4 )
= 6.164
d (Anna, Philip) = ???
= ???
= ???
d (Mark, Philip) = sqrt( (95-95)2 + (87-88)2 + (95-90)2 + (90-90)2 )
= sqrt( 0 + 1 + 25 + 0 )
= 5.099
Dissimilarity Matrix for Euclidean Distance:

Anna
Mark
Philip

Anna
0
6.164
???

Mark

Philip

0
5.099

Math Proficiency
Grade
88
90
90

Chebyshev / Supremum Distance:


d (Anna, Mark) = max(|95-95|,|84-87|,|90-95|,|88-90|)
= max(0,3,5,2)
=5
d (Anna, Philip) = max(|95-95|,|84-88|,|90-90|,|88-90|)
= max(0,4,0,2)
=4
d (Mark, Philip) = ???
= ???
= ???
Dissimilarity Matrix for Chebyshev Distance:

Anna
Mark
Philip

Anna
0
5
4

Mark

Philip

0
???

Converting Ordinal Values for Dissimilarity Calculation


Name
Martin
Mary
Adrian
Julie

Rating (Rank Value)


Excellent (3)
Fair (1)
Good (2)
Excellent (3)

Normalized Value (max(rank value) == M)


(3 1) / (3 1) = 1.0
(1 1) / (3 1) = 0.0
(2 1) / (3 1) = 0.5
(3 1) / (3 1) = 1.0

Then choose any of the numeric Mikowski distance calculation:


** Lets use Euclidean **
Euclidean Distance:
d (Martin, Mary)

= sqrt ( (1.0 - 0.0)2)


= 1.0

d (Martin, Adrian)

= sqrt ( (1.0 0.5)2 )


= 0.5

d (Martin, Julie)

= sqrt( (1.0 1.0)2 )


=0

d (Mary, Adrian) = ???


d (Mary, Julie) = ???
d (Adrian, Julie) = ???
Dissimilarity Matrix for Euclidean Distance:
Martin
Martin 0
Mary
1.0
Adrian 0.5
Julie
0

Mary

Adrian

Julie

0
???
???

0
???

Dissimilarity of Mixed Attribute Types

= weight of indicator (usually given)


Assume the following dissimilarity matrices with different types and weights:
Numeric (weight = 2)
Anna Mark
Anna
0
Mark
10
0
Philip 6
6

Philip

Normalize the numeric table to reflect values as a transformed value within [0.0, 1.0] (The easiest way
to normalize is to get the value as a percentage of the maximum. There are however, other methods to
normalize values)
Anna
Mark
Philip
Anna
0
Mark
10/10 = 1.0
0
Philip 6/10 = 0.6
6/10 = 0.6 0
Numeric (weight = 1)
Anna Mark
Anna
0
Mark
5
0
Philip 1
2

Philip

Normalize the numeric table to reflect values as a transformed value within [0.0, 1.0] (The easiest way
to normalize is to get the value as a percentage of the maximum. There are however, other methods to
normalize values)

Anna
Mark
Philip

Anna
0
5/5 = 1.0
1/5 = 0.2

Ordinal (weight = 1)
Anna Mark
Anna
0
Mark
0.5
0
Philip 0.5
1

Mark

Philip

0
2/5 = 0.4

Philip

Binary Asymmetric (weight = 2)


Anna Mark Philip
Anna
0
Mark
1
0
Philip 1
0
0
d (Anna, Mark) = [(2)(1.0) + (1)(1.0) + (1)(0.5) + (2)(1)] / (2+1+1+2)
= (2 + 1 + 0.5 + 2) / 6
= 5.5 / 6 = 0.9167
d (Anna, Philip) = [(2)(0.6) + (1)(0.2) + (1)(0.5) + (2)(1)] / (2+1+1+2)
= (1.2 + 0.2 + 0.5 + 2) / 6
= 3.9 / 6 = 0.65
d (Mark, Philip) = [(2)(0.6) + (1)(0.4) + (1)(1) + (2)(0)] / (2+1+1+2)
= (1.2 + 0.4 + 1 + 0) / 6
= 2.6 / 6 = 0.4333

FINAL DISSIMILARITY MATRIX:


Anna
Mark
Philip
Anna
0
Mark
0.9167 0
Philip 0.65
0.4333 0

You might also like