You are on page 1of 7

IS328 Data Mining

Semester 2, 2019

Partitional Clustering Techniques


K-Means and K-Medoids Clustering

Tutorial 12 Exercises

Q1

Suppose we want to group the visitors to a website using just their age (a one-dimensional space)
as follows:
15,17,17,19,19,20,20,21,22,28,35,45,52,58,59,60,60,61,61
Assume the initial centres as 15 and 20
Use K = 2 with the K-Means algorithm. Show your calculations and steps.

The Initial centres are 15 and 20

The initial clusters are


[15] [15, 17, 17]
[20] [19,19,20,20,21,22,28,35,45,52,58,59,60,60,61,61]

The new centres are 16.33 and 40.


The new clusters are
[16.33] [15, 17, 17, 19,19,20,20,21,22,28]
[40] [35,45,52,58,59,60,60,61,61]

The new centres are 19.8 and 54.6


The new clusters are
[19.8] [15, 17, 17, 19,19,20,20,21,22,28, 35]
[54.6] [45,52,58,59,60,60,61,61]

The new centres are 21.2 and 57


The new clusters are
[21.2] [15, 17, 17, 19,19,20,20,21,22,28, 35]
[57] [45,52,58,59,60,60,61,61]

Here the K-Means algorithm terminates, as there is no any change in the clusters.
Q2 K-means clustering with Manhattan Distance
We are given the following data on 5 objects:

Object X1 X2
1 3 2
2 8 6
3 6 7
4 3 4
5 7 2

Cluster this data into two clusters, using the k-means algorithm.
To initialize the algorithm, put objects 1 and 3 in one cluster, and objects 2, 4 and 5 in the
other cluster.
Show the steps of the algorithm clearly. Use Manhattan distance for calculating distances.

ITERATION 1
Cluster A = [ (3, 2), (6, 7)]
Cluster B = [(8, 6), (3, 4), (7, 2)]

C1 = [3+6)/2, (2+7)/2] = (4.5, 4.5)


C2 = (8+3+7)/3 , (6+4+2)/3) = (6, 4)

Manhattan Distances from the Centres


Object C1 (4,5, 4,5) C2(6,4)
(3,2) 4 5
(8,6) 5 4
(6,7) 4 3
(3,4) 2 3
(7, 2) 5 3

ITERATION 2
Cluster A = [ (3, 2), (3,4)]
Cluster B = [(8, 6), (6,7), (7, 2)]

C1 = [3+3)/2, (3+4)/2] = (3,3)


C2 = (8+6+7)/3 , (6+7+2)/3) = (7, 5)
Manhattan Distances from the Centres
Object C1 (3,3) C2(7,5)
(3,2) 1 7
(8,6) 8 2
(6,7) 7 3
(3,4) 1 5
(7, 2) 5 3

ITERATION 3
Cluster A = [ (3, 2), (3,4)]
Cluster B = [(8, 6), (6,7), (7, 2)]

The members of clusters A and B are the same for Iteration 2 and 3.
Therefore the K-Means terminates.

The final clusters are


Cluster A = [ (3, 2), (3,4)]
Cluster B = [(8, 6), (6,7), (7, 2)]

Q3. K-means clustering with Euclidean Distance


Use the k-means algorithm and Euclidean distance to cluster the following 6 objects into 3
clusters:

A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(1,2)

Suppose that the initial seeds (centers of each cluster) are A1, A4 and A6.
Run the k-means algorithm to cluster the above data:
Draw a 10 by 10 space with all the 6 points and show the clusters after each iteration.

Initial centres are C1(2,10), C2(5, 8), C3 (1, 2)

ITERATION 1

Euclidean Distances from the Centres


Object C1 (2,10) C2(5,8) C3(1,2)
A1(2,10) 0 3.61 8.06
A2(2, 5) 5 4.24 3.16
A3(8, 4) 8.49 5 7.62
A4(5,8) 3.61 0 7.21
A5(7, 5) 7.07 3.61 6.71
A6(1, 2) 8.06 7.21 0
The current clusters are
Cluster A = [ (2, 10)]
Cluster B = [(8, 4), (5, 8), (7, 5)]
Cluster C = [(2, 5), (1, 2)]

The new centres are C1(2, 10), C2 (6.67, 5.67), and C3(1.5, 3.5)

ITERATION 2

Euclidean Distances from the Centres


Object C1 (2,10) C2(6.67, 5.67) C3(1.5, 3.5)
A1(2,10) 0 6.37 6.52
A2(2, 5) 5 4.72 1.58
A3(8, 4) 8.49 2.13 6.52
A4(5,8) 3.61 2.87 5.70
A5(7, 5) 7.07 0.75 5.70
A6(1, 2) 8.06 6.75 1/58

The current clusters are


Cluster A = [ (2, 10)]
Cluster B = [(8, 4), (5, 8), (7, 5)]
Cluster C = [(2, 5), (1, 2)]

Since the members have not changed, K-Means terminates here.


The final clusters are
Cluster A = [ (2, 10)]
Cluster B = [(8, 4), (5, 8), (7, 5)]
Cluster C = [(2, 5), (1, 2)]

Exercise 4: K-Medoid Clustering Using Distance Matrix


K-Medoids
Initial Clusters

C1 A, C, D
C2 B, E, F

Medoid of (A, C, D)

A C D Total
A 0 5.66 3.61 9.27
C 5.66 0 2.24 7.90
D 3.61 2.24 0 5.85

Medoid of (B, E, F)

B E F Total
B 0 3.54 2.50 6.04
E 3.54 0 1.12 4.66
F 2.50 1.12 0 3.62
Clusters 2

C1 D, C, E
C2 F, A, B

Medoid of (D, C, E)

D C E Total
D 0 2.24 1.00 3.24
C 2.24 0 1.41 3.65
E 1.00 1.41 0 2.41

Medoid of (F, A, B)

F A B Total
F 0 3.20 2.50 5.70
A 3.20 0 0.71 3.91
B 2.50 0.71 0 3.21

Clusters 3

C1 E, C, D, F
C2 B, A

Medoid of (D, C, E, F)

D C E F Total
D 0 2.24 1.00 0.50 3.74
C 2.24 0 1.41 2.50 6.15
E 1.00 1.41 0 2.50 4.91
F 0.50 2.50 1.12 0 4.12
Clusters 4

C1 D, C, E, F
C2 B, A

Therefore the final clusters are {A, B} and {C, D, E, F}.

You might also like