You are on page 1of 11

Answer

Given: K=2, initial cluster centers: (0,0) & (5,6), and

Data samples: (0,0), (0,1), (1,0), (3,3), (5,6), (8,9), (9,8), (9,9)

Iteration 1

Distance from Distance from


Data Sample Cluster-1’s center Cluster-2’s center
(0,0) (5,6)
(0,0) 0 7.81
(0,1) 1 7.07
(1,0) 1 7.21
(3,3) 4.24 3.61
(5,6) 7.81 0
(8,9) 12.37 4.24
(9,8) 12.37 4.47
(9,9) 12.73 5

Updated cluster centers:

For Cluster-1: centroid of samples (0,0), (0,1), and (1,0) = (0.33,0.33)

For Cluster-2: centroid of samples (3,3), (5,6), (8,9), (9,8), and (9,9) = (6.8,7)
Iteration 2

Distance from Distance from


Data Sample Cluster-1’s center Cluster-2’s center
(0.33,0.33) (6.8,7)
(0,0) 0.47 9.76
(0,1) 0.75 9.08
(1,0) 0.75 9.09
(3,3) 3.78 5.52
(5,6) 7.35 2.06
(8,9) 11.58 2.33
(9,8) 11.58 2.42
(9,9) 12.26 2.97

Updated cluster centers:

For Cluster-1: centroid of samples (0,0), (0,1), (1,0), and (3,3) = (1,1)

For Cluster-2: centroid of samples (5,6), (8,9), (9,8), and (9,9) = (7.75,8)

Iteration 3

Distance from Distance from


Data Sample Cluster-1’s center Cluster-2’s center
(1,1) (7.75,8)
(0,0) 1.41 11.14
(0,1) 1 10.44
(1,0) 1 10.47
(3,3) 2.83 6.90
(5,6) 6.40 3.40
(8,9) 10.63 1.03
(9,8) 10.63 1.25
(9,9) 11.31 1.60
Updated cluster centers:

For Cluster-1: centroid of samples (0,0), (0,1), (1,0), and (3,3) = (1,1)

For Cluster-2: centroid of samples (5,6), (8,9), (9,8), and (9,9) = (7.75,8)

No change in cluster centers. Therefore, final clusters are

Cluster-1: (0,0), (0,1), (1,0), and (3,3)

Cluster-2: (5,6), (8,9), (9,8), and (9,9)

Within cluster sum of squares:

WCSS = {(𝟎 − 𝟏)𝟐 + (𝟎 − 𝟏)𝟐 } + {(𝟎 − 𝟏)𝟐 + (𝟏 − 𝟏)𝟐 } + {(𝟏 − 𝟏)𝟐 + (𝟎 − 𝟏)𝟐 } +
{(𝟑 − 𝟏)𝟐 + (𝟑 − 𝟏)𝟐 } + {(𝟓 − 𝟕. 𝟕𝟓)𝟐 + (𝟔 − 𝟖)𝟐 } + {(𝟖 − 𝟕. 𝟕𝟓)𝟐 + (𝟗 − 𝟖)𝟐 } +
{(𝟗 − 𝟕. 𝟕𝟓)𝟐 + (𝟖 − 𝟖)𝟐 } + {(𝟗 − 𝟕. 𝟕𝟓)𝟐 + (𝟗 − 𝟖)𝟐 }
Subjecting the data points in Question 3.1 to K-medoids

Assumption: K=2, initial medoids are (0,0) & (5,6), and Manhattan distance is
used as a dissimilarity measure/metric.

Manhattan Distance from Manhattan Distance


Data Sample Cluster-1’s medoid from Cluster-2’s medoid
(0,0) (5,6)
(0,0) 0 11
(0,1) 1 10
(1,0) 1 10
(3,3) 6 5
(5,6) 11 0
(8,9) 17 6
(9,8) 17 6
(9,9) 18 7

Cost = 0+1+1+5+0+6+6+7 = 26.

What will be the cost if in Cluster-1, the medoid (0,0) is swapped with the non-
medoid data point (1,0)?

Manhattan Distance from Manhattan Distance


Data Sample Cluster-1’s medoid from Cluster-2’s medoid
(1,0) (5,6)
(0,0) 1 11
(0,1) 2 10
(1,0) 0 10
(3,3) 5 5
(5,6) 10 0
(8,9) 16 6
(9,8) 16 6
(9,9) 17 7

Cost = 1+2+0+5+0+6+6+7 = 27. The cost increases and therefore, the swap should
be avoided.
Will there be a decrease in the cost if in Cluster-2, the medoid (5,6) is swapped
with the non-medoid data point (8,9)?

Manhattan Distance from Manhattan Distance


Data Sample Cluster-1’s medoid from Cluster-2’s medoid
(0,0) (8,9)
(0,0) 0 17
(0,1) 1 16
(1,0) 1 16
(3,3) 6 11
(5,6) 11 6
(8,9) 17 0
(9,8) 17 2
(9,9) 18 1

Cost = 0+1+1+6+6+0+2+1 = 17. The cost decreases and therefore, they can be
swapped. The medoid for Cluster-2 should be (8,9).

Since no other swap could reduce the cost below 17, the final clusters can be
formed based on the last swap. The final clusters will be

Cluster-1: (0,0), (0,1), (1,0), and (3,3)

Cluster-2: (5,6), (8,9), (9,8), and (9,9)


Answer for part-(a): simple-linkage clustering

ITERATION 1

STEP 1.1

P1 P2 P3 P4 P5 P6
P1 1
P2 0.7895 1
P3 0.1579 0.3684 1
P4 0.0100 0.2105 0.8421 1
P5 0.5292 0.7023 0.5292 0.3840 1
P6 0.3542 0.5480 0.6870 0.5573 0.8105 1

STEP 1.2

P1 P2 P34 P5 P6
P1 1
P2 0.7895 1
P34 0.1579 0.3684 1
P5 0.5292 0.7023 0.5292 1
P6 0.3542 0.5480 0.6870 0.8105 1
ITERATION 2

STEP 2.1

P1 P2 P34 P5 P6
P1 1
P2 0.7895 1
P34 0.1579 0.3684 1
P5 0.5292 0.7023 0.5292 1
P6 0.3542 0.5480 0.6870 0.8105 1

STEP 2.2

P1 P2 P34 P56
P1 1
P2 0.7895 1
P34 0.1579 0.3684 1
P56 0.5292 0.7023 0.6870 1

ITERATION 3

STEP 3.1

P1 P2 P34 P56
P1 1
P2 0.7895 1
P34 0.1579 0.3684 1
P56 0.5292 0.7023 0.6870 1

STEP 3.2

P12 P34 P56


P12 1
P34 0.3684 1
P56 0.7023 0.6870 1
ITERATION 4

STEP 4.1

P12 P34 P56


P12 1
P34 0.3684 1
P56 0.7023 0.6870 1

The next level (i.e. the top most level) of hierarchy will have P1256 and P34.

Resulting dendrogram
Answer for part-(b): complete-linkage clustering

ITERATION 1

STEP 1.1

P1 P2 P3 P4 P5 P6
P1 1
P2 0.7895 1
P3 0.1579 0.3684 1
P4 0.0100 0.2105 0.8421 1
P5 0.5292 0.7023 0.5292 0.3840 1
P6 0.3542 0.5480 0.6870 0.5573 0.8105 1

STEP 1.2

P1 P2 P34 P5 P6
P1 1
P2 0.7895 1
P34 0.0100 0.2105 1
P5 0.5292 0.7023 0.3840 1
P6 0.3542 0.5480 0.5573 0.8105 1

ITERATION 2

STEP 2.1

P1 P2 P34 P5 P6
P1 1
P2 0.7895 1
P34 0.0100 0.2105 1
P5 0.5292 0.7023 0.3840 1
P6 0.3542 0.5480 0.5573 0.8105 1
STEP 2.2

P1 P2 P34 P56
P1 1
P2 0.7895 1
P34 0.0100 0.2105 1
P56 0.3542 0.5480 0.3840 1

ITERATION 3

STEP 3.1

P1 P2 P34 P56
P1 1
P2 0.7895 1
P34 0.0100 0.2105 1
P56 0.3542 0.5480 0.3840 1

STEP 3.2

P12 P34 P56


P12 1
P34 0.0100 1
P56 0.3542 0.3840 1

ITERATION 4

STEP 4.1

P12 P34 P56


P12 1
P34 0.0100 1
P56 0.3542 0.3840 1

The next level (i.e. the top most level) of hierarchy will have P3456 and P12.
Resulting dendrogram

You might also like