You are on page 1of 31

K-MEDOIDS CLUSTERING

A Problem of K-means
 Sensitive to outliers

Outlier: objects with extremely large (or small) values
 May substantially distort the distribution of the data

+
+

Outlier

2
Limitations of K-means: Outlier Problem

• The k-means algorithm is sensitive to outliers !

• Since an object with an extremely large value may substantially distort the

distribution of the data.

• Solution: Instead of taking the mean value of the object in a cluster as a reference

point, medoids can be used, which is the most centrally located object in a cluster .
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
k-Medoids Clustering Method
 k-medoids: Find k representative objects, called medoids

PAM (Partitioning Around Medoids, 1987)

CLARA (Kaufmann & Rousseeuw, 1990)

CLARANS (Ng & Han, 1994): Randomized sampling

10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

k-means k-medoids
4
MEDOID
1.A medoid can be defined as the point in the
cluster, whose similarities with all the other points in
the cluster is maximum.

2.In k-medoids clustering, each cluster is


represented by one of the data point in the cluster.
These points are named cluster medoids.

3.The term medoid refers to an object within a


cluster for which average dissimilarity between it
and all the other the members of the cluster is
minimal. It corresponds to the most centrally located
point in the cluster.
5
PAM (Partitioning Around Medoids) (1987)

 PAM (Kaufman and Rousseeuw, 1987)


 Arbitrarily choose k objects as the initial medoids
 Until no change, do

(Re)assign each object to the cluster with the nearest medoid

Improve the quality of the k-medoids
(Randomly select a nonmedoid object, Orandom,
compute the total cost of swapping a medoid with
Orandom)
 Work for small data sets (100 objects in 5 clusters)
 Not efficient for medium and large data sets

6
A Typical K-Medoids Algorithm (PAM)
Total Cost ({mi}) = 20
10 10
10
9 9
9
8 8
8
7 7

6
Arbitrary 6
Assign 7

5
choose k 5 each 6

5
4 object as 4 remaining 4
3
initial 3
object to 3
2
medoids 2
nearest 2
1
medoids
1
1
0 (mi,i=1..k) 0
0 1 2 3 4 5 6 7 8 9 10 0
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

K=2
select a nonmedoid
Total Cost = 18 object,Oj
10 10

9
Compute 9

Swapping mt and Oj 8
total cost of
8

7 7

If cost({Oj,{mi,i≠t})} is the smallest, 6 all possible 6

and cost ({Oj,{mi,i≠t}}) < cost({mi}).


5
new set of 5

4 4

3
medoids 3

(Oj,{mi},i≠t)
2 2

1 1

Do loop 0
0 1 2 3 4 5 6 7 8 9 10
0
0 1 2 3 4 5 6 7 8 9 10

Until no change 7
The k-Medoids Algorithm
Total swapping Cost (TCih)

Total swapping cost TCih=jCjih

– Where Cjih is the cost of swapping i with h for all non medoid objects j

– Cjih will vary depending upon different cases


Sum of Squared Error (SSE)
𝑘
SSE ( 𝐸 ¿= ∑ ∑ 𝑑 2 ( 𝑗 , 𝑖)
𝑖 =1 𝑗 ∈ 𝐶𝑖

𝑦3
𝑥2 𝑦2
𝑥3
𝑥1 C2

C1
𝑦1
K-MEDOIDS ALGORITHM
EXAMPLE-1
K-MEDOIDS ALGORITHM
EXAMPLE-1
Select the Randomly K-Medoids
K-MEDOIDS ALGORITHM
EXAMPLE-1
Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHM
EXAMPLE-1
Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHM
EXAMPLE-1
Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHM
EXAMPLE-1
Determine New Medoid for Each Cluster
K-MEDOIDS ALGORITHM
EXAMPLE-1
Determine New Medoid for Each Cluster
K-MEDOIDS ALGORITHM
EXAMPLE-1
Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHM
EXAMPLE-1
Stop Process
Case (i) : Computation of Cjih

• j currently belongs to the cluster represent by the medoid i


• j is less similar to the medoid t compare to h

10

8
t
7
Therefore, in future, j j
6
belongs to the cluster h
5
represented by h
4

3
i
2

0
0 1 2 3 4 5 6 7 8 9 10

𝑘
𝐸=∑ ∑ 𝑑( 𝑗 ,𝑖) Cjih = d(j, h) - d(j, i)
𝑖=1 𝑗∈𝐶 𝑖
Case (ii) : Computation of Cjih

• j currently belongs to the cluster represent by the medoid i


• j is more similar to the medoid t compare to h

10

8 h
Therefore, in future, j 7 j
belongs to the cluster 6

represented by t 5

4
i t
3

0
0 1 2 3 4 5 6 7 8 9 10

Cjih = d(j, t) - d(j, i)


Case (iii) : Computation of Cjih

• j currently belongs to the cluster represent by the medoid t


• j is more similar to the medoid t compare to h

10

8
t j
7
Therefore, in future, j
6
belongs to the cluster h
5
represented by t itself
4

3
i
2

0
0 1 2 3 4 5 6 7 8 9 10

Cjih = 0
Case (iv) : Computation of Cjih

• j currently belongs to the cluster represent by the medoid t


• j is less similar to the medoid t compare to h

10

8
t
7
Therefore, in future, j j
6
belongs to the cluster h
5
represented by h
4

3
i
2

0
0 1 2 3 4 5 6 7 8 9 10

Cjih = d(j, t) - d(j, h)


PAM or K-Medoids: Example

Data Objects 9
8
3

A1 A2 7
4
6 10
01 2 6 1
5
02 3 4
9

03 4
3 8 2 6 8

04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3
08 Goal: create two clusters
7 4
09 8 5 Choose randmly two medoids
010 7 6
08 = (7,4) and 02 = (3,4)
PAM or K-Medoids: Example

Data Objects 9
cluster1
8
3 cluster2
A1 A2 7
4
6 10
01 2 6 1
5
02 3 4
9

03 4
3 8 2 6 8

04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3 Assign each object to the closest representative object
08 7 4
09 Using L1 Metric (Manhattan), we form the following clusters
8 5
010 7 6 Cluster1 = {01, 02, 03, 04}

Cluster2 = {05, 06, 07, 08, 09, 010}


PAM or K-Medoids: Example

Data Objects 9
cluster1
8
3 cluster2
A1 A2 7
4
6 10
01 2 6 1
5
02 3 4
9

03 4
3 8 2 6 8

04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3
Compute the absolute error criterion [for the set of Medoids
08 7 4
09 (O2,O8)]
8 5
𝒌
010 7 6
𝑬=∑ ∑ |𝒑− 𝑶 𝒊|=(|𝑶 𝟏 − 𝑶 𝟐|+|𝑶 𝟑 − 𝑶 𝟐|+|𝑶 𝟒 − 𝑶 𝟐|)+¿ ¿
𝒊=𝟏 𝒑 ∈𝑪 𝒊

(|𝑶𝟓 − 𝑶 𝟖|+|𝑶 𝟔 − 𝑶 𝟖|+|𝑶 𝟕 − 𝑶 𝟖|+|𝑶 𝟗 − 𝑶𝟖|+|𝑶𝟏𝟎 − 𝑶𝟖|)


PAM or K-Medoids: Example

Data Objects 9
cluster1
8
3 cluster2
A1 A2 7
4
6 10
01 2 6 1
5
02 3 4
9

03 4
3 8 2 6 8

04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3
08 The absolute error criterion [for the set of Medoids (O2,O8)]
7 4
09 8 5
010 7 6 E = (3+4+4)+(3+1+1+2+2) = 20
PAM or K-Medoids: Example

Data Objects 9
cluster1
8
3 cluster2
A1 A2 7
4
6 10
01 2 6 1
5
02 3 4
9

03 4
3 8 2 6 8

04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3 • Choose a random object 07
08 7 4
• Swap 08 and 07
09 8 5
010 • Compute the absolute error criterion [for the set of
7 6
Medoids (02,07)

E = (3+4+4)+(2+2+1+3+3) = 22
PAM or K-Medoids: Example

Data Objects 9
cluster1
8
3 cluster2
A1 A2 7
4
6 10
01 2 6 1
5
02 3 4
9

03 4
3 8 2 6 8

04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3 → Compute the cost function
08 7 4
Absolute error [02 ,07 ] - Absolute error [for 02 ,08 ]
09 8 5
010 7 6
S=22-20
S> 0 => It is a bad idea to replace 08 by 07
PAM or K-Medoids: Example

Data Objects 9
cluster1
8
3 cluster2
A1 A2 7
4
6 10
01 2 6 1
5
02 3 4
9

03 4
3 8 2 6 8

04 4 7 3 7
5
05 6 2 2
3 4 5 6 7 8 9
06 6 4
07 7 3 C687 = d(07,06) - d(08,06)=2-1=1
08 7 4
C587 = d(07,05) - d(08,05)=2-3=-1 C187=0, C387=0, C487=0
09 8 5
010 C987 = d(07,09) - d(08,09)=3-2=1
7 6
C1087 = d(07,010) - d(08,010)=3-2=1

TCih=jCjih=1-1+1+1=2=S
CLARA (Clustering Large Applications) (1990)
CLARA (Clustering LARge Applications, Kaufmann and Rousseeuw)
Sampling based method:
 It draws multiple samples of the data set,
 applies PAM on each sample,
 gives the best clustering as the output (minimizing cost/SSE)

Strength: deals with larger data sets than PAM


Weakness:
 Efficiency depends on the sample size
 A good clustering based on samples will not necessarily represent a
good clustering of the whole data set if the sample is biased

31

You might also like