Partitioning Algorithms: Basic Concepts: Partition N Objects Into K Clusters

Partitioning Algorithms: Basic Concepts
 Partition n objects into k clusters


Optimize the chosen partitioning criterion

Example: minimize the Squared Error

Squared Error of a cluster
 d ( p, m )
2
Error (Ci )  i
pCi
mi is the mean (centroid) of Ci


Squared Error of a clustering
k k
Error   Error (Ci )   d ( p, mi )
2
i 1 i 1 pCi
1
Example of Square Error of Cluster
Ci={P1, P2, P3}
10
9 P1 = (3, 7)
8 P1 P2 = (2, 3)
7 P3 = (7, 5)
6 P3 mi = (4, 5)
5
4 P2 mi
|d(P1, mi)|2
3
=(3-4)2+(7-5)2=5
2
|d(P2, mi)|2=8
1
|d(P3, mi)|2=9
0 1 2 3 4 5 6 7 8 9 10
Error (Ci)=5+8+9=22 2
Example of Square Error of Cluster
Cj={P4, P5, P6}
10
9 P4 = (4, 6)
8 P5 = (5, 5)
7 P4 P6 = (3, 4)
6 P5 mj = (4, 5)
5
4 mj
P6 |d(P4, mj)|2
3
=(4-4)2+(6-5)2=1
2
|d(P5, mj)|2=1
1
|d(P6, mj)|2=1
0 1 2 3 4 5 6 7 8 9 10
Error (Cj)=1+1+1=3 3
Partitioning Algorithms: Basic Concepts
 Global optimal: examine all possible partitions

kn possible partitions, too expensive!
 Heuristic methods: k-means and k-medoids

k-means (MacQueen’67): Each cluster is
represented by center of cluster

k-medoids (Kaufman & Rousseeuw’87): Each
cluster is represented by one of the objects
(medoid) in cluster
4
K-means
 Initialization

Arbitrarily choose k objects as the initial cluster centers
(centroids)
 Iteration until no change

For each object Oi
 Calculate the distances between O and the k centroids
i
 (Re)assign O to the cluster whose centroid is the closest
i
to Oi

Update the cluster centroids based on current assignment
5
k-Means Clustering Method cluster
10 current 10
mean
9 clusters 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
objects
new relocated
clusters
10
10
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
6
Example
 For simplicity, 1 dimensional objects and k=2.
 Objects: 1, 2, 5, 6,7
 K-means:

Randomly select 5 and 6 as initial centroids;

=> Two clusters {1,2,5} and {6,7}; meanC1=8/3,
meanC2=6.5

=> {1,2}, {5,6,7}; meanC1=1.5, meanC2=6

=> no change.

Aggregate dissimilarity = 0.5^2 + 0.5^2 + 1^2 + 1^2 =
2.5
7
Variations of k-Means Method
 Aspects of variants of k-means

Selection of initial k centroids
 E.g., choose k farthest points

Dissimilarity calculations
 E.g., use Manhattan distance

Strategies to calculate cluster means
 E.g., update the means incrementally
8
Strengths of k-Means Method
 Strength

Relatively efficient for large datasets
 O(tkn) where n is # objects, k is # clusters, and t is #
iterations; normally, k, t <<n


Often terminates at a local optimum
 global optimum may be found using techniques such as
deterministic annealing and genetic algorithms
9
Weakness of k-Means Method
 Weakness

Applicable only when mean is defined, then what about
categorical data?
 k-modes algorithm

Unable to handle noisy data and outliers
 k-medoids algorithm

Need to specify k, number of clusters, in advance
 Hierarchical algorithms
 Density-based algorithms
10
k-modes Algorithm age income student credit_rating
< = 30 high no fair
 Handling categorical data: < = 30 high no excellent
31…40 high no fair
k-modes (Huang’98) > 40 medium no fair

Replacing means of > 40 low yes fair
> 40 low yes excellent
clusters with modes
31…40 low yes excellent
 Given n records in
< = 30 medium no fair
cluster, mode is record < = 30 low yes fair
made up of most > 40 medium yes fair
< = 30 medium yes excellent
frequent attribute 31…40 medium no excellent
values 31…40 high yes fair
 In the example cluster, mode = (<=30, medium, yes, fair)

Using new dissimilarity measures to deal with
categorical objects
11
A Problem of K-means
 Sensitive to outliers

Outlier: objects with extremely large (or small) values
 May substantially distort the distribution of the data
+
+
Outlier
12
k-Medoids Clustering Method
 k-medoids: Find k representative objects, called medoids

PAM (Partitioning Around Medoids, 1987)

CLARA (Kaufmann & Rousseeuw, 1990)

CLARANS (Ng & Han, 1994): Randomized sampling
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
k-means k-medoids
13
PAM (Partitioning Around Medoids) (1987)
 PAM (Kaufman and Rousseeuw, 1987)

 Arbitrarily choose k objects as the initial medoids
 Until no change, do

(Re)assign each object to the cluster with the nearest
medoid

Improve the quality of the k-medoids
(Randomly select a nonmedoid object, Orandom,
compute the total cost of swapping a medoid with
Orandom)
 Work for small data sets (100 objects in 5 clusters)
 Not efficient for medium and large data sets
14
Swapping Cost
 For each pair of a medoid m and a non-medoid object
h, measure whether h is better than m as a medoid
 Use the squared-error criterion
k
E    d ( p, mi ) 2
i 1 pCi

Compute Eh-Em

Negative: swapping brings benefit
 Choose the minimum swapping cost
15
Four Swapping Cases
 When a medoid m is to be swapped with a non-medoid
object h, check each of other non-medoid objects j

j is in cluster of m reassign j
 Case 1: j is closer to some k than to h; after swapping m and
h, j relocates to cluster represented by k
 Case 2: j is closer to h than to k; after swapping m and h, j
is in cluster represented by h

j is in cluster of some k, not m  compare k with h
 Case 3: j is closer to some k than to h; after swapping m and
h, j remains in cluster represented by k

 Case 4: j is closer to h than to k; after swapping m and h, j
is in cluster represented by h
16
PAM Clustering: Total swapping cost TCmh=jCjmh
Case 1 10 Case 3 10
9 9
j
8
h 8
k
7
6
j 7
h
5
4 m k 4
2
3
2
m
1
1
0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
C jmh = d(j, k)  d(j, m)  0 C jmh = d(j, k)  d(j, k)= 0

Case 2 10
Case 4 10
9
9
8
k 8
7
h 7
6
j 6
5
5 m
4
3
m 4
h j
3
2
1
2
1
k
0
0
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
C jmh = d(j, h)  d(j, m) C jmh = d(j, h)  d(j, k) < 0

May be positiv e or negativ e
17
Complexity of PAM
 Arbitrarily choose k objects as
the initial medoids O(1)
 Until no change, do O((n-k)2*k)

(Re)assign each object to the
cluster with the nearest medoid
O((n-k)*k)

Improve the quality of the k-
medoids O((n-k)2*k)
 For each pair of medoid m
and non-medoid object h (n-k)*k times


Calculate the swapping
cost TCmh =jCjmh
O(n-k)
18
Strength and Weakness of PAM
 PAM is more robust than k-means in the presence of

outliers because a medoid is less influenced by outliers
or other extreme values than a mean
 PAM works efficiently for small data sets but does not
scale well for large data sets

O(k(n-k)2 ) for each iteration
where n is # of data objects, k is # of clusters
 Can we find the medoids faster?
19
CLARA (Clustering Large Applications) (1990)
 CLARA (Kaufmann and Rousseeuw in 1990)

Built in statistical analysis packages, such as S+
 It draws multiple samples of data set, applies PAM on
each sample, gives best clustering as output
 Handle larger data sets than PAM (1,000 objects in 10
clusters)
 Efficiency and effectiveness depends on the sampling
20
CLARA - Algorithm
 Set mincost to MAXIMUM;
 Repeat q times // draws q samples

Create S by drawing s objects randomly from D;

Generate the set of medoids M from S by applying the
PAM algorithm;

Compute cost(M,D)

If cost(M, D)<mincost
Mincost = cost(M, D);
Bestset = M;

Endif;
 Endrepeat;
 Return Bestset;
21
Complexity of CLARA
 Set mincost to MAXIMUM; O(1)
 Repeat q times O((s-k)2*k+(n-k)*k)

Create S by drawing s objects
randomly from D; O(1)

Generate the set of medoids M
from S by applying the PAM
algorithm; O((s-k)2*k)

Compute cost(M,D) O((n-k)*k)

If cost(M, D)<mincost O(1)
Mincost = cost(M, D);
Bestset = M;
Endif;
 Endrepeat;
 Return Bestset; 22
Strengths and Weaknesses of CLARA
 Strength:

Handle larger data sets than PAM (1,000 objects in 10
clusters)
 Weakness:

Efficiency depends on sample size

A good clustering based on samples will not necessarily
represent a good clustering of whole data set if sample is
biased
23
CLARANS (“Randomized” CLARA) (1994)
 CLARANS (A Clustering Algorithm based on
Randomized Search) (Ng and Han’94)
 CLARANS draws sample in solution space dynamically
 A solution is a set of k medoids
 The solutions space contains  n  solutions in total
k 
 
 The solution space can be represented by a graph where
every node is a potential solution, i.e., a set of k medoids
24
Graph Abstraction
 Every node is a potential solution (k-medoid)
 Every node is associated with a squared error
 Two nodes are adjacent if they differ by one medoid
 Every node has k(nk) adjacent nodes
{O1,O2,…,Ok}
k(n k)
{Ok+1,O2,…,Ok}
… {Ok+n,O2,…,Ok}
… neighbors for
one node
n-k neighbors for

one medoid 25
Graph Abstraction: CLARANS
 Start with a randomly selected node, check at most m
neighbors randomly
 If a better adjacent node is found, moves to node and
continue; otherwise, current node is local optimum; re-
starts with another randomly selected node to search
for another local optimum
 When h local optimum have been found, returns best
result as overall result
26
CLARANS Compare no more than
maxneighbor times
N C N
N

N
<

C
… Local
minimum
 
N N numlocal
… Local
minimum
… Local
minimum
Best Node
… Local
minimum
27
CLARANS - Algorithm
 Set mincost to MAXIMUM;
 For i=1 to h do // find h local optimum

Randomly select a node as the current node C in the graph;

J = 1; // counter of neighbors

Repeat
Randomly select a neighbor N of C;
If Cost(N,D)<Cost(C,D)
Assign N as the current node C;
J = 1;
Else J++;
Endif;

Until J > m

Update mincost with Cost(C,D) if applicableEnd for;
 End For
 Return bestnode;
28
Graph Abstraction (k-means, k-modes, k-medoids)
 Each vertex is a set of k-representative objects (means,
modes, medoids)
 Each iteration produces a new set of k-representative
objects with lower overall dissimilarity
 Iterations correspond to a hill descent process in a
landscape (graph) of vertices
29
Comparison with PAM
 Search for minimum in graph (landscape)
 At each step, all adjacent vertices are examined; the one
with deepest descent is chosen as next k-medoids
 Search continues until minimum is reached
 For large n and k values (n=1,000, k=10), examining all
k(nk) adjacent vertices is time consuming; inefficient
for large data sets
 CLARANS vs PAM

For large and medium data sets, it is obvious that
CLARANS is much more efficient than PAM

For small data sets, CLARANS outperforms PAM
significantly
30
When n=80,
CLARANS is 5
times faster
than PAM,
while the
cluster quality
is the same.
31
Comparision with CLARA
 CLARANS vs CLARA

CLARANS is always able to find clusterings of better
quality than those found by CLARA; CLARANS may use
much more time than CLARA

When the time used is the same, CLARANS is still better
than CLARA
32
33
Hierarchies of Co-expressed Genes and Coherent Patterns
The interpretation of
co-expressed genes
and coherent patterns
mainly depends on the
domain knowledge
34
A Subtle Situation
 To split or not to split? It’s a question.
group A1
group A2
group A
35

Partitioning Algorithms: Basic Concepts: Partition N Objects Into K Clusters

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Partitioning Algorithms: Basic Concepts: Partition N Objects Into K Clusters

Uploaded by

Copyright:

Available Formats

Partitioning Algorithms: Basic Concepts

 Partition n objects into k clusters

mi is the mean (centroid) of Ci

iterations; normally, k, t <<n

deterministic annealing and genetic algorithms

 PAM (Kaufman and Rousseeuw, 1987)

h, j remains in cluster represented by k

C jmh = d(j, k)  d(j, m)  0 C jmh = d(j, k)  d(j, k)= 0

C jmh = d(j, h)  d(j, m) C jmh = d(j, h)  d(j, k) < 0

and non-medoid object h (n-k)*k times

 PAM is more robust than k-means in the presence of

n-k neighbors for

 To split or not to split? It’s a question.

You might also like