Professional Documents
Culture Documents
Dr.Vani V
Source : Textbook 1 & Textbook 2
Cluster Analysis: Basic Concepts and
Methods
■ Cluster Analysis: Basic Concepts
■ Partitioning Methods
■ Hierarchical Methods
■ Density-Based Methods
■ Grid-Based Methods
■ Evaluation of Clustering
■ Summary
62
Hierarchical Clustering
■ Use distance matrix as clustering criteria. This method
does not require the number of clusters k as an input,
but needs a termination condition
Step 0 Step 1 Step 2 Step 3 Step 4 agglomerative
(AGNES)
a
ab
b
abcde
c
cde
d
de
e
divisive
Step 3 Step 2 Step 1 Step 0 (DIANA)
Step 4
63
Hierarchical Clustering
0.1
2
1
0.05
3 1
0
1 3 2 5 4 6
21/01/22 64
AGNES (Agglomerative Nesting)
■ Introduced in Kaufmann and Rousseeuw (1990)
■ Implemented in statistical packages, e.g., Splus
■ Use the single-link method and the dissimilarity matrix
■ Merge nodes that have the least dissimilarity
■ Go on in a non-descending fashion
■ Eventually all nodes belong to the same cluster
65
Dendrogram: Shows How Clusters are
Merged
Decompose data objects into a several levels of nested partitioning (tree of
clusters), called a dendrogram
66
Agglomerative Clustering Algorithm
■ More popular hierarchical clustering technique
21/01/22 67
DIANA (Divisive Analysis)
68
Distance between X X
Clusters
■ Single link: smallest distance between an element in one cluster
and an element in the other, i.e., dist(Ki, Kj) = min(tip, tjq)
1. Calculate the distance matrix. 2. Calculate three cluster distances between C1 and C2.
Single link
a b c d e dist(C1 , C 2 ) = min{d(a, c), d(a, d), d(a, e), d(b, c), d(b, d), d(b, e)}
a 0 1 3 4 5 = min{3, 4, 5, 2, 3, 4} = 2
b 1 0 2 3 4 Complete link
dist(C1 , C 2 ) = max{d(a, c), d(a, d), d(a, e), d(b, c), d(b, d), d(b, e)}
c 3 2 0 1 2
= max{3, 4, 5, 2, 3, 4} = 5
d 4 3 1 0 1 Average
e 5 4 2 1 0 d(a, c) + d(a, d) + d(a, e) + d(b, c) + d(b, d) + d(b, e)
dist(C1 , C 2 ) =
6
3 + 4 + 5 + 2 + 3 + 4 21
= = = 3.5
6 6
21/01/22 71
Centroid, Radius and Diameter of X
72
Starting Situation
■ Start with clusters of individual points and a proximity matrix
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
.
.
Proximity
. Matrix
...
p1 p2 p3 p4 p9 p10 p11 p12
21/01/22 73
Intermediate Situation
■ After some merging steps, we have some clusters
C1 C2 C3 C4 C5
C1
C2
C3 C3
C4 C4
C5
C1 Proximity Matrix
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
21/01/22 74
Intermediate Situation
■ We want to merge the two closest clusters (C2 and C5) and
update the proximity matrix.
C1 C2 C3 C4 C5
C1
C2
C3 C3
C4
C4
C5
Proximity Matrix
C1
C2 C5
...
p1 p2 p3 p4 p9 p10 p11 p12
21/01/22 75
After Merging
■ The question is “How do we update the proximity matrix?”
C2
U
C1 C5 C3 C4
C1 ?
C2 U C5 ? ? ? ?
C3
C3 ?
C4
C4 ?
C1 Proximity Matrix
C2 U C5
...
p1 p2 p3 p4 p9 p10 p11 p12
21/01/22 76
How to Define Inter-Cluster Similarity?
p1 p2 p3 p4 p5 ...
p1
Similarity?
p2
p3
p4
p5
! MIN .
! MAX .
.
! Group Average Proximity Matrix
! Distance Between Centroids
! Other methods driven by an objective function
– Ward’s Method uses squared error
21/01/22 77
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
! MIN .
! MAX .
! Group Average .
Proximity Matrix
! Distance Between Centroids
! Other methods driven by an objective
function
– Ward’s Method uses squared error
21/01/22 78
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
! MIN
.
! MAX .
! Group Average .
Proximity Matrix
! Distance Between Centroids
! Other methods driven by an objective
function
– Ward’s Method uses squared error
21/01/22 79
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
! MIN
.
! MAX
.
! Group Average .
Proximity Matrix
! Distance Between Centroids
! Other methods driven by an objective
function
– Ward’s Method uses squared error
21/01/22 80
How to Define Inter-Cluster Similarity
p1 p2 p3 p4 p5 ...
p1
´ ´ p2
p3
p4
p5
! MIN
.
! MAX
.
! Group Average .
Proximity Matrix
! Distance Between Centroids
! Other methods driven by an objective
function
– Ward’s Method uses squared error
21/01/22 81
Single Link – Complete Link
21/01/22 82
Hierarchical Clustering: MIN
1 2 3 4 5 6
1 0 .24 .22 .37 .34 .23
0.1
0.05
4 0.35
0.3
0.25
0.2
0.15
0.05
0
3 6 4 1 2 5
21/01/22 84
Cluster Similarity: Group Average
■ Proximity of two clusters is the average of pairwise proximity
between points in the two clusters.
å proximity(p , p )
piÎClusteri
i j
p jÎClusterj
proximity(Clusteri , Clusterj ) =
|Clusteri |*|Clusterj |
0.15
0.1
0
3 6 4 1 2 5
21/01/22 86
Hierarchical Clustering: Group Average
21/01/22 87
Cluster Similarity: Ward’s Method
21/01/22 88
Hierarchical Clustering: Comparison
5
1 4 1
3
2 5
5 5
2 1 2
MIN MAX
2 3 6 3 6
3
1
4 4
4
5
1 5 4 1
2 2
5 Ward’s Method 5
2 2
3 6 Group Average 3 6
3
4 1 1
4 4
3
21/01/22 89
Agglomerative Algorithm
■ The Agglomerative algorithm is carried out in three steps:
21/01/22 90
Example
■ Problem: clustering analysis with agglomerative algorithm
data matrix
Euclidean distance
distance matrix
21/01/22 91
Example
■ Merge two closest clusters (iteration 1)
21/01/22 92
Example
■ Update distance matrix (iteration 1)
21/01/22 93
Example
■ Merge two closest clusters (iteration 2)
21/01/22 94
Example
■ Update distance matrix (iteration 2)
21/01/22 95
Example
■ Merge two closest clusters/update distance matrix (iteration 3)
21/01/22 96
Example
■ Merge two closest clusters/update distance matrix (iteration 4)
21/01/22 97
Example
21/01/22 98
Example
■ Dendrogram tree representation
21/01/22 99
Example
■ Dendrogram tree representation: “clustering” USA states
21/01/22 100
Strengths of Hierarchical Clustering
21/01/22 101
Hierarchical Clustering: Problems and
Limitations
■ Computational complexity in time and space
■ Once a decision is made to combine two clusters, it cannot
be undone
■ No objective function is directly minimized
■ Different schemes have problems with one or more of the
following:
– Sensitivity to noise and outliers
– Difficulty handling different sized clusters and convex
shapes
– Breaking large clusters
21/01/22 102
Exercise
Given a data set of five objects characterized by a single continuous feature:
a b C d e
Feature 1 2 4 5 6
Apply the agglomerative algorithm with single-link, complete-link and averaging cluster
distance measures to produce three dendrogram trees, respectively.
a b c d e
a 0 1 3 4 5
b 1 0 2 3 4
c 3 2 0 1 2
d 4 3 1 0 1
e 5 4 2 1 0
21/01/22 103
Summary
■ Hierarchical algorithm is a sequential clustering algorithm
– Use distance matrix to construct a tree of clusters (dendrogram)
– Hierarchical representation without the need of knowing # of clusters (can
set termination condition with known # of clusters)
■ Major weakness of agglomerative clustering methods
– Can never undo what was done previously
– Sensitive to cluster distance measures and noise/outliers
– Less efficient: O (n2 logn), where n is the number of total objects
■ There are several variants to overcome its weaknesses
– BIRCH: scalable to a large data set
– ROCK: clustering categorical data
– CHAMELEON: hierarchical clustering using dynamic modelling
21/01/22 104