You are on page 1of 22

Agglomerative Hierarchical Clustering

Agglomerative clustering is a hierarchical cluster technique that


builds nested clusters with a bottom-up approach where each data
point starts in its own cluster and as we move up, the clusters are
merged, based on a distance matrix.

1
Agglomerative Hierarchical Clustering

We start with each object in a cluster of its own and then


repeatedly merge the closest pair of clusters until we end up with
just one cluster containing everything.
The basic algorithm is as follows:

1. Assign each object to its own single-object cluster and


calculate the distance between each pair of clusters.
2. Choose the closest pair of clusters and merge them into a
single cluster (so reducing the total number of clusters by
one).
3. Calculate the distance between the new cluster and each of the
old clusters.
4. Repeat steps 2 and 3 until all the objects are in a single cluster.
2
Agglomerative Hierarchical Clustering

Suppose we start with eleven objects A, B, C, . . . , K located as


shown and we merge clusters on the basis of Euclidean distance.

3
Agglomerative Hierarchical Clustering

Let us assume the process starts by choosing objects A and B as the


pair that are closest and merging them into a new cluster which we
will call AB. The next
step may be to choose
clusters AB and C as
the closest pair and to
merge them.
After two passes the
clusters then look as
shown.

4
Agglomerative Hierarchical Clustering

We will use notation such as A and B → AB to mean ‘clusters A


and B are merged to form a new cluster, which we will call AB’.
Without knowing the precise distances between each pair of
objects, a plausible sequence of events is as follows.

5
Agglomerative Hierarchical Clustering

The final result of this hierarchical clustering process is shown


below which is called a dendrogram. A dendrogram is a binary tree
(two branches at each node). However, the positioning of the
clusters does not correspond to their physical location in the
original diagram. All the original objects are placed at the same
level (the bottom of the diagram), as leaf nodes

6
Linkage Methods of Clustering
Single Linkage
Minimum
Distance
Cluster Cluster
1 Complete Linkage 2
Maximum
Distance
Cluster Cluster
1 Average Linkage
2

Average
Cluster Cluster
Distance
Agglomerative Hierarchical Clustering; dendrogram

8
Agglomerative Hierarchical Clustering

9
Example
• Problem: clustering analysis with
agglomerative algorithm

data matrix

Euclidean distance

distance matrix

10
Example
• Merge two closest clusters
(iteration 1)

11
Example
• Update distance matrix (iteration 1)

12
Example
• Merge two closest clusters (iteration 2)

13
Example
• Update distance matrix (iteration 2)

14
Example
• Merge two closest clusters/update
distance matrix (iteration 3)

15
Example
• Merge two closest clusters/update
distance matrix (iteration 4)

16
Example

• Final result (meeting termination


condition)

17
Key Concepts in Hierarchal
Clustering
• Dendrogram tree representation

1. In the beginning we have 6


6 clusters: A, B, C, D, E and F
2. We merge clusters D and F into
cluster (D, F) at distance 0.50
3. We merge cluster A and cluster B
lifetime

into (A, B) at distance 0.71


4. We merge clusters E and (D, F)
5 into ((D, F), E) at distance 1.00
5. We merge clusters ((D, F), E) and C
into (((D, F), E), C) at distance 1.41
4
6. We merge clusters (((D, F), E), C)
3 and (A, B) into ((((D, F), E), C), (A, B))
2 at distance 2.50
7. The last cluster contain all the objects,
thus conclude the computation

object

18
Key Concepts in Hierarchal
Clustering
• Lifetime vs K-cluster Lifetime
• Lifetime
The distance between that a cluster is created
and that it disappears (merges with other
clusters during clustering).
6 e.g. lifetime of A, B, C, D, E and F are 0.71,
0.71, 1.41,
0.50, 1.00 and 0.50, respectively, the life time
of (A, B) is
lifetime

2.50 – 0.71 = 1.79, ……

5 • K-cluster Lifetime
The distance from that K clusters emerge to
4 that K clusters
vanish (due to the reduction to K-1 clusters).
3
e.g.
2
5-cluster lifetime is 0.71 - 0.50 = 0.21
4-cluster lifetime is 1.00 - 0.71 = 0.29
3-cluster lifetime is 1.41 – 1.00 = 0.41
object 2-cluster lifetime is 2.50 – 1.41 = 1.09

19
Stopping criteria

• Heuristic Approach
• Cut dendrogram into 6
clusters by horizontal
line according to your

lifetime
choice of number of
clusters. 5
• The cut shown by 4
black line will give two 3
clusters. 2

• The cut shown by


green line will give
object
three clusters.
20
Stopping Criteria

– If the number of clusters known, termination


condition is given.
– The algorithm can be stopped when the desired
number of clusters have been formed.

21
Relevant Issues

• Major weakness of agglomerative


clustering methods
– Can never undo what was done previously
– Sensitive to cluster distance measures and noise/outliers
– Less efficient: O (n2 logn), where n is the number of total objects

• There are several variants to overcome its


weaknesses
– BIRCH: scalable to a large data set
– ROCK: clustering categorical data
– CHAMELEON: hierarchical clustering using dynamic modelling

22

You might also like