You are on page 1of 21

Hierarchical Clustering

Ke Chen

COMP24111 Machine Learning

Outline
• Introduction
• Cluster Distance Measures
• Agglomerative Algorithm
• Example and Demo
• Relevant Issues
• Summary
COMP24111 Machine Learning

2

Divisive – Two sequential clustering strategies for constructing a tree of clusters – Agglomerative: a bottom-up strategy  Initially each data object is in its own (atomic) cluster  Then merge these atomic clusters into larger and larger clusters – Divisive: a top-down strategy  Initially all objects are in one single cluster  Then the cluster is subdivided into smaller and smaller clusters COMP24111 Machine Learning 3 .Introduction • Hierarchical Clustering Approach – A typical clustering analysis approach via partitioning data set sequentially – Construct nested partitions layer by layer via grouping objects into a tree of clusters (without the need to know the number of clusters in advance) – Use (generalised) distance matrix as clustering criteria • Agglomerative vs.

c.e } Step 1 Step 2 Step 3 Step 4 Step 0 Agglomerative a ab b abcde c cde d de e Step 4  Cluster distance  Termination condition Divisive Step 3 Step 2 Step 1 Step 0 COMP24111 Machine Learning 4 .Introduction • Illustrative Example Agglomerative and divisive clustering on the data set {a. b. d .

Cluster Distance Measures • Single link: smallest distance between an element in one cluster single link (min) and an element in the other.e. xjq)} COMP24111 Machine Learning d(C. C)=0 5 .. xjq)} • Complete link: largest distance between an element in one cluster complete link (max) and an element in the other. i. Cj) = max{d(xip. i.e. xjq)} • Average: avg distance between elements in one cluster and average elements in the other.e. d(Ci. Cj) = avg{d(xip. d(Ci. d(Ci. Cj) = min{d(xip... i.

d(a.Single 2. c).5 6 6 dist(C1 . d. c). b c d e dist(C1 . d(a. d).Cluster Distance Measures Example: Given a data set of five objects characterised by a single continuous feature. d(b. d(b. e). 5. 4}  2 Complete link a 0 1 3 4 5 b 1 0 2 3 4 c 3 2 0 1 2 dist(C 1 . c)  d(a. d(b. 4. e}. C 2 )  COMP24111 Machine Learning 6 . d(b. 5. d(b. 4. d(a. e) 6 3  4  5  2  3  4 21    3. e). 2. 2. C 2 )  min{d(a. a b c d e Feature 1 2 4 5 6 1. d). c)  d(b. e)} d 4 3 1 0 1 Average e 5 4 2 1 0  max{3. d(b. d). e)  d(b. 4}  5 d(a. d)  d(a. assume that there are two clusters: C 1: {a. C 2 )  max{d(a. 3. c). b} and C2: {c. e)}  min{3. link Calculate three cluster distances between C1 andaC2. d(a. d). c). d)  d(b. 3. Calculate the distance matrix.

we will have N clusters at the beginning) 3)Repeat until number of cluster is one (or known # of clusters)  Merge two closest clusters COMP24111 Machine Learning  Update “distance 7 .Agglomerative Algorithm • The Agglomerative algorithm is carried out in three steps: 1)Convert all object features into a distance matrix 2)Set each object as a cluster (thus if we have N objects.

Example • Problem: clustering analysis with agglomerative algorithm data matrix Euclidean distance distance matrix COMP24111 Machine Learning 8 .

Example • Merge two closest clusters (iteration 1) COMP24111 Machine Learning 9 .

Example • Update distance matrix (iteration 1) COMP24111 Machine Learning 10 .

Example • Merge two closest clusters (iteration 2) COMP24111 Machine Learning 11 .

Example • Update distance matrix (iteration 2) COMP24111 Machine Learning 12 .

Example • Merge two closest clusters/update distance matrix (iteration 3) COMP24111 Machine Learning 13 .

Example • Merge two closest clusters/update distance matrix (iteration 4) COMP24111 Machine Learning 14 .

Example • Final result (meeting termination condition) COMP24111 Machine Learning 15 .

C). In the beginning we have 6 clusters: A. F). E) and C into (((D. B) at distance 0. F) into ((D. F). thus conclude the computation COMP24111 Machine Learning 16 .00 5. E). We merge clusters ((D. E and F 2. F) at distance 0. (A. B. B)) at distance 2. F). B) into ((((D. F). E) at distance 1. We merge clusters E and (D. C.71 4. E).41 6. E).Example • Dendrogram tree representation 6 lifetime 5 4 3 2 object 1. D. We merge clusters D and F into cluster (D.50 3. We merge clusters (((D. F). C) at distance 1.50 7. We merge cluster A and cluster B into (A. C) and (A. The last cluster contain all the objects.

Example • Dendrogram tree representation: “clustering” USA states COMP24111 Machine Learning 17 .

a 0 1 3 4 5 b 1 0 2 3 4 c 3 2 0 1 2 d 4 3 1 0 1 e 5 4 2 1 0 COMP24111 Machine Learning 18 . a b c d e respectively. complete-link and averaging cluster distance measures to produce three dendrogram trees.Exercise Given a data set of five objects characterised by a single continuous feature: a b C d e Feature 1 2 4 5 6 Apply the agglomerative algorithm with single-link.

Demo Agglomerative Demo COMP24111 Machine Learning 19 .

Relevant Issues • How to determine the number of clusters – – – If the number of clusters known. termination condition is given! The K-cluster lifetime as the range of threshold value on the dendrogram tree that leads to the identification of K clusters Heuristic rule: cut a dendrogram tree with maximum Kcluster life time COMP24111 Machine Learning 20 .

Summary • Hierarchical algorithm is a sequential clustering algorithm – – Use distance matrix to construct a tree of clusters (dendrogram) Hierarchical representation without the need of knowing # of clusters (can set termination condition with known # of clusters) • Major weakness of agglomerative clustering methods – – – Can never undo what was done previously Sensitive to cluster distance measures and noise/outliers Less efficient: O (n2 logn).com/watch?v=aYzjenNNOcc COMP24111 Machine Learning 21 .youtube. where n is the number of total objects • There are several variants to overcome its weaknesses – – – BIRCH: scalable to a large data set ROCK: clustering categorical data CHAMELEON: hierarchical clustering using dynamic modelling Online tutorial: the hierarchical clustering functions in Matlab https://www.