Professional Documents
Culture Documents
Lecture 10
UN-SUPERVISED LEARNING
hammad.afzal@mcs.edu.pk
2
CLUSTERING
There is no explicit teacher and the system forms clusters
or “natural groupings” or structure in the input pattern
3
CLUSTERING
Data WITHOUT classes or labels
1 2 3
x , x , x , x n , x d
In this case we easily identify the 4 clusters into which the data
can be divided;
6
TYPES OF CLUSTERING
Partitional clustering
Construct a partition of a data set to produce several
clusters – At once
The process is repeated iteratively – Termination
condition
Examples
K-means clustering
Fuzzy c-means clustering
7
K MEANS CLUSTERING
1. Chose the number (K) of clusters and randomly select the
centroids of each cluster.
2. For each data point:
I. Calculate the distance from the data point to each cluster.
II. Assign the data point to the closest cluster.
3. Recompute the centroid of each cluster.
4. Repeat steps 2 and 3 until there is no further change in the
assignment of data points (or in the centroids).
8
K MEANS – EXAMPLE 2
Suppose we have 4 medicines and each has two attributes (pH and
weight index). Our goal is to group these objects into K=2 clusters
of medicine
9
K MEANS – EXAMPLE 2
Compute the distance between all samples and K centroids
c1 A , c 2 B
d( D , c1 ) ( 5 1)2 ( 4 1)2 5
d( D , c2 ) ( 5 2)2 ( 4 1)2 4.24
10
K MEANS – EXAMPLE 2
Assign the sample to its closest cluster
An elements in a row of the
Group matrix below is 1 if
and only if the object is
assigned to that group
11
K MEANS – EXAMPLE 2
Re-calculate the K-centroids
c1 (1, 1)
2 4 5 1 3 4
c2 ,
3 3
(11 / 3, 8 / 3)
(3.67, 2.67)
12
K MEANS – EXAMPLE 2
Repeat the above steps
13
K MEANS – EXAMPLE 2
14
K MEANS – EXAMPLE 2
1 2 11 1
c1 , (1 , 1)
2 2 2
45 34 1 1
c2 , (4 , 3 )
2 2 2 2
15
K MEANS – EXAMPLE 2
16
K MEANS – EXAMPLE 2
17
K MEANS – EXAMPLE 2
18
Hierarchical
Clustering
19
HIERACHICAL CLUSTERING
Agglomerative and divisive clustering on the data set {a, b, c, d ,e }
a
ab
b
abcde
c
cde
d
de
e 20
Divisive
Step 4 Step 3 Step 2 Step 1 Step 0
AGGLOMERATIVE CLUSTERING
1. Convert object attributes to distance matrix
2. Set each object as a cluster (thus if we have N objects, we
will have N clusters at the beginning)
3. Repeat until number of cluster is one (or known # of
clusters)
a. Merge two closest clusters
b. Update distance matrix
d3
d5
d1 d3,d4,d5
d4
d2 21
d1,d2 d4,d5 d3
STARTING SITUATION
Start with clusters of individual points and a
distance/proximity matrix
p1 p2 p3 p4 p5 ...
p1
p2
p3
p4
p5
.
.
.
Distance Matrix
...
p1 p2 p3 p4 p9 p10 p11 p12 22
INTERMEDIATE SITUATION
After some merging steps, we have some clusters
C1 C2 C3 C4 C5
C1
C2
C3
C3
C4
C4
C5
C1 Distance Matrix
C2 C5
... 23
p1 p2 p3 p4 p9 p10 p11 p12
INTERMEDIATE SITUATION
How do we compare two clusters
C3
C4
C1
C2 C5
24
INTER CLUSTER DISTANCE MEASURES
Similarity?
Single Link
Average Link
Complete Link
25
INTERMEDIATE SITUATION
We want to merge the two closest clusters (C2 and C5)
and update the distance matrix.
C1 C2 C3 C4 C5
C1
C2
C3
C3
C4
C4
C5
C1
Distance Matrix
C2 C5
26
...
p1 p2 p3 p4 p9 p10 p11 p12
SINGLE LINK
Smallest distance between an element in one cluster and
an element in the other
D (ci , c j ) min D( x, y )
xci , yc j
27
COMPLETE LINK
Largest distance between an element in one cluster and
an element in the other
D(ci , c j ) max D( x, y )
xci , yc j
28
AVERAGE LINK
Avg distance between an element in one cluster and an
element in the other
D(ci , c j ) avg D( x, y )
xci , yc j
29
DISTANCE BETWEEN CENTROIDS
Distance between the centroids of two clusters
30
AFTER MERGING
Update the distance matrix
C2
U
C1 C5 C3 C4
C1 ?
C3
C2 U C5 ? ? ? ?
C4
C3 ?
C4 ?
C1
C2 U C5
31
...
p1 p2 p3 p4 p9 p10 p11 p12
AGGLOMERATIVE CLUSTERING -
EXAMPLE
X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
F 3 3.5
Data matrix
Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
dAB = ((1-1.5)2+(1-1.5)2)1/2 = 0.707
C 5.66 4.95 0.00 2.24 1.41 2.50
Euclidean distance D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
32
F 3.20 2.50 2.50 0.50 1.12 0.00
Merge two closest clusters
AGGLOMERATIVE CLUSTERING -
EXAMPLE
X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
Merge them into
single cluster` F 3 3.5
Data matrix
Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50
D 3.61 2.92 2.24 0.00 1.00 0.50
Find two closest clusters
E 4.24 3.54 1.41 1.00 0.00 1.12
33
F 3.20 2.50 2.50 0.50 1.12 0.00
Update Distance Matrix
AGGLOMERATIVE CLUSTERING -
EXAMPLE
Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50
D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00
Dist A B C D,F E
A 0.00 0.71 5.66 ? 4.24
B 0.71 0.00 4.95 ? 3.54
C 5.66 4.95 0.00 ? 1.41
D,F ? ? ? 0.00 ?
E 4.24 3.54 1.41 ? 0.00
34
Update Distance Matrix
AGGLOMERATIVE CLUSTERING -
EXAMPLE
Dist A B C D E F Min Distance – Single Linkage
A 0.00 0.71 5.66 3.61 4.24 3.20
D(D,F)→A = min(dDA,dFA)=min(3.61,3.20) = 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50 D(D,F)→B = min(dDB,dFB)=min(2.92,2.50) = 2.50
D 3.61 2.92 2.24 0.00 1.00 0.50
D(D,F)→C = min(dDC,dFC)=min(2.24,2.50) = 2.24
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00 D(D,F)→E = min(dDE,dFE)=min(1.00,1.12) = 1.00
AGGLOMERATIVE CLUSTERING -
EXAMPLE
Dist A B C D,F E
A 0.00 0.71 5.66 3.20 4.24
B 0.71 0.00 4.95 2.50 3.54
C 5.66 4.95 0.00 2.24 1.41
D,F 3.20 2.50 2.24 0.00 1.00
E 4.24 3.54 1.41 1.00 0.00
36
Update Distance Matrix
AGGLOMERATIVE CLUSTERING -
EXAMPLE
Dist A B C D,F E
A 0.00 0.71 5.66 3.20 4.24 D(A,B)→C = min(dCA,dCB)=min(5.66,4.95) = 4.95
B 0.71 0.00 4.95 2.50 3.54
C 5.66 4.95 0.00 2.24 1.41 D(A,B)→(D,F) = min(dDA,dDB, dFA,dFB)
D,F 3.20 2.50 2.24 0.00 1.00 =min(3.61, 2.92, 3.20, 2.50) = 2.50
AGGLOMERATIVE CLUSTERING -
EXAMPLE
Dist A,B C D,F E
A,B 0.00 4.95 2.50 3.54
C 4.95 0.00 2.24 1.41
D,F 2.50 2.24 0.00 1.00
E 3.54 1.41 1.00 0.00
38
Merge two closest clusters/Update Distance Matrix
AGGLOMERATIVE CLUSTERING -
EXAMPLE
Dist (A,B) C (D,F),E
(A,B) 0.00 4.95 2.50
C 4.95 0.00 1.41
(D,F),E 2.50 1.41 0.00
39
Final Result
AGGLOMERATIVE CLUSTERING -
EXAMPLE
X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
F 3 3.5
Data matrix
40
Dendrogram Representation
AGGLOMERATIVE CLUSTERING -
EXAMPLE
1. In the beginning we have 6 clusters: A,
B, C, D, E and F
2. We merge cluster D and F into cluster
6 (D, F) at distance 0.50
3. We merge cluster A and cluster B into
(A, B) at distance 0.71
4. We merge cluster E and (D, F) into
((D, F), E) at distance 1.00
5. We merge cluster ((D, F), E) and C into
5 (((D, F), E), C) at distance 1.41
6. We merge cluster (((D, F), E), C) and
(A, B) into ((((D, F), E), C), (A, B)) at
4 distance 2.50
3 7. The last cluster contain all the objects,
2 thus conclude the computation
41