Professional Documents
Culture Documents
Outline: Part 1: Basic Concepts of Data Clustering Non-Supervised Learning and Clustering
Outline: Part 1: Basic Concepts of Data Clustering Non-Supervised Learning and Clustering
Clustering Algorithms
Outline
Part 1: Basic Concepts of data clustering
:
:
:
:
Hierarchical methods
:
:
:
Single-link
Complete-link
Clustering Based on Dissimilarity Increments Criteria
Unsupervised Learning
Clustering Algorithms
Hierarchical Clustering
Use proximity matrix: nxn
Step 0
Step 1
agglomerative
ab
abcde
cde
de
divisive
Step 4
Step 3
Unsupervised Learning
Clustering Algorithms
4. Repeat steps (2) e (3) until a single cluster is obtained (i.e. N-1
times)
Unsupervised Learning
Clustering Algorithms
d i j, k min d (i, k ), d ( j, k )
ai a j 0.5 ; b 0 ; c -0.5
Complete-link
ai a j 0.5 ; b 0 ; c 0.5
Centroid
ai
ni
ni n j
aj
nj
ni n j
d i j, k maxd (i, k ), d ( j, k )
ni n j
n n
c0
d (i j, k ) d ( i j , k )
ai a j 0.5 ; b 0.25 ; c 0
Median
(Average link)
Wards Method
(minimum variance)
ni
ai
ni n j
ai
aj
nk ni
nk ni n j
nj
bc0
ni n j
aj
nk n j
nk ni n j
d (Ci , C j )
1
ni n j
d ( a , b)
aCi ,bC j
nk
c0
nk ni n j
Unsupervised Learning
Clustering Algorithms
Single-link
ai a j 0.5 ; b 0 ; c -0.5
Complete-link
ai a j 0.5 ; b 0 ; c 0.5
Centroid
ni
ai
ni n j
aj
nj
ni n j
d i j, k maxd (i, k ), d ( j, k )
ni n j
n n
c0
d (i j, k ) d ( i j , k )
ai a j 0.5 ; b 0.25 ; c 0
Median
(Average link)
Wards Method
(minimum variance)
ni
ai
ni n j
ai
aj
nk ni
nk ni n j
nj
bc0
ni n j
aj
d (Ci , C j )
nk n j
nk ni n j
1
ni n j
d ( a , b)
aCi ,bC j
nk
c0
nk ni n j
Unsupervised Learning
Clustering Algorithms
Single-link
d i j, k min d (i, k ), d ( j, k )
ai a j 0.5 ; b 0 ; c -0.5
Complete-link
ai a j 0.5 ; b 0 ; c 0.5
Centroide
ai
ni
ni n j
aj
nj
ni n j
d i j, k maxd (i, k ), d ( j, k )
ni n j
n n
c0
d (i j, k ) d ( i j , k )
ai a j 0.5 ; b 0.25 ; c 0
Median
(Average link)
Wards Method
(minimum variance)
ni
ai
ni n j
ai
aj
nk ni
nk ni n j
nj
bc0
ni n j
aj
nk n j
nk ni n j
d (Ci , C j )
1
ni n j
d ( a , b)
aCi ,bC j
nk
c0
nk ni n j
Unsupervised Learning
Clustering Algorithms
Single-link
ai a j 0.5 ; b 0 ; c -0.5
Complete-link
ai a j 0.5 ; b 0 ; c 0.5
Centroide
ni
ai
ni n j
aj
nj
ni n j
d i j, k maxd (i, k ), d ( j, k )
ni n j
n n
c0
d (i j, k ) d ( i j , k )
ai a j 0.5 ; b 0.25 ; c 0
Median
(Average link)
Wards Method
(minimum variance)
ni
ai
ni n j
ai
aj
nk ni
nk ni n j
nj
bc0
ni n j
aj
d (Ci , C j )
nk n j
nk ni n j
1
ni n j
d ( a , b)
aCi ,bC j
nk
c0
nk ni n j
Unsupervised Learning
Clustering Algorithms
Single-link
d i j, k min d (i, k ), d ( j, k )
ai a j 0.5 ; b 0 ; c -0.5
Complete-link
ai a j 0.5 ; b 0 ; c 0.5
Centroid
ai
ni
ni n j
aj
nj
ni n j
d i j, k maxd (i, k ), d ( j, k )
ni n j
n n
c0
d (i j, k ) d ( i j , k )
ai a j 0.5 ; b 0.25 ; c 0
Median
(Average link)
Wards Method
(minimum variance)
ni
ai
ni n j
ai
aj
nk ni
nk ni n j
nj
bc0
ni n j
aj
nk n j
nk ni n j
d (Ci , C j )
1
ni n j
d ( a , b)
aCi ,bC j
nk
c0
nk ni n j
Unsupervised Learning
Clustering Algorithms
d i j, k min d (i, k ), d ( j, k )
Single-link
ai a j 0.5 ; b 0 ; c -0.5
Complete-link
ai a j 0.5 ; b 0 ; c 0.5
Centroid
ni
ai
ni n j
aj
nj
ni n j
d i j, k maxd (i, k ), d ( j, k )
ni n j
n n
c0
d (i j, k ) d ( i j , k )
ai a j 0.5 ; b 0.25 ; c 0
Median
ni
ai
ni n j
(Average link)
Wards Method
ai
(minimum variance)
aj
nj
bc0
ni n j
nk ni
nk ni n j
aj
nk n j
nk ni n j
d (Ci , C j )
1
ni n j
d ( a , b)
aCi ,bC j
nk
c0
nk ni n j
Unsupervised Learning
d Ci , C j min d (a, b)
Single Linkage:
x
15
24
5
1
1
24
-
2
12
4
Clustering Algorithms
aCi ,bC j
3
1
11.7
20
21.5
8.1
16
17.9
11.7
8.1
9.8
9.8
20
1,2
16
3
9.8
4
- 5
5
1,2
21.5
-
17.9
8.1
9.8
16
8 17.9 -
8.1
9.8
9.8
16
9.8
17.9
9.8
3
1
3
1
5
4
Unsupervised Learning
Clustering Algorithms
d Ci , C j min d (a, b)
Single Linkage:
aCi ,bC j
1,2
4,5
1,2
8.1
16
8.1
9.8
4,5
16
9.8
1,2,3
4,5
1,2,3
9.8
4,5
9.8
Dendrogram
Unsupervised Learning
Clustering Algorithms
d (a, b)
Complete-Link: d Ci , C j amax
C ,bC
i
11.7
20
21.5
8.1
16
17.9
11.7
8.1
9.8
9.8
20
16
9.8
1,2
21.5
3
17.9
4
9.8
5
8
1,2
11.7
20
21.5
11.7
9.8
9.8
20
9.8
21.5
1,2
9.8
3
8
4,5
1,2
11.7
21.5
3
4,5
11.7
9.8
21.5
9.8
5
4
5
4
5
4
Unsupervised Learning
Clustering Algorithms
d (a, b)
Complete-Link: d Ci , C j amax
C ,bC
i
1,2,3
4,5
1,2,3
21.5
4,5
21.5
3
1
Dendrogram
Unsupervised Learning
Clustering Algorithms
3
1
3
1
Single-link
Favours connectedness
Complete-link
Favours compactness
Unsupervised Learning
Clustering Algorithms
SL algorithm:
Favors connectedness
7
6
5
3
10
2
9
1
10
8
10
10
9
9
8
8
7
7
6
6
5
5
4
10
4
3
3
2
2
1
10
1
2
10
Unsupervised Learning
Clustering Algorithms
:
:
:
Favors connectedness
Detects arbitrary-shaped clusters with even densities
Cannot handle distinct density clusters
Unsupervised Learning
Clustering Algorithms
:
:
:
:
Favors connectedness
Detects arbitrary-shaped clusters with even densities
Cannot handle distinct density clusters
Is sensitive to in-between patterns
4
2
2
0
0
-2
-4
-5
4
2
-2
10
-4
-5
10
0
-2
Unsupervised Learning -- Ana Fred
-4
0
5 2009
17 From-5
Single Clustering to Ensemble
Methods - April
10
Unsupervised Learning
Clustering Algorithms
:
:
:
:
:
Favors connectedness
Detects arbitrary-shaped clusters with even densities
Cannot handle distinct density clusters
Is sensitive to in-between patterns
Needs criteria to set the final number of clusters
CL algorithm:
Favors compactness
44
22
00
-2-2
-4
-5-5
Unsupervised Learning -- -4
Ana Fred
00
55
1010
Unsupervised Learning
Clustering Algorithms
SL algorithm:
:
:
:
:
:
CL algorithm:
:
:
-2
-2
-1
0
Favors compactness
Imposes spherical-shaped clusters on data
Unsupervised Learning
Clustering Algorithms
:
:
:
:
:
Favors connectedness
Detects arbitrary-shaped clusters with even densities
Cannot handle distinct density clusters
Is sensitive to in-between patterns
Needs criteria to set the final number of clusters
CL algorithm:
:
:
:
Favors compactness
Imposes spherical-shaped clusters on data
Needs criteria to set the final number of clusters
10
Unsupervised Learning
Clustering Algorithms
Hierarchical Clustering
Weakness
:
:
:
:
:
Unsupervised Learning
Clustering Algorithms
:
:
:
:
:
A Fred, J Leito, A New Cluster Isolation Criterion Based on Dissimilarity Increments, IEEE PAMI, 2003.
11
Unsupervised Learning
Clustering Algorithms
x ,x ,x
i
xk
- nearest neighbors
x j : j argmin d xl , xi , l i
l
xk : k argmin d xl , x j , l i
l
d xi , x j
d xi , x j
xj
xi
Dissimilarity increment:
d inc xi , x j , xk d xi , x j d x j , xk
d inc
Unsupervised Learning
Clustering Algorithms
12
Unsupervised Learning
Clustering Algorithms
2D Gaussian data
Unsupervised Learning
Clustering Algorithms
Ring-shaped data
13
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
Clustering Algorithms
x
Exponential distribution: p( x ) exp
14
Unsupervised Learning
Clustering Algorithms
:
:
Unsupervised Learning
Clustering Algorithms
10
Ci
Cj
13
8
17
gapi
1
dt(C i)
14
dt(Ci)
16
dt(C j)
12
d(C i,Cj)
dt(Cj)
18
15
gapj
5
6
11
15
Unsupervised Learning
Clustering Algorithms
dt(Ci )
d(Ci ,C j )
Ci
gapi
gapj
Cj
Unsupervised Learning
Clustering Algorithms
16
Unsupervised Learning
Clustering Algorithms
Result: the crossings of the tangential line, at points which are multiple of
the distribution mean value, /, with the x axis, is given by (+1)/
exponential distribution,
=.05
Unsupervised Learning
Clustering Algorithms
17
Unsupervised Learning
Clustering Algorithms
:Ring-Shaped Clusters
Unsupervised Learning
Clustering Algorithms
:Ring-Shaped Clusters
18
Unsupervised Learning
Clustering Algorithms
:Ring-Shaped Clusters
3/3
g2
d1
3/
d2
3/1
g1
d1
d2
d2
Unsupervised Learning
Clustering Algorithms
:Ring-Shaped Clusters
3/3
g2
3/
3/1
g1
d1
d2
d2
19
Unsupervised Learning
Clustering Algorithms
2 < < 9
Single-link
Single Link
th=1.1
Unsupervised Learning
Clustering Algorithms
:
:
20
Unsupervised Learning
Clustering Algorithms
Contour
String Contour
Extraction
Description
Based on a
thresholding
method
:
:
8 directional
differential
chain code
Unsupervised Learning
Clustering Algorithms
21
Unsupervised Learning
Clustering Algorithms
t1
t1
t2
t2
t3
t3
t4
t4
t5
t5
(string-edit distance)
Unsupervised Learning
Clustering Algorithms
Strength
:
:
:
Weakness
22
Unsupervised Learning
Clustering Algorithms
Outline
Partitional Methods
:
:
:
K-Means
Spectral Clustering
EM-based Gaussian Mixture Decomposition
Unsupervised Learning
Clustering Algorithms
Partitional Methods
K-Means
Algorithm:
1. Randomly select k seed points from the data set, and take them as initial
centroids
2. Partition the data into k clusters by assigning each object to the cluster with
the nearest centroid.
The centroid is
23
Unsupervised Learning
Clustering Algorithms
10
10
0
-2
-4
-5
Unsupervised Learning
Clustering Algorithms
:
:
:
Weakness
24
Unsupervised Learning
Clustering Algorithms
Strength
:
:
:
66
Weakness
:
:
88
Imposes spherical-shaped00clusters
22
44 clusters
66
Is sensitive to the number00of objects
in
88
10
10
Unsupervised Learning
Clustering Algorithms
:
:
:
Weakness
:
:
:
25
Unsupervised Learning
10
10
Favors compactness
Strength
:
:
:
2
10
0
0
8
:
:
:
Fast
Scalability
Robust to noise
4
6
Weakness
Clustering Algorithms
10
0
0
10
0
0
4 Unsupervised
6 Learning
8
-- Ana Fred
10
Unsupervised Learning
Clustering Algorithms
:
:
:
Weakness
:
:
:
:
:
26
Unsupervised Learning
Clustering Algorithms
:
:
:
:
:
:
:
Unsupervised Learning
Clustering Algorithms
Spectral Clustering
For a given data set, X, Spectral Clustering finds a set of data clusters on
the basis of spectral analysis of a similarity graph
vertices V={1, , N}, corresponding to the data points in the data set, and
each edge between two vertices is weighted by the similarity between them.
The weight matrix is also called the affinity matrix or the similarity matrix.
x x
Aij e
j
2
Gaussian Kernel
27
Unsupervised Learning
Clustering Algorithms
Spectral Clustering
Cutting edges of G we obtain disjoint subgraphs of G as the clusters of X
The goal of clustering is to organize the dataset into disjoint subsets with
high intra-cluster similarity and low inter-cluster similarity
Unsupervised Learning
Clustering Algorithms
Spectral Clustering
The graph partitioning for data clustering can be interpreted as a
minimization problem of an objective function, in which the compactness
and isolation are quantified by the subset sums of edge weights.
Rcut(C1 , , Ck ) :
l 1
cut(Cl , X \ Cl )
card Cl
Ncut(C1 , , Ck ) :
l 1
M cut(C1 , , Ck ) :
l 1
cut(Cl , X \ Cl )
cut(Cl , X )
cut(Cl , X \ Cl )
cut(Cl , Cl )
28
Unsupervised Learning
Clustering Algorithms
Spectral Clustering
The solution of the minimization problem of any of the previous objective
functions is obtained from the matrix of the first k eigenvectors of a matrix
derived from the affinity matrix (Laplacian matrix)
The eigenvectors for Ncut and Mcut are identical, and obtained from the
symmetrical Laplacian
L Lsym : D 1/ 2 AD 1/ 2
D is a diagonal matrix whose ii-th entry is the sum of the i-th row of A
:
:
Unsupervised Learning
Spectral Clustering
Clustering Algorithms
x x
Aij e
Original feature
space
j
2
Eigen vector
Feature space
A. Y. Ng and M. I. Jordan and Y. Weiss, On Spectral Clustering: Analysis and an algorithm, NIPS 2001
29
Unsupervised Learning
Spectral Clustering
Clustering Algorithms
Algorithm:
Unsupervised Learning
Spectral Clustering
Clustering Algorithms
30
Unsupervised Learning
Spectral Clustering
Clustering Algorithms
K=2, =0.1
K=2, =0.4
Unsupervised Learning
Spectral Clustering
Clustering Algorithms
K=2, =0.1
K=2, =0.4
31
Unsupervised Learning
Spectral Clustering
Clustering Algorithms
1.5
1.5
0.5
0.5
-0.5
-0.5
-1
-1
-1.5
-1.5
-2
-1.5
-1
-0.5
0.5
1.5
-2
-1.5
-1
-0.5
0.5
1.5
K=3, =0.45
K=3, =0.1
Unsupervised Learning -- Ana Fred
Unsupervised Learning
Spectral Clustering
Clustering Algorithms
1.5
1.5
0.5
0.5
-0.5
-0.5
-1
-1
-1.5
-1.5
-2
-1.5
-1
-0.5
0.5
1.5
-2
K=3, =0.1
-1.5
-1
-0.5
0.5
1.5
K=3, =0.45
32
Unsupervised Learning
Clustering Algorithms
Spectral Clustering
Selection of parameter values:
MSEY
MSE:
Eigengap
1
N
( A) 1
Rcut
Rcut K
y
k
ni
i 1 j 1
mi
2
1
k 1
l 1,l k
i
jS k
jSl
A jk
ij
Unsupervised Learning
Clustering Algorithms
33
Unsupervised Learning
Clustering Algorithms
Spectral Clustering
Strength
:
:
Weakness
:
:
Computationally heavy
Needs criteria to set the final number of clusters and scaling factor
Unsupervised Learning
Clustering Algorithms
f1(x)
Choose at random
f2(x)
random variable
fi(x)
Conditional: f (x|source i) = fi (x)
Joint: f (x and source i) = fi (x) i
fk(x)
Unconditional: f(x) =
i 1
34
Unsupervised Learning
Clustering Algorithms
f(x|) = i f(x|i )
i=1
f1(x)
f2(x)
f3(x)
Unsupervised Learning
Clustering Algorithms
f(x|) = i f(x|i )
i=1
Component densities
k
: f ( x | i )
i 1
Gaussian
Common covariance: f ( x | i ) N( x | i , C)
{ 1 , 2 ,..., k , C, 1 , 2 ,...k 1}
Unsupervised Learning -- Ana Fred
35
Unsupervised Learning
Clustering Algorithms
f(x|i )
i=1
arg max L( x, )
L( x , ) log
f ( x
( j)
| ) log
j1
j 1
f (x
i 1
( j)
| i )
mixture
Unsupervised Learning
Clustering Algorithms
.
.
f ( x | 1 , 2 , , )
2
2 2
( x 1 ) 2
2 12
1
e
2
( x 2 ) 2
2
{x1 , x 2 ,..., x n }
1 x1
( x ) 2
1 1 2 2
L( x, ) log
e
2 2
2
n
log(...)
j 2
as 2 0
36
Unsupervised Learning
Clustering Algorithms
Observed data:
x {x (1) , x ( 2) ,..., x ( n ) }
Unsupervised Learning
Clustering Algorithms
.
.
( 2)
(n)
x( j) generated
by component i
Lc (x, z, ) z i( j) log i f i (x ( j) | i )
j1 i 1
log f ( x ( j) , z ( j) | )
Unsupervised Learning -- Ana Fred
37
Unsupervised Learning
Clustering Algorithms
The EM Algorithm
The E-step: compute the expected value of Lc (x, z, )
( t ) ] Q(,
(t) )
E[Lc (x, z, ) | x,
Unsupervised Learning
Clustering Algorithms
i f ( x ( j ) | i( t ) )
k
n 1
f ( x ( j ) | n( t ) )
The M-step:
i( t 1)
1 n ( j, t )
wi
n j1
n
i( t 1)
w
j1
( j, t )
i
x ( j)
w i( j,t )
( t 1)
i
w
j1
j1
( j, t )
i
( x ( j) i( t 1) ) ( x ( j) i( t 1) ) T
n
w
j1
( j, t )
i
38
Unsupervised Learning
Clustering Algorithms
:
:
Usually:
), k k , k 1,..., k
k arg min C (
(k)
min
min
max
.
.
.
) L(x,
) P(
)
C (
(k)
(k)
(k)
Resampling-based techniques
.
.
.
Unsupervised Learning
Clustering Algorithms
L(x | ( k ) ) log f (x | ( k ) )
MDL criterion: parameter code length
Parameter
code-length
MDL criterion:
(k)
(k)
(k)
( k )
1
log( n ' )
2
39
Unsupervised Learning
Clustering Algorithms
(k)
(k)
( k )
2
(k)
(k)
( k )
2
2
Np
n
i w i( j, t )
2
j1
( t 1)
i
y+
m 1
m 1
log(
y
This M-step may annihilate components
M. Figueiredo and A. K. Jain, Unsupervised Learning of Finite Mixture Models, IEEE TPAMI, 2002
Unsupervised Learning -- Ana Fred
Unsupervised Learning
Clustering Algorithms
40
Unsupervised Learning
Clustering Algorithms
Unsupervised Learning
Clustering Algorithms
:
:
:
Model-based approach
Good for Gaussian data
Handles touching clusters
Weakness
:
:
41
Unsupervised Learning
Clustering Algorithms
74 iterations
270 iterations
Unsupervised Learning
Clustering Algorithms
References
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
A.K. Jain and M. N. Murty and P.J. Flynn, Data Clustering: A Review, ACM Computing Surveys, vol 31. No
3,pp 264-323, 1999.
Data Mining: Concepts and Techniques, J. Han and M. Kamber, Morgan Kaufmann Pusblishers, 2001.
A. Y. Ng and M. I. Jordan and Y. Weiss, On Spectral Clustering: Analysis and an algorithm, in Advances in
Neural Information Processing Systems 14, T. G. Dietterich and S. Becker and Z. Ghahramani, MIT
Press, 2002,
M. Figueiredo and A. K. Jain, Unsupervised Learning of Finite Mixture Models, IEEE TPAMI, 2002
A. L. N. Fred, J. M. N. Leito, A New Cluster Isolation Criterion Based on Dissimilarity Increments, IEEE
Trans. On Pattern Analysis and Machine Intelligence, vol 25, NO. 8, pp 2003.
42