Advanced Machine Learning

CS 221: Artificial Intelligence
Lecture 6:
Advanced Machine Learning
Sebastian Thrun and Peter Norvig
Slide credit: Mark Pollefeys, Dan Klein, Chris Manning
Outline
Clustering
K-Means
EM
Spectral Clustering
Dimensionality Reduction
2
The unsupervised learning problem
3
Many data points, no labels
Unsupervised Learning?
Google Street View
4
K-Means
5
Many data points, no labels
K-Means
Choose a fixed number of
clusters

Choose cluster centers and
point-cluster allocations to
minimize error
cant do this by exhaustive
search, because there are
too many possible
allocations.
Algorithm
fix cluster centers;
allocate points to closest
cluster
fix allocation; compute
best cluster centers
x could be any set of
features for which we can
compute a distance (careful
about scaling)

2
elements of i' th cluster clusters
* From Marc Pollefeys COMP 256 2003
K-Means
K-Means
K-means clustering using intensity alone and color alone
Image Clusters on intensity Clusters on color
Results of K-Means Clustering:
K-Means
Is an approximation to EM
Model (hypothesis space): Mixture of N Gaussians
Latent variables: Correspondence of data and Gaussians
We notice:
Given the mixture model, its easy to calculate the
correspondence
Given the correspondence its easy to estimate the mixture
models

Expectation Maximzation: Idea
Data generated from mixture of Gaussians

Latent variables: Correspondence between Data Items
and Gaussians
Generalized K-Means (EM)
Gaussians
13
p(x)
1
2
exp
(x )
2
2
2
p(x) 2
( )

N
2

1
exp
1
2
(x )
T
1
(x )
ML Fitting Gaussians
14

1
M
x
i
i
1
M
x
i

( )
i
x
i

( )
T
p(x)
1
2
exp
(x )
2
2
2
p(x) 2
( )

N
2

1
exp
1
2
(x )
T
1
(x )
Learning a Gaussian Mixture

(with known covariance)
k
n
x
x
n i
j i
e
e
1
) (
2
1
) (
2
1
2
2
2
2
n
j
E[z
ij
]
i1
M
M-Step
k
n
n i
j i
ij
x x p
x x p
z E
1
) | (
) | (
] [ E-Step
j

1
n
j
E[z
ij
] x
i

( )
i1
M
x
i

( )
T
j

1
n
j
E[z
ij
] x
i
i1
M
Expectation Maximization
Converges!
Proof [Neal/Hinton, McLachlan/Krishnan]:
E/M step does not decrease data likelihood
Converges at local minimum or saddle point
But subject to local minima
EM Clustering: Results
http://www.ece.neu.edu/groups/rpl/kmeans/
Practical EM
Number of Clusters unknown
Suffers (badly) from local minima
Algorithm:
Start new cluster center if many points
unexplained
Kill cluster center that doesnt contribute
(Use AIC/BIC criterion for all this, if you want
to be formal)
18
Spectral Clustering
19
Spectral Clustering
20
The Two Spiral Problem
21
Spectral Clustering: Overview
Data Similarities Block-Detection
* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University
Eigenvectors and Blocks
Block matrices have block eigenvectors:

Near-block matrices have near-block eigenvectors: [Ng et al., NIPS 02]
1 1 0 0
1 1 0 0
0 0 1 1
0 0 1 1
eigensolver
.71
.71
0
0
0
0
.71
.71
1
= 2
2
= 2
3
= 0
4
= 0
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
eigensolver
.71
.69
.14
0
0
-.14
.69
.71
1
= 2.02
2
= 2.02
3
= -0.02
4
= -0.02
Spectral Space
Can put items into blocks by eigenvectors:

Resulting clusters independent of row ordering:
1 1 .2 0
1 1 0 -.2
.2 0 1 1
0 -.2 1 1
.71
.69
.14
0
0
-.14
.69
.71
e
1

e
2

e
1
e
2

1 .2 1 0
.2 1 0 1
1 0 1 -.2
0 1 -.2 1
.71
.14
.69
0
0
.69
-.14
.71
e
1

e
2

e
1
e
2

The Spectral Advantage
The key advantage of spectral clustering is the spectral space
representation:
Measuring Affinity
Intensity
Texture
Distance
, exp
1
2

2

2
, exp
1
2

2

2
, exp
1
2

2

2

Scale affects affinity
Dimensionality Reduction
29
The Space of Digits (in 2D)
30
M. Brand, MERL
Dimensionality Reduction with PCA
31
gaelvaroquaux@normalesup.org
Linear: Principal Components

Fit multivariate Gaussian
Compute eigenvectors of
Covariance
Project onto eigenvectors
with largest eigenvalues

32
Other examples of unsupervised learning
33
Mean face (after alignment)
Slide credit: Santiago Serrano
Eigenfaces
34
Slide credit: Santiago Serrano
Non-Linear Techniques
Isomap
Local Linear Embedding
35
Isomap, Science, M. Balasubmaranian and E. Schwartz
Scape (Drago Anguelov et al)
36

Advanced Machine Learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Machine Learning

Uploaded by

Copyright:

Available Formats

CS 221: Artificial Intelligence

Learning a Gaussian Mixture

* From Marc Pollefeys COMP 256 2003

You might also like