You are on page 1of 32

4/7/2015

MA5232 Modeling and Numerical


Simulations
Lecture 2
Iterative Methods for Mixture-Model
Segmentation
8 Apr 2015
National University of Singapore

Last time
PCA reduces dimensionality of a data set while
retaining as much as possible the data variation.
Statistical view: The leading PCs are given by the
leading eigenvectors of the covariance.
Geometric view: Fitting a d-dim subspace model via
SVD

Extensions of PCA
Probabilistic PCA via MLE
Kernel PCA via kernel functions and kernel matrices
National University of Singapore

4/7/2015

This lecture
Review basic iterative algorithms for central
clustering
Formulation of the subspace segmentation
problem

National University of Singapore

Segmentation by Clustering

From: Object Recognition as Machine Translation, Duygulu, Barnard, de Freitas, Forsyth, ECCV02

4/7/2015

Example 4.1
Euclidean distance-based clustering is not
invariant to linear transformation

Distance metric needs to be adjusted after


linear transformation

National University of Singapore

Central Clustering
Assume data sampled from a mixture of
Gaussian

Classical distance metric between a sample


and the mean of the jth cluster is the
Mahanalobis distance

National University of Singapore

4/7/2015

Central Clustering: K-Means


Assume a map function provide each ith
sample a label
An optimal clustering minimizes the withincluster scatter:

i.e., the average distance of all samples to


their respective cluster means
National University of Singapore

Central Clustering: K-Means


However, as K is user defined,
when
each point becomes a cluster itself: K=n.
In this chapter, would assume true K is known.

National University of Singapore

4/7/2015

Algorithm
A chicken-and-egg view

National University of Singapore

Two-Step Iteration

National University of Singapore

10

4/7/2015

Example
http://util.io/k-means

National University of Singapore

11

Feature Space

Source: K. Grauman

4/7/2015

Results of K-Means Clustering:

Image

Clusters on intensity

Clusters on color

K-means clustering using intensity alone and color alone


* From Marc Pollefeys COMP 256 2003

4/7/2015

A bad local optimum

National University of Singapore

15

Characteristics of K-Means
It is a greedy algorithm, does not guarantee to
converge to the global optimum.
Given fixed initial clusters/ Gaussian models, the
iterative process is deterministic.
Result may be improved by running k-means
multiple times with different starting conditions.
The segmentation-estimation process can be
treated as a generalized expectationmaximization algorithm
National University of Singapore

16

4/7/2015

EM Algorithm [Dempster-Laird-Rubin 1977]


Expectation Maximization (EM) estimates the
model parameters and the segmentation in a
ML sense.
Assume samples are independently drawn
from a mixed probabilistic distribution,
indicated by a hidden discrete variable z
Cond. dist. can be Gaussian
National University of Singapore

17

The Maximum-Likelihood Estimation


The unknown parameters are
The likelihood function:

The optimal solution maximizes the loglikelihood


National University of Singapore

18

4/7/2015

The Maximum-Likelihood Estimation


Directly maximize the log-likelihood function
is a high-dimensional nonlinear optimization
problem

National University of Singapore

19

Define a new function:

The first term is called expected complete loglikelihood function;


The second term is the conditional entropy.
National University of Singapore

20

10

4/7/2015

Observation:

National University of Singapore

21

The Maximum-Likelihood Estimation


Regard the (incomplete) log-likelihood as a
function of two variables:
Maximize g iteratively (E step, followed by M
step)

National University of Singapore

22

11

4/7/2015

Iteration converges to a stationary


point

National University of Singapore

23

Prop 4.2: Update

National University of Singapore

24

12

4/7/2015

Update
Recall

Assume
is fixed, then maximize the
expected complete log-likelihood

National University of Singapore

25

To maximize the expected log-likelihood, as an


example, assume each cluster is isotropic
normal distribution:

Eliminate the constant term in the objective

National University of Singapore

26

13

4/7/2015

Exer 4.2

Compared to k-means, EM assigns the


samples softly to each cluster according to a
set of probabilities.

National University of Singapore

27

EM Algorithm

National University of Singapore

28

14

4/7/2015

Exam 4.3: Global max may not exist

National University of Singapore

29

Alternative view of EM:


Coordinate ascent
w

w1

National University of Singapore

30

15

4/7/2015

Alternative view of EM:


Coordinate ascent
w

w1

National University of Singapore

31

Alternative view of EM:


Coordinate ascent
w

w2
w1

National University of Singapore

32

16

4/7/2015

Alternative view of EM:


Coordinate ascent
w

w2
w1

National University of Singapore

33

Alternative view of EM:


Coordinate ascent
w

w2
w1

National University of Singapore

34

17

4/7/2015

Alternative view of EM:


Coordinate ascent
w

w2
w1

National University of Singapore

35

Visual example of EM

18

4/7/2015

Potential Problems
Incorrect number of Mixture Components

Singularities

Incorrect Number of Gaussians

19

4/7/2015

Incorrect Number of Gaussians

Singularities
A minority of the data can have a
disproportionate effect on the model
likelihood.
For example

20

4/7/2015

GMM example

Singularities
When a mixture component collapses on a
given point, the mean becomes the point, and
the variance goes to zero.
Consider the likelihood function as the
covariance goes to zero.
The likelihood approaches infinity.

21

4/7/2015

K-means VS EM

k-means clustering and EM clustering on an artificial dataset ("mouse"). The


tendency of k-means to produce equi-sized clusters leads to bad results, while
EM benefits from the Gaussian distribution present in the data set

National University of Singapore

43

So far
K-means
Expectation Maximization

National University of Singapore

44

22

4/7/2015

Next up
Multiple-Subspace Segmentation
K-subspaces
EM for Subspaces

National University of Singapore

45

Multiple-Subspace Segmentation

National University of Singapore

46

23

4/7/2015

K-subspaces

National University of Singapore

47

K-subspaces
With noise, we minimize

Unfortunately, unlike PCA, there is no constructive


solution to the above minimization problem. The main
difficulty is that the foregoing objective is hybrid it is
a combination of minimization on the continuous
variables {Uj} and the discrete variable j.
National University of Singapore

48

24

4/7/2015

K-subspaces

National University of Singapore

49

K-subspaces

Exactly the same as


in PCA

National University of Singapore

50

25

4/7/2015

K-subspaces

National University of Singapore

51

K-subspaces

National University of Singapore

52

26

4/7/2015

EM for Subspaces

National University of Singapore

53

EM for Subspaces

National University of Singapore

54

27

4/7/2015

EM for Subspaces

National University of Singapore

55

EM for Subspaces

National University of Singapore

56

28

4/7/2015

EM for Subspaces

National University of Singapore

57

EM for Subspaces
In the M step

National University of Singapore

58

29

4/7/2015

EM for Subspaces

National University of Singapore

59

EM for Subspaces

National University of Singapore

60

30

4/7/2015

EM for Subspaces

National University of Singapore

61

EM for Subspaces

National University of Singapore

62

31

4/7/2015

Relationship between K-subspaces and


EM
At each iteration,
K-subspaces algorithm gives a definite
assignment of every data point into one of the
subspaces;
EM algorithm views the membership as a
random variable and uses its expected value
to give a probabilistic assignment of the
data point.
National University of Singapore

63

Homework
Read the handout Chapter 4 Iterative
Methods for Multiple-Subspace
Segmentation.
Complete exercise 4.2 (page 111) of the
handout

National University of Singapore

64

32

You might also like