Professional Documents
Culture Documents
CSE586
Gaussian Mixtures
and the EM Algorithm
Reading: Chapter 7.4, Prince book
Robert Collins
• Mul3variate
Gaussian
d
(( 2π )
Evaluates to a number
mean covariance
dx1 vector dxd matrix
Bishop, 2003
Robert Collins
Likelihood
Func3on
CSE586
• Data
set
• Assume
observed
data
points
generated
independently
• Viewed
as
a
func3on
of
the
parameters,
this
is
known
as
the
likelihood
func-on
Bishop, 2003
Robert Collins
Maximum
Likelihood
CSE586
Bishop, 2003
Robert Collins
Bishop, 2003
Robert Collins
Comments
CSE586
Time
between
eruptions
(minutes)
Bishop, 2004
Robert Collins
Bishop, 2003
Robert Collins
1 1
0.5 0.5
0 0
0 0.5 1 0 0.5 1
(a) (b)
Bishop, 2004
Robert Collins
CSE586
Aside:
Sampling
from
a
Mixture
Model
0.5
0
0 0.5 1
(a)
Robert Collins
CSE586
Aside:
Sampling
from
a
Mixture
Model
Generate u = uniform random number between 0 and 1
If u < π1
generate x ~ N(x | µ1, Σ1)
elseif u < π1 + π2
generate x ~ N(x | µ2, Σ2)
• Log likelihood
EM Algorithm
CSE586
EM Algorithm
CSE586
labels
Suppose some oracle told us which point comes from which Gaussian.
EM Algorithm
CSE586
labels
= + +
Robert Collins
EM Algorithm
CSE586
labels
And we can easily estimate each Gaussian, along with the mixture weights!
= + +
EM Algorithm
CSE586
EM Algorithm
CSE586
can be
estimated
separately
can be
estimated
separately
can be
estimated
separately
Robert Collins
can be
estimated
separately
can be
estimated
separately
can be
estimated
separately
EM Algorithm
CSE586
Unfortunately, oracles don’t exist (or if they do, they won’t talk to us)
4) iterate...
Robert Collins
Insight
CSE586
Since we don’t know the latent variables, we instead take the expected value
of the log likelihood with respect to their posterior distribution P(z|x,theta). In
the GMM case, this is equivalent to “softening” the binary latent variables to
continuous ones (the expected values of the latent variables)
Where is P(znk = 1)
Robert Collins
Insight
CSE586
can be
estimated
separately
can be
estimated
separately
can be
estimated
separately
i i
can be
estimated
separately
i i can be
estimated
separately
E ownership weights
means covariances
mixing probabilities
Robert Collins
labeled unlabeled
Easy to estimate params Hard to estimate params
(do each color separately) (we need to assign colors)
Robert Collins
E
ownership
weights
(so.
labels)
M
means covariances
1
if
µj
is
closest
mean
to
xn
hard
labels,
as
in
the
K-‐means
algorithm
0
otherwise
K-‐Means
Algorithm
• Given
N
data
points
x1,
x2,...,
xN
• Find
K
cluster
centers
µ1, µ2,..., µΚ to
minimize
(znk
is
1
if
point
n
belongs
to
cluster
k;
0
otherwise)
• Algorithm:
– ini3alize
K
cluster
centers
µ1, µ2,..., µΚ
– repeat
• set
znk
labels
to
assign
each
point
to
closest
cluster
center
• revise
each
cluster
center
E
µj to
be
center
of
mass
of
points
in
that
cluster
M
– un3l
convergence
(e.g.
znk
labels
don’t
change)