EM Algorithm: Jur Van Den Berg

EM Algorithm
Jur van den Berg
Kalman Filtering vs.

Smoothing
Dynamics and Observation model
x t 1
yt
Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )
Kalman Filter:
Compute X t | Y0 y 0 , , Yt y t
Real-time, given data so far
Kalman Smoother:
X t | Y0 y 0 , , YT y T , 0 t T
Compute
Post-processing, given all data
EM Algorithm
x t 1
yt
Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )
Kalman smoother:
Compute distributions X0, , Xt
given parameters A, C, Q, R, and data y0, , yt.
EM Algorithm:
Simultaneously optimize X0, , Xt and A, C, Q,
R
given data y0, , yt.
Probability vs. Likelihood

Probability: predict unknown outcomes
based on known parameters:
p(x | )
Likelihood: estimate unknown

parameters based on known outcomes:
L(| x) = p(x | )
Coin-flip example:
is probability of heads (parameter)
x = HHHTTH is outcome
Likelihood for Coin-flip

Example
Probability of outcome given parameter:
p(x = HHHTTH | = 0.5) = 0.56 = 0.016
Likelihood of parameter given outcome:

L(= 0.5 | x = HHHTTH) = p(x | ) = 0.016
Likelihood maximal when = 0.6666

Likelihood function not a probability density
Likelihood for Cont.

Distributions
Six samples {-3, -2, -1, 1, 2, 3}
believed to be drawn from some
Gaussian N(0, 2)
LLikelihood
( | {3,2,1,1,2of
,3}):
p ( x 3 | ) p ( x 2 | ) p ( x 3 | )
Maximum (likelihood:
3) 2 (2) 2 (1) 2 12 2 2 32
2.16
Likelihood for Stochastic

Model
Dynamics model
x t 1
yt
Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )
Suppose xt and yt are given for 0 t

T, what is likelihood of A,
C, Q and
T
L( A, C , Q, R | x, y ) p (x, y | A, C , Q, R ) p (x t | x t 1 ) p (y t | x t )
R?
t 0
log p (x, y | A, C , Q, R )
Compute log-likelihood:
Log-likelihood
T
log p(x, y | A, C , Q, R ) log p(x t | x t 1 ) p(y t | x t )

t 0
T 1
log p(x
t 0
t 1
| x t ) log p(y t | x t ) ...

t 0
Multivariate normal
distribution N(,
1
/
2
k / 2
1
T 1
1
p
(
x
)
(
2
exp(
(
x
)
(x ))
2
) has pdf:
From model:xt 1 ~ N ( Axt , Q) y t ~ N (Cxt , R)
1
1
1
T
1
log Q (x t 1 Ax t ) Q (x t 1 Ax t )
2
t 0 2
1
T 1
1
T
1
log R (y t Cx t ) R (y t Cx t ) const
2
t 0 2
T 1
Log-likelihood #2
1
1
1
T
1
log Q (x t 1 Ax t ) Q (x t 1 Ax t )
2
t 0 2
1
T 1
1
T
1
log R (y t Cx t ) R (y t Cx t ) const ...
2
t 0 2
T 1
a = Tr(a) if a is scalar
Bring summation inward
T 1
T
1
1
T
1
log Q Tr(( x t 1 Ax t ) Q (x t 1 Ax t ))
2
2 t 0
T 1
1 T
1
T
1
log R Tr(( y t Cx t ) R (y t Cx t )) const
2
2 t 0
Log-likelihood #3
T
1 T 1
1
T
1
log Q Tr(( x t 1 Ax t ) Q (x t 1 Ax t ))
2
2 t 0
T
T 1
1
1
T
1
log R Tr(( y t Cx t ) R (y t Cx t )) const ...
2
2 t 0
Tr(AB) = Tr(BA)
Tr(A) + Tr(B) = Tr(A+B)
T
1 1
1
log Q Tr Q
2
2
T 1
1 1
1
log R Tr R
2
2
T 1
(x
t 0
t 1
(y
t 0
Cx t ) (y t Cx t ) const

T
Ax t )(x t 1 Ax t )

T
Log-likelihood #4
T
1 1
1
log Q Tr Q
2
2
T 1
(x
t 0
T 1
1 1
1
log R Tr R
2
2
t 1
(y
t 0
Ax t )(x t 1 Ax t )

T
Cx t ) (y t Cx t ) const ...

T
Expand
l ( A, C , Q, R | x, y )
T
1 1
1
log Q Tr Q
2
2
T 1
x
t 0
T 1
1 1
1
log R Tr R
2
2
T
t 1 t 1
y y
t 0
T
t
x x A Ax t x
T
t 1 t
y t x C Cx t y
T
t
T
t 1
T
t 1
Ax t x A

T
t
Cx t x C const

T
t
Maximize likelihood
log is monotone function
max log(f(x)) max f(x)
Maximize l(A, C, Q, R | x, y) in turn

for A, C, Q and R.
l ( A, C , Q, R | x, y )
Solve
A
l ( A, C , Q, R | x, y )
Solve
C
Solvel ( A, C , Q, R | x, y )
Q
Solvel ( A, C , Q, R | x, y )
R
0
0
0
0
for
for
for
for
A
C
Q
R
Matrix derivatives
Defined for scalar functions f : Rn*m ->
R
Key identities
xT Ax
xT ( AT A)
x
B T AB
B T ( AT A)
B
Tr ( AB) Tr ( BA) Tr ( B T AT )
BT
A
A
A
log A
A T
A
Optimizing A
Derivative
l ( A, C , Q, R | x, y ) 1 1
Q
A
2
Maximizer
T 1
x
t 0
T
t 1 t
T 1
x x
t 0
T
t
T 1
2x
t 0
x 2 Ax t x
T
t 1 t
T
t
Optimizing C
Derivative
l ( A, C , Q, R | x, y ) 1 1
R
C
2
Maximizer
y x x x
t 0
T
t
t 0
T
t
2y x
t 0
T
t
2Cx t x
T
t
Optimizing Q
Derivative with respect to inverse
l ( A, C , Q, R | x, y ) T
1
Q
1
Q
2
2
t 0
Maximizer
1
Q
T
T 1
x
t 0
T
t 1 t 1
T
T T
T
T T
xt 1xt 1 xt 1xt A Axt xt 1 Axt xt A
T 1
x x A Ax t x
T
t 1 t
T
t 1
Ax t x A
T
t
Optimizing R
Derivative with respect to inverse
l ( A, C , Q, R | x, y ) T 1
1
R
1
R
2
2
T
T T
T
T
y
y
y
x
C
C
x
y
C
x
x
t t t t
t t
t t C
t 0
Maximizer
1
R
T 1
y y
t 0
T
t
y t x C Cx t y Cx t x C
T
t
T
t
T
t
EM-algorithm
x t 1
yt
Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )
Initial guesses of A, C, Q, R
Kalman smoother (E-step):
Compute distributions X0, , XT
given data y0, , yT and A, C, Q, R.
Update parameters (M-step):

Update A, C, Q, R such that
expected log-likelihood is maximized
Repeat until convergence (local

optimum)
Kalman Smoother
for (t = 0; t < T; ++t)
filter x t 1|t Ax t|t
// Kalman
Pt 1|t APt|t AT Q
K t 1
x t 1|t 1
Pt 1|t C CPt 1|t C R

x t 1|t K t 1 y t 1 Cx t 1|t
Pt 1|t 1
Pt 1|t K t 1CPt 1|t
for (t = T 1; t 0;T --t)1

Lt
Pt |t A Pt 1|t
pass
x t|T
Pt|T
// Backward
x t |t Lt x t 1|T x t 1|t
Pt |t Lt ( Pt 1|T Pt 1|t ) LTt
Update Parameters
Likelihood in terms of x, but only X
l ( A, C , Q, R | x, y )
available
T
1 1
1
log Q Tr Q
2
2
T 1
x
t 0
T 1
1
log R 1 Tr R 1
2
2
T
t 1 t 1
x x A Ax t x
T
t 1 t
T
t 1
Ax t x A

T
t
T
T T
T
T T
y
y
y
x
C
C
x
y
C
x
x
t t
t t
t t 1
t t C const
t 0

T
x t , x t xTt , x t xTt1
Likelihood-function linear in
Expected likelihood: replace them with:
E ( X t | y ) x t|T
E ( X t X tT | y ) Pt|T x t|T x Tt|T
E ( X t X tT1 | y ) x t|t x Tt 1|T Lt Pt 1|T (x t 1|T x t 1|t )x Tt 1|T
Use maximizers to update A, C, Q and R.
Convergence
Convergence is guaranteed to local
optimum
Similar to coordinate ascent
Conclusion
EM-algorithm to simultaneously
optimize
state estimates and model
parameters
Given ``training data, EM-algorithm
can be used (off-line) to learn the
model for subsequent use in (realtime) Kalman filters
Next time
Learning from demonstrations
Dynamic Time Warping

EM Algorithm: Jur Van Den Berg

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EM Algorithm: Jur Van Den Berg

Uploaded by

Copyright:

Available Formats

EM Algorithm

Jur van den Berg

Kalman Filtering vs.

Probability vs. Likelihood

Likelihood: estimate unknown

Likelihood for Coin-flip

Likelihood of parameter given outcome:

Likelihood maximal when = 0.6666

Likelihood for Cont.

Likelihood for Stochastic

Suppose xt and yt are given for 0 t

log p(x, y | A, C , Q, R ) log p(x t | x t 1 ) p(y t | x t )

| x t ) log p(y t | x t ) ...

Maximize l(A, C, Q, R | x, y) in turn

Update parameters (M-step):

Repeat until convergence (local

Pt 1|t C CPt 1|t C R

Pt 1|t K t 1CPt 1|t

for (t = T 1; t 0;T --t)1

E ( X t X tT | y ) Pt|T x t|T x Tt|T

E ( X t X tT1 | y ) x t|t x Tt 1|T Lt Pt 1|T (x t 1|T x t 1|t )x Tt 1|T

Use maximizers to update A, C, Q and R.

You might also like