Professional Documents
Culture Documents
Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )
Kalman Filter:
Compute X t | Y0 y 0 , , Yt y t
Real-time, given data so far
Kalman Smoother:
X t | Y0 y 0 , , YT y T , 0 t T
Compute
Post-processing, given all data
EM Algorithm
x t 1
yt
Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )
Kalman smoother:
Compute distributions X0, , Xt
given parameters A, C, Q, R, and data y0, , yt.
EM Algorithm:
Simultaneously optimize X0, , Xt and A, C, Q,
R
given data y0, , yt.
Coin-flip example:
is probability of heads (parameter)
x = HHHTTH is outcome
LLikelihood
( | {3,2,1,1,2of
,3}):
p ( x 3 | ) p ( x 2 | ) p ( x 3 | )
Maximum (likelihood:
3) 2 (2) 2 (1) 2 12 2 2 32
2.16
Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )
log p (x, y | A, C , Q, R )
Compute log-likelihood:
Log-likelihood
T
T 1
log p(x
t 0
t 1
Multivariate normal
distribution N(,
1
/
2
k / 2
1
T 1
1
p
(
x
)
(
2
exp(
(
x
)
(x ))
2
) has pdf:
From model:xt 1 ~ N ( Axt , Q) y t ~ N (Cxt , R)
1
1
1
T
1
log Q (x t 1 Ax t ) Q (x t 1 Ax t )
2
t 0 2
1
T 1
1
T
1
log R (y t Cx t ) R (y t Cx t ) const
2
t 0 2
T 1
Log-likelihood #2
1
1
1
T
1
log Q (x t 1 Ax t ) Q (x t 1 Ax t )
2
t 0 2
1
T 1
1
T
1
log R (y t Cx t ) R (y t Cx t ) const ...
2
t 0 2
T 1
a = Tr(a) if a is scalar
Bring summation inward
T 1
T
1
1
T
1
log Q Tr(( x t 1 Ax t ) Q (x t 1 Ax t ))
2
2 t 0
T 1
1 T
1
T
1
log R Tr(( y t Cx t ) R (y t Cx t )) const
2
2 t 0
Log-likelihood #3
T
1 T 1
1
T
1
log Q Tr(( x t 1 Ax t ) Q (x t 1 Ax t ))
2
2 t 0
T
T 1
1
1
T
1
log R Tr(( y t Cx t ) R (y t Cx t )) const ...
2
2 t 0
Tr(AB) = Tr(BA)
Tr(A) + Tr(B) = Tr(A+B)
T
1 1
1
log Q Tr Q
2
2
T 1
1 1
1
log R Tr R
2
2
T 1
(x
t 0
t 1
(y
t 0
Cx t ) (y t Cx t ) const
T
Ax t )(x t 1 Ax t )
T
Log-likelihood #4
T
1 1
1
log Q Tr Q
2
2
T 1
(x
t 0
T 1
1 1
1
log R Tr R
2
2
t 1
(y
t 0
Ax t )(x t 1 Ax t )
T
Cx t ) (y t Cx t ) const ...
T
Expand
l ( A, C , Q, R | x, y )
T
1 1
1
log Q Tr Q
2
2
T 1
x
t 0
T 1
1 1
1
log R Tr R
2
2
T
t 1 t 1
y y
t 0
T
t
x x A Ax t x
T
t 1 t
y t x C Cx t y
T
t
T
t 1
T
t 1
Ax t x A
T
t
Cx t x C const
T
t
Maximize likelihood
log is monotone function
max log(f(x)) max f(x)
l ( A, C , Q, R | x, y )
Solve
A
l ( A, C , Q, R | x, y )
Solve
C
Solvel ( A, C , Q, R | x, y )
Q
Solvel ( A, C , Q, R | x, y )
R
0
0
0
0
for
for
for
for
A
C
Q
R
Matrix derivatives
Defined for scalar functions f : Rn*m ->
R
Key identities
xT Ax
xT ( AT A)
x
B T AB
B T ( AT A)
B
Tr ( AB) Tr ( BA) Tr ( B T AT )
BT
A
A
A
log A
A T
A
Optimizing A
Derivative
l ( A, C , Q, R | x, y ) 1 1
Q
A
2
Maximizer
T 1
x
t 0
T
t 1 t
T 1
x x
t 0
T
t
T 1
2x
t 0
x 2 Ax t x
T
t 1 t
T
t
Optimizing C
Derivative
l ( A, C , Q, R | x, y ) 1 1
R
C
2
Maximizer
y x x x
t 0
T
t
t 0
T
t
2y x
t 0
T
t
2Cx t x
T
t
Optimizing Q
Derivative with respect to inverse
l ( A, C , Q, R | x, y ) T
1
Q
1
Q
2
2
t 0
Maximizer
1
Q
T
T 1
x
t 0
T
t 1 t 1
T
T T
T
T T
xt 1xt 1 xt 1xt A Axt xt 1 Axt xt A
T 1
x x A Ax t x
T
t 1 t
T
t 1
Ax t x A
T
t
Optimizing R
Derivative with respect to inverse
l ( A, C , Q, R | x, y ) T 1
1
R
1
R
2
2
T
T T
T
T
y
y
y
x
C
C
x
y
C
x
x
t t t t
t t
t t C
t 0
Maximizer
1
R
T 1
y y
t 0
T
t
y t x C Cx t y Cx t x C
T
t
T
t
T
t
EM-algorithm
x t 1
yt
Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )
Initial guesses of A, C, Q, R
Kalman smoother (E-step):
Compute distributions X0, , XT
given data y0, , yT and A, C, Q, R.
Kalman Smoother
for (t = 0; t < T; ++t)
filter x t 1|t Ax t|t
// Kalman
Pt 1|t APt|t AT Q
K t 1
x t 1|t 1
Pt 1|t 1
// Backward
x t |t Lt x t 1|T x t 1|t
Pt |t Lt ( Pt 1|T Pt 1|t ) LTt
Update Parameters
Likelihood in terms of x, but only X
l ( A, C , Q, R | x, y )
available
T
1 1
1
log Q Tr Q
2
2
T 1
x
t 0
T 1
1
log R 1 Tr R 1
2
2
T
t 1 t 1
x x A Ax t x
T
t 1 t
T
t 1
Ax t x A
T
t
T
T T
T
T T
y
y
y
x
C
C
x
y
C
x
x
t t
t t
t t 1
t t C const
t 0
T
x t , x t xTt , x t xTt1
Likelihood-function linear in
Expected likelihood: replace them with:
E ( X t | y ) x t|T
Convergence
Convergence is guaranteed to local
optimum
Similar to coordinate ascent
Conclusion
EM-algorithm to simultaneously
optimize
state estimates and model
parameters
Given ``training data, EM-algorithm
can be used (off-line) to learn the
model for subsequent use in (realtime) Kalman filters
Next time
Learning from demonstrations
Dynamic Time Warping