This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

### Publishers

Scribd Selects Books

Hand-picked favorites from

our editors

our editors

Scribd Selects Audiobooks

Hand-picked favorites from

our editors

our editors

Scribd Selects Comics

Hand-picked favorites from

our editors

our editors

Scribd Selects Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

P. 1

Kalman Filter Explained|Views: 83|Likes: 1

Published by mohgawa

See more

See less

https://www.scribd.com/doc/74986760/Kalman-Filter-Explained

10/30/2012

text

original

**The Kalman Filter Explained
**

Tristan Fletcher

www.cs.ucl.ac.uk/staﬀ/T.Fletcher/

1 Introduction

The aim of this document is to derive the ﬁltering equations for the simplest

Linear Dynamical System case, the Kalman Filter, outline the ﬁlter’s im-

plementation, do a similar thing for the smoothing equations and conclude

with parameter learning in an LDS (calibrating the Kalman Filter).

The document is based closely on Bishop [1] and Gharamani’s [2] texts,

but is more suitable for those who wish to understand every aspect of the

mathematics required and see how it all comes together in a procedural

sense.

2 Model Speciﬁcation

The simplest form of Linear Dynamical System (LDS) models a discrete time

process where a latent variable h is updated every time step by a constant

linear state transition A with the addition of zero-mean Gaussian noise η

h

:

h

t

= Ah

t−1

+η

h

t

where η

h

t

∼ N(0, Σ

H

)

⇒p(h

t

|h

t−1

) ∼ N(Ah

t−1

, Σ

H

) (2.1)

This latent variable is observed through a constant linear function of the

latent state B also subject to zero-mean Gaussian noise η

v

:

v

t

= Bh

t

+η

v

t

where η

v

t

∼ N(0, Σ

V

)

⇒p(v

t

|h

t

) ∼ N(Bh

t

, Σ

V

) (2.2)

We wish to infer the probability distribution of h

t

given the observations up

to that point in time v

1:t

, i.e. p(h

t

|v

1:t

), which can be expressed recursively.

Starting with an initial distribution for our latent variable given the ﬁrst

observation:

p(h

1

|v

1

) ∝ p(v

1

|h

1

)p(h

1

)

and the assumption that h

1

has a Gaussian distribution:

p(h

1

) ∼ N(µ

0

, σ

2

0

)

1

values for p(h

t

|v

1:t

) for subsequent values of t can be found by iteration:

p(h

t

|v

1:t

) = p(h

t

|v

t

, v

1:t−1

)

=

p(h

t

, v

t

|v

1:t−1

)

p(v

t

|v

1:t−1

)

∝ p(h

t

, v

t

|v

1:t−1

)

=

_

h

t−1

p(h

t

, v

t

|v

1:t−1

, h

t−1

)p(h

t−1

|v

1:t−1

)

=

_

h

t−1

p(v

t

|v

1:t−1

, h

t−1

, h

t

)p(h

t

|h

t−1

, v

1:t−1

)p(h

t−1

|v

1:t−1

)

h

t

⊥v

1:t−1

|h

t−1

⇒

_

h

t−1

p(v

t

|v

1:t−1

, h

t−1

, h

t

)p(h

t

|h

t−1

)p(h

t−1

|v

1:t−1

)

v

t

⊥h

t−1

|v

1:t−1

, h

t

⇒

_

h

t−1

p(v

t

|v

1:t−1

, h

t

)p(h

t

|h

t−1

)p(h

t−1

|v

1:t−1

)

=

_

h

t−1

p(v

t

|h

t

)p(h

t

|h

t−1

)p(h

t−1

|v

1:t−1

) (2.3)

The fact that the distributions described in (2.1) and (2.2) are both Gaussian

and the operations in (2.3) of multiplication and then integration will yield

Gaussians when performed on Gaussians, means that we know p(h

t

|v

1:t

) will

itself be a Gaussian:

p(h

t

|v

1:t

) ∼ N(µ

t

, σ

2

t

) (2.4)

and the task is to derive the expressions for µ

t

and σ

2

t

.

3 Deriving the state estimate variance

If we deﬁne ∆h ≡ h −E

_

ˆ

h

_

, where h denotes the actual value of the latent

variable,

ˆ

h is its estimated value, E

_

ˆ

h

_

is the expected value of this estimate

and F as the covariance of the error estimate then:

F

t|t−1

= E

_

∆h

t

∆h

T

t

¸

= E

_

(A∆h

t−1

+ ∆η

h

t

)(A∆h

t−1

+ ∆η

h

t

)

T

_

= E

_

(A∆h

t−1

∆h

t−1

T

A

T

+ ∆η

h

t

∆η

hT

t

+ ∆η

h

t

A

T

∆h

T

t−1

+A∆h

t−1

∆η

hT

t

)

_

= AE

_

∆h

t−1

∆h

T

t−1

¸

A

T

+E

_

∆η

h

t

∆η

hT

t

_

= AF

t−1:t−1

A

T

+ Σ

H

(3.1)

The subscript in F

t|t−1

denotes the fact that this is F’s value before an

observation is made at time t (i.e. its a priori value) while F

t|t

would denote

2

a value for F

t|t

after an observation is made (its posterior value). This more

informative notation allows the update equation in (2.1) to be expressed as

follows:

ˆ

h

t|t−1

= A

ˆ

h

t−1|t−1

(3.2)

Once we have an observation (and are therefore dealing with posterior val-

ues), we can deﬁne

t

as the diﬀerence between the observation we’d expect

to see given our estimate of the latent state (its a priori value) and the one

actually observed, i.e.:

t

= v

t

−B

ˆ

h

t|t−1

(3.3)

Now that we have an observation, if we wish to add a correction to our a

priori estimate that is proportional to the error

t

we can use a coeﬃcient

κ:

ˆ

h

t|t

=

ˆ

h

t|t−1

+κ

t

(3.4)

This allows us to express F

t|t

recursively:

F

t|t

= Cov(h

t

−

ˆ

h

t|t

)

= Cov(h

t

−(

ˆ

h

t|t−1

+κ

t

)

= Cov(h

t

−(

ˆ

h

t|t−1

+κ(v

t

−B

ˆ

h

t|t−1

)))

= Cov(h

t

−(

ˆ

h

t|t−1

+κ(Bh

t

+η

v

t

−B

ˆ

h

t|t−1

)))

= Cov(h

t

−

ˆ

h

t|t−1

−κBh

t

−κη

v

t

+κB

ˆ

h

t|t−1

)

= Cov((I −κB)(h

t

−

ˆ

h

t|t−1

) −κη

v

t

)

= Cov((I −κB)(h

t

−

ˆ

h

t|t−1

)) +Cov(κη

v

t

)

= (I −κB)Cov(h

t

−

ˆ

h

t|t−1

)(I −κB)

T

+κCov(η

v

t

)κ

T

= (I −κB)F

t|t−1

(I −κB)

T

+κΣ

V

κ

T

= (F

t|t−1

−κBF

t|t−1

)(I −κB)

T

+κΣ

V

κ

T

= F

t|t−1

−κBF

t|t−1

−F

t|t−1

(κB)

T

+κBF

t|t−1

(κB)

T

+κΣ

V

κ

T

= F

t|t−1

−κBF

t|t−1

−F

t|t−1

B

T

κ

T

+κ(BF

t|t−1

B

T

+ Σ

V

)κ

T

(3.5)

If we deﬁne the innovation variance as S

t

= BF

t|t−1

B

T

+ Σ

V

then (3.5)

becomes:

F

t|t

= F

t|t−1

−κBF

t|t−1

−F

t|t−1

B

T

κ

T

+κS

t

κ

T

(3.6)

3

4 Minimizing the state estimate variance

If we wish to minimize the variance of F

t|t

, we can use the mean square error

measure (MSE):

E

_

¸

¸

¸h

t

−

ˆ

h

t|t

¸

¸

¸

2

_

= Tr(Cov(h

t

−

ˆ

h

t|t

)) = Tr(F

t|t

) (4.1)

The only coeﬃcient we have control over is κ, so we wish to ﬁnd the κ that

gives us the minimum MSE, i.e. we need to ﬁnd κ such that:

δTr(F

t|t

)

δκ

= 0

(2.6) ⇒

δTr(F

t|t−1

−κBF

t|t−1

−F

t|t−1

B

T

κ

T

+κS

t

κ

T

)

δκ

= 0

⇒κ = F

t|t−1

B

T

S

−1

t

(4.2)

This optimum value for κ in terms of minimizing MSE is known as the

Kalman Gain and will be denoted K.

If we multiply by both sides of (4.2) by SK

T

:

KSK

T

= F

t|t−1

B

T

K

T

(4.3)

Substituting this into (3.6):

F

t|t

= F

t|t−1

−KBF

t|t−1

−F

t|t−1

B

T

K

T

+F

t|t−1

B

T

K

T

= (I −KB)F

t|t−1

(4.4)

4

5 Filtered Latent State Estimation Procedure (The

Kalman Filter)

The procedure for estimating the state of h

t

, which when using the MSE

optimal value for κ is called Kalman Filtering, proceeds as follows:

1. Choose initial values for

ˆ

h and F (i.e.

ˆ

h

0|0

and F

0|0

).

2. Advance latent state estimate:

ˆ

h

t|t−1

= A

ˆ

h

t−1|t−1

3. Advance estimate covariance:

F

t|t−1

= AF

t−1|t−1

A

T

+ Σ

H

4. Make an observation v

t

5. Calculate innovation:

t

= v

t

−B

ˆ

h

t|t−1

6. Calculate S

t

:

S

t

= BF

t|t−1

B

T

+ Σ

V

7. Calculate K:

K = F

t|t−1

B

T

S

−1

t

8. Update latent state estimate:

ˆ

h

t|t

=

ˆ

h

t|t−1

+K

t

9. Update estimate covariance (from (4.2)):

F

t|t

= (I −KB)F

t|t−1

10. Cycle through stages 2 to 9 for each time step.

Note that

ˆ

h

t|t

and F

t|t

correspond to µ

t

and σ

2

t

from (2.4).

5

6 Smoothed Latent State Estimation

The smoothed probability of the latent variable is the probability it had a

value at time t after a sequence of T observations, i.e. p(h

t

|v

1:T

). Unlike the

Kalman Filter which you can update with each observation, one has to wait

until T observations have been made and then retrospectively calculate the

probability the latent variable had a value at time t where t < T.

Commencing at the ﬁnal time step in the sequence (t = T) and working

backwards to the start (t = 1), p(h

t

|v

1:T

) can be evaluated as follows:

p(h

t

|v

1:T

) =

_

h

t+1

p(h

t

|h

t+1

, v

1:T

)p(h

t+1

|v

1:T

)

h

t

⊥v

t+1:T

|h

t+1

⇒p(h

t

|h

t+1

, v

1:T

) = p(h

t

|h

t+1

, v

1:t

)

∴ p(h

t

|v

1:T

) =

_

h

t+1

p(h

t

|h

t+1

, v

1:t

)p(h

t+1

|v

1:T

)

=

_

h

t+1

p(h

t+1

, h

t

|v

1:t

)p(h

t+1

|v

1:T

)

p(h

t+1

|v

1:t

)

∝

_

h

t+1

p(h

t+1

, h

t

|v

1:t

)p(h

t+1

|v

1:T

)

=

_

h

t+1

p(h

t+1

|h

t

, v

1:t

)p(h

t

|v

1:t

)p(h

t+1

|v

1:T

)

h

t+1

⊥v

1:t

|h

t

⇒

_

h

t+1

p(h

t+1

|h

t

)p(h

t

|v

1:t

)p(h

t+1

|v

1:T

) (6.1)

6

As before, we know that p(h

t

|v

1:T

) will be a Gaussian and we will need to

establish it’s mean and variance at each t, i.e. in a similar manned to (2.4):

p(h

t

|v

1:T

) ∼ N(h

s

t

, F

s

t

) (6.2)

Using the ﬁltered values calculated in the previous section for

ˆ

h

t|t

and F

t|t

for each time step, the procedure for estimating the smoothed parameters

h

s

t

and F

s

t

works backwards from the last time step in the sequence, i.e. at

t = T as follows:

1. Set h

s

T

and F

s

T

to

ˆ

h

T|T

and F

T|T

from steps 8 and 9 in section 5.

2. Calculate A

s

t

:

A

s

t

= (AF

t|t

)

T

(AF

t|t

A

T

+ Σ

H

)

−1

3. Calculate S

s

t

:

S

s

t

= F

t|t

−A

s

t

AF

t|t

4. Calculate the smoothed latent variable estimate h

s

t

:

h

s

t

= A

s

t

h

s

t+1

+

ˆ

h

t|t

−A

s

t

A

ˆ

h

t|t

5. Calculate the smoothed estimate covariance F

s

t

:

F

s

t

=

1

2

_

(A

s

t

F

s

t+1

A

T

+S

s

t

) + (A

s

t

F

s

t+1

A

T

+S

s

t

)

T

¸

6. Calculate the smoothed cross-variance X

s

t

:

X

s

t

= A

s

t

F

s

t

+h

s

t

ˆ

h

T

t|t

7. Cycle through stages 2 to 6 for each time step backwards through the

sequence from t = T to t = 1.

7

7 Expectation Maximization (Calibrating the Kalman

Filter)

The procedures outlined in the previous sections are ﬁne if we assume that

we know the value in the parameter set θ =

_

µ

0

, σ

2

0

, A, Σ

H

, B, Σ

V

_

but in

order to learn these values, we will need to perform the Expectation Maxi-

mization algorithm.

The joint probability of T time steps of the latent and observable variables

is:

p(h

1:T

, v

1:T

) = p(h

1

)

T

t=2

p(h

t

|h

t−1

)

T

t=1

p(v

t

|h

t

) (7.1)

Making the dependence on the parameters explicit, the likelihood of the

model given the parameter set θ is:

p(h

1:T

, v

1:T

|θ) = p(h

1

|µ

0

, σ

2

0

)

T

t=2

p(h

t

|h

t−1

, A, Σ

H

)

T

t=1

p(v

t

|h

t

, B, Σ

V

)

(7.2)

Taking logs gives us the model’s log likelihood:

ln p(h

1:T

, v

1:T

|θ) = ln p(h

1

|µ

0

, σ

2

0

) +

T

t=2

ln p(h

t

|h

t−1

, A, Σ

H

) +

T

t=1

ln p(v

t

|h

t

, B, Σ

V

)

(7.3)

We will deal with each of the three components of (7.3) in turn. Using V

to represent the set of observations up to and including time t (i.e. v

1:t

),

H for h

1:T

, θ

old

to represent our parameter values before an iteration of the

EM loop, the superscript n to represent the value of a parameter after an

iteration of the loop, c to represent terms that are not dependent on µ

0

or

σ

2

0

, λ to represent (σ

2

0

)

−1

and Q = E

H|θ

old [ln p(H, V |θ)] we will ﬁrst ﬁnd the

expected value for p(h

1

|µ

0

, σ

2

0

):

Q = −

1

2

ln

¸

¸

σ

2

0

¸

¸

−E

H|θ

old

_

1

2

(h

1

−µ

0

)

T

λ(h

1

−µ

0

)

_

+c

= −

1

2

ln

¸

¸

σ

2

0

¸

¸

−

1

2

E

H|θ

old

_

h

T

1

λh

1

−h

T

1

λµ

0

−µ

T

0

λh

1

+µ

T

0

λµ

0

¸

+c

=

1

2

_

ln |λ| −Tr

_

λE

H|θ

old

_

h

1

h

T

1

−h

1

µ

T

0

−µ

0

h

T

1

+µ

0

µ

T

0

¸

__

+c

(7.4)

8

In order to ﬁnd the µ

0

which maximizes the expected log likelihood described

in (7.4), we will diﬀerentiate it wrt µ

0

and set the diﬀerential to zero:

δQ

δµ

0

= 2λµ

0

−2λE[h

1

] = 0

⇒µ

n

0

= E[h

1

] (7.5)

Proceeding in a similar manner to establish the maximal λ:

δQ

δλ

=

1

2

_

σ

2

0

−E

_

h

1

h

T

1

¸

−E[h

1

] µ

T

0

−µ

0

E

_

h

T

1

¸

+µ

0

µ

T

0

_

= 0

σ

2

0

n

= E

_

h

1

h

T

1

¸

−E[h

1

] E

_

h

T

1

¸

(7.6)

In order to optimize for A and Σ

H

we will substitute for p(h

t

|h

t−1

, A, Σ

H

)

in (7.3) giving:

Q = −

T −1

2

ln |Σ

H

| −E

H|θ

old

_

1

2

T

t=2

(h

t

−Ah

t−1

)

T

Σ

−1

H

(h

t

−Ah

t−1

)

_

+c

(7.7)

Maximizing with respect to these parameters then gives:

A

n

=

_

T

t=2

E

_

h

t

h

T

t−1

¸

__

T

t=2

E

_

h

t

h

T

t−1

¸

_

−1

(7.8)

Σ

n

H

=

1

T −1

T

t=2

_

E

_

h

t

h

T

t−1

¸

−A

n

E

_

h

t−1

h

T

t

¸

−E

_

h

t

h

T

t−1

¸

A

n

+A

n

E

_

h

t−1

h

T

t−1

¸

(A

n

)

T

_

(7.9)

In order to determine values for B and Σ

V

we substitute for p(v

t

|h

t

, B, Σ

V

)

in (7.3) to give:

Q = −

T

2

ln |Σ

V

| −E

H|θ

old

_

1

2

T

t=2

(v

t

−Bh

t

)

T

Σ

−1

V

(v

t

−Bh

t

)

_

+c

(7.10)

Maximizing this with respect to B and Σ

V

gives:

B

n

=

_

T

t=1

v

t

E

_

h

T

t

¸

__

T

t=1

E

_

h

t

h

T

t

¸

_

−1

(7.11)

Σ

n

V

=

1

T

T

t=1

_

v

t

v

T

t

−B

n

E[h

t

] v

T

t

−v

t

E

_

h

T

t

¸

B

n

+B

n

E

_

h

t

h

T

t

¸

B

n

_

(7.12)

9

Using the values calculated from the smoothing procedure in section 5:

E[h

t

] = h

s

t

E

_

h

t

h

T

t

¸

= F

s

t

E

_

h

t

h

T

t−1

¸

= X

s

t

We can now set out the procedure for parameter learning using Expectation

Maximization:

1. Choose starting values for the parameters θ =

_

µ

0

, σ

2

0

, A, Σ

H

, B, Σ

V

_

.

2. Using the parameter set θ, calculate the ﬁltered statistics

ˆ

h

t|t

and F

t|t

for each time step as described in section 4.

3. Using the parameter set θ, calculate the smoothed statistics h

s

t

, F

s

t

and X

s

t

for each time step as described in section 5.

4. Update A:

A

n

=

_

T

t=1

h

s

t

h

s

t−1

T

+

T

t=1

X

s

t

__

T

t=1

h

s

t

h

s

t

T

+

T

t=1

F

s

t

_

−1

5. Update Σ

H

:

Σ

n

H

= [T −1]

−1

_

_

T

t=2

h

s

t

h

s

t

T

+

T

t=2

F

s

t

−A

n

_

T

t=1

h

s

t

h

s

t−1

T

+

T

t=1

X

s

t

_

T

_

_

6. Update B:

B

n

=

_

T

t=1

v

t

h

s

t

T

__

T

t=1

h

s

t

h

s

t

T

+

T

t=1

F

s

t

_

−1

7. Update Σ

V

:

Σ

n

V

=

_

_

T

t=1

v

t

v

T

t

−B

n

_

T

t=1

v

t

h

s

t

T

_

T

_

_

T

−1

8. Update µ

0

:

µ

n

0

= h

s

1

9. Update σ

2

0

:

σ

2

0

n

=

_

F

s

1

+h

s

1

h

s

1

T

¸ _

1 −µ

n

0

µ

n

0

T

¸

−1

10. Iterate steps 2 to 10 a given number of times or until the diﬀerence

between parameter values from succeeding iterations is below a pre-

deﬁned threshold.

10

References

[1] C. M. Bishop, Pattern Recognition and Machine Learning (Information

Science and Statistics). Springer, 2006.

[2] Z. Ghahramani and G. Hinton, “Parameter estimation for linear dynam-

ical systems,” Tech. Rep., 1996.

11

1 Introduction The aim of this document is to derive the ﬁltering equations for the simplest Linear Dynamical System case. do a similar thing for the smoothing equations and conclude with parameter learning in an LDS (calibrating the Kalman Filter). ΣH ) ⇒ p(ht |ht−1 ) ∼ N (Aht−1 . ΣH ) (2.e. The document is based closely on Bishop [1] and Gharamani’s [2] texts. outline the ﬁlter’s implementation. but is more suitable for those who wish to understand every aspect of the mathematics required and see how it all comes together in a procedural sense. the Kalman Filter. σ0 ) 1 . Starting with an initial distribution for our latent variable given the ﬁrst observation: p(h1 |v1 ) ∝ p(v1 |h1 )p(h1 ) and the assumption that h1 has a Gaussian distribution: 2 p(h1 ) ∼ N (µ0 . ΣV ) (2.2) We wish to infer the probability distribution of ht given the observations up to that point in time v1:t . 2 Model Speciﬁcation The simplest form of Linear Dynamical System (LDS) models a discrete time process where a latent variable h is updated every time step by a constant linear state transition A with the addition of zero-mean Gaussian noise η h : h ht = Aht−1 + ηt h where ηt ∼ N (0. p(ht |v1:t ). i. ΣV ) ⇒ p(vt |ht ) ∼ N (Bht .1) This latent variable is observed through a constant linear function of the latent state B also subject to zero-mean Gaussian noise η v : v vt = Bht + ηt v where ηt ∼ N (0. which can be expressed recursively.

2) are both Gaussian and the operations in (2. (2. vt |v1:t−1 ) = ht−1 p(ht . its a priori value) while Ft|t would denote 2 .e. vt |v1:t−1 .1) The subscript in Ft|t−1 denotes the fact that this is F ’s value before an observation is made at time t (i. ht−1 . where h denotes the actual value of the latent ˆ ˆ variable. v1:t−1 )p(ht−1 |v1:t−1 ) ht−1 = ht ⊥v1:t−1 |ht−1 ⇒ ht−1 p(vt |v1:t−1 . ht ⇒ = ht−1 p(vt |ht )p(ht |ht−1 )p(ht−1 |v1:t−1 ) (2.3) The fact that the distributions described in (2.values for p(ht |v1:t ) for subsequent values of t can be found by iteration: p(ht |v1:t ) = p(ht |vt . v1:t−1 ) p(ht . vt |v1:t−1 ) = p(vt |v1:t−1 ) ∝ p(ht . h is its estimated value. ht−1 . ht )p(ht |ht−1 . ht )p(ht |ht−1 )p(ht−1 |v1:t−1 ) p(vt |v1:t−1 .3) of multiplication and then integration will yield Gaussians when performed on Gaussians. ht )p(ht |ht−1 )p(ht−1 |v1:t−1 ) ht−1 vt ⊥ht−1 |v1:t−1 . ht−1 )p(ht−1 |v1:t−1 ) p(vt |v1:t−1 . means that we know p(ht |v1:t ) will itself be a Gaussian: 2 p(ht |v1:t ) ∼ N (µt . σt ) 2 and the task is to derive the expressions for µt and σt .4) 3 Deriving the state estimate variance ˆ If we deﬁne ∆h ≡ h − E h .1) and (2. E h is the expected value of this estimate and F as the covariance of the error estimate then: Ft|t−1 = E ∆ht ∆hT t h h = E (A∆ht−1 + ∆ηt )(A∆ht−1 + ∆ηt )T h hT h hT = E (A∆ht−1 ∆ht−1 T AT + ∆ηt ∆ηt + ∆ηt AT ∆hT + A∆ht−1 ∆ηt ) t−1 h hT = AE ∆ht−1 ∆hT AT + E ∆ηt ∆ηt t−1 = AFt−1:t−1 AT + ΣH (3.

if we wish to add a correction to our a priori estimate that is proportional to the error t we can use a coeﬃcient κ: ˆ ˆ ht|t = ht|t−1 + κ t (3.4) This allows us to express Ft|t recursively: ˆ Ft|t = Cov(ht − ht|t ) ˆ = Cov(ht − (ht|t−1 + κ t ) ˆ ˆ = Cov(ht − (ht|t−1 + κ(vt − B ht|t−1 ))) v ˆ ˆ = Cov(ht − (ht|t−1 + κ(Bht + ηt − B ht|t−1 ))) ˆ ˆ = Cov(ht − ht|t−1 − κBht − κη v + κB ht|t−1 ) t v ˆ = Cov((I − κB)(ht − ht|t−1 ) − κηt ) ˆ = Cov((I − κB)(ht − ht|t−1 )) + Cov(κη v ) t v ˆ = (I − κB)Cov(ht − ht|t−1 )(I − κB)T + κCov(ηt )κT = (I − κB)Ft|t−1 (I − κB)T + κΣV κT = (Ft|t−1 − κBFt|t−1 )(I − κB)T + κΣV κT = Ft|t−1 − κBFt|t−1 − Ft|t−1 (κB)T + κBFt|t−1 (κB)T + κΣV κT = Ft|t−1 − κBFt|t−1 − Ft|t−1 B T κT + κ(BFt|t−1 B T + ΣV )κT (3.3) Now that we have an observation.6) 3 .2) Once we have an observation (and are therefore dealing with posterior values).5) becomes: Ft|t = Ft|t−1 − κBFt|t−1 − Ft|t−1 B T κT + κSt κT (3. we can deﬁne t as the diﬀerence between the observation we’d expect to see given our estimate of the latent state (its a priori value) and the one actually observed.e. i. This more informative notation allows the update equation in (2.1) to be expressed as follows: ˆ ˆ ht|t−1 = Aht−1|t−1 (3.a value for Ft|t after an observation is made (its posterior value).5) If we deﬁne the innovation variance as St = BFt|t−1 B T + ΣV then (3.: t ˆ = vt − B ht|t−1 (3.

so we wish to ﬁnd the κ that gives us the minimum MSE.6): Ft|t = Ft|t−1 − KBFt|t−1 − Ft|t−1 B T K T + Ft|t−1 B T K T = (I − KB)Ft|t−1 (4. i. we need to ﬁnd κ such that: δT r(Ft|t ) =0 δκ δT r(Ft|t−1 − κBFt|t−1 − Ft|t−1 B T κT + κSt κT ) (2. we can use the mean square error measure (MSE): E ˆ ht − ht|t 2 ˆ = T r(Cov(ht − ht|t )) = T r(Ft|t ) (4.2) This optimum value for κ in terms of minimizing MSE is known as the Kalman Gain and will be denoted K.1) The only coeﬃcient we have control over is κ.4 Minimizing the state estimate variance If we wish to minimize the variance of Ft|t .4) (4.2) by SK T : KSK T = Ft|t−1 B T K T Substituting this into (3.6) ⇒ =0 δκ −1 ⇒κ = Ft|t−1 B T St (4.3) 4 .e. If we multiply by both sides of (4.

5 Filtered Latent State Estimation Procedure (The Kalman Filter) The procedure for estimating the state of ht . Make an observation vt 5.4). 5 .e. proceeds as follows: ˆ ˆ 1. Update latent state estimate: ˆ ˆ ht|t = ht|t−1 + K t 9.2)): Ft|t = (I − KB)Ft|t−1 10. Cycle through stages 2 to 9 for each time step. Update estimate covariance (from (4. h0|0 and F0|0 ). Advance estimate covariance: Ft|t−1 = AFt−1|t−1 AT + ΣH 4. Calculate innovation: ˆ t = vt − B ht|t−1 6. which when using the MSE optimal value for κ is called Kalman Filtering. 2 ˆ Note that ht|t and Ft|t correspond to µt and σt from (2. 2. Calculate K: −1 K = Ft|t−1 B T St 8. Advance latent state estimate: ˆ ˆ ht|t−1 = Aht−1|t−1 3. Calculate St : St = BFt|t−1 B T + ΣV 7. Choose initial values for h and F (i.

6 Smoothed Latent State Estimation The smoothed probability of the latent variable is the probability it had a value at time t after a sequence of T observations.e. v1:t )p(ht+1 |v1:T ) p(ht+1 . p(ht |v1:T ). v1:t ) ∴ p(ht |v1:T ) = ht+1 p(ht |ht+1 . v1:T ) = p(ht |ht+1 . Commencing at the ﬁnal time step in the sequence (t = T ) and working backwards to the start (t = 1).1) 6 . i. ht |v1:t )p(ht+1 |v1:T ) ht+1 = ht+1 p(ht+1 |ht . v1:T )p(ht+1 |v1:T ) ht ⊥vt+1:T |ht+1 ⇒ p(ht |ht+1 . v1:t )p(ht |v1:t )p(ht+1 |v1:T ) p(ht+1 |ht )p(ht |v1:t )p(ht+1 |v1:T ) ht+1 ht+1 ⊥v1:t |ht ⇒ (6. Unlike the Kalman Filter which you can update with each observation. p(ht |v1:T ) can be evaluated as follows: p(ht |v1:T ) = ht+1 p(ht |ht+1 . one has to wait until T observations have been made and then retrospectively calculate the probability the latent variable had a value at time t where t < T . ht |v1:t )p(ht+1 |v1:T ) = ∝ ht+1 p(ht+1 |v1:t ) p(ht+1 .

e. Calculate As : t As = (AFt|t )T (AFt|t AT + ΣH )−1 t s 3. Calculate the smoothed latent variable estimate hs : t ˆ ˆ hs = As hs + ht|t − As Aht|t t t t+1 t 5. Calculate St : s St = Ft|t − As AFt|t t 4. Calculate the smoothed estimate covariance Fts : s s s s Fts = 1 (As Ft+1 AT + St ) + (As Ft+1 AT + St )T t t 2 s 6. we know that p(ht |v1:T ) will be a Gaussian and we will need to establish it’s mean and variance at each t.4): p(ht |v1:T ) ∼ N (hs . Set hs and FT to hT |T and FT |T from steps 8 and 9 in section 5. i. the procedure for estimating the smoothed parameters hs and Fts works backwards from the last time step in the sequence. Calculate the smoothed cross-variance Xt : s ˆ Xt = As Fts + hs hT t t t|t 7. Cycle through stages 2 to 6 for each time step backwards through the sequence from t = T to t = 1.2) ˆ Using the ﬁltered values calculated in the previous section for ht|t and Ft|t for each time step.e.As before. in a similar manned to (2. i. Fts ) t (6. at t t = T as follows: s ˆ 1. 7 . T 2.

2) Taking logs gives us the model’s log likelihood: T T ln p(h1:T . B. c to represent terms that are not dependent on µ0 or 2 2 σ0 . σ0 ) t=2 T p(ht |ht−1 . v1:T ) = p(h1 ) t=2 p(ht |ht−1 ) t=1 p(vt |ht ) (7. ΣV but in order to learn these values.e. σ0 ): 1 1 2 (h1 − µ0 )T λ(h1 − µ0 ) + c Q = − ln σ0 − EH|θold 2 2 1 1 2 = − ln σ0 − EH|θold hT λh1 − hT λµ0 − µT λh1 + µT λµ0 + c 1 1 0 0 2 2 1 ln |λ| − T r λEH|θold h1 hT − h1 µT − µ0 hT + µ0 µT +c = 1 0 1 0 2 (7. v1:t ). ΣH . B. v1:T |θ) = p(h1 |µ0 . H for h1:T . B.1) Making the dependence on the parameters explicit. A.7 Expectation Maximization (Calibrating the Kalman Filter) The procedures outlined in the previous sections are ﬁne if we assume that 2 we know the value in the parameter set θ = µ0 . ΣV ) (7. Using V to represent the set of observations up to and including time t (i. σ0 ) + t=2 ln p(ht |ht−1 . v1:T |θ) = 2 ln p(h1 |µ0 .3) We will deal with each of the three components of (7.4) 8 . V |θ)] we will ﬁrst ﬁnd the 2 expected value for p(h1 |µ0 . ΣV ) (7.3) in turn. the likelihood of the model given the parameter set θ is: T 2 p(h1:T . λ to represent (σ0 )−1 and Q = EH|θold [ln p(H. The joint probability of T time steps of the latent and observable variables is: T T p(h1:T . ΣH ) + t=1 ln p(vt |ht . σ0 . θold to represent our parameter values before an iteration of the EM loop. A. A. the superscript n to represent the value of a parameter after an iteration of the loop. ΣH ) t=1 p(vt |ht . we will need to perform the Expectation Maximization algorithm.

9) In order to determine values for B and ΣV we substitute for p(vt |ht . ΣV ) in (7.10) Maximizing this with respect to B and ΣV gives: T T −1 B = Σn = V 1 T t=1 T n vt E hT t t=1 E ht hT t (7.6) In order to optimize for A and ΣH we will substitute for p(ht |ht−1 .7) Maximizing with respect to these parameters then gives: T T −1 A = t=2 n E 1 T −1 ht hT t−1 t=2 T E ht hT t−1 (7.11) T T vt vt − B n E [ht ] vt − vt E hT B n + B n E ht hT B n t t t=1 (7.3) giving: Q=− T −1 1 ln |ΣH | − EH|θold 2 2 T (ht − Aht−1 )T Σ−1 (ht − Aht−1 ) + c H t=2 (7. B.In order to ﬁnd the µ0 which maximizes the expected log likelihood described in (7. A. we will diﬀerentiate it wrt µ0 and set the diﬀerential to zero: δQ = 2λµ0 − 2λE [h1 ] = 0 δµ0 ⇒ µn = E [h1 ] 0 Proceeding in a similar manner to establish the maximal λ: δQ 1 2 = σ − E h1 hT − E [h1 ] µT − µ0 E hT + µ0 µT = 0 1 0 1 0 δλ 2 0 2n T T σ0 = E h1 h1 − E [h1 ] E h1 (7. ΣH ) in (7.3) to give: T 1 Q = − ln |ΣV | − EH|θold 2 2 T (vt − Bht )T Σ−1 (vt − Bht ) + c V t=2 (7.12) 9 .4).5) (7.8) Σn = H n T T n n T n T E ht hT t−1 − A E ht−1 ht − E ht ht−1 A + A E ht−1 ht−1 (A ) t=2 (7.

4. Update σ0 : 2 n = F s + hs hs T σ0 1 1 1 1 − µn µn T 0 0 −1 10. Update ΣH : T T T T T s Xt Σn = [T − 1]−1 H t=2 hs hs T + t t t=2 Fts − An t=1 hs hs T + t t−1 t=1 6. σ0 . Update µ0 : µn = hs 0 1 2 9. 10 . Fts t s and Xt for each time step as described in section 5.Using the values calculated from the smoothing procedure in section 5: E [ht ] = hs t E ht hT = Fts t s E ht hT t−1 = Xt We can now set out the procedure for parameter learning using Expectation Maximization: 2 1. ΣH . Choose starting values for the parameters θ = µ0 . Using the parameter set θ. Update B: T T T −1 Bn = t=1 v t hs T t t=1 hs hs T t t + t=1 Fts 7. Update A: T T T s Xt t=1 t=1 T −1 An = t=1 hs hs T t t−1 + hs hs T t t + t=1 Fts 5. ˆ 2. Update ΣV : T T T T −1 Σn = V t=1 T vt vt − B n t=1 vt hs T t 8. Using the parameter set θ. ΣV . 3. A. calculate the ﬁltered statistics ht|t and Ft|t for each time step as described in section 4. B. calculate the smoothed statistics hs . Iterate steps 2 to 10 a given number of times or until the diﬀerence between parameter values from succeeding iterations is below a predeﬁned threshold.

Bishop. Ghahramani and G.. Springer. 1996. Hinton. [2] Z. Pattern Recognition and Machine Learning (Information Science and Statistics). Rep.” Tech. 2006. “Parameter estimation for linear dynamical systems. M.References [1] C. 11 .

- Read and print without ads
- Download to keep your version
- Edit, email or read offline

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

CANCEL

OK

You've been reading!

NO, THANKS

OK

scribd

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->