Professional Documents
Culture Documents
Kalman Filter
COMP 486
Continuous States
●
HMM's only allow us to reason about discrete state
systems:
– Rainy or Sunny?
– Urn #1 or Urn #2?
●
We often need to work with continuous states:
– Robot position.
– Temperature and Humidity.
– etc.
●
One possibility is discretization.
– Forces us to trade off the number of states against the accuracy
of the model.
Kalman Filter
●
The Kalman filter allows us to solve the following
problem:
– Continuous state, discrete time.
– We have a model of the state transition.
– We have a model of how observations relate to the underlying
state.
– Predict the state given a sequence of observations.
●
We will not talk about learning for Kalman filters.
Kalman Filter Assumptions
●
State dynamics are linear – the current state is a linear
function of the previous state.
●
Noise in the state dynamics is normally distributed.
●
The observation process is linear – observations are a
linear function of the state.
●
The observation noise is normally distributed.
Linear State Dynamics
●
A simple example: Radioactive Decay.
– x is our one dimensional state variable amount of decaying
material.
– The following equation describes an exponentially decreasing
amount of radioactive material (r < 1).
x t =r x t −1
Another Example
●
Motion of a particle moving in one dimension at a
constant velocity.
pt
●
x is now a two dimensional vector: x=
[]
vt
1 1
●
●
In other words:
x t = A x t−1
Our update equation is: , where A=
[ ]
0 1
●
There are a huge range of interesting questions here:
– Given a process, how do we specify the correct transition
matrix?
– Given a transition matrix, what can we say about the state
dynamics?
– What if we have a process that has fundamentally nonlinear
dynamics?
●
Take these classes:
●
Math 270 “NonLinear Dynamics and Chaos”
●
Math 260 “Differential Equations/Numerical Methods”
Process Noise
●
Previous examples assumed that our transition model
perfectly captures the true state dynamics.
●
In the real world, our model will never be perfect:
– For example, moving objects are influenced by friction, air
resistance etc.
●
We account for this by adding a noise term to our update
equation:
x t = A x t−1w
●
Where (This is common shorthand for “w
w~N 0, Q
is drawn from the normal distribution with mean 0 and
covariance matrix Q”)
Measurements (Observations)
●
As in an HMM, we do not have access to the true state.
●
We get observations z that are linearly related to the true
state x:
z t= H xt
●
The dimensionality of z, may be different from x.
●
For example, in our 1D motion case, assume that
velocity cannot be directly observed: H=[ 1 0 ]
pt
z t=[ 1 0 ]
[]
vt
, z t = pt
Noisy Measurements
●
Once again, our measurements don't perfectly reflect
the true system state.
●
We account for noisy measurements as follows:
z t = H x t v
●
Where v~ N 0, R
Putting it all Together
x t = A x t−1w
●
State dynamics:
w~N 0, Q
z t = H x t v
●
Measurement model:
v~ N 0, R
●
Complete parameterization:
– A – process transition matrix.
– Q – process noise covariance.
– H – Measurement matrix.
– R – Measurement noise covariance.
An Aside: Control
●
Note that the Welch & Bishop tutorial has the following:
x t = A x t−1 B u t−1w
w~N 0, Q
●
u is a control signal.
●
This raises other interesting questions that we will
ignore.
Filtering Steps
● The system starts in some true (unobservable) state x1.
●
x1
We make an initial guess at the state .
– The “^” indicates that this is an estimate.
– The “” in the superscript indicates that this is an a priori estimate:
no observation has yet been made.
●
x t zt
**We update our estimate to based on the observation .
– This is the a posteriori estimate: after an observation has been made.
● The state is updated according to the dynamics resulting in xt+1.
●
We propagate our estimate according to the state dynamics
x t1
resulting in .
●
Return to **.
Kalman Filter Derivation
●
This is the tricky part:
●
x t zt
**We update our estimate to based on the observation .
●
We want to our estimate to be unbiased, and have the least
possible variance.
– Unbiased: the expected difference between our estimate
and the state should be 0.
– Minimum variance: as little uncertainty as possible in our
estimate.
●
(We won't show the whole derivation. Just the gist.)
Kalman Filter Derivation
●
We want an update rule of the following form:
x t = x t K z t − H x t
a priori estimate a posteriori estimate
error covariance error covariance
T
Pt =E [ e t et T ] P t =E [ e t e t ]
Kalman Filter Derivation
●
The goal is to find a K that minimizes the a posteriori
estimate error covariance:
T T
P t =E [ e t e t ]=E [ x t − x t x t − x t ]
●
x t K z t −H xt x t
First step: substitute for above.
●
Then take the derivative with respect to the trace of the
expectation.
– The trace is the sum of the diagonal elements. In this case,
that means we are minimizing the sum of the variances of the
different dimensions.
●
Set the derivative to 0 and solve for K.
Kalman Filter Derivation
T
Pt H
●
This is what we end up with: K t= T
H P t H R
●
Reminders:
P t =a priori estimate error covariance.
R=measurement noise covariance.
●
When R is large, K is small: we tend to ignore the
sensors if they are unreliable.
●
When is small K is small: we tend to ignore the
P t
sensors if our a priori estimate is precise.
Kalman Filter in a Nutshell
●
If we already have a reliable estimate:
– ignore our sensors.
●
If we have reliable sensors:
– ignore our estimate.
●
The gain factor gives the optimal tradeoff between these
two extremes.
The Whole Algorithm
PREDICT
project the state ahead:
x t = A x t−1w
project the error covariance
T
ahead: P t = A Pt −1 A Q
CORRECT
compute the Kalman gain:
T T
K t =P t H H P t H R
−1
update estimate with the measurement:
x t = x t K z t − H x t
update the error covariance:
From Welch and
P k = I − K t H P t Bishop
An Illustration...
Nice Properties
●
Optimal.
●
Efficient. (Takes advantage of temporal independence
assumptions)
●
Provides both a state estimate and a measure of the
uncertainty of that estimate.
The Extended Kalman Filter
●
An extension of the algorithm that handles nonlinear
state dynamics.
●
Not necessarily optimal.