You are on page 1of 8

A Mickey Mouse Guide to Kalman Filtering

Mark J. L. Orr 
Advanced Robotics Research Ltd. and SD-Scicon UK Ltd.
February 15, 1993

Abstract
This document is an introduction to Kalman ltering and associated
techniques starting at the level of, roughly, Mickey Mouse and the Girl's
Blouse and ending up at the level of, approximately, Robocop II.

1 Introduction
When people tell you something about a thing you want to know about, and
you already know a little bit about that thing, but you're not sure about what
you know, and they're not too sure either about what they're telling you, then
that's when you need ... a Kalman lter.
The Kalman lter, developed as recently as 1960, is a tool for estimating the
true state of a airs from unreliable information. It's a numerical tool, so both
the state of a airs, which hence forth we'll call the state, and the information,
henceforth the observations, must be representable numerically. In addition
the unreliability of the observations and any prior knowledge about the state
must be modelled as normal (i.e. gaussian) probability distributions. Such
distributions require only mean vectors and variance-covariance matricies for
their complete speci cation.
The Kalman lter is widely used in engineering, control, navigation and
communications. Some examples:
 estimating the parameters of an ellipse [7]
 rotation estimation in computer vision [5]
 target tracking [2]
 world modelling and sensor fusion in robotics [1, 3]
 satellite navigation
 econometrics
In its most general form, the Kalman lter can be applied to time varying,
controlled non-linear systems. We will look at the non-linear extensions but omit
 Currently a visiting worker at the Department of Arti cial Intelligence, Edinburgh
University.

1
the control and time varying aspects. Consequently there will be no discussion
of state transition as there would be in more general treatments (e.g. [2]).
Most applications in robotics (so far) have been restricted to the time-invariant
uncontrolled case.
In section 2 we look at representing uncertainty with probability distribu-
tions. In section 3 we further de ne the terms state and observation and intro-
duce the measurement equation and recursive estimation. In the next section
we present the basic Kalman lter equations, and then, in the section following,
extensions to cope with non-linear systems. We present the associated test, the
Mahalanobis distance test in section 6 and nally there is a simple illustrative
example in section 7.
The notation we use below is similar to Ayache and Faugeras [1], one of the
best references for Kalman ltering in a robotics context. Scalars are repre-
sented with normal type letters, vectors with emboldened lower case letters and
matricies with emboldened upper case letters. By convention, vectors are single
column matricies unless explicitly transposed as in:
2 3
x1
66 x2 77
x = 4 ::: 5 , xt = [x1; x2 ; ::: xk ]
xk
We use the hat symbol to distinguish true from estimated values: x is the true
value while x^ is an estimate.
For those of you who generally nd understanding equations hard, I would
like to give some words of encouragement. The Kalman lter is a numerical tool
built on algebraic analysis. To understand how it works, even at the most basic
level, entails understanding certain equations. That cannot be avoided. How-
ever, though these equations may look complicated, with all sorts of syntactic
confetti, I believe that most people with a basic knowledge of mathematics (and
this must surely include almost all ARRL engineers) can understand every one
of them. This may need a little patience to read an equation or explanatory
text more than once, but patience which is rewarded in the end. If you are
like me, understanding something mathematical which once was unintelligible
gives a peculiar sense of satisfaction (see my forthcoming work: The Orgasm in
Mathematics).

2 Uncertainty, Probability and Covariance


Uncertainty has to be represented as a normal probability distribution. This
means that every independent scalar parameter x is to be regarded as having
being sampled from a normal probability distribution with a mean x^ and vari-
ance x2 (or standard deviation x). These two numbers, the mean and variance,
are sucient to specify the distribution, which is:
p(x) = p 1 e 21 ( x )
x x^ 2
2x
For all but the simplest cases it requires more than one number to represent
a state or an observation so we have to manipulate vector quantities. When
x = [x1; x2 ; :::xd]t is a vector quantity of dimension d, its mean is also a
vector and but its uncertainty is represented by a d-square symmetric variance-
covariance matix, x . This matrix records the expected, or most likely, value

2
of (x x^ )(x x^ )t just as x^ is the expected, or most likely, value of x.
E[x] = x^
E[(x x^ )(x x^ )t ] = x
Each diagonal entry of x , ii, is the variance of the corresponding component
xi, while the o -diagonal entries, ij , are the covariances between di erent
components.
The probability distribution for the vector random variable is:
p(x) = j 2 j 1
2 e 1(
2 x x t 1 x x
^) ( ^)

3 States, Observations, Measurements and Re-


cursion
If we denote by a the true state and by x the perfect observation then the
equation:
x = Ma
describes mathematically how state and observation are related. This equation
is called the measurement equation or sensor model. M is a matrix of xed
values with m rows and n columns, where m and n are the dimensions of,
respectively, x and a. This is the linear form of the measurement equation; we
will meet the non-linear form in section 5.
Each imperfect observation, with mean x^ i , covariance i and measurement
equation xi = Mi a is processed by the Kalman lter in turn, producing a new
estimate with mean a^i and covariance Si , i = 1; 2; :::k. The subscript i is a
computation step (not the component index as in the previous section) as the
observations are processed sequentially. Note that since it is possible to process
di erent types of observations relating to the same state but in di erent ways,
we also need to subscript the measurement matrix, Mi . At each step another
observation is ltered until, nally, all of them have been processed leaving the
nal state estimate (^ak ; Sk ) as a compact representation of the information in
all the observations plus the prior information in the initial state (^a0 ; S0 ).
The enormous advantage of such a step wise (also sometimes called recursive)
solution, is that if we decide, after a set of k measurements, to stop the measures,
we only have to keep a^k and Sk as the whole memory of the measurement
process. If we decide later to take into account additional measurements, we only
have to initialise a^0 = a^k and S0 = Sk and to process the new measurements to
obtain exactly the same solution as if we had processed all the measurements
together.

4 The Basic Kalman Filter


The basic equations (which will be stated but not derived) are a recipe for
processing one observation at step i and updating the state from the old estimate
a^i 1 to the new one a^i :
a^i = a^i 1 + Ki (^xi Mi a^i 1)

3
Si = (I Ki Mi )Si 1

or equivalently
Si = Si
1
+ Mti i 1 Mi
1
1

One can see that the previously estimated mean a^i 1 is corrected by an amount
proportional to the current error (^xi Mi a^i 1) called the innovation. The
proportionality factor Ki , is called the Kalman gain and is a matrix with n
rows and m columns given by:
Ki = Si Mti (i + Mi Si Mti )
1 1
1

These equations are more intuitive than they might at rst seem. As an
illustration, consider the very simple case when both the state and observation
are scalars and the measurement equation is simply x = a. In this case the
measurement matrix collapses to the scalar value M = 1 and, denoting the
measurement and state variances by  and  respectively, the gain is also a
scalar:
2
Ki = 2 +i 1 2
i i 1

a^i = i ^ai 2 1 ++ 2 i 1x^i


2 2

i i 1

i2 = 2i+i 12


2 2

i i 1
Notice how if i 1 ! 1 (no prior information) a^ ! x^i and i2 ! 2i so the
2

observation becomes the new state. Conversely, if 2i ! 1 (no information in


the observation), then a^i ! a^i 1 and i2 ! i2 1 so the observation is ignored.
The Kalman lter has been described as \a glori ed weighted averaging
technique" [6]. The above illustration is suggestive of that but it should be
remembered that it is a very a simple example. In any case, the Kalman lter
is the best estimation technique for, if all the assumptions about independent
normal probability distributions and linear measurement equations are true,
then it gives provably optimal (minimum variance) solutions.

5 The Iterated Extended Kalman Filter


Unfortunately, most interesting problems are non-linear. In particular, they
have non-linear measurement equations involving a function fi (rather than a
matrix Mi ):
fi (xi ; a) = 0
The trick is to assume that the means x^ i and a^i 1 are good approximations
for the true values xi and a, expand the measurement equation in a Taylor
series, truncate the series after the rst order terms and nally emerge with an
approximation to a linear system:
fi(xi ; a) = 0  fi(^xi ; a^i 1) + @@xfi (xi x^ i ) + @@fai (a a^i 1)
i

where the Jacobians:


@ fi
@ x = [rxi f ]
tt
i

4
@ fi = [r f t]t
@a a
are evaluated at (^xi ; a^i 1 ). This equation can be rewritten into the (linear)
form:
yi = Mi a + ui
where
yi = fi (^xi ; a^i 1) + @@fai a^i 1
Mi = @@fai
ui = @@xfi (xi x^i )
i
This is now a linear measurement equation where yi is the measurement, Mi
the linear transformation and ui is the random measurement noise. Both yi
and Mi are readily computed from the actual measurement x^i , the estimate
a^i 1 and the function fi and its rst derivative. The second order statistics of
ui are easily derived:
E[ui] = 0
 t
Wi  E[uiuti] = @@xfi i @@xfi
i i

We can now write down a new set of equations which take into account the
linearisation process and which constitute the extended Kalman lter.
a^i = a^i 1 + Ki (yi Mi a^i 1 ) = a^i 1 Ki fi(^xi ; a^i )
1

Si = (I Ki Mi )Si 1

or equivalently
Si = Si
1 1
1 + Mti Wi 1Mi
and where
Ki = Si Mti (Wi + Mi Si Mti )
1 1
1

If the estimate a^i 1 around which the Taylor expansion is performed is too
far from the correct parameter a the approximation which led above to a linear
measurement equation is not very good and the optimal solution of the linear
system may di er signi cantly from the true one. A method to reduce the e ect
of these approximation errors is to apply the iterated extended Kalman lter.
This consists of applying the update equation for the mean, a^i = a^i 1 Ki fi,
as long as a^i a^i 1 is large enough, computing at each iteration a new value of
Ki , yi and Mi obtained from a re-linearisation of fi about the new estimate a^i .
Ayache and Faugeras [1] say: \In general, after a few iterations a^i is so
close to a that the linearisation error is almost zero, yielding an almost optimal
lter". I have to say I have found that this isn't always true. My experience
with problems requiring more than one observation to completely constrain the
solution is that if the initial guess, a^0 , is very di erent from the true value of
a then quite often the true value is never approached. Consequently it is often
necessary to nd some reasonable initial guess before starting the lter, perhaps
by waiting until enough observations are available to constrain the solution.

5
6 Mahalanobis Distance
A basic problem in tracking, data fusion and computer vision is how to associate
a given observation with a given model and its estimate. In most images there is
data about a number of di erent features of the environment; which data goes
with which feature? The basic, and still most common method to solve this
problem is the Mahalanobis distance test [3], also called the nearest-neighbour
standard lter, the validation gate, or the normalised innovation.
This technique relies on calculating a normalised distance between observa-
tion and state which can be used to guage whether the observation plausibly
associates with the state. In the case of a non-linear system we have, at step
i, an estimate a^i 1 and an attached covariance Si 1 for parameter a. We also
have a noisy measurement (^xi ; i ) of xi , and we want to test the plausibility
of this measurement with respect to the equation fi(xi ; a) = 0.
If we consider again a rst order expansion of fi(xi ; a) about (^xi ; a^i 1 ),
since (^xi xi ) and (^ai 1 a) are independent gaussian processes, we see that
so is (up to a linear approximation) f (^xi; a^i 1 ) whose mean and covariance are:
E[f (^xi; a^i 1)] = 0
@ f  @ f t @ f  @ f t
Qi  E[f (^xi; a^i 1 )f (^xi ; a^i 1)] = @ x i @ x + @ a Si 1 @ ai
t i i i
i i

Therefore if the rank of Qi is q, the generalised Mahalanobis distance:


d(^xi ; a^i 1 ) = f (^xi; a^i 1 )Qi 1f t(^xi ; a^i 1)
has a 2 distribution with q degrees of freedom.
Looking at a 2 distribution table, it is therefore possible to disassociate a
measurement x^ i at a given (e.g. 95%) con dence rate by setting an appropriate
threshold  on the Mahalanobis distance. Only associations satisfying:
d(^xi ; a^i 1 ) < 
are acceptable.

7 Example
In this section we go through a fairly simple example to illustrate the techniques
discussed above. For a more complicated problem, namely the estimation of
rotations, see [4].
Consider the equation of a circle in the (p; q) plane centred on ( ; ) with
radius :
(p )2 + (q )2 = 2
Suppose we have an instance of such a circle with the true (but unknown)
parameters: 2 3 2 3
3:0
4 5 = 4 3:0 5
2:0

6
upon which (our imperfect sensors tell us) lie the points:
 
x^ 1 = 3:02 1:01
 
x^ 2 = 1:01 2:98
 
x^ 3 = 2:99 4:97
all with the same error:
 
i = 0:0001 0:0
0:0 0:0004
and that the prior information is:
2 3
3:5
a^0 = 4 2:5 5
1:5
2 3
1:0 0:0 0:0
S0 = 4 0:0 1:0 0:0 5
0:0 0:0 1:0
What then is the best estimate for the parameters of the circle? If we make
the parameters of the circle the state (a = [ ]t ) and make the sensed points
the observations (xi = [pi qi]t) we can use an extended Kalman lter. The
measurement equation is:
fi (xi ; a) = (pi )2 + (qi )2 2 = 0
and its derivatives are:
 
@fi = @fi @fi = [2(p ) 2(q )]
@ xi @pi @qi i i

 
@fi = @fi @fi @fi = [2( p ) 2( q ) 2 ]
@a @ @ @ i i

These expressions for the measurement equation and its derivatives, the
points x^i ; i = 1; 2; 3 and their covariance i and the prior state information
(^a0; S0) were all fed to an iterated extended Kalman lter. The following results
were obtained:
i a^i Si
3:5 1:0 0:0 0:0
0 2:5 0:0 1:0 0:0
1:5 0:0 0:0 1:0
2:90 2:9  10 1 1:5  10 1 4:3  10 1
1 2:63 1:5  10 1 9:7  10 1 8:9  10 2
1:86 4:3  10 1 8:9  10 2 7:4  10 1
2:85 2:3  10 1 3:2  10 1 2:7  10 1
2 2:75 3:2  10 1 4:5  10 1 3:8  10 1
1:75 2:7  10 1 3:8  10 1 3:2  10 1
3:03 2:0  10 4 6:1  10 5 1:2  10 4
3 3:00 6:1  10 5 2:1  10 4 3:3  10 5
1:95 1:2  10 4 3:3  10 5 2:0  10 4

7
The rst thing to notice is that a^3 is much closer to the value of a = [3 3 2]t
than a^0 and that the nal covariance is small indicating that the lter is con -
dent about this solution. Notice also how the covariance remains relatively high
until after the third observation has been ltered - a consequence of the fact
that it takes at least three points to x a circle.

References
[1] N. Ayache and O.D. Faugeras. Maintaining representations of the environ-
ment of a mobile robot. In Robotics Research 4, pages 337{350. MIT Press,
USA, 1988.
[2] Y. Bar-Shalom and T.E. Fortmann. Tracking and Data Association. Aca-
demic Press, UK, 1988.
[3] H.F. Durrant-Whyte. Multi-sensor data fusion for (semi-)autonomous
robots. Technical review document, Advanced Robotics Research Ltd., Uni-
versity Road, Salford M5 4PP, UK, 1990.
[4] M.J.L. Orr. On estimating rotations. Department of Arti cial Intelligence,
Edinburgh University, Working Paper 233, 1992.
[5] M.J.L. Orr, R.B. Fisher, and J. Hallam. Uncertain reasoning: Intervals
versus probabilities. In British Machine Vision Conference, pages 351{354.
Springer-Verlag, 1991.
[6] G. Pegman, 1990. Personal communication.
[7] J. Porril. Fitting ellipses and predicting con dence using a bias corrected
Kalman Filter. Image and Vision Computing, 8(1):37{41, 1990.

You might also like