You are on page 1of 5

[lecture NOTES] Ramsey Faragher

Understanding the Basis of the Kalman Filter


Via a Simple and Intuitive Derivation

T
his article provides a simple understand the basis of the Kalman fil- the Kalman filter, in the interests of
and intuitive derivation of ter via a simple and intuitive derivation. being concise, but instead aims to pro-
the Kalman filter, with the vide tutors with a simple method of
aim of teaching this useful RELEVANCE teaching the concepts of the Kalman fil-
tool to students from disci- The Kalman filter [2] (and its variants ter to students who are not strong
plines that do not require a strong such as the extended Kalman filter [3] mathematicians. The reader is expected
mathematical background. The most and unscented Kalman filter [4]) is to be familiar with vector notation and
complicated level of mathematics one of the most celebrated and popu- terminology associated with Kalman fil-
required to understand this derivation is lar data fusion algorithms in the field tering such as the state vector and cova-
the ability to multiply two Gaussian of information processing. The most riance matrix. This article is aimed at
functions together and reduce the result famous early use of the Kalman filter those who need to teach the Kalman fil-
to a compact form. was in the Apollo navigation computer ter to others in a simple and intuitive
The Kalman filter is over 50 years old that took Neil Armstrong to the moon, manner, or for those who already have
but is still one of the most important and (most importantly) brought him some experience with the Kalman filter
and common data fusion algorithms in back. Today, Kalman filters are at work but may not fully understand its founda-
use today. Named after Rudolf E. in every satellite navigation device, tions. This article is not intended to be a
Kálmán, the great success of the every smart phone, and many com- thorough and standalone education tool
Kalman filter is due to its small compu- puter games. for the complete novice, as that would
tational requirement, elegant recursive require a chapter, rather than a few
properties, and its status as the optimal pages, to convey.
THE KALMAN FILTER IS
estimator for one-dimensional linear
systems with Gaussian error statistics
OVER 50 YEARS OLD BUT PROBLEM STATEMENT
[1]. Typical uses of the Kalman filter IS STILL ONE OF THE MOST The Kalman filter model assumes that
include smoothing noisy data and pro- IMPORTANT AND COMMON the state of a system at a time t evolved
viding estimates of parameters of inter- DATA FUSION ALGORITHMS from the prior state at time t-1 accord-
est. Applications include global IN USE TODAY. ing to the equation
positioning system receivers, phase-
locked loops in radio equipment, The Kalman filter is typically derived x t = Ft x t - 1 + B t u t + w t , (1)
smoothing the output from laptop using vector algebra as a minimum
trackpads, and many more. mean squared estimator [5], an where
From a theoretical standpoint, the approach suitable for students confident ■ xt is the state vector containing
Kalman filter is an algorithm permitting in mathematics but not one that is easy the terms of interest for the system
exact inference in a linear dynamical to grasp for students in disciplines that (e.g., position, velocity, heading) at
system, which is a Bayesian model simi- do not require strong mathematics. The time t
lar to a hidden Markov model but where Kalman filter is derived here from first ■ u t is the vector containing any
the state space of the latent variables is principles considering a simple physical control inputs (steering angle, throt-
continuous and where all latent and example exploiting a key property of the tle setting, braking force)
observed variables have a Gaussian dis- Gaussian distribution—specifically the ■ Ft is the state transition matrix
tribution (often a multivariate Gaussian property that the product of two which applies the effect of each sys-
distribution). The aim of this lecture Gaussian distributions is another tem state parameter at time t-1 on
note is to permit people who find this Gaussian distribution. the system state at time t (e.g., the
description confusing or terrifying to position and velocity at time t-1
PREREQUISITES both affect the position at time t)
Digital Object Identifier 10.1109/MSP.2012.2203621
This article is not designed to be a thor- ■ B t is the control input matrix
Date of publication: 20 August 2012 ough tutorial for a brand-new student to which applies the effect of each

IEEE SIGNAL PROCESSING MAGAZINE [128] SEPTEMBER 2012 1053-5888/12/$31.00©2012IEEE


control input parameter in the we will consider here as a function of an following the derivation outlined below
vector ut on the state vector (e.g., applied force ft and the mass of the train in the “Solutions” section. To fully
applies the effect of the throttle set- m. Such control information is stored describe the Gaussian functions, we
ting on the system velocity and within the control vector ut need to know their variances and covari-
position) ances, and these are stored in the covari-
ft
■ w t is the vector containing the ut = . ance matrix P t. The terms along the
m
process noise terms for each parame- main diagonal of Pt are the variances
ter in the state vector. The process The relationship between the force associated with the corresponding terms
noise is assumed to be drawn from a applied via the brake or throttle during in the state vector. The off-diagonal
zero mean multivariate normal the time period ∆t (the time elapsed terms of P t provide the covariances
distribution with covariance given by between time epochs t-1 and t) and the between terms in the state vector. In the
the covariance matrix Qt. position and velocity of the train is given case of a well-modeled, one-dimensional
Measurements of the system can also by the following equations: linear system with measurement errors
be performed, according to the model drawn from a zero-mean Gaussian distri-
f (Tt) 2
x t = x t - 1 + (xo t - 1 # Tt) + t bution, the Kalman filter has been
2m
zt = Ht xt + vt , (2) shown to be the optimal estimator [1].
f Tt
xo t = xo t - 1 + t . In the remainder of this article, we will
m
where derive the Kalman filter equations that
■ zt is the vector of measurements These linear equations can be written in allow us to recursively calculate xt t by
■ H t is the transformation matrix matrix form as combining prior knowledge, predictions
that maps the state vector parame- from systems models, and noisy mea-
xt 1 Tt x t - 1 2
; E=; E; E + (Tt) ft .
0 1 xo t - 1 > 2 H m
ters into the measurement domain surements.
xo t
■ v t is the vector containing the The Kalman filter algorithm involves
Tt
measurement noise terms for each two stages: prediction and measure-
observation in the measurement vec- And so by comparison with (1), we can ment update. The standard Kalman fil-
tor. Like the process noise, the mea- see for this example that ter equations for the prediction stage are
surement noise is assumed to be zero
1 Tt 2
Ft = ; E and B t = (Tt) . xt t ; t-1 = Ft xt t-1 ; t-1 + B t u t (3)
> 2 H
mean Gaussian white noise with
0 1
covariance R t. T
Pt ; t-1 = Ft Pt-1 ; t-1 F + Q t , (4)
Tt t
In the derivation that follows, we will
consider a simple one-dimensional track- The true state of the system xt cannot be where Qt is the process noise covariance
ing problem, particularly that of a train directly observed, and the Kalman filter matrix associated with noisy control
moving along a railway line (see provides an algorithm to determine an inputs. Equation (3) was derived explic-
Figure  1). We can therefore consider estimate xt t by combining models of the itly in the discussion above. We can
some example vectors and matrices in system and noisy measurements of cer- derive (4) as follows. The variance asso-
this problem. The state vector x t contains tain parameters or linear functions of ciated with the prediction tx t ; t-1 of an
the position and velocity of the train parameters. The estimates of the param- unknown true value x t is given by
eters of interest in the state vector are
xt
xt = ; E. therefore now provided by probability Pt ; t-1 = E [(x t - tx t ; t-1) (x t - tx t ; t-1) T],
xo t
density functions (pdfs), rather than dis-
The train driver may apply a braking or crete values. The Kalman filter is based and taking the difference between (3)
accelerating input to the system, which on Gaussian pdfs, as will become clear and (1) gives

Measurement (Noisy)

r
0 Prediction (Estimate)

[FIG1] This figure shows the one-dimensional system under consideration.

IEEE SIGNAL PROCESSING MAGAZINE [129] SEPTEMBER 2012


[lecture NOTES] continued

[FIG2] The initial knowledge of the system at time t = 0. The red Gaussian distribution represents the pdf providing the initial
confidence in the estimate of the position of the train. The arrow pointing to the right represents the known initial velocity of
the train.

x t - tx t ; t-1 = F (x t-1 - tx t ; t-1) + w t K t = Pt ; t-1 H Tt (H t Pt ; t-1 H Tt + R t) -1 . (7) from a radio ranging system deployed at
& Pt ; t-1 = E [(F (x t-1 - xt t-1 ; t-1) the track side. The information from the
In the remainder of this article, we will predictions and measurements are com-
+ w t) # (F (x t-1 - xt t-1 ; t-1)
derive the measurement update equa- bined to provide the best possible estimate
+ w t) T]
tions [(5)–(7)] from first principles. of the location of the train. The system is
= FE [(x t-1 - xt t-1 ; t-1) shown graphically in Figure 1.
# (x t-1 - xt t-1 ; t-1) T] SOLUTIONS The initial state of the system (at
# F T + FE [(x t-1 The Kalman filter will be derived here time t = 0 s) is known to a reasonable
- xt t-1 ; t-1) w tT] by considering a simple one-dimension- accuracy, as shown in Figure 2. The
+ E [w t x t - 1 al tracking problem, specifically that of location of the train is given by a
- xt t-1 ; t-1T] F T a train is moving along a railway line. At Gaussian pdf. At the next time epoch
every measurement epoch we wish to (t = 1 s) , we can estimate the new posi-
+ E [w t w tT] .
tion of the train, based on known limita-
Noting that the state estimation errors tions such as its position and velocity at
and process noise are uncorrelated THE BEST ESTIMATE t = 0, its maximum possible acceleration
WE CAN MAKE OF THE and deceleration, etc. In practice, we may
E [(x t-1 - xt t-1 ; t-1) w Tt ] LOCATION OF THE TRAIN IS have some knowledge of the control
= E 6w t (x t-1 - xt t-1 ; t-1) T@ = 0 PROVIDED BY COMBINING inputs on the brake or accelerator by the
OUR KNOWLEDGE FROM driver. In any case, we have a prediction of
& Pt ; t-1 = FE [(x t-1 - xt t-1 ; t-1) (x t-1 the new position of the train, represented
THE PREDICTION AND THE
- xt t-1 ; t-1) T] F T + E [w t w tT] in Figure 3 by a new Gaussian pdf with a
MEASUREMENT.
& Pt ; t-1 = FPt-1 ; t-1 F T + Q t . new mean and variance. Mathematically
this step is represented by (1). The vari-
The measurement update equations are know the best possible estimate of the ance has increased [see (2)], representing
given by location of the train (or more precisely, our reduced certainty in the accuracy of
the location of the radio antenna mount- our position estimate compared to t = 0,
xt t ; t = xt t ; t-1 + K t (z t - H t xt t ; t-1) (5) ed on the train roof). Information is avail- due to the uncertainty associated with any
Pt ; t = Pt ; t-1 -K t H t Pt ; t-1 , (6) able from two sources: 1) predictions process noise from accelerations or decel-
based on the last known position and erations undertaken from time t = 0 to
where velocity of the train and 2) measurements time t = 1.

Prediction (Estimate)
???

[FIG3] Here, the prediction of the location of the train at time t = 1 and the level of uncertainty in that prediction is shown. The
confidence in the knowledge of the position of the train has decreased, as we are not certain if the train has undergone any
accelerations or decelerations in the intervening period from t = 0 to t = 1.

IEEE SIGNAL PROCESSING MAGAZINE [130] SEPTEMBER 2012


Measurement (Noisy)

???
Prediction (Estimate)

[FIG4] Shows the measurement of the location of the train at time t = 1 and the level of uncertainty in that noisy measurement,
represented by the blue Gaussian pdf. The combined knowledge of this system is provided by multiplying these two pdfs
together.

Measurement (Noisy)

???
Prediction (Estimate)

[FIG5] Shows the new pdf (green) generated by multiplying the pdfs associated with the prediction and measurement of the
train’s location at time t = 1. This new pdf provides the best estimate of the location of the train, by fusing the data from the
prediction and the measurement.

At t = 1, we also make a measure- The measurement pdf represented by The quadratic terms in this new
ment of the location of the train using the blue Gaussian function in Figure 4 function can expanded and then the
the radio positioning system, and this is is given by whole expression rewritten in Gaussian
represented by the blue Gaussian pdf in 2
form
Figure 4. The best estimate we can make y 2 (r; n 2, v 2) _ 1 e - (r - n22) . (9)
2v 2 y fused (r; n fused, v fused)
of the location of the train is provided by 2rv 22 (r - n fused) 2
= 1 e
- 2
combining our knowledge from the pre- The information provided by these two pdfs 2
2v fused , (11)
2rv fused
diction and the measurement. This is is fused by multiplying the two together,
achieved by multiplying the two corre- i.e., considering the prediction and the where
sponding pdfs together. This is repre- measurement together (see Figure 5). The
2 2
sented by the green pdf in Figure 5. new pdf representing the fusion of the n1 v2 + n2 v1
n fused = 2 2
A key property of the Gaussian function v1 + v2
is exploited at this point: the product of two A KEY PROPERTY OF THE
2
v1 (n 2 - n 1)
= n1 + (12)
Gaussian functions is another Gaussian 2
v1 + v2
2
GAUSSIAN FUNCTION IS
function. This is critical as it permits an
EXPLOITED AT THIS POINT:
endless number of Gaussian pdfs to be and
multiplied over time, but the resulting
THE PRODUCT OF TWO
function does not increase in complexity or GAUSSIAN FUNCTIONS 2 v1 v2
2 2
v
4

IS ANOTHER GAUSSIAN v fused = 2 2


= v 21 - 2 1 2 .
number of terms; after each time epoch the v1 + v2 v1 + v2
new pdf is fully represented by a Gaussian FUNCTION. (13)
function. This  is the key to the elegant
recursive properties of the Kalman filter. information from the prediction and mea- These last two equations represent the
The stages described above in the fig- surement, and our best current estimate of measurement update steps of the
ures are now considered again mathe- the system, is therefore given by the prod- Kalman filter algorithm, as will be
matically to derive the Kalman filter uct of these two Gaussian functions shown explicitly below. However, to
measurement update equations. present a more general case, we need to
The prediction pdf represented by y fused (r; n 1, v 1, n 2, v 2) consider an extension to this example.
2 2
the red Gaussian function in Figure 3 is
= 1 e - (r - n21) # 1 e - (r - n22) In the example above, it was assumed
2v 1 2v 2
given by the equation 2rv 21 2rv 22 that the predictions and measurements
were made in the same coordinate frame
2 (r - n 1) 2 (r - n 2) 2
y 1 (r; n 1, v 1) _ 1 e - (r - n21) . (8) = 1 e -e 2v 1
2
+
2v 2
2 o
. (10) and in the same units. This has resulted
2v 1 2 2
2rv 21 2r v1 v2 in a particularly concise pair of

IEEE SIGNAL PROCESSING MAGAZINE [131] SEPTEMBER 2012


[lecture NOTES] continued

equations representing the prediction J v1


2 N Hv 21
K O ■ K= " Kt
and measurement update stages. It is & n fused = n 1 + K c O $ c n 2 - n 1 m. H v 21 + v 22
2

K` j
important to note however that in reality K v1 2 + v2 O c
2O =P t ; t-1 H Tt (H t P t ; t-1 H Tt +R t) -1 :
a function is usually required to map L c P
(16) the Kalman gain.
predictions and measurements into the
same domain. In a more realistic exten- Substituting H = 1 c and K = (Hv 21) It is now easy to see how the stan-
sion to our example, the position of the ^ H 2 v 21 + v 22 h results in dard Kalman filter equations relate to
train will be predicted directly as a new (17) and (18) derived above:
n fused = n 1 + K $ ^ n 2 - Hn 1 h . (17)
distance along the railway line in units of
Hv 21
meters, but the time of flight measure- Similarly the fused variance estimate n fused = n 1 +e o $ ^ n 2 - Hn 1 h
H v 21 + v 22
2
ments are recorded in units of seconds. becomes
To allow the prediction and measure- " xt t ; t = xt t ; t - 1 + K t ^z t = H t xt t ; t - 1 h
` j
v1 4
2
=` j - c
ment pdfs to be multiplied together, one v fused v1 2
Hv 21
c2 c
` j + v2 v fused = v 1 - e o Hv 1
v1 2 2 2 2
must be converted into the domain of 2
c H v 21 + v 22
2
the other, and it is standard practice to
J v 21 N
map the predictions into the measure- K O 2 " Pt ; t = Pt ; t - 1 - K t H t P t ; t - 1 .
ment domain via the transformation & v 2fused = v 21 - K c O v1 .
K` j
K v1 2 + v2 O c
matrix Ht. 2O

We now revisit (8) and (9) and, instead L c P CONCLUSIONS


2 2
= v 1 - KHv 1 (18)
of allowing y1 and y2 to both represent val- The Kalman filter can be taught using a
ues in meters along the railway track, we simple derivation involving scalar math-
consider the distribution y2 to represent THE KALMAN FILTER CAN ematics, basic algebraic manipulations,
the time of flight in seconds for a radio BE TAUGHT USING A SIMPLE and an easy-to-follow thought experi-
signal propagating from a transmitter DERIVATION INVOLVING ment. This approach should permit stu-
positioned at x = 0 to the antenna on the SCALAR MATHEMATICS, dents lacking a strong mathematical
train. The spatial prediction pdf y1 is con- interest to understand the core mathe-
BASIC ALGEBRAIC
verted into the measurement domain by matics underlying the Kalman filter in
MANIPULATIONS, AND
scaling the function by c, the speed of an intuitive manner and to understand
light. Equations (8) and (9) therefore must
AN EASY-TO-FOLLOW that the recursive property of the filter
be rewritten as THOUGHT EXPERIMENT. are provided by the unique multiplica-
tive property of the Gaussian function.
n1 2
cs- m
1 - c We can now compare certain terms
y 1 (s; n 1, v 1, c) _ e 2` v 1 j
2

2r ^ vC1 h2 c resulting from this scalar derivation AUTHOR


(14) with the standard vectors and matrices Ramsey Faragher (ramsey@cantab.net)
used in the Kalman filter algorithm: is a principal scientist at the BAE
and ■ n fused " x t t ; t : the state vector fol- Systems Advanced Technology Centre,
lowing data fusion United Kingdom.
1 e - ^ s - n22h ,
2

y 2 (s; n 2, v 2) _ 2v 2 ■ n1 " x t t ; t - 1 : the state vector before


2rv 22 data fusion, i.e., the prediction
(15) 2 REFERENCES
■ v fused " Pt ; t : the covariance matrix [1] B. D. O. Anderson and J. B. Moore, Optimal
where both distributions are now defined (confidence) following data fusion Filtering. New York: Dover, 2005.
2 [2] R. E. Kalman, “A new approach to linear filtering
in the measurement domain, radio sig- ■ v 1 " Pt ; t - 1 : the covariance matrix
and prediction problems,” J. Basic Eng., vol. 82, no.
nals propagate along the time “s” axis, (confidence) before data fusion 1, pp. 35–45, Mar. 1960.
and the measurement unit is the second. ■ n 2 " z t : the measurement vector [3] P. D. Groves, Principles of GNSS, Inertial, and
2 Multisensor Integrated Navigation Systems.
Following the derivation as before ■ v 2 " R t : the uncertainty matrix
Norwood, MA: Artech House, 2008.
we now find associated with a noisy set of mea- [4] S. J. Julier and J. K. Uhlmann, “Unscented filter-
surements ing and nonlinear estimation,” Proc. IEEE, vol. 92,
` j c n2 - m
v1 2 n1 no. 3, pp. 401–422, 2004.
■ H " H t : the transformation matrix
n fused
=
n1
+ c c [5] J. Bibby and H. Toutenburg, Prediction and
used to map state vector parameters
` j + v2
c c v1 2 2 Improved Estimation in Linear Models. New York:
c into the measurement domain Wiley, 1977. [SP]

IEEE SIGNAL PROCESSING MAGAZINE [132] SEPTEMBER 2012

You might also like