Derivation of KF

C04 12/13/2011 10:43:59 Page 141
4
Discrete Kalman Filter Basics
Modern filter theory began with N. Wiener’s work in the 1940s (1). His work was
based on minimizing the mean-square error, so this branch of filter theory is
sometimes referred to as least-squares filtering. This is an oversimplification
though, because a more exact description would be “linear time-domain minimum
mean-square error filtering.” This is a bit wordy though, so the shortened version
will suffice for now. Regardless of what it is called, the central problem is simply a
matter of separation of the signal from an additive combination of signal and noise.
In hindsight, the Wiener solution turned out to be one of those subjects that was
much discussed in textbooks, but little used in practice. Perhaps Wiener’s main
contribution was the way in which he posed the the problem in terms of minimizing
the mean-square error in the time domain. This is in contrast to the frequency-
separation methods that were in use at the time. However, in fairness to Wiener’s
work, the weighting function approach (which is central in the Wiener theory) still
has some merit. More is said of this in Section 6.8.
In 1960 R.E. Kalman considered the same problem that Wiener had dealt with
earlier, but in his 1960 paper he considered the noisy measurement to be a discrete
sequence in time in contrast to a continuous-time signal (2). He also posed the
problem in a state-space setting that accommodated the time-variable multiple-
input/multiple-output scenario nicely. Engineers, especially in the field of naviga-
tion, were quick to see the Kalman technique as a practical solution to some applied
filtering problems that were intractable using Wiener methods. Also, the rapid
advances in computer technology that occurred in the 1960s certainly contributed to
popularizing Kalman filtering as a practical means of separating signal from noise.
After some 50 years now, Kalman filtering is still alive and well, and new
applications keep appearing on the scene regularly.
4.1
A SIMPLE RECURSIVE EXAMPLE
When working with practical problems involving discrete data, it is important that our
methods be computationally feasible as well as mathematically correct. A simple
example will illustrate this. Consider the problem of estimating the mean of some
random constant based on a sequence of noisy measurements. That is, in filtering
terms the true mean is the “signal,” and the measurement error is the “noise.”
141
C04 12/13/2011 10:43:59 Page 142
142 CHAPTER 4 DISCRETE KALMAN FILTER BASICS
The resulting estimate is the filter output. Now think of processing the data on-line.
Let the measurement sequence be denoted as z1, z2, . . . , zn, where the subscript
denotes the time at which the measurement is taken. One method of processing
the data would be to store each measurement as it becomes available and then compute
the sample mean in accordance with the following algorithm (in words):
1. First measurement z1: Store z1 and estimate the mean as
^ 1 ¼ z1
m
2. Second measurement z2: Store z2 along with z1 and estimate the mean as
z1 þ z2
^2 ¼
m
2
3. Third measurement z3: Store z3 along with z1 and z2 and estimate the mean as
z1 þ z2 þ z3
^3 ¼
m
3
4. And so forth.
Clearly, this would yield the correct sequence of sample means as the
experiment progresses. It should also be clear that the amount of memory needed
to store the measurements keeps increasing with time, and also the number of
arithmetic operations needed to form the estimate increases correspondingly. This
would lead to obvious problems when the total amount of data is large. Thus,
consider a simple variation in the computational procedure in which each new
estimate is formed as a blend of the old estimate and the current measurement. To be
specific, consider the following algorithm:
1. First measurement z1: Compute the estimate as
^ 1 ¼ z1
m
^ 1 and discard z1.

Store m
2. Second measurement z2: Compute the estimate as a weighted sum of the
^ 1 and the current measurement z2:
previous estimate m
^ 2 ¼ 12 m
m ^ 1 þ 12z2
^ 2 and discard z2 and m

Store m ^ 1.
^ 2 and z3:
3. Third measurement z3: Compute the estimate as a weighted sum of m
^ 3 ¼ 23 m
m ^ 2 þ 13 z3
^ 3 and discard z3 and m
Store m ^ 2.
4. And so forth. It should be obvious that at the nth stage the weighted sum is

n1 1
^n ¼
m ^ n1 þ
m zn
n n
Clearly, the above procedure yields the same identical sequence of estimates as
before, but without the need to store all the previous measurements. We simply use
C04 12/13/2011 10:43:59 Page 143
4.2 THE DISCRETE KALMAN FILTER 143
the result of the previous step to help obtain the estimate at the current step of the
process. In this way, the previous computational effort is used to good advantage
and not wasted. The second algorithm can proceed on ad infinitum without a
growing memory problem. Eventually, of course, as n becomes extremely large, a
round-off problem might be encountered. However, this is to be expected with
either of the two algorithms.
The second algorithm is a simple example of a recursive mode of operation.
The key element in any recursive procedure is the use of the results of the previous
step to aid in obtaining the desired result for the current step. This is one of the main
features of Kalman filtering, and one that clearly distinguishes it from the weight-
factor (Wiener) approach.
In order to apply the recursive philosophy to estimation of a random process, it
is first necessary that both the process and the measurement noise be modeled in
vector form. This was discussed in Section 3.9, and we will proceed on that basis.
4.2
THE DISCRETE KALMAN FILTER
We will now proceed to develop the Kalman filter recursive equations. The optimiza-
tion criterion used here is minimization of the mean-square estimation error of the
random variable x. Then later in Section 4.7 we will show that this same linear estimate
also corresponds to the mean of x conditioned on the entire past measurement stream.
We begin by assuming the random process to be estimated can be modeled in
the form
xkþ1 ¼ fk xk þ wk (4.2.1)
The observation (measurement) of the process is assumed to occur at discrete points

in time in accordance with the linear relationship
zk ¼ Hk xk þ vk (4.2.2)
and we assume that we know fk, Hk, and the covariances describing wk and vk.
Also, we will be using the same shortened notation here that was introduced earlier
in Section 3.9. It will be repeated here for easy reference:
xk ¼ (n 1) process state vector at time tk
fk ¼ (n n) matrix relating xk to xkþ1 in the absence of a forcing function—if xk is
a sample of continuous process, fk is the usual state transition matrix
f(tkþ1, tk)
wk ¼ (n 1) vector—assumed to be a white sequence with known covariance
structure. It is the input white noise contribution to the state vector for the
time interval (tkþ1, tk)
zk ¼ (m 1) vector measurement at time tk
Hk ¼ (m n) matrix giving the ideal (noiseless) connection between the
measurement and the state vector at time tk
vk ¼ (m 1) measurement error—assumed to be a white sequence with known
covariance structure and having zero crosscorrelation with the wk sequence
C04 12/13/2011 10:43:59 Page 144
The covariance matrices for the wk and vk vectors are given by

Qk ; i ¼ k
E wk wTi ¼ (4.2.3)
0; i ¼ 6 k

Rk ; i ¼ k
E vk vTi ¼ (4.2.4)
0; i ¼ 6 k

E wk vTi ¼ 0; for all k and i (4.2.5)
We assume at this point that we have an initial estimate of the process at some
point in time tk, and that this estimate is based on all our knowledge about the
process prior to tk. This prior (or a priori) estimate will be denoted as ^x
k where the
“hat” denotes estimate, and the “super minus” is a reminder that this is our best
estimate prior to assimilating the measurement at tk. (Note that super minus as used
here is not related in any way to the super minus notation used in spectral
factorization.) We also assume that we know the error covariance matrix associated
with ^x
k . That is, we define the estimation error to be
e x
k ¼ xk ^k (4.2.6)
and the associated error covariance matrix is*
T T
P
k ¼ E ek ek ¼ E½ xk ^x
k xk ^x
k (4.2.7)
In many cases, we begin the estimation problem with no prior measurements. Thus,
in this case, if the process mean is zero, the initial estimate is zero, and the
associated error covariance matrix is just the covariance matrix of x itself.
With the assumption of a prior estimate ^x
k , we now seek to use the measure-
ment zk to improve the prior estimate. We choose a linear blending of the noisy
measurement and the prior estimate in accordance with the equation

^xk ¼ ^x
k þ Kk zk Hk ^ x
k (4.2.8)
where
^xk ¼ updated estimate

Kk ¼ blending factor (yet to be determined)
The justification of the special form of Eq. (4.2.8) will be deferred until Section 4.7.
The problem now is to find the particular blending factor Kk that yields an updated
estimate that is optimal in some sense. Just as in the Wiener solution, we use
minimum mean-square error as the performance criterion. Toward this end, we first
form the expression for the error covariance matrix associated with the updated
(a posteriori) estimate.

Pk ¼ E ek eTk ¼ E ðxk ^xk Þðxk ^xk ÞT (4.2.9)
T
* We tacitly assume here that the estimation error has zero mean, and thus, it is proper to refer to E e
k ek as a
covariance matrix. It is also, of course, a moment matrix, but it is usually not referred to as such.
C04 12/13/2011 10:43:59 Page 145
Next, we substitute Eq. (4.2.2) into Eq. (4.2.8) and then substitute the resulting
expression for ^xk into Eq. (4.2.9). The result is

Pk ¼ E xk ^x k Kk Hk xk þ vk Hk ^ x
k
o (4.2.10)
xk ^x
k Kk H x
k k þ v k H ^
x T
k k

Now, performing the indicated expectation and noting the xk ^x k is the a priori
estimation error that is uncorrelated with the current measurement error vk, we have
Pk ¼ ðI Kk Hk ÞP T
k ðI Kk Hk Þ þ Kk Rk Kk
T
(4.2.11)
Notice here that Eq. (4.2.11) is a perfectly general expression for the updated error
covariance matrix, and it applies for any gain Kk, suboptimal or otherwise.
Returning to the optimization problem, we wish to find the particular Kk that
minimizes the individual terms along the major diagonal of Pk, because these terms
represent the estimation error variances for the elements of the state vector being
estimated. The optimization can be done in a number of ways. We will do this using
a straightforward differential calculus approach, and to do so we need two matrix
differentiation formulas. They are
d ½traceðABÞ
¼ BT ðAB must be squareÞ (4.2.12)
dA

d trace ACAT
¼ 2AC ðC must be symmetricÞ (4.2.13)
dA
where the derivative of a scalar with respect to a matrix is defined as

2 ds ds 3

6 da11 da12 7
6 7
ds 6 7
¼66
ds ds
7
7 (4.2.14)
dA 6 da21 da22 7
4 5
..
.
The two matrix differentiation formulas can be easily verified by writing out the
indicated traces explicitly and then differentiating the results term by term. (This
will be left as an exercise.)
We will now expand the general form for Pk, Eq. (4.2.11), and rewrite it in the
form:
T
Pk ¼ P T T T
k Kk Hk Pk Pk Hk Kk þ Kk Hk Pk Hk þ Rk Kk (4.2.15)
Notice that the second and third terms are linear in Kk and that the fourth term is
quadratic in K. The two matrix differentiation formulas may now be applied to
Eq. (4.2.15). We wish to minimize the trace of P because it is the sum of the mean-
square errors in the estimates of all the elements of the state vector. We can use
the argument here that the individual mean-square errors are also minimized when
C04 12/13/2011 10:44:0 Page 146
the total is minimized, provided that we have enough degrees of freedom in the
variation of Kk, which we do in this case. We proceed now to differentiate the trace
of Pk with respect to Kk, and we note that the trace of P
k Hk Kk is equal to the trace
T T

of its transpose Kk Hk Pk . The result is
d ðtrace Pk Þ T
¼ 2 Hk P
k þ 2Kk Hk P
k Hk þ Rk
T
(4.2.16)
dKk
We now set the derivative equal to zero and solve for the optimal gain. The result is
1
Kk ¼ P T
k Hk Hk Pk Hk þ Rk
T
(4.2.17)
This particular Kk, namely, the one that minimizes the mean-square estimation
error, is called the Kalman gain.
The covariance matrix associated with the optimal estimate may now be
computed. Referring to Eq. (4.2.11), we have
Pk ¼ ðI Kk Hk ÞP T
k ðI Kk Hk Þ þ Kk Rk Kk
T
(4.2.18)
T
¼ P T T T
k Kk Hk Pk Pk Hk Kk þ Kk Hk Pk Hk þ Rk Kk (4.2.19)
Routine substitution of the optimal gain expression, Eq. (4.2.17), into Eq. (4.2.19)
leads to
1
Pk ¼ P T T
k Pk Hk Hk Pk Hk þ Rk Hk P
k (4.2.20)
or
T
Pk ¼ P T
k Kk Hk Pk Hk þ Rk Kk (4.2.21)
or
Pk ¼ ðI Kk Hk ÞP
k (4.2.22)
Note that we have four expressions for computing the updated Pk from the prior
P
k . Three of these, Eqs. (4.2.20), (4.2.21), and (4.2.22), are only valid for the
optimal gain condition. However, Eq. (4.2.18) is valid for any gain, optimal or
suboptimal. All four equations yield identical results for optimal gain with perfect
arithmetic. We note, though, that in the real engineering world Kalman filtering is a
numerical procedure, and some of the P-update equations may perform better
numerically than others under unusual conditions. More will be said of this later in
Section 4.9. For now, we will list the simplest update equation, that is, Eq. (4.2.22),
as the usual way to update the error covariance.
We now have a means of assimilating the measurement at tk by the use of
Eq. (4.2.8) with Kk set equal to the Kalman gain as given by Eq. (4.2.17). Note that
we need ^x
k and Pk to accomplish this, and we can anticipate a similar need at the
next step in order to make optimal use of the measurement zkþ1. The updated
estimated ^xk is easily projected ahead via the transition matrix. We are justified in
C04 12/13/2011 10:44:0 Page 147
ignoring the contribution of wk in Eq. (4.2.1) because it has zero mean and is not
correlated with any of the previous w’s.* Thus, we have
^x
kþ1 ¼ fk ^
xk (4.2.23)
The error covariance matrix associated with ^x
kþ1 is obtained by first forming
the expression for the a priori error
e x
kþ1 ¼ xkþ1 ^ kþ1
¼ ðfk xk þ wk Þ fk ^xk (4.2.24)

¼ fk ek þ wk
We now note that wk and ek have zero crosscorrelation, because wk is the process
noise for the step ahead of tk. Thus, we can write the expression for P
kþ1 as
T T
P
kþ1 ¼ E ekþ1 ekþ1 ¼ E ðfk ek þ wk Þðfk ek þ wk Þ
(4.2.25)
¼ fk Pk fTk þ Qk
We now have the needed quantities at time tkþ1 , and the measurement zkþ1 can be
assimilated just as in the previous step.
Equations (4.2.8), (4.2.17), (4.2.22), (4.2.23), and (4.2.25) comprise the
Kalman filter recursive equations. It should be clear that once the loop is entered,
it can be continued ad infinitum. The pertinent equations and the sequence of
computational steps are shown pictorially in Fig. 4.1. This summarizes what is now
known as the Kalman filter.
Before we proceed to some examples, it is interesting to reflect on the Kalman
filter in perspective. If you were to stumble onto the recursive process of Fig. 4.1
Enter prior estimate ^

x 0– and
its error covariance P0–
Compute Kalman gain:

Kk = P–kHTk (HkP–kHkT + Rk )–1
z0, z1, ···
Project ahead: Update estimate with

^
x– =φ ^x measurement zk:
k +1 k k
^x = ^
x – + K (z – H ^
x –)
P–k + 1 = φ kP k φ k +
T
Qk k k k k k k
^
x 0, ^
x1, ···
Compute error covariance
for updated estimate:
Pk = (I – Kk Hk)P –k
Figure 4.1 Kalman filter loop.
* Recall that in our notation wk is the process noise that accumulates during the step ahead from tk to tkþ1 . This
is purely a matter of notation (but an important one), and in some books it is denoted as wkþ1 rather than wk.
Consistency in notation is the important thing here. Conceptually, we are thinking of doing real-time filtering in
contrast to smoothing, which we usually think of doing off-line (see Chapter 6).
C04 12/13/2011 10:44:0 Page 148
without benefit of previous history, you might logically ask, “Why in the world did
somebody call that a filter? It looks more like a computer algorithm.” You would, of
course, be quite right in your observation. The Kalman filter is just a computer
algorithm for processing discrete measurements (the input) into optimal estimates
(the output). Its roots, though, go back to the days when filters were made of
electrical elements wired together in such a way as to yield the desired frequency
response. The design was often heuristic. Wiener then came on the scene in the
1940s and added a more sophisticated type of filter problem. The end result of his
solution was a filter weighting function or a corresponding transfer function in the
complex domain. Implementation in terms of electrical elements was left as a
further exercise for the designer. The discrete-time version of the Wiener problem
remained unsolved (in a practical sense, at least) until Kalman’s paper of 1960 (2).
Even though his presentation appeared to be quite abstract at first glance, engineers
soon realized that this work provided a practical solution to a number of unsolved
filtering problems, especially in the field of navigation.
4.3
SIMPLE KALMAN FILTER EXAMPLES AND AUGMENTING
THE STATE VECTOR
The basic recursive equations for the Kalman filter were presented in Section 4.2.
We will now illustrate the use of these equations with two simple examples. The
emphasis here is on modeling. Monte Carlo examples with simulated measurements
will be considered later.
EXAMPLE 4.1 WIENER (BROWNIAN MOTION) PROCESS

The Wiener process is defined as integrated Gaussian white noise with the additional
stipulation that the initial value is zero. This is shown in Fig. 4.2, and the input u(t) is
unity white noise in this example. Let us say that we have uniformly spaced
measurements of x(t) beginning a t ¼ 0 and that Dt is 1 second. Also, we will assume
that the measurement errors are uncorrelated and have an rms value of 0.5 m. The first
thing we need to do in any Kalman filter problem is to develop the four key model
parameters fk, Qk, Hk, Rk and the initial conditions. They are:
fk ¼ 1
hÐ Ð Dt i Ð Ð
Dt Dt Dt
Qk ¼ E 0 uðjÞdj 0 uðhÞdh ¼ 0 0 E½uðjÞuðhÞdjdh
Ð Dt Ð Dt
¼ 0 0 dðj hÞdjdh ¼ Dt ¼ 1
(Note the input is unity white noise.)
Figure 4.2 Block diagram of Wiener process and typical sample function
of the process.

Derivation of KF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Derivation of KF

Uploaded by

Copyright:

Available Formats

C04 12/13/2011 10:43:59 Page 141

142 CHAPTER 4 DISCRETE KALMAN FILTER BASICS

^ 1 and discard z1.

^ 2 and discard z2 and m

4.2 THE DISCRETE KALMAN FILTER 143

The observation (measurement) of the process is assumed to occur at discrete points

144 CHAPTER 4 DISCRETE KALMAN FILTER BASICS

The covariance matrices for the wk and vk vectors are given by

^xk ¼ updated estimate

4.2 THE DISCRETE KALMAN FILTER 145

where the derivative of a scalar with respect to a matrix is defined as

146 CHAPTER 4 DISCRETE KALMAN FILTER BASICS

4.2 THE DISCRETE KALMAN FILTER 147

¼ ðfk xk þ wk Þ fk ^xk (4.2.24)

Enter prior estimate ^

Compute Kalman gain:

Project ahead: Update estimate with

Figure 4.1 Kalman filter loop.

148 CHAPTER 4 DISCRETE KALMAN FILTER BASICS

EXAMPLE 4.1 WIENER (BROWNIAN MOTION) PROCESS

You might also like