Professional Documents
Culture Documents
Gabriel A. Terejanu
Department of Computer Science and Engineering
University at Buffalo, Buffalo, NY 14260
terejanu@buffalo.edu
1 Introduction
Consider the following stochastic dynamic model and the sequence of noisy observations zk :
Zk = {zi |1 ≤ i ≤ k} (3)
be the set of k observations. Finding xak , the estimate or analysis of the state space xk , given Zk
and the initial conditions is called the filtering problem. When the dynamic model for the process,
f(·), and for the measurements, h(·), are linear, and the random components x0 , wk , vk are uncorre-
lated Gaussian random vectors, then the solution is given by the classical Kalman filter equations [7].
The Kalman filter is named after Rudolph E.Kalman, who in 1960 published his famous paper de-
scribing a recursive solution to the discrete-data linear filtering problem (Kalman 1960) [11]. It is
the optimal estimator for a large class of problems, finding the most probable state as an unbiased
linear minimum variance estimate of a system based on discrete observations of the system and a
model which describes the evolution of the system [5].
2 Dynamic process
A stochastic time-variant linear system is described by the difference equation and the observation
model:
Where the control input uk is a known nonrandom vector. The initial state x0 is a random vector
with known mean µ0 = E[x0 ] and covariance P0 = E[(x0 − µ0 )(x0 − µ0 )T ].
In the following we assume that the random vector wk captures uncertainties in the model and vk
denotes the measurement noise. Both are temporally uncorrelated (white noise), zero-mean random
sequences with known covariances and both of them are uncorrelated with the initial state x0 .
E[wk ] = 0 E[wk wTk ] = Qk E[wk wTj ] = 0 for k 6= j E[wk xT0 ] = 0 for all k (6)
E[vk ] = 0 E[vk vTk ] = Rk E[vk vTj ] = 0 for k 6= j E[vk xT0 ] = 0 for all k (7)
The assumptions about the unbiasedness and uncorrelation are not that critical, extensions of Kalman
Filter can be derived in these situations.
3 KF derivation
The optimal (minimum variance unbiased) estimate is the conditional mean and is computed in two
steps: the forecast step using the model difference equations and the data assimilation step. Hence
the Kalman Filter has a ”predictor-corrector” structure.
Assume now that we have an optimal estimate xak−1 ≡ E[xk−1 |Zk−1 ] with Pk−1 ≡ E[(xk−1 −
xak−1 )(xk−1 − xak−1 )T ] covariance at time k − 1. The predictable part of xk is given by:
2
The forecast error is:
Assume that the last term is a linear operation on the innovation zk − Hk xfk [10](see also [9]: Projec-
tion theorem p.408 and Kalman innovations p.443). The innovation represents the new information
contained in the observation zk .
Therefore:
So, the easiest way to combine the two bits of information is to assume that the unbiased estimate xak
is a linear combination of both the forecast and the measurement.
Or, the optimal estimate at time k is equal to the best prediction plus a correction term of an
optimal weighting value, Kk , times the innovation as in (17) [8].
xak = Ak−1 xak−1 + Bk−1 uk−1 + Kk (Hk xk + vk − Hk (Ak−1 xak−1 + Bk−1 uk−1 )) (18)
= Ak−1 xak−1 + Bk−1 uk−1 + Kk (Hk Ak−1 (xk−1 − xak−1 ) + Hk wk−1 + vk )
3
Figure 1: Sequential assimilation
ek ≡ xk − xak (19)
= Ak−1 ek−1 − Kk Hk Ak−1 ek−1 + (I − Kk Hk )wk−1 − Kk vk
= (I − Kk Hk )(Ak−1 ek−1 + wk−1 ) − Kk vk
where
The posterior covariance formula holds for any Kk . The cross terms canceled because xk−1 , wk−1 and
vk are uncorrelated and ek−1 is a function of xk−1 .
Our goal is to minimize the error in the estimate eki for any state i = 1, n. The problem is con-
structed as a mean squared error minimiser. The cost functional to be minimized is given by [1]:
" n #
X
2
J =E eki (23)
i=1
4
This is the sum of error variances for each state variable. Therefore the cost functional can be expressed
as the trace of the error covariance:
J = tr(Pk ) (24)
Since tr(Pk ) is a function of Kk and Kk is the only unknown, we request to minimize the tr(Pk ) w.r.t.
Kk .
∂tr(Pk )
= 0 (25)
∂Kk
The partial derivative of the trace is easily given using matrix calculus rules [4].
∂tr(Pfk − Kk Hk Pfk − Pfk HTk KTk + Kk Dk KTk )
= 0 (26)
∂Kk
−(Hk Pfk )T − Pfk HTk + 2Kk Dk = 0
Thus, the Kalman gain is given by:
Kk = Pfk HTk D−1
k (27)
= Pfk HTk (Hk Pfk HTk + Rk )
−1
5
τ
Kalman Filter
Innovation
τ
5 KF original derivation
The following derivation respects Kalman original concept of derivation [10]. The notation that has
been changed for the consistency of the tutorial. The optimal estimate for the system (4)-(5) is derived
using orthogonal projections on the vector space of random variables.
Orthogonal Projection
Let the vector space Zk be the set of all linear combinations of the random variables (observations)
zk . Zk is a finite-dimensional subspace of the space of all possible observations.
Xk
Zk ≡ zk zk = αi zi (29)
i=1
Two vectors u, v ∈ Zk are orthogonal if their correlation is zero. Any vector x can be uniquely
decomposed in two parts: x ∈ Zk and x
e⊥Zk .
x= x+x
e (30)
Theorem [10]: Let {xk }, {zk } be random processes with zero mean. If either (1) the random processes
are Gaussian or (2) the optimal estimate is restricted to be a linear function of the observed random
variables, and the loss function L(ek ) is quadratic in ek = xk − x̂k , where L(·) is a positive non-
decreasing function of error
x̂k = optimal estimate of xk given {zk } (31)
= orthogonal projection xk of xk on Zk
Derivation
Assume Zk−1 is known and zk is measured. Let e zk be the component of zk orthogonal on Zk−1 . The
zk generates a linear manifold Zek .
component e
Zk = Zk−1 ∪ Zek (32)
6
Every vector in Zek is orthogonal to every vector in Z(k − 1).
Where the forecast value xfk of xk can be obtained as in (11) and the forecast covariance matrix is
given by (13):
Assume that the last term in (33) is a linear operation on the random variable e
zk (called innovation):
E[xk |Zek ] = Kk e
zk (36)
where
zk = zk − zk
e (37)
ek ≡ xk − xak (40)
= Ak−1 ek−1 − Kk Hk Ak−1 ek−1 + (I − Kk Hk )wk−1 − Kk vk
= (I − Kk Hk )(Ak−1 ek−1 + wk−1 ) − Kk vk
where
7
ek ] is orthogonal to
We have to find an explicit formula for Kk by noting that the residual xk − E[xk |Z
e
Zk , therefore it is orthogonal to e
zk . Results:
ek ])e
0 = E[(xk − E[xk |Z zTk ] (43)
= E[(xk − Kk e zTk ]
zk )e
zTk ] − Kk E[e
= E[xk e zTk ]
zk e
ek , where xk ∈ Zk−1 , therefore xk ⊥Zek so xk ⊥e
We know that xk = xk + x zk .
zTk ] = E[(xk + x
E[xk e zTk ]
ek )e (44)
= zTk ]
xk e
E[e
zTk ]
= E[(xk − xk )e
zTk ]
= E[(Ak−1 xk−1 + Bk−1 uk−1 + wk−1 − E[xk |Zk−1 ])e
zTk ]
= E[(Ak−1 xk−1 + Bk−1 uk−1 + wk−1 − Ak−1 xak−1 − Bk−1 uk−1 )e
zTk ]
= E[(Ak−1 ek−1 + wk−1 )e
= Ak−1 E[ek−1 (zk − Hk xfk )T ] + E[wk−1 (zk − Hk xfk )T ]
We can obtain an expression of the innovation as function of the estimate error:
zk − Hk xfk = Hk xk + vk − Hk xfk (45)
= Hk (xk − xfk ) + vk
= Hk (Ak−1 xk−1 + Bk−1 uk−1 + wk−1 − Ak−1 xak−1 − Bk−1 uk−1 ) + vk
= Hk Ak−1 ek−1 + Hk wk−1 + vk
Substituting this into (44):
zTk ] = Ak−1 E[ek−1 (Hk Ak−1 ek−1 + Hk wk−1 + vk )T ]
E[xk e (46)
T
+E[wk−1 (Hk Ak−1 ek−1 + Hk wk−1 + vk ) ]
= Ak−1 Pk−1 ATk−1 HTk + Qk−1 HTk
= Pfk HTk
The last term from (43) is:
zTk ] = E[(Hk Ak−1 ek−1 + Hk wk−1 + vk )(Hk Ak−1 ek−1 + Hk wk−1 + vk )T ]
zk e
E[e (47)
= Hk Ak−1 Pk−1 ATk−1 HTk + Hk Qk−1 HTk + Rk
= Hk Pfk HTk + Rk
With (46) and (47), (43) becomes:
zTk ] − Kk E[e
0 = E[xk e zTk ]
zk e (48)
= Pfk HTk − Kk (Hk Pfk HTk + Rk )
Results that the gain matrix Kk is:
Kk = Pfk HTk (Hk Pfk HTk + Rk )−1 (49)
The Kalman Filter equations are given by (34), (35), (33), (49) and (41). Note that in the original
Kalman paper the gain derived there, ∆k , is given by ∆k = Ak−1 Kk .
8
6 Information form
In the information filter (inverse-covariance filter) the estimated vector and the covariance matrix has
been replaced by the information state yk and the information matrix Yk respectively.
The same information form has the forecast estimate and covariance matrix.
With this changes we try to write Kalman filter equations in the information form [6]. So the data
assimilation equations become:
Derivation of the information matrix follows immediately from the posterior covariance matrix formula.
P−1
k = (Pfk )−1 (I − Kk Hk )−1 (56)
= (Pfk )−1 [Kk (K−1
− Hk )]−1
k
−1
f −1 f T T −1 f −1
= (Pk ) Kk [(Hk Pk Hk + Rk )(Hk ) (Pk ) − Hk ]
−1
= (Pfk )−1 T −1 f −1
Kk Rk (Hk ) (Pk )
= HTk R−1
k Kk
−1
f T f −1
= HTk R−1 T −1
k (Hk Pk Hk + Rk )(Hk ) (Pk )
f −1
= HTk R−1 T −1 T −1
k Hk + Hk Rk Rk (Hk ) (Pk )
Yk = Yfk + HTk R−1
k Hk (57)
9
Provided that Ak−1 is nonsingular, equations of the model forecast become:
where Mk = (A−1 T
k ) Yk Ak . Using the forecast covariance matrix recurrence formula we can derive
−1
Y0 = P−1
0
ya0 = Y0 µ0
7 Innovation approach
The method solves the estimation problem much easier using the innovation process, which is the
observed process converted into a white-noise process. The innovation represents the new information
measure in the observation variable zk , being given all the past observations and the old information
deduced therefrom. It is defined as:
zk = zk − E[zk |Zk−1 ]
e (61)
10
1. The innovation e zk , associated with the current observation, is uncorrelated with the past obser-
vations: E[e T
zk zj ] = 0 where j = 1, 2..., k − 1.
3. There is a one-to-one correspondence between the innovation and the associated observation.
where Ii is an n × m matrix to be determined. We know by the Projection Theorem that the forecast
error ek is uncorrelated with the innovation sequence. Therefore, for all i up to k:
zTi ]
0 = E[ek e (63)
= E[(xk − zTi ]
xak )e
zTi ]
E[xk e zTi ]
= E[xak e
k
X
zTi ]
E[xk e = zTi ]
zl e
Il E[e
l=1
= Ii E[e zTi ]
zi e
All the terms up to k − 1 go to zero since the innovations are temporally uncorrelated.
zTi ]Cov −1 (e
Ii = E[xk e zi ) (64)
11
zTk ]Cov −1 (e
where Kk = E[xk e zk ).
ek ≡ xk − xak (66)
= Ak−1 ek−1 + wk−1 − Kk e
zk
Denote Pfk = Ak−1 Pk−1 ATk−1 + Qk−1 and substitute this back into (67) yields:
Pk = (I − Kk Hk )Pfk (69)
Kk = Pfk HTk (Hk Pfk HTk + Rk )
−1
(70)
The equations (61), (62), (69) and (70) define the Kalman Filter algorithm.
Filter Divergence
This phenomenon occurs when the filter seems to behave well, having low error variance, but the
estimate is far away from the truth. This is due to errors in the system modeling: the model error
is higher than expected, the system model has the wrong form, the system is unstable or has bias
errors.
9 Remarks
1. The filter produce the error covariance matrix Pk which is an important estimate for the accuracy
of the estimate.
12
2. The filter is optimal for Gaussian sequences only.
3. While the measurement noise covariance Rk is possible to be determined, the process noise
covariance matrix Qk has to be computed to adjust to different dynamics. We are not able to
directly observe the process we are estimating. Therefore a tuning on Qk has to be performed
for superior filter performances.
10 Conclusion
While most of the filters are formulated in the frequency domain, the Kalman Filter is a purely time-
domain filter.
References
[1] Michael Athans. The Control Handbook, chapter Kalman Filtering, pages 589–594. CRC Press,
1996.
[3] Mourad Barkat. Signal Detection and Estimation. Artech House Inc, 2005.
[4] Jon Dattorro. Convex Optimization & Euclidean Distance Geometry, chapter Matrix Calculus.
Meboo Publishing USA, 2006.
[6] Mohinder S. Grewal and Angus P. Andrews. Kalman Filtering. Theory and Practice using Matlab
2nd. John Wiley & Sons, 2001.
[7] John M. Lewis and S.Lakshmivarahan. Dynamic Data Assimilation, A Least Squares Approach.
2006.
[8] Peter S. Mayback. Introduction to Random Signals and Applied Kalman Filtering. Academic
Press, 1979.
[9] Athanasios Papoulis. Probability, Random Variables, and Stochastic Process. McGraw-Hill, Inc.,
2nd edition, 1965.
[10] R.E.Kalman. A New Approach to Linear Filtering and Prediction Problems. Trans.ASME, 1960.
[11] Greg Welch and Gary Bishop. An Introduction to the Kalman Filter. SIGGRAPH, ACM, 2001.
13