You are on page 1of 8

Lecture 16: State Space Model and Kalman Filter

Bus 41910, Time Series Analysis, Mr. R. Tsay


A state space model consists of two equations:
S
t+1
= FS
t
+Ge
t+1
, (1)
Z
t
= HS
t
+
t
(2)
where S
t
is a state vector of dimension m, Z
t
is the observed time series, F, G, H are
matrices of parameters, {e
t
} and {
t
} are iid random vectors satisfying
E(e
t
) = 0, E(
t
) = 0, Cov(e
t
) = Q, Cov(
t
) = R
and {e
t
} and {
t
} are independent. In the engineering literature, a state vector denotes
the unobservable vector that describes the status of the system. Thus, a state vector
can be thought of as a vector that contains the necessary information to predict the future
observations (i.e., minimum mean squared error forecasts).
Remark: In some applications, a state-space model is written as
S
t+1
= FS
t
+Ge
t
,
Z
t
= HS
t
+
t
.
Is there any dierence between the two parameterizations?
In this course, Z
t
is a scalar and F, G, H are constants. A general state space model in
fact allows for vector time series and time-varying parameters. Also, the independence
requirement between e
t
and
t
can be relaxed so that e
t+1
and
t
are correlated.
To appreciate the above state space model, we rst consider its relation with ARMA models.
The basic relations are
an ARMA model can be put into a state space form in innite many ways;
for a given state space model in (1)-(2), there is an ARMA model.
A. State space model to ARMA model:
The key here is the Cayley-Hamilton theorem, which says that for any m m matrix F
with characteristic equation
c() = |F I| =
m
+
1

m1
+
2

m2
+ +
m1
+
0
,
we have c(F) = 0. In other words, the matrix F satises its own characteristic equation,
i.e.
F
m
+
1
F
m1
+
2
F
m2
+ +
m1
F +
m
I = 0.
1
Next, from the state transition equation, we have
S
t
= S
t
S
t+1
= FS
t
+Ge
t+1
S
t+2
= F
2
S
t
+FGe
t+1
+Ge
t+2
S
t+3
= F
3
S
t
+F
2
Ge
t+1
+FGe
t+2
+Ge
t+3
.
.
. =
.
.
.
S
t+m
= F
m
S
t
+F
m1
Ge
t+1
+ +FGe
t+m1
+Ge
t+m
.
Multiplying the above equations by
m
,
m1
, ,
1
, 1, respectively, and summing, we
obtain
S
t+m
+
1
S
t+m1
+ +
m1
S
t+1
+
m
S
t
= Ge
t+m
+
1
e
t+m1
+ +
m1
e
t+1
. (3)
In the above, we have used the fact that c(F) = 0.
Finally, two cases are possible. First, assume that there is no observational noise, i.e.
t
= 0
for all t in (2). Then, by multiplying H from the left to equation (3) and using Z
t
= HS
t
,
we have
Z
t+m
+
1
Z
t+m1
+ +
m1
Z
t+1
+
m
Z
t
= a
t+m

1
a
t+m1

m1
a
t+1
,
where a
t
= HGe
t
. This is an ARMA(m, m 1) model.
The second possibility is that there is an observational noise. Then, the same argument
gives
(1 +
1
B + +
m
B
m
)(Z
t+m

t+m
) = (1
1
B
m1
B
m1
)a
t+m
.
By combining
t
with a
t
, the above equation is an ARMA(m, m) model.
B. ARMA model to state space model:
We begin the discussion with some simple examples. Three general approaches will be
given later.
Example 1: Consider the AR(2) model
Z
t
=
1
Z
t1
+
2
Z
t2
+a
t
.
For such an AR(2) process, to compute the forecasts, we need Z
t1
and Z
t2
. Therefore,
it is easily seen that

Z
t+1
Z
t


1

2
1 0

Z
t
Z
t1

1
0

a
t+1
and
Z
t
= [1, 0]S
t
2
where S
t
= (Z
t
, Z
t1
)

and there is no observational noise.


Example 2: Consider the MA(2) model
Z
t
= a
t

1
a
t1

2
a
t2
.
Method 1:

a
t
a
t1

0 0
1 0

a
t1
a
t2

1
0

a
t
Z
t
= [
1
,
2
]S
t
+a
t
.
Here the innovation a
t
shows up in both the state transition equation and the observation
equation. The state vector is of dimension 2.
Method 2: For an MA(2) model, we have
Z
t|t
= Z
t
Z
t+1|t
=
1
a
t

2
a
t1
Z
t+2|t
=
2
a
t
.
Let S
t
= (Z
t
,
1
a
t

2
a
t1
,
2
a
t
)

. Then,
S
t+1
=

0 1 0
0 0 1
0 0 0

S
t
+

a
t+1
and
Z
t
= [1, 0, 0]S
t
.
Here the state vector is of dimension 3, but there is no observational noise.
Exercise: Generalize the above result to an MA(q) model.
Next, we consider three general approaches.
Akaikes approach: For an ARMA(p, q) process, let m = max{p, q + 1},
i
= 0 for
i > p and
j
= 0 for j > q. Dene S
t
= (Z
t
, Z
t+1|t
, Z
t+2|t
, , Z
t+m1|t
)

where Z
t+|t
is the
conditional expectation of Z
t+
given
t
= {Z
t
, Z
t1
, }. By using the updating equation
of forecasts (recall what we discussed before)
Z
t+1
( 1) = Z
t
() +
1
a
t+1
,
it is easy to show that
S
t+1
= FS
t
+Ga
t+1
Z
t
= [1, 0, , 0]S
t
3
where
F =

0 1 0 0
0 0 1 0
.
.
.
.
.
.

m

m1

2

1

, G =

2
.
.
.

m1

.
The matrix F is call a companion matrix of the polynomial 1
1
B
m
B
m
.
Aokis Method: This is a two-step procedure. First, consider the MA(q) part. Letting
W
t
= a
t

1
a
t1

q
a
tq
, we have

a
t
a
t1
.
.
.
a
tq+1

0 0 0 0
1 0 0 0
.
.
.
0 0 1 0

a
t1
a
t2
.
.
.
a
tq

1
0
.
.
.
0

a
t
W
t
= [
1
,
2
, ,
q
]S
t
+a
t
.
In the next step, we use the usual AR(p) format for
Z
t

1
Z
t1

p
Z
tp
= W
t
.
Consequently, dene the state vector as
S
t
= (Z
t1
, Z
t2
, , Z
tp
, a
t1
, , a
tq
)

.
Then, we have

Z
t
Z
t1
.
.
.
Z
tp+1
a
t
a
t1
.
.
.
a
tq+1

1

2

p

1

1

q
1 0 0 0 0 0
.
.
.
.
.
.
0 1 0 0 0 0
0 0 0 0 0 0
0 0 0 1 0 0
.
.
. 0
0 0 0 0 1 0

Z
t1
Z
t2
.
.
.
Z
tp
a
t1
a
t2
.
.
.
a
tq

1
0
.
.
.
0
1
0
.
.
.
0

a
t
.
and
Z
t
= [
1
, ,
p
,
1
, ,
q
]S
t
+a
t
.
Third approach: The third method is used by some authors, e.g. Harvey and his asso-
ciates. Consider an ARMA(p, q) model
Z
t
=
p

i=1

i
Z
ti
+a
t

q

j=1

j
a
tj
.
4
Let m = max{p, q}. Dene
i
= 0 for i > p and
j
= 0 for j > q. The model can be
written as
Z
t
=
m

i=1

i
Z
ti
+a
t

m

i=1

i
a
ti
.
Using (B) = (B)/(B), we can obtain the -weights of the model by equating coecients
of B
j
in the equation
(1
1
B
m
B
m
) = (1
1
B
m
B
m
)(
0
+
1
B + +
m
B
m
+ ),
where
0
= 1. In particular, consider the coecient of B
m
, we have

m
=
m

0

m1

1

1

m1
+
m
.
Consequently,

m
=
m

i=1

mi

m
. (4)
Next, from the -weight representation
Z
t+mi
= a
t+mi
+
1
a
t+mi1
+
2
a
t+mi2
+ ,
we obtain
Z
t+mi|t
=
mi
a
t
+
mi+1
a
t1
+
mi+2
a
t2
+
Z
t+mi|t1
=
mi+1
a
t1
+
mi+2
a
t2
+ .
Consequently,
Z
t+mi|t
= Z
t+mi|t1
+
mi
a
t
, mi > 0. (5)
We are ready to setup a state space model. Dene S
t
= (Z
t|t1
, Z
t+1|t1
, , Z
t+m1|t1
)

.
Using Z
t
= Z
t|t1
+a
t
, the observational equation is
Z
t
= [1, 0, , 0]S
t
+a
t
.
The state-transition equation can be obtained by Equations (5) and (4). First, for the rst
m 1 elements of S
t+1
, Equation (5) applies. Second, for the last element of S
t+1
, the
model implies
Z
t+m|t
=
m

i=1

i
Z
t+mi|t

m
a
t
.
Using Equation (5), we have
Z
t+m|t
=
m

i=1

i
(Z
t+mi|t1
+
mi
a
t
)
m
a
t
=
m

i=1

i
Z
t+mi|t1
+ (
m

i=1

mi

m
)a
t
=
m

i=1

i
Z
t+mi|t1
+
m
a
t
,
5
where the last equality uses Equation (4). Consequently, the state-transition equation is
S
t+1
= FS
t
+Ga
t
where
F =

0 1 0 0
0 0 1 0
.
.
.
.
.
.

m

m1

2

1

, G =

3
.
.
.

.
Note that for this third state space model, the dimension of the state vector is m =
max{p, q}, which may be lower than that of the Akaikes approach. However, the innova-
tions to both the state-transition and observational equations are a
t
.
Kalman Filter
Kalman lter is a set of recursive equation that allows us to update the information in
a state space model. It basically decomposes an observation into conditional mean and
predictive residual sequentially. Thus, it has wide applications in statistical analysis.
The simplest way to derive the Kalman recursion is to use normality assumption. It should
be pointed out, however, that the recursion is a result of the least squares principle (or
projection) not normality. Thus, the recursion continues to hold for non-normal case.
The only dierence is that the solution obtained is only optimal within the class of linear
solutions. With normailty, the solution is optimal among all possible solutions (linear and
nonlinear).
Under normality, we have
that normal prior plus normal likelihood results in a normal posterior,
that if the random vector (X, Y ) are jointly normal

X
Y

N(


xx

xy

yx

yy

),
then the conditional distribution of X given Y = y is normal
X|Y = y N[
x
+
xy

1
yy
(y
y
),
xx

xy

1
yy

yx
].
Using these two results, we are ready to derive the Kalman lter. In what follows, let P
t+j|t
be the conditional covariance matrix of S
t+j
given {Z
t
, Z
t1
, } for j 0 and S
t+j|t
be
the conditional mean of S
t+j
given {Z
t
, Z
t1
, }.
First, by the state space model, we have
S
t+1|t
= FS
t|t
(6)
6
Z
t+1|t
= HS
t+1|t
(7)
P
t+1|t
= FP
t|t
F

+GQG

(8)
V
t+1|t
= HP
t+1|t
H

+R (9)
C
t+1|t
= HP
t+1|t
(10)
where V
t+1|t
is the conditional variance of Z
t+1
given {Z
t
, Z
t1
, } and C
t+1|t
denotes
the conditional covariance between Z
t+1
and S
t+1
. Next, consider the joint conditional
distribution between S
t+1
and Z
t+1
. The above results give

S
t+1
Z
t+1

t
N(

S
t+1|t
Z
t+1|t

P
t+1|t
P
t+1|t
H

HP
t+1|t
HP
t+1|t
H

+R

).
Finally, when Z
t+1
becomes available, we may use the property of nromality to update the
distribution of S
t+1
. More specically,
S
t+1|t+1
= S
t+1|t
+P
t+1|t
H

[HP
t+1|t
H

+R]
1
(Z
t+1
Z
t+1|t
) (11)
and
P
t+1|t+1
= P
t+1|t
P
t+1|t
H

[HP
t+1|t
H

+R]
1
HP
t+1|t
. (12)
Obviously,
r
t+1|t
= Z
t+1
Z
t+1|t
= Z
t+1
HS
t+1|t
is the predictive residual for time point t + 1. The updating equation in (11) says that
when the predictive residual r
t+1|t
is non-zero there is new information about the system so
that the state vector should be modied. The contribution of r
t+1|t
to the state vector, of
course, needs to be weighted by the variance of r
t+1|t
and the conditional covariance matrix
of S
t+1
.
In summary, the Kalman lter consists of the following equations:
Prediction: (6), (7), (8) and (9)
Updating: (11) and (12).
In practice, one starts with initial prior information S
0|0
and P
0|0
, then predicts Z
1|0
and
V
1|0
. Once the observation Z
1
is available, uses the updating equations to compute S
1|1
and
P
1|1
, which in turns serve as prior for the next observation. This is the Kalman recusion.
Applications of Kalman Filter
Kalman lter has many applications. They are often classied as ltering, prediction,
and smoothing. Let F
t1
be the information available at time t 1, i.e., F
t1
= -
led{Z
t1
, Z
t2
, . . .}.
Filtering: make inference on S
t
given F
t
.
Prediction: draw inference about S
t+h
with h > 0, given F
t
.
7
SMoothing: make inference about S
t
given the data F
T
, where T t is the sample
size.
We shall briey discuss some of the applications. A good reference is Chapter 11 of Tsay
(2005, 2nd ed.).
8