You are on page 1of 63

Warm-up: Recursive Least Squares

Kalman Filter
Nonlinear State Space Models
Particle Filtering

Time Series Analysis


5. State space models and Kalman filtering

Andrew Lesniewski

Baruch College
New York

Fall 2019

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Outline

1 Warm-up: Recursive Least Squares

2 Kalman Filter

3 Nonlinear State Space Models

4 Particle Filtering

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

OLS regression
As a motivation for the reminder of this lecture, we consider the standard linear
model
Y = X T β + ε, (1)
where Y ∈ R, X ∈ Rk , and ε ∈ R is noise (this includes the model with an
intercept as a special case in which the first component of X is assumed to be 1).
Given n observations x1 , . . . , xn and y1 , . . . , yn of X and Y , respectively, the
ordinary least square least (OLS) regression leads to the following estimated
value of the coefficient β:

βbn = (XnT Xn )−1 XnT Yn . (2)

The matrices X and Y above are defined as

x1T y1
   

X =  ..  ∈ Matn,k (R) and Yn =  ..  ∈ Rn ,


 .  .
(3)
xnT yn

respectively.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Recursive least squares

Suppose now that X and Y consists of a streaming set of data, and each new
observation leads to an updated value of the estimated β.
It is not optimal to redo the entire calculation above after a new observation
arrives in order to find the updated value.
Instead, we shall derive a recursive algorithm that stores the last calculated
value of the estimated β, and updates it to incorporate the impact of the latest
observation.
To this end, we introduce the notation:

Pn = (XnT Xn )−1 ,
(4)
Bn = XnT Yn ,

and rewrite (2) as


βbn = Pn Bn . (5)

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Recursive least squares

Using (3), we can write


 
Xn
Xn+1 = T ,
xn+1
 
Yn
Yn+1 = .
yn+1

Therefore, Pn and Bn obey the following recursive relations:

−1
Pn+1 = Pn−1 + xn+1 xn+1
T
,
Bn+1 = Bn + xn+1 yn+1 .

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Recursive least squares

We shall now use the matrix inversion formula:

(A + BD)−1 = A−1 − A−1 B(I + DA−1 B)−1 DA−1 , (6)

valid for a square invertible matrix A, and matrices B and D such that the
operations above are defined.
This yields the following relation:

T
Pn+1 = Pn − Pn xn+1 (1 + xn+1 Pn xn+1 )−1 xn+1
T
Pn
T
= Pn − Kn+1 xn+1 Pn .

where we have introduced the notation

T
Kn+1 = Pn xn+1 (1 + xn+1 Pn xn+1 )−1 .

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Recursive least squares

Now, define the a priori error

T b
εbn+1 = yn+1 − xn+1 βn .

Then the recursion for Bn can be recast as

T b
Bn+1 = Bn + xn+1 xn+1 βn + xn+1 εbn+1 .

Using (5), we see that

−1 b
Pn+1 βn+1 = Pn−1 βbn + xn+1 xn+1
T b
βn + xn+1 εbn+1
= (Pn−1 + xn+1 xn+1
T
)βbn + xn+1 εbn+1
−1 b
= Pn+1 βn + xn+1 εbn+1 .

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Recursive least squares

In other words,
βbn+1 = βbn + Pn+1 xn+1 εbn+1 .
However, from the definition of Kn+1 ,

Pn+1 xn+1 = Kn+1 ,

and so
βbn+1 = βbn + Kn+1 εbn+1 .

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Recursive least squares

The algorithm can be summarized as follows. We initialize βb0 (e.g. at 0), and P0
(e.g. at I), and iterate:

T b
εbn+1 = yn+1 − xn+1 βn ,
T
Kn+1 = Pn xn+1 (1 + xn+1 Pn xn+1 )−1 ,
T
(7)
Pn+1 = Pn − Kn+1 xn+1 Pn ,
βbn+1 = βbn + Kn+1 εbn+1 .

Note that (i) we no longer have to store the (potentially large) matrices Xn and
Yn , and (ii) the computationally expensive operation of inverting the matrix Xn XnT
is replaced with a small number of simpler operations.
We now move on to the main topic of these notes, the Kalman filter and its
generalizations.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

State space models

A state space model (SSM) is a time series model in which the time series Yt is
interpreted as the result of a noisy observation of a stochastic process Xt .
The values of the variables Xt and Yt can be continuous (scalar or vector) or
discrete.
Graphically, an SSM is represented as follows:

X1 → X2 → ... → Xt → ... state process


   
y y y y ··· (8)

Y1 Y2 ... Yt ... observation process

SSMs belong to the realm of Bayesian inference, and they have been
successfully applied in many fields to solve a broad range of problems.
Our discussion of SSMs follows largely [2].

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

State space models

It is usually assumed that the state process Xt is Markovian, i.e. Xt depends on


the history only through Xt−1 , and Yt depends only on Xt :

Xt ∼ p(Xt |Xt−1 ),
(9)
Yt ∼ p(Yt |Xt ).

The most well studied SSM is the Kalman filter, for which the processes above
are linear and and the sources of randomness are Gaussian.
Namely, a linear state space model has the form:

Xt+1 = GXt + εt+1 ,


(10)
Yt = HXt + ηt .

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

State space models

Here, the state vector Xt ∈ Rr is possibly unobservable and it can be observed


only through the observation vector Yt ∈ Rn .
The matrices G ∈ Matr (R) and H ∈ Matn,r (R) are assumed to be known. For
example, their values may be given by (economic) theory, or they may have been
obtained through MLE estimation.
In fact, the matrices G and H may depend deterministically on time, i.e. G and H
may be replaced by known matrices Gt and Ht , respectively.
We also assume that the distribution of the initial value X1 is known and
Gaussian.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

State space models

The vectors of residuals εt ∈ Rr and ηt ∈ Rn satisfy

E(εt εTs ) = δts Q,


(11)
E(ηt ηsT ) = δts R,

where δts denotes Kronecker’s delta, and where Q and R are known positive
definite (covariance) matrices.
We also assume that the components of εt and ηs are independent of each other
for all t and s.
The matrices Q and R may depend deterministically on time.
The first of the equations in (10) is called the state equation, while the second
one is referred to as the observation equation.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Inference for state space models

Let T denote the time horizon.


Our broad goal is to make inference about the states Xt based on a set of
observations Y1 , . . . , Yt .
Three questions are of particular interest:
(i) Filtering: t < T . What can we infer about the current state of the system
based on all available observations?
(ii) Smoothing: t = T . What can be inferred about the system based on the
information contained in the entire data sample? In particular, how can we
back fill missing observations?
(iii) Forecasting: t > T . What is the optimal prediction of a future observation
and / or a future state of the system?

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Kalman filter

In principle, any inference for this model can be done using the standard
methods of multivariate statistics.
However, these methods require storing large amounts of data and inverting
tn × tn matrices. Notice that, as new data arrive, the storage requirements and
matrix dimensionality increase.
This is frequently computationally intractable and impractical.
Instead, the Kalman filter relies on a recursive approach which does not require
significant storage resources and involves inverting n × n matrices only.
We will go through a detailed derivation of this recursion.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Kalman filter

The purpose of filtering is to update the knowledge of the system each time a
new observation is made.
We define the one period predictor µt+1 , when the observation Yt is made, and
its covariance Pt+1 :

µt+1 = E(Xt+1 |Y1:t ),


(12)
Pt+1 = Var(Xt+1 |Y1:t ),

as well as the filtered estimator µt|t and its covariance Pt|t :

µt|t = E(Xt |Y1:t ),


(13)
Pt|t = Var(Xt |Y1:t ).

Our objective is to compute these quantities recursively.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Kalman filter

We let
vt = Yt − E(Yt |Y1:t−1 ) (14)
denote the one period prediction error or innovation.
In Homework Assignment #6 we show that vt ’s are mutually independent.
Note that

vt = Yt − E(HXt + ηt |Y1:t−1 )
= Yt − Hµt .

As a consequence, we have, for t = 2, 3, . . .,

E(vt |Y1:t−1 ) = E(HXt + ηt − Hµt |Y1:t−1 )


(15)
= 0.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Kalman filter
Now we notice that

Var(vt |Y1:t−1 ) = Var(HXt + ηt − Hµt |Y1:t−1 )


= Var(H(Xt − µt )|Y1:t−1 ) + Var(ηt |Y1:t−1 )
= E(H(Xt − µt )(Xt − µt )T H T |Y1:t−1 ) + E(ηt ηtT |Y1:t−1 )
= HPt H T + R.

For convenience we denote

Ft = Var(vt |Y1:t−1 ). (16)

We will assume in the following that the matrix Ft is invertible.


The result of the calculation above can thus be stated as:

Ft = HPt H T + R. (17)

This relation allows us to derive a relation between µt and µt|t .

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Lemma
First, we will establish the following Lemma: Let X and Y be Gaussian jointly
distributed random vectors with
   
X µX
E = ,
Y µY

and    
X ΣXX ΣXY
Var =
Y ΣTXY ΣYY

Then
E(X |Y ) = µX + ΣXY Σ−1

YY
Y − µY , (18)

and
Var(X |Y ) = ΣXX − ΣXY Σ−1 ΣT .
YY XY
(19)
Proof : Consider the random variable

Z = X − ΣXY Σ−1

YY
Y − µY .

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Lemma

Since Z is a linear in X and Y , the vector (Y , Z ) is Gaussian jointly distributed.


Furthermore,

E(Z ) = µX ,
Var(Z ) = E((Z − µX )(Z − µX )T )
= ΣXX − ΣXY Σ−1 ΣT .
YY XY

Finally,

Cov(Y , Z ) = E(Y (Z − µX )T )
= E(Y (X − µX )T − Y (Y − µY )T Σ−1 ΣT )
YY XY
= 0.

This means that Z and Y are independently distributed!

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Lemma

Consequently, E(Z |Y ) = E(Z ) and Var(Z |Y ) = Var(Z ).


Since
X = Z + ΣXY Σ−1

YY
Y − µY ,

we have
E(X |Y ) = µX + ΣXY Σ−1

YY
Y − µY ,
which proves (18).
Also, conditioned on Y , X and Z differ by a constant vector, and so

Var(X |Y ) = Var(Z |Y )
= Var(Z )
= ΣXX − ΣXY Σ−1 ΣT ,
YY XY

which proves (19). QED

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Kalman filter

Now, going back to the main calculation, we have

µt|t = E(Xt |Y1:t )


= E(Xt |Y1:t−1 , vt ),

and

µt+1 = E(Xt+1 |Y1:t )


= E(Xt+1 |Y1:t−1 , vt ).

Applying the Lemma to the joint distribution of Xt and vt conditioned on Yt−1


yields
µt|t = E(Xt |Y1:t−1 ) + Cov(Xt , vt |Y1:t−1 )Var(vt |Y1:t−1 )−1 vt . (20)

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Kalman filter

Note that

Cov(Xt , vt |Y1:t−1 ) = E Xt (HXt + ηt − Hµt )T |Y1:t−1




= E Xt (Xt − µt )T H T |Y1:t−1

(21)
= E (Xt − µt )(Xt − µt )T |Y1:t−1 H T


= Pt H T ,

by the definition (12) of Pt .


This allows us to rewrite equation (20) in the form

µt|t = µt + Pt H T Ft−1 vt , (22)

where Ft is defined by (16).

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Kalman filter

Next, we conclude from the Lemma that

Var(Xt |Y1:t ) = Var(Xt |Y1:t−1 , vt )


= Var(Xt |Y1:t−1 ) − Cov(Xt , vt |Y1:t−1 )Var(vt |Y1:t−1 )−1 Cov(Xt , vt |Y1:t−1 )T .

From the definition (13) of Pt|t , this can be stated as the following relation:

Pt|t = Pt − Pt H T Ft−1 HPt


(23)
= (I − Kt H)Pt .

where
Kt = Pt H T Ft−1 . (24)
The matrix Kt is referred to as the Kalman gain.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Kalman filter
Now we are ready to establish recursions for µt and Pt . From the state equation
(the first equation in (10)) we have

µt+1 = E(GXt + εt+1 |Y1:t )


= GE(Xt |Y1:t ) (25)
= Gµt|t .

Furthermore,

Pt+1 = Var(GXt + εt+1 |Y1:t )


(26)
= GVar(Xt |Y1:t )GT + Var(εt+1 |Y1:t ).

Substituting (23) and (24) into (26) we find that

Pt+1 = GPt|t GT + Q. (27)

Relations and are called the prediction step of the Kalman filter.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Kalman filter
Using this notation, we can write the full system of recursive relations for
updating from t to t + 1 in the following form:

vt = Yt − Hµt ,
Ft = HPt H T + R,
Kt = Pt H T Ft−1 ,
µt|t = µt + Kt vt , (28)
Pt|t = (I − Kt H)Pt ,
µt+1 = Gµt|t ,
Pt+1 = GPt|t GT + Q,

for t = 1, 2, . . ..
The initial values µ1 and P1 are assumed to be known (consequently,
X1 ∼ N(µ1 , P1 )).
If the matrices G, H, Q, R depend (deterministically) on t, the formulas above
remain valid with G replaced by Gt , etc.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Kalman filter with mean adjustments


It is sometimes necessary to consider a linear SSM with mean adjustments:

Xt+1 = GXt + Ct + εt+1 ,


(29)
Yt = HXt + Dt + ηt ,

where Ct ∈ Rr and Dt ∈ Rn are deterministic (known).


Following the arguments above, we can derive the following Kalman filter for (29):

vt = Yt − Hµt − Dt ,
Ft = HPt H T + R,
Kt = Pt H T Ft−1 ,
µt|t = µt + Kt vt , (30)
Pt|t = (I − Kt H)Pt ,
µt+1 = Gµt|t + Ct ,
Pt+1 = GPt|t GT + Q,

for t = 1, 2, . . ..
A. Lesniewski Time Series Analysis
Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

State smoothing

State smoothing refers to the process of estimation of the values of the states
X1 , . . . , XT , given the entire observation set.
The objective is thus to (recursively) determine the conditional mean

bt = E(Xt |Y1:T )
X (31)

and the conditional variance

Vt = Var(Xt |Y1:T ). (32)

Since all distributions are normal, Xt |Y1:T ∼ N(X


bt , Vt ).
As before, we assume that X1 ∼ N(µ1 , P1 ) with known µ1 and P1 .

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

State smoothing

An analysis, similar to the derivation of the Kalman filter leads to the following
result (see [2]) for the derivation).
The smoothing process consists of two phases:
(i) forward sweep of the Kalman filter (28) for t = 1, . . . , T ,
(ii) backward recursion

Rt−1 = H T Ft−1 vt + LTt rt ,


Nt−1 = H T Ft−1 H + LTt Nt Lt ,
(33)
X
bt = µt + Pt Rt−1 ,
Vt = Pt − Pt Nt−1 Pt ,

where Lt = G(I − Kt H), for t = T , T − 1, . . ., with the terminal condition


RT = 0 and NT = 0.
This version of the smoothing algorithm is somewhat unintuitive but
computationally efficient.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Forecasting with Kalman filter


The forecasting problem consists in predicting the values of Xt+d and Yt+d given
the observations Y1:t .
As discussed in Lecture Notes #1, the optimal forecasts of the state variable and
observation are given by:

Xt+d = E(Xt+d |Y1:t ), (34)
with variance

∗ ∗ ∗
− Xt+d )T |Y1:t ,

Pt+d = E (Xt+d − Xt+d )(Xt+d (35)

and

Yt+d = E(Yt+d |Y1:t ), (36)
with variance

∗ ∗ ∗
− Yt+d )T |Y1:t ,

Vt+d = E (Yt+d − Yt+d )(Yt+d (37)

respectively.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Forecasting with Kalman filter

The forecast is straightforward for d = 1:


Xt+1 = Gµt|t , (38)

with

Pt+1 = GPt+1 GT + Q, (39)
and

Yt+1 = Hµt+1 , (40)
with

Vt+1 = HPt+1 H T + R. (41)

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Forecasting with Kalman filter

For d > 1 we obtain the recursions:

∗ ∗
Xt+d = GXt+d−1 , (42)

with
∗ ∗
Pt+d = GPt+d−1 GT + Q, (43)
and
∗ ∗
Yt+d = HXt+d , (44)
with
∗ ∗
Vt+d = HPt+d H T + R. (45)

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

MLE estimation of the parameters

We have left a number of parameters that may have not been specified, namely:
(i) the initial values µ1 and P1 that enter the probability distribution N(µ1 , P1 )
of the state X1 ,
(ii) the matrices G and H,
(iii) the variances Q and R of the disturbances in the state and observation
equations, respectively.
We denote the unspecified model parameters collectively by θ.
If µ1 and P1 are known, the remaining parameters θ can be estimated by means
of MLE.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

MLE estimation of the parameters

To this end, we consider the joint probability of the observations:

T
Y
p(Y1:T |θ) = p(Yt |Y1:t−1 ), (46)
t=1

where p(Y1 |Y0 ) = p(Y1 ).


Hence,

T
1 X 
− log L(θ|Y1:T ) = log det(Ft ) + vtT Ft−1 vt + const, (47)
2 t=1

where vt denotes the innovation.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

MLE estimation of the parameters

For each t, the value of the log likelihood function is calculated by running the
Kalman filter.
Searching for the minimum of this log likelihood function using an efficient
algorithm such as BFGS, we find estimates of θ.
In the case of unknown µ1 and P1 , one can use the diffuse log likelihood
method, which is discussed in detail in [2].
Alternatively, one can regard µ1 and P1 hyperparameters of the model.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Nonlinear state space models

The recursion relations defining the Kalman filter can be extended to the case of
more general state space models.
Some of these extensions are straightforward (such as adding constant terms to
the right hand sides of the state and observation equations in (10)), others are
fundamentally more complicated.
In the remainder of this lecture we summarize these extensions following [2].

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Nonlinear state space models

In general, a state space model has the form:

X1 → X2 → ... → Xt → ...
   
y y y y ··· (48)

Y1 Y2 ... Yt ...

where the state and observed variables may be continuous or discrete.


It is also not required that the distributions of the residuals are Gaussian.
Such models include linear and non-linear Kalman filters, hidden Markov
models, stochastic volatility models, etc.
As in the case of a linear SSM, the filtering problem is to estimate sequentially
the values of the unobserved states Xt , given the values of the observation
process Y1 , · · · , Yt , for any time step t.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Nonlinear state space models


We assume that the states Xt and the observations Yt can be modeled in the
following form.
X1 , X2 , · · · , is a Markov process on Rn that evolves according to the transition
probability density p(Xt | Xt−1 ):

Xt | Xt−1 ∼ p(Xt | Xt−1 ). (49)

On the other hand, Yt depends only on the value of the state variable Xt :

Yt ∼ p(Yt | Xt ). (50)

Usually, these two relations are stated in explicit functional form:

Xt = G(Xt−1 , εt ),
(51)
Yt = H(Xt , ηt ),

where εt and ηt are noises.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Stochastic volatility model

An example of a nonlinear SSM is the stochastic volatility model

Xt+1 = a + Xt + εt+1 ,
(52)
Yt = exp(Xt )ηt ,

where Xt , Yt ∈ R, a ∈ R, εt ∼ N(0, α2 ), and ηt ∼ N(0, 1).


This model can be thought of as follows:
(i) the (hidden) state variable Xt drives the stochastic volatility process
σt = exp(Xt ),
(ii) the volatility process is observed through the change Yt = Ft+1 − Ft in the
market observable Ft (such as asset price or forward rate).
One can view this model as a discretized version of a continuous time stochastic
volatility model such as SABR.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Extended Kalman filter

The extended Kalman filter (EKF) consists in approximating a nonlinear SSM by


a linear SSM followed by applying the Kalman filter.
Namely, assume that the state and observation equations are given by

Xt+1 = G(Xt ) + εt+1 ,


(53)
Yt = H(Xt ) + ηt ,

respectively, where
(i) G(x) and H(x) are differentiable functions on Rr ,
(ii) the disturbances εt and ηt are mutually and serially uncorrelated with
mean zero and covariances Q(Xt ) and R(Xt ), respectively (we do not
require that their distributions are Gaussian),
(iii) X1 has mean µ1 and variance P1 , and is uncorrelated with all noises.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Extended Kalman filter

We denote by Gt0 and Ht0 the matrices of first derivatives (Jacobi matrices) of
G(Xt ) and H(Xt ) evaluated at µt and µt|t , respectively:

Gt0 = ∇G(Xt )|Xt =µt|t ,


Ht0 = ∇H(Xt )|Xt =µt .

We now expand the matrix functions G, H, Q and R in Taylor series to the orders
indicated:

G(Xt ) = G(µt|t ) + Gt0 (Xt − µt|t ) + . . . ,


H(Xt ) = H(µt ) + Ht0 (Xt − µt ) + . . . ,
Q(Xt ) = Q(µt|t ) + . . . ,
R(Xt ) = R(µt ) + . . . ,

and disregard the higher order terms denoted by . . ..

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Extended Kalman filter


As a result of this approximation we obtain a linear SSM with mean adjustment:

Xt+1 = Gt0 Xt + G(µt|t ) − Gt0 µt|t + εt+1 ,



(54)
Yt = Ht0 Xt + H(µt ) − Ht0 µt + ηt .


Applying the Kalman filter formulas to this SSM we obtain the following EKF
recursion:

vt = Yt − H(µt ),
Ft = Ht0 Pt Ht0T + R(µt ),
Kt = Pt H 0T Ft−1 ,
µt|t = µt + Kt vt , (55)
Pt|t = (I − Kt Ht0 )Pt ,
µt+1 = G(µt|t ),
Pt+1 = Gt0 Pt|t Gt0T + Q(µt|t ),

for t = 1, 2, . . ..
A. Lesniewski Time Series Analysis
Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Extended Kalman filter

The EKF works well if the functions G and H are weakly nonlinear, for strongly
nonlinear models its performance may be poor.
Other extensions of the Kalman filter have been developed, including the
unscented Kalman filter (UKF).
It is based on a different principle than the EKF: rather than approximating G and
H by linear expressions, one matches approximately the first and second
moments of a nonlinear function of a Gaussian random variable, see [2] for
details.
Another approach to inference in nonlinear SSMs is via Monte Carlo (MC)
techniques particle filters, a.k.a. sequential Monte Carlo.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Particle filtering

Estimation of complex time series models requires evaluation of complex


expected values, often expressed as high dimensional, analytically intractable
integrals.
Particle filters provide a method for calculating such integrals approximately via
carefully crafted MC techniques.
In this approach, a continuous PDF is approximated by a discrete PDF made of
weighted outcomes called particles.
Particle filter algorithms are formulated recursively, very much in the spirit of the
Kalman filter.
They are far reaching generalizations of the Kalman filter to nonlinear,
non-Gaussian SSMs.
Since particle filtering is based on MC methods, its performance or accuracy
does not much that of the Kalman filter.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Nonlinear state space models

The probability distributions in the following depend on some parameters θ. In


order to streamline the notation, we will suppress θ from all the formulas.
All (Bayesian) inference about Xt is encoded in the posterior PDF p(Xt | Y1:t ).
The particle filter methodology provides an approximation of these conditional
probabilities using the empirical measure associated with a sampling algorithm.
The objective of a particle filter is to estimate the posterior PDF of the
(unobserved) state variables given a time series of observations.
Distribution properties of the state variable can be captured by the joint
smoothing distribution, which is defined as

p(X1:t , Y1:t )
p(X1:t | Y1:t ) = . (56)
p(Y1:t )

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Joint smoothing distribution

We derive the following recursion relation for the joint smoothing distribution:

p(Yt | X1:t , Y1:t−1 )p(X1:t , Y1:t−1 )


p(X1:t | Y1:t ) =
p(Yt , Y1:t−1 )
p(Yt | X1:t , Y1:t−1 )p(Xt | X1:t−1 , Y1:t−1 )
= p(X1:t−1 | Y1:t−1 ) (57)
p(Yt | Y1:t−1 )
p(Yt | Xt )p(Xt | Xt−1 )
= p(X1:t−1 | Y1:t−1 ).
p(Yt | Y1:t−1 )

This recursion will be approximated by numerically tractable expressions.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Filtering recursion

An alternative to working directly with the joint smoothing distribution is to find


recursive relations for the one-period predictive and filtering distributions.
This is analogous to the approach we took when deriving the Kalman filter.
Assume that the initial distribution p(X1 ) is known.
The one-period prediction distribution is given by
Z
p(Xt | Y1:t−1 ) = p(Xt | xt−1 )p(xt−1 | Y1:t−1 )dxt−1 . (58)

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Filtering recursion

The filtering distribution is calculated based on the arrival of the new observation
Yt .
Namely, applying Bayes’ rule, and the fact that Yt depends on Xt only,

p(Yt , Xt | Y1:t−1 )
p(Xt | Y1:t ) =
p(Yt | Y1:t−1 )
p(Yt | Xt , Y1:t−1 )p(Xt | Y1:t−1 )
= R (59)
p(Yt | xt )p(xt | Y1:t−1 )dxt
p(Yt | Xt )p(Xt | Y1:t−1 )
= R .
p(Yt | xt )p(xt | Y1:t−1 )dxt

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Filtering recursion

The difficulty with this recursion is clear: there is a complicated integral in the
denominator, which cannot in general be calculated in closed form.
In some special cases this can be done: for example, in the case of a linear
Gaussian state space model, this integral is Gaussian and can be calculated.
The recursion above leads then to the Kalman filter.
Instead of trying to evaluate the integral numerically, we will develop a Monte
Carlo based approach for approximately solving recursions (58) and (59).

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Importance sampling

Suppose we are faced with Monte Carlo evaluation of the expected value
Z
E(f (X1:t ) | Y1:t ) = f (x1:t )p(x1:t | Y1:t )dx1:t . (60)

The straightforward approach would be to generate a number of samples


j j
x1:t , j = 1, . . . , N, from the distribution p(x1:t |Y1:t ), evaluate the integrand f (x1:t )
on each of these samples, and take the average of these values.
This approach may prove impractical if the density p(x1:t | Y1:t ) is hard to
simulate from.
Instead, we use the method of importance sampling (IS).
We proceed as follows:

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Importance sampling

1. Choose a proposal distribution g(X1:t | Y1:t ), and write

p(x1:t | Y1:t )
Z
E(f (X1:t ) | Y1:t ) = f (x1:t ) g(x1:t | Y1:t )dx1:t . (61)
g(x1:t | Y1:t )

The proposal distribution should be chosen so that it is easy to sample from it.
1 , . . . , x N from the proposal distribution, and assign
2. Draw N samples of paths x1:t 1:t
to each of them a weight proportional to the ratio of the target and proposal
distributions:
j
j p(x1:t | Y1:t )
wt ∝ j
. (62)
g(x1:t | Y1:t )

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Importance sampling

3. Given the sample, we define the estimated expected value by

N
btj f (x1:t
j
X
b N (f (X1:t ) | Y1:t ) =
E w ), (63)
j=1

btj , j = 1, . . . , N, are given by


where the importance weights w

j
w
btj = P t
w . (64)
N j
j=1 wt

The efficiency of IS depends essentially on how closely the proposal distribution


g(X1:t | Y1:t ) matches the target distribution.
One could, for example, settle on a parametric distribution such as Gaussian and
fine tune its parameters by minimizing its KL divergence from p(x1:t | Y1:t ).

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Sequential importance sampling


Another serious limitation of IS is that it is computationally very expensive to
j
generate x1:t , and that this cost increases with t.
To mitigate it, the method of sequential importance sampling (SIS) has been
developed.
j
In this approach we retain the previously simulated values x1:t−1 and generate
j
the value of xt only.
j
In order to implement this idea, the samples x1:t are simulated from a sequence
of conditional distributions rather than a joint proposal distribution.
The proposal distribution can be factored into two pieces as follows :

g(X1:t , Y1:t )
g(X1:t | Y1:t ) =
g(Y1:t )
g(Xt | X1:t−1 , Y1:t )g(X1:t−1 , Y1:t )
=
g(Y1:t )
= g(Xt | X1:t−1 , Y1:t )g(X1:t−1 | Y1:t ).

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Sequential importance sampling

Once the sample of X1:t−1 has been generated from g(X1:t−1 | Y1:t−1 ), its value
is independent of the observation Yt , and so g(X1:t−1 | Y1:t ) = g(X1:t−1 | Y1:t−1 ).
We can thus write the result of the calculation above as the following recursion:

g(X1:t | Y1:t ) = g(Xt | X1:t−1 , Y1:t )g(X1:t−1 | Y1:t−1 ). (65)

The second factor on the RHS of this equation, g(X1:t−1 | Y1:t−1 ), is the proposal
distribution built out of the paths that have already been generated in the
previous steps.
A new set of samples xt1 , . . . , xtN is drawn from the first factor g(Xt | X1:t−1 , Y1:t ).
We then append the newly simulated values xt1 , . . . , xtN to the simulated paths
1
x1:t−1 N
, . . . , x1:t−1 of length t − 1.
1 , . . . , x N of length t.
We thus obtain simulated paths x1:t 1:t

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Sequential importance sampling


The weights (62) can be computed as follows. Using (57) and (65),
j j j j
j p(Yt | xt )p(xt | xt−1 )p(x1:t−1 | Y1:t−1 )
wt ∝ j j j
p(Yt | Y1:t−1 )g(xt | x1:t−1 , Y1:t )g(x1:t−1 | Y1:t−1 )
j j j j
p(Yt | xt )p(xt | xt−1 ) p(x1:t−1 | Y1:t−1 )
= j j
× j
p(Yt | Y1:t−1 )g(xt | x1:t−1 , Y1:t ) g(x1:t−1 | Y1:t−1 ) (66)
j j j
p(Yt | xt )p(xt | xt−1 ) j
∝ j j
wt−1
g(xt | x1:t−1 , Y1:t )
j
=w
et wt−1 ,

where the factor w


et is defined by
j j j
p(Yt | xt )p(xt | xt−1 )
w
et =
j j
. (67)
g(xt | x1:t−1 , Y1:t )

j
We initialize this distribution with w1 = 1.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Sequential importance sampling

The densities p(Yt | Xt ) and p(Xt | Xt−1 ) are determined by the state and
observation equations (51).
The only quantity that needs to be computed at each iteration is the ratio of
weights w
et .
1 , . . . , xN
As a result of each iteration, SIS produces N Monte Carlo paths x1:t 1:t
1 N
along with the unnormalized importance weights wt , . . . , wt .
These paths are referred to as particles.
We define the normalized weights by

j
w
btj = P t
w . (68)
N
i=1 wti

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Sequential importance sampling


The joint smoothing PDF is estimated as follows:

N
btj δ(X1:t − x1:t
j
X
p(X1:t | Y1:t ) =
b w ), (69)
j=1

where δ denotes Dirac’s delta function, and so

N
btj f (x1:t
j
X
b (X1:t ) | Y1:t ) =
E(f w ). (70)
j=1

The estimated contribution to the likelihood function at time t is equal to

N
j
etj .
X
p(Yt | Y1:t−1 ) =
b w
bt−1 w (71)
j=1

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Sequential importance sampling with resampling

It has been observed that in practice SIS suffers from the weight degeneracy
problem.
This manifests itself in the rapid increase of the variance of the distribution of the
importance weights as the number of time steps t increases.
As t increases, all the probability density gets eventually allocated to a single
particle.
That particle’s normalized weight converges to one, while the normalized weights
of the other particles converge to zero, and the SIS estimator becomes a
function of a single sample.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Sequential importance sampling with resampling


A remedy to this problem is resampling, a process in which a new population of
particles is replicated from the existing population in proportion to their
normalized importance weights.
This algorithm is called sequential importance sampling with resampling (SISR).
A new population of particles is generated by sampling from the existing
population:
(i) The probability of selecting a particle is proportional to its normalized
importance weight.
(ii) Once the resampled particles are selected, their weights are set equal (to
1/N). This prevents the weights from degenerating as in SIS.
We proceed as follows:
j
1. Initialize the filter: draw N samples x1 ∼ g(X1 ) and define the weights

j
j p(x1 )
w1 = j
.
g(x1 )

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Sequential importance sampling with resampling

2. For t = 2, . . . , T :
j j
(i) Generate N samples xt ∼ g(Xt | xt−1 , Y1:t ) and compute the importance
weights
j j j
j p(Yt | xt ) p(xt | xt−1 ) j
wt ∝ j j
wt−1 .
g(xt | xt−1 , Y1:t )

(ii) Normalize the importance weights:

j
w
btj = P t
w . (72)
N j
j=1 wt

j
bt1 , . . . , w
(iii) Resample N particles with probabilities w btN , and define wt = 1/N.
After every iteration, once the particles have been generated, quantities of
interest can be estimated.

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Sequential importance sampling with resampling


The joint smoothing PDF at time t is estimated as follows:

N
btj δ(X1:t − x1:t
j
X
pN (X1:t | Y1:t ) =
b w ), (73)
j=1

where the normalized weights are given by (72).


The estimate of the expected value of a function f (X1:t ) of the path X1:t is given
by
N
btj δ(X1:t − x1:t
j
X
b N (f (X1:t ) | Y1:t ) =
E w ). (74)
j=1

The contribution to the likelihood function at time t is estimated as follows:


Z
pN (Yt | Y1:t−1 ) ≈
b p(Yt | xt )p(xt | Y1:t−1 )dxt

N (75)
1 X j
≈ w
bt .
N
j=1

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

Bootstrap filter

The efficacy of the algorithms presented above depends on the choice of the
proposal distribution.
The simplest choice of the proposal distribution is

g(Xt | Xt−1 , Yt ) = p(Xt | Xt−1 ). (76)

This choice is called the prior kernel, and the corresponding particle filter is
called the bootstrap filter.
The bootsstrap filter resamples by setting the incremental weight ratios equal to
et = p(Yt | Xt ).
w
The prior kernel is an example of a blind proposal: it does not use the current
observation Yt .
Despite this, the bootstrap filter performs well in a number of situations.
Another popular version is the auxiliary particle filter, see [1] and [2].

A. Lesniewski Time Series Analysis


Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering

References

[1] Creal, D.: A survey of sequential Monte Carlo methods for economics and
finance, Economic Reviews, textbf31, 245 - 296 (2012).
[2] Durbin, J., and Koopman, S. J.: Time Series Analysis by State Space
Methods, Oxford University Press (2012).

A. Lesniewski Time Series Analysis

You might also like