TS Lecture5 2019

Warm-up: Recursive Least Squares
Kalman Filter
Nonlinear State Space Models
Particle Filtering
Time Series Analysis

5. State space models and Kalman filtering
Andrew Lesniewski
Baruch College
New York
Fall 2019
A. Lesniewski Time Series Analysis

Kalman Filter
Particle Filtering
Outline
1 Warm-up: Recursive Least Squares
2 Kalman Filter
3 Nonlinear State Space Models
4 Particle Filtering

Kalman Filter
Particle Filtering
OLS regression
As a motivation for the reminder of this lecture, we consider the standard linear
model
Y = X T β + ε, (1)
where Y ∈ R, X ∈ Rk , and ε ∈ R is noise (this includes the model with an
intercept as a special case in which the first component of X is assumed to be 1).
Given n observations x1 , . . . , xn and y1 , . . . , yn of X and Y , respectively, the
ordinary least square least (OLS) regression leads to the following estimated
value of the coefficient β:
βbn = (XnT Xn )−1 XnT Yn . (2)
The matrices X and Y above are defined as
x1T y1
   
X =  ..  ∈ Matn,k (R) and Yn =  ..  ∈ Rn ,

 .  .
(3)
xnT yn
respectively.

Kalman Filter
Particle Filtering
Recursive least squares
Suppose now that X and Y consists of a streaming set of data, and each new
observation leads to an updated value of the estimated β.
It is not optimal to redo the entire calculation above after a new observation
arrives in order to find the updated value.
Instead, we shall derive a recursive algorithm that stores the last calculated
value of the estimated β, and updates it to incorporate the impact of the latest
observation.
To this end, we introduce the notation:
Pn = (XnT Xn )−1 ,
(4)
Bn = XnT Yn ,
and rewrite (2) as

βbn = Pn Bn . (5)

Kalman Filter
Particle Filtering
Using (3), we can write

Xn
Xn+1 = T ,
xn+1

Yn
Yn+1 = .
yn+1
Therefore, Pn and Bn obey the following recursive relations:
−1
Pn+1 = Pn−1 + xn+1 xn+1
T
,
Bn+1 = Bn + xn+1 yn+1 .

Kalman Filter
Particle Filtering
We shall now use the matrix inversion formula:
(A + BD)−1 = A−1 − A−1 B(I + DA−1 B)−1 DA−1 , (6)
valid for a square invertible matrix A, and matrices B and D such that the
operations above are defined.
This yields the following relation:
T
Pn+1 = Pn − Pn xn+1 (1 + xn+1 Pn xn+1 )−1 xn+1
T
Pn
T
= Pn − Kn+1 xn+1 Pn .
where we have introduced the notation
T
Kn+1 = Pn xn+1 (1 + xn+1 Pn xn+1 )−1 .

Kalman Filter
Particle Filtering
Now, define the a priori error
T b
εbn+1 = yn+1 − xn+1 βn .
Then the recursion for Bn can be recast as
T b
Bn+1 = Bn + xn+1 xn+1 βn + xn+1 εbn+1 .
Using (5), we see that
−1 b
Pn+1 βn+1 = Pn−1 βbn + xn+1 xn+1
T b
βn + xn+1 εbn+1
= (Pn−1 + xn+1 xn+1
T
)βbn + xn+1 εbn+1
−1 b
= Pn+1 βn + xn+1 εbn+1 .

Kalman Filter
Particle Filtering
In other words,
βbn+1 = βbn + Pn+1 xn+1 εbn+1 .
However, from the definition of Kn+1 ,
Pn+1 xn+1 = Kn+1 ,
and so
βbn+1 = βbn + Kn+1 εbn+1 .

Kalman Filter
Particle Filtering
The algorithm can be summarized as follows. We initialize βb0 (e.g. at 0), and P0
(e.g. at I), and iterate:
T b
εbn+1 = yn+1 − xn+1 βn ,
T
Kn+1 = Pn xn+1 (1 + xn+1 Pn xn+1 )−1 ,
T
(7)
Pn+1 = Pn − Kn+1 xn+1 Pn ,
βbn+1 = βbn + Kn+1 εbn+1 .
Note that (i) we no longer have to store the (potentially large) matrices Xn and
Yn , and (ii) the computationally expensive operation of inverting the matrix Xn XnT
is replaced with a small number of simpler operations.
We now move on to the main topic of these notes, the Kalman filter and its
generalizations.

Kalman Filter
Particle Filtering
State space models
A state space model (SSM) is a time series model in which the time series Yt is
interpreted as the result of a noisy observation of a stochastic process Xt .
The values of the variables Xt and Yt can be continuous (scalar or vector) or
discrete.
Graphically, an SSM is represented as follows:
X1 → X2 → ... → Xt → ... state process

   
y y y y ··· (8)
Y1 Y2 ... Yt ... observation process
SSMs belong to the realm of Bayesian inference, and they have been
successfully applied in many fields to solve a broad range of problems.
Our discussion of SSMs follows largely [2].

Kalman Filter
Particle Filtering
State space models
It is usually assumed that the state process Xt is Markovian, i.e. Xt depends on

the history only through Xt−1 , and Yt depends only on Xt :
Xt ∼ p(Xt |Xt−1 ),
(9)
Yt ∼ p(Yt |Xt ).
The most well studied SSM is the Kalman filter, for which the processes above
are linear and and the sources of randomness are Gaussian.
Namely, a linear state space model has the form:
Xt+1 = GXt + εt+1 ,

(10)
Yt = HXt + ηt .

Kalman Filter
Particle Filtering
State space models
Here, the state vector Xt ∈ Rr is possibly unobservable and it can be observed

only through the observation vector Yt ∈ Rn .
The matrices G ∈ Matr (R) and H ∈ Matn,r (R) are assumed to be known. For
example, their values may be given by (economic) theory, or they may have been
obtained through MLE estimation.
In fact, the matrices G and H may depend deterministically on time, i.e. G and H
may be replaced by known matrices Gt and Ht , respectively.
We also assume that the distribution of the initial value X1 is known and
Gaussian.

Kalman Filter
Particle Filtering
State space models
The vectors of residuals εt ∈ Rr and ηt ∈ Rn satisfy
E(εt εTs ) = δts Q,

(11)
E(ηt ηsT ) = δts R,
where δts denotes Kronecker’s delta, and where Q and R are known positive
definite (covariance) matrices.
We also assume that the components of εt and ηs are independent of each other
for all t and s.
The matrices Q and R may depend deterministically on time.
The first of the equations in (10) is called the state equation, while the second
one is referred to as the observation equation.

Kalman Filter
Particle Filtering
Inference for state space models
Let T denote the time horizon.

Our broad goal is to make inference about the states Xt based on a set of
observations Y1 , . . . , Yt .
Three questions are of particular interest:
(i) Filtering: t < T . What can we infer about the current state of the system
based on all available observations?
(ii) Smoothing: t = T . What can be inferred about the system based on the
information contained in the entire data sample? In particular, how can we
back fill missing observations?
(iii) Forecasting: t > T . What is the optimal prediction of a future observation
and / or a future state of the system?

Kalman Filter
Particle Filtering
Kalman filter
In principle, any inference for this model can be done using the standard
methods of multivariate statistics.
However, these methods require storing large amounts of data and inverting
tn × tn matrices. Notice that, as new data arrive, the storage requirements and
matrix dimensionality increase.
This is frequently computationally intractable and impractical.
Instead, the Kalman filter relies on a recursive approach which does not require
significant storage resources and involves inverting n × n matrices only.
We will go through a detailed derivation of this recursion.

Kalman Filter
Particle Filtering
Kalman filter
The purpose of filtering is to update the knowledge of the system each time a
new observation is made.
We define the one period predictor µt+1 , when the observation Yt is made, and
its covariance Pt+1 :
µt+1 = E(Xt+1 |Y1:t ),

(12)
Pt+1 = Var(Xt+1 |Y1:t ),
as well as the filtered estimator µt|t and its covariance Pt|t :
µt|t = E(Xt |Y1:t ),

(13)
Pt|t = Var(Xt |Y1:t ).
Our objective is to compute these quantities recursively.

Kalman Filter
Particle Filtering
Kalman filter
We let
vt = Yt − E(Yt |Y1:t−1 ) (14)
denote the one period prediction error or innovation.
In Homework Assignment #6 we show that vt ’s are mutually independent.
Note that
vt = Yt − E(HXt + ηt |Y1:t−1 )
= Yt − Hµt .
As a consequence, we have, for t = 2, 3, . . .,
E(vt |Y1:t−1 ) = E(HXt + ηt − Hµt |Y1:t−1 )

(15)
= 0.

Kalman Filter
Particle Filtering
Kalman filter
Now we notice that
Var(vt |Y1:t−1 ) = Var(HXt + ηt − Hµt |Y1:t−1 )

= Var(H(Xt − µt )|Y1:t−1 ) + Var(ηt |Y1:t−1 )
= E(H(Xt − µt )(Xt − µt )T H T |Y1:t−1 ) + E(ηt ηtT |Y1:t−1 )
= HPt H T + R.
For convenience we denote
Ft = Var(vt |Y1:t−1 ). (16)
We will assume in the following that the matrix Ft is invertible.

The result of the calculation above can thus be stated as:
Ft = HPt H T + R. (17)
This relation allows us to derive a relation between µt and µt|t .

Kalman Filter
Particle Filtering
Lemma
First, we will establish the following Lemma: Let X and Y be Gaussian jointly
distributed random vectors with

X µX
E = ,
Y µY
and
X ΣXX ΣXY
Var =
Y ΣTXY ΣYY
Then
E(X |Y ) = µX + ΣXY Σ−1

YY
Y − µY , (18)
and
Var(X |Y ) = ΣXX − ΣXY Σ−1 ΣT .
YY XY
(19)
Proof : Consider the random variable
Z = X − ΣXY Σ−1

YY
Y − µY .

Kalman Filter
Particle Filtering
Lemma
Since Z is a linear in X and Y , the vector (Y , Z ) is Gaussian jointly distributed.

Furthermore,
E(Z ) = µX ,
Var(Z ) = E((Z − µX )(Z − µX )T )
= ΣXX − ΣXY Σ−1 ΣT .
YY XY
Finally,
Cov(Y , Z ) = E(Y (Z − µX )T )
= E(Y (X − µX )T − Y (Y − µY )T Σ−1 ΣT )
YY XY
= 0.
This means that Z and Y are independently distributed!

Kalman Filter
Particle Filtering
Lemma
Consequently, E(Z |Y ) = E(Z ) and Var(Z |Y ) = Var(Z ).

Since
X = Z + ΣXY Σ−1

YY
Y − µY ,
we have
E(X |Y ) = µX + ΣXY Σ−1

YY
Y − µY ,
which proves (18).
Also, conditioned on Y , X and Z differ by a constant vector, and so
Var(X |Y ) = Var(Z |Y )
= Var(Z )
= ΣXX − ΣXY Σ−1 ΣT ,
YY XY
which proves (19). QED

Kalman Filter
Particle Filtering
Kalman filter
Now, going back to the main calculation, we have
µt|t = E(Xt |Y1:t )

= E(Xt |Y1:t−1 , vt ),
and
µt+1 = E(Xt+1 |Y1:t )

= E(Xt+1 |Y1:t−1 , vt ).
Applying the Lemma to the joint distribution of Xt and vt conditioned on Yt−1

yields
µt|t = E(Xt |Y1:t−1 ) + Cov(Xt , vt |Y1:t−1 )Var(vt |Y1:t−1 )−1 vt . (20)

Kalman Filter
Particle Filtering
Kalman filter
Note that
Cov(Xt , vt |Y1:t−1 ) = E Xt (HXt + ηt − Hµt )T |Y1:t−1

= E Xt (Xt − µt )T H T |Y1:t−1

(21)
= E (Xt − µt )(Xt − µt )T |Y1:t−1 H T

= Pt H T ,
by the definition (12) of Pt .

This allows us to rewrite equation (20) in the form
µt|t = µt + Pt H T Ft−1 vt , (22)
where Ft is defined by (16).

Kalman Filter
Particle Filtering
Kalman filter
Next, we conclude from the Lemma that
Var(Xt |Y1:t ) = Var(Xt |Y1:t−1 , vt )

= Var(Xt |Y1:t−1 ) − Cov(Xt , vt |Y1:t−1 )Var(vt |Y1:t−1 )−1 Cov(Xt , vt |Y1:t−1 )T .
From the definition (13) of Pt|t , this can be stated as the following relation:
Pt|t = Pt − Pt H T Ft−1 HPt

(23)
= (I − Kt H)Pt .
where
Kt = Pt H T Ft−1 . (24)
The matrix Kt is referred to as the Kalman gain.

Kalman Filter
Particle Filtering
Kalman filter
Now we are ready to establish recursions for µt and Pt . From the state equation
(the first equation in (10)) we have
µt+1 = E(GXt + εt+1 |Y1:t )

= GE(Xt |Y1:t ) (25)
= Gµt|t .
Furthermore,
Pt+1 = Var(GXt + εt+1 |Y1:t )

(26)
= GVar(Xt |Y1:t )GT + Var(εt+1 |Y1:t ).
Substituting (23) and (24) into (26) we find that
Pt+1 = GPt|t GT + Q. (27)
Relations and are called the prediction step of the Kalman filter.

Kalman Filter
Particle Filtering
Kalman filter
Using this notation, we can write the full system of recursive relations for
updating from t to t + 1 in the following form:
vt = Yt − Hµt ,
Ft = HPt H T + R,
Kt = Pt H T Ft−1 ,
µt|t = µt + Kt vt , (28)
Pt|t = (I − Kt H)Pt ,
µt+1 = Gµt|t ,
Pt+1 = GPt|t GT + Q,
for t = 1, 2, . . ..
The initial values µ1 and P1 are assumed to be known (consequently,
X1 ∼ N(µ1 , P1 )).
If the matrices G, H, Q, R depend (deterministically) on t, the formulas above
remain valid with G replaced by Gt , etc.

Kalman Filter
Particle Filtering
Kalman filter with mean adjustments

It is sometimes necessary to consider a linear SSM with mean adjustments:
Xt+1 = GXt + Ct + εt+1 ,

(29)
Yt = HXt + Dt + ηt ,
where Ct ∈ Rr and Dt ∈ Rn are deterministic (known).

Following the arguments above, we can derive the following Kalman filter for (29):
vt = Yt − Hµt − Dt ,
Ft = HPt H T + R,
Kt = Pt H T Ft−1 ,
µt|t = µt + Kt vt , (30)
Pt|t = (I − Kt H)Pt ,
µt+1 = Gµt|t + Ct ,
Pt+1 = GPt|t GT + Q,
for t = 1, 2, . . ..
Kalman Filter
Particle Filtering
State smoothing
State smoothing refers to the process of estimation of the values of the states
X1 , . . . , XT , given the entire observation set.
The objective is thus to (recursively) determine the conditional mean
bt = E(Xt |Y1:T )
X (31)
and the conditional variance
Vt = Var(Xt |Y1:T ). (32)
Since all distributions are normal, Xt |Y1:T ∼ N(X

bt , Vt ).
As before, we assume that X1 ∼ N(µ1 , P1 ) with known µ1 and P1 .

Kalman Filter
Particle Filtering
State smoothing
An analysis, similar to the derivation of the Kalman filter leads to the following
result (see [2]) for the derivation).
The smoothing process consists of two phases:
(i) forward sweep of the Kalman filter (28) for t = 1, . . . , T ,
(ii) backward recursion
Rt−1 = H T Ft−1 vt + LTt rt ,

Nt−1 = H T Ft−1 H + LTt Nt Lt ,
(33)
X
bt = µt + Pt Rt−1 ,
Vt = Pt − Pt Nt−1 Pt ,
where Lt = G(I − Kt H), for t = T , T − 1, . . ., with the terminal condition

RT = 0 and NT = 0.
This version of the smoothing algorithm is somewhat unintuitive but
computationally efficient.

Kalman Filter
Particle Filtering
Forecasting with Kalman filter

The forecasting problem consists in predicting the values of Xt+d and Yt+d given
the observations Y1:t .
As discussed in Lecture Notes #1, the optimal forecasts of the state variable and
observation are given by:
∗
Xt+d = E(Xt+d |Y1:t ), (34)
with variance
∗ ∗ ∗
− Xt+d )T |Y1:t ,

Pt+d = E (Xt+d − Xt+d )(Xt+d (35)
and
∗
Yt+d = E(Yt+d |Y1:t ), (36)
with variance
∗ ∗ ∗
− Yt+d )T |Y1:t ,

Vt+d = E (Yt+d − Yt+d )(Yt+d (37)
respectively.

Kalman Filter
Particle Filtering
The forecast is straightforward for d = 1:
∗
Xt+1 = Gµt|t , (38)
with
∗
Pt+1 = GPt+1 GT + Q, (39)
and
∗
Yt+1 = Hµt+1 , (40)
with
∗
Vt+1 = HPt+1 H T + R. (41)

Kalman Filter
Particle Filtering
For d > 1 we obtain the recursions:
∗ ∗
Xt+d = GXt+d−1 , (42)
with
∗ ∗
Pt+d = GPt+d−1 GT + Q, (43)
and
∗ ∗
Yt+d = HXt+d , (44)
with
∗ ∗
Vt+d = HPt+d H T + R. (45)

Kalman Filter
Particle Filtering
MLE estimation of the parameters
We have left a number of parameters that may have not been specified, namely:
(i) the initial values µ1 and P1 that enter the probability distribution N(µ1 , P1 )
of the state X1 ,
(ii) the matrices G and H,
(iii) the variances Q and R of the disturbances in the state and observation
equations, respectively.
We denote the unspecified model parameters collectively by θ.
If µ1 and P1 are known, the remaining parameters θ can be estimated by means
of MLE.

Kalman Filter
Particle Filtering
To this end, we consider the joint probability of the observations:
T
Y
p(Y1:T |θ) = p(Yt |Y1:t−1 ), (46)
t=1
where p(Y1 |Y0 ) = p(Y1 ).

Hence,
T
1 X
− log L(θ|Y1:T ) = log det(Ft ) + vtT Ft−1 vt + const, (47)
2 t=1
where vt denotes the innovation.

Kalman Filter
Particle Filtering
For each t, the value of the log likelihood function is calculated by running the
Kalman filter.
Searching for the minimum of this log likelihood function using an efficient
algorithm such as BFGS, we find estimates of θ.
In the case of unknown µ1 and P1 , one can use the diffuse log likelihood
method, which is discussed in detail in [2].
Alternatively, one can regard µ1 and P1 hyperparameters of the model.

Kalman Filter
Particle Filtering
Nonlinear state space models
The recursion relations defining the Kalman filter can be extended to the case of
more general state space models.
Some of these extensions are straightforward (such as adding constant terms to
the right hand sides of the state and observation equations in (10)), others are
fundamentally more complicated.
In the remainder of this lecture we summarize these extensions following [2].

Kalman Filter
Particle Filtering
In general, a state space model has the form:
X1 → X2 → ... → Xt → ...
   
y y y y ··· (48)
Y1 Y2 ... Yt ...
where the state and observed variables may be continuous or discrete.

It is also not required that the distributions of the residuals are Gaussian.
Such models include linear and non-linear Kalman filters, hidden Markov
models, stochastic volatility models, etc.
As in the case of a linear SSM, the filtering problem is to estimate sequentially
the values of the unobserved states Xt , given the values of the observation
process Y1 , · · · , Yt , for any time step t.

Kalman Filter
Particle Filtering

We assume that the states Xt and the observations Yt can be modeled in the
following form.
X1 , X2 , · · · , is a Markov process on Rn that evolves according to the transition
probability density p(Xt | Xt−1 ):
Xt | Xt−1 ∼ p(Xt | Xt−1 ). (49)
On the other hand, Yt depends only on the value of the state variable Xt :
Yt ∼ p(Yt | Xt ). (50)
Usually, these two relations are stated in explicit functional form:
Xt = G(Xt−1 , εt ),
(51)
Yt = H(Xt , ηt ),
where εt and ηt are noises.

Kalman Filter
Particle Filtering
Stochastic volatility model
An example of a nonlinear SSM is the stochastic volatility model
Xt+1 = a + Xt + εt+1 ,
(52)
Yt = exp(Xt )ηt ,
where Xt , Yt ∈ R, a ∈ R, εt ∼ N(0, α2 ), and ηt ∼ N(0, 1).

This model can be thought of as follows:
(i) the (hidden) state variable Xt drives the stochastic volatility process
σt = exp(Xt ),
(ii) the volatility process is observed through the change Yt = Ft+1 − Ft in the
market observable Ft (such as asset price or forward rate).
One can view this model as a discretized version of a continuous time stochastic
volatility model such as SABR.

Kalman Filter
Particle Filtering
Extended Kalman filter
The extended Kalman filter (EKF) consists in approximating a nonlinear SSM by

a linear SSM followed by applying the Kalman filter.
Namely, assume that the state and observation equations are given by
Xt+1 = G(Xt ) + εt+1 ,

(53)
Yt = H(Xt ) + ηt ,
respectively, where
(i) G(x) and H(x) are differentiable functions on Rr ,
(ii) the disturbances εt and ηt are mutually and serially uncorrelated with
mean zero and covariances Q(Xt ) and R(Xt ), respectively (we do not
require that their distributions are Gaussian),
(iii) X1 has mean µ1 and variance P1 , and is uncorrelated with all noises.

Kalman Filter
Particle Filtering
We denote by Gt0 and Ht0 the matrices of first derivatives (Jacobi matrices) of
G(Xt ) and H(Xt ) evaluated at µt and µt|t , respectively:
Gt0 = ∇G(Xt )|Xt =µt|t ,

Ht0 = ∇H(Xt )|Xt =µt .
We now expand the matrix functions G, H, Q and R in Taylor series to the orders
indicated:
G(Xt ) = G(µt|t ) + Gt0 (Xt − µt|t ) + . . . ,

H(Xt ) = H(µt ) + Ht0 (Xt − µt ) + . . . ,
Q(Xt ) = Q(µt|t ) + . . . ,
R(Xt ) = R(µt ) + . . . ,
and disregard the higher order terms denoted by . . ..

Kalman Filter
Particle Filtering

As a result of this approximation we obtain a linear SSM with mean adjustment:
Xt+1 = Gt0 Xt + G(µt|t ) − Gt0 µt|t + εt+1 ,

(54)
Yt = Ht0 Xt + H(µt ) − Ht0 µt + ηt .

Applying the Kalman filter formulas to this SSM we obtain the following EKF
recursion:
vt = Yt − H(µt ),
Ft = Ht0 Pt Ht0T + R(µt ),
Kt = Pt H 0T Ft−1 ,
µt|t = µt + Kt vt , (55)
Pt|t = (I − Kt Ht0 )Pt ,
µt+1 = G(µt|t ),
Pt+1 = Gt0 Pt|t Gt0T + Q(µt|t ),
for t = 1, 2, . . ..
Kalman Filter
Particle Filtering
The EKF works well if the functions G and H are weakly nonlinear, for strongly
nonlinear models its performance may be poor.
Other extensions of the Kalman filter have been developed, including the
unscented Kalman filter (UKF).
It is based on a different principle than the EKF: rather than approximating G and
H by linear expressions, one matches approximately the first and second
moments of a nonlinear function of a Gaussian random variable, see [2] for
details.
Another approach to inference in nonlinear SSMs is via Monte Carlo (MC)
techniques particle filters, a.k.a. sequential Monte Carlo.

Kalman Filter
Particle Filtering
Particle filtering
Estimation of complex time series models requires evaluation of complex

expected values, often expressed as high dimensional, analytically intractable
integrals.
Particle filters provide a method for calculating such integrals approximately via
carefully crafted MC techniques.
In this approach, a continuous PDF is approximated by a discrete PDF made of
weighted outcomes called particles.
Particle filter algorithms are formulated recursively, very much in the spirit of the
Kalman filter.
They are far reaching generalizations of the Kalman filter to nonlinear,
non-Gaussian SSMs.
Since particle filtering is based on MC methods, its performance or accuracy
does not much that of the Kalman filter.

Kalman Filter
Particle Filtering
The probability distributions in the following depend on some parameters θ. In

order to streamline the notation, we will suppress θ from all the formulas.
All (Bayesian) inference about Xt is encoded in the posterior PDF p(Xt | Y1:t ).
The particle filter methodology provides an approximation of these conditional
probabilities using the empirical measure associated with a sampling algorithm.
The objective of a particle filter is to estimate the posterior PDF of the
(unobserved) state variables given a time series of observations.
Distribution properties of the state variable can be captured by the joint
smoothing distribution, which is defined as
p(X1:t , Y1:t )
p(X1:t | Y1:t ) = . (56)
p(Y1:t )

Kalman Filter
Particle Filtering
Joint smoothing distribution
We derive the following recursion relation for the joint smoothing distribution:
p(Yt | X1:t , Y1:t−1 )p(X1:t , Y1:t−1 )

p(X1:t | Y1:t ) =
p(Yt , Y1:t−1 )
p(Yt | X1:t , Y1:t−1 )p(Xt | X1:t−1 , Y1:t−1 )
= p(X1:t−1 | Y1:t−1 ) (57)
p(Yt | Y1:t−1 )
p(Yt | Xt )p(Xt | Xt−1 )
= p(X1:t−1 | Y1:t−1 ).
p(Yt | Y1:t−1 )
This recursion will be approximated by numerically tractable expressions.

Kalman Filter
Particle Filtering
Filtering recursion
An alternative to working directly with the joint smoothing distribution is to find

recursive relations for the one-period predictive and filtering distributions.
This is analogous to the approach we took when deriving the Kalman filter.
Assume that the initial distribution p(X1 ) is known.
The one-period prediction distribution is given by
Z
p(Xt | Y1:t−1 ) = p(Xt | xt−1 )p(xt−1 | Y1:t−1 )dxt−1 . (58)

Kalman Filter
Particle Filtering
Filtering recursion
The filtering distribution is calculated based on the arrival of the new observation
Yt .
Namely, applying Bayes’ rule, and the fact that Yt depends on Xt only,
p(Yt , Xt | Y1:t−1 )
p(Xt | Y1:t ) =
p(Yt | Y1:t−1 )
p(Yt | Xt , Y1:t−1 )p(Xt | Y1:t−1 )
= R (59)
p(Yt | xt )p(xt | Y1:t−1 )dxt
p(Yt | Xt )p(Xt | Y1:t−1 )
= R .
p(Yt | xt )p(xt | Y1:t−1 )dxt

Kalman Filter
Particle Filtering
Filtering recursion
The difficulty with this recursion is clear: there is a complicated integral in the
denominator, which cannot in general be calculated in closed form.
In some special cases this can be done: for example, in the case of a linear
Gaussian state space model, this integral is Gaussian and can be calculated.
The recursion above leads then to the Kalman filter.
Instead of trying to evaluate the integral numerically, we will develop a Monte
Carlo based approach for approximately solving recursions (58) and (59).

Kalman Filter
Particle Filtering
Importance sampling
Suppose we are faced with Monte Carlo evaluation of the expected value
Z
E(f (X1:t ) | Y1:t ) = f (x1:t )p(x1:t | Y1:t )dx1:t . (60)
The straightforward approach would be to generate a number of samples

j j
x1:t , j = 1, . . . , N, from the distribution p(x1:t |Y1:t ), evaluate the integrand f (x1:t )
on each of these samples, and take the average of these values.
This approach may prove impractical if the density p(x1:t | Y1:t ) is hard to
simulate from.
Instead, we use the method of importance sampling (IS).
We proceed as follows:

Kalman Filter
Particle Filtering
Importance sampling
1. Choose a proposal distribution g(X1:t | Y1:t ), and write
p(x1:t | Y1:t )
Z
E(f (X1:t ) | Y1:t ) = f (x1:t ) g(x1:t | Y1:t )dx1:t . (61)
g(x1:t | Y1:t )
The proposal distribution should be chosen so that it is easy to sample from it.
1 , . . . , x N from the proposal distribution, and assign
2. Draw N samples of paths x1:t 1:t
to each of them a weight proportional to the ratio of the target and proposal
distributions:
j
j p(x1:t | Y1:t )
wt ∝ j
. (62)
g(x1:t | Y1:t )

Kalman Filter
Particle Filtering
Importance sampling
3. Given the sample, we define the estimated expected value by
N
btj f (x1:t
j
X
b N (f (X1:t ) | Y1:t ) =
E w ), (63)
j=1
btj , j = 1, . . . , N, are given by

where the importance weights w
j
w
btj = P t
w . (64)
N j
j=1 wt
The efficiency of IS depends essentially on how closely the proposal distribution

g(X1:t | Y1:t ) matches the target distribution.
One could, for example, settle on a parametric distribution such as Gaussian and
fine tune its parameters by minimizing its KL divergence from p(x1:t | Y1:t ).

Kalman Filter
Particle Filtering
Sequential importance sampling

Another serious limitation of IS is that it is computationally very expensive to
j
generate x1:t , and that this cost increases with t.
To mitigate it, the method of sequential importance sampling (SIS) has been
developed.
j
In this approach we retain the previously simulated values x1:t−1 and generate
j
the value of xt only.
j
In order to implement this idea, the samples x1:t are simulated from a sequence
of conditional distributions rather than a joint proposal distribution.
The proposal distribution can be factored into two pieces as follows :
g(X1:t , Y1:t )
g(X1:t | Y1:t ) =
g(Y1:t )
g(Xt | X1:t−1 , Y1:t )g(X1:t−1 , Y1:t )
=
g(Y1:t )
= g(Xt | X1:t−1 , Y1:t )g(X1:t−1 | Y1:t ).

Kalman Filter
Particle Filtering
Once the sample of X1:t−1 has been generated from g(X1:t−1 | Y1:t−1 ), its value
is independent of the observation Yt , and so g(X1:t−1 | Y1:t ) = g(X1:t−1 | Y1:t−1 ).
We can thus write the result of the calculation above as the following recursion:
g(X1:t | Y1:t ) = g(Xt | X1:t−1 , Y1:t )g(X1:t−1 | Y1:t−1 ). (65)
The second factor on the RHS of this equation, g(X1:t−1 | Y1:t−1 ), is the proposal
distribution built out of the paths that have already been generated in the
previous steps.
A new set of samples xt1 , . . . , xtN is drawn from the first factor g(Xt | X1:t−1 , Y1:t ).
We then append the newly simulated values xt1 , . . . , xtN to the simulated paths
1
x1:t−1 N
, . . . , x1:t−1 of length t − 1.
1 , . . . , x N of length t.
We thus obtain simulated paths x1:t 1:t

Kalman Filter
Particle Filtering

The weights (62) can be computed as follows. Using (57) and (65),
j j j j
j p(Yt | xt )p(xt | xt−1 )p(x1:t−1 | Y1:t−1 )
wt ∝ j j j
p(Yt | Y1:t−1 )g(xt | x1:t−1 , Y1:t )g(x1:t−1 | Y1:t−1 )
j j j j
p(Yt | xt )p(xt | xt−1 ) p(x1:t−1 | Y1:t−1 )
= j j
× j
p(Yt | Y1:t−1 )g(xt | x1:t−1 , Y1:t ) g(x1:t−1 | Y1:t−1 ) (66)
j j j
p(Yt | xt )p(xt | xt−1 ) j
∝ j j
wt−1
g(xt | x1:t−1 , Y1:t )
j
=w
et wt−1 ,
where the factor w

et is defined by
j j j
p(Yt | xt )p(xt | xt−1 )
w
et =
j j
. (67)
g(xt | x1:t−1 , Y1:t )
j
We initialize this distribution with w1 = 1.

Kalman Filter
Particle Filtering
The densities p(Yt | Xt ) and p(Xt | Xt−1 ) are determined by the state and
observation equations (51).
The only quantity that needs to be computed at each iteration is the ratio of
weights w
et .
1 , . . . , xN
As a result of each iteration, SIS produces N Monte Carlo paths x1:t 1:t
1 N
along with the unnormalized importance weights wt , . . . , wt .
These paths are referred to as particles.
We define the normalized weights by
j
w
btj = P t
w . (68)
N
i=1 wti

Kalman Filter
Particle Filtering

The joint smoothing PDF is estimated as follows:
N
btj δ(X1:t − x1:t
j
X
p(X1:t | Y1:t ) =
b w ), (69)
j=1
where δ denotes Dirac’s delta function, and so
N
btj f (x1:t
j
X
b (X1:t ) | Y1:t ) =
E(f w ). (70)
j=1
The estimated contribution to the likelihood function at time t is equal to
N
j
etj .
X
p(Yt | Y1:t−1 ) =
b w
bt−1 w (71)
j=1

Kalman Filter
Particle Filtering
Sequential importance sampling with resampling
It has been observed that in practice SIS suffers from the weight degeneracy
problem.
This manifests itself in the rapid increase of the variance of the distribution of the
importance weights as the number of time steps t increases.
As t increases, all the probability density gets eventually allocated to a single
particle.
That particle’s normalized weight converges to one, while the normalized weights
of the other particles converge to zero, and the SIS estimator becomes a
function of a single sample.

Kalman Filter
Particle Filtering

A remedy to this problem is resampling, a process in which a new population of
particles is replicated from the existing population in proportion to their
normalized importance weights.
This algorithm is called sequential importance sampling with resampling (SISR).
A new population of particles is generated by sampling from the existing
population:
(i) The probability of selecting a particle is proportional to its normalized
importance weight.
(ii) Once the resampled particles are selected, their weights are set equal (to
1/N). This prevents the weights from degenerating as in SIS.
We proceed as follows:
j
1. Initialize the filter: draw N samples x1 ∼ g(X1 ) and define the weights
j
j p(x1 )
w1 = j
.
g(x1 )

Kalman Filter
Particle Filtering
2. For t = 2, . . . , T :
j j
(i) Generate N samples xt ∼ g(Xt | xt−1 , Y1:t ) and compute the importance
weights
j j j
j p(Yt | xt ) p(xt | xt−1 ) j
wt ∝ j j
wt−1 .
g(xt | xt−1 , Y1:t )
(ii) Normalize the importance weights:
j
w
btj = P t
w . (72)
N j
j=1 wt
j
bt1 , . . . , w
(iii) Resample N particles with probabilities w btN , and define wt = 1/N.
After every iteration, once the particles have been generated, quantities of
interest can be estimated.

Kalman Filter
Particle Filtering

The joint smoothing PDF at time t is estimated as follows:
N
j
X
pN (X1:t | Y1:t ) =
b w ), (73)
j=1
where the normalized weights are given by (72).

The estimate of the expected value of a function f (X1:t ) of the path X1:t is given
by
N
j
X
b N (f (X1:t ) | Y1:t ) =
E w ). (74)
j=1
The contribution to the likelihood function at time t is estimated as follows:

Z
pN (Yt | Y1:t−1 ) ≈
b p(Yt | xt )p(xt | Y1:t−1 )dxt
N (75)
1 X j
≈ w
bt .
N
j=1

Kalman Filter
Particle Filtering
Bootstrap filter
The efficacy of the algorithms presented above depends on the choice of the
proposal distribution.
The simplest choice of the proposal distribution is
g(Xt | Xt−1 , Yt ) = p(Xt | Xt−1 ). (76)
This choice is called the prior kernel, and the corresponding particle filter is
called the bootstrap filter.
The bootsstrap filter resamples by setting the incremental weight ratios equal to
et = p(Yt | Xt ).
w
The prior kernel is an example of a blind proposal: it does not use the current
observation Yt .
Despite this, the bootstrap filter performs well in a number of situations.
Another popular version is the auxiliary particle filter, see [1] and [2].

Kalman Filter
Particle Filtering
References
[1] Creal, D.: A survey of sequential Monte Carlo methods for economics and
finance, Economic Reviews, textbf31, 245 - 296 (2012).
[2] Durbin, J., and Koopman, S. J.: Time Series Analysis by State Space
Methods, Oxford University Press (2012).

TS Lecture5 2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TS Lecture5 2019

Uploaded by

Copyright:

Available Formats

Warm-up: Recursive Least Squares

Time Series Analysis

A. Lesniewski Time Series Analysis

1 Warm-up: Recursive Least Squares

3 Nonlinear State Space Models

A. Lesniewski Time Series Analysis

βbn = (XnT Xn )−1 XnT Yn . (2)

The matrices X and Y above are defined as

X =  ..  ∈ Matn,k (R) and Yn =  ..  ∈ Rn ,

A. Lesniewski Time Series Analysis

Recursive least squares

and rewrite (2) as

A. Lesniewski Time Series Analysis

Recursive least squares

Using (3), we can write

Therefore, Pn and Bn obey the following recursive relations:

A. Lesniewski Time Series Analysis

Recursive least squares

We shall now use the matrix inversion formula:

(A + BD)−1 = A−1 − A−1 B(I + DA−1 B)−1 DA−1 , (6)

where we have introduced the notation

A. Lesniewski Time Series Analysis

Recursive least squares

Now, define the a priori error

Then the recursion for Bn can be recast as

Using (5), we see that

A. Lesniewski Time Series Analysis

Recursive least squares

Pn+1 xn+1 = Kn+1 ,

A. Lesniewski Time Series Analysis

Recursive least squares

A. Lesniewski Time Series Analysis

State space models

X1 → X2 → ... → Xt → ... state process

Y1 Y2 ... Yt ... observation process

A. Lesniewski Time Series Analysis

State space models

It is usually assumed that the state process Xt is Markovian, i.e. Xt depends on

Xt+1 = GXt + εt+1 ,

A. Lesniewski Time Series Analysis

State space models

Here, the state vector Xt ∈ Rr is possibly unobservable and it can be observed

A. Lesniewski Time Series Analysis

State space models

The vectors of residuals εt ∈ Rr and ηt ∈ Rn satisfy

E(εt εTs ) = δts Q,

A. Lesniewski Time Series Analysis

Inference for state space models

Let T denote the time horizon.

A. Lesniewski Time Series Analysis

A. Lesniewski Time Series Analysis

µt+1 = E(Xt+1 |Y1:t ),

as well as the filtered estimator µt|t and its covariance Pt|t :

µt|t = E(Xt |Y1:t ),

Our objective is to compute these quantities recursively.

A. Lesniewski Time Series Analysis

As a consequence, we have, for t = 2, 3, . . .,

E(vt |Y1:t−1 ) = E(HXt + ηt − Hµt |Y1:t−1 )

A. Lesniewski Time Series Analysis

Var(vt |Y1:t−1 ) = Var(HXt + ηt − Hµt |Y1:t−1 )

For convenience we denote

Ft = Var(vt |Y1:t−1 ). (16)

We will assume in the following that the matrix Ft is invertible.