Professional Documents
Culture Documents
Abstract (EKF) as well as the Unscented Kalman Filter (UKF) similar to Kushner’s Nonlinear
In this article we present an introduction to various Filtering algorithms and some of Filter. We also tackle the subject of Non-Gaussian filters and describe the Particle
their applications to the world of Quantitative Finance. We shall first mention the Filtering (PF) algorithm. Lastly, we will apply the filters to the term structure model of
fundamental case of Gaussian noises where we obtain the well-known Kalman Filter. commodity prices and the stochastic volatility model.
Because of common nonlinearities, we will be discussing the Extended Kalman Filter
2 Wilmott magazine
TECHNICAL ARTICLE 1
We denote the dimension of x k as nx , the dimension of w k as nw and Needless to say for a linear system, the function matrices are equal to
so on. these Jacobians. This is the case for the simple Kalman Filter.
We define the a priori process estimate as
1.1.2 The Algorithm
The actual algorithm could be implemented as follows:
x̂−k = E [x k ] (3)
(1) Initialization of x0 and P0
which is the estimation at time step k − 1 prior to the step k measurement. For k in 1...N
Similarly, we define the a posteriori estimate (2) Time Update (Prediction) equations
ẑ−k = h(x̂−k , 0)
P k = E e k etk (5)
and
where the superscript t corresponds to the transpose operator. ν k = z k − ẑ−k
In order to evaluate the above means and covariances we will need
as the innovation process.
the conditional densities p(x k |zk−1 ) and p(x k |z k ), which are determined
(3-b) Measurement Update (Filtering) equations
iteratively via the Time Update and Measurement Update equations. The
basic idea is to find the probability density function corresponding to a x̂ k = x̂−k + K k ν k (10)
hidden state x k at time step k given all the observations z1:k up to that
time. and
The Time Update step is based upon the Chapman-Kolmogorov equation P k = (I − K k H k )P−k (11)
with
p(x k |z1:k−1 ) = p(x k |xk−1 , z1:k−1 )p(xk−1 |z1:k−1 )dxk−1 K k = P−k Htk (H k P−k Htk + U k R k Utk )−1 (12)
(6)
= p(x k |xk−1 )p(xk−1 |z1:k−1 )dxk−1 and I the Identity matrix.
The above Kalman gain K k corresponds to the mean of the condition-
via the Markov property. al distribution of x k upon the observation z k or equivalently, the matrix
The Measurement Update step is based upon the Bayes rule that would minimize the mean square error P k within the class of linear
estimators.
p(zk |x k )p(x k |z1:k−1 ) This interpretation is based upon the following observation. Having x
p(x k |z1:k ) = (7)
p(z k |z1:k−1 ) a Normally distributed random-variable with a mean mx and variance Pxx ,
with and z another Normally distributed random-variable with a mean mz and
p(z k |z1:k−1 ) = p(z k |x k )p(x k |z1:k−1 )dx k variance Pzz , and having Pzx = Pxz the covariance between x and z, the con-
ditional distribution of x|z is also Normal with
A proof of the above is given in the appendix. mx|z = mx + K(z − mz )
EKF is based on the linearization of the transition and measurement
equations, and uses the conservation of Normal property within the class with
−1
of linear functions. We therefore define the Jacobian matrices of f with K = Pxz Pzz
respect to the state variable and the system noise as A k and W k respec- which corresponds to our Kalman gain.
tively; and similarly for h, as H k and U k respectively.
More accurately, for every row i and column j we have 1.1.3 Parameter Estimation
For a parameter-set in the model, the calibration could be carried out
Aij = ∂fi /∂ x j (x̂k−1 , 0), Wij = ∂fi /∂ wj (x̂k−1 , 0), via a Maximum Likelihood Estimator (MLE) or in case of conditionally
^
Hij = ∂hi /∂ x j (x̂−k , 0), Uij = ∂hi /∂ u j (x̂−k , 0) Gaussian state variables, a Quasi-Maximum Likelihood (QML) algorithm.
Wilmott magazine 3
To do this we need to maximize Nk=1 p(z k |z1:k−1 ) , and given the where the scaling parameters α , β , κ and λ = α 2 (na + κ) − na will be cho-
Normal form of this probability density function, taking the logarithm, sen for tuning.
changing the signs and ignoring constant terms, this would be equiva- For k in 1...N
lent to minimizing over the parameter-set (1-b) State Space Augmentation: As mentioned earlier, we concatenate
N the state vector with the system noise and the observation noise, and cre-
z k − mz k
L1:N = ln(Pz k z k ) + (13) ate an augmented state vector for each time-step
Pz k z k
k=1 xk−1
xk−1 = wk−1
a
For the KF or EKF we have
uk−1
mz k = ẑ−k and therefore
x̂k−1
and a
x̂k−1 = a
E [xk−1 |z k ] = 0
Pz k z k = H k P−k Htk + U k R k Utk
0
and
1.2 The Unscented Kalman Filter and Kushner’s Pk−1 Pxw (k − 1|k − 1) 0
Nonlinear Filter a
Pk−1 = Pxw (k − 1|k − 1) Pww (k − 1|k − 1) 0
1.2.1 Background and Notations 0 0 Puu (k − 1|k − 1)
Recently Julier and Uhlmann (1997) proposed a new extension of the
(1-c) The Unscented Transformation: Following this, in order to use the
Kalman Filter to Nonlinear systems, different from the EKF. The new
Normal approximation, we need to construct the corresponding Sigma
method called the Unscented Kalman Filter (UKF) will calculate the
Points through the Unscented Transformation:
mean to a higher order of accuracy than the EKF, and the covariance to
the same order of accuracy. a
χk−1 (0) = x̂k−1
a
Unlike the EKF, this method does not require any Jacobian calculation
since it does not approximate the nonlinear functions of the process and For i = 1...na
the observation. Indeed it uses the true nonlinear models but approxi-
a
χk−1 (i) = x̂k−1
a
+ ( (na + λ)Pk−1
a
)i
mates the distribution of the state random variable x k (as well as the
and for i = na + 1...2na
observation z k ) with a Normal distribution by applying an Unscented
Transformation to it.
a
χk−1 (i) = x̂k−1
a
− ( (na + λ)Pk−1
a
)i−na (15)
In order to be able to apply this Gaussian approximation (unless we
have x k = f (xk−1 ) + w k and z k = h(x k ) + u k , i.e. unless the equations are where the above subscripts i and i − na correspond to the ith and i − nth
a
linear in noise) we will need to augment the state space by concatenating columns of the square-root matrix5.
the noises to it. This augmented state will have a dimension (2) Time Update equations are
na = nx + nw + nu . χk|k−1 (i) = f (χk−1
x w
(i), χk−1 (i)) (16)
1.2.2 The Algorithm for i = 0...2na + 1 and
The UKF algorithm could be written in the following way:
2n a
(1-a) Initialization: Similarly to the EKF, we start with an initial choice x̂−k =
(m)
Wi χk|k−1 (i) (17)
for the state vector x̂0 = E [x0 ] and its covariance matrix i=0
P0 = E [(x0 − x̂0 )(x0 − x̂0 )t ] . and
2n a
We also define the weights Wi(m) and Wi(c) as P−k = Wi (χk|k−1 (i) − x̂−k )(χ k|k−1 (i) − x̂−k )t
(c)
(18)
i=0
(m) λ
W0 = The superscripts x and w correspond to the state and system-noise por-
na + λ
tions of the augmented state respectively.
and (3-a) Innovation: We define
(c) λ
W0 = + (1 − α 2 + β) zk|k−1 (i) = h(χk|k−1 (i), χk−1
u
(i)) (19)
na + λ
and for i = 1...2na and
1
2n a
ẑ−k =
(m) (c) (m)
Wi = Wi = (14) Wi Z k|k−1 (i) (20)
2(na + λ) i=0
4 Wilmott magazine
TECHNICAL ARTICLE 1
which gives us the Kalman Gain where ζ t = ( ζi1 ... ζiN ) is the vector of the Gauss-Hermite roots of
order M and wi1 ...wiN are the corresponding weights.
K k = Px k z k Pz−1
kzk Note that even if both Kushner’s NLF and UKF use Gaussian
Quadratures, UKF only uses 2N + 1 sigma points, while NLF needs MN
and we have as previously
points for the computation of the integrals.
x̂ k = x̂−k + K k ν k (23) More accurately, for a Quadrature-order M and an N -dimensional (pos-
sibly augmented) variable, the sigma-points are defined for j = 1...N and
as well as
i j=1 ...M as
P k = P−k − K k Pz k z k K tk (23)
a
χk−1 (i1 , ..., iN ) = x̂k−1
a
+ Pk−1
a
ζ (i1 , ..., iN )
which completes the Measurement Update Equations.
where this square-root corresponds to the Cholesky factorization.
1.2.3 Parameter Estimation
Similarly to the UKF, we have the Time Update equations
The maximization of the likelihood could be done exactly as in (13) tak-
x
ing ẑ−k and Pz k z k defined as above. χk|k−1 (i1 , ..., iN ) = f χk−1 w
(i1 , ..., iN ), χk−1 (i1 , ..., iN )
approach, the authors suggest using a Normal approximation to the den- and
sities p(x k |zk−1 ) and p(x k |z k ). They then use the fact that a Normal distri-
M
M
bution is entirely determined via its first two moments, which reduces P−k = ... wi1 ...wiN (χk|k−1 (i1 , ..., iN ) − x̂−k )(χ k|k−1 (i1 , ..., iN ) − x̂−k )t
the calculations considerably. i1 =1 iN =1
They finally rewrite the moment calculation equations (3), (4) and (5)
and similarly for the measurement update equations.
using the above p(x k |zk−1 ) and p(x k |z k ) , after calculating these condi-
tional densities via the time and measurement update equations (6) and 1.3 The Non-Gaussian Case: The Particle Filter
(7). All integrals could be evaluated via Gaussian Quadratures7. It is possible to generalize the algorithm for the fundamental Gaussian
Note that when h(x, u) is strongly nonlinear, the Gauss Hermite inte- case to one applicable to any distribution.
gration is not efficient for evaluating the moments of the measurement
1.3.1 Background and Notations
update equation, since the term p(z k |x k ) contains the exponent z k − h(x k ).
In this approach, we use Markov-Chain Monte-Carlo simulations instead
The iterative methods based on the idea of importance sampling proposed
of using a Gaussian approximation for (x k |z k ) as the Kalman or Kushner
in Kushner (1967), Kushner and Budhiraja (2000) correct this problem at
Filters do. A detailed description is given in Doucet, De Freitas and
the price of a strong increase in computation time. As suggested in Ito and
Gordon (2001).
Xiong (2000), one way to avoid this integration would be to make the addi-
The idea is based on the Importance Sampling technique:
tional hypothesis that x k , h(x k )|z1:k−1 is Gaussian.
We can calculate an expected value
When nx = 1 and λ = 2, the numeric integration in the UKF will cor-
respond to a Gauss-Hermite Quadrature of order 3. However in the UKF E[ f (x k )] = f (x k )p(x k |z1:k )dx k (24)
we can tune the filter and reduce the higher term errors via the previ-
^
ously mentioned parameters α and β . by using a known and simple proposal distribution q().
Wilmott magazine 5
More precisely, it is possible to write However this choice of the proposal distribution does not take into
account our most recent observation z k at all and therefore could
p(x k |z1:k ) become inefficient.
E[f (x k )] = f (x k ) q(x k |z1:k )dx k
q(x k |z1:k ) Hence the idea of using a Gaussian Approximation for the proposal,
which could be also written as and in particular an approximation based on the Kalman Filter, in order
to incorporate the observations.
w k (x k ) We therefore will have
E[f (x k )] = f (x k ) q(x k |z1:k )dx k (25)
p(z1:k )
with q(x k |xk−1 , z1:k ) = N (x̂ k , P k ) (29)
p(z1:k |x k )p(x k )
w k (x k ) =
q(x k |z1:k ) using the same notations as in the section on the Kalman Filter. Such fil-
defined as the filtering non-normalized weight as step k. ters are sometimes referred to as the Extended Particle Filter (EPF) and
We therefore have the Unscented Particle Filter (UPF). See Haykin (2001) for a detailed
description of these algorithms.
Eq [w k (x k )f (x k )]
E[f (x k )] = = Eq [w̃ k (x k )f (x k )] (26) 1.3.2 The Algorithm
Eq [w k (x k )]
Given the above framework, the algorithm for an Extended or Unscented
with
Particle Filter could be implemented in the following way:
w k (x k )
w̃ k (x k ) = (1) For time step k = 0 choose x0 and P0 > 0.
Eq [w k (x k )]
defined as the filtering normalized weight as step k. For i such that 1 ≤ i ≤ Nsims take
Using Monte-Carlo sampling from the distribution q(x k |z1:k ) we can (i)
x0 = x0 + P0 Z(i)
write in the discrete framework:
where Z(i) is a standard Gaussian simulated number.
N sims
E[f (x k )] ≈
(i)
w̃ k (x k )f (x k )
(i)
(27) Also take P0(i) = P0 and w0(i) = 1/Nsims
i=1 While 1 ≤ k ≤ N
with again (i) (2) For each simulation-index i
(i) w k (x k )
w̃ k (x k ) = N (i) (i)
(j)
j=1 w k (x k )
sims
x̂ k = KF(xk−1 )
Now supposing that our proposal distribution q() satisfies the Markov with P (i)
k the associated a posteriori error covariance matrix.
property, it can be shown that w k verifies the recursive identity (KF could be either the EKF or the UKF)
Needless to say, the choice of the proposal distribution is crucial. (5) Normalize the weights
Many suggest using w
(i)
(i)
q(x k |xk−1 , z1:k ) = p(x k |xk−1 ) w̃ k = N k (i)
sims
i=1 wk
since it will give us a simple weight identity
(6) Resample the points x̃(i) (i) (i) (i)
k and get x k and reset w k = w̃ k = 1/Nsims .
(i) (i) (i)
w k = wk−1 p(z k |x k ) (7) Increment k, Go back to step (2) and Stop at the end of the While loop.
6 Wilmott magazine
TECHNICAL ARTICLE 1
Wilmott magazine 7
where: where:
— The ith component of the vector zt is F(τi )
µ − 12 σS2 t — h(St , Ct ) is a vector whose ith component is:
—c= is a (2 × 1) vector, and t is the period sepa-
καt [St × exp(−zi Ct/t−1 + B(τi ))]
−κ τ i
rating 2 observation dates 1−e
with: Zi =
κ
1 −t
—F= is a (2 × 2) matrix, 2
0 1 − κt σC2 σS σC ρ σC 1 − e−2κ τi
B(τi ) = r − α + − × τi + ×
2κ 2 κ 4 κ3
— T is an identity matrix, (2 × 2),
— wt is a sequence of uncorrelated errors, with: σC2 1 − e−κ τi
+ α κ + σS σC ρ − ×
σS2 t ρσS σC t κ κ2
E[wt ] = 0, and Q = Var[wt ] =
ρσS σC t σC2 t α = α − λ/κ
The measurement equation is directly written from the solution of — ut is a white noise vector, (N × 1), with no serial correlation:
the model:
Gt
zt = d + H × + ut , t = 1, . . . NT E[ut ] = 0 and R = Var[ut ]. R is (N × N)
Ct
where: Lastly A and H, the derivatives of the functions f and h with respect to the
— The ith component of the vector zt of size N is ln(F(τi )). N is the state variables, are the following:
number of maturities that were used for the estimation.
— The ith component of the vector d of size N is B(τi )
— A is a (2 × 2) matrix:
1 − e−κ τi
— H is the (N × 2) matrix whose i line is 1, −
th
1 + µt − Ct−1 t −St−1 t
κ A(St−1 , Ct−1 ) =
— ut is a white noise vector, of size N, with no serial correlation: 0 (1 − κt)
— H is a (N × 2) matrix whose ith line is:
E[ut ] = 0 and R = Var[ut ]. R is (N × N)
e(−Zi Ct− 1 +B(τ i )) − Zi × e(−zi Ct− 1 +B(τ i ))
2.1.3 Applying the extended filter to the Schwartz model
2
The transition equation is directly obtained from the dynamics of the σC2 σS σC ρ σ 1 − e−2κ τi
B(τi ) = r − α + − × τi + C ×
state variables: 2κ 2 κ 4 κ3
St
= f (St−1 , Ct−1 ) + T(St−1 , Ct−1 )wt σ2 1 − e−κ τi
Ct + α κ + σS σC ρ − C ×
κ κ2
where:
α = α − λ/κ
S
— t is the state vector, (2 × 1), 2.2 Practical difficulties associated with the empirical
Ct
study
— f (St−1 , Ct−1 ) is a (2 × 1) vector:
To perform the empirical study, some difficulties must be overcome.
St−1 (1 + µt − Ct−1 t) First, there are choices to make when the iterative process is started.
f (St−1 , Ct−1 ) =
καt + Ct−1 (1 − κt) Second, if the model has been expressed under the logarithmic form for
S 0 the simple Kalman filter, some precautions must be taken when the per-
—T(St−1 , Ct−1 ) is a (2 × 2) matrix: T(St−1 , Ct−1 ) = t−1
0 1 formance is being considered. Third, the stability of the iteration process
2
and the model’s performance are extremely sensitive to the covariance
σS ρσS σC
— Q is a (2 × 2) matrix: Var(wt ) = matrix R.
ρσS σC σC2
8 Wilmott magazine
TECHNICAL ARTICLE 1
term structure models of commodity prices, the nearest futures price is linearization on the results. We first present the empirical data. Then
generally used as the spot price S, and the convenience yield C is calcu- the performance criteria are exposed. Finally, the results are delivered
lated as the solution of the Brennan and Schwartz (1985) model. This and commented upon.
solution requires the use of two observed futures prices, for delivery in
2.3.1 The empirical data
T1 and in T2 :
The data used for the empirical study corresponds to daily crude oil
ln(F(S, t, T1 )) − ln(F(S, t, T2 ))
c=r− prices for the settlement of the Nymex’s WTI futures contracts,
T1 − T2 between the 18th of May 1998, and the 15th of October 2001. They have
where T1 is the nearest delivery, and T2 is the one immediately after- been arranged such that the first futures price’s maturity τ1 is actual-
wards. We choose the first value of the estimation period for the non- ly the one month maturity, and the second futures price’s corre-
observable variables. sponds to the two months maturity τ2 ,. . . Keeping the first observa-
The covariance matrix associated with the state variables must also be tion of each group of five, these daily data points were transformed
initialized. We choose a diagonal matrix, and calculate the variances into weekly data. For the parameters estimation, and for the measure
with the first 30 points of the estimation period. of the model’s performance, four series of futures prices 11 were
retained, corresponding to the one, the three, the six and the nine
2.2.2 Measuring the performance months maturities. The interest rates are set to the three-month T-bill
As we have transformed the equations taking the logarithms, in order to rates. Because interest rates are supposed to be constant in the model,
use the simple Kalman filter, care has to be taken not to introduce a bias we use the mean of all the observations between 1995 and 2002.
when transforming back the results. Indeed in that case, the innovations
are calculated with the logarithms of the futures prices. 2.3.2 The performance criteria
To measure the model’s performance, two criteria were retained: the
zt = ẑ−
t + σV mean pricing errors and the root mean squared errors.
where σ is the standard deviation of the innovations and V is a standard- The mean pricing errors (MPE) are defined in the following way:
ized Gaussian residual independent to ẑ− −
t . The exponential of ẑt is a
1
N
ẑ−
biased estimator of the future prices, because e = e t × e and taking
zt σ V
MPE = (F̂(n, τ ) − F(n, τ ))
N n=1
the expectation we find:
− σ2 where N is the number of observations, F̂(n, τ ) is the estimated futures price
E[ezt ] = E[eẑt ] × e 2
for a maturity τ at time-step n, and F(n, τ ) is the observed futures price.
We therefore have to correct it by the exponential factor. Retaining the same notations, the root mean squared error (RMSE), is
defined in the following way, for one given maturity τ :
2.2.3 Stabilizing the iteration process
Another important choice must be made before initiating the Kalman 1 N
RMSE = (F̂(n, τ ) − F(n, τ ))2
iteration process, concerning the estimation of the covariance matrix N n=1
associated with the errors introduced in the measurement equation. This
matrix R is crucial for the iteration’s stability, because it is added to the
2.3.3 The empirical results
innovations covariance matrix, during the innovation phase. Therefore,
First, the optimal parameters obtained with the two filters are compared.
the updating of the non-observable variable is strongly affected by the
Then the model’s ability to represent the price curve and its dynamics is
matrix R, and if the terms of this matrix are too high, the iteration can
appreciated. Finally, the sensitivity of the results to the errors covariance
become unstable.
matrix is presented.
Most of the time, the easiest way to estimate this matrix is to calculate
the variances and the covariances of the estimation database. This
2.3.3.1 Optimal parameters12
method was chosen to measure the model’s performance and is present-
The differences between the optimal parameters obtained with the two
ed in paragraph 2.3. But it is important to know how much the empirical
filters show that the linearization has a significant influence.
results are affected by this choice. In order to answer this question, a few
Nevertheless, the parameters have the same order size as those Schwartz
simulations will be run and analyzed.
obtained in 1997 on the crude oil market, on different periods.
2.3 Comparison between the two filters 2.3.3.2 The model’s performance
The comparison between the performance of the Schwartz model meas- The simple filter is always more precise than the extended one. This is true
^
ured with the two filters allows us to appreciate the influence of the for all the maturities14. These measures also always decrease with maturity,
Wilmott magazine 9
TABLE 1. OPTIMAL PARAMETERS, 1998–200113 TABLE 3. THE COMPARISON BETWEEN
THE MODEL’S PERFORMANCE ASSOCI-
Simple filter Extended filter ATED WITH THE SIMPLE FILTER, WITH
Parameters Gradients Parameters Gradients AND WITHOUT CORRECTIONS FOR THE
Pull back force: κ 1.59171 −0.003631 1.258133 0.000628 LOGARITHM, 1998–2001
Trend: µ 0.379926 0.000497 0.352014 −0.001178
Spot price’s volatility: σs 0.263525 −0.000448 0.320235 −0.000338 Simple filter Simple filter corrected
Long run mean: α 0.252260 −0.012867 0.232547 0.004723 Maturity MPE RMSE MPE RMSE
Convenience yield’s volatility: σc 0.237071 −0.000602 0.288427 −0.001070 1 month −0.060423 2.319730 0.065644 2.314178
Correlation coefficient: ρ 0.938487 −0.001533 0.969985 0.000008 3 months −0.107783 1.989428 0.006419 1.981453
Risk premium: λ 0.177159 0.009272 0.181955 −0.002426 6 months −0.054536 1.715223 0.026010 1.709931
9 months −0.007316 1.567467 0.061301 1.564854
Average −0.057514 1.897962 0.036637 1.892604
TABLE 2. THE MODEL’S PERFORMANCE Unit: USD/b.
WITH THE SIMPLE AND THE EXTENDED
FILTERS, 1998–2001
Simple filter Extended filter 7
6
Maturity MPE RMSE MPE RMSE 5
1 month −0.060423 2.319730 0.09793 2.294503 4
98
98
98
99
99
99
99
99
99
18 /00
18 00
18 00
18 00
18 00
18 00
18 /01
18 01
18 01
18 01
01
5/
7/
9/
1/
1/
3/
5/
7/
9/
1/
3/
5/
7/
9/
1/
3/
5/
7/
9/
1
1
/0
/0
/0
/1
/0
/0
/0
/0
/0
/1
/0
/0
/0
/0
/0
/1
/0
/0
/0
/0
/0
mean squared error increases again for deliveries after 15 months.
18
18
18
18
18
18
18
18
18
18
18
To be rigorous, the model’s performance associated with the simple Simple filter Extended filter
Kalman filter should be corrected when, as is the case here, the loga-
rithms of the estimations are used to obtain the estimations themselves. FIGURE 1: Innovations, 1998–2001
The correction slightly improves the performance, as shown in table 3:
the root mean squared errors and the mean pricing errors diminish a
bit for almost all the maturities.
Finally, the innovations range diminishes with the futures con- Nevertheless with an extended filter, the model’s ability to represent the
tracts maturity. Figure 1 illustrates the innovations behavior for the price curve is still quite good for this case.
one-month maturity case. It shows that they tend to return to zero for Another way to assess a model’s performance is to see if it is able to
the two filters. Nevertheless as the figure illustrates it, even if the reproduce the price dynamics, which can be shown graphically.
mean pricing errors are low for the two filters, the pricing errors can Considering this, the first important conclusion is that the model is
be rather important at certain specific dates. They reach USD 6 in able to reproduce the price dynamics quite precisely, even if as in 1998—
absolute value for the extended filter, which represents 25% of the 2001, there are very large fluctuations in the futures prices. Figure 2
mean futures price for the one-month maturity case. It is a bit lower shows the results obtained for the one-month maturity case. During
for the simple filter: USD 5. that period, the crude oil price goes from USD 11 per barrel to USD 37
Consequently we can say that there is clearly an impact of the lin- per barrel! Even if the Kalman filters are often suspected to be unstable,
earization introduced in the extended filter: it can be observed from the these results show that they can be used even with extremely volatile
optimal parameters, from the performance, and from the innovations. data. The graph also shows that the two filters attenuate the range of
10 Wilmott magazine
TECHNICAL ARTICLE 1
Most of the time, the terms of this matrix correspond to the variances
35 and the covariances of the estimation database, namely, in the case stud-
ied here, the variances and covariances between futures prices for differ-
30
ent maturities. But one must know that the results obtained with the
25
Kalman filter can be more precise if these terms are (artificially) lowered,
($/b)
as shown in table 4. This table exposes the different results obtained dur-
20 ing 1998—2001 with the extended Kalman filter. This period is especially
interesting because the data strongly fluctuates. The performance is
15
achieved, first with the matrix based on the observations, then with the
artificially lowered one.
10
Simulations 1 to 4 correspond respectively to the model’s perform-
98
98
98
98
99
99
99
99
99
99
00
00
00
00
00
00
01
01
01
01
01
5/
7/
9/
1/
1/
3/
5/
7/
9/
1/
1/
3/
5/
7/
9/
1/
1/
3/
5/
7/
9/
ance obtained by multiplying the system’s matrix R by (1/2), (1/16),
/0
/0
/0
/1
/0
/0
/0
/0
/0
/1
/0
/0
/0
/0
/0
/1
/0
/0
/0
/0
/0
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
Futures one month Simple filter Extended filter (1/160), and (1/1600). As the matrix elements are lowered, the model’s
performance strongly improves: from the initial simulation to the fourth
FIGURE 2: Estimated and observed futures prices for the one month maturity, one, the root-mean-squared-error is almost divided by two. The compari-
1998–2001 son between the third and the fourth simulations also illustrates the fact
that there is a limit to the performance amelioration. Figure 3 portrays
the main results of these simulations.
price f luctuations. This phenomenon can actually be observed for
every maturity.
35,00
2.3.3.3 Simulations
The last results presented in this sesction are simulations. They show 30,00
results.
15,00
10,00
98
18 /98
18 /98
18 /98
18 /99
18 /99
18 /99
18 /99
18 /99
18 /99
18 /00
18 /00
18 /00
18 /00
18 /00
18 /00
18 /01
18 /01
18 /01
18 /01
01
5/
9/
7
7
/0
/0
/0
/1
/0
/0
/0
/0
/0
/1
/0
/0
/0
/0
/0
/1
/0
/0
/0
/0
/0
TABLE 4. SIMULATIONS WITH DIFFERENT
18
18
Wilmott magazine 11
Let x follow an Ornstein-Uhlenbeck or mean-reverting diffusion, like initial series and the estimations we obtain with the two versions of
the convenience yield in the Schwartz model, and z be an exponential: the Kalman filter when a = 0.8 .
1
Z̃ k = (Z k − ρB k )
FIGURE 4: Real process versus its estimates: Top EKF, bottom Kushner’s NLF 1 − ρ2
12 Wilmott magazine
TECHNICAL ARTICLE 1
as well as
3.1.2 Other Stochastic Volatility Models
It is easy to generalize the above state space model to other stochastic H k = − 12 t
volatility approaches. Indeed we could replace (32) with
and
p
√ √ √
v k = vk−1 + (ω − θ vk−1 )t + ξ vk−1 t zk−1 (35) Uk = v k t
The same time update and measurement update equations could be used
where p = 1/2 would naturally correspond to the Heston (Square-Root)
with the UKF or Kushner’s NLF.
model, p = 1 to the GARCH diffusion-limit model, and p = 3/2 to the 3/2
model. These models have all been described and analyzed in Lewis 3.2.2 Particle Filters
(2000). We could also apply the Particle Filtering algorithm to our problem.
The new state transition equation would therefore become Using the same notations as in section 1.3.2 and calling
1 (x − m)2
p− 1 1 p− 1 n(x, m, s) = √ exp(− )
v k = vk−1 + ω − ρξ r k vk−12 − θ − ρξ vk−12 vk−1 t 2π s 2s2
2
(36)
Sk √ the Normal density with mean m and standard deviation s, we will have
p− 1 p
+ ρξ vk−12 ln +ξ 1− ρ 2 vk−1 t Z̃k−1
Sk−1 (i) (i) (i) (i) (i)
q(x̃ k |xk−1 , z1:k ) = n x̃ k , m = x̂ k , s = P k
Wilmott magazine 13
3.3 Parameter Estimation and Back-Testing MPE EKF −Heston = 3.58207e − 05 RMSE EKF −Heston = 1.83223e − 05
For the Gaussian MLE we will need to minimize MPE EKF −GARCH = 2.78438e − 05 RMSE EKF −GARCH = 1.42428e − 05
K MPE EKF − 32 = 2.63227e − 05 RMSE EKF − 32 = 1.74760e − 05
ν2
φ (ω, θ, ξ, ρ) = ln(F k ) + k
Fk
k=1
4e-05
By maximizing the above likelihood functions, we will find the optimal
errors
parameter-set ˆ = (ω̂, θ̂ , ξ̂ , ρ̂) . This calibration procedure could then be
3e-05
used for pricing of derivatives instruments or forecasting volatility.
We could perform a back-testing process in the following way.
2e-05
Choosing an original parameter-set
8e-05
which show the better performance of the Particle Filters. However, it
should be reminded that the Particle Filters are also more computation-
errors
14 Wilmott magazine
TECHNICAL ARTICLE 1
EPF errors for different SV Models Heston model errors for different Filters
5.5e-05 7e-05
Heston EKF
5e-05 GARCH UKF
3/2 6e-05 EPF
4.5e-05 UPF
4e-05 5e-05
3.5e-05
4e-05
errors
error
3e-05
3e-05
2.5e-05
2e-05
2e-05
1.5e-05
1e-05
1e-05
5e-06 0
0 200 400 600 800 1000 1200 200 400 600 800 1000 1200
Days Days
FIGURE 7: Comparison of EPF Filtering errors for Heston, GARCH and 3/2 FIGURE 9: Comparison of Filtering errors for the Heston Model
Models
UPF errors for different SV Models GARCH model errors for different Filters
2.8e-05 8e-05
EKF
2.6e-05 7e-05 UKF
Heston EPF
GARCH UPF
2.4e-05 6e-05
3/2
2.2e-05 5e-05
errors
errors
2e-05 4e-05
1.8e-05 3e-05
1.6e-05 2e-05
1.4e-05 1e-05
1.2e-05 0
0 200 400 600 800 1000 1200 200 400 600 800 1000 1200
Days Days
FIGURE 8: Comparison of UPF Filtering errors for Heston, GARCH and 3/2 FIGURE 10: Comparison of Filtering errors for the GARCH Model
Models
^
Wilmott magazine 15
3/2 model errors for different Filters As expected, we observe an improvement when Particle Filters are
7e-05 used. What is more, given the tests carried out on the S&P500 data it
EKF
UKF seems that, despite its vast popularity, the Heston model does not per-
6e-05 EPF form as well as the 3/2 representation.
UPF
This suggests further research on other existing models such as Jump
5e-05 Diffusion (Merton (1976)), Variance Gamma (Madan, Carr and Chang
(1998)) or CGMY (Carr, Geman, Madan and Yor (2002)). Clearly, because of
the non-Gaussianity of these models, the Particle Filtering technique
errors
4e-05
will need to be applied to them18.
Finally it would be instructive to compare the risk-neutral parameter-
3e-05
set obtained from the above time-series based approaches, to the parame-
ter-set resulting from a cross-sectional approach using options market
2e-05
prices at a given point in time19. Inconsistent parameters between the
two approaches would signal either arbitrage opportunities in the mar-
1e-05 ket, or a misspecification in the tested model20.
0
200 400 600
Days
800 1000 1200
4 Summary
In this article, we present an introduction to various filtering algo-
FIGURE 11: Comparison of Filtering errors for the 3/2 Model rithms: the Kalman filter, the Extended Filter (EKF), as well as the
Unscented Kalman Filter (UKF) similar to Kushner’s Nonlinear filter. We
also tackle the subject of Non-Gaussian filters and describe the Particle
MPE UKF −Heston = 3.00000e − 05 RMSE UKF −Heston = 1.91280e − 05 Filtering (PF) algorithm.
MPE UKF −GARCH = 2.99275e − 05 RMSE UKF −GARCH = 2.58131e − 05 We then apply the filters to a term structure model of commodity
prices. Our main results are the following: Firstly, the approximation
MPE UKF − 32 = 2.82279e − 05 RMSE UKF − 32 = 1.55777e − 05
introduced in the Extended filter has an influence on the model per-
formances. Secondly, the estimation results are sensitive to the system
MPE EPF −Heston = 2.70104e − 05 RMSE EPF −Heston = 1.34534e − 05 matrix containing the errors of the measurement equation. Thirdly, the
MPE EPF −GARCH = 2.48733e − 05 RMSE EPF −GARCH = 4.99337e − 06 approximation made in the extended filter is not a real issue until the
model becomes highly nonlinear. In that case, other nonlinear filters
MPE EPF − 32 = 2.26462e − 05 RMSE EPF − 32 = 2.58645e − 06
such as those described in section 1.2 may be used.
Lastly, the application of the filters to stochastic volatility models
MPE UPF −Heston = 2.04000e − 05 RMSE UPF −Heston = 2.74818e − 06 shows that the Particle Filters perform better than the Gaussian ones,
MPE UPF −GARCH = 2.63036e − 05 RMSE UPF −GARCH = 8.44030e − 07 however they are also more expensive. What is more, given the tests car-
ried out on the S&P500 data it seems that, despite its vast popularity, the
MPE UPF − 32 = 1.73857e − 05 RMSE UPF − 32 = 4.09918e − 06
Heston model does not perform as well as the 3/2 representation. This
suggests further research on other existing models such as Jump
Two immediate observations can be made: On the one hand the Particle Diffusion, Variance Gamma or CGMY. Clearly, because of the non-
Filters have a better performance than the Gaussian ones, which recon- Gaussianity of these models, the Particle Filtering technique will need to
firms what one would anticipate. On the other hand for most of the be applied to them.
Filters, the 3/2 model seems to outperform the Heston model, which is in
line with the findings of Engle & Ishida (2002).
Appendix
3.5 Conclusion The Measurement Update equation is
Using the Gaussian or Particle Filtering techniques, it is possible to esti-
mate the stochastic volatility parameters from the underlying asset time- p(z k |x k )p(x k |z1:k−1 )
p(x k |z1:k ) =
series, in the risk-neutral or real-world context. p(z k |z1:k−1 )
16 Wilmott magazine
TECHNICAL ARTICLE 1
where the denominator p(z k |z1:k−1 ) could be written as 13. An alternative approach would consist in choosing a volatility proxy such as f(ln
Sk,√ln Sk+1) = ln Sk+1 − ln Sk and ignoring the stock-drift term, given that t =
p(z k |z1:k−1 ) = p(z k |x k )p(x k |z1:k−1 )dx k o( t). We would therefore write zk = ln(|f(ln Sk, ln Sk+1)|) = E[ln(|f(lnS*k, ln S*k+1
)|)] + 12 ln vk + k with St* the same process as St but with a volatility of one, and k
and corresponds to the Likelihood Function for the time-step k. corresponding to the measurement noise. See Alizadeh, Brandt and Diebold (2002)
Indeed using the Bayes rule and the Markov property, we have for details.
14. In this section, all optimizations were made via the Direction-Set algorithm as
described in Press et al. (1997). The precision was set to 1.0e-6.
p(z1:k |x k )p(x k ) 15. A study on Filtering and Lévy processes has recently been done in Barndorff—Nielsen
p(x k |z1:k ) =
p(z1:k ) and Shephard N. (2002).
p(z k , z1:k−1 |x k )p(x k ) 16. This idea is explored in Aït-Sahalia, Wang and Yared (2001) in a Non-parametric
= fashion.
p(z k , z1:k−1 )
17. This comparison supposes that the Girsanov theorem is applicable to the tested
p(z k |z1:k−1 , x k )p(z1:k−1 |x k )p(x k )
= model.
p(z k |z1:k−1 )p(z1:k−1 )
p(z k |z1:k−1 , x k )p(x k |z1:k−1 )p(z1:k−1 )p(x k ) ■ Aït-Sahalia Y., Wang Y., Yared F. (2001) “Do Option Markets Correctly Price the
=
p(z k |z1:k−1 )p(z1:k−1 )p(x k ) Probabilities of Movement of the Underlying Asset?” Journal of Econometrics, 101
p(z k |x k )p(x k |z1:k−1 ) ■ Alizadeh S., Brandt M.W., Diebold F.X. (2002) “Range-Based Estimation of Stochastic
= Volatility Models” Journal of Finance, Vol. 57, No. 3
p(z k |z1:k−1 )
■ Anderson B. D. O., Moore J. B. (1979) “Optimal Filtering” Englewood Cliffs, Prentice
Hall
Note that p(x k |z k ) is proportional to
■ Arulampalam S., Maskell S., Gordon N., Clapp T. (2002) “A Tutorial on Particle Filters
for On-line Nonlinear/Non-Gaussian Bayesian Tracking” IEEE Transactions on Signal
exp − 12 (z k − h(x k ))t R−1
k (z k − h(x k ))
Processing, Vol. 50, No. 2
■ Babbs S. H., Nowman K. B. (1999) “Kalman Filtering of Generalized Vasicek Term
Structure Models” Journal of Financial and Quantitative Analysis, Vol. 34, No. 1
under the hypothesis of additive measurement noises.
■ Barndorff-Nielsen O. E., Shephard N. (2002) “Econometric Analysis of Realized
Volatility and its use in Estimating Stochastic Volatility Models” Journal of the Royal
Statistical Society, Series B, Vol. 64
FOOTNOTES & REFERENCES ■ Bates D. S. (2002) “Maximum Likelihood Estimation of Latent Affine Processes”
University of Iowa & NBER
1. Some prefer to write xk = f(xk-1 , wk-1). Needless to say, the two notations are ■ Brennan M.J., Schwartz E.S. (1985) “Evaluating Natural Resource Investments” The
equivalent. Journal of Business, Vol. 58, No. 2
2. The square-root can be computed via a Cholesky factorization combined with a ■ Carr P., Geman H., Madan D., Yor M. (2002) “The Fine Structure of Asset Returns”
Singular Value Decomposition. See Press et al. (1997) for details. Journal of Business, Vol. 75, No. 2
3. This analogy between Kushner’s NLF and the UKF has been studied in Ito and Xiong ■ Doucet A., De Freitas N., Gordon N. (Editors) (2001) “Sequential Monte-Carlo
(2000). Methods in Practice” Springer-Verlag
4. A description of the Gaussian Quadrature could be found in Press et al. (1997). ■ Engle R. F., Ishida I. (2002) “The Square-Root, the Affine, and the CEV GARCH
5. See Harvey (1989) for an explanation on Smoothing. Models” Working Paper, New York University and University of California San Diego
6. See for instance Arulampalam et al. (2002) or Van der Merwe, et al. (2000) for details. ■ Harvey A. C. (1989) “Forecasting, Structural Time Series Models and the Kalman Filter”
7. The same exact methodology could be used in a non risk-neutral setting. We are sup- Cambridge University Press
posing we have a small time-step t in order to be able to apply the Girsanov theorem. ■ Harvey A. C., Ruiz E., Shephard N. (1994) “Multivariate Stochastic Variance Models”
8. In that model, interest rates are supposed to be constant. Review of Economic Studies, Vol. 61, No. 2
9. This means that N = 4 in the case we study. ■ Haykin S. (Editor) (2001) “Kalman Filtering and Neural Networks” Wiley Inter-Science
10. In the whole empirical study, optimizations have been made with a precision of 1e-5 ■ Heston S. (1993) “A Closed-Form Solution for Options with Stochastic Volatility with
on the gradients. Applications to Bond and Currency Options” Review of Financial Studies, Vol. 6, No. 2
11. For the two filters, the parameters values retained to initiate the optimization are the ■ Ito K., Xiong K. (2000) “Gaussian Filters for Nonlinear Filtering Problems” IEEE
same. These values are the following: κ = 0.5; µ = 0.1; σ S = 0.3; α = 0.1; σ C = 0.4; ρ = Transactions on Automatic Control, Vol. 45, No. 5
0.5; λ = 0.1. ■ Javaheri A. (2002) “The Volatility Process” Ph.D. Thesis in progress, Ecole des Mines de
12. The MPE and the RMSE presented here can not be directly compared with those pro- Paris
posed by Schwartz in 1997 because his computations were done using the logarithm of ■ Julier S. J., Uhlmann J.K. (1997) “A New Extension of the Kalman Filter to Nonlinear
the futures prices. Systems” The University of Oxford, The Robotics Research Group
^
Wilmott magazine 17
TECHNICAL ARTICLE 1
ACKNOWLEDGEMENT
The second part of the study was realized with the support of the French Institute for
Energy (IFE). The data were provided by TotalFinaElf.
18 Wilmott magazine