This action might not be possible to undo. Are you sure you want to continue?
∗
Damiano Brigo
†
, Antonio Dalessandro
‡
, Matthias Neugebauer
§
, Fares Triki
¶
15 November 2007
Abstract
In risk management it is desirable to grasp the essential statistical features of a time series rep
resenting a risk factor. This tutorial aims to introduce a number of diﬀerent stochastic processes
that can help in grasping the essential features of risk factors describing diﬀerent asset classes or
behaviors. This paper does not aim at being exhaustive, but gives examples and a feeling for practi
cally implementable models allowing for stylised features in the data. The reader may also use these
models as building blocks to build more complex models, although for a number of risk management
applications the models developed here suﬃce for the ﬁrst step in the quantitative analysis. The
broad qualitative features addressed here are fat tails and mean reversion. We give some orientation
on the initial choice of a suitable stochastic process and then explain how the process parameters
can be estimated based on historical data. Once the process has been calibrated, typically through
maximum likelihood estimation, one may simulate the risk factor and build future scenarios for the
risky portfolio. On the terminal simulated distribution of the portfolio one may then single out
several risk measures, although here we focus on the stochastic processes estimation preceding the
simulation of the risk factors Finally, this ﬁrst survey report focuses on single time series. Correlation
or more generally dependence across risk factors, leading to multivariate processes modeling, will be
addressed in future work.
JEL Classiﬁcation code: G32, C13, C15, C16.
AMS Classiﬁcation code: 60H10, 60J60, 60J75, 65C05, 65c20, 91B70
Key words: Risk Management, Stochastic Processes, Maximum Likelihood Estimation, Fat Tails, Mean
Reversion, Monte Carlo Simulation
∗
We are grateful to Richard Hrvatin and Lars Jebjerg for reviewing the manuscript and for helpful comments. Kyr
iakos Chourdakis furhter helped with comments and formatting issues. Contact: richard.hrvatin@fitchratings.com,
lars.jebjerg@fitchratings.com, kyriakos.chourdakis@fitchratings.com
†
Fitch Solutions, and Department of Mathematics, Imperial College, London. damiano.brigo@fitchratings.com
‡
Fitch Solutions, and Department of Mathematics, Imperial College, London. antonio.dalessandro@ic.ac.uk
§
Fitch Ratings. matthias.neugebauer@fitchratings.com
¶
Fitch Solutions, and Paris School of Economics, Paris. fares.triki@fitchratings.com
1
Contents
1 Introduction 3
2 Modelling with Basic Stochastic Processes: GBM 5
3 Fat Tails: GARCH Models 13
4 Fat Tails: Jump Diﬀusion Models 17
5 Fat Tails Variance Gamma (VG) process 21
6 The Mean Reverting Behaviour 25
7 Mean Reversion: The Vasicek Model 27
8 Mean Reversion: The Exponential Vasicek Model 31
9 Mean Reversion: The CIR Model 33
10 Mean Reversion and Fat Tails together 36
2
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 3
1 Introduction
In risk management and in the rating practice it is desirable to grasp the essential statistical features
of a time series representing a risk factor to begin a detailed technical analysis of the product or the
entity under scrutiny. The quantitative analysis is not the ﬁnal tool, since it has to be integrated with
judgemental analysis and possible stresses, but represents a good starting point for the process leading
to the ﬁnal decision.
This tutorial aims to introduce a number of diﬀerent stochastic processes that, according to the
economic and ﬁnancial nature of the speciﬁc risk factor, can help in grasping the essential features of
the risk factor itself. For example, a family of stochastic processes that can be suited to capture foreign
exchange behaviour might not be suited to model hedge funds or a commodity. Hence there is need
to have at one’s disposal a suﬃciently large set of practically implementable stochastic processes that
may be used to address the quantitative analysis at hand to begin a risk management or rating decision
process.
This report does not aim at being exhaustive, since this would be a formidable task beyond the scope
of a survey paper. Instead, this survey gives examples and a feeling for practically implementable models
allowing for stylised features in the data. The reader may also use these models as building blocks to build
more complex models, although for a number of risk management applications the models developed here
suﬃce for the ﬁrst step in the quantitative analysis.
The broad qualitative features addressed here are fat tails and mean reversion. This report begins
with the geometric Brownian motion (GBM) as a fundamental example of an important stochastic process
featuring neither mean reversion nor fat tails, and then it goes through generalisations featuring any of the
two properties or both of them. According to the speciﬁc situation, the diﬀerent processes are formulated
either in (log) returns space or in levels space, as is more convenient. Levels and returns can be easily
converted into each other, so this is indeed a matter of mere convenience.
This tutorial will address the following processes:
• Basic process: Arithmetic Brownian Motion (ABM, returns) or GBM (levels).
• Fat tails processes: GBM with lognormal jumps (levels), ABM with normal jumps (returns),
GARCH (returns), Variance Gamma (returns).
• Mean Reverting processes: Vasicek, CIR (levels if interest rates or spreads, or returns in
general), exponential Vasicek (levels).
• Mean Reverting processes with Fat Tails: Vasicek with jumps (levels if interest rates or
spreads, or returns in general), exponential Vasicek with jumps (levels).
Diﬀerent ﬁnancial time series are better described by diﬀerent processes above. In general, when ﬁrst
presented with a historical time series of data for a risk factor, one should decide what kind of general
properties the series features. The ﬁnancial nature of the time series can give some hints at whether
mean reversion and fat tails are expected to be present or not. Interest rate processes, for example, are
often taken to be mean reverting, whereas foreign exchange series are supposed to be often fat tailed and
credit spreads feature both characteristics. In general one should:
Check for mean reversion or stationarity
Some of the tests mentioned in the report concern mean reversion. The presence of an autoregressive
(AR) feature can be tested in the data, typically on the returns of a series or on the series itself if this
is an interest rate or spread series. In linear processes with normal shocks, this amounts to checking
stationarity. If this is present, this can be a symptom for mean reversion and a mean reverting process
can be adopted as a ﬁrst assumption for the data. If the AR test is rejected this means that there is
no mean reversion, at least under normal shocks and linear models, although the situation can be more
complex for nonlinear processes with possibly fat tails, where the more general notions of stationarity
and ergodicity may apply. These are not addressed in this report.
If the tests ﬁnd AR features then the process is considered as mean reverting and one can compute
autocorrelation and partial autocorrelation functions to see the lag in the regression. Or one can go
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 4
directly to the continuous time model and estimate it on the data through maximum likelihood. In this
case, the main model to try is the Vasicek model.
If the tests do not ﬁnd AR features then the simplest form of mean reversion for linear processes
with Gaussian shocks is to be rejected. One has to be careful though that the process could still be
mean reverting in a more general sense. In a way, there could be some sort of mean reversion even under
nonGaussian shocks, and example of such a case are the jumpextended Vasicek or exponential Vasicek
models, where mean reversion is mixed with fat tails, as shown below in the next point.
Check for fat tails If the AR test fails, either the series is not mean reverting, or it is but with fatter
tails that the Gaussian distribution and possibly nonlinear behaviour. To test for fat tails a ﬁrst graphical
tool is the QQplot, comparing the tails in the data with the Gaussian distribution. The QQplot gives
immediate graphical evidence on possible presence of fat tails. Further evidence can be collected by
considering the third and fourth sample moments, or better the skewness and excess kurtosis, to see
how the data diﬀer from the Gaussian distribution. More rigorous tests on normality should also be
run following this preliminary analysis
1
. If fat tails are not found and the AR test failed, this means
that one is likely dealing with a process lacking mean reversion but with Gaussian shocks, that could be
modeled as an arithmetic Brownian motion. If fat tails are found (and the AR test has been rejected),
one may start calibration with models featuring fat tails and no mean reversion (GARCH, NGARCH,
Variance Gamma, arithmetic Brownian motion with jumps) or fat tails and mean reversion (Vasicek
with jumps, exponential Vasicek with jumps). Comparing the goodness of ﬁt of the models may suggest
which alternative is preferable. Goodness of ﬁt is determined again graphically through QQplot of the
model implied distribution against the empirical distribution, although more rigorous approaches can be
applied
2
. The predictive power can also be tested, following the calibration and the goodness of ﬁt tests.
3
.
The above classiﬁcation may be summarized in the table, where typically the referenced variable is
the return process or the process itself in case of interest rates or spreads:
Normal tails Fat tails
NO mean reversion ABM ABM+Jumps,
(N)GARCH, VG
Mean Reversion Vasicek Exponential Vasicek
CIR, Vasicek with Jumps
Once the process has been chosen and calibrated to historical data, typically through regression
or maximum likelihood estimation, one may use the process to simulate the risk factor over a given
time horizon and build future scenarios for the portfolio under examination. On the terminal simulated
distribution of the portfolio one may then single out several risk measures. This report does not focus
on the risk measures themselves but on the stochastic processes estimation preceding the simulation of
the risk factors. In case more than one model is suited to the task, one can analyze risks with diﬀerent
models and compare the outputs as a way to reduce model risk.
Finally, this ﬁrst survey report focuses on single time series. Correlation or more generally dependence
across risk factors, leading to multivariate processes modeling, will be addressed in future work.
Prerequisites
The following discussion assumes that the reader is familiar with basic probability theory, includ
ing probability distributions (density and cumulative density functions, moment generating functions),
random variables and expectations. It is also assumed that the reader has some practical knowledge of
stochastic calculus and of It¯o’s formula, and basic coding skills. For each section, the MATLAB
code
is provided to implement the speciﬁc models. Examples illustrating the possible use of the models on
actual ﬁnancial time series are also presented.
1
such as, for example, the Jarque Bera, the ShapiroWilk and the AndersonDarling tests, that are not addressed in this
report
2
Examples are the Kolmogorov Smirnov test, likelihood ratio methods and the Akaike information criteria, as well as
methods based on the Kullback Leibler information or relative entropy, the Hellinger Distance and other divergences
3
The Diebold Mariano statistics can be mentioned as an example for AR processes. These approaches are not pursued
here.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 5
2 Modelling with Basic Stochastic Processes: GBM
This section provides an introduction to modelling through stochastic processes. All the concepts will be
introduced using the fundamental process for ﬁnancial modelling, the geometric Brownian motion with
constant drift and constant volatility. The GBM is ubiquitous in ﬁnance, being the process underlying
the Black and Scholes formula for pricing European options.
2.1 The Geometric Brownian Motion
The geometric Brownian motion (GBM) describes the random behaviour of the asset price level S(t) over
time. The GBM is speciﬁed as follows:
dS(t) = µS(t)dt +σS(t)dW(t) (1)
Here W is a standard Brownian motion, a special diﬀusion process
4
that is characterised by indepen
dent identically distributed (iid) increments that are normally (or Gaussian) distributed with zero mean
and a standard deviation equal to the square root of the time step. Independence in the increments
implies that the model is a Markov Process, which is a particular type of process for which only the
current asset price is relevant for computing probabilities of events involving future asset prices. In other
terms, to compute probabilities involving future asset prices, knowledge of the whole past does not add
anything to knowledge of the present.
The d terms indicate that the equation is given in its continuous time version
5
. The property of
independent identically distributed increments is very important and will be exploited in the calibration
as well as in the simulation of the random behaviour of asset prices. Using some basic stochastic calculus
the equation can be rewritten as follows:
d log S(t) =
µ −
1
2
σ
2
dt +σdW(t) (2)
where log denotes the standard natural logarithm. The process followed by the log is called an
Arithmetic Brownian Motion. The increments in the logarithm of the asset value are normally distributed.
This equation is straightforward to integrate between any two instants, t and u, leading to:
log S(u) −log S(t) =
µ −
1
2
σ
2
(u −t) +σ(W(u) − W(t)) ∼ N
µ −
1
2
σ
2
(u −t), σ
2
(u −t)
. (3)
Through exponentiation and taking u = T and t = 0 the solution for the GBM equation is obtained:
S(T) = S(0) exp
¸
µ −
1
2
σ
2
T +σW(T)
(4)
This equation shows that the asset price in the GBM follows a lognormal distribution, while the
logarithmic returns log(S
t+∆t
/S
t
) are normally distributed.
The moments of S(T) are easily found using the above solution, the fact that W(T) is Gaussian with
mean zero and variance T, and ﬁnally the moment generating function of a standard normal random
variable Z given by:
E
e
aZ
= e
1
2
a
2
. (5)
Hence the ﬁrst two moments (mean and variance) of S(T) are:
4
Also referred to as a Wiener Process.
5
A continuoustime stochastic process is one where the value of the price can change at any point in time. The theory is
very complex and actually this diﬀerential notation is just shorthand for an integral equation. A discretetime stochastic
process on the other hand is one where the price of the ﬁnancial asset can only change at certain ﬁxed times. In practice,
discrete behavior is usually observed, but continuous time processes prove to be useful to analyse the properties of the
model, besides being paramount in valuation where the assumption of continuous trading is at the basis of the Black and
Scholes theory and its extensions.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 6
E[S(T)] = S(0)e
µT
Var [S(T)] = e
2µT
S
2
(0)
e
σ
2
T
−1
(6)
To simulate this process, the continuous equation between discrete instants t
0
< t
1
< . . . < t
n
needs to
be solved as follows:
S(t
i+1
) = S(t
i
) exp
¸
µ −
1
2
σ
2
(t
i+1
−t
i
) +σ
t
i+1
−t
i
Z
i+1
(7)
where Z
1
, Z
2
, . . . Z
n
are independent random draws from the standard normal distribution. The
simulation is exact and does not introduce any discretisation error, due to the fact that the equation
can be solved exactly. The following charts plot a number of simulated sample paths using the above
equation and the mean plus/minus one standard deviation and the ﬁrst and 99th percentiles over time.
The mean of the asset price grows exponentially with time.
0 100 200 300 400 500 600
50
100
150
200
250
300
350
Time in days
S
p
o
t
P
r
i
c
e
0 100 200 300 400 500
50
100
150
200
250
300
Time in days
S
p
o
t
P
r
i
c
e
Figure 1: GBM Sample Paths and Distribution Statistics
The following Matlab function simulates sample paths of the GBM using equation (7), which was
vectorised in Matlab.
Code 1 MATLAB
Code to simulate GBM Sample Paths.
function S=GBM_simulation(N_Sim ,T,dt,mu,sigma ,S0)
mean=(mu 0.5* sigma ^2)*dt;
S=S0*ones(N_Sim ,T+1);
BM=sigma*sqrt(dt)*normrnd (0,1,N_Sim ,T);
S(:,2: end)=S0*exp(cumsum(mean+BM ,2));
end
2.2 Parameter Estimation
This section describes how the GBM can be used as an attempt to model the random behaviour of the
FTSE100 stock index, the log of which is plotted in the left chart of Figure 2.
The second chart, on the right of Figure 2, plots the quantiles of the log return distribution against
the quantiles of the standard normal distribution. This QQplot allows one to compare distributions
and to check the assumption of normality. The quantiles of the historical distribution are plotted on the
Yaxis and the quantiles of the chosen modeling distribution on the Xaxis. If the comparison distribution
provides a good ﬁt to the historical returns, then the QQplot approximates a straight line. In the case
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 7
1940 1946 1953 1959 1965 1971 1978 1984 1990 1996 2003
0
1
2
3
4
5
6
7
8
l
o
g
o
f
F
T
S
E
1
0
0
I
n
d
e
x
−4 −3 −2 −1 0 1 2 3 4
−10
−8
−6
−4
−2
0
2
4
6
8
10
Standard Normal Quantiles
Q
u
a
n
t
i
l
e
s
o
f
I
n
p
u
t
S
a
m
p
l
e
QQ Plot of Sample Data versus Standard Normal
Figure 2: Historical FTSE100 Index and QQ Plot FTSE 100
of the FTSE100 log returns, the QQplot show that the historical quantiles in the tail of the distribution
are signiﬁcantly larger compared to the normal distribution. This is in line with the general observation
about the presence of fat tails in the return distribution of ﬁnancial asset prices. Therefore the GBM
at best provides only a rough approximation for the FTSE100. The next sections will outline some
extensions of the GBM that provide a better ﬁt to the historical distribution.
Another important property of the GBM is the independence of returns. Therefore, to ensure the
GBM is appropriate for modelling the asset price in a ﬁnancial time series one has to check that the
returns of the observed data are truly independent. The assumption of independence in statistical terms
means that there is no autocorrelation present in historical returns. A common test for autocorrelation
(typically on returns of a given asset) in a time series sample x
1
, x
2
, . . . , x
n
realised from a random process
X(t
i
) is to plot the autocorrelation function of the lag k, deﬁned as:
ACF(k) =
1
(n −k)ˆ v
n−k
¸
i=1
(x
i
− ˆ m)(x
i+k
− ˆ m), k = 1, 2, . . . (8)
where ˆ m and ˆ v are the sample mean and variance of the series, respectively. Often, besides the ACF,
one also considers the Partial AutoCorrelation function (PACF). Without fully deﬁning PACF, roughly
speaking one may say that while ACF(k) gives an estimate of the correlation between X(t
i
) and X(t
i+k
),
PACF(k) informs on the correlation between X(t
i
) and X(t
i+k
) that is not accounted for by the shorter
lags from 1 to k −1, or more precisely it is a sort of sample estimate of:
Corr(X(t
i
) −
¯
X
i+1,...,i+k−1
i
, X(t
i+k
) −
¯
X
i+1,...,i+k−1
i+k
)
where
¯
X
i+1,...,i+k−1
i
and
¯
X
i+1,...,i+k−1
i+k
are the best estimates (regressions in a linear context) of X(t
i
)
and X(t
i+k
) given X(t
i+1
), . . . , X(t
i+k−1
). PACF also gives an estimate of the lag in an autoregressive
process.
ACF and PACF for the FTSE 100 equity index returns are shown in the charts in Figure 3. For the
FTSE100 one does not observe any signiﬁcant lags in the historical return time series, which means the
independence assumption is acceptable in this example.
Maximum Likelihood Estimation
Having tested the properties of independence as well as normality for the underlying historical data
one can now proceed to calibrate the parameter set Θ = (µ, σ) for the GBM based on the historical
returns. To ﬁnd Θ that yields the best ﬁt to the historical dataset the method of maximum likelihood
estimation is used (MLE).
MLE can be used for both continuous and discrete random variables. The basic concept of MLE, as
suggested by the name, is to ﬁnd the parameter estimates Θ for the assumed probability density function
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 8
0 5 10 15 20
−0.2
0
0.2
0.4
0.6
0.8
Lag
S
a
m
p
l
e
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
Sample Autocorrelation Function (ACF)
0 5 10 15 20
−0.2
0
0.2
0.4
0.6
0.8
Lag
S
a
m
p
l
e
P
a
r
t
i
a
l
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
s
Sample Partial Autocorrelation Function
Figure 3: Autocorrelation and Partial Autocorrelation Function for FTSE 100
f
Θ
(continuous case) or probability mass function (discrete case) that will maximise the likelihood or
probability of having observed the given data sample x
1
, x
2
, x
3
, ..., x
n
for the random vector X
1
, ..., X
n
.
In other terms, the observed sample x
1
, x
2
, x
3
, ..., x
n
is used inside f
X1,X2,...,Xn;Θ
, so that the only variable
in f is Θ, and the resulting function is maximised in Θ. The likelihood (or probability) of observing a
particular data sample, i.e. the likelihood function, will be denoted by L(Θ).
In general, for stochastic processes, the Markov property is suﬃcient to be able to write the likelihood
along a time series of observations as a product of transition likelihoods on single time steps between
two adjacent instants. The key idea there is to notice that for a Markov process x
t
, one can write, again
denoting f the probability density of the vector random sample:
L(Θ) = f
X(t0),X(t1),...,X(tn);Θ
= f
X(tn)X(tn−1);Θ
· f
X(tn−1)X(tn−2);Θ
· · · f
X(t1)X(t0);Θ
· f
X(t0);Θ
(9)
In the more extreme cases where the observations in the data series are iid, the problem is consid
erably simpliﬁed, since f
X(ti)X(ti−1);Θ
= f
X(ti);Θ
and the likelihood function becomes the product of
the probability density of each data point in the sample without any conditioning. In the GBM case,
this happens if the maximum likelihood estimation is done on logreturns rather than on the levels. By
deﬁning the X as
X(t
i
) := log S(t
i
) −log S(t
i−1
), (10)
one sees that they are already independent so that one does not need to use explicitly the above decom
position (9) through transitions. For general Markov processes diﬀerent from GBM, or if one had worked
at level space S(t
i
) rather than in return space X(t
i
), this can be necessary though: see, for example,
the Vasicek example later on.
The likelihood function for iid variables is in general:
L(Θ) = f
Θ
(x
1
, x
2
, x
3
, ..., x
n
) =
n
¸
i=1
f
Θ
(x
i
) (11)
The MLE estimate
ˆ
Θ is found by maximising the likelihood function. Since the product of density
values could become very small, which would cause numerical problems with handling such numbers,
the likelihood function is usually converted
6
to the log likelihood L
∗
= log L.
7
For the iid case the
loglikelihood reads:
L
∗
(Θ) =
n
¸
i=1
log f
Θ
(x
i
) (12)
6
This is allowed, since maxima are not aﬀected by monotone transformations.
7
The reader has to be careful not to confuse the log taken to move from levels S to returns X with the log used to move
from the likelihood to the loglikelihood. The two are not directly related, in that one takes the loglikelihood in general
even in situations when there are no logreturns but just general processes.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 9
The maximum of the log likelihood usually has to be found numerically using an optimisation algo
rithm
8
. In the case of the GBM, the logreturns increments form normal iid random variables, each with
a well known density f
Θ
(x) = f(x; m, v), determined by mean and variance. Based on Equation (3) (with
u = t
i+1
and t = t
i
) the parameters are computed as follows:
m =
¸
ˆ µ −
1
2
´
σ
2
∆t v =
´
σ
2
∆t (13)
The estimates for the GBM parameters are deduced from the estimates of m and v. In the case of
the GBM, the MLE method provides closed form expressions for m and v through MLE of Gaussian iid
samples
9
. This is done by diﬀerentiating the Gaussian density function with respect to each parameter and
setting the resulting derivative equal to zero. This leads to following well known closed form expressions
for the sample mean and variance
10
of the log returns samples x
i
for X
i
= log S(t
i
) −log S(t
i−1
)
ˆ m =
n
¸
i=1
x
i
/n ˆ v =
n
¸
i=1
(x
i
− ˆ m)
2
/n. (14)
Similarly one can ﬁnd the maximum likelihood estimates for both parameters simultaneously using
the following numerical search algorithm.
Code 2 MATLAB
MLE Function for iid Normal Distribution.
function [mu sigma] = GBM_calibration(S,dt,params )
Ret=price2ret(S);
n=length (Ret);
options = optimset (’MaxFunEvals’, 100000 , ’MaxIter ’, 100000);
fminsearch(@normalLL , params , options );
function mll = normalLL (params)
mu=params (1);sigma=abs(params (2));
ll=n*log(1/sqrt(2*pi*dt)/sigma)+ sum((Ret mu*dt).^2/2/( dt*sigma ^2));
mll=ll;
end
end
An important remark is in order for this important example: given that x
i
= log s(t
i
) − log s(t
i−1
),
where s is the observed sample time series for geometric brownian motion S, ˆ m is expressed as:
ˆ m =
n
¸
i=1
x
i
/n =
log(s(t
n
)) −log(s(t
0
))
n
(15)
The only relevant information used on the whole S sample are its initial and ﬁnal points. All of the
remaining sample, possibly consisting of hundreds of observations, is useless.
11
This is linked to the fact that drift estimation for random processes like GBM is extremely diﬃcult.
In a way, if one could really estimate the drift one would know locally the direction of the market in
the future, which is eﬀectively one of the most diﬃcult problems. The impossibility of estimating this
quantity with precision is what keeps the market alive. This is also one of the reasons for the success of
risk neutral valuation: in risk neutral pricing one is allowed to replace this very diﬃcult drift with the
risk free rate.
8
For example, Excel Solver or fminsearch in MatLab, although for potentially multimodal likelihoods, such as possibly
the jump diﬀusion cases further on, one may have to preprocess the optimisation through a coarse grid global method such
as genetic algorithms or simulated annealing, so as to avoid getting stuck in a local minima given by a lower mode.
9
Hogg and Craig, Introduction to Mathematical Statistics., fourth edition, p. 205, example 4.
10
Note that the estimator for the variance is biased, since E(ˆ v) = v(n − 1)/n. The bias can be corrected by multiplying
the estimator by n/(n − 1). However, for large data samples the diﬀerence is very small.
11
Jacod, in Statistics of Diﬀusion Processes: Some Elements, CNRIAMI 94.17, p. 2, in case v is known, shows that the
drift estimation can be statistically consistent, converging in probability to the true drift value, only if the number of points
times the time step tends to inﬁnity. This conﬁrms that increasing the observations between two given ﬁxed instants is
useless; the only helpful limit would be letting the ﬁnal time of the sample go to inﬁnity, or let the number of observations
increase while keeping a ﬁxed time step
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 10
The drift estimation will improve considerably for mean reverting processes, as shown in the next
sections.
Conﬁdence Levels for Parameter Estimates
For most applications it is important to know the uncertainty around the parameter estimates obtained
from historical data, which is statistically expressed by conﬁdence levels. Generally, the shorter the
historical data sample, the larger the uncertainty as expressed by wider conﬁdence levels.
For the GBM one can derive closed form expressions for the Conﬁdence Interval (CI) of the mean
and standard deviation of returns, using the Gaussian law of independent return increments and the fact
that the sum of independent Gaussians is still Gaussian, with the mean given by the sum of the means
and the variance given by the sum of variances. Let C
n
= X
1
+X
2
+X
3
+... +X
n
, then one knows that
C
n
∼ N(nm, nv)
12
. One then has:
P
C
n
−nm
√
v
√
n
z
= P
C
n
/n −m
√
v/
√
n
z
= Φ(z) (16)
where Φ() denotes the cumulative standard normal distribution. C
n
/n, however, is the MLE estimate
for the sample mean m. Therefore the 95th conﬁdence level for the sample mean is given by:
C
n
n
−1.96
√
v
√
n
m
C
n
n
+ 1.96
√
v
√
n
(17)
This shows that as n increases the CI tightens, which means the uncertainty around the parameter
estimates declines.
Since a standard Gaussian squared is a chisquared, and the sum of n independent chisquared is a
chisquared with n degrees of freedom, this gives:
P
n
¸
i=1
(x
i
− ˆ m)
√
v
2
z
= P
n
v
ˆ v z
= χ
2
n
(z), (18)
hence under the assumption that the returns are normal one ﬁnds the following conﬁdence interval
for the sample variance:
n
q
χ
U
ˆ v v
n
q
χ
L
ˆ v (19)
where q
χ
L
, q
χ
U
are the quantiles of the chisquared distribution with n degrees of freedom corresponding
to the desired conﬁdence level.
In the absence of these explicit relationships one can also derive the distribution of each of the pa
rameters numerically, by simulating the sample parameters as follows. After the parameters have been
estimated (either through MLE or with the best ﬁt to the chosen model, call these the ”ﬁrst parame
ters”), simulate a sample path of length equal to the historical sample with those parameters. For this
sample path one then estimates the best ﬁt parameters as if the historical sample were the simulated
one, and obtains a sample of the parameters. Then again, simulating from the ﬁrst parameters, one
goes through this procedure and obtains another sample of the parameters. By iterating one builds a
random distribution of each parameter, all starting from the ﬁrst parameters. An implementation of the
simulated parameter distributions is provided in the next Matlab function.
The charts in Figure 4 plot the simulated distribution for the sample mean and sample variance
against the normal and Chisquared distribution given by the theory and the Gaussian distribution. In
both cases the simulated distributions are very close to their theoretical limits. The conﬁdence levels
and distributions of individual estimated parameters only tell part of the story. For example, in value
atrisk calculations, one would be interested in ﬁnding a particular percentile of the asset distribution.
In expected shortfall calculations, one would be interested in computing the expectation of the asset
12
Even without the assumption of Gaussian log returns increments, but knowing only that log returns increments are
iid, one can reach a similar Gaussian limit for Cn, but only asymptotically in n, thanks to the central limit theorem.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 11
Code 3 MATLAB
Simulating distribution for Sample mean and variance.
%load ’C:\Research \QS CVO Technical Paper\Report\Chapter1 \ftse100monthly1940.mat’
load ’ftse100monthly1940.mat’
S=ftse100monthly1940data;
ret=log(S(2:end ,2)./S(1:end 1 ,2));
n=length(ret)
m=sum(ret)/n
v=sqrt(sum((ret m).^2)/n)
SimR=normrnd (m,v,10000 ,n);
m_dist=sum(SimR ,2)/n;
m_tmp=repmat(m_dist ,1,799);
v_dist=sum((SimR m_tmp ).^2,2)/n;
dt=1/12;
sigma_dist=sqrt(v_dist/dt);
mu_dist =m_dist/dt+0.5* sigma_dist.^2;
S0=100;
pct_dist =S0*logninv (0.99,sigma_dist ,mu_dist );
0.05 0.1 0.15 0.2
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
F
r
e
q
u
e
n
c
y
GBM Parameter mu
600 650 700 750 800 850 900 950 1000
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
F
r
e
q
u
e
n
c
y
Normalised Sample Variance
Figure 4: Distribution of Sample Mean and Variance
conditional on exceeding such percentile. In the case of the GBM, one knows the marginal distribution
of the asset price to be log normal and one can compute percentiles analytically for a given parameter
set. The MATLAB code 2.2, which computes the parameter distributions, also simulates the distribution
for the 99th percentile of the asset price after three years when resampling from simulation according to
the ﬁrst estimated parameters. The histogram is plotted in Figure 5.
2.3 Characteristic and Moment Generating Functions
Let X be a random variable, either discrete with probability p
X
(ﬁrst case) or continuous with density
f
X
(second case). The k −th moment m
k
(X) is deﬁned as:
m
k
(X) = E(X
k
) =
¸
x
x
k
p
X
(x)
∞
−∞
x
k
f
X
(x)dx.
Moments are not guaranteed to exist, for example the Cauchy distribution does not even admit the ﬁrst
moment (mean). The moment generating function M
X
(t) of a random variable X is deﬁned, under some
technical conditions for existence, as:
M(t) = E(e
tX
) =
¸
x
e
tx
p
X
(x)
∞
−∞
e
tx
f
X
(x)dx
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 12
130 140 150 160 170 180 190 200
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
F
r
e
q
u
e
n
c
y
99th Percentile with S0=100 over a three year period
Figure 5: Distribution for 99th Percentile of FTSE100 over a three year Period
and can be seen to be the exponential generating function of its sequence of moments:
M
X
(t) =
∞
¸
k=0
m
k
(X)
k!
t
k
. (20)
The moment generating function (MGF) owes its name to the fact that from its knowledge one may back
out all moments of the distribution. Indeed, one sees easily that by diﬀerentiating j times the MGF and
computing it in 0 one obtains the jth moment:
d
j
dt
j
M
X
(t)
t=0
= m
j
(X). (21)
A better and more general deﬁnition is that of characteristic function, deﬁned as:
φ
X
(t) = E(e
itX
),
where i is the unit imaginary number from complex numbers theory. The characteristic function exists for
every possible density, even for the Cauchy distribution, whereas the moment generating function does not
exist for some random variables, notably some with inﬁnite moments. If moments exist, one can derive
them also from the characteristic function through diﬀerentiation. Basically, the characteristic function
forces one to bring complex numbers into the picture but brings in considerable advantages. Finally, it
is worth mentioning that the moment generating function can be seen as the Laplace transform of the
probability density, whereas the characteristic function is the Fourier transform.
2.4 Properties of Moment Generating and Characteristic Functions
If two random variables, X and Y , have the same moment generating function M(t), then X and Y have
the same probability distribution. The same holds, more generally, for the characteristic function.
If X and Y are independent random variables and W = X +Y , then the moment generating function
of W is the product of the moment generating functions of X and Y :
M
X+Y
(t) = E(e
t(X+Y )
) = E(e
tX
e
tY
) = E(e
tX
)E(e
tY
) = M
X
(t)M
Y
(t).
The same property holds for the characteristic function. This is a very powerful property since the sum
of independent returns or shocks is common in modelling. Finding the density of a sum of independent
random variables can be diﬃcult, resulting in a convolution, whereas in moment or characteristic function
space it is simply a multiplication. This is why typically one moves in moment generating space, computes
a simple multiplication and then goes back to density space rather than working directly in density space
through convolution.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 13
As an example, if X and Y are two independent random variables and X is normal with mean µ
1
and
variance σ
2
1
and Y is normal with mean µ
2
and variance σ
2
2
, then one knows that W = X +Y is normal
with mean µ
1
+ µ
2
and variance σ
2
1
+ σ
2
2
. In fact, the moment generating function of a normal random
variable Z ∼ N(µ, σ
2
) is given by:
M
Z
(t) = e
µt+
σ
2
2
t
2
Hence
M
X
(t) = e
µ1t+
σ
2
1
2
t
2
and M
Y
(t) = e
µ2t+
σ
2
2
2
t
2
From this
M
W
(t) = M
X
(t)M
Y
(t) = e
µ1t+
σ
2
1
2
t
2
e
µ2t+
σ
2
2
2
t
2
= e
(µ1+µ2)t+
(σ
2
1
+σ
2
2
)
2
t
2
one obtains the moment generating function of a normal random variable with mean (µ
1
+ µ
2
) and
variance (σ
2
1
+σ
2
2
), as expected.
3 Fat Tails: GARCH Models
GARCH
13
models were ﬁrst introduced by Bollerslev (1986). The assumption underlying a GARCH
model is that volatility changes with time and with the past information. In contrast, the GBM model
introduced in the previous section assumes that volatility (σ) is constant. GBM could have been easily
generalised to a time dependent but fully deterministic volatility, not depending on the state of the
process at any time. GARCH instead allows the volatility to depend on the evolution of the process. In
particular the GARCH(1,1) model, introduced by Bollerslev, assumes that the conditional variance (i.e.
conditional on information available up to time t
i
) is given by a linear combination of the past variance
σ(t
i−1
)
2
and squared values of past returns.
∆S(t
i
)
S(t
i
)
= µ∆t
i
+σ(t
i
)∆W(t
i
)
σ(t
i
)
2
= ω¯ σ
2
+ασ(t
i−1
)
2
+βǫ(t
i−1
)
2
ǫ(t
i
)
2
= (σ(t
i
)∆W(t
i
))
2
(22)
where ∆S(t
i
) = S(t
i+1
) − S(t
i
) and the ﬁrst equation is just the discrete time counterpart of Equation
(1).
If one had just written the discrete time version of GBM, σ(t
i
) would just be a known function of time
and one would stop with the ﬁrst equation above. But in this case, in the above GARCH model σ(t)
2
is
assumed itself to be a stochastic process, although one depending only on past information. Indeed, for
example
σ(t
2
)
2
= ω¯ σ
2
+βǫ(t
1
)
2
+αω¯ σ
2
+α
2
σ(t
0
)
2
so that σ(t
2
) depends on the random variable ǫ(t
1
)
2
. Clearly conditional on the information at the
previous time t
1
the volatility is no longer random, but unconditionally it is, contrary to the GBM
volatility.
In GARCH the variance is an autoregressive process around a longterm average variance rate ¯ σ
2
.
The GARCH(1,1) model assumes one lag period in the variance. Interestingly, with ω = 0 and β = 1 −α
one recovers the exponentially weighted moving average model, which unlike the equally weighted MLE
estimator in the previous section, puts more weight on the recent observations. The GARCH(1,1) model
can be generalised to a GARCH(p, q) model with p lag terms in the variance and q terms in the squared
returns. The GARCH parameters can be estimated by ordinary least squares regression or by maximum
likelihood estimation, subject to the constraint ω +α +β = 1, a constraint summarising of the fact that
the variance at each instant is a weighted sum where the weights add up to one.
As a result of the time varying statedependent volatility, the unconditional distribution of returns is
nonGaussian and exhibits so called “fat” tails. Fat tail distributions allow the realizations of the random
variables having those distributions to assume values that are more extreme than in the normal/Gaussian
13
Generalised Autoregressive Conditional Heteroscedasticity
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 14
−10 −5 0 5 10
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Historical vs Simulated Return Distribution
F
r
e
q
u
e
n
c
y
Historical
NGARCH Sim
Normal
−8 −6 −4 −2 0 2 4 6 8
−8
−6
−4
−2
0
2
4
6
8
X Quantiles
Y
Q
u
a
n
t
i
l
e
s
Figure 6: Historical FTSE100 Return Distribution vs. NGARCH Return Distribution
case, and are therefore more suited to model risks of large movements than the Gaussian. Therefore,
returns in GARCH will tend to be “riskier” than the Gaussian returns of GBM.
GARCH is not the only approach to account for fat tails, and this report will introduce other ap
proaches to fat tails later on.
The particular structure of the GARCH model allows for volatility clustering (autocorrelation in the
volatility), which means a period of high volatility will be followed by high volatility and conversely
a period of low volatility will be followed by low volatility. This is an often observed characteristic
of ﬁnancial data. For example, Figure 8 shows the monthly volatility of the FTSE100 index and the
corresponding autocorrelation function, which conﬁrms the presence of autocorrelation in the volatility.
A long line of research followed the seminal work of Bollerslev leading to customisations and general
isations including exponential GARCH (EGARCH) models, threshold GARCH (TGARCH) models and
nonlinear GARCH (NGARCH) models among others. In particular, the NGARCH model introduced by
Engle and Ng (1993) is an interesting extension of the GARCH(1,1) model, since it allows for asymmetric
behaviour in the volatility such that “good news” or positive returns yield subsequently lower volatility,
while “bad news” or negative returns yields a subsequent increase in volatility. The NGARCH is speciﬁed
as follows:
∆S(t
i
)
S(t
i
)
= µ∆t
i
+σ(t
i
)∆W(t
i
)
σ(t
i
)
2
= ω +ασ(t
i−1
)
2
+β (ǫ(t
i−1
) −γσ(t
i−1
))
2
ǫ(t
i
)
2
= (σ(t
i
)∆W(t
i
))
2
(23)
NGARCH parameters ω, α, and β are positive and α, β, and γ are subject to the stationarity
constraint α +β(1 +γ
2
) < 1.
Compared to the GARCH(1,1), this model contains an additional parameter γ, which is an adjustment
to the return innovations. Gamma equal to zero leads to the symmetric GARCH(1,1) model, in which
positive and negative ǫ(t
i−1
) have the same eﬀect on the conditional variance. In the NGARCH γ is
positive, thus reducing the impact of good news (ǫ(t
i−1
) > 0) and increasing the impact of bad news
(ǫ(t
i−1
) < 0) on the variance.
MATLAB Routine 4 extends Routine 1 by simulating the GBM with the NGARCH volatility struc
ture. Notice that there is only one loop in the simulation routine, which is the time loop. Since the
GARCH model is path dependent it is no longer possible to vectorise the simulation over the time step.
For illustration purpose, this model is calibrated to the historic asset returns of the FTSE100 index. In
order to estimate the parameters for the NGARCH geometric Brownian motion the maximumlikelihood
method introduced in the previous section is used.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 15
100 200 300 400 500 600 700 800 900 1000
−2
0
2
N
e
w
s
le
v
e
l
4 years of daily simulation
Market News
100 200 300 400 500 600 700 800 900 1000
0
0.02
0.04
0.06
0.08
0.1
L
o
g
r
e
t
u
r
n
s
V
o
la
t
ilit
y
4 years of daily simulation
GBM Log returns Volatility with GARCH effects
100 200 300 400 500 600 700 800 900 1000
0
0.02
0.04
0.06
0.08
0.1
L
o
g
r
e
t
u
r
n
s
V
o
la
t
ilit
y
4 years of daily simulation
GBM Log returns Volatility with N−GARCH effects
100 200 300 400 500 600 700 800 900 1000
−0.05
0
0.05
0.1
L
o
g
r
e
t
u
r
n
s
4 years of daily simulation
GBM Log returns
100 200 300 400 500 600 700 800 900 1000
−0.05
0
0.05
0.1
L
o
g
r
e
t
u
r
n
s
4 years of daily simulation
GBM Log returns with GARCH effects
100 200 300 400 500 600 700 800 900 1000
−0.05
0
0.05
0.1
L
o
g
r
e
t
u
r
n
s
4 years of daily simulation
GBM Log returns with N−GARCH effects
Figure 7: NGARCH News Eﬀect vs GARCH News Eﬀect
Although the variance is changing over time and is state dependent and hence stochastic conditional on
information at the initial time, it is locally deterministic, which means that conditional on the information
at t
i−1
the variance is known and constant in the next step between time t
i−1
and t
i
. In other words,
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 16
1940 1946 1953 1959 1965 1971 1978 1984 1990 1996 2003
0
0.5
1
1.5
2
2.5
3
3.5
4
M
o
n
t
h
l
y
v
o
l
a
t
i
l
y
i
n
%
0 5 10 15 20
−0.2
0
0.2
0.4
0.6
0.8
Lag
S
a
m
p
l
e
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
Sample Autocorrelation Function (ACF)
Figure 8: Monthly Volatility of the FTSE100 and Autocorrelation Function
Code 4 MATLAB
Code to Simulate GBM with NGARCH vol.
function S=NGARCH_simulation(N_Sim ,T,params ,S0)
mu_=params (1);omega_=params (2); alpha_=params (3); beta_=params (4); gamma_=params (5);
S=S0*ones(N_Sim ,T);
v0=omega_ /(1alpha_ beta_ );
v=v0*ones(N_Sim ,1);
BM=normrnd (0,1,N_Sim ,T);
for i=2:T
sigma_=sqrt(v);
mean=(mu_ 0.5* sigma_ .^2);
S(:,i)=S(:,i1).* exp(mean+sigma_ .*BM(:,i));
v=omega_+alpha_*v+beta_ *( sqrt(v).*BM(:,i)gamma_*sqrt(v)).^2;
end
end
the variance in the GARCH models is locally constant. Therefore, conditional on the variance at t
i−1
the density of the process at the next instant is still normal
14
. Moreover, the conditional returns are still
independent. The loglikelihood function for the sample x
1
, ..., x
i
, ..., x
n
of:
X
i
=
∆S(t
i
)
S(t
i
)
is given by:
L
∗
(Θ) =
n
¸
i=1
log f
Θ
(x
i
; x
i−1
) (24)
f
Θ
(x
i
; x
i−1
) = f
N
(x
i
; µ, σ
2
i
) =
1
2πσ
2
i
exp
−
(x
i
−µ)
2
2σ
2
i
(25)
L
∗
(Θ) = constant +
1
2
−log(σ
2
i
) −(x
i
−µ)
2
/σ
2
i
(26)
where f
Θ
is the density function of the normal distribution. Note the appearance of x
i−1
in the func
tion, which indicates that normal density is only locally applicable between successive returns, as noticed
previously. The parameter set for the NGARCH GBM includes ﬁve parameters, Θ = (µ, ω, α, β, γ). Due
to the time changing state dependent variance one no longer ﬁnds explicit solutions for the individual
parameters and has to use a numerical scheme to maximise the likelihood function through a solver. To
do so one only needs to maximise the term in brackets in (26), since the constant and the factor do not
depend on the parameters.
14
Note that the unconditional density of the NGARCH GBM is no longer normal, but an unknown heavy tailed distri
bution.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 17
Code 5 MATLAB
Code to Estimate Parameters for GBM NGARCH.
function [mu_ omega_ alpha_ beta_ gamma_ ]= NG_calibration(data ,params)
returns = price2ret(data);
returnsLength=length(returns );
options = optimset (’MaxFunEvals’, 100000 , ’MaxIter ’, 100000);
fminsearch(@NG_JGBM_LL , params , options );
function mll = NG_JGBM_LL(params)
mu_=params (1);omega_=abs(params (2)); alpha_ =abs(params (3));
beta_=abs(params (4)); gamma_=params (5);
denum = 1alpha_beta_ *(1+ gamma_ ^2);
if denum < 0; mll = intmax; return; end %Variance stationarity test
var_t = omega_/denum;
innov = returns (1)mu_;
LogLikelihood = log(exp(innov ^2/var_t /2)/ sqrt(pi*2*var_t ));
for time=2: returnsLength
var_t = omega_+alpha_*var_t+beta_*(innov gamma_*sqrt(var_t ))^2;
if var_t <0; mll = intmax; return; end;
innov = returns (time)mu_;
LogLikelihood = LogLikelihood + log(exp(innov^2/var_t /2)/ sqrt(pi*2*var_t ));
end
mll = LogLikelihood;
end
end
Having estimated the best ﬁt parameters one can verify the ﬁt of the NGARCH model for the FTSE100
monthly returns using the QQplot. A large number of monthly returns using Routine 5 is ﬁrst simulated.
The QQ plot of the simulated returns against the historical returns shows that the GARCH model
does a signiﬁcantly better job when compared to the constant volatility model of the previous section.
The quantiles of the simulated NGARCH distribution are much more aligned with the quantiles of the
historical return distribution than was the case for the plain normal distribution (compare QQplots in
Figures 2 and 6).
4 Fat Tails: Jump Diﬀusion Models
Compared with the normal distribution of the returns in the GBM model, the logreturns of a GBM model
with jumps are often leptokurtotic
15
. This section discusses the implementation of a jumpdiﬀusion model
with compound Poisson jumps. In this model, a standard time homogeneous Poisson process dictates
the arrival of jumps, and the jump sizes are random variables with some preferred distribution (typically
Gaussian, exponential or multinomial). Merton (1976) applied this model to option pricing for stocks
that can be subject to idiosyncratic shocks, modelled through jumps. The model SDE can be written as:
dS(t) = µS(t)dt +σS(t)dW(t) +S(t)dJ
t
(27)
where again W
t
is a univariate Wiener process and J
t
is the univariate jump process deﬁned by:
J
T
=
NT
¸
j=1
(Y
j
−1), or dJ(t) = (Y
N(t)
−1)dN(t), (28)
where (N
T
)
T0
follows a homogeneous Poisson process with intensity λ, and is thus distributed like a
Poisson distribution with parameter λT. The Poisson distribution is a discrete probability distribution
that expresses the probability of a number of events occurring in a ﬁxed period of time. Since the Poisson
distribution is the limiting distribution of a binomial distribution where the number of repetitions of the
Bernoulli experiment n tends to inﬁnity, each experiment with probability of success λ/n, it is usually
adopted to model the occurrence of rare events. For a Poisson distribution with parameter λT, the
Poisson probability mass function is:
f
P
(x, λT) =
exp(−λT)(λT)
x
x!
, x = 0, 1, 2, . . .
15
Fattailed distribution.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 18
(recall that by convention 0! = 1). The Poisson process (N
T
)
T0
is counting the number of arrivals in
[1, T], and Y
j
is the size of the jth jump. The Y are i.i.d lognormal variables (Y
j
∼ exp
N
µ
Y
, σ
2
Y
)
that are also independent from the Brownian motion W and the basic Poisson process N.
Also in the jump diﬀusion case, and for comparison with (2), it can be useful to write the equation
for the logarithm of S, which in this case reads
d log S(t) =
µ −
1
2
σ
2
dt +σdW(t) + log(Y
N(t)
)dN(t) or, equivalently (29)
d log S(t) =
µ +λµ
Y
−
1
2
σ
2
dt +σdW(t) + [log(Y
N(t)
)dN(t) −µ
Y
λdt]
where now both the jump (between square brackets) and diﬀusion shocks have zero mean, as is shown
right below. Here too it is convenient to do MLE estimation in logreturns space, so that here too one
deﬁnes X(t
i
)’s according to (10).
The solution of the SDE for the levels S is given easily by integrating the log equation:
S(T) = S(0) exp
µ −
σ
2
2
T +σW(T)
N(T)
¸
j=1
Y
j
(30)
Code 6 MATLAB
Code to Simulate GBM with Jumps.
function S = JGBM_simulation(N_Sim ,T,dt,params,S0)
mu_star =params (1);sigma_=params (2);lambda_ =params (3);
mu_y_=params (4);sigma_y_ =params (5);
M_simul = zeros(N_Sim ,T);
for t=1:T
jumpnb = poissrnd (lambda_ *dt,N_Sim ,1);
jump = normrnd (mu_y_ *(jumpnb lambda_ *dt),sqrt(jumpnb )* sigma_y_ );
M_simul (:,t) = mu_star *dt + sigma_*sqrt(dt)*randn(N_Sim ,1) + jump;
end
S=ret2price(M_simul ’,S0)’;
end
The discretisation of the Equation (30) for the given time step ∆t is:
S(t) = S(t −∆t) exp
µ −
σ
2
2
∆t +σ
√
∆tε
t
nt
¸
j=1
Y
j
(31)
where ε ∼ N (0, 1) and n
t
= N
t
−N
t−∆t
counts the jumps between time t −∆t and t. When rewriting
this equation for the log returns
X(t) := ∆log(S(t)) = log(S(t)) −log(S(t −∆t)) we ﬁnd:
X(t) = ∆log(S(t)) = µ
∗
∆t +σ
√
∆t ε
t
+ ∆J
∗
t
(32)
where the jumps ∆J
∗
t
in the time interval ∆t and the drift µ
∗
are deﬁned as:
∆J
∗
t
=
nt
¸
j=1
log(Y
j
) −λ ∆t µ
Y
, µ
∗
=
µ +λµ
Y
−
1
2
σ
2
(33)
so that the jumps ∆J
∗
t
have zero mean. Since the calibration and simulation are done in a discrete time
framework, we use the fact that for the Poisson process with rate λ, the number of jump events occurring
in the ﬁxed small time interval ∆t has the expectation of λ∆t. Compute
E(∆J
∗
t
) = E
¸
nt
¸
j=1
log Y
j
¸
−λ ∆t µ
Y
= E
¸
E
¸
nt
¸
j=1
log Y
j
n
t
¸
¸
−λ ∆t µ
Y
= E(n
t
µ
Y
) −λ ∆t µ
Y
= 0
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 19
0 50 100 150 200 250
−0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
J
u
m
p
s
l
o
g
r
e
t
u
r
n
l
e
v
e
l
1 year of daily simulation
Jumps
0 50 100 150 200 250
0.038
0.04
0.042
0.044
0.046
0.048
0.05
S
(
t
)
1 year of daily simulation
GBM (black line) and GBM with Jumps (green line)
−0.06 −0.04 −0.02 0 0.02 0.04 0.06
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
F
r
e
q
u
e
n
c
y
Simulated daily log return
GBM with Jumps histogram
−0.06 −0.04 −0.02 0 0.02 0.04 0.06
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
F
r
e
q
u
e
n
c
y
Simulated daily log return
GBM histogram
Figure 9: The Eﬀect of the Jumps on the Simulated GBM path and on the Density Probability of the
Log Returns. µ
∗
= 0, σ = 0.15, λ = 10, µ
Y
= 0.03, σ
Y
= 0.001
The simulation of this model is best done by using Equation (32).
Conditional on the jumps occurrence n
t
, the jump disturbance ∆J
∗
t
is normally distributed
∆J
∗
t

nt
∼ N
(n
t
−λ∆t)µ
Y
, n
t
σ
2
Y
.
Thus the conditional distribution of X(t) = ∆log(S(t)) is also normal and the ﬁrst two conditional
moments are:
E(∆log(S(t))n
t
) = µ
∗
∆t + (n
t
−λ ∆t)µ
Y
=
µ −σ
2
/2
∆t +n
t
µ
Y
, (34)
Var(∆log(S(t))n
t
) = σ
2
∆t +n
t
σ
2
Y
(35)
The probability density function is the sum of the conditional probabilities density weighted by the
probability of the conditioning variable: the number of jumps. Then the calibration of the model param
eters can be done using the MLE on logreturns X conditioning on the jumps.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 20
Code 7 MATLAB
Code to Calibrate GBM with Jumps.
function [mu_star sigma_ lambda_ mu_y_ sigma_y_ ] = JGBM_calibration(data ,dt,params)
returns = price2ret(data);
dataLength=length(data);
options = optimset (’MaxFunEvals’, 100000 , ’MaxIter ’, 100000);
params = fminsearch(@FT_JGBM_LL , params , options );
mu_star =params (1);sigma_=abs(params (2)); lambda_ =abs(params (3));
mu_y_=params (4);sigma_y_ =abs(params (5));
function mll = FT_JGBM_LL(params)
mu_=params (1); sigma_=abs(params (2)); lambda_ =abs(params (3));
mu_y_=params (4); sigma_y_ =abs(params (5));
Max_jumps = 5;
factoriel = factorial(0:Max_jumps);
LogLikelihood = 0;
for time=1: dataLength
ProbDensity = 0;
for jumps =0:Max_jumps 1
jumpProb = exp(lambda_ *dt)*(lambda_ *dt)^jumps/factoriel(jumps +1);
condVol = dt*sigma_ ^2+jumps*sigma_y_ ^2;
condMean = mu_*dt+mu_y_*(jumps lambda_ *dt);
condToJumps = exp((data(time)condMean )^2/condVol /2)/ sqrt(pi*2*condVol );
ProbDensity = ProbDensity + jumpProb *condToJumps;
end
LogLikelihood = LogLikelihood + log(ProbDensity);
end
mll = LogLikelihood;
end
end
argmax
µ
∗
,µY ,λ>0,σ>0,σY >0
L
∗
= log(L) (36)
The loglikelihood function for the returns X(t) at observed times t = t
1
, . . . , t
n
with values x
1
, . . . , x
n
,
with ∆t = t
i
−t
i−1
, is then given by:
L
∗
(Θ) =
n
¸
i=1
log f (x
i
; µ, µ
Y
, λ, σ, σ
Y
) (37)
f (x
i
; µ, µ
Y
, λ, σ, σ
Y
) =
+∞
¸
j=0
P(n
t
= j)f
N
x
i
;
µ −σ
2
/2
∆t +jµ
Y
, σ
2
∆t +jσ
2
Y
(38)
This is an inﬁnite mixture of Gaussian random variables, each weighted by a Poisson probability
P(n
t
= j) = f
P
(j; λ∆t). Notice that if ∆t is small, typically the Poisson process jumps at most once. In
that case, the above expression simpliﬁes in
f (x
i
; µ, µ
Y
, λ, σ, σ
Y
) = (1 −λ∆t)f
N
x
i
;
µ −σ
2
/2
∆t, σ
2
∆t
+ λ ∆t f
N
x
i
;
µ −σ
2
/2
∆t +µ
Y
, σ
2
∆t +σ
2
Y
i.e. a mixture of two Gaussian random variables weighted by the probability of zero or one jump in ∆t.
The model parameters are estimated by MLE for FTSE100 on monthly returns and then diﬀerent
paths are simulated. The goodness to ﬁt is seen by showing the QQplot of the simulated monthly log
returns against the historical FTSE100 monthly returns in Fig. 10. This plot shows that the Jumps
model ﬁts the data better then the simple GBM model.
Finally, one can add markedly negative jumps to the arithmetic Brownian motion return process (29),
to then exponentiate it to obtain a new level process for S featuring a more complex jump behaviour.
Indeed, if µ
Y
is larger than one and σ
2
Y
is not large, jumps in the logreturns will tend to be markedly
positive only, which may not be realistic in some cases. For details on how to incorporate negative jumps
and on the more complex mixture features this induces, the reader can refer to Section 10, where the
more general mean reverting case is described.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 21
−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Historical vs Simulated Return Distribution
F
re
q
u
e
n
c
y
Historical
Jumps GBM
GBM
−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
X Quantiles
Y
Q
u
a
n
tile
s
Figure 10: Historical FTSE100 Return Distribution vs. Jump GBM Return Distribution
5 Fat Tails Variance Gamma (VG) process
The two previous sections have addressed the modelling of phenomena where numerically large changes
in value (“fat tails”) are more likely than in the case of the normal distribution. This section presents
yet another method to accomplish fat tails, and this is through a process obtained by time changing a
Brownian motion (with drift ¯ µ and volatility ¯ σ) with an independent subordinator, that is an increasing
process with independent and stationary increments. In a way, instead of making the volatility state
dependent like in GARCH or random like in stochastic volatility models, and instead of adding a new
source of randomness like jumps, one makes the time of the process random through a new process, the
subordinator. A possible related interpretative feature is that ﬁnancial time, or market activity time,
related to trading, is a random transform of calendar time.
Two models based on Brownian subordination are the Variance Gamma (VG) process and Normal
Inverse Gaussian (NIG) process. They are a subclass of the generalised hyperbolic (GH) distributions.
They are characterised by drift parameters, volatility parameter of the Brownian motion, and a variance
parameter of the subordinator.
Both VG and NIG have exponential tails and the tail decay rate is the same for both VG and NIG
distributions and it is smaller than for the Normal distribution.
Therefore these models allow for more ﬂexibility in capturing fat tail features and high kurtosis of
some historical ﬁnancial distributions with respect to GBM.
Among fat tail distributions obtained through subordination, this section will focus on the VG dis
tribution, that can also be thought of as a mixture of normal distributions, where the mixing weights
density is given by the Gamma distribution of the subordinator. The philosophy is not so diﬀerent from
the GBM with jumps seen before, where the return process is also a mixture. The VG distribution was
introduced in the ﬁnancial literature by
16
Madan and Seneta (1990).
The model considered for the logreturns of an asset price is of this type:
d log S(t) = ¯ µ dt +
¯
θ dg(t) + ¯ σ dW(g(t)) S(0) = S
0
(39)
where ¯ µ,
¯
θ and ¯ σ are real constants and ¯ σ 0
17
. This model diﬀers from the “usual” notation of Brownian
motion mainly in the term g
t
. In fact one introduces g
t
to characterise the market activity time. One
can deﬁne the ‘market time’ as a positive increasing random process, g(t), with stationary increments
g(u) −g(t) for u ≥ t ≥ 0. In probabilistic jargon g(t) is a subordinator. An important assumption is that
E[g(u) −g(t)] = u −t. This means that the market time has to reconcile with the calendar time between
t and u on average. Conditional on the subordinator g, between two instants t and u:
¯ σ( W(g(u)) −W(g(t)) )
g
∼ ¯ σ
(g(u) −g(t))ǫ, (40)
16
For a more advanced introduction and background on the above processes refer to Seneta (2004); Cont and Tankov
(2004); Moosbrucker (2006) and Madan, Carr, and Chang (1998).
17
In addition one may say that (39) is a L` evy process of ﬁnite variation but of relatively low activity with small jumps.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 22
where ǫ is a standard Gaussian, and hence:
(log(S(u)/S(t)) − ¯ µ(u −t))g ∼ N(
¯
θ(g(u) −g(t)), ¯ σ
2
(g(u) −g(t))) (41)
VG assumes {g(t)} ∼ Γ(
t
ν
, ν), a Gamma process with parameter ν that is independent from the
standard Brownian motion {W
t
}
t0
. Notice also that the gamma process deﬁnition implies {g(u) −
g(t)} ∼ Γ(
u−t
ν
, ν) for any u and t. Increments are independent and stationary Gamma random variables.
Consistently with the above condition on the expected subordinator, this Gamma assumption implies
E[g(u) −g(t)] = u −t and Var(g(u) −g(t)) = ν(u −t).
Through iterated expectations, one has easily that:
E[log(S(u)) −log(S(t))] = (¯ µ +
¯
θ)(u −t), Var(log S(u) −log S(t)) = (¯ σ
2
+
¯
θ
2
ν)(u −t).
Mean and variance of these logreturns can be compared to the logreturns of GBM in Equation (3), and
found to be the same if (¯ µ+
¯
θ) = µ−σ
2
/2 and ¯ σ
2
+
¯
θ
2
ν = σ
2
. In this case, the diﬀerences will be visible
with higher order moments.
Code 8 MATLAB
code that simulate the asset model in eq. (39).
function S = VG_simulation(N_Sim ,T,dt,params,S0)
theta = params (1); nu = params (2); sigma = params (3); mu = params (4);
g = gamrnd (dt/nu,nu,N_Sim ,T);
S = S0*cumprod ([ones(1,T); exp(mu*dt + theta*g + sigma*sqrt(g).*randn(N_Sim ,T))]);
The process in equation (39) is called a VarianceGamma process.
5.1 Probability Density Function of the VG Process
The unconditional density may then be obtained by integrating out g in Eq (41) through its Gamma
density. This corresponds to the following density function, where one sets u − t =: ∆t and X
∆t
:=
log(S(t + ∆t)/S(t)):
f
(X∆t)
(x) =
∞
0
f
N
(x;
¯
θg, ¯ σ
2
g)f
Γ
g;
∆t
ν
, ν
dg (42)
As mentioned above, this is again a (inﬁnite) mixture of Gaussian densities, where the weights are given
by the Gamma densities rather than Poisson probabilities. The above integral converges and the PDF of
the VG process X
∆t
deﬁned in Eq. (39) is
f
(X∆t)
(x) =
2e
¯
θ(x−¯ µ)
¯ σ
2
¯ σ
√
2πν
∆t
ν
Γ(1/ν)
x − ¯ µ
2¯ σ
2
ν
+
¯
θ
2
∆t/ν−1/2
K
∆t/ν−1/2
s − ¯ µ
2¯ σ
2
ν
+
¯
θ
2
¯ σ
2
(43)
Here K
η
(·) is a modiﬁed Bessel function of the third kind with index η, given for ω > 0 by
K
η
(x) =
1
2
∞
0
y
η−1
exp
−
x
2
(y
−1
+y)
¸
dy
Code 9 MATLAB
code of the variance gamma density function as in equation (43).
function fx = VGdensity(x, theta , nu, sigma , mu, T )
v1 = 2*exp((theta*(xmu)) / sigma^2) / ( (nu^(T/nu)) * sqrt(2*pi) * sigma * gamma(T/nu) );
M2 = (2*sigma ^2)/nu + theta ^2;
v3 = abs(xmu)./sqrt(M2);
v4 = v3.^(T/nu  0.5) ;
v6 = (abs(xmu).*sqrt(M2))./ sigma^2;
K = besselk (T/nu  0.5, v6);
fx = v1.*v4.*K;
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 23
−4 −2 0 2 4 6 8 10 12 14 16
0
0.1
0.2
0.3
0.4
0.5
0 100 200 300 400 500
1.5
2
2.5
3
3.5
4
time (days)
S
t
Figure 11: On the left side, VG densities for ν = .4, ¯ σ = .9, ¯ µ = 0 and various values of of the parameter
¯
θ.
On the right, simulated VG paths samples. To replicate the simulation, use code (5) with the parameters
¯ µ =
¯
θ = 0, ν = 6 and ¯ σ = .03.
5.2 Calibration of the VG Process
Given a time series of independent logreturns log(S(t
i+1
)/S(t
i
)) (with t
i+1
−t
i
= ∆t), say (x
1
, x
2
, . . . , x
n
),
and denoting by π = {¯ µ,
¯
θ, ¯ σ, ν} the variancegamma probability density function parameters that one
would like to estimate, the maximum likelihood estimation (MLE) of π consists in ﬁnding ¯ π that max
imises the logarithm of the likelihood function L(π) = Π
x
i=1
f(x
i
; π). The moment generating function of
X
∆t
:= log(S(t + ∆t)/S(t)), E[e
zX
] is given by:
M
X
(z) = e
¯ µz
1 −
¯
θνz −
1
2
ν¯ σ
2
z
2
−
∆t
ν
(44)
and one obtains the four central moments of the return distribution over an interval of length ∆t:
E[X] =
¯
X = (¯ µ +
¯
θ)∆t
E[(X −
¯
X)
2
] = (ν
¯
θ
2
+ ¯ σ
2
)∆t
E[(X −
¯
X)
3
] = (2
¯
θ
3
ν
2
+ 3¯ σ
2
ν
¯
θ)∆t
E[(X −
¯
X)
4
] = (3ν¯ σ
4
+ 12
¯
θ
2
¯ σ
2
ν
2
+ 6
¯
θ
4
ν
3
)∆t + (3¯ σ
4
+ 6
¯
θ
2
¯ σ
2
ν + 3
¯
θ
4
ν
2
)(∆t)
2
.
Now considering the skewness and the kurtosis:
S =
3ν
¯
θ∆t
((ν
¯
θ
2
+ ¯ σ
2
)∆t)
3/2
K =
(3ν¯ σ
4
+ 12
¯
θ
2
¯ σ
2
ν
2
+ 6
¯
θ
4
ν
3
)∆t + (3¯ σ
4
+ 6
¯
θ
2
¯ σ
2
ν + 3
¯
θ
4
ν
2
)(∆t)
2
((ν
¯
θ
2
+ ¯ σ
2
)∆t)
2
.
and assuming
¯
θ to be small, thus ignoring
¯
θ
2
,
¯
θ
3
,
¯
θ
4
, obtain:
S =
3ν
¯
θ
¯ σ
√
∆t
K = 3
1 +
ν
∆t
.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 24
So the initial guess for the parameters π
0
will be:
¯ σ =
V
∆t
ν =
K
3
−1
∆t
¯
θ =
S¯ σ
√
∆t
3ν
¯ µ =
¯
X
∆t
−
¯
θ
Code 10 MATLAB
code of the MLE for variance gamma density function in (43).
%function params = VG_calibration(data ,dt)
% computation of the initial parameters
data = (EURUSDsv (: ,1));
hist(data)
dt = 1;
data = price2ret(data);
M = mean(data);
V = std(data);
S = skewness (data);
K = kurtosis (data);
sigma = sqrt(V/dt);
nu = (K/31)*dt;
theta = (S*sigma*sqrt(dt ))/(3* nu);
mu = (M/dt)theta;
% VG MLE
pdf_VG = @(data ,theta , nu, sigma , mu) VGdensity(data , theta , nu, sigma , mu,dt);
start = [theta , nu, sigma , mu];
lb = [intmax 0 0 intmax ];
ub = [intmax intmax intmax intmax ];
options = statset (’MaxIter ’ ,1000, ’MaxFunEvals’ ,10000);
params = mle(data , ’pdf’,pdf_VG , ’start’,start , ’lower’,lb, ’upper’,ub , ’options ’,options );
0 10 20
0
0.05
0.1
0.15
0.2
0.25
0.3
1yr
2yr
3yr
4yr
5yr
6yr
7yr
8yr
9yr
10yr
2 4 6 8 10
−5
0
5
10
15
20
25
0.1
0.25
0.5
0.75
0.9
0.95
0.99
years
Figure 12: On the left side, VG densities for
¯
θ = 1, ν = .4, ¯ σ = 1.4, ¯ µ = 0 and time from 1 to 10 years.
On the right side, on the yaxis, percentiles of these densities as time varies, xaxis.
An example can be the calibration of the historical time series of the EUR/USD exchange rate, where
the VG process is a more suitable choice than the geometric Brownian motion. The ﬁt is performed
on daily returns and the plot in Fig. 13 illustrates the results. Notice in particular that the diﬀerence
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 25
Code 11 MATLAB
code to calculate the percentiles of the VG distribution as in ﬁg (12).
clear all; close all
lb = 0; ub = 20;
options = optimset (’Display ’,’off’,’TolFun ’,1e5,’MaxFunEvals’ ,2500);
X = linspace (15,30,500);
sigma = 1.4;
f = @(x,t) VGdensity(x, 1, .4, sigma , 0, t);
nT = 10;
t = linspace (1,10,nT);
pc = [.1 .25 .5 .75 .9 .95 .99];
for k = 1:nT
Y(:,k) = VGdensity(X, 1, .4, sigma , 0, t(k) );
f = @(x) VGdensity(x, 1, .4, sigma , 0, t(k));
VG_PDF = quadl(f,3,30);
for zz = 1:length(pc)
percF = @(a) quadl(f,3,a)  pc(zz);
PercA(k,zz) = fzero(percF ,1,options );
end
end
subplot (1,2,1)
plot(X,Y);
legend(’ 1yr’,’ 2yr’,’ 3yr’,’ 4yr’,’ 5yr’,’ 6yr’,’ 7yr’,’ 8yr’,’ 9yr’,’10yr’)
xlim([5 27]); ylim([0 .32]); subplot (1,2,2); plot(t,PercA);
for zz = 1:length(pc)
text(t(zz),PercA(zz,zz),[{’\fontsize {9}\color{black}’,num2str (pc(zz))}],’HorizontalAlignment’ ,...
’center ’,’BackgroundColor’,[1 1 1]); hold on
end
xlim([.5 10]); xlabel(’years ’); grid on
between the Arithmetic Brownian motion for the returns and the VG process is expressed by the time
change parameter ν, all other parameters being roughly the same in the ABM and VG cases.
6 The Mean Reverting Behaviour
One can deﬁne mean reversion as the property to always revert to a certain constant or time varying
level with limited variance around it. This property is true for an AR(1) process if the absolute value of
the autoregression coeﬃcient is less than one (α < 1). To be precise, the formulation of the ﬁrst order
autoregressive process AR(1) is:
x
t+1
= µ +αx
t
+σǫ
t+1
⇒△x
t+1
= (1 −α)
µ
1 −α
−x
t
+σǫ
t+1
(45)
All the mean reverting behaviour in the processes that are introduced in this section is due to an
AR(1) feature in the discretised version of the relevant SDE.
In general, before thinking about ﬁtting a speciﬁc mean reverting process to available data, it is
important to check the validity of the mean reversion assumption. A simple way to do this is to test
for stationarity. Since for the AR(1) process α < 1 is also a necessary and suﬃcient condition for
stationarity, testing for mean reversion is equivalent to testing for stationarity. Note that in the special
case where α = 1, the process behaves like a pure random walk with constant drift, △x
t+1
= µ +σǫ
t+1
.
There is a growing number of stationarity statistical tests available and ready for use in many econo
metric packages.
18
The most popular are:
• the Dickey and Fuller (1979, DF) test;
• the Augmented DF (ADF) test of Said and Dickey (1984) test;
18
For example Stata, SAS, R, EViews, etc.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 26
200 400 600 800 1000 1200 1400 1600 1800
0.8
1
1.2
1.4
1.6
days
EUR/USD
−0.03 −0.02 −0.01 0 0.01 0.02 0.03
0
0.02
0.04
0.06
0.08
−0.03 −0.02 −0.01 0 0.01 0.02 0.03
−0.04
−0.02
0
0.02
0.04
X Quantiles
Y
Q
u
a
n
t
i
l
e
s
−0.03 −0.02 −0.01 0 0.01 0.02 0.03
0
20
40
60
80
historical
estimated
−0.03 −0.02 −0.01 0 0.01 0.02 0.03
−0.04
−0.02
0
0.02
0.04
X Quantiles
Y
Q
u
a
n
t
i
l
e
s
estimated
Figure 13: On the top chart the historical trend of the EUR/USD is reported. In the middle chart the
daily return distribution presents fat tails with respect to a normal distribution. The MLE parameters
for the normal R.V. are µ = −.0002 and σ = 0.0061. The QQplot in the middle shows clearly the
presence of tails more pronounced than in the normal case. The bottom chart is the plot of the daily
return density against the estimated VG one, and the QQplot of the ﬁtted daily historical FX return
time series EUR/USD using a VG process with parameters
¯
θ = −.0001, ν = 0.4, ¯ σ = 0.0061, ¯ µ = −.0001.
• the Phillips and Perron. (1988, PP);
• the Variance Ratio (VR) test of Poterba and Summers (1988) and Cochrane (2001); and
• the Kwiatkowski, Phillips, Schmidt, and Shin (1992, KPSS) test.
All tests give results that are asymptotically valid, and are constructed under speciﬁc assumptions.
Once the stationarity has been established for the time series data, the only remaining test is for the
statistical signiﬁcance of the autoregressive lag coeﬃcient. A simple Ordinary Least Squares of the time
series on its one time lagged version (Equation (45)) allows to assess whether α is eﬀectively signiﬁcant.
Suppose that, as an example, one is interested in analysing the GLOBOXX index history. Due to the
lack of Credit Default Swap (CDS) data for the time prior to 2004, the underlying GLOBOXX spread
history has to be proxied by an Index based on the Merrill Lynch corporate indices
19
. This proxy index
has appeared to be mean reverting. Then the risk of having the CDS spreads rising in the mean term to
revert to the index historical ”longterm mean” may impact the risk analysis of a product linked to the
GLOBOXX index.
The mean reversion of the spread indices is supported by the intuition that the credit spread is linked
to the economic cycle. This assumption also has some support from the econometric analysis. To illus
trate this, the mean reversion of a Merrill lynch index, namely the EMU Corporate A rated 57Y index
20
,
is tested using the Augmented DickeyFuller test (Said and Dickey (1984)).
19
Asset Swap Spreads data of a portfolio of equally weighted Merrill Lynch Indices that match both the average credit
quality of GLOBOXX, its duration, and its geographical composition.
20
Bloomberg Tickers ER33  Asset Swap Spreads data from 01/98 to 08/07.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 27
The ADF test for mean reversion is sensitive to outliers
21
. Before testing for mean reversion, the
index data have been cleaned by removing the innovations that exceed three times the volatility, as they
are considered outliers. Figure 15 shows the EMU Corporate A rated 57Y index cleaned from outliers
and shows also the autocorrelations and partial autocorrelation plots of its log series. From this chart
one can deduce that the weekly log spread of the EMU Corporate A rated 57Y index shows a clear
AR(1) pattern. The AR(1) process is characterised by exponentially decreasing autocorrelations and by
a partial autocorrelation that reaches zero after the ﬁrst lag (see discussions around Equation (8) for the
related deﬁnitions):
ACF(k) = α
k
, k = 0, 1, 2, ...; PACF(k) = α, k = 1; PACF(k) = 0, k = 2, 3, 4, ... (46)
The mean reversion of the time series has been tested using the Augmented DickeyFuller test (Said
and Dickey (1984)). The ADF statistics obtained by this time series is 3.7373. The more negative the
ADF statistics, the stronger the rejection of the unit root hypothesis
22
. The ﬁrst and ﬁfth percentiles are
3.44 and 2.87. This means that the test rejects the null hypothesis of unit root at the 1% signiﬁcance
level.
Having discussed AR(1) stylised features in general, this tutorial now embarks on describing speciﬁc
models featuring this behaviour in continuous time.
7 Mean Reversion: The Vasicek Model
The Vasicek model, owing its name to Vasicek (1977), is one of the earliest stochastic models of the short
term interest rate. It assumes that the instantaneous spot rate (or ”short rate”) follows an Ornstein
Uhlenbeck process with constant coeﬃcients under the statistical or objective measure used for historical
estimation:
dx
t
= α(θ −x
t
)dt +σdW
t
(47)
21
For an outlier due to a bad quote, the up and down movement can be interpreted as a strong mean reversion in that
particular data point and is able to inﬂuence the ﬁnal result.
22
Before cleaning the data from the outliers the ADF statistic was 5.0226. This conﬁrms the outliers eﬀect on amplifying
the mean reversion detected level.
0 20 40 60 80 100 120
−20
−15
−10
−5
0
5
10
15
20
Simulation time
AR(1) paths (in bleu) v.s. Random Walk paths (in red)
0 20 40 60 80 100 120
0
20
40
60
80
100
120
Simulation time
Variance of the stationary process AR(1) (in bleu) v.s. Variance of the Random Walk (in red)
Figure 14: The Random walk vs the AR(1) stationary process. (AR(1): µ = 0, α = 0.7, σ = 1 and
Random Walk: µ = 0, σ = 1)
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 28
0
20
40
60
80
100
120
Weekly spread of the EMU Corporate A rated 5−7Y index
1
9
9
8
2
0
0
0
2
0
0
2
2
0
0
4
2
0
0
6
2
0
0
8
0 5 10 15 20 25 30
−0.5
0
0.5
1
Lag
S
a
m
p
l
e
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
30 first Autocorrelations
0 5 10 15 20 25 30
−0.5
0
0.5
1
Lag
S
a
m
p
l
e
P
a
r
t
i
a
l
A
u
t
o
c
o
r
r
e
l
a
t
i
o
n
s
30 first Partial Autocorrelations
Figure 15: On the top: the EMU Corporate A rated 57Y weekly index from 01/98 to 08/07. On the
bottom: the ACF and PACF plot of this time series.
−20 0 20 40 60 80 100 120 140 160
0
10
20
30
40
50
60
70
80
90
100
F
r
e
q
u
e
n
c
y
EMU Corporate A rated 5−7Y index spreads level
EMU Corporate A rated 5−7Y index Histogram
Figure 16: The histogram of EMU Corporate A rated 57Y index.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 29
with α,θ and σ positive and dW
t
a standard Wiener process. Under the risk neutral measure used for
valuation and pricing one may assume a similar parametrisation for the same process but with one more
parameter in the drift modelling the market price of risk; see for example Brigo and Mercurio (2006).
This model assumes a meanreverting stochastic behaviour of interest rates. The Vasicek model has
a major shortcoming that is the non null probability of negative rates. This is an unrealistic property for
the modelling of positive entities like interest rates or credit spreads. The explicit solution to the SDE
(47) between any two instants s and t, with 0 ≤ s < t, can be easily obtained from the solution to the
OrnsteinUhlenbeck SDE, namely:
x
t
= θ
1 −e
−α(t−s)
+x
s
e
−α(t−s)
+σe
−αt
t
s
e
αu
dW
u
(48)
The discrete time version of this equation, on a time grid 0 = t
0
, t
1
, t
2
, . . . with (assume constant for
simplicity) time step △t = t
i
−t
i−1
is:
x(t
i
) = c +bx(t
i−1
) +δǫ(t
i
) (49)
where the coeﬃcients are:
c = θ
1 −e
−α∆t
(50)
b = e
−α∆t
(51)
and ǫ(t) is a Gaussian white noise (ǫ ∼ N (0, 1)). The volatility of the innovations can be deduced by
the It¯o isometry
δ = σ
(1 −e
−2α∆t
) /2α (52)
whereas if one used the Euler scheme to discretise the equation this would simply be σ
√
△t, which is
the same (ﬁrst order in △t) as the exact one above.
Equation (49) is exactly the formulation of an AR(1) process (see section 6). Having 0 < b < 1 when
0 < α implies that this AR(1) process is stationary and mean reverting to a longterm mean given by θ.
One can check this also directly by computing mean and variance of the process. As the distribution of
x(t) is Gaussian, it is characterised by its ﬁrst two moments. The conditional mean and the conditional
variance of x(t) given x(s) can be derived from Equation (48):
E
s
[x
t
] = θ + (x
s
−θ)e
−α(t−s)
(53)
Var
s
[x
t
] =
σ
2
2α
1 −e
−2α(t−s)
(54)
One can easily check that as time increases the mean tends to the longterm value θ and the variance
remains bounded, implying mean reversion. More precisely, the longterm distribution of the Ornstein
Uhlenbeck process is stationary and is Gaussian with θ as mean and
σ
2
/2α as standard deviation.
To calibrate the Vasicek model the discrete form of the process is used, Equation (49). The coeﬃcients
c, b and δ are calibrated using the equation (49). The calibration process is simply an OLS regression of
the time series x(t
i
) on its lagged form x(t
i−1
). The OLS regression provides the maximum likelihood
estimator for the parameters c, b and δ. By resolving the three equations system one gets the following
α, θ and σ parameters:
α = −ln(b)/∆t (55)
θ = c/(1 −b) (56)
σ = δ/
(b
2
−1)∆t/2 ln(b) (57)
One may compare the above formulas with the estimators derived directly via maximum likelihood in
Brigo and Mercurio (2006), where for brevity x(t
i
) is denoted by x
i
and n is assumed to be the number
of observations.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 30
44 46 48 50 52 54 56
0
0.01
0.02
0.03
0.04
0.05
N
o
r
m
a
l
i
z
e
d
F
r
e
q
u
e
n
c
y
LT mean
10 20 30 40 50 60 70 80 90
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
N
o
r
m
a
l
i
z
e
d
F
r
e
q
u
e
n
c
y
LT variance
50 55 60 65 70 75 80
0
0.005
0.01
0.015
0.02
0.025
0.03
N
o
r
m
a
l
i
z
e
d
F
r
e
q
u
e
n
c
y
99th percentile
Figure 17: The distribution of the longterm mean and longterm variance and the 99th percentile
estimated by several simulations of a Vasicek process with α = 8, θ = 50 and σ = 25
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 31
´
b =
n
¸
n
i=1
x
i
x
i−1
−
¸
n
i=1
x
i
¸
n
i=1
x
i−1
n
¸
n
i=1
x
2
i−1
−(
¸
n
i=1
x
i−1
)
2
, (58)
´
θ =
¸
n
i=1
[x
i
−
´
bx
i−1
]
n(1 −
´
b)
, (59)
´
δ
2
=
1
n
n
¸
i=1
x
i
−
´
bx
i−1
−
´
θ(1 −
´
b)
2
. (60)
From these estimators, using (55) and (57) one obtains immediately the estimators for the parameters
α and σ.
Code 12 MATLAB
Calibration of the Vasicek process.
function ML_VasicekParams = Vasicek_calibration(V_data ,dt)
N=length (V_data );
x = [ones(N1,1) V_data (1:N1)];
ols = (x’*x)^(1)*(x’* V_data (2:N));
resid = V_data (2:N)  x*ols;
c=ols(1);
b=ols(2);
delta=std(resid);
alpha=log(b)/dt;
theta=c/(1b);
sigma=delta/sqrt((b^21)*dt/(2* log(b)));
ML_VasicekParams = [alpha theta sigma];
For the Monte Carlo simulation purpose the discrete time expression of the Vasicek model is used.
The simulation is straightforward using the calibrated coeﬃcients α, θ and σ. One only needs to draw
a random normal process ǫ
t
and then generate recursively the new x
ti
process, where now t
i
are future
instants, using equation (49), starting the recursion with the current value x
0
.
8 Mean Reversion: The Exponential Vasicek Model
To address the positivity of interest rates and ensuring their distribution to have fatter tails, it is possible
to transform a basic Vasicek process y(t) in a new positive process x(t).
The most common transformations to obtain a positive number x from a possibly negative number y
are the square and the exponential functions.
By squaring or exponentiating a normal random variable y one obtains a random variable x with fat
tails. Since the exponential function increases faster than the square and is larger (exp(y) >> y
2
for
positive and large y) the tails in the exponential Vasicek model below will be fatter than the tails in the
squared Vasicek (and CIR as well). In other terms, a lognormal distribution (exponential Vasicek) has
fatter tails than a chisquared (squared Vasicek) or even noncentral chisquared (CIR) distribution.
Therefore, if one assumes y to follow a Vasicek process,
dy
t
= α(θ −y
t
)dt +σdW
t
(61)
then one may transform y. If one takes the square to ensure positivity and deﬁne x(t) = y(t)
2
, then by
It¯o’s formula x satisﬁes:
dx(t) = (B +A
x(t) −Cx(t))dt +ν
x(t)dW
t
for suitable constants A, B, C and ν. This “squared Vasicek” model is similar to the CIR model that the
subsequent section will address, so it will not be addressed further here.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 32
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
0
50
100
150
200
250
Figure 18: EMU Corporate A rated 57Y index (in dark black) vs two simulated Exponential Vasicek
paths (in blue and green). The dashed red lines describe the 95th percentile and the 5th percentile of
the exponential Vasicek distribution. The dark red line is the distribution mean. The two remaining red
lines describe one standard deviation above and below the mean.
If, on the other hand, to ensure positivity one takes the exponential, x(t) = exp(y(t)), then one
obtains a model that at times has been termed Exponential Vasicek (for example in Brigo and Mercurio
(2006)).
It¯o’s formula this time reads, for x under the objective measure:
dx
t
= αx
t
(m−log x
t
)dt +σx
t
dW
t
(62)
where α and σ are positive, and a similar parametrisation can be used under the risk neutral measure.
Also, m = θ +
σ
2
2α
.
In other words, this model for x assumes that the logarithm of the short rate (rather than the short
rate itself) follows an OrnsteinUhlenbeck process.
The explicit solution of the Exponential Vasicek equation is then, between two any instants 0 < s < t:
x
t
= exp
θ
1 −e
−α(t−s)
+ log(x
s
)e
−α(t−s)
+σe
−αt
t
s
e
αu
dW
u
(63)
The advantage of this model is that the underlying process is mean reverting, stationary in the long
run, strictly positive and having a distribution skewed to the right and characterised by a fat tail for
large positive values. This makes the Exponential Vasicek model an excellent candidate for the modelling
of the credit spreads when one needs to do historical estimation. As shortcomings for the Exponential
Vasicek model one can identify the explosion problem that appears when this model is used to describe
the interest rate dynamic in a pricing model involving the bank account evaluation (like for the Eurodollar
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 33
future pricing, Brigo and Mercurio (2006)). One should also point out that if this model is assumed for
the short rate or stochastic default intensity, there is no closed form bond pricing or survival probability
formula and all curve construction and valuation according to this model has to be done numerically
through ﬁnite diﬀerence schemes, trinomial trees or Monte Carlo simulation.
The historical calibration and simulation of the Exponential Vasicek model can be done by calibrating
and simulating its logprocess y as described in the previous Section. To illustrate the calibration, the
EMU Corporate A rated 57Y index described in Section 6 is used. The histogram of the index shows
a clear fat tail pattern. In addition, the log spread process is tested to be mean reverting (see again
Section 6). This leads to think about using the Exponential Vasicek model that has a log normal
distribution and an AR(1) disturbance in the log prices.
The OLS regression of the logspread time series on its ﬁrst lags (Equation (49)) gives the coeﬃcients
c, b and δ. Then the α, θ and σ parameters in Equations (62) are deduced using equations (55), (56) and
(57):
c = 0.3625 α = 4.9701
b = 0.9054 θ = 3.8307
δ = 0.1894 σ = 1.4061
Figure 18 shows how the EMU Corporate A rated 57Y index time series ﬁts into the simulation of
50,000 paths using the Equation for the exponential of (49) starting with the ﬁrst observation of the
index (ﬁrst week of 1998). The simulated path (in blue) illustrates that the Exponential Vasicek process
can occasionally reach values that are much higher than its 95–percentile level.
9 Mean Reversion: The CIR Model
Originally the model proposed by Cox, Ingersoll, and Ross (1985b) was introduced in the context of
the general equilibrium model of the same authors Cox, Ingersoll, and Ross (1985a). By choosing an
appropriate market price of risk, the x
t
process dynamics can be written with the same structure under
both the objective measure and risk neutral measure. Under the objective measure one can write
dx
t
= α(θ −x
t
)dt +σ
√
x
t
dW
t
(64)
with α, θ, σ > 0, and dW
t
a standard Wiener process. To ensure that x
t
is always strictly positive one
needs to impose σ
2
2αθ. By doing so the upward drift is suﬃciently large with respect to the volatility
to make the origin inaccessible to x
t
.
The simulation of the CIR process can be done using a recursive discrete version of Equation (64) at
discretisation times t
i
:
x(t
i
) = αθ∆t + (1 −α∆t)x(t
i−1
) +σ
x(t
i−1
) ∆t ǫ
ti
(65)
with ǫ ∼ N (0, 1). If the time step is too large, however, there is risk that the process approaches zero
in some scenarios, which could lead to numerical problems and complex numbers surfacing in the scheme.
Alternative positivity preserving schemes are discussed, for example in Brigo and Mercurio (2006). For
simulation purposes it is more precise but also more computationally expensive to draw the simulation
directly from the following exact conditional distribution of the CIR process:
f
CIR
x
ti
x
ti−1
= ce
−u−v
v
u
q/2
I
q
2
√
uv
(66)
c =
2α
σ
2
(1 −e
−α∆t
)
(67)
u = cx
ti
e
−α∆t
(68)
v = cx
ti−1
(69)
q =
2αθ
σ
2
− 1 (70)
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 34
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
0
20
40
60
80
100
120
Figure 19: EMU Corporate A rated 57Y index (in dark black) vs two simulated CIR paths (in blue
and green). The dashed red lines describe the 95th percentile and the 5th percentile of the distribution.
The dark red line is the distribution mean. The two remaining red lines describe one standard deviation
above and below the mean
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 35
with I
q
the modiﬁed Bessel function of the ﬁrst kind of order q and ∆t = t
i
− t
i−1
. The process x
follows a noncentral chisquared distribution x(t
i
)x(t
i−1
) ∼ χ
2
(2cx
ti−1
, 2q + 2, 2u).
To calibrate this model the Maximum Likelihood Method is applied:
argmax
α>0,θ>0,σ>0
log(L) (71)
The Likelihood can be decomposed, as for general Markov processes
23
, according to Equation (9), as
a product of transition likelihoods:
L =
n
¸
i=1
f
CIR
(x(t
i
)x(t
i−1
); α, θ, σ) (72)
Since this model is also a fat tail mean reverting one, it can be calibrated to the EMU Corporate A
rated 57Y index described in Section 6. The calibration is done by Maximum Likelihood. To speed up
the calibration one uses as initial guess for α the AR(1) coeﬃcient of the Vasicek regression (Equation
(49)) and then infer θ and σ using the ﬁrst two moments of the process. The longterm mean of the CIR
process is θ and its longterm variance is θσ
2
/(2α). Then:
α
0
= −log(b)/∆t (73)
θ
0
= E(x
t
) (74)
σ
0
=
2α
0
V ar(x
t
)/θ
0
(75)
Code 13 MATLAB
MLE calibration of the CIR process.
function ML_CIRparams = CIR_calibration(V_data,dt ,params)
% ML_CIRparams = [alpha theta sigma]
N=length (V_data );
if nargin <3
x = [ones(N1,1) V_data (1:N1)];
ols = (x’*x)^(1)*(x’*V_data (2:N));
m=mean(V_data ); v=var(V_data );
params = [log(ols(2))/dt,m,sqrt(2*ols(2)*v/m)];
end
options = optimset (’MaxFunEvals’, 100000 , ’MaxIter ’, 100000);
ML_CIRparams = fminsearch(@FT_CIR_LL_ExactFull, params , options );
function mll = FT_CIR_LL_ExactFull(params)
alpha = params (1);teta = params (2);sigma = params (3);
c = (2*alpha )/((sigma ^2)*(1 exp(alpha*dt)));
q = ((2*alpha*teta)/(sigma^2)) 1;
u = c*exp(alpha*dt)* V_data (1:N1);
v = c*V_data (2:N);
mll = (N1)* log(c)+sum(u+vlog(v./u)*q/2...
log(besseli (q,2*sqrt(u.*v),1)) abs(real (2*sqrt(u.*v))));
end
end
The initial guess and ﬁtted parameters for the EMU Corporate A rated 57Y index are the following:
α
0
= 0.8662 α = 1.2902
θ
0
= 49.330 θ = 51.7894
σ
0
= 4.3027 σ = 4.4966
Figure 19 shows how the EMU Corporate A rated 57Y index time series ﬁts into the simulation of
40,000 paths using Equation (65) starting with the ﬁrst observation of the index (ﬁrst week of 1998).
23
Equation (65) shows that x(t
i
) depends only on x(t
i−1
) and not on previous values x(t
i−2
), x(t
i−3
)..., so CIR is Markov.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 36
100 110 120 130 140 150
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
x 10
−3
N
o
r
m
a
l
i
z
e
d
F
r
e
q
u
e
n
c
y
Spread level in bp
EMU Corporate A rated 5−7Y index spreads
Vasicek
Exponential Vasicek
CIR
Figure 20: The tail of the EMU Corporate A rated 57Y index histogram (in blue) compared to the
ﬁtted Vasicek process probability density (in black), to the ﬁtted Exponential Vasicek process probability
density (in green) and to the ﬁtted CIR process probability density (in red)
As observed before, the CIR model has less fat tails than the Exponential Vasicek, but avoids the
abovementioned explosion problem of the lognormal processes bank account. Furthermore, when used
as an instantaneous short rate or default intensity model under the pricing measure, it comes with
closed form solutions for bonds and survival probabilities. Figure (20) compares the tail behaviour of
the calibrated CIR process with the calibrated Vasicek and Exponential Vasicek processes. Overall,
the exponential Vasicek process is easier to estimate and to simulate, and features fatter tails, so it is
preferable to CIR. However, if used in conjunction with pricing applications, the tractability of CIR and
nonexplosion of its associated bank account can make CIR preferable in some cases.
10 Mean Reversion and Fat Tails together
In the previous sections, two possible features of ﬁnancial time series were studied: fat tails and mean
reversion. The ﬁrst feature accounts for extreme movements in short time, whereas the second one
accounts for a process that in long times approaches some reference mean value with a bounded variance.
The question in this section is: can one combine the two features and have a process that in short time
instants can feature extreme movements, beyond the Gaussian statistics, and in the long run features
mean reversion?
In this section one example of one such model is formulated: a mean reverting diﬀusion process with
jumps. In other terms, rather than adding jumps to a GBM, one may decide to add jumps to a mean
reverting process such as Vasicek. Consider:
dx(t) = α(θ −x(t))dt +σdW(t) +dJ
t
(76)
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 37
where the jumps J are deﬁned as:
J(t) =
N(t)
¸
j=1
Y
j
, dJ(t) = Y
N(t)
dN(t)
where N(t) is a Poisson process with intensity λ and Y
j
are iid random variables modelling the size of
the jth jump, independent of N and W.
The way the previous equation is written is slightly misleading, since the mean in the shocks in
nonzero. To have zeromean shocks, so as to be able to read the longterm mean level from the equation,
one needs to rewrite it as:
dx(t) = α(θ
Y
− x(t)) dt +σdW(t) + [dJ
t
−λE[Y ] dt] (77)
where the actual long term mean level is:
θ
Y
:= θ +λE[Y ]/α,
and can be much larger than θ, depending on Y and λ. For the distribution of the jump sizes, Gaus
sian jump size are considered, but calculations are possible also with exponential or other jump sizes,
particularly with stable or inﬁnitely divisible distributions.
Otherwise, for a general jump size distribution, the process integrates into:
x
t
= m
x
(x
s
, t −s) +σe
−αt
t
s
e
αz
dW
z
+e
−αt
t
s
e
αz
dJ
z
(78)
m
x
(x
s
, t −s) := θ
1 −e
−α(t−s)
+x
s
e
−α(t−s)
,
v
x
(t −s) :=
σ
2
2α
1 −e
−2α(t−s)
,
where m
x
and v
x
are, respectively, mean and variance of the Vasicek model without jumps.
Notice that taking out the last term in (78) one still gets (48), the earlier equation for Vasicek.
Furthermore, the last term is independent of all others. Since a sum of independent random variables
leads to convolution in the density space and to product in the moment generating function space, the
moment generating function of the transition density between any two instants is computed:
M
ts
(u) = E
x(s)
[exp(ux(t))] (79)
that in the model is given by
M
ts
(u) = exp
m
x
(x
s
, t −s)u +
v
x
(t −s)
2
u
2
+λ
t
s
M
Y
(ue
−α(t−z)
) −1
dz
(80)
where the ﬁrst exponential is due to the diﬀusion part (Vasicek) and the second one comes from the
jumps
24
.
Now, recalling Equation (9), one knows that to write the likelihood of the Markov process it is enough
to know the transition density between subsequent instants. In this case, such density between any two
instants s = t
i−1
and t = t
i
can be computed by the above moment generating function through inverse
Laplace transform
25
.
24
As stated previously, the characteristic function is a better notion than the moment generating function. For the
characteristic function one would obtain:
φ
ts
(u) = exp
mx(xs, t − s)iu −
vx(t − s)
2
u
2
+ λ
t
s
φ
Y
(ue
−α(t−z)
) − 1
dz
. (81)
As recalled previously, the characteristic function always exists, whereas the moment generating function may not exist in
some cases.
25
Or better, from the above characteristic function through inverse Fourier transform.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 38
−30 −20 −10 0 10 20 30
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
N
o
r
m
a
l
i
z
e
d
F
r
e
q
u
e
n
c
y
Innovation level
Historical histogram
Vasicek with jumps
Vasicek
−20 −15 −10 −5 0 5 10 15 20 25
−30
−20
−10
0
10
20
30
40
Historical distribution quantiles
F
i
t
t
e
d
V
a
s
i
c
e
k
q
u
a
n
t
i
l
e
s
−20 −15 −10 −5 0 5 10 15 20 25
−20
−15
−10
−5
0
5
10
15
20
25
Historical distribution quantiles
F
i
t
t
e
d
V
a
s
i
c
e
k
w
i
t
h
J
u
m
p
s
q
u
a
n
t
i
l
e
s
Figure 21: EMU Corporate A rated 57Y index charts. The top chart compares the innovations ﬁt of
the Vasicek process to the ﬁt of the Vasicek process with normal jumps (in red). The bottom left ﬁgure
is the QQplot of the Vasicek process innovations against the historical data innovations. The bottom
right ﬁgure is the QQplot of the Vasicek process with normal jumps innovations against the historical
data innovations.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 39
There are however some special jump size distributions where one does not need to use transform
methods. These include the Gaussian jump size case, that is covered in detail now.
Suppose then that the Jump size Y is normally distributed, with mean µ
Y
and variance σ
2
Y
. For
small ∆t, one knows that there will be possibly just one jump. More precisely, there will be one jump
in(t −∆t, t] with probability λ∆t, and no jumps with probability 1 −λ∆t.
Furthermore, with the approximation:
t
t−∆t
e
αz
dJ
z
≈ e
α(t−∆t)
∆J(t), ∆J(t) := J(t) −J(t −∆t). (82)
one can write the transition density between s = t −∆t and t as:
f
x(t)x(t−∆t)
(x) = λ ∆t f
N
(x; m
x
(∆t, x(t −∆t)) +µ
Y
, v
x
(∆t) +σ
2
Y
) (83)
+(1 −λ∆t)f
N
(x; m
x
(∆t, x(t −∆t)), v
x
(∆t))
This is a mixture of Gaussians, weighted by the arrival intensity λ. It is known that mixtures of
Gaussians feature fat tails, so this model is capable of reproducing more extreme movements than the
basic Vasicek model.
To check that the process is indeed meanreverting, in the Gaussian jump size case, through straight
forward calculations, one can prove that the exact solution of Equation (77) or equivalently Equation(76))
satisﬁes:
E[x
t
] = θ
Y
+ (x
0
−θ
Y
)e
−αt
(84)
V ar[x
t
] =
σ
2
+λ(µ
2
Y
+σ
2
Y
)
2α
1 −e
−2αt
(85)
This way one sees that the process is indeed mean reverting, since as time increases the mean tends
to θ
Y
while the variance remains bounded, tending to (σ
2
+λ(µ
2
Y
+σ
2
Y
))/(2α). Notice that, either letting
λ or µ
Y
and σ
Y
going to zero (no jumps) gives back the mean and variance for the basic Vasicek process
without jumps.
It is important that the jump mean µ
Y
not be too close to zero, or one gets again almost symmetric
Gaussian shocks that are already there in Brownian motion and the model does not diﬀer much from
a pure diﬀusion Vasicek model. On the contrary, it is best to assume µ
Y
to be signiﬁcantly positive
or negative with a relatively small variance, so the jumps will be markedly positive or negative. If one
aims at modelling both markedly positive and negative jumps, one may add a second jump process J
−
(t)
deﬁned as:
dJ
−
(t) = Z
M(t)
dM(t)
where now M is an arrival Poisson process with intensity λ
−
counting the jumps, independent of J, and
Z are again iid normal random variables with mean µ
Z
and variance σ
2
Z
, independent of M, J, W. The
overall process reads:
dx(t) = α(θ −x(t)) dt +σdW(t) +dJ(t) −dJ
−
(t) or (86)
dx(t) = α(θ
J
−x(t)) dt +σdW(t) + [dJ(t) −λµ
Y
dt] −[dJ
−
(t) −λ
−
µ
Z
dt],
θ
J
:= θ +
λµ
Y
−λ
−
µ
Z
α
This model would feature as transition density, over small intervals [s, t]:
f
x(t)x(t−∆t)
(x) = λ ∆t f
N
(x; m
x
(∆t, x(t −∆t)) +µ
Y
, v
x
(∆t) +σ
2
Y
) (87)
+λ
−
∆t f
N
(x; m
x
(∆t, x(t −∆t)) −µ
Z
, v
x
(∆t) +σ
2
Z
)
+(1 −(λ +λ
−
)∆t) f
N
(x; m
x
(∆t, x(t −∆t)), v
x
(∆t))
This is now a mixture of three Gaussian distributions, and allows in principle for more ﬂexibility in
the shapes the ﬁnal distribution can assume.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 40
100 110 120 130 140 150
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
x 10
−3
N
o
r
m
a
l
i
z
e
d
F
r
e
q
u
e
n
c
y
Spread level in bp
Historical spread histogram
Vasicek
Vasicek with jumps
Figure 22: The tail of the EMU Corporate A rated 57Y index histogram (in blue) compared to the ﬁtted
Vasicek process probability density (in green) and to the ﬁtted Vasicek with jumps process probability
density (in red)
Again, one can build on the Vasicek model with jumps by squaring or exponentiating it. Indeed, even
with positive jumps the basic model still features positive probability for negative values. One can solve
this problem by exponentiating the model: deﬁne a stochastic process y following the same dynamics as
x in (86) or (76), but then deﬁne your basic process x as x(t) = exp(y(t)). Then historical estimation
can be applied working on the logseries, by estimating y on the logseries rather than on the series itself.
From the examples in Figures 21 and 22 one sees the goodness of ﬁt of Vasicek with jumps, and note
that the Vasicek model with positivemean Gaussian size jumps can be an alternative to the exponential
Vasicek. Indeed, Figure 20 shows that the tail of exponential Vasicek almost exceeds the series, whereas
Vasicek with jumps can be a weaker tail alternative to exponential Vasicek, somehow similar to CIR, as
one sees from Figure 22. However, by playing with the jump parameters or better by exponentiating
Vasicek with jumps one can easily obtain a process with tails that are fatter than the exponential Vasicek
when needed. While, given Figure 20, it may not be necessary to do so in this example, in general the
need may occur in diﬀerent applications.
More generally, given Vasicek with positive and in case negative jumps, by playing with the jump
parameters and in case by squaring or exponentiating, one can obtain a large variety of tail behaviours
for the simulated process. This may be suﬃcient for several applications. Furthermore, using a more
powerful numerical apparatus, especially inverse Fourier transform methods, one may generalise the jump
distribution beyond the Gaussian case and attain an even larger ﬂexibility.
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 41
Notation
General
E Expectation
Var Variance
SDE Stochastic Diﬀerential Equation
P(A) Probability of event A
M
X
(u) Moment generating function of the random variable X computed at u
φ
X
(u) Characteristic function of X computed at u
q Quantiles
f
X
(x) Density of the random variable X computed at x
F
X
(x) Cumulative distribution function of the random variable X computed at x
f
XY =y
(x)
or f
XY
(x; y): Density of the random variable X computed at x
conditional on the random variable Y being y
N(m, V ) Normal (Gaussian) random variable with mean m and Variance V
f
N
(x; m, V ) Normal density with mean m and Variance V computed at x
= (2πV )
−
1
2
exp
−
(x−m)
2
2V
Φ Standard Normal Cumulative Distribution Function
f
P
(x; L) Poisson probability mass at x = 0, 1, 2, . . . with parameter L
= exp(−L)(L)
x
/(x!)
f
Γ
(x; κ, ζ) Gamma probability density at x ≥ 0 with parameters κ and ζ:
= x
κ−1
exp(−x/ζ)/(ζ
κ
Γ(κ))
∼ Distributed as
Statistical Estimation
CI, ACF, PACF Conﬁdence Interval, Autocorrelation Function, Partial ACF
MLE Maximum Likelihood Estimation
OLS Ordinary Least Squares
Θ Parameters for MLE
L(Θ)(x
1
, . . . , x
n
) := f
X1,X2,...,XnΘ
(x
1
, . . . , x
n
) or f
X1,X2,...,Xn
(x
1
, . . . , x
n
; Θ):
likelihood distribution function with parameter Θ in the assumed distribution,
for random variables X
1
, . . . , X
n
, computed at x
1
, . . . , x
n
L
⋆
(Θ)(x
1
, . . . , x
n
) :=log(L(Θ)(x
1
, . . . , x
n
)), loglikelihood.
ˆ
Θ Estimator for Θ
Geometric Brownian Motion (GBM)
ABM Arithmetic Brownian Motion
GBM Geometric Brownian Motion
µ, ˆ µ Drift parameter and Sample drift estimator of GBM
σ, ˆ σ Volatility and sample volatility
W
t
or W(t) Standard Wiener process
Fat Tails: GBM with Jumps, Garch Models
J
t
, J(t) Jump process at time t
N
t
, N(t) Poisson process counting the number of jumps,
jumping by one whenever a new jump occurs, with intensity λ
Y
1
, Y
2
, . . . , Y
i
, ... Jump amplitudes random variables
Time, discretisation, Maturities, Time Steps
∆t Time step in discretization
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 42
s, t, u Three time instants, 0 ≤ s ≤ t ≤ u.
0 = t
0
, t
i
, . . . , t
i
, . . . A sequence of equally spaced time instants, t
i
−t
i−1
= ∆t.
T Maturity
n Sample size
Variance Gamma Models
VG Variance Gamma
¯ µ Logreturn drift in calendartime
g(t) ∼ Γ(t/ν, ν) Transform of calendar time t into market activity time
¯
θ Logreturn drift in marketactivitytime
¯ σ volatility
Vasicek and CIR Mean Reverting Models
θ, α, σ Longterm mean reversion level, speed, and volatility
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management 43
References
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Economet
rics 31, 307–327. 13
Brigo, D. and F. Mercurio (2006). Interest Rate Models: Theory and Practice  With Smile, Inﬂation
and Credit. New York, NY: Springer Verlag. 29, 32, 33
Cochrane, J. H. (2001). Asset Pricing. Princeton: Princeton University Press. 26
Cont, R. and P. Tankov (2004). Financial Modelling with Jump Processes. Chapman and Hall, Financial
Mathematics Series. 21
Cox, J., J. Ingersoll, and S. A. Ross (1985a). An intertemporal general equilibrium model of asset prices.
Econometrica 53, 363–384. 33
Cox, J., J. Ingersoll, and S. A. Ross (1985b). A theory of the term structure of interest rates. Economet
rica 53, 385–408. 33
Dickey, D. and W. Fuller (1979). Distribution of the estimators for autoregressive time series with a unit
root. Journal of the American Statistical Association 74, 427431. 25
Engle, R. and V. K. Ng (1993). Measuring and testing the impact of news on volatility. Journal of
Finance 48, 1749–78. 14
Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin (1992). Testing the null hypothesis of
stationary against the alternative of a unit root. Journal of Econometrics 54, 159178. 26
Madan, D., P. Carr, and E. Chang (1998). The variance gamma process and option pricing. Europ.
Finance Rev. 2, 79–105. 21
Madan, D. B. and E. Seneta (1990). The variance gamma (v.g.) model for share market returns. Journal
of Business 63, 511–524. 21
Merton, R. (1976). Option pricing when underlying stock returns are discontinuous. Journal of Financial
Economics 3, 125–144. 17
Moosbrucker, T. (2006). Pricig cdo with correlated variancegamma distributions. Journal of Fixed
Income 12, 1–30. 21
Phillips, P. and P. Perron. (1988). Testing for unit root in time series regression. Biometrica 75, 335–346.
26
Poterba, J. and L. J. Summers (1988). Mean reversion in stock prices: Evidence and implications. Journal
of Financial Economics 22, 27–59. 26
Said, S. and D. Dickey (1984). Testing for unit roots in autoregressive moving average models of unknown
order. Biometrika 71, 599607. 25, 26, 27
Seneta, E. (2004). Fitting the variancegamma model to ﬁnancial data. Journal of Applied Probabil
ity 41A, 177–187. 21
Vasicek, O. A. (1977). An equilibrium characterization of the term structure. Journal of Financial
Economics 5, 177–188. 27
Contents
1 Introduction 2 Modelling with Basic Stochastic Processes: GBM 3 Fat Tails: GARCH Models 4 Fat Tails: Jump Diﬀusion Models 5 Fat Tails Variance Gamma (VG) process 6 The Mean Reverting Behaviour 7 Mean Reversion: The Vasicek Model 8 Mean Reversion: The Exponential Vasicek Model 9 Mean Reversion: The CIR Model 10 Mean Reversion and Fat Tails together 3 5 13 17 21 25 27 31 33 36
2
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management
3
1
Introduction
In risk management and in the rating practice it is desirable to grasp the essential statistical features of a time series representing a risk factor to begin a detailed technical analysis of the product or the entity under scrutiny. The quantitative analysis is not the ﬁnal tool, since it has to be integrated with judgemental analysis and possible stresses, but represents a good starting point for the process leading to the ﬁnal decision. This tutorial aims to introduce a number of diﬀerent stochastic processes that, according to the economic and ﬁnancial nature of the speciﬁc risk factor, can help in grasping the essential features of the risk factor itself. For example, a family of stochastic processes that can be suited to capture foreign exchange behaviour might not be suited to model hedge funds or a commodity. Hence there is need to have at one’s disposal a suﬃciently large set of practically implementable stochastic processes that may be used to address the quantitative analysis at hand to begin a risk management or rating decision process. This report does not aim at being exhaustive, since this would be a formidable task beyond the scope of a survey paper. Instead, this survey gives examples and a feeling for practically implementable models allowing for stylised features in the data. The reader may also use these models as building blocks to build more complex models, although for a number of risk management applications the models developed here suﬃce for the ﬁrst step in the quantitative analysis. The broad qualitative features addressed here are fat tails and mean reversion. This report begins with the geometric Brownian motion (GBM) as a fundamental example of an important stochastic process featuring neither mean reversion nor fat tails, and then it goes through generalisations featuring any of the two properties or both of them. According to the speciﬁc situation, the diﬀerent processes are formulated either in (log) returns space or in levels space, as is more convenient. Levels and returns can be easily converted into each other, so this is indeed a matter of mere convenience. This tutorial will address the following processes: • Basic process: Arithmetic Brownian Motion (ABM, returns) or GBM (levels). • Fat tails processes: GBM with lognormal jumps (levels), ABM with normal jumps (returns), GARCH (returns), Variance Gamma (returns). • Mean Reverting processes: Vasicek, CIR (levels if interest rates or spreads, or returns in general), exponential Vasicek (levels). • Mean Reverting processes with Fat Tails: Vasicek with jumps (levels if interest rates or spreads, or returns in general), exponential Vasicek with jumps (levels). Diﬀerent ﬁnancial time series are better described by diﬀerent processes above. In general, when ﬁrst presented with a historical time series of data for a risk factor, one should decide what kind of general properties the series features. The ﬁnancial nature of the time series can give some hints at whether mean reversion and fat tails are expected to be present or not. Interest rate processes, for example, are often taken to be mean reverting, whereas foreign exchange series are supposed to be often fat tailed and credit spreads feature both characteristics. In general one should: Check for mean reversion or stationarity Some of the tests mentioned in the report concern mean reversion. The presence of an autoregressive (AR) feature can be tested in the data, typically on the returns of a series or on the series itself if this is an interest rate or spread series. In linear processes with normal shocks, this amounts to checking stationarity. If this is present, this can be a symptom for mean reversion and a mean reverting process can be adopted as a ﬁrst assumption for the data. If the AR test is rejected this means that there is no mean reversion, at least under normal shocks and linear models, although the situation can be more complex for nonlinear processes with possibly fat tails, where the more general notions of stationarity and ergodicity may apply. These are not addressed in this report. If the tests ﬁnd AR features then the process is considered as mean reverting and one can compute autocorrelation and partial autocorrelation functions to see the lag in the regression. Or one can go
D. Brigo, A. Dalessandro, M. Neugebauer, F. Triki: A stochastic processes toolkit for Risk Management
4
directly to the continuous time model and estimate it on the data through maximum likelihood. In this case, the main model to try is the Vasicek model. If the tests do not ﬁnd AR features then the simplest form of mean reversion for linear processes with Gaussian shocks is to be rejected. One has to be careful though that the process could still be mean reverting in a more general sense. In a way, there could be some sort of mean reversion even under nonGaussian shocks, and example of such a case are the jumpextended Vasicek or exponential Vasicek models, where mean reversion is mixed with fat tails, as shown below in the next point. Check for fat tails If the AR test fails, either the series is not mean reverting, or it is but with fatter tails that the Gaussian distribution and possibly nonlinear behaviour. To test for fat tails a ﬁrst graphical tool is the QQplot, comparing the tails in the data with the Gaussian distribution. The QQplot gives immediate graphical evidence on possible presence of fat tails. Further evidence can be collected by considering the third and fourth sample moments, or better the skewness and excess kurtosis, to see how the data diﬀer from the Gaussian distribution. More rigorous tests on normality should also be run following this preliminary analysis1. If fat tails are not found and the AR test failed, this means that one is likely dealing with a process lacking mean reversion but with Gaussian shocks, that could be modeled as an arithmetic Brownian motion. If fat tails are found (and the AR test has been rejected), one may start calibration with models featuring fat tails and no mean reversion (GARCH, NGARCH, Variance Gamma, arithmetic Brownian motion with jumps) or fat tails and mean reversion (Vasicek with jumps, exponential Vasicek with jumps). Comparing the goodness of ﬁt of the models may suggest which alternative is preferable. Goodness of ﬁt is determined again graphically through QQplot of the model implied distribution against the empirical distribution, although more rigorous approaches can be applied 2 . The predictive power can also be tested, following the calibration and the goodness of ﬁt tests. 3 . The above classiﬁcation may be summarized in the table, where typically the referenced variable is the return process or the process itself in case of interest rates or spreads: NO mean reversion Mean Reversion Normal tails ABM Vasicek Fat tails ABM+Jumps, (N)GARCH, VG Exponential Vasicek CIR, Vasicek with Jumps
Once the process has been chosen and calibrated to historical data, typically through regression or maximum likelihood estimation, one may use the process to simulate the risk factor over a given time horizon and build future scenarios for the portfolio under examination. On the terminal simulated distribution of the portfolio one may then single out several risk measures. This report does not focus on the risk measures themselves but on the stochastic processes estimation preceding the simulation of the risk factors. In case more than one model is suited to the task, one can analyze risks with diﬀerent models and compare the outputs as a way to reduce model risk. Finally, this ﬁrst survey report focuses on single time series. Correlation or more generally dependence across risk factors, leading to multivariate processes modeling, will be addressed in future work. Prerequisites The following discussion assumes that the reader is familiar with basic probability theory, including probability distributions (density and cumulative density functions, moment generating functions), random variables and expectations. It is also assumed that the reader has some practical knowledge of stochastic calculus and of It¯’s formula, and basic coding skills. For each section, the MATLAB code o is provided to implement the speciﬁc models. Examples illustrating the possible use of the models on actual ﬁnancial time series are also presented.
1 such as, for example, the Jarque Bera, the ShapiroWilk and the AndersonDarling tests, that are not addressed in this report 2 Examples are the Kolmogorov Smirnov test, likelihood ratio methods and the Akaike information criteria, as well as methods based on the Kullback Leibler information or relative entropy, the Hellinger Distance and other divergences 3 The Diebold Mariano statistics can be mentioned as an example for AR processes. These approaches are not pursued here.
and ﬁnally the moment generating function of a standard normal random variable Z given by: E eaZ = e 2 a . The moments of S(T ) are easily found using the above solution. to compute probabilities involving future asset prices. Hence the ﬁrst two moments (mean and variance) of S(T ) are: referred to as a Wiener Process. which is a particular type of process for which only the current asset price is relevant for computing probabilities of events involving future asset prices. Independence in the increments implies that the model is a Markov Process. The theory is very complex and actually this diﬀerential notation is just shorthand for an integral equation. t and u. (3) 2 log S(u) − log S(t) = Through exponentiation and taking u = T and t = 0 the solution for the GBM equation is obtained: S(T ) = S(0) exp 1 µ − σ 2 T + σW (T ) 2 (4) This equation shows that the asset price in the GBM follows a lognormal distribution. F. while the logarithmic returns log(St+∆t /St ) are normally distributed. The property of independent identically distributed increments is very important and will be exploited in the calibration as well as in the simulation of the random behaviour of asset prices. The increments in the logarithm of the asset value are normally distributed. 5A 4 Also 1 2 (5) . Neugebauer. All the concepts will be introduced using the fundamental process for ﬁnancial modelling. Using some basic stochastic calculus the equation can be rewritten as follows: d log S(t) = 1 µ − σ 2 dt + σdW (t) 2 (2) where log denotes the standard natural logarithm. A. The GBM is speciﬁed as follows: dS(t) = µS(t)dt + σS(t)dW (t) (1) Here W is a standard Brownian motion. continuoustime stochastic process is one where the value of the price can change at any point in time. being the process underlying the Black and Scholes formula for pricing European options. the fact that W (T ) is Gaussian with mean zero and variance T . The d terms indicate that the equation is given in its continuous time version5 . the geometric Brownian motion with constant drift and constant volatility.1 The Geometric Brownian Motion The geometric Brownian motion (GBM) describes the random behaviour of the asset price level S(t) over time. Brigo. M. a special diﬀusion process4 that is characterised by independent identically distributed (iid) increments that are normally (or Gaussian) distributed with zero mean and a standard deviation equal to the square root of the time step. In other terms. A discretetime stochastic process on the other hand is one where the price of the ﬁnancial asset can only change at certain ﬁxed times. knowledge of the whole past does not add anything to knowledge of the present. In practice. Dalessandro. σ 2 (u − t) . but continuous time processes prove to be useful to analyse the properties of the model. leading to: 1 µ − σ 2 (u − t) + σ(W (u) − W (t)) ∼ N 2 1 µ − σ 2 (u − t). besides being paramount in valuation where the assumption of continuous trading is at the basis of the Black and Scholes theory and its extensions. 2.D. Triki: A stochastic processes toolkit for Risk Management 5 2 Modelling with Basic Stochastic Processes: GBM This section provides an introduction to modelling through stochastic processes. The GBM is ubiquitous in ﬁnance. discrete behavior is usually observed. This equation is straightforward to integrate between any two instants. The process followed by the log is called an Arithmetic Brownian Motion.
function S = G B M _ s i m u l a t i o n( N_Sim . 350 300 300 250 250 Spot Price Spot Price 0 100 200 300 Time in days 400 500 600 200 200 150 150 100 100 50 50 0 100 200 300 Time in days 400 500 Figure 1: GBM Sample Paths and Distribution Statistics The following Matlab function simulates sample paths of the GBM using equation (7). due to the fact that the equation can be solved exactly. the continuous equation between discrete instants t0 < t1 < . on the right of Figure 2. Zn are independent random draws from the standard normal distribution. The quantiles of the historical distribution are plotted on the Yaxis and the quantiles of the chosen modeling distribution on the Xaxis. The second chart. Code 1 M AT LAB Code to simulate GBM Sample Paths. . If the comparison distribution provides a good ﬁt to the historical returns. Neugebauer. the log of which is plotted in the left chart of Figure 2. then the QQplot approximates a straight line. end 2. . Dalessandro. The simulation is exact and does not introduce any discretisation error.2 Parameter Estimation This section describes how the GBM can be used as an attempt to model the random behaviour of the FTSE100 stock index. A. dt . . The following charts plot a number of simulated sample paths using the above equation and the mean plus/minus one standard deviation and the ﬁrst and 99th percentiles over time.D. S (: . S0 ) mean =( mu 0. which was vectorised in Matlab. sigma . T +1). BM = sigma * sqrt ( dt )* normrnd (0 . . < tn needs to be solved as follows: S(ti+1 ) = S(ti ) exp 1 µ − σ 2 (ti+1 − ti ) + σ 2 ti+1 − ti Zi+1 (7) where Z1 . Triki: A stochastic processes toolkit for Risk Management 6 E [S(T )] = S(0)eµT Var [S(T )] = e2µT S 2 (0) eσ 2 T −1 (6) To simulate this process. S = S0 * ones ( N_Sim . The mean of the asset price grows exponentially with time. N_Sim .1 . Z2 .2: end )= S0 * exp ( cumsum ( mean + BM .2)).T . plots the quantiles of the log return distribution against the quantiles of the standard normal distribution. mu .5* sigma ^2)* dt . In the case . F. M. T ). . Brigo. This QQplot allows one to compare distributions and to check the assumption of normality.
. x2 ..i+k−1 and Xi+k and X(ti+k ) given X(ti+1 ). Neugebauer. The next sections will outline some extensions of the GBM that provide a better ﬁt to the historical distribution. .. X(ti+k ) − Xi+k ¯ ¯ i+1. . . PACF also gives an estimate of the lag in an autoregressive process. .. F.. A. The basic concept of MLE.. . Therefore the GBM at best provides only a rough approximation for the FTSE100. .. ˆ ˆ (8) where m and v are the sample mean and variance of the series.i+k−1 . xn realised from a random process X(ti ) is to plot the autocorrelation function of the lag k.. Brigo. or more precisely it is a sort of sample estimate of: ¯ ¯ i+1. deﬁned as: ACF(k) = 1 (n − k)ˆ v n−k i=1 (xi − m)(xi+k − m). roughly speaking one may say that while ACF(k) gives an estimate of the correlation between X(ti ) and X(ti+k ).i+k−1 are the best estimates (regressions in a linear context) of X(ti ) where Xii+1. besides the ACF. ACF and PACF for the FTSE 100 equity index returns are shown in the charts in Figure 3. M.. to ensure the GBM is appropriate for modelling the asset price in a ﬁnancial time series one has to check that the returns of the observed data are truly independent.i+k−1 ) Corr(X(ti ) − Xii+1. 2. respectively.. To ﬁnd Θ that yields the best ﬁt to the historical dataset the method of maximum likelihood estimation is used (MLE). MLE can be used for both continuous and discrete random variables. X(ti+k−1 ). k = 1. . ˆ ˆ one also considers the Partial AutoCorrelation function (PACF). Triki: A stochastic processes toolkit for Risk Management 7 QQ Plot of Sample Data versus Standard Normal 8 7 6 Quantiles of Input Sample 6 log of FTSE 100 Index 5 4 3 2 −6 1 0 1940 1946 1953 1959 1965 1971 1978 1984 1990 1996 2003 −8 −10 −4 −3 −2 −1 0 1 Standard Normal Quantiles 2 3 4 4 2 0 −2 −4 10 8 Figure 2: Historical FTSE100 Index and QQ Plot FTSE 100 of the FTSE100 log returns. Another important property of the GBM is the independence of returns... Therefore. A common test for autocorrelation (typically on returns of a given asset) in a time series sample x1 .D. Dalessandro. . The assumption of independence in statistical terms means that there is no autocorrelation present in historical returns. For the FTSE100 one does not observe any signiﬁcant lags in the historical return time series. σ) for the GBM based on the historical returns.. Without fully deﬁning PACF. .. the QQplot show that the historical quantiles in the tail of the distribution are signiﬁcantly larger compared to the normal distribution. This is in line with the general observation about the presence of fat tails in the return distribution of ﬁnancial asset prices. . is to ﬁnd the parameter estimates Θ for the assumed probability density function .. which means the independence assumption is acceptable in this example. . as suggested by the name. Often. Maximum Likelihood Estimation Having tested the properties of independence as well as normality for the underlying historical data one can now proceed to calibrate the parameter set Θ = (µ. PACF(k) informs on the correlation between X(ti ) and X(ti+k ) that is not accounted for by the shorter lags from 1 to k − 1..
Neugebauer. and the resulting function is maximised in Θ..X(t1 )... .. the Vasicek example later on.. x3 . x3 . x3 . x2 .8 Sample Partial Autocorrelations 0 5 10 Lag 15 20 0. the Markov property is suﬃcient to be able to write the likelihood along a time series of observations as a product of transition likelihoods on single time steps between two adjacent instants. which would cause numerical problems with handling such numbers. Since the product of density values could become very small.4 0.Θ and the likelihood function becomes the product of the probability density of each data point in the sample without any conditioning.8 Sample Autocorrelation 0. Brigo. . . this happens if the maximum likelihood estimation is done on logreturns rather than on the levels... or if one had worked at level space S(ti ) rather than in return space X(ti )..2 −0..Θ · · · fX(t1 )X(t0 ). For general Markov processes diﬀerent from GBM. in that one takes the loglikelihood in general even in situations when there are no logreturns but just general processes... since maxima are not aﬀected by monotone transformations. the likelihood function is usually converted6 to the log likelihood L∗ = log L. the problem is considerably simpliﬁed. F.X(tn ). i. x2 .D. In other terms. xn for the random vector X1 . Triki: A stochastic processes toolkit for Risk Management 8 Sample Autocorrelation Function (ACF) Sample Partial Autocorrelation Function 0. so that the only variable in f is Θ.Θ = fX(tn )X(tn−1 ).4 0. again denoting f the probability density of the vector random sample: L(Θ) = fX(t0 ). one can write.Θ · fX(t0 ). L(Θ) = fΘ (x1 . The likelihood (or probability) of observing a particular data sample. The likelihood function for iid variables is in general: n In the more extreme cases where the observations in the data series are iid. xn ) = i=1 fΘ (xi ) (11) ˆ The MLE estimate Θ is found by maximising the likelihood function. the observed sample x1 . will be denoted by L(Θ). The key idea there is to notice that for a Markov process xt . M.Θ ...2 0 5 10 Lag 15 20 Figure 3: Autocorrelation and Partial Autocorrelation Function for FTSE 100 fΘ (continuous case) or probability mass function (discrete case) that will maximise the likelihood or probability of having observed the given data sample x1 .6 0.7 For the iid case the loglikelihood reads: n L∗ (Θ) = 6 This 7 The log fΘ (xi ) i=1 (12) is allowed. Xn .e. this can be necessary though: see..Θ · fX(tn−1 )X(tn−2 ).. By deﬁning the X as (10) X(ti ) := log S(ti ) − log S(ti−1 )... x2 .X2 . The two are not directly related..Θ (9) one sees that they are already independent so that one does not need to use explicitly the above decomposition (9) through transitions. xn is used inside fX1 .. . the likelihood function. for example.6 0.Θ = fX(ti ).. for stochastic processes.Xn .2 0. In general. . A. reader has to be careful not to confuse the log taken to move from levels S to returns X with the log used to move from the likelihood to the loglikelihood.2 0 0 −0. Dalessandro. In the GBM case. since fX(ti )X(ti−1 ).
This is also one of the reasons for the success of risk neutral valuation: in risk neutral pricing one is allowed to replace this very diﬃcult drift with the risk free rate. This conﬁrms that increasing the observations between two given ﬁxed instants is useless. 2. CNRIAMI 94. shows that the drift estimation can be statistically consistent.. 205. the logreturns increments form normal iid random variables. only if the number of points times the time step tends to inﬁnity. possibly consisting of hundreds of observations.^2/2/( dt * sigma ^2)). so as to avoid getting stuck in a local minima given by a lower mode. Code 2 M AT LAB MLE Function for iid Normal Distribution.ll . for large data samples the diﬀerence is very small. The impossibility of estimating this quantity with precision is what keeps the market alive. the only helpful limit would be letting the ﬁnal time of the sample go to inﬁnity. is useless. Brigo. ˆ (14) Similarly one can ﬁnd the maximum likelihood estimates for both parameters simultaneously using the following numerical search algorithm. This leads to following well known closed form expressions for the sample mean and variance10 of the log returns samples xi for Xi = log S(ti ) − log S(ti−1 ) n n m= ˆ i=1 xi /n v= ˆ i=1 (xi − m)2 /n. F. Excel Solver or fminsearch in MatLab.mu * dt ). The bias can be corrected by multiplying v the estimator by n/(n − 1). 11 Jacod. params . f m i n s e a r c h( @normalLL . v). m. 100000 .D. sigma = abs ( params (2)). In a way. M. where s is the observed sample time series for geometric brownian motion S. ’ MaxIter ’ . one may have to preprocess the optimisation through a coarse grid global method such as genetic algorithms or simulated annealing. although for potentially multimodal likelihoods. 10 Note that the estimator for the variance is biased. In the case of the GBM. Triki: A stochastic processes toolkit for Risk Management 9 The maximum of the log likelihood usually has to be found numerically using an optimisation algorithm8 . 9 Hogg and Craig.17. This is done by diﬀerentiating the Gaussian density function with respect to each parameter and setting the resulting derivative equal to zero. Introduction to Mathematical Statistics. m is expressed as: ˆ n m= ˆ i=1 xi /n = log(s(tn )) − log(s(t0 )) n (15) The only relevant information used on the whole S sample are its initial and ﬁnal points. determined by mean and variance. All of the remaining sample. the MLE method provides closed form expressions for m and v through MLE of Gaussian iid samples9 . n = length ( Ret ). example 4. if one could really estimate the drift one would know locally the direction of the market in the future. In the case of the GBM. Dalessandro. p. fourth edition. options = optimset ( ’ M a x F u n E v a l s’ . A. since E(ˆ) = v(n − 1)/n. Neugebauer. in case v is known. which is eﬀectively one of the most diﬃcult problems. or let the number of observations increase while keeping a ﬁxed time step . function mll = normalLL ( params ) mu = params (1). dt . Based on Equation (3) (with u = ti+1 and t = ti ) the parameters are computed as follows: 1 m = µ − σ 2 ∆t ˆ 2 v = σ 2 ∆t (13) The estimates for the GBM parameters are deduced from the estimates of m and v. converging in probability to the true drift value. ll = n * log (1/ sqrt (2* pi * dt )/ sigma )+ sum ( ( Ret .11 This is linked to the fact that drift estimation for random processes like GBM is extremely diﬃcult. such as possibly the jump diﬀusion cases further on. in Statistics of Diﬀusion Processes: Some Elements. mll = . function [ mu sigma ] = G B M _ c a l i b r a t i o n(S . 8 For example. However. options ). end end An important remark is in order for this important example: given that xi = log s(ti ) − log s(ti−1 ). each with a well known density fΘ (x) = f (x. p. 100 000). params ) Ret = p r i c e 2 r e t( S ).
Since a standard Gaussian squared is a chisquared. Conﬁdence Levels for Parameter Estimates For most applications it is important to know the uncertainty around the parameter estimates obtained from historical data. however. with the mean given by the sum of the means and the variance given by the sum of variances. then one knows that Cn ∼ N (nm. Generally. For this sample path one then estimates the best ﬁt parameters as if the historical sample were the simulated one. Therefore the 95th conﬁdence level for the sample mean is given by: √ √ Cn v Cn v − 1. The charts in Figure 4 plot the simulated distribution for the sample mean and sample variance against the normal and Chisquared distribution given by the theory and the Gaussian distribution. simulate a sample path of length equal to the historical sample with those parameters. is the MLE estimate for the sample mean m. thanks to the central limit theorem. Let Cn = X1 + X2 + X3 + . + Xn .96 √ m + 1. For example. A. using the Gaussian law of independent return increments and the fact that the sum of independent Gaussians is still Gaussian.. Brigo. n (18) hence under the assumption that the returns are normal one ﬁnds the following conﬁdence interval for the sample variance: n ˆ χv qU v n ˆ χv qL (19) χ χ where qL . M. The conﬁdence levels and distributions of individual estimated parameters only tell part of the story. one can reach a similar Gaussian limit for Cn . Cn /n. which is statistically expressed by conﬁdence levels. one goes through this procedure and obtains another sample of the parameters. simulating from the ﬁrst parameters. One then has: P Cn − nm √ √ v n z =P Cn /n − m √ √ v/ n z = Φ(z) (16) where Φ() denotes the cumulative standard normal distribution. In the absence of these explicit relationships one can also derive the distribution of each of the parameters numerically. this gives: n P i=1 (xi − m) ˆ √ v 2 z =P n v ˆ v z = χ2 (z). which means the uncertainty around the parameter estimates declines.96 √ (17) n n n n This shows that as n increases the CI tightens. After the parameters have been estimated (either through MLE or with the best ﬁt to the chosen model. F. qU are the quantiles of the chisquared distribution with n degrees of freedom corresponding to the desired conﬁdence level. as shown in the next sections. By iterating one builds a random distribution of each parameter. Neugebauer. An implementation of the simulated parameter distributions is provided in the next Matlab function. In both cases the simulated distributions are very close to their theoretical limits. . Then again. in valueatrisk calculations. Dalessandro. In expected shortfall calculations. but knowing only that log returns increments are iid. by simulating the sample parameters as follows. one would be interested in computing the expectation of the asset 12 Even without the assumption of Gaussian log returns increments. nv) 12 . and the sum of n independent chisquared is a chisquared with n degrees of freedom. call these the ”ﬁrst parameters”). the shorter the historical data sample. Triki: A stochastic processes toolkit for Risk Management 10 The drift estimation will improve considerably for mean reverting processes.D. and obtains a sample of the parameters. one would be interested in ﬁnding a particular percentile of the asset distribution. but only asymptotically in n.. the larger the uncertainty as expressed by wider conﬁdence levels. all starting from the ﬁrst parameters. For the GBM one can derive closed form expressions for the Conﬁdence Interval (CI) of the mean and standard deviation of returns.
Dalessandro. n = length ( ret ) m = sum ( ret )/ n v = sqrt ( sum (( ret . % load ’C :\ Research \ QS CVO T e c h n i c a l Paper \ Report \ Chapte r1 \ f t s e 1 0 0 m o n t h l y 1 9 4 0. dt =1/12. S0 =100. The moment generating function MX (t) of a random variable X is deﬁned. M.m_tmp ).10000 .03 0.01 0.2 0.2)). A.1 . 0. mu_dist = m_dist / dt +0. under some technical conditions for existence. 2. sigma_dist . F.^2 .025 0.99 . also simulates the distribution for the 99th percentile of the asset price after three years when resampling from simulation according to the ﬁrst estimated parameters.5* s i g m a _ d i s t .m ).1 0. The histogram is plotted in Figure 5.3 Characteristic and Moment Generating Functions Let X be a random variable.2.005 0.02 0. m_dist = sum ( SimR . mat ’ S = f t s e 1 0 0 m o n t h l y 1 9 4 0 d a t a.035 0.799). ret = log ( S (2: end . The k − th moment mk (X) is deﬁned as: mk (X) = E(X k ) = k x x pX (x) ∞ xk fX (x)dx. which computes the parameter distributions.^2)/ n ) SimR = normrnd (m . either discrete with probability pX (ﬁrst case) or continuous with density fX (second case).^2./ S (1: end 1 .02 Frequency 0. v_dist = sum (( SimR .D. pct_dist = S0 * logninv (0. mat ’ load ’ f t s e 1 0 0 m o n t h l y 1 9 4 0. In the case of the GBM.015 0. for example the Cauchy distribution does not even admit the ﬁrst moment (mean). Triki: A stochastic processes toolkit for Risk Management 11 Code 3 M AT LAB Simulating distribution for Sample mean and variance.005 0 0 600 650 700 750 800 850 900 Normalised Sample Variance 950 1000 Figure 4: Distribution of Sample Mean and Variance conditional on exceeding such percentile.2).035 0.n ).03 0. −∞ Moments are not guaranteed to exist.2)/ n . as: M (t) = E(etX ) = tx x e pX (x) ∞ etx fX (x)dx −∞ .15 GBM Parameter mu 0. s i g m a _ d i s t= sqrt ( v_dist / dt ).015 0.05 0.2)/ n . m_tmp = repmat ( m_dist . one knows the marginal distribution of the asset price to be log normal and one can compute percentiles analytically for a given parameter set.v . Brigo.01 0. mu_dist ).025 Frequency 0. The MATLAB code 2. Neugebauer.
Brigo. The same holds. M. . Finally. notably some with inﬁnite moments. X and Y . k! (20) The moment generating function (MGF) owes its name to the fact that from its knowledge one may back out all moments of the distribution. deﬁned as: φX (t) = E(eitX ). whereas the characteristic function is the Fourier transform. where i is the unit imaginary number from complex numbers theory. dtj A better and more general deﬁnition is that of characteristic function. Neugebauer. then the moment generating function of W is the product of the moment generating functions of X and Y : MX+Y (t) = E(et(X+Y ) ) = E(etX etY ) = E(etX )E(etY ) = MX (t)MY (t).005 0 130 140 150 160 170 180 190 99th Percentile with S0=100 over a three year period 200 Figure 5: Distribution for 99th Percentile of FTSE100 over a three year Period and can be seen to be the exponential generating function of its sequence of moments: ∞ MX (t) = k=0 mk (X) k t . computes a simple multiplication and then goes back to density space rather than working directly in density space through convolution. Basically. The same property holds for the characteristic function.D.4 Properties of Moment Generating and Characteristic Functions If two random variables. one sees easily that by diﬀerentiating j times the MGF and computing it in 0 one obtains the jth moment: dj MX (t)t=0 = mj (X). for the characteristic function. Finding the density of a sum of independent random variables can be diﬃcult. then X and Y have the same probability distribution.015 0. F. This is why typically one moves in moment generating space. This is a very powerful property since the sum of independent returns or shocks is common in modelling. The characteristic function exists for every possible density. it is worth mentioning that the moment generating function can be seen as the Laplace transform of the probability density. Triki: A stochastic processes toolkit for Risk Management 12 0.01 0. the characteristic function forces one to bring complex numbers into the picture but brings in considerable advantages. If X and Y are independent random variables and W = X + Y . whereas the moment generating function does not exist for some random variables. A. Dalessandro.025 Frequency 0. resulting in a convolution. one can derive them also from the characteristic function through diﬀerentiation. even for the Cauchy distribution.02 0. have the same moment generating function M (t). (21) 2. whereas in moment or characteristic function space it is simply a multiplication. If moments exist.03 0. Indeed.035 0. more generally.
Clearly conditional on the information at the previous time t1 the volatility is no longer random.e. ∆S(ti ) S(ti ) σ(ti )2 ǫ(ti )2 = = = µ∆ti + σ(ti )∆W (ti ) ω¯ 2 + ασ(ti−1 )2 + βǫ(ti−1 )2 σ (σ(ti )∆W (ti )) 2 (22) where ∆S(ti ) = S(ti+1 ) − S(ti ) and the ﬁrst equation is just the discrete time counterpart of Equation (1). GARCH instead allows the volatility to depend on the evolution of the process. conditional on information available up to time ti ) is given by a linear combination of the past variance σ(ti−1 )2 and squared values of past returns. σ 2 ) is given by: MZ (t) = eµt+ 2 σ1 2 2 t σ2 2 2 t Hence MX (t) = eµ1 t+ From this and MY (t) = eµ2 t+ 2 σ1 2 2 t 2 σ2 2 2 t MW (t) = MX (t)MY (t) = eµ1 t+ eµ2 t+ 2 σ2 2 2 t = e(µ1 +µ2 )t+ 2 2 (σ1 +σ2 ) 2 t 2 one obtains the moment generating function of a normal random variable with mean (µ1 + µ2 ) and 2 2 variance (σ1 + σ2 ). with ω = 0 and β = 1 − α one recovers the exponentially weighted moving average model. 3 Fat Tails: GARCH Models GARCH13 models were ﬁrst introduced by Bollerslev (1986).D. The GARCH parameters can be estimated by ordinary least squares regression or by maximum likelihood estimation. not depending on the state of the process at any time. then one knows that W = X + Y is normal 2 2 with mean µ1 + µ2 and variance σ1 + σ2 . although one depending only on past information. a constraint summarising of the fact that the variance at each instant is a weighted sum where the weights add up to one.1) model. Indeed. Fat tail distributions allow the realizations of the random variables having those distributions to assume values that are more extreme than in the normal/Gaussian 13 Generalised Autoregressive Conditional Heteroscedasticity . if X and Y are two independent random variables and X is normal with mean µ1 and 2 2 variance σ1 and Y is normal with mean µ2 and variance σ2 .1) model assumes one lag period in the variance. Neugebauer. the unconditional distribution of returns is nonGaussian and exhibits so called “fat” tails. q) model with p lag terms in the variance and q terms in the squared returns. for example σ(t2 )2 = ω¯ 2 + βǫ(t1 )2 + αω¯ 2 + α2 σ(t0 )2 σ σ so that σ(t2 ) depends on the random variable ǫ(t1 )2 . A. assumes that the conditional variance (i. In particular the GARCH(1. Brigo. σ(ti ) would just be a known function of time and one would stop with the ﬁrst equation above. Interestingly. but unconditionally it is. contrary to the GBM volatility. In fact. as expected. As a result of the time varying statedependent volatility. The GARCH(1. Triki: A stochastic processes toolkit for Risk Management 13 As an example. introduced by Bollerslev. The assumption underlying a GARCH model is that volatility changes with time and with the past information. If one had just written the discrete time version of GBM. GBM could have been easily generalised to a time dependent but fully deterministic volatility. in the above GARCH model σ(t)2 is assumed itself to be a stochastic process. the moment generating function of a normal random variable Z ∼ N (µ. F. M. But in this case. the GBM model introduced in the previous section assumes that volatility (σ) is constant. subject to the constraint ω + α + β = 1. which unlike the equally weighted MLE estimator in the previous section. ¯ The GARCH(1. In GARCH the variance is an autoregressive process around a longterm average variance rate σ 2 . puts more weight on the recent observations.1) model can be generalised to a GARCH(p. In contrast. Dalessandro.
and this report will introduce other approaches to fat tails later on.01 0 −10 −6 −8 −8 2 0 −2 −6 −4 −2 0 X Quantiles 2 4 6 8 Figure 6: Historical FTSE100 Return Distribution vs. Gamma equal to zero leads to the symmetric GARCH(1. and β are positive and α.1) model. which means a period of high volatility will be followed by high volatility and conversely a period of low volatility will be followed by low volatility. which conﬁrms the presence of autocorrelation in the volatility. the NGARCH model introduced by Engle and Ng (1993) is an interesting extension of the GARCH(1. returns in GARCH will tend to be “riskier” than the Gaussian returns of GBM. Dalessandro. which is the time loop.D. In the NGARCH γ is positive. while “bad news” or negative returns yields a subsequent increase in volatility. in which positive and negative ǫ(ti−1 ) have the same eﬀect on the conditional variance. α. .07 Y Quantiles −5 0 5 Historical vs Simulated Return Distribution 10 Frequency 0. and γ are subject to the stationarity constraint α + β(1 + γ 2 ) < 1. since it allows for asymmetric behaviour in the volatility such that “good news” or positive returns yield subsequently lower volatility. Notice that there is only one loop in the simulation routine. M.05 0. A. This is an often observed characteristic of ﬁnancial data. Figure 8 shows the monthly volatility of the FTSE100 index and the corresponding autocorrelation function. F. GARCH is not the only approach to account for fat tails. MATLAB Routine 4 extends Routine 1 by simulating the GBM with the NGARCH volatility structure.1).1) model. For example. this model contains an additional parameter γ. A long line of research followed the seminal work of Bollerslev leading to customisations and generalisations including exponential GARCH (EGARCH) models. thus reducing the impact of good news (ǫ(ti−1 ) > 0) and increasing the impact of bad news (ǫ(ti−1 ) < 0) on the variance.06 0. NGARCH Return Distribution case.09 0. this model is calibrated to the historic asset returns of the FTSE100 index. β. For illustration purpose. In particular. Therefore. Compared to the GARCH(1.08 Historical NGARCH Sim Normal 8 6 4 0.04 0. Triki: A stochastic processes toolkit for Risk Management 14 0. The particular structure of the GARCH model allows for volatility clustering (autocorrelation in the volatility). threshold GARCH (TGARCH) models and nonlinear GARCH (NGARCH) models among others. Since the GARCH model is path dependent it is no longer possible to vectorise the simulation over the time step. In order to estimate the parameters for the NGARCH geometric Brownian motion the maximumlikelihood method introduced in the previous section is used. and are therefore more suited to model risks of large movements than the Gaussian. Brigo. Neugebauer. The NGARCH is speciﬁed as follows: ∆S(ti ) S(ti ) σ(ti )2 ǫ(ti )2 = µ∆ti + σ(ti )∆W (ti ) = ω + ασ(ti−1 )2 + β (ǫ(ti−1 ) − γσ(ti−1 )) = (σ(ti )∆W (ti ))2 2 (23) NGARCH parameters ω. which is an adjustment to the return innovations.03 −4 0.1 0.02 0.
In other words.1 Log returns Volatility 0. Dalessandro.05 0 −0.05 0 −0.04 0.05 100 200 300 400 500 4 years of daily simulation 600 700 800 900 1000 Figure 7: NGARCH News Eﬀect vs GARCH News Eﬀect Although the variance is changing over time and is state dependent and hence stochastic conditional on information at the initial time.02 0 100 200 300 400 500 4 years of daily simulation 600 700 800 900 1000 GBM Log returns 0. M.05 100 200 300 400 500 4 years of daily simulation 600 700 800 900 1000 GBM Log returns with N−GARCH effects 0. F.05 0 −0. Neugebauer. .06 0. it is locally deterministic.1 Log returns Volatility 0.04 0.D.02 0 100 200 300 400 500 4 years of daily simulation 600 700 800 900 1000 GBM Log returns Volatility with N−GARCH effects 0. Brigo. which means that conditional on the information at ti−1 the variance is known and constant in the next step between time ti−1 and ti . Triki: A stochastic processes toolkit for Risk Management 15 Market News 2 News level 0 −2 100 200 300 400 500 4 years of daily simulation 600 700 800 900 1000 GBM Log returns Volatility with GARCH effects 0. A.06 0.1 Log returns 0.08 0.08 0.1 Log returns 0.05 100 200 300 400 500 4 years of daily simulation 600 700 800 900 1000 GBM Log returns with GARCH effects 0.1 Log returns 0.
but an unknown heavy tailed distribution.1).5* sigma_ . mean =( mu_ 0.2 0. alpha_ = params (3).5 2 1. To do so one only needs to maximise the term in brackets in (26). . function S = N G A R C H _ s i m u l a t i o n( N_Sim . Brigo. which indicates that normal density is only locally applicable between successive returns. i ) ... β. ω. M.2 0 5 10 Lag 15 20 Figure 8: Monthly Volatility of the FTSE100 and Autocorrelation Function Code 4 M AT LAB Code to Simulate GBM with NGARCH vol. Dalessandro. v = omega_ + alpha_ * v + beta_ *( sqrt ( v ).. BM = normrnd (0 . The loglikelihood function for the sample x1 .* BM (: .D. gamma_ = params (5). i )).^2. µ.. α. 14 Note that the unconditional density of the NGARCH GBM is no longer normal. S (: . v = v0 * ones ( N_Sim . v0 = omega_ /(1 .5 0 1940 1946 1953 1959 1965 1971 1978 1984 1990 1996 2003 −0. Note the appearance of xi−1 in the function. conditional on the variance at ti−1 the density of the process at the next instant is still normal14 . Triki: A stochastic processes toolkit for Risk Management 16 Sample Autocorrelation Function (ACF) 4 3.1 . i 1). Due to the time changing state dependent variance one no longer ﬁnds explicit solutions for the individual parameters and has to use a numerical scheme to maximise the likelihood function through a solver. as noticed previously. xn of: Xi = is given by: n ∆S(ti ) S(ti ) L∗ (Θ) = fΘ (xi .beta_ ). i )= S (: . omega_ = params (2).4 0. xi−1 ) = L∗ (Θ) = log fΘ (xi . T ). A.* BM (: . Neugebauer. end end the variance in the GARCH models is locally constant. xi−1 ) i=1 2 fN (xi . σi ) = (24) exp − (xi − µ)2 2 2σi (25) (26) 1 2 2πσi constant + 1 2 2 − log(σi ) − (xi − µ)2 /σi 2 where fΘ is the density function of the normal distribution. T ).6 2. γ). Moreover.* exp ( mean + sigma_ .. for i =2: T sigma_ = sqrt ( v ). bet a_ = params (4).alpha_ . S = S0 * ones ( N_Sim . xi .5 0. The parameter set for the NGARCH GBM includes ﬁve parameters.8 3 Sample Autocorrelation Monthly volatily in % 0. S0 ) mu_ = params (1). . params .5 1 0 0. since the constant and the factor do not depend on the parameters.gamma_ * sqrt ( v )).T . Therefore. F. N_Sim . Θ = (µ.^2). the conditional returns are still independent.. .
and the jump sizes are random variables with some preferred distribution (typically Gaussian. L o g L i k e l i h o o d = L o g L i k e l i h o o d + log ( exp ( . . .D. λT ) = 15 Fattailed exp(−λT )(λT )x . . a standard time homogeneous Poisson process dictates the arrival of jumps. the Poisson probability mass function is: fP (x. F. end % Variance s t a t i o n a r i t y test var_t = omega_ / denum .innov ^2/ var_t /2)/ sqrt ( pi *2* var_t )). A large number of monthly returns using Routine 5 is ﬁrst simulated. or dJ(t) = (YN (t) − 1)dN (t). beta_ = abs ( params (4)). if denum < 0.mu_ . innov = returns ( time ) . options = optimset ( ’ M a x F u n E v a l s’ . end .mu_ . In this model. . Brigo. A. f m i n s e a r c h( @NG_JGBM_LL . For a Poisson distribution with parameter λT . options ). pa rams ) returns = p r i c e 2 r e t( data ). gamma_ = params (5). each experiment with probability of success λ/n. end end Having estimated the best ﬁt parameters one can verify the ﬁt of the NGARCH model for the FTSE100 monthly returns using the QQplot. function mll = N G _ J G B M _ L L( params ) mu_ = params (1). Neugebauer. omega_ = abs ( params (2)). 4 Fat Tails: Jump Diﬀusion Models Compared with the normal distribution of the returns in the GBM model. the logreturns of a GBM model with jumps are often leptokurtotic15 . L o g L i k e l i h o o d = log ( exp ( . denum = 1 . 1. Triki: A stochastic processes toolkit for Risk Management 17 Code 5 M AT LAB Code to Estimate Parameters for GBM NGARCH. Dalessandro. M. The quantiles of the simulated NGARCH distribution are much more aligned with the quantiles of the historical return distribution than was the case for the plain normal distribution (compare QQplots in Figures 2 and 6). ’ MaxIter ’ . if var_t <0. This section discusses the implementation of a jumpdiﬀusion model with compound Poisson jumps. The QQ plot of the simulated returns against the historical returns shows that the GARCH model does a signiﬁcantly better job when compared to the constant volatility model of the previous section. 100 000). alpha_ = abs ( params (3)). x! distribution. it is usually adopted to model the occurrence of rare events. return . mll = intmax . 2.L o g L i k e l i h o o d. (28) where (NT )T 0 follows a homogeneous Poisson process with intensity λ. params . Since the Poisson distribution is the limiting distribution of a binomial distribution where the number of repetitions of the Bernoulli experiment n tends to inﬁnity. for time =2: r e t u r n s L e n g t h var_t = omega_ + alpha_ * var_t + beta_ *( innov . The Poisson distribution is a discrete probability distribution that expresses the probability of a number of events occurring in a ﬁxed period of time. 100000 .beta_ *(1+ gamma_ ^2). return . r e t u r n s L e n g t h= length ( returns ).innov ^2/ var_t /2)/ sqrt ( pi *2* var_t )).gamma_ * sqrt ( var_t ))^2. The model SDE can be written as: dS(t) = µS(t)dt + σS(t)dW (t) + S(t)dJt where again Wt is a univariate Wiener process and Jt is the univariate jump process deﬁned by: NT (27) JT = j=1 (Yj − 1). exponential or multinomial). and is thus distributed like a Poisson distribution with parameter λT . end mll = . mll = intmax . x = 0. innov = returns (1) . modelled through jumps. function [ mu_ omega_ alpha_ beta_ gamma_ ]= N G _ c a l i b r a t i o n( data . Merton (1976) applied this model to option pricing for stocks that can be subject to idiosyncratic shocks.alpha_ .
σY ) that are also independent from the Brownian motion W and the basic Poisson process N . Dalessandro. A. jump = normrnd ( mu_y_ *( jumpnb .1).d lognormal variables (Yj ∼ exp N µY . params . The Poisson process (NT )T 0 is counting the number of arrivals in 2 [1. lambda_ = params (3 ). the number of jump events occurring in the ﬁxed small time interval ∆t has the expectation of λ∆t. Also in the jump diﬀusion case. end S = r e t 2 p r i c e( M_simul ’ . t ) = mu_star * dt + sigma_ * sqrt ( dt )* randn ( N_Sim . for t =1: T jumpnb = poissrnd ( lambda_ * dt . function S = J G B M _ s i m u l a t i o n( N_Sim .lambda_ * dt ) .i. sqrt ( jumpnb )* sigma_y_ ). end The discretisation of the Equation (30) for the given time step ∆t is: S(t) = S(t − ∆t) exp σ2 µ− 2 √ ∆t + σ ∆tεt nt Yj j=1 (31) where ε ∼ N (0. N_Sim . µ∗ = 1 µ + λµY − σ 2 2 (33) ∗ so that the jumps ∆Jt have zero mean. we use the fact that for the Poisson process with rate λ. Neugebauer.1) + jump . as is shown right below. which in this case reads d log S(t) = d log S(t) = 1 µ − σ 2 dt + σdW (t) + log(YN (t) )dN (t) or. T ). M_simul (: . and for comparison with (2). dt .T . it can be useful to write the equation for the logarithm of S. sigma_ = params (2). M. Triki: A stochastic processes toolkit for Risk Management 18 (recall that by convention 0! = 1). T ].D. S0 ) ’. 1) and nt = Nt − Nt−∆t counts the jumps between time t − ∆t and t. Brigo. When rewriting this equation for the log returns X(t) := ∆ log(S(t)) = log(S(t)) − log(S(t − ∆t)) we ﬁnd: √ ∗ X(t) = ∆ log(S(t)) = µ∗ ∆t + σ ∆t εt + ∆Jt (32) ∗ where the jumps ∆Jt in the time interval ∆t and the drift µ∗ are deﬁned as: nt ∗ ∆Jt = j=1 log(Yj ) − λ ∆t µY . M_simul = zeros ( N_Sim . mu_y_ = params (4). S0 ) mu_star = params (1). F. so that here too one deﬁnes X(ti )’s according to (10). Since the calibration and simulation are done in a discrete time framework. equivalently 2 1 µ + λµY − σ 2 dt + σdW (t) + [log(YN (t) )dN (t) − µY λdt] 2 (29) where now both the jump (between square brackets) and diﬀusion shocks have zero mean. The solution of the SDE for the levels S is given easily by integrating the log equation: S(T ) = S(0) exp σ2 µ− 2 N (T ) T + σW (T ) j=1 Yj (30) Code 6 M AT LAB Code to Simulate GBM with Jumps. and Yj is the size of the jth jump. sigma_y_ = params (5). Here too it is convenient to do MLE estimation in logreturns space. Compute nt nt ∗ E(∆Jt ) = = E (nt µY ) − λ ∆t µY = 0 E j=1 log Yj − λ ∆t µY = E E j=1 log Yj nt − λ ∆t µY . The Y are i.
06 GBM with Jumps histogram −0.012 0. . A.02 Simulated daily log return 0. M.06 −0.002 0 −0. Then the calibration of the model parameters can be done using the MLE on logreturns X conditioning on the jumps.046 S(t) 0. Thus the conditional distribution of X(t) = ∆ log(S(t)) is also normal and the ﬁrst two conditional moments are: E(∆ log(S(t))nt ) = Var(∆ log(S(t))nt ) = µ∗ ∆t + (nt − λ ∆t)µY = µ − σ 2 /2 ∆t + nt µY .02 0. Triki: A stochastic processes toolkit for Risk Management 19 Jumps 0.15.001 The simulation of this model is best done by using Equation (32).01 0.04 0. Dalessandro.042 0.03 0.006 0.04 −0.06 0.048 0. Neugebauer.008 0.01 Frequency 0. Brigo. ∗ Conditional on the jumps occurrence nt .014 0.008 0.02 0 0. λ = 10. σY = 0.002 0 −0.01 0 50 100 150 1 year of daily simulation 200 250 GBM (black line) and GBM with Jumps (green line) 0. nt σY .01 0 −0.006 0.044 0.012 0.04 0.02 0 0.02 Simulated daily log return 0.05 0. σ = 0.04 0.004 0.04 0. F. µ∗ = 0.038 0 50 100 150 1 year of daily simulation 200 250 GBM histogram 0. the jump disturbance ∆Jt is normally distributed ∗ 2 ∆Jt nt ∼ N (nt − λ∆t)µY . µY = 0.004 0.06 Figure 9: The Eﬀect of the Jumps on the Simulated GBM path and on the Density Probability of the Log Returns.06 Frequency 0.014 0. σ ∆t + 2 2 nt σY (34) (35) The probability density function is the sum of the conditional probabilities density weighted by the probability of the conditioning variable: the number of jumps.04 −0.D.05 Jumps log return level 0.03.
is then given by: n L (Θ) f (xi . condMean = mu_ * dt + mu_y_ *( jumps . function mll = F T _ J G B M _ L L( params ) mu_ = params (1). σ 2 ∆t + σY (1 − λ∆t)fN xi .µY . function [ mu_star sigma_ lambda_ mu_y_ sigma_y_ ] = J G B M _ c a l i b r a t i o n( data . σY ) ∗ = i=1 +∞ log f (xi . .condMean )^2/ condVol /2)/ sqrt ( pi *2* condVol ). 2 Indeed. sigma_y_ = abs ( params (5)). for time =1: d a t a L e n g t h P r o b D e n s i t y = 0.L o g L i k e l i h o o d.σY >0 L∗ = log(L) (36) The loglikelihood function for the returns X(t) at observed times t = t1 . λ. µ − σ 2 /2 ∆t + µY . 100000 . . The goodness to ﬁt is seen by showing the QQplot of the simulated monthly log returns against the historical FTSE100 monthly returns in Fig.lambda_ * dt ). M a x _ j u m p s = 5. µ − σ 2 /2 ∆t + jµY . c o n d T o J u m p s = exp ( ( data ( time ) . end end argmax µ∗ . 100 000). if µY is larger than one and σY is not large. µ − σ 2 /2 ∆t. This plot shows that the Jumps model ﬁts the data better then the simple GBM model. . L o g L i k e l i h o o d = 0. end mll = . Brigo. params . jumps in the logreturns will tend to be markedly positive only. .e. µ. the reader can refer to Section 10. a mixture of two Gaussian random variables weighted by the probability of zero or one jump in ∆t. µY . F. options = optimset ( ’ M a x F u n E v a l s’ . sigma_ = abs ( params (2)). µY . A. end L o g L i k e l i h o o d = L o g L i k e l i h o o d + log ( P r o b D e n s i t y). the above expression simpliﬁes in f (xi . Triki: A stochastic processes toolkit for Risk Management 20 Code 7 M AT LAB Code to Calibrate GBM with Jumps. to then exponentiate it to obtain a new level process for S featuring a more complex jump behaviour. with ∆t = ti − ti−1 . mu_star = params (1). .λ>0. Neugebauer. λ. In that case. one can add markedly negative jumps to the arithmetic Brownian motion return process (29). where the more general mean reverting case is described. params = f m i n s e a r c h( @FT_JGBM_LL . options ). which may not be realistic in some cases. for jumps =0: Max_jumps 1 jumpProb = exp ( . each weighted by a Poisson probability P (nt = j) = fP (j. typically the Poisson process jumps at most once. condVol = dt * sigma_ ^2+ jumps * sigma_y_ ^2. xn . d a t a L e n g t h= length ( data ). The model parameters are estimated by MLE for FTSE100 on monthly returns and then diﬀerent paths are simulated. params ) returns = p r i c e 2 r e t( data ). P r o b D e n s i t y = P r o b D e n s i t y + jumpProb * c o n d T o J u m p s. λ. σ. Dalessandro. . λ∆t). σ 2 ∆t + jσY (37) (38) = j=0 This is an inﬁnite mixture of Gaussian random variables. µ. mu_y_ = params (4). ’ MaxIter ’ .σ>0. M. σ 2 ∆t i. sigma_y_ = abs ( params (5)). σ. tn with values x1 . σY ) = + 2 λ ∆t fN xi . Notice that if ∆t is small. lambda_ = abs ( params (3)). f a c t o r i e l = f a c t o r i a l(0: M a x _ j u m p s).lambda_ * dt )*( lambda_ * dt )^ jumps / f a c t o r i e l( jumps +1) .D. µ. lambda_ = abs ( params (3)). . σY ) 2 P (nt = j)fN xi . 10. For details on how to incorporate negative jumps and on the more complex mixture features this induces. mu_y_ = params (4). σ. µY . . . sigma_ = abs ( params (2)). Finally. dt .
Carr.15 0. that is an increasing ¯ ¯ process with independent and stationary increments. the subordinator.15 −0.2 −0. In a way.4 −0.09 0. This means that the market time has to reconcile with the calendar time between t and u on average. is a random transform of calendar time. Neugebauer. (40) 16 For a more advanced introduction and background on the above processes refer to Seneta (2004).1 0.05 0.D. 17 In addition one may say that (39) is a L`vy process of ﬁnite variation but of relatively low activity with small jumps. This section presents yet another method to accomplish fat tails.1 0. θ and σ are real constants and σ 017 . where the return process is also a mixture. this section will focus on the VG distribution.5 0. where the mixing weights density is given by the Gamma distribution of the subordinator.2 −0.01 0 −0.04 −0. Jump GBM Return Distribution 5 Fat Tails Variance Gamma (VG) process The two previous sections have addressed the modelling of phenomena where numerically large changes in value (“fat tails”) are more likely than in the case of the normal distribution.02 −0.1 −0.1 0.2 0.1 0 0.05 0. and instead of adding a new source of randomness like jumps. They are a subclass of the generalised hyperbolic (GH) distributions. Conditional on the subordinator g. Moosbrucker (2006) and Madan. Both VG and NIG have exponential tails and the tail decay rate is the same for both VG and NIG distributions and it is smaller than for the Normal distribution. A.2 Figure 10: Historical FTSE100 Return Distribution vs.15 0.08 0. One can deﬁne the ‘market time’ as a positive increasing random process.6 Frequency 0. A possible related interpretative feature is that ﬁnancial time. This model diﬀers from the “usual” notation of Brownian ¯ ¯ ¯ ¯ motion mainly in the term gt . and Chang (1998).3 −0.05 0 0.03 −0. In fact one introduces gt to characterise the market activity time.2 −0.1 Historical Jumps GBM GBM 0. The VG distribution was introduced in the ﬁnancial literature by16 Madan and Seneta (1990).07 0. They are characterised by drift parameters. and this is through a process obtained by time changing a Brownian motion (with drift µ and volatility σ ) with an independent subordinator. Cont and Tankov (2004). that can also be thought of as a mixture of normal distributions.15 0. related to trading. Therefore these models allow for more ﬂexibility in capturing fat tail features and high kurtosis of some historical ﬁnancial distributions with respect to GBM. with stationary increments g(u) − g(t) for u ≥ t ≥ 0. or market activity time. between two instants t and u: σ ( W (g(u)) − W (g(t)) )g ∼ σ ¯ ¯ (g(u) − g(t))ǫ.05 0 X Quantiles 0. The philosophy is not so diﬀerent from the GBM with jumps seen before. Brigo. Two models based on Brownian subordination are the Variance Gamma (VG) process and Normal Inverse Gaussian (NIG) process. e .4 0. Dalessandro. F. and a variance parameter of the subordinator. An important assumption is that E[g(u) − g(t)] = u − t. M.1 0.06 Y Quantiles −0.2 Historical vs Simulated Return Distribution 0. Among fat tail distributions obtained through subordination. Triki: A stochastic processes toolkit for Risk Management 21 0. The model considered for the logreturns of an asset price is of this type: ¯ d log S(t) = µ dt + θ dg(t) + σ dW (g(t)) ¯ ¯ S(0) = S0 (39) where µ.05 0. instead of making the volatility state dependent like in GARCH or random like in stochastic volatility models. g(t).3 0. In probabilistic jargon g(t) is a subordinator. volatility parameter of the Brownian motion. one makes the time of the process random through a new process.
( abs (x . T ))]). besselk ( T / nu . Dalessandro. Code 8 M AT LAB code that simulate the asset model in eq. params . Brigo. σ Mean and variance of these logreturns can be compared to the logreturns of GBM in Equation (3). nu = params (2). M.1 Probability Density Function of the VG Process The unconditional density may then be obtained by integrating out g in Eq (41) through its Gamma density. µ ¯ ¯ Var(log S(u) − log S(t)) = (¯ 2 + θ2 ν)(u − t). v1 . sigma .mu ). The above integral converges and the PDF of the VG process X∆t deﬁned in Eq. one has easily that: E[log(S(u)) − log(S(t))] = (¯ + θ)(u − t). mu = par ams (4). abs (x . T ) v1 M2 v3 v4 v6 K fx = = = = = = = 2* exp (( theta *( x . ν). Triki: A stochastic processes toolkit for Risk Management 22 where ǫ is a standard Gaussian. and ¯ found to be the same if (¯ + θ) = µ − σ 2 /2 and σ 2 + θ2 ν = σ 2 . T ). 5.T . θg. T ). σ 2 g)fΓ g. v3 . Notice also that the gamma process deﬁnition implies {g(u) − g(t)} ∼ Γ( u−t . Increments are independent and stationary Gamma random variables. a Gamma process with parameter ν that is independent from the standard Brownian motion {Wt }t 0 ./ sigma ^2. and hence: ¯ (log(S(u)/S(t)) − µ(u − t))g ∼ N (θ(g(u) − g(t)). given for ω > 0 by Kη (x) = 1 2 ∞ 0 x y η−1 exp − (y −1 + y) dy 2 Code 9 M AT LAB code of the variance gamma density function as in equation (43). this is again a (inﬁnite) mixture of Gaussian densities. ν dg (42) ν 0 As mentioned above. where one sets u − t =: ∆t and X∆t := log(S(t + ∆t)/S(t)): ∞ ∆t ¯ ¯ f(X∆t ) (x) = fN (x.0.D. exp ( mu * dt + theta * g + sigma * sqrt ( g ).5 . where the weights are given by the Gamma densities rather than Poisson probabilities. theta . (39) is f(X∆t ) (x) = √ 2e ¯ θ(x−µ) ¯ σ2 ¯ ∆t ν σ 2πν ¯ Γ(1/ν) x − µ ¯ 2¯ 2 σ ν ∆t/ν−1/2 ¯ + θ2 K∆t/ν−1/2 s − µ ¯ 2¯ 2 σ ν σ2 ¯ ¯ + θ2 (43) Here Kη (·) is a modiﬁed Bessel function of the third kind with index η. g = gamrnd ( dt / nu . σ 2 (g(u) − g(t))) ¯ ¯ (41) t VG assumes {g(t)} ∼ Γ( ν . This corresponds to the following density function. . A. N_Sim . function fx = V G d e n s i t y(x .* sqrt ( M2 )). F.0./ sqrt ( M2 ). v6 ). . (2* sigma ^2)/ nu + theta ^2.* v4 . S0 ) theta = params (1).* randn ( N_Sim . S = S0 * cumprod ([ ones (1 . (39). Through iterated expectations. nu . function S = V G _ s i m u l a t i o n( N_Sim . sigma = params (3). this Gamma assumption implies E[g(u) − g(t)] = u − t and Var(g(u) − g(t)) = ν(u − t). nu .mu ). In this case. Neugebauer.* K . the diﬀerences will be visible µ ¯ ¯ with higher order moments. dt .5) . ν Consistently with the above condition on the expected subordinator.^( T / nu .mu )) / sigma ^2) / ( ( nu ^( T / nu )) * sqrt (2* pi ) * sigma * gamma ( T / nu ) ). mu . The process in equation (39) is called a VarianceGamma process. ν) for any u and t.
To replicate the simulation. Triki: A stochastic processes toolkit for Risk Management 23 4 0. ν = 6 and σ = . . E[ezX ] is given by: 1 ¯ ¯ MX (z) = eµz 1 − θνz − ν σ 2 z 2 ¯ 2 − ∆t ν (44) and one obtains the four central moments of the return distribution over an interval of length ∆t: E[X] ¯ E[(X − X)2 ] ¯ E[(X − X)3 ] ¯ E[(X − X)4 ] = = = = ¯ X = (¯ + θ)∆t µ ¯ 2 ¯ (ν θ + σ 2 )∆t ¯ ¯3 ν 2 + 3¯ 2 ν θ)∆t σ ¯ (2θ ¯ ¯ ¯ ¯ ¯ ¯ ¯ σ (3ν σ 4 + 12θ2 σ 2 ν 2 + 6θ4 ν 3 )∆t + (3¯ 4 + 6θ2 σ 2 ν + 3θ4 ν 2 )(∆t)2 .5 0. the maximum likelihood estimation (MLE) of π consists in ﬁnding π that max¯ imises the logarithm of the likelihood function L(π) = Πx f (xi .3 St 2. and denoting by π = {¯. Brigo. µ = 0 and various values of of the parameter θ.2 0.2 Calibration of the VG Process Given a time series of independent logreturns log(S(ti+1 )/S(ti )) (with ti+1 −ti = ∆t).5 0 100 200 300 time (days) 400 500 ¯ Figure 11: On the left side. σ = . θ4 . use code (5) with the parameters µ = θ = 0. σ . xn ).5 3. π). VG densities for ν = . The moment generating function of i=1 X∆t := log(S(t + ∆t)/S(t)). say (x1 . ¯ ¯ ¯ 5. Dalessandro.9.4 3 0. A. .D. ν} the variancegamma probability density function parameters that one µ ¯ ¯ would like to estimate. θ. ¯ ¯ On the right. ∆t .03. Now considering the skewness and the kurtosis: S K = = ¯ 3ν θ∆t ¯2 + σ 2 )∆t)3/2 ((ν θ ¯ 4 ¯ ¯ ¯ ¯ ¯ ¯ (3ν σ + 12θ2 σ 2 ν 2 + 6θ4 ν 3 )∆t + (3¯ 4 + 6θ2 σ 2 ν + 3θ4 ν 2 )(∆t)2 ¯ σ . x2 .5 0. θ3 . Neugebauer.4.1 2 0 −4 −2 0 2 4 6 8 10 12 14 16 1. M. simulated VG paths samples. obtain: S K = = ¯ 3ν θ √ σ ∆t ¯ 3 1+ ν . ¯ ((ν θ2 + σ 2 )∆t)2 ¯ ¯ ¯ ¯ ¯ and assuming θ to be small. thus ignoring θ2 . . F. .
where the VG process is a more suitable choice than the geometric Brownian motion. 25 0. 13 illustrates the results. options = statset ( ’ MaxIter ’ . sigma . sigma = sqrt ( V / dt ). M. Notice in particular that the diﬀerence . M = mean ( data ). pdf_VG . ’ options ’ . % VG MLE pdf_VG = @ ( data . A.15 0. sigma .99 15 0. lb = [ . ¯ ¯ On the right side. ’ pdf ’ . theta .95 10 0.theta . start .1 0. V = std ( data ). An example can be the calibration of the historical time series of the EUR/USD exchange rate. ’ lower ’ . VG densities for θ = 1. ’ M a x F u n E v a l s’ . nu . Dalessandro. Brigo.25 0.5 0. % function params = V G _ c a l i b r a t i o n( data .1000 . Neugebauer.9 5 0 0. start = [ theta . theta . mu = ( M / dt ) .1 0. percentiles of these densities as time varies. ν = . ’ start ’ .05 0 1yr 2yr 3yr 4yr 5yr 6yr 7yr 8yr 9yr 10yr 20 0. The ﬁt is performed on daily returns and the plot in Fig. hist ( data ) dt = 1.3 0. sigma . data = p r i c e 2 r e t( data ).25 −5 0 10 20 2 4 6 years 8 10 ¯ Figure 12: On the left side.75 0. K = kurtosis ( data ).4. ub = [ intmax intmax intmax intmax ]. nu .10000) . nu . dt ) % c o m p u t a t i o n of the initial p a r a m e t e r s data = ( EURUSDsv (: .1)). dt ). mu ) V G d e n s i t y( data .D. ’ upper ’ . σ = 1.intmax ]. nu = ( K /3 1)* dt . params = mle ( data . on the yaxis. Triki: A stochastic processes toolkit for Risk Management 24 So the initial guess for the parameters π0 will be: σ ¯ ν = = V ∆t K − 1 ∆t 3√ S σ ∆t ¯ 3ν ¯ X ¯ −θ ∆t ¯ θ = µ = ¯ Code 10 M AT LAB code of the MLE for variance gamma density function in (43).4. xaxis. options ).ub .2 0.intmax 0 0 . S = skewness ( data ). lb . µ = 0 and time from 1 to 10 years. theta = ( S * sigma * sqrt ( dt ))/(3* nu ). F. mu ]. mu .
SAS.10 . sigma .30).25 . A simple way to do this is to test for stationarity.4 . EViews. zz ) = fzero ( percF .5 10]). ’ H o r i z o n t a l A l i g n m e n t’ . 18 For example Stata.1 . hold on end xlim ([. .1 . sigma . 6 The Mean Reverting Behaviour One can deﬁne mean reversion as the property to always revert to a certain constant or time varying level with limited variance around it. end end subplot (1 . 1 . • the Augmented DF (ADF) test of Said and Dickey (1984) test. subplot (1 . VG_PDF = quadl (f . options ). before thinking about ﬁtting a speciﬁc mean reverting process to available data. k ) = V G d e n s i t y(X . 3 . ’ B a c k g r o u n d C o l o r ’ .2). all other parameters being roughly the same in the ABM and VG cases. 0 . Since for the AR(1) process α < 1 is also a necessary and suﬃcient condition for stationarity.1) plot (X . M. To be precise. num2str ( pc ( zz ))}] .. 1 . Y ).[{ ’\ fontsize {9}\ color { black } ’ .5 . etc. ’ 6 yr ’ . for zz = 1: length ( pc ) percF = @ ( a ) quadl (f . f = @ (x . 0 . ’ 7 yr ’ . close all code to calculate the percentiles of the VG distribution as in ﬁg (12). F.4 . R. 1 . the process behaves like a pure random walk with constant drift. ’ 5 yr ’ . legend ( ’ 1 yr ’ . sigma . PercA ).30 . ’ 9 yr ’ . for k = 1: nT Y (: .99].pc ( zz ). options = optimset ( ’ Display ’ . t ( k )).4 .18 The most popular are: • the Dickey and Fuller (1979. Dalessandro. ’ 8 yr ’ . for zz = 1: length ( pc ) text ( t ( zz ) . In general. △xt+1 = µ + σǫt+1 . ’ 3 yr ’ . t ) V G d e n s i t y(x . ’ 2 yr ’ . ’ 4 yr ’ .500).. nT = 10. t ( k ) ).4.D. ’ center ’ .9 . t ). Triki: A stochastic processes toolkit for Risk Management 25 Code 11 M AT LAB clear all . This property is true for an AR(1) process if the absolute value of the autoregression coeﬃcient is less than one (α < 1).2 . . ylim ([0 . X = linspace ( 15 .95 . a ) . . A. There is a growing number of stationarity statistical tests available and ready for use in many econometric packages. PercA (k . f = @ ( x ) V G d e n s i t y(x . ’ 10 yr ’) xlim ([ 5 27]). the formulation of the ﬁrst order autoregressive process AR(1) is: xt+1 = µ + αxt + σǫt+1 ⇒ △xt+1 = (1 − α) µ − xt 1−α + σǫt+1 (45) All the mean reverting behaviour in the processes that are introduced in this section is due to an AR(1) feature in the discretised version of the relevant SDE. zz ) . sigma = 1.2 . . Note that in the special case where α = 1. ’ TolFun ’ . lb = 0. xlabel ( ’ years ’ ). grid on between the Arithmetic Brownian motion for the returns and the VG process is expressed by the time change parameter ν. pc = [. 3 .2500). t = linspace (1 . PercA ( zz . plot ( t . ’ off ’ . ’ M a x F u n E v a l s’ . DF) test.[1 1 1]).75 . testing for mean reversion is equivalent to testing for stationarity. Neugebauer.32]). ub = 20. it is important to check the validity of the mean reversion assumption..1e 5 . Brigo. 0 . nT ).
03 Figure 13: On the top chart the historical trend of the EUR/USD is reported. The QQplot in the middle shows clearly the presence of tails more pronounced than in the normal case. and are constructed under speciﬁc assumptions.04 0. and Shin (1992. A.04 Y Quantiles 0. is tested using the Augmented DickeyFuller test (Said and Dickey (1984)). ν = 0.04 Y Quantiles 1200 1400 1600 1800 0. Once the stationarity has been established for the time series data.01 0 0.02 0.D.03 0. • the Variance Ratio (VR) test of Poterba and Summers (1988) and Cochrane (2001). Neugebauer.01 0 0.4.02 0 −0. namely the EMU Corporate A rated 57Y index20 . µ = −. Then the risk of having the CDS spreads rising in the mean term to revert to the index historical ”longterm mean” may impact the risk analysis of a product linked to the GLOBOXX index.0002 and σ = 0.02 0 −0. The bottom chart is the plot of the daily return density against the estimated VG one.V.01 X Quantiles 0.0001. This assumption also has some support from the econometric analysis.02 0 −0.01 0. Schmidt. M.Asset Swap Spreads data from 01/98 to 08/07. the underlying GLOBOXX spread history has to be proxied by an Index based on the Merrill Lynch corporate indices19 .02 −0. 20 Bloomberg Tickers ER33 . Suppose that. and • the Kwiatkowski.0061. The mean reversion of the spread indices is supported by the intuition that the credit spread is linked to the economic cycle.01 0. σ = 0.03 −0. . To illustrate this.02 −0.08 0.02 0.4 1.0061. one is interested in analysing the GLOBOXX index history.01 0 0. F.04 −0. (1988.02 −0. The MLE parameters for the normal R.6 1.02 −0. KPSS) test. and the QQplot of the ﬁtted daily historical FX return ¯ time series EUR/USD using a VG process with parameters θ = −.02 −0. Dalessandro.03 −0. Triki: A stochastic processes toolkit for Risk Management 26 1.03 estimated −0. In the middle chart the daily return distribution presents fat tails with respect to a normal distribution.2 1 0. the only remaining test is for the statistical signiﬁcance of the autoregressive lag coeﬃcient. 19 Asset Swap Spreads data of a portfolio of equally weighted Merrill Lynch Indices that match both the average credit quality of GLOBOXX.02 0. This proxy index has appeared to be mean reverting.0001.06 0. Due to the lack of Credit Default Swap (CDS) data for the time prior to 2004.03 80 60 40 20 0 −0.01 X Quantiles 0. A simple Ordinary Least Squares of the time series on its one time lagged version (Equation (45)) allows to assess whether α is eﬀectively signiﬁcant.03 historical estimated −0. as an example.02 0.03 0. its duration. ¯ ¯ • the Phillips and Perron.04 −0. Brigo. PP).8 EUR/USD 200 400 600 800 1000 days 0.02 −0. are µ = −. Phillips. the mean reversion of a Merrill lynch index. and its geographical composition. All tests give results that are asymptotically valid.01 0 0.
k = 2. The ﬁrst and ﬁfth percentiles are 3. Neugebauer. σ = 1 and Random Walk: µ = 0. k = 0. P ACF (k) = α. The AR(1) process is characterised by exponentially decreasing autocorrelations and by a partial autocorrelation that reaches zero after the ﬁrst lag (see discussions around Equation (8) for the related deﬁnitions): ACF (k) = αk . A. (46) The mean reversion of the time series has been tested using the Augmented DickeyFuller test (Said and Dickey (1984)). The ADF statistics obtained by this time series is 3. It assumes that the instantaneous spot rate (or ”short rate”) follows an OrnsteinUhlenbeck process with constant coeﬃcients under the statistical or objective measure used for historical estimation: dxt = α(θ − xt )dt + σdWt (47) 21 For an outlier due to a bad quote. Having discussed AR(1) stylised features in general.. 22 Before cleaning the data from the outliers the ADF statistic was 5. 4. 2. 7 Mean Reversion: The Vasicek Model The Vasicek model. F. Random Walk paths (in red) 20 15 10 5 0 −5 −10 −15 −20 0 20 40 60 Simulation time 80 100 120 Variance of the stationary process AR(1) (in bleu) v.s. . Figure 15 shows the EMU Corporate A rated 57Y index cleaned from outliers and shows also the autocorrelations and partial autocorrelation plots of its log series. From this chart one can deduce that the weekly log spread of the EMU Corporate A rated 57Y index shows a clear AR(1) pattern. (AR(1): µ = 0. M. The more negative the ADF statistics. 1. this tutorial now embarks on describing speciﬁc models featuring this behaviour in continuous time.. . Dalessandro. the stronger the rejection of the unit root hypothesis22 .s. This means that the test rejects the null hypothesis of unit root at the 1% signiﬁcance level. 3. owing its name to Vasicek (1977). σ = 1) .44 and 2. is one of the earliest stochastic models of the shortterm interest rate. the up and down movement can be interpreted as a strong mean reversion in that particular data point and is able to inﬂuence the ﬁnal result. Brigo. k = 1.D. the index data have been cleaned by removing the innovations that exceed three times the volatility.. Variance of the Random Walk (in red) 120 100 80 60 40 20 0 0 20 40 60 Simulation time 80 100 120 Figure 14: The Random walk vs the AR(1) stationary process..7.87. α = 0. Before testing for mean reversion.0226. Triki: A stochastic processes toolkit for Risk Management 27 The ADF test for mean reversion is sensitive to outliers21 . This conﬁrms the outliers eﬀect on amplifying the mean reversion detected level.. as they are considered outliers.7373. AR(1) paths (in bleu) v. P ACF (k) = 0.
EMU Corporate A rated 5−7Y index Histogram 100 90 80 70 Frequency 60 50 40 30 20 10 0 −20 0 20 40 60 80 100 120 140 EMU Corporate A rated 5−7Y index spreads level 160 Figure 16: The histogram of EMU Corporate A rated 57Y index.5 0 0 −0.D. On the bottom: the ACF and PACF plot of this time series. Brigo. Dalessandro. A.5 0 5 10 15 Lag 20 25 30 −0. M.5 0. F. . Neugebauer. Triki: A stochastic processes toolkit for Risk Management 28 Weekly spread of the EMU Corporate A rated 5−7Y index 120 100 80 60 40 20 0 98 19 20 00 20 02 20 04 20 06 08 20 30 first Autocorrelations 1 Sample Partial Autocorrelations Sample Autocorrelation 1 30 first Partial Autocorrelations 0.5 0 5 10 15 Lag 20 25 30 Figure 15: On the top: the EMU Corporate A rated 57Y weekly index from 01/98 to 08/07.
on a time grid 0 = t0 . with 0 ≤ s < t. b and δ are calibrated using the equation (49). . can be easily obtained from the solution to the OrnsteinUhlenbeck SDE. Having 0 < b < 1 when 0 < α implies that this AR(1) process is stationary and mean reverting to a longterm mean given by θ. which is the same (ﬁrst order in △t) as the exact one above. . The explicit solution to the SDE (47) between any two instants s and t. b and δ. see for example Brigo and Mercurio (2006). The coeﬃcients c. The calibration process is simply an OLS regression of the time series x(ti ) on its lagged form x(ti−1 ). By resolving the three equations system one gets the following α. This model assumes a meanreverting stochastic behaviour of interest rates. F. Vars [xt ] = To calibrate the Vasicek model the discrete form of the process is used. M. More precisely. . One can check this also directly by computing mean and variance of the process. it is characterised by its ﬁrst two moments. Equation (49). This is an unrealistic property for the modelling of positive entities like interest rates or credit spreads. .D. The conditional mean and the conditional variance of x(t) given x(s) can be derived from Equation (48): Es [xt ] = θ + (xs − θ)e−α(t−s) 2 (53) σ 1 − e−2α(t−s) (54) 2α One can easily check that as time increases the mean tends to the longterm value θ and the variance remains bounded. Brigo. t2 . the longterm distribution of the OrnsteinUhlenbeck process is stationary and is Gaussian with θ as mean and σ 2 /2α as standard deviation. t1 . Neugebauer. 1)). The volatility of the innovations can be deduced by the It¯ isometry o δ=σ (1 − e−2α∆t ) /2α (52) √ whereas if one used the Euler scheme to discretise the equation this would simply be σ △t. As the distribution of x(t) is Gaussian. implying mean reversion. namely: xt = θ 1 − e−α(t−s) + xs e−α(t−s) + σe−αt t s eαu dWu (48) The discrete time version of this equation. Under the risk neutral measure used for valuation and pricing one may assume a similar parametrisation for the same process but with one more parameter in the drift modelling the market price of risk. A. Equation (49) is exactly the formulation of an AR(1) process (see section 6).θ and σ positive and dWt a standard Wiener process. The Vasicek model has a major shortcoming that is the non null probability of negative rates. Dalessandro. where for brevity x(ti ) is denoted by xi and n is assumed to be the number of observations. The OLS regression provides the maximum likelihood estimator for the parameters c. with (assume constant for simplicity) time step △t = ti − ti−1 is: x(ti ) = c + bx(ti−1 ) + δǫ(ti ) where the coeﬃcients are: c b = θ 1 − e−α∆t = e −α∆t (49) (50) (51) and ǫ(t) is a Gaussian white noise (ǫ ∼ N (0. θ and σ parameters: α = θ = σ = − ln(b)/∆t c/(1 − b) (55) (56) (57) δ/ (b2 − 1)∆t/2 ln(b) One may compare the above formulas with the estimators derived directly via maximum likelihood in Brigo and Mercurio (2006). Triki: A stochastic processes toolkit for Risk Management 29 with α.
01 0 44 0 10 20 30 40 50 60 LT variance 70 80 90 0.01 0.02 0.03 0. θ = 50 and σ = 25 .005 0 50 55 60 65 99th percentile 70 75 80 Figure 17: The distribution of the longterm mean and longterm variance and the 99th percentile estimated by several simulations of a Vasicek process with α = 8.07 0. A.06 Normalized Frequency Normalized Frequency 46 48 50 LT mean 52 54 56 0. Dalessandro.03 0. Triki: A stochastic processes toolkit for Risk Management 30 0.015 0.05 0.05 0. F.04 0.01 0. Neugebauer. Brigo.D.03 0. M.025 Normalized Frequency 0.04 0.02 0.02 0.
M L _ V a s i c e k P a r a m s = [ alpha theta sigma ]. M.D. In other terms. By squaring or exponentiating a normal random variable y one obtains a random variable x with fat tails. c = ols (1). resid = V_data (2: N ) . a lognormal distribution (exponential Vasicek) has fatter tails than a chisquared (squared Vasicek) or even noncentral chisquared (CIR) distribution. function M L _ V a s i c e k P a r a m s = V a s i c e k _ c a l i b r a t i o n( V_data . sigma = delta / sqrt (( b ^2 1)* dt /(2* log ( b ))). x = [ ones (N 1 . Code 12 M AT LAB Calibration of the Vasicek process. alpha = . Neugebauer. (58) θ= δ2 = 1 n n i=1 n i=1 [xi n(1 − b) − bxi−1 ] (59) 2 xi − bxi−1 − θ(1 − b) . it is possible to transform a basic Vasicek process y(t) in a new positive process x(t). If one takes the square to ensure positivity and deﬁne x(t) = y(t)2 .1) V_data (1: N 1)]. using equation (49). where now ti are future instants. . B. The simulation is straightforward using the calibrated coeﬃcients α. then by It¯’s formula x satisﬁes: o dx(t) = (B + A x(t) − Cx(t))dt + ν x(t)dWt for suitable constants A. Therefore. using (55) and (57) one obtains immediately the estimators for the parameters α and σ. . For the Monte Carlo simulation purpose the discrete time expression of the Vasicek model is used.x * ols . This “squared Vasicek” model is similar to the CIR model that the subsequent section will address. starting the recursion with the current value x0 . θ and σ. b = ols (2). Brigo.b ). dyt = α(θ − yt )dt + σdWt (61) then one may transform y. theta = c /(1 . 8 Mean Reversion: The Exponential Vasicek Model To address the positivity of interest rates and ensuring their distribution to have fatter tails. Since the exponential function increases faster than the square and is larger (exp(y) >> y 2 for positive and large y) the tails in the exponential Vasicek model below will be fatter than the tails in the squared Vasicek (and CIR as well). Triki: A stochastic processes toolkit for Risk Management 31 b= n n i=1 n xi xi−1 − n i=1 x2 i−1 − n i=1 xi n ( i=1 n i=1 xi−1 2 xi−1 ) . (60) From these estimators.log ( b )/ dt . The most common transformations to obtain a positive number x from a possibly negative number y are the square and the exponential functions. C and ν. if one assumes y to follow a Vasicek process. so it will not be addressed further here. F. A. ols = (x ’* x )^( 1)*( x ’* V_data (2: N )). delta = std ( resid ). dt ) N = length ( V_data ). One only needs to draw a random normal process ǫt and then generate recursively the new xti process. Dalessandro.
between two any instants 0 < s < t: xt = exp θ 1 − e−α(t−s) + log(xs )e−α(t−s) + σe−αt t s eαu dWu (63) The advantage of this model is that the underlying process is mean reverting. A. then one obtains a model that at times has been termed Exponential Vasicek (for example in Brigo and Mercurio (2006)). F. and a similar parametrisation can be used under the risk neutral measure. m = θ + 2α . It¯’s formula this time reads. The dashed red lines describe the 95th percentile and the 5th percentile of the exponential Vasicek distribution. The explicit solution of the Exponential Vasicek equation is then. on the other hand. This makes the Exponential Vasicek model an excellent candidate for the modelling of the credit spreads when one needs to do historical estimation. x(t) = exp(y(t)). σ2 Also. As shortcomings for the Exponential Vasicek model one can identify the explosion problem that appears when this model is used to describe the interest rate dynamic in a pricing model involving the bank account evaluation (like for the Eurodollar . this model for x assumes that the logarithm of the short rate (rather than the short rate itself) follows an OrnsteinUhlenbeck process.D. strictly positive and having a distribution skewed to the right and characterised by a fat tail for large positive values. Triki: A stochastic processes toolkit for Risk Management 32 250 200 150 100 50 0 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Figure 18: EMU Corporate A rated 57Y index (in dark black) vs two simulated Exponential Vasicek paths (in blue and green). M. The two remaining red lines describe one standard deviation above and below the mean. to ensure positivity one takes the exponential. stationary in the long run. In other words. Brigo. Dalessandro. Neugebauer. The dark red line is the distribution mean. If. for x under the objective measure: o dxt = αxt (m − log xt )dt + σxt dWt (62) where α and σ are positive.
One should also point out that if this model is assumed for the short rate or stochastic default intensity. b and δ. In addition. θ and σ parameters in Equations (62) are deduced using equations (55). Brigo and Mercurio (2006)).9701 θ = 3.9054 δ = 0.1894 α = 4. If the time step is too large. To ensure that xt is always strictly positive one needs to impose σ 2 2αθ.4061 Figure 18 shows how the EMU Corporate A rated 57Y index time series ﬁts into the simulation of 50. Brigo. Alternative positivity preserving schemes are discussed. and Ross (1985a). the EMU Corporate A rated 57Y index described in Section 6 is used. the log spread process is tested to be mean reverting (see again Section 6). 9 Mean Reversion: The CIR Model Originally the model proposed by Cox. M. Ingersoll.000 paths using the Equation for the exponential of (49) starting with the ﬁrst observation of the index (ﬁrst week of 1998). The historical calibration and simulation of the Exponential Vasicek model can be done by calibrating and simulating its logprocess y as described in the previous Section. there is risk that the process approaches zero in some scenarios. By doing so the upward drift is suﬃciently large with respect to the volatility to make the origin inaccessible to xt . for example in Brigo and Mercurio (2006). To illustrate the calibration. The histogram of the index shows a clear fat tail pattern. The simulation of the CIR process can be done using a recursive discrete version of Equation (64) at discretisation times ti : x(ti ) = αθ∆t + (1 − α∆t)x(ti−1 ) + σ x(ti−1 ) ∆t ǫti (65) with ǫ ∼ N (0. and dWt a standard Wiener process.3625 b = 0. This leads to think about using the Exponential Vasicek model that has a log normal distribution and an AR(1) disturbance in the log prices. however. and Ross (1985b) was introduced in the context of the general equilibrium model of the same authors Cox. By choosing an appropriate market price of risk.8307 σ = 1. there is no closed form bond pricing or survival probability formula and all curve construction and valuation according to this model has to be done numerically through ﬁnite diﬀerence schemes. F. 1). Then the α. σ > 0. which could lead to numerical problems and complex numbers surfacing in the scheme. trinomial trees or Monte Carlo simulation. Ingersoll. Neugebauer. For simulation purposes it is more precise but also more computationally expensive to draw the simulation directly from the following exact conditional distribution of the CIR process: fCIR xti xti−1 = ce−u−v c= v u q/2 √ Iq 2 uv (66) u = cxti e−α∆t v = cxti−1 2αθ q = 2 −1 σ 2α σ 2 (1 − e−α∆t ) (67) (68) (69) (70) . Triki: A stochastic processes toolkit for Risk Management 33 future pricing. The simulated path (in blue) illustrates that the Exponential Vasicek process can occasionally reach values that are much higher than its 95–percentile level. Dalessandro. the xt process dynamics can be written with the same structure under both the objective measure and risk neutral measure. (56) and (57): c = 0.D. A. The OLS regression of the logspread time series on its ﬁrst lags (Equation (49)) gives the coeﬃcients c. θ. Under the objective measure one can write √ dxt = α(θ − xt )dt + σ xt dWt (64) with α.
A. The two remaining red lines describe one standard deviation above and below the mean . M. The dark red line is the distribution mean. The dashed red lines describe the 95th percentile and the 5th percentile of the distribution. Brigo. Dalessandro. Triki: A stochastic processes toolkit for Risk Management 34 120 100 80 60 40 20 0 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 Figure 19: EMU Corporate A rated 57Y index (in dark black) vs two simulated CIR paths (in blue and green).D. Neugebauer. F.
Neugebauer.000 paths using Equation (65) starting with the ﬁrst observation of the index (ﬁrst week of 1998). c = (2* alpha )/(( sigma ^2)*(1 ..D. M.θ>0. function M L _ C I R p a r a m s = C I R _ c a l i b r a t i o n( V_data . function mll = F T _ C I R _ L L _ E x a c t F u l l( params ) alpha = params (1). sigma = params (3). 2u).4966 Figure 19 shows how the EMU Corporate A rated 57Y index time series ﬁts into the simulation of 40.* v )))). M L _ C I R p a r a m s = f m i n s e a r c h( @ F T _ C I R _ L L _ E xa ct Fu ll .2* sqrt ( u .alpha * dt )* V_data (1: N 1). Then: α0 θ0 σ0 Code 13 M AT LAB = = = − log(b)/∆t E(xt ) 2α0 V ar(xt )/θ0 (73) (74) (75) MLE calibration of the CIR process. options )..abs ( real (2* sqrt ( u .3027 α = 1. 23 Equation (65) shows that x(ti ) depends only on x(ti−1 ) and not on previous values x(ti−2 ).m . params ) % M L _ C I R p a r a m s = [ alpha theta sigma ] N = length ( V_data ). The longterm mean of the CIR process is θ and its longterm variance is θσ 2 /(2α). params = [ . m = mean ( V_data ). Brigo. F. θ. so CIR is Markov. params . ’ MaxIter ’ .8662 θ0 = 49. if nargin <3 x = [ ones (N 1 . Dalessandro.. it can be calibrated to the EMU Corporate A rated 57Y index described in Section 6. ols = (x ’* x )^( 1)*( x ’* V_data (2: N )). log ( besseli (q .2902 θ = 51.1)) .. sqrt (2* ols (2)* v / m )]. v = c * V_data (2: N ). To calibrate this model the Maximum Likelihood Method is applied: argmax α>0./ u )* q /2 . q = ((2* alpha * teta )/( sigma ^2)) 1. The calibration is done by Maximum Likelihood.σ>0 log(L) (71) The Likelihood can be decomposed.log ( ols (2))/ dt ..exp ( . The process x follows a noncentral chisquared distribution x(ti )x(ti−1 ) ∼ χ2 (2cxti−1 .7894 σ = 4. mll = (N 1)* log ( c )+ sum ( u +v . end end The initial guess and ﬁtted parameters for the EMU Corporate A rated 57Y index are the following: α0 = 0.330 σ0 = 4.1) V_data (1: N 1)]. dt . u = c * exp ( . σ) i=1 (72) Since this model is also a fat tail mean reverting one. according to Equation (9). 100 000). v = var ( V_data ).log ( v . as for general Markov processes23 .alpha * dt ))). teta = params (2). as a product of transition likelihoods: n L= fCIR (x(ti )x(ti−1 ). 100000 . 2q + 2.* v ) . Triki: A stochastic processes toolkit for Risk Management 35 with Iq the modiﬁed Bessel function of the ﬁrst kind of order q and ∆t = ti − ti−1 . . x(ti−3 ). A. To speed up the calibration one uses as initial guess for α the AR(1) coeﬃcient of the Vasicek regression (Equation (49)) and then infer θ and σ using the ﬁrst two moments of the process. end options = optimset ( ’ M a x F u n E v a l s’ . α.
rather than adding jumps to a GBM. when used as an instantaneous short rate or default intensity model under the pricing measure.5 2 1. one may decide to add jumps to a mean reverting process such as Vasicek. Neugebauer.5 1 0. and in the long run features mean reversion? In this section one example of one such model is formulated: a mean reverting diﬀusion process with jumps. so it is preferable to CIR. Dalessandro. In other terms. if used in conjunction with pricing applications. 10 Mean Reversion and Fat Tails together In the previous sections. F. and features fatter tails. A. it comes with closed form solutions for bonds and survival probabilities. whereas the second one accounts for a process that in long times approaches some reference mean value with a bounded variance. Overall. the tractability of CIR and nonexplosion of its associated bank account can make CIR preferable in some cases. Brigo. Consider: dx(t) = α(θ − x(t))dt + σdW (t) + dJt (76) . to the ﬁtted Exponential Vasicek process probability density (in green) and to the ﬁtted CIR process probability density (in red) As observed before. The ﬁrst feature accounts for extreme movements in short time. Furthermore.D. However. beyond the Gaussian statistics. the CIR model has less fat tails than the Exponential Vasicek.5 3 2.5 0 −0. two possible features of ﬁnancial time series were studied: fat tails and mean reversion. Triki: A stochastic processes toolkit for Risk Management 36 x 10 3.5 −3 EMU Corporate A rated 5−7Y index spreads Vasicek Exponential Vasicek CIR Normalized Frequency 100 110 120 130 Spread level in bp 140 150 Figure 20: The tail of the EMU Corporate A rated 57Y index histogram (in blue) compared to the ﬁtted Vasicek process probability density (in black). but avoids the abovementioned explosion problem of the lognormal processes bank account. the exponential Vasicek process is easier to estimate and to simulate. The question in this section is: can one combine the two features and have a process that in short time instants can feature extreme movements. M. Figure (20) compares the tail behaviour of the calibrated CIR process with the calibrated Vasicek and Exponential Vasicek processes.
Now. Brigo. M. respectively. F. (81) As recalled previously. vx (t − s) := σ2 1 − e−2α(t−s) . but calculations are possible also with exponential or other jump sizes. 25 Or better. such density between any two instants s = ti−1 and t = ti can be computed by the above moment generating function through inverse Laplace transform25 . . Otherwise. Furthermore. In this case. Gaussian jump size are considered. the characteristic function always exists. since the mean in the shocks in nonzero. The way the previous equation is written is slightly misleading. For the distribution of the jump sizes. dJ(t) = YN (t) dN (t) where N (t) is a Poisson process with intensity λ and Yj are iid random variables modelling the size of the jth jump. t − s) := θ 1 − e−α(t−s) + xs e−α(t−s) . one needs to rewrite it as: dx(t) = α (θY − x(t)) dt + σdW (t) + [dJt − λE[Y ] dt] where the actual long term mean level is: θY := θ + λE[Y ]/α. independent of N and W . A. To have zeromean shocks. the earlier equation for Vasicek. the characteristic function is a better notion than the moment generating function. mean and variance of the Vasicek model without jumps. depending on Y and λ. t − s) + σe−αt eαz dWz + e−αt eαz dJz (78) mx (xs . 2α where mx and vx are. Neugebauer. the last term is independent of all others. from the above characteristic function through inverse Fourier transform. 24 As stated previously. recalling Equation (9). Notice that taking out the last term in (78) one still gets (48). Dalessandro. whereas the moment generating function may not exist in some cases. t − s)iu − vx (t − s) 2 u +λ 2 t φY (ue−α(t−z) ) − 1 dz s . the process integrates into: t s t s (77) xt = mx (xs . the moment generating function of the transition density between any two instants is computed: Mts (u) = Ex(s) [exp(ux(t))] that in the model is given by Mts (u) = exp mx (xs . so as to be able to read the longterm mean level from the equation. Since a sum of independent random variables leads to convolution in the density space and to product in the moment generating function space. t − s)u + vx (t − s) 2 u +λ 2 t s (79) MY (ue−α(t−z) ) − 1 dz (80) where the ﬁrst exponential is due to the diﬀusion part (Vasicek) and the second one comes from the jumps24 .D. For the characteristic function one would obtain: φts (u) = exp mx (xs . particularly with stable or inﬁnitely divisible distributions. and can be much larger than θ. Triki: A stochastic processes toolkit for Risk Management 37 where the jumps J are deﬁned as: N (t) J(t) = j=1 Yj . one knows that to write the likelihood of the Markov process it is enough to know the transition density between subsequent instants. for a general jump size distribution.
01 0. F.002 0 −30 −20 −10 0 Innovation level 10 20 30 40 25 20 Fitted Vasicek with Jumps quantiles −15 −10 −5 0 5 10 Historical distribution quantiles 15 20 25 15 10 5 0 −5 −10 −15 −20 −20 30 Fitted Vasicek quantiles 20 10 0 −10 −20 −30 −20 −15 −10 −5 0 5 10 Historical distribution quantiles 15 20 25 Figure 21: EMU Corporate A rated 57Y index charts. The bottom left ﬁgure is the QQplot of the Vasicek process innovations against the historical data innovations.006 0. . Brigo. Dalessandro.D.014 Normalized Frequency 0.004 0. Neugebauer. A. The bottom right ﬁgure is the QQplot of the Vasicek process with normal jumps innovations against the historical data innovations. The top chart compares the innovations ﬁt of the Vasicek process to the ﬁt of the Vasicek process with normal jumps (in red).016 0.012 0. Triki: A stochastic processes toolkit for Risk Management 38 0. M.008 0.018 Historical histogram Vasicek with jumps Vasicek 0.
and allows in principle for more ﬂexibility in the shapes the ﬁnal distribution can assume. M. 2 Suppose then that the Jump size Y is normally distributed. On the contrary. It is important that the jump mean µY not be too close to zero. Neugebauer. weighted by the arrival intensity λ. and no jumps with probability 1 − λ∆t. tending to (σ 2 + λ(µ2 + σY ))/(2α). and 2 Z are again iid normal random variables with mean µZ and variance σZ . For small ∆t. x(t − ∆t)). one knows that there will be possibly just one jump. in the Gaussian jump size case. ∆J(t) := J(t) − J(t − ∆t). since as time increases the mean tends 2 to θY while the variance remains bounded. +λ− ∆t fN (x. t]: 2 fx(t)x(t−∆t)(x) = λ ∆t fN (x. independent of M. x(t − ∆t)). Dalessandro. vx (∆t) + 2 σZ ) . vx (∆t) + σY ) +(1 − λ∆t)fN (x. Triki: A stochastic processes toolkit for Risk Management 39 There are however some special jump size distributions where one does not need to use transform methods. x(t − ∆t)) + µY . A. mx (∆t. so the jumps will be markedly positive or negative. so this model is capable of reproducing more extreme movements than the basic Vasicek model. one may add a second jump process J− (t) deﬁned as: dJ− (t) = ZM(t) dM (t) V ar[xt ] = where now M is an arrival Poisson process with intensity λ− counting the jumps. independent of J. there will be one jump in(t − ∆t. F. x(t − ∆t)) + µY . W . one can prove that the exact solution of Equation (77) or equivalently Equation(76)) satisﬁes: E[xt ] = θY + (x0 − θY )e−αt (84) 2 σ 2 + λ(µ2 + σY ) Y 1 − e−2αt (85) 2α This way one sees that the process is indeed mean reverting. with the approximation: t t−∆t eαz dJz ≈ eα(t−∆t) ∆J(t). t] with probability λ∆t. If one aims at modelling both markedly positive and negative jumps. mx (∆t. vx (∆t)) This is now a mixture of three Gaussian distributions. More precisely. vx (∆t)) (83) This is a mixture of Gaussians. The overall process reads: dx(t) dx(t) θJ = = α (θ − x(t)) dt + σdW (t) + dJ(t) − dJ− (t) or (86) α (θJ − x(t)) dt + σdW (t) + [dJ(t) − λµY dt] − [dJ− (t) − λ− µZ dt]. mx (∆t. over small intervals [s. J. with mean µY and variance σY . x(t − ∆t)) − µZ . vx (∆t) + σY ) (87) +(1 − (λ + λ− )∆t) fN (x. Furthermore. that is covered in detail now. (82) one can write the transition density between s = t − ∆t and t as: 2 fx(t)x(t−∆t)(x) = λ ∆t fN (x. These include the Gaussian jump size case. through straightforward calculations. Brigo. it is best to assume µY to be signiﬁcantly positive or negative with a relatively small variance. either letting Y λ or µY and σY going to zero (no jumps) gives back the mean and variance for the basic Vasicek process without jumps.D. It is known that mixtures of Gaussians feature fat tails. mx (∆t. or one gets again almost symmetric Gaussian shocks that are already there in Brownian motion and the model does not diﬀer much from a pure diﬀusion Vasicek model. mx (∆t. Notice that. λµY − λ− µZ := θ + α This model would feature as transition density. To check that the process is indeed meanreverting.
F.5 0 −0. as one sees from Figure 22. in general the need may occur in diﬀerent applications. However. Dalessandro. M. by playing with the jump parameters or better by exponentiating Vasicek with jumps one can easily obtain a process with tails that are fatter than the exponential Vasicek when needed. More generally. Brigo. somehow similar to CIR. by playing with the jump parameters and in case by squaring or exponentiating. by estimating y on the logseries rather than on the series itself. Furthermore. Indeed. Neugebauer. From the examples in Figures 21 and 22 one sees the goodness of ﬁt of Vasicek with jumps.5 Historical spread histogram Vasicek Vasicek with jumps 3 2. one may generalise the jump distribution beyond the Gaussian case and attain an even larger ﬂexibility. Then historical estimation can be applied working on the logseries. but then deﬁne your basic process x as x(t) = exp(y(t)). even with positive jumps the basic model still features positive probability for negative values.5 1 0. it may not be necessary to do so in this example. Figure 20 shows that the tail of exponential Vasicek almost exceeds the series. especially inverse Fourier transform methods. While.D.5 100 110 120 130 Spread level in bp 140 150 Figure 22: The tail of the EMU Corporate A rated 57Y index histogram (in blue) compared to the ﬁtted Vasicek process probability density (in green) and to the ﬁtted Vasicek with jumps process probability density (in red) Again.5 Normalized Frequency 2 1. Indeed. Triki: A stochastic processes toolkit for Risk Management 40 x 10 −3 3. . using a more powerful numerical apparatus. one can build on the Vasicek model with jumps by squaring or exponentiating it. whereas Vasicek with jumps can be a weaker tail alternative to exponential Vasicek. one can obtain a large variety of tail behaviours for the simulated process. and note that the Vasicek model with positivemean Gaussian size jumps can be an alternative to the exponential Vasicek. given Figure 20. A. This may be suﬃcient for several applications. One can solve this problem by exponentiating the model: deﬁne a stochastic process y following the same dynamics as x in (86) or (76). given Vasicek with positive and in case negative jumps.
σ ˆ Volatility and sample volatility Wt or W (t) Standard Wiener process Fat Tails: GBM with Jumps. . loglikelihood. xn . 2. Garch Models Jt . with intensity λ Y1 . .. xn L⋆ (Θ)(x1 .. . . µ ˆ Drift parameter and Sample drift estimator of GBM σ. xn ) := log(L(Θ)(x1 . . . Neugebauer. with parameter L = exp(−L)(L)x /(x!) Gamma probability density at x ≥ 0 with parameters κ and ζ: = xκ−1 exp(−x/ζ)/(ζ κ Γ(κ)) Distributed as 1 2 Statistical Estimation CI. . Y2 .. . κ.. . .. N (t) Poisson process counting the number of jumps. Autocorrelation Function. Dalessandro. .. . . F.X2 . 1.. V ) fN (x. . .D. . computed at x1 . xn ) := fX1 . y): N (m. . . L) fΓ (x. . Jump amplitudes random variables Time.X2 . . Xn . m. . . V ) Φ fP (x. . Partial ACF MLE Maximum Likelihood Estimation OLS Ordinary Least Squares Θ Parameters for MLE L(Θ)(x1 . . . . M. . .Xn (x1 . . ˆ Θ Estimator for Θ Geometric Brownian Motion (GBM) ABM Arithmetic Brownian Motion GBM Geometric Brownian Motion µ. Yi . xn ) or fX1 . ζ) ∼ Expectation Variance Stochastic Diﬀerential Equation Probability of event A Moment generating function of the random variable X computed at u Characteristic function of X computed at u Quantiles Density of the random variable X computed at x Cumulative distribution function of the random variable X computed at x Density of the random variable X computed at x conditional on the random variable Y being y Normal (Gaussian) random variable with mean m and Variance V Normal density with mean m and Variance V computed at x = (2πV )− 2 exp − (x−m) 2V Standard Normal Cumulative Distribution Function Poisson probability mass at x = 0.. Maturities. . Θ): likelihood distribution function with parameter Θ in the assumed distribution. Brigo.. ACF.Xn Θ (x1 . . Triki: A stochastic processes toolkit for Risk Management 41 Notation General E Var SDE P (A) MX (u) φX (u) q fX (x) FX (x) fXY =y (x) or fXY (x. . A. J(t) Jump process at time t Nt . discretisation. . . . . jumping by one whenever a new jump occurs. PACF Conﬁdence Interval. xn )).. for random variables X1 . Time Steps ∆t Time step in discretization .
D. α. A sequence of equally spaced time instants. A. Maturity Sample size Models Variance Gamma Logreturn drift in calendartime Transform of calendar time t into market activity time Logreturn drift in marketactivitytime volatility Vasicek and CIR Mean Reverting Models θ. Neugebauer. Triki: A stochastic processes toolkit for Risk Management 42 s. . ti . . . u 0 = t0 . T n Variance Gamma VG µ ¯ g(t) ∼ Γ(t/ν. t. M. σ Longterm mean reversion level. F. and volatility . 0 ≤ s ≤ t ≤ u. speed. Brigo. ti − ti−1 = ∆t. ti . . ν) ¯ θ σ ¯ Three time instants. Dalessandro. . . .
33 Cox. T. and F. 13 Brigo. J. Journal of Applied Probability 41A. 177–188.g. 26 Cont. 21 Phillips. Journal of Financial Economics 5.. Ross (1985a). Financial Modelling with Jump Processes. B. Econometrica 53. Fitting the variancegamma model to ﬁnancial data. 1–30. 363–384. (2004). J. Mean reversion in stock prices: Evidence and implications. Chang (1998). 177–187. 1749–78. Journal of Business 63. Europ. R. 335–346. T. and D. 32. M. 599607. Ng (1993). E. Pricig cdo with correlated variancegamma distributions. and V. Dickey (1984). Testing for unit roots in autoregressive moving average models of unknown order. S. 79–105. Summers (1988). The variance gamma process and option pricing. Phillips. (2006). 2. 33 Cochrane. 26 Poterba. Journal of Econometrics 54.) model for share market returns. D. NY: Springer Verlag. Mercurio (2006). Brigo. A theory of the term structure of interest rates. Carr. 26 Said. Schmidt. P. Journal of Financial Economics 3. (1988). Distribution of the estimators for autoregressive time series with a unit root. 21 Madan. and E. Biometrica 75. (2001). B. 25. Interest Rate Models: Theory and Practice . (1977). (1986). D. 27–59. P. Testing for unit root in time series regression. 29. The variance gamma (v. Princeton: Princeton University Press. 21 Vasicek. J. A. C. and P. 427431. O. Triki: A stochastic processes toolkit for Risk Management 43 References Bollerslev. F. D. Ingersoll. 125–144. Journal of Fixed Income 12. Chapman and Hall. A. A. Testing the null hypothesis of stationary against the alternative of a unit root. Inﬂation and Credit. 26 Madan. K. Financial Mathematics Series. 385–408. and S. R. Neugebauer. H. Fuller (1979). 17 Moosbrucker. 21 Cox. and L. and Y. An equilibrium characterization of the term structure. D. Shin (1992). J. 21 Merton. P. 27 . Ingersoll. and P. Measuring and testing the impact of news on volatility. Perron. New York. Seneta (1990). Journal of Econometrics 31. Tankov (2004). 25 Engle. J. (1976).. Dalessandro. 14 Kwiatkowski. Journal of Financial Economics 22. Finance Rev. P. J.. 27 Seneta.. and E. R. A. J. D.D.With Smile. 26. Journal of the American Statistical Association 74. 307–327. Generalized autoregressive conditional heteroskedasticity. Econometrica 53. An intertemporal general equilibrium model of asset prices. Ross (1985b). and S. Asset Pricing. Journal of Finance 48. Biometrika 71. Option pricing when underlying stock returns are discontinuous. 159178. 33 Dickey. 511–524. and W.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.