Stochastic Processes in Risk Management
Stochastic Processes in Risk Management
Francesco Menoncin1
1
Università degli Studi di Brescia – Department of Economics and Management. Via S. Faustino 74/B – 25122 Brescia
(Italy). E-mail: [Link]@[Link]
Contents
1 The R software 7
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Stochastic processes 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 The random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 The Brown/Wiener process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Conditional expected values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Tower property of iterated expected values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 Stopping time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.9 Low of the maximum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Stochastic calculus 27
3.1 Stochastic differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Quadratic variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Correlated Wiener processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Itô’s calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Some properties of Itô processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 Simulation of stochastic process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3
4 CONTENTS
7 Asset prices 75
7.1 Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.2 Futures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.3 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.4 Replication and hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.5 Real options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8 Credit risk 83
8.1 Default measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.2 Double stochastic default intensity and asset pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.3 Zero-coupon bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.4 Default-coupon bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.5 Credit Default Swap (CDS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9 Risk measures 89
9.1 Coherent risk measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9.2 The variance as a risk measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9.3 Representation theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.4 Expected Shortfall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
9.5 Expected Shortfall: Historical simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.6 Spectral risk measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9.7 Value at Risk (VaR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Introduction
In his PhD dissertation, Bachelier [1900] tried, for the first time in history, to model the asset prices on the Paris
stock exchange through Gaussian processes. In particular, he used the so-called Brownian motions (or Wiener
processes) simply because they proved themselves very useful for describing many natural phenomena (like the heat
transfer).
Finance, nowadays, heavily relies on Wiener processes (also called diffusion processes) for describing the dynamic
behaviour of asset prices. More recently, and mainly because of the big financial crisis which burst in 2007/2008,
also so-called jump processes have become relevant in finance: they describe the behaviour of a stochastic variable
which may take a finite variation in an infinitesimal time interval (i.e. a so-called jump).
In these notes we will present the main theoretical properties of diffusion and jump processes together with
numerical applications written in R.
5
Chapter 1
The R software
1.1 Introduction
In this book we will use the R free software [Link] and its free interface RStudio https:
//[Link]/. A full introduction to the R programming language is out of the scope of this book. Here,
we just outline the main features of R. Other (and similar) software like Matlab (or its freeware clones like Scilab –
[Link] – or Octave – [Link] use mainly a vector/matrix
approach to computations, while R also and mainly works with data frames and lists.
In this work we use the package Knitr for LATEX ([Link] which allows to execute R
commands directly on a LATEX document without calling externally the R software.
In the following code we show how to create three sets of data (with command c) and show the first one.
All the sets that have been created can be put together into a data frame through the following commands
where we also show the whole set and a subset of it. Finally, the mean of the second subset is computed.
X = [Link](A, B, C)
X
## A B C
## 1 11 19 10
## 2 12 20 9
## 3 14 21 7
X$A
## [1] 11 12 14
mean(X$B)
## [1] 20
A matrix can be created by concatenation of the sets A, B and C (row by row through the command rbind –
or column by column through the command cbind). The new matrix (called M ) can be used as the argument of a
matrix command like determinant (det) or transposition (t).
M = rbind(A, B, C)
det(M)
## [1] -21
t(M)
## A B C
## [1,] 11 19 10
## [2,] 12 20 9
7
8 1 The R software
## [3,] 14 21 7
If a single number is appended to a matrix through the command rbind, a new row is created and all its elements
are equal to the given number.
rbind(M, 2.5)
## [,1] [,2] [,3]
## A 11.0 12.0 14.0
## B 19.0 20.0 21.0
## C 10.0 9.0 7.0
## 2.5 2.5 2.5
The elements of a matrix or of a set are identified by their coordinates inside brackets.
M[2, 1]
## B
## 19
A[3]
## [1] 14
M[1:2, 1]
## A B
## 11 19
A matrix can be compute through the command array whose input are a set of elements (created through
command c) and the dimensions of the matrix.
The product between two matrices can be computed by using a particular multiplying operator as follows.
D %*% t(D)
## [,1] [,2]
## [1,] 35 44
## [2,] 44 56
A sequence can be created through the command seq whose arguments are as follows:
seq(from = , to = , by = ,[Link] = , [Link] = )
where by contains the constant difference between two adjacent elements of the sequence, [Link] is the number
of elements in the sequence, and [Link] is the element whose dimension we want the sequence to replicate.
Here are some examples.
seq(0, 2, by = 0.5)
## [1] 0.0 0.5 1.0 1.5 2.0
seq(0, 1, [Link] = 10)
## [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
## [8] 0.7777778 0.8888889 1.0000000
seq(0, 1, [Link] = A)
## [1] 0.0 0.5 1.0
1.1 Introduction 9
Remark 1.1.1. Contrary to other software, the arguments of any command in R can be put in any preferred order. If
the default order of the arguments is preserved, then it is not necessary to use the name of the arguments. Instead,
if a new order of the arguments is chosen, then it is necessary to specify the names of the arguments whose order
has been altered.
Stochastic processes
2.1 Introduction
The very basic idea for any analysis of the risk is a «stochastic process», defined as follows.
Definition 2.1.1 (Stochastic process). A stochastic process is a collection of stochastic variables indexed by
time.
X0 , X1 , X2 , ...
X0 = 0.
∑t
Xt = Yi ,
i=1
11
12 2 Stochastic processes
Figure 2.1.1: 250 observations of a stochastic process (daily prices of the S&P500 starting at 3-Jan-1950)
20
19
price
18
17
days
2.2 The random walk 13
Figure 2.2.1: Example of a random walk where, starting at zero (and going towards the abscissa arrow), a «walker»
turned once to the right, three times to the left, three times to the right, twice to the left and once to the right
2.0
●
1.5
1.0
direction of the walk
● ● ●
0.5
0.0
● ● ● ● ●
−0.5
−1.0
● ●
2 4 6 8 10
steps
14 2 Stochastic processes
Thus, the random walk moves around a constant mean (which is zero) but the variance around that mean is
proportional to time. The longer the time (we are forecasting) the higher the volatility (i.e. the less reliable is the
forecast).
The random walk has the following properties:
• E [Xt ] = X0 = 0
The R function that allows to simulate the extraction of a random number from a binomial distribution is
rbinom(n,size,prob)
where n is the number of the experiments, size is the number of trials performed in any experiment, and prob
is the probability of success. Thus, if we want to simulate 5 times (n) 2 trials from a binomial distribution whose
probability of success is 0.5, the number of success can be either 0, or 1 or 2. The possible outcomes are shown in
the following commands:
rbinom(5, 2, 0.5)
## [1] 1 0 1 0 0
rbinom(5, 2, 0.5)
## [1] 1 1 2 2 1
rbinom(5, 2, 0.5)
## [1] 0 1 1 1 0
If we want to simulate at the same time many random walks, we can think of a matrix containing as many rows
as the steps of the walk and as many columns as the number of the walks we want to generate. At each step we
extract only one value (i.e. n = 1) and we know that the events «going to the left» and «going to the right» have
the same probability (i.e. p = 0.5).
The events of the binomial distribution are either 0 or 1 (i.e. the function created a series of zeros and ones).
If we want to trace back this case to our random walk whose values are either 1 or −1, we can take the random
numbers generated by the function rbinom, multiply them by 2 and subtracting 1 (so that all the zeros become
−1 and all the ones remain the same).
Thus, the first step is to create the matrix Y containing the desired number of rows and columns. Then the
matrix X is created by taking as its first row a vector of zeros (in each simulation the value X0 must be zero) and
as its elements the cumulative sum (over the first dimension, i.e. the rows) of all the elements in matrix Y . In R
we can code a function that allows to obtain the matrix X as follows, where:
• The input of the function are: the number of rows (r) and the number of columns (c) to be created.
• The output (returned by the function) is the matrix X containing the c simulated random walks for r periods.
• Inside the function, the command apply is used; this command allow to apply a given command (its third
argument) to the dimension (the second argument) of a given array or matrix (the first argument). In the
case of our code, the function cumsum is applied to the second dimension (i.e. columns) of the matrix Y .
2.3 The Central Limit Theorem 15
RW = function(r, c) {
Y = array(2 * rbinom(r * c, 1, 0.5) - 1, dim = c(r, c))
X = rbind(array(0, dim = c(1, c)), apply(Y, 2, cumsum))
return(X)
}
If the function is used for creating 100 simulations of the same random walk for t ∈ {1, 2, ..., 500} as follows
X=RW(500,100)
Theorem 2.3.1 (Lindeberg–Lévy Central Limit Theorem). Suppose Yi is a sequence of i.i.d. random
variables with mean µ and (finite) variance σ 2 , then as n approaches infinity
(( n ) )
√ 1∑ d ( )
n Yi − µ − → N 0, σ 2 ,
n i=1
where N (•, •) is the normal distribution whose arguments are the mean and the variance, respectively.
Remark.
√ The CLT allows us to conclude that the rate of convergence of the empirical mean to the theoretical
one is n. This means that in order to improve the approximation of an expected value on Octave by 10 times,
the number of simulations must increase by 100 times.
16 2 Stochastic processes
Figure 2.2.2: 100 simulations of the same random walk Xt for t ∈ {1, 2, ..., 500}
X = RW(500, 100)
matplot(X, type = "l", ylab = "", col = "lightgray")
grid()
60
40
20
0
−20
−40
Figure 2.2.3: 100 simulations of the same random walk Xt for t ∈ {1, 2, ..., 500} (in light grey), the mean and the
standard
√ deviation of the simulated random walks (in bold) and the theoretical mean (zero) and standard deviation
( t) of the random walks (dotted)
In the simple case of the random walk seen in the previous section, we can write
1 d
√ Xt −
→ N (0, 1) .
t
rnorm(n,mean=0,sd=1)
Now, we can compare the random walk generated by the binomial distribution and the same random walk
generated by the normal distribution by plotting them in two graphs through the following commands, where we
use:
• The function par(mfrow=c(r,c)) for split a graph into r × c smaller graphs. Each graph is then drawn from
the top-left to the bottom-right, raw by raw.
• The function par(mar=c(_,_,_,_)) for setting the margins between graphs. The first value is the lower
margin, the second is the left margin, the third is the upper margin, ad the fourth is the right margin.
The result is shown in Figure 2.3.1 where we actually see that the two simulations are very close. The powerful
result of the CLT is that we can simulate the random walk just through normally distributed variables.
where the process dWt is called Brownian motion or Wiener process because of the two main researchers who
studied it:
• the Scottish botanist Robert Brown (1773–1858) who discovered that some pollen particles suspended in a
liquid follow such a process;
• the US mathematician and statistician Norbert Wiener (1894–1964) who formally described the random walk
followed by particles suspended in a liquid.
Empirically, the switch between periods and sub-periods is justified by the frequency of the data. In economics
and finance all the variables are expressed by using the year as a unit of time. For instance, when we read that
the interest rate for investments having a 3 year maturity is 2%, this means that each year a capital of 100 Euros
generates an interest of 2 Euros. Nevertheless, the frequency of the financial data is much higher than the year.
So, for example, the asset prices and interest rate are usually available day by day (or even at an intra-day level).
1
Accordingly, if the time unit of measure if the year and daily data are available, we can take dt = 250 where 250 is
the number of working day in one year; in this way the year is «the period», while the day is the sub-period.
The previous function RW can be of course rewritten for simulating a random walk by suitably taking into
account the sub-periods. This time we must add dt among the inputs of the function, while the first input r (the
1
number of the periods to be simulated) is now expressed in years. Thus, if r = 2 and dt = 250 , then 2 years of 250
days each are simulated, which means that 500 simulations are performed.
2.4 The Brown/Wiener process 19
Figure 2.3.1: Comparison between two sets of 100 simulations of random walks generated by either the binomial
variable (top) or the normal variable (bottom)
Figure 2.5.1: Information set at time t and at time t + 1 for a risky asset St and a risk-less asset Gt
St St+1 St+2 = Ft
E [ Xt+1 | Ft ] ≡ Et [Xt+1 ] ,
and during this presentation we will use the notation on the right hand side, where the subscript on the expected
value stands for the information set that is available when the expected value is computed.
We assume that information is never lost while time goes on or, more formally,
Ft ⊆ Ft+1 ,
i.e. everything which is known at time t is also known, for sure, at time t + 1 when new information may arise.
If the value of Xt is contained in the information set Ft , i.e., more formally,
Xt = Et [Xt ] ,
we say that Xt is Ft −measurable, or non-anticipated. This last term means that the value Xt is not known before
t and it becomes available in the information set only at time t.
In finance many variables are non-anticipated or Ft −measurable. For instance, the price of an asset is revealed
instant by instant on the stock exchange.
There exists a particular case of a variable which is known both in t and in t + 1 or, in other terms, which is
«anticipated». This is the case of a risk-less asset.
Let us call St the price, at time t, of a risky asset (e.g. a stock) whose price, one period ahead, is St+1 . In the
same way, we can define the price at time t of a risk-less asset as Gt (called «G» because only a Government is
assumed to be able to issue a risk-less asset) whose price, at time t + 1, is Gt+1 . The difference between St and Gt
stays exactly in their property of being measurable. In particular, we can write
Et [St ] = St ,
Et [Gt ] = Gt ,
Et [Gt+1 ] = Gt+1 ,
since the value of the risk-less asset at time t + 1 is already known at time t (if this were not the case, Gt would
not be a risk-less asset). In another way, we can write
Ft = {St , Gt , Gt+1 } .
If we take into account three periods, with prices in t, in t + 1 and in t + 2, the situation can be represented as
in Figure 2.5.1.
Finally, the same property implies that the return (in the period [t, t + 1]) on a risk-less asset:
Gt+1 − Gt
≡ r,
Gt
2.6 Tower property of iterated expected values 21
is known at time t, while the return on a risky asset (in the same period)
St+1 − St
≡ µ,
St
is a stochastic variable (i.e. it is not known at time t).
It is important to stress that also the risk-less asset is a stochastic variable since at time t we know both Gt and
Gt+1 , but we do not know Gt+2 . Thus, the only difference with respect to a risky asset is that Gt is predictable
(but just for one period ahead).
∑
k
Et [XT ] = XT,i pi ,
i=1
where k is the number of the states of the world, pi is the probability of each state, and XT,i is the value of the
stochastic process X at time T and in the state i. In continuous time, instead, the expected value is computed as
follows: ∫
Et [XT ] = XT (ω) dP T |t (ω) ,
Ω
where P T |t (ω) is the probability (i.e. the cumulative distribution function – CDF) of the event XT , given the
information set at time t, and when the event ω happens. Here, Ω is the set of all the possible values of ω.
The information set F and the probability P are two fundamental elements of a so-called «probability space».
The third element of this space is the set of all the possible outcomes (so-called «sample space»). Until now this
set has not been specified because we have assumed that it coincided with F (i.e. the set of all the past values of
the stochastic process Xt ). In a more general framework, the set of all the possible outcomes could even contain
non numeric values (i.e. the outcomes of tossing a coin). If we call Ω the sample space, then the full probability
space is defined as the following triplet
(Ω, Ft , Pt ) .
With this toolbox, we are now ready to compute the expected value of an expected value, under two different
information sets. In particular, we want to show what follows.
Proposition 2.6.1 (Iterated expectation – Tower property). For any t0 ≤ s ≤ T , the following property
holds:
Et0 [XT ] = Et0 [Es [XT ]] .
but since all the stochastic variables in the integral are Ft0 −measurable, they are also known at time s ≥ t0 . Thus,
we can remove the external expected value and the final result is found
∫
Et0 [Es [XT ]] = XT (u) dP T |t0 (u) = Et0 [XT ] .
Ω
This property will be mainly useful when credit or mortality risk must be modelled.
2.7 Martingales
A fundamental property of some stochastic processes that is most often used in finance is the so-called martingale.
A stochastic process is a martingale when the best predictor at time t of its future value is the value at time t itself.
More formally we can write what follows.
Xt = Et [ Xt+1 | Ft ] ,
A random walk is a martingale and this can be demonstrated as follows, with T > t:
[ T ] [ t ]
∑ ∑ ∑T
Et [XT ] = Et Yi = Et Yi + Yi
i=1 i=1 i=t+1
[ ] [ ]
∑
T ∑
T
= Et Xt + Yi = Xt + Et Yi
i=t+1 i=t+1
= Xt ,
where we have used the property that Xt is Ft −measurable (since Ft does contain Xt ). The following result applies.
∏
T
XT = Yi ,
i=1
X0 = 1,
and we obtain
[ ] [ ] [ ]
∏
T ∏
t ∏
T ∏
T
Et [XT ] = Et Yi = Et Yi Yi = Et Xt Yi
i=1 i=1 i=t+1 i=t+1
[ ]
∏
T ∏
T ∏
T
= Xt Et Yi = Xt Et [Yi ] = Xt 1 = Xt ,
i=t+1 i=t+1 i=1
and this is true also for t = 0. Thus, XT as defined above is a martingale. In these passages we have used: (i)
the fact that Xt is Ft −measurable and (ii) the fact that Yi are i.i.d.
2.8 Stopping time 23
The Wiener process is a martingale and, in fact, knowing that, for any s ≥ t
Et [dWs ] = 0,
The integral of a differential of a function coincides with the function computed in the upper bound reduced by
the function computed in the lower bound:
Et [WT − Wt ] = 0,
and since Wt is Ft −measurable, we can conclude
Et [WT ] = Wt ,
Definition 2.8.1 (Stopping Time). Given a stochastic process Xt , a non-negative integer random variable τ
is called a «stopping time» if for all integer k ≥ 0, the event τ ≤ k depends only on {X0 , X1 , ...Xk }.
In finance there are many examples of stopping time. For instance the time of default for a firm is a stopping
time. One of the first models used for describing the event of default is the so-called threshold model: if Xt is the
value of the firm (i.e. the value of its assets reduced by the value of its liabilities), then the default (τ ) is the first
time when Xt = 0. Of course τ is a stopping time.
A martingale is still a martingale even if it is evaluated at a stopping time.
Theorem 2.8.1 (Optional Stopping Time). Given a martingale Xt for t ∈ [0, T ], and a stopping time
τ ≤ T , then Et [Xτ ] = Xt ·
One of the problems that can be solved by using the optional stopping time theorem is the so-called Gambel’s
ruin problem. Let us take again the coin toss game whose total gain Xt is a martingale. We define τ the first
time when Xτ reaches either A Euros or −B Euros. What is the probability to reach first the threshold A (or −B)?
From Theorem 2.8.1, we know that
E0 [Xτ ] = X0 = 0.
24 2 Stochastic processes
If we call p the probability to reach first the amount A (and, thus, 1 − p is the probability to reach first the
amount −B) the previous equation can be written as
pA + (1 − p) (−B) = 0,
The probability that the maximum is higher than a constant threshold can be expressed as a function of the
probability that the Brownian motion itself is higher than the same threshold. The result is as follows.
Proof. In order to prove the theorem, we define a stopping time τa as the first time Wt reaches the threshold a
or, in formal terms,
τa = min {t : Wt = a} ,
then, because of the symmetry of Wt we can write1
or, in other terms, the probability that, after reaching the threshold, the Wiener process is either higher or lower
than the threshold itself is the same.
Now, the probability that the maximum at time t is higher than the threshold coincides with the probability
that the time t is higher than τa :
P {Mt > a} = P {t > τa } .
Any probability can be written as the sum of two conditional probabilities, if the conditions are collectively
exhaustive as follows
where the last two passages are true because Wτa = a (since the Wiener process is continuous) and if Wt > a
we know that τa happened for sure before t (thus the probability of the conditional event is the same as the
probability of the unconditional event).
We know that Wt is distributed normally with mean zero and variance t. Thus, the probability in Theorem
2.9.1, can be computed as follows
{ } { } ( )
Wt a Wt a a
P {Wt > a} = P √ > √ =P − √ <− √ =Φ − √ ,
t t t t t
where Φ (•) is the CDF of the standard normal variable.
2.9 Low of the maximum 25
We can write a function in R which computes the probability P {Mt > a} both in closed form and by simulation.
The inputs of the function are: the threshold a, the time horizon t, the time interval dt for simulating the Wiener
differential dWt , and the number of simulations to be performed (N ). The outputs are: the theoretical probability
(pt) and the empirical probability (p). The function can be written as follows, where we use
• the function [Link](condition) which gives 1 is the «condition» written in the argument is true, and
0 otherwise.
In the function we have simulated the Wiener process Wt by summing the differentials dWt , in fact (with
W0 = 0):
∫ t
Wt = dWs .
0
The empirical probability is obtained by computing the mean of a function whose value is 1 if the maximum of
Wt is greater than a and 0 otherwise.
Finally, the theoretical probability is computed by using the function
pnorm(x,mean,sd)
which gives the cumulative distribution function of a Gaussian variable in x (i.e. the probability that a Gaussian
variable takes values lower than x).
The result is shown in the following commands.
The rate of convergence to the exact value is quite low (since it is proportional to the square root of the number of
simulations N ) and a very large number of experiments should be needed in order to obtain a better approximation.
Chapter 3
Stochastic calculus
Remark 3.1.1. There are technical (Lipschitzian) conditions on the function f and g in (3.1.1) which guarantee
the existence of a unique solution to this differential equation. For more details the reader is referred, for
instance, to Karatzas and Shreve [1991], ksendal [2000]. Here, we note that these conditions are sufficient, but
not necessary.
In the ordinary calculus, when g = 0, the function h (t, Xt ) coincides with the derivative of Xt with respect to
time. In fact, in this case, we can write
dXt
= h (t, Xt ) .
dt
In order to apply the same idea to a differential equation with a non zero diffusion, the Wiener process dWt
should be differentiable (i.e. the ratio dW
dt should exist). Unfortunately, this is not the case as we show in the
t
following section and, accordingly, a new version of differential calculus will be needed.
∑
n−1 ∑
n−1
(f ′ (si ) (ti+1 − ti )) .
2 2
(f (ti+1 ) − f (ti )) =
i=0 i=0
27
28 3 Stochastic calculus
Now, if we substitute all the n terms f ′ (si ) with their maximum, we obtain
∑
n−1 ( ) n−1
∑
max f ′ (s)
2 2 2
(f (ti+1 ) − f (ti )) ≤ (ti+1 − ti ) ,
0≤s≤T
i=0 i=0
∑
n−1 ( ) n−1
∑ ( T )2 ( ) 2
T
max f ′ (s) = max f ′ (s)
2 2 2
(f (ti+1 ) − f (ti )) ≤ .
i=0
0≤s≤T
i=0
n 0≤s≤T n
If we take the limit for n → ∞, the quadratic variation goes to zero. So, for any differentiable function, the
quadratic variation is zero. In continuous time we can write
∫ T
2
(df (s)) = 0.
0
Now, we can check what happens to the quadratic variation of a Brownian motion. Since we know that
( )
T
Wti+1 − Wti ∼ N 0, ,
n
we can define
Zi ≡ Wti+1 − Wti ,
and write ( )
∑
n−1
1∑ 2
n−1
T
Zi2 =n Z → nV [Zi ] = n = T,
i=0
n i=0 i n
where the convergence (→) holds because of the strong version of the Law of Large Numbers and because all the
variances of Zi are the same. Thus, we can conclude what follows.
∑
n−1
( )2
lim Wti+1 − Wti = T.
n→∞
t=0
Remark 3.2.1. Note that, given (2.4.1), the square of dWt can be written as
{
2 dt, 12
(dWt ) =
dt, 12
2
which is a degenerate stochastic variable such that (dWt ) = dt.
We can check this result in R through the following commands, where we assume T = 10, a sufficiently small dt
(10−6 ) and the Wiener differentials dWt are generated. The sum of their square values approaches T .
3.3 Correlated Wiener processes 29
T = 10
dt = 10^(-6)
for (i in 1:5) {
dW = rnorm(T/dt, 0, sqrt(dt))
print(sum(dW^2))
}
## [1] 9.99569
## [1] 10.00288
## [1] 9.995888
## [1] 9.996089
## [1] 10.00076
tiated.
Because of this quadratic variation property (the quadratic variation which does not vanish), the stochastic
integral is not the same for any point of the interval [ti , ti+1 ] on integration is performed (as it happens for the
Riemann integral). The Itô’s integral is obtained when the leftmost of each interval is taken. When the rightmost
is taken, another calculus is obtained: the Stratonovich calculus. In the Stratonovich version of calculus, the usual
properties of integration and differentiability hold but, unfortunately, it cannot be applied in finance. In fact, in
the Stratonovich version of stochastic calculus, the idea to integrate on the right side of each time interval coincides
with the hypothesis to know the future, which is not realistic in finance. Thus, in order to develop a consistent
financial framework, we have to rely on Itô’s calculus (where the quadratic variation of a Wiener process does not
vanish).
In what follows we will use the properties of dWt that we have just demonstrated and that are summarised
in Table 3.2.1, where the property dt × dWt = 0 directly comes from the quadratic variation result: dt × dWt =
√ 3 3
dt × dt = (dt) 2 ; if we let dt → ∞, then (dt) 2 converges to zero faster than dt and can be set to 0.
where Wt is a vector of independent Wiener processes and C is a square matrix of coefficients that must be
determined. Since Wt is normally distributed, we know that a linear transformation of it is still normally distributed.
In particular, CWt and Ŵt are equivalent if they have the same mean and the same variance. The mean is already
equal since they have zero mean:
[ ]
Et Ŵt = Et [CWt ] = CEt [Wt ] = 0,
and now we have to find C such that also the variance is the same:
[ ]
Vt Ŵt = Vt [CWt ] = CVt [Wt ] C ′ = CC ′ t,
where C ′ is the transposed of C. Thus, we can conclude that Equation (3.3.1) has a solution if there exists a matrix
C such that
CC ′ = R.
Cholesky has demonstrated that if R is a square positive semi-definite (which is our case here), then there exists
a sub-triangular matrix C which satisfied the previous equality. We recall that a sub-triangular matrix is defined
as follows
c1,1 0 ... 0
c2,1 c2,2 ... 0
C= ...
.
... ... ...
ck,1 ck,2 ... ck,k
As an exercise, we show the application of this result for two correlated Wiener processes:
[ ] [ ] [ 1 ρ ]
Ŵ1,t
Ŵt = , Vt Ŵt = t.
Ŵ2.t ρ 1
There exist four solutions to this system (of three equations in three unknowns) and one of them is
[ ]
1 √ 0
C= .
ρ 1 − ρ2
Thus, we can conclude that the initial vector of correlated Wiener processes Ŵt can be written by using inde-
pendent Wiener processes as follows
[ ] [ ][ ] [ ]
Ŵ1,t 1 √ 0 W1,t W1,t
√
Ŵt = = = .
Ŵ2.t ρ 1 − ρ2 W2.t ρW1,t + 1 − ρ2 W2.t
3.4 Itô’s calculus 31
Remark 3.3.1. When a financial model is written on correlated Wiener processes, it can always be traced
back to a fully equivalent model with independent Wiener processes through the Cholesky decomposition of a
variance-covariance matrix.
As we have already demonstrated in the previous sections, the quadratic variation of any differentiable function
is zero (and, a fortiori, the same is true for any other power of the variation higher than two). This implies that,
i
for any differentiable function f (t), all the terms (dt) = 0 for any i ≥ 2 and, so
∂f (t)
df (t) = dt.
∂t
Nevertheless, we have also demonstrated that the quadratic variation of a stochastic process does not vanish
(while the variation of the higher powers, of course, do vanish).
Thus, if we define a stochastic process as the solution to the following stochastic differential equation
we can use the Taylor expansion for computing the dynamics of a function Y (t, Xt ) as follows:
∂Y ∂Y 1 ∂2Y 2
dY = dt + dXt + (dXt ) .
∂t ∂Xt 2 ∂Xt2
Now, if dXt is plugged into this differential (and recalling the rules in Table 3.2.1) we obtain the final result:
( )
∂Y ∂Y 1 ∂2Y 2 ∂Y
dY = + h (t, Xt ) + g (t, Xt ) dt + g (t, Xt ) dWt ,
∂t ∂Xt 2 ∂Xt2 ∂Xt
which is known as Itô’s lemma.
Lemma 3.4.1 (Itô’s lemma). Given a stochastic process Xt which solves the differential equation
any function Y (t, Xt ) which is at least differentiable once w.r.t to its first argument and twice w.r.t. its second
argument, solves the following differential equation:
( )
∂Y ∂Y 1 ∂2Y 2 ∂Y
dY = + h (t, Xt ) + 2 g (t, Xt ) dt + g (t, Xt ) dWt .
∂t ∂Xt 2 ∂Xt ∂Xt
Itô’s lemma is useful for reformulate the usual rules of calculus once applied to a stochastic framework. Let us
assume that, in the differential dXt we have X0 = 0, h (t, Xt ) = 0 and g (t, Xt ) = 1, so that Xt = Wt . Then, we
can use Itô’s lemma on a function Y (Wt ), as follows
1 ∂ 2 Y (Wt ) ∂Y (Wt )
dY (Wt ) = 2 dt + dWt .
2 ∂Wt ∂Wt
Then, if this differential is integrated in the interval [0, t]:
∫ t ∫ t ∫ t
1 ∂ 2 Y (Ws ) ∂Y (Ws )
dY (Ws ) = 2
ds + dWs ,
0 0 2 ∂W s 0 ∂Ws
32 3 Stochastic calculus
we finally obtain ∫ ∫
t t
∂Y (Ws ) 1 ∂ 2 Y (Ws )
dWs = Y (Wt ) − Y (0) − ds, (3.4.1)
0 ∂Ws 2 0 ∂Ws2
where we observe that, in this framework, it is no more true that the integral of a derivative of a function coincides
with the function itself, i.e. ∫ t
∂Y (Ws )
dWs ̸= Y (Wt ) − Y (0) .
0 ∂Ws
for any real function f (t, Xt ) of an Ft −measurable stochastic variable Xt , where of course It0 = 0, then we have
The demonstration is almost trivial if the tower property (Proposition 2.6.1) is used:
[∫ t ] [ [∫ t ]]
Et0 f (s, Xs ) dWs = Et0 Es f (s, Xs ) dWs
t0 t0
[∫ t ]
= Et0 Es [f (s, Xs ) dWs ]
t0
[∫ t ]
= Et0 f (s, Xs ) Es [dWs ]
t0
= 0,
Es [f (s, Xs )] = f (s, Xs ) ,
Proposition 3.5.1. Any Itô integral of a real function of an Ft −measurable variable (as in (3.5.1)) is a
martingale, and its expected value is zero.
The result of Proposition 3.5.1 can be applied when we take the expected value of both sides of Equation (3.4.1)
for any function Y (Wt ) (in this case Xt coincides with Wt that is, of course, Ft −measurable):
[∫ t ] [ ∫ ]
∂Y (Ws ) 1 t ∂ 2 Y (Ws )
E0 dWs = E0 Y (Wt ) − Y (0) − ds ,
0 ∂Ws 2 0 ∂Ws2
3.5 Some properties of Itô processes 33
which simplifies to ∫ [ ]
t
1 ∂ 2 Y (Ws )
0 = E0 [Y (Wt )] − Y (0) − E0 ds,
2 0 ∂Ws2
and, finally, ∫ [ 2 ]
1 t ∂ Y (Ws )
Y (0) = E0 [Y (Wt )] − E0 ds.
2 0 ∂Ws2
An obvious result is obtained when the second derivative of Y (Wt ) with respect to Wt is zero (i.e. when Y (Wt )
is a linear transformation of Y (Wt )).
Corollary 3.5.1. A linear transformation of a Wiener process (like Y (Wt ) = a + bWt ) is a martingale.
The second property that we demonstrate is the so-called «isometry» of an Itô integral. We have just demon-
strated that the expected value of an Itô integral It (as in (3.5.1)) is zero. Now, we want to compute its variance
(since the expected value is zero, the variance coincides with the second moment):
[(∫ )2 ] [ [(∫ )2 ]]
t t
Et0 f (s, Xs ) dWs = Et0 Es f (s, Xs ) dWs
t0 t0
[ [(∫ )2 ]]
t
= Et0 Es f (s, Xs ) dWs − 0
t0
[ [∫ t ]]
= Et0 Vs f (s, Xs ) dWs
t0
[∫ t ]
= Et0 Vs [f (s, Xs ) dWs ]
t0
[∫ t ]
2
= Et0 f (s, Xs ) Vs [dWs ]
t0
[∫ t ]
2
= Et0 f (s, Xs ) ds ,
t0
Proposition 3.5.2 (Itô isometry). Given any real function f (t, Xt ) of an Ft −measurable stochastic variable
Xt , the following property holds
[(∫ )2 ] ∫ t
t [ ]
2
Et0 f (s, Xs ) dWs = Et0 f (s, Xs ) ds.
t0 t0
√
Example 3.5.1. An easy application can be obtained when f (t, Xt ) = t, as follows:
[(∫ )2 ] ∫ t
t√
1
E0 sdWs = tds = t2 ,
0 0 2
and when t = 1 [(∫ )2 ]
1 √ 1
E0 sdWs = .
0 2
34 3 Stochastic calculus
If we want to check this result in Octave, we can use the following commands where:
• The values of dWt are created between time 0 and time T = 1 (this period is divided into sub-periods whose
length is dt).
√
• The sum (as an approximation of the integral) of the products sdWs is computed for s ∈ {dt, 2dt, 3dt, ..., T }.
• Finally the mean of all these results is computed for N = 104 simulations.
N = 10^4
dt = 1/250
T = 1
dW = array(rnorm(T/dt * N, 0, sqrt(dt)), dim = c(T/dt, N))
mean(apply(sqrt(seq(dt, T, dt)) * dW, 2, sum)^2)
## [1] 0.5064259
Remark 3.6.1. The continuous time stochastic process (3.1.1), once written in discrete time as in (3.6.1) is an
auto-regressive model of order one (AR (1)). In continuous time it is impossible to obtain any auto-regressive
process of order higher than 1.
The idea, then, is to start from an initial value x0 (at time t0 ), find x1 by using (3.6.1) as
eval(parse(’function’))
which evaluate a text as a command. In our case the drift and the diffusion will be functions of the arguments: the
time (t) and the space (x). Thus, we can write the following commands.
Figure 3.6.1: 100 simulations of a daily Wiener process for a 2 year period
0
−2
−4
days
for (i in 2:(T/dt)) {
dx = h(t, x[i - 1, ]) * dt + g(t, x[i - 1, ]) * rnorm(N) *
sqrt(dt)
x[i, ] = x[i - 1, ] + dx
t = t + dt
}
return(x)
}
Remark 3.6.2. The «drift» and «diffusion» inputs of the euler function must be written between inverted commas
and the variables «time» and «space» must be specified with the letters «t» and «x». If this is not the case the
function may lead to wrong computations and give an error message.
If we set the drift to zero and the diffusion to 1, we can obtain 100 simulations of a daily Wiener process Wt for
a 2 year period (as shown in Figure 3.6.1).
Chapter 4
where the drift is linear in Xt and a (t) and b (t) are deterministic functions. Without knowing the functional form
of the diffusion, we are not able to compute the solution to (4.1.1). Nevertheless, we are able to compute the
expected value of Xt by applying Itô’s lemma to the following function
∫t
− a(u)du
Xt e t0
.
Now, if both sides are integrated between t0 and T , the following result is obtained
[ ∫ ]
[ ∫ ] T ∫
− tT a(u)du − ts a(u)du
Et0 XT e 0 − Xt0 = Et0 − b (s) e 0 ds ,
t0
and then [∫ ]
T ∫s ∫T
− a(u)du − a(u)du
Xt0 = Et0 b (s) e t0
ds + XT e t0
. (4.1.2)
t0
We see that this equation has a suitable financial interpretation when Xt0 is interpreted as the value of an asset
at time t0 :
• the function a (t) plays the role of a discount interest rate; in fact, the exponential of the negative integral is a
discount factor in continuous time; the value of the function a (t) will be different according to the framework
which the asset will be evaluated in;
• the function b (t) measures the cash flows paid by the asset from time t0 to time T ;
• the value XT is the value at which the asset is expected to be sold at time T or, alternatively, the last cash
flow it pays.
1 We
∫t
recall that the derivative of the integral t0 a (u) du with respect to t is a (t).
37
38 4 Stochastic processes used in finance
t0
∫ T
= − αβeα(s−t0 ) ds + Et0 [XT ] eα(T −t0 )
t
( 0 )
= β 1 − eα(T −t0 ) + Et0 [XT ] eα(T −t0 ) ,
Et [dXt ] = α (β − Xt ) dt,
and, thus, if at time t, Xt > β (or Xt < β), the expected value of the differential dXt is negative (positive) and Xt
tends to decrease (towards β).
We can draw the behaviour of the expected value (4.2.2) as in Figure 4.2.1, where we see that the higher α the
higher the speed of convergence.
Many variables in economics exhibit a mean reversion behaviour like: the growth rate of GDP, the inflation rate,
the interest rate, the (foreign) exchange rate and, in general, the growth rates of any economic variable. Thus, the
process (4.2.1) is widely used in economics and finance.
In the following sections, we will take into account three processes that are obtained by giving particular
functional forms to the diffusion term g in (4.2.1).
Xt eα(t−t0 ) ,
as follows ( )
d Xt eα(t−t0 ) = eα(t−t0 ) αβdt + eα(t−t0 ) σdWt ,
and, finally,
( ) ∫ t
Xt = Xt0 e−α(t−t0 ) + β 1 − e−α(t−t0 ) + σ e−α(t−s) dWs .
t0
4.3 The Ornstein-Uhlenbeck process 39
Figure 4.2.1: Behaviour of the expected value in (4.2.2) for 10 years when Xt0 = 25, β = 30 and for three values of
α : 0.5 continuous line, 0.2 dashed line, and 0.1 dotted line
30
29
Expected value
28
27
26
α=0.5
α=0.2
α=0.1
25
0 2 4 6 8 10
years
40 4 Stochastic processes used in finance
Since dWt is normally distributed, we can conclude that also Xt is Gaussian. The mean and the variance can
be computed thanks to the properties of Itô integrals that we have shown in the previous chapters:
( )
Et0 [Xt ] = Xt0 e−α(t−t0 ) + β 1 − e−α(t−t0 ) ,
∫ t
1 − e−2α(t−t0 )
Vt0 [Xt ] = σ 2 e−2α(t−s) ds = σ 2 .
t0 2α
The stationary values of the mean and the variance are given by
lim Et0 [Xt ] = β,
t→∞
σ2
lim Vt0 [Xt ] =
.
t→∞ 2α
Again, as presented in the previous sections, β measures the long term mean of the process. The parameter α,
in this case, also affects the long term variance: the higher α the lower the variance. In fact, a high α means that
the process tends to stay close to its long term mean. When α tends towards infinity, the process is constantly
equal to β. When, instead, α = 0, the process is a random walk.
The Gaussian distribution of Xt implies that it can take negative values. Thus, this process is useful for
describing economic and financial variables which can become negative.
In particular, we can compute at time t0 the probability that Xt takes negative values as follows
{ } ( )
Xt − Et0 [Xt ] Et0 [Xt ] Et0 [Xt ]
P {Xt < 0} = P √ < −√ = Φ −√ ,
Vt0 [Xt ] Vt0 [Xt ] Vt0 [Xt ]
where Φ (•) is the cumulative distribution function of a standard normal density. In equilibrium, the probability to
have negative values is ( )
β √
lim P {Xt < 0} = Φ − 2α .
t→∞ |σ|
The process (4.3.1), when written in discrete time as in (3.6.1), becomes
xi+1 = αβdt + (1 − αdt) xi + σdWt ,
and, accordingly, it can be estimated with Ordinary Least Squares as in the following model
xi+1 = ρ0 + ρ1 xi + εi ,
where εi is a normally distributed error term. Once the parameters ρ0 and ρ1 are estimated, the parameters α and
β are obtained by solving the system {
ρ0 = αβdt,
ρ1 = 1 − αdt,
from which
1 − ρ1
α̂ = ,
dt
ρ0
β̂ = .
1 − ρ1
The parameter σ, instead, is directly estimated from the variance of the differences dXt :
√
Vt [dXt ]
Vt [dXt ] = σ dt ⇒ σ̂ =
2
.
dt
Let us take into account the Harmonized Index of Consumer Prices – HICP (all items) for Euro area as shown
in Figure 4.3.1 for the period from 1/1/1996 to 1/9/2015 (monthly data of annual percentage changes).
If we want to estimate the parameters of the process (4.3.1) in such a way that it fits the HICP index at best,
we can at first compute the empirical standard deviation and then estimate (via Ordinary Least Square – OLS) the
parameters ρ0 and ρ1 .
We have stored the inflation data in the variable hicp, then we compute the variance of the differences and,
finally, the estimated σ.
HICP
1997−01−01
1997−08−01
1998−04−01
1998−12−01
1999−07−01
2000−02−01
4.3 The Ornstein-Uhlenbeck process
2000−10−01
2001−06−01
2002−01−01
2002−08−01
2003−04−01
2003−12−01
2004−07−01
2005−02−01
2005−10−01
2006−06−01
2007−01−01
2007−08−01
2008−04−01
2008−12−01
2009−07−01
2010−02−01
2010−10−01
2011−06−01
2012−01−01
2012−08−01
2013−04−01
2013−12−01
2014−07−01
2015−02−01
Figure 4.3.1: Harmonized Index of Consumer Prices (all items) for Euro area, monthly data (percentage change
with respect to 12 months ahead) for the period from 1/1/1996 to 1/9/2015 [Source: [Link]/fred2/]
41
42 4 Stochastic processes used in finance
dt = 1/12
sigma = sd(diff(hicp))/sqrt(dt)
sigma
## [1] 0.008503048
Now, we can estimate α and β (in R we will call them a and b) by using the function (linear model)
lm(y~x)
where y is the dependent variable and x is the matrix of the independent variables. In the following code we use:
a = (1 - rho1)/dt
a
## [1] 0.3153725
b = rho0/(1 - rho1)
b
## [1] 0.01483822
We can check that the long term equilibrium value of the variable is close to its empirical mean (up to about 30
basis points).
mean(hicp)
## [1] 0.01793022
Now, we can simulate some path of the process (4.3.1) by using the parameters we have just estimated and
compare the simulations with the empirical data. This goal is achieved through the commands shown in Figure
4.3.2 (where x0 is, here, the first empirical value of the inflation index). In the code we now use:
• The command abline for adding a straight line. If the option is h (or v), then a horizontal (or vertical) line
is drawn and it has the coordinate measured by the value of the option itself.
4.3 The Ornstein-Uhlenbeck process 43
Figure 4.3.2: 100 paths (in light grey) of the process (4.3.1) whose parameters have been estimated on the HICP
index (as shown in Figure 4.3.1); in bold the historical values of the HICP
0.00
−0.02
months
44 4 Stochastic processes used in finance
The probability to have negative value of the inflation index, in the long term, is as follows.
Actually, a probability to have a negative value of inflation which is higher than 8% seems to over evaluate the
reality and this is mainly due to the recent financial crisis which altered the economic (and financial) framework.
dXt = αdt,
which is strictly positive. In this case, the level zero is said to be a reflecting barrier (as soon as the process hits
the barrier, it is reflected and pushed far from it).
There exists a condition for preventing the process from reaching the value zero (it is called Feller’s condition):
2αβ ≥ σ 2 .
While we are able to compute the mean and the variance of Xt in closed form, the process (4.4.1) does not have
a closed form solution.
Furthermore, the process (4.4.1) is heteroscedastic, since its variance depends on the value of Xt itself. In
particular, when Xt is higher, also its variance is higher. This is consistent with the empirical evidence about
interest rates, since a high interest rate is in general related to a period of economic troubles, when the variance of
any economic indicator becomes higher.
If we want to use OLS for estimating the parameters of (4.4.1), we must make it homoscedastic, i.e. we must
look for a transformation f (Xt ) such that
∂f (Xt ) √
Xt = 1.
∂Xt
One of the solutions to this differential equation is
√
f (Xt ) = 2 Xt .
√
Thus, if we define Yt = 2 Xt , its differential is
( ( ) √ )
1 1
dYt = √ αβ − σ 2 − α Xt dt + σdWt
X 4
(( t ) )
1 1 α
= 2αβ − σ 2 − Yt dt + σdWt .
2 Yt 2
Figure 4.4.1: Volatility Index (VIX) on CBOE from 2/1/1990 to 19/11/2015 (daily data) [Source: re-
[Link]/fred2/]
80
70
60
50
VIX
40
30
20
10
1990−01−02
1990−11−01
1991−09−03
1992−07−02
1993−05−04
1994−03−03
1995−01−02
1995−11−02
1996−09−02
1997−07−03
1998−05−04
1999−03−03
2000−01−03
2000−11−01
2001−09−03
2002−07−03
2003−05−02
2004−03−03
2004−12−31
2005−11−02
2006−09−01
2007−07−03
2008−05−02
2009−03−03
2010−01−01
2010−11−02
2011−09−01
2012−07−03
2013−05−02
2014−03−04
2015−01−01
( ) ( )
1 2 dt αdt
Yi+1 = 2αβ − σ + 1− Yi + σdWi ,
2 Yi 2
1
Yi+1 = ρ1 + ρ2 Yi + εi .
Yi
After estimating ρ1 and ρ2 , the values of α and β are obtained as follows:
{( ) {
ρ1 + 12 σ 2 dt
2αβ − 12 σ 2 dt = ρ1 , β = 4(1−ρ ,
( ) ⇒ 2)
1 − 2 = ρ2 ,
αdt 1−ρ
α = 2 dt . 2
We can now apply this estimation method to a process which never becomes negative and exhibits a mean
reverting property. The index we choose is the volatility index (VIX) listed on the Chicago Board of Options
Exchange (CBOE – [Link]). It measures the volatility of the US financial market (measured through the
implied volatility computed on a portfolio of options). The downloaded data (from 2/1/1990 to 19/11/2015) are
shown in Figure 4.4.1.
Once the values of the VIX have been stored in variable vix, the volatility is estimated with the following
commands.
dt = 1/250
y = 2 * sqrt(vix)
sigma = sd(diff(y))/sqrt(dt)
sigma
## [1] 4.680272
46 4 Stochastic processes used in finance
Now, we can estimate α and β (in R we will call them a and b) by using the function lm(y~x) as we did in the
previous section. Inside the function lm we write «-1» for indicating that the constant of regression must not be
taken into account.
a = 2 * (1 - rho2)/dt
a
## [1] 4.408669
b = (rho1 + 0.5 * sigma^2 * dt)/(4 * (1 - rho2))
b
## [1] 19.83019
We can check that the long term equilibrium value of the variable is close to its empirical mean.
mean(vix)
## [1] 19.83174
Now, we can simulate some paths of the process (4.4.1) by using the parameters we have just estimated and
compare the simulations with the empirical data. This goal is achieved through the following commands (where we
use, as x0 , the first empirical value of the VIX index). The result is drawn in Figure 4.4.2, where we compare the
VIX index with 100 simulations (in the upper graph) and with only the first simulation (in the lower graph).
We see that the picks in the simulated process reach high values (around 60) but are not able to replicate the
highest volatility reached during the recent financial crisis (in 2007/2008). Nevertheless, the simulations capture in
a quite accurate way the process we are trying to reproduce.
Figure 4.4.2: In the upper graph, 100 paths (in light grey) of the process (4.4.1) whose parameters have been
estimated on the VIX index (as shown in Figure 4.4.1); in bold the historical values of the VIX. In the lower graph,
the comparison is performed between the VIX index (in bold) and only one simulated path (in light grey)
40
20
0
days
80
simulated VIX
60
40
20
0
days
48 4 Stochastic processes used in finance
One of the most commonly used divergent process is the so-called geometric Brownian motion (GBM) which
has the following form
dXt = µXt dt + σXt dWt . (4.5.1)
Such a process is heteroscedastic, and it can be transformed into a homoscedastic process by taking its logarithm
transformation: ( )
1
d ln Xt = µ − σ 2 dt + σdWt ,
2
which can be integrated for obtaining its solution in closed form:
The GBM has been widely used in finance for modelling asset prices since it is consistent with the price of an
asset whose continuous return is constant over time and given by µ. In fact, given the expected value of Xt , we can
write
Et [Xt ] = Xt0 eµ(t−t0 ) ,
which is a compounding rule in continuous time.
Since (4.5.1) is a divergent process, then it is suitable for describing processes that, on average, grow over time.
This is the case, for instance, of stock exchange indexes. In Figure 4.5.1, for instance, we show the evolution of the
S&P500 index from 3/1/1950 to 23/10/2015.
We can see that this index is, on average, increasing over time, even if we can immediately recognise six periods:
• from the beginning to 1995: the «normal» period (even if there is a big fall on October, 19th 1987 – the
so-called «black Monday»)
• from 1995 to 2000: the accumulation of the so-called «dot-com bubble»
• from 2000 to 2002: the burst of the bubble (on September, 11th 2001 there is the attack to the World Trade
Centre in New York)
• from 2002 to 2007: the accumulation of the co-called «sub-prime bubble»
• from 2007 to 2009: the burst of the bubble (on September, 15th 2008 Lehman Brothers goes bankrupt)
• from 2009: the recovery and the increase of the index because of the many quantitative easing of the Federal
Reserve
Once the values of the index are stored in the variable S (stock), we can estimate the parameters µ and σ as follows.
{ ( )
1
µ = dt E [d ln Xt ] + 12 Vt [d ln Xt ] ,
√ t
σ = Vt [ddt ln Xt ]
.
dlnS = diff(log(S))
dt = 1/250
sigma = sqrt(var(dlnS)/dt)
sigma
## [1] 0.153757
mu = (mean(dlnS) + 0.5 * var(dlnS))/dt
mu
## [1] 0.08466284
S&P 500
[Link]]
1950−01−03
1952−03−20
1954−06−03
1956−08−09
1958−10−17
1960−12−28
4.5 The geometric Brownian motion
1963−03−11
1965−05−19
1967−07−27
1969−11−14
1972−01−20
1974−04−01
1976−06−07
1978−08−14
1980−10−17
1982−12−27
1985−03−01
1987−05−08
1989−07−14
1991−09−19
1993−11−23
1996−01−31
1998−04−08
2000−06−15
2002−08−29
2004−11−08
2007−01−19
2009−03−30
2011−06−07
2013−08−16
Figure 4.5.1: Daily values of the stock exchange index S&P500 from 3/1/1950 to 23/10/2015 [Source: finance.
49
50 4 Stochastic processes used in finance
Figure 4.5.2: 100 paths (in light grey) of a geometric Brownian motion whose parameters have been estimated on
the S&P 500 index (in black)
30000
20000
10000
0
days
Thus, we can conclude that the average standard deviation for the S&P500 is about 15%, while the average
return is about 8.5%.
Now we can generate some simulations of the S&P500 with the parameters we have just estimated, as shown in
Figure 4.5.2.
4.6 The Chan et al. [1992] process (and the simulated maximum likeli-
hood estimation)
Chan et al. [1992] propose a model for interest rates which accommodates all the models shown in the previous
sections:
dXt = α (β − Xt ) dt + σXtγ dWt , (4.6.1)
and is known as CKLS mode (form the initials of the authors).
The Vasiček [1977] model is obtained with γ = 0, while the Cox et al. [1985] model is obtained with γ = 21 . In
this case, a homoscedastic transformation exists, but it is not useful for estimating the parameters of the model. In
4.6 The Chan et al. [1992] process (and the simulated maximum likelihood estimation) 51
This process is actually homoscedastic, but since the transformation can be done only by knowing the value of
the parameter γ, it could be used for estimating the other parameters only if there exists a method which allows
to estimate, at first, γ. Since estimating parameters in different steps is usually inefficient, we must rely to other
methods.
A simple method is based on the following Euler scheme of Equation (4.6.1):
from which we see that, given the value of Xi , the variable Xi+1 is normally distributed:
( )
xi+1 | xi ∼ N αβdt + (1 − αdt) xi , σ 2 x2γ
i dt .
Remark 4.6.1. While we know that the distribution of xi+1 conditional to the value of xi is Gaussian, the
distribution of xi (unconditional) is not known.
Thanks to this result, we can compute the likelihood function for each xi+1 given its previous value xi as follows:2
( )2
∏
n−1
1 − 12
xi+1 −αβdt−(1−αdt)xi
γ√
L= √ √ e σx
i
dt
.
i=1
2πσxγi dt
The parameters α, β, γ and σ are computed by maximising the value of L. Since the optimal values of the
parameters do not change with a monotonic transformation of L, we can maximise its logarithm:
( )2
∑
n−1 (√ √ ) 1 xi+1 − αβdt − (1 − αdt) xi
l ≡ ln L = − ln γ
2πσxi dt − √
i=1
2 σxγi dt
∑ n−1 ( )2
1 ∑ xi+1 − αβdt − (1 − αdt) xi
n−1
n−1
=− ln (2πdt) − (n − 1) ln σ − γ ln xi − 2 .
2 i=1
2σ dt i=1 xγi
n−1 ( )2
1 ∑ 1 ∑ xi+1 − αβdt − (1 − αdt) xi
n−1
1
min ln σ + γ ln xi + 2 ,
α,β,γ,σ n − 1 i=1 2σ dt n − 1 i=1 xγi
where the maximisation problem has become a minimisation one since the objective function has been multiplied
by −1.
Remark 4.6.2. When γ = 0 (i.e. when the process Xt is homoscedastic, like in the case of Vasiček), this method
coincides with the least square method. In fact the minimisation problem becomes:
1 ∑
n−1
1 2
min ln σ + (xi+1 − αβdt − (1 − αdt) xi ) .
α,β,γ=0,σ 2σ 2 dt n − 1 i=1
2 In general, if we know the density function f (x, θ) of a stochastic variable x, where θ contains the parameters of the model, the
Unfortunately, the system of the first derivatives with respect to the control variables has not an algebraic closed
form solution. Thus, we have to solve the minimisation problem numerically. We can do that with R by defining
the log-likelihood function and then minimising it with respect to the parameters (α, β, σ, γ).
The log-likelihood function is defined in the following code, where the first argument is the set of the parameters,
and the other inputs are the data and the time interval dt. The name of the parameters are attributed in the first
rows, and then the present and retarded values of the data are set.
Now we can use one of the R function which is meant to optimize a given user defined function whose syntax is
as follows:
optim(parameters,function)
where function is the user defined function that must be minimised, and parameters is the set of the initial
values of the parameters which the iteration starts from. Since, in our case, the log-likelihood function has three
arguments and we want to perform the optimization only with respect to the first argument (the parameters), the
other arguments must be defined inside the optim function.
Now, we can use the optimization procedure for adapting Equation (4.6.1) to the VIX data we have already
presented in the previous sections. We set the initial values of all parameters to 1 but for β which is assumed to
start from the mean value of the VIX index.
Figure 4.6.1: Simulation (in grey) of the VIX index (in black) by using Equation (4.6.1) with the parameters
estimated through the maximum likelihood method
40
20
days
We see that the γ parameter is much higher than 0.5 (the value of γ in the case of the CIR process) and,
accordingly, also the value of σ is different with respect to the one obtained for the CIR process. We can conclude
that the VIX process is «more» heteroscedastic than the CIR process. Instead, the magnitude of α and β is
definitely comparable with the values obtained for the CIR process.
With these new parameters we can now simulate the process (4.6.1) by using the euler function with the
following commands (the result is drawn in Figure 4.6.1).
Figure 4.6.1 allows us to conclude that the CKLS model is much more suitable for describing the VIX index
than the CIR model. The CKLS model, in fact, is able to capture even the high volatility picks.
where C is the covariance operator and ρ is set constant (we recall that this model can be traced back to an
equivalent one with independent Wiener processes as shown in Section ).
Three versions of (4.7.1) play a relevant role in the financial literature:
• when γ = 12 , i.e. the volatility follows a CIR process; this case is studied in Heston [1993];
5.1 Introduction
The Wiener process that has been introduced in the previous chapters is able to describe the «small» (infinitesimal)
changes in the prices, but it fails when the price of an asset falls by a «big» (finite) jump. Since these jumps are
rare events, they are often modelled through a Poisson distribution. We are about to present in this chapter how
the stochastic calculus changes when a Poisson process is used for capturing finite jumps in financial variables (like
a negative jump in asset prices or a positive jump in price volatility after a crisis). For further readings about jump
processes we refer to Cont and Tankov [2004], ksendal and Sulem [2007].
{
1, p
Yi =
0, 1 − p
where p is assumed to be sufficiently low for describing a rare event. The expected value and the variance of this
variable are
E [Yi ] = p, V [Yi ] = p − p2 = p (1 − p) .
Now, we define the stochastic process Xt as the sum of the variables Yi for i ∈ {1, 2, ..., t}, with X0 = 0 by
definition:
∑t
Xt = Yi . (5.2.1)
i=1
If we assume that the variables Yi are i.i.d., then the expected value and the variance of Xt are as follows:
∑
t
E0 [Xt ] = E0 [Yi ] = pt,
i=1
∑
t
V0 [Xt ] = V0 [Yi ] = p (1 − p) t.
i=1
55
56 5 Stochastic processes with jumps
Remark 5.2.1. Note that since E0 [Xt ] ̸= X0 , then this process is not a martingale. We can check that even
through the following passages:
[ t ]
∑ ∑T
Et [XT ] = Et Yi + Yi
i=1 i=t+1
[ ]
∑
T
= Et [Xt ] + Et Yi
i=t+1
∑
T
= Xt + Et [Yi ] = Xt + (T − t) p.
i=t+1
Sometimes, in order to preserve the martingale property of the random walk, the process Xt defined in (5.2.1)
is corrected (or, better, «compensated») by its own mean:
X̂t ≡ Xt − pt,
and now X̂t is a martingale. Of course, the compensation does not alter the variance of the process.
5.3 The continuous time version of the binomial model for rare events
Now, we divide each period into n sub-periods (each of length n1 ≡ dt) for n → ∞ (and, thus, dt → 0). At the same
time, in order to represent a rare event, we let p tend towards zero, with the property that the product pn, i.e. the
probability that an event occurs in one period, is constant and equal to λ:
lim pn ≡ λ.
p→0,n→∞
Figure 5.2.1: In the upper figure one path of the process Xt for t ∈ {0, 1, 2, ..., 500} in (5.2.1) with p = 0.01 is
shown, together with the mean of the process (dashed line). In the lower figure, 100 paths of the same process are
drown, together with the empirical mean (balck line) and the theoretical mean (dashed line)
RE = function(p, r, c) {
Y = array(rbinom(r * c, 1, p), dim = c(r, c))
X = rbind(array(0, dim = c(1, c)), apply(Y, 2, cumsum))
return(X)
}
X1 = RE(0.01, 500, 1)
X100 = RE(0.01, 500, 100)
par(mfrow = c(2, 1), mar = c(2, 3, 2, 2))
matplot(X1, type = "l", ylab = "", col = "lightgray")
lines(apply(X1, 1, mean))
lines(seq(0, 500), seq(0, 500) * 0.01, type = "l", lty = 3, lwd = 3)
grid()
matplot(X100, type = "l", ylab = "", col = "lightgray")
lines(apply(X100, 1, mean))
lines(seq(0, 500), seq(0, 500) * 0.01, type = "l", lty = 3, lwd = 3)
grid()
2.0
1.5
1.0
0.5
0.0
where O (•) measures the order of the infinite. Finally, we can write that the density function of having k events is:
λk e−λ
f (k) = ,
k!
which is the Poisson density function.
The mean of the process in each period is pn = λ and the variance of the process is
lim p (1 − p) n = lim λ (1 − p) = λ.
p→0,n→∞ p→0,n→∞
Thus, we can conclude that the limit distribution for rare events has the following property:
In this case, it is also assumed that the jump width γt is independent of the number of jumps dNt . Thus, the
expected value of (5.4.1) is (we assume λt constant):
Remark 5.4.1. Note that the variable γt , in this case, is not measurable with respect to Ft . If it were measurable,
then we could know the width of the jump before it happens. In order to underline this property, sometimes
the notation of the expected value is changed as Et− [•] where we indicate with t− the instant just before the
jump. In this way, it is obvious that the value γt does not belong to the set Ft− .
We can simulate a Poisson process like (5.4.1) with the following commands, where we have used the command
rpois(n,lambda)
whose inputs are the number of observations (n) and the intensity of the process (λ). The following code allows to
simulate N = 10 times a process which starts at X0 = 25, and with µγ = 1, σγ = 2, whose jumps occur 5 times
1
(λ = 5) for any daily time interval dt = 250 (i.e. we expect to have, on average, 5 jumps per year) and for a period
of T = 10 years.
5.4 The Poisson process 59
Figure 5.4.1: N = 10 simulations (in light grey) of daily values for T = 5 years of the process (5.4.1) with: X0 = 25,
1
µγ = 1, σγ = 2, λ = 5, dt = 250 (in black the mean of the 10 simulations)
Figure 5.4.2: N = 10 simulations (in light grey) of daily values for T = 5 years of the process (5.4.2) with: X0 = 25,
1
µγ = 1, σγ = 2, λ = 5, dt = 250 (in black the mean of the 10 simulations)
We see that the compensated Poisson process has a drift which is negative because it must compensate the
jumps which, on average, are positive (we have assumed µγ > 0).
In general, on a financial market the widths of the jumps are, on average, negative, since the negative jumps
are more frequent and have a greater magnitude with respect to the positive jumps.
not exist for dt → 0). Also the Poisson process cannot be differentiated, but for another reason: it is discontinuous.
In fact, when a jump occurs, the Poisson process jumps by a finite amount and this creates a discontinuity.
Now, if we take a function f (Nt ), the difference between its value in t = 0 and its value in any time t can be
decomposed as follows
∑
t
f (Nt ) − f (N0 ) = (f (Ns + ∆Ns ) − f (Ns )) ,
s=1
where, on the right hand side, we have summed up all the differences between the value of the function at the
beginning and at the end of any period. Nevertheless, since ∆Nt is either equal to 1 (if a jump occurs) or to 0 (if
it does not), then we can also write
∑
t
f (Nt ) − f (N0 ) = (f (Ns + 1) − f (Ns )) ∆Ns ,
s=1
in fact {
f (Ns + 1) − f (Ns ) , ∆Ns = 1
f (Ns + ∆Ns ) − f (Ns ) =
f (Ns ) − f (Ns ) = 0, ∆Ns = 0
The same difference, in continuous time, is
∫ t
f (Nt ) − f (N0 ) = (f (Ns + 1) − f (Ns )) dNs ,
0
whose differential is
df (Nt ) = (f (Nt + 1) − f (Nt )) dNt .
If we take into account any stochastic process driven by a Poisson process
and we take a function Y (t, Xt ), the differential of this function can thus be written as
( )
∂Y ∂Y
dY (t, Xt ) = + h (t, Xt ) dt + (Y (t, Xt + γ (t, Xt )) − Y (t, Xt )) dNt .
∂t ∂Xt
Lemma 5.5.1 (Itô’s lemma for jump-diffusion processes). Given a stochastic process Xt which solves the
differential equation
dXt = h (t, Xt ) dt + g (t, Xt ) dWt + γ (t, Xt ) dNt ,
any function Y (t, Xt ) which is at least differentiable once w.r.t to its first argument and twice w.r.t. its second
argument, solves the following differential equation:
( )
∂Y ∂Y 1 ∂2Y 2 ∂Y
dY = + h (t, Xt ) + 2 g (t, Xt ) dt + g (t, Xt ) dWt
∂t ∂Xt 2 ∂Xt ∂Xt
+ (Y (t, Xt + γ) − Y (t, Xt )) dNt .
With this new version of Itô’s lemma, we can check two interesting properties.
62 5 Stochastic processes with jumps
1. The process Yt = eαNt −λt(e −1) is a martingale (α is a constant and λ is the intensity of the Poisson process).
α
Since any differential equation whose expected value is zero is also a martingale, we have demonstrated the
first statement.
2. The process Yt = (1 + α) t e−λαt is a martingale (α is a constant and λ is the intensity of the Poisson process).
N
were we see that the model makes sense if and only if γt < 1. In other words we must exclude the case when Xt
completely loses its value because of a jump. In fact, if we assume γt = 1, then when a jump occurs, the process
Xt takes value zero and it remains at that level (zero is a so-called absorbing barrier ).
Chapter 6
S (t0 ) = S0 ,
where IS is the diagonal matrix containing the asset values and the prime denotes transposition. Thus, for instance,
we could write a model with 2 assets and 3 risk sources as follows (we neglect all the functional dependences for
the sake of simplicity)
[ ]−1 [ ] [ ] [ ] dWt,1
St,1 0 dSt,1 µ1 σ11 σ12 σ13
= dt + dWt,2 .
0 St,2 dSt,2 µ2 σ21 σ22 σ23
dWt,3
Here, we assume that the prices of the n assets are driven by k independent1 risk sources represented by Brownian
motions.
The expected (instantaneous) returns on these assets are
[ ]
Et IS−1 dSt = µ (t, St ) dt,
Hereafter, we will neglect the functional dependences of both µ and Σ′ with respect to time and space in order
to keep the notation as simple as possible.
Furthermore, on financial market there is a riskless asset (issued by the Government) whose price G (t) follows
a deterministic differential equation:
dGt
= rt dt. (6.1.2)
Gt
Remark 6.1.1. If we know the value in t0 of the asset Gt then the (unique) solution of the differential equation
(6.1.2) is ∫t
r du
Gt = Gt0 e t0 u ,
and the ratio ∫t
r du
∫T
Gt G t e t0 u
= 0 ∫T = e− t ru du ,
GT r du
Gt0 e t0 u
for any T > t is the discount factor between t and T .
1 The independence hypothesis is not restrictive since we can always switch from a vector of dependent Wiener process to a vector
of independent Wiener processes (and vice versa) through the Cholesky decomposition of the variance and covariance matrix.
63
64 6 The financial market
6.2 Portfolio
A portfolio is a linear combination of assets St and Gt whose value Rt is given by
where wt and wt,G are the number of risky and riskless assets held in the portfolio, respectively (if an element of
wt is negative then the corresponding asset is short sold).
Since the portfolio allocation wt and wt,G may change over time according to the changes in the asset values
then both wt and wt,G must be considered as stochastic variables. This means that the differential of Rt must be
computed as
where the term dwt,G dGt lacks because Gt is deterministic (in other words, the product of the two differentials is
2
an infinitesimal of the same order as (dt) ).
The dynamics of wealth can also be written as
where dRt,1 are changes in wealth due to the changes in asset prices (dSt and dGt ) while dRt,2 are changes in wealth
due to changes in portfolio allocation (dwt and dwt,G ).
The changes in portfolio composition cannot be arbitrarily chosen, since we cannot invest more than our wealth.
Accordingly, we can distinguish three cases:
1. strict self-financing condition: the agent has no more wealth than his portfolio value and he wants neither
to contribute nor to withdraw any money from it; accordingly, in each period, he can invest more in one asset
only if he suitably decreases the amount of money invested in the other assets; this condition can be written
as
dwt′ (St + dSt ) + dwt,G × Gt = 0,
where we see that St + dSt is the new price of asset St , after the period dt;
2. outflows: at each period, the agent withdraws some money from his portfolio in order, for instance, to finance
consumption; if we call c (t) dt the amount of consumption in the instant dt, then
3. inflows: at each period, the agent receives some yield y (t) dt and a percentage α (t) of it is invested in the
portfolio; this means that
dwt′ (St + dSt ) + dwt,G × Gt = α (t) y (t) dt.
Case 2 is typical of a pension fund when it starts paying pensions: at each time the amount of pensions are deducted
from its wealth. Instead, case 3 is typical of a pension fund when it receives contributions from its sponsors, during
the so-called «accumulation phase».
Right now we just take into account the case of a strict self-financing portfolio. Accordingly, the wealth differ-
ential can be written as
dRt = wt′ dSt + wt,G dGt , (6.2.2)
6.3 Arbitrage 65
which is the dynamic version of the constraint (6.2.1). Since both constraints must be verified at any time, we
can merge them by taking wt,G from (6.2.1) and plugging it into (6.2.2). Accordingly, we have just one (dynamic)
constraint as follows:
Rt − wt′ St
dRt = wt′ dSt + dGt ,
G
| {zt }
wt,G
and after substituting for both dSt and dGt from (6.1.1) and (6.1.2) respectively, we have
( ( ))
dRt = Rt rt + wt′ IS µ − rt 1 dt + wt′ IS Σ′ dWt ,
1×n n×n n×1 n×1 1×n n×n n×k k×1
6.3 Arbitrage
A financial market is well defined if there is no arbitrage, which can be defined as a strategy (portfolio) without
risk and whose return is different from rt . Accordingly, θt is an arbitrage if the two following conditions hold
θt′ Σ′ = 0 ,
(
1×n n×k )
1×k
θt ′
µ − rt 1 ̸= 0.
1×n n×1 n×1
For checking whether there is an arbitrage on financial market, we can use the following result.
Lemma 6.3.1 (Fredholm). One and only one of the two following cases is true:
∃x ∈ Rk : A′ x = b ,
n×k k×1 n×1
y ′ A′ = 0,
1×n n×k
∃y ∈ Rn :
y
′
b ̸= 0.
1×n n×1
Since in the real world the financial markets are actually arbitrage free, then we never take into account the third
case.
We highlight that Equation (6.3.1) has (at least) a solution if there exists the so-called left inverse of matrix
Σ′ . In particular, the matrix Σ′l is said to be the left inverse of Σ′ if
Σ′l Σ′ = I,
Exercise 6.3.1. A financial market with one risky asset driven by one risk source:
dSt
= µdt + σdWt ,
St
dGt
= rdt,
Gt
is always arbitrage free since there exists the scalar ξ which solves
σξt = µ − r.
Furthermore, ξt coincides with the Sharpe ratio (and, in this case, it is constant). Let us stress that if σ = 0
(i.e. both assets are riskless), then the market is arbitrage free if and only if µ = r (on the financial market
there cannot be more than one risk free return).
The vector ξ has a nice economic interpretation. If σ measures the risk and µ − r is the risk premium, then the
ratio between µ − r and σ is the risk premium for any unit of risk: actually, this is the «market price of risk». If
there are k risk sources on the financial market, then there must be k prices of risk. Thus, a financial market works
well (is arbitrage free) if and only if it is able to provide a price for any risk source.
The interpretation of this result is easy: if two assets depend on the same risk source, then their market price
of risk must be the very same!
6.4 Completeness (and asset pricing) 67
Since this equation has infinite solutions, then the financial market is arbitrage free.
Remark 6.3.1. Hereafter, we will always work with arbitrage free financial market.
Since the market is arbitrage free, then also the drift and diffusion terms of this asset must verify the no arbitrage
condition:
σF′ ξt = µF − rt .
68 6 The financial market
We can conclude that on an arbitrage free financial market, the expected return on any asset is given by the
riskless interest rate augmented by the product between the diffusion term and the market price of risk.
Proposition 6.4.1. On an arbitrage free financial market (where ∃ξt : Σ′ ξt = µ−rt 1), the drift of any asset having
diffusion σF′ must be rt + σF′ ξt .
In order to replicate asset Ft , we must look for a portfolio θ such that the investor’s wealth
dRt
= (rt + θt′ (µ − rt 1)) dt + θt′ Σ′ dWt , (6.4.2)
Rt
coincides with dF
Ft . The two stochastic processes (6.4.1) and (6.4.2) are equal if both their drifts and diffusions are
t
equal. Nevertheless, the absence of arbitrage allows us to ask just for the diffusion terms to be equal, in fact, if we
are able to find a portfolio such that
θt′ Σ′ = σF′ , (6.4.3)
1×n n×k 1×k
Rt = wt,G Gt + wt′ St = Ft ,
Ft − wt′ St
wt,G = ,
Gt
or, which is the same,
1 1 ′ Ft
wt,G Gt + wt St = ,
Rt Rt Rt
Ft
θt,G + θt′ 1 = ,
Rt
i.e.
Ft
− θt′ 1.
θt,G =
Rt
System (6.4.3) has a solution if the matrix Σ′ has a so-called right inverse. In particular, we say that Σ′r is
the right inverse of matrix Σ′ if
Σ′ Σ′r = I.
Nevertheless, we recall that we have already assumed that Σ′ has a left inverse for the market to be arbitrage
free. Accordingly, if we now assume that Σ′ has also the right inverse, then we are assuming that it is invertible (in
fact a matrix which has both the left and the right inverse is invertible). If Σ′ is invertible then Equation (6.3.1)
has only one solution:
ξt = Σ′−1 (µ − rt 1) .
6.5 Incomplete financial market and incomplete replication 69
Proposition 6.4.2. The financial market is complete if and only if there exists only one vector of market price of
risk solving Equation (6.3.1).
No arbitrage Completeness
∃ξt : Σ′ ξt = µ − rt 1 ∃!ξt : Σ′ ξt = µ − rt 1
Proposition 6.4.3. In a complete financial market defined by (6.1.1)-(6.1.2), an asset having diffusion σF (as in
6.4.1), is replicated by the portfolio θt = Σ−1 σF .
Σθt − σF ̸= 0,
which means that there will always be an error when trying to replicate an asset with a portfolio. Nevertheless,
we can try to approximate a kind of (non perfect) replicating portfolio. Many choices are available, but one quite
«natural» is to minimize the square of the replicating error:
The second order condition for this problem always hold since the variance-covariance matrix Σ′ Σ is always
positive semi-definite. Thus, by solving the first order condition
where we recall that the variance-covariance matrix is invertible because of the no arbitrage condition (i.e. the
existence of the left inverse of matrix Σ′ ).
If this portfolio allocation is replaced in the wealth equation (6.4.2), it becomes
dRt ( )
−1 −1
= rt + σF′ Σ (Σ′ Σ) (µ − rt 1) dt + σF′ Σ (Σ′ Σ) Σ′ dWt .
Rt
The drift of this equation must coincide with the drift of the asset Ft :
−1
rt + σF′ ξt = rt + σF′ Σ (Σ′ Σ) (µ − rt 1) ,
and thus, the only market price of risk ξt compatible with this approach is
−1
ξt∗ = Σ (Σ′ Σ) (µ − rt 1) .
It is important to stress that this value of the market price minimizes its square under the no arbitrage condition:
A sufficient (but not necessary) condition for (6.6.2) to be a martingale on a given horizon T , is the so-called
Novikov’s condition [ 1 ∫T ′ ]
Et e 2 t ξs ξs ds < ∞.
A demonstration of Theorem 6.6.1 can be found in Karatzas and Shreve [1991]. Theorem 6.6.1 allows us to
conclude that if the financial market is arbitrage free (and ξt is such that the Radon-Nikodym is a martingale),
then we can switch from the historical probability P to another probability Q under which the expected return on
any financial asset coincides with the riskless interest rate. This is the reason why the new probability Q is called
risk neutral probability.
Theorem 6.6.1 is not based on the hypothesis of completeness. In fact, it is sufficient that the financial market
is arbitrage free.
Now we take Equation (6.1.1) and we rewrite it under the new probability by using (6.6.1):
IS−1 dSt = µdt + Σ′ dWt
( )
= µdt + Σ′ dWtQ − ξt dt
= (µ − Σ′ ξt ) dt + Σ′ dWtQ .
Since the market is arbitrage free, then µ − Σ′ ξt = rt 1 (from Equation (6.3.1)), and so
IS−1 dSt = rt 1dt + Σ′ dWtQ .
6.6 Change of probability and asset pricing 71
Remark 6.6.1. Girsanov’s theorem allows us to change the drift of any stochastic process, while the diffusion
cannot be changed.
Gt
where we have already underlined that GT is the discount factor between t and T . This is a fundamental result in
asset pricing.
Theorem 6.6.2 (Fundamental Theorem of Asset Pricing (I)). On an arbitrage free financial market, the price
of any asset is given by the expected value, under the risk neutral probability, of its future value discounted by
the riskless interest rate.
According to Theorem 6.6.2, the value in t of an asset which pays 1 Euro in T (for sure) is given by
[ ] [ ∫T ]
Gt
Bt,T = EQ
t 1 × = EQ
t e
− t ru du
. (6.6.3)
GT
This means that the value of a zero coupon bond is given by the expected value of the discount factor (under
the risk neutral probability).
If an asset pays some cash flows δt at any instant t and up time T , and then it is sold in T at the price ST
(unknown in t), its value can be computed as the sum of each cash flow which is interpreted as a single asset (the
strategy of trading cash flows of an asset independently of the main asset is called «stripping») as shown in Figure
6.6.1.
( )
2 We recall that given dGt = Gt rt dt, we have d 1
Gt
= − G1 rt dt.
t
72 6 The financial market
[ ]
EQ
G t0
t0 δt1 Gt 1
[ ]
EQ
G
t0
t0 δt2 Gt 2
...
[ ]
EQ
G t0
t0 (δ t n
+ St n
) Gt n
Theorem 6.6.3 (Fundamental Theorem of Asset Pricing (II)). On an arbitrage free financial market, the value
of any asset is given by the expected value, under the risk neutral probability, of its future cash flows discounted
by the riskless interest rate.
dQ
t|t0
where dP t|t is the Radon-Nikodym derivative (6.6.2), and Ω is the domain of the stochastic variable X.
0
Thus, we can always switch from an expected value under Q to an expected value under the historical probability
P and vice versa.
From Girsanov theorem we know that
dQ t|t0 ∫ ∫
− 1 t ξ ′ ξ ds− tt ξs′ dWs
= e 2 t0 s s 0 ≡ mt0 ,t ,
dP t|t0
or, alternatively,
[ ∫ ]
− T r ds
St0 = Et0 ST mt0 ,T e t0 s
[ ∫ ∫ ∫ ]
− 1 T ξ ′ ξ ds− tT ξs′ dWs − tT rs ds
= Et0 ST e 2 t0 s s 0 e 0 .
Thus, we can conclude that the value of an asset can also be computed under the historical probability but the
discount factor must be stochastic:
∫T ∫T
St0 = Et0 ST e|
− (rs + 12 ξs′ ξs )ds− ξs′ dWs
} .
t0 t0
{z
Stochastic Discount Factor (SDF)
Since the SDF between time t0 and time t0 is of course equal to 1, then we can write
i.e. under the historical probability the asset prices are martingales if they are discounted by the stochastic discount
factor.
dSt δt
+ dt = rt dt + σ ′ dWtQ ,
St St
which becomes ( )
dSt δt
= rt − dt + σ ′ dWtQ .
St St
This means that an asset which pays coupons/dividends has a growth rate lower than that of an asset which
does not pay any cash flow.
Exercise 6.8.1. Let us take into account the simplest case: everything is deterministic (coupons/dividends and
interest rate). The value of an asset paying cash flows δt from t to T is given by
∫ T ∫s
Vt = δs e− t
ru du
ds.
t
EQ
t [dVt,T ] = (Vt,T rt − δt ) dt.
If the coupons are deterministic then the value of Vt,T can be simplified as follows
∫ T [ ] [ ]
Gt Q Gt
Vt,T = δs EQ
t ds + Et
t Gs GT
∫ T
= δs Bt,s ds + Bt,T .
t
Once the values of all the zero-coupon bonds are known, the asset price Vt,T can be easily computed.
Exercise 6.8.2. An interesting case is that of a perfectly indexed bond whose coupon is equal to the riskless
interest rate: [∫ ]
T
Q Gt Gt
Vt,T = Et rs ds + .
t Gs GT
Now we recall that
dGs
= rs ds,
Gs
and so we can substitute rs ds in the integral with its corresponding value:
∫ [∫ ]
T G dG T
G G G
Vt,T = EQ Q
t s t t t
t + = Et dGs +
t G s Gs GT G2s GT
|{z} t
rs ds
[[ ]s=T ] [ ]
Gt Gt Gt Gt Gt
= EQ
t − + = EQ
t − + +
Gs s=t GT GT Gt GT
= 1.
The result of this exercise teaches that a perfectly indexed bond must always be listed at par.
Chapter 7
Asset prices
7.1 Forward
The fundamental theorem of asset pricing is a power tool which can be used for pricing any derivative once the rule
for computing its cash flows is known.
The first derivative officially traded on the Chicago Stock Exchange was a forward contract: two parties agree at
time t0 to exchange at time T > t0 an asset, which is called underling asset (whose value will be ST ), against an
amount of money (FT ). The pay-off of who receives ST and pays FT will be ST − FT . By applying the fundamental
theorem of asset pricing we can conclude that the value in t0 of this forward contract is
[ ]
Q Gt0
Ft0 ,T = Et0 (ST − FT ) . (7.1.1)
GT
Both parties are willing to sign the contract if and only if the possible gains and the possible losses compensate,
i.e. if the value of this asset, at time t0 , is nil. This is consistent with the actual price of the forward: when the two
parties enter the contract, no money is exchanged between them and, thus, its price must be zero. This means that
[ ]
Gt0
0 = EQt0 (S T − F T ) ,
GT
form which we can compute the equilibrium value of the forward price FT :
[ ] [ ]
Q G t0 Q G t0
0 = Et0 ST − FT Et0 .
GT GT
Since the fundamental theorem of asset pricing must be valid for any asset, and also for the underling asset
(which is supposed not the pay dividends), then
[ ]
Gt0
St0 = EQ
t0 ST ,
GT
0 = St0 − FT Bt0 ,T ,
St0
FT = .
Bt0 ,T
7.2 Futures
While the forward contract does not pay any cash flow between its issuance and its maturity, the futures contract
asks both parties to make up their position day by day. Thus, the party who is loosing money because of a change
in the value of the underlying will have to pay a cash flow to the other party. Accordingly, the value of the futures
contract must be set to zero at any date (since the debt of the party who is loosing money is immediately made
up). If we call F̂t,T the value of the futures contract at any time t ∈ [t0 , T ] (where t0 is the date of issuance and
75
76 7 Asset prices
T is the maturity), the cash flow paid at each period must be dF̂t,T (i.e. the change in the contract value) and,
according to the fundamental theorem of asset pricing, we can write
[∫ ]
T
Q Gs
F̂t0 ,T = Et0 dF̂t,T = 0
t0 G t0
We know that an Itô integral is zero if dF̂t,T is a martingale, i.e. if it can be represented as a Wiener process (a
stochastic differential equation with zero drift). Thus, we know that
[ ]
F̂t0 ,T = EQ
t F̂T,T .
The price of the forward contract at the maturity must coincide with the price of the underling asset (what is
a party willing to pay at time T for receiving the price ST at time T ? Exactly ST ). Thus, we can conclude that
F̂t0 ,T = EQ
t [ST ] ,
i.e. the price of a futures contract is the expected value of the price of the underlying, without any discount factor.
The discount factor is no more present since the position on the futures is made up day by day.
7.3 Options
If at maturity T the buyer of the contract has the right to choose whether to actually make the exchange between
ST and FT , the contract becomes an option (in particular a call option, which gives the right to buy an asset at
a given price K, called strike price). The pay-off of the call option is:
• positive: if at the time T the price ST is higher than the strike price K; in this case, in fact, it is convenient to
buy the stock at the strike price since the buyer pays K for an asset whose value is higher; thus, the pay-off
is ST − K;
• nil: if at time T the price ST is lower than K, then it is convenient to buy the asset directly on the market
(at the price ST < K) by quitting the option contract; in this case the pay-off is zero.
where Iε is the indicator function of the event ε whose value is 1 if the event ε happens and zero otherwise:
{
1, ε happens
Iε =
0, ε does not happen
7.3 Options 77
If we buy the right to sell an asset at time T and at a given price K, this is a so-called put option and its value
is given by [ ]
Q G t0
Pt0 = Et0 (K − ST ) IK>ST . (7.3.2)
GT
There exists a relationship between the prices of a call, a put and a forward, which can be found by recalling
that the indicator function of an event is equal to one minus the indicator function of the opposite event. Thus, if
we start from the value of a call, we can write
[ ] [ ]
G t0 Gt0
Ct0 = EQ t0 (ST − K) IST >K = EQ
t0 (ST − K) (1 − I K>ST
)
GT GT
[ ] [ ]
G G
= EQ + EQ
t0 t0
t0 (ST − K) t0 (K − ST ) IK>ST
GT GT
= Ft0 + Pt0 .
This is the so-called «put-call parity»: the value of a call is equal to the sum of a forward and a put written
on the same underling and having FT = K.
Now, we are going to simplify the price of a call through tow changes in probability. After the first passage:
[ ] [ ] [ ]
Gt0 Gt0 G t0
Ct0 = EQt0 (ST − K) IST >K = EQ
t0 IST >K ST − KE Q
t0 IST >K ,
GT GT GT
and obtain [ ] [ ]∫
G t0 Q Gt0
EQ
t0 IST >K = Et0 IST >K dFT = Bt0 ,T EFt0T [IST >K ] .
GT GT Ω
Remark 7.3.1. The expected value of the indicator function of an event is the probability of the event:
Et [Iε ] = P {ε} .
The first part of the option price can be simplified in a similar way:
[ ] ∫ ∫ [ ] G
G t0 G t0 G t0 ST GtT0
EQ
t0 I S
ST >K T = IST >K ST dQ = IST >K EQ
t0 ST [ ] dQ
GT Ω GT Ω GT EQ S Gt0
t0 T GT
[ ]∫ G
Gt0 ST GtT0
= EQ
t0 ST IST >K [ ] dQ,
GT EQ S t0
G
Ω T GT
t0
78 7 Asset prices
and so
[ ] [ ]∫
Gt0 G t0
EQ
t0 IST >K ST = STEQ
t0 IST >K dS = St0 ESt0 [IST >K ]
GT GT Ω
= St0 S {ST > K} .
Finally, the value of a call option is
Ct0 = St0 S {ST > K} − KBt0 ,T FT {ST > K} .
If the riskless interest rate r is constant and the asset price St is log-normally distributed, then Ct0 coincides
with the Black and Scholes formula.
Because of the put-call parity we obtain the value of the put option as follows
Pt = Ct − Ft = KBt,T (1 − FT {ST > K}) − St (1 − S {ST > K}) .
Case: Black & Scholes formula
We assume that the interest rate r is constant and that the risky asset price follows, under the probability
Q, a log-normal distribution according to the following formula:
√
ST = St er(T −t)− 2 σ (T −t)+σ T −ty
1 2
,
where y is a standard normal distribution (with zero mean and variance equal to 1). This means that the density
of the variable y under Q is
1
dQ = √ e− 2 y .
1 2
2π
The riskless asset is given by
GT = Gt er(T −t) .
Now, we compute the value of a call option. In order to do so we have to compute both the forward
probability FT and the probability S.
The density of the forward probability is
Gt
e−r(T −t)
dFT ≡ [
GT
] dQ = [ ] dQ = dQ.
EQ
t0
Gt EQ
t0 e−r(T −t)
GT
If the interest rate r is constant, then the forward probability and the risk-neutral probability coincide. With
this result we can compute the probability that ST is higher than the strike price:
Since y is normal, then we can use the cumulative density function of a normal variable (let us call it Φ):
{ ( ) } ( K ( ) )
ln SKt − r − 12 σ 2 (T − t) ln St − r − 12 σ 2 (T − t)
Q y> √ = 1−Φ √
σ T −t σ T −t
( ( ) )
ln SKt − r − 12 σ 2 (T − t)
= Φ − √
σ T −t
( ( ) )
ln SKt + r − 12 σ 2 (T − t)
= Φ √ .
σ T −t
7.4 Replication and hedging 79
√ √
eσ T −ty eσ T −ty √
dQ = 1 σ2 (T −t) dQ = e− 2 σ (T −t)+σ T −ty dQ
1 2
Q [ σ T −ty ]
= √
Et0 e e2
√ 1 1 √ 2
e− 2 σ (T −t)+σ T −ty √ e− 2 y = √ e− 2 (y−σ T −t) ,
1 2 1 2 1
=
2π 2π
where
√ we see that the variable y, under the new probability S, is again normally distributed but its mean is
σ T − t instead of zero. This means that, under S, the price of the risky asset can be written as
√ √
= St er(T −t)− 2 σ (T −t)+σ T −t(y+σ T −t)
1 2
ST
√
r(T −t)+ 21 σ 2 (T −t)+σ T −ty
= St e ,
where we see that, with respect to the initial formula, just the sign of σ 2 has changed. Thus, we obtain:
If we take into account a portfolio formed by θt,S asset St and θt,B asset Bt,T , then the value of a portfolio is
• a forward can be replicated by buying θt,S = 1 stock and short selling θt,B = −FT bond;
• a call option can be replicated by buying θt,S = S {ST > K} < 1 stock and short selling θt,B = KFT {ST > K}
bond;
• a put option can be replicated by short selling θt,S = (1 − S {ST > K}) < 1 stock and buying θt,B =
K (1 − FT {ST > K}) bond.
80 7 Asset prices
We assume to be able to invest in a stock whose price is St and in a derivative, written on this stock, whose price
is Xt (St ). If we can buy/sell θt,S number of the stock and θt,X number of the derivative, the wealth is given by
If we want to hedge this wealth against changes in St , we must find the suitable portfolio composition (θt,S , θt,X )
which makes the value of the portfolio independent of the changes in St . In mathematical terms we want to set to
zero the derivative of Rt with respect to St :
∂Rt ∂Rt
= θt,S + θt,X = 0.
∂St ∂St
Definition 7.4.1. The mathematical derivative of a financial derivative with respect to its underling asset is
called «Delta».
which is called «hedging ratio». In order to hedge a portfolio against a risk, we must buy a number of derivatives
(written on that risk) with respect to stocks which is equal to the opposite of the inverse of the delta.
Sometimes, it is preferable to obtain the hedging ratio as the ratio between two amounts of money by multiplying
both sides by XSt :
t
θt,X Xt 1
= − ∂Xt St ,
θt,S St ∂S X t t
where we see that in the denominator of the right hand side coincides with the elasticity of the derivative with
respect to the underling (that we call ηX,S ).
Remark 7.4.1. The elasticity can be computed as the ratio between the differential of two log functions
∂Xt St d ln Xt
= ,
∂St Xt d ln St
and since
d ln Xt = ηX,S d ln St ,
we can conclude that an easy way to estimate the elasticity is to perform an OLS as
d ln Xt = β0 + β1 d ln St + εt ,
where we expect β0 not to be significantly different from zero and β1 is the estimation of the elasticity.
The Delta plays another important role for measuring the volatility of a derivative. If we compute the differential
of Xt (St ), we have
∂Xt
dXt = dSt ,
∂St
from which ( )2
∂Xt
Vt [dXt ] = Vt [dSt ] .
∂St
Thus, we see that the variance of a derivative is proportional to the variance of the underling and the proportion
is equal to the square of the Delta.
• the value of the investment Vt evolves over time by following a geometric Brownian motion
dVt
= µdt + σdWt ,
Vt
• the objective of the investor is to chose the time to exercise the option (τ ) so that the discounted expected
value of the difference between the value Vτ and the cost I is maximised:
[ ]
max Cτ = E0 (Vτ − I) e−rτ .
τ
The problem is solved through an easy application of Theorem 2.8.1. The first step is to rewrite the problem by
defining τ as the stopping time when the process Vt reaches, for the first time, a given threshold (let us call it v):
τ = inf {t ≥ 0 : Vt = v} .
where we still have to compute the expected value of the discounted factor. The «trick» is to recall that Wt is a
martingale and for any constant α we can write
[ 1 2 ]
1 = E0 e− 2 α t+αWt ,
Now, since ( )
(µ− 21 σ 2 )t+σWt
ln VV0t − µ − 12 σ 2 t
Vt = V0 e ⇐⇒ Wt =
σ
then the previous equality can be written as
[( ) ασ ]
Vτ (− 12 α2 − ασ µ+ 12 α 2
)τ
1 = E0 e σσ ,
V0
and, since Vτ = v:
( ) ασ [ ]
V0
= E0 e−( 2 α + σ µ− 2 σ σ )τ .
1 2 α 1 α 2
v
Now, we are finally able to compute the expected value by choosing α which solves
( )
1 2 α 1α 2
r= α + µ− σ ,
2 σ 2σ
then, we immediately see that for an infinite threshold (i.e. v → ∞), τ is never reached and the value of the
option must be zero. The power in the left hand side is zero for v → ∞ if and only if the exponent is positive and,
accordingly, we chose only the positive solution
( ) √( )2
µ 1 µ 1 2r
β≡− − + − + 2.
σ2 2 σ2 2 σ
Remark 7.5.1. The original optimal stopping time, has been changed into an optimal threshold problem where
we aim at finding the value v above which it is expedient to exercise the option (and start investing).
Credit risk
with the natural boundary condition (t0 pt0 ) = 1, i.e. the probability to be solvable in t0 given that we are solvable
in t0 is, of course, 1.
The previous differential equation can be also written as
d (t pt0 ) πt
=− dt,
(t pt0 ) (t pt0 )
where
πt πt
= ∫t ≡ λt , (8.1.1)
(t pt0 ) 1 − t0 π (s) ds
is often called hazard rate, while we will call it intensity of default (in the literature about the actuarial risk,
this measure is called force of mortality). Since both the numerator and the denominator of this ratio are positive
then λt must be positive for any t.
Remark 8.1.1. In discrete time, the hazard rate has the following meaning: the number of firms which go
bankrupt in a given period (for instance one year) over the firms who were solvable at the beginning of the same
period.
83
84 8 Credit risk
The solvency probability between t and T can be traced back to the probabilities of solvency between t0 and t
and between t0 and T . In particular, if we solve the differential equation
d (t pt0 )
= −λt dt,
(t pt0 )
(t0 pt0 ) = 1,
we have ∫t
− λs ds
(t pt0 ) = e t0
,
or, equivalently, ∫T
− λs ds
(T pt0 ) = e t0
.
By using Bayes rule about conditional probability
P (B|A) P (A)
P (A|B) = ,
P (B)
we can write
P (τ > t|τ > T ) P (τ > T )
P (τ > T |τ > t) = ,
P (τ > t)
where P (τ > T |τ > t) is the probability to be solvable in T given that one is solvable in t (with T > t). We can
also write it as (T pt ). The probability of being solvable in t given that we are solvable in T is, of course 1 (i.e.
P (τ > t|τ > T ) = 1). Then, since we have
P (τ > T ) = (T pt0 ) ,
and
P (τ > t) = (t pt0 ) ,
we can finally write ∫T
− λ ds
∫T
(T pt0 ) e t0 s
(T pt ) = = ∫t = e− t λs ds . (8.1.2)
(t pt0 ) − λ ds
e t0 s
It is worth noting the analogy between rt and λt and between the vale of a zero coupon Bt,T and the value of
the survival probability (T pt ).
i.e. the expected value of the indicator function that the default hasn’t happened yet at time T (i.e. τ > T ).
Since a probability always belong to the domain [0, 1], from (8.2.2) we see that λt must belong to the domain
[0, +∞[. A negative value of λt would imply a solvency probability higher than 1, which does not make any sense.
When we want to price an asset whose cash flows depend on the default date τ , we have to take into account
the expected value computed with respect to both the financial market risk (summarised by Q) and the credit risk
(summarised by τ ). Thus, the expected value takes the following form
EQ,τ
t,t [•] ≡ E
Q,τ
[ •| Ft ∧ Gt ] ,
where we see that it is computed with respect to two information set:
8.3 Zero-coupon bond 85
Table 8.2.1: Comparison between the value of a zero-coupon bond and the default probability
Now, we can use a trick due to Lando [1998] and write the expected value by using the rule of iterated expected
values (the so-called tower property for expected values)
[ ]
EQ Q Q
t ET [•] = Et [•] ,
as follows [ [ ∫ ]]
B0 (t, T ) = EQ,τ
t,t EQ,τ
T,t Iτ >T e
− tT ru du
,
where the inner expected value has a bigger information set for what concerns the financial risk (Q is taken at
time T ). Actually, at time T , all the financial variables (here just the interest rate) are known and so can be taken
outside the inner expected value:
[ ∫T ]
B0 (t, T ) = EQ,τ
t,t e− t ru du EQ,τ
T,t [Iτ >T ] .
Now, we have the expected value of the solvency indicator function, as in (8.2.2), and we can accordingly write
[ ∫T ∫ ] [ ∫T ]
B0 (t, T ) = EQ,τ
t,t e − t ru du − tT λu du
e = EQ
t e − t (ru +λu )du
,
in fact a bond where there is more risk must have a lower price (in order to have a higher return). In particular,
if we compare the return on a non-defaultable bond (which is rt ) and the return on a defaultable bond (which is
rt + λt ), the difference is exactly the default intensity λt .
When default happens, the issuer of the zero-coupon is often able to pay back a percentage of the face value.
We call this percentage the «recovery rate» ϕt . Thus, the possible cash flows of this asset are given by:
• 1 at the maturity T if the default hasn’t happened yet;
86 8 Credit risk
where the first part can be simplified as in the previous case, while in the second part we can write
[ [ ∫ ]]
EQ,τ
t,t EQ,τ
∞,t ϕ τ Iτ ≤T e − tτ ru du
,
where we have taken, for the financial risk, the biggest filtration (that corresponding to an infinite maturity).
Unfortunately, all the stochastic variables inside the inner expected value depend on τ and, thus, cannot be taken
outside the inner expected value. Nevertheless, the inner expected value allows us to compute it by using the
marginal density function of τ (i.e. π (τ )) since all the financial risk is known when t → ∞. Thus, we can simplify
the previous formula as [ [∫ ]] ∞ ∫s
EQ,τ
t,t EQ,τ
∞,t ϕs Is≤T e− t
ru du
πs ds .
t
where the indicator function can be eliminated from the integrand function by changing the upper bound of the
integral as follows: [∫ ]
T ∫
Q − ts (ru +λu )du
Et ϕs λs e ds .
t
We can see that the value of this asset looks like a coupon bond, whose coupons are equal to ϕt λt , and at
maturity the face value is paid back. The discount rate is, another time, rt + λt as it happened before.
The price of a zero-coupon looks like the price of a coupon bond since the recovery rate ϕt may be paid at any
time and, thus, when weighted by the default probability, it works like a coupon.
δt = a + rt ,
This means that such an asset should always be listed over the par. Nevertheless, during the last financial crisis,
many indexed assets (with a spread) felt under the par. How can we explain that?
8.5 Credit Default Swap (CDS) 87
The answer is in the credit risk. If we take into account the credit risk, the value of such an asset is
[∫ ]
T ∫s ∫T ∫τ
EQ,τ
t,t (a + rs ) Is<τ e− t
ru du
ds + IT <τ e− t
ru du
+ IT ≥τ ϕτ e− t
ru du
,
t
which simplifies to
[∫ ∫ ]
T ∫s ∫T T ∫s
EQ
t (a + rs ) e − t
(λu +ru )du
ds + e − t
(λu +ru )du
+ λs ϕs e − t
(λu +ru )du
ds
t t
[∫ ]
T ∫s ∫T
=EQ
t (a + rs + λs ϕs ) e− t
(λu +ru )du
ds + e− t
(λu +ru )du
.
t
a + rs + λs ϕs = a + rs + λs − (1 − ϕs ) λs ,
[∫ ]
T ∫s ∫T
EQ
t (a + rs + λs − (1 − ϕs ) λs ) e − t
(λu +ru )du
ds + e − t
(λu +ru )du
t
[∫ ]
T ∫s ∫T
=EQ
t (rs + λs ) e − t
(λu +ru )du
ds + e − t
(λu +ru )du
t
| {z }
1
[∫ ]
T ∫s
+ EQ
t (a − (1 − ϕs ) λs ) e − t
(λu +ru )du
ds
t
[∫ ]
T ∫s
=1 + EQ
t (a − (1 − ϕs ) λs ) e − t
(λu +ru )du
ds ,
t
where we see that this asset can be listed under the par if λt is sufficiently high. So, during a crisis, when the credit
risk is very high (i.e. when λt increases) the value a − (1 − ϕs ) λs could be negative. The term 1 − ϕt is know as
the «Loss Given Default» (LGD).
This formula allows us to compute the value of δ through the following simplifications:
[∫ ] [∫ ]
T ∫ T ∫
Q − ts (λu +ru )du Q − ts (λu +ru )du
δEt e ds = Et λs (1 − ϕs ) e ds ,
t t
88 8 Credit risk
premia (spread )
Protection buyer Protection seller
(Firm A) (Firm B)
Debt
payment
no payment
Reference entity
(Firm C)
Table 8.5.1: Spreads (and implied default probabilities) for 5 year CDS on sovereign debts
[∫ ∫ ]
EQ
T − ts (λu +ru )du
t t
λ s (1 − ϕ s ) e ds
[ ]
Q ∫ T − s (λu +ru )du
δ= ∫ .
Et t e t ds
We can conclude that the value of a spread for a CDS is given by the weighted mean of the terms λs (1 − ϕs ),
where the weights are given by the discount factors (at the rate ru + λu ).
Through the theorem of the mean value, we know that there exist constant λ̂ and ϕ̂ such that
[∫ ] [ ( ) ∫s ]
Q ∫T
∫
EQ
T − ts (λu +ru )du − t (λu +ru )du
t t
λs (1 − ϕ s ) e ds Et t
λ̂ 1 − ϕ̂ e ds
[ ] [∫ ]
Q ∫ T − s (λu +ru )du
∫ = ∫ ,
EQ
T − s (λu +ru )du
Et t e t ds t t
e t ds
Risk measures
Axiom 1 (Translation invariance). Given the P&L X, if some money is invested in a riskless activity whose
(non stochastic) payoff is a constant k, then:
Ψ (X + k) = Ψ (X) − k.
In other words, if investing in an asset could imply a loss of 1 Euro (per day), then investing in this asset and
having a riskless return of 1 Euro, would lead to a total risk of loosing nothing.
Note that the property Ψ (0) = 0 together with Axiom 1 implies
Ψ (k) = −k,
that is if we invest in a riskless asset whose return is k (Euros), then our «risk» will actually be to gain k Euros (so
the risk function must take a negative value just for indicating this opportunity to gain something).
This axiom also implies that in order to set the risk to zero, we must invest in a riskless asset an amount of
money such that the return on this riskless asset is equal to the risk on the risky asset, i.e. k = Ψ (X). In this case,
in fact, if we apply Axiom 1, we have
Axiom 2 (Monotonicity). Given two P&L variables X1 and X2 such that X1 ≥ X2 in any state of the world,
then
Ψ (X1 ) ≤ Ψ (X2 ) .
89
90 9 Risk measures
The meaning of this axiom is that if one asset always pays more than another one, then a good risk measure
must indicate that the first asset is better (i.e. the first asset must imply a lower risk).
In order to understand better this idea, let us take into account again Axiom 1. If we define X2 = X1 + k, i.e.
the second asset is actually a portfolio formed by the first asset and a riskless asset. So we can write
This axiom can be interpreted as «diversification is good»: the risk of a portfolio should never be greater than
the sum of the risks computed on each asset separately.
Ψ (αX) = αΨ (X) .
The meaning of this final axiom is that if we risk to lose 2.5 Euros when we buy 1 asset, then if we buy 2 assets
we must risk to lose 5 Euros.
Definition 9.1.1. Any risk measure Ψ (X) that satisfies the Axioms 1–2–3–4 is said to be coherent.
Proposition 9.1.1. Any convex combination of coherent risk measures is a coherent risk measure.
∑n
Given n risk measures Ψi (X), with i ∈ {1, 2, ..., n} and some weights ci ≥ 0 such that i=1 ci = 1, then the
new risk measure
∑n
Ψ̂ (X) = ci Ψi (X) ,
i=1
is coherent. The demonstration is easy and it is left to the reader (it is sufficient to check all the axioms).
Remark 9.1.1. Any risk measure Ψ (X) inherits the unit of measure of the variable X. So if X is in Euros,
also Ψ (X) will provide a risk that is measured as the amount of Euros that we risk to lose. Instead, if X is a
percentage (or a return), also Ψ (X) will be a return.
variance V [X] does not have the same unit of measure as X, we take as a risk measure
Remark 9.2.1. Since the √
the standard deviation: V [X].
Thus the variance does not satisfy Axiom 1 and then it is not coherent. Nevertheless, we can continue.
9.3 Representation theorem 91
or √( )2
√ √ √
V [X1 + X2 ] ≤ V [X1 ] + V [X2 ] ,
and √ √ √
V [X1 + X2 ] ≤ V [X1 ] + V [X2 ],
which is exactly Axiom 3.
Finally, the homogeneity is easy to check (recall that α is a positive constant):
√ √ √
V [αX] = α2 V [X] = α V [X].
So the variance (or the standard deviation) cannot be used for measuring risk.
Theorem 9.3.1. A risk measure Ψ (X) is coherent if and only if there exists a family of probabilities P (on the
states of the world) such that { }
Ψ (X) = − inf EP [X] . (9.3.1)
P∈P
The theorem states that, given a family of probabilities, we have to choose the probability (belonging to that
family) which minimizes an expected value.
The proof that (9.3.1) satisfies all the coherence axioms is very easy indeed1 , while the prof of the other part of
the theorem (the «only if» part) is much more complicate.
The representation theorem implies what follows.
1. The minus sign in (9.3.1) makes the risk measure positive. In fact, it is reasonable to assume that the minimum
of the risky asset returns (under any probability distribution) is negative.
2. The possibility to choose the probability (P) under which the risk measure must be computed, allows to create
an infinite number of coherent risk measures. This result is positive because allows to adjust the risk measure
to the preferences of the economic agent who is computing it, but it is also negative because states that it is
not possible to create a fully «objective» risk measure.
1 For the proof of the sub-additivity, we recall that the infimum of a sum is always higher than the sum of the infima.
92 9 Risk measures
Table 9.3.1: Three families of probabilities (P) which contain at least one zero and equally distributed events
X P1 P2 P3 P4 X P1 P2 P3 P4 P5 P6
−10 0 1
3
1
3
1
3 −10 0 0 0 1
2
1
2
1
2
−5 1
3 0 1
3
1
3 −5 0 1
2
1
2 0 0 1
2
1 1 1 1 1 1
0 3 3 0 3 0 2 0 2 0 2 0
1 1 1 1 1 1
20 3 3 3 0 20 2 2 0 2 0 0
X P1 P2 P3 P4
−10 0 0 0 1
−5 0 0 1 0
0 0 1 0 0
20 1 0 0 0
E [X] 20 0 −5 −10
{ }
− inf Pi ∈P3 EPi [X] = 10
Now, we show how to create a coherent risk measure by using the representation theorem. We start from an example
of a stochastic variable taking only four values:
In order to compute a (coherent) risk measure we must firstly choose a family of probabilities and then find the
probability (belonging to that family) which minimizes the expected value of X. Just as an example, we list three
possible families of probabilities.
1. The family giving zero probability to one state of the world and the same probability to the others; we show
this case in the upper-left part of Table 9.3.1; this family contains four distributions.
2. The family giving zero probability to two states of the world and the same probability to the others; we show
this case in the upper-right part of Table 9.3.1; this family contains six distributions.
3. The family giving zero probability to three states of the world and probability 1 to one state of the world; we
show this case in the lower part of Table 9.3.1; this family contains four distributions.
Of course, the expected value changes according to the probability family which is chosen (this is the subjective
part of the risk measure) as shown in Table 9.3.1. Once the family (P) has been chosen, we take the probability
(P) which minimizes the expected value.
9.4 Expected Shortfall 93
We see that under all the families proposed in Table 9.3.1, the risk measure coincides with the mean of the worst
results.
Proposition 9.3.1. The opposite of the mean of a given number of worst returns is a coherent risk measure.
A very particular case is given by the probability family which gives the same probability to all the events. In
this case we are computing the mean of the variable X and, accordingly, we can conclude that the opposite of the
mean return is a coherent risk measure.
Example 9.3.1. By using the values in Table 9.3.2 we want to compute the risk measure as the opposite of
the mean of the two worst scenarios.
In this case we have
−5 − 4
Ψ (X1 ) = − = 4.5,
2
−2 + 0
Ψ (X2 ) = − = 1.
2
Axiom 1 is satisfied since we have
−3 − 2
Ψ (X1 + 2) = − = 2.5 = Ψ (X1 ) − 2.
2
Axiom 2 is satisfied since X2 > X1 in any state of the world.
Axiom 3 is satisfied since
−5 − 3
Ψ (X1 + X2 ) = − = 4 < Ψ (X1 ) + Ψ (X2 ) .
2
Axiom 4 is satisfied since
−10 − 8
Ψ (2X1 ) = − = 9 = 2Ψ (X1 ) .
2
Figure 9.4.1: Relationship between the density function f (X) and the distribution function F (X)
f (X)
6
α
1 -
−γ O X
Maximum loss
which happens
F (X)
with prob. α 6
1
α
-
−1 O
F (α) X
Table 9.4.1: Meaning of the distribution and density functions (as shown in Figure 9.4.1)
Variable Description
f (X) Density of X
∫ −γ
F (−γ) ≡ −∞
f (X) dX = α Probability (α) that X takes values smaller than
−γ
F −1 (α) = −γ Maximum loss (−γ) which happens with
probability at least α
Let us compute, for instance, the mean of the losses greater than a threshold −γ as in Figure 9.4.1 (the meaning
of the symbols are summarised in Table 9.4.1), where −γ is the α−quantile of the distribution.
A condition expected value is computed as follows:
∫ −γ ∫
Xf (X) dX 1 −γ
E [ X| X < −γ] = ∫−∞ −γ = Xf (X) dX.
f (X) dX α −∞
−∞
Definition 9.4.1. Given a distribution function F (X), the expected shortfall at the confidence level α is
∫
1 α −1
ESα = − F (p) dp. (9.4.1)
α 0
We immediately see that when α = 1 the expected shortfall coincides with the (opposite of the) mean of X. Let
us compute the value of the expected shortfall when α → 0:
∂
∫ α −1
∂α 0 F (p) dp
lim ESα = − lim ∂
= − lim F −1 (α) = −F −1 (0) ,
α→0 α→0
∂α α α→0
where we have applied de l’Hôpital theorem. So, in this case, the risk measure coincides with the biggest loss.
Example 9.4.1. We now compute the expected shortfall for a stochastic variable uniformly distributed on the
domain [a, b].
The density function is
1
f (X) = ,
b−a
and the distribution function is ∫ X
X −a
F (X) = f (z) dz = ,
a b−a
which can be inverted:
F −1 (α) = a + α (b − a) .
Finally, the expected shortfall is
∫ ∫
1 α −1 1 α 1
ESα = − F (p) dp = − (a + p (b − a)) dp = −a − α (b − a) .
α 0 α 0 2
Given this assumption we can compute the Expected Shortfall as the mean of the worst cases. If we have n
observations (each with probability n1 ) and we want to compute the mean of the α worst cases, we must put the
→
possible outcomes of the variable X in an increasing order (we call X such ordered variable) and we must compute
the mean of the αn cases (α = 1% on 1000 observations is the mean of the worst 10 scenarios):
1∑1→ 1 ∑→
nα nα
ESα = − Xi = − Xi .
α i=1 n nα i=1
96 9 Risk measures
Unfortunately, most of the time αn is not integer. If, for instance, we have n = 250 and α = 0.01, then αn = 2.5:
how many losses do we take? We must take 2 losses and the third one with a weight which is not n1 but lower.
Thus, we sum all the losses up to the integer part of αn (which is noted ⌊nα⌋) and then we add the loss ⌊nα⌋ + 1
with a weight suitable for reaching the value α:
∑⌊nα⌋ → ( )→
⌊nα⌋
i=1
1
n Xi + α− n X ⌊nα⌋+1
ESα = −
α
∑⌊nα⌋ → →
j=1 Xi + (nα − ⌊nα⌋) X ⌊nα⌋+1
=− . (9.5.1)
nα
In R, the function for putting the elements of a vector in increasing order is
sort(X)
and the function for computing the integer part of a decimal number is
floor(X)
Thus, we can write the following function for computing the expected shortfall, given the vector of returns X
and the confidence level α.
ES = function(X, alpha) {
X = sort(X)
n = length(X)
k = floor(n * alpha)
ES = -(sum(X[1:k]) + (n * alpha - k) * X[k + 1])/(n * alpha)
return(ES)
}
Then, if we assume that the past 250 returns on an asset has been normally distributed with (annual) mean
0.08 and (annual) standard deviation 0.15 , we can compute the expected value at three confidence level as follows.
α Ip<α
1
ϕ (p) =
6
1
α
-
O α 1 p
Now that the ESα can be written as a spectral risk measure by using an indicator function as follows:
1
ϕ (p) = Ip<α . (9.6.2)
α
This spectrum is graphically represented in Figure 9.6.1.
In fact, we have
∫ 1
1
ESα = − Ip<α F −1 (p) dp
0 α
(∫ α ∫ 1 )
1
= − Ip<α F −1 (p) dp + Ip<α F −1 (p) dp
α
∫ 0 α
1 α −1
= − F (p) dp.
α 0
Acerbi [2002] demonstrates what follows.
Proposition 9.6.1. A spectral risk measure is coherent if and only if its spectrum ϕ (p) is such that, ∀p ∈ [0, 1]:
1. ϕ (p) ≥ 0
2. ϕ (p) is non increasing
∫1
3. 0 ϕ (p) dp = 1.
Remark 9.6.1. Properties 1. and 3. in the previous proposition allows us to conclude that any coherent spectrum
is a density function (even if the contrary is not true).
Definition 9.7.1. The Value at Risk at the confidence level α (V aRα ) is the (opposite of the) maximum lost
which happens with probability at least α.
In other words, the V aRα is (the opposite of) a quantile. Thus, we can conclude that the V aRα coincides with the
value of γ in Figure 9.4.1 and, accordingly, we can write
V aRα = −F −1 (α) .
98 9 Risk measures
ϕ (p) = δ (p − α)
6
-
O α 1 p
Probability X1 X2 X1 + X2 2X1 X1 + 2
0.1 −5 2 −3 −10 −3
0.1 −4 1 −3 −8 −2
0.1 −3 −2 −5 −6 −1
0.1 −2 1 −1 −4 0
0.1 −1 0 −1 −2 1
0.1 0 1 1 0 2
0.1 1 4 5 2 3
0.1 2 3 5 4 4
0.1 3 5 8 6 5
0.1 4 6 10 8 6
V aR0.3 3 −1 3 6 1
The V aRα can be represented as a spectral risk measure through the Dirac δ function.2 If we define the spectrum
as follows
ϕ (p) = δ (p − α) ,
then we obtain ∫ 1
V aRα = − δ (p − α) F −1 (p) dp = −F −1 (α) . (9.7.1)
0
The spectrum of the V aRα can be represented as in Figure 9.7.1 where we clearly see that it is not coherent
(since it is first increasing and then decreasing).
So, we can conclude that the V aRα is not a coherent risk measure (see Szego, 2002).
We can check that the V aRα is not coherent even with another example. Let us take the returns listed in Table
9.7.1 and compute, on them, the value of V aR0.3 .
We see that Axiom 3 does not hold
2 The Dirac function δ (x) takes always value 0 with the exception that limx→0 δ (x) = +∞.
9.7 Value at Risk (VaR) 99
C. Acerbi. Spectral measures of risk: a coherent representation of subjective risk aversion. Journal of Banking and
Finance, 26:1505–1518, 2002.
Ph. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk. Mathematical Finance, 3:203–228,
1999.
L. Bachelier. Thorie de la spculation. Annales Scientifiques de l’Ecole Normale Suprieure, 3:21–86, 1900.
I. Karatzas and E. S. Shreve. Brownian Motions and Stochastic Calculus. Springer, 1991.
D. Lando. On cox processes and credit risky securities. Review of Derivatives Research, 2:99–120, 1998.
H. Markowitz. Portfolio selection. The Journal of Finance, 7:77–91, 1952.
Giorgio P. Szego. No more var (this is not a typo). Journal of Banking & Finance, 26(7):1247–1251, July 2002.
O. Vasiček. An equilibrium characterization of the term structure. Journal of Financial Economics, 5:177–188,
1977.
B. ksendal. Stochastic Differential Equations - An Introduction with Applications. Fifth edition. Springer-Verlag,
2000.
B. ksendal and A. Sulem. Applied Stochastic Control of Jump Diffusions. Springer, 2007.
101