0% found this document useful (0 votes)
130 views101 pages

Stochastic Processes in Risk Management

This document provides an overview of stochastic processes and their applications in risk management and finance. It introduces key stochastic processes like random walks, Brownian motion, martingales and jump processes. It also covers related concepts in stochastic calculus like Itô's calculus. The document aims to explain how these stochastic processes can be used to model and price financial assets with applications programmed in R.

Uploaded by

boris djiongo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views101 pages

Stochastic Processes in Risk Management

This document provides an overview of stochastic processes and their applications in risk management and finance. It introduces key stochastic processes like random walks, Brownian motion, martingales and jump processes. It also covers related concepts in stochastic calculus like Itô's calculus. The document aims to explain how these stochastic processes can be used to model and price financial assets with applications programmed in R.

Uploaded by

boris djiongo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Stochastic Processes for Risk Management – With Applications in R

Francesco Menoncin1

7th December 2016

1
Università degli Studi di Brescia – Department of Economics and Management. Via S. Faustino 74/B – 25122 Brescia
(Italy). E-mail: [Link]@[Link]
Contents

1 The R software 7
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Stochastic processes 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 The random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 The Brown/Wiener process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Conditional expected values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Tower property of iterated expected values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 Stopping time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.9 Low of the maximum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Stochastic calculus 27
3.1 Stochastic differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Quadratic variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Correlated Wiener processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Itô’s calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Some properties of Itô processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 Simulation of stochastic process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 Stochastic processes used in finance 37


4.1 Linear stochastic differential equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Mean reverting process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 The Ornstein-Uhlenbeck process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 The Cox et al. [1985] process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5 The geometric Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.6 The Chan et al. [1992] process (and the simulated maximum likelihood estimation) . . . . . . . . . . 50
4.7 Two factor models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Stochastic processes with jumps 55


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Binomial model for rare events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3 The continuous time version of the binomial model for rare events . . . . . . . . . . . . . . . . . . . 56
5.4 The Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.5 Itô calculus for Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6 The financial market 63


6.1 Financial assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.3 Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.4 Completeness (and asset pricing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.5 Incomplete financial market and incomplete replication . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.6 Change of probability and asset pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3
4 CONTENTS

6.7 The switch between probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72


6.8 Assets with coupons/dividends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

7 Asset prices 75
7.1 Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.2 Futures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.3 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.4 Replication and hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.5 Real options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

8 Credit risk 83
8.1 Default measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.2 Double stochastic default intensity and asset pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.3 Zero-coupon bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.4 Default-coupon bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.5 Credit Default Swap (CDS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

9 Risk measures 89
9.1 Coherent risk measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9.2 The variance as a risk measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
9.3 Representation theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
9.4 Expected Shortfall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
9.5 Expected Shortfall: Historical simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.6 Spectral risk measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9.7 Value at Risk (VaR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Introduction

In his PhD dissertation, Bachelier [1900] tried, for the first time in history, to model the asset prices on the Paris
stock exchange through Gaussian processes. In particular, he used the so-called Brownian motions (or Wiener
processes) simply because they proved themselves very useful for describing many natural phenomena (like the heat
transfer).
Finance, nowadays, heavily relies on Wiener processes (also called diffusion processes) for describing the dynamic
behaviour of asset prices. More recently, and mainly because of the big financial crisis which burst in 2007/2008,
also so-called jump processes have become relevant in finance: they describe the behaviour of a stochastic variable
which may take a finite variation in an infinitesimal time interval (i.e. a so-called jump).
In these notes we will present the main theoretical properties of diffusion and jump processes together with
numerical applications written in R.

5
Chapter 1

The R software

1.1 Introduction
In this book we will use the R free software [Link] and its free interface RStudio https:
//[Link]/. A full introduction to the R programming language is out of the scope of this book. Here,
we just outline the main features of R. Other (and similar) software like Matlab (or its freeware clones like Scilab –
[Link] – or Octave – [Link] use mainly a vector/matrix
approach to computations, while R also and mainly works with data frames and lists.
In this work we use the package Knitr for LATEX ([Link] which allows to execute R
commands directly on a LATEX document without calling externally the R software.
In the following code we show how to create three sets of data (with command c) and show the first one.

A = c(11, 12, 14)


B = c(19, 20, 21)
C = c(10, 9, 7)
A
## [1] 11 12 14

All the sets that have been created can be put together into a data frame through the following commands
where we also show the whole set and a subset of it. Finally, the mean of the second subset is computed.

X = [Link](A, B, C)
X
## A B C
## 1 11 19 10
## 2 12 20 9
## 3 14 21 7
X$A
## [1] 11 12 14
mean(X$B)
## [1] 20

A matrix can be created by concatenation of the sets A, B and C (row by row through the command rbind –
or column by column through the command cbind). The new matrix (called M ) can be used as the argument of a
matrix command like determinant (det) or transposition (t).

M = rbind(A, B, C)
det(M)
## [1] -21
t(M)
## A B C
## [1,] 11 19 10
## [2,] 12 20 9

7
8 1 The R software

## [3,] 14 21 7

If a single number is appended to a matrix through the command rbind, a new row is created and all its elements
are equal to the given number.

rbind(M, 2.5)
## [,1] [,2] [,3]
## A 11.0 12.0 14.0
## B 19.0 20.0 21.0
## C 10.0 9.0 7.0
## 2.5 2.5 2.5

The elements of a matrix or of a set are identified by their coordinates inside brackets.

M[2, 1]
## B
## 19
A[3]
## [1] 14

A subset of elements can be selected by using brackets and colon as follows.

M[1:2, 1]
## A B
## 11 19

A matrix can be compute through the command array whose input are a set of elements (created through
command c) and the dimensions of the matrix.

D = array(c(1, 2, 3, 4, 5, 6), dim = c(2, 3))


D
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6

The product between two matrices can be computed by using a particular multiplying operator as follows.

D %*% t(D)
## [,1] [,2]
## [1,] 35 44
## [2,] 44 56

A sequence can be created through the command seq whose arguments are as follows:
seq(from = , to = , by = ,[Link] = , [Link] = )
where by contains the constant difference between two adjacent elements of the sequence, [Link] is the number
of elements in the sequence, and [Link] is the element whose dimension we want the sequence to replicate.
Here are some examples.

seq(0, 2, by = 0.5)
## [1] 0.0 0.5 1.0 1.5 2.0
seq(0, 1, [Link] = 10)
## [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
## [8] 0.7777778 0.8888889 1.0000000
seq(0, 1, [Link] = A)
## [1] 0.0 0.5 1.0
1.1 Introduction 9

Remark 1.1.1. Contrary to other software, the arguments of any command in R can be put in any preferred order. If
the default order of the arguments is preserved, then it is not necessary to use the name of the arguments. Instead,
if a new order of the arguments is chosen, then it is necessary to specify the names of the arguments whose order
has been altered.

seq(to = 6, from = 0, by = 0.5)


## [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Chapter 2

Stochastic processes

2.1 Introduction
The very basic idea for any analysis of the risk is a «stochastic process», defined as follows.

Definition 2.1.1 (Stochastic process). A stochastic process is a collection of stochastic variables indexed by
time.

Accordingly, a stochastic process can be defined either in discrete time

X0 , X1 , X2 , ...

or in continuous time: {Xt }t≥0 .


In Figure 2.1.1 we can see a financial example of a stochastic process: 250 daily observation of the S&P500 index
starting at January 3rd 1950. Thus, the need to formally define a stochastic process is evident. In the following
section we present one of the simplest stochastic processes.

2.2 The random walk


One of the simplest example of a stochastic process is the so-called random walk. Its name comes from the walk of
a drunk man who, at each step, sways either on the left or on the right. In the same way, we can roll a die and
move to the right if the face is even or to the left if it is odd. Let us assume we start our walk on the origin of a
Cartesian graph, then the «walk» can be represented as in Figure 2.2.1.
If we want to describe in a more formal way this random walk, we can start from a stochastic variable Yi (∀i ≥ 1)
which takes value either 1 or −1 with the same probability:
{
1
1, 2
Yi =
−1, 12

and whose two first moments are


E [Yi ] = 0,
[ ] 2
V [Yi ] = E Yi2 − E [Yi ] = 1,
where E [•] and V [•] are, respectively, the expected value and the variance operator.
If we assume that all the variables Yi are identically and independently distributed (i.i.d.), then the sum of all
the variables Yi for i ∈ {1, 2, ..., t} is both a stochastic process and a random walk. In particular, we can formally
define the random walk Xt as follows:

X0 = 0.
∑t
Xt = Yi ,
i=1

11
12 2 Stochastic processes

Figure 2.1.1: 250 observations of a stochastic process (daily prices of the S&P500 starting at 3-Jan-1950)
20
19
price

18
17

0 50 100 150 200 250

days
2.2 The random walk 13

Figure 2.2.1: Example of a random walk where, starting at zero (and going towards the abscissa arrow), a «walker»
turned once to the right, three times to the left, three times to the right, twice to the left and once to the right
2.0


1.5
1.0
direction of the walk

● ● ●
0.5
0.0

● ● ● ● ●
−0.5
−1.0

● ●

2 4 6 8 10

steps
14 2 Stochastic processes

Because of the i.i.d. hypothesis we can write


[ t ]
∑ ∑
t
E [Xt ] = E Yi = E [Yi ] = 0,
i=1 i=1
[ ]

t ∑
t
V [Xt ] = V Yi = V [Yi ] = t.
i=1 i=1

Thus, the random walk moves around a constant mean (which is zero) but the variance around that mean is
proportional to time. The longer the time (we are forecasting) the higher the volatility (i.e. the less reliable is the
forecast).
The random walk has the following properties:

• E [Xt ] = X0 = 0

• independent increment: Xti+1 − Xti are mutually independent

• stationarity: Xt+s − Xt has the same distribution as Xs

The R function that allows to simulate the extraction of a random number from a binomial distribution is

rbinom(n,size,prob)

where n is the number of the experiments, size is the number of trials performed in any experiment, and prob
is the probability of success. Thus, if we want to simulate 5 times (n) 2 trials from a binomial distribution whose
probability of success is 0.5, the number of success can be either 0, or 1 or 2. The possible outcomes are shown in
the following commands:

rbinom(5, 2, 0.5)
## [1] 1 0 1 0 0
rbinom(5, 2, 0.5)
## [1] 1 1 2 2 1
rbinom(5, 2, 0.5)
## [1] 0 1 1 1 0

If we want to simulate at the same time many random walks, we can think of a matrix containing as many rows
as the steps of the walk and as many columns as the number of the walks we want to generate. At each step we
extract only one value (i.e. n = 1) and we know that the events «going to the left» and «going to the right» have
the same probability (i.e. p = 0.5).
The events of the binomial distribution are either 0 or 1 (i.e. the function created a series of zeros and ones).
If we want to trace back this case to our random walk whose values are either 1 or −1, we can take the random
numbers generated by the function rbinom, multiply them by 2 and subtracting 1 (so that all the zeros become
−1 and all the ones remain the same).
Thus, the first step is to create the matrix Y containing the desired number of rows and columns. Then the
matrix X is created by taking as its first row a vector of zeros (in each simulation the value X0 must be zero) and
as its elements the cumulative sum (over the first dimension, i.e. the rows) of all the elements in matrix Y . In R
we can code a function that allows to obtain the matrix X as follows, where:

• Any «function» written in R must be contained inside braces.

• The input of the function are: the number of rows (r) and the number of columns (c) to be created.

• The output (returned by the function) is the matrix X containing the c simulated random walks for r periods.

• Inside the function, the command apply is used; this command allow to apply a given command (its third
argument) to the dimension (the second argument) of a given array or matrix (the first argument). In the
case of our code, the function cumsum is applied to the second dimension (i.e. columns) of the matrix Y .
2.3 The Central Limit Theorem 15

RW = function(r, c) {
Y = array(2 * rbinom(r * c, 1, 0.5) - 1, dim = c(r, c))
X = rbind(array(0, dim = c(1, c)), apply(Y, 2, cumsum))
return(X)
}

If the function is used for creating 100 simulations of the same random walk for t ∈ {1, 2, ..., 500} as follows

X=RW(500,100)

the result in Figure 2.2.2 is obtained, where:


• The command matplot draws the data from a matrix (or an array) by assuming that each column is a different
series that must be plot separately. Thus, we obtain as many lines as the columns of the matrix whose lengths
are all equal to the number of rows.
• The option type allows to set the kind of draw we want (in this case we take a «line» by setting type=’l’).
• The labels on the X-axis and Y-axis can be set through the commands: xlab=’string’ and ylab=’string’,
respectively. If a label is not set, the name of the variables are used.
• The option col=’lightgray’ for drawing all the random walks in light-grey colour.
• The command grid draws a grid.
Now, in order to check the moments of the random walk, we can compute the mean and the standard deviation
of the simulations. On the same graph we can represent the random walks previously generated, together with
their empirical and theoretical mean (which should be zero) and their empirical and theoretical standard deviation
(which should be the square root of t). The commands for plotting the random walks in matrix X, their mean
(computed along the second size of the matrix X) and their standard deviation (again along the second size of the
matrix X) are listed below where also the square root of time is plotted. In the code we use:
• The function lines which allows to add another plot without erasing the old one.
• The option lty=3 for defining the line type (1=continuous, 2=dashed, 3=dotted, 4=dashed-dotted, 5=long
dash, 6=long dash-short dash).
• The option lwd=3 for defining the line width (default is 1).
The result is shown in Figure 2.2.3.

2.3 The Central Limit Theorem


It is interesting to investigate which is the behaviour of a random walk when a sufficiently high number of extractions
is performed. The answer is provided by the well known Central Limit Theorem – CLT (that we provide here without
any proof) stating that the sum of n i.i.d. random variables with the same mean and the same variance converges
to a normal random variable for n approaching infinity.

Theorem 2.3.1 (Lindeberg–Lévy Central Limit Theorem). Suppose Yi is a sequence of i.i.d. random
variables with mean µ and (finite) variance σ 2 , then as n approaches infinity
(( n ) )
√ 1∑ d ( )
n Yi − µ − → N 0, σ 2 ,
n i=1

where N (•, •) is the normal distribution whose arguments are the mean and the variance, respectively.

Remark.
√ The CLT allows us to conclude that the rate of convergence of the empirical mean to the theoretical
one is n. This means that in order to improve the approximation of an expected value on Octave by 10 times,
the number of simulations must increase by 100 times.
16 2 Stochastic processes

Figure 2.2.2: 100 simulations of the same random walk Xt for t ∈ {1, 2, ..., 500}

X = RW(500, 100)
matplot(X, type = "l", ylab = "", col = "lightgray")
grid()
60
40
20
0
−20
−40

0 100 200 300 400 500


2.3 The Central Limit Theorem 17

Figure 2.2.3: 100 simulations of the same random walk Xt for t ∈ {1, 2, ..., 500} (in light grey), the mean and the
standard
√ deviation of the simulated random walks (in bold) and the theoretical mean (zero) and standard deviation
( t) of the random walks (dotted)

matplot(X, type = "l", col = "lightgray", ylab = "")


lines(apply(X, 1, mean))
lines(seq(0, 0, [Link] = 501), type = "l", lty = 3, lwd = 3)
lines(apply(X, 1, sd))
lines(sqrt(seq(1, 501)), type = "l", lty = 3, lwd = 3)
grid()
60
40
20
0
−20
−40

0 100 200 300 400 500


18 2 Stochastic processes

In the simple case of the random walk seen in the previous section, we can write

1 d
√ Xt −
→ N (0, 1) .
t

In R the command for simulating n extractions from a standard Gaussian distribution is

rnorm(n,mean=0,sd=1)

Now, we can compare the random walk generated by the binomial distribution and the same random walk
generated by the normal distribution by plotting them in two graphs through the following commands, where we
use:

• The function par(mfrow=c(r,c)) for split a graph into r × c smaller graphs. Each graph is then drawn from
the top-left to the bottom-right, raw by raw.

• The function par(mar=c(_,_,_,_)) for setting the margins between graphs. The first value is the lower
margin, the second is the left margin, the third is the upper margin, ad the fourth is the right margin.

The result is shown in Figure 2.3.1 where we actually see that the two simulations are very close. The powerful
result of the CLT is that we can simulate the random walk just through normally distributed variables.

2.4 The Brown/Wiener process


In order to have a sufficient number of observations for converging towards the normal distribution, we take each
period and we split it into a «sufficiently» high (possibly infinite) number of sub-periods with length dt (and we let
dt tend towards zero).
If in one period the variable Xt takes value either 1 or −1, in each of the sub-period it should take the value
either dt or −dt. Nevertheless, since this variable√converges towards a normal only when it is divided√ by the square
root of the time length
√ (which is, in√this case, dt), we must divide both possible outcomes by dt (and so we
−dt
obtain either √dtdt = dt or √ dt
= − dt). Now, we can define this new stochastic variable as
{√
1
dt, d
√ 2 −
→ N (0, dt) ≡ dWt , (2.4.1)
− dt, 1
2

where the process dWt is called Brownian motion or Wiener process because of the two main researchers who
studied it:

• the Scottish botanist Robert Brown (1773–1858) who discovered that some pollen particles suspended in a
liquid follow such a process;

• the US mathematician and statistician Norbert Wiener (1894–1964) who formally described the random walk
followed by particles suspended in a liquid.

Empirically, the switch between periods and sub-periods is justified by the frequency of the data. In economics
and finance all the variables are expressed by using the year as a unit of time. For instance, when we read that
the interest rate for investments having a 3 year maturity is 2%, this means that each year a capital of 100 Euros
generates an interest of 2 Euros. Nevertheless, the frequency of the financial data is much higher than the year.
So, for example, the asset prices and interest rate are usually available day by day (or even at an intra-day level).
1
Accordingly, if the time unit of measure if the year and daily data are available, we can take dt = 250 where 250 is
the number of working day in one year; in this way the year is «the period», while the day is the sub-period.
The previous function RW can be of course rewritten for simulating a random walk by suitably taking into
account the sub-periods. This time we must add dt among the inputs of the function, while the first input r (the
1
number of the periods to be simulated) is now expressed in years. Thus, if r = 2 and dt = 250 , then 2 years of 250
days each are simulated, which means that 500 simulations are performed.
2.4 The Brown/Wiener process 19

Figure 2.3.1: Comparison between two sets of 100 simulations of random walks generated by either the binomial
variable (top) or the normal variable (bottom)

par(mfrow = c(2, 1), mar = c(2, 3, 2, 2))


matplot(X, col = "lightgray", type = "l", ylab = "")
grid()
r = 500
c = 100
N = array(rnorm(r * c, mean = 0, sd = 1), dim = c(r, c))
matplot(rbind(array(0, dim = c(1, c)), apply(N, 2, cumsum)),
col = "lightgray", type = "l", ylab = "")
grid()
60
40
20
0
−40

0 100 200 300 400 500


40
20
0
−40

0 100 200 300 400 500


20 2 Stochastic processes

Figure 2.5.1: Information set at time t and at time t + 1 for a risky asset St and a risk-less asset Gt

St St+1 St+2 = Ft

Gt Gt+1 Gt+2 = Ft+1

2.5 Conditional expected values


An expected value in finance is usually computed conditionally to a given information set. Let us assume we have
a stochastic process Xt (with t ∈ {0, 1, 2, ...}), then we could be interested in computing the expected value of Xt+1
given all the past values of Xi for i = {0, 1, 2, ..., t}. Formally, if we call Ft = {X0 , X1 , ..., Xt } the information set,
we can write the expected value to be computed in the following way

E [ Xt+1 | Ft ] ≡ Et [Xt+1 ] ,

and during this presentation we will use the notation on the right hand side, where the subscript on the expected
value stands for the information set that is available when the expected value is computed.
We assume that information is never lost while time goes on or, more formally,

Ft ⊆ Ft+1 ,

i.e. everything which is known at time t is also known, for sure, at time t + 1 when new information may arise.
If the value of Xt is contained in the information set Ft , i.e., more formally,

Xt = Et [Xt ] ,

we say that Xt is Ft −measurable, or non-anticipated. This last term means that the value Xt is not known before
t and it becomes available in the information set only at time t.
In finance many variables are non-anticipated or Ft −measurable. For instance, the price of an asset is revealed
instant by instant on the stock exchange.
There exists a particular case of a variable which is known both in t and in t + 1 or, in other terms, which is
«anticipated». This is the case of a risk-less asset.
Let us call St the price, at time t, of a risky asset (e.g. a stock) whose price, one period ahead, is St+1 . In the
same way, we can define the price at time t of a risk-less asset as Gt (called «G» because only a Government is
assumed to be able to issue a risk-less asset) whose price, at time t + 1, is Gt+1 . The difference between St and Gt
stays exactly in their property of being measurable. In particular, we can write

Et [St ] = St ,

Et [Gt ] = Gt ,
Et [Gt+1 ] = Gt+1 ,
since the value of the risk-less asset at time t + 1 is already known at time t (if this were not the case, Gt would
not be a risk-less asset). In another way, we can write

Ft = {St , Gt , Gt+1 } .

If we take into account three periods, with prices in t, in t + 1 and in t + 2, the situation can be represented as
in Figure 2.5.1.
Finally, the same property implies that the return (in the period [t, t + 1]) on a risk-less asset:

Gt+1 − Gt
≡ r,
Gt
2.6 Tower property of iterated expected values 21

is known at time t, while the return on a risky asset (in the same period)
St+1 − St
≡ µ,
St
is a stochastic variable (i.e. it is not known at time t).
It is important to stress that also the risk-less asset is a stochastic variable since at time t we know both Gt and
Gt+1 , but we do not know Gt+2 . Thus, the only difference with respect to a risky asset is that Gt is predictable
(but just for one period ahead).

2.6 Tower property of iterated expected values


An expected value can be computed as the weighted sum of a stochastic process, where the weights are given by
the probability of each value. In discrete time the expected value can be written as


k
Et [XT ] = XT,i pi ,
i=1

where k is the number of the states of the world, pi is the probability of each state, and XT,i is the value of the
stochastic process X at time T and in the state i. In continuous time, instead, the expected value is computed as
follows: ∫
Et [XT ] = XT (ω) dP T |t (ω) ,

where P T |t (ω) is the probability (i.e. the cumulative distribution function – CDF) of the event XT , given the
information set at time t, and when the event ω happens. Here, Ω is the set of all the possible values of ω.
The information set F and the probability P are two fundamental elements of a so-called «probability space».
The third element of this space is the set of all the possible outcomes (so-called «sample space»). Until now this
set has not been specified because we have assumed that it coincided with F (i.e. the set of all the past values of
the stochastic process Xt ). In a more general framework, the set of all the possible outcomes could even contain
non numeric values (i.e. the outcomes of tossing a coin). If we call Ω the sample space, then the full probability
space is defined as the following triplet
(Ω, Ft , Pt ) .
With this toolbox, we are now ready to compute the expected value of an expected value, under two different
information sets. In particular, we want to show what follows.

Proposition 2.6.1 (Iterated expectation – Tower property). For any t0 ≤ s ≤ T , the following property
holds:
Et0 [XT ] = Et0 [Es [XT ]] .

The first passage is to write the inner expected value as an integral:


[∫ ]
Et0 [Es [XT ]] = Et0 XT (u) dP T |s (u) ,

then the same is done for the «external» expected value


∫ (∫ )
Et0 [Es [XT ]] = XT (u) dP T |s (u) dP T |t0 (u) .
Ω Ω

Now, we can switch the order of the integrals:


∫ (∫ )
Et0 [Es [XT ]] = XT (u) dP T |t0 (u) dP T |s (u) ,
Ω Ω

and write back the «external» integral as an expected value:


[∫ ]
Et0 [Es [XT ]] = Es XT (u) dP T |t0 (u) ,

22 2 Stochastic processes

but since all the stochastic variables in the integral are Ft0 −measurable, they are also known at time s ≥ t0 . Thus,
we can remove the external expected value and the final result is found

Et0 [Es [XT ]] = XT (u) dP T |t0 (u) = Et0 [XT ] .

This property will be mainly useful when credit or mortality risk must be modelled.

2.7 Martingales
A fundamental property of some stochastic processes that is most often used in finance is the so-called martingale.
A stochastic process is a martingale when the best predictor at time t of its future value is the value at time t itself.
More formally we can write what follows.

Definition 2.7.1 (Martingale). A stochastic process is a martingale if for all t ≥ 0

Xt = Et [ Xt+1 | Ft ] ,

where Ft = {X0 , X1 , ..., Xt }.

A random walk is a martingale and this can be demonstrated as follows, with T > t:
[ T ] [ t ]
∑ ∑ ∑T
Et [XT ] = Et Yi = Et Yi + Yi
i=1 i=1 i=t+1
[ ] [ ]

T ∑
T
= Et Xt + Yi = Xt + Et Yi
i=t+1 i=t+1
= Xt ,
where we have used the property that Xt is Ft −measurable (since Ft does contain Xt ). The following result applies.

Proposition 2.7.1. A random walk is a martingale.

Example 2.7.1. Another example of a martingale can be created by defining Yi i.i.d. as


{
2, 13
Yi = 1 2
2, 3

where E [Yi ] = 1. Now, we define the variable


T
XT = Yi ,
i=1
X0 = 1,

and we obtain
[ ] [ ] [ ]

T ∏
t ∏
T ∏
T
Et [XT ] = Et Yi = Et Yi Yi = Et Xt Yi
i=1 i=1 i=t+1 i=t+1
[ ]

T ∏
T ∏
T
= Xt Et Yi = Xt Et [Yi ] = Xt 1 = Xt ,
i=t+1 i=t+1 i=1

and this is true also for t = 0. Thus, XT as defined above is a martingale. In these passages we have used: (i)
the fact that Xt is Ft −measurable and (ii) the fact that Yi are i.i.d.
2.8 Stopping time 23

The Wiener process is a martingale and, in fact, knowing that, for any s ≥ t

Et [dWs ] = 0,

we can integrate both sides


∫ T ∫ T
Et [dWs ] = 0ds,
t t
and, since the integral and the expected value can be switched,
[∫ ]
T
Et dWs = 0.
t

The integral of a differential of a function coincides with the function computed in the upper bound reduced by
the function computed in the lower bound:
Et [WT − Wt ] = 0,
and since Wt is Ft −measurable, we can conclude

Et [WT ] = Wt ,

which is, in fact, the martingale property.


We will see that the martingale property plays a crucial role on the financial market.

2.8 Stopping time


Given a stochastic process Xt , we can define a time (τ ) when a particular event (depending on Xt ) occurs. If it
is possible to check whether such an event has occurred at time t only by using the information set Ft , then τ is
said to be a «stopping time». In fact, in this case at each time we are able to check whether the process must be
«stopped» or not.
Let us think of a coin toss game, where we gain 1 Euro in case of «head» and we lose 1 Euro in case of «tail»
(this is, of course, a random walk). If we define τ as the first time when the balance becomes 100 Euros, then τ
is a stopping time. Instead, is we define τ as the first time of a peak (i.e. a decrease in the balance which comes
immediately after an increase), then this is not a stopping time. The reason is that if you want to know whether
you have reached a pick at time t, then you need to know the value in t + 1, i.e. you have to wait one period more
(and this value is not an element of the information set at time t). In other words, at time t you still do not know
whether you can stop the process or not. Thus, this time, τ is not a stopping time.
More formally we can use the following definition.

Definition 2.8.1 (Stopping Time). Given a stochastic process Xt , a non-negative integer random variable τ
is called a «stopping time» if for all integer k ≥ 0, the event τ ≤ k depends only on {X0 , X1 , ...Xk }.

In finance there are many examples of stopping time. For instance the time of default for a firm is a stopping
time. One of the first models used for describing the event of default is the so-called threshold model: if Xt is the
value of the firm (i.e. the value of its assets reduced by the value of its liabilities), then the default (τ ) is the first
time when Xt = 0. Of course τ is a stopping time.
A martingale is still a martingale even if it is evaluated at a stopping time.

Theorem 2.8.1 (Optional Stopping Time). Given a martingale Xt for t ∈ [0, T ], and a stopping time
τ ≤ T , then Et [Xτ ] = Xt ·

One of the problems that can be solved by using the optional stopping time theorem is the so-called Gambel’s
ruin problem. Let us take again the coin toss game whose total gain Xt is a martingale. We define τ the first
time when Xτ reaches either A Euros or −B Euros. What is the probability to reach first the threshold A (or −B)?
From Theorem 2.8.1, we know that
E0 [Xτ ] = X0 = 0.
24 2 Stochastic processes

If we call p the probability to reach first the amount A (and, thus, 1 − p is the probability to reach first the
amount −B) the previous equation can be written as

pA + (1 − p) (−B) = 0,

from which we can estimate the probability to reach A:


B
p= .
A+B

2.9 Low of the maximum


In some application it is useful to define the maximum value that a Brownian motion has reached during a given
period of time. Formally,
Mt = max Ws .
s≤t

The probability that the maximum is higher than a constant threshold can be expressed as a function of the
probability that the Brownian motion itself is higher than the same threshold. The result is as follows.

Theorem 2.9.1. For all t > 0 and a > 0

P {Mt > a} = 2P {Wt > a} .

Proof. In order to prove the theorem, we define a stopping time τa as the first time Wt reaches the threshold a
or, in formal terms,
τa = min {t : Wt = a} ,
then, because of the symmetry of Wt we can write1

P {Wt > Wτa ∩ t > τα } = P {Wt < Wτa ∩ t > τα } ,

or, in other terms, the probability that, after reaching the threshold, the Wiener process is either higher or lower
than the threshold itself is the same.
Now, the probability that the maximum at time t is higher than the threshold coincides with the probability
that the time t is higher than τa :
P {Mt > a} = P {t > τa } .
Any probability can be written as the sum of two conditional probabilities, if the conditions are collectively
exhaustive as follows

P {Mt > a} = P {t > τa }


= P {Wt > Wτa ∩ t > τα } + P {Wt < Wτa ∩ t > τα }
= 2P {Wt > Wτa ∩ t > τα }
= 2P {Wt > a ∩ t > τα }
= 2P {Wt > a} ,

where the last two passages are true because Wτa = a (since the Wiener process is continuous) and if Wt > a
we know that τa happened for sure before t (thus the probability of the conditional event is the same as the
probability of the unconditional event).

We know that Wt is distributed normally with mean zero and variance t. Thus, the probability in Theorem
2.9.1, can be computed as follows
{ } { } ( )
Wt a Wt a a
P {Wt > a} = P √ > √ =P − √ <− √ =Φ − √ ,
t t t t t
where Φ (•) is the CDF of the standard normal variable.
2.9 Low of the maximum 25

We can write a function in R which computes the probability P {Mt > a} both in closed form and by simulation.
The inputs of the function are: the threshold a, the time horizon t, the time interval dt for simulating the Wiener
differential dWt , and the number of simulations to be performed (N ). The outputs are: the theoretical probability
(pt) and the empirical probability (p). The function can be written as follows, where we use
• the function [Link](condition) which gives 1 is the «condition» written in the argument is true, and
0 otherwise.

maxW = function(a, dt, t, N) {


dW = array(rnorm(t/dt * N, 0, 1), dim = c(t/dt, N)) * sqrt(dt)
W = rbind(array(0, dim = c(1, N)), apply(dW, 2, cumsum))
Wmax = apply(W, 2, max)
p = mean([Link](Wmax > a))
pt = 2 * pnorm(-a/sqrt(t), 0, 1)
return(c(p, pt))
}

In the function we have simulated the Wiener process Wt by summing the differentials dWt , in fact (with
W0 = 0):
∫ t
Wt = dWs .
0
The empirical probability is obtained by computing the mean of a function whose value is 1 if the maximum of
Wt is greater than a and 0 otherwise.
Finally, the theoretical probability is computed by using the function

pnorm(x,mean,sd)

which gives the cumulative distribution function of a Gaussian variable in x (i.e. the probability that a Gaussian
variable takes values lower than x).
The result is shown in the following commands.

maxW(3, 1/250, 2, 10000)


## [1] 0.03150000 0.03389485

The rate of convergence to the exact value is quite low (since it is proportional to the square root of the number of
simulations N ) and a very large number of experiments should be needed in order to obtain a better approximation.
Chapter 3

Stochastic calculus

3.1 Stochastic differential equations


Once the differential of a Wiener process (dWt ) has been defined, we can build on it other stochastic processes. The
general formulation is as follows

dXt = h (t, Xt ) dt + g (t, Xt ) dWt , (3.1.1)


where the functions f and g are called, drift and diffusion respectively.

Remark 3.1.1. There are technical (Lipschitzian) conditions on the function f and g in (3.1.1) which guarantee
the existence of a unique solution to this differential equation. For more details the reader is referred, for
instance, to Karatzas and Shreve [1991], ksendal [2000]. Here, we note that these conditions are sufficient, but
not necessary.

In the ordinary calculus, when g = 0, the function h (t, Xt ) coincides with the derivative of Xt with respect to
time. In fact, in this case, we can write

dXt
= h (t, Xt ) .
dt
In order to apply the same idea to a differential equation with a non zero diffusion, the Wiener process dWt
should be differentiable (i.e. the ratio dW
dt should exist). Unfortunately, this is not the case as we show in the
t

following section and, accordingly, a new version of differential calculus will be needed.

3.2 Quadratic variation


When a function is differentiable, its quadratic variation is nil, while this is no more true for stochastic variables.
We are about to demonstrate this property.
We divide the time interval [0, T ] into n sub-periods, each of length Tn . Each time ti of these sub-period can be
written as Tni such that the length of the time interval [ti , ti+1 ] is constant and equal to T (i+1)
n − Tni = Tn .
Think of any differentiable function f (t). The mean value theorem allows us to state that there exists si ∈
[ti , ti+1 ] such that
f (ti+1 ) − f (ti )
f ′ (si ) = ,
ti+1 − ti

f (ti+1 ) − f (ti ) = f ′ (si ) (ti+1 − ti ) ,


and, if we take the square of both sides and we sum them, we can write


n−1 ∑
n−1
(f ′ (si ) (ti+1 − ti )) .
2 2
(f (ti+1 ) − f (ti )) =
i=0 i=0

27
28 3 Stochastic calculus

Now, if we substitute all the n terms f ′ (si ) with their maximum, we obtain


n−1 ( ) n−1

max f ′ (s)
2 2 2
(f (ti+1 ) − f (ti )) ≤ (ti+1 − ti ) ,
0≤s≤T
i=0 i=0

and since ti+1 − ti = T


n, we have


n−1 ( ) n−1
∑ ( T )2 ( ) 2
T
max f ′ (s) = max f ′ (s)
2 2 2
(f (ti+1 ) − f (ti )) ≤ .
i=0
0≤s≤T
i=0
n 0≤s≤T n

If we take the limit for n → ∞, the quadratic variation goes to zero. So, for any differentiable function, the
quadratic variation is zero. In continuous time we can write
∫ T
2
(df (s)) = 0.
0

Now, we can check what happens to the quadratic variation of a Brownian motion. Since we know that
( )
T
Wti+1 − Wti ∼ N 0, ,
n

we can define
Zi ≡ Wti+1 − Wti ,
and write ( )

n−1
1∑ 2
n−1
T
Zi2 =n Z → nV [Zi ] = n = T,
i=0
n i=0 i n

where the convergence (→) holds because of the strong version of the Law of Large Numbers and because all the
variances of Zi are the same. Thus, we can conclude what follows.

Theorem 3.2.1 (Quadratic variation). Let ti = i Tn for all T > 0:


n−1
( )2
lim Wti+1 − Wti = T.
n→∞
t=0

In continuous time we can write ∫ T


2
(dWt ) = T,
0

and, in differential terms,


2
(dWT ) = dT.

Remark 3.2.1. Note that, given (2.4.1), the square of dWt can be written as
{
2 dt, 12
(dWt ) =
dt, 12

2
which is a degenerate stochastic variable such that (dWt ) = dt.

We can check this result in R through the following commands, where we assume T = 10, a sufficiently small dt
(10−6 ) and the Wiener differentials dWt are generated. The sum of their square values approaches T .
3.3 Correlated Wiener processes 29

Table 3.2.1: Multiplicative rules for Itô’s calculus


× dt dWt
dt 0 0
dWt 0 dt

T = 10
dt = 10^(-6)
for (i in 1:5) {
dW = rnorm(T/dt, 0, sqrt(dt))
print(sum(dW^2))
}
## [1] 9.99569
## [1] 10.00288
## [1] 9.995888
## [1] 9.996089
## [1] 10.00076

Given this result, we can write the following limit



Wt+dt − Wt dt 1
lim ≃ lim = lim √ = +∞,
dt→0 dt dt→0 dt dt→0 dt

which implies that the derivative of Wt (i.e. dW


dt ) does not exist and, then, the Wiener process cannot be differen-
t

tiated.
Because of this quadratic variation property (the quadratic variation which does not vanish), the stochastic
integral is not the same for any point of the interval [ti , ti+1 ] on integration is performed (as it happens for the
Riemann integral). The Itô’s integral is obtained when the leftmost of each interval is taken. When the rightmost
is taken, another calculus is obtained: the Stratonovich calculus. In the Stratonovich version of calculus, the usual
properties of integration and differentiability hold but, unfortunately, it cannot be applied in finance. In fact, in
the Stratonovich version of stochastic calculus, the idea to integrate on the right side of each time interval coincides
with the hypothesis to know the future, which is not realistic in finance. Thus, in order to develop a consistent
financial framework, we have to rely on Itô’s calculus (where the quadratic variation of a Wiener process does not
vanish).
In what follows we will use the properties of dWt that we have just demonstrated and that are summarised
in Table 3.2.1, where the property dt × dWt = 0 directly comes from the quadratic variation result: dt × dWt =
√ 3 3
dt × dt = (dt) 2 ; if we let dt → ∞, then (dt) 2 converges to zero faster than dt and can be set to 0.

3.3 Correlated Wiener processes


In some financial applications, more than one Wiener process is used. In this case it is necessary to define whether
the involved Wiener processes are correlated. For instance, Ŵt cold be a vector of k correlated Wiener processes:
 
Ŵ1,t
 Ŵ2.t 
Ŵt =  
 ...  .
Ŵk,t

If the correlation between the Wiener processes Ŵi,t and Ŵj,t is


[ ] [ ]
Ct Ŵi,t , Ŵj,t = ρi,j t = Ct Ŵj,t , Ŵi,t = ρj,i t,
30 3 Stochastic calculus

then the mean and the variance of Ŵt can be written as


 
1 ρ1,2 ... ρ1,k
[ ] [ ]  ρ 1 ... ρ2,k 
Et Ŵt = 0, Vt Ŵt =  2,1
 ...
t,
... ... ... 
ρk,1 ρk,2 ... 1
| {z }
R

where 0 is a vector of zeros and R is the variance-covariance matrix of Ŵt .


Now, we want to rewrite Ŵt by using independent Wiener processes according to the following model:
  
c1,1 c1,2 ... c1,k W1,t
 c2,1 c2,2 ... c2,k   W2.t 
Ŵt = C Wt =   ...
 , (3.3.1)
k×1 k×kk×1 ... ... ...   ... 
ck,1 ck,2 ... ck,k Wk,t

where Wt is a vector of independent Wiener processes and C is a square matrix of coefficients that must be
determined. Since Wt is normally distributed, we know that a linear transformation of it is still normally distributed.
In particular, CWt and Ŵt are equivalent if they have the same mean and the same variance. The mean is already
equal since they have zero mean:
[ ]
Et Ŵt = Et [CWt ] = CEt [Wt ] = 0,

and now we have to find C such that also the variance is the same:
[ ]
Vt Ŵt = Vt [CWt ] = CVt [Wt ] C ′ = CC ′ t,

where C ′ is the transposed of C. Thus, we can conclude that Equation (3.3.1) has a solution if there exists a matrix
C such that
CC ′ = R.
Cholesky has demonstrated that if R is a square positive semi-definite (which is our case here), then there exists
a sub-triangular matrix C which satisfied the previous equality. We recall that a sub-triangular matrix is defined
as follows  
c1,1 0 ... 0
 c2,1 c2,2 ... 0 
C=  ...
.
... ... ... 
ck,1 ck,2 ... ck,k
As an exercise, we show the application of this result for two correlated Wiener processes:
[ ] [ ] [ 1 ρ ]
Ŵ1,t
Ŵt = , Vt Ŵt = t.
Ŵ2.t ρ 1

In this case we must find C such that


[ ][ ] [ ] [ ]
c1,1 0 c1,1 c2,1 c21,1 c1,1 c2,1 1 ρ
CC ′ = = = .
c2,1 c2,2 0 c2,2 c2,1 c1,1 c22,1 + c22,2 ρ 1

There exist four solutions to this system (of three equations in three unknowns) and one of them is
[ ]
1 √ 0
C= .
ρ 1 − ρ2

Thus, we can conclude that the initial vector of correlated Wiener processes Ŵt can be written by using inde-
pendent Wiener processes as follows
[ ] [ ][ ] [ ]
Ŵ1,t 1 √ 0 W1,t W1,t

Ŵt = = = .
Ŵ2.t ρ 1 − ρ2 W2.t ρW1,t + 1 − ρ2 W2.t
3.4 Itô’s calculus 31

Remark 3.3.1. When a financial model is written on correlated Wiener processes, it can always be traced
back to a fully equivalent model with independent Wiener processes through the Cholesky decomposition of a
variance-covariance matrix.

3.4 Itô’s calculus


The Taylor expansion of the differential of any function f (t) (which can be infinitely differentiated) is given by
∂f (t) 1 ∂ 2 f (t) 2 1 ∂ n f (t) n
df (t) = dt + (dt) + ... + (dt) + ...
∂t 2 ∂t2 n! ∂tn
∑∞
∂ i f (t) i
= i
(dt) .
i=1
∂t

As we have already demonstrated in the previous sections, the quadratic variation of any differentiable function
is zero (and, a fortiori, the same is true for any other power of the variation higher than two). This implies that,
i
for any differentiable function f (t), all the terms (dt) = 0 for any i ≥ 2 and, so
∂f (t)
df (t) = dt.
∂t
Nevertheless, we have also demonstrated that the quadratic variation of a stochastic process does not vanish
(while the variation of the higher powers, of course, do vanish).
Thus, if we define a stochastic process as the solution to the following stochastic differential equation

dXt = h (t, Xt ) dt + g (t, Xt ) dWt ,

we can use the Taylor expansion for computing the dynamics of a function Y (t, Xt ) as follows:
∂Y ∂Y 1 ∂2Y 2
dY = dt + dXt + (dXt ) .
∂t ∂Xt 2 ∂Xt2
Now, if dXt is plugged into this differential (and recalling the rules in Table 3.2.1) we obtain the final result:
( )
∂Y ∂Y 1 ∂2Y 2 ∂Y
dY = + h (t, Xt ) + g (t, Xt ) dt + g (t, Xt ) dWt ,
∂t ∂Xt 2 ∂Xt2 ∂Xt
which is known as Itô’s lemma.

Lemma 3.4.1 (Itô’s lemma). Given a stochastic process Xt which solves the differential equation

dXt = h (t, Xt ) dt + g (t, Xt ) dWt ,

any function Y (t, Xt ) which is at least differentiable once w.r.t to its first argument and twice w.r.t. its second
argument, solves the following differential equation:
( )
∂Y ∂Y 1 ∂2Y 2 ∂Y
dY = + h (t, Xt ) + 2 g (t, Xt ) dt + g (t, Xt ) dWt .
∂t ∂Xt 2 ∂Xt ∂Xt

Itô’s lemma is useful for reformulate the usual rules of calculus once applied to a stochastic framework. Let us
assume that, in the differential dXt we have X0 = 0, h (t, Xt ) = 0 and g (t, Xt ) = 1, so that Xt = Wt . Then, we
can use Itô’s lemma on a function Y (Wt ), as follows
1 ∂ 2 Y (Wt ) ∂Y (Wt )
dY (Wt ) = 2 dt + dWt .
2 ∂Wt ∂Wt
Then, if this differential is integrated in the interval [0, t]:
∫ t ∫ t ∫ t
1 ∂ 2 Y (Ws ) ∂Y (Ws )
dY (Ws ) = 2
ds + dWs ,
0 0 2 ∂W s 0 ∂Ws
32 3 Stochastic calculus

we finally obtain ∫ ∫
t t
∂Y (Ws ) 1 ∂ 2 Y (Ws )
dWs = Y (Wt ) − Y (0) − ds, (3.4.1)
0 ∂Ws 2 0 ∂Ws2
where we observe that, in this framework, it is no more true that the integral of a derivative of a function coincides
with the function itself, i.e. ∫ t
∂Y (Ws )
dWs ̸= Y (Wt ) − Y (0) .
0 ∂Ws

Example 3.4.1. Let us take the function


Wtn+1
Y (Wt ) = ,
n+1
where n is a real number different from −1. In this case Equation (3.4.1) can be written as (we recall that
W0 = 0)
∫ t ∫ t
Wtn+1 1
n
Wt dWs = − n Wsn−1 ds.
0 n+1 2 0
where the second term in the right hand side arises because Wt is a stochastic variable.1

3.5 Some properties of Itô processes


In this section we present two relevant properties of Itô processes. The first one is that any Itô integral is a
martingale. If we define ∫ t
It ≡ f (s, Xs ) dWs , (3.5.1)
t0

for any real function f (t, Xt ) of an Ft −measurable stochastic variable Xt , where of course It0 = 0, then we have

It0 = Et0 [It ] .

The demonstration is almost trivial if the tower property (Proposition 2.6.1) is used:

[∫ t ] [ [∫ t ]]
Et0 f (s, Xs ) dWs = Et0 Es f (s, Xs ) dWs
t0 t0
[∫ t ]
= Et0 Es [f (s, Xs ) dWs ]
t0
[∫ t ]
= Et0 f (s, Xs ) Es [dWs ]
t0
= 0,

where we have used the following properties:


• the variable Xs is Fs −measurable and, thus,

Es [f (s, Xs )] = f (s, Xs ) ,

• the expected value of a Wiener differential is zero.

Proposition 3.5.1. Any Itô integral of a real function of an Ft −measurable variable (as in (3.5.1)) is a
martingale, and its expected value is zero.

The result of Proposition 3.5.1 can be applied when we take the expected value of both sides of Equation (3.4.1)
for any function Y (Wt ) (in this case Xt coincides with Wt that is, of course, Ft −measurable):
[∫ t ] [ ∫ ]
∂Y (Ws ) 1 t ∂ 2 Y (Ws )
E0 dWs = E0 Y (Wt ) − Y (0) − ds ,
0 ∂Ws 2 0 ∂Ws2
3.5 Some properties of Itô processes 33

which simplifies to ∫ [ ]
t
1 ∂ 2 Y (Ws )
0 = E0 [Y (Wt )] − Y (0) − E0 ds,
2 0 ∂Ws2
and, finally, ∫ [ 2 ]
1 t ∂ Y (Ws )
Y (0) = E0 [Y (Wt )] − E0 ds.
2 0 ∂Ws2
An obvious result is obtained when the second derivative of Y (Wt ) with respect to Wt is zero (i.e. when Y (Wt )
is a linear transformation of Y (Wt )).

Corollary 3.5.1. A linear transformation of a Wiener process (like Y (Wt ) = a + bWt ) is a martingale.

The second property that we demonstrate is the so-called «isometry» of an Itô integral. We have just demon-
strated that the expected value of an Itô integral It (as in (3.5.1)) is zero. Now, we want to compute its variance
(since the expected value is zero, the variance coincides with the second moment):
[(∫ )2 ] [ [(∫ )2 ]]
t t
Et0 f (s, Xs ) dWs = Et0 Es f (s, Xs ) dWs
t0 t0
[ [(∫ )2 ]]
t
= Et0 Es f (s, Xs ) dWs − 0
t0
[ [∫ t ]]
= Et0 Vs f (s, Xs ) dWs
t0
[∫ t ]
= Et0 Vs [f (s, Xs ) dWs ]
t0
[∫ t ]
2
= Et0 f (s, Xs ) Vs [dWs ]
t0
[∫ t ]
2
= Et0 f (s, Xs ) ds ,
t0

where the demonstration uses the following properties:


• the tower property (Proposition 2.6.1);
• the independence of the increments of an Itô process; thus, the variance of a sum (integral) of increments is
the sum (integral) of the variance of the increments;
• the variable X (s) is Fs −measurable;
• the variance of the Wiener differential dWs is ds.

Proposition 3.5.2 (Itô isometry). Given any real function f (t, Xt ) of an Ft −measurable stochastic variable
Xt , the following property holds
[(∫ )2 ] ∫ t
t [ ]
2
Et0 f (s, Xs ) dWs = Et0 f (s, Xs ) ds.
t0 t0


Example 3.5.1. An easy application can be obtained when f (t, Xt ) = t, as follows:
[(∫ )2 ] ∫ t
t√
1
E0 sdWs = tds = t2 ,
0 0 2
and when t = 1 [(∫ )2 ]
1 √ 1
E0 sdWs = .
0 2
34 3 Stochastic calculus

If we want to check this result in Octave, we can use the following commands where:
• The values of dWt are created between time 0 and time T = 1 (this period is divided into sub-periods whose
length is dt).

• The sum (as an approximation of the integral) of the products sdWs is computed for s ∈ {dt, 2dt, 3dt, ..., T }.
• Finally the mean of all these results is computed for N = 104 simulations.

N = 10^4
dt = 1/250
T = 1
dW = array(rnorm(T/dt * N, 0, sqrt(dt)), dim = c(T/dt, N))
mean(apply(sqrt(seq(dt, T, dt)) * dW, 2, sum)^2)
## [1] 0.5064259

3.6 Simulation of stochastic process


In this section we present an Octave function for simulating a stochastic differential equation like (3.1.1). The
first attempt to simulate a continuous time differential equation through a discrete version of it (i.e. a difference
equation) dates back to Euler. The Euler method is based on the idea to rewrite (3.1.1) in a discrete version. Let
us call
Xti = xi ,
and, thus
dXt ≃ Xti+1 − Xti = xi+1 − xi ,
then we can write
xi+1 = xi + h (ti , xi ) dt + g (ti , xi ) dWti , (3.6.1)
where, of course
dt ≃ ti+1 − ti .

Remark 3.6.1. The continuous time stochastic process (3.1.1), once written in discrete time as in (3.6.1) is an
auto-regressive model of order one (AR (1)). In continuous time it is impossible to obtain any auto-regressive
process of order higher than 1.

The idea, then, is to start from an initial value x0 (at time t0 ), find x1 by using (3.6.1) as

x1 = x0 + h (t0 , x0 ) dt + g (t0 , x0 ) dWt0 ,

then find x2 by using x1 and so on where, each time, ti+1 = ti + dt.


In R we define a function (called euler) which produces N possible paths for the variable x, on an interval
between t0 and T (by dividing each period in dt sub-periods). Also the drift (h) and diffusion (g) functions must
be defined in the inputs. In order to create a general function which accepts any possible functional form for the
drift and the diffusion we use the command

eval(parse(’function’))

which evaluate a text as a command. In our case the drift and the diffusion will be functions of the arguments: the
time (t) and the space (x). Thus, we can write the following commands.

euler = function(drift, diffusion, dt, T, t0, x0, N) {


h = function(t, x) eval(parse(text = drift))
g = function(t, x) eval(parse(text = diffusion))
x = array(0, dim = c(T/dt, N))
x[1, ] = rep(x0, N)
t = t0
3.6 Simulation of stochastic process 35

Figure 3.6.1: 100 simulations of a daily Wiener process for a 2 year period

W = euler(drift = "0", diffusion = "1", dt = 1/250, T = 2, t0 = 0,


x0 = 0, N = 100)
matplot(W, type = "l", xlab = "days", ylab = "Wiener process",
col = "lightgray")
grid()
4
2
Wiener process

0
−2
−4

0 100 200 300 400 500

days

for (i in 2:(T/dt)) {
dx = h(t, x[i - 1, ]) * dt + g(t, x[i - 1, ]) * rnorm(N) *
sqrt(dt)
x[i, ] = x[i - 1, ] + dx
t = t + dt
}
return(x)
}

Remark 3.6.2. The «drift» and «diffusion» inputs of the euler function must be written between inverted commas
and the variables «time» and «space» must be specified with the letters «t» and «x». If this is not the case the
function may lead to wrong computations and give an error message.
If we set the drift to zero and the diffusion to 1, we can obtain 100 simulations of a daily Wiener process Wt for
a 2 year period (as shown in Figure 3.6.1).
Chapter 4

Stochastic processes used in finance

4.1 Linear stochastic differential equation


One of the most commonly used stochastic processes in finance is written in the following way

dXt = (a (t) Xt − b (t)) dt + g (t, Xt ) dWt , (4.1.1)

where the drift is linear in Xt and a (t) and b (t) are deterministic functions. Without knowing the functional form
of the diffusion, we are not able to compute the solution to (4.1.1). Nevertheless, we are able to compute the
expected value of Xt by applying Itô’s lemma to the following function
∫t
− a(u)du
Xt e t0
.

The differential of this product is1


( ∫ ) ∫ ∫
− t a(u)du − t a(u)du − t a(u)du
d Xt e t0 = −b (t) e t0 dt + e t0 g (t, Xt ) dWt ,

whose expected value at time t0 is (we use Proposition 3.5.1)


[ ( ∫ )] [ ∫ ]
− t a(u)du − t a(u)du
Et0 d Xt e t0 = Et0 −b (t) e t0 dt .

Now, if both sides are integrated between t0 and T , the following result is obtained
[ ∫ ]
[ ∫ ] T ∫
− tT a(u)du − ts a(u)du
Et0 XT e 0 − Xt0 = Et0 − b (s) e 0 ds ,
t0

and then [∫ ]
T ∫s ∫T
− a(u)du − a(u)du
Xt0 = Et0 b (s) e t0
ds + XT e t0
. (4.1.2)
t0

We see that this equation has a suitable financial interpretation when Xt0 is interpreted as the value of an asset
at time t0 :

• the function a (t) plays the role of a discount interest rate; in fact, the exponential of the negative integral is a
discount factor in continuous time; the value of the function a (t) will be different according to the framework
which the asset will be evaluated in;

• the function b (t) measures the cash flows paid by the asset from time t0 to time T ;

• the value XT is the value at which the asset is expected to be sold at time T or, alternatively, the last cash
flow it pays.
1 We
∫t
recall that the derivative of the integral t0 a (u) du with respect to t is a (t).

37
38 4 Stochastic processes used in finance

4.2 Mean reverting process


If the signs of a (t) and b (t) in (4.1.1) are changed and they are assumed to be constant, another interesting version
of a linear process is obtained:
dXt = α (β − Xt ) dt + g (t, Xt ) dWt . (4.2.1)
If we apply the solution (4.1.2) to this case (with, of course, a = −α and b = −αβ), we obtain
[ ∫ ]
T ∫s ∫T
αdu αdu
Xt0 = Et0 − αβe t0
ds + XT e t0

t0
∫ T
= − αβeα(s−t0 ) ds + Et0 [XT ] eα(T −t0 )
t
( 0 )
= β 1 − eα(T −t0 ) + Et0 [XT ] eα(T −t0 ) ,

and finally, we have


Et0 [XT ] = (Xt0 − β) e−α(T −t0 ) + β. (4.2.2)
This result implies that, if α > 0, then in equilibrium (i.e. for T → ∞) the expected value of the process will
converge towards β, which can be interpreted as the long term mean of the process. Thus, α can be seen as the
speed at which such a convergence happens.
The process (4.2.1) is called a mean reverting process (for α positive) since it tends to revert towards its long
term mean. The expected value of (4.2.1) is

Et [dXt ] = α (β − Xt ) dt,

and, thus, if at time t, Xt > β (or Xt < β), the expected value of the differential dXt is negative (positive) and Xt
tends to decrease (towards β).
We can draw the behaviour of the expected value (4.2.2) as in Figure 4.2.1, where we see that the higher α the
higher the speed of convergence.
Many variables in economics exhibit a mean reversion behaviour like: the growth rate of GDP, the inflation rate,
the interest rate, the (foreign) exchange rate and, in general, the growth rates of any economic variable. Thus, the
process (4.2.1) is widely used in economics and finance.
In the following sections, we will take into account three processes that are obtained by giving particular
functional forms to the diffusion term g in (4.2.1).

4.3 The Ornstein-Uhlenbeck process


When the diffusion term in (4.2.1) is constant (let us call this term σ), we obtain the so-called Ornstein-Uhlenbeck
process that Vasiček [1977] used for the first time to describe interest rate (in fact, in finance the process is often
called after Vasiček):
dXt = α (β − Xt ) dt + σdWt . (4.3.1)
This process has a closed form solution that can be found by applying Itô’s lemma to

Xt eα(t−t0 ) ,

as follows ( )
d Xt eα(t−t0 ) = eα(t−t0 ) αβdt + eα(t−t0 ) σdWt ,

and if we integrate in the interval [t0 , t] we obtain


∫ t ( ) ∫ t ∫ t
d Xs eα(s−t0 ) = eα(s−t0 ) αβds + eα(s−t0 ) σdWs ,
t0 t0 t0

and, finally,
( ) ∫ t
Xt = Xt0 e−α(t−t0 ) + β 1 − e−α(t−t0 ) + σ e−α(t−s) dWs .
t0
4.3 The Ornstein-Uhlenbeck process 39

Figure 4.2.1: Behaviour of the expected value in (4.2.2) for 10 years when Xt0 = 25, β = 30 and for three values of
α : 0.5 continuous line, 0.2 dashed line, and 0.1 dotted line
30
29
Expected value

28
27
26

α=0.5
α=0.2
α=0.1
25

0 2 4 6 8 10

years
40 4 Stochastic processes used in finance

Since dWt is normally distributed, we can conclude that also Xt is Gaussian. The mean and the variance can
be computed thanks to the properties of Itô integrals that we have shown in the previous chapters:
( )
Et0 [Xt ] = Xt0 e−α(t−t0 ) + β 1 − e−α(t−t0 ) ,
∫ t
1 − e−2α(t−t0 )
Vt0 [Xt ] = σ 2 e−2α(t−s) ds = σ 2 .
t0 2α
The stationary values of the mean and the variance are given by
lim Et0 [Xt ] = β,
t→∞

σ2
lim Vt0 [Xt ] =
.
t→∞ 2α
Again, as presented in the previous sections, β measures the long term mean of the process. The parameter α,
in this case, also affects the long term variance: the higher α the lower the variance. In fact, a high α means that
the process tends to stay close to its long term mean. When α tends towards infinity, the process is constantly
equal to β. When, instead, α = 0, the process is a random walk.
The Gaussian distribution of Xt implies that it can take negative values. Thus, this process is useful for
describing economic and financial variables which can become negative.
In particular, we can compute at time t0 the probability that Xt takes negative values as follows
{ } ( )
Xt − Et0 [Xt ] Et0 [Xt ] Et0 [Xt ]
P {Xt < 0} = P √ < −√ = Φ −√ ,
Vt0 [Xt ] Vt0 [Xt ] Vt0 [Xt ]

where Φ (•) is the cumulative distribution function of a standard normal density. In equilibrium, the probability to
have negative values is ( )
β √
lim P {Xt < 0} = Φ − 2α .
t→∞ |σ|
The process (4.3.1), when written in discrete time as in (3.6.1), becomes
xi+1 = αβdt + (1 − αdt) xi + σdWt ,
and, accordingly, it can be estimated with Ordinary Least Squares as in the following model
xi+1 = ρ0 + ρ1 xi + εi ,
where εi is a normally distributed error term. Once the parameters ρ0 and ρ1 are estimated, the parameters α and
β are obtained by solving the system {
ρ0 = αβdt,
ρ1 = 1 − αdt,
from which
1 − ρ1
α̂ = ,
dt
ρ0
β̂ = .
1 − ρ1
The parameter σ, instead, is directly estimated from the variance of the differences dXt :

Vt [dXt ]
Vt [dXt ] = σ dt ⇒ σ̂ =
2
.
dt
Let us take into account the Harmonized Index of Consumer Prices – HICP (all items) for Euro area as shown
in Figure 4.3.1 for the period from 1/1/1996 to 1/9/2015 (monthly data of annual percentage changes).
If we want to estimate the parameters of the process (4.3.1) in such a way that it fits the HICP index at best,
we can at first compute the empirical standard deviation and then estimate (via Ordinary Least Square – OLS) the
parameters ρ0 and ρ1 .
We have stored the inflation data in the variable hicp, then we compute the variance of the differences and,
finally, the estimated σ.
HICP

0.00 0.01 0.02 0.03 0.04

1997−01−01
1997−08−01
1998−04−01
1998−12−01
1999−07−01
2000−02−01
4.3 The Ornstein-Uhlenbeck process

2000−10−01
2001−06−01
2002−01−01
2002−08−01
2003−04−01
2003−12−01
2004−07−01
2005−02−01
2005−10−01
2006−06−01
2007−01−01
2007−08−01
2008−04−01
2008−12−01
2009−07−01
2010−02−01
2010−10−01
2011−06−01
2012−01−01
2012−08−01
2013−04−01
2013−12−01
2014−07−01
2015−02−01
Figure 4.3.1: Harmonized Index of Consumer Prices (all items) for Euro area, monthly data (percentage change
with respect to 12 months ahead) for the period from 1/1/1996 to 1/9/2015 [Source: [Link]/fred2/]
41
42 4 Stochastic processes used in finance

dt = 1/12
sigma = sd(diff(hicp))/sqrt(dt)
sigma
## [1] 0.008503048

Now, we can estimate α and β (in R we will call them a and b) by using the function (linear model)

lm(y~x)

where y is the dependent variable and x is the matrix of the independent variables. In the following code we use:

• The function lm to apply linear OLS to the HICP index on itself.

rho = lm(hicp[-1] ~ head(hicp, -1))


summary(rho)
##
## Call:
## lm(formula = hicp[-1] ~ head(hicp, -1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0104095 -0.0013877 0.0000462 0.0014066 0.0078492
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0003900 0.0003615 1.079 0.282
## head(hicp, -1) 0.9737190 0.0178963 54.409 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.002448 on 222 degrees of freedom
## Multiple R-squared: 0.9302,Adjusted R-squared: 0.9299
## F-statistic: 2960 on 1 and 222 DF, p-value: < 2.2e-16
rho0 = [Link](rho$coefficients[1])
rho1 = [Link](rho$coefficients[2])

Once ρ0 and ρ1 have been obtained, we finally have what follows.

a = (1 - rho1)/dt
a
## [1] 0.3153725
b = rho0/(1 - rho1)
b
## [1] 0.01483822

We can check that the long term equilibrium value of the variable is close to its empirical mean (up to about 30
basis points).

mean(hicp)
## [1] 0.01793022

Now, we can simulate some path of the process (4.3.1) by using the parameters we have just estimated and
compare the simulations with the empirical data. This goal is achieved through the commands shown in Figure
4.3.2 (where x0 is, here, the first empirical value of the inflation index). In the code we now use:

• The command abline for adding a straight line. If the option is h (or v), then a horizontal (or vertical) line
is drawn and it has the coordinate measured by the value of the option itself.
4.3 The Ornstein-Uhlenbeck process 43

Figure 4.3.2: 100 paths (in light grey) of the process (4.3.1) whose parameters have been estimated on the HICP
index (as shown in Figure 4.3.1); in bold the historical values of the HICP

sim = euler(drift = paste(toString(a), "*(", toString(b), "-x)"),


diffusion = toString(sigma), dt = 1/12, T = length(hicp)/12,
t0 = 0, x0 = hicp[1], N = 100)
matplot(sim, type = "l", col = "lightgray", xlab = "months",
ylab = "inflation")
lines(hicp)
abline(h = 0, lty = 3)
grid()
0.04
0.02
inflation

0.00
−0.02

0 50 100 150 200

months
44 4 Stochastic processes used in finance

• The command toString for transforming a number to a string.

• The command paste to concatenate strings.

The probability to have negative value of the inflation index, in the long term, is as follows.

pnorm(-b * sqrt(2 * a)/sigma, 0, 1)


## [1] 0.08288769

Actually, a probability to have a negative value of inflation which is higher than 8% seems to over evaluate the
reality and this is mainly due to the recent financial crisis which altered the economic (and financial) framework.

4.4 The Cox et al. [1985] process


Cox et al. [1985] propose a process like (4.2.1) where the diffusion term is specified in order to prevent the process
from becoming negative: √
dXt = α (β − Xt ) dt + σ Xt dWt . (4.4.1)
This process may reach the value zero, and if it does, it instantly departs from zero since its differential becomes

dXt = αdt,

which is strictly positive. In this case, the level zero is said to be a reflecting barrier (as soon as the process hits
the barrier, it is reflected and pushed far from it).
There exists a condition for preventing the process from reaching the value zero (it is called Feller’s condition):

2αβ ≥ σ 2 .

While we are able to compute the mean and the variance of Xt in closed form, the process (4.4.1) does not have
a closed form solution.
Furthermore, the process (4.4.1) is heteroscedastic, since its variance depends on the value of Xt itself. In
particular, when Xt is higher, also its variance is higher. This is consistent with the empirical evidence about
interest rates, since a high interest rate is in general related to a period of economic troubles, when the variance of
any economic indicator becomes higher.
If we want to use OLS for estimating the parameters of (4.4.1), we must make it homoscedastic, i.e. we must
look for a transformation f (Xt ) such that
∂f (Xt ) √
Xt = 1.
∂Xt
One of the solutions to this differential equation is

f (Xt ) = 2 Xt .

Thus, if we define Yt = 2 Xt , its differential is
( ( ) √ )
1 1
dYt = √ αβ − σ 2 − α Xt dt + σdWt
X 4
(( t ) )
1 1 α
= 2αβ − σ 2 − Yt dt + σdWt .
2 Yt 2

The parameter σ can accordingly be estimated through the following result



Vt [dYt ]
Vt [dYt ] = σ dt ⇒ σ̂ =
2
.
dt
The other parameters are estimated by using the OLS on the discrete version of dYt :
( )
1 dt αdt
Yi+1 − Yi = 2αβ − σ 2 − Yi + σdWi ,
2 Yi 2
4.4 The Cox et al. [1985] process 45

Figure 4.4.1: Volatility Index (VIX) on CBOE from 2/1/1990 to 19/11/2015 (daily data) [Source: re-
[Link]/fred2/]
80
70
60
50
VIX

40
30
20
10

1990−01−02
1990−11−01
1991−09−03
1992−07−02
1993−05−04
1994−03−03
1995−01−02
1995−11−02
1996−09−02
1997−07−03
1998−05−04
1999−03−03
2000−01−03
2000−11−01
2001−09−03
2002−07−03
2003−05−02
2004−03−03
2004−12−31
2005−11−02
2006−09−01
2007−07−03
2008−05−02
2009−03−03
2010−01−01
2010−11−02
2011−09−01
2012−07−03
2013−05−02
2014−03−04
2015−01−01
( ) ( )
1 2 dt αdt
Yi+1 = 2αβ − σ + 1− Yi + σdWi ,
2 Yi 2
1
Yi+1 = ρ1 + ρ2 Yi + εi .
Yi
After estimating ρ1 and ρ2 , the values of α and β are obtained as follows:
{( ) {
ρ1 + 12 σ 2 dt
2αβ − 12 σ 2 dt = ρ1 , β = 4(1−ρ ,
( ) ⇒ 2)
1 − 2 = ρ2 ,
αdt 1−ρ
α = 2 dt . 2

We can now apply this estimation method to a process which never becomes negative and exhibits a mean
reverting property. The index we choose is the volatility index (VIX) listed on the Chicago Board of Options
Exchange (CBOE – [Link]). It measures the volatility of the US financial market (measured through the
implied volatility computed on a portfolio of options). The downloaded data (from 2/1/1990 to 19/11/2015) are
shown in Figure 4.4.1.
Once the values of the VIX have been stored in variable vix, the volatility is estimated with the following
commands.

dt = 1/250
y = 2 * sqrt(vix)
sigma = sd(diff(y))/sqrt(dt)
sigma
## [1] 4.680272
46 4 Stochastic processes used in finance

Now, we can estimate α and β (in R we will call them a and b) by using the function lm(y~x) as we did in the
previous section. Inside the function lm we write «-1» for indicating that the constant of regression must not be
taken into account.

inverted = 1/head(y, -1)


retarded = head(y, -1)
rho = lm(y[-1] ~ inverted + retarded - 1)
summary(rho)
##
## Call:
## lm(formula = y[-1] ~ inverted + retarded - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.10770 -0.15624 -0.01943 0.13011 2.58451
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## inverted 0.655588 0.091372 7.175 8.04e-13 ***
## retarded 0.991183 0.001223 810.725 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2948 on 6519 degrees of freedom
## Multiple R-squared: 0.9989,Adjusted R-squared: 0.9989
## F-statistic: 2.972e+06 on 2 and 6519 DF, p-value: < 2.2e-16
rho1 = [Link](rho$coefficients[1])
rho2 = [Link](rho$coefficients[2])

Once ρ1 and ρ2 have been obtained, we finally have what follows.


{
ρ1 + 12 σ 2 dt
β = 4(1−ρ 2)
,
α = 2 1−ρ
dt .
2

a = 2 * (1 - rho2)/dt
a
## [1] 4.408669
b = (rho1 + 0.5 * sigma^2 * dt)/(4 * (1 - rho2))
b
## [1] 19.83019

We can check that the long term equilibrium value of the variable is close to its empirical mean.

mean(vix)
## [1] 19.83174

Now, we can simulate some paths of the process (4.4.1) by using the parameters we have just estimated and
compare the simulations with the empirical data. This goal is achieved through the following commands (where we
use, as x0 , the first empirical value of the VIX index). The result is drawn in Figure 4.4.2, where we compare the
VIX index with 100 simulations (in the upper graph) and with only the first simulation (in the lower graph).
We see that the picks in the simulated process reach high values (around 60) but are not able to replicate the
highest volatility reached during the recent financial crisis (in 2007/2008). Nevertheless, the simulations capture in
a quite accurate way the process we are trying to reproduce.

4.5 The geometric Brownian motion


The process we have seen so far are mean reverting, but the parameters in (4.2.1) can also be set in order to have
a divergent process (this is the case with α < 0).
4.5 The geometric Brownian motion 47

Figure 4.4.2: In the upper graph, 100 paths (in light grey) of the process (4.4.1) whose parameters have been
estimated on the VIX index (as shown in Figure 4.4.1); in bold the historical values of the VIX. In the lower graph,
the comparison is performed between the VIX index (in bold) and only one simulated path (in light grey)

sim = euler(drift = paste(toString(a), "*(", toString(b), "-x)"),


diffusion = paste(toString(sigma), "*sqrt(x)"), dt = 1/250,
T = length(vix)/250, t0 = 0, x0 = vix[1], N = 100)
par(mfrow = c(2, 1), mar = c(4, 5, 3, 1) + 0.1)
matplot(sim, type = "l", col = "lightgray", xlab = "days", ylab = "VIX",
ylim = c(min(sim, vix), max(sim, vix)))
lines(vix)
grid()
plot(sim[, 1], type = "l", xlab = "days", ylab = "simulated VIX",
col = "lightgray", ylim = c(min(sim, vix), max(sim, vix)))
lines(vix)
grid()
80
60
VIX

40
20
0

0 1000 2000 3000 4000 5000 6000

days
80
simulated VIX

60
40
20
0

0 1000 2000 3000 4000 5000 6000

days
48 4 Stochastic processes used in finance

One of the most commonly used divergent process is the so-called geometric Brownian motion (GBM) which
has the following form
dXt = µXt dt + σXt dWt . (4.5.1)
Such a process is heteroscedastic, and it can be transformed into a homoscedastic process by taking its logarithm
transformation: ( )
1
d ln Xt = µ − σ 2 dt + σdWt ,
2
which can be integrated for obtaining its solution in closed form:

Xt = Xt0 e(µ− 2 σ )(t−t0 )+σ(Wt −Wt0 ) .


1 2

From this result we see that Xt follows a log-normal distribution:


( ( ) )
1 2
ln Xt ∼ N ln Xt0 + µ − σ (t − t0 ) , σ (t − t0 ) ,
2
2
The estimation of the parameters µ and σ, in this case, is trivial and can be obtained from the moments of the
log-transformation:
{ ( ) { ( )
Et [d ln Xt ] = µ − 21 σ 2 dt, µ = dt 1
Et [d ln Xt ] + 21 Vt [d ln Xt ] ,
⇒ √
Vt [d ln Xt ] = σ 2 dt, σ = Vt [ddt ln Xt ]
.

The GBM has been widely used in finance for modelling asset prices since it is consistent with the price of an
asset whose continuous return is constant over time and given by µ. In fact, given the expected value of Xt , we can
write
Et [Xt ] = Xt0 eµ(t−t0 ) ,
which is a compounding rule in continuous time.
Since (4.5.1) is a divergent process, then it is suitable for describing processes that, on average, grow over time.
This is the case, for instance, of stock exchange indexes. In Figure 4.5.1, for instance, we show the evolution of the
S&P500 index from 3/1/1950 to 23/10/2015.
We can see that this index is, on average, increasing over time, even if we can immediately recognise six periods:
• from the beginning to 1995: the «normal» period (even if there is a big fall on October, 19th 1987 – the
so-called «black Monday»)
• from 1995 to 2000: the accumulation of the so-called «dot-com bubble»
• from 2000 to 2002: the burst of the bubble (on September, 11th 2001 there is the attack to the World Trade
Centre in New York)
• from 2002 to 2007: the accumulation of the co-called «sub-prime bubble»
• from 2007 to 2009: the burst of the bubble (on September, 15th 2008 Lehman Brothers goes bankrupt)
• from 2009: the recovery and the increase of the index because of the many quantitative easing of the Federal
Reserve
Once the values of the index are stored in the variable S (stock), we can estimate the parameters µ and σ as follows.
{ ( )
1
µ = dt E [d ln Xt ] + 12 Vt [d ln Xt ] ,
√ t
σ = Vt [ddt ln Xt ]
.

dlnS = diff(log(S))
dt = 1/250
sigma = sqrt(var(dlnS)/dt)
sigma
## [1] 0.153757
mu = (mean(dlnS) + 0.5 * var(dlnS))/dt
mu
## [1] 0.08466284
S&P 500

0 500 1000 1500 2000

[Link]]
1950−01−03
1952−03−20
1954−06−03
1956−08−09
1958−10−17
1960−12−28
4.5 The geometric Brownian motion

1963−03−11
1965−05−19
1967−07−27
1969−11−14
1972−01−20
1974−04−01
1976−06−07
1978−08−14
1980−10−17
1982−12−27
1985−03−01
1987−05−08
1989−07−14
1991−09−19
1993−11−23
1996−01−31
1998−04−08
2000−06−15
2002−08−29
2004−11−08
2007−01−19
2009−03−30
2011−06−07
2013−08−16
Figure 4.5.1: Daily values of the stock exchange index S&P500 from 3/1/1950 to 23/10/2015 [Source: finance.
49
50 4 Stochastic processes used in finance

Figure 4.5.2: 100 paths (in light grey) of a geometric Brownian motion whose parameters have been estimated on
the S&P 500 index (in black)

sim = euler(drift = paste(toString(mu), "*x"), diffusion = paste(toString(sigma),


"*x"), dt = 1/250, T = length(S)/250, t0 = 0, x0 = S[1],
N = 100)
matplot(sim, type = "l", col = "lightgray", xlab = "days", ylab = "S&P500",
ylim = c(min(sim, S), max(sim, S)))
lines(S)
grid()
60000
50000
40000
S&P500

30000
20000
10000
0

0 5000 10000 15000

days

Thus, we can conclude that the average standard deviation for the S&P500 is about 15%, while the average
return is about 8.5%.
Now we can generate some simulations of the S&P500 with the parameters we have just estimated, as shown in
Figure 4.5.2.

4.6 The Chan et al. [1992] process (and the simulated maximum likeli-
hood estimation)
Chan et al. [1992] propose a model for interest rates which accommodates all the models shown in the previous
sections:
dXt = α (β − Xt ) dt + σXtγ dWt , (4.6.1)
and is known as CKLS mode (form the initials of the authors).
The Vasiček [1977] model is obtained with γ = 0, while the Cox et al. [1985] model is obtained with γ = 21 . In
this case, a homoscedastic transformation exists, but it is not useful for estimating the parameters of the model. In
4.6 The Chan et al. [1992] process (and the simulated maximum likelihood estimation) 51

fact, we can write, through Itô’s lemma,


( ) ( )
Xt1−γ −γ 1−γ 1 γ−1 2
d = αβXt − αXt − γXt σ dt + σdWt .
1−γ 2

This process is actually homoscedastic, but since the transformation can be done only by knowing the value of
the parameter γ, it could be used for estimating the other parameters only if there exists a method which allows
to estimate, at first, γ. Since estimating parameters in different steps is usually inefficient, we must rely to other
methods.
A simple method is based on the following Euler scheme of Equation (4.6.1):

xi+1 − xi = α (β − xi ) dt + σxγi dWi ,

from which we see that, given the value of Xi , the variable Xi+1 is normally distributed:
( )
xi+1 | xi ∼ N αβdt + (1 − αdt) xi , σ 2 x2γ
i dt .

Remark 4.6.1. While we know that the distribution of xi+1 conditional to the value of xi is Gaussian, the
distribution of xi (unconditional) is not known.

Thanks to this result, we can compute the likelihood function for each xi+1 given its previous value xi as follows:2
( )2

n−1
1 − 12
xi+1 −αβdt−(1−αdt)xi
γ√
L= √ √ e σx
i
dt
.
i=1
2πσxγi dt

The parameters α, β, γ and σ are computed by maximising the value of L. Since the optimal values of the
parameters do not change with a monotonic transformation of L, we can maximise its logarithm:
 ( )2 

n−1 (√ √ ) 1 xi+1 − αβdt − (1 − αdt) xi
l ≡ ln L = − ln γ
2πσxi dt − √ 
i=1
2 σxγi dt
∑ n−1 ( )2
1 ∑ xi+1 − αβdt − (1 − αdt) xi
n−1
n−1
=− ln (2πdt) − (n − 1) ln σ − γ ln xi − 2 .
2 i=1
2σ dt i=1 xγi

Finally, we can write the problem has follows:

n−1 ( )2
1 ∑ 1 ∑ xi+1 − αβdt − (1 − αdt) xi
n−1
1
min ln σ + γ ln xi + 2 ,
α,β,γ,σ n − 1 i=1 2σ dt n − 1 i=1 xγi

where the maximisation problem has become a minimisation one since the objective function has been multiplied
by −1.

Remark 4.6.2. When γ = 0 (i.e. when the process Xt is homoscedastic, like in the case of Vasiček), this method
coincides with the least square method. In fact the minimisation problem becomes:

1 ∑
n−1
1 2
min ln σ + (xi+1 − αβdt − (1 − αdt) xi ) .
α,β,γ=0,σ 2σ 2 dt n − 1 i=1

2 In general, if we know the density function f (x, θ) of a stochastic variable x, where θ contains the parameters of the model, the

maximum likelihood estimation of θ is obtained by solving the problem



n
max f (xi , θ) ,
θ
i=1

where we have assumed to know n observations of the variable x.


52 4 Stochastic processes used in finance

Unfortunately, the system of the first derivatives with respect to the control variables has not an algebraic closed
form solution. Thus, we have to solve the minimisation problem numerically. We can do that with R by defining
the log-likelihood function and then minimising it with respect to the parameters (α, β, σ, γ).
The log-likelihood function is defined in the following code, where the first argument is the set of the parameters,
and the other inputs are the data and the time interval dt. The name of the parameters are attributed in the first
rows, and then the present and retarded values of the data are set.

loglike = function(theta, x, dt) {


a = theta[1]
b = theta[2]
sigma = theta[3]
gam = theta[4]
x1 = head(x, -1)
x2 = tail(x, -1)
log(sigma) + gam * mean(log(x1)) + 0.5 * mean(((x2 - a *
b * dt - (1 - a * dt) * x1)/(sigma * x1^gam * sqrt(dt)))^2)
}

Now we can use one of the R function which is meant to optimize a given user defined function whose syntax is
as follows:

optim(parameters,function)

where function is the user defined function that must be minimised, and parameters is the set of the initial
values of the parameters which the iteration starts from. Since, in our case, the log-likelihood function has three
arguments and we want to perform the optimization only with respect to the first argument (the parameters), the
other arguments must be defined inside the optim function.
Now, we can use the optimization procedure for adapting Equation (4.6.1) to the VIX data we have already
presented in the previous sections. We set the initial values of all parameters to 1 but for β which is assumed to
start from the mean value of the VIX index.

ML = optim(c(1, mean(vix), 1, 1), loglike, x = vix, dt = 1/250)


ML
## $par
## [1] 3.7525814 19.9451681 0.4838176 1.2510442
##
## $value
## [1] 3.431153
##
## $counts
## function gradient
## 415 NA
##
## $convergence
## [1] 0
##
## $message
## NULL

The output of the function show:


• The values of the parameters ($par).
• The (minimum) value of the log-likelihood function ($value).
• The convergence code ($convergence) whose value is 0 is the procedure has been performed successfully.
In this case we obtain a warning message because during the optimization procedure the algorithm has tried the
zero value for the parameter σ (and the log function does not exist for an argument equal to zero).
4.7 Two factor models 53

Figure 4.6.1: Simulation (in grey) of the VIX index (in black) by using Equation (4.6.1) with the parameters
estimated through the maximum likelihood method

sim = euler(drift = paste(toString(ML$par[1]), "*(", toString(ML$par[2]),


"-x)"), diffusion = paste(toString(ML$par[3]), "*x^", toString(ML$par[4])),
dt = 1/250, T = length(vix)/250, t0 = 0, x0 = vix[1], N = 1)
matplot(sim, type = "l", col = "lightgray", xlab = "days", ylab = "VIX",
ylim = c(min(sim, vix), max(sim, vix)))
lines(vix)
grid()
80
60
VIX

40
20

0 1000 2000 3000 4000 5000 6000

days

We see that the γ parameter is much higher than 0.5 (the value of γ in the case of the CIR process) and,
accordingly, also the value of σ is different with respect to the one obtained for the CIR process. We can conclude
that the VIX process is «more» heteroscedastic than the CIR process. Instead, the magnitude of α and β is
definitely comparable with the values obtained for the CIR process.
With these new parameters we can now simulate the process (4.6.1) by using the euler function with the
following commands (the result is drawn in Figure 4.6.1).
Figure 4.6.1 allows us to conclude that the CKLS model is much more suitable for describing the VIX index
than the CIR model. The CKLS model, in fact, is able to capture even the high volatility picks.

4.7 Two factor models


Sometimes a stochastic variable like the VIX index is not assumed to be fully described by only one stochastic
process. Instead, it seems to be more effective to assume that some parameters of the process describing the VIX
index are stochastic themselves. One of the most commonly used models is the so-called «stochastic volatility
model» where the relevant variable is assumed to follow a Wiener process whose volatility follows, itself, another
54 4 Stochastic processes used in finance

Wiener process (possibly correlated with the previous one).


For asset prices, for instance, the usual GBM is completed in the following way:
{ √
dSt
St = µdt + vt dŴS,t ,
(4.7.1)
dvt = α (β − vt ) dt + σvtγ dŴv,t ,
where the volatility of the log-return (d ln St ), given by vt , is assumed to follow a CKLS process (of course any other
model can be used). The processes ŴS,t and Ŵv,t are assumed to be correlated:
[ ]
C dŴS,t , dŴv,t = ρdt,

where C is the covariance operator and ρ is set constant (we recall that this model can be traced back to an
equivalent one with independent Wiener processes as shown in Section ).
Three versions of (4.7.1) play a relevant role in the financial literature:
• when γ = 12 , i.e. the volatility follows a CIR process; this case is studied in Heston [1993];

• when γ = 1, (4.7.1) is known as a Generalised Auto-Regressive Conditional Heteroscedasticity (GARCH)


model;
• when γ = 23 , (4.7.1) is known as «2/3 model».
The estimation of these models is not a trivial task and it is not our purpose to show here the methods used. In
particular, we underline that many financial econometrics manuals show how to estimate the GARCH models.
Chapter 5

Stochastic processes with jumps

5.1 Introduction
The Wiener process that has been introduced in the previous chapters is able to describe the «small» (infinitesimal)
changes in the prices, but it fails when the price of an asset falls by a «big» (finite) jump. Since these jumps are
rare events, they are often modelled through a Poisson distribution. We are about to present in this chapter how
the stochastic calculus changes when a Poisson process is used for capturing finite jumps in financial variables (like
a negative jump in asset prices or a positive jump in price volatility after a crisis). For further readings about jump
processes we refer to Cont and Tankov [2004], ksendal and Sulem [2007].

5.2 Binomial model for rare events


Like we did for the Wiener process, we start by defining a stochastic variable which takes only two values: either 1
if something «happens» (with probability p), or 0 otherwise:

{
1, p
Yi =
0, 1 − p

where p is assumed to be sufficiently low for describing a rare event. The expected value and the variance of this
variable are
E [Yi ] = p, V [Yi ] = p − p2 = p (1 − p) .

Now, we define the stochastic process Xt as the sum of the variables Yi for i ∈ {1, 2, ..., t}, with X0 = 0 by
definition:
∑t
Xt = Yi . (5.2.1)
i=1

If we assume that the variables Yi are i.i.d., then the expected value and the variance of Xt are as follows:


t
E0 [Xt ] = E0 [Yi ] = pt,
i=1


t
V0 [Xt ] = V0 [Yi ] = p (1 − p) t.
i=1

55
56 5 Stochastic processes with jumps

Remark 5.2.1. Note that since E0 [Xt ] ̸= X0 , then this process is not a martingale. We can check that even
through the following passages:
[ t ]
∑ ∑T
Et [XT ] = Et Yi + Yi
i=1 i=t+1
[ ]

T
= Et [Xt ] + Et Yi
i=t+1


T
= Xt + Et [Yi ] = Xt + (T − t) p.
i=t+1

Sometimes, in order to preserve the martingale property of the random walk, the process Xt defined in (5.2.1)
is corrected (or, better, «compensated») by its own mean:

X̂t ≡ Xt − pt,

and now X̂t is a martingale. Of course, the compensation does not alter the variance of the process.

The probability to have k events happening on t trials is given by


t! t−k
P {k events on t trials} = pk (1 − p) .
k! (t − k)!
We can change the R function we created for representing a random walk, in order to represent an event which
happens with a given probability (p). The new function called «RE» (rare-event) is written below and used for
representing one and 100 paths of the variable Xt for t ∈ {0, 1, 2, ..., 500} as in Figure 5.2.1.

5.3 The continuous time version of the binomial model for rare events
Now, we divide each period into n sub-periods (each of length n1 ≡ dt) for n → ∞ (and, thus, dt → 0). At the same
time, in order to represent a rare event, we let p tend towards zero, with the property that the product pn, i.e. the
probability that an event occurs in one period, is constant and equal to λ:
lim pn ≡ λ.
p→0,n→∞

The probability to have k events happening on n trials is given by


n! n−k
P {k events on n trials} = pk (1 − p) ,
k! (n − k)!
λ
where we can substitute p with n and compute the limit of the probability for n tending towards infinity:
( )k ( )−k ( )t
n! λ λ λ
lim 1− 1− .
n→∞ k! (n − k)! n n n
This limit can be split into three simpler limits:
( )n
λ
lim 1 − = e−λ ,
n→∞ n
( )−k
λ
lim 1 − = 1,
n→∞ n
and, finally,
( )k
n! λ λk n! 1
lim = lim
n→∞ k! (n − k)! n k! n→∞ (n − k)! nk
λk n (n − 1) (n − 2) ... (n − k + 1)
= lim k
k! n→∞ ( k−1 )n
k k
λ n +O n λk
= lim = ,
k! n→∞ nk k!
5.3 The continuous time version of the binomial model for rare events 57

Figure 5.2.1: In the upper figure one path of the process Xt for t ∈ {0, 1, 2, ..., 500} in (5.2.1) with p = 0.01 is
shown, together with the mean of the process (dashed line). In the lower figure, 100 paths of the same process are
drown, together with the empirical mean (balck line) and the theoretical mean (dashed line)

RE = function(p, r, c) {
Y = array(rbinom(r * c, 1, p), dim = c(r, c))
X = rbind(array(0, dim = c(1, c)), apply(Y, 2, cumsum))
return(X)
}
X1 = RE(0.01, 500, 1)
X100 = RE(0.01, 500, 100)
par(mfrow = c(2, 1), mar = c(2, 3, 2, 2))
matplot(X1, type = "l", ylab = "", col = "lightgray")
lines(apply(X1, 1, mean))
lines(seq(0, 500), seq(0, 500) * 0.01, type = "l", lty = 3, lwd = 3)
grid()
matplot(X100, type = "l", ylab = "", col = "lightgray")
lines(apply(X100, 1, mean))
lines(seq(0, 500), seq(0, 500) * 0.01, type = "l", lty = 3, lwd = 3)
grid()
2.0
1.5
1.0
0.5
0.0

0 100 200 300 400 500


10 12
8
6
4
2
0

0 100 200 300 400 500


58 5 Stochastic processes with jumps

where O (•) measures the order of the infinite. Finally, we can write that the density function of having k events is:

λk e−λ
f (k) = ,
k!
which is the Poisson density function.
The mean of the process in each period is pn = λ and the variance of the process is

lim p (1 − p) n = lim λ (1 − p) = λ.
p→0,n→∞ p→0,n→∞

Thus, we can conclude that the limit distribution for rare events has the following property:

E0 [Xt ] = V0 [Xt ] = λt.

5.4 The Poisson process


In the small time interval dt, the difference in a Poisson process dNt can be approximated as follows:
{
1, λt dt
dNt =
0, 1 − λt dt
where λt (possibly stochastic) is the so-called «intensity» of the process. The mean and the variance of the process
are
Et [dNt ] = λt dt, Vt [dNt ] = λt dt.
The process dNt is called a «counting process» because it counts the number of occurrences. In order to describe
a stochastic variable which evolves through jumps, we can use the following stochastic process

dXt = γt dNt , (5.4.1)

whose integral, of course, is ∫ t


Xt − X0 = γs dNs ,
0
where γt may be a stochastic process itself which measures the magnitude of the jump. Thus, while dNt just
measures whether a jump occurs or not, γt measure the width of the jump.
It is quite common, in finance, to take i.i.d. jumps whose width is normally distributed:
( )
γt ∼ N µγ , σγ2 .

In this case, it is also assumed that the jump width γt is independent of the number of jumps dNt . Thus, the
expected value of (5.4.1) is (we assume λt constant):

Et [dXt ] = Et [γt ] λdt = µγ λdt.

Thus, on average, this process is increasing over time.

Remark 5.4.1. Note that the variable γt , in this case, is not measurable with respect to Ft . If it were measurable,
then we could know the width of the jump before it happens. In order to underline this property, sometimes
the notation of the expected value is changed as Et− [•] where we indicate with t− the instant just before the
jump. In this way, it is obvious that the value γt does not belong to the set Ft− .

We can simulate a Poisson process like (5.4.1) with the following commands, where we have used the command

rpois(n,lambda)

whose inputs are the number of observations (n) and the intensity of the process (λ). The following code allows to
simulate N = 10 times a process which starts at X0 = 25, and with µγ = 1, σγ = 2, whose jumps occur 5 times
1
(λ = 5) for any daily time interval dt = 250 (i.e. we expect to have, on average, 5 jumps per year) and for a period
of T = 10 years.
5.4 The Poisson process 59

Figure 5.4.1: N = 10 simulations (in light grey) of daily values for T = 5 years of the process (5.4.1) with: X0 = 25,
1
µγ = 1, σγ = 2, λ = 5, dt = 250 (in black the mean of the 10 simulations)

X = poisson(X0 = 25, mu = 1, sigma = 2, lambda = 5, dt = 1/250,


T = 10, N = 10)
matplot(X, type = "l", ylab = "", col = "lightgray")
lines(apply(X, 1, mean))
grid()
120
100
80
60
40
20

0 500 1000 1500 2000 2500

poisson = function(X0, mu, sigma, lambda, dt, T, N) {


X = array(0, dim = c(T/dt, N))
X[1, ] = rep(X0, N)
for (i in 2:(T/dt)) {
dX = rnorm(N, mean = mu, sd = sigma) * rpois(N, lambda *
dt)
X[i, ] = X[i - 1, ] + dX
}
return(X)
}

In Figure 5.4.1 the result of the simulations is shown.


If we want to transform the previous process dXt into a martingale, we must subtract its expected value:

dXt = γt dNt − Et [γt dNt ]


= −Et [γt ] λdt + γt dNt .

Thus, in the previous case, we have


60 5 Stochastic processes with jumps

Figure 5.4.2: N = 10 simulations (in light grey) of daily values for T = 5 years of the process (5.4.2) with: X0 = 25,
1
µγ = 1, σγ = 2, λ = 5, dt = 250 (in black the mean of the 10 simulations)

X = poisson(X0 = 25, mu = 1, sigma = 2, lambda = 5, dt = 1/250,


T = 10, N = 10)
matplot(X, type = "l", ylab = "", col = "lightgray")
lines(apply(X, 1, mean))
grid()
60
40
20
0

0 500 1000 1500 2000 2500

dXt = −µγ λdt + γt dNt . (5.4.2)


In this case we have used a so-called «compensated» Poisson process (which is a martingale), and the previous
code can be suitably adjusted in order to simulated this new (jump) martingale as follows.

poisson = function(X0, mu, sigma, lambda, dt, T, N) {


X = array(0, dim = c(T/dt, N))
X[1, ] = rep(X0, N)
for (i in 2:(T/dt)) {
dX = -mu * lambda * dt + rnorm(N, mean = mu, sd = sigma) *
rpois(N, lambda * dt)
X[i, ] = X[i - 1, ] + dX
}
return(X)
}

The graphic in Figure 5.4.2 is obtained.


5.5 Itô calculus for Poisson process 61

We see that the compensated Poisson process has a drift which is negative because it must compensate the
jumps which, on average, are positive (we have assumed µγ > 0).
In general, on a financial market the widths of the jumps are, on average, negative, since the negative jumps
are more frequent and have a greater magnitude with respect to the positive jumps.

5.5 Itô calculus for Poisson process


We have previously demonstrated that a Wiener process cannot be differentiated (because the limit of dW dt does
t

not exist for dt → 0). Also the Poisson process cannot be differentiated, but for another reason: it is discontinuous.
In fact, when a jump occurs, the Poisson process jumps by a finite amount and this creates a discontinuity.
Now, if we take a function f (Nt ), the difference between its value in t = 0 and its value in any time t can be
decomposed as follows

t
f (Nt ) − f (N0 ) = (f (Ns + ∆Ns ) − f (Ns )) ,
s=1

where, on the right hand side, we have summed up all the differences between the value of the function at the
beginning and at the end of any period. Nevertheless, since ∆Nt is either equal to 1 (if a jump occurs) or to 0 (if
it does not), then we can also write


t
f (Nt ) − f (N0 ) = (f (Ns + 1) − f (Ns )) ∆Ns ,
s=1

in fact {
f (Ns + 1) − f (Ns ) , ∆Ns = 1
f (Ns + ∆Ns ) − f (Ns ) =
f (Ns ) − f (Ns ) = 0, ∆Ns = 0
The same difference, in continuous time, is
∫ t
f (Nt ) − f (N0 ) = (f (Ns + 1) − f (Ns )) dNs ,
0

whose differential is
df (Nt ) = (f (Nt + 1) − f (Nt )) dNt .
If we take into account any stochastic process driven by a Poisson process

dXt = h (t, Xt ) dt + γ (t, Xt ) dNt ,

and we take a function Y (t, Xt ), the differential of this function can thus be written as
( )
∂Y ∂Y
dY (t, Xt ) = + h (t, Xt ) dt + (Y (t, Xt + γ (t, Xt )) − Y (t, Xt )) dNt .
∂t ∂Xt

Thus, we can also conclude what follows.

Lemma 5.5.1 (Itô’s lemma for jump-diffusion processes). Given a stochastic process Xt which solves the
differential equation
dXt = h (t, Xt ) dt + g (t, Xt ) dWt + γ (t, Xt ) dNt ,
any function Y (t, Xt ) which is at least differentiable once w.r.t to its first argument and twice w.r.t. its second
argument, solves the following differential equation:
( )
∂Y ∂Y 1 ∂2Y 2 ∂Y
dY = + h (t, Xt ) + 2 g (t, Xt ) dt + g (t, Xt ) dWt
∂t ∂Xt 2 ∂Xt ∂Xt
+ (Y (t, Xt + γ) − Y (t, Xt )) dNt .

With this new version of Itô’s lemma, we can check two interesting properties.
62 5 Stochastic processes with jumps

1. The process Yt = eαNt −λt(e −1) is a martingale (α is a constant and λ is the intensity of the Poisson process).
α

We can apply Lemma 5.5.1 with h = g = 0 and γ = 1:


( )
dYt = eαNt −λt(e −1) (−λ (eα − 1)) dt + eα(Nt +1)−λt(e −1) − eαNt −λt(e −1) dNt
α α α

= eαNt −λt(e −1)


(−λ (eα − 1)) dt + (eα − 1) eαNt −λt(e −1)
α α
dNt
= Yt (−λ (e − 1)) dt + (e − 1) Yt dNt ,
α α

whose expected value is


[ ]
dYt
Et = (−λ (eα − 1)) dt + Et [(eα − 1) dNt ]
Yt
= (−λ (eα − 1)) dt + (eα − 1) Et [dNt ]
= (−λ (eα − 1)) dt + (eα − 1) λdt = 0.

Since any differential equation whose expected value is zero is also a martingale, we have demonstrated the
first statement.
2. The process Yt = (1 + α) t e−λαt is a martingale (α is a constant and λ is the intensity of the Poisson process).
N

We can apply again Lemma 5.5.1 with h = g = 0 and γ = 1:


( )
dYt = −λα (1 + α) t e−λαt dt + (1 + α) t e−λαt − (1 + α) t e−λαt dNt
N N +1 N

= −λαYt dt + αYt dNt ,

whose expected value is


[ ]
dYt
Et = Et [−λαdt + αdNt ]
Yt
= −λαdt + αEt [dNt ]
= −λαdt + αλdt = 0,

and again we have demonstrated the initial statement.

Another example is the following jump-diffusion GBM:


dXt
= (µ + λµγ ) dt + σdWt − γt dNt ,
Xt
where µ, σ and λ are constant and µγ is the mean of the stochastic variable γt . In this case if we apply Itô’s lemma
to the logarithm of Xt we have
( )
1 2
d ln Xt = µ + λµγ − σ dt + σdWt + (ln (Xt − Xt γt ) − ln Xt ) dNt
2
( )
1 2
= µ + λµγ − σ dt + σdWt + ln (1 − γt ) dNt ,
2

were we see that the model makes sense if and only if γt < 1. In other words we must exclude the case when Xt
completely loses its value because of a jump. In fact, if we assume γt = 1, then when a jump occurs, the process
Xt takes value zero and it remains at that level (zero is a so-called absorbing barrier ).
Chapter 6

The financial market

6.1 Financial assets


Let us assume that on financial market there are n risky assets which do not pay dividends and whose prices St ∈ Rn+
follow the matrix stochastic differential equation

IS−1 dSt = µ (t, St )dt + Σ (t, St ) dWt , (6.1.1)
n×nn×1 n×1 n×k k×1

S (t0 ) = S0 ,

where IS is the diagonal matrix containing the asset values and the prime denotes transposition. Thus, for instance,
we could write a model with 2 assets and 3 risk sources as follows (we neglect all the functional dependences for
the sake of simplicity)
 
[ ]−1 [ ] [ ] [ ] dWt,1
St,1 0 dSt,1 µ1 σ11 σ12 σ13 
= dt + dWt,2  .
0 St,2 dSt,2 µ2 σ21 σ22 σ23
dWt,3

Here, we assume that the prices of the n assets are driven by k independent1 risk sources represented by Brownian
motions.
The expected (instantaneous) returns on these assets are
[ ]
Et IS−1 dSt = µ (t, St ) dt,

while their (instantaneous) variances and covariances matrix is


[ ] ′
Vt IS−1 dSt = Σ (t, St ) Σ (t, St ) dt.

Hereafter, we will neglect the functional dependences of both µ and Σ′ with respect to time and space in order
to keep the notation as simple as possible.
Furthermore, on financial market there is a riskless asset (issued by the Government) whose price G (t) follows
a deterministic differential equation:
dGt
= rt dt. (6.1.2)
Gt
Remark 6.1.1. If we know the value in t0 of the asset Gt then the (unique) solution of the differential equation
(6.1.2) is ∫t
r du
Gt = Gt0 e t0 u ,
and the ratio ∫t
r du
∫T
Gt G t e t0 u
= 0 ∫T = e− t ru du ,
GT r du
Gt0 e t0 u
for any T > t is the discount factor between t and T .
1 The independence hypothesis is not restrictive since we can always switch from a vector of dependent Wiener process to a vector

of independent Wiener processes (and vice versa) through the Cholesky decomposition of the variance and covariance matrix.

63
64 6 The financial market

6.2 Portfolio
A portfolio is a linear combination of assets St and Gt whose value Rt is given by

Rt = wt′ St + wt,G Gt , (6.2.1)


1×n n×1

where wt and wt,G are the number of risky and riskless assets held in the portfolio, respectively (if an element of
wt is negative then the corresponding asset is short sold).
Since the portfolio allocation wt and wt,G may change over time according to the changes in the asset values
then both wt and wt,G must be considered as stochastic variables. This means that the differential of Rt must be
computed as

dRt = dwt′ × St + wt′ × dSt + dwt′ × dSt


| {z }
d(wt′ St )
+dwt,G × Gt + wt,G × dGt ,
| {z }
d(wt′ Gt )

where the term dwt,G dGt lacks because Gt is deterministic (in other words, the product of the two differentials is
2
an infinitesimal of the same order as (dt) ).
The dynamics of wealth can also be written as

dRt = wt′ dSt + wt,G dGt


| {z }
dRt,1

+dwt′ (St + dSt ) + dwt,G × Gt ,


| {z }
dRt,2

where dRt,1 are changes in wealth due to the changes in asset prices (dSt and dGt ) while dRt,2 are changes in wealth
due to changes in portfolio allocation (dwt and dwt,G ).
The changes in portfolio composition cannot be arbitrarily chosen, since we cannot invest more than our wealth.
Accordingly, we can distinguish three cases:

1. strict self-financing condition: the agent has no more wealth than his portfolio value and he wants neither
to contribute nor to withdraw any money from it; accordingly, in each period, he can invest more in one asset
only if he suitably decreases the amount of money invested in the other assets; this condition can be written
as
dwt′ (St + dSt ) + dwt,G × Gt = 0,

where we see that St + dSt is the new price of asset St , after the period dt;

2. outflows: at each period, the agent withdraws some money from his portfolio in order, for instance, to finance
consumption; if we call c (t) dt the amount of consumption in the instant dt, then

dwt′ (St + dSt ) + dwt,G × Gt = −c (t) dt,

3. inflows: at each period, the agent receives some yield y (t) dt and a percentage α (t) of it is invested in the
portfolio; this means that
dwt′ (St + dSt ) + dwt,G × Gt = α (t) y (t) dt.

Case 2 is typical of a pension fund when it starts paying pensions: at each time the amount of pensions are deducted
from its wealth. Instead, case 3 is typical of a pension fund when it receives contributions from its sponsors, during
the so-called «accumulation phase».
Right now we just take into account the case of a strict self-financing portfolio. Accordingly, the wealth differ-
ential can be written as
dRt = wt′ dSt + wt,G dGt , (6.2.2)
6.3 Arbitrage 65

which is the dynamic version of the constraint (6.2.1). Since both constraints must be verified at any time, we
can merge them by taking wt,G from (6.2.1) and plugging it into (6.2.2). Accordingly, we have just one (dynamic)
constraint as follows:
Rt − wt′ St
dRt = wt′ dSt + dGt ,
G
| {zt }
wt,G

and after substituting for both dSt and dGt from (6.1.1) and (6.1.2) respectively, we have
( ( ))
dRt = Rt rt + wt′ IS µ − rt 1 dt + wt′ IS Σ′ dWt ,
1×n n×n n×1 n×1 1×n n×n n×k k×1

where 1 is a vector containing just 1’s.


Here, wt′ IS is the amount of wealth invested in each asset. Sometimes we prefer to write the wealth differential
equation as a function of the percentage of wealth invested in each asset:
1 ′
θt ≡ w IS .
Rt t
In this case, the stochastic equation for wealth becomes
dRt
= (rt + θt′ (µ − rt 1)) dt + θt′ Σ′ dWt .
Rt
The differences between the expected returns on the risky asset and the riskless interest rate are called risk
premia (µ − r1). In fact, they measure the excess return (with respect to r) which an agent needs in order to bear
a risk.

6.3 Arbitrage
A financial market is well defined if there is no arbitrage, which can be defined as a strategy (portfolio) without
risk and whose return is different from rt . Accordingly, θt is an arbitrage if the two following conditions hold


 θt′ Σ′ = 0 ,
(
1×n n×k )
1×k


 θt ′
µ − rt 1 ̸= 0.
1×n n×1 n×1

For checking whether there is an arbitrage on financial market, we can use the following result.
Lemma 6.3.1 (Fredholm). One and only one of the two following cases is true:

∃x ∈ Rk : A′ x = b ,
n×k k×1 n×1


 y ′ A′ = 0,
1×n n×k
∃y ∈ Rn :

y

b ̸= 0.
1×n n×1

Fredholm [1903]’s lemma allows us to conclude what follows.


Proposition 6.3.1. On the financial market (6.1.1)-(6.1.2) there is no arbitrage if and only if there exists a vector
ξt ∈ Rk such that
Σ′ ξt = µ − rt 1 . (6.3.1)
n×k k×1 n×1 n×1

Equation (6.3.1) is a linear system of n equations in k unknowns which can have:


1. only one solution (the market is arbitrage free);
2. infinite solutions (the market is arbitrage free);
3. no solution (the market is not arbitrage free).
66 6 The financial market

Since in the real world the financial markets are actually arbitrage free, then we never take into account the third
case.
We highlight that Equation (6.3.1) has (at least) a solution if there exists the so-called left inverse of matrix
Σ′ . In particular, the matrix Σ′l is said to be the left inverse of Σ′ if

Σ′l Σ′ = I,

where I is the identity matrix.

Exercise 6.3.1. A financial market with one risky asset driven by one risk source:
dSt
= µdt + σdWt ,
St
dGt
= rdt,
Gt
is always arbitrage free since there exists the scalar ξ which solves

σξt = µ − r.

Furthermore, ξt coincides with the Sharpe ratio (and, in this case, it is constant). Let us stress that if σ = 0
(i.e. both assets are riskless), then the market is arbitrage free if and only if µ = r (on the financial market
there cannot be more than one risk free return).

The vector ξ has a nice economic interpretation. If σ measures the risk and µ − r is the risk premium, then the
ratio between µ − r and σ is the risk premium for any unit of risk: actually, this is the «market price of risk». If
there are k risk sources on the financial market, then there must be k prices of risk. Thus, a financial market works
well (is arbitrage free) if and only if it is able to provide a price for any risk source.

Exercise 6.3.2. Let us take into account the following market:


dSt,1
= µ1 dt + σ1 dWt,1 ,
St,1
dSt,2
= µ2 dt + σ2 dWt,1 ,
St,2
dGt
= rdt,
Gt
where there are two risky assets driven by just one risk source dWt,1 . In this case we can check the existence of
arbitrage by solving the linear system [ ] [ ]
σ1 µ1 − r
ξt = .
σ2 µ2 − r
| {z } | {z }
Σ′ µ−r1

This system has a solution if and only if


µ1 − r µ2 − r
= ,
σ1 σ2
i.e. if the Sharpe ratios of the two risky assets coincide. If this is not the case then the financial market is not
arbitrage free.

The interpretation of this result is easy: if two assets depend on the same risk source, then their market price
of risk must be the very same!
6.4 Completeness (and asset pricing) 67

Exercise 6.3.3. Let us take into account the following market


dSt
= µdt + σ1 dWt,1 + σ2 dWt,2 ,
St
dGt
= rdt,
Gt
where there is just one risky asset whose price depends on two risk sources. In this case there is no arbitrage if
we are able to solve [ ]
[ ] ξt,1
σ1 σ2 = µ − r.
| {z } ξt,2 | {z }
Σ′ µ−r1

Since this equation has infinite solutions, then the financial market is arbitrage free.

Exercise 6.3.4. Let us take into account the following market


dSt,1
= µ1 dt + σ1,1 dWt,1 + σ1,2 dWt,2 ,
St,1
dSt,2
= µ2 dt + σ2,1 dWt,1 + σ2,2 dWt,2 ,
St,2
dGt
= rdt,
Gt
where there are two risky assets driven by two risk sources. There is no arbitrage if and only if we are able to
solve [ ][ ] [ ]
σ1,1 σ1,2 ξt,1 µ1 − r
= .
σ2,1 σ2,2 ξt,2 µ2 − r
| {z } | {z }
Σ′ µ−r1

There exists only one solution to this system if

σ1,1 σ2,2 − σ1,2 σ2,1 ̸= 0.

In this case we have


[ ] [ ]−1 [ ]
ξt,1 σ1,1 σ1,2 µ1 − r
=
ξt,2 σ2,1 σ2,2 µ2 − r
 ( ) 
−r µ2 −r
1 σ2,2 σ1,2 µσ11,2 −
=  ( σ2,2 ) .
σ1,1 σ2,2 − σ1,2 σ2,1 −r
σ2,1 σ1,1 µσ22,1 − µ1 −r
σ1,1

Remark 6.3.1. Hereafter, we will always work with arbitrage free financial market.

6.4 Completeness (and asset pricing)


Definition 6.4.1. A financial market is said to be complete if and only if any asset can be replicated by a suitable
portfolio.
Now, let us assume there is an asset whose price follows
dFt
= µF dt + σF′ dWt .
Ft 1×k k×1

Since the market is arbitrage free, then also the drift and diffusion terms of this asset must verify the no arbitrage
condition:
σF′ ξt = µF − rt .
68 6 The financial market

This means that the previous differential equation can be written as


dFt
= (rt + σF′ ξt ) dt + σF′ dWt . (6.4.1)
Ft 1×k k×1

We can conclude that on an arbitrage free financial market, the expected return on any asset is given by the
riskless interest rate augmented by the product between the diffusion term and the market price of risk.
Proposition 6.4.1. On an arbitrage free financial market (where ∃ξt : Σ′ ξt = µ−rt 1), the drift of any asset having
diffusion σF′ must be rt + σF′ ξt .
In order to replicate asset Ft , we must look for a portfolio θ such that the investor’s wealth
dRt
= (rt + θt′ (µ − rt 1)) dt + θt′ Σ′ dWt , (6.4.2)
Rt
coincides with dF
Ft . The two stochastic processes (6.4.1) and (6.4.2) are equal if both their drifts and diffusions are
t

equal. Nevertheless, the absence of arbitrage allows us to ask just for the diffusion terms to be equal, in fact, if we
are able to find a portfolio such that
θt′ Σ′ = σF′ , (6.4.3)
1×n n×k 1×k

then equation (6.4.1) becomes


 
dFt  
= rt + σF′ ξt  dt + σF′ dWt
Ft |{z} |{z}
θt′ Σ′ θ ′ Σ′

= (rt + θt′ Σ′ ξt ) dt + θt′ Σ′ dWt ,

and, because of the no arbitrage condition (i.e. Σ′ ξt = µ − rt 1), we finally have


dFt
= (rt + θt′ (µ − rt 1)) dt + θt′ Σ′ dWt ,
Ft
which is exactly the differential equation of wealth Rt . Accordingly, condition (6.4.3) is necessary and sufficient for
replicating asset Ft .
Remark 6.4.1. Condition (6.4.3) implies that the asset Ft and the wealth Rt behave in the very same way. Nev-
ertheless, it does not guarantee that they have the same value. This last condition (i.e. Ft = Rt ) is satisfied by
finding the amount of riskless asset wt,G which solves

Rt = wt,G Gt + wt′ St = Ft ,
Ft − wt′ St
wt,G = ,
Gt
or, which is the same,
1 1 ′ Ft
wt,G Gt + wt St = ,
Rt Rt Rt
Ft
θt,G + θt′ 1 = ,
Rt
i.e.
Ft
− θt′ 1.
θt,G =
Rt
System (6.4.3) has a solution if the matrix Σ′ has a so-called right inverse. In particular, we say that Σ′r is
the right inverse of matrix Σ′ if
Σ′ Σ′r = I.
Nevertheless, we recall that we have already assumed that Σ′ has a left inverse for the market to be arbitrage
free. Accordingly, if we now assume that Σ′ has also the right inverse, then we are assuming that it is invertible (in
fact a matrix which has both the left and the right inverse is invertible). If Σ′ is invertible then Equation (6.3.1)
has only one solution:
ξt = Σ′−1 (µ − rt 1) .
6.5 Incomplete financial market and incomplete replication 69

Proposition 6.4.2. The financial market is complete if and only if there exists only one vector of market price of
risk solving Equation (6.3.1).

No arbitrage Completeness

∃ξt : Σ′ ξt = µ − rt 1 ∃!ξt : Σ′ ξt = µ − rt 1

Proposition 6.4.3. In a complete financial market defined by (6.1.1)-(6.1.2), an asset having diffusion σF (as in
6.4.1), is replicated by the portfolio θt = Σ−1 σF .

6.5 Incomplete financial market and incomplete replication


Given the asset (6.4.1) and the wealth (6.4.2), if the financial market is arbitrage free but incomplete (i.e. ∄Σ−1 )
then it is impossible to solve the system Σθt = σF , or, in other terms

Σθt − σF ̸= 0,

which means that there will always be an error when trying to replicate an asset with a portfolio. Nevertheless,
we can try to approximate a kind of (non perfect) replicating portfolio. Many choices are available, but one quite
«natural» is to minimize the square of the replicating error:

min (θt′ Σ′ − σF′ ) (Σθt − σF ) .


θt

The second order condition for this problem always hold since the variance-covariance matrix Σ′ Σ is always
positive semi-definite. Thus, by solving the first order condition

2Σ′ Σθt − 2Σ′ σF = 0,

we can find the minimum square replicating portfolio:


−1
θt∗ = (Σ′ Σ) Σ′ σ F ,

where we recall that the variance-covariance matrix is invertible because of the no arbitrage condition (i.e. the
existence of the left inverse of matrix Σ′ ).
If this portfolio allocation is replaced in the wealth equation (6.4.2), it becomes

dRt ( )
−1 −1
= rt + σF′ Σ (Σ′ Σ) (µ − rt 1) dt + σF′ Σ (Σ′ Σ) Σ′ dWt .
Rt
The drift of this equation must coincide with the drift of the asset Ft :
−1
rt + σF′ ξt = rt + σF′ Σ (Σ′ Σ) (µ − rt 1) ,

and thus, the only market price of risk ξt compatible with this approach is
−1
ξt∗ = Σ (Σ′ Σ) (µ − rt 1) .

It is important to stress that this value of the market price minimizes its square under the no arbitrage condition:

ξt∗ = arg min ξt′ ξt


ξt
s.t.
Σ′ ξt = µ − rt 1

and, thus, it is consistent with the least-square framework.


70 6 The financial market

The minimum replicating error is given by


( ∗ ′ ′ )
(θt ) Σ − σF′ (Σθt∗ − σF )
( )( )
−1 −1
= σF′ Σ (Σ′ Σ) Σ′ − σF′ Σ (Σ′ Σ) Σ′ σF − σF
( )( )
−1 −1
= σF′ Σ (Σ′ Σ) Σ′ − I Σ (Σ′ Σ) Σ′ − I σF
( )
−1
= σF′ I − Σ (Σ′ Σ) Σ′ σF ,
−1
where we see that, if the market is complete, the error is zero since Σ (Σ′ Σ) Σ′ = I, where I is the identity matrix.

6.6 Change of probability and asset pricing


We take back the stochastic differential equation for a generic asset Ft :
dFt
= (rt + σF′ ξt ) dt + σF′ dWt ,
Ft
which can be written as
dFt
= rt dt + σF′ (ξt dt + dWt ) .
Ft
Girsanov has demonstrated that the term ξt dt + dWt is equal to another Wiener process under a new probability
measure. Thus, under this new measure, the expected return on any asset must be equal to the riskless interest
rate (this is the reason why the new probability is often called «risk neutral probability»).
Theorem 6.6.1 (Girsanov). Given the market (6.1.1)-(6.1.2), if there exists a vector ξt such that
Σ′ ξt = µ − rt 1
then there exists a probability measure Q such that
dWtQ = ξt dt + dWt , (6.6.1)
provided that the so-called Radon-Nikodym derivative
dQ T |t ∫T ′ ∫T ′
= e− 2 t ξs ξs ds− t ξs dWs ,
1
(6.6.2)
dP T |t
[ ]
dQ T |t
is a martingale (i.e. Et dP T |t = 1).

A sufficient (but not necessary) condition for (6.6.2) to be a martingale on a given horizon T , is the so-called
Novikov’s condition [ 1 ∫T ′ ]
Et e 2 t ξs ξs ds < ∞.
A demonstration of Theorem 6.6.1 can be found in Karatzas and Shreve [1991]. Theorem 6.6.1 allows us to
conclude that if the financial market is arbitrage free (and ξt is such that the Radon-Nikodym is a martingale),
then we can switch from the historical probability P to another probability Q under which the expected return on
any financial asset coincides with the riskless interest rate. This is the reason why the new probability Q is called
risk neutral probability.
Theorem 6.6.1 is not based on the hypothesis of completeness. In fact, it is sufficient that the financial market
is arbitrage free.
Now we take Equation (6.1.1) and we rewrite it under the new probability by using (6.6.1):
IS−1 dSt = µdt + Σ′ dWt
( )
= µdt + Σ′ dWtQ − ξt dt
= (µ − Σ′ ξt ) dt + Σ′ dWtQ .
Since the market is arbitrage free, then µ − Σ′ ξt = rt 1 (from Equation (6.3.1)), and so
IS−1 dSt = rt 1dt + Σ′ dWtQ .
6.6 Change of probability and asset pricing 71

Remark 6.6.1. Girsanov’s theorem allows us to change the drift of any stochastic process, while the diffusion
cannot be changed.

The new probability measure Q allows us to obtain the following result:2


( ) ( ) ( )
1 1 1 1
d St = d St + dSt + d dSt
Gt Gt Gt Gt
| {z }
0
1 1 ( )
= − rt St dt + IS rt 1dt + Σ′ dWtQ
Gt Gt
1
= IS Σ′ dWtQ .
Gt

If we compute the integral from t up to T of both sides we have


∫ T ( ) ∫ T
1 1
d Ss = IS Σ′ dWsQ ,
t Gs t Gs
∫ T
ST St 1
− = IS Σ′ dWsQ ,
GT Gt t Gs
whose expected value is
[ ]
ST St
EQ
t = .
GT Gt
This is the formula of a martingale. This is why Q is also known as martingale equivalent measure (under
it the asset prices are martingale if measured in term of the riskless asset, which is accordingly the numéraire of
the economy).
Furthermore, since Gt belongs to the information set in t, we can rewrite the previous equation as
 
 Gt 
St = EQ 
t ST
,
GT 
|{z}
discount factor

Gt
where we have already underlined that GT is the discount factor between t and T . This is a fundamental result in
asset pricing.

Theorem 6.6.2 (Fundamental Theorem of Asset Pricing (I)). On an arbitrage free financial market, the price
of any asset is given by the expected value, under the risk neutral probability, of its future value discounted by
the riskless interest rate.

According to Theorem 6.6.2, the value in t of an asset which pays 1 Euro in T (for sure) is given by
[ ] [ ∫T ]
Gt
Bt,T = EQ
t 1 × = EQ
t e
− t ru du
. (6.6.3)
GT

This means that the value of a zero coupon bond is given by the expected value of the discount factor (under
the risk neutral probability).
If an asset pays some cash flows δt at any instant t and up time T , and then it is sold in T at the price ST
(unknown in t), its value can be computed as the sum of each cash flow which is interpreted as a single asset (the
strategy of trading cash flows of an asset independently of the main asset is called «stripping») as shown in Figure
6.6.1.
( )
2 We recall that given dGt = Gt rt dt, we have d 1
Gt
= − G1 rt dt.
t
72 6 The financial market

Figure 6.6.1: Expected present value of dividends


time
...
St0 δ t1 δ t2 δtn + Stn

[ ]
EQ
G t0
t0 δt1 Gt 1

[ ]
EQ
G
t0
t0 δt2 Gt 2

...
[ ]
EQ
G t0
t0 (δ t n
+ St n
) Gt n

Accordingly, the value of a dividend paying asset is


∫ T [ ] [ ]
Gt Gt
St = EQ
δs
t
Q
ds + Et ST
t Gs GT
[∫ ]
T
Gt Gt
= EQ
t δs ds + ST .
t G s G T

Theorem 6.6.3 (Fundamental Theorem of Asset Pricing (II)). On an arbitrage free financial market, the value
of any asset is given by the expected value, under the risk neutral probability, of its future cash flows discounted
by the riskless interest rate.

6.7 The switch between probabilities


It should be quite clear that (with t ≥ t0 )
∫ ∫
dQ t|t0 (ω)
EQ
t0 [Xt ] = Xt0 (ω) dQ t|t0 (ω) = Xt0 (ω) dP t|t0 (ω)
Ω Ω dP t|t0 (ω)
[ ]
dQ t|t0
= Et0 Xt ,
dP t|t0

dQ
t|t0
where dP t|t is the Radon-Nikodym derivative (6.6.2), and Ω is the domain of the stochastic variable X.
0
Thus, we can always switch from an expected value under Q to an expected value under the historical probability
P and vice versa.
From Girsanov theorem we know that
dQ t|t0 ∫ ∫
− 1 t ξ ′ ξ ds− tt ξs′ dWs
= e 2 t0 s s 0 ≡ mt0 ,t ,
dP t|t0

and by using Itô’s lemma we can write


dmt0 ,t
= −ξt′ dWt (6.7.1)
mt0 ,t
Accordingly, the value of any asset can be written as
[ ∫ ]
− tT rs ds
St0 = EQt0 ST e 0 ,
6.8 Assets with coupons/dividends 73

or, alternatively,
[ ∫ ]
− T r ds
St0 = Et0 ST mt0 ,T e t0 s
[ ∫ ∫ ∫ ]
− 1 T ξ ′ ξ ds− tT ξs′ dWs − tT rs ds
= Et0 ST e 2 t0 s s 0 e 0 .

Thus, we can conclude that the value of an asset can also be computed under the historical probability but the
discount factor must be stochastic:
 
∫T ∫T
St0 = Et0 ST e|
− (rs + 12 ξs′ ξs )ds− ξs′ dWs

} .
t0 t0
{z
Stochastic Discount Factor (SDF)

Since the SDF between time t0 and time t0 is of course equal to 1, then we can write

St0 SDFt0 ,t0 = Et0 [ST SDFt0 ,T ] ,

i.e. under the historical probability the asset prices are martingales if they are discounted by the stochastic discount
factor.

6.8 Assets with coupons/dividends


If a financial asset instantaneously pays coupons/dividends δt (in monetary units), then the fundamental asset
pricing theorem must be interpreted in the following way: under the probability Q, the «total» expected return on
the asset must equate the riskless interest rate rt and the total return is the sum between the capital return ( dS
St )
t

and the coupon/dividend return ( Sδtt dt). Thus, we must write

dSt δt
+ dt = rt dt + σ ′ dWtQ ,
St St
which becomes ( )
dSt δt
= rt − dt + σ ′ dWtQ .
St St
This means that an asset which pays coupons/dividends has a growth rate lower than that of an asset which
does not pay any cash flow.

Exercise 6.8.1. Let us take into account the simplest case: everything is deterministic (coupons/dividends and
interest rate). The value of an asset paying cash flows δt from t to T is given by
∫ T ∫s
Vt = δs e− t
ru du
ds.
t

Now, if we differentiate with respect to time we have


( ∫ )
T ∫s

dVt = −δt + rt δs e t
ru du
ds dt,
t

which can be written as ( )


dVt δt
= rt − dt,
Vt Vt
which is, of course, the result we were looking for.

According to the result of the exercise, if the value of a bond is given by


[∫ ]
T
Q Gt Gt
Vt,T = Et δs ds + ,
t Gs GT
74 6 The financial market

then the expected value of its differential is

EQ
t [dVt,T ] = (Vt,T rt − δt ) dt.

If the coupons are deterministic then the value of Vt,T can be simplified as follows
∫ T [ ] [ ]
Gt Q Gt
Vt,T = δs EQ
t ds + Et
t Gs GT
∫ T
= δs Bt,s ds + Bt,T .
t

Once the values of all the zero-coupon bonds are known, the asset price Vt,T can be easily computed.

Exercise 6.8.2. An interesting case is that of a perfectly indexed bond whose coupon is equal to the riskless
interest rate: [∫ ]
T
Q Gt Gt
Vt,T = Et rs ds + .
t Gs GT
Now we recall that
dGs
= rs ds,
Gs
and so we can substitute rs ds in the integral with its corresponding value:
 
∫ [∫ ]
 T G dG  T
 G  G G
Vt,T = EQ Q
t s t t t
t  +  = Et dGs +
 t G s Gs GT  G2s GT
|{z} t
rs ds
[[ ]s=T ] [ ]
Gt Gt Gt Gt Gt
= EQ
t − + = EQ
t − + +
Gs s=t GT GT Gt GT
= 1.

The result of this exercise teaches that a perfectly indexed bond must always be listed at par.
Chapter 7

Asset prices

7.1 Forward
The fundamental theorem of asset pricing is a power tool which can be used for pricing any derivative once the rule
for computing its cash flows is known.
The first derivative officially traded on the Chicago Stock Exchange was a forward contract: two parties agree at
time t0 to exchange at time T > t0 an asset, which is called underling asset (whose value will be ST ), against an
amount of money (FT ). The pay-off of who receives ST and pays FT will be ST − FT . By applying the fundamental
theorem of asset pricing we can conclude that the value in t0 of this forward contract is
[ ]
Q Gt0
Ft0 ,T = Et0 (ST − FT ) . (7.1.1)
GT

Both parties are willing to sign the contract if and only if the possible gains and the possible losses compensate,
i.e. if the value of this asset, at time t0 , is nil. This is consistent with the actual price of the forward: when the two
parties enter the contract, no money is exchanged between them and, thus, its price must be zero. This means that
[ ]
Gt0
0 = EQt0 (S T − F T ) ,
GT

form which we can compute the equilibrium value of the forward price FT :
[ ] [ ]
Q G t0 Q G t0
0 = Et0 ST − FT Et0 .
GT GT

Since the fundamental theorem of asset pricing must be valid for any asset, and also for the underling asset
(which is supposed not the pay dividends), then
[ ]
Gt0
St0 = EQ
t0 ST ,
GT

and so the previous equality can be simplified as

0 = St0 − FT Bt0 ,T ,

St0
FT = .
Bt0 ,T

7.2 Futures
While the forward contract does not pay any cash flow between its issuance and its maturity, the futures contract
asks both parties to make up their position day by day. Thus, the party who is loosing money because of a change
in the value of the underlying will have to pay a cash flow to the other party. Accordingly, the value of the futures
contract must be set to zero at any date (since the debt of the party who is loosing money is immediately made
up). If we call F̂t,T the value of the futures contract at any time t ∈ [t0 , T ] (where t0 is the date of issuance and

75
76 7 Asset prices

T is the maturity), the cash flow paid at each period must be dF̂t,T (i.e. the change in the contract value) and,
according to the fundamental theorem of asset pricing, we can write
[∫ ]
T
Q Gs
F̂t0 ,T = Et0 dF̂t,T = 0
t0 G t0

We know that an Itô integral is zero if dF̂t,T is a martingale, i.e. if it can be represented as a Wiener process (a
stochastic differential equation with zero drift). Thus, we know that

dF̂t,T = σF′ dWtQ ,

and, if we integrate both sides:


∫ T ∫ T
dF̂t,T = σF′ dWtQ ,
t0 t0
∫ T
F̂T,T − F̂t0 ,T = σF′ dWtQ ,
t0
∫ T
F̂t0 ,T = F̂T,T − σF′ dWtQ ,
t0

and, finally, when the expected value is computed:


[ ∫ ]
[ ] T
EQ
t0
Q
F̂t0 ,T = Et F̂T,T − σF′ dWtQ ,
t0

[ ]
F̂t0 ,T = EQ
t F̂T,T .

The price of the forward contract at the maturity must coincide with the price of the underling asset (what is
a party willing to pay at time T for receiving the price ST at time T ? Exactly ST ). Thus, we can conclude that

F̂t0 ,T = EQ
t [ST ] ,

i.e. the price of a futures contract is the expected value of the price of the underlying, without any discount factor.
The discount factor is no more present since the position on the futures is made up day by day.

7.3 Options
If at maturity T the buyer of the contract has the right to choose whether to actually make the exchange between
ST and FT , the contract becomes an option (in particular a call option, which gives the right to buy an asset at
a given price K, called strike price). The pay-off of the call option is:

• positive: if at the time T the price ST is higher than the strike price K; in this case, in fact, it is convenient to
buy the stock at the strike price since the buyer pays K for an asset whose value is higher; thus, the pay-off
is ST − K;

• nil: if at time T the price ST is lower than K, then it is convenient to buy the asset directly on the market
(at the price ST < K) by quitting the option contract; in this case the pay-off is zero.

The price of the call option can accordingly be represented as


[ ]
Q Gt0
Ct0 = Et0 (ST − K) IST >K , (7.3.1)
GT

where Iε is the indicator function of the event ε whose value is 1 if the event ε happens and zero otherwise:
{
1, ε happens
Iε =
0, ε does not happen
7.3 Options 77

If we buy the right to sell an asset at time T and at a given price K, this is a so-called put option and its value
is given by [ ]
Q G t0
Pt0 = Et0 (K − ST ) IK>ST . (7.3.2)
GT
There exists a relationship between the prices of a call, a put and a forward, which can be found by recalling
that the indicator function of an event is equal to one minus the indicator function of the opposite event. Thus, if
we start from the value of a call, we can write
[ ] [ ]
G t0 Gt0
Ct0 = EQ t0 (ST − K) IST >K = EQ
t0 (ST − K) (1 − I K>ST
)
GT GT
[ ] [ ]
G G
= EQ + EQ
t0 t0
t0 (ST − K) t0 (K − ST ) IK>ST
GT GT
= Ft0 + Pt0 .

This is the so-called «put-call parity»: the value of a call is equal to the sum of a forward and a put written
on the same underling and having FT = K.
Now, we are going to simplify the price of a call through tow changes in probability. After the first passage:
[ ] [ ] [ ]
Gt0 Gt0 G t0
Ct0 = EQt0 (ST − K) IST >K = EQ
t0 IST >K ST − KE Q
t0 IST >K ,
GT GT GT

the second expected value can be written as follows


[ ] ∫ ∫ [ ] Gt0
G t0 G t0 G t0
EQ
t0 IST >K = IST >K dQ = IST >K EQ
t0 [
GT
] dQ
GT GT GT Q Gt0
Ω Ω Et0 GT
[ ]∫ Gt0
Gt0
= EQ
t0 IST >K [
GT
] dQ,
GT Q Gt0
Ω Et0 GT

where we can define the new probability


G t0
dFT ≡ [
GT
] dQ,
Q G t0
Et0 GT

and obtain [ ] [ ]∫
G t0 Q Gt0
EQ
t0 IST >K = Et0 IST >K dFT = Bt0 ,T EFt0T [IST >K ] .
GT GT Ω

Remark 7.3.1. The expected value of the indicator function of an event is the probability of the event:

Et [Iε ] = P {ε} .

By using this property we can finally write


[ ]
G t0
EQt0 IST >K = Bt0 ,T FT {ST > K} .
GT

The first part of the option price can be simplified in a similar way:
[ ] ∫ ∫ [ ] G
G t0 G t0 G t0 ST GtT0
EQ
t0 I S
ST >K T = IST >K ST dQ = IST >K EQ
t0 ST [ ] dQ
GT Ω GT Ω GT EQ S Gt0
t0 T GT
[ ]∫ G
Gt0 ST GtT0
= EQ
t0 ST IST >K [ ] dQ,
GT EQ S t0
G
Ω T GT
t0
78 7 Asset prices

where the new probability is


G
ST GtT0
dS ≡ [ ] dQ,
EQ
G t0
t0 ST GT

and so
[ ] [ ]∫
Gt0 G t0
EQ
t0 IST >K ST = STEQ
t0 IST >K dS = St0 ESt0 [IST >K ]
GT GT Ω
= St0 S {ST > K} .
Finally, the value of a call option is
Ct0 = St0 S {ST > K} − KBt0 ,T FT {ST > K} .
If the riskless interest rate r is constant and the asset price St is log-normally distributed, then Ct0 coincides
with the Black and Scholes formula.
Because of the put-call parity we obtain the value of the put option as follows
Pt = Ct − Ft = KBt,T (1 − FT {ST > K}) − St (1 − S {ST > K}) .
Case: Black & Scholes formula
We assume that the interest rate r is constant and that the risky asset price follows, under the probability
Q, a log-normal distribution according to the following formula:

ST = St er(T −t)− 2 σ (T −t)+σ T −ty
1 2
,

where y is a standard normal distribution (with zero mean and variance equal to 1). This means that the density
of the variable y under Q is
1
dQ = √ e− 2 y .
1 2


The riskless asset is given by
GT = Gt er(T −t) .
Now, we compute the value of a call option. In order to do so we have to compute both the forward
probability FT and the probability S.
The density of the forward probability is
Gt
e−r(T −t)
dFT ≡ [
GT
] dQ = [ ] dQ = dQ.
EQ
t0
Gt EQ
t0 e−r(T −t)
GT

If the interest rate r is constant, then the forward probability and the risk-neutral probability coincide. With
this result we can compute the probability that ST is higher than the strike price:

FT {ST > K} = Q {ST > K} = Q {ln ST > ln K}


{ }
1 √
= Q ln St + r (T − t) − σ 2 (T − t) + σ T − ty > ln K
2
{ ( ) }
ln St − r − 2 σ (T − t)
K 1 2
= Q y> √ .
σ T −t

Since y is normal, then we can use the cumulative density function of a normal variable (let us call it Φ):
{ ( ) } ( K ( ) )
ln SKt − r − 12 σ 2 (T − t) ln St − r − 12 σ 2 (T − t)
Q y> √ = 1−Φ √
σ T −t σ T −t
( ( ) )
ln SKt − r − 12 σ 2 (T − t)
= Φ − √
σ T −t
( ( ) )
ln SKt + r − 12 σ 2 (T − t)
= Φ √ .
σ T −t
7.4 Replication and hedging 79

We now look at the probability S:


Gt √
St e− 2 σ (T −t)+σ T −ty
1 2
ST G
dS ≡ [ T ] dQ = [ √ ] dQ
EQ Gt
t0 ST GT EQ
t0 St e
− 21 σ 2 (T −t)+σ T −ty

√ √
eσ T −ty eσ T −ty √
dQ = 1 σ2 (T −t) dQ = e− 2 σ (T −t)+σ T −ty dQ
1 2

Q [ σ T −ty ]
= √
Et0 e e2
√ 1 1 √ 2
e− 2 σ (T −t)+σ T −ty √ e− 2 y = √ e− 2 (y−σ T −t) ,
1 2 1 2 1
=
2π 2π
where
√ we see that the variable y, under the new probability S, is again normally distributed but its mean is
σ T − t instead of zero. This means that, under S, the price of the risky asset can be written as
√ √
= St er(T −t)− 2 σ (T −t)+σ T −t(y+σ T −t)
1 2
ST

r(T −t)+ 21 σ 2 (T −t)+σ T −ty
= St e ,

where we see that, with respect to the initial formula, just the sign of σ 2 has changed. Thus, we obtain:

S {ST > K} = S {ln ST > ln K}


{ }
1 √
= S ln St + r (T − t) + σ 2 (T − t) + σ T − ty > ln K
2
{ ( ) } ( ( ) )
ln St − r + 2 σ (T − t)
K 1 2
ln SKt + r + 12 σ 2 (T − t)
= S y> √ =Φ √ .
σ T −t σ T −t

Finally, we are able to obtain:


( ( ) ) ( ( ) )
ln SKt + r + 12 σ 2 (T − t) ln SKt + r − 12 σ 2 (T − t)
Ct = St Φ √ − KBt,T Φ √ ,
σ T −t σ T −t

which is the Black and Scholes formula.

7.4 Replication and hedging


Once the value of a derivative has been computed, like in the previous section, we can also replicate it through a
suitable portfolio. Let me take into account the derivatives seen in the previous section:
[ ]
Gt
Ft = EQt (ST − FT ) = St − FT Bt,T ,
GT

Ct = St S {ST > K} − KBt,T FT {ST > K} ,

Pt = KBt,T (1 − FT {ST > K}) − St (1 − S {ST > K}) .

If we take into account a portfolio formed by θt,S asset St and θt,B asset Bt,T , then the value of a portfolio is

Rt = θt,S St + θt,B Bt,T .

Accordingly, we car easily check the following replicating portfolios:

• a forward can be replicated by buying θt,S = 1 stock and short selling θt,B = −FT bond;

• a call option can be replicated by buying θt,S = S {ST > K} < 1 stock and short selling θt,B = KFT {ST > K}
bond;

• a put option can be replicated by short selling θt,S = (1 − S {ST > K}) < 1 stock and buying θt,B =
K (1 − FT {ST > K}) bond.
80 7 Asset prices

We assume to be able to invest in a stock whose price is St and in a derivative, written on this stock, whose price
is Xt (St ). If we can buy/sell θt,S number of the stock and θt,X number of the derivative, the wealth is given by

Rt = θt,S St + θt,X Xt (St ) .

If we want to hedge this wealth against changes in St , we must find the suitable portfolio composition (θt,S , θt,X )
which makes the value of the portfolio independent of the changes in St . In mathematical terms we want to set to
zero the derivative of Rt with respect to St :

∂Rt ∂Rt
= θt,S + θt,X = 0.
∂St ∂St

Definition 7.4.1. The mathematical derivative of a financial derivative with respect to its underling asset is
called «Delta».

From the previous equation we obtain


θt,X 1
= − ∂Xt ,
θt,S ∂S t

which is called «hedging ratio». In order to hedge a portfolio against a risk, we must buy a number of derivatives
(written on that risk) with respect to stocks which is equal to the opposite of the inverse of the delta.
Sometimes, it is preferable to obtain the hedging ratio as the ratio between two amounts of money by multiplying
both sides by XSt :
t

θt,X Xt 1
= − ∂Xt St ,
θt,S St ∂S X t t

where we see that in the denominator of the right hand side coincides with the elasticity of the derivative with
respect to the underling (that we call ηX,S ).

Remark 7.4.1. The elasticity can be computed as the ratio between the differential of two log functions
∂Xt St d ln Xt
= ,
∂St Xt d ln St
and since
d ln Xt = ηX,S d ln St ,
we can conclude that an easy way to estimate the elasticity is to perform an OLS as

d ln Xt = β0 + β1 d ln St + εt ,

where we expect β0 not to be significantly different from zero and β1 is the estimation of the elasticity.

The Delta plays another important role for measuring the volatility of a derivative. If we compute the differential
of Xt (St ), we have
∂Xt
dXt = dSt ,
∂St
from which ( )2
∂Xt
Vt [dXt ] = Vt [dSt ] .
∂St
Thus, we see that the variance of a derivative is proportional to the variance of the underling and the proportion
is equal to the square of the Delta.

7.5 Real options


A real option is the right to start an investment at a chosen time which, of course, should be «optimal» in some
way. In the simplest approach we assume that:
7.5 Real options 81

• the cost of the investment is constant over time and given by I;

• the value of the investment Vt evolves over time by following a geometric Brownian motion

dVt
= µdt + σdWt ,
Vt

given the initial value V0 ;

• the objective of the investor is to chose the time to exercise the option (τ ) so that the discounted expected
value of the difference between the value Vτ and the cost I is maximised:
[ ]
max Cτ = E0 (Vτ − I) e−rτ .
τ

The problem is solved through an easy application of Theorem 2.8.1. The first step is to rewrite the problem by
defining τ as the stopping time when the process Vt reaches, for the first time, a given threshold (let us call it v):

τ = inf {t ≥ 0 : Vt = v} .

Thus, the previous objective function can be written as follows


[ ] [ ]
Cτ = E0 (Vτ − I) e−rτ = (v − I) E0 e−rτ ,

where we still have to compute the expected value of the discounted factor. The «trick» is to recall that Wt is a
martingale and for any constant α we can write
[ 1 2 ]
1 = E0 e− 2 α t+αWt ,

which is true, because of Theorem 2.8.1, also for a stopping time


[ 1 2 ]
1 = E0 e− 2 α τ +αWτ .

Now, since ( )
(µ− 21 σ 2 )t+σWt
ln VV0t − µ − 12 σ 2 t
Vt = V0 e ⇐⇒ Wt =
σ
then the previous equality can be written as
[( ) ασ ]
Vτ (− 12 α2 − ασ µ+ 12 α 2

1 = E0 e σσ ,
V0

and, since Vτ = v:
( ) ασ [ ]
V0
= E0 e−( 2 α + σ µ− 2 σ σ )τ .
1 2 α 1 α 2

v
Now, we are finally able to compute the expected value by choosing α which solves
( )
1 2 α 1α 2
r= α + µ− σ ,
2 σ 2σ

whose values are √(


( ) )2
α∗ µ 1 µ 1 2r
=− − ± − + .
σ σ2 2 σ2 2 σ2
One solution is positive, while the other one is negative. Since the value of the expected value is
( ) ασ∗
V0 [ ]
= E0 e−rτ ,
v
82 7 Asset prices

then, we immediately see that for an infinite threshold (i.e. v → ∞), τ is never reached and the value of the
option must be zero. The power in the left hand side is zero for v → ∞ if and only if the exponent is positive and,
accordingly, we chose only the positive solution
( ) √( )2
µ 1 µ 1 2r
β≡− − + − + 2.
σ2 2 σ2 2 σ

Accordingly, the value of the option is


( )β
[ ] V0
max Cτ = max E0 (Vτ − I) e−rτ = max (v − I) .
τ τ v v

Remark 7.5.1. The original optimal stopping time, has been changed into an optimal threshold problem where
we aim at finding the value v above which it is expedient to exercise the option (and start investing).

The first order condition is


( )β ( )β
∂C V0 v−I V0
= −β = 0,
∂v v v v
from which we obtain the candidate for the optimal threshold:
β
v∗ = I .
β−1
The second order condition is
( )β ( )β ( )β
∂2C V0 1 I V0 2v −I V0 1
= −β −β 2 +β < 0,
∂v 2 v v v v v v v

which is satisfied for


(β − 1) v < (1 + β) I,

and in the optimal solution v this inequality holds true.
Finally, we can conclude that it is optimal to invest at time τ ∗ , defined as follows:
{ }
∗ β
τ = inf t ≥ 0 : Vt = I .
β−1
Chapter 8

Credit risk

8.1 Default measures


There exists a straightforward parallel between the probability to die for humans and the probability to go bankrupt
for firms. In fact, the model that will be presented here is able to describe both cases.
Let us call πτ the density function of the default time τ (this could be the «death time» for an agent) whose
domain is assumed to be [t0 , ω] (but the analysis does not change if ω → ∞). It must be true that πt > 0 for any
t ∈ [t0 , ω] and
∫ ω
πs ds = 1.
t0

The probability to go bankrupt between time t0 and time t is given by


∫ t
(t qt0 ) = πs ds,
t0

while the probability to be solvable between time t0 and time t is of course


∫ t
(t pt0 ) = 1 − πs ds.
t0

If we differentiate this equation we have


d (t pt0 ) = −πt dt,

with the natural boundary condition (t0 pt0 ) = 1, i.e. the probability to be solvable in t0 given that we are solvable
in t0 is, of course, 1.
The previous differential equation can be also written as

d (t pt0 ) πt
=− dt,
(t pt0 ) (t pt0 )

where
πt πt
= ∫t ≡ λt , (8.1.1)
(t pt0 ) 1 − t0 π (s) ds

is often called hazard rate, while we will call it intensity of default (in the literature about the actuarial risk,
this measure is called force of mortality). Since both the numerator and the denominator of this ratio are positive
then λt must be positive for any t.

Remark 8.1.1. In discrete time, the hazard rate has the following meaning: the number of firms which go
bankrupt in a given period (for instance one year) over the firms who were solvable at the beginning of the same
period.

83
84 8 Credit risk

The solvency probability between t and T can be traced back to the probabilities of solvency between t0 and t
and between t0 and T . In particular, if we solve the differential equation
d (t pt0 )
= −λt dt,
(t pt0 )
(t0 pt0 ) = 1,
we have ∫t
− λs ds
(t pt0 ) = e t0
,
or, equivalently, ∫T
− λs ds
(T pt0 ) = e t0
.
By using Bayes rule about conditional probability
P (B|A) P (A)
P (A|B) = ,
P (B)
we can write
P (τ > t|τ > T ) P (τ > T )
P (τ > T |τ > t) = ,
P (τ > t)
where P (τ > T |τ > t) is the probability to be solvable in T given that one is solvable in t (with T > t). We can
also write it as (T pt ). The probability of being solvable in t given that we are solvable in T is, of course 1 (i.e.
P (τ > t|τ > T ) = 1). Then, since we have
P (τ > T ) = (T pt0 ) ,
and
P (τ > t) = (t pt0 ) ,
we can finally write ∫T
− λ ds
∫T
(T pt0 ) e t0 s
(T pt ) = = ∫t = e− t λs ds . (8.1.2)
(t pt0 ) − λ ds
e t0 s
It is worth noting the analogy between rt and λt and between the vale of a zero coupon Bt,T and the value of
the survival probability (T pt ).

8.2 Double stochastic default intensity and asset pricing


The default time τ is a stochastic variable. Nevertheless, in the previous section we have assumed that both λt and
πt were deterministic. A more realistic assumption is that λt (or πt ) is stochastic itself (this is the so-called double
stochastic model). In this case, Equation (8.1.2) is valid in expected value terms:
[ ∫T ]
(T pt ) = Et e− t λs ds , (8.2.1)

where the expected value is computed under the historical probability.


The analogy between the survival probability in (8.2.1) and the value of a zero coupon bond in (6.6.3) is apparent
and it is summarised in Table 8.2.1.
Since we know that the probability of an event can be written as the expected value of the indicator function of
that event, then Equation (8.2.1) can also be written as
[ ∫T ]
(T pt ) = Et e− t λs ds = Et [IT <τ ] , (8.2.2)

i.e. the expected value of the indicator function that the default hasn’t happened yet at time T (i.e. τ > T ).
Since a probability always belong to the domain [0, 1], from (8.2.2) we see that λt must belong to the domain
[0, +∞[. A negative value of λt would imply a solvency probability higher than 1, which does not make any sense.
When we want to price an asset whose cash flows depend on the default date τ , we have to take into account
the expected value computed with respect to both the financial market risk (summarised by Q) and the credit risk
(summarised by τ ). Thus, the expected value takes the following form

EQ,τ
t,t [•] ≡ E
Q,τ
[ •| Ft ∧ Gt ] ,
where we see that it is computed with respect to two information set:
8.3 Zero-coupon bond 85

Table 8.2.1: Comparison between the value of a zero-coupon bond and the default probability

Financial framework Credit risk framework


d(t p0 )
dGt
Gt = rt dt (t p0 ) = −λt dt
∫t ∫t
Gt = G0 e 0 ru du (t p0 ) = e− 0 λu du
[ ∫T ] [ ∫T ]
Bt,T = EQ
t e
− t ru du
(T pt ) = Et e− t λu du

• the financial set at time t (given by Q) given by Ft ;


• the credit risk set at time t (given by τ ) given by Gt .
Such an expected value should be computed as the integral of its argument, weighted by the join density function
of both the financial risk and the default risk. We will see how to deal with this problem in the next sections.

8.3 Zero-coupon bond


Let us think of a zero-coupon bond with default risk whose value, at maturity T , is 1 if the issuer has not gone
bankrupt, and 0 otherwise:
[ ∫T ]
B0 (t, T ) = EQ,τ
t,t (1Iτ >T + 0Iτ ≤T ) e− t ru du
[ ∫ ]
= EQ,τ
t,t Iτ >T e − tT ru du
.

Now, we can use a trick due to Lando [1998] and write the expected value by using the rule of iterated expected
values (the so-called tower property for expected values)
[ ]
EQ Q Q
t ET [•] = Et [•] ,

as follows [ [ ∫ ]]
B0 (t, T ) = EQ,τ
t,t EQ,τ
T,t Iτ >T e
− tT ru du
,

where the inner expected value has a bigger information set for what concerns the financial risk (Q is taken at
time T ). Actually, at time T , all the financial variables (here just the interest rate) are known and so can be taken
outside the inner expected value:
[ ∫T ]
B0 (t, T ) = EQ,τ
t,t e− t ru du EQ,τ
T,t [Iτ >T ] .

Now, we have the expected value of the solvency indicator function, as in (8.2.2), and we can accordingly write
[ ∫T ∫ ] [ ∫T ]
B0 (t, T ) = EQ,τ
t,t e − t ru du − tT λu du
e = EQ
t e − t (ru +λu )du
,

where we see that [ ∫T ] [ ∫T ]


EQt e − t ru du
> EQ
t e − t λu +ru du
,
| {z } | {z }
B(t,T ) B0 (t,T )

in fact a bond where there is more risk must have a lower price (in order to have a higher return). In particular,
if we compare the return on a non-defaultable bond (which is rt ) and the return on a defaultable bond (which is
rt + λt ), the difference is exactly the default intensity λt .
When default happens, the issuer of the zero-coupon is often able to pay back a percentage of the face value.
We call this percentage the «recovery rate» ϕt . Thus, the possible cash flows of this asset are given by:
• 1 at the maturity T if the default hasn’t happened yet;
86 8 Credit risk

• ϕτ at the time of default τ if default happens before maturity T .

Accordingly, we can write [ ∫T ∫τ ]


Bϕ (t, T ) = EQ,τ
t,t Iτ >T e− t ru du + ϕτ Iτ ≤T e− t ru du ,

where the first part can be simplified as in the previous case, while in the second part we can write
[ [ ∫ ]]
EQ,τ
t,t EQ,τ
∞,t ϕ τ Iτ ≤T e − tτ ru du
,

where we have taken, for the financial risk, the biggest filtration (that corresponding to an infinite maturity).
Unfortunately, all the stochastic variables inside the inner expected value depend on τ and, thus, cannot be taken
outside the inner expected value. Nevertheless, the inner expected value allows us to compute it by using the
marginal density function of τ (i.e. π (τ )) since all the financial risk is known when t → ∞. Thus, we can simplify
the previous formula as [ [∫ ]] ∞ ∫s
EQ,τ
t,t EQ,τ
∞,t ϕs Is≤T e− t
ru du
πs ds .
t

Now, we recall that ∫s


πs = λs e− t
λu du
,
as in Equation (8.1.1). Accordingly, we can write
[∫ ∞ ∫
]
Q − ts (ru +λu )du
Et ϕs λs Is≤T e ds ,
t

where the indicator function can be eliminated from the integrand function by changing the upper bound of the
integral as follows: [∫ ]
T ∫
Q − ts (ru +λu )du
Et ϕs λs e ds .
t

Finally, the value of a bond with default risk can be written as


[∫ ]
T ∫s ∫T
Bϕ (t, T ) = EQ
t ϕs λs e − t
(λu +ru )du
ds + e − t
(λu +ru )du
.
t

We can see that the value of this asset looks like a coupon bond, whose coupons are equal to ϕt λt , and at
maturity the face value is paid back. The discount rate is, another time, rt + λt as it happened before.
The price of a zero-coupon looks like the price of a coupon bond since the recovery rate ϕt may be paid at any
time and, thus, when weighted by the default probability, it works like a coupon.

8.4 Default-coupon bond


We have already shown that a bond whose coupon is indexed to the interest rate as follows

δt = a + rt ,

where a is a spread, has the following price


[∫ ]
T ∫s ∫T
EQ
t (a + rs ) e − t
ru du
ds + e − t
ru du
t
[∫ ] [∫ ]
T ∫s T ∫s ∫T
=EQ
t ae − t
ru du
ds + EQ
t rs e − t
ru du
ds + e − t
ru du
t t
∫ T
=a B (t, s) ds + 1.
t

This means that such an asset should always be listed over the par. Nevertheless, during the last financial crisis,
many indexed assets (with a spread) felt under the par. How can we explain that?
8.5 Credit Default Swap (CDS) 87

The answer is in the credit risk. If we take into account the credit risk, the value of such an asset is
[∫ ]
T ∫s ∫T ∫τ
EQ,τ
t,t (a + rs ) Is<τ e− t
ru du
ds + IT <τ e− t
ru du
+ IT ≥τ ϕτ e− t
ru du
,
t

which simplifies to
[∫ ∫ ]
T ∫s ∫T T ∫s
EQ
t (a + rs ) e − t
(λu +ru )du
ds + e − t
(λu +ru )du
+ λs ϕs e − t
(λu +ru )du
ds
t t
[∫ ]
T ∫s ∫T
=EQ
t (a + rs + λs ϕs ) e− t
(λu +ru )du
ds + e− t
(λu +ru )du
.
t

We can rewrite the coupon as follows

a + rs + λs ϕs = a + rs + λs − (1 − ϕs ) λs ,

in order to simplify the computations:

[∫ ]
T ∫s ∫T
EQ
t (a + rs + λs − (1 − ϕs ) λs ) e − t
(λu +ru )du
ds + e − t
(λu +ru )du
t
[∫ ]
T ∫s ∫T
=EQ
t (rs + λs ) e − t
(λu +ru )du
ds + e − t
(λu +ru )du
t
| {z }
1
[∫ ]
T ∫s
+ EQ
t (a − (1 − ϕs ) λs ) e − t
(λu +ru )du
ds
t
[∫ ]
T ∫s
=1 + EQ
t (a − (1 − ϕs ) λs ) e − t
(λu +ru )du
ds ,
t

where we see that this asset can be listed under the par if λt is sufficiently high. So, during a crisis, when the credit
risk is very high (i.e. when λt increases) the value a − (1 − ϕs ) λs could be negative. The term 1 − ϕt is know as
the «Loss Given Default» (LGD).

8.5 Credit Default Swap (CDS)


Let us assume that a firm A has a credit towards a firm C. In a CDS, firm A (called «protection buyer») agrees
to pay a constant amount of money (called «spread» δ) to another firm B (called «protection seller») in order to
receive money if firm C does not pay its debt. The spread is paid until the maturity of the CDS (T ) or until default
if it happens before.
The scheme of a CDS can be represented as in Figure 8.5.1.
Firms A and B agree to sign the CDS if the expected present values of their cash flows are equal. The positive
cash flows for the protection seller are the spreads, while the positive cash flow for the protection buyer is the loss
given default (i.e. what the firm C is not able to pay to firm A):
[∫ ]
T ∫s [ ∫τ ]
EQ,τ
t,t δIs<τ e− t ru du ds = EQ,τ
t,t IT ≥τ (1 − ϕτ ) e− t ru du .
t | {z }
| {z }
+Protection buyer
+Protection seller

This formula allows us to compute the value of δ through the following simplifications:
[∫ ] [∫ ]
T ∫ T ∫
Q − ts (λu +ru )du Q − ts (λu +ru )du
δEt e ds = Et λs (1 − ϕs ) e ds ,
t t
88 8 Credit risk

Figure 8.5.1: Cash flows generated by a Credit Default Swap contract

premia (spread )
Protection buyer Protection seller
(Firm A) (Firm B)
Debt
payment

no payment

Reference entity
(Firm C)

Table 8.5.1: Spreads (and implied default probabilities) for 5 year CDS on sovereign debts

January 26, 2012 January 8, 2014


Country 5y spread (bp) 5y default prob. 5y spread (bp) 5y default prob.
USA 46 2, 83% 29 2, 39%
Germany 90 5, 47% 24 1, 98%
France 172 10, 19% 49 4, 00%
UK 80 4, 88% 26 2, 14%
Italy 439 24, 00% 147 11, 53%
Japan 132 7, 92% 44 3, 60%
Ireland 633 32, 67% 104 8, 30%
Portugal 1294 55, 46% 288 21, 34%
Spain 365 20, 40% 126 9, 97%
Greece 5363 96, 50% 548 36, 66%

[∫ ∫ ]
EQ
T − ts (λu +ru )du
t t
λ s (1 − ϕ s ) e ds
[ ]
Q ∫ T − s (λu +ru )du
δ= ∫ .
Et t e t ds
We can conclude that the value of a spread for a CDS is given by the weighted mean of the terms λs (1 − ϕs ),
where the weights are given by the discount factors (at the rate ru + λu ).
Through the theorem of the mean value, we know that there exist constant λ̂ and ϕ̂ such that
[∫ ] [ ( ) ∫s ]
Q ∫T

EQ
T − ts (λu +ru )du − t (λu +ru )du
t t
λs (1 − ϕ s ) e ds Et t
λ̂ 1 − ϕ̂ e ds
[ ] [∫ ]
Q ∫ T − s (λu +ru )du
∫ = ∫ ,
EQ
T − s (λu +ru )du
Et t e t ds t t
e t ds

and since λ̂ and ϕ̂ are constant, we have ( )


δ = λ̂ 1 − ϕ̂ .
In this case the survival probability of a firm from time t up to time T is given by
∫T
(T pt ) = e− t
λ̂ds
= e−λ̂(T −t) ,
and, given the previous result,
− δ
(T −t)
(T pt ) = e 1−ϕ̂ .
This formula is often used for applications where δ is known (the spread is listed on the financial market), and
so is the time to maturity (T − t), while the recovery rate must be assumed (it is often assumed to be 20%). Some
spread listed on the financial market are shown in Table 8.5.1.
Chapter 9

Risk measures

9.1 Coherent risk measures


When one tries to measure risk with a function that is not «correct» for this purpose, it is like trying to measure a
distance with an elastic band.
Artzner et al. [1999], for the first time, use an axiomatic approach to define the properties that a function must
have in order to provide a «good» risk measure.
The idea is to define profits and losses (P&L) on a given asset (or portfolio) as a stochastic variable X, and then
create a function Ψ (X) which gives, as a result, the risk (whatever it means) on X. We assume that the function
Ψ (X) is such that:
• Ψ (0) = 0, i.e. if we invest nothing on risky assets, then there is no risk;
• Ψ (X) > 0, measures the «risk», i.e. it measures the loss we may have when investing in a risky asset;
• Ψ (X) < 0, measures a gain, i.e. an amount of money that we should «risk» to obtain (instead of paying) or,
in other words, a positive cash flow.
Given these first statements, we say that a risk measure Ψ (X) is a good risk measure (or, in other words, it is
«coherent») if it satisfies the following axioms.

Axiom 1 (Translation invariance). Given the P&L X, if some money is invested in a riskless activity whose
(non stochastic) payoff is a constant k, then:

Ψ (X + k) = Ψ (X) − k.

In other words, if investing in an asset could imply a loss of 1 Euro (per day), then investing in this asset and
having a riskless return of 1 Euro, would lead to a total risk of loosing nothing.
Note that the property Ψ (0) = 0 together with Axiom 1 implies

Ψ (k) = −k,

that is if we invest in a riskless asset whose return is k (Euros), then our «risk» will actually be to gain k Euros (so
the risk function must take a negative value just for indicating this opportunity to gain something).
This axiom also implies that in order to set the risk to zero, we must invest in a riskless asset an amount of
money such that the return on this riskless asset is equal to the risk on the risky asset, i.e. k = Ψ (X). In this case,
in fact, if we apply Axiom 1, we have

Ψ (X + Ψ (X)) = Ψ (X) − Ψ (X) = 0.

Axiom 2 (Monotonicity). Given two P&L variables X1 and X2 such that X1 ≥ X2 in any state of the world,
then
Ψ (X1 ) ≤ Ψ (X2 ) .

89
90 9 Risk measures

The meaning of this axiom is that if one asset always pays more than another one, then a good risk measure
must indicate that the first asset is better (i.e. the first asset must imply a lower risk).
In order to understand better this idea, let us take into account again Axiom 1. If we define X2 = X1 + k, i.e.
the second asset is actually a portfolio formed by the first asset and a riskless asset. So we can write

Ψ (X2 ) = Ψ (X1 + k) = Ψ (X1 ) − k < Ψ (X1 ) .

Axiom 3 (Sub-additivity). Given two P&L variables X1 and X2

Ψ (X1 + X2 ) ≤ Ψ (X1 ) + Ψ (X2 ) .

This axiom can be interpreted as «diversification is good»: the risk of a portfolio should never be greater than
the sum of the risks computed on each asset separately.

Axiom 4 (Positive homogeneity). Given a P&L X and a positive constant α:

Ψ (αX) = αΨ (X) .

The meaning of this final axiom is that if we risk to lose 2.5 Euros when we buy 1 asset, then if we buy 2 assets
we must risk to lose 5 Euros.

Definition 9.1.1. Any risk measure Ψ (X) that satisfies the Axioms 1–2–3–4 is said to be coherent.

An easy result follows.

Proposition 9.1.1. Any convex combination of coherent risk measures is a coherent risk measure.

∑n
Given n risk measures Ψi (X), with i ∈ {1, 2, ..., n} and some weights ci ≥ 0 such that i=1 ci = 1, then the
new risk measure
∑n
Ψ̂ (X) = ci Ψi (X) ,
i=1

is coherent. The demonstration is easy and it is left to the reader (it is sufficient to check all the axioms).

Remark 9.1.1. Any risk measure Ψ (X) inherits the unit of measure of the variable X. So if X is in Euros,
also Ψ (X) will provide a risk that is measured as the amount of Euros that we risk to lose. Instead, if X is a
percentage (or a return), also Ψ (X) will be a return.

9.2 The variance as a risk measure


Since Markowitz [1952], the risk of an investment has always been measured by the variance of its return. The
higher the variance the higher the risk.
Now, we wonder whether the variance is a coherent risk measure. In order to check that we have to verify each
axiom.

variance V [X] does not have the same unit of measure as X, we take as a risk measure
Remark 9.2.1. Since the √
the standard deviation: V [X].

For what concerns the translation invariance we see that


√ √ √
V [X + k] = V [X] ̸= V [X] − k.

Thus the variance does not satisfy Axiom 1 and then it is not coherent. Nevertheless, we can continue.
9.3 Representation theorem 91

For what concerns the monotonicity, we see that is we define X2 = X1 + k, then


√ √ √
V [X2 ] = V [X1 + k] = V [X1 ],

and then also Axiom 2 is not verified.


Instead, we can demonstrate that the standard deviation satisfy the other two axioms. For what concerns the
sub-additivity: √ √
V [X1 + X2 ] = V [X1 ] + V [X2 ] + 2C [X1 , X2 ].
Now, we know that the correlation index between X1 and X2 is
C [X1 , X2 ]
ρ= √ √ ,
V [X1 ] V [X2 ]
and since it always smaller than 1: √ √
C [X1 , X2 ] < V [X1 ] V [X2 ].
So, if in the previous equality we substitute the covariance with the product of the standard deviations, we are
increasing the value of the right hand side:
√ √ √ √
V [X1 + X2 ] ≤ V [X1 ] + V [X2 ] + 2 V [X1 ] V [X2 ],

or √( )2
√ √ √
V [X1 + X2 ] ≤ V [X1 ] + V [X2 ] ,

and √ √ √
V [X1 + X2 ] ≤ V [X1 ] + V [X2 ],
which is exactly Axiom 3.
Finally, the homogeneity is easy to check (recall that α is a positive constant):
√ √ √
V [αX] = α2 V [X] = α V [X].

So the variance (or the standard deviation) cannot be used for measuring risk.

9.3 Representation theorem


Not only do Artzner et al. [1999] define the properties that a «good» risk measure must satisfy, but they also show
which is the form that a risk measure must have in order to satisfy all the coherence axioms. Their result is shown
in the following theorem (we refer to the original paper for the proof).

Theorem 9.3.1. A risk measure Ψ (X) is coherent if and only if there exists a family of probabilities P (on the
states of the world) such that { }
Ψ (X) = − inf EP [X] . (9.3.1)
P∈P

The theorem states that, given a family of probabilities, we have to choose the probability (belonging to that
family) which minimizes an expected value.
The proof that (9.3.1) satisfies all the coherence axioms is very easy indeed1 , while the prof of the other part of
the theorem (the «only if» part) is much more complicate.
The representation theorem implies what follows.
1. The minus sign in (9.3.1) makes the risk measure positive. In fact, it is reasonable to assume that the minimum
of the risky asset returns (under any probability distribution) is negative.
2. The possibility to choose the probability (P) under which the risk measure must be computed, allows to create
an infinite number of coherent risk measures. This result is positive because allows to adjust the risk measure
to the preferences of the economic agent who is computing it, but it is also negative because states that it is
not possible to create a fully «objective» risk measure.
1 For the proof of the sub-additivity, we recall that the infimum of a sum is always higher than the sum of the infima.
92 9 Risk measures

Table 9.3.1: Three families of probabilities (P) which contain at least one zero and equally distributed events

P1 =one zero e equal weights P2 =two zeros e equal weights

X P1 P2 P3 P4 X P1 P2 P3 P4 P5 P6

−10 0 1
3
1
3
1
3 −10 0 0 0 1
2
1
2
1
2

−5 1
3 0 1
3
1
3 −5 0 1
2
1
2 0 0 1
2

1 1 1 1 1 1
0 3 3 0 3 0 2 0 2 0 2 0

1 1 1 1 1 1
20 3 3 3 0 20 2 2 0 2 0 0

−15 −5 −10 −15


E [X] 15
3
10
3
5
3 3 E [X] 20
2
15
2 2
10
2 2 2
{ } { }
− inf Pi ∈P1 EPi [X] = 15
3 − inf Pi ∈P2 EPi [X] = 15
2

P3 =three zeros e equal weights

X P1 P2 P3 P4

−10 0 0 0 1

−5 0 0 1 0

0 0 1 0 0

20 1 0 0 0

E [X] 20 0 −5 −10
{ }
− inf Pi ∈P3 EPi [X] = 10

Now, we show how to create a coherent risk measure by using the representation theorem. We start from an example
of a stochastic variable taking only four values:

X ∈ {−10, −5, 0, 20} .

In order to compute a (coherent) risk measure we must firstly choose a family of probabilities and then find the
probability (belonging to that family) which minimizes the expected value of X. Just as an example, we list three
possible families of probabilities.

1. The family giving zero probability to one state of the world and the same probability to the others; we show
this case in the upper-left part of Table 9.3.1; this family contains four distributions.

2. The family giving zero probability to two states of the world and the same probability to the others; we show
this case in the upper-right part of Table 9.3.1; this family contains six distributions.

3. The family giving zero probability to three states of the world and probability 1 to one state of the world; we
show this case in the lower part of Table 9.3.1; this family contains four distributions.

Of course, the expected value changes according to the probability family which is chosen (this is the subjective
part of the risk measure) as shown in Table 9.3.1. Once the family (P) has been chosen, we take the probability
(P) which minimizes the expected value.
9.4 Expected Shortfall 93

Table 9.3.2: P&L on two assets on the financial market

States of the world X1 X2 X1 + X2 2X1 X1 + 2


1 −5 2 −3 −10 −3
2 −4 1 −3 −8 −2
3 −3 −2 −5 −6 −1
4 −2 1 −1 −4 0
5 −1 0 −1 −2 1
6 0 1 1 0 2
7 1 4 5 2 3
8 2 3 5 4 4
9 3 5 8 6 5
10 4 6 10 8 6

We see that under all the families proposed in Table 9.3.1, the risk measure coincides with the mean of the worst
results.

Proposition 9.3.1. The opposite of the mean of a given number of worst returns is a coherent risk measure.

A very particular case is given by the probability family which gives the same probability to all the events. In
this case we are computing the mean of the variable X and, accordingly, we can conclude that the opposite of the
mean return is a coherent risk measure.

Example 9.3.1. By using the values in Table 9.3.2 we want to compute the risk measure as the opposite of
the mean of the two worst scenarios.
In this case we have
−5 − 4
Ψ (X1 ) = − = 4.5,
2
−2 + 0
Ψ (X2 ) = − = 1.
2
Axiom 1 is satisfied since we have
−3 − 2
Ψ (X1 + 2) = − = 2.5 = Ψ (X1 ) − 2.
2
Axiom 2 is satisfied since X2 > X1 in any state of the world.
Axiom 3 is satisfied since
−5 − 3
Ψ (X1 + X2 ) = − = 4 < Ψ (X1 ) + Ψ (X2 ) .
2
Axiom 4 is satisfied since
−10 − 8
Ψ (2X1 ) = − = 9 = 2Ψ (X1 ) .
2

9.4 Expected Shortfall


We have demonstrated that the (opposite of the) mean of the worst cases is a coherent risk measure. In finance
this measure is called «Expected Shortfall ».
94 9 Risk measures

Figure 9.4.1: Relationship between the density function f (X) and the distribution function F (X)

f (X)
6

α
1 -
−γ O X
Maximum loss
which happens
F (X)
with prob. α 6
1

α
-
−1 O
F (α) X

Table 9.4.1: Meaning of the distribution and density functions (as shown in Figure 9.4.1)

Variable Description
f (X) Density of X
∫ −γ
F (−γ) ≡ −∞
f (X) dX = α Probability (α) that X takes values smaller than
−γ
F −1 (α) = −γ Maximum loss (−γ) which happens with
probability at least α

Let us compute, for instance, the mean of the losses greater than a threshold −γ as in Figure 9.4.1 (the meaning
of the symbols are summarised in Table 9.4.1), where −γ is the α−quantile of the distribution.
A condition expected value is computed as follows:
∫ −γ ∫
Xf (X) dX 1 −γ
E [ X| X < −γ] = ∫−∞ −γ = Xf (X) dX.
f (X) dX α −∞
−∞

Now, since we know that the distribution function is a probability:


F (X) = p ⇐⇒ f (X) dX = dp
and it is invertible
X = F −1 (p) ,
we can substitute it in the previous integral and compute the integral with respect to p (instead that with respect
to X):
∫ ∫
1 F (−γ) −1 1 α −1
E [ X| X < −γ] = F (p) dp = F (p) dp.
α F (−∞) α 0
Since F −1 (p) is the P&L, then we see that here we are computing the average of the work scenarios (happening
with probability from 0 to α).
9.5 Expected Shortfall: Historical simulation 95

Definition 9.4.1. Given a distribution function F (X), the expected shortfall at the confidence level α is

1 α −1
ESα = − F (p) dp. (9.4.1)
α 0

We immediately see that when α = 1 the expected shortfall coincides with the (opposite of the) mean of X. Let
us compute the value of the expected shortfall when α → 0:

∫ α −1
∂α 0 F (p) dp
lim ESα = − lim ∂
= − lim F −1 (α) = −F −1 (0) ,
α→0 α→0
∂α α α→0

where we have applied de l’Hôpital theorem. So, in this case, the risk measure coincides with the biggest loss.

Example 9.4.1. We now compute the expected shortfall for a stochastic variable uniformly distributed on the
domain [a, b].
The density function is
1
f (X) = ,
b−a
and the distribution function is ∫ X
X −a
F (X) = f (z) dz = ,
a b−a
which can be inverted:
F −1 (α) = a + α (b − a) .
Finally, the expected shortfall is
∫ ∫
1 α −1 1 α 1
ESα = − F (p) dp = − (a + p (b − a)) dp = −a − α (b − a) .
α 0 α 0 2

9.5 Expected Shortfall: Historical simulation


When we want to compute the ESα we need to know the density function f (X) (or the distribution function). In
order to estimate it, an easy method which is used is the so-called «historical simulation». If we call Xt the return
obtained in the last period, and we assume to take into account n past returns (i.e. we take the stochastic variables
from Xt−n+1 to Xt ), then the historical simulation is based on the assumption that Xt+1 (the return obtained in
the next period) can take the same values of the past return, all with the same probability.
So, if the past returns have been
{−0.01, 0.01, 0.02, −0.03} ,
we assume that the tomorrow return has the following distribution


 −0.03, 1

−0.01,
4
1
Xt+1 = 4

 0.01, 1

 4
1
0.02, 4

Given this assumption we can compute the Expected Shortfall as the mean of the worst cases. If we have n
observations (each with probability n1 ) and we want to compute the mean of the α worst cases, we must put the

possible outcomes of the variable X in an increasing order (we call X such ordered variable) and we must compute
the mean of the αn cases (α = 1% on 1000 observations is the mean of the worst 10 scenarios):

1∑1→ 1 ∑→
nα nα
ESα = − Xi = − Xi .
α i=1 n nα i=1
96 9 Risk measures

Unfortunately, most of the time αn is not integer. If, for instance, we have n = 250 and α = 0.01, then αn = 2.5:
how many losses do we take? We must take 2 losses and the third one with a weight which is not n1 but lower.
Thus, we sum all the losses up to the integer part of αn (which is noted ⌊nα⌋) and then we add the loss ⌊nα⌋ + 1
with a weight suitable for reaching the value α:
∑⌊nα⌋ → ( )→
⌊nα⌋
i=1
1
n Xi + α− n X ⌊nα⌋+1
ESα = −
α
∑⌊nα⌋ → →
j=1 Xi + (nα − ⌊nα⌋) X ⌊nα⌋+1
=− . (9.5.1)

In R, the function for putting the elements of a vector in increasing order is

sort(X)

and the function for computing the integer part of a decimal number is

floor(X)

Thus, we can write the following function for computing the expected shortfall, given the vector of returns X
and the confidence level α.

ES = function(X, alpha) {
X = sort(X)
n = length(X)
k = floor(n * alpha)
ES = -(sum(X[1:k]) + (n * alpha - k) * X[k + 1])/(n * alpha)
return(ES)
}

Then, if we assume that the past 250 returns on an asset has been normally distributed with (annual) mean
0.08 and (annual) standard deviation 0.15 , we can compute the expected value at three confidence level as follows.

X = 0.08/250 + 0.15/250 * rnorm(250, 0, 1)


ES(X, 0.01)
## [1] 0.001353724
ES(X, 0.05)
## [1] 0.0009107003
ES(X, 0.1)
## [1] 0.0006839873

9.6 Spectral risk measures


Once we know the distribution function F (X), we can compute any risk measure by attaching a weight to each
loss happening with a given probability F −1 (p). The function which gives the weight ϕ (p) is called spectrum and,
accordingly, the risk measure that is obtained is called «spectral risk measure».

Definition 9.6.1. A risk measure Mϕ is said to be «spectral» if


∫ 1
Mϕ = − ϕ (p) F −1 (p) dp, (9.6.1)
0

where ϕ (p) is the spectrum function.


9.7 Value at Risk (VaR) 97

Figure 9.6.1: Graphical representation of the spectrum of the ESα

α Ip<α
1
ϕ (p) =
6

1
α

-
O α 1 p

Now that the ESα can be written as a spectral risk measure by using an indicator function as follows:
1
ϕ (p) = Ip<α . (9.6.2)
α
This spectrum is graphically represented in Figure 9.6.1.
In fact, we have
∫ 1
1
ESα = − Ip<α F −1 (p) dp
0 α
(∫ α ∫ 1 )
1
= − Ip<α F −1 (p) dp + Ip<α F −1 (p) dp
α
∫ 0 α
1 α −1
= − F (p) dp.
α 0
Acerbi [2002] demonstrates what follows.

Proposition 9.6.1. A spectral risk measure is coherent if and only if its spectrum ϕ (p) is such that, ∀p ∈ [0, 1]:

1. ϕ (p) ≥ 0
2. ϕ (p) is non increasing
∫1
3. 0 ϕ (p) dp = 1.

Remark 9.6.1. Properties 1. and 3. in the previous proposition allows us to conclude that any coherent spectrum
is a density function (even if the contrary is not true).

9.7 Value at Risk (VaR)

Definition 9.7.1. The Value at Risk at the confidence level α (V aRα ) is the (opposite of the) maximum lost
which happens with probability at least α.

In other words, the V aRα is (the opposite of) a quantile. Thus, we can conclude that the V aRα coincides with the
value of γ in Figure 9.4.1 and, accordingly, we can write

V aRα = −F −1 (α) .
98 9 Risk measures

Figure 9.7.1: Graphical representation of the spectrum of V aRα

ϕ (p) = δ (p − α)
6

-
O α 1 p

Table 9.7.1: Returns on two assets and V aR

Probability X1 X2 X1 + X2 2X1 X1 + 2
0.1 −5 2 −3 −10 −3
0.1 −4 1 −3 −8 −2
0.1 −3 −2 −5 −6 −1
0.1 −2 1 −1 −4 0
0.1 −1 0 −1 −2 1
0.1 0 1 1 0 2
0.1 1 4 5 2 3
0.1 2 3 5 4 4
0.1 3 5 8 6 5
0.1 4 6 10 8 6
V aR0.3 3 −1 3 6 1

The V aRα can be represented as a spectral risk measure through the Dirac δ function.2 If we define the spectrum
as follows
ϕ (p) = δ (p − α) ,
then we obtain ∫ 1
V aRα = − δ (p − α) F −1 (p) dp = −F −1 (α) . (9.7.1)
0

The spectrum of the V aRα can be represented as in Figure 9.7.1 where we clearly see that it is not coherent
(since it is first increasing and then decreasing).
So, we can conclude that the V aRα is not a coherent risk measure (see Szego, 2002).
We can check that the V aRα is not coherent even with another example. Let us take the returns listed in Table
9.7.1 and compute, on them, the value of V aR0.3 .
We see that Axiom 3 does not hold

V aR0.3 (X1 + X2 ) > V aR0.3 (X1 ) + V aR0.3 (X2 ),


| {z } | {z } | {z }
3 3 −1

2 The Dirac function δ (x) takes always value 0 with the exception that limx→0 δ (x) = +∞.
9.7 Value at Risk (VaR) 99

and so the V aRα may suggest not to diversify a portfolio.


A way to avoid this negative result is to compute the weighted average of the worst V aR’s. This risk measure
is called «Conditional -VaR» and is computed as follows:n

1 α
CV aRα = V aRp dp,
α 0

nevertheless, since V aRα = −F −1 (α), then


∫ α
1
CV aRα = − F −1 (p) dp,
α 0

we see that this «new» measure coincides with the ESα .


There exists another link between ESα and V aRα which can be seen by computing the elasticity of the ESα
with respect to α:
( ∫ α )
∂ESα α 1 −1 1 −1 α V aRα
= F (p) dp − F (α) = −1 + . (9.7.2)
∂α ESα α2 0 α ESα ESα

If we discretise this equation we have


ESα+ϵ − ESα α V aRα
≃ −1 + ,
ϵ ESα ESα
and
α−ϵ ϵ
ESα+ϵ ≃ ESα + V aRα . (9.7.3)
α α
Thus, given the ES and the V aR at a given level, we can compute the ES at a new level (as an approximation).
Bibliography

C. Acerbi. Spectral measures of risk: a coherent representation of subjective risk aversion. Journal of Banking and
Finance, 26:1505–1518, 2002.
Ph. Artzner, F. Delbaen, J.-M. Eber, and D. Heath. Coherent measures of risk. Mathematical Finance, 3:203–228,
1999.
L. Bachelier. Thorie de la spculation. Annales Scientifiques de l’Ecole Normale Suprieure, 3:21–86, 1900.

K. C. Chan, G. A. Karolyi, F. A. Longstaff, and A. B. Sanders. An empirical comparison of alternative models of


the short-term interest rate. The Journal of Finance, 47(3):1209–1227, 1992.
R. Cont and P. Tankov. Financial Modelling With Jump Processes. Chapman & Hall/CRC – Financial Mathematics
Series, 2004.
J. C. Cox, J. E. Jr. Ingersoll, and S. A. Ross. A theory of the term structure of interest rates. Econometrica, 53:
385–407, 1985.
E. I. Fredholm. Sur une classe d’equations fonctionnelles. Acta Mathematica, (27):365–390, 1903.
S. L. Heston. A closed-form solution for options with stochastic volatility with applications to bond and currency
options. The Review of Financial Studies, 6(2), 1993.

I. Karatzas and E. S. Shreve. Brownian Motions and Stochastic Calculus. Springer, 1991.
D. Lando. On cox processes and credit risky securities. Review of Derivatives Research, 2:99–120, 1998.
H. Markowitz. Portfolio selection. The Journal of Finance, 7:77–91, 1952.
Giorgio P. Szego. No more var (this is not a typo). Journal of Banking & Finance, 26(7):1247–1251, July 2002.

O. Vasiček. An equilibrium characterization of the term structure. Journal of Financial Economics, 5:177–188,
1977.
B. ksendal. Stochastic Differential Equations - An Introduction with Applications. Fifth edition. Springer-Verlag,
2000.

B. ksendal and A. Sulem. Applied Stochastic Control of Jump Diffusions. Springer, 2007.

101

You might also like