Applied Quantitative Methods: Slides On Selected Topics

1
Applied Quantitative Methods
Slides on selected topics:
Lévy processes
Wald, Likelihood Ratio and Lagrange Multiplier Tests

in Econometrics
Empirical Pricing Kernels

2
Lévy processes
Szymon Borak
Center for Applied Statistics and Economics

Humboldt-Universität zu Berlin
borak@wiwi.hu-berlin.de
210
205
200
Y
195
190
185
Lévy processes 0 50 100
X
150 200
3
Introduction
The aim of this lecture is to present methods for modelling price

fluctuations of financial assets
Allianz 1991-03-19 to 1992-12-30
2200
asset price
2000
1800
0 100 200 300 400
210
205
200
Y
195
190
185
X
150 200
4
Application of financial models
• derivatives - option pricing

• risk management - Value at Risk calculations
210
205
200
Y
195
190
185
X
150 200
5
Option pricing
A derivative is a financial instrument that is derived from other financial

instruments and whose value depends on the values of other underlying
variables.
A option is a contract between two parties that gives the buyer the right
to buy(sell) an asset at a specified time at price K.
For the right to buy(sell) the buyer needs to pay the certain price.
210
205
200
Y
195
190
185
X
150 200
6
Call options
A call option is the option that gives the right to buy asset at the fixed
price K.
Pay-off
6
K - ST
max{ST − K, 0}
Figure 1: Value of a call option on the delivery day
210
205
200
Y
195
190
185
X
150 200
7
Put options
A put option is the option that gives the right to sell asset at the fixed
price K.
@
Pay-off @
6 @
@
@
@
@K - ST
max{K − ST , 0}
Figure 2: Value of a put option on the delivery day
210
205
200
Y
195
190
185
X
150 200
8
Value at Risk
VaR quantifies the maximal amount that may be lost in a portfolio over
a given period of time, at a certain confidence level.
Statistically speaking, VaR is the quantile of the P&L distribution.
P (L > V aR) ≤ 1 − α
where:
α is the confidence level typically 95% or 99%
L = −∆X(τ ) is the relative change (return) in portfolio value over the
horizon τ .
210
205
200
Y
195
190
185
X
150 200
9
Stochastic processes
For modelling changes of prices of financial assets stochastic processes

were proposed.
A stochastic process is a sequence of random variables {Xk ; k ≥ 0}. If
the observations are measured at regular intervals (e.g. daily, monthly,
quarterly, etc.) t = 0, 1, 2 . . . we have a stochastic process in discrete
time.
A stochastic process in continuous time is a collection of random
variables {Xt ; t ∈ IR+ } with a continuous time variable t.
210
205
200
Y
195
190
185
X
150 200
10
Binomial Processes
The simple random walk is a stochastic process: its increments

Zk = Xk − Xk−1 are either +1 or −1.
Assume that:
1. X0 , Z1 , Z2 , . . . are independent
2. P (Zk = 1) = p , P (Zk = −1) = 1 − p for all k
The random walk can be written as follows:
n
X
Xn = X0 + Zk , n = 1, 2, . . .
k=1
210
205
200
Y
195
190
185
X
150 200
11
Binomial processes (p=0.500)
20
10
Y
0
-10
-20
0 25 50 75 100
X
Figure 3: Simple random walk SFMBinomp.xpl
210
205
200
Y
195
190
185
X
150 200
12
As Var(Zk ) = Var(Z1 ) and X0 , Z1 , Z2 . . . are independent, the variance

of Xn equals:
Var(Xn ) = Var(X0 ) + n · Var(Z1 )
Variance increases with n and therefore the standard deviation increases

√
with n.
The variance of Zk is easily computed by using simple relationships of

the binomial distribution Var(Zk ) = p(1 − p).
210
205
200
Y
195
190
185
X
150 200
13
Approximation of the binomial distribution

For X0 = 0
Var(Xn ) = np(1 − p)
For large n we obtain an approximation of the distribution L(Xn ) of

Xn :
L(Xn ) ≈ N (0, np(1 − p))
210
205
200
Y
195
190
185
X
150 200
14
The Wiener process

Set the time interval [0, t]
Now decrease the time unit and the increment: consider {Xt∆ ; t ≥ 0}
(continuous time) which decreases by −∆x or increases by ∆x after ∆t
with probability p = 21 .
At time t = n · ∆t :
n
X
Xt∆ = Zk · ∆x = Xn · ∆x
k=1
where the independent increments {Zk ∆x} take the values ∆x or −∆x
with probability p = 12 .
210
205
200
Y
195
190
185
X
150 200
15
Since Xt∆ is a linear combination of Xn
(∆x)2
E[Xt∆ ] =0, Var(Xt∆ ) 2 2
= (∆x) · Var(Xn ) = (∆x) · n = t ·
∆t
Now let ∆t, ∆x −→ 0. Var(Xt∆ ) must stay finite and should not tend
to zero:
√
∆t −→ 0, ∆x = · ∆t , hence Var(Xt∆ ) −→ t .
If ∆t is small then n = t/∆t is large and Xn (symmetric random walk)

is N (0, n). Thus for ∀ t
L(Xt∆ ) ≈ N (0, n(∆x)2 ) ≈ N (0, t) .
210
205
200
Y
195
190
185
X
150 200
16
The limiting process {Xt ; t ≥ 0} which we obtain from {Xt∆ ; t ≥ 0}

√
with ∆t −→ 0, ∆x = ∆t is the Wiener Process or Brownian
Motion
Properties
(i) X0 = 0
(ii) Xt ∼ N (0, t), t ≥ 0
(iii) {Xt ; t ≥ 0} has independent increments: Xt − Xs is independent
from Xs , ∀ t > s ≥ 0
(iv) (Xt − Xs ) ∼ N (0, ·(t − s))
210
205
200
Y
195
190
185
X
150 200
17
delta t = 0.100, var = 1.000 *t
10
5
values of the process X_t delta
0
-5
-10
0 50 100
time t
Figure 4: Typical paths of Wiener process SFMWienerProcess.xpl
210
205
200
Y
195
190
185
X
150 200
18
Poisson process
Exponential distribution is the distribution with density λeλx 1x≥0

Let τi be a sequence of independent exponential random variables with
Pn
parameter λ and Tn = i=1 τi .
The process (Nt , t ≥ 0) defined by
X
Nt = 1t≥Tn
n≥1
is called Poisson process with intensity λ.
210
205
200
Y
195
190
185
X
150 200
19
Poisson process
6
5
4
Y
3
2
1
0
0 0.5 1
X
Figure 5: Typical paths of Poisson process genpoiss.xpl
210
205
200
Y
195
190
185
X
150 200
20
Properties of Poisson process

1. N0 = 0
2. sample paths are right continuous with left limits (cadlag)
3. for any t > 0, Nt follows a Poisson distribution with parameter λt:
n
(λt)
P (Nt = n) = e−λt
n!
4. Nt has independent and stationary increments
210
205
200
Y
195
190
185
X
150 200
21
Classical financial models
• Bachelier model: St = S0 + σWt , t ≥ 0
• Black-Scholes model: St = S0 exp(σWt + µt), t ≥ 0
210
205
200
Y
195
190
185
X
150 200
22
Allianz vs Black-Scholes simulation
2400
2200
asset price
2000
1800
0 100 200 300 400
Figure 6: Comparison between Allianz stock prices and prices generated

with Black-Scholes model
210
205
200
Y
195
190
185
X
150 200
23
Option price in Black-Scholes model
In Black-Scholes model one can derive price for the call option as:
√
C(S, K, τ, r) = SΦ(y + σ τ ) − e−rτ Φ(y)
where
S
log K + (r − 12 σ 2 )τ
y= √
σ τ
Φ(·) is standard normal distribution function.
210
205
200
Y
195
190
185
X
150 200
24
Returns of Allianz Returns of BS
6
5
4
2
Y*E-2
Y*E-2
0
0
-2
-4
-5
-6
0 1 2 3 4 0 1 2 3 4
X*E2 X*E2
Figure 7: Comparison of the Allianz price returns from 1991-03-19 to

1992-12-30 with returns generated in Black-Scholes model.
210
205
200
Y
195
190
185
X
150 200
25
Returns of DM/USD Returns of BS
5
5
0
Y*E-3
Y*E-3
0
-5
-5
-10
0 5 10 15 20 25 0 5 10 15 20 25
X*E3 X*E3
Figure 8: Comparison of the logarithm of FX Rate DM/US returns from

1992-10-01 to 1993-09-30 with returns generated in Black-Scholes model.
210
205
200
Y
195
190
185
X
150 200
26
-1
-2
-3
log(CDF(x))
-4
-5
-6
-5 -4 -3
log(x)
Figure 9: Left tails of empirical distribution function of log returns of

Allianz prices (red) and Black-Scholes simulation (blue) in double loga-
rithmic scale. Black line is the Gaussian fit for stock log returns.
210
205
200
Y
195
190
185
X
150 200
27
Lévy process
A stochastic process (Xt )t≥0 in R is called Lévy process if :

(i) (Xt ) has independent and stationary increments.
(ii) X0 = 0
(iii) (Xt ) has cadlag trajectories.
210
205
200
Y
195
190
185
X
150 200
28
Paul Lévy
210
205
200
Y
195
190
185
X
150 200
29
1886 Lévy is born in Pais.
1905 publishes his first paper on semiconvergent series.
1912 Lévy receives his Docteur és Sciences.
1913 Lévy becomes professor École des Mines in Paris in 1913.
1920 A professor at the Ecole Polytechnique where he remains until his

retirement.
1963 Lévy is elected to honorary membership of the London Mathematical

Society and in the following year to the Acadmie des Sciences.
1971 Lévy dies in Paris.
210
205
200
Y
195
190
185
X
150 200
30
The following processes are Lévy-processes :

• Wiener process with diffusion coefficient σ and drift µ
• Poisson process
210
205
200
Y
195
190
185
X
150 200
31
Compound Poisson process
A compound Poisson process with intensity λ is a stochastic process Xt

defined as:
Nt
X
Xt = Yi
i≥1
where:
Yi are i.i.d. with distribution f and Nt is a Poisson process with intensity
λ. Nt is independent from Yi .
When Yi = 1 we obtain standard Poisson process.
210
205
200
Y
195
190
185
X
150 200
32
3
2
Y
1
0
0 0.5 1
X
Figure 10: Typical paths of compound Poisson process with standard

normal distribution of jump size gencpoiss.xpl
210
205
200
Y
195
190
185
X
150 200
33
Characteristic function of compound Poisson process has the form:
Z +∞
E exp iuXt = exp(tλ (eiux − 1)f (dx))
−∞
Introducing new measure ν(A) = λf (A)
Z +∞
E exp iuXt = exp(t (eiux − 1)ν(dx))
−∞
ν is called Lévy measure and it is NOT probability measure.
210
205
200
Y
195
190
185
X
150 200
34
Interpretation of Lévy measure
ν(A) is the expected number, per unit time, of jumps whose size belongs
to A.
Example
ν(R) = 5 means that expected number of jumps on the interval [0, 1] is

5.
ν([1, 2]) = 3 means that one can expect 3 jumps of the size greater or
equal 1 and smaller or equal 2 on the unit time inteval.
210
205
200
Y
195
190
185
X
150 200
35
Composing Lévy processes
Simple Lévy process can be created from independent Brownian motion

with drift and diffusion coefficient (γt + aWt ) and compound Poisson
process Ct
Xt = γt + aWt + Ct
210
205
200
Y
195
190
185
X
150 200
36
0.5
0
Y
-0.5
-1
0.1 0.2 0.3 0.4 0.5 0.6

X
Figure 11: Typical paths of Lévy process composed from Brownian motion
with drift and compound Poisson process genlevy.xpl
210
205
200
Y
195
190
185
X
150 200
37
Merton model
The simplest application of jump processes is Merton model where the

price is modelled with equation:
Nt
X
S = S0 exp{Xt } = S0 exp{γt + σWt + Yi }
i≥1
where:
Wt is standard Wiener process
Nt is Poisson process with intensity λ independent from Wt
Yi ∼ N (µ, δ 2 ) are i.i.d independent from Wt and Nt
210
205
200
Y
195
190
185
X
150 200
38
Infinite activity Lévy processes
Lévy measure ν is finite for every compact set A such that 0 ∈

/ A.
Otherwise cadlag property need to be rejected.
In general Lévy measure ν is not necessarily a finite measure. Lévy
process can have infinite number of small jumps on interval [0, T ].
Sum of jumps becomes infinite series and its convergence imposes some
conditions on the measure ν.
Z
(|x|2 ∧ 1)ν(dx) < ∞
210
205
200
Y
195
190
185
X
150 200
39
Lévy-Itô decomposition
Each Lévy process can be decomposed into Brownian motion with

diffusion coefficient a and drift γ, compound Poisson proces with jump
size larger than 1 and compensated compound Poisson process with
jumps smaller than 1.
Xt = γt + aWt + Ct1 + lim Ctε

ε→0
where:
Xs ≤1
ε≤∆C
Ctε = ∆Cs − tν([ε, 1])
0≤s≤t
(Compensated compound Poisson process. This process is used for
210
technical reasons in order to ensure convergence.)
205
200
Y
195
190
185
X
150 200
40
Lévy triplet
A Lévy triplet is a triplet (a, ν, γ), where:
• a ∈ [0, ∞) diffusion coefficient
• ν Lévy measure on R with ν({0}) = 0
• γ ∈ R a drift
and
Z
(1 ∧ |x|2 )ν(dx) < ∞
R
210
205
200
Y
195
190
185
X
150 200
41
Lévy-Khintchine theorem
Let (Xt ) be a Lévy process, then there exists only one Lévy triplet
(a, ν, γ) with
E(eiuXt ) = etψ(u) , t ≥ 0, (1)
where ψ is given by
Z
1 2
ψ(u) = iγu − au + {eiux − 1 − iux1[−1,1] (x)}ν(dx), u ≥ 0.
2 R
210
205
200
Y
195
190
185
X
150 200
42
Infinitely divisible
A random variable X is called infinitely divisible, if for each n ∈ N there

(n) (n)
is an i.i.d sequence Y1 , . . . , Yn where
L (n)
X = Y1 + . . . + Yn(n)
210
205
200
Y
195
190
185
X
150 200
43
Example
• normal distribution
• Poisson distribution
• Gamma distribution
Y1 , . . . , Yn ∼ Gamma(a/n, β) independent, then
Y1 + . . . + Yn ∼ Gamma(a, β))
• Generalized hyperbolic distributions
210
205
200
Y
195
190
185
X
150 200
44
Let (Xt ) be a Lévy process. Then Xt is infinitely divisible for each t ≥ 0.

In particular X1 is infinitely divisible.
Let Y be an infinitely divisible random variable. There exist a Lévy
L
process (Xt ) where X1 = Y .
Having infinite divisible distribution for X1 one can construct continuous
Lévy process for each t.
210
205
200
Y
195
190
185
X
150 200
45
Wiener process
(i) W0 = 0
(ii) Wt ∼ N (0, t), t ≥ 0
(iii) {Wt ; t ≥ 0} has independent increments: Wt − Ws is independent
from Ws , ∀ t > s ≥ 0
(iv) (Wt − Ws ) ∼ N (0, (t − s))
210
205
200
Y
195
190
185
X
150 200
46
Wiener process
0.5
Y
0
-0.5
0 0.5 1
X
Figure 12: Typical paths of Wiener process genwiener.xpl
210
205
200
Y
195
190
185
X
150 200
47
Poisson process
Exponential distribution is the distribution with density λeλx 1x≥0

Let τi be a sequence of independent exponential random variables with
Pn
parameter λ and Tn = i=1 τi .
The process (Nt , t ≥ 0) defined by
X
Nt = 1t≥Tn
n≥1
is called Poisson process with intensity λ.
210
205
200
Y
195
190
185
X
150 200
48
Poisson process
6
5
4
Y
3
2
1
0
0 0.5 1
X
Figure 13: Typical paths of Poisson process genpoiss.xpl
210
205
200
Y
195
190
185
X
150 200
49
A compound Poisson process with intensity λ is a stochastic process Xt

defined as:
Nt
X
Xt = Yi
i≥1
where:
Yi are i.i.d. with distribution f and Nt is a Poisson process with intensity
λ. Nt is independent from Yi .
When Yi = 1 we obtain standard Poisson process.
210
205
200
Y
195
190
185
X
150 200
50
3
2
Y
1
0
0 0.5 1
X
Figure 14: Typical paths of compound Poisson process with standard

normal distribution of jump size gencpoiss.xpl
210
205
200
Y
195
190
185
X
150 200
51
Simple Lévy process can be created from independent Brownian motion

with drift and diffusion coefficient (γt + aWt ) and compound Poisson
process Ct
Xt = γt + aWt + Ct
210
205
200
Y
195
190
185
X
150 200
52
0.5
0
Y
-0.5
-1
0.1 0.2 0.3 0.4 0.5 0.6

X
Figure 15: Typical paths of Lévy process composed from Brownian motion
with drift and compound Poisson process genlevy.xpl
210
205
200
Y
195
190
185
X
150 200
53
Lévy process
A stochastic process (Xt )t≥0 in R is called Lévy process if :

(i) (Xt ) has independent and stationary increments.
(ii) X0 = 0
(iii) (Xt ) has cadlag trajectories.
Examples:
• combination of Brownian motion with drift and compound Poisson
process
• processes with infinite number of jumps
210
205
200
Y
195
190
185
X
150 200
54
Simulation
Although Lévy processes allow to build more realistic models we need to

pay a price for increased complexity of computation. In application we
can rarely use analytical methods for option pricing so numerical
methods are unavoidable.
In order to apply Lévy processes one need to have efficient simulation
methods.
210
205
200
Y
195
190
185
X
150 200
55
Monte Carlo for stochastic processes
For stochastic processes one needs to simulate many trajectories of the

process and obtain estimates of densities or quantiles.
Each trajectory is approximated on the discrete number of points.
210
205
200
Y
195
190
185
X
150 200
56
Computer representation of stochastic process
Set a grid of I + 1 time points on the interval [t0 , T ] :
t0 < t1 < ... < tI = T
where ti = t0 + iτ for i = 0, 1, ..., I and (T − t0 )/I.

For each point ti set value of the process Xti . The set of values
Xt0 , Xt1 , ..., XtI one trajectory of the process.
Repeat this procedure M times to obtain M trajectories of the process.
210
205
200
Y
195
190
185
X
150 200
57
Xt10 Xt11 Xt12 ··· Xt1I−1 Xt1I

Xt20 Xt21 Xt22 ··· Xt2I−1 Xt2I
.. .. .. .. .. ..
. . . . . .
XtM
0
XtM
1
XtM
2
··· XtM
I−1
XtM
I
Each row represents approximation of one trajectory.

Each column represents approximation of distribution of the process in
particular time point.
210
205
200
Y
195
190
185
X
150 200
58
140
120
Y
100
80
0 0.5 1
X
Figure 16: 10 paths of simulation of asset prices in Black-Scholes model

BStrajectories.xpl
210
205
200
Y
195
190
185
X
150 200
59
In order to calculate option price with Monte Carlo method generate

sufficiently many trajectories of the possible asset’s prices. Set the option
price as a discounted value of the mean of the payoff.
M
1 X
C M C (K) = e−rT max(StiI − K, 0)
M i=1
where: StiI is a simulated price of the asset in time point tI = T
210
205
200
Y
195
190
185
X
150 200
60
In order to calculate Value-at-Risk:
P (L > V aR) = 1 − α
approximate the distribution of loss as:
L = S0 − StI
V aR is the (1 − α)- quantile of the loss distribution L.
210
205
200
Y
195
190
185
X
150 200
61
Simulation of Wiener process
• divide time interval [0, T ] in I + 1 fixed time points

0 = t0 < t1 < ... < tI = T
• set W0 = 0
• simulate I standard normal variables N1 , ..., NI
√
• set ∆Wi = Ni ti − ti−1
Pi
• set Wti = k=1 ∆Wk
• repeat whole procedure M times
210
205
200
Y
195
190
185
X
150 200
62
Simulation of Poisson process
Since the process (Nt , t ≥ 0) is defined by

X
Nt = 1t≥Tn
n≥1
the algorithm for simulation is following:

0 = t0 < t1 < ... < tI = T and set N0 = 0
Pk
• simulate Tk from exp(λ) while i=1 Ti < T
Pk
• set Nti = sup{k : j=1 Tj < ti }
210
205
200
Y
195
190
185
X
150 200
63
Simulation of Poisson process
Improved algorithm
0 = t0 < t1 < ... < tI = T and set N0 = 0
• simulate from Poiss(λT ) the number of jumps N
• simulate N uniformly distributed variables on the interval [0, T ]
(They correspond to to the jumps time)
Pk
• set Nti = sup{k : j=1 Uj < ti }
210
205
200
Y
195
190
185
X
150 200
64
The improved algorithm for simulating Poisson process is based on two

properties
• the number of jumps on the interval [0, T ] has Poisson distribution
with parameter λT
• Conditionally on NT the exact moments of jumps on the interval
[0, T ] have the same distribution as NT independent random
numbers uniformly distributed on this interval. They need to
rearranged in increasing order.
210
205
200
Y
195
190
185
X
150 200
65
Simulation of compound Poisson process

0 = t0 < t1 < ... < tI = T and set C0 = 0
• generate total number of jumps N and jump times J1 , J2 , ..., JN
like in Poisson process case
• simulate N random variables Y1 , Y2 , ..., YN from the given
distribution λν
PNti
• set Cti = j=0 Yj where Y0 = 0
210
205
200
Y
195
190
185
X
150 200
66
Simulation of simple Lévy process
A simple Lévy process with characteristic triplet (a, ν, γ)
Xt = γt + aWt + Ct
can be approximated with following algorithm

0 = t0 < t1 < ... < tI = T and set X0 = 0
• generate Wiener process Wt and compound Poisson process Ct
• set Xti = aWti + Cti + γ(ti − ti−1 )
210
205
200
Y
195
190
185
X
150 200
67
Monte Carlo for Merton model
In Merton model:
Nt
X
S = S0 exp{Xt } = S0 exp{γt + σWt + Yi }
i≥1
one has simple Lévy process as a sum of Wiener process with drift and
compound Poisson process.
Using techniques for simulation of simple Lévy processes it is easy to
obtain simulated path of asset’s prices in Merton model by simple
exponential transformation.
210
205
200
Y
195
190
185
X
150 200
68
Building and simulating other Lévy process
Not every Lévy process can be obtained as simple sum of compound

Poisson process and Wiener process with drift.
There is a huge class of Lévy processes that have infinitely many jumps.
Most of them are not easily tractable and therefore they can hardly be
applied. However there are some particular cases where this kind of
processes can be taken into consideration.
210
205
200
Y
195
190
185
X
150 200
69
Building Lévy Processes
There are three convenient ways to define Lévy processes in parametric

way.
• subordinating Brownian motion with independent Lévy process
• directly specifying measure
• specify the density of increments in a given time scale as infinitely
divisible density
210
205
200
Y
195
190
185
X
150 200
70
Subordination
The Lévy process St with monotonic increasing paths is called

subordinator.
Let (0, ρ, b) be a generating triplet for St . Then for each u ≤ 0 moment
generating function of St has a form:
E(euSt ) = etl(u)
where:
R∞
l(u) = bu + 0
(eux − 1)ρ(dx) is called Laplace exponent
210
205
200
Y
195
190
185
X
150 200
71
Subordination
Let Xt be a Lévy processes with triplet (a, ν, γ) and characteristic

exponent Ψ(u) and St is subordinator with Laplace exponent l(u) and
triplet (0, ρ, b).
def
The process Yt = XSt is Lévy processes.
It’s characteristic function is given by:
E(eiuYt ) = etl(Ψ(u))
210
205
200
Y
195
190
185
X
150 200
72
Subordination
It is also possible to find the triplet (aY , ν Y , γ Y ) of Yt .

aY = ba
R∞
ν (B) = bν(B) + 0 pX
Y
s (B)ρ(ds)
R∞
γ = bγ + 0 ρ(ds) |x|≤1 pX
Y
R
s (dx)
where
pX
t is the probability distribution of Xt
210
205
200
Y
195
190
185
X
150 200
73
Since we need to specify pX

t the Brownian motion is a natural candidate
for Xt .
We will construct new Lévy processes by subordination of Wiener
process with drift µ and volatility σ
Lt = σWSt + µSt
210
205
200
Y
195
190
185
X
150 200
74
Generating the subordinated Brownian motion
Since many processes are based on Brownian subordination it is

important to know how to simulate them.
Algorithm for simulating subordinated Brownian motion.
0 = t0 < t1 < ... < tI = T and set X0 = 0
• simulate the increments of subordinator ∆Si = Sti − Sti−1
• simulate I standard normal variables N1 , ..., NI
√
• set ∆Xi = σNi ∆Si + µ∆Si , where σ is volatility µ is a drift
Pi
• set Xti = k=1 ∆Xk
210
205
200
Y
195
190
185
X
150 200
75
Example
Consider the Lévy measure of the form:
ce−λx
ρ(x) = 1x>0
x
where c and λ are positive.
Probability density of such a process is given as:
λct ct−1 −λx

pt (x) = x e 1x>0
Γ(ct)
This process is called gamma process and is a subordinator.
Brownian subordination of gamma process is called variance gamma
process.
210
205
200
Y
195
190
185
X
150 200
76
Example
Consider the Lévy measure of the form:
ce−λx
ρ(x) = 3/2 1x>0
x
where c and λ are positive.
Probability density of such a process is given as:
ct −λx−πc2 t2 /x+2ct√πλ
pt (x) = 3/2 e 1x>0
x
This process is called inverse gaussian process and is a subordinator.
Brownian subordination of inverse Gaussian process is called normal
inverse Gaussian process.
210
205
200
Y
195
190
185
X
150 200
77
Stable process
Stable distribution is the distribution with characteristic function:

α α πα


−σ |t| {1 − iβsign(t) tan 2 } + iµt, α 6= 1,
log φ(t) =

−σ|t|{1 + iβsign(t) π2 log |t|} + iµt,

α = 1.

Stable processes are process with stable distribution. For α = 2 stable

process is a Brownian motion.
210
205
200
Y
195
190
185
X
150 200
78
Generalized hyperbolic process
Probability density function of generalized hyperbolic distribution has a

form:
λ 1 p
2
p(x) = C(δ + (x − µ) ) 2 2 −4 Kλ− 12 (α δ 2 + (x − µ)2 )eβ(x−µ)
(α2 −β 2 )λ/2
where: C = √ √
2παλ−1/2 δ λ Kλ (δ α2 −β 2 )
Lévy process (Xt ) is called Generalized Hyperbolic Lévy Motion, when

X1 has generalized hyperbolic distribution.
Remark
Let (Xt ) be a Generalized Hyperbolic Lévy Motion. Then Xt doesn’t
210
need to have for t 6= 1 generalized hyperbolic distribution.
205
200
Y
195
190
185
X
150 200
79
210
205
200
Y
195
190
185
X
150 200
80
Option
A option is a contract between two parties that gives the buyer the right
to buy(sell) an asset at a specified time at price K.
For the right to buy(sell) the buyer needs to pay the certain price.
The Chicago Board Options Exchange (CBOE) first created
standardized, listed options (European calls on 16 stocks) in April 1973
210
205
200
Y
195
190
185
X
150 200
81
Call options
A call option is the option that gives the right to buy asset at the fixed
price K.
Pay-off
6
K - ST
max{ST − K, 0}
Figure 17: Value of a call option on the delivery day
210
205
200
Y
195
190
185
X
150 200
82
Put options
A put option is the option that gives the right to sell asset at the fixed
price K.
@
Pay-off @
6 @
@
@
@
@K - ST
max{K − ST , 0}
Figure 18: Value of a put option on the delivery day
210
205
200
Y
195
190
185
X
150 200
83
Black-Scholes model
In Black-Scholes model price of the asset is modelled with:
σ2
St = S0 exp(σWt + (r − )t), t ≥ 0
2
where:
St is the asset’s price in time point t
S0 is the asset’s price in a current time point (t = 0)
r is an interest rate
Wt is a standard Wiener process
σ is a volatility
210
205
200
Y
195
190
185
X
150 200
84
Option price in Black-Scholes model
In Black-Scholes model one can derive price for the call option as:
√
C(S0 , K, τ, r) = S0 Φ(y + σ τ ) − e−rτ Φ(y)
where
S0
log K + (r − 12 σ 2 )τ
y= √
σ τ
Φ(·) is standard normal distribution function.
210
205
200
Y
195
190
185
X
150 200
85
Put-call parity of European option: for a put and a call with the same
expiry, the same strike price and based on the same underlying, it holds
that
P = C − St + Ke−r(T −t)
where C is the call price and P the put price at t, r is the risk-free
interest rate.
210
205
200
Y
195
190
185
X
150 200
86
Parameters of the option price:

• S0 value of underlying asset
• K exercise price or strike price
• r interest rate
• τ time to maturity
• σ volatility
210
205
200
Y
195
190
185
X
150 200
87
S0 and r are values taken from the market.

K and τ are specified in the option contract.
σ is unknown parameter which measures incertainity of future changes of
price. It is often estimated as standard deviation from returns:
v
u n n
u 1 X 1X
σ̂ = t (Zi − Zj )2
n − 1 i=1 n j=1
210
205
200
Y
195
190
185
X
150 200
88
Implied volatility
On the market the prices of derivatives are determined by the law of

supply and demand. It means that C(S0 , K, τ, r) is observed.
In Black-Scholes formula for call option price only σ is not obeserved
√
C(S0 , K, τ, r) = S0 Φ(y + σ τ ) − e−rτ Φ(y)
S0
log K + (r − 12 σ 2 )τ
y= √
σ τ
210
205
200
Y
195
190
185
X
150 200
89
Implied volatility
Since the option price is a monotonic function of volatility it is possible

to find unique parameter σI such that match the equation:
C BS (S0 , K, τ, r, σI ) = C ∗ (K, τ )
where
C BS (S0 , K, τ, r, σI ) is a price given with Black-Scholes formula
C ∗ (K, τ ) is the price obcerved in the market
σI is called implied volatility.
210
205
200
Y
195
190
185
X
150 200
90
Since the Black-Scholes formula is complicated the implied volatility is

not given explicitly One needs to use numerical techniques to obtain the
result.
Prices on the option market are commonly quoted in terms of
Black-Scholes implied volatility. Black-Scholes formula is not used as a
pricing model but as a tool for representing prices in terms of implied
volatility.
210
205
200
Y
195
190
185
X
150 200
91
In Black-Scholes model σ is assumed to be constant. In real markets

implied volatilities exhibt non constant behaviour.
If we denote the implied volatility by σI (K, T ) then the surface
σI (K, T )K,T contains the implied volatility for all strikes and maturities.
On the option market we can observe only few point from this surface.
210
205
200
Y
195
190
185
X
150 200
92
20020516
1.00
0.80
0.60
0.40
0.20
0.80 0.65
0.88 0.52
0.96 0.39
0.26
1.03 0.13
1.11
Figure 19: Implied volatility surface
210
205
200
Y
195
190
185
X
150 200
93
Merton model
In Merton model the price of an asset is modelled as:
Nt
X
St = S0 exp{γt + σWt + Yi }
i≥1
Wt is standard Wiener process

Nt is Poisson process with intensity λ independent from Wt
Yi ∼ N (µ, δ 2 ) are i.i.d independent from Wt and Nt
σ2 µ+δ 2 /2
γ =r− 2 − λ(e − 1)
210
205
200
Y
195
190
185
X
150 200
94
In Black-Scholes model generated implied volatility surface is constant

what is in contradiction with observed option prices.
In Merton model generated implied volatility surface is not constant and
replicate the behavior of option prices more realistic.
210
205
200
Y
195
190
185
X
150 200
95
Implied volatility in Merton model
0.6
0.5
0.4
Y
0.3
0.2
2000 2500 3000 3500 4000
X
Figure 20: Implied volatilities genereted in Merton model

smilemerton.xpl
210
205
200
Y
195
190
185
X
150 200
96
Calibration problem
In order to use Merton model efficiently (for risk managment or pricing

exotic options) one needs to specify set of parameters (λ, σ, δ, µ).
• λ intensity of jumps
• σ volatility
• δ standard deviation of jumps
• µ mean of jumps
The specyfing the set of parameters is called calibration of the model.
210
205
200
Y
195
190
185
X
150 200
97
In calculating implied volatilities in Black-Scholes model one has one

option and one parameter. The solution is unique.
In calibration of the Merton model there are more options and four
parameters. The solution does not need to be unique.
The idea of calibration is to search for model parameters that minimize
the distance between the IVS of the model and an IVS observed on the
market.
210
205
200
Y
195
190
185
X
150 200
98
Minimizing function
X (C ∗ − C M (Θ))2 X (P ∗ − P M (Θ))2
i i i i
f (Θ) = 1S≤K + 1S>K
i
Ci∗ i
Pi∗
where:
Θ is the set of parameters (λ, σ, δ, µ)
Ci∗ , Pi∗ call/put option prices from the market
CiM (Θ), PiM (Θ) option prices calculated with Merton model with
parameters Θ
210
205
200
Y
195
190
185
X
150 200
99
Estimated parameters of the Merton model:

• λ
b = 0.950717
• σ
b = 0.100115
• δb = 0.119883
• µ
b = −0.109419
210
205
200
Y
195
190
185
X
150 200
100
Time to maturity T=0.3288
0.4
0.3
Y
0.2
0.1
0
2000 2500 3000 3500

X
Figure 21: Implied volatility of calibrated Merton model. Blue points de-
note which options were taken into calibration
210
205
200
Y
195
190
185
X
150 200
101
Problems with efficient calibration
There are several problems how to effiiently and accuratly calibrate

parameters of the Merton
• Option pricing function needs to be called many times. Monte Carlo
pricing function works too slow so using it in calibration is not
reasonable.
• The parameters’ space has four dimensions. It is hard to tell
anything about minimizing function.
210
205
200
Y
195
190
185
X
150 200
102
Fast method of option pricing
Carr, Madan proposed a method for option valuation based on the fast
Fourier transform(FFT).
Some motivations for the use of FFT:

• the considerable power of the FFT
• the Fourier transform of the (logarithm of the) price process is
known for many models specially models based on Lévy processes
like Merton model
• FFT allows to calculate prices for a whole range of strikes
210
205
200
Y
195
190
185
X
150 200
103
Pricing a single call
The value CT (k) of a T -maturity call with strike K = exp(k) is given by

Z ∞
CT (k) = e−rT (es − ek )qT (s)ds
k
where qT is the density of the log price ST .
As the function CT is not square-integrable we cannot apply the Fourier

inversion directly. Thus we consider the modified function
cT (k) = exp(αk)CT (k)
which should be square-integrable for a suitable α > 0.
210
205
200
Y
195
190
185
X
150 200
104
The Fourier transform of cT is defined by

Z ∞
ψ(v) = eivk cT (k)dk.
−∞
The Fourier transform ψ can be expressed as well:

e−rT φ(v − (α + 1)i)
ψ(v) = 2
α + α − v 2 + i(2α + 1)v
where φ is the Fourier transform of qT .
Example
2 u2 t −δ 2 u2 /2+iµu
−σ +iγut+λt(e −1)
In Merton model φ(u) = e 2 where:
σ2 µ+δ 2 /2
γ = r − 2 − λ(e − 1)
210
205
200
Y
195
190
185
X
150 200
105
As cT is square-integrable we can get back the call price by applying the

inverse Fourier transform
Z ∞
exp(−αk)
CT (k) = e−ivk ψ(v)dv.
π 0
The call price can be computed numerically using the trapezoid rule
N −1
exp(−αk) X
CT (k) ≈ wi e−ivj k ψ(vj )η
π j=0
where vj = ηj, j = 0, . . . , N − 1 with some η > 0.

1
w0 = wN −1 = 2 and w1 = ... = wN −2 = 1
210
205
200
Y
195
190
185
X
150 200
106
Pricing calls with different strikes
Let us consider now N calls with maturity T and strikes

1
ku = − N λ + λu, u = 0, . . . , N − 1
2
where λ > 0 is the distance between the log strikes.
The formula for the numerical approximation of the call price gives
N −1
exp(−αku ) X −iληju i 21 N λvj
CT (ku ) ≈ wi e e ψ(vj )η, u = 0, . . . , N − 1.
π j=0
210
205
200
Y
195
190
185
X
150 200
107
This representation allows a direct application of the FFT which is an

efficient algorithm for computing the sum
N −1
2π
X
ak = e−i N jk xj , k = 0, . . . , N − 1.
j=0
The parameters λ, η, N only have to satisfy the constraint

2π
λη = .
N
If we choose η small in order to obtain a fine grid for the numerical
integration, then we observe call prices at relatively large strike spacings,
with few strikes lying in the desired region near the stock price.
210
205
200
Y
195
190
185
X
150 200
108
FFT versus MC
FFT time: 0.015 sec.

MC time: 36.531 sec. (5000 simulations, 500 time steps)
disadvantages of FFT
• instable for fixed FFT parameter α, η, N
• applicable only to european options
210
205
200
Y
195
190
185
X
150 200
109
Searching for minimum
In order to find parameters of the model one needs to minimize

numericaly appropriate function.
The minimizing function could have many local minimums what makes
the problem more difficult.
The performance of the method can also depends on starting values of
the algorithm. There is no rule how to set the starting point.
210
205
200
Y
195
190
185
X
150 200
110
4
Y
2
0
-10 -5 0 5 10
X
210
205
200
Y
195
190
185
X
150 200
111
Simulated annealing
Simulated annealing is a numerical algorithm for finding a global

minimum of a function. Each step of the algorithm is adjusted by adding
some random variable with variance T . After certain amount of function
calls T is decreased and algorthm is restarted from the best ever point.
There is a hope that due to random adjustment the algorithm will jump
out of the local minimium valey and find valey with global minimum.
210
205
200
Y
195
190
185
X
150 200
112
Merton model is not good when we think about whole surface that is
why even more comlicated models need to be consider.
Estimated parameters of the Merton model for six different maturietes:
• λ
b = 0.096349
• σ
b = 0.127587
• δb = 0.17323
• µ
b = −0.568271
210
205
200
Y
195
190
185
X
150 200
113

0.4
0.3
Y
0.2
0.1
0
2000 2500 3000 3500

X
210
205
200
Y
195
190
185
X
150 200
114

0.4
0.3
Y
0.2
0.1
0
2000 2500 3000 3500

X
210
205
200
Y
195
190
185
X
150 200
115

0.4
0.3
Y
0.2
0.1
0
2000 2500 3000 3500

X
210
205
200
Y
195
190
185
X
150 200
116

0.4
0.3
Y
0.2
0.1
0
2000 2500 3000 3500

X
210
205
200
Y
195
190
185
X
150 200
117

0.4
0.3
Y
0.2
0.1
0
2000 2500 3000 3500

X
210
205
200
Y
195
190
185
X
150 200
118

0.4
0.3
Y
0.2
0.1
0
2000 2500 3000 3500

X
210
205
200
Y
195
190
185
X
150 200
119
Wald, Likelihood Ratio and Lagrange

Multiplier Tests in Econometrics
Definitions
Linearization
General Formulation of Wald, LR and Lagrange
Multiplier Tests
Composite Null Hypothesis
Likelihood Ratio Tests

Basics 120
Basics
Null hypothesis H0 is tested. If the critical statistics falls inside the

critical region then the test rejects H0 , otherwise it cannot reject it.
Type I error: H0 is falsely rejected

Type II error: H0 is incorrectly accepted

Basics 121
α – size of a test: probability of Type I errors (typically 5%)

β – probability of Type II errors
(1 − β) – power of a test:
Comparison of tests: a test is better than the others if it has the

maximum power (min β) among the tests with size α ≤ α0 , where α0 is
fixed.
The alternative hypothesis H1 must be specified

Model and Definitions 122
Model and Definitions
y is a T × 1 vector
f (y, θ) is joint density
θ is a k × 1 vector of parameters
Θ, θ ∈ Θ, is the parameter space

H0 : θ ∈ Θ0 ⊂ Θ
H1 : θ ∈ Θ1 ⊂ Θ, Θ0 ∩ Θ1 = ∅
Often Θ1 = Θ\Θ0
For a critical region CT the size αT and the power πT are:
αT = P (y ∈ CT |θ ∈ Θ0 )
πT = P (y ∈ CT |θ ∈ Θ1 )
Note: size does not depend on θ in most situations

If H0 is composite (includes multiple values of θ), the class of tests is

restriced to those where the size does not depend on the particular value
of θ ∈ Θ0 . Such tests are called similar.
Often critical regions are indexed by T (sample size)
α = lim αT
T →∞
π(θ) = lim πT , for θ ∈ Θ1
T →∞
A test is consistent if π (θ) = 1 for all θ ∈ Θ1

Most tests are consistent if they can be chosen according to their power
function.
Local alternatives: sequences of alternatives tending to H0 . Typical
econometric testing problem:
> > >

θ= θ1 , θ 2 ,
Θ1 = {θ1 } ,
θ2 unconstrained, i.e. θ2 ∈ Θ
Example 1: θ1 is the mean, θ2 is the variance

Example 2: regression problem, θ1 = θ10 vs. θ1 6= θ10 (e.g. variance, serial

correlation etc.)
H0 : θ1 = θ10 , θ2 unrestricted
Sequence of local alternatives:

H0 : θ1,T = θ10 + δT 1/2 , θ2 unrestricted for some vector δ
Choice of δ determines in which direction the test seeks departures from

H0
A test that is equally good in all directions is called an invariant test.

Summary
LRT is asymptotically locally more powerful among all invariant tests.
Asymptotically optimal: asymptotically locally most powerful. Tests ξ1

and ξ1 are asymptotically equivalent if ξ1 , ξ1 have the same critical
values and
P
|ξ1 − ξ2 | −→ 0 under H0 and H1

Linearization 128
Linearization
0

Nonlinear hypothesis: H0 : g θ =0
g is a p × 1 vector of factors
0 0
θ̄ is between θ and θ0

g (θ) = g θ + G θ̄ θ − θ ,
G is a first derivative matrix

0

G (θ) → G θ ≡G
Restriction is linear if Gθ = Gθ0
For local alternatives there is no loss of generality if considering the

linear hypothesis.

General Formulation of Wald, LR and Lagrange Multiplier Tests 129
General Formulation of Wald, LR and Lagrange

Multiplier Tests
• Breusch and Pagan (1980)
• Sarni (1976)
• Berndt and Sarni (1977)
Log-likelihood:
L(θ, y) = log f (y, θ)
∂L
FOC: ∂θ (θ̂, y) =0
∂L(θ,y)
score function: s(θ, y) = ∂θ ; MLE sets the score to zero

Fisher information:
V (θ̂) = J−1 (θ)/T
where
∂2L
J(θ) = −E >
(θ)/T
∂θ∂θ
If J(θ̂) estimator of J(θ0 ) and θ̂ is asymptotically normal (Wald, 1943):

L
ξW = T (θ̂ − θ0 )> J(θ̂)(θ̂ − θ0 ) −→ χ2k under H0
LRT (Wilks, 1938):

L
ξLR = −2{L(θ0 , y) − L(θ̂, y)} −→ χ2k under H0

Lagrange Multipliers:
H = L(θ, y) − λ> (θ − θ0 )
Maximization of the likelihood subject to the constraint θ = θ0 yields a

set of LM that measure the shadow price of the constraint (Aitchnan
and Silvey, 1958), (Silvey, 1959) and (Rao, 1948):
∂L
= λ, θ = θ0
∂θ
Distribution of the score under H0 has mean zero and variance J(θ0 )T
L
ξLM = s> (θ0 , y)> J−1 (θ0 )s(θ0 , y)/T −→ χ2k

The Three Principles

LMT
()
L θˆ
LRT
( )
Lθ0
θ0 θˆ θ
Wald test

Lemma 1: If L = b − 12 (θ − θ̂)> A(θ − θ̂), where A is a symmetric

positive definite matrix which may depend upon the data and known
parameters, b is a scalar and θ̂ is a function of the data, then the W, LR
and LM tests are identical.
Proof:
∂L
= −(θ − θ̂)> A = s(θ),
∂θ
∂2L
= −A = −T J
∂θ∂θ>
Thus:
ξW = (θ0 − θ̂)> A(θ0 − θ̂),
ξLM = s(θ0 )> A−1 s(θ0 ) = (θ0 − θ̂)> A(θ0 − θ̂)
Finally, by direct substitution:

ξLR = (θ0 − θ̂)> A(θ0 − θ̂). Q.E.D.

Composite Null Hypothesis 135
Composite Null Hypothesis
> > > > > >

θ= (θ1 , θ2 ) , θ̂ = (θ̂1 , θ̂2 ) , θ 1 ∈ Rk 1
H0 : θ1 = θ10
> >
ML estimate of θ2 under H0 : θ̃2 , θ̃ = (θ10 , θ20 )>
 
J11 J12
J11 – partioned inverse of J =  
J21 J22
−1
J11 = J11 − J12 J22 −1 J21

Wald test:
−1 L
ξW = T (θ̂1 − θ10 )> J11 (θ̂1 − θ10 ) −→ χ2k1 under H0
LRT: n o
L
ξLR = −2 L(θ̃, y) − L(θ̂, y) −→ χ2k1 under H0
LMT:
H = L(θ, y) − λ> (θ1 − θ10 )

FOC:
∂L
(θ, y) = λ
∂θ1
∂L
(θ, y) = 0
∂θ2
Thus:
θ1 = θ10
L
ξLM = s(θ̃, y)> J−1 (θ̃)s(θ̃, y)/T = s(θ̃, y)> J11 s(θ̃, y)/T −
→ χ2k1

Lemma 2: If the likelihood is locally quadratic as in Lemma 1 then all

tests are identical.
Proof:
−1
ξW = (θ10 − θ̂1 )> A11 (θ10 − θ̂1 ) = (θ10 − θ̂1 )> (A11 −A12 A−1 0
22 A21 )(θ1 − θ̂1 )
For the two other tests θ̂2 must be the estimator.
S2 (θ, y) = 0
 
A11 (θ1 − θ̂1 ) + A12 (θ2 − θ̂2 )

S1 ∂L
= = A(θ − θ̂) =  =0
S2 ∂θ A21 (θ1 − θ̂1 ) + A22 (θ2 − θ̂2 )
S2 = 0 ⇒ θ̃2 − θ̂2 = −A−1

22 A21 (θ1 − θ̂1 )

The concentrated likelihood function:

1
L = b − (θ1 − θ̂1 )> (A11 − A12 A−1
22 A21 )(θ1 − θ̂1 )
2
and:
ξLR = (θ10 − θ̂1 )> (A11 − A12 A−1
22 A21 )(θ 0
1 − θ̂1 )
the score:
S1 (θ̃) = A11 (θ10 − θ̂1 ) + A12 (θ̃2 − θ̂2 )
= (A11 − A12 A−1 0
12 A21 )(θ1 − θ̂1 )
⇒ ξLM = (θ10 − θ̂1 )> (A11 − A12 A−1 0

22 A21 )(θ1 − θ̂1 )
Q.E.D.

The tests do not depend on the value of θ2 (under H0 ) ⇒ all tests

are similar.
An easier construction of LMT:

L
if T −1/2 S(θ0 , y) −
→ N (0, V ) under H0 ⇒
ξLM = S > V −1 S/T
Example
yt , t = 1, ..., T , is a set of independent binomial random variables

1 with p = θ
yt =
0 with p = 1 − θ

Example (cont.)
H0 : θ = θ0
H1 : θ 6= θ0 , θ ∈ (0, 1)
X
ȳ = yt /T
t
X
L(θ, y) = {yt log θ + (1 − yt ) log(1 − θ)}
t
ML estimate: θ̂ = ȳ
1 X
s(θ, y) = (yt − θ)
θ(1 − θ) t
P
T θ(1 − θ) + (1 − 2θ) (yt − θ) 1
J(θ) = E /T =
θ2 (1 − θ)2 θ(1 − θ)

Wald test:
ξW = T (θ0 − ŷ)2 /ŷ(1 − ŷ)
LMT:
0
0
θ (1 − θ0 )
P
y
t t − θ 0 0 0
ξLM = = T (θ − ȳ)/θ (1 − θ )
θ0 (1 − θ0 ) T
LRT:
ξLR = 2T {ȳ log ȳ/θ0 + (1 − ȳ) log(1 − ȳ)/(1 − θ0 )}
A Taylor expansion about ȳ = θ0 establishes that under H0 the three

tests will have the same distribution

Example 2
yT∗ ×1 ∈ RT , x∗T ×k
y ∗ |x∗ ∼ N (x∗ β, σ 2 I)
H0 : Rβ = r, Rk1 ×k
r is a vector of constraints
We may reparametrize the problem to y|x ∼ N (xθ, δ 2 I)
H0 : θ1 = 0; y and x are linear combinations of y ∗ and x∗
Log-likelihood
T 1
L(θ, y) = k − log σ 2 − 2 (y − xθ)> (y − xθ), k = const
2 2σ
Lemma 1 & 2 guarantee that W, LR and LM would be identical

The score and information matrices are:

s(θ, y) = x> u/σ 2 , u = y − xθ
Jθθ = x> x/σ 2 T
Notice that the score is proportional to the correlation coefficient
between the residiuals and the x variables
This correlation coefficient is zero for θ = θ̂, but not for the estimates θ̃
under H0
The three test statstics are:
ξW = (θ10 − θ̂1 )> (x> > >

1 x1 − x1 x2 (x2 x2 )
−1 >
x2 x1 ) = (θ10 − θ̂1 )/σ̂ 2
ξLM = ũ> x1 (x> > >
1 x1 − x1 x2 (x2 x2 )
−1 >
x2 x1 )−1 = x>
1 ũ/σ̃
2
ξLR = T log(ũ> ũ/ũ> û)

where:
û = y − xθ̂ ũ = y − xθ̃
σ̂ 2 = û> û/T σ̃ 2 = ũ> ũ/T
x = (x1 , x2 )
The statistics can be rewritten as:

ξW = T (ũ> ũ − û> û)/û> û
ξLM = T (ũ> ũ − û> û)/û> ũ ⇒
ξLR = T log(1 + ξW /T )
ξLM = ξW /(1 + ξW /T )
(T − K)ξW /T k1 ∼ Fk1 ,T −k under H0

References
R. F. Engle (1994): Wald, Likelihood Ratio, and Lagrange Multiplier
Tests in Econometrics in Handbook of Econometrics, 4th edition,
pp.776-826, North-Holland

Handel, Michael
Institut für Statistik und Ökonometrie

Humboldt-Universität zu Berlin
www.case.hu-berlin.de

Pricing Kernels - Background 1-1
Motivation
• Asset pricing Kernel summarizes investor preferences for payoffs over

different states of the world. In the absence of arbitrage, the Lucas
asset pricing equation holds:
Pt = Et [Mt Xt+1 ]
Pt - Asset price at time t, Mt - Pricing Kernel, Xt+1 = Pt+1 + dt+1 is
Asset Payoff in one period, dt - dividend at time t.

Motivation
• Asset pricing Kernel summarizes investor preferences for payoffs over

different states of the world. In the absence of arbitrage, the Lucas
asset pricing equation holds:
Pt = Et [Mt Xt+1 ]
Pt - Asset price at time t, Mt - Pricing Kernel, Xt+1 = Pt+1 + dt+1 is
Asset Payoff in one period, dt - dividend at time t.
• The goal is to investigate the empirical characteristics of

investor risk aversion over equity return states by estimating a
time-varying pricing kernel, which is called Empirical Pricing
Kernel (EPK).
Based on a paper by J.V. Rosenberg and R.F. Engle
[Rosenberg & Engle, 2001].

Presentation Outline
• Background and Common Problems - Pricing Kernels and Risk

Aversion
• Empirical Pricing Kernels (EPK) - Estimation and Specification
• The Stochastic Volatility Model
• Data and Results
• Conclusions

Asset Pricing and Pricing Kernels - Common Facts
Xt+1
• Asset pricing equation:Pt = Et [Mt Xt+1 ], or: 1 = Et [Mt · ].
Pt
| {z }
def
= Rt+1
Rt+1 - Return on asset.

Xt+1
Pt
| {z }
def
= Rt+1
uc (ct+1 ) def
• A private case of kernel is Mt = β · uc (ct ) = M RSt - Marginal
Rate of Substitution at t, describes consumption smoothing, β - a
discount factor.
1 1−γ
• Under power utility, u(ct ) = c
1−γ t
, the M RS is a function of
consumption growth.

Xt+1
P
| {zt }
def
= Rt+1
uc (ct+1 ) def
• A private case of kernel is Mt = β · uc (ct ) = M RSt - Marginal
Rate of Substitution at t, describes consumption smoothing, β - a
discount factor.
• Under power utility, u(ct ) = 1
c1−γ ,
1−γ t
the M RS is a function of
consumption growth.
• Covariance decomposition:
Pt = Et [Mt Xt+1 ] = Et [Mt ] · Et [Xt+1 ] + Covt [Mt , Xt+1 ]

| {z } | {z } | {z }
f P ayof f RiskP remium
[Rt,t+1 ]−1

Risk Aversion - Common Facts
• Risk Aversion: u(αcL + (1 − α)cH ) > αu(cL ) + (1 − α)u(cH ).

def
• Coefficient of Relative Risk Aversion: CRRA(ct ) = − ucc (ct )ct
uc (ct ) ,
Under power utility, CRRA = γ.
0
Mt+1 (c )·ct+1
• Generalized γ = − Mt+1t+1(ct+1 ) [Arrow, 1965], [Pratt, 1964]

Pricing Kernels - Bounding the Return

Covariance decomposition (Return on Assets):
1 = Et [Mt ]·Et [Rt+1 ]+Covt [Mt , Rt+1 ] = Et [Mt ]·Et [Rt+1 ]+ρM,R ·σR ·σM
h i
f f
⇒ Et [Rt+1 ] ∈ Rt,t+1 ± Rt,t+1 · σR · σM where σM is unknown.

Pricing Kernels - Puzzles
• γ and σm can not be estimated empirically.

• Under the assumptions of power utility and log-normal consumption
growth, σm ≈ Rf1 · γσct+1 /ct .
t,t+1

Pricing Kernels - Background 1 - 10

t,t+1
• Equity Premium Puzzle: Actual returns are outside the bounded

region [Mehra & Prescott, 1985].


t,t+1

• Risk-free Rate Puzzle: Increasing γ leads to excessive risk-free
returns [Weil, 1989].


t,t+1

• Risk-free Rate Puzzle: Increasing γ leads to excessive risk-free
returns [Weil, 1989].
Possible solutions: Altering the Pricing Kernel or estimating it

empirically!!

EPK - Estimation and Specification 2-1
Estimation and Specification

Pricing Kernel Projection
• Pricing kernel and risk aversion are functions of many variables, not
only consumption. We look for the projection of pricing kernels onto
the payoff of the asset Xt+1 .
def
• Mt = Mt (Zt , Zt+1 ),
Zt - a vector of all pricing kernels state variables.
• Now Pt = Et [Mt∗ (Xt+1 ) · Xt+1 ],

def
Mt∗ (Xt+1 ) = Et [Mt (Zt , Zt+1 )|Xt+1 ] - Projected pricing kernel.
def M ∗ 0t+1 (Xt+1 )·Xt+1
• Equivalently γt∗ = − M ∗ (Xt+1 ) - Projected risk aversion.
t+1

Estimation Technique
• Choosing an Empirical Pricing Kernel (EPK) function which is the
best fit to current derivative prices, given current expectations on
future payoff.
• Exchanging the payoff of asset i with a payoff function gi (rt+1 ) and
an estimated probability density function fˆt (rt+1 ) of the underlying
asset’s return rt+1 , gives:
h i Z
P̂i,t = Et M̂t∗ (rt+1 ) · gi (rt+1 ) = M̂t∗ (rt+1 )gi (rt+1 )fˆt (rt+1 )drt+1
M̂t∗ (rt+1 ) - Estimated Pricing Kernel, projected on asset return

P̂i,t - Estimated Price of asset i at time t.

Two Pricing Kernel Specifications
1. Power function of the asset’s gross return:

M̂t∗ (rt+1 , θt ) = θ0,t (rt+1 )−θ1,t
• θ0,t - scaling factor, θ1,t pricing kernel slope.
• Level of risk aversion is γt∗ = θ1,t , time-varying or time-invariant.

Two Pricing Kernel Specifications
1. Power function of the asset’s gross return:

M̂t∗ (rt+1 , θt ) = θ0,t (rt+1 )−θ1,t
• θ0,t - scaling factor, θ1,t pricing kernel slope.
• Level of risk aversion is γt∗ = θ1,t , time-varying or time-invariant.
2. Orthogonal Polynomial Expansion - exponential of the
generalized Chebyshev polynomial with N+1 terms.
"N #
X
∗
M̂t (rt+1 , θt ) = θ0,t T0 (rt+1 ) exp θn,t Tn (rt+1 )
n=1
The polynomial is defined over [a, b] with terms

−b−a
Tn (rt+1 ) = cos(n · cos−1 ( 2rt+1
b−a ))

Stochastic Volatility Model 3-1
Stochastic Volatility Model
• Equity index return volatility is stochastic, mean-reverting and

responds asymmetrically to positive and negative returns.
• Using an asymmetric GARCH (1,1) model (approximation):

St
ln − rf = µ + εt , where εt ∼ f (0, σt|t−1
2
)
St−1
2
σt|t−1 = ω1 + ω2 I + αε2t−1 + βσt−1|t−2
2
+ δ max[0, −εt−1 ]2
ω2 I - Optional constant, shift in long-run volatility.
• Model parameters estimated using maximum likelihood with a

normal innovation density, tested empirically and found to be the
best fit.

Empirical Innovation Density Estimation

• Modelling f by factorizing into time-invariant standardized
εt
innovation σt|t−1 and time-varying σt|t−1 components.


εt
• Having a set of standardized innovations as the time-invariant

component and conditional standard deviation as the time-varying
component of the empirical density function f .


εt

• The set of estimated standardized innovations forms a pdf with
extreme return behavior such as excess skewness and kurtosis.


εt

• The set of estimated standardized innovations forms a pdf with
extreme return behavior such as excess skewness and kurtosis.
• Creating multi-period return density by simulating many
multi-period return paths, updating the conditional standard
deviation after each time-step.

Hedging Ratio Specification

• Delta- and Gamma-Neutral portfolio, hedging with Put options.
• Stock prices follow a Trinomial tree with ε-sized increments.
• Put price according to our asset pricing model is:
P utt = Et [M ∗ (rt,t+T , θt ) max[0, K − St+T ]]


• For the next period we can approximate:
P utt+1|St+1 ≈ Et+1|St+1 [M ∗ (rt+1,t+T , Et [θt+1 ]) max[0, K − St+T ]]
AR(K) process θt+1 = α + β0 θt + . . . + βK θt−K + . . . + et+1


• For the next period we can approximate:
P utt+1|St+1 ≈ Et+1|St+1 [M ∗ (rt+1,t+T , Et [θt+1 ]) max[0, K − St+T ]]
AR(K) process θt+1 = α + β0 θt + . . . + βK θt−K + . . . + et+1
• Using Monte-Carlo simulation enables:

 
J
1 X ∗
P utt+1|St+1 ≈ M (rt,t+T , [α + β0 θt + . . . + βK θt−K ]) max[0, K − St+T,j ]

J j=1
| {z }
=Et [θt+1 ]

Data and Results 4-1
The Data and Results

• Subset of Berkeley Options Database 1991-1995, time-synchronized
daily closing prices.
• Implied volatility used to correct different closing times effect.
• Screening criteria, e.g. moneyness | SKt − 1| < 0.10 and prices that
satisfy no-arbitrage bounds.
• Due to Put-Call Parity, only OTM Puts and OTM-Calls are used for
EPK estimation.
• Annualized mean return 7.55%, annualized standard deviation
14.79%, negative skewness, excess kurtosis, evident serial
correlation.

Power Specification
N=53 Mean StD Min Max
θ0,t 1.01 0.01 0.99 1.02
θ1,t 7.36 2.58 2.36 12.55
Pricing Error StD $0.63 $0.26 $0.28 $1.34
Polynomial Specification
N=53 Mean StD Min Max
θ0,t 0.19 0.10 0.04 0.40
θ1,t -2.25 1.06 -4.38 -0.25
θ2,t -0.88 0.68 -2.52 0.19
θ3,t -1.08 0.42 -1.94 -0.19
Pricing Error StD $0.09 $0.05 $0.03 $0.24
Table 1: EPK estimation results: Polynomial fits better!!

Estimated S&P500 Empirical Pricing Kernel

Both graphs show a time-varying risk aversion [Rosenberg & Engle, 2001]

Estimation of S&P500 EPK and Risk Aversion -

Summary
• Orthogonal polynomial specification fits S&P500 option prices

better than power specification, with respect to pricing errors.
• Both orthogonal polynomial and power EPK estimates show a clear
time-varying level of risk aversion.
• Orthogonal polynomial estimates exhibit known risk-aversion
characteristics, such as a region of negative absolute risk aversion
around 0% returns and positive autocorrelation [Jackwerth, 2000].

The Hedging Test
• Estimating hedge ratios using time-invariant and time-varying EPK

with both specifications, and using ATM puts and S&P500 index as
hedging instruments.

The Hedging Test

• Estimating a forecast model in which tomorrow’s parameter vector is
AR(1): θt+1 = α + βθt + et+1 , both coefficients are significant in a
regression.

The Hedging Test

regression.
• 3 possible realizations for next day’s S&P500 level and
corresponding put prices, depending on expected pricing kernel and
payoff density function (estimated by the simulation using the
asymmetric GARCH model).

The Hedging Test

regression.
• 3 possible realizations for next day’s S&P500 level and
corresponding put prices, depending on expected pricing kernel and
payoff density function (estimated by the simulation using the
asymmetric GARCH model).
• Measuring the time-series of hedging errors and standard deviation
of hedge portfolio price.

Hedging Test Results

Pricing Kernel Specific. Portfolio StD StD Reduction t-stat
No hedge:
$100 OTM writ.put pos. $22.56
Hedge with underlying:
Time-invariant power $12.41
EPK power $12.11 2.39% 1.16
Time-invariant polynom. $13.45
EPK polynomial $13.13 2.36% 1.95

Data and Results 4 - 10
Pricing Kernel Specific. Portfolio StD StD Reduction t-stat

Hedge with ATM put:
EPK power $11.10 0.95% 2.82
Hedge with both:
EPK power $11.29 0.63% 2.94
Table 2: Hedging test results

Data and Results 4 - 11
Hedging Test Results - Summary
• Strong evidence that pricing kernel is time-varying, we always

improve hedging performance by 1% to 3%. 4 of the 6 cases show
significant improvement.
• The best performance is achieved when hedging with ATM put
alone, with a power specified pricing kernel. Hedging with ATM put
alone is superior to hedging with the S&P500 portfolio alone and
with the combination of ATM puts and the underlying.
• Hedging performance using power specification is consistently
superior to the performance using orthogonal polynomial
specification. The improvement is significant in 5 of the 6 cases.

Summary 5-1
Empirical Pricing Kernels -

Conclusions
• We used S&P500 index option prices and estimated S&P500 return

densities to estimate empirical pricing kernel and empirical risk
aversion each month between 1991-1995.

Summary 5-2
Empirical Pricing Kernels -

Conclusions
• We used S&P500 index option prices and estimated S&P500 return

densities to estimate empirical pricing kernel and empirical risk
aversion each month between 1991-1995.
• Orthogonal polynomial pricing kernel specification fits option
price data better than power specification.
• However, power specified pricing kernel is superior in terms of
hedging performance.
• Empirical risk aversion is countercyclical.

Summary 5-3
References
[Arrow, 1965] Arrow K.J., Aspects of the Theory of Risk-Bearing,
Yrjö Hahnsson Foundation, Helsinki, 1965.
[Cochrane, 2001] Cochrane J.H., Asset Pricing, Princeton University
Press, 2001.
[Ebell, 2004] Ebell M., Capital Markets and Macroeconomy, lecture
notes, HU-Berlin, Winter 2003-04.
[Franke, Härdle & Hafner, 2003] Franke J., Härdle W., Hafner C.,
Einführung in die Statistik der Finanzmärkte, 2. Edition, Springer
2003.
[Jackwerth, 2000] Jackwerth J., Recovering Risk Aversion from Option
Prices and Realized Returns, Review of Financial Studies, vol.13,
2000.

Summary 5-4
[Ljungqvist & Uhlig, 1999] Ljungqvist L., Uhlig H., On Consumption

Bunching under Campbell-Cochrane Habit Formation, Working
paper, 1999.
[Mehra & Prescott, 1985] Mehra R. and Prescott E.C., The Equity
Premium A Puzzle, Journal of Monetary Economics, vol.15, 1985.
[Pratt, 1964] Pratt J.W., Risk Aversion in the Small and in the Large,
Econometrica, Vol. 32, 1964.
[Rosenberg, 2000] Rosenberg J.V., Asset Pricing Puzzles: Evidence from
Options Markets, Working paper, 2000.
[Rosenberg & Engle, 2001] Rosenberg J.V., Engle R.F., Empirical
Pricing Kernels, Working paper, 2001.
[Weil, 1989] Weil P., The equity premium puzzle and the risk-free rate
puzzle, Journal of Monetary Economics, vol.24, 1989.

Applied Quantitative Methods: Slides On Selected Topics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applied Quantitative Methods: Slides On Selected Topics

Uploaded by

Copyright:

Available Formats

1

Applied Quantitative Methods

Slides on selected topics:

Wald, Likelihood Ratio and Lagrange Multiplier Tests

Empirical Pricing Kernels

Center for Applied Statistics and Economics

The aim of this lecture is to present methods for modelling price

Allianz 1991-03-19 to 1992-12-30

0 100 200 300 400

Application of financial models

• derivatives - option pricing

A derivative is a financial instrument that is derived from other financial

For modelling changes of prices of financial assets stochastic processes

The simple random walk is a stochastic process: its increments

Binomial processes (p=0.500)

Figure 3: Simple random walk SFMBinomp.xpl

As Var(Zk ) = Var(Z1 ) and X0 , Z1 , Z2 . . . are independent, the variance

Variance increases with n and therefore the standard deviation increases

The variance of Zk is easily computed by using simple relationships of

Approximation of the binomial distribution

For large n we obtain an approximation of the distribution L(Xn ) of

The Wiener process

Since Xt∆ is a linear combination of Xn

If ∆t is small then n = t/∆t is large and Xn (symmetric random walk)

L(Xt∆ ) ≈ N (0, n(∆x)2 ) ≈ N (0, t) .

The limiting process {Xt ; t ≥ 0} which we obtain from {Xt∆ ; t ≥ 0}

delta t = 0.100, var = 1.000 *t

Figure 4: Typical paths of Wiener process SFMWienerProcess.xpl

Exponential distribution is the distribution with density λeλx 1x≥0

is called Poisson process with intensity λ.

Figure 5: Typical paths of Poisson process genpoiss.xpl

Properties of Poisson process

Classical financial models

• Bachelier model: St = S0 + σWt , t ≥ 0

• Black-Scholes model: St = S0 exp(σWt + µt), t ≥ 0

Allianz vs Black-Scholes simulation

0 100 200 300 400

Figure 6: Comparison between Allianz stock prices and prices generated

Option price in Black-Scholes model

Φ(·) is standard normal distribution function.

Returns of Allianz Returns of BS

Figure 7: Comparison of the Allianz price returns from 1991-03-19 to

Returns of DM/USD Returns of BS

Figure 8: Comparison of the logarithm of FX Rate DM/US returns from

Figure 9: Left tails of empirical distribution function of log returns of

A stochastic process (Xt )t≥0 in R is called Lévy process if :

1886 Lévy is born in Pais.

1905 publishes his first paper on semiconvergent series.

1912 Lévy receives his Docteur és Sciences.

1913 Lévy becomes professor École des Mines in Paris in 1913.

1920 A professor at the Ecole Polytechnique where he remains until his

1963 Lévy is elected to honorary membership of the London Mathematical

1971 Lévy dies in Paris.

The following processes are Lévy-processes :

Compound Poisson process

A compound Poisson process with intensity λ is a stochastic process Xt

Compound Poisson process

Figure 10: Typical paths of compound Poisson process with standard

Characteristic function of compound Poisson process has the form:

Introducing new measure ν(A) = λf (A)

ν is called Lévy measure and it is NOT probability measure.

Interpretation of Lévy measure