This action might not be possible to undo. Are you sure you want to continue?
Without Agonizing Pain
P.A. Forsyth
∗
February 22, 2005
Contents
1 The First Option Trade 3
2 The BlackScholes Equation 3
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 A Simple Example: The Two State Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 A hedging strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Geometric Brownian motion with drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6.1 Ito’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6.2 Some uses of Ito’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.3 Some more uses of Ito’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.7 The BlackScholes Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.8 Hedging in Continuous Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.9 The option price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.10 American early exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 The Risk Neutral World 16
4 Monte Carlo Methods 18
4.1 Monte Carlo Error Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Random Numbers and Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 The BoxMuller Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.1 An improved Box Muller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Speeding up Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 Estimating the mean and variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.6 Low Discrepancy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.7 Correlated Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.8 Integration of Stochastic Diﬀerential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.8.1 The Brownian Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.8.2 Strong and Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
∗
School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1, paforsyt@elora.uwaterloo.ca,
www.scicom.uwaterloo.ca/ paforsyt, tel: (519) 8884567x4415, fax: (519) 8851208
1
5 The Binomial Model 33
5.1 A Noarbitrage Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6 More on Ito’s Lemma 38
7 Derivative Contracts on nontraded Assets and Real Options 40
7.1 Derivative Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.2 A Forward Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.2.1 Convenience Yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8 Discrete Hedging 45
8.1 Delta Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.2 Gamma Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.3 Vega Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9 Jump Diﬀusion 49
9.1 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.2 The Jump Diﬀusion Pricing Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
10 Mean Variance Portfolio Optimization 53
10.1 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
10.2 The Portfolio Allocation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10.3 Adding a Riskfree asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
10.4 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
10.5 Individual Securities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
11 Stocks for the Long Run? 62
12 Further Reading 64
12.1 General Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
12.2 More Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
12.3 More Technical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2
“Men wanted for hazardous journey, small wages, bitter cold, long months of complete darkness,
constant dangers, safe return doubtful. Honour and recognition in case of success.” Advertise
ment placed by Earnest Shackleton in 1914. He received 5000 replies. An example of extreme
riskseeking behaviour. Hedging with options is used to mitigate risk, and would not appeal to
members of Shackleton’s expedition.
1 The First Option Trade
Many people think that options and futures are recent inventions. However, options have a long history,
going back to ancient Greece.
As recorded by Aristotle in Politics, the ﬁfth century BC philosopher Thales of Miletus took part in a
sophisticated trading strategy. The main point of this trade was to conﬁrm that philosophers could become
rich if they so chose. This is perhaps the ﬁrst rejoinder to the famous question “If you are so smart, why
aren’t you rich?” which has dogged academics throughout the ages.
Thales observed that the weather was very favourable to a good olive crop, which would result in a bumper
harvest of olives. If there was an established Athens Board of Olives Exchange, Thales could have simply
sold olive futures short (a surplus of olives would cause the price of olives to go down). Since the exchange
did not exist, Thales put a deposit on all the olive presses surrounding Miletus. When the olive crop was
harvested, demand for olive presses reached enormous proportions (olives were not a storable commodity).
Thales then sublet the presses for a proﬁt. Note that by placing a deposit on the presses, Thales was actually
manufacturing an option on the olive crop, i.e. the most he could lose was his deposit. If had sold short
olive futures, he would have been liable to an unlimited loss, in the event that the olive crop turned out bad,
and the price of olives went up. In other words, he had an option on a future of a nonstorable commodity.
2 The BlackScholes Equation
This is the basic PDE used in option pricing. We will derive this PDE for a simple case below. Things get
much more complicated for real contracts.
2.1 Background
Over the past few years derivative securities (options, futures, and forward contracts) have become essential
tools for corporations and investors alike. Derivatives facilitate the transfer of ﬁnancial risks. As such, they
may be used to hedge risk exposures or to assume risks in the anticipation of proﬁts. To take a simple yet
instructive example, a gold mining ﬁrm is exposed to ﬂuctuations in the price of gold. The ﬁrm could use a
forward contract to ﬁx the price of its future sales. This would protect the ﬁrm against a fall in the price of
gold, but it would also sacriﬁce the upside potential from a gold price increase. This could be preserved by
using options instead of a forward contract.
Individual investors can also use derivatives as part of their investment strategies. This can be done
through direct trading on ﬁnancial exchanges. In addition, it is quite common for ﬁnancial products to include
some form of embedded derivative. Any insurance contract can be viewed as a put option. Consequently, any
investment which provides some kind of protection actually includes an option feature. Standard examples
include deposit insurance guarantees on savings accounts as well as the provision of being able to redeem a
savings bond at par at any time. These types of embedded options are becoming increasingly common and
increasingly complex. A prominent current example are investment guarantees being oﬀered by insurance
companies (“segregated funds”) and mutual funds. In such contracts, the initial investment is guaranteed,
and gains can be lockedin (reset) a ﬁxed number of times per year at the option of the contract holder. This
is actually a very complex put option, known as a shout option. How much should an investor be willing to
pay for this insurance? Determining the fair market value of these sorts of contracts is a problem in option
pricing.
3
Stock Price = $20
Stock Price = $22
Option Price = $1
Stock Price = $18
Option Price = $0
Figure 2.1: A simple case where the stock value can either be $22 or $18, with a European call option, K =
$21.
2.2 Deﬁnitions
Let’s consider some simple European put/call options. At some time T in the future (the expiry or exercise
date) the holder has the right, but not the obligation, to
• Buy an asset at a prescribed price K (the exercise or strike price). This is a call option.
• Sell the asset at a prescribed price K (the exercise or strike price). This is a put option.
At expiry time T, we know with certainty what the value of the option is, in terms of the price of the
underlying asset S,
Payoﬀ = max(S −K, 0) for a call
Payoﬀ = max(K −S, 0) for a put (2.1)
Note that the payoﬀ from an option is always nonnegative, since the holder has a right but not an obligation.
This contrasts with a forward contract, where the holder must buy or sell at a prescribed price.
2.3 A Simple Example: The Two State Tree
This example is taken from Options, futures, and other derivatives, by John Hull. Suppose the value of a
stock is currently $20. It is known that at the end of three months, the stock price will be either $22 or $18.
We assume that the stock pays no dividends, and we would like to value a European call option to buy the
stock in three months for $21. This option can have only two possible values in three months: if the stock
price is $22, the option is worth $1, if the stock price is $18, the option is worth zero. This is illustrated in
Figure 2.1.
In order to price this option, we can set up an imaginary portfolio consisting of the option and the stock,
in such a way that there is no uncertainty about the value of the portfolio at the end of three months. Since
the portfolio has no risk, the return earned by this portfolio must be the riskfree rate.
Consider a portfolio consisting of a long (positive) position of δ shares of stock, and short (negative) one
call option. We will compute δ so that the portfolio is riskless. If the stock moves up to $22 or goes down
to $18, then the value of the portfolio is
Value if stock goes up = $22δ −1
Value if stock goes down = $18δ −0 (2.2)
4
So, if we choose δ = .25, then the value of the portfolio is
Value if stock goes up = $22δ −1 = $4.50
Value if stock goes down = $18δ −0 = $4.50 (2.3)
So, regardless of whether the stock moves up or down, the value of the portfolio is $4.50. A riskfree portfolio
must earn the risk free rate. Suppose the current riskfree rate is 12%, then the value of the portfolio today
must be the present value of $4.50, or
4.50 ×e
−.12×.25
= 4.367
The value of the stock today is $20. Let the value of the option be V . The value of the portfolio is
20 ×.25 −V = 4.367
→ V = .633
2.4 A hedging strategy
So, if we sell the above option (we hold a short position in the option), then we can hedge this position in
the following way. Today, we sell the option for $.633, borrow $4.367 from the bank at the risk free rate (this
means that we have to pay the bank back $4.50 in three months), which gives us $5.00 in cash. Then, we
buy .25 shares at $20.00 (the current price of the stock). In three months time, one of two things happens
• The stock goes up to $22, our stock holding is now worth $5.50, we pay the option holder $1.00, which
leaves us with $4.50, just enough to pay oﬀ the bank loan.
• The stock goes down to $18.00. The call option is worthless. The value of the stock holding is now
$4.50, which is just enough to pay oﬀ the bank loan.
Consequently, in this simple situation, we see that the theoretical price of the option is the cost for the seller
to set up portfolio, which will precisely pay oﬀ the option holder and any bank loans required to set up the
hedge, at the expiry of the option. In other words, this is price which a hedger requires to ensure that there
is always just enough money at the end to net out at zero gain or loss. If the market price of the option
was higher than this value, the seller could sell at the higher price and lock in an instantaneous riskfree
gain. Alternatively, if the market price of the option was lower than the theoretical, or fair market value, it
would be possible to lock in a riskfree gain by selling the portfolio short. Any such arbitrage opportunities
are rapidly exploited in the market, so that for most investors, we can assume that such opportunities are
not possible (the no arbitrage condition), and therefore that the market price of the option should be the
theoretical price.
Note that this hedge works regardless of whether or not the stock goes up or down. Once we set up this
hedge, we don’t have a care in the world. The value of the option is also independent of the probability that
the stock goes up to $22 or down to $18. This is somewhat counterintuitive.
2.5 Brownian Motion
Before we consider a model for stock price movements, let’s consider the idea of Brownian motion with drift.
Suppose X is a random variable, and in time t →t +dt, X →X +dX, where
dX = αdt +σdZ (2.4)
where αdt is the drift term, σ is the volatility, and dZ is a random term. The dZ term has the form
dZ = φ
√
dt (2.5)
where φ is a random variable drawn from a normal distribution with mean zero and variance one (φ ∼ N(0, 1),
i.e. φ is normally distributed).
5
If E is the expectation operator, then
E(φ) = 0 E(φ
2
) = 1 . (2.6)
Now in a time interval dt, we have
E(dX) = E(αdt) +E(σdZ)
= αdt , (2.7)
and the variance of dX, denoted by V ar(dX) is
V ar(dX) = E([dX −E(dX)]
2
)
= E([σdZ]
2
)
= σ
2
dt . (2.8)
Let’s look at a discrete model to understand this process more completely. Suppose that we have a
discrete lattice of points. Let X = X
0
at t = 0. Suppose that at t = ∆t,
X
0
→ X
0
+ ∆h ; with probability p
X
0
→ X
0
−∆h ; with probability q (2.9)
where p +q = 1. Assume that
• X follows a Markov process, i.e. the probability distribution in the future depends only on where it is
now.
• The probability of an up or down move is independent of what happened in the past.
• X can move only up or down ∆h.
At any lattice point X
0
+i∆h, the probability of an up move is p, and the probability of a down move is q.
The probabilities of reaching any particular lattice point for the ﬁrst three moves are shown in Figure 2.2.
Each move takes place in the time interval t →t + ∆t.
Let ∆X be the change in X over the interval t →t + ∆t. Then
E(∆X) = (p −q)∆h
E([∆X]
2
) = p(∆h)
2
+q(−∆h)
2
= (∆h)
2
, (2.10)
so that the variance of ∆X is (over t →t + ∆t)
V ar(∆X) = E([∆X]
2
) −[E(∆X)]
2
= (∆h)
2
−(p −q)
2
(∆h)
2
= 4pq(∆h)
2
. (2.11)
Now, suppose we consider the distribution of X after n moves, so that t = n∆t. The probability of j up
moves, and (n −j) down moves (P(n, j)) is
P(n, j) =
n!
j!(n −j)!
p
j
q
n−j
(2.12)
which is just a binomial distribution. Now, if X
n
is the value of X after n steps on the lattice, then
E(X
n
−X
0
) = nE(∆X)
V ar(X
n
−X
0
) = nV ar(∆X) , (2.13)
6
X
0
X
0
 ∆h
X
0
 2∆h
X
0
+ 2∆h
X
0
+ ∆h
p
q
p
2
q
2
q
3
p
3
2pq
3p
2
q
3pq
2
X
0
+ 3∆h
X
0
 3∆h
Figure 2.2: Probabilities of reaching the discrete lattice points for the ﬁrst three moves.
which follows from the properties of a binomial distribution, (each up or down move is independent of
previous moves). Consequently, from equations (2.10, 2.11, 2.13) we obtain
E(X
n
−X
0
) = n(p −q)∆h
=
t
∆t
(p −q)∆h
V ar(X
n
−X
0
) = n4pq(∆h)
2
=
t
∆t
4pq(∆h)
2
(2.14)
Now, we would like to take the limit at ∆t →0 in such a way that the mean and variance of X, after a
ﬁnite time t is independent of ∆t, and we would like to recover
dX = αdt +σdZ
E(dX) = αdt
V ar(dX) = σ
2
dt (2.15)
as ∆t →0. Now, since 0 ≤ p, q ≤ 1, we need to choose ∆h = Const
√
∆t. Otherwise, from equation (2.14)
we get that V ar(X
n
− X
0
) is either 0 or inﬁnite after a ﬁnite time. (Stock variances do not have either of
these properties, so this is obviously not a very interesting case).
Let’s choose ∆h = σ
√
∆t, which gives (from equation (2.14))
E(X
n
−X
0
) = (p −q)
σt
√
∆t
V ar(X
n
−X
0
) = t4pqσ
2
(2.16)
Now, for E(X
n
−X
0
) to be independent of ∆t as ∆t →0, we must have
(p −q) = Const.
√
∆t (2.17)
7
If we choose
p −q =
α
σ
√
∆t (2.18)
we get
p =
1
2
[1 +
α
σ
√
∆t]
q =
1
2
[1 −
α
σ
√
∆t] (2.19)
Now, putting together equations (2.162.19) gives
E(X
n
−X
0
) = αt
V ar(X
n
−X
0
) = tσ
2
(1 −
α
2
σ
2
∆t)
= tσ
2
; ∆t →0 . (2.20)
Now, let’s imagine that X(t
n
) −X(t
0
) = X
n
−X
0
is very small, so that X
n
−X
0
dX and t
n
−t
0
dt, so
that equation (2.20) becomes
E(dX) = α dt
V ar(dX) = σ
2
dt . (2.21)
which agrees with equations (2.72.8). Hence, in the limit as ∆t →0, we can interpret the random walk for
X on the lattice (with these parameters) as the solution to the stochastic diﬀerential equation (SDE)
dX = α dt +σ dZ
dZ = φ
√
dt. (2.22)
Consider the case where α = 0, σ = 1, so that dX = dZ = Z(t
i
) − Z(t
i−1
) = Z
i
− Z
i−1
= X
i
− X
i−1
.
Now we can write
t
0
dZ = lim
∆t→0
¸
i
(Z
i+1
−Z
i
) = (Z
n
−Z
0
) . (2.23)
From equation (2.20) (α = 0, σ = 1) we have
E(Z
n
−Z
0
) = 0
V ar(Z
n
−Z
0
) = t . (2.24)
Now, if n is large (∆t →0), recall that the binomial distribution (2.12) tends to a normal distribution. From
equation (2.24), we have that the mean of this distribution is zero, with variance t, so that
(Z
n
−Z
0
) ∼ N(0, t)
=
t
0
dZ . (2.25)
In other words, after a ﬁnite time t,
t
0
dZ is normally distributed with mean zero and variance t (the limit
of a binomial distribution is a normal distribution).
Recall that have that Z
i
− Z
i−1
=
√
∆t with probability p and Z
i
− Z
i−1
= −
√
∆t with probability q.
Note that (Z
i
−Z
i−1
)
2
= ∆t, with certainty, so that we can write
(Z
i
−Z
i−1
)
2
(dZ)
2
= ∆t . (2.26)
To summarize
8
• We can interpret the SDE
dX = α dt +σ dZ
dZ = φ
√
dt. (2.27)
as the limit of a discrete random walk on a lattice as the timestep tends to zero.
• V ar(dZ) = dt, otherwise, after any ﬁnite time, the V ar(X
n
−X
0
) is either zero or inﬁnite.
• We can integrate the term dZ to obtain
t
0
dZ = Z(t) −Z(0)
∼ N(0, t) . (2.28)
Going back to our lattice example, note that the total distance traveled over any ﬁnite interval of time
becomes inﬁnite,
E(∆X) = ∆h (2.29)
so that the the total distance traveled in n steps is
n∆h =
t
∆t
∆h
=
tσ
√
∆t
(2.30)
which goes to inﬁnity as ∆t →0. Similarly,
∆x
∆t
= ±∞ . (2.31)
Consequently, Brownian motion is very jagged at every timescale. These paths are not diﬀerentiable, i.e.
dx
dt
does not exist, so we cannot speak of
E(
dx
dt
) (2.32)
but we can possibly deﬁne
E(dx)
dt
. (2.33)
2.6 Geometric Brownian motion with drift
Of course, the actual path followed by stock is more complex than the simple situation described above.
More realistically, we assume that the relative changes in stock prices (the returns) follow Brownian motion
with drift. We suppose that in an inﬁnitesimal time dt, the stock price S changes to S +dS, where
dS
S
= µdt +σdZ (2.34)
where µ is the drift rate, σ is the volatility, and dZ is the increment of a Wiener process,
dZ = φ
√
dt (2.35)
where φ ∼ N(0, 1). Equations (2.34) and (2.35) are called geometric Brownian motion with drift. So,
superimposed on the upward (relative) drift is a (relative) random walk. The degree of randomness is given
9
0 2 4 6 8 10 12
Time (years)
0
100
200
300
400
500
600
700
800
900
1000
A
s
s
e
t
P
r
i
c
e
(
$
)
Risk Free
Return
Low Volatility Case
σ = .20 per year
0 2 4 6 8 10 12
Time (years)
0
100
200
300
400
500
600
700
800
900
1000
A
s
s
e
t
P
r
i
c
e
(
$
)
Risk Free
Return
High Volatility Case
σ = .40 per year
Figure 2.3: Realizations of asset price following geometric Brownian motion. Left: low volatility case; right:
high volatility case. Riskfree rate of return r = .05.
by the volatility σ. Figure 2.3 gives an illustration of ten realizations of this random process for two diﬀerent
values of the volatility. In this case, we assume that the drift rate µ equals the risk free rate.
Note that
E(dS) = E(σSdZ +µSdt)
= µSdt
since E(dZ) = 0 (2.36)
and that the variance of dS is
V ar[dS] = E(dS
2
) −[E(dS)]
2
= E(σ
2
S
2
dZ
2
)
= σ
2
S
2
dt (2.37)
so that σ is a measure of the degree of randomness of the stock price movement.
Equation (2.34) is a stochastic diﬀerential equation. The normal rules of calculus don’t apply, since for
example
dZ
dt
= φ
1
√
dt
→∞ as dt →0 .
The study of these sorts of equations uses results from stochastic calculus. However, for our purposes, we
need only one result, which is Ito’s Lemma (see Derivatives: the theory and practice of ﬁnancial engineering,
by P. Wilmott). Suppose we have some function G = G(S, t), where S follows the stochastic process equation
(2.34), then, in small time increment dt, G →G+dG, where
dG =
µS
∂G
∂S
+
σ
2
S
2
2
∂
2
G
∂S
2
+
∂G
∂t
dt +σS
∂G
∂S
dZ (2.38)
An informal derivation of this result is given in the following section.
10
2.6.1 Ito’s Lemma
We give an informal derivation of Ito’s lemma (2.38). Suppose we have a variable S which follows
dS = a(S, t)dt +b(S, t)dZ (2.39)
where dZ is the increment of a Weiner process.
Now since
dZ
2
= φ
2
dt (2.40)
where φ is a random variable drawn from a normal distribution with mean zero and unit variance, we have
that, if E is the expectation operator, then
E(φ) = 0 E(φ
2
) = 1 (2.41)
so that the expected value of dZ
2
is
E(dZ
2
) = dt (2.42)
Now, it can be shown (see Section 6) that in the limit as dt →0, we have that φ
2
dt becomes nonstochastic,
so that with probability one
dZ
2
→dt as dt →0 (2.43)
Now, suppose we have some function G = G(S, t), then
dG = G
S
dS +G
t
dt +G
SS
dS
2
2
+... (2.44)
Now (from (2.39) )
(dS)
2
= (adt +b dZ)
2
= a
2
dt
2
+ab dZdt +b
2
dZ
2
(2.45)
Since dZ = O(
√
dt) and dZ
2
→dt, equation (2.45) becomes
(dS)
2
= b
2
dZ
2
+O((dt)
3/2
) (2.46)
or
(dS)
2
→b
2
dt as dt →0 (2.47)
Now, equations(2.39,2.44,2.47) give
dG = G
S
dS +G
t
dt +G
SS
dS
2
2
+...
= G
S
(a dt +b dZ) +dt(G
t
+G
SS
b
2
2
)
= G
S
b dZ + (aG
S
+G
SS
b
2
2
+G
t
)dt (2.48)
So, we have the result that if
dS = a(S, t)dt +b(S, t)dZ (2.49)
and if G = G(S, t), then
dG = G
S
b dZ + (a G
S
+G
SS
b
2
2
+G
t
)dt (2.50)
Equation (2.38) can be deduced by setting a = µS and b = σS in equation (2.50).
11
2.6.2 Some uses of Ito’s Lemma
Suppose we have
dS = µdt +σdZ . (2.51)
If µ, σ = Const., then this can be integrated (from t = 0 to t = t) exactly to give
S(t) = S(0) +µt +σ(Z(t) −Z(0)) (2.52)
and from equation (2.28)
Z(t) −Z(0) ∼ N(0, t) (2.53)
Suppose instead we use the more usual geometric Brownian motion
dS = µSdt +σSdZ (2.54)
Let F(S) = log S, and use Ito’s Lemma
dF = F
S
SσdZ + (F
S
µS +F
SS
σ
2
S
2
2
+F
t
)dt
= (µ −
σ
2
2
)dt +σdZ , (2.55)
so that we can integrate this to get
F(t) = F(0) + (µ −
σ
2
2
)t +σ(Z(t) −Z(0)) (2.56)
or, since S = e
F
,
S(t) = S(0) exp[(µ −
σ
2
2
)t +σ(Z(t) −Z(0))] . (2.57)
Unfortunately, these cases are about the only situations where we can exactly integrate the SDE (constant
σ, µ).
2.6.3 Some more uses of Ito’s Lemma
We can often use Ito’s Lemma and some algebraic tricks to determine some properties of distributions. Let
dX = a(X, t) dt +b(X, t) dZ , (2.58)
then if G = G(X), then
dG =
¸
aG
X
+G
t
+
b
2
2
G
XX
dt +G
X
b dZ . (2.59)
If E[X] =
¯
X, then (b(X, t) and dZ are independent)
E[dX] = d E[S] = d
¯
X
= E[a dt] +E[b] E[dZ]
= E[a dt] , (2.60)
so that
d
¯
X
dt
= E[a] = ¯ a
¯
X = E
¸
t
0
a dt
. (2.61)
12
Let
¯
G = E[(X −
¯
X)
2
] = var(X), then
d
¯
G = E [dG]
= E[2(X −
¯
X)a −2(X −
¯
X)¯ a +b
2
] dt +E[b(X −
¯
X)]E[dZ]
= E[b
2
dt] +E[2(X −
¯
X)(a − ¯ a) dt] , (2.62)
which means that
¯
G = var(X) = E
¸
t
0
b
2
dt
+E
¸
t
0
2(a − ¯ a)(X −
¯
X) dt
. (2.63)
In a particular case, we can sometimes get more useful expressions. If
dS = µS dt +σS dZ (2.64)
with µ, σ constant, then
E[dS] = d
¯
S = E[µS] dt
= µ
¯
S dt , (2.65)
so that
d
¯
S = µ
¯
S dt
¯
S = S
0
e
µt
. (2.66)
Now, let G(S) = S
2
, so that E[G] =
¯
G = E[S
2
], then (from Ito’s Lemma)
d
¯
G = E[2µS
2
+σ
2
S
2
] dt +E[2S
2
σ]E[dZ]
= E[2µS
2
+σ
2
S
2
] dt
= (2µ +σ
2
)
¯
G dt , (2.67)
so that
¯
G =
¯
G
0
e
(2µ+σ
2
)t
E[S
2
] = S
2
0
e
(2µ+σ
2
)t
. (2.68)
From equations (2.66) and (2.68) we then have
var(S) = E[S
2
] −(E[S])
2
= E[S
2
] −
¯
S
2
= S
2
0
e
2µt
(e
σ
2
t
−1)
=
¯
S
2
(e
σ
2
t
−1) . (2.69)
One can use the same ideas to compute the skewness, E[(S −
¯
S)
3
]. If G(S) = S
3
and
¯
G = E[G(S)] = E[S
3
],
then
d
¯
G = E[µS · 3S
2
+σ
2
S
2
/2 · 3 · 2S] dt +E[3S
2
σS]E[dZ]
= E[3µS
3
+ 3σ
2
S
3
]
= 3(µ +σ
2
)
¯
G , (2.70)
so that
¯
G = E[S
3
]
= S
3
0
e
3(µ+σ
2
)t
. (2.71)
We can then obtain the skewness from
E[(S −
¯
S)
3
] = E[S
3
−2S
2
¯
S −2S
¯
S
2
+
¯
S
3
]
= E[S
3
] −2
¯
SE[S
2
] −
¯
S
3
. (2.72)
Equations (2.66, 2.68, 2.71) can then be substituted into equation (2.72) to get the desired result.
13
2.7 The BlackScholes Analysis
Assume
• The stock price follows geometric Brownian motion, equation (2.34).
• The riskfree rate of return is a constant r.
• There are no arbitrage opportunities, i.e. all riskfree portfolios must earn the riskfree rate of return.
• Short selling is permitted (i.e. we can own negative quantities of an asset).
Suppose that we have an option whose value is given by V = V (S, t). Construct an imaginary portfolio,
consisting of one option, and a number of (−(α
h
)) of the underlying asset. (If (α
h
) > 0, then we have sold
the asset short, i.e. we have borrowed an asset, sold it, and are obligated to give it back at some future
date).
The value of this portfolio P is
P = V −(α
h
)S (2.73)
In a small time dt, P →P +dP,
dP = dV −(α
h
)dS (2.74)
Note that in equation (2.74) we not included a term (α
h
)
S
S. This is actually a rather subtle point, since
we shall see (later on) that (α
h
) actually depends on S. However, if we think of a real situation, at any
instant in time, we must choose (α
h
), and then we hold the portfolio while the asset moves randomly. So,
equation (2.74) is actually the change in the value of the portfolio, not a diﬀerential. If we were taking a
true diﬀerential then equation (2.74) would be
dP = dV −(α
h
)dS −Sd(α
h
)
but we have to remember that (α
h
) does not change over a small time interval, since we pick (α
h
), and
then S changes randomly. We are not allowed to peek into the future, (otherwise, we could get rich without
risk, which is not permitted by the noarbitrage condition) and hence (α
h
) is not allowed to contain any
information about future asset price movements. The principle of no peeking into the future is why Ito
stochastic calculus is used. Other forms of stochastic calculus are used in Physics applications (i.e. turbulent
ﬂow).
Substituting equations (2.34) and (2.38) into equation (2.74) gives
dP = σS
V
S
−(α
h
)
dZ +
µSV
S
+
σ
2
S
2
2
V
SS
+V
t
−µ(α
h
)S
dt (2.75)
We can make this portfolio riskless over the time interval dt, by choosing (α
h
) = V
S
in equation (2.75). This
eliminates the dZ term in equation (2.75). (This is the analogue of our choice of the amount of stock in the
riskless portfolio for the two state tree model.) So, letting
(α
h
) = V
S
(2.76)
then substituting equation (2.76) into equation (2.75) gives
dP =
V
t
+
σ
2
S
2
2
V
SS
dt (2.77)
Since P is now riskfree in the interval t →t +dt, then noarbitrage says that
dP = rPdt (2.78)
Therefore, equations (2.77) and (2.78) give
rPdt =
V
t
+
σ
2
S
2
2
V
SS
dt (2.79)
14
Since
P = V −(α
h
)S = V −V
S
S (2.80)
then substituting equation (2.80) into equation (2.79) gives
V
t
+
σ
2
S
2
2
V
SS
+rSV
S
−rV = 0 (2.81)
which is the BlackScholes equation. Note the rather remarkable fact that equation (2.81) is independent of
the drift rate µ.
Equation (2.81) is solved backwards in time from the option expiry time t = T to the present t = 0.
2.8 Hedging in Continuous Time
We can construct a hedging strategy based on the solution to the above equation. Suppose we sell an option
at price V at t = 0. Then we carry out the following
• We sell one option worth V . (This gives us V in cash initially).
• We borrow (S
∂V
∂S
−V ) from the bank.
• We buy
∂V
∂S
shares at price S.
At every instant in time, we adjust the amount of stock we own so that we always have
∂V
∂S
shares. Note
that this is a dynamic hedge, since we have to continually rebalance the portfolio. Cash will ﬂow into and
out of the bank account, in response to changes in S. If the amount in the bank is positive, we receive the
risk free rate of return. If negative, then we borrow at the risk free rate.
So, our hedging portfolio will be
• Short one option worth V .
• Long
∂V
∂S
shares at price S.
• V −S
∂V
∂S
cash in the bank account.
At any instant in time (including the terminal time), this portfolio can be liquidated and any obligations
implied by the short position in the option can be covered, at zero gain or loss, regardless of the value of S.
Note that given the receipt of the cash for the option, this strategy is selfﬁnancing.
2.9 The option price
So, we can see that the price of the option valued by the BlackScholes equation is the market price of the
option at any time. If the price was higher then the BlackScholes price, we could construct the hedging
portfolio, dynamically adjust the hedge, and end up with a positive amount at the end. Similarly, if the price
was lower than the BlackScholes price, we could short the hedging portfolio, and end up with a positive
gain. By the noarbitrage condition, this should not be possible.
Note that we are not trying to predict the price movements of the underlying asset, which is a random
process. The value of the option is based on a hedging strategy which is dynamic, and must be continuously
rebalanced. The price is the cost of setting up the hedging portfolio. The BlackScholes price is not the
expected payoﬀ.
The price given by the BlackScholes price is not the value of the option to a speculator, who buys and
holds the option. A speculator is making bets about the underlying drift rate of the stock (note that the
drift rate does not appear in the BlackScholes equation). For a speculator, the value of the option is given
by an equation similar to the BlackScholes equation, except that the drift rate appears. In this case, the
price can be interpreted as the expected payoﬀ based on the guess for the drift rate. But this is art, not
science!
15
2.10 American early exercise
Actually, most options traded are American options, which have the feature that they can be exercised at
any time. Consequently, an investor acting optimally, will always exercise the option if the value falls below
the payoﬀ or exercise value. So, the value of an American option is given by the solution to equation (2.81)
with the additional constraint
V (S, t) ≥
max(S −K, 0) for a call
max(K −S, 0) for a put
(2.82)
Note that since we are working backwards in time, we know what the option is worth in future, and therefore
we can determine the optimal course of action.
In order to write equation (2.81) in more conventional form, deﬁne τ = T − t, so that equation (2.81)
becomes
V
τ
=
σ
2
S
2
2
V
SS
+rSV
S
−rV
V (S, τ = 0) =
max(S −K, 0) for a call
max(K −S, 0) for a put
V (0, τ) → V
τ
= −rV
V (S = ∞, τ) →
S for a call
0 for a put
(2.83)
If the option is American, then we also have the additional constraints
V (S, τ) ≥
max(S −K, 0) for a call
max(K −S, 0) for a put
(2.84)
Deﬁne the operator
LV ≡ V
τ
−(
σ
2
S
2
2
V
SS
+rSV
S
−rV ) (2.85)
and let V (S, 0) = V
∗
. More formally, the American option pricing problem can be stated as
LV ≥ 0
V −V
∗
≥ 0
(V −V
∗
)LV = 0 (2.86)
3 The Risk Neutral World
Suppose instead of valuing an option using the above noarbitrage argument, we wanted to know the expected
value of the option. We can imagine that we are buying and holding the option, and not hedging. If we
are considering the value of risky cash ﬂows in the future, then these cash ﬂows should be discounted at an
appropriate discount rate, which we will call ρ (i.e. the riskier the cash ﬂows, the higher ρ).
Consequently the value of an option today can be considered to the be the discounted future value. This
is simply the old idea of net present value. Regard S today as known, and let V (S +dS, t +dt) be the value
of the option at some future time t +dt, which is uncertain, since S evolves randomly. Thus
V (S, t) =
1
1 +ρdt
E(V (S +dS, t +dt)) (3.1)
where E(...) is the expectation operator, i.e. the expected value of V (S +dS, t +dt) given that V = V (S, t)
at t = t. We can rewrite equation (3.1) as (ignoring terms of o(dt), where o(dt) represents terms that go to
zero faster than dt )
ρdtV (S, t) = E(V (S, t) +dV ) −V (S, t) . (3.2)
16
Since we regard V as known today, then
E(V (S, t) +dV ) −V (S, t) = E(dV ) , (3.3)
so that equation (3.2) becomes
ρdtV (S, t) = E(dV ) . (3.4)
Assume that
dS
S
= µdt +σdZ . (3.5)
From Ito’s Lemma (2.38) we have that
dV =
V
t
+
σ
2
S
2
2
V
SS
+µSV
S
dt +σSdZ . (3.6)
Noting that
E(dZ) = 0 (3.7)
then
E(dV ) =
V
t
+
σ
2
S
2
2
V
SS
+µSV
S
dt . (3.8)
Combining equations (3.43.8) gives
V
t
+
σ
2
S
2
2
V
SS
+µSV
S
−ρV = 0 . (3.9)
Equation (3.9) is the PDE for the expected value of an option. If we are not hedging, maybe this is the
value that we are interested in, not the noarbitrage value. However, if this is the case, we have to estimate
the drift rate µ, and the discount rate ρ. Estimating the appropriate discount rate is always a thorny issue.
Now, note the interesting fact, if we set ρ = r and µ = r in equation (3.9) then we simply get the
BlackScholes equation (2.81).
This means that the noarbitrage price of an option is identical to the expected value if ρ = r and µ = r.
In other words, we can determine the noarbitrage price by pretending we are living in a world where all
assets drift at rate r, and all investments are discounted at rate r. This is the socalled risk neutral world.
This result is the source of endless confusion. It is best to think of this as simply a mathematical ﬂuke.
This does not have any reality. Investors would be very stupid to think that the drift rate of risky investments
is r. I’d rather just buy riskfree bonds in this case. There is in reality no such thing as a riskneutral world.
Nevertheless, this result is useful for determining the noarbitrage value of an option using a Monte Carlo
approach. Using this numerical method, we simply assume that
dS = rSdt +σSdZ (3.10)
and simulate a large number of random paths. If we know the option payoﬀ as a function of S at t = T,
then we compute
V (S, 0) = e
−rT
E
Q
(V (S, T)) (3.11)
which should be the noarbitrage value.
Note the E
Q
in the above equation. This makes it clear that we are taking the expectation in the risk
neutral world (the expectation in the Q measure). This contrasts with the realworld expectation (the P
measure).
Suppose we want to know the expected value (in the real world) of an asset which pays V (S, t = T) at
t = T in the future. Then, the expected value (today) is given by solving
V
t
+
σ
2
S
2
2
V
SS
+µSV
S
= 0 . (3.12)
17
where we have dropped the discounting term. In particular, suppose we are going to receive V = S(t = T),
i.e. just the asset at t = T. Assume that the solution to equation (3.12) is V = Const. A(t)S, and we ﬁnd
that
V = Const. Se
µ(T−t)
. (3.13)
Noting that we receive V = S at t = T means that
V = Se
µ(T−t)
. (3.14)
Today, we can acquire the asset for price S(t = 0). At t = T, the asset is worth S(t = T). Equation (3.14)
then says that
E[V (S(t = 0), t = 0)] = E[S(t = 0)] = S(t = 0)e
µ(T)
(3.15)
In other words, if
dS = Sµ dt +Sσ dZ (3.16)
then (setting t = T)
E[S] = Se
µt
. (3.17)
Recall that the exact solution to equation (3.16) is (equation (2.57))
S(t) = S(0) exp[(µ −
σ
2
2
)t +σ(Z(t) −Z(0))] . (3.18)
So that we have just shown that E[S] = Se
µt
by using a simple PDE argument and Ito’s Lemma. Isn’t this
easier than using brute force statistics? PDEs are much more elegant.
4 Monte Carlo Methods
This brings us to the simplest numerical method for computing the noarbitrage value of an option. Suppose
that we assume that the underlying process is
dS
S
= rdt +σdZ (4.1)
then we can simulate a path forward in time, starting at some price today S
0
, using a forward Euler
timestepping method (S
i
= S(t
i
))
S
i+1
= S
i
+S
i
(r∆t +σφ
i
√
∆t) (4.2)
where ∆t is the ﬁnite timestep, and φ
i
is a random number which is N(0, 1). Note that at each timestep,
we generate a new random number. After N steps, with T = N∆t, we have a single realized path. Given
the payoﬀ function of the option, the value for this path would be
V alue = Payoff(S
N
) . (4.3)
For example, if the option was a European call, then
V alue = max(S
N
−K, 0)
K = Strike Price (4.4)
Suppose we run a series of trials, m = 1, ..., M, and denote the payoﬀ after the m
th trial as payoff(m).
Then, the noarbitrage value of the option is
Option V alue = e
−rT
E(payoff)
e
−rT
1
M
m=M
¸
m=1
payoff(m) . (4.5)
Recall that these paths are not the real paths, but are the risk neutral paths.
Now, we should remember that we are
18
1. approximating the solution to the SDE by forward Euler, which has O(∆t) truncation error.
2. approximating the expectation by the mean of many random paths. This Monte Carlo error is of size
O(1/
√
M), which is slowly converging.
There are thus two sources of error in the Monte Carlo approach: timestepping error and sampling error.
The slow rate of convergence of Monte Carlo methods makes these techniques unattractive except when
the option is written on several (i.e. more than three) underlying assets. As well, since we are simulating
forward in time, we cannot know at a given point in the forward path if it is optimal to exercise or hold an
American style option. This is easy if we use a PDE method, since we solve the PDE backwards in time, so
we always know the continuation value and hence can act optimally. However, if we have more than three
factors, PDE methods become very expensive computationally. As well, if we want to determine the eﬀects
of discrete hedging, for example, a Monte Carlo approach is very easy to implement.
The error in the Monte Carlo method is then
Error = O
max(∆t,
1
√
M
)
∆t = timestep
M = number of Monte Carlo paths (4.6)
Now, it doesn’t make sense to drive the Monte Carlo error down to zero if there is O(∆t) timestepping error.
We should seek to balance the timestepping error and the sampling error. In order to make these two errors
the same order, we should choose M = O(
1
(∆t)
2
). This makes the total error O(∆t). We also have that
Complexity = O
M
∆t
= O
1
(∆t)
3
∆t = O
(Complexity)
−1/3
(4.7)
and hence
Error = O
1
( Complexity)
1/3
. (4.8)
In practice, the convergence in terms of timestep error is often not done. People just pick a timestep,
i.e. one day, and increase the number of Monte Carlo samples until they achieve convergence in terms of
sampling error, and ignore the timestep error. Sometimes this gives bad results!
Note that the exact solution to Geometric Brownian motion (2.57) has the property that the asset value
S can never reach S = 0 if S(0) > 0, in any ﬁnite time. However, due to the approximate nature of our
Forward Euler method for solving the SDE, it is possible that a negative or zero S
i
can show up. We can
do one of three things here, in this case
• Cut back the timestep at this point in the simulation so that S is positive.
• Set S = 0 and continue. In this case, S remains zero for the rest of this particular simulation.
• Use Ito’s Lemma, and determine the SDE for log S, i.e. if F = log S, then, from equation (2.55), we
obtain (with µ = r)
dF = (r −
σ
2
2
)dt +σdZ , (4.9)
so that now, if F < 0, there is no problem, since S = e
F
, and if F < 0, this just means that S is very
small. We can use this idea for any stochastic process where the variable should not go negative.
19
Usually, most people set S = 0 and continue. As long as the timestep is not too large, this situation is
probably due to an event of low probability, hence any errors incurred will not aﬀect the expected value very
much. If negative S values show up many times, this is a signal that the timestep is too large.
In the case of simple Geometric Brownian motion, where r, σ are constants, then the SDE can be solved
exactly, and we can avoid timestepping errors (see Section 2.6.2). In this case
S(T) = S(0) exp[(r −
σ
2
2
)T +σφ
√
T] (4.10)
where φ ∼ N(0, 1). I’ll remind you that equation (4.10) is exact. For these simple cases, we should always
use equation (4.10). Unfortunately, this does not work in more realistic situations.
Monte Carlo is popular because
• It is simple to code. Easily handles complex path dependence.
• Easily handles multiple assets.
The disadvantages of Monte Carlo methods are
• It is diﬃcult to apply this idea to problems involving optimal decision making (e.g. American options).
• It is hard to compute the Greeks (V
S
, V
SS
), which are the hedging parameters, very accurately.
• MC converges slowly.
4.1 Monte Carlo Error Estimators
The sampling error can be estimated via a statistical approach. If the estimated mean of the sample is
ˆ µ =
e
−rT
M
m=M
¸
m=1
payoff(m) (4.11)
and the standard deviation of the estimate is
ω =
1
M −1
m=M
¸
m=1
(e
−rT
payoff(m) − ˆ µ)
2
1/2
(4.12)
then the 95% conﬁdence interval for the actual value V of the option is
ˆ µ −
1.96ω
√
M
< V < ˆ µ +
1.96ω
√
M
(4.13)
Note that in order to reduce this error by a factor of 10, the number of simulations must be increased by
100.
The timestep error can be estimated by running the problem with diﬀerent size timesteps, comparing the
solutions.
4.2 Random Numbers and Monte Carlo
There are many good algorithms for generating random sequences which are uniformly distributed in [0, 1].
See for example, (Numerical Recipies in C++., Press et al, Cambridge University Press, 2002). As pointed
out in this book, often the system supplied random number generators, such as rand in the standard C
library, and the infamous RANDU IBM function, are extremely bad. The Matlab functions appear to be
quite good. For more details, please look at (Park and Miller, ACM Transactions on Mathematical Software,
31 (1988) 11921201). Another good generator is described in (Matsumoto and Nishimura, “The Mersenne
Twister: a 623 dimensionally equidistributed uniform pseudorandom number generator,” ACM Transactions
20
on Modelling and Computer Simulation, 8 (1998) 330.) Code can be downloaded from the authors Web
site.
However, we need random numbers which are normally distributed on [−∞, +∞], with mean zero and
variance one (N(0, 1)).
Suppose we have uniformly distributed numbers on [0, 1], i.e. the probability of obtaining a number
between x and x +dx is
p(x)dx = dx ; 0 ≤ x ≤ 1
= 0 ; otherwise (4.14)
Let’s take a function of this random variable y(x). How is y(x) distributed? Let ˆ p(y) be the probability
distribution of obtaining y in [y, y + dy]. Consequently, we must have (recall the law of transformation of
probabilities)
p(x)dx = ˆ p(y)dy
or
ˆ p(y) = p(x)
dx
dy
. (4.15)
Suppose we want ˆ p(y) to be normal,
ˆ p(y) =
e
−y
2
/2
√
2π
. (4.16)
If we start with a uniform distribution, p(x) = 1 on [0, 1], then from equations (4.154.16) we obtain
dx
dy
=
e
−y
2
/2
√
2π
. (4.17)
Now, for x ∈ [0, 1], we have that the probability of obtaining a number in [0, x] is
x
0
dx
= x , (4.18)
but from equation (4.17) we have
dx
=
e
−(y
)
2
/2
√
2π
dy
. (4.19)
So, there exists a y such that the probability of getting a y
in [−∞, y] is equal to the probability of getting
x
in [0, x],
x
0
dx
=
y
−∞
e
−(y
)
2
/2
√
2π
dy
, (4.20)
or
x =
y
−∞
e
−(y
)
2
/2
√
2π
dy
. (4.21)
So, if we generate uniformly distributed numbers x on [0, 1], then to determine y which are N(0, 1), we do
the following
• Generate x
• Find y such that
x =
1
√
2π
y
−∞
e
−(y
)
2
/2
dy
. (4.22)
We can write this last step as
y = F(x) (4.23)
where F(x) is the inverse cumulative normal distribution.
21
4.3 The BoxMuller Algorithm
Starting from random numbers which are uniformly distributed on [0, 1], there is actually a simpler method
for obtaining random numbers which are normally distributed.
If p(x) is the probability of ﬁnding x ∈ [x, x +dx] and if y = y(x), and ˆ p(y) is the probability of ﬁnding
y ∈ [y, y +dy], then, from equation (4.15) we have
p(x)dx =  ˆ p(y)dy (4.24)
or
ˆ p(y) = p(x)
dx
dy
. (4.25)
Now, suppose we have two original random variables x
1
, x
2
, and let p(x
i
, x
2
) be the probability of
obtaining (x
1
, x
2
) in [x
1
, x
1
+dx
1
] ×[x
2
, x
2
+dx
2
]. Then, if
y
1
= y
1
(x
1
, x
2
)
y
2
= y
2
(x
1
, x
2
) (4.26)
and we have that
ˆ p(y
1
, y
2
) = p(x
1
, x
2
)
∂(x
1
, x
2
)
∂(y
1
, y
2
)
(4.27)
where the Jacobian of the transformation is deﬁned as
∂(x
1
, x
2
)
∂(y
1
, y
2
)
= det
∂x1
∂y1
∂x1
∂y2
∂x2
∂y1
∂x2
∂y2
(4.28)
Recall that the Jacobian of the transformation can be regarded as the scaling factor which transforms dx
1
dx
2
to dy
1
dy
2
, i.e.
dx
1
dx
2
=
∂(x
1
, x
2
)
∂(y
1
, y
2
)
dy
1
dy
2
. (4.29)
Now, suppose that we have x
1
, x
2
uniformly distributed on [0, 1] ×[0, 1], i.e.
p(x
1
, x
2
) = U(x
1
)U(x
2
) (4.30)
where
U(x) = 1 ; 0 ≤ x ≤ 1
= 0 ; otherwise . (4.31)
We denote this distribution as x
1
∼ U[0, 1] and x
2
∼ U[0, 1].
If p(x
1
, x
2
) is given by equation (4.30), then we have from equation (4.27) that
ˆ p(y
1
, y
2
) =
∂(x
1
, x
2
)
∂(y
1
, y
2
)
(4.32)
Now, we want to ﬁnd a transformation y
1
= y
1
(x
1
, x
2
), y
2
= y
2
(x
1
, x
2
) which results in normal distributions
for y
1
, y
2
. Consider
y
1
=
−2 log x
1
cos 2πx
2
y
2
=
−2 log x
1
sin 2πx
2
(4.33)
22
or solving for (x
2
, x
2
)
x
1
= exp
−1
2
(y
2
1
+y
2
2
)
x
2
=
1
2π
tan
−1
¸
y
2
y
1
. (4.34)
After some tedious algebra, we can see that (using equation (4.34))
∂(x
1
, x
2
)
∂(y
1
, y
2
)
=
1
√
2π
e
−y
2
1
/2
1
√
2π
e
−y
2
2
/2
(4.35)
Now, assuming that equation (4.30) holds, then from equations (4.324.35) we have
ˆ p(y
1
, y
2
) =
1
√
2π
e
−y
2
1
/2
1
√
2π
e
−y
2
2
/2
(4.36)
so that (y
1
, y
2
) are independent, normally distributed random variables, with mean zero and variance one,
or
y
1
∼ N(0, 1) ; y
2
∼ N(0, 1) . (4.37)
This gives the following algorithm for generating normally distributed random numbers (given uniformly
distributed numbers):
Box Muller Algorithm
Repeat
Generate u
1
∼ U(0, 1), u
2
∼ U(0, 1)
θ = 2πu
2
, ρ =
√
−2 log u
1
z
1
= ρ cos θ; z
2
= ρ sin θ
End Repeat
(4.38)
This has the eﬀect that Z
1
∼ N(0, 1) and Z
2
∼ N(0, 1).
Note that we generate two draws from a normal distribution on each pass through the loop.
4.3.1 An improved Box Muller
The algorithm (4.38) can be expensive due to the trigonometric function evaluations. We can use the
following method to avoid these evaluations. Let
U
1
∼ U[0, 1] ; U
2
∼ U[0, 1]
V
1
= 2U
1
−1 ; V
2
= 2U
2
−1 (4.39)
which means that (V
1
, V
2
) are uniformly distributed in [−1, 1] × [−1, 1]. Now, we carry out the following
procedure
Rejection Method
Repeat
If ( V
2
1
+V
2
2
< 1 )
Accept
Else
Reject
23
Endif
End Repeat
(4.40)
which means that if we deﬁne (V
1
, V
2
) as in equation (4.39), and then process the pairs (V
1
, V
2
) using
algorithm (4.40) we have that (V
1
, V
2
) are uniformly distributed on the disk centered at the origin, with
radiius one, in the (V
1
, V
2
) plane. This is denoted by
(V
1
, V
2
) ∼ D(0, 1) . (4.41)
If (V
1
, V
2
) ∼ D(0, 1) and R
2
= V
2
1
+V
2
2
, then the probability of ﬁnding R in [R, R +dR] is
p(R) dR =
2πR dR
π(1)
2
= 2R dR . (4.42)
From the fundamental law of transformation of probabilities, we have that
p(R
2
)d(R
2
) = p(R)dR
= 2R dR (4.43)
so that
p(R
2
) =
2R
d(R
2
)
dR
= 1 (4.44)
so that R
2
is uniformly distributed on [0, 1], (R
2
∼ U[0, 1]).
As well, if θ = tan
−1
(V
2
/V
1
), i.e. θ is the angle between a line from the origin to the point (V
1
, V
2
) and
the V
1
axis, then θ ∼ U[0, 2π]. Note that
cos θ =
V
1
V
2
1
+V
2
2
sin θ =
V
2
V
2
1
+V
2
2
. (4.45)
Now in the original Box Muller algorithm (4.38),
ρ =
−2 log U
1
; U
1
∼ U[0, 1]
θ = 2ΠU
2
; U
2
∼ U[0, 1] , (4.46)
but θ = tan
−1
(V
2
/V
1
) ∼ U[0, 2π], and R
2
= U[0, 1]. Therefore, if we let W = R
2
, then we can replace θ, ρ
in algorithm (4.38) by
θ = tan
−1
V
2
V
1
ρ =
−2 log W . (4.47)
Now, the last step in the Box Muller algorithm (4.38) is
Z
1
= ρ cos θ
Z
2
= ρ sin θ , (4.48)
24
but since W = R
2
= V
2
1
+V
2
2
, then cos θ = V
1
/R, sin θ = V
2
/R, so that
Z
1
= ρ
V
1
√
W
Z
2
= ρ
V
2
√
W
. (4.49)
This leads to the following algorithm
Polar form of Box Muller
Repeat
Generate U
1
∼ U[0, 1], U
2
∼ U[0, 1].
Let
V
1
= 2U
1
−1
V
2
= 2U
2
−1
W = V
2
1
+V
2
2
If( W < 1) then
Z
1
= V
1
−2 log W/W
Z
2
= V
2
−2 log W/W (4.50)
End If
End Repeat
Consequently, (Z
1
, Z
2
) are independent (uncorrelated), and Z
1
∼ N(0, 1), and Z
2
∼ N(0, 1). Because of the
rejection step (4.40), about (1 −π/4) of the random draws in [−1, +1] ×[−1, +1] are rejected (about 21%),
but this method is still generally more eﬃcient than brute force Box Muller.
4.4 Speeding up Monte Carlo
Monte Carlo methods are slow to converge, since the error is given by
Error = O(
1
√
M
)
where M is the number of samples. There are many methods which can be used to try to speed up
convergence. These are usually termed Variance Reduction techniques.
Perhaps the simplest idea is the Antithetic Variable method. Suppose we compute a random asset path
S
i+1
= S
i
µ∆t +S
i
σφ
i
√
∆t
where φ
i
are N(0, 1). We store all the φ
i
, i = 1, ..., for a given path. Call the estimate for the option price
from this sample path V
+
. Then compute a second sample path where (φ
i
)
= −φ
i
, i = 1, ...,. Call this
estimate V
−
. Then compute the average
¯
V =
V
+
+V
−
2
,
and continue sampling in this way. Averaging over all the
¯
V , slightly faster convergence is obtained. Intu
itively, we can see that this symmetrizes the random paths.
25
Let X
+
be the option values obtained from all the V
+
simulations, and X
−
be the estimates obtained
from all the V
−
simulations. Note that V ar(X
+
) = V ar(X
−
) (they have the same distribution). Then
V ar(
X
+
+X
−
2
) =
1
4
V ar(X
+
) +
1
4
V ar(X
−
) +
1
2
Cov(X
+
, X
−
)
=
1
2
V ar(X
+
) +
1
2
Cov(X
+
, X
−
) (4.51)
which will be smaller than V ar(X
+
) if Cov(X
+
, X
−
) is nonpositive. Warning: this is not always the case.
For example, if the payoﬀ is not a monotonic function of S, the results may actually be worse than crude
Monte Carlo. For example, if the payoﬀ is a capped call
payoﬀ = min(K
2
, max(S −K
1
, 0))
K
2
> K
1
then the antithetic method performs poorly.
Note that this method can be used to estimate the mean. In the MC error estimator (4.13), compute the
standard deviation of the estimator as ω =
V ar(
X
+
+X
−
2
).
However, if we want to estimate the distribution of option prices (i.e. a probability distribution), then
we should not average each V
+
and V
−
, since this changes the variance of the actual distribution.
If we want to know the actual variance of the distribution (and not just the mean), then to compute the
variance of the distribution, we should just use the estimates V
+
, and compute the estimate of the variance
in the usual way. This should also be used if we want to plot a histogram of the distribution, or compute
the Value at Risk.
4.5 Estimating the mean and variance
An estimate of the mean ¯ x and variance s
2
M
of M numbers x
1
, x
2
, ..., x
M
is
s
2
M
=
1
M −1
M
¸
i=1
(x
i
− ¯ x)
2
¯ x =
1
M
M
¸
i=1
x
i
(4.52)
Alternatively, one can use
s
2
M
=
1
M −1
¸
M
¸
i=1
x
2
i
−
1
M
M
¸
i=1
x
i
2
¸
(4.53)
which has the advantage that the estimate of the mean and standard deviation can be computed in one loop.
In order to avoid roundoﬀ, the following method is suggested by Seydel (R. Seydel, Tools for Computa
tional Finance, Springer, 2002). Set
α
1
= x
1
; β
1
= 0 (4.54)
then compute recursively
α
i
= α
i−1
+
x
i
−α
i−1
i
β
i
= β
i−1
+
(i −1)(x
i
−α
i−1
)
2
i
(4.55)
so that
¯ x = α
M
s
2
M
=
β
M
M −1
(4.56)
26
4.6 Low Discrepancy Sequences
In a eﬀort to get around the
1
√
M
, (M = number of samples) behaviour of Monte Carlo methods, quasiMonte
Carlo methods have been devised.
These techniques use a deterministic sequence of numbers (low discrepancy sequences). The idea here
is that a Monte Carlo method does not ﬁll the sample space very evenly (after all, its random). A low
discrepancy sequence tends to sample the space in a orderly fashion. If d is the dimension of the space, then
the worst case error bound for an LDS method is
Error = O
(log M)
d
M
(4.57)
where M is the number of samples used. Clearly, if d is small, then this error bound is (at least asymptotically)
better than Monte Carlo.
LDS methods generate numbers on [0, 1]. We cannot use the BoxMuller method in this case to produce
normally distributed numbers, since these numbers are deterministic. We have to invert the cumulative
normal distribution in order to get the numbers distributed with mean zero and standard deviation one on
[−∞, +∞]. So, if F(x) is the inverse cumulative normal distribution, then
x
LDS
= uniformly distributed on [0, 1]
y
LDS
= F(x
LDS
) is N(0, 1) . (4.58)
Another problem has to do with the fact that if we are stepping through time, i.e.
S
n+1
= S
n
+S
n
(r∆t +φσ
√
∆t)
φ = N(0, 1) (4.59)
with, say, N steps in total, then we need to think of this as a problem in N dimensional space. In other
words, the k −th timestep is sampled from the k −th coordinate in this N dimensional space. We are trying
to uniformly sample from this N dimensional space.
Let ˆ x be a vector of LDS numbers on [0, 1], in N dimensional space
ˆ x =
x
1
x
2

x
N
¸
¸
¸
¸
. (4.60)
So, an LDS algorithm would proceed as follows, for the j
th trial
• Generate ˆ x
j
(the j
th LDS number in an N dimensional space).
• Generate the normally distributed vector ˆ y
j
by inverting the cumulative normal distribution for each
component
ˆ y
j
=
F(x
j
1
)
F(x
j
2
)

F(x
j
N
)
¸
¸
¸
¸
(4.61)
• Generate a complete sample path k = 0, ..., N −1
S
k+1
j
= S
k
j
+S
k
j
(r∆t + ˆ y
j
k+1
σ
√
∆t)
(4.62)
• Compute the payoﬀ at S = S
N
j
27
The option value is the average of these trials.
There are a variety of LDS numbers: Halton, Sobol, Niederrieter, etc. Our tests seem to indicate that
Sobol is the best.
Note that the worst case error bound for the error is given by equation (4.57). If we use a reasonable
number of timesteps, say 50 −100, then, d = 50 −100, which gives a very bad error bound. For d large, the
numerator in equation (4.57) dominates. The denominator only dominates when
M e
d
(4.63)
which is a very large number of trials for d 100. Fortunately, at least for pathdependent options, we have
found that things are not quite this bad, and LDS seems to work if the number of timesteps is less than
100 −200. However, once the dimensionality gets above a few hundred, convergence seems to slow down.
4.7 Correlated Random Numbers
In many cases involving multiple assets, we would like to generate correlated, normally distributed random
numbers. Suppose we have i = 1, ..., d assets, and each asset follows the simulated path
S
n+1
i
= S
n
i
+S
n
i
(r∆t +φ
n
i
σ
i
√
∆t)
(4.64)
where φ
n
i
is N(0, 1) and
E(φ
n
i
φ
n
j
) = ρ
ij
(4.65)
where ρ
ij
is the correlation between asset i and asset j.
Now, it is easy to generate a set of d uncorrelated N(0, 1) variables. Call these
1
, ...,
d
. So, how do we
produce correlated numbers? Let
[Ψ]
ij
= ρ
ij
(4.66)
be the matrix of correlation coeﬃcients. Assume that this matrix is SPD (if not, one of the random variables
is a linear combination of the others, hence this is a degenerate case). Assuming Ψ is SPD, we can Cholesky
factor Ψ = LL
t
, so that
ρ
ij
=
¸
k
L
ik
L
t
kj
(4.67)
Let
¯
φ be the vector of correlated normally distributed random numbers (i.e. what we want to get), and let
¯ be the vector of uncorrelated N(0, 1) numbers (i.e. what we are given).
¯
φ =
φ
1
φ
2

φ
d
¸
¸
¸
¸
; ¯ =
1
2

d
¸
¸
¸
¸
(4.68)
So, given ¯ , we have
E(
i
j
) = δ
ij
where
δ
ij
= 0 ; if i = j
= 1 ; if i = j .
28
since the
i
are uncorrelated. Now, let
φ
i
=
¸
j
L
ij
j
(4.69)
which gives
φ
i
φ
k
=
¸
j
¸
l
L
ij
L
kl
l
j
=
¸
j
¸
l
L
ij
l
j
L
t
lk
. (4.70)
Now,
E(φ
i
φ
k
) = E
¸
j
¸
l
L
ij
l
j
L
t
lk
¸
¸
=
¸
j
¸
l
L
ij
E(
l
j
)L
t
lk
=
¸
j
¸
l
L
ij
δ
lj
L
t
lk
=
¸
l
L
il
L
t
lk
= ρ
ij
(4.71)
So, in order to generate correlated N(0, 1) numbers:
• Factor the correlation matrix Ψ = LL
t
• Generate uncorrelated N(0, 1) numbers
i
• Correlated numbers φ
i
are given from
¯
φ = L¯
4.8 Integration of Stochastic Diﬀerential Equations
Up to now, we have been fairly slack about deﬁning what we mean by convergence when we use forward
Euler timestepping (4.2) to integrate
dS = µS dt +σS dZ . (4.72)
The forward Euler algorithm is simply
S
i+1
= S
i
+S
i
(µh +φ
i
√
h) (4.73)
where h = ∆t is the ﬁnite timestep. For a good overview of these methods, check out (“An algorithmic
introduction to numerical simulation of stochastic diﬀerential equations,” by D. Higham, SIAM Review vol.
43 (2001) 525546). This article also has some good tips on solving SDEs using Matlab, in particular, taking
full advantage of the vectorization of Matlab. Note that eliminating as many for loops as possible (i.e.
computing all the MC realizations for each timestep in a vector) can reduce computation time by orders of
magnitude.
Before we start deﬁning what we mean by convergence, let’s consider the following situation. Recall that
dZ = φ
√
dt (4.74)
29
where φ is a random draw from a normal distribution with mean zero and variance one. Let’s imagine
generating a set of Z values at discrete times t
i
, e.g. Z(t
i
) = Z
i
, by
Z
i+1
= Z
i
+φ
√
∆t . (4.75)
Now, these are all legitimate points along a Brownian motion path, since there is no timestepping error here,
in view of equation (2.53). So, this set of values {Z
0
, Z
1
, ..., } are valid points along a Brownian path. Now,
recall that the exact solution (for a given Brownian path) of equation (4.72) is given by equation (2.57)
S(T) = S(0) exp[(µ −
σ
2
2
)t +σ(Z(T) −Z(0))] (4.76)
where T is the stopping time of the simulation.
Now if we integrate equation (4.72) using forward Euler, with the discrete timesteps ∆t = t
i+1
−t
i
, using
the realization of the Bownian path {Z
0
, Z
1
, ..., }, we will not get the exact solution (4.76). This is because
even though the Brownian path points are exact, time discretization errors are introduced in equation (4.73).
So, how can we systematically study convergence of algorithm (4.73)? We can simply take smaller timesteps.
However, we want to do this by ﬁlling in new Z values in the Brownian path, while keeping the old values
(since these are perfectly legitimate values). Let S(T)
h
represent the forward Euler solution (4.73) for a
ﬁxed timestep h. Let S(T) be the exact solution (4.76). As h → 0, we would expect S(T)
h
→ S(T), for a
given path.
4.8.1 The Brownian Bridge
So, given a set of valid Z
k
, how do we reﬁne this path, keeping the existing points along this path? In
particular, suppose we have two points Z
i
, Z
k
, at (t
i
, t
k
), and we would like to generate a point Z
j
at t
j
,
with t
i
< t
j
< t
k
. How should we pick Z
j
? What density function should we use when generating Z
j
, given
that Z
k
is known?
Let x, y be two draws from a normal distribution with mean zero and variance one. Suppose we have the
point Z(t
i
) = Z
i
and we generate Z(t
j
) = Z
j
, Z(t
k
) = Z
k
along the Wiener path,
Z
j
= Z
i
+x
t
j
−t
i
(4.77)
Z
k
= Z
j
+y
t
k
−t
j
(4.78)
Z
k
= Z
i
+x
t
j
−t
i
+y
t
k
−t
j
. (4.79)
So, given (x, y), and Z
i
, we can generate Z
j
, Z
k
. Suppose on the other hand, we have Z
i
, and we generate
Z
k
directly using
Z
k
= Z
i
+z
√
t
k
−t
i
, (4.80)
where z is N(0, 1). Then how do we generate Z
j
using equation (4.77)? Since we are now specifying that
we know Z
k
, this means that our method for generating Z
j
is constrained. For example, given z, we must
have that, from equations (4.79) and (4.80)
y =
z
√
t
k
−t
i
−x
√
t
j
−t
i
√
t
k
−t
j
. (4.81)
Now the probability density of drawing the pair (x, y) given z, denoted by p(x, yz) is
p(x, yz) =
p(x)p(y)
p(z)
(4.82)
where p(..) is a standard normal distribution, and we have used the fact that successive increments of a
Brownian process are uncorrelated.
30
From equation (4.81), we can write y = y(x, z), so that p(x, yz) = p(x, y(x, z)z)
p(x, y(x, z)z) =
p(x)p(y(x, z))
p(z)
=
1
√
2π
exp
¸
−
1
2
(x
2
+y
2
−z
2
)
(4.83)
or (after some algebra, using equation (4.81))
p(xz) =
1
√
2π
exp
¸
−
1
2
(x −αz)
2
/β
2
α =
t
j
−t
i
t
k
−t
i
β =
t
k
−t
j
t
k
−t
i
(4.84)
so that x is normally distributed with mean αz and variance β
2
. Since
z =
Z
k
−Z
i
√
t
k
−t
i
(4.85)
we have that x has mean
E(x) =
√
t
j
−t
i
t
k
−t
i
(Z
k
−Z
i
) (4.86)
and variance
E[(x −E(x))
2
] =
t
k
−t
j
t
k
−t
i
(4.87)
Now, let
x =
√
t
j
−t
i
t
k
−t
i
(Z
k
−Z
i
) +φ
t
k
−t
j
t
k
−t
i
(4.88)
where φ is N(0, 1). Clearly, x satisﬁes equations (4.86) and (4.88). Substituting equation (4.88) into (4.77)
gives
Z
j
=
t
k
−t
j
t
k
−t
i
Z
i
+
t
j
−t
i
t
k
−t
i
Z
k
+φ
(t
j
−t
i
)(t
k
−t
j
)
(t
k
−t
i
)
(4.89)
where φ is N(0, 1). Equation (4.89) is known as the Brownian Bridge.
Figure 4.1 shows diﬀerent Brownian paths constructed for diﬀerent timestep sizes. An initial coarse path
is constructed, then the ﬁne timestep path is constructed from the coarse path using a Brownian Bridge. By
construction, the ﬁnal timestep path will pass through the coarse timestep nodes.
Figure 4.2 shows the asset paths integrated using the forward Euler algorithm (4.73) fed with the Brow
nian paths in Figure 4.1. In this case, note that the ﬁne timestep path does not coincide with the coarse
timestep nodes, due to the timestepping error.
4.8.2 Strong and Weak Convergence
Since we are dealing with a probabilistic situation here, it is not obvious how to deﬁne convergence. Given
a number of points along a Brownian path, we could imagine reﬁning this path (using a Brownian Bridge),
and then seeing if the solution converged to exact solution. For the model SDE (4.72), we could ask that
E
S(T) −S
h
(T)
≤ Const. h
γ
(4.90)
31
Time
B
r
o
w
n
i
a
n
P
a
t
h
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
Time
B
r
o
w
n
i
a
n
P
a
t
h
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
Figure 4.1: Eﬀect of adding more points to a Brownian path using a Brownian bridge. Note that the small
timestep points match the coarse timestep points. Left: each coarse timestep is divided into 16 substeps.
Right: each coarse timestep divided into 64 substeps.
Time
A
s
s
e
t
P
r
i
c
e
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
80
85
90
95
100
105
110
115
120
Time
A
s
s
e
t
P
r
i
c
e
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
80
85
90
95
100
105
110
115
120
Figure 4.2: Brownian paths shown in Figure 4.1 used to determine asset price paths using forward Euler
timestepping (4.73). In this case, note that the asset paths for ﬁne and coarse timestepping do not agree at
the ﬁnal time (due to the timestepping error). Eventually, for small enough timesteps, the ﬁnal asset value
will converge to the exact solution to the SDE. Left: each coarse timestep is divided into 16 substeps. Right:
each coarse timestep divided into 64substeps.
32
T .25
σ .4
µ .06
S
0
100
Table 4.1: Data used in the convergence tests.
where the expectation in equation (4.90) is over many Brownian paths, and h is the timestep size. Note
that S(T) is the exact solution along a particular Brownian path; the same path used to compute S
h
(T).
Criterion (4.90) is called strong convergence. A less strict criterion is
E [S(T)] −E
S
h
(T)
 ≤ Const. h
γ
(4.91)
It can be shown that using forward Euler results in weak convergence with γ = 1, and strong convergence
with γ = .5.
Table 4.1 shows some test data used to integrate the SDE (4.72) using method (4.73). A series of Brownian
paths was constructed, beginning with a coarse timestep path. These paths were systematically reﬁned using
the Brownian Bridge construction. Table 4.2 shows results where the strong and weak convergence errors
are estimated as
Strong Error =
1
N
N
¸
i=1
S(T)
i
−S
h
(T)
i

(4.92)
Weak Error = 
1
N
N
¸
i=1
[S(T)
i
] −
1
N
N
¸
i=1
S
h
(T)
i
 , (4.93)
where S
h
(T)
i
is the solution obtained by forward Euler timestepping along the i
th Brownian path, and
S(T)
i
is the exact solution along this same path, and N is the number of samples. Note that for equation
(4.72), we have the exact solution
lim
N→∞
1
N
N
¸
i=1
[S(T)
i
] = S
0
e
µT
(4.94)
but we do not replace the approximate sampled value of the limit in equation (4.93) by the theoretical limit
(4.94). If we use enough Monte Carlo samples, we could replace the approximate expression
lim
N→∞
1
N
N
¸
i=1
[S(T)
i
]
by S
0
e
µT
, but for normal parameters, the Monte Carlo sampling error is much larger than the timestepping
error, so we would have to use an enormous number of Monte Carlo samples. Estimating the weak error
using equation (4.93) will measure the timestepping error, as opposed to the Monte Carlo sampling error.
However, for normal parameters, even using equation (4.93) requires a large number of Monte Carlo samples
in order to ensure that the error is dominated by the timestepping error.
In Table 4.1, we can see that the ratio of the errors is about
√
2 for the strong error, and about one for
the weak error. This is consistent with a convergence rate of γ = .5 for strong convergence, and γ = 1.0 for
weak convergence.
5 The Binomial Model
We have seen that a problem with the Monte Carlo method is that it is diﬃcult to use for valuing American
style options. Recall that the holder of an American option can exercise the option at any time and receive
33
Timesteps Strong Error (4.90) Weak Error (4.91)
72 .0269 .00194
144 .0190 .00174
288 .0135 .00093
576 .0095 .00047
Table 4.2: Convergence results, 100,000 samples used. Data in Table 4.1.
the payoﬀ. In order to determine whether or not it is worthwhile to hold the option, we have to compare the
value of continuing to hold the option (the continuation value) with the payoﬀ. If the continuation value is
greater than the payoﬀ, then we hold; otherwise, we exercise.
At any point in time, the continuation value depends on what happens in the future. Clearly, if we
simulate forward in time, as in the Monte Carlo approach, we don’t know what happens in the future, and
hence we don’t know how to act optimally. This is actually a dynamic programming problem. These sorts
of problems are usually solved by proceeding from the end point backwards. We use the same idea here. We
have to start from the terminal time and work backwards.
Recall that we can determine the noarbitrage value of an option by pretending we live in a riskneutral
world, where risky assets drift at r and are discounted at r. If we let X = log S, then the risk neutral process
for X is (from equation (2.55) )
dX = (r −
σ
2
2
)dt +σdZ . (5.1)
Now, we can construct a discrete approximation to this random walk using the lattice discussed in Section
2.5. In fact, all we have to do is let let α = r −
σ
2
2
, so that equation (5.1) is formally identical to equation
(2.4). In order to ensure that in the limit as ∆t → 0, we get the process (5.1), we require that the sizes of
the random jumps are ∆X = σ
√
∆t and that the probabilities of up (p) and down (q) moves are
p
r
=
1
2
[1 +
α
σ
√
∆t]
=
1
2
[1 +
r
σ
−
σ
2
√
∆t]
q
r
=
1
2
[1 −
α
σ
√
∆t]
=
1
2
[1 −
r
σ
−
σ
2
√
∆t] , (5.2)
where we have denoted the risk neutral probabilities by p
r
and q
r
to distinguish them from the real proba
bilities p, q.
Now, we will switch to a more common notation. If we are at node j, timestep n, we will denote this
node location by X
n
j
. Recall that X = log S, so that in terms of asset price, this is S
n
j
= e
X
n
j
.
Now, consider that at node (j, n), the asset can move up with probability p
r
and down with probability
q
r
. In other words
S
n
j
→S
n+1
j+1
; with probability p
r
S
n
j
→S
n+1
j
; with probability q
r
(5.3)
Now, since in Section 2.5 we showed that ∆X = σ
√
∆t, so that (S = e
X
)
S
n+1
j+1
= S
n
j
e
σ
√
∆t
S
n+1
j
= S
n
j
e
−σ
√
∆t
(5.4)
34
S
0
0
S
1
1
S
0
1
S
2
2
S
1
2
S
0
2
Figure 5.1: Lattice of stock price values
or
S
n
j
= S
0
0
e
(2j−n)σ
√
∆t
; j = 0, .., n (5.5)
So, the ﬁrst step in the process is to construct a tree of stock price values, as shown on Figure 5.
Associated with each stock price on the lattice is the option value V
n
j
. We ﬁrst set the value of the option
at T = N∆t to the payoﬀ. For example, if we are valuing a put option, then
V
N
j
= max(K −S
N
j
, 0) ; j = 0, ..., N (5.6)
Then, we can use the risk neutral world idea to determine the noarbitrage value of the option (it is the
expected value in the risk neutral world). We can do this by working backward through the lattice. The
value today is the discounted expected future value
European Lattice Algorithm
V
n
j
= e
−r∆t
p
r
V
n+1
j+1
+q
r
V
n+1
j
n = N −1, ..., 0
j = 0, ..., n (5.7)
Rolling back through the tree, we obtain the value at S
0
0
today, which is V
0
0
.
If the option is an American put, we can determine if it is optimal to hold or exercise, since we know the
continuation value. In this case the rollback (5.7) becomes
American Lattice Algorithm
(V
n
j
)
c
= e
−r∆t
p
r
V
n+1
j+1
+q
r
V
n+1
j
V
n
j
= max
(V
n
j
)
c
, max(K −S
n
j
, 0)
n = N −1, ..., 0
j = 0, ..., n (5.8)
which is illustrated in Figure 5.
The binomial lattice method has the following advantages
35
V
j
n
V
j
n
+
+
1
1
V
j
n+1
p
q
Figure 5.2: Backward recursion step.
• It is very easy to code for simple cases.
• It is easy to explain to managers.
• American options are easy to handle.
However, the binomial lattice method has the following disadvantages
• Except for simple cases, coding becomes complex. For example, if we want to handle simple barrier
options, things become nightmarish.
• This method is algebraically identical to an explicit ﬁnite diﬀerence solution of the BlackScholes
equation. Consequently, convergence is at an O(∆t) rate.
• The probabilities p
r
, q
r
are not real probabilities, they are simply the coeﬃcients in a particular dis
cretization of a PDE. Regarding them as probabilities leads to much fuzzy thinking, and complex
wrongheaded arguments.
If we are going to solve the BlackScholes PDE, we might as well do it right, and not fool around with
lattices.
5.1 A Noarbitrage Lattice
We can also derive the lattice method directly from the discrete lattice model in Section 2.5. Suppose we
assume that
dS = µSdt +σSdZ (5.9)
and letting X = log S, we have that
dX = (µ −
σ
2
2
)dt +σdZ (5.10)
so that α = µ −
σ
2
2
in equation (2.19). Now, let’s consider the usual hedging portfolio at t = n∆t, S = S
n
j
,
P
n
j
= V
n
j
−(α
h
)S
n
j
, (5.11)
36
where V
n
j
is the value of the option at t = n∆t, S = S
n
j
. At t = (n + 1)∆t,
S
n
j
→ S
n+1
j+1
; with probability p
S
n
j
→ S
n+1
j
; with probability q
S
n+1
j+1
= S
n
j
e
σ
√
∆t
S
n+1
j
= S
n
j
e
−σ
√
∆t
so that the value of the hedging portfolio at t = n + 1 is
P
n+1
j+1
= V
n+1
j+1
−(α
h
)S
n+1
j+1
; with probability p (5.12)
P
n+1
j
= V
n+1
j
−(α
h
)S
n+1
j
; with probability q . (5.13)
Now, as in Section 2.3, we can determine (α
h
) so that the value of the hedging portfolio is independent of
p, q. We do this by requiring that
P
n+1
j+1
= P
n+1
j
(5.14)
so that
V
n+1
j+1
−(α
h
)S
n+1
j+1
= V
n+1
j
−(α
h
)S
n+1
j
which gives
(α
h
) =
V
n+1
j+1
−V
n+1
j
S
n+1
j+1
−S
n+1
j
. (5.15)
Since this portfolio is risk free, it must earn the risk free rate of return, so that
P
n
j
= e
−r∆t
P
n+1
j+1
= e
−r∆t
P
n+1
j
. (5.16)
Now, substitute for P
n
j
from equation (5.11), with P
n+1
j+1
from equation (5.13), and (α
h
) from equation (5.15)
gives
V
n
j
= e
−r∆t
p
r∗
V
n+1
j+1
+q
r∗
V
n+1
j
p
r∗
=
e
r∆t
−e
−σ
√
∆t
e
σ
√
∆t
−e
−σ
√
∆t
q
r∗
= 1 −p
r∗
. (5.17)
Note that p
r∗
, q
r∗
do not depend on the real drift rate µ, which is expected. If we expand p
r∗
, q
r∗
in a Taylor
Series, and compare with the p
r
, q
r
in equations (5.2), we can show that
p
r∗
= p
r
+O((∆t)
3/2
)
q
r∗
= q
r
+O((∆t)
3/2
) . (5.18)
After a bit more work, one can show that the value of the option at t = 0, V
0
0
using either p
r∗
, q
r∗
or p
r
, q
r
is the same to O(∆t), which is not surprising, since these methods can both be regarded as an explicit
ﬁnite diﬀerence approximation to the BlackScholes equation, having truncation error O(∆t). The deﬁnition
p
r∗
, q
r∗
is the common deﬁnition in ﬁnance books, since the tree has noarbitrage.
What is the meaning of a noarbitrage tree? If we are sitting at node S
n
j
, and assuming that there are
only two possible future states
S
n
j
→ S
n+1
j+1
; with probability p
S
n
j
→ S
n+1
j
; with probability q
37
then using (α
h
) from equation (5.15) guarantees that the hedging portfolio has the same value in both future
states.
But let’s be a bit more sensible here. Suppose we are hedging a portfolio of RIM stock. Let ∆t = one
day. Suppose the price of RIM stocks is $10 today. Do we actually believe that tomorrow there are only two
possible prices for Rim stock
S
up
= 10e
σ
√
∆t
S
down
= 10e
−σ
√
∆t
?
(5.19)
Of course not. This is obviously a highly simpliﬁed model. The fact that there is noarbitrage in the context
of the simpliﬁed model does not really have a lot of relevance to the realworld situation. The best that
can be said is that if the BlackScholes model was perfect, then we have that the portfolio hedging ratios
computed using either p
r
, q
r
or p
r∗
, q
r∗
are both correct to O(∆t).
6 More on Ito’s Lemma
In Section 2.6.1, we mysteriously made the infamous comment
...it can be shown that dZ
2
→dt as dt →0
In this Section, we will give some justiﬁcation this remark. For a lot more details here, we refer the
reader to Stochastic Diﬀerential Equations, by Bernt Oksendal, Springer, 1998.
We have to go back here, and decide what the statement
dX = αdt +cdZ (6.1)
really means. The only sensible interpretation of this is
X(t) −X(0) =
t
0
α(X(s), s)ds +
t
0
c(X(s), s)dZ(s) . (6.2)
where we can interpret the integrals as the limit, as ∆t →0 of a discrete sum. For example,
t
0
c(X(s), s)dZ(s) = lim
∆t→0
j=N−1
¸
j=0
c
j
∆Z
j
c
j
= c(X(Z
j
), t
j
)
Z
j
= Z(t
j
)
∆Z
j
= Z(t
j+1
) −Z(t
j
)
∆t = t
j+1
−t
j
N = t/(∆t) (6.3)
In particular, in order to derive Ito’s Lemma, we have to decide what
t
0
c(X(s), s) dZ(s)
2
(6.4)
means. Replace the integral by a sum,
t
0
c(X(s), s) dZ(s)
2
= lim
∆t→0
j=N−1
¸
j=0
c(X
j
, t
j
)∆Z
2
j
. (6.5)
Note that we have evaluated the integral using the left hand end point of each subinterval (the no peeking
into the future principle).
38
From now on, we will use the notation
¸
j
≡
j=N−1
¸
j=0
. (6.6)
Now, we claim that
t
0
c(X(s), s)dZ
2
(s) =
t
0
c(X(s), s)ds (6.7)
or
lim
∆t→0
¸
j
c
j
∆Z
2
j
¸
¸
= lim
∆t→0
¸
j
c
j
∆t (6.8)
which is what we mean by equation (6.7). i.e. we can say that dZ
2
→dt as dt →0.
Now, let’s consider a ﬁnite ∆t, and consider the expression
E
¸
¸
j
c
j
∆Z
2
j
−
¸
j
c
j
∆t
¸
2
¸
¸
¸ (6.9)
If equation (6.9) tends to zero as ∆t →0, then we can say that (in the mean square limit)
lim
∆t→0
¸
j
c
j
∆Z
2
j
¸
¸
= lim
∆t→0
¸
j
c
j
∆t
=
t
0
c(X(s), s) ds (6.10)
so that in this sense
t
0
c(X, s) dZ
2
=
t
0
c(X, s) ds (6.11)
and hence we can say that
dZ
2
→dt (6.12)
with probability one as ∆t →0.
Now, expanding equation (6.9) gives
E
¸
¸
j
c
j
∆Z
2
j
−
¸
j
c
j
∆t
¸
2
¸
¸
¸ =
¸
ij
E
c
j
(∆Z
2
j
−∆t)c
i
(∆Z
2
i
−∆t)
. (6.13)
Now, we assume that
E
c
j
(∆Z
2
j
−∆t)c
i
(∆Z
2
i
−∆t)
= δ
ij
E
c
2
i
(∆Z
2
i
−∆t)
2
(6.14)
which means that c
j
(∆Z
2
j
−∆t) and c
i
(∆Z
2
i
−∆t) are independent if i = j. This follows since
• The increments of Brownian motion are uncorrelated, i.e. Cov [∆Z
i
∆Z
j
] = 0, i = j, which means
that Cov
∆Z
2
i
∆Z
2
j
= 0, or E
(∆Z
2
j
−∆t)(∆Z
2
i
−∆t)
= 0, i = j.
• c
i
= c(t
i
, X(Z
i
)), and ∆Z
i
are independent.
39
It also follows from the above properties that
E[c
2
j
(∆Z
2
j
−∆t)
2
] = E[c
2
j
] E[(∆Z
2
j
−∆t)
2
] (6.15)
since c
j
and (∆Z
2
j
−∆t) are independent.
Using equations (6.146.15), then equation (6.13) becomes
¸
ij
E
c
j
(∆Z
2
j
−∆t) c
i
(∆Z
2
i
−∆t)
=
¸
i
E[c
2
i
] E
(∆Z
2
i
−∆t)
2
. (6.16)
Now,
¸
i
E[c
2
i
] E
(∆Z
2
i
−∆t)
2
=
¸
i
E[c
2
i
]
E
∆Z
4
i
−2∆tE
∆Z
2
i
+ (∆t)
2
. (6.17)
Recall that (∆Z)
2
is N(0, ∆t) ( normally distributed with mean zero and variance ∆t) so that
E
(∆Z
i
)
2
= ∆t
E
(∆Z
i
)
4
= 3(∆t)
2
(6.18)
so that equation (6.17) becomes
E
∆Z
4
i
−2∆tE
∆Z
2
i
+ (∆t)
2
= 2(∆t)
2
(6.19)
and
¸
i
E[c
2
i
] E
(∆Z
2
i
−∆t)
2
= 2
¸
i
E[c
2
i
](∆t)
2
= 2∆t
¸
i
E[c
2
i
]∆t
= O(∆t) (6.20)
so that we have
E
¸
¸
c
j
∆Z
2
j
−
¸
j
c
j
∆t
¸
2
¸
¸
¸ = O(∆t) (6.21)
or
lim
∆t→0
E
¸
¸
c
j
∆Z
2
j
−
t
0
c(s, X(s))ds
2
¸
= 0 (6.22)
so that in this sense we can write
dZ
2
→ dt ; dt →0 . (6.23)
7 Derivative Contracts on nontraded Assets and Real Options
The hedging arguments used in previous sections use the underlying asset to construct a hedging portfolio.
What if the underlying asset cannot be bought and sold, or is nonstorable? If the underlying variable is
an interest rate, we can’t store this. Or if the underlying asset is bandwidth, we can’t store this either.
However, we can get around this using the following approach.
40
7.1 Derivative Contracts
Let the underlying variable follow
dS = a(S, t)dt +b(S, t)dZ, (7.1)
and let F = F(S, t), so that from Ito’s Lemma
dF =
¸
aF
S
+
b
2
2
F
SS
+F
t
dt +bF
S
dZ, (7.2)
or in shorter form
dF = µdt +σ
∗
dZ
µ = aF
S
+
b
2
2
F
SS
+F
t
σ
∗
= bF
S
. (7.3)
Now, instead of hedging with the underlying asset, we will hedge one contract with another. Suppose we
have two contracts F
1
, F
2
(they could have diﬀerent maturities for example). Then
dF
1
= µ
1
dt +σ
∗
1
dZ
dF
2
= µ
2
dt +σ
∗
2
dZ
µ
i
= a(F
i
)
S
+
b
2
2
(F
i
)
SS
+ (F
i
)
t
σ
∗
i
= b(F
i
)
S
; i = 1, 2 . (7.4)
Consider the portfolio P such that (recall that ∆
1
, ∆
2
are constant over the hedging interval)
P = ∆
1
F
1
+ ∆
2
F
2
dP = ∆
1
(µ
1
dt +σ
∗
1
dZ) + ∆
2
(µ
2
dt +σ
∗
2
dZ) (7.5)
We can eliminate the random term by choosing
∆
1
= σ
∗
2
∆
2
= −σ
∗
1
(7.6)
so that
dP = (σ
∗
2
µ
1
−σ
∗
1
µ
2
)dt (7.7)
Since this portfolio is riskless, it must earn the riskfree rate of return,
(σ
∗
2
µ
1
−σ
∗
1
µ
2
)dt = r(∆
1
F
1
+ ∆
2
F
2
)dt (7.8)
and after some simpliﬁcation we obtain
µ
1
−rF
1
σ
∗
1
=
µ
2
−rF
2
σ
∗
2
(7.9)
Let λ
S
(S, t) be the value of both sides of equation (7.9), so that
µ
1
−rF
1
σ
∗
1
= λ
S
(S, t)
µ
2
−rF
2
σ
∗
2
= λ
S
(S, t) . (7.10)
41
Dropping the subscripts, we obtain
µ −rF
1
σ
∗
= λ
S
(7.11)
Substituting µ, σ
∗
from equations (7.3) into equation (7.11) gives
F
t
+
b
2
2
F
SS
+ (a −λ
S
b)F
S
−rF = 0 . (7.12)
Equation (7.12) is the PDE satisﬁed by a derivative contract on any asset S, traded or not. Suppose
µ = Fµ
σ
∗
= Fσ
(7.13)
so that we can write
dF = Fµ
dt +Fσ
dZ (7.14)
then using equation (7.13) in equation (7.11) gives
µ
= r +λ
S
σ
(7.15)
which has the convenient interpretation that the expected return on holding (not hedging) the derivative
contract F is the riskfree rate plus extra compensation due to the riskiness of holding F. The extra return
is λ
S
σ
, where λ
S
is the market price of risk of S (which should be the same for all contracts depending on
S) and σ
is the volatility of F. Note that the volatility and drift of F are not the volatility and drift of the
underlying asset S.
If S is a traded asset, it must satisfy equation (7.12). Substituting F = S into equation (7.13) gives
(a −λ
S
b) = rS, so that F then satisﬁes the usual BlackScholes equation if b = σS.
If we believe that the Capital Asset Pricing Model holds, then a simple minded idea is to estimate
λ
S
= ρ
SM
λ
M
(7.16)
where λ
M
is the price of risk of the market portfolio, and ρ
SM
is the correlation of returns between S and
the returns of the market portfolio.
Another idea is the following. Suppose we can ﬁnd some companies whose main source of business is
based on S. Let q
i
be the price of this companies stock at t = t
i
. The return of the stock over t
i
−t
i−1
is
R
i
=
q
i
−q
i−1
q
i−1
.
Let R
M
i
be the return of the market portfolio (i.e. a broad index) over the same period. We compute β as
the best ﬁt linear regression to
R
i
= α +βR
M
i
which means that
β =
Cov(R, R
M
)
V ar(R
M
)
. (7.17)
Now, from CAPM we have that
E(R) = r +β
E(R
M
) −r
(7.18)
where E(...) is the expectation operator. We would like to determine the unlevered β, denoted by β
u
, which
is the β for an investment made using equity only. In other words, if the ﬁrm we used to compute the β
above has signiﬁcant debt, its riskiness with respect to S is ampliﬁed. The unlevered β can be computed by
β
u
=
E
E + (1 −T
c
)D
β (7.19)
42
where
D = long term debt
E = Total market capitalization
T
c
= Corporate Tax rate . (7.20)
So, now the expected return from a pure equity investment based on S is
E(R
u
) = r+ β
u
E(R
M
) −r
. (7.21)
If we assume that F in equation (7.14) is the company stock, then
µ
= E(R
u
)
= r +β
u
E(R
M
) −r
(7.22)
But equation (7.15) says that
µ
= r +λ
S
σ
. (7.23)
Combining equations (7.227.22) gives
λ
S
σ
= β
u
E(R
M
) −r
. (7.24)
Recall from equations (7.3) and (7.13) that
σ
∗
= Fσ
σ
∗
= bF
S
,
or
σ
=
bF
S
F
. (7.25)
Combining equations (7.247.25) gives
λ
S
=
β
u
E(R
M
) −r
bF
S
F
. (7.26)
In principle, we can now compute λ
S
, since
• The unleveraged β
u
is computed as described above. This can be done using market data for a speciﬁc
ﬁrm, whose main business is based on S, and the ﬁrms balance sheet.
• b(S, t)/S is the volatility rate of S (equation (7.1)).
• [E(R
M
) −r] can be determined from historical data. For example, the expected return of the market
index above the risk free rate is about 6% for the past 50 years of Canadian data.
• The risk free rate r is simply the current Tbill rate.
• F
S
can be estimated by computing a linear regression of the stock price of a ﬁrm which invests in
S, and S. Now, this may have to be unlevered, to reduce the eﬀect of debt. If we are going to now
value the real option for a speciﬁc ﬁrm, we will have to make some assumption about how the ﬁrm will
ﬁnance a new investment. If it is going to use pure equity, then we are done. If it is a mixture of debt
and equity, we should relever the value of F
S
. At this point, we need to talk to a Finance Professor to
get this right.
43
7.2 A Forward Contract
A forward contract is a special type of derivative contract. The holder of a forward contract agrees to buy or
sell the underlying asset at some delivery price K in the future. K is determined so that the cost of entering
into the forward contract is zero at its inception.
The payoﬀ of a (long) forward contract expiring at t = T is then
V (S, τ = 0) = S(T) −K . (7.27)
Note that there is no optionality in a forward contract.
The value of a forward contract is a contingent claim. and its value is given by equation (7.12)
V
t
+
b
2
2
V
SS
+ (a −λ
S
b)V
S
−rV = 0 . (7.28)
Now we can also use a simple noarbitrage argument to express the value of a forward contract in terms
of the original delivery price K, (which is set at the inception of the contract) and the current forward price
f(S, τ). Suppose we are long a forward contract with delivery price K. At some time t > 0, (τ < T), the
forward price is no longer K. Suppose the forward price is f(S, τ), then the payoﬀ of a long forward contract,
entered into at (τ) is
Payoﬀ = S(T) −f(S(τ), τ) .
Suppose we are long the forward contract struck at t = 0 with delivery price K. At some time t > 0, we
hedge this contract by going short a forward with the current delivery price f(S, τ) (which costs us nothing
to enter into). The payoﬀ of this portfolio is
S −K −(S −f) = f −K (7.29)
Since f, K are known with certainty at (S, τ), then the value of this portfolio today is
(f −K)e
−rτ
. (7.30)
But if we hold a forward contract today, we can always construct the above hedge at no cost. Therefore,
V (S, τ) = (f −K)e
−rτ
. (7.31)
Substituting equation (7.31) into equation (7.28), and noting that K is a constant, gives us the following
PDE for the forward price (the delivery price which makes the forward contract worth zero at inception)
f
τ
=
b
2
2
f
SS
+ (a −λ
S
b)f
S
(7.32)
with terminal condition
f(S, τ = 0) = S (7.33)
which can be interpreted as the fact that the forward price must equal the spot price at t = T.
Suppose we can estimate a, b in equation (7.32), and there are a set of forward prices available. We can
then estimate λ
S
by solving equation (7.32) and adjusting λ
S
until we obtain a good ﬁt for the observed
forward prices.
7.2.1 Convenience Yield
We can also write equation (7.32) as
f
t
+
b
2
2
f
SS
+ (r −δ)Sf
S
= 0 (7.34)
where δ is deﬁned as
δ = r −
a −λ
S
b
S
. (7.35)
In this case, we can interpret δ as the convenience yield for holding the asset. For example, there is a
convenience to holding supplies of natural gas in reserve.
44
8 Discrete Hedging
In practice, we cannot hedge at inﬁnitesimal time intervals. In fact, we would like to hedge as infrequently as
possible, since in real life, there are transaction costs (something which is ignored in the basic BlackScholes
equation, but which can be taken into account and results in a nonlinear PDE).
8.1 Delta Hedging
Recall that the basic derivation of the BlackScholes equation used a hedging portfolio where we hold V
S
shares. In ﬁnance, V
S
is called the option delta, hence this strategy is called delta hedging.
As an example, consider the hedging portfolio P(t) which is composed of
• A short option position in an option −V (t).
• Long α(t)
h
S(t) shares
• An amount in a riskfree bank account B(t).
Initially, we have
P(0) = 0 = −V (0) +α(0)
h
S(0) +B(0)
α = V
S
B(0) = V (0) −α(0)
h
S(0)
The hedge is rebalanced at discrete times t
i
. Deﬁning
α
h
i
= V
S
(S
i
, t
i
)
V
i
= V (S
i
, t
i
)
then, we have to update the hedge by purchasing α
i
− α
i−1
shares at t = t
i
, so that updating our share
position requires
S(t
i
)(α
h
i
−α
h
i−1
)
in cash, which we borrow from the bank if (α
h
i
−α
h
i−1
) > 0. If (α
h
i
−α
h
i−1
) < 0, then we sell some shares and
deposit the proceeds in the bank account. If ∆t = t
i
−t
i−1
, then the bank account balance is updated by
B
i
= e
r∆t
B
i−1
−S
i
(α
h
i
−α
h
i−1
)
At the instant after the rebalancing time t
i
, the value of the portfolio is
P(t
i
) = −V (t
i
) +α(t
i
)
h
S(t
i
) +B(t
i
)
Since we are hedging at discrete time intervals, the hedge is no longer risk free (it is risk free only in the
limit as the hedging interval goes to zero). We can determine the distribution of proﬁt and loss ( P& L) by
carrying out a Monte Carlo simulation. Suppose we have precomputed the values of V
S
for all the likely (S, t)
values. Then, we simulate a series of random paths. For each random path, we determine the discounted
relative hedging error
error =
e
−rT
∗
P(T
∗
)
V (S
0
, t = 0)
(8.1)
After computing many sample paths, we can plot a histogram of relative hedging error, i.e. fraction of Monte
Carlo trials giving a hedging error between E and E+∆E. We can compute the variance of this distribution,
and also the value at risk (VAR). VAR is the worst case loss with a given probability. For example, a typical
VAR number reported is the maximum loss that would occur 95% of the time. In other words, ﬁnd the value
45
0
0.5
1
1.5
2
2.5
3
3.5
4
3 2.5 2 1.5 1 0.5 0 0.5 1
R
e
la
t
iv
e
f
r
e
q
u
e
n
c
y
Relative hedge error
0
0.5
1
1.5
2
2.5
3
3.5
4
3 2.5 2 1.5 1 0.5 0 0.5 1
R
e
la
t
iv
e
f
r
e
q
u
e
n
c
y
Relative hedge error
Figure 8.1: Relative frequency (yaxis) versus relative P&L of delta hedging strategies. Left: no hedging,
right: rebalance hedge once a month. American put, T = .25, σ = .3, r = .06, µ = .08, K = S
0
= 100. The
relative P&L is computed by dividing the actual P&L by the BlackScholes price.
of E along the xaxis such that the area under the histogram plot to the right of this point is .95× the total
area.
As an example, consider the case of an American put option, T = .25, σ = .3, r = .06, K = S
0
= 100. At
t = 0, S
0
= 100. Since there are discrete hedging errors, the results in this case will depend on the stock drift
rate, which we set at µ = .08. The initial value of the American put, obtained by solving the BlackScholes
linear complementarity problem, is $5.34. Figure 8.1 shows the results for no hedging, and hedging once
a month. The xaxis in these plots shows the relative P & L of this portfolio (i.e. P & L divided by the
BlackScholes price), and the yaxis shows the relative frequency.
Relative P& L =
Actual P& L
BlackScholes price
(8.2)
Note that the nohedging strategy actually has a high probability of ending up with a proﬁt (from the
option writer’s point of view) since the drift rate of the stock is positive. In this case, the hedger does
nothing, but simply pockets the option premium. Note the sudden jump in the relative frequency at relative
P&L = 1. This is because the maximum the option writer stands to gain is the option premium, which
we assume is the BlackScholes value. The writer makes this premium for any path which ends up S > K,
which is many paths, hence the sudden jump in probability. However, there is signiﬁcant probability of a
loss as well. Figure 8.1 also shows the relative frequency of the P&L of hedging once a month (only three
times during the life of the option).
In fact, there is a history of Ponzilike hedge funds which simply write put options with essentially no
hedging. In this case, these funds will perform very well for several years, since markets tend to drift up on
average. However, then a sudden market drop occurs, and they will blow up. Blowing up is a technical term
for losing all your capital and being forced to get a real job. However, usually the owners of these hedge
funds walk away with large bonuses, and the shareholders take all the losses.
Figure 8.2 shows the results for rebalancing the hedge once a week, and daily. We can see clearly here
that the mean is zero, and variance is getting smaller as the hedging interval is reduced. In fact, one can
show that the variance of the hedge error should be proportional to
√
∆t where ∆t is the hedge rebalance
frequency.
8.2 Gamma Hedging
In an attempt to account for some the errors in delta hedging at ﬁnite hedging intervals, we can try to use
second derivative information. The second derivative of an option value V
SS
is called the option gamma,
hence this strategy is termed deltagamma hedging.
A gamma hedge consists of
46
0
0.5
1
1.5
2
2.5
3
3.5
4
3 2.5 2 1.5 1 0.5 0 0.5 1
R
e
la
t
iv
e
f
r
e
q
u
e
n
c
y
Relative hedge error
0
0.5
1
1.5
2
2.5
3
3.5
4
3 2.5 2 1.5 1 0.5 0 0.5 1
R
e
la
t
iv
e
f
r
e
q
u
e
n
c
y
Relative hedge error
Figure 8.2: Relative frequency (yaxis) versus relative P&L of delta hedging strategies. Left: rebalance hedge
once a week, right: rebalance hedge daily. American put, T = .25, σ = .3, r = .06, µ = .08, K = S
0
= 100.
The relative P&L is computed by dividing the actual P&L by the BlackScholes price.
• A short option position −V (t).
• Long α
h
S(t) shares
• Long β another derivative security I.
• An amount in a riskfree bank account B(t).
Now, recall that we consider α
h
, β to be constant over the hedging interval (no peeking into the future),
so we can regard these as constants (for the duration of the hedging interval).
The hedge portfolio P(t) is then
P(t) = −V +α
h
S +βI +B(t)
Assuming that we buy and hold α
h
shares and β of the secondary instrument at the beginning of each
hedging interval, then we require that
∂P
∂S
= −
∂V
∂S
+α
h
+β
∂I
∂S
= 0
∂
2
P
∂S
2
= −
∂
2
V
∂S
2
+β
∂
2
I
∂S
2
= 0 (8.3)
Note that
• If β = 0, then we get back the usual delta hedge.
• In order for the gamma hedge to work, we need an instrument which has some gamma (the asset S has
second derivative zero). Hence, traders often speak of being long (positive) or short (negative) gamma,
and try to buy/sell things to get gamma neutral.
So, at t = 0 we have
P(0) = 0 ⇒B(0) = V (0) −α
h
0
S
0
−β
0
I
0
The amounts α
h
i
, β
i
are determined by requiring that equation (8.3) hold
−(V
S
)
i
+α
h
i
+β
i
(I
S
)
i
= 0
−(V
SS
)
i
+β
i
(I
SS
)
i
= 0 (8.4)
The bank account balance is then updated at each hedging time t
i
by
B
i
= e
r∆t
B
i−1
−S
i
(α
h
i
−α
h
i−1
) −I
i
(β
i
−β
i−1
)
47
0
5
10
15
20
25
30
35
40
3 2.5 2 1.5 1 0.5 0 0.5 1
R
e
la
t
iv
e
f
r
e
q
u
e
n
c
y
Relative hedge error
0
5
10
15
20
25
30
35
40
3 2.5 2 1.5 1 0.5 0 0.5 1
R
e
la
t
iv
e
f
r
e
q
u
e
n
c
y
Relative hedge error
Figure 8.3: Relative frequency (yaxis) versus relative P&L of gamma hedging strategies. Left: rebalance
hedge once a week, right: rebalance hedge daily. Dotted lines show the delta hedge for comparison. American
put, T = .25, σ = .3, r = .06, µ = .08, K = 100, S
0
= 100. Secondary instrument: European put option,
same strike, T = .5 years. The relative P&L is computed by dividing the actual P&L by the BlackScholes
price.
We will consider the same example as we used in the delta hedge example. For an additional instrument,
we will use a European put option written on the same underlying with the same strike price and a maturity
of T=.5 years.
Figure 8.3 shows the results of gamma hedging, along with a comparison on delta hedging. In principle,
gamma hedging produces a smaller variance with less frequent hedging. However, we are exposed to more
model error in this case, since we need to be able to compute the second derivative of the theoretical price.
8.3 Vega Hedging
The most important parameter in the option pricing is the volatility. What if we are not sure about the
value of the volatility? It is possible to assume that the volatility itself is stochastic, i.e.
dS = µSdt +
√
vSdZ
1
dv = κ(θ −v)dt +σ
v
√
vdZ
2
(8.5)
where µ is the expected growth rate of the stock price,
√
v is its instantaneous volatility, κ is a parameter
controlling how fast v reverts to its mean level of θ, σ
v
is the “volatility of volatility” parameter, and Z
1
, Z
2
are Wiener processes with correlation parameter ρ.
If we use the model in equation (8.5), the this will result in a two factor PDE to solve for the option price
and the hedging parameters. Since there are two sources of risk (dZ
1
, dZ
2
), we will need to hedge with the
underlying asset and another option (Heston, A closed form solution for options with stochastic volatility
with applications to bond and currency options, Rev. Fin. Studies 6 (1993) 327343).
Another possibility is to assume that the volatility is uncertain, and to assume that
σ
min
≤ σ ≤ σ
max
,
and to hedge based on a worst case (from the hedger’s point of view). This results in an uncertain volatil
ity model (Avellaneda, Levy, Paris, Pricing and Hedging Derivative Securities in Markets with Uncertain
Volatilities, Appl. Math. Fin. 2 (1995) 7788). This is great if you can get someone to buy this option at
this price, because the hedger is always guaranteed to end up with a nonnegative balance in the hedging
portfolio. But you may not be able to sell at this price, since the option price is expensive (after all, the
price you get has to cover the worst case scenario).
An alternative, much simpler, approach (and therefore popular in industry), is to construct a vega hedge.
We assume that we know the volatility, and price the option in the usual way. Then, as with a gamma
hedge, we construct a portfolio
48
• A short option position −V (t).
• Long α
h
S(t) shares
• Long β another derivative security I.
• An amount in a riskfree bank account B(t).
The hedge portfolio P(t) is then
P(t) = −V +α
h
S +βI +B(t)
Assuming that we buy and hold α
h
shares and β of the secondary instrument at the beginning of each
hedging interval, then we require that
∂P
∂S
= −
∂V
∂S
+α
h
+β
∂I
∂S
= 0
∂P
∂σ
= −
∂V
∂σ
+β
∂I
∂σ
= 0 (8.6)
Note that if we assume that σ is constant when pricing the option, yet do not assume σ is constant when
we hedge, this is somewhat inconsistent. Nevertheless, we can determine the derivatives in equation (8.6)
numerically (solve the pricing equation for several diﬀerent values of σ, and then ﬁnite diﬀerence the solu
tions).
In practice, we would sell the option priced using our best estimate of σ (today). This is usually based on
looking at the prices of traded options, and then backing out the volatility which gives back today’s traded
option price (this is the implied volatility). Then as time goes on, the implied volatility will likely change.
We use the current implied volatility to determine the current hedge parameters in equation (8.6). Since
this implied volatility has likely changed since we last rebalanced the hedge, there is some error in the hedge.
However, taking into account the change in the hedge portfolio through equations (8.6) should make up for
this error. This procedure is called deltavega hedging.
In fact, even if the underlying process is a stochastic volatility, the vega hedge computed using a constant
volatility model works surprisingly well (Hull and White, The pricing of options on assets with stochastic
volatilities, J. of Finance, 42 (1987) 281300).
9 Jump Diﬀusion
Recall that if
dS = µSdt +σS dZ (9.1)
then from Ito’s Lemma we have
d[log S] = [µ −
σ
2
2
] dt +σ dZ. (9.2)
Now, suppose that we observe asset prices at discrete times t
i
, i.e. S(t
i
) = S
i
, with ∆t = t
i+1
− t
i
. Then
from equation (9.2) we have
log S
i+1
−log S
i
= log(
S
i+1
S
i
)
[µ −
σ
2
2
] ∆t +σφ
√
∆t (9.3)
where φ is N(0, 1). Now, if ∆t is suﬃciently small, then ∆t is much smaller than
√
∆t, so that equation
(9.3) can be approximated by
log(
S
i+1
−S
i
+S
i
S
i
) = log(1 +
S
i+1
−S
i
S
i
)
σφ
√
∆t. (9.4)
49
Deﬁne the return R
i
in the period t
i+1
−t
i
as
R
i
=
S
i+1
−S
i
S
i
(9.5)
so that equation (9.4) becomes
log(1 +R
i
) R
i
= σφ
√
∆t.
Consequently, a plot of the discretely observed returns of S should be normally distributed, if the as
sumption (9.1) is true. In Figure 9.1 we can see a histogram of monthly returns from the S&P500 for the
period 1982 −2002. The histogram has been scaled to zero mean and unit standard deviation. A standard
normal distribution is also shown. Note that for real data, there is a higher peak, and fatter tails than
the normal distribution. This means that there is higher probability of zero return, or a large gain or loss
compared to a normal distribution.
−1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6
0
5
10
15
20
Figure 9.1: Probability density functions for the S&P 500 monthly returns 1982 −2002, scaled to zero mean
and unit standard deviation and the standardized Normal distribution.
As ∆t →0, Geometric Brownian Motion (equation (9.1)) assumes that the probability of a large return
also tends to zero. The amplitude of the return is proportional to
√
∆t, so that the tails of the distribution
become unimportant.
But, in real life, we can sometimes see very large returns (positive or negative) in small time increments.
It therefore appears that Geometric Brownian Motion (GBM) is missing something important.
9.1 The Poisson Process
Consider a process where most of the time nothing happens (contrast this with Brownian motion, where
some changes occur at any time scale we look at), but on rare occasions, a jump occurs. The jump size does
not depend on the time interval, but the probability of the jump occurring does depend on the interval.
More formally, consider the process dq where, in the interval [t, t +dt],
dq = 1 ; with probability λdt
= 0 ; with probability 1 −λdt. (9.6)
Note, once again, that size of the Poisson outcome does not depend on dt. Also, the probability of a jump
occurring in [t, t +dt] goes to zero as dt →0, in contrast to Brownian motion, where some movement always
takes place (the probability of movement is constant as dt →0), but the size of the movement tends to zero
as dt →0. For future reference, note that
E[dq] = λ dt · 1 + (1 −λ dt) · 0
= λ dt (9.7)
50
and
V ar(dq) = E[(dq −E[dq])
2
]
= E[(dq −λ dt)
2
]
= (1 −λ dt)
2
· λ dt + (0 −λ dt)
2
· (1 −λ dt)
= λ dt +O((dt)
2
) . (9.8)
Now, suppose we assume that, along with the usual GBM, occasionally the asset jumps, i.e. S → JS,
where J is the size of a (proportional) jump. We will restrict J to be nonnegative.
Suppose a jump occurs in [t, t +dt], with probability λdt. Let’s write this jump process as an SDE, i.e.
[dS]
jump
= (J −1)S dq
since, if a jump occurs
S
after jump
= S
before jump
+ [dS]
jump
= S
before jump
+ (J −1)S
before jump
= JS
before jump
(9.9)
which is what we want to model. So, if we have a combination of GBM and a rare jump event, then
dS = µS dt +σS dZ + (J −1)S dq (9.10)
Assume that the jump size has some known probability density g(J), i.e. given that a jump occurs, then the
probability of a jump in [J, J +dJ] is g(J) dJ, and
+∞
−∞
g(J) dJ =
∞
0
g(J) dJ = 1 (9.11)
since we assume that g(J) = 0 if J < 0. For future reference, if f = f(J), then the expected value of f is
E[f] =
∞
0
f(J)g(J) dJ . (9.12)
9.2 The Jump Diﬀusion Pricing Equation
Now, form the usual hedging portfolio
P = V −αS . (9.13)
Now, consider
[dP]
total
= [dP]
Brownian
+ [dP]
jump
(9.14)
where, from Ito’s Lemma
[dP]
Brownian
= [V
t
+
σ
2
S
2
2
V
SS
]dt + [V
S
−αS](µS dt +σS dZ) (9.15)
and, noting that the jump is of ﬁnite size,
[dP]
jump
= [V (JS, t) −V (S, t)] dq −α(J −1)S dq . (9.16)
If we hedge the Brownian motion risk, by setting α = V
S
, then equations (9.149.16) give us
dP = [V
t
+
σ
2
S
2
2
V
SS
]dt + [V (JS, t) −V (S, t)]dq −V
S
(J −1)S dq . (9.17)
51
So, we still have a random component (dq) which we have not hedged away. Let’s take the expected value
of this change in the portfolio, e.g.
E(dP) = [V
t
+
σ
2
S
2
2
V
SS
]dt +E[V (JS, t) −V (S, t)]E[dq] −V
S
SE[J −1]E[dq] (9.18)
where we have assumed that probability of the jump and the probability of the size of the jump are inde
pendent. Deﬁning E(J −1) = κ, then we have that equation (9.18) becomes
E(dP) = [V
t
+
σ
2
S
2
2
V
SS
]dt +E[V (JS, t) −V (S, t)]λ dt −V
S
Sκλ dt . (9.19)
Now, we make a rather interesting assumption. Assume that an investor holds a diversiﬁed portfolio of
these hedging portfolios, for many diﬀerent stocks. If we make the rather dubious assumption that these
jumps for diﬀerent stocks are uncorrelated, then the variance of this portfolio of portfolios is small, hence
there is little risk in this portfolio. Hence, the expected return should be
E[dP] = rP dt . (9.20)
Now, equating equations (9.19 and (9.20) gives
V
t
+
σ
2
S
2
2
V
SS
+V
S
[rS −Sκλ] −(r +λ)V +E[V (JS, t)]λ = 0 . (9.21)
Using equation (9.12) in equation (9.21) gives
V
t
+
σ
2
S
2
2
V
SS
+V
S
[rS −Sκλ] −(r +λ)V +λ
∞
0
g(J)V (JS, t) dJ = 0 . (9.22)
Equation (9.22) is a Partial Integral Diﬀerential Equation (PIDE).
A common assumption is to assume that g(J) is log normal,
g(J) =
exp
−
(log(J)
2
−ˆ µ)
2γ
2
√
2πγJ
. (9.23)
where, some algebra shows that
E(J −1) = κ = exp(ˆ µ +γ
2
/2) −1 . (9.24)
Now, what about our dubious assumption that jump risk was diversiﬁable? In practice, we can regard
σ, ˆ µ, γ, λ as parameters, and ﬁt them to observed option prices. If we do this, (see L. Andersen and J.
Andreasen, JumpDiﬀusion processes: Volatility smile ﬁtting and numerical methods, Review of Derivatives
Research (2002), vol 4, pages 231262), then we ﬁnd that σ is close to historical volatility, but that the ﬁtted
values of λ, ˆ µ are at odds with the historical values. The ﬁtted values seem to indicate that investors are
pricing in larger more frequent jumps than has been historically observed. In other words, actual prices seem
to indicate that investors do require some compensation for jump risk, which makes sense. In other words,
these parameters contain a market price of risk.
Consequently, our assumption about jump risk being diversiﬁable is not really a problem if we ﬁt the
jump parameters from market (as opposed to historical) data, since the marketﬁt parameters will contain
some eﬀect due to risk preferences of investors.
One can be more rigorous about this if you assume some utility function for investors. See (Alan Lewis,
Fear of jumps, Wilmott Magazine, December, 2002, pages 6067) or (V. Naik, M. Lee, General equilibrium
pricing of options on the market portfolio with discontinuous returns, The Review of Financial Studies, vol
3 (1990) pages 493521.)
52
10 Mean Variance Portfolio Optimization
An introduction to Computational Finance would not be complete without some discussion of Portfolio
Optimization. Consider a risky asset which follows Geometric Brownian Motion with drift
dS
S
= µ dt +σ dZ , (10.1)
where as usual dZ = φ
√
dt and φ ∼ N(0, 1). Suppose we consider a ﬁxed ﬁnite interval ∆t, then we can
write equation (10.1) as
R = µ
+σ
φ
R =
∆S
S
µ
= µ∆t
σ
= σ
√
∆t , (10.2)
where R is the actual return on the asset in [t, t + ∆t], µ
is the expected return on the asset in [t, t + ∆t],
and σ
is the standard deviation of the return on the asset in [t, t + ∆t].
Now consider a portfolio of N risky assets. Let R
i
be the return on asset i in [t, t + ∆t], so that
R
i
= µ
i
+σ
i
φ
i
(10.3)
Suppose that the correlation between asset i and asset j is given by ρ
ij
= E[φ
i
φ
j
]. Suppose we buy x
i
of
each asset at t, to form the portfolio P
P =
i=N
¸
i=1
x
i
S
i
. (10.4)
Then, over the interval [t, t + ∆t]
P +dP =
i=N
¸
i=1
x
i
S
i
(1 +R
i
)
dP =
i=N
¸
i=1
x
i
S
i
R
i
dP
P
=
i=N
¸
i=1
w
i
R
i
w
i
=
x
i
S
i
¸
j=N
j=1
x
j
S
j
(10.5)
In other words, we divide up our total wealth W =
¸
i=N
i=1
x
i
S
i
into each asset with weight w
i
. Note that
¸
i=N
i=1
w
i
= 1.
To summarize, given some initial wealth at t, we suppose that an investor allocates a fraction w
i
of this
wealth to each asset i. We assume that the total wealth is allocated to this risky portfolio P, so that
i=N
¸
i=1
w
i
= 1
P =
i=N
¸
i=1
x
i
S
i
R
p
=
dP
P
=
i=N
¸
i=1
w
i
R
i
. (10.6)
53
The expected return on this portfolio R
p
in [t, t + ∆t] is
R
p
=
i=N
¸
i=1
w
i
µ
i
, (10.7)
while the variance of R
p
in [t, t + ∆t] is
V ar(R
p
) =
i=N
¸
i=1
j=N
¸
j=1
w
i
w
j
σ
i
σ
j
ρ
ij
. (10.8)
10.1 Special Cases
Suppose the assets all have zero correlation with one another, i.e. ρ
ij
≡ 0, ∀i, j. Then equation (10.8)
becomes
V ar(R
p
) =
i=N
¸
i=1
(σ
i
)
2
(w
i
)
2
. (10.9)
Now, suppose we equally weight all the assets in the portfolio, i.e. w
i
= 1/N, ∀i. Let max
i
σ
i
= σ
max
, then
V ar(R
p
) =
1
N
2
i=N
¸
i=1
(σ
i
)
2
≤
N(σ
max
)
2
N
2
= O
1
N
, (10.10)
so that in this special case, if we diversify over a large number of assets, the standard deviation of the
portfolio tends to zero as N →∞.
Consider another case: all assets are perfectly correlated, ρ
ij
= 1, ∀i, j. In this case
V ar(R
p
) =
i=N
¸
i=1
j=N
¸
j=1
w
i
w
j
σ
i
σ
j
=
j=N
¸
j=1
w
j
σ
j
¸
¸
2
(10.11)
so that if sd(R) =
V ar(R) is the standard deviation of R, then, in this case
sd(R
p
) =
j=N
¸
j=1
w
j
σ
j
, (10.12)
which means that in this case the standard deviation of the portfolio is simply the weighted average of the
individual asset standard deviations.
In general, we can expect that 0 < ρ
ij
 < 1, so that the standard deviation of a portfolio of assets will
be smaller than the weighted average of the individual asset standard deviation, but larger than zero.
This means that diversiﬁcation will be a good thing (as Martha Stewart would say) in terms of risk versus
reward. In fact, a portfolio of as little as 10 −20 stocks tends to reap most of the beneﬁts of diversiﬁcation.
54
10.2 The Portfolio Allocation Problem
Diﬀerent investors will choose diﬀerent portfolios depending on how much risk they wish to take. However,
all investors like to achieve the highest possible expected return for a given amount of risk. We are assuming
that risk and standard deviation of portfolio return are synonymous.
Let the covariance matrix C be deﬁned as
[C]
ij
= C
ij
= σ
i
σ
j
ρ
ij
(10.13)
and deﬁne the vectors ¯ µ = [µ
1
, µ
2
, ..., µ
N
]
t
, ¯ w = [w
1
, w
2
, ..., w
N
]
t
. In theory, the covariance matrix should be
symmetric positive semideﬁnite. However, measurement errors may result in C having a negative eigenvalue,
which should be ﬁxed up somehow.
The expected return on the portfolio is then
R
p
= ¯ w
t
¯ µ , (10.14)
and the variance is
V ar(R
p
) = ¯ w
t
C ¯ w . (10.15)
We can think of portfolio allocation problem as the following. Let α represent the degree with which
investors want to maximize return at the expense of assuming more risk. If α → 0, then investors want
to avoid as much risk as possible. On the other hand, if α → ∞, then investors seek only to maximize
expected return, and don’t care about risk. The portfolio allocation problem is then (for given α) ﬁnd ¯ w
which satisﬁes
min
¯ w
¯ w
t
C ¯ w −α ¯ w
t
¯ µ (10.16)
subject to the constraints
¸
i
w
i
= 1 (10.17)
L
i
≤ w
i
≤ U
i
; i = 1, ..., N . (10.18)
Constraint (10.17) is simply equation (10.6), while constraints (10.18) may arise due to the nature of the
portfolio. For example, most mutual funds can only hold long positions (w
i
≥ 0), and they may also be
prohibited from having a large position in any one asset (e.g. w
i
≤ .20). Longshort hedge funds will not
have these types of restrictions. For ﬁxed α, equations (10.1610.18) constitute a quadratic programming
problem.
Let
sd(R
p
) = standard deviation of R
p
=
V ar(R
p
) (10.19)
We can now trace out a curve on the (sd(R
p
), R
p
) plane. We pick various values of α, and then solve the
quadratic programming problem (10.1610.18). Figure 10.1 shows a typical curve, which is also known as
the eﬃcient frontier. The data used for this example is
¯ µ =
.15
.20
.08
¸
¸
; C =
.20 .05 −.01
.05 .30 .015
−.01 .015 .1
¸
¸
L =
0
0
0
¸
¸
; U =
∞
∞
∞
¸
¸
(10.20)
We have restricted this portfolio to be long only. For a given value of the standard deviation of the
portfolio return (sd(R
p
)), then any point below the curve is not eﬃcient, since there is another portfolio
55
Standard Deviation
E
x
p
e
c
t
e
d
R
e
t
u
r
n
0.2 0.3 0.4 0.5 0.6
0.1
0.125
0.15
0.175
0.2
0.225
0.25
Efficient Frontier
Figure 10.1: A typical eﬃcient frontier. This curve shows, for each value of portfolio standard deviation
SD(R
p
), the maximum possible expected portfolio return R
p
. Data in equation (10.20).
with the same risk (standard deviation) and higher expected return. Only points on the curve are eﬃcient
in this manner. In general, a linear combination of portfolios at two points along the eﬃcient frontier will be
feasible, i.e. satisfy the constraints. This feasible region will be convex along the eﬃcient frontier. Another
way of saying this is that a straight line joining any two points along the curve does not intersect the curve
except at the given two points. Why is this the case? If this was not true, then the eﬃcient frontier would
not really be eﬃcient. (see Portfolio Theory and Capital Markets, W. Sharpe, McGraw Hill, 1970, reprinted
in 2000).
Figure 10.2 shows results if we allow the portfolio to hold up to .25 short positions in each asset. In other
words, the data is the same as in (10.20) except that
L =
−.25
−.25
−.25
¸
¸
. (10.21)
In general, longshort portfolios are more eﬃcient than longonly portfolios. This is the advertised advantage
of longshort hedge funds.
Since the feasible region is convex, we can actually proceed in a diﬀerent manner when constructing the
eﬃcient frontier. First of all, we can determine the maximum possible expected return (α = ∞ in equation
(10.16)),
min
¯ w
−¯ w
t
¯ µ
¸
i
w
i
= 1
L
i
≤ w
i
≤ U
i
; i = 1, ..., N (10.22)
which is simply a linear programming problem. If the solution weight vector to this problem is ( ¯ w)
max
, then
the maximum possible expected return is (R
p
)
max
= ¯ w
t
max
¯ µ.
Then determine the portfolio with the smallest possible risk, (α = 0 in equation (10.16) )
min
¯ w
¯ w
t
C ¯ w
¸
i
w
i
= 1
L
i
≤ w
i
≤ U
i
; i = 1, ..., N . (10.23)
56
Standard Deviation
E
x
p
e
c
t
e
d
R
e
t
u
r
n
0.2 0.3 0.4 0.5 0.6
0.1
0.125
0.15
0.175
0.2
0.225
0.25
Long Only
LongShort
Figure 10.2: Eﬃcient frontier, comparing results for longonly portfolio (10.20) and a longshort portfolio
(same data except that lower bound constraint is replaced by equation (10.21).
If the solution weight vector to this quadratic program is given by ¯ w
min
, then the minimum possible portfolio
return is (R
p
)
min
= ¯ w
t
min
¯ µ. We then divide up the range [(R
p
)
min
, (R
p
)
max
] into a large number of discrete
portfolio returns (R
p
)
k
; k = 1, ..., N
pts
. Let e = [1, 1, ..., 1], and
A =
¸
¯ µ
t
e
; B
k
=
¸
(R
p
)
k
1
(10.24)
then, for given (R
p
)
k
we solve the quadratic program
min
¯ w
¯ w
t
C ¯ w
A¯ w = B
k
L
i
≤ w
i
≤ U
i
; i = 1, ..., N , (10.25)
with solution vector ( ¯ w)
k
and hence portfolio standard deviation sd((R
p
)
k
) =
( ¯ w)
t
k
C( ¯ w)
k
. This gives us
a set of pairs (sd((R
p
)
k
), (R
p
)
k
), k = 1, ..., N
pts
.
10.3 Adding a Riskfree asset
Up to now, we have assumed that each asset is risky, i.e. σ
i
> 0, ∀i. However, what happens if we add a
risk free asset to our portfolio? This riskfree asset must earn the risk free rate r
= r∆t, and its standard
deviation is zero. The data for this case is (the riskfree asset is added to the end of the weight vector, with
r
= .03).
¯ µ =
.15
.20
.08
.03
¸
¸
¸
¸
; C =
.20 .05 −.01 0.0
.05 .30 .015 0.0
−.01 .015 .1 0.0
0.0 0.0 0.0 0.0
¸
¸
¸
¸
L =
0
0
0
−∞
¸
¸
¸
¸
; U =
∞
∞
∞
∞
¸
¸
¸
¸
(10.26)
where we have assumed that we can borrow any amount at the riskfree rate (a dubious assumption).
57
Standard Deviation
E
x
p
e
c
t
e
d
R
e
t
u
r
n
0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.025
0.05
0.075
0.1
0.125
0.15
0.175
0.2
0.225
0.25
Efficient Frontier
All Risky Assets
Risk Free
Return
Market
Portfolio
Capital Market
Line
Lending
Borrowing
Figure 10.3: The eﬃcient frontier from Figure 10.1 (all risky assets), and the eﬃcient frontier with the same
assets as in Figure 10.1, except that we include a risk free asset. In this case, the eﬃcient frontier becomes
a straight line, shown as the capital market line.
If we compute the eﬃcient frontier with a portfolio of risky assets and include one riskfree asset, we get
the result labeled capital market line in Figure 10.3. In other words, in this case the eﬃcient frontier is a
straight line. Note that this straight line is always above the eﬃcient frontier for the portfolio consisting of
all risky assets (as in Figure 10.1). In fact, given the eﬃcient frontier from Figure 10.1, we can construct the
eﬃcient frontier for a portfolio of the same risky assets plus a risk free asset in the following way. First of all,
we start at the point (0, r
) in the (sd(R
p
), R
p
) plane, corresponding to a portfolio which consists entirely
of the risk free asset. We then draw a straight line passing through (0, r
), which touches the allriskyasset
eﬃcient frontier at a single point (the straight line is tangent the allriskyasset eﬃcient frontier). Let the
portfolio weights at this single point be denoted by ¯ w
M
. The portfolio corresponding to the weights ¯ w
M
is termed the market portfolio. Let (R
p
)
M
= ¯ w
t
M
¯ µ be the expected return on this market portfolio, with
corresponding standard deviation sd((R
p
)
M
). Let w
r
be the fraction invested in the risk free asset. Then,
any point along the capital market line has
R
p
= w
r
r
+ (1 −w
r
)(R
p
)
M
sd(R
p
) = (1 −w
r
) sd((R
p
)
M
) . (10.27)
If w
r
≥ 0, then we are lending at the riskfree rate. If w
r
< 0, we are borrowing at the riskfree rate.
Consequently, given a portfolio of risky assets, and a riskfree asset, then all investors should divide their
assets between the riskfree asset and the market portfolio. Any other choice for the portfolio is not eﬃcient.
Note that the actual fraction selected for investment in the market portfolio depends on the risk preferences
of the investor.
The capital market line is so important, that the equation of this line is written as R
p
= r
+λ
M
sd((R
p
)),
where λ
M
is the market price of risk. In other words, all diversiﬁed investors, at any particular point in
time, should have diversiﬁed portfolios which plot along the capital market line. All portfolios should have
the same Sharp ratio
λ
M
=
R
p
−r
sd(R
p
)
. (10.28)
58
10.4 Criticism
Is meanvariance portfolio optimization the solution to all our problems? Not exactly. We have assumed
that µ
, σ
are independent of time. This is not likely. Even if these parameters are reasonably constant,
they are diﬃcult to estimate. In particular, µ
is hard to determine if the time series of returns is not very
long. Remember that for short time series, the noise term (Brownian motion) will dominate. If we have a
long time series, we can get a better estimate for µ
, but why do we think µ
for a particular ﬁrm will be
constant for long periods? Probably, stock analysts should be estimating µ
from company balance sheets,
sales data, etc. However, for the past few years, analysts have been too busy hyping stocks and going to
lunch to do any real work. So, there will be lots of diﬀerent estimates of µ
, C, and hence many diﬀerent
optimal portfolios.
In fact, some recent studies have suggested that if investors simply use the 1/N rule, whereby initial
wealth is allocated equally between N assets, that this does a pretty good job, assuming that there is
uncertainty in the estimates of µ
, C.
We have also assumed that risk is measured by standard deviation of portfolio return. Actually, if I am
long an asset, I like it when the asset goes up, and I don’t like it when the asset goes down. In other words,
volatility which makes the price increase is good. This suggests that perhaps it may be more appropriate to
minimize downside risk only (assuming a long position).
Perhaps one of the most useful ideas that come from meanvariance portfolio optimization is that diver
siﬁed investors (at any point in time) expect that any optimal portfolio will produce a return
R
p
= r
+λ
M
σ
p
R
p
= Expected portfolio return
r
= riskfree return in period ∆t
λ
M
= market price of risk
σ
p
= Portfolio volatility , (10.29)
where diﬀerent investors will choose portfolios with diﬀerent σ
p
(volatility), depending on their risk prefer
ences, but λ
M
is the same for all investors. Of course, we also have
R
M
= r
+λ
M
σ
M
. (10.30)
Note: there is a whole ﬁeld called Behavioural Finance, whose adherents don’t think much of mean
variance portfolio optimization.
Another recent approach is to compute the optimal portfolio weights using using many diﬀerent perturbed
input data sets. The input data (expected returns, and covariances) are determined by resampling, i.e.
assuming that the observed values have some observational errors. In this way, we can get an some sort of
optimal portfolio weights which have some eﬀect of data errors incorporated in the result. This gives us an
average eﬃcient frontier, which, it is claimed, is less sensitive to data errors.
10.5 Individual Securities
Equation (10.30) refers to an eﬃcient portfolio. What is the relationship between risk and reward for
individual securities? Consider the following portfolio: divide all wealth between the market portfolio, with
weight w
M
and security i, with weight w
i
. By deﬁnition
w
M
+w
i
= 1 , (10.31)
and we deﬁne
R
M
= expected return on the market portfolio
R
i
= expected return on asset i
σ
M
= s.d. of return on market portfolio
σ
i
= s.d. of return on asset i
C
i,M
= σ
M
σ
i
ρ
i,M
= Covariance between i and M (10.32)
59
Now, the expected return on this portfolio is
R
p
= E[R
p
] = w
i
R
i
+w
M
R
M
= w
i
R
i
+ (1 −w
i
)R
M
(10.33)
and the variance is
V ar(R
p
) = (σ
p
)
2
= w
2
i
(σ
i
)
2
+ 2w
i
w
M
C
i,M
+w
2
M
(σ
M
)
2
= w
2
i
(σ
i
)
2
+ 2w
i
(1 −w
i
)C
i,M
+ (1 −w
i
)
2
(σ
M
)
2
(10.34)
For a set of values {w
i
}, equations (10.3310.34) will plot a curve in expected returnstandard deviation
plane (R
p
, σ
p
) (e.g. Figure 10.3). Let’s determine the slope of this curve when w
i
→0, i.e. when this curve
intersects the capital market line at the market portfolio.
2(σ
p
)
∂(σ
p
)
∂w
i
= 2w
i
(σ
i
)
2
+ 2(1 −2w
i
)C
i,M
+ 2(w
i
−1)(σ
M
)
2
∂R
p
∂w
i
= R
i
−R
M
. (10.35)
Now,
∂R
p
∂(σ
p
)
=
∂Rp
∂wi
∂(σ
p
)
∂wi
=
(R
i
−R
M
)(σ
p
)
w
i
(σ
i
)
2
+ (1 −2w
i
)C
i,M
+ (w
i
−1)(σ
M
)
2
. (10.36)
Now, let w
i
→0 in equation (10.36), then we obtain
∂R
p
∂(σ
p
)
=
(R
i
−R
M
)(σ
M
)
C
i,M
−(σ
M
)
2
(10.37)
But this curve should be tangent to the capital market line, equation (10.30) at the point where the capital
market line touches the eﬃcient frontier. If this curve is not tangent to the capital market line, then this
implies that if we choose w
i
= ±, then the curve would be above the capital market line, which should not
be possible (the capital market line is the most eﬃcient possible portfolio). This assumes that positions with
w
i
< 0 in asset i are possible.
Assuming that the slope of the R
p
portfolio is tangent to the capital market line gives (from equations
(10.30,10.37))
R
M
−r
(σ
M
)
=
(R
i
−R
M
)(σ
M
)
C
i,M
−(σ
M
)
2
(10.38)
or
R
i
= r
+β
i
(R
M
−r
)
β
i
=
C
i,M
(σ
M
)
2
. (10.39)
The coeﬃcient β
i
in equation (10.39) has a nice intuitive deﬁnition. Suppose we have a time series of returns
(R
i
)
k
= Return on asset i, in period k
(R
M
)
k
= Return on market portfolio in period k . (10.40)
Typically, we assume that the market portfolio is a broad index, such as the TSX 300. Now, suppose we try
to obtain a least squares ﬁt to the above data, using the equation
R
i
α
i
+b
i
R
M
. (10.41)
60
−0.08 −0.06 −0.04 −0.02 0 0.02 0.04
−0.1
−0.05
0
0.05
0.1
0.15
Return on market
R
e
t
u
r
n
o
n
s
t
o
c
k
Rogers Wireless Communications
Figure 10.4: Return on Rogers Wireless Communications versus return on TSE 300. Each point represents
pairs of daily returns. The vertical axis measures the daily return on the stock and the horizontal axis that
of the TSE300.
Carrying out the usual least squares analysis (e.g. do a linear regression of R
i
vs. R
M
), we ﬁnd that
b
i
=
C
i,M
(σ
M
)
2
(10.42)
so that we can write
R
i
α
i
+β
i
R
M
. (10.43)
This means that β
i
is the slope of the best ﬁt straight line to a ((R
i
)
k
, (R
M
)
k
) scatter plot. An example is
shown in Figure 10.4. Now, from equation (10.39) we have that
R
i
= r
+β
i
(R
M
−r
) (10.44)
which is consistent with equation (10.43) if
R
i
= α
i
+β
i
R
M
+
i
E[
i
] = 0
α
i
= r
(1 −β
i
)
E[
i
, R
M
] = 0 , (10.45)
since
E[R
i
] = R
i
= α
i
+β
i
R
M
. (10.46)
Equation (10.46) has the interpretation that the return on asset i can be decomposed into a drift component,
a part which is correlated to the market portfolio (the broad index), and a random part uncorrelated with
the index. Make the following assumptions
E[
i
j
] = 0 ; i = j
= e
2
i
; i = j (10.47)
e.g. that returns on each each asset are correlated only through their correlation with the index. Consider
once again a portfolio where the wealth is divided amongst N assets, each asset receiving a fraction w
i
of
61
the initial wealth. In this case, the return on the portfolio is
R
p
=
i=N
¸
i=1
w
i
R
i
R
p
=
i=N
¸
i=1
w
i
α
i
+R
M
i=N
¸
i=1
w
i
β
i
(10.48)
and
s.d.(R
p
) =
(σ
M
)
2
i=N
¸
i=1
j=N
¸
j=1
w
i
w
j
β
i
β
j
+
i=N
¸
i=1
w
2
i
e
2
i
=
(σ
M
)
2
i=N
¸
i=1
w
i
β
i
2
+
i=N
¸
i=1
w
2
i
e
2
i
. (10.49)
Now, if w
i
= O(1/N), then
i=N
¸
i=1
w
2
i
e
2
i
(10.50)
is O(1/N) as N becomes large, hence equation (10.49) becomes
s.d.(R
p
) σ
M
i=N
¸
i=1
w
i
β
i
. (10.51)
Note that if we write
R
i
= r
+λ
i
σ
i
(10.52)
then we also have that
R
i
= r
+β
i
(R
M
−r
) (10.53)
so that the market price of risk of security i is
λ
i
=
β
i
(R
M
−r
)
σ
i
(10.54)
which is useful in real options analysis.
11 Stocks for the Long Run?
Conventional wisdom states that investment in a diversiﬁed portfolio of equities has a low risk for a long
term investor. However, in a recent article (”Irrational Optimisim,” Fin. Anal. J. E. Simson, P.Marsh, M.
Staunton, vol 60 (January, 2004) 2525) an extensive analysis of historical data of equity returns was carried
out. Projecting this information forward, the authors conclude that the probability of a negative real return
over a twenty year period, for a investor holding a diversiﬁed portfolio, is about 14 per cent. In fact, most
individuals in deﬁned contribution pension plans have poorly diversiﬁed portfolios. Making more realistic
assumptions for deﬁned contribution pension plans, the authors ﬁnd that the probabability of a negative
real return over twenty years is about 25 per cent.
Let’s see if we can explain why there is this common misconception about the riskiness of long term
equity investing. Table 11.1 shows a typical table in a Mutual Fund advertisement. From this table, we are
supposed to conclude that
62
1 year 2 years 5 years 10 years 20 years 30 years 30 year bond
yield
2% 5% 10% 8% 7% 6% 3%
Table 11.1: Historical annualized compound return, XYZ Mutual Equity Funds. Also shown is the current
yield on a long term government bond.
• Long term equity investment is not very risky, with an annualized compound return about 3% higher
than the current yield on government bonds.
• If S is the value of the mutual fund, and B is the value of the government bond, then
B(T) = B(0)e
rT
r = .03
S(T) S(0)e
αT
α = .06, (11.1)
for T large, which gives
S(T=30)
S(0)
B(T=30)
B(0)
= e
1.8−.9
= e
.9
2.46, (11.2)
indicating that you more than double your return by investing in equities compared to bonds (over the
long term).
A convenient way to measure the relative returns on these two investments (bonds and stocks) is to
compare the total compound return
Compound return: stocks = log
¸
S(T)
S(0)
= αT
Compound return: bonds = log
¸
B(T)
B(0)
= rT , (11.3)
or the annualized compound returns
Annualized compound return: stocks =
1
T
log
¸
S(T)
S(0)
= α
Annualized compound return: bonds =
1
T
log
¸
B(T)
B(0)
= r . (11.4)
If we assume that the value of the equity portfolio S follows a Geometric Brownian Motion
dS = µS dt +σS dZ (11.5)
then from equation (2.56) we have that
log
S(T)
S(0)
∼ N((µ −
σ
2
2
)T, σ
2
T) , (11.6)
i.e. the compound return in is normally distributed with mean (µ −
σ
2
2
)T and variance σ
2
T, so that the
variance of the total compound return increases as T becomes large.
Since var(aX) = a
2
var(X), it follows that
1
T
log
S(T)
S(0)
∼ N((µ −
σ
2
2
), σ
2
/T) , (11.7)
63
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8
0
0.5
1
1.5
2
2.5
3
x 10
4
Annualized Returns − 5 years
Ann. Returns ($)
−3 −2 −1 0 1 2 3 4 5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x 10
4
Log Returns − 5 years
Log Returns ($)
Figure 11.1: Histogram of distribution of returns T = 5 years. µ = .08, σ = .2, 100, 000 simulations. Left:
annualized return 1/T log[S(T)/S(0)]. Right: return log[S(T)/S(0)].
so that the the variance of the annualized return tends to zero at T becomes large.
Of course, what we really care about is the total compound return (that’s how much we actually have at
t = T, relative to what we invested at t = 0) at the end of the investment horizon. This is why Table 11.1
is misleading. There is signiﬁcent risk in equities, even over the long term (30 years would be longterm for
most investors).
Figure 11.1 shows the results of 100, 000 simulations of asset prices assuming that the asset follows
equation (11.5), with µ = .08, σ = .2. The investment horizon is 5 years. The results are given in terms of
histograms of the annualized compound return (equation (11.4)) and the total compound return ((equation
(11.3)).
Figure 11.2 shows similar results for an investment horizon of 30 years. Note how the variance of the
annualized return has decreased, while the variance of the total return has increased (verifying equations
(11.611.7)).
Assuming long term bonds yield 3%, this gives a total compound return over 30 years of .90, for bonds.
Looking at the right hand panel of Figure 11.2 shows that there are many possible scenarios where the return
on equities will be less than risk free bonds after 30 years. The number of scenarios with return less than
risk free bonds is given by the area to the left of .9 in the histogram.
12 Further Reading
12.1 General Interest
• Peter Bernstein, Capital Ideas: the improbable origins of modern Wall street, The Free Press, New
York, 1992.
• Peter Bernstein, Against the Gods: the remarkable story of risk, John Wiley, New York, 1998, ISBN
0471295639.
• Burton Malkeil, A random walk down Wall Street, W.W. Norton, New York, 1999, ISBN 0393320405.
• N. Taleb, Fooled by Randomness, Texere, 2001, ISBN 1587990717.
12.2 More Background
• A. Dixit and R. Pindyck, Investment under uncertainty, Princeton University Press, 1994.
64
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8
0
0.5
1
1.5
2
2.5
3
x 10
4
Annualized Returns − 30 years
Ann. Returns ($)
−3 −2 −1 0 1 2 3 4 5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x 10
4
Log Returns − 30 years
Log Returns ($)
Figure 11.2: Histogram of distribution of returns T = 30 years. µ = .08, σ = .2, 100, 000 simulations. Left:
annualized return 1/T log[S(T)/S(0)]. Right: return log[S(T)/S(0)],
• John Hull, Options, futures and other derivatives, PrenticeHall, 1997, ISBN 0131864793.
• S. Ross, R. Westerﬁeld, J. Jaﬀe, Corporate Finance, McGrawHill Irwin, 2002, ISBN 0072831375.
• W. Sharpe, Portfolio Theory and Capital Markets, Wiley, 1970, reprinted in 2000, ISBN 0071353208.
(Still a classic).
• Lenos Triegeorgis, Real Options: Managerial Flexibility and Strategy for Resource Allocation, MIT
Press, 1996, ISBN 026220102X.
12.3 More Technical
• P. Brandimarte, Numerical Methods in Finance: A Matlab Introduction, Wiley, 2002, ISBN 0471
396869.
• Boyle, Broadie, Glassermman, Monte Carlo methods for security pricing, J. Econ. Dyn. Con., 21:1267
1321 (1997)
• P. Glasserman, Monte Carlo Methods in Financial Engineering, Springer (2004) ISBN 0387004513.
• D. Higham, An Introduction to Financial Option Valuation, Cambridge (2004) ISBN 0521838843.
• P. Jackel, Monte Carlo Methods in Finance, Wiley, 2002, ISBN 047149741X.
• Y.K. Kwok, Mathematical Models of Finance, Springer Singapore, 1998, ISBN 9813083565.
• S. Neftci, An Introduction to the Mathematics of Financial Derivatives, Academic Press (2000) ISBN
0125153929.
• R. Seydel, Tools for Computational Finance, Springer, 2002, ISBN 354043609X.
• D. Tavella and C. Randall, Pricing Financial Instruments: the Finite Diﬀerence Method, Wiley, 2000,
ISBN 0471197602.
• D. Tavella, Quantitative Methods in Derivatives Pricing: An Introduction to Computational Finance,
Wiley (2002) ISBN 0471394475.
• N. Taleb, Dynamic Hedging, Wiley, 1997, ISBN 0471152803.
65
• P. Wilmott, S. Howison, J. Dewynne, The mathematics of ﬁnancial derivatives: A student introduction,
Cambridge, 1997, ISBN 0521497892.
• Paul Wilmott, Paul Wilmott on Quantitative Finance, Wiley, 2000, ISBN 0471874388.
66
5 The Binomial Model 33 5.1 A Noarbitrage Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6 More on Ito’s Lemma 7 Derivative Contracts on nontraded Assets 7.1 Derivative Contracts . . . . . . . . . . . . . 7.2 A Forward Contract . . . . . . . . . . . . . 7.2.1 Convenience Yield . . . . . . . . . . and Real . . . . . . . . . . . . . . . . . . 38 Options 40 . . . . . . . . . . . . . . . . . . . . . . 41 . . . . . . . . . . . . . . . . . . . . . . 44 . . . . . . . . . . . . . . . . . . . . . . 44
8 Discrete Hedging 45 8.1 Delta Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 8.2 Gamma Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 8.3 Vega Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 9 Jump Diﬀusion 49 9.1 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 9.2 The Jump Diﬀusion Pricing Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 10 Mean Variance Portfolio Optimization 10.1 Special Cases . . . . . . . . . . . . . . 10.2 The Portfolio Allocation Problem . . . 10.3 Adding a Riskfree asset . . . . . . . . 10.4 Criticism . . . . . . . . . . . . . . . . 10.5 Individual Securities . . . . . . . . . . 11 Stocks for the Long Run? 53 54 55 57 59 59 62
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
12 Further Reading 64 12.1 General Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 12.2 More Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 12.3 More Technical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2
“Men wanted for hazardous journey, small wages, bitter cold, long months of complete darkness, constant dangers, safe return doubtful. Honour and recognition in case of success.” Advertisement placed by Earnest Shackleton in 1914. He received 5000 replies. An example of extreme riskseeking behaviour. Hedging with options is used to mitigate risk, and would not appeal to members of Shackleton’s expedition.
1
The First Option Trade
Many people think that options and futures are recent inventions. However, options have a long history, going back to ancient Greece. As recorded by Aristotle in Politics, the ﬁfth century BC philosopher Thales of Miletus took part in a sophisticated trading strategy. The main point of this trade was to conﬁrm that philosophers could become rich if they so chose. This is perhaps the ﬁrst rejoinder to the famous question “If you are so smart, why aren’t you rich?” which has dogged academics throughout the ages. Thales observed that the weather was very favourable to a good olive crop, which would result in a bumper harvest of olives. If there was an established Athens Board of Olives Exchange, Thales could have simply sold olive futures short (a surplus of olives would cause the price of olives to go down). Since the exchange did not exist, Thales put a deposit on all the olive presses surrounding Miletus. When the olive crop was harvested, demand for olive presses reached enormous proportions (olives were not a storable commodity). Thales then sublet the presses for a proﬁt. Note that by placing a deposit on the presses, Thales was actually manufacturing an option on the olive crop, i.e. the most he could lose was his deposit. If had sold short olive futures, he would have been liable to an unlimited loss, in the event that the olive crop turned out bad, and the price of olives went up. In other words, he had an option on a future of a nonstorable commodity.
2
The BlackScholes Equation
This is the basic PDE used in option pricing. We will derive this PDE for a simple case below. Things get much more complicated for real contracts.
2.1
Background
Over the past few years derivative securities (options, futures, and forward contracts) have become essential tools for corporations and investors alike. Derivatives facilitate the transfer of ﬁnancial risks. As such, they may be used to hedge risk exposures or to assume risks in the anticipation of proﬁts. To take a simple yet instructive example, a gold mining ﬁrm is exposed to ﬂuctuations in the price of gold. The ﬁrm could use a forward contract to ﬁx the price of its future sales. This would protect the ﬁrm against a fall in the price of gold, but it would also sacriﬁce the upside potential from a gold price increase. This could be preserved by using options instead of a forward contract. Individual investors can also use derivatives as part of their investment strategies. This can be done through direct trading on ﬁnancial exchanges. In addition, it is quite common for ﬁnancial products to include some form of embedded derivative. Any insurance contract can be viewed as a put option. Consequently, any investment which provides some kind of protection actually includes an option feature. Standard examples include deposit insurance guarantees on savings accounts as well as the provision of being able to redeem a savings bond at par at any time. These types of embedded options are becoming increasingly common and increasingly complex. A prominent current example are investment guarantees being oﬀered by insurance companies (“segregated funds”) and mutual funds. In such contracts, the initial investment is guaranteed, and gains can be lockedin (reset) a ﬁxed number of times per year at the option of the contract holder. This is actually a very complex put option, known as a shout option. How much should an investor be willing to pay for this insurance? Determining the fair market value of these sorts of contracts is a problem in option pricing.
3
where the holder must buy or sell at a prescribed price.2) . if the stock price is $18. with a European call option. but not the obligation. In order to price this option. We will compute δ so that the portfolio is riskless. Since the portfolio has no risk. 0) for a call = max(K − S. Consider a portfolio consisting of a long (positive) position of δ shares of stock. K = $21. to • Buy an asset at a prescribed price K (the exercise or strike price).1.2 Deﬁnitions Let’s consider some simple European put/call options. we know with certainty what the value of the option is. in such a way that there is no uncertainty about the value of the portfolio at the end of three months. This is a call option. Payoﬀ Payoﬀ = max(S − K. the option is worth zero. the return earned by this portfolio must be the riskfree rate. At expiry time T . 2. 0) for a put (2. the option is worth $1. This option can have only two possible values in three months: if the stock price is $22.1) Note that the payoﬀ from an option is always nonnegative. then the value of the portfolio is Value if stock goes up = $22δ − 1 Value if stock goes down = $18δ − 0 4 (2. the stock price will be either $22 or $18. This contrasts with a forward contract. This is illustrated in Figure 2. since the holder has a right but not an obligation. • Sell the asset at a prescribed price K (the exercise or strike price). If the stock moves up to $22 or goes down to $18. and other derivatives. It is known that at the end of three months. and we would like to value a European call option to buy the stock in three months for $21. 2. we can set up an imaginary portfolio consisting of the option and the stock. We assume that the stock pays no dividends. At some time T in the future (the expiry or exercise date) the holder has the right.3 A Simple Example: The Two State Tree This example is taken from Options. by John Hull.Stock Price = $22 Option Price = $1 Stock Price = $20 Stock Price = $18 Option Price = $0 Figure 2. in terms of the price of the underlying asset S. and short (negative) one call option.1: A simple case where the stock value can either be $22 or $18. This is a put option. futures. Suppose the value of a stock is currently $20.
1). or 4.00 (the current price of the stock). 5 . the seller could sell at the higher price and lock in an instantaneous riskfree gain. σ is the volatility. and in time t → t + dt. In three months time. Note that this hedge works regardless of whether or not the stock goes up or down. The value of the stock holding is now $4. Alternatively. at the expiry of the option.633.25 − V = 4.4) (2. then the value of the portfolio is Value if stock goes up = $22δ − 1 = $4. which leaves us with $4.5 Brownian Motion Before we consider a model for stock price movements. This is somewhat counterintuitive. Consequently. in this simple situation. let’s consider the idea of Brownian motion with drift. we don’t have a care in the world.25 = 4. and therefore that the market price of the option should be the theoretical price. Once we set up this hedge.25. then we can hedge this position in the following way. we pay the option holder $1. the value of the portfolio is $4. Suppose the current riskfree rate is 12%. then the value of the portfolio today must be the present value of $4. Today.5) where φ is a random variable drawn from a normal distribution with mean zero and variance one (φ ∼ N (0. φ is normally distributed).50 (2.50. regardless of whether the stock moves up or down.367 → V = . In other words.4 A hedging strategy So. and dZ is a random term. Suppose X is a random variable. The value of the portfolio is 20 × . which is just enough to pay oﬀ the bank loan. Let the value of the option be V .50.50. The value of the option is also independent of the probability that the stock goes up to $22 or down to $18. i.25 shares at $20.e. this is price which a hedger requires to ensure that there is always just enough money at the end to net out at zero gain or loss.00 in cash. The dZ term has the form √ dZ = φ dt (2.50 Value if stock goes down = $18δ − 0 = $4. 2. The call option is worthless. we can assume that such opportunities are not possible (the no arbitrage condition).367 The value of the stock today is $20.00. if we sell the above option (we hold a short position in the option). Then. Any such arbitrage opportunities are rapidly exploited in the market. If the market price of the option was higher than this value. which gives us $5. so that for most investors. which will precisely pay oﬀ the option holder and any bank loans required to set up the hedge.367 from the bank at the risk free rate (this means that we have to pay the bank back $4. if we choose δ = . • The stock goes down to $18. our stock holding is now worth $5. it would be possible to lock in a riskfree gain by selling the portfolio short.3) So.50 × e−.12×. just enough to pay oﬀ the bank loan. we see that the theoretical price of the option is the cost for the seller to set up portfolio. if the market price of the option was lower than the theoretical.50.633 2. borrow $4.50.00. where dX = αdt + σdZ where αdt is the drift term. we buy . we sell the option for $. or fair market value. one of two things happens • The stock goes up to $22.50 in three months). X → X + dX. A riskfree portfolio must earn the risk free rate.So.
(2. Suppose that at t = ∆t. the probability of an up move is p. Suppose that we have a discrete lattice of points.9) (2.6) (2. (2.13) 6 .2. Now. E(φ2 ) = 1 . we have E(dX) = E(αdt) + E(σdZ) = αdt . → X0 − ∆h . and (n − j) down moves (P (n. then E(Xn − X0 ) V ar(Xn − X0 ) = nE(∆X) = nV ar(∆X) . • The probability of an up or down move is independent of what happened in the past. (2. (2. j) = n! pj q n−j j!(n − j)! (2. with probability p with probability q (2. At any lattice point X0 + i∆h. Let X = X0 at t = 0.e. j)) is P (n. Each move takes place in the time interval t → t + ∆t. • X can move only up or down ∆h.If E is the expectation operator.11) Now.12) which is just a binomial distribution. the probability distribution in the future depends only on where it is now. so that t = n∆t. The probabilities of reaching any particular lattice point for the ﬁrst three moves are shown in Figure 2. The probability of j up moves.7) and the variance of dX. then E(φ) = 0 Now in a time interval dt. i. → X0 + ∆h . X0 X0 where p + q = 1.8) Let’s look at a discrete model to understand this process more completely. suppose we consider the distribution of X after n moves. Then E(∆X) E([∆X]2 ) = (p − q)∆h = p(∆h)2 + q(−∆h)2 = (∆h)2 . Assume that • X follows a Markov process.10) so that the variance of ∆X is (over t → t + ∆t) V ar(∆X) = E([∆X]2 ) − [E(∆X)]2 = (∆h)2 − (p − q)2 (∆h)2 = 4pq(∆h)2 . and the probability of a down move is q. Let ∆X be the change in X over the interval t → t + ∆t. if Xn is the value of X after n steps on the lattice. denoted by V ar(dX) is V ar(dX) = E([dX − E(dX)]2 ) = E([σdZ]2 ) = σ 2 dt .
is Let’s choose ∆h = σ ∆t. Now. which gives (from equation (2. (each up or down move is independent of previous moves).17) .∆h X 0 . so this √ obviously not a very interesting case).X 0 + 3∆h p2 X 0 + 2∆h X 0 + ∆h X0 X 0 . q ≤ 1. we must have √ (p − q) = Const. (Stock variances do not have either of these properties.14)) E(Xn − X0 ) V ar(Xn − X0 ) = σt (p − q) √ ∆t = t4pqσ 2 (2. 2.2∆h p p3 3p2q 2pq 3pq2 q q2 q 3 X 0 . 2. we need to choose ∆h = Const ∆t.2: Probabilities of reaching the discrete lattice points for the ﬁrst three moves. ∆t 7 (2.16) Now. since 0 ≤ p.15) as ∆t → 0. and we would like to recover dX = αdt + σdZ E(dX) = αdt V ar(dX) = σ 2 dt √ (2.3∆h Figure 2. after a ﬁnite time t is independent of ∆t.14) Now. for E(Xn − X0 ) to be independent of ∆t as ∆t → 0.13) we obtain = n(p − q)∆h t = (p − q)∆h ∆t V ar(Xn − X0 ) = n4pq(∆h)2 t = 4pq(∆h)2 ∆t E(Xn − X0 ) (2.14) we get that V ar(Xn − X0 ) is either 0 or inﬁnite after a ﬁnite time. we would like to take the limit at ∆t → 0 in such a way that the mean and variance of X. from equations (2. from equation (2. Otherwise.11. which follows from the properties of a binomial distribution. Consequently.10.
∆t → 0 . (2. so that Xn − X0 that equation (2.19) Now.24). dX and tn − t0 (2.19) gives E(Xn − X0 ) V ar(Xn − X0 ) = αt = tσ 2 (1 − = tσ 2 α2 ∆t) σ2 . Hence. (2.8).If we choose p−q we get p q = = 1 [1 + 2 1 [1 − 2 α√ ∆t] σ √ α ∆t] σ = α√ ∆t σ (2. so that we can write (Zi − Zi−1 )2 To summarize 8 (dZ)2 = ∆t . Note that (Zi − Zi−1 )2 = ∆t. if n is large (∆t → 0). in the limit as ∆t → 0.21) which agrees with equations (2.12) tends to a normal distribution. t) t = 0 t dZ .22) Z(ti ) − Z(ti−1 ) = Zi − Zi−1 = Xi − Xi−1 . we have that the mean of this distribution is zero. σ = 1. σ = 1) we have E(Zn − Z0 ) = 0 V ar(Zn − Z0 ) = t.20) dt. so Now. putting together equations (2. with variance t.162. we can interpret the random walk for X on the lattice (with these parameters) as the solution to the stochastic diﬀerential equation (SDE) dX = α dt + σ dZ √ dZ = φ dt.20) (α = 0. i (2.23) From equation (2. (2. let’s imagine that X(tn ) − X(t0 ) = Xn − X0 is very small. after a ﬁnite time t. recall that the binomial distribution (2. (2. Consider the case where α = 0.72.20) becomes E(dX) V ar(dX) = α dt = σ 2 dt .26) . so that (Zn − Z0 ) ∼ N (0. with certainty.25) In other words. √ √ Recall that have that Zi − Zi−1 = ∆t with probability p and Zi − Zi−1 = − ∆t with probability q. From equation (2.18) (2. (2. so that dX = dZ = Now we can write t dZ 0 = ∆t→0 lim (Zi+1 − Zi ) = (Zn − Z0 ) . 0 dZ is normally distributed with mean zero and variance t (the limit of a binomial distribution is a normal distribution).24) Now.
e. after any ﬁnite time. Similarly. (2. √ dZ = φ dt (2.28) Going back to our lattice example.27) as the limit of a discrete random walk on a lattice as the timestep tends to zero.30) Consequently. More realistically. • V ar(dZ) = dt. so we cannot speak of E( but we can possibly deﬁne E(dx) . We suppose that in an inﬁnitesimal time dt.34) and (2. note that the total distance traveled over any ﬁnite interval of time becomes inﬁnite. t) .33) 2. These paths are not diﬀerentiable. E(∆X) so that the the total distance traveled in n steps is n∆h = = which goes to inﬁnity as ∆t → 0. the actual path followed by stock is more complex than the simple situation described above.31) dx dt = ∆h (2. (2. (2. where dS = µdt + σdZ S where µ is the drift rate. So. we assume that the relative changes in stock prices (the returns) follow Brownian motion with drift.34) (2. 1). The degree of randomness is given 9 .29) t ∆h ∆t tσ √ ∆t (2.6 Geometric Brownian motion with drift Of course. the V ar(Xn − X0 ) is either zero or inﬁnite. superimposed on the upward (relative) drift is a (relative) random walk.• We can interpret the SDE dX = α dt + σ dZ √ dZ = φ dt. i. dt dx ) dt (2. and dZ is the increment of a Wiener process.35) are called geometric Brownian motion with drift. ∆x ∆t = ±∞ . does not exist.32) (2. Equations (2. Brownian motion is very jagged at every timescale. σ is the volatility. otherwise. the stock price S changes to S + dS. • We can integrate the term dZ to obtain t dZ 0 = Z(t) − Z(0) ∼ N (0.35) where φ ∼ N (0.
by the volatility σ. The study of these sorts of equations uses results from stochastic calculus. Figure 2.34). then. G → G + dG. 10 . where S follows the stochastic process equation (2. since for example dZ dt 1 = φ√ dt → ∞ as dt → 0 .3 gives an illustration of ten realizations of this random process for two diﬀerent values of the volatility. for our purposes. we assume that the drift rate µ equals the risk free rate. The normal rules of calculus don’t apply.37) so that σ is a measure of the degree of randomness of the stock price movement. in small time increment dt. Riskfree rate of return r = .05. Note that E(dS) = E(σSdZ + µSdt) = µSdt since E(dZ) = 0 (2.1000 900 800 700 1000 Low Volatility Case σ = . right: high volatility case. However. Equation (2. Wilmott). where dG = µS ∂G σ 2 S 2 ∂ 2 G ∂G + + ∂S 2 ∂S 2 ∂t dt + σS ∂G dZ ∂S (2.36) and that the variance of dS is V ar[dS] = E(dS 2 ) − [E(dS)]2 = E(σ 2 S 2 dZ 2 ) = σ 2 S 2 dt (2.38) An informal derivation of this result is given in the following section. Suppose we have some function G = G(S.3: Realizations of asset price following geometric Brownian motion. In this case. by P.40 per year Asset Price ($) 600 500 400 300 200 100 0 0 2 4 6 8 10 12 Asset Price ($) Risk Free Return 600 500 400 300 200 100 0 0 2 4 6 8 10 12 Risk Free Return Time (years) Time (years) Figure 2. Left: low volatility case. which is Ito’s Lemma (see Derivatives: the theory and practice of ﬁnancial engineering.34) is a stochastic diﬀerential equation. we need only one result.20 per year 900 800 700 High Volatility Case σ = . t).
45) √ Since dZ = O( dt) and dZ 2 → dt.47) (2.48) (2. then E(φ) = 0 so that the expected value of dZ 2 is E(dZ 2 ) = dt (2. 2 (2.43) Now.50) 11 . then b2 + Gt )dt 2 (2. Suppose we have a variable S which follows dS = a(S..40) (2.38). then dG = GS dS + Gt dt + GSS Now (from (2. it can be shown (see Section 6) that in the limit as dt → 0. dG = GS b dZ + (a GS + GSS (2.. t)dt + b(S.42) Now.49) b2 + Gt )dt 2 Equation (2.41) (2.39) where φ is a random variable drawn from a normal distribution with mean zero and unit variance.2.45) becomes (dS)2 = b2 dZ 2 + O((dt)3/2 ) or (dS)2 → b2 dt as dt → 0 Now.2.50). t). if E is the expectation operator.39) ) (dS)2 = (adt + b dZ)2 = a2 dt2 + ab dZdt + b2 dZ 2 dS 2 + . equation (2.. so that with probability one dZ 2 → dt as dt → 0 (2. t)dZ and if G = G(S. we have that φ2 dt becomes nonstochastic.47) give dG = GS dS + Gt dt + GSS dS 2 + . t). 2 b2 ) 2 (2. we have the result that if dS = a(S. Now since dZ 2 = φ2 dt (2.6.. we have that.44.38) can be deduced by setting a = µS and b = σS in equation (2. suppose we have some function G = G(S.1 Ito’s Lemma We give an informal derivation of Ito’s lemma (2. t)dt + b(S. equations(2.2.46) = GS (a dt + b dZ) + dt(Gt + GSS = GS b dZ + (aGS + GSS So.44) E(φ2 ) = 1 (2.39. t)dZ where dZ is the increment of a Weiner process.
then dG = aGX + Gt + b2 GXX 2 dt + GX b dZ . t) Suppose instead we use the more usual geometric Brownian motion dS = µSdt + σSdZ Let F (S) = log S.2 Some uses of Ito’s Lemma Suppose we have dS = µdt + σdZ .52) (2. t) dZ .6. these cases are about the only situations where we can exactly integrate the SDE (constant σ. so that ¯ dX dt t (2. 2 σ2 S 2 + Ft )dt 2 (2.60) = E[a] = a ¯ (2. µ). (2.57) σ2 )t + σ(Z(t) − Z(0)) 2 (2. If µ. then (b(X.55) (2. 2 (2. 12 . t) dt + b(X. (2.58) ¯ If E[X] = X.2. t) and dZ are independent) ¯ E[dX] = d E[S] = dX = E[a dt] + E[b] E[dZ] = E[a dt] . then this can be integrated (from t = 0 to t = t) exactly to give S(t) = S(0) + µt + σ(Z(t) − Z(0)) and from equation (2. 2.28) Z(t) − Z(0) ∼ N (0.61) ¯ X=E 0 a dt . and use Ito’s Lemma dF = FS SσdZ + (FS µS + FSS = (µ − so that we can integrate this to get F (t) = F (0) + (µ − or. since S = eF .53) (2.51) Unfortunately.. Let dX then if G = G(X).59) = a(X. σ = Const. S(t) = S(0) exp[(µ − σ2 )t + σ(Z(t) − Z(0))] .3 Some more uses of Ito’s Lemma We can often use Ito’s Lemma and some algebraic tricks to determine some properties of distributions.54) (2.56) σ2 )dt + σdZ .6.
68. we can sometimes get more useful expressions. σ constant.70) so that = E[S 3 ] 3 = S0 e3(µ+σ 2 )t . 2.69) ¯ ¯ One can use the same ideas to compute the skewness.66) and (2.66.64) (2. If G(S) = S 3 and G = E[G(S)] = E[S 3 ]. let G(S) = S .66) ¯ Now. so that E[G] = G = E[S ]. (2.72) to get the desired result. ¯ G 2 ¯ = G0 e(2µ+σ )t 2 2 E[S 2 ] = S0 e(2µ+σ )t . 2. then (from Ito’s Lemma) ¯ dG = E[2µS 2 + σ 2 S 2 ] dt + E[2S 2 σ]E[dZ] = E[2µS 2 + σ 2 S 2 ] dt ¯ = (2µ + σ 2 )G dt . E[(S − S)3 ]. 2 (2.68) From equations (2. (2.71) We can then obtain the skewness from ¯ ¯ ¯ ¯ E[(S − S)3 ] = E[S 3 − 2S 2 S − 2S S 2 + S 3 ] 3 2 3 ¯ ¯ = E[S ] − 2SE[S ] − S . Equations (2. then ¯ dG = E[µS · 3S 2 + σ 2 S 2 /2 · 3 · 2S] dt + E[3S 2 σS]E[dZ] = E[3µS 3 + 3σ 2 S 3 ] ¯ = 3(µ + σ 2 )G .63) In a particular case. then ¯ dG = E [dG] ¯ ¯ a ¯ = E[2(X − X)a − 2(X − X)¯ + b2 ] dt + E[b(X − X)]E[dZ] 2 ¯ = E[b dt] + E[2(X − X)(a − a) dt] .¯ ¯ Let G = E[(X − X)2 ] = var(X). (2.62) which means that ¯ G = var(X) = E 0 b2 dt + E 0 ¯ 2(a − a)(X − X) dt ¯ .67) so that (2.68) we then have var(S) = E[S 2 ] − (E[S])2 ¯ = E[S 2 ] − S 2 2 = S0 e2µt (eσ t − 1) 2 ¯ = S 2 (eσ t − 1) . 13 (2.72) . (2. so that ¯ dS ¯ S 2 2 = µS dt + σS dZ (2. ¯ t t (2.65) ¯ = µS dt = S0 eµt . then ¯ E[dS] = dS = E[µS] dt ¯ = µS dt . ¯ G (2. If dS with µ.71) can then be substituted into equation (2.
e. sold it. P → P + dP .75). at any instant in time. i.34) and (2. (This is the analogue of our choice of the amount of stock in the riskless portfolio for the two state tree model.74) is actually the change in the value of the portfolio. • There are no arbitrage opportunities.7 The BlackScholes Analysis Assume • The stock price follows geometric Brownian motion.78) 14 . letting (αh ) = VS then substituting equation (2.74) Note that in equation (2. consisting of one option. turbulent ﬂow). t).75) We can make this portfolio riskless over the time interval dt. Substituting equations (2. However. This is actually a rather subtle point. and are obligated to give it back at some future date).73) In a small time dt. we can own negative quantities of an asset). The principle of no peeking into the future is why Ito stochastic calculus is used.34).77) (2. and then S changes randomly.78) give rP dt = Vt + σ2 S 2 VSS dt 2 (2. i. which is not permitted by the noarbitrage condition) and hence (αh ) is not allowed to contain any information about future asset price movements. The value of this portfolio P is P = V − (αh )S (2. equations (2. we have borrowed an asset.76) into equation (2. dP = dV − (αh )dS h (2. then noarbitrage says that dP = rP dt Therefore.74) gives dP = σS VS − (αh ) dZ + µSVS + σ2 S 2 VSS + Vt − µ(αh )S dt 2 (2. we could get rich without risk. if we think of a real situation. • Short selling is permitted (i. (If (αh ) > 0. and then we hold the portfolio while the asset moves randomly.) So. all riskfree portfolios must earn the riskfree rate of return. If we were taking a true diﬀerential then equation (2.38) into equation (2.75) gives dP = Vt + σ2 S 2 VSS dt 2 (2. since we pick (αh ).e. equation (2.74) we not included a term (α )S S. Other forms of stochastic calculus are used in Physics applications (i. Construct an imaginary portfolio. not a diﬀerential. This eliminates the dZ term in equation (2.e.e.75). by choosing (αh ) = VS in equation (2. and a number of (−(αh )) of the underlying asset.77) and (2.79) (2. equation (2. We are not allowed to peek into the future. we must choose (αh ). then we have sold the asset short. Suppose that we have an option whose value is given by V = V (S.2. (otherwise.74) would be dP = dV − (αh )dS − Sd(αh ) but we have to remember that (αh ) does not change over a small time interval. since we shall see (later on) that (αh ) actually depends on S. So.76) Since P is now riskfree in the interval t → t + dt. • The riskfree rate of return is a constant r.
Note the rather remarkable fact that equation (2. By the noarbitrage condition. and end up with a positive amount at the end. the value of the option is given by an equation similar to the BlackScholes equation. 2. not science! 15 . Equation (2. If the amount in the bank is positive. Cash will ﬂow into and out of the bank account. since we have to continually rebalance the portfolio.8 Hedging in Continuous Time We can construct a hedging strategy based on the solution to the above equation. Suppose we sell an option at price V at t = 0. The BlackScholes price is not the expected payoﬀ. regardless of the value of S. If negative. and end up with a positive gain. • V − S ∂V cash in the bank account. • We borrow (S ∂V − V ) from the bank. Then we carry out the following • We sell one option worth V . ∂S • We buy ∂V ∂S shares at price S. If the price was higher then the BlackScholes price. we could short the hedging portfolio. The value of the option is based on a hedging strategy which is dynamic. In this case.81) is solved backwards in time from the option expiry time t = T to the present t = 0. The price is the cost of setting up the hedging portfolio.81) is independent of the drift rate µ. in response to changes in S.80) which is the BlackScholes equation. if the price was lower than the BlackScholes price. For a speculator. this should not be possible. Note that given the receipt of the cash for the option. we can see that the price of the option valued by the BlackScholes equation is the market price of the option at any time. dynamically adjust the hedge. A speculator is making bets about the underlying drift rate of the stock (note that the drift rate does not appear in the BlackScholes equation). Similarly. and must be continuously rebalanced. the price can be interpreted as the expected payoﬀ based on the guess for the drift rate.9 The option price So.81) (2.80) into equation (2. who buys and holds the option. this portfolio can be liquidated and any obligations implied by the short position in the option can be covered. Note ∂S that this is a dynamic hedge. our hedging portfolio will be • Short one option worth V . this strategy is selfﬁnancing. ∂S At any instant in time (including the terminal time). (This gives us V in cash initially). at zero gain or loss. we adjust the amount of stock we own so that we always have ∂V shares.79) gives Vt + σ2 S 2 VSS + rSVS − rV = 0 2 (2. we receive the risk free rate of return. But this is art. except that the drift rate appears. which is a random process. 2. we could construct the hedging portfolio. • Long ∂V ∂S shares at price S. The price given by the BlackScholes price is not the value of the option to a speculator. At every instant in time.Since P = V − (αh )S = V − VS S then substituting equation (2. then we borrow at the risk free rate. Note that we are not trying to predict the price movements of the underlying asset. So.
t) + dV ) − V (S. τ ) → Vτ = −rV S for a call V (S = ∞.81) becomes Vτ V (S. and therefore we can determine the optimal course of action.e. t + dt)) 1 + ρdt (3. 0) max(K − S. then we also have the additional constraints V (S. We can rewrite equation (3. t + dt) given that V = V (S. τ ) → 0 for a put If the option is American. More formally. 0) for a call max(K − S. an investor acting optimally. the expected value of V (S + dS.82) Note that since we are working backwards in time.1) as (ignoring terms of o(dt). 0) max(K − S. τ ) ≥ Deﬁne the operator σ2 S 2 VSS + rSVS − rV ) 2 and let V (S. we wanted to know the expected value of the option. then these cash ﬂows should be discounted at an appropriate discount rate. most options traded are American options. t) at t = t. deﬁne τ = T − t. In order to write equation (2. which is uncertain.81) with the additional constraint V (S. where o(dt) represents terms that go to zero faster than dt ) ρdtV (S. which have the feature that they can be exercised at any time. t) . 0) for a call for a put (2. t) ≥ max(S − K. Thus V (S.84) (2. we know what the option is worth in future. So. the value of an American option is given by the solution to equation (2. 0) = V ∗ .81) in more conventional form. and let V (S + dS.86) 3 The Risk Neutral World Suppose instead of valuing an option using the above noarbitrage argument. the riskier the cash ﬂows. If we are considering the value of risky cash ﬂows in the future. τ = 0) = = σ2 S 2 VSS + rSVS − rV 2 max(S − K.83) (2. and not hedging. so that equation (2. which we will call ρ (i.) is the expectation operator. t) = 1 E(V (S + dS. 16 (3. t + dt) be the value of the option at some future time t + dt.2. i.85) (2.. the higher ρ). will always exercise the option if the value falls below the payoﬀ or exercise value.10 American early exercise Actually. since S evolves randomly.e. t) = E(V (S. Consequently the value of an option today can be considered to the be the discounted future value. This is simply the old idea of net present value. 0) for a put V (0.2) . Regard S today as known. We can imagine that we are buying and holding the option. the American option pricing problem can be stated as LV ≡ Vτ − ( LV V −V∗ (V − V ∗ )LV ≥ 0 ≥ 0 = 0 max(S − K..1) where E(. Consequently. 0) for a call for a put (2.
2 (3. Then.2) becomes ρdtV (S. However.12) .11) which should be the noarbitrage value. this result is useful for determining the noarbitrage value of an option using a Monte Carlo approach. In other words. maybe this is the value that we are interested in. t) + dV ) − V (S.9) then we simply get the BlackScholes equation (2. t) = E(dV ) .9) Vt + σ2 S 2 VSS + µSVS dt . T )) (3. It is best to think of this as simply a mathematical ﬂuke. 2 17 (3. Suppose we want to know the expected value (in the real world) of an asset which pays V (S. then E(V (S. Investors would be very stupid to think that the drift rate of risky investments is r. so that equation (3. Nevertheless. Estimating the appropriate discount rate is always a thorny issue.3) (3. we have to estimate the drift rate µ. 0) = e−rT EQ (V (S. S σ2 S 2 VSS + µSVS dt + σSdZ .9) is the PDE for the expected value of an option. note the interesting fact. we can determine the noarbitrage price by pretending we are living in a world where all assets drift at rate r.6) Equation (3. I’d rather just buy riskfree bonds in this case.8) gives Vt + σ2 S 2 VSS + µSVS − ρV = 0 . This makes it clear that we are taking the expectation in the risk neutral world (the expectation in the Q measure). This means that the noarbitrage price of an option is identical to the expected value if ρ = r and µ = r. the expected value (today) is given by solving Vt + σ2 S 2 VSS + µSVS = 0 . if this is the case. t) = E(dV ) . If we are not hedging.5) From Ito’s Lemma (2.81). This result is the source of endless confusion. then we compute V (S.4) (3. There is in reality no such thing as a riskneutral world. This is the socalled risk neutral world. t = T ) at t = T in the future.8) = 0 (3. if we set ρ = r and µ = r in equation (3. This does not have any reality. Assume that dS = µdt + σdZ . 2 (3. not the noarbitrage value.38) we have that dV Noting that E(dZ) then E(dV ) = Combining equations (3. we simply assume that dS = rSdt + σSdZ (3. 2 (3. and the discount rate ρ. If we know the option payoﬀ as a function of S at t = T . Now.7) = Vt + (3. This contrasts with the realworld expectation (the P measure). and all investments are discounted at rate r.43.Since we regard V as known today. Note the EQ in the above equation. Using this numerical method.10) and simulate a large number of random paths.
Suppose that we assume that the underlying process is dS = rdt + σdZ (4. . 1). m = 1. but are the risk neutral paths. suppose we are going to receive V = S(t = T ).4) Suppose we run a series of trials.3) For example.. 0) K = Strike Price (4. we should remember that we are 18 . (3. = Seµ(T −t) . A(t)S. and we ﬁnd that V = Const.14) then says that E[V (S(t = 0).where we have dropped the discounting term. we can acquire the asset for price S(t = 0).1) S then we can simulate a path forward in time. Assume that the solution to equation (3. we have a single realized path.13) Noting that we receive V = S at t = T means that V (3. m=1 (4. i. In particular.e. Equation (3. just the asset at t = T .. and φi is a random number which is N (0. After N steps. S(t) = S(0) exp[(µ − (3. if dS then (setting t = T ) E[S] = Seµt .2) where ∆t is the ﬁnite timestep.18) 2 So that we have just shown that E[S] = Seµt by using a simple PDE argument and Ito’s Lemma.17) = Sµ dt + Sσ dZ (3.5) Recall that these paths are not the real paths. At t = T . with T = N ∆t. Recall that the exact solution to equation (3. we generate a new random number. Then. then V alue = (4. the asset is worth S(t = T ). Seµ(T −t) . max(S N − K. using a forward Euler timestepping method (S i = S(ti )) √ S i+1 = S i + S i (r∆t + σφi ∆t) (4.15) 4 Monte Carlo Methods This brings us to the simplest numerical method for computing the noarbitrage value of an option. if the option was a European call. M . t = 0)] = E[S(t = 0)] = S(t = 0)eµ(T ) In other words. Note that at each timestep.12) is V = Const. and denote the payoﬀ after the m th trial as payof f (m).16) (3. starting at some price today S 0 . Isn’t this easier than using brute force statistics? PDEs are much more elegant.57)) σ2 )t + σ(Z(t) − Z(0))] . (3.16) is (equation (2. Given the payoﬀ function of the option.14) Today. the noarbitrage value of the option is Option V alue = e−rT E(payof f ) e−rT 1 M m=M payof f (m) . Now. the value for this path would be V alue = P ayof f (S N ) ..
√ ) M ∆t = timestep M = number of Monte Carlo paths (4. the convergence in terms of timestep error is often not done.57) has the property that the asset value S can never reach S = 0 if S(0) > 0. and if F < 0.7) = O (Complexity)−1/3 In practice. for example. and ignore the timestep error. (4. We also have that Complexity = O = O ∆t and hence Error = O 1 ( Complexity)1/3 . i. Sometimes this gives bad results! Note that the exact solution to Geometric Brownian motion (2. PDE methods become very expensive computationally. more than three) underlying assets. However. In this case. 2. We can use this idea for any stochastic process where the variable should not go negative. 19 . This is easy if we use a PDE method. it doesn’t make sense to drive the Monte Carlo error down to zero if there is O(∆t) timestepping error. People just pick a timestep. this just means that S is very small. one day. since S = eF . approximating the solution to the SDE by forward Euler. • Set S = 0 and continue. in this case • Cut back the timestep at this point in the simulation so that S is positive. As well. S remains zero for the rest of this particular simulation. it is possible that a negative or zero S i can show up. The error in the Monte Carlo method is then Error 1 = O max(∆t. 2 (4. In order to make these two errors 1 the same order. we obtain (with µ = r) dF = (r − σ2 )dt + σdZ .8) M ∆t 1 (∆t)3 (4. then. if we have more than three factors. There are thus two sources of error in the Monte Carlo approach: timestepping error and sampling error. • Use Ito’s Lemma. if F = log S.6) Now. As well. from equation (2. We can do one of three things here. since we solve the PDE backwards in time. due to the approximate nature of our Forward Euler method for solving the SDE.1. approximating the expectation by the mean of many random paths. However. so we always know the continuation value and hence can act optimally. which has O(∆t) truncation error. we should choose M = O( (∆t)2 ). if F < 0.55). since we are simulating forward in time.e. if we want to determine the eﬀects of discrete hedging. we cannot know at a given point in the forward path if it is optimal to exercise or hold an American style option. and determine the SDE for log S. and increase the number of Monte Carlo samples until they achieve convergence in terms of sampling error. This makes the total error O(∆t).e. This Monte Carlo error is of size √ O(1/ M ). a Monte Carlo approach is very easy to implement. there is no problem. The slow rate of convergence of Monte Carlo methods makes these techniques unattractive except when the option is written on several (i. which is slowly converging. in any ﬁnite time.9) so that now. We should seek to balance the timestepping error and the sampling error. i.e.
12) then the 95% conﬁdence interval for the actual value V of the option is 1. See for example.10) where φ ∼ N (0. In this case S(T ) = S(0) exp[(r − √ σ2 )T + σφ T ] 2 (4..2 Random Numbers and Monte Carlo There are many good algorithms for generating random sequences which are uniformly distributed in [0. ACM Transactions on Mathematical Software. If the estimated mean of the sample is µ = ˆ and the standard deviation of the estimate is ω = 1 M −1 m=M 1/2 e−rT M m=M payof f (m) m=1 (4. 1].2). and we can avoid timestepping errors (see Section 2. 4. • Easily handles multiple assets. which are the hedging parameters. are extremely bad. American options).Usually. As long as the timestep is not too large.g. then the SDE can be solved exactly.13) Note that in order to reduce this error by a factor of 10. The disadvantages of Monte Carlo methods are • It is diﬃcult to apply this idea to problems involving optimal decision making (e. For these simple cases. The Matlab functions appear to be quite good. please look at (Park and Miller. “The Mersenne Twister: a 623 dimensionally equidistributed uniform pseudorandom number generator. • It is hard to compute the Greeks (VS .” ACM Transactions 20 . hence any errors incurred will not aﬀect the expected value very much. very accurately. I’ll remind you that equation (4. • MC converges slowly. this does not work in more realistic situations.10). As pointed out in this book. Press et al. we should always use equation (4. If negative S values show up many times. where r. most people set S = 0 and continue. VSS ).11) (e m=1 −rT payof f (m) − µ) ˆ 2 (4. the number of simulations must be increased by 100. and the infamous RANDU IBM function. σ are constants.6. 1). this situation is probably due to an event of low probability. Cambridge University Press.10) is exact. For more details.96ω <V <µ+ √ ˆ µ− √ ˆ M M (4. 31 (1988) 11921201). comparing the solutions. this is a signal that the timestep is too large. (Numerical Recipies in C++.1 Monte Carlo Error Estimators The sampling error can be estimated via a statistical approach.96ω 1. Easily handles complex path dependence. Another good generator is described in (Matsumoto and Nishimura. 4. The timestep error can be estimated by running the problem with diﬀerent size timesteps. Monte Carlo is popular because • It is simple to code. such as rand in the standard C library. Unfortunately. 2002). In the case of simple Geometric Brownian motion. often the system supplied random number generators.
i. for x ∈ [0.22) .20) or x = −∞ e−(y ) /2 √ dy . p(x) = 1 on [0. 21 (4. 1].14) Let’s take a function of this random variable y(x). there exists a y such that the probability of getting a y in [−∞. x] is dx dy x 2 2 = p(x) dx . y + dy]. if we generate uniformly distributed numbers x on [0. y] is equal to the probability of getting x in [0. 1].17) we have e−(y ) /2 √ dy . otherwise (4. 1]. the probability of obtaining a number between x and x + dx is p(x)dx = dx .21) So.) Code can be downloaded from the authors Web site.e. ˆ e−y /2 √ . 1).on Modelling and Computer Simulation. 2π (4. then to determine y which are N (0. we do the following • Generate x • Find y such that x = We can write this last step as y = F (x) where F (x) is the inverse cumulative normal distribution. x].16) we obtain p(y) = ˆ e−y /2 √ = . dy (4.17) dx 0 = x. 1]. +∞].154. Consequently. (4. Suppose we have uniformly distributed numbers on [0. 2π 2 2 (4.23) 1 √ 2π y e−(y ) −∞ 2 /2 dy . 1)).15) (4. 8 (1998) 330. we need random numbers which are normally distributed on [−∞. 0 ≤ x ≤ 1 = 0 . How is y(x) distributed? Let p(y) be the probability ˆ distribution of obtaining y in [y. then from equations (4. (4.16) 2π If we start with a uniform distribution. However.19) 2π So. with mean zero and variance one (N (0. we must have (recall the law of transformation of probabilities) p(x)dx = p(y)dy ˆ or p(y) ˆ Suppose we want p(y) to be normal. (4.18) but from equation (4. dx = x y 2 dx 0 = −∞ y e−(y ) /2 √ dy . (4. 2π Now. we have that the probability of obtaining a number in [0.
i. x2 uniformly distributed on [0. y2 = y2 (x1 . x2 ) ∂(y1 .30).32) Now.25) (4. suppose we have two original random variables x1 . x2 ) (4. and let p(xi . y + dy]. y2 ) ˆ = p(x1 . p(x1 . y2 . dy (4. ∂(y1 . 1] and x2 ∼ U [0. if y1 y2 and we have that p(y1 . x2 ) where U (x) = 1 .27) = y1 (x1 . x2 ) dy1 dy2 . x2 ) ∂(y1 .31) We denote this distribution as x1 ∼ U [0. y2 ) (4.27) that p(y1 . x2 ) is given by equation (4. Consider y1 y2 = = −2 log x1 cos 2πx2 −2 log x1 sin 2πx2 (4.30) (4. x2 ) which results in normal distributions for y1 . x2 ) = y2 (x1 . y2 ) (4. x2 ) be the probability of obtaining (x1 . x2 + dx2 ]. otherwise . x2 ) ∂(y1 . and p(y) is the probability of ﬁnding ˆ y ∈ [y. suppose that we have x1 . If p(x1 . x1 + dx1 ] × [x2 . from equation (4. If p(x) is the probability of ﬁnding x ∈ [x. 1]. y2 ) = ˆ ∂(x1 .e. x2 ) in [x1 . we want to ﬁnd a transformation y1 = y1 (x1 . x + dx] and if y = y(x). y2 ) = det ∂x1 ∂y1 ∂x2 ∂y1 ∂x1 ∂y2 ∂x2 ∂y2 (4.3 The BoxMuller Algorithm Starting from random numbers which are uniformly distributed on [0. then we have from equation (4. y2 ) (4.15) we have p(x)dx = ˆ(y)dy p or p(y) = p(x) ˆ dx . x2 ). 1]. 0≤x≤1 = 0 . dx1 dx2 = ∂(x1 . i. 1] × [0.26) where the Jacobian of the transformation is deﬁned as ∂(x1 . 1].29) Now. = U (x1 )U (x2 ) (4. x2 ) ∂(x1 . there is actually a simpler method for obtaining random numbers which are normally distributed.e.28) Recall that the Jacobian of the transformation can be regarded as the scaling factor which transforms dx1 dx2 to dy1 dy2 . then.24) Now.4. x2 . Then.33) 22 .
Note that we generate two draws from a normal distribution on each pass through the loop. 1) . 1).34) After some tedious algebra. 2π y1 (4. 1).324. x2 ) ∂(y1 . ρ = −2 log u1 z1 = ρ cos θ. or y1 ∼ N (0. we can see that (using equation (4. 1]. 4. z2 = ρ sin θ End Repeat (4.38) can be expensive due to the trigonometric function evaluations.30) holds. 1) and Z2 ∼ N (0. y2 ) are independent. We can use the following method to avoid these evaluations. then from equations (4. y2 ∼ N (0. 1] .39) which means that (V1 . 1) U θ = 2πu2 . V2 = 2U2 − 1 (4. 1] V1 = 2U1 − 1 .35) Now. y2 ) = 2 2 1 1 √ e−y1 /2 √ e−y2 /2 2π 2π (4. we carry out the following procedure Rejection Method Repeat If ( V12 + V22 < 1 ) Accept Else Reject 23 .37) This gives the following algorithm for generating normally distributed random numbers (given uniformly distributed numbers): Box Muller Algorithm Repeat Generate u1 ∼ √ (0. y2 ) = ˆ 2 2 1 1 √ e−y1 /2 √ e−y2 /2 2π 2π (4.3. Let U1 ∼ U [0. Now. u2 ∼ U (0. assuming that equation (4. normally distributed random variables. 1] × [−1. (4.1 An improved Box Muller The algorithm (4. with mean zero and variance one.35) we have p(y1 . V2 ) are uniformly distributed in [−1.36) so that (y1 .or solving for (x2 .38) This has the eﬀect that Z1 ∼ N (0. U2 ∼ U [0. x2 ) x1 x2 = = exp −1 2 2 (y + y2 ) 2 1 1 y2 tan−1 . 1) .34)) ∂(x1 .
Endif End Repeat (4.40) which means that if we deﬁne (V1 , V2 ) as in equation (4.39), and then process the pairs (V1 , V2 ) using algorithm (4.40) we have that (V1 , V2 ) are uniformly distributed on the disk centered at the origin, with radiius one, in the (V1 , V2 ) plane. This is denoted by (V1 , V2 ) ∼ D(0, 1) . If (V1 , V2 ) ∼ D(0, 1) and R2 = V12 + V22 , then the probability of ﬁnding R in [R, R + dR] is p(R) dR = = 2πR dR π(1)2 2R dR . (4.41)
(4.42)
From the fundamental law of transformation of probabilities, we have that p(R2 )d(R2 ) = p(R)dR = 2R dR
(4.43)
so that p(R2 ) = 2R
d(R2 ) dR
= 1
(4.44)
so that R2 is uniformly distributed on [0, 1], (R2 ∼ U [0, 1]). As well, if θ = tan−1 (V2 /V1 ), i.e. θ is the angle between a line from the origin to the point (V1 , V2 ) and the V1 axis, then θ ∼ U [0, 2π]. Note that cos θ sin θ = = V1 V12 + V22 V2 V12 + V22
.
(4.45)
Now in the original Box Muller algorithm (4.38), ρ= −2 log U1 θ = 2ΠU2 ; U1 ∼ U [0, 1] ; U2 ∼ U [0, 1] ,
(4.46)
but θ = tan−1 (V2 /V1 ) ∼ U [0, 2π], and R2 = U [0, 1]. Therefore, if we let W = R2 , then we can replace θ, ρ in algorithm (4.38) by θ = tan−1 V2 V1 (4.47)
ρ =
−2 log W .
Now, the last step in the Box Muller algorithm (4.38) is Z1 Z2 = ρ cos θ = ρ sin θ ,
(4.48)
24
but since W = R2 = V12 + V22 , then cos θ = V1 /R, sin θ = V2 /R, so that Z1 Z2 This leads to the following algorithm Polar form of Box Muller Repeat Generate U1 ∼ U [0, 1], U2 ∼ U [0, 1]. Let V1 V2 W If( W < 1) then −2 log W/W −2 log W/W (4.50) = 2U1 − 1 = 2U2 − 1 = V12 + V22 V1 = ρ√ W V2 = ρ√ . W
(4.49)
Z1 Z2 End If End Repeat
= V1 = V2
Consequently, (Z1 , Z2 ) are independent (uncorrelated), and Z1 ∼ N (0, 1), and Z2 ∼ N (0, 1). Because of the rejection step (4.40), about (1 − π/4) of the random draws in [−1, +1] × [−1, +1] are rejected (about 21%), but this method is still generally more eﬃcient than brute force Box Muller.
4.4
Speeding up Monte Carlo
1 O( √ ) M
Monte Carlo methods are slow to converge, since the error is given by Error =
where M is the number of samples. There are many methods which can be used to try to speed up convergence. These are usually termed Variance Reduction techniques. Perhaps the simplest idea is the Antithetic Variable method. Suppose we compute a random asset path √ S i+1 = S i µ∆t + S i σφi ∆t where φi are N (0, 1). We store all the φi , i = 1, ..., for a given path. Call the estimate for the option price from this sample path V + . Then compute a second sample path where (φi ) = −φi , i = 1, ...,. Call this estimate V − . Then compute the average ¯ V = V++V− , 2
¯ and continue sampling in this way. Averaging over all the V , slightly faster convergence is obtained. Intuitively, we can see that this symmetrizes the random paths.
25
Let X + be the option values obtained from all the V + simulations, and X − be the estimates obtained from all the V − simulations. Note that V ar(X + ) = V ar(X − ) (they have the same distribution). Then V ar( X+ + X− ) 2 = = 1 V ar(X + ) + 4 1 V ar(X + ) + 2 1 1 V ar(X − ) + Cov(X + , X − ) 4 2 1 + − Cov(X , X ) 2
(4.51)
which will be smaller than V ar(X + ) if Cov(X + , X − ) is nonpositive. Warning: this is not always the case. For example, if the payoﬀ is not a monotonic function of S, the results may actually be worse than crude Monte Carlo. For example, if the payoﬀ is a capped call payoﬀ = min(K2 , max(S − K1 , 0)) K2 > K 1
then the antithetic method performs poorly. Note that this method can be used to estimate the mean. In the MC error estimator (4.13), compute the standard deviation of the estimator as ω = V ar( X +X ). 2 However, if we want to estimate the distribution of option prices (i.e. a probability distribution), then we should not average each V + and V − , since this changes the variance of the actual distribution. If we want to know the actual variance of the distribution (and not just the mean), then to compute the variance of the distribution, we should just use the estimates V + , and compute the estimate of the variance in the usual way. This should also be used if we want to plot a histogram of the distribution, or compute the Value at Risk.
+ −
4.5
Estimating the mean and variance
M
An estimate of the mean x and variance s2 of M numbers x1 , x2 , ..., xM is ¯ M s2 M = 1 M −1 1 M
M
(xi − x)2 ¯
i=1
x = ¯ Alternatively, one can use s2 M =
xi
i=1
(4.52)
M 1 1 x2 − M − 1 i=1 i M
M
2
xi
i=1
(4.53)
which has the advantage that the estimate of the mean and standard deviation can be computed in one loop. In order to avoid roundoﬀ, the following method is suggested by Seydel (R. Seydel, Tools for Computational Finance, Springer, 2002). Set α1 = x1 ; β1 = 0 then compute recursively αi βi so that x = αM ¯ βM s2 = M M −1 26 xi − αi−1 i (i − 1)(xi − αi−1 )2 = βi−1 + i = αi−1 + (4.54)
(4.55)
(4.56)
e. 1]. quasiMonte M Carlo methods have been devised.59) with. ˆ (4.60)  xN So. 1] = F (xLDS ) is N (0. (M = number of samples) behaviour of Monte Carlo methods. 1]. in N dimensional space ˆ x1 x x= 2 . N − 1 k+1 Sj the cumulative normal distribution for each (4. then we need to think of this as a problem in N dimensional space. So. .58) Another problem has to do with the fact that if we are stepping through time. then this error bound is (at least asymptotically) better than Monte Carlo. We are trying to uniformly sample from this N dimensional space. 1) (4.61) √ k k = Sj + Sj (r∆t + yk+1 σ ∆t) ˆj (4. if d is small. The idea here is that a Monte Carlo method does not ﬁll the sample space very evenly (after all. LDS methods generate numbers on [0. the k − th timestep is sampled from the k − th coordinate in this N dimensional space. an LDS algorithm would proceed as follows. In other words. N steps in total. We have to invert the cumulative normal distribution in order to get the numbers distributed with mean zero and standard deviation one on [−∞. ˆ • Generate the normally distributed vector y j by inverting ˆ component F (xj ) 1 F (xj ) j 2 y = ˆ  F (xj ) N • Generate a complete sample path k = 0. +∞]. for the j th trial • Generate xj (the j th LDS number in an N dimensional space).62) N • Compute the payoﬀ at S = Sj 27 . If d is the dimension of the space. Let x be a vector of LDS numbers on [0.57) where M is the number of samples used.. since these numbers are deterministic.. its random). say. √ S n+1 = S n + S n (r∆t + φσ ∆t) φ = N (0. A low discrepancy sequence tends to sample the space in a orderly fashion. then the worst case error bound for an LDS method is Error = O (log M )d M (4. then xLDS yLDS = uniformly distributed on [0.6 Low Discrepancy Sequences In a eﬀort to get around the √1 . if F (x) is the inverse cumulative normal distribution. i. Clearly. We cannot use the BoxMuller method in this case to produce normally distributed numbers. (4..4. These techniques use a deterministic sequence of numbers (low discrepancy sequences). 1) .
65) where ρij is the correlation between asset i and asset j.e. d assets.63) which is a very large number of trials for d 100. . . d .e. Assuming Ψ is SPD. it is easy to generate a set of d uncorrelated N (0. The denominator only dominates when M ed (4. and LDS seems to work if the number of timesteps is less than 100 − 200. Assume that this matrix is SPD (if not. the numerator in equation (4. given ¯. 1) and i E(φn φn ) i j = ρij 1 .67) ¯ Let φ be the vector of correlated normally distributed random numbers (i. convergence seems to slow down. 4. once the dimensionality gets above a few hundred. what we are given). Now. If we use a reasonable number of timesteps.. at least for pathdependent options. and each asset follows the simulated path √ n+1 n n Si = Si + Si (r∆t + φn σi ∆t) i (4.64) where φn is N (0. 1) numbers (i.66) be the matrix of correlation coeﬃcients. etc. For d large.57) dominates. There are a variety of LDS numbers: Halton. we have E( i j) = δij where δij = 0 . so that ρij = k Lik Lt kj (4.. However. Suppose we have i = 1. which gives a very bad error bound. (4.. Niederrieter. d = 50 − 100. Sobol. how do we (4. 28 . we would like to generate correlated.68)   φd d So. one of the random variables is a linear combination of the others. Call these produce correlated numbers? Let [Ψ]ij = ρij So. if i = j = 1 . we have found that things are not quite this bad. normally distributed random numbers.. and let ¯ be the vector of uncorrelated N (0. Fortunately. Note that the worst case error bound for the error is given by equation (4. ¯= 2 (4.7 Correlated Random Numbers In many cases involving multiple assets. 1) variables. what we want to get). we can Cholesky factor Ψ = LLt .. then. Our tests seem to indicate that Sobol is the best. hence this is a degenerate case)..The option value is the average of these trials.57). say 50 − 100. if i = j . φ1 1 φ ¯ φ = 2 .
SIAM Review vol.73) (4. check out (“An algorithmic introduction to numerical simulation of stochastic diﬀerential equations. This article also has some good tips on solving SDEs using Matlab. (4.2) to integrate dS = µS dt + σS dZ . Recall that √ dZ = φ dt (4. Now.” by D.69) which gives φi φk = j l Lij Lkl Lij j l l j = Now. taking full advantage of the vectorization of Matlab.71) ¯ φ = L¯ 4. in order to generate correlated N (0.70) j l Lij Lij E( t l j Llk t l j )Llk = j l Lij δlj Lt lk Lil Lt lk l = = ρij So.since the i are uncorrelated. let’s consider the following situation. The forward Euler algorithm is simply S i+1 √ = S i + S i (µh + φi h) (4. For a good overview of these methods. 1) numbers • Correlated numbers φi are given from i (4. we have been fairly slack about deﬁning what we mean by convergence when we use forward Euler timestepping (4. Note that eliminating as many for loops as possible (i.e. 43 (2001) 525546).74) 29 . computing all the MC realizations for each timestep in a vector) can reduce computation time by orders of magnitude.8 Integration of Stochastic Diﬀerential Equations Up to now. Higham. E(φi φk ) = E = j l t l j Llk . 1) numbers: • Factor the correlation matrix Ψ = LLt • Generate uncorrelated N (0. Before we start deﬁning what we mean by convergence. let φi = j Lij j (4. in particular.72) where h = ∆t is the ﬁnite timestep.
. Z(tk ) = Zk along the Wiener path. Z1 .72) using forward Euler.75) Now. we want to do this by ﬁlling in new Z values in the Brownian path. Zk .73).76) where T is the stopping time of the simulation. how can we systematically study convergence of algorithm (4.. Then how do we generate Zj using equation (4. yz) is p(x. given that Zk is known? Let x.77)? Since we are now specifying that we know Zk .53). in view of equation (2. 30 .79) and (4.. This is because even though the Brownian path points are exact. how do we reﬁne this path. since there is no timestepping error here. for a given path. and we generate Zk directly using √ Zk = Zi + z tk − ti .81) √ tk − tj Now the probability density of drawing the pair (x.g. (4. given a set of valid Zk . recall that the exact solution (for a given Brownian path) of equation (4. we must have that. Let S(T )h represent the forward Euler solution (4. e.78) (4. As h → 0... denoted by p(x. given z. Now if we integrate equation (4.where φ is a random draw from a normal distribution with mean zero and variance one. For example. we would expect S(T )h → S(T ). yz) = p(x)p(y) p(z) (4.72) is given by equation (2. Z1 . by √ Zi+1 = Zi + φ ∆t .76). Suppose we have the point Z(ti ) = Zi and we generate Z(tj ) = Zj . 1). this means that our method for generating Zj is constrained. y) given z. (4.) is a standard normal distribution. So. }.73) for a ﬁxed timestep h. with ti < tj < tk . with the discrete timesteps ∆t = ti+1 − ti .82) where p(. y be two draws from a normal distribution with mean zero and variance one. 4.76). Suppose on the other hand. However. keeping the existing points along this path? In particular. suppose we have two points Zi . Zk = Zi + x tj − ti + y (4. we can generate Zj . at (ti . this set of values {Z0 .80) where z is N (0.1 The Brownian Bridge So. we will not get the exact solution (4.77) (4. using the realization of the Bownian path {Z0 .57) S(T ) = S(0) exp[(µ − σ2 )t + σ(Z(T ) − Z(0))] 2 (4. and we would like to generate a point Zj at tj . (4. ... Let’s imagine generating a set of Z values at discrete times ti . and we have used the fact that successive increments of a Brownian process are uncorrelated.80) √ √ z tk − ti − x tj − t i y = . Zj Zk = Zi + x tj − ti = Zj + y tk − tj tk − tj . Now. and Zi .73)? We can simply take smaller timesteps. we have Zi . So. } are valid points along a Brownian path.79) So. y). from equations (4..8. tk ). while keeping the old values (since these are perfectly legitimate values). Z(ti ) = Zi . Let S(T ) be the exact solution (4. How should we pick Zj ? What density function should we use when generating Zj . time discretization errors are introduced in equation (4. given (x. these are all legitimate points along a Brownian motion path. Zk .
the ﬁnal timestep path will pass through the coarse timestep nodes. By construction. 4. Figure 4. y(x. z)z) = = or (after some algebra. and then seeing if the solution converged to exact solution. it is not obvious how to deﬁne convergence. In this case.8. z)) p(z) 1 1 √ exp − (x2 + y 2 − z 2 ) 2 2π (4.88).73) fed with the Brownian paths in Figure 4.87) tj − t i (Zk − Zi ) tk − t i (4. 1).89) is known as the Brownian Bridge.72). then the ﬁne timestep path is constructed from the coarse path using a Brownian Bridge.90) .81)) p(xz) = 1 1 √ exp − (x − αz)2 /β 2 2 2π tj − t i α= tk − t i tk − t j β= t k − ti p(x)p(y(x. hγ (4. due to the timestepping error. 1). Substituting equation (4.88) = tk − tj tk − ti (4. y(x. Figure 4.1 shows diﬀerent Brownian paths constructed for diﬀerent timestep sizes. For the model SDE (4.2 Strong and Weak Convergence Since we are dealing with a probabilistic situation here.85) where φ is N (0. Equation (4.77) gives Zj = t k − tj tk − t i Zi + tj − ti t k − ti Zk + φ (tj − ti )(tk − tj ) (tk − ti ) (4.86) = Z − Zi √k tk − ti (4. Clearly. using equation (4. An initial coarse path is constructed. x satisﬁes equations (4.2 shows the asset paths integrated using the forward Euler algorithm (4.81).From equation (4. we could imagine reﬁning this path (using a Brownian Bridge). let √ x = tj − ti (Zk − Zi ) + φ tk − ti tk − tj tk − ti (4.89) where φ is N (0. Since z we have that x has mean √ E(x) = and variance E[(x − E(x))2 ] Now. z).86) and (4.1. yz) = p(x.84) so that x is normally distributed with mean αz and variance β 2 . we can write y = y(x.88) into (4. note that the ﬁne timestep path does not coincide with the coarse timestep nodes. we could ask that E S(T ) − S h (T ) 31 ≤ Const. so that p(x.83) (4. z)z) p(x. Given a number of points along a Brownian path.
32 .1 0.6 0.4 0.7 0.7 0.7 0. Left: each coarse timestep is divided into 16 substeps.3 0.8 0.9 1 Time Time Figure 4. Note that the small timestep points match the coarse timestep points.5 0. the ﬁnal asset value will converge to the exact solution to the SDE.2 Brownian Path 0.7 0. Right: each coarse timestep divided into 64 substeps.1 0.4 0. 120 115 110 105 100 95 90 85 80 120 115 110 105 100 95 90 85 80 Asset Price 0 0.4 0. for small enough timesteps.6 0.8 0. Right: each coarse timestep divided into 64substeps.8 0.9 1 Time Time Figure 4.1 used to determine asset price paths using forward Euler timestepping (4.1 0.5 0. Left: each coarse timestep is divided into 16 substeps.2: Brownian paths shown in Figure 4.4 0.0.1 0. Eventually.2 0.3 0.2 0.2 0.5 0 0.5 0 0. In this case.1 0.3 0.9 1 Brownian Path 0.6 0.73).6 0.5 0.3 0.2 0.4 0.2 0.5 0.3 0.5 0.9 1 Asset Price 0 0.4 0.4 0.1: Eﬀect of adding more points to a Brownian path using a Brownian bridge. note that the asset paths for ﬁne and coarse timestepping do not agree at the ﬁnal time (due to the timestepping error).1 0 0.3 0.5 0.4 0.2 0.1 0 0.3 0.1 0.2 0.3 0.8 0.
93) by the theoretical limit (4. and S(T )i is the exact solution along this same path.1 shows some test data used to integrate the SDE (4.90) is called strong convergence. Note that S(T ) is the exact solution along a particular Brownian path. Criterion (4. we can see that the ratio of the errors is about 2 for the strong error. and strong convergence with γ = . and γ = 1. we have the exact solution 1 N →∞ N lim N [S(T )i ] = S0 eµT i=1 (4. so we would have to use an enormous number of Monte Carlo samples. hγ (4.5 for strong convergence.94) but we do not replace the approximate sampled value of the limit in equation (4.06 100 Table 4. These paths were systematically reﬁned using the Brownian Bridge construction. beginning with a coarse timestep path. If we use enough Monte Carlo samples.0 for weak convergence. Note that for equation (4.93) requires a large number of Monte Carlo samples in order to ensure that the error is dominated by the timestepping error.93) i=1 where S h (T )i is the solution obtained by forward Euler timestepping along the i th Brownian path. Table 4.5. Weak Error = [S(T )i ] − i=1 1 N N (4.90) is over many Brownian paths. the Monte Carlo sampling error is much larger than the timestepping error. Table 4.73). even using equation (4. where the expectation in equation (4. A series of Brownian paths was constructed. A less strict criterion is E [S(T )] − E S h (T )  ≤ Const.72) using method (4. for normal parameters. and about one for the weak error. 5 The Binomial Model We have seen that a problem with the Monte Carlo method is that it is diﬃcult to use for valuing American style options.4 .91) It can be shown that using forward Euler results in weak convergence with γ = 1.2 shows results where the strong and weak convergence errors are estimated as Strong Error = 1 N  1 N N S(T )i − S h (T )i  i=1 N (4.72). the same path used to compute S h (T ). and N is the number of samples. Estimating the weak error using equation (4.93) will measure the timestepping error.25 .92) S h (T )i  .T σ µ S0 . However. This is consistent with a convergence rate of γ = . and h is the timestep size. Recall that the holder of an American option can exercise the option at any time and receive 33 . we could replace the approximate expression 1 N →∞ N lim N [S(T )i ] i=1 by S0 eµT . as opposed to the Monte Carlo sampling error. but for normal parameters.1: Data used in the convergence tests.1. √ In Table 4.94).
and hence we don’t know how to act optimally. we require that the sizes of √ the random jumps are ∆X = σ ∆t and that the probabilities of up (p) and down (q) moves are pr = = qr = = 1 [1 + 2 1 [1 + 2 1 [1 − 2 1 [1 − 2 α√ ∆t] σ r σ √ − ∆t] σ 2 √ α ∆t] σ r σ √ − ∆t] . n+1 → Sj . with probability pr with probability q r (5. Now.5. Clearly. This is actually a dynamic programming problem.1. timestep n. Recall that we can determine the noarbitrage value of an option by pretending we live in a riskneutral world. 2 (5. In fact. we have to compare the value of continuing to hold the option (the continuation value) with the payoﬀ. we get the process (5.00174 . At any point in time.0095 Weak Error (4. the asset can move up with probability pr and down with probability r q .55) ) dX = (r − σ2 )dt + σdZ . we exercise.2) where we have denoted the risk neutral probabilities by pr and q r to distinguish them from the real probabilities p. so that in terms of asset price. 100.00093 .1) Now. otherwise. In order to determine whether or not it is worthwhile to hold the option. These sorts of problems are usually solved by proceeding from the end point backwards. so that equation (5. q.91) .2: Convergence results. We have to start from the terminal time and work backwards. Now. We use the same idea here.000 samples used. Data in Table 4.00047 Table 4.90) . If we are at node j. we don’t know what happens in the future. if we simulate forward in time. where risky assets drift at r and are discounted at r. consider that at node (j. n). In other words n Sj n Sj n+1 → Sj+1 .4). so that (S = eX ) n+1 Sj+1 n+1 Sj n = Sj eσ √ ∆t ∆t n = Sj e−σ √ (5. we will denote this n n n node location by Xj . we can construct a discrete approximation to this random walk using the lattice discussed in Section 2 2.0135 . this is Sj = eXj . Recall that X = log S.0190 . σ 2 (5.5 we showed that ∆X = σ ∆t. since in Section 2.3) √ Now. all we have to do is let let α = r − σ .Timesteps 72 144 288 576 Strong Error (4. as in the Monte Carlo approach.1). the continuation value depends on what happens in the future.00194 . then the risk neutral process for X is (from equation (2.0269 .1) is formally identical to equation 2 (2.4) 34 . In order to ensure that in the limit as ∆t → 0. If the continuation value is greater than the payoﬀ. we will switch to a more common notation. If we let X = log S. the payoﬀ. then we hold.
n (5. as shown on Figure 5. 0 j = 0.6) Then...1: Lattice of stock price values or n Sj 0 = S0 e(2j−n)σ √ ∆t ... . n (5. . n (5. which is V00 .... . j = 0. if we are valuing a put option.7) 0 Rolling back through the tree. 0 j = 0.. Associated with each stock price on the lattice is the option value Vjn . . the ﬁrst step in the process is to construct a tree of stock price values.. We can do this by working backward through the lattice.5) So.. since we know the continuation value. 0) .7) becomes American Lattice Algorithm (Vjn )c Vjn n+1 = e−r∆t pr Vj+1 + q r Vjn+1 = n max (Vjn )c . The binomial lattice method has the following advantages 35 . j = 0. In this case the rollback (5. then VjN = N max(K − Sj .. we can determine if it is optimal to hold or exercise. For example.. we can use the risk neutral world idea to determine the noarbitrage value of the option (it is the expected value in the risk neutral world).. We ﬁrst set the value of the option at T = N ∆t to the payoﬀ.. max(K − Sj . we obtain the value at S0 today. If the option is an American put. The value today is the discounted expected future value European Lattice Algorithm Vjn n+1 = e−r∆t pr Vj+1 + q r Vjn+1 n = N − 1. N (5.2 S2 S1 1 S0 0 S2 1 S1 0 S2 0 Figure 5..8) which is illustrated in Figure 5. 0) n = N − 1.. . ..
(5. q r are not real probabilities. • American options are easy to handle. • It is easy to explain to managers.5. we have that dX = (µ − so that α = µ − σ2 2 (5. the binomial lattice method has the following disadvantages • Except for simple cases. Regarding them as probabilities leads to much fuzzy thinking. convergence is at an O(∆t) rate. 5. S = Sj .2: Backward recursion step. coding becomes complex. let’s consider the usual hedging portfolio at t = n∆t. and not fool around with lattices.11) 36 .10) n in equation (2.1 A Noarbitrage Lattice We can also derive the lattice method directly from the discrete lattice model in Section 2. However. and complex wrongheaded arguments. things become nightmarish. they are simply the coeﬃcients in a particular discretization of a PDE. Consequently. • This method is algebraically identical to an explicit ﬁnite diﬀerence solution of the BlackScholes equation. we might as well do it right. n Pjn = Vjn − (αh )Sj . If we are going to solve the BlackScholes PDE. • It is very easy to code for simple cases. Now.V n+11 j+ p Vj n q V n+1 j Figure 5. if we want to handle simple barrier options. • The probabilities pr .19). For example.9) σ2 )dt + σdZ 2 (5. Suppose we assume that dS = µSdt + σSdZ and letting X = log S.
n Sj n Sj n+1 → Sj+1 . it must earn the risk free rate of return. q r∗ do not depend on the real drift rate µ. (5. S = Sj . so that Pjn n+1 = e−r∆t Pj+1 = e−r∆t Pjn+1 . n What is the meaning of a noarbitrage tree? If we are sitting at node Sj . q r∗ is the common deﬁnition in ﬁnance books. which is not surprising.17) Note that pr∗ . as in Section 2. and assuming that there are only two possible future states n Sj n Sj n+1 → Sj+1 .11).13). (5.18) After a bit more work. q. having truncation error O(∆t).n where Vjn is the value of the option at t = n∆t. q r in equations (5. q r is the same to O(∆t).2). If we expand pr∗ . 37 . (5.15) Since this portfolio is risk free. with Pj+1 from equation (5.12) (5. we can determine (αh ) so that the value of the hedging portfolio is independent of p.13) Pjn+1 = Vjn+1 − (α h n+1 )Sj .16) n+1 Now. = n+1 Sj = so that the value of the hedging portfolio at t = n + 1 is n+1 Pj+1 n+1 n+1 = Vj+1 − (αh )Sj+1 . q r∗ or pr . which is expected.14) so that n+1 n+1 Vj+1 − (αh )Sj+1 n+1 = Vjn+1 − (αh )Sj which gives (αh ) = n+1 Vj+1 − Vjn+1 n+1 n+1 Sj+1 − Sj . since these methods can both be regarded as an explicit ﬁnite diﬀerence approximation to the BlackScholes equation. substitute for Pjn from equation (5. At t = (n + 1)∆t.15) gives Vjn n+1 = e−r∆t pr∗ Vj+1 + q r∗ Vjn+1 eσ ∆t − q r∗ = 1 − pr∗ . since the tree has noarbitrage. we can show that pr∗ q r∗ = pr + O((∆t)3/2 ) = q r + O((∆t)3/2 ) . with probability p with probability q √ n Sj eσ ∆t √ n Sj e−σ ∆t → n+1 Sj n+1 Sj+1 . (5. We do this by requiring that n+1 Pj+1 = Pjn+1 (5. with probability p with probability q . and compare with the pr . Now. V00 using either pr∗ . The deﬁnition pr∗ . q r∗ in a Taylor Series. one can show that the value of the option at t = 0. p r∗ = er∆t − e−σ √ ∆t √ e−σ ∆t √ (5.3. and (αh ) from equation (5. with probability p with probability q → n+1 Sj .
For a lot more details here.19) Of course not. Springer. then we have that the portfolio hedging ratios computed using either pr .3) c(X(s).6. tj ) Zj = Z(tj ) ∆Zj = Z(tj+1 ) − Z(tj ) ∆t = tj+1 − tj N = t/(∆t) In particular. as ∆t → 0 of a discrete sum. 6 More on Ito’s Lemma . Do we actually believe that tomorrow there are only two possible prices for Rim stock Sup Sdown = = 10eσ √ ∆t ∆t √ 10e−σ ? (5. 38 . s)dZ(s) 0 = ∆t→0 lim cj ∆Zj j=0 cj = c(X(Zj ).. we mysteriously made the infamous comment In this Section. t j=N −1 c(X(s).2) where we can interpret the integrals as the limit. by Bernt Oksendal. s) dZ(s)2 0 = ∆t→0 lim 2 c(Xj . we have to decide what t (6.5) Note that we have evaluated the integral using the left hand end point of each subinterval (the no peeking into the future principle). (6. q r∗ are both correct to O(∆t).1. in order to derive Ito’s Lemma. s) dZ(s)2 0 (6. tj )∆Zj . j=0 (6.15) guarantees that the hedging portfolio has the same value in both future states.4) means. q r or pr∗ . We have to go back here. and decide what the statement dX = αdt + cdZ (6. This is obviously a highly simpliﬁed model. s)dZ(s) . For example. The fact that there is noarbitrage in the context of the simpliﬁed model does not really have a lot of relevance to the realworld situation. we will give some justiﬁcation this remark.then using (αh ) from equation (5. The only sensible interpretation of this is t t X(t) − X(0) = 0 α(X(s). we refer the reader to Stochastic Diﬀerential Equations. But let’s be a bit more sensible here.it can be shown that dZ 2 → dt as dt → 0 In Section 2. The best that can be said is that if the BlackScholes model was perfect.. Let ∆t = one day. s)ds + 0 c(X(s). t j=N −1 c(X(s). Suppose the price of RIM stocks is $10 today.1) really means. Suppose we are hedging a portfolio of RIM stock. 1998. Replace the integral by a sum.
s)dZ 2 (s) = 0 0 c(X(s). Now. i. then we can say that (in the mean square limit) ∆t→0 lim j 2 cj ∆Zj = ∆t→0 t lim cj ∆t j = 0 c(X(s). s) ds (6.7) or lim j 2 cj ∆Zj = ∆t→0 ∆t→0 lim cj ∆t j (6. we assume that 2 2 E cj (∆Zj − ∆t)ci (∆Zi − ∆t) 2 = δij E c2 (∆Zi − ∆t)2 i (6. X(Zi )). we will use the notation j=N −1 ≡ j j=0 . This follows since • The increments of Brownian motion are uncorrelated. expanding equation (6. i. Now.From now on.8) which is what we mean by equation (6. • ci = c(ti . 39 . (6. and ∆Zi are independent.11) and hence we can say that dZ 2 → dt with probability one as ∆t → 0.14) 2 2 which means that cj (∆Zj − ∆t) and ci (∆Zi − ∆t) are independent if i = j.9) tends to zero as ∆t → 0. we claim that t t c(X(s). i = j. and consider the expression 2 E 2 cj ∆Zj − j j cj ∆t (6. ij (6.6) Now.e. or E (∆Zj − ∆t)(∆Zi − ∆t) = 0. s)ds (6.9) If equation (6. Cov [∆Zi ∆Zj ] = 0. i = j. which means 2 2 2 2 that Cov ∆Zi ∆Zj = 0.e.12) 2 cj ∆t = 2 2 E cj (∆Zj − ∆t)ci (∆Zi − ∆t) . we can say that dZ 2 → dt as dt → 0. s) ds (6.7).13) Now. s) dZ 2 0 = 0 c(X. let’s consider a ﬁnite ∆t.10) so that in this sense t t c(X.9) gives E 2 cj ∆Zj − j j (6.
23) 7 Derivative Contracts on nontraded Assets and Real Options The hedging arguments used in previous sections use the underlying asset to construct a hedging portfolio. i (6.16) Now. then equation (6.17) becomes 4 2 E ∆Zi − 2∆tE ∆Zi + (∆t)2 4 = = ∆t 3(∆t)2 (6. ∆t) ( normally distributed with mean zero and variance ∆t) so that E (∆Zi )2 E (∆Zi ) so that equation (6. we can get around this using the following approach. dt → 0 . 2 E[c2 ] E (∆Zi − ∆t)2 i i = i 4 2 E[c2 ] E ∆Zi − 2∆tE ∆Zi + (∆t)2 i . we can’t store this either. we can’t store this.22) so that in this sense we can write dZ 2 → dt . (6.20) 2 cj ∆t = O(∆t) (6. or is nonstorable? If the underlying variable is an interest rate. 40 .13) becomes 2 2 E cj (∆Zj − ∆t) ci (∆Zi − ∆t) ij (6.19) and 2 E[c2 ] E (∆Zi − ∆t)2 i i = 2 i E[c2 ](∆t)2 i E[c2 ]∆t i i = 2∆t = O(∆t) so that we have E or t ∆t→0 2 2 cj ∆Zj − j (6.15) = i 2 E[c2 ] E (∆Zi − ∆t)2 . What if the underlying asset cannot be bought and sold. However. Or if the underlying asset is bandwidth.146.18) = 2(∆t)2 (6.17) Recall that (∆Z)2 is N (0. X(s))ds = 0 (6. (6. Using equations (6.It also follows from the above properties that 2 2 E[c2 (∆Zj − ∆t)2 ] = E[c2 ] E[(∆Zj − ∆t)2 ] j j 2 since cj and (∆Zj − ∆t) are independent.15).21) lim E 2 cj ∆Zj − 0 c(s.
t). (7. F2 (they could have diﬀerent maturities for example). ∗ ∗ (σ2 µ1 − σ1 µ2 )dt = r(∆1 F1 + ∆2 F2 )dt (7.7) Since this portfolio is riskless.8) and after some simpliﬁcation we obtain µ1 − rF1 ∗ σ1 = µ2 − rF2 ∗ σ2 (7. b2 FSS + Ft 2 (7. 2 . Then dF1 dF2 ∗ = µ1 dt + σ1 dZ ∗ = µ2 dt + σ2 dZ µi = a(Fi )S + ∗ σi = b(Fi )S b2 (Fi )SS + (Fi )t 2 .1) and let F = F (S. t)dZ.5) We can eliminate the random term by choosing ∆1 ∆2 so that dP ∗ ∗ = (σ2 µ1 − σ1 µ2 )dt ∗ = σ2 ∗ = −σ1 (7.9) Let λS (S.1 Derivative Contracts Let the underlying variable follow dS = a(S.7. (7. we will hedge one contract with another. ∆2 are constant over the hedging interval) P dP = ∆1 F1 + ∆2 F2 ∗ ∗ = ∆1 (µ1 dt + σ1 dZ) + ∆2 (µ2 dt + σ2 dZ) (7. 2 (7. (7.10) 41 .3) = aFS + b2 FSS + Ft dt + bFS dZ. so that from Ito’s Lemma dF or in shorter form dF = µdt + σ ∗ dZ µ = aFS + σ ∗ = bFS . it must earn the riskfree rate of return.4) Consider the portfolio P such that (recall that ∆1 . i = 1.9).6) (7. t) . instead of hedging with the underlying asset. t) = λS (S. so that µ1 − rF1 ∗ σ1 µ2 − rF2 ∗ σ2 = λS (S.2) Now. Suppose we have two contracts F1 . t) be the value of both sides of equation (7. t)dt + b(S.
13) gives (a − λS b) = rS.12) is the PDE satisﬁed by a derivative contract on any asset S. Let qi be the price of this companies stock at t = ti . Suppose µ = Fµ σ∗ = F σ so that we can write dF = F µ dt + F σ dZ (7. denoted by β u . traded or not. Substituting F = S into equation (7. we obtain µ − rF1 σ∗ = λS (7. RM ) . If S is a traded asset.13) then using equation (7.14) (7. so that F then satisﬁes the usual BlackScholes equation if b = σS.18) where E(. We compute β as the best ﬁt linear regression to Ri which means that β Now.e.) is the expectation operator. Suppose we can ﬁnd some companies whose main source of business is based on S.13) in equation (7. Note that the volatility and drift of F are not the volatility and drift of the underlying asset S. it must satisfy equation (7. The extra return is λS σ .12). which is the β for an investment made using equity only.Dropping the subscripts. 2 (7. The unlevered β can be computed by βu = E β E + (1 − Tc )D 42 (7.. its riskiness with respect to S is ampliﬁed.15) which has the convenient interpretation that the expected return on holding (not hedging) the derivative contract F is the riskfree rate plus extra compensation due to the riskiness of holding F .3) into equation (7. then a simple minded idea is to estimate λS = ρSM λM (7. In other words.19) . from CAPM we have that E(R) = M = α + βRi Cov(R.16) where λM is the price of risk of the market portfolio. Another idea is the following.12) Equation (7. if the ﬁrm we used to compute the β above has signiﬁcant debt. The return of the stock over ti − ti−1 is Ri = qi − qi−1 . If we believe that the Capital Asset Pricing Model holds.17) = r + β E(RM ) − r (7..11) gives Ft + b2 FSS + (a − λS b)FS − rF = 0 . a broad index) over the same period. σ ∗ from equations (7. qi−1 M Let Ri be the return of the market portfolio (i.11) gives µ = r + λS σ (7. We would like to determine the unlevered β. V ar(RM ) (7. and ρSM is the correlation of returns between S and the returns of the market portfolio.11) Substituting µ. where λS is the market price of risk of S (which should be the same for all contracts depending on S) and σ is the volatility of F .
25) gives λS = β u E(RM ) − r bFS F = r + λS σ .25) . now the expected return from a pure equity investment based on S is E(Ru ) = r+ β u E(RM ) − r . = bFS . • b(S. since • The unleveraged β u is computed as described above. and S.247.1)). whose main business is based on S.227. This can be done using market data for a speciﬁc ﬁrm. we will have to make some assumption about how the ﬁrm will ﬁnance a new investment. we need to talk to a Finance Professor to get this right. then we are done. the expected return of the market index above the risk free rate is about 6% for the past 50 years of Canadian data. to reduce the eﬀect of debt.23) = β u E(RM ) − r . this may have to be unlevered. At this point. (7.15) says that µ Combining equations (7. If it is a mixture of debt and equity. (7. If it is going to use pure equity. • [E(RM ) − r] can be determined from historical data.22) But equation (7. If we are going to now value the real option for a speciﬁc ﬁrm. we should relever the value of FS . then µ = E(Ru ) = r + β u E(RM ) − r (7.21) If we assume that F in equation (7. F (7.26) In principle. 43 .where D E Tc = = = long term debt Total market capitalization Corporate Tax rate .14) is the company stock. • FS can be estimated by computing a linear regression of the stock price of a ﬁrm which invests in S.20) So. For example.13) that σ∗ σ∗ or σ Combining equations (7. and the ﬁrms balance sheet.24) = Fσ = bFS . Now. • The risk free rate r is simply the current Tbill rate. we can now compute λS .22) gives λS σ Recall from equations (7.3) and (7. (7. t)/S is the volatility rate of S (equation (7. (7. (7.
Therefore. Suppose the forward price is f (S. we can always construct the above hedge at no cost. τ ). K are known with certainty at (S. and there are a set of forward prices available.32). we hedge this contract by going short a forward with the current delivery price f (S. Note that there is no optionality in a forward contract.2 A Forward Contract A forward contract is a special type of derivative contract.31) into equation (7. Suppose we can estimate a. b in equation (7. τ ) (which costs us nothing to enter into). At some time t > 0.2. then the value of this portfolio today is (f − K)e−rτ .28). The payoﬀ of a (long) forward contract expiring at t = T is then V (S.27) But if we hold a forward contract today. 7. τ ).32) which can be interpreted as the fact that the forward price must equal the spot price at t = T . At some time t > 0. V (S.1 Convenience Yield We can also write equation (7.30) (7. Suppose we are long the forward contract struck at t = 0 with delivery price K. = (f − K)e−rτ . we can interpret δ as the convenience yield for holding the asset. there is a convenience to holding supplies of natural gas in reserve. and noting that K is a constant. For example. δ=r− 44 b2 fSS + (r − δ)SfS = 0 2 (7. The payoﬀ of this portfolio is S − K − (S − f ) = f − K Since f. and its value is given by equation (7.12) b2 VSS + (a − λS b)VS − rV = 0 . We can then estimate λS by solving equation (7. K is determined so that the cost of entering into the forward contract is zero at its inception. τ ) .35) S In this case. gives us the following PDE for the forward price (the delivery price which makes the forward contract worth zero at inception) fτ = with terminal condition f (S.31) Substituting equation (7. The value of a forward contract is a contingent claim. (τ < T ).29) (7. τ ). τ = 0) = S (7. (which is set at the inception of the contract) and the current forward price f (S. (7. (7.7. the forward price is no longer K. Suppose we are long a forward contract with delivery price K. then the payoﬀ of a long forward contract.32) and adjusting λS until we obtain a good ﬁt for the observed forward prices.34) .33) b2 fSS + (a − λS b)fS 2 (7. τ ) (7. (7.28) 2 Now we can also use a simple noarbitrage argument to express the value of a forward contract in terms of the original delivery price K.32) as ft + where δ is deﬁned as a − λS b . τ = 0) = S(T ) − K . entered into at (τ ) is Vt + Payoﬀ = S(T ) − f (S(τ ). The holder of a forward contract agrees to buy or sell the underlying asset at some delivery price K in the future.
since in real life. t = 0) ∗ (8. we would like to hedge as infrequently as possible. • Long α(t)h S(t) shares • An amount in a riskfree bank account B(t). we simulate a series of random paths. which we borrow from the bank if (αi − αi−1 ) > 0. we have to update the hedge by purchasing αi − αi−1 shares at t = ti . VS is called the option delta. 8. If (αi − αi−1 ) < 0. i. the hedge is no longer risk free (it is risk free only in the limit as the hedging interval goes to zero). We can compute the variance of this distribution. but which can be taken into account and results in a nonlinear PDE). we determine the discounted relative hedging error error = e−rT P (T ∗ ) V (S0 . we cannot hedge at inﬁnitesimal time intervals. In ﬁnance. Initially.1 Delta Hedging Recall that the basic derivation of the BlackScholes equation used a hedging portfolio where we hold VS shares. we have P (0) = 0 = −V (0) + α(0)h S(0) + B(0) α = VS B(0) = V (0) − α(0)h S(0) The hedge is rebalanced at discrete times ti . the value of the portfolio is P (ti ) = −V (ti ) + α(ti )h S(ti ) + B(ti ) Since we are hedging at discrete time intervals. then we sell some shares and deposit the proceeds in the bank account. For each random path. consider the hedging portfolio P (t) which is composed of • A short option position in an option −V (t). and also the value at risk (VAR). fraction of Monte Carlo trials giving a hedging error between E and E +∆E. Then. In fact.1) After computing many sample paths. VAR is the worst case loss with a given probability. ﬁnd the value 45 . ti ) = V (Si . ti ) then. In other words. hence this strategy is called delta hedging. a typical VAR number reported is the maximum loss that would occur 95% of the time. Suppose we have precomputed the values of VS for all the likely (S.e. so that updating our share position requires h h S(ti )(αi − αi−1 ) h h h h in cash. we can plot a histogram of relative hedging error. If ∆t = ti − ti−1 .8 Discrete Hedging In practice. We can determine the distribution of proﬁt and loss ( P& L) by carrying out a Monte Carlo simulation. then the bank account balance is updated by h h Bi = er∆t Bi−1 − Si (αi − αi−1 ) At the instant after the rebalancing time ti . t) values. For example. As an example. there are transaction costs (something which is ignored in the basic BlackScholes equation. Deﬁning h αi Vi = VS (Si .
1 shows the results for no hedging.25.06. one can √ show that the variance of the hedge error should be proportional to ∆t where ∆t is the hedge rebalance frequency.5 Relative hedge error 0 0. The xaxis in these plots shows the relative P & L of this portfolio (i. At t = 0. σ = . usually the owners of these hedge funds walk away with large bonuses. r = . K = S0 = 100. In this case. which we set at µ = .08. there is a history of Ponzilike hedge funds which simply write put options with essentially no hedging. T = .5 3. Since there are discrete hedging errors. hence the sudden jump in probability. the hedger does nothing. r = . The writer makes this premium for any path which ends up S > K. Figure 8. S0 = 100. However. σ = . and variance is getting smaller as the hedging interval is reduced. and the yaxis shows the relative frequency. The relative P&L is computed by dividing the actual P&L by the BlackScholes price.5 1 0. 8.5 2 1. the results in this case will depend on the stock drift rate. which is many paths. of E along the xaxis such that the area under the histogram plot to the right of this point is .95× the total area. and the shareholders take all the losses.5 1 0. The initial value of the American put.5 Relative hedge error 0 0. K = S0 = 100. hence this strategy is termed deltagamma hedging. and daily. since markets tend to drift up on average.2 shows the results for rebalancing the hedge once a week.5 2. obtained by solving the BlackScholes linear complementarity problem. In fact. which we assume is the BlackScholes value. is $5. Left: no hedging.5 1. and hedging once a month. P & L divided by the BlackScholes price).5 2 1.34.4 4 3. there is signiﬁcant probability of a loss as well. In fact. Figure 8.5 1 2. these funds will perform very well for several years.5 3 3 Relative frequency 2 Relative frequency 3 2. right: rebalance hedge once a month. Blowing up is a technical term for losing all your capital and being forced to get a real job. In this case. The second derivative of an option value VSS is called the option gamma. This is because the maximum the option writer stands to gain is the option premium. Figure 8. µ = .2 Gamma Hedging In an attempt to account for some the errors in delta hedging at ﬁnite hedging intervals. American put.25. T = . then a sudden market drop occurs.1: Relative frequency (yaxis) versus relative P&L of delta hedging strategies.5 0 0 3 2. Note the sudden jump in the relative frequency at relative P &L = 1.1 also shows the relative frequency of the P &L of hedging once a month (only three times during the life of the option). consider the case of an American put option.08.5 0. However.5 1 1 0. and they will blow up. As an example. Relative P& L = Actual P& L BlackScholes price (8. However.2) Note that the nohedging strategy actually has a high probability of ending up with a proﬁt (from the option writer’s point of view) since the drift rate of the stock is positive.5 2 1.3.3. A gamma hedge consists of 46 .5 1 Figure 8. We can see clearly here that the mean is zero.06. we can try to use second derivative information.e. but simply pockets the option premium.
r = . Hence. µ = .4 4 3. so we can regard these as constants (for the duration of the hedging interval).06. then we get back the usual delta hedge.5 2 1.3) hold h −(VS )i + αi + βi (IS )i −(VSS )i + βi (ISS )i ∂V ∂I + αh + β =0 ∂S ∂S ∂2V ∂2I = − 2 +β 2 =0 ∂S ∂S = − (8. β to be constant over the hedging interval (no peeking into the future).5 1 0.2: Relative frequency (yaxis) versus relative P&L of delta hedging strategies.5 3 3 Relative frequency 2 Relative frequency 3 2.5 1 0. Left: rebalance hedge once a week. T = . traders often speak of being long (positive) or short (negative) gamma.5 1 Figure 8.5 0.08. at t = 0 we have h P (0) = 0 ⇒ B(0) = V (0) − α0 S0 − β0 I0 h The amounts αi .5 2 1.5 2 1. • An amount in a riskfree bank account B(t). we need an instrument which has some gamma (the asset S has second derivative zero). So.3.5 Relative hedge error 0 0. The relative P&L is computed by dividing the actual P&L by the BlackScholes price.5 3. then we require that ∂P ∂S ∂2P ∂S 2 Note that • If β = 0.5 1. The hedge portfolio P (t) is then P (t) = −V + αh S + βI + B(t) Assuming that we buy and hold αh shares and β of the secondary instrument at the beginning of each hedging interval.5 1 1 0.5 2.5 Relative hedge error 0 0. • A short option position −V (t).5 1 2. σ = . American put.5 0 0 3 2. βi are determined by requiring that equation (8. • In order for the gamma hedge to work.4) The bank account balance is then updated at each hedging time ti by Bi h h = er∆t Bi−1 − Si (αi − αi−1 ) − Ii (βi − βi−1 ) 47 .3) = 0 = 0 (8. Now. K = S0 = 100. right: rebalance hedge daily. recall that we consider αh . • Long αh S(t) shares • Long β another derivative security I. and try to buy/sell things to get gamma neutral.25.
dZ2 ).5 Relative hedge error 0 0. we are exposed to more model error in this case. Levy.5 2 1. For an additional instrument. 2 (1995) 7788).5 Relative hedge error 0 0. American put. However. We will consider the same example as we used in the delta hedge example. since the option price is expensive (after all. Figure 8.5 years. and to assume that σmin ≤ σ ≤ σmax . T = . as with a gamma hedge. This is great if you can get someone to buy this option at this price. i.40 40 35 35 30 30 Relative frequency 20 Relative frequency 3 2. T = .25. is to construct a vega hedge.5).3: Relative frequency (yaxis) versus relative P&L of gamma hedging strategies. If we use the model in equation (8. √ dS = µSdt + vSdZ1 √ dv = κ(θ − v)dt + σv vdZ2 (8. Math. What if we are not sure about the value of the volatility? It is possible to assume that the volatility itself is stochastic.3 Vega Hedging The most important parameter in the option pricing is the volatility. and price the option in the usual way.5 2 1. we will need to hedge with the underlying asset and another option (Heston. gamma hedging produces a smaller variance with less frequent hedging. S0 = 100. Studies 6 (1993) 327343). σv is the “volatility of volatility” parameter. Fin.3. K = 100.5 1 0. Since there are two sources of risk (dZ1 .e. σ = .5 1 0. µ = . and to hedge based on a worst case (from the hedger’s point of view). Another possibility is to assume that the volatility is uncertain. along with a comparison on delta hedging. Then. right: rebalance hedge daily.08. This results in an uncertain volatility model (Avellaneda. 8.5 1 25 25 20 15 15 10 10 5 5 0 0 3 2. Appl. r = . κ is a parameter controlling how fast v reverts to its mean level of θ. we construct a portfolio 48 .06. Fin. v is its instantaneous volatility. the this will result in a two factor PDE to solve for the option price and the hedging parameters.3 shows the results of gamma hedging. In principle. Paris. the price you get has to cover the worst case scenario). Rev. A closed form solution for options with stochastic volatility with applications to bond and currency options.5) √ where µ is the expected growth rate of the stock price. and Z1 .5 1 Figure 8. An alternative. Pricing and Hedging Derivative Securities in Markets with Uncertain Volatilities. We assume that we know the volatility. Secondary instrument: European put option. But you may not be able to sell at this price. since we need to be able to compute the second derivative of the theoretical price. Left: rebalance hedge once a week. The relative P&L is computed by dividing the actual P&L by the BlackScholes price. we will use a European put option written on the same underlying with the same strike price and a maturity of T=. because the hedger is always guaranteed to end up with a nonnegative balance in the hedging portfolio. much simpler. same strike.5 years. Dotted lines show the delta hedge for comparison. approach (and therefore popular in industry). Z2 are Wiener processes with correlation parameter ρ.
The pricing of options on assets with stochastic volatilities. and then ﬁnite diﬀerence the solutions).3) can be approximated by log( Si+1 − Si + Si ) Si = log(1 + √ σφ ∆t. even if the underlying process is a stochastic volatility. In fact. this is somewhat inconsistent. of Finance. • Long αh S(t) shares • Long β another derivative security I.1) Recall that if then from Ito’s Lemma we have d[log S] = [µ − σ2 ] dt + σ dZ. The hedge portfolio P (t) is then P (t) = −V + αh S + βI + B(t) Assuming that we buy and hold αh shares and β of the secondary instrument at the beginning of each hedging interval. yet do not assume σ is constant when we hedge. 9 Jump Diﬀusion dS = µSdt + σS dZ (9. 2 (9. if ∆t is suﬃciently small. This is usually based on looking at the prices of traded options. 42 (1987) 281300). then ∆t is much smaller than (9.2) Now. taking into account the change in the hedge portfolio through equations (8.• A short option position −V (t). S(ti ) = Si . then we require that ∂P ∂S ∂P ∂σ ∂V ∂I + αh + β =0 ∂S ∂S ∂V ∂I = − +β =0 ∂σ ∂σ = − (8. 1). i. In practice. We use the current implied volatility to determine the current hedge parameters in equation (8. J. Then as time goes on. the implied volatility will likely change. Then from equation (9. there is some error in the hedge. we would sell the option priced using our best estimate of σ (today). and then backing out the volatility which gives back today’s traded option price (this is the implied volatility).2) we have log Si+1 − log Si = log( Si+1 ) Si √ σ2 [µ − ] ∆t + σφ ∆t 2 (9.6) should make up for this error. 49 Si+1 − Si ) Si (9. This procedure is called deltavega hedging. so that equation where φ is N (0. suppose that we observe asset prices at discrete times ti . Since this implied volatility has likely changed since we last rebalanced the hedge.4) .6).6) numerically (solve the pricing equation for several diﬀerent values of σ.e. • An amount in a riskfree bank account B(t). with ∆t = ti+1 − ti . we can determine the derivatives in equation (8. Nevertheless.6) Note that if we assume that σ is constant when pricing the option. the vega hedge computed using a constant volatility model works surprisingly well (Hull and White.3) √ ∆t. Now. However.
that size of the Poisson outcome does not depend on dt.4 0. so that the tails of the distribution become unimportant.6 Figure 9.1)) assumes that the probability of a large return √ also tends to zero.1: Probability density functions for the S&P 500 monthly returns 1982 − 2002.5) Consequently. consider the process dq where. note that E[dq] = λ dt · 1 + (1 − λ dt) · 0 = λ dt 50 (9. A standard normal distribution is also shown. = 0 . t + dt]. in real life. if the assumption (9. once again. dq = 1 .1) is true.6) Note.2 0 0. It therefore appears that Geometric Brownian Motion (GBM) is missing something important. 9. 20 15 10 5 0 −1. scaled to zero mean and unit standard deviation and the standardized Normal distribution.8 −0. a plot of the discretely observed returns of S should be normally distributed. where some changes occur at any time scale we look at).4) becomes log(1 + Ri ) √ Ri = σφ ∆t. we can sometimes see very large returns (positive or negative) in small time increments. or a large gain or loss compared to a normal distribution.Deﬁne the return Ri in the period ti+1 − ti as Ri so that equation (9. but on rare occasions. As ∆t → 0. The histogram has been scaled to zero mean and unit standard deviation. Note that for real data. t + dt] goes to zero as dt → 0. For future reference. in contrast to Brownian motion. the probability of a jump occurring in [t.2 −1 −0.4 −0.2 0.1 we can see a histogram of monthly returns from the S&P 500 for the period 1982 − 2002. More formally. This means that there is higher probability of zero return. where some movement always takes place (the probability of movement is constant as dt → 0). there is a higher peak. But. a jump occurs. in the interval [t. and fatter tails than the normal distribution. but the probability of the jump occurring does depend on the interval.1 The Poisson Process Consider a process where most of the time nothing happens (contrast this with Brownian motion.6 −0. Geometric Brownian Motion (equation (9. The jump size does not depend on the time interval. The amplitude of the return is proportional to ∆t. In Figure 9. = Si+1 − Si Si (9. (9. Also. with probability λdt with probability 1 − λdt. but the size of the movement tends to zero as dt → 0.7) .4 −1.
e. t) − V (S. (9.8) Now. We will restrict J to be nonnegative.17) .9) which is what we want to model. i. i.e. then dS = µS dt + σS dZ + (J − 1)S dq (9. J + dJ] is g(J) dJ.15) = [dP ]Brownian + [dP ]jump (9. [dP ]jump = [V (JS. Suppose a jump occurs in [t. 2 51 (9.16) give us dP = [Vt + σ2 S 2 VSS ]dt + [V (JS. form the usual hedging portfolio P Now. i. noting that the jump is of ﬁnite size. if a jump occurs Saf ter jump = (J − 1)S dq = Sbef ore jump + [dS]jump = Sbef ore jump + (J − 1)Sbef ore = JSbef ore jump jump (9.12) 9. if f = f (J). by setting α = VS . Let’s write this jump process as an SDE. suppose we assume that. where J is the size of a (proportional) jump. For future reference. with probability λdt.and V ar(dq) = = = = E[(dq − E[dq])2 ] E[(dq − λ dt)2 ] (1 − λ dt)2 · λ dt + (0 − λ dt)2 · (1 − λ dt) λ dt + O((dt)2 ) . So. t)]dq − VS (J − 1)S dq .16) If we hedge the Brownian motion risk. along with the usual GBM. occasionally the asset jumps. then the probability of a jump in [J. then the expected value of f is ∞ E[f ] = 0 f (J)g(J) dJ . given that a jump occurs. if we have a combination of GBM and a rare jump event. consider [dP ]total where. S → JS.2 The Jump Diﬀusion Pricing Equation Now. then equations (9.10) Assume that the jump size has some known probability density g(J).13) and. t + dt].149. from Ito’s Lemma [dP ]Brownian = [Vt + σ2 S 2 VSS ]dt + [VS − αS](µS dt + σS dZ) 2 (9.14) = V − αS . (9. [dS]jump since.e. and +∞ ∞ g(J) dJ −∞ = 0 g(J) dJ = 1 (9. t) − V (S. (9. t)] dq − α(J − 1)S dq .11) since we assume that g(J) = 0 if J < 0. (9.
λ as parameters. t)]E[dq] − VS SE[J − 1]E[dq] 2 (9. General equilibrium pricing of options on the market portfolio with discontinuous returns. Assume that an investor holds a diversiﬁed portfolio of these hedging portfolios. 2002. M. One can be more rigorous about this if you assume some utility function for investors.12) in equation (9. If we make the rather dubious assumption that these jumps for diﬀerent stocks are uncorrelated. pages 6067) or (V. vol 4. then we have that equation (9. A common assumption is to assume that g(J) is log normal.20) Using equation (9. for many diﬀerent stocks. we can regard σ. our assumption about jump risk being diversiﬁable is not really a problem if we ﬁt the jump parameters from market (as opposed to historical) data. t) − V (S. 2 (9. Now. actual prices seem to indicate that investors do require some compensation for jump risk. g(J) = 2πγJ 2 (9.) 52 .21) gives Vt + σ2 S 2 VSS + VS [rS − Sκλ] − (r + λ)V + λ 2 ∞ g(J)V (JS. e. vol 3 (1990) pages 493521. JumpDiﬀusion processes: Volatility smile ﬁtting and numerical methods. Review of Derivatives Research (2002). equating equations (9. some algebra shows that E(J − 1) = κ = exp(ˆ + γ 2 /2) − 1 . but that the ﬁtted values of λ. Consequently. Andersen and J.20) gives Vt + σ2 S 2 VSS + VS [rS − Sκλ] − (r + λ)V + E[V (JS. what about our dubious assumption that jump risk was diversiﬁable? In practice. which makes sense. (see L. then we ﬁnd that σ is close to historical volatility.21) (9. E(dP ) = [Vt + σ2 S 2 VSS ]dt + E[V (JS.18) becomes E(dP ) = [Vt + σ2 S 2 VSS ]dt + E[V (JS. t)]λ = 0 . December. t) dJ = 0 . µ are at odds with the historical values.19 and (9. the expected return should be E[dP ] = rP dt . Deﬁning E(J − 1) = κ. we still have a random component (dq) which we have not hedged away. since the marketﬁt parameters will contain some eﬀect due to risk preferences of investors. and ﬁt them to observed option prices. If we do this. Naik.19) Now. these parameters contain a market price of risk.24) Now. γ. The ﬁtted values seem to indicate that investors are ˆ pricing in larger more frequent jumps than has been historically observed. µ.22) Equation (9. See (Alan Lewis. Wilmott Magazine. t) − V (S. Let’s take the expected value of this change in the portfolio. In other words. µ (9.18) where we have assumed that probability of the jump and the probability of the size of the jump are independent. pages 231262). Lee. t)]λ dt − VS Sκλ dt . 2 (9. then the variance of this portfolio of portfolios is small. Fear of jumps.23) where. we make a rather interesting assumption. The Review of Financial Studies. hence there is little risk in this portfolio. 0 (9. µ exp − (log(J)2 −ˆ) 2γ √ .So.g. ˆ Andreasen. Hence.22) is a Partial Integral Diﬀerential Equation (PIDE). In other words.
t + ∆t]. (10. 1). to form the portfolio P i=N P Then.4) i=N P + dP = i=1 i=N xi S i (1 + Ri ) xi S i Ri i=1 i=N dP dP P = = i=1 wi Ri wi = i=N xi Si j=N j=1 xj Sj (10. Now consider a portfolio of N risky assets. i=1 Rp = = (10. (10. (10. µ is the expected return on the asset in [t. t + ∆t] = i=1 xi Si . we suppose that an investor allocates a fraction wi of this wealth to each asset i. Let Ri be the return on asset i in [t.1) as R = µ +σ φ ∆S R= S µ = µ∆t √ σ = σ ∆t .3) Suppose that the correlation between asset i and asset j is given by ρij = E[φi φj ].6) 53 . and σ is the standard deviation of the return on the asset in [t.10 Mean Variance Portfolio Optimization An introduction to Computational Finance would not be complete without some discussion of Portfolio Optimization. We assume that the total wealth is allocated to this risky portfolio P . so that Ri = µi + σi φi (10. To summarize. Suppose we buy xi of each asset at t. Note that i=N i=1 wi = 1. then we can write equation (10. Consider a risky asset which follows Geometric Brownian Motion with drift dS S = µ dt + σ dZ . t + ∆t]. so that i=N wi i=1 = 1 i=N P dP P = i=1 i=N xi S i wi Ri .5) In other words. Suppose we consider a ﬁxed ﬁnite interval ∆t. we divide up our total wealth W = i=1 xi Si into each asset with weight wi . given some initial wealth at t. t + ∆t].2) where R is the actual return on the asset in [t. over the interval [t.1) √ where as usual dZ = φ dt and φ ∼ N (0. t + ∆t].
12) which means that in this case the standard deviation of the portfolio is simply the weighted average of the individual asset standard deviations. if we diversify over a large number of assets. a portfolio of as little as 10 − 20 stocks tends to reap most of the beneﬁts of diversiﬁcation.8) becomes i=N V ar(Rp ) = i=1 (σi )2 (wi )2 . suppose we equally weight all the assets in the portfolio. In this case i=N j=N V ar(Rp ) = i=1 j=1 wi wj σi σj j=N = so that if sd(R) = j=1 wj σj 2 (10. j. (10. j. but larger than zero. (10. Consider another case: all assets are perfectly correlated.10) so that in this special case. ρij = 1. Then equation (10. wi = 1/N. ∀i. the standard deviation of the portfolio tends to zero as N → ∞. N (10. ∀i. i. in this case j=N sd(Rp ) = j=1 wj σj .1 Special Cases Suppose the assets all have zero correlation with one another. (10. ∀i. 54 .11) V ar(R) is the standard deviation of R. i. Let maxi σi = σmax .8) 10. so that the standard deviation of a portfolio of assets will be smaller than the weighted average of the individual asset standard deviation. (10. In fact.9) Now. we can expect that 0 < ρij  < 1.e. t + ∆t] is = i=1 wi µi .e. t + ∆t] is i=N Rp while the variance of Rp in [t. ρij ≡ 0.The expected return on this portfolio Rp in [t. This means that diversiﬁcation will be a good thing (as Martha Stewart would say) in terms of risk versus reward. In general. then. then V ar(Rp ) = 1 N2 i=N (σi )2 i=1 N (σmax )2 ≤ N2 1 = O .7) i=N j=N V ar(Rp ) = i=1 j=1 wi wj σi σj ρij .
wN ]t . most mutual funds can only hold long positions (wi ≥ 0).. if α → ∞. Let α represent the degree with which investors want to maximize return at the expense of assuming more risk. For a given value of the standard deviation of the portfolio return (sd(Rp )). .30 .05 −. In theory. (10.18).. We pick various values of α.1 shows a typical curve. Rp ) plane. then any point below the curve is not eﬃcient.20) 0 ∞ We have restricted this portfolio to be long only.1 0 ∞ L= 0 . The data used for this example is . However. w = [w1 ..015 .05 ¯ . i = 1. However.. .18) constitute a quadratic programming problem.. which should be ﬁxed up somehow.17) is simply equation (10. For example.19) We can now trace out a curve on the (sd(Rp ).. On the other hand. µ2 . Figure 10. wi ≤ . and they may also be prohibited from having a large position in any one asset (e.10.. . The expected return on the portfolio is then Rp = w t µ . N . U = ∞ (10.01 . µN ]t .14) We can think of portfolio allocation problem as the following. and don’t care about risk. and then solve the quadratic programming problem (10. Longshort hedge funds will not have these types of restrictions. C = .13) and deﬁne the vectors µ = [µ1 . Let sd(Rp ) = = standard deviation of Rp V ar(Rp ) (10. then investors want to avoid as much risk as possible.08 −.015 µ = . measurement errors may result in C having a negative eigenvalue. which is also known as the eﬃcient frontier.g. If α → 0.. since there is another portfolio 55 . The portfolio allocation problem is then (for given α) ﬁnd w ¯ which satisﬁes min wt C w − αwt µ ¯ ¯ ¯ ¯ w ¯ (10.18) Li ≤ wi Constraint (10. while constraints (10.20 .20).1610.01 .6). ¯ ¯ (10. Let the covariance matrix C be deﬁned as [C]ij = Cij = σi σj ρij (10.17) (10. For ﬁxed α.15 .20 .1610. all investors like to achieve the highest possible expected return for a given amount of risk. We are assuming that risk and standard deviation of portfolio return are synonymous.18) may arise due to the nature of the portfolio.15) (10. w2 .16) subject to the constraints wi i = 1 ≤ Ui . equations (10. ¯ ¯ and the variance is V ar(Rp ) = wt C w . then investors seek only to maximize expected return.. the covariance matrix should be ¯ ¯ symmetric positive semideﬁnite.2 The Portfolio Allocation Problem Diﬀerent investors will choose diﬀerent portfolios depending on how much risk they wish to take.
This is the advertised advantage of longshort hedge funds. the maximum possible expected portfolio return Rp . Then determine the portfolio with the smallest possible risk.25 In general.22) which is simply a linear programming problem.e. reprinted in 2000).3 Standard Deviation 0.4 0. then ¯ ¯t ¯ the maximum possible expected return is (Rp )max = wmax µ. i = 1.1: A typical eﬃcient frontier.16) ) min wt C w ¯ ¯ w ¯ wi = 1 i Li ≤ wi ≤ Ui ..125 0.. Only points on the curve are eﬃcient in this manner. a linear combination of portfolios at two points along the eﬃcient frontier will be feasible. i.2 shows results if we allow the portfolio to hold up to . N . Data in equation (10.25 .20) except that −. (10. min −wt µ ¯ ¯ w ¯ wi = 1 i Li ≤ wi ≤ Ui . (α = 0 in equation (10.6 Figure 10.25 0.2 0. .0.225 Expected Return 0. In general. the data is the same as in (10. In other words. 56 (10. First of all.5 0. longshort portfolios are more eﬃcient than longonly portfolios.1 0. with the same risk (standard deviation) and higher expected return.. Why is this the case? If this was not true. for each value of portfolio standard deviation SD(Rp ).. we can determine the maximum possible expected return (α = ∞ in equation (10. satisfy the constraints. Sharpe.2 Efficient Frontier 0.25 short positions in each asset. Since the feasible region is convex.. N (10.15 0. If the solution weight vector to this problem is (w)max . 1970. . then the eﬃcient frontier would not really be eﬃcient. McGraw Hill. we can actually proceed in a diﬀerent manner when constructing the eﬃcient frontier.21) −. W.20). Figure 10. i = 1. (see Portfolio Theory and Capital Markets. This feasible region will be convex along the eﬃcient frontier. This curve shows.. Another way of saying this is that a straight line joining any two points along the curve does not intersect the curve except at the given two points.175 0.25 L = −.16)).23) .
for given (Rp )k we solve the quadratic program min wt C w ¯ ¯ w ¯ Aw = B k ¯ Li ≤ wi ≤ Ui .125 0.05 µ= ¯ −.175 Long Only 0.0 0.03). comparing results for longonly portfolio (10.01 . . σi > 0..2: Eﬃcient frontier. and A= µt ¯ e .. what happens if we add a risk free asset to our portfolio? This riskfree asset must earn the risk free rate r = r∆t.01 0. (10.6 Figure 10. 1]. 1. i = 1. However.e.. Npts . Bk = (Rp )k 1 (10. with solution vector (w)k and hence portfolio standard deviation sd((Rp )k ) = ¯ a set of pairs (sd((Rp )k )..225 LongShort Expected Return 0. U = ∞ −∞ ∞ where we have assumed that we can borrow any amount at the riskfree rate (a dubious assumption). k = 1. We then divide up the range [(Rp )min ..21).30 . (Rp )max ] into a large number of discrete ¯t ¯ portfolio returns (Rp )k . .26) 0 .0 0. then the minimum possible portfolio ¯ return is (Rp )min = wmin µ.05 −.03 0.15 .0 . . .24) then.2 0.2 0. This gives us ¯ k ¯ 10. C = .08 . 57 .015 .0 .0 . with r = . (Rp )k ). If the solution weight vector to this quadratic program is given by wmin .1 0.015 0. . N .. we have assumed that each asset is risky... Let e = [1.0..20) and a longshort portfolio (same data except that lower bound constraint is replaced by equation (10.20 .3 Standard Deviation 0. and its standard deviation is zero. The data for this case is (the riskfree asset is added to the end of the weight vector. ∀i.0 0.25) (w)t C(w)k ..1 0..0 0 ∞ 0 ∞ L= (10..15 0. k = 1.3 Adding a Riskfree asset Up to now.20 . i.25 0. Npts .5 0.4 0.
(10.1. Consequently.4 0. Rp ) plane. shown as the capital market line. then we are lending at the riskfree rate. Then. The portfolio corresponding to the weights wM ¯ ¯ is termed the market portfolio. all diversiﬁed investors. which touches the allriskyasset eﬃcient frontier at a single point (the straight line is tangent the allriskyasset eﬃcient frontier). sd(Rp ) 58 (10. In other words. Any other choice for the portfolio is not eﬃcient. Let the portfolio weights at this single point be denoted by wM . except that we include a risk free asset. In fact.125 0.05 0. In other words. corresponding to a portfolio which consists entirely of the risk free asset.6 Figure 10. any point along the capital market line has Rp = wr r + (1 − wr )(Rp )M sd(Rp ) = (1 − wr ) sd((Rp )M ) . we start at the point (0. in this case the eﬃcient frontier is a straight line. and a riskfree asset. If we compute the eﬃcient frontier with a portfolio of risky assets and include one riskfree asset. Note that the actual fraction selected for investment in the market portfolio depends on the risk preferences of the investor. We then draw a straight line passing through (0.225 0. we are borrowing at the riskfree rate. In this case.3. Let (Rp )M = wM µ be the expected return on this market portfolio. All portfolios should have the same Sharp ratio λM = Rp − r . r ) in the (sd(Rp ). we can construct the eﬃcient frontier for a portfolio of the same risky assets plus a risk free asset in the following way. The capital market line is so important.0.1 (all risky assets).27) If wr ≥ 0. the eﬃcient frontier becomes a straight line.28) .1 Standard Deviation 0. that the equation of this line is written as Rp = r +λM sd((Rp )).3 0. given the eﬃcient frontier from Figure 10. Let wr be the fraction invested in the risk free asset.1 0.1. First of all. and the eﬃcient frontier with the same assets as in Figure 10.2 Capital Market Line Borrowing Efficient Frontier All Risky Assets Lending Market Portfolio Expected Return 0.25 0. should have diversiﬁed portfolios which plot along the capital market line.5 0. Note that this straight line is always above the eﬃcient frontier for the portfolio consisting of all risky assets (as in Figure 10.15 0. with ¯t ¯ corresponding standard deviation sd((Rp )M ).2 0. If wr < 0. we get the result labeled capital market line in Figure 10.175 0.075 0.025 0 0 Risk Free Return 0. given a portfolio of risky assets.1). r ). where λM is the market price of risk.3: The eﬃcient frontier from Figure 10. at any particular point in time. then all investors should divide their assets between the riskfree asset and the market portfolio.
Remember that for short time series. and covariances) are determined by resampling. If we have a long time series. of return on market portfolio = s. stock analysts should be estimating µ from company balance sheets. C.e. we also have RM = r + λM σ M . for the past few years. assuming that the observed values have some observational errors. We have also assumed that risk is measured by standard deviation of portfolio return. In fact. whereby initial wealth is allocated equally between N assets. etc. Perhaps one of the most useful ideas that come from meanvariance portfolio optimization is that diversiﬁed investors (at any point in time) expect that any optimal portfolio will produce a return Rp = r + λM σp Rp = Expected portfolio return r = riskfree return in period ∆t λM = market price of risk σp = Portfolio volatility . Another recent approach is to compute the optimal portfolio weights using using many diﬀerent perturbed input data sets. However. with weight wM and security i. This gives us an average eﬃcient frontier.10. some recent studies have suggested that if investors simply use the 1/N rule. and I don’t like it when the asset goes down. it is claimed. is less sensitive to data errors. but why do we think µ for a particular ﬁrm will be constant for long periods? Probably.d. of return on asset i = σM σi ρi. they are diﬃcult to estimate. 10. In other words. depending on their risk preferences. (10. (10.4 Criticism Is meanvariance portfolio optimization the solution to all our problems? Not exactly. This suggests that perhaps it may be more appropriate to minimize downside risk only (assuming a long position). (10. whose adherents don’t think much of meanvariance portfolio optimization. In this way. Actually. So.5 Individual Securities Equation (10. µ is hard to determine if the time series of returns is not very long. we can get an some sort of optimal portfolio weights which have some eﬀect of data errors incorporated in the result.M = Covariance between i and M 59 = 1 . volatility which makes the price increase is good. Of course.30) Note: there is a whole ﬁeld called Behavioural Finance. sales data. We have assumed that µ . I like it when the asset goes up. which. In particular. What is the relationship between risk and reward for individual securities? Consider the following portfolio: divide all wealth between the market portfolio. the noise term (Brownian motion) will dominate.29) where diﬀerent investors will choose portfolios with diﬀerent σp (volatility). assuming that there is uncertainty in the estimates of µ . and hence many diﬀerent optimal portfolios. there will be lots of diﬀerent estimates of µ .31) (10. C. we can get a better estimate for µ . that this does a pretty good job.32) . σ are independent of time. but λM is the same for all investors. with weight wi .30) refers to an eﬃcient portfolio. By deﬁnition wM + wi and we deﬁne RM Ri σM σi Ci. i. analysts have been too busy hyping stocks and going to lunch to do any real work. This is not likely. Even if these parameters are reasonably constant.d. The input data (expected returns. if I am long an asset.M = expected return on the market portfolio = expected return on asset i = s.
2(σp ) ∂(σp ) ∂wi ∂Rp ∂wi = 2wi (σi )2 + 2(1 − 2wi )Ci. suppose we try to obtain a least squares ﬁt to the above data. then we obtain ∂Rp ∂(σp ) = (Ri − RM )(σM ) Ci.3310.40) Typically.35) Now.38) (10. This assumes that positions with wi < 0 in asset i are possible.34) For a set of values {wi }.M βi = .M − (σM )2 (10. (10. then the curve would be above the capital market line. in period k Return on market portfolio in period k .g. If this curve is not tangent to the capital market line.36).M + (1 − wi )2 (σM )2 (10.41) . Assuming that the slope of the Rp portfolio is tangent to the capital market line gives (from equations (10.M + (wi − 1)(σM )2 (10.37)) RM − r (σM ) or Ri = r + βi (RM − r ) Ci. i. let wi → 0 in equation (10.30. when this curve intersects the capital market line at the market portfolio. such as the TSX 300.Now.M + wM (σM )2 2 = wi (σi )2 + 2wi (1 − wi )Ci. Figure 10.36) Now.e.37) But this curve should be tangent to the capital market line.M + 2(wi − 1)(σM )2 = Ri − RM . ∂Rp ∂(σp ) = ∂Rp ∂wi ∂(σp ) ∂wi = (Ri − RM )(σp ) .34) will plot a curve in expected returnstandard deviation plane (Rp .3). (10. Let’s determine the slope of this curve when wi → 0. 60 (10. Now.30) at the point where the capital market line touches the eﬃcient frontier.M − (σM )2 (10. Suppose we have a time series of returns (Ri )k (RM )k = = Return on asset i. equation (10. then this implies that if we choose wi = ± . wi (σi )2 + (1 − 2wi )Ci. which should not be possible (the capital market line is the most eﬃcient possible portfolio).39) The coeﬃcient βi in equation (10. equations (10.10.33) (10. (σM )2 = (Ri − RM )(σM ) Ci. σp ) (e. the expected return on this portfolio is Rp = E[Rp ] = wi Ri + wM RM = wi Ri + (1 − wi )RM and the variance is 2 2 V ar(Rp ) = (σp )2 = wi (σi )2 + 2wi wM Ci.39) has a nice intuitive deﬁnition. using the equation Ri αi + bi RM . we assume that the market portfolio is a broad index.
each asset receiving a fraction wi of 61 . we ﬁnd that bi = so that we can write Ri αi + βi RM .1 −0. Now. RM ] = 0 .42) This means that βi is the slope of the best ﬁt straight line to a ((Ri )k .15 0. Each point represents pairs of daily returns.43) Ci.4: Return on Rogers Wireless Communications versus return on TSE 300. do a linear regression of Ri vs. The vertical axis measures the daily return on the stock and the horizontal axis that of the TSE300.39) we have that Ri which is consistent with equation (10.43) if Ri = αi + βi RM + i E[ i ] = 0 αi = r (1 − βi ) E[ i .44) (10. An example is shown in Figure 10. from equation (10.06 −0.04 −0.08 −0.4.M (σM )2 (10.46) has the interpretation that the return on asset i can be decomposed into a drift component.1 Return on stock 0. a part which is correlated to the market portfolio (the broad index).46) Equation (10. (10.02 0. RM ). and a random part uncorrelated with the index.45) since E[Ri ] = Ri = αi + βi RM .g. that returns on each each asset are correlated only through their correlation with the index. (RM )k ) scatter plot.g.47) e.05 −0. Make the following assumptions E[ i j] = 0 . i=j = e2 . (10. Consider once again a portfolio where the wealth is divided amongst N assets.04 Figure 10.05 0 −0.02 Return on market 0 0. i = j i (10. Carrying out the usual least squares analysis (e. = r + βi (RM − r ) (10.Rogers Wireless Communications 0.
the authors conclude that the probability of a negative real return over a twenty year period.49) i=N 2 wi e2 i i=1 (10. Let’s see if we can explain why there is this common misconception about the riskiness of long term equity investing.the initial wealth. if wi = O(1/N ).1 shows a typical table in a Mutual Fund advertisement.” Fin.54) 11 Stocks for the Long Run? Conventional wisdom states that investment in a diversiﬁed portfolio of equities has a low risk for a long term investor.52) = r + βi (RM − r ) (10. then (σM )2 i=1 wi βi + i=1 2 wi e2 . i (10.(Rp ) Note that if we write Ri then we also have that Ri σM i=1 wi βi . 2004) 2525) an extensive analysis of historical data of equity returns was carried out. M. Table 11.(Rp ) = (σM )2 i=1 j=1 i=N wi wj βi βj + i=1 2 i=N 2 wi e2 i = Now. hence equation (10. In this case. in a recent article (”Irrational Optimisim. most individuals in deﬁned contribution pension plans have poorly diversiﬁed portfolios. P.51) = r + λi σi (10. Staunton.48) i=N j=N i=N s. J.50) is O(1/N ) as N becomes large. However.Marsh. In fact. vol 60 (January. for a investor holding a diversiﬁed portfolio.d. we are supposed to conclude that 62 . E.d. (10. = βi (RM − r ) σi (10. the authors ﬁnd that the probabability of a negative real return over twenty years is about 25 per cent. Anal. Simson. the return on the portfolio is i=N Rp = i=1 i=N wi Ri i=N Rp and = i=1 wi αi + RM i=1 wi βi (10. From this table.49) becomes i=N s.53) so that the market price of risk of security i is λi which is useful in real options analysis. Projecting this information forward. is about 14 per cent. Making more realistic assumptions for deﬁned contribution pension plans.
1 year 2% 2 years 5% 5 years 10% 10 years 8% 20 years 7% 30 years 6% 30 year bond yield 3% Table 11.06. it follows that 1 log T S(T ) S(0) ∼ N ((µ − 63 σ2 ). and B is the value of the government bond.4) If we assume that the value of the equity portfolio S follows a Geometric Brownian Motion dS then from equation (2.03 S(T ) S(0)eαT α = .e.1: Historical annualized compound return. 2 σ2 2 )T = µS dt + σS dZ (11.46.9 2. (11. • Long term equity investment is not very risky.2) indicating that you more than double your return by investing in equities compared to bonds (over the long term). Compound return: bonds = log B(0) (11. • If S is the value of the mutual fund. for T large. which gives S(T =30) S(0) B(T =30) B(0) (11. XYZ Mutual Equity Funds. with an annualized compound return about 3% higher than the current yield on government bonds. so that the i. σ 2 T ) .5) (11. then B(T ) = B(0)erT r = . Since var(aX) = a2 var(X). A convenient way to measure the relative returns on these two investments (bonds and stocks) is to compare the total compound return Compound return: stocks = log S(T ) = αT S(0) B(T ) = rT . T B(0) (11. Also shown is the current yield on a long term government bond. the compound return in is normally distributed with mean (µ − variance of the total compound return increases as T becomes large.8−.1) = e1. σ 2 /T ) . 2 (11.9 = e.6) and variance σ 2 T .3) or the annualized compound returns Annualized compound return: stocks = Annualized compound return: bonds = 1 S(T ) log =α T S(0) 1 B(T ) log =r .56) we have that log S(T ) S(0) ∼ N ((µ − σ2 )T.7) .
8 −0.5 1. σ = . The Free Press. New York.6 1. New York.4 0.611. 1994. Fooled by Randomness. Against the Gods: the remarkable story of risk. σ = .90.1 is misleading. so that the the variance of the annualized return tends to zero at T becomes large. W. ISBN 0393320405. 64 .5). 000 simulations of asset prices assuming that the asset follows equation (11. Dixit and R. Norton.3)).2. Capital Ideas: the improbable origins of modern Wall street.6 −0. 12 12. while the variance of the total return has increased (verifying equations (11.08. Left: annualized return 1/T log[S(T )/S(0)]. This is why Table 11.2 shows that there are many possible scenarios where the return on equities will be less than risk free bonds after 30 years.08. Figure 11. There is signiﬁcent risk in equities.1 Further Reading General Interest • Peter Bernstein.2.9 in the histogram.8 1 0. Princeton University Press.2 0 −0. ISBN 1587990717. 100. Pindyck. Texere.5 0. Note how the variance of the annualized return has decreased. John Wiley. Taleb. New York. ISBN 0471295639. The number of scenarios with return less than risk free bonds is given by the area to the left of .6 0.2 More Background • A.1 shows the results of 100. 1992.5 1 0.8 2. Assuming long term bonds yield 3%. A random walk down Wall Street. µ = .2 1. for bonds. 2001. what we really care about is the total compound return (that’s how much we actually have at t = T .4 −0. • Peter Bernstein. The investment horizon is 5 years.4 0.x 10 3 4 Annualized Returns − 5 years 2 x 10 4 Log Returns − 5 years 1.6 0.1: Histogram of distribution of returns T = 5 years.2 0 Ann. Looking at the right hand panel of Figure 11. Figure 11. Returns ($) 0. • Burton Malkeil.W.7)).2 shows similar results for an investment horizon of 30 years. with µ = .8 0 −3 −2 −1 0 1 Log Returns ($) 2 3 4 5 Figure 11. 1998. Right: return log[S(T )/S(0)]. • N. even over the long term (30 years would be longterm for most investors). Of course. 12. 000 simulations.4 2 1. relative to what we invested at t = 0) at the end of the investment horizon. Investment under uncertainty. 1999. this gives a total compound return over 30 years of .2 0. The results are given in terms of histograms of the annualized compound return (equation (11.4)) and the total compound return ((equation (11.
K. Tavella and C. 1997. • R. 65 . • John Hull. J. Springer Singapore. Wiley.6 −0.5 0. µ = . Real Options: Managerial Flexibility and Strategy for Resource Allocation. McGrawHill Irwin. Returns ($) 0. ISBN 0471396869. Kwok.8 1 0. Broadie.4 −0. Tools for Computational Finance. • P. An Introduction to Financial Option Valuation. Dyn. • N.x 10 3 4 Annualized Returns − 30 years 2 x 10 4 Log Returns − 30 years 1. R. 2002. ISBN 0471197602.4 0. Right: return log[S(T )/S(0)].3 More Technical • P. Randall. Glasserman. Wiley. 21:12671321 (1997) • P. Left: annualized return 1/T log[S(T )/S(0)]. Seydel.8 −0. • S.. 000 simulations. Econ. Springer (2004) ISBN 0387004513. • D. 2002.2. Jaﬀe. PrenticeHall.8 2. ISBN 0131864793. futures and other derivatives. 2002. 12. ISBN 0471152803. • D. Options. Tavella.2 1. Academic Press (2000) ISBN 0125153929.6 0. Wiley. MIT Press. 100. • Lenos Triegeorgis.4 0. An Introduction to the Mathematics of Financial Derivatives. • Boyle. 2000.2: Histogram of distribution of returns T = 30 years. • S. ISBN 9813083565. ISBN 354043609X. • D. Wiley (2002) ISBN 0471394475. 1998. ISBN 026220102X.6 1. Monte Carlo Methods in Finance. Westerﬁeld.2 0. Pricing Financial Instruments: the Finite Diﬀerence Method.2 0 Ann.5 1. Mathematical Models of Finance. Jackel. ISBN 0072831375. • Y. Neftci. Taleb. • W. Portfolio Theory and Capital Markets. Con. σ = . Higham.6 0.4 2 1. reprinted in 2000. Numerical Methods in Finance: A Matlab Introduction. ISBN 047149741X.08. Monte Carlo Methods in Financial Engineering. 1996. Sharpe. Quantitative Methods in Derivatives Pricing: An Introduction to Computational Finance. Wiley. ISBN 0071353208. J. 1970. Glassermman. Wiley.8 0 −3 −2 −1 0 1 Log Returns ($) 2 3 4 5 Figure 11. Dynamic Hedging.2 0 −0. (Still a classic).5 1 0. Monte Carlo methods for security pricing. Springer. Cambridge (2004) ISBN 0521838843. Ross. 2002. 1997. Brandimarte. Corporate Finance.
ISBN 0471874388.• P. J. Dewynne. 1997. • Paul Wilmott. S. Wiley. Cambridge. ISBN 0521497892. 2000. Paul Wilmott on Quantitative Finance. Wilmott. Howison. The mathematics of ﬁnancial derivatives: A student introduction. 66 .
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.