This action might not be possible to undo. Are you sure you want to continue?

# An Introduction to Computational Finance

**Without Agonizing Pain
**

c (Peter Forsyth 2005

P.A. Forsyth

∗

August 5, 2005

Contents

1 The First Option Trade 3

2 The Black-Scholes Equation 3

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 A Simple Example: The Two State Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.4 A hedging strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.5 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.6 Geometric Brownian motion with drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.6.1 Ito’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.6.2 Some uses of Ito’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.6.3 Some more uses of Ito’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.7 The Black-Scholes Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.8 Hedging in Continuous Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.9 The option price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.10 American early exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 The Risk Neutral World 16

4 Monte Carlo Methods 18

4.1 Monte Carlo Error Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Random Numbers and Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 The Box-Muller Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3.1 An improved Box Muller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.4 Speeding up Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.5 Estimating the mean and variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.6 Low Discrepancy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.7 Correlated Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.8 Integration of Stochastic Diﬀerential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.8.1 The Brownian Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

∗

School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1, paforsyt@elora.uwaterloo.ca,

www.scicom.uwaterloo.ca/ paforsyt, tel: (519) 888-4567x4415, fax: (519) 885-1208

1

4.8.2 Strong and Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.9 Matlab and Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 The Binomial Model 35

5.1 A No-arbitrage Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 More on Ito’s Lemma 40

7 Derivative Contracts on non-traded Assets and Real Options 42

7.1 Derivative Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7.2 A Forward Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7.2.1 Convenience Yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

8 Discrete Hedging 46

8.1 Delta Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

8.2 Gamma Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

8.3 Vega Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

9 Jump Diﬀusion 51

9.1 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

9.2 The Jump Diﬀusion Pricing Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

10 Mean Variance Portfolio Optimization 55

10.1 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

10.2 The Portfolio Allocation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

10.3 Adding a Risk-free asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

10.4 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

10.5 Individual Securities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

11 Stocks for the Long Run? 65

12 Further Reading 66

12.1 General Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

12.2 More Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

12.3 More Technical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

2

pdﬂatex version

“Men wanted for hazardous journey, small wages, bitter cold, long months of complete darkness,

constant dangers, safe return doubtful. Honour and recognition in case of success.” Advertise-

ment placed by Earnest Shackleton in 1914. He received 5000 replies. An example of extreme

risk-seeking behaviour. Hedging with options is used to mitigate risk, and would not appeal to

members of Shackleton’s expedition.

1 The First Option Trade

Many people think that options and futures are recent inventions. However, options have a long history,

going back to ancient Greece.

As recorded by Aristotle in Politics, the ﬁfth century BC philosopher Thales of Miletus took part in a

sophisticated trading strategy. The main point of this trade was to conﬁrm that philosophers could become

rich if they so chose. This is perhaps the ﬁrst rejoinder to the famous question “If you are so smart, why

aren’t you rich?” which has dogged academics throughout the ages.

Thales observed that the weather was very favourable to a good olive crop, which would result in a bumper

harvest of olives. If there was an established Athens Board of Olives Exchange, Thales could have simply

sold olive futures short (a surplus of olives would cause the price of olives to go down). Since the exchange

did not exist, Thales put a deposit on all the olive presses surrounding Miletus. When the olive crop was

harvested, demand for olive presses reached enormous proportions (olives were not a storable commodity).

Thales then sublet the presses for a proﬁt. Note that by placing a deposit on the presses, Thales was actually

manufacturing an option on the olive crop, i.e. the most he could lose was his deposit. If had sold short

olive futures, he would have been liable to an unlimited loss, in the event that the olive crop turned out bad,

and the price of olives went up. In other words, he had an option on a future of a non-storable commodity.

2 The Black-Scholes Equation

This is the basic PDE used in option pricing. We will derive this PDE for a simple case below. Things get

much more complicated for real contracts.

2.1 Background

Over the past few years derivative securities (options, futures, and forward contracts) have become essential

tools for corporations and investors alike. Derivatives facilitate the transfer of ﬁnancial risks. As such, they

may be used to hedge risk exposures or to assume risks in the anticipation of proﬁts. To take a simple yet

instructive example, a gold mining ﬁrm is exposed to ﬂuctuations in the price of gold. The ﬁrm could use a

forward contract to ﬁx the price of its future sales. This would protect the ﬁrm against a fall in the price of

gold, but it would also sacriﬁce the upside potential from a gold price increase. This could be preserved by

using options instead of a forward contract.

Individual investors can also use derivatives as part of their investment strategies. This can be done

through direct trading on ﬁnancial exchanges. In addition, it is quite common for ﬁnancial products to include

some form of embedded derivative. Any insurance contract can be viewed as a put option. Consequently, any

investment which provides some kind of protection actually includes an option feature. Standard examples

include deposit insurance guarantees on savings accounts as well as the provision of being able to redeem a

savings bond at par at any time. These types of embedded options are becoming increasingly common and

increasingly complex. A prominent current example are investment guarantees being oﬀered by insurance

companies (“segregated funds”) and mutual funds. In such contracts, the initial investment is guaranteed,

and gains can be locked-in (reset) a ﬁxed number of times per year at the option of the contract holder. This

is actually a very complex put option, known as a shout option. How much should an investor be willing to

pay for this insurance? Determining the fair market value of these sorts of contracts is a problem in option

pricing.

3

Stock Price = $20

Stock Price = $22

Option Price = $1

Stock Price = $18

Option Price = $0

Figure 2.1: A simple case where the stock value can either be $22 or $18, with a European call option, K =

$21.

2.2 Deﬁnitions

Let’s consider some simple European put/call options. At some time T in the future (the expiry or exercise

date) the holder has the right, but not the obligation, to

• Buy an asset at a prescribed price K (the exercise or strike price). This is a call option.

• Sell the asset at a prescribed price K (the exercise or strike price). This is a put option.

At expiry time T, we know with certainty what the value of the option is, in terms of the price of the

underlying asset S,

Payoﬀ = max(S −K, 0) for a call

Payoﬀ = max(K −S, 0) for a put (2.1)

Note that the payoﬀ from an option is always non-negative, since the holder has a right but not an obligation.

This contrasts with a forward contract, where the holder must buy or sell at a prescribed price.

2.3 A Simple Example: The Two State Tree

This example is taken from Options, futures, and other derivatives, by John Hull. Suppose the value of a

stock is currently $20. It is known that at the end of three months, the stock price will be either $22 or $18.

We assume that the stock pays no dividends, and we would like to value a European call option to buy the

stock in three months for $21. This option can have only two possible values in three months: if the stock

price is $22, the option is worth $1, if the stock price is $18, the option is worth zero. This is illustrated in

Figure 2.1.

In order to price this option, we can set up an imaginary portfolio consisting of the option and the stock,

in such a way that there is no uncertainty about the value of the portfolio at the end of three months. Since

the portfolio has no risk, the return earned by this portfolio must be the risk-free rate.

Consider a portfolio consisting of a long (positive) position of δ shares of stock, and short (negative) one

call option. We will compute δ so that the portfolio is riskless. If the stock moves up to $22 or goes down

to $18, then the value of the portfolio is

Value if stock goes up = $22δ −1

Value if stock goes down = $18δ −0 (2.2)

4

So, if we choose δ = .25, then the value of the portfolio is

Value if stock goes up = $22δ −1 = $4.50

Value if stock goes down = $18δ −0 = $4.50 (2.3)

So, regardless of whether the stock moves up or down, the value of the portfolio is $4.50. A risk-free portfolio

must earn the risk free rate. Suppose the current risk-free rate is 12%, then the value of the portfolio today

must be the present value of $4.50, or

4.50 e

−.12×.25

= 4.367

The value of the stock today is $20. Let the value of the option be V . The value of the portfolio is

20 .25 −V = 4.367

→ V = .633

2.4 A hedging strategy

So, if we sell the above option (we hold a short position in the option), then we can hedge this position in

the following way. Today, we sell the option for $.633, borrow $4.367 from the bank at the risk free rate (this

means that we have to pay the bank back $4.50 in three months), which gives us $5.00 in cash. Then, we

buy .25 shares at $20.00 (the current price of the stock). In three months time, one of two things happens

• The stock goes up to $22, our stock holding is now worth $5.50, we pay the option holder $1.00, which

leaves us with $4.50, just enough to pay oﬀ the bank loan.

• The stock goes down to $18.00. The call option is worthless. The value of the stock holding is now

$4.50, which is just enough to pay oﬀ the bank loan.

Consequently, in this simple situation, we see that the theoretical price of the option is the cost for the seller

to set up portfolio, which will precisely pay oﬀ the option holder and any bank loans required to set up the

hedge, at the expiry of the option. In other words, this is price which a hedger requires to ensure that there

is always just enough money at the end to net out at zero gain or loss. If the market price of the option

was higher than this value, the seller could sell at the higher price and lock in an instantaneous risk-free

gain. Alternatively, if the market price of the option was lower than the theoretical, or fair market value, it

would be possible to lock in a risk-free gain by selling the portfolio short. Any such arbitrage opportunities

are rapidly exploited in the market, so that for most investors, we can assume that such opportunities are

not possible (the no arbitrage condition), and therefore that the market price of the option should be the

theoretical price.

Note that this hedge works regardless of whether or not the stock goes up or down. Once we set up this

hedge, we don’t have a care in the world. The value of the option is also independent of the probability that

the stock goes up to $22 or down to $18. This is somewhat counterintuitive.

2.5 Brownian Motion

Before we consider a model for stock price movements, let’s consider the idea of Brownian motion with drift.

Suppose X is a random variable, and in time t →t +dt, X →X +dX, where

dX = αdt +σdZ (2.4)

where αdt is the drift term, σ is the volatility, and dZ is a random term. The dZ term has the form

dZ = φ

√

dt (2.5)

where φ is a random variable drawn from a normal distribution with mean zero and variance one (φ ∼ N(0, 1),

i.e. φ is normally distributed).

5

If E is the expectation operator, then

E(φ) = 0 E(φ

2

) = 1 . (2.6)

Now in a time interval dt, we have

E(dX) = E(αdt) +E(σdZ)

= αdt , (2.7)

and the variance of dX, denoted by V ar(dX) is

V ar(dX) = E([dX −E(dX)]

2

)

= E([σdZ]

2

)

= σ

2

dt . (2.8)

Let’s look at a discrete model to understand this process more completely. Suppose that we have a

discrete lattice of points. Let X = X

0

at t = 0. Suppose that at t = ∆t,

X

0

→ X

0

+ ∆h ; with probability p

X

0

→ X

0

−∆h ; with probability q (2.9)

where p +q = 1. Assume that

• X follows a Markov process, i.e. the probability distribution in the future depends only on where it is

now.

• The probability of an up or down move is independent of what happened in the past.

• X can move only up or down ∆h.

At any lattice point X

0

+i∆h, the probability of an up move is p, and the probability of a down move is q.

The probabilities of reaching any particular lattice point for the ﬁrst three moves are shown in Figure 2.2.

Each move takes place in the time interval t →t + ∆t.

Let ∆X be the change in X over the interval t →t + ∆t. Then

E(∆X) = (p −q)∆h

E([∆X]

2

) = p(∆h)

2

+q(−∆h)

2

= (∆h)

2

, (2.10)

so that the variance of ∆X is (over t →t + ∆t)

V ar(∆X) = E([∆X]

2

) −[E(∆X)]

2

= (∆h)

2

−(p −q)

2

(∆h)

2

= 4pq(∆h)

2

. (2.11)

Now, suppose we consider the distribution of X after n moves, so that t = n∆t. The probability of j up

moves, and (n −j) down moves (P(n, j)) is

P(n, j) =

n!

j!(n −j)!

p

j

q

n−j

(2.12)

which is just a binomial distribution. Now, if X

n

is the value of X after n steps on the lattice, then

E(X

n

−X

0

) = nE(∆X)

V ar(X

n

−X

0

) = nV ar(∆X) , (2.13)

6

X

0

X

0

- ∆h

X

0

- 2∆h

X

0

+ 2∆h

X

0

+ ∆h

p

q

p

2

q

2

q

3

p

3

2pq

3p

2

q

3pq

2

X

0

+ 3∆h

X

0

- 3∆h

Figure 2.2: Probabilities of reaching the discrete lattice points for the ﬁrst three moves.

which follows from the properties of a binomial distribution, (each up or down move is independent of

previous moves). Consequently, from equations (2.10, 2.11, 2.13) we obtain

E(X

n

−X

0

) = n(p −q)∆h

=

t

∆t

(p −q)∆h

V ar(X

n

−X

0

) = n4pq(∆h)

2

=

t

∆t

4pq(∆h)

2

(2.14)

Now, we would like to take the limit at ∆t →0 in such a way that the mean and variance of X, after a

ﬁnite time t is independent of ∆t, and we would like to recover

dX = αdt +σdZ

E(dX) = αdt

V ar(dX) = σ

2

dt (2.15)

as ∆t →0. Now, since 0 ≤ p, q ≤ 1, we need to choose ∆h = Const

√

∆t. Otherwise, from equation (2.14)

we get that V ar(X

n

− X

0

) is either 0 or inﬁnite after a ﬁnite time. (Stock variances do not have either of

these properties, so this is obviously not a very interesting case).

Let’s choose ∆h = σ

√

∆t, which gives (from equation (2.14))

E(X

n

−X

0

) = (p −q)

σt

√

∆t

V ar(X

n

−X

0

) = t4pqσ

2

(2.16)

Now, for E(X

n

−X

0

) to be independent of ∆t as ∆t →0, we must have

(p −q) = Const.

√

∆t (2.17)

7

If we choose

p −q =

α

σ

√

∆t (2.18)

we get

p =

1

2

[1 +

α

σ

√

∆t]

q =

1

2

[1 −

α

σ

√

∆t] (2.19)

Now, putting together equations (2.16-2.19) gives

E(X

n

−X

0

) = αt

V ar(X

n

−X

0

) = tσ

2

(1 −

α

2

σ

2

∆t)

= tσ

2

; ∆t →0 . (2.20)

Now, let’s imagine that X(t

n

) −X(t

0

) = X

n

−X

0

is very small, so that X

n

−X

0

· dX and t

n

−t

0

· dt, so

that equation (2.20) becomes

E(dX) = α dt

V ar(dX) = σ

2

dt . (2.21)

which agrees with equations (2.7-2.8). Hence, in the limit as ∆t →0, we can interpret the random walk for

X on the lattice (with these parameters) as the solution to the stochastic diﬀerential equation (SDE)

dX = α dt +σ dZ

dZ = φ

√

dt. (2.22)

Consider the case where α = 0, σ = 1, so that dX = dZ =· Z(t

i

) − Z(t

i−1

) = Z

i

− Z

i−1

= X

i

− X

i−1

.

Now we can write

t

0

dZ = lim

∆t→0

¸

i

(Z

i+1

−Z

i

) = (Z

n

−Z

0

) . (2.23)

From equation (2.20) (α = 0, σ = 1) we have

E(Z

n

−Z

0

) = 0

V ar(Z

n

−Z

0

) = t . (2.24)

Now, if n is large (∆t →0), recall that the binomial distribution (2.12) tends to a normal distribution. From

equation (2.24), we have that the mean of this distribution is zero, with variance t, so that

(Z

n

−Z

0

) ∼ N(0, t)

=

t

0

dZ . (2.25)

In other words, after a ﬁnite time t,

t

0

dZ is normally distributed with mean zero and variance t (the limit

of a binomial distribution is a normal distribution).

Recall that have that Z

i

− Z

i−1

=

√

∆t with probability p and Z

i

− Z

i−1

= −

√

∆t with probability q.

Note that (Z

i

−Z

i−1

)

2

= ∆t, with certainty, so that we can write

(Z

i

−Z

i−1

)

2

· (dZ)

2

= ∆t . (2.26)

To summarize

8

• We can interpret the SDE

dX = α dt +σ dZ

dZ = φ

√

dt. (2.27)

as the limit of a discrete random walk on a lattice as the timestep tends to zero.

• V ar(dZ) = dt, otherwise, after any ﬁnite time, the V ar(X

n

−X

0

) is either zero or inﬁnite.

• We can integrate the term dZ to obtain

t

0

dZ = Z(t) −Z(0)

∼ N(0, t) . (2.28)

Going back to our lattice example, note that the total distance traveled over any ﬁnite interval of time

becomes inﬁnite,

E([∆X[) = ∆h (2.29)

so that the the total distance traveled in n steps is

n∆h =

t

∆t

∆h

=

tσ

√

∆t

(2.30)

which goes to inﬁnity as ∆t →0. Similarly,

∆x

∆t

= ±∞ . (2.31)

Consequently, Brownian motion is very jagged at every timescale. These paths are not diﬀerentiable, i.e.

dx

dt

does not exist, so we cannot speak of

E(

dx

dt

) (2.32)

but we can possibly deﬁne

E(dx)

dt

. (2.33)

2.6 Geometric Brownian motion with drift

Of course, the actual path followed by stock is more complex than the simple situation described above.

More realistically, we assume that the relative changes in stock prices (the returns) follow Brownian motion

with drift. We suppose that in an inﬁnitesimal time dt, the stock price S changes to S +dS, where

dS

S

= µdt +σdZ (2.34)

where µ is the drift rate, σ is the volatility, and dZ is the increment of a Wiener process,

dZ = φ

√

dt (2.35)

where φ ∼ N(0, 1). Equations (2.34) and (2.35) are called geometric Brownian motion with drift. So,

superimposed on the upward (relative) drift is a (relative) random walk. The degree of randomness is given

9

0 2 4 6 8 10 12

Time (years)

0

100

200

300

400

500

600

700

800

900

1000

A

s

s

e

t

P

r

i

c

e

(

$

)

Risk Free

Return

Low Volatility Case

σ = .20 per year

0 2 4 6 8 10 12

Time (years)

0

100

200

300

400

500

600

700

800

900

1000

A

s

s

e

t

P

r

i

c

e

(

$

)

Risk Free

Return

High Volatility Case

σ = .40 per year

Figure 2.3: Realizations of asset price following geometric Brownian motion. Left: low volatility case; right:

high volatility case. Risk-free rate of return r = .05.

by the volatility σ. Figure 2.3 gives an illustration of ten realizations of this random process for two diﬀerent

values of the volatility. In this case, we assume that the drift rate µ equals the risk free rate.

Note that

E(dS) = E(σSdZ +µSdt)

= µSdt

since E(dZ) = 0 (2.36)

and that the variance of dS is

V ar[dS] = E(dS

2

) −[E(dS)]

2

= E(σ

2

S

2

dZ

2

)

= σ

2

S

2

dt (2.37)

so that σ is a measure of the degree of randomness of the stock price movement.

Equation (2.34) is a stochastic diﬀerential equation. The normal rules of calculus don’t apply, since for

example

dZ

dt

= φ

1

√

dt

→∞ as dt →0 .

The study of these sorts of equations uses results from stochastic calculus. However, for our purposes, we

need only one result, which is Ito’s Lemma (see Derivatives: the theory and practice of ﬁnancial engineering,

by P. Wilmott). Suppose we have some function G = G(S, t), where S follows the stochastic process equation

(2.34), then, in small time increment dt, G →G+dG, where

dG =

µS

∂G

∂S

+

σ

2

S

2

2

∂

2

G

∂S

2

+

∂G

∂t

dt +σS

∂G

∂S

dZ (2.38)

An informal derivation of this result is given in the following section.

10

2.6.1 Ito’s Lemma

We give an informal derivation of Ito’s lemma (2.38). Suppose we have a variable S which follows

dS = a(S, t)dt +b(S, t)dZ (2.39)

where dZ is the increment of a Weiner process.

Now since

dZ

2

= φ

2

dt (2.40)

where φ is a random variable drawn from a normal distribution with mean zero and unit variance, we have

that, if E is the expectation operator, then

E(φ) = 0 E(φ

2

) = 1 (2.41)

so that the expected value of dZ

2

is

E(dZ

2

) = dt (2.42)

Now, it can be shown (see Section 6) that in the limit as dt →0, we have that φ

2

dt becomes non-stochastic,

so that with probability one

dZ

2

→dt as dt →0 (2.43)

Now, suppose we have some function G = G(S, t), then

dG = G

S

dS +G

t

dt +G

SS

dS

2

2

+... (2.44)

Now (from (2.39) )

(dS)

2

= (adt +b dZ)

2

= a

2

dt

2

+ab dZdt +b

2

dZ

2

(2.45)

Since dZ = O(

√

dt) and dZ

2

→dt, equation (2.45) becomes

(dS)

2

= b

2

dZ

2

+O((dt)

3/2

) (2.46)

or

(dS)

2

→b

2

dt as dt →0 (2.47)

Now, equations(2.39,2.44,2.47) give

dG = G

S

dS +G

t

dt +G

SS

dS

2

2

+...

= G

S

(a dt +b dZ) +dt(G

t

+G

SS

b

2

2

)

= G

S

b dZ + (aG

S

+G

SS

b

2

2

+G

t

)dt (2.48)

So, we have the result that if

dS = a(S, t)dt +b(S, t)dZ (2.49)

and if G = G(S, t), then

dG = G

S

b dZ + (a G

S

+G

SS

b

2

2

+G

t

)dt (2.50)

Equation (2.38) can be deduced by setting a = µS and b = σS in equation (2.50).

11

2.6.2 Some uses of Ito’s Lemma

Suppose we have

dS = µdt +σdZ . (2.51)

If µ, σ = Const., then this can be integrated (from t = 0 to t = t) exactly to give

S(t) = S(0) +µt +σ(Z(t) −Z(0)) (2.52)

and from equation (2.28)

Z(t) −Z(0) ∼ N(0, t) (2.53)

Suppose instead we use the more usual geometric Brownian motion

dS = µSdt +σSdZ (2.54)

Let F(S) = log S, and use Ito’s Lemma

dF = F

S

SσdZ + (F

S

µS +F

SS

σ

2

S

2

2

+F

t

)dt

= (µ −

σ

2

2

)dt +σdZ , (2.55)

so that we can integrate this to get

F(t) = F(0) + (µ −

σ

2

2

)t +σ(Z(t) −Z(0)) (2.56)

or, since S = e

F

,

S(t) = S(0) exp[(µ −

σ

2

2

)t +σ(Z(t) −Z(0))] . (2.57)

Unfortunately, these cases are about the only situations where we can exactly integrate the SDE (constant

σ, µ).

2.6.3 Some more uses of Ito’s Lemma

We can often use Ito’s Lemma and some algebraic tricks to determine some properties of distributions. Let

dX = a(X, t) dt +b(X, t) dZ , (2.58)

then if G = G(X), then

dG =

¸

aG

X

+G

t

+

b

2

2

G

XX

dt +G

X

b dZ . (2.59)

If E[X] =

¯

X, then (b(X, t) and dZ are independent)

E[dX] = d E[S] = d

¯

X

= E[a dt] +E[b] E[dZ]

= E[a dt] , (2.60)

so that

d

¯

X

dt

= E[a] = ¯ a

¯

X = E

¸

t

0

a dt

. (2.61)

12

Let

¯

G = E[(X −

¯

X)

2

] = var(X), then

d

¯

G = E [dG]

= E[2(X −

¯

X)a −2(X −

¯

X)¯ a +b

2

] dt +E[2b(X −

¯

X)]E[dZ]

= E[b

2

dt] +E[2(X −

¯

X)(a − ¯ a) dt] , (2.62)

which means that

¯

G = var(X) = E

¸

t

0

b

2

dt

+E

¸

t

0

2(a − ¯ a)(X −

¯

X) dt

. (2.63)

In a particular case, we can sometimes get more useful expressions. If

dS = µS dt +σS dZ (2.64)

with µ, σ constant, then

E[dS] = d

¯

S = E[µS] dt

= µ

¯

S dt , (2.65)

so that

d

¯

S = µ

¯

S dt

¯

S = S

0

e

µt

. (2.66)

Now, let G(S) = S

2

, so that E[G] =

¯

G = E[S

2

], then (from Ito’s Lemma)

d

¯

G = E[2µS

2

+σ

2

S

2

] dt +E[2S

2

σ]E[dZ]

= E[2µS

2

+σ

2

S

2

] dt

= (2µ +σ

2

)

¯

G dt , (2.67)

so that

¯

G =

¯

G

0

e

(2µ+σ

2

)t

E[S

2

] = S

2

0

e

(2µ+σ

2

)t

. (2.68)

From equations (2.66) and (2.68) we then have

var(S) = E[S

2

] −(E[S])

2

= E[S

2

] −

¯

S

2

= S

2

0

e

2µt

(e

σ

2

t

−1)

=

¯

S

2

(e

σ

2

t

−1) . (2.69)

One can use the same ideas to compute the skewness, E[(S −

¯

S)

3

]. If G(S) = S

3

and

¯

G = E[G(S)] = E[S

3

],

then

d

¯

G = E[µS 3S

2

+σ

2

S

2

/2 3 2S] dt +E[3S

2

σS]E[dZ]

= E[3µS

3

+ 3σ

2

S

3

]

= 3(µ +σ

2

)

¯

G , (2.70)

so that

¯

G = E[S

3

]

= S

3

0

e

3(µ+σ

2

)t

. (2.71)

We can then obtain the skewness from

E[(S −

¯

S)

3

] = E[S

3

−2S

2

¯

S −2S

¯

S

2

+

¯

S

3

]

= E[S

3

] −2

¯

SE[S

2

] −

¯

S

3

. (2.72)

Equations (2.66, 2.68, 2.71) can then be substituted into equation (2.72) to get the desired result.

13

2.7 The Black-Scholes Analysis

Assume

• The stock price follows geometric Brownian motion, equation (2.34).

• The risk-free rate of return is a constant r.

• There are no arbitrage opportunities, i.e. all risk-free portfolios must earn the risk-free rate of return.

• Short selling is permitted (i.e. we can own negative quantities of an asset).

Suppose that we have an option whose value is given by V = V (S, t). Construct an imaginary portfolio,

consisting of one option, and a number of (−(α

h

)) of the underlying asset. (If (α

h

) > 0, then we have sold

the asset short, i.e. we have borrowed an asset, sold it, and are obligated to give it back at some future

date).

The value of this portfolio P is

P = V −(α

h

)S (2.73)

In a small time dt, P →P +dP,

dP = dV −(α

h

)dS (2.74)

Note that in equation (2.74) we not included a term (α

h

)

S

S. This is actually a rather subtle point, since

we shall see (later on) that (α

h

) actually depends on S. However, if we think of a real situation, at any

instant in time, we must choose (α

h

), and then we hold the portfolio while the asset moves randomly. So,

equation (2.74) is actually the change in the value of the portfolio, not a diﬀerential. If we were taking a

true diﬀerential then equation (2.74) would be

dP = dV −(α

h

)dS −Sd(α

h

)

but we have to remember that (α

h

) does not change over a small time interval, since we pick (α

h

), and

then S changes randomly. We are not allowed to peek into the future, (otherwise, we could get rich without

risk, which is not permitted by the no-arbitrage condition) and hence (α

h

) is not allowed to contain any

information about future asset price movements. The principle of no peeking into the future is why Ito

stochastic calculus is used. Other forms of stochastic calculus are used in Physics applications (i.e. turbulent

ﬂow).

Substituting equations (2.34) and (2.38) into equation (2.74) gives

dP = σS

V

S

−(α

h

)

dZ +

µSV

S

+

σ

2

S

2

2

V

SS

+V

t

−µ(α

h

)S

dt (2.75)

We can make this portfolio riskless over the time interval dt, by choosing (α

h

) = V

S

in equation (2.75). This

eliminates the dZ term in equation (2.75). (This is the analogue of our choice of the amount of stock in the

riskless portfolio for the two state tree model.) So, letting

(α

h

) = V

S

(2.76)

then substituting equation (2.76) into equation (2.75) gives

dP =

V

t

+

σ

2

S

2

2

V

SS

dt (2.77)

Since P is now risk-free in the interval t →t +dt, then no-arbitrage says that

dP = rPdt (2.78)

Therefore, equations (2.77) and (2.78) give

rPdt =

V

t

+

σ

2

S

2

2

V

SS

dt (2.79)

14

Since

P = V −(α

h

)S = V −V

S

S (2.80)

then substituting equation (2.80) into equation (2.79) gives

V

t

+

σ

2

S

2

2

V

SS

+rSV

S

−rV = 0 (2.81)

which is the Black-Scholes equation. Note the rather remarkable fact that equation (2.81) is independent of

the drift rate µ.

Equation (2.81) is solved backwards in time from the option expiry time t = T to the present t = 0.

2.8 Hedging in Continuous Time

We can construct a hedging strategy based on the solution to the above equation. Suppose we sell an option

at price V at t = 0. Then we carry out the following

• We sell one option worth V . (This gives us V in cash initially).

• We borrow (S

∂V

∂S

−V ) from the bank.

• We buy

∂V

∂S

shares at price S.

At every instant in time, we adjust the amount of stock we own so that we always have

∂V

∂S

shares. Note

that this is a dynamic hedge, since we have to continually rebalance the portfolio. Cash will ﬂow into and

out of the bank account, in response to changes in S. If the amount in the bank is positive, we receive the

risk free rate of return. If negative, then we borrow at the risk free rate.

So, our hedging portfolio will be

• Short one option worth V .

• Long

∂V

∂S

shares at price S.

• V −S

∂V

∂S

cash in the bank account.

At any instant in time (including the terminal time), this portfolio can be liquidated and any obligations

implied by the short position in the option can be covered, at zero gain or loss, regardless of the value of S.

Note that given the receipt of the cash for the option, this strategy is self-ﬁnancing.

2.9 The option price

So, we can see that the price of the option valued by the Black-Scholes equation is the market price of the

option at any time. If the price was higher then the Black-Scholes price, we could construct the hedging

portfolio, dynamically adjust the hedge, and end up with a positive amount at the end. Similarly, if the price

was lower than the Black-Scholes price, we could short the hedging portfolio, and end up with a positive

gain. By the no-arbitrage condition, this should not be possible.

Note that we are not trying to predict the price movements of the underlying asset, which is a random

process. The value of the option is based on a hedging strategy which is dynamic, and must be continuously

rebalanced. The price is the cost of setting up the hedging portfolio. The Black-Scholes price is not the

expected payoﬀ.

The price given by the Black-Scholes price is not the value of the option to a speculator, who buys and

holds the option. A speculator is making bets about the underlying drift rate of the stock (note that the

drift rate does not appear in the Black-Scholes equation). For a speculator, the value of the option is given

by an equation similar to the Black-Scholes equation, except that the drift rate appears. In this case, the

price can be interpreted as the expected payoﬀ based on the guess for the drift rate. But this is art, not

science!

15

2.10 American early exercise

Actually, most options traded are American options, which have the feature that they can be exercised at

any time. Consequently, an investor acting optimally, will always exercise the option if the value falls below

the payoﬀ or exercise value. So, the value of an American option is given by the solution to equation (2.81)

with the additional constraint

V (S, t) ≥

**max(S −K, 0) for a call
**

max(K −S, 0) for a put

(2.82)

Note that since we are working backwards in time, we know what the option is worth in future, and therefore

we can determine the optimal course of action.

In order to write equation (2.81) in more conventional form, deﬁne τ = T − t, so that equation (2.81)

becomes

V

τ

=

σ

2

S

2

2

V

SS

+rSV

S

−rV

V (S, τ = 0) =

**max(S −K, 0) for a call
**

max(K −S, 0) for a put

V (0, τ) → V

τ

= −rV

V (S = ∞, τ) →

· S for a call

· 0 for a put

(2.83)

If the option is American, then we also have the additional constraints

V (S, τ) ≥

**max(S −K, 0) for a call
**

max(K −S, 0) for a put

(2.84)

Deﬁne the operator

LV ≡ V

τ

−(

σ

2

S

2

2

V

SS

+rSV

S

−rV ) (2.85)

and let V (S, 0) = V

∗

. More formally, the American option pricing problem can be stated as

LV ≥ 0

V −V

∗

≥ 0

(V −V

∗

)LV = 0 (2.86)

3 The Risk Neutral World

Suppose instead of valuing an option using the above no-arbitrage argument, we wanted to know the expected

value of the option. We can imagine that we are buying and holding the option, and not hedging. If we

are considering the value of risky cash ﬂows in the future, then these cash ﬂows should be discounted at an

appropriate discount rate, which we will call ρ (i.e. the riskier the cash ﬂows, the higher ρ).

Consequently the value of an option today can be considered to the be the discounted future value. This

is simply the old idea of net present value. Regard S today as known, and let V (S +dS, t +dt) be the value

of the option at some future time t +dt, which is uncertain, since S evolves randomly. Thus

V (S, t) =

1

1 +ρdt

E(V (S +dS, t +dt)) (3.1)

where E(...) is the expectation operator, i.e. the expected value of V (S +dS, t +dt) given that V = V (S, t)

at t = t. We can rewrite equation (3.1) as (ignoring terms of o(dt), where o(dt) represents terms that go to

zero faster than dt )

ρdtV (S, t) = E(V (S, t) +dV ) −V (S, t) . (3.2)

16

Since we regard V as the expected value, so that E[V (S, t)] = V (S, t), and then

E(V (S, t) +dV ) −V (S, t) = E(dV ) , (3.3)

so that equation (3.2) becomes

ρdtV (S, t) = E(dV ) . (3.4)

Assume that

dS

S

= µdt +σdZ . (3.5)

From Ito’s Lemma (2.38) we have that

dV =

V

t

+

σ

2

S

2

2

V

SS

+µSV

S

dt +σSdZ . (3.6)

Noting that

E(dZ) = 0 (3.7)

then

E(dV ) =

V

t

+

σ

2

S

2

2

V

SS

+µSV

S

dt . (3.8)

Combining equations (3.4-3.8) gives

V

t

+

σ

2

S

2

2

V

SS

+µSV

S

−ρV = 0 . (3.9)

Equation (3.9) is the PDE for the expected value of an option. If we are not hedging, maybe this is the

value that we are interested in, not the no-arbitrage value. However, if this is the case, we have to estimate

the drift rate µ, and the discount rate ρ. Estimating the appropriate discount rate is always a thorny issue.

Now, note the interesting fact, if we set ρ = r and µ = r in equation (3.9) then we simply get the

Black-Scholes equation (2.81).

This means that the no-arbitrage price of an option is identical to the expected value if ρ = r and µ = r.

In other words, we can determine the no-arbitrage price by pretending we are living in a world where all

assets drift at rate r, and all investments are discounted at rate r. This is the so-called risk neutral world.

This result is the source of endless confusion. It is best to think of this as simply a mathematical ﬂuke.

This does not have any reality. Investors would be very stupid to think that the drift rate of risky investments

is r. I’d rather just buy risk-free bonds in this case. There is in reality no such thing as a risk-neutral world.

Nevertheless, this result is useful for determining the no-arbitrage value of an option using a Monte Carlo

approach. Using this numerical method, we simply assume that

dS = rSdt +σSdZ (3.10)

and simulate a large number of random paths. If we know the option payoﬀ as a function of S at t = T,

then we compute

V (S, 0) = e

−rT

E

Q

(V (S, T)) (3.11)

which should be the no-arbitrage value.

Note the E

Q

in the above equation. This makes it clear that we are taking the expectation in the risk

neutral world (the expectation in the Q measure). This contrasts with the real-world expectation (the P

measure).

Suppose we want to know the expected value (in the real world) of an asset which pays V (S, t = T) at

t = T in the future. Then, the expected value (today) is given by solving

V

t

+

σ

2

S

2

2

V

SS

+µSV

S

= 0 . (3.12)

17

where we have dropped the discounting term. In particular, suppose we are going to receive V = S(t = T),

i.e. just the asset at t = T. Assume that the solution to equation (3.12) is V = Const. A(t)S, and we ﬁnd

that

V = Const. Se

µ(T−t)

. (3.13)

Noting that we receive V = S at t = T means that

V = Se

µ(T−t)

. (3.14)

Today, we can acquire the asset for price S(t = 0). At t = T, the asset is worth S(t = T). Equation (3.14)

then says that

E[V (S(t = 0), t = 0)] = E[S(t = 0)] = S(t = 0)e

µ(T)

(3.15)

In other words, if

dS = Sµ dt +Sσ dZ (3.16)

then (setting t = T)

E[S] = Se

µt

. (3.17)

Recall that the exact solution to equation (3.16) is (equation (2.57))

S(t) = S(0) exp[(µ −

σ

2

2

)t +σ(Z(t) −Z(0))] . (3.18)

So that we have just shown that E[S] = Se

µt

by using a simple PDE argument and Ito’s Lemma. Isn’t this

easier than using brute force statistics? PDEs are much more elegant.

4 Monte Carlo Methods

This brings us to the simplest numerical method for computing the no-arbitrage value of an option. Suppose

that we assume that the underlying process is

dS

S

= rdt +σdZ (4.1)

then we can simulate a path forward in time, starting at some price today S

0

, using a forward Euler

timestepping method (S

i

= S(t

i

))

S

i+1

= S

i

+S

i

(r∆t +σφ

i

√

∆t) (4.2)

where ∆t is the ﬁnite timestep, and φ

i

is a random number which is N(0, 1). Note that at each timestep,

we generate a new random number. After N steps, with T = N∆t, we have a single realized path. Given

the payoﬀ function of the option, the value for this path would be

V alue = Payoff(S

N

) . (4.3)

For example, if the option was a European call, then

V alue = max(S

N

−K, 0)

K = Strike Price (4.4)

Suppose we run a series of trials, m = 1, ..., M, and denote the payoﬀ after the m

**th trial as payoff(m).
**

Then, the no-arbitrage value of the option is

Option V alue = e

−rT

E(payoff)

· e

−rT

1

M

m=M

¸

m=1

payoff(m) . (4.5)

Recall that these paths are not the real paths, but are the risk neutral paths.

Now, we should remember that we are

18

1. approximating the solution to the SDE by forward Euler, which has O(∆t) truncation error.

2. approximating the expectation by the mean of many random paths. This Monte Carlo error is of size

O(1/

√

M), which is slowly converging.

There are thus two sources of error in the Monte Carlo approach: timestepping error and sampling error.

The slow rate of convergence of Monte Carlo methods makes these techniques unattractive except when

the option is written on several (i.e. more than three) underlying assets. As well, since we are simulating

forward in time, we cannot know at a given point in the forward path if it is optimal to exercise or hold an

American style option. This is easy if we use a PDE method, since we solve the PDE backwards in time, so

we always know the continuation value and hence can act optimally. However, if we have more than three

factors, PDE methods become very expensive computationally. As well, if we want to determine the eﬀects

of discrete hedging, for example, a Monte Carlo approach is very easy to implement.

The error in the Monte Carlo method is then

Error = O

max(∆t,

1

√

M

)

∆t = timestep

M = number of Monte Carlo paths (4.6)

Now, it doesn’t make sense to drive the Monte Carlo error down to zero if there is O(∆t) timestepping error.

We should seek to balance the timestepping error and the sampling error. In order to make these two errors

the same order, we should choose M = O(

1

(∆t)

2

). This makes the total error O(∆t). We also have that

Complexity = O

M

∆t

= O

1

(∆t)

3

∆t = O

(Complexity)

−1/3

(4.7)

and hence

Error = O

1

( Complexity)

1/3

. (4.8)

In practice, the convergence in terms of timestep error is often not done. People just pick a timestep,

i.e. one day, and increase the number of Monte Carlo samples until they achieve convergence in terms of

sampling error, and ignore the timestep error. Sometimes this gives bad results!

Note that the exact solution to Geometric Brownian motion (2.57) has the property that the asset value

S can never reach S = 0 if S(0) > 0, in any ﬁnite time. However, due to the approximate nature of our

Forward Euler method for solving the SDE, it is possible that a negative or zero S

i

can show up. We can

do one of three things here, in this case

• Cut back the timestep at this point in the simulation so that S is positive.

• Set S = 0 and continue. In this case, S remains zero for the rest of this particular simulation.

• Use Ito’s Lemma, and determine the SDE for log S, i.e. if F = log S, then, from equation (2.55), we

obtain (with µ = r)

dF = (r −

σ

2

2

)dt +σdZ , (4.9)

so that now, if F < 0, there is no problem, since S = e

F

, and if F < 0, this just means that S is very

small. We can use this idea for any stochastic process where the variable should not go negative.

19

Usually, most people set S = 0 and continue. As long as the timestep is not too large, this situation is

probably due to an event of low probability, hence any errors incurred will not aﬀect the expected value very

much. If negative S values show up many times, this is a signal that the timestep is too large.

In the case of simple Geometric Brownian motion, where r, σ are constants, then the SDE can be solved

exactly, and we can avoid timestepping errors (see Section 2.6.2). In this case

S(T) = S(0) exp[(r −

σ

2

2

)T +σφ

√

T] (4.10)

where φ ∼ N(0, 1). I’ll remind you that equation (4.10) is exact. For these simple cases, we should always

use equation (4.10). Unfortunately, this does not work in more realistic situations.

Monte Carlo is popular because

• It is simple to code. Easily handles complex path dependence.

• Easily handles multiple assets.

The disadvantages of Monte Carlo methods are

• It is diﬃcult to apply this idea to problems involving optimal decision making (e.g. American options).

• It is hard to compute the Greeks (V

S

, V

SS

), which are the hedging parameters, very accurately.

• MC converges slowly.

4.1 Monte Carlo Error Estimators

The sampling error can be estimated via a statistical approach. If the estimated mean of the sample is

ˆ µ =

e

−rT

M

m=M

¸

m=1

payoff(m) (4.11)

and the standard deviation of the estimate is

ω =

1

M −1

m=M

¸

m=1

(e

−rT

payoff(m) − ˆ µ)

2

1/2

(4.12)

then the 95% conﬁdence interval for the actual value V of the option is

ˆ µ −

1.96ω

√

M

< V < ˆ µ +

1.96ω

√

M

(4.13)

Note that in order to reduce this error by a factor of 10, the number of simulations must be increased by

100.

The timestep error can be estimated by running the problem with diﬀerent size timesteps, comparing the

solutions.

4.2 Random Numbers and Monte Carlo

There are many good algorithms for generating random sequences which are uniformly distributed in [0, 1].

See for example, (Numerical Recipies in C++., Press et al, Cambridge University Press, 2002). As pointed

out in this book, often the system supplied random number generators, such as rand in the standard C

library, and the infamous RANDU IBM function, are extremely bad. The Matlab functions appear to be

quite good. For more details, please look at (Park and Miller, ACM Transactions on Mathematical Software,

31 (1988) 1192-1201). Another good generator is described in (Matsumoto and Nishimura, “The Mersenne

Twister: a 623 dimensionally equidistributed uniform pseudorandom number generator,” ACM Transactions

20

on Modelling and Computer Simulation, 8 (1998) 3-30.) Code can be downloaded from the authors Web

site.

However, we need random numbers which are normally distributed on [−∞, +∞], with mean zero and

variance one (N(0, 1)).

Suppose we have uniformly distributed numbers on [0, 1], i.e. the probability of obtaining a number

between x and x +dx is

p(x)dx = dx ; 0 ≤ x ≤ 1

= 0 ; otherwise (4.14)

Let’s take a function of this random variable y(x). How is y(x) distributed? Let ˆ p(y) be the probability

distribution of obtaining y in [y, y + dy]. Consequently, we must have (recall the law of transformation of

probabilities)

p(x)[dx[ = ˆ p(y)[dy[

or

ˆ p(y) = p(x)

dx

dy

. (4.15)

Suppose we want ˆ p(y) to be normal,

ˆ p(y) =

e

−y

2

/2

√

2π

. (4.16)

If we start with a uniform distribution, p(x) = 1 on [0, 1], then from equations (4.15-4.16) we obtain

dx

dy

=

e

−y

2

/2

√

2π

. (4.17)

Now, for x ∈ [0, 1], we have that the probability of obtaining a number in [0, x] is

x

0

dx

= x , (4.18)

but from equation (4.17) we have

dx

=

e

−(y

)

2

/2

√

2π

dy

. (4.19)

So, there exists a y such that the probability of getting a y

**in [−∞, y] is equal to the probability of getting
**

x

in [0, x],

x

0

dx

=

y

−∞

e

−(y

)

2

/2

√

2π

dy

, (4.20)

or

x =

y

−∞

e

−(y

)

2

/2

√

2π

dy

. (4.21)

So, if we generate uniformly distributed numbers x on [0, 1], then to determine y which are N(0, 1), we do

the following

• Generate x

• Find y such that

x =

1

√

2π

y

−∞

e

−(y

)

2

/2

dy

. (4.22)

We can write this last step as

y = F(x) (4.23)

where F(x) is the inverse cumulative normal distribution.

21

4.3 The Box-Muller Algorithm

Starting from random numbers which are uniformly distributed on [0, 1], there is actually a simpler method

for obtaining random numbers which are normally distributed.

If p(x) is the probability of ﬁnding x ∈ [x, x +dx] and if y = y(x), and ˆ p(y) is the probability of ﬁnding

y ∈ [y, y +dy], then, from equation (4.15) we have

[p(x)dx[ = [ ˆ p(y)dy[ (4.24)

or

ˆ p(y) = p(x)

dx

dy

. (4.25)

Now, suppose we have two original random variables x

1

, x

2

, and let p(x

i

, x

2

) be the probability of

obtaining (x

1

, x

2

) in [x

1

, x

1

+dx

1

] [x

2

, x

2

+dx

2

]. Then, if

y

1

= y

1

(x

1

, x

2

)

y

2

= y

2

(x

1

, x

2

) (4.26)

and we have that

ˆ p(y

1

, y

2

) = p(x

1

, x

2

)

∂(x

1

, x

2

)

∂(y

1

, y

2

)

(4.27)

where the Jacobian of the transformation is deﬁned as

∂(x

1

, x

2

)

∂(y

1

, y

2

)

= det

∂x1

∂y1

∂x1

∂y2

∂x2

∂y1

∂x2

∂y2

(4.28)

Recall that the Jacobian of the transformation can be regarded as the scaling factor which transforms dx

1

dx

2

to dy

1

dy

2

, i.e.

dx

1

dx

2

=

∂(x

1

, x

2

)

∂(y

1

, y

2

)

dy

1

dy

2

. (4.29)

Now, suppose that we have x

1

, x

2

uniformly distributed on [0, 1] [0, 1], i.e.

p(x

1

, x

2

) = U(x

1

)U(x

2

) (4.30)

where

U(x) = 1 ; 0 ≤ x ≤ 1

= 0 ; otherwise . (4.31)

We denote this distribution as x

1

∼ U[0, 1] and x

2

∼ U[0, 1].

If p(x

1

, x

2

) is given by equation (4.30), then we have from equation (4.27) that

ˆ p(y

1

, y

2

) =

∂(x

1

, x

2

)

∂(y

1

, y

2

)

(4.32)

Now, we want to ﬁnd a transformation y

1

= y

1

(x

1

, x

2

), y

2

= y

2

(x

1

, x

2

) which results in normal distributions

for y

1

, y

2

. Consider

y

1

=

−2 log x

1

cos 2πx

2

y

2

=

−2 log x

1

sin 2πx

2

(4.33)

22

or solving for (x

2

, x

2

)

x

1

= exp

−1

2

(y

2

1

+y

2

2

)

x

2

=

1

2π

tan

−1

¸

y

2

y

1

. (4.34)

After some tedious algebra, we can see that (using equation (4.34))

∂(x

1

, x

2

)

∂(y

1

, y

2

)

=

1

√

2π

e

−y

2

1

/2

1

√

2π

e

−y

2

2

/2

(4.35)

Now, assuming that equation (4.30) holds, then from equations (4.32-4.35) we have

ˆ p(y

1

, y

2

) =

1

√

2π

e

−y

2

1

/2

1

√

2π

e

−y

2

2

/2

(4.36)

so that (y

1

, y

2

) are independent, normally distributed random variables, with mean zero and variance one,

or

y

1

∼ N(0, 1) ; y

2

∼ N(0, 1) . (4.37)

This gives the following algorithm for generating normally distributed random numbers (given uniformly

distributed numbers):

Box Muller Algorithm

Repeat

Generate u

1

∼ U(0, 1), u

2

∼ U(0, 1)

θ = 2πu

2

, ρ =

√

−2 log u

1

z

1

= ρ cos θ; z

2

= ρ sin θ

End Repeat

(4.38)

This has the eﬀect that Z

1

∼ N(0, 1) and Z

2

∼ N(0, 1).

Note that we generate two draws from a normal distribution on each pass through the loop.

4.3.1 An improved Box Muller

The algorithm (4.38) can be expensive due to the trigonometric function evaluations. We can use the

following method to avoid these evaluations. Let

U

1

∼ U[0, 1] ; U

2

∼ U[0, 1]

V

1

= 2U

1

−1 ; V

2

= 2U

2

−1 (4.39)

which means that (V

1

, V

2

) are uniformly distributed in [−1, 1] [−1, 1]. Now, we carry out the following

procedure

Rejection Method

Repeat

If ( V

2

1

+V

2

2

< 1 )

Accept

Else

Reject

23

Endif

End Repeat

(4.40)

which means that if we deﬁne (V

1

, V

2

) as in equation (4.39), and then process the pairs (V

1

, V

2

) using

algorithm (4.40) we have that (V

1

, V

2

) are uniformly distributed on the disk centered at the origin, with

radiius one, in the (V

1

, V

2

) plane. This is denoted by

(V

1

, V

2

) ∼ D(0, 1) . (4.41)

If (V

1

, V

2

) ∼ D(0, 1) and R

2

= V

2

1

+V

2

2

, then the probability of ﬁnding R in [R, R +dR] is

p(R) dR =

2πR dR

π(1)

2

= 2R dR . (4.42)

From the fundamental law of transformation of probabilities, we have that

p(R

2

)d(R

2

) = p(R)dR

= 2R dR (4.43)

so that

p(R

2

) =

2R

d(R

2

)

dR

= 1 (4.44)

so that R

2

is uniformly distributed on [0, 1], (R

2

∼ U[0, 1]).

As well, if θ = tan

−1

(V

2

/V

1

), i.e. θ is the angle between a line from the origin to the point (V

1

, V

2

) and

the V

1

axis, then θ ∼ U[0, 2π]. Note that

cos θ =

V

1

V

2

1

+V

2

2

sin θ =

V

2

V

2

1

+V

2

2

. (4.45)

Now in the original Box Muller algorithm (4.38),

ρ =

−2 log U

1

; U

1

∼ U[0, 1]

θ = 2ΠU

2

; U

2

∼ U[0, 1] , (4.46)

but θ = tan

−1

(V

2

/V

1

) ∼ U[0, 2π], and R

2

= U[0, 1]. Therefore, if we let W = R

2

, then we can replace θ, ρ

in algorithm (4.38) by

θ = tan

−1

V

2

V

1

ρ =

−2 log W . (4.47)

Now, the last step in the Box Muller algorithm (4.38) is

Z

1

= ρ cos θ

Z

2

= ρ sin θ , (4.48)

24

but since W = R

2

= V

2

1

+V

2

2

, then cos θ = V

1

/R, sin θ = V

2

/R, so that

Z

1

= ρ

V

1

√

W

Z

2

= ρ

V

2

√

W

. (4.49)

This leads to the following algorithm

Polar form of Box Muller

Repeat

Generate U

1

∼ U[0, 1], U

2

∼ U[0, 1].

Let

V

1

= 2U

1

−1

V

2

= 2U

2

−1

W = V

2

1

+V

2

2

If( W < 1) then

Z

1

= V

1

−2 log W/W

Z

2

= V

2

−2 log W/W (4.50)

End If

End Repeat

Consequently, (Z

1

, Z

2

) are independent (uncorrelated), and Z

1

∼ N(0, 1), and Z

2

∼ N(0, 1). Because of the

rejection step (4.40), about (1 −π/4) of the random draws in [−1, +1] [−1, +1] are rejected (about 21%),

but this method is still generally more eﬃcient than brute force Box Muller.

4.4 Speeding up Monte Carlo

Monte Carlo methods are slow to converge, since the error is given by

Error = O(

1

√

M

)

where M is the number of samples. There are many methods which can be used to try to speed up

convergence. These are usually termed Variance Reduction techniques.

Perhaps the simplest idea is the Antithetic Variable method. Suppose we compute a random asset path

S

i+1

= S

i

µ∆t +S

i

σφ

i

√

∆t

where φ

i

are N(0, 1). We store all the φ

i

, i = 1, ..., for a given path. Call the estimate for the option price

from this sample path V

+

. Then compute a second sample path where (φ

i

)

= −φ

i

, i = 1, ...,. Call this

estimate V

−

. Then compute the average

¯

V =

V

+

+V

−

2

,

and continue sampling in this way. Averaging over all the

¯

V , slightly faster convergence is obtained. Intu-

itively, we can see that this symmetrizes the random paths.

25

Let X

+

be the option values obtained from all the V

+

simulations, and X

−

be the estimates obtained

from all the V

−

simulations. Note that V ar(X

+

) = V ar(X

−

) (they have the same distribution). Then

V ar(

X

+

+X

−

2

) =

1

4

V ar(X

+

) +

1

4

V ar(X

−

) +

1

2

Cov(X

+

, X

−

)

=

1

2

V ar(X

+

) +

1

2

Cov(X

+

, X

−

) (4.51)

which will be smaller than V ar(X

+

) if Cov(X

+

, X

−

) is nonpositive. Warning: this is not always the case.

For example, if the payoﬀ is not a monotonic function of S, the results may actually be worse than crude

Monte Carlo. For example, if the payoﬀ is a capped call

payoﬀ = min(K

2

, max(S −K

1

, 0))

K

2

> K

1

then the antithetic method performs poorly.

Note that this method can be used to estimate the mean. In the MC error estimator (4.13), compute the

standard deviation of the estimator as ω =

V ar(

X

+

+X

−

2

).

However, if we want to estimate the distribution of option prices (i.e. a probability distribution), then

we should not average each V

+

and V

−

, since this changes the variance of the actual distribution.

If we want to know the actual variance of the distribution (and not just the mean), then to compute the

variance of the distribution, we should just use the estimates V

+

, and compute the estimate of the variance

in the usual way. This should also be used if we want to plot a histogram of the distribution, or compute

the Value at Risk.

4.5 Estimating the mean and variance

An estimate of the mean ¯ x and variance s

2

M

of M numbers x

1

, x

2

, ..., x

M

is

s

2

M

=

1

M −1

M

¸

i=1

(x

i

− ¯ x)

2

¯ x =

1

M

M

¸

i=1

x

i

(4.52)

Alternatively, one can use

s

2

M

=

1

M −1

¸

M

¸

i=1

x

2

i

−

1

M

M

¸

i=1

x

i

2

¸

(4.53)

which has the advantage that the estimate of the mean and standard deviation can be computed in one loop.

In order to avoid roundoﬀ, the following method is suggested by Seydel (R. Seydel, Tools for Computa-

tional Finance, Springer, 2002). Set

α

1

= x

1

; β

1

= 0 (4.54)

then compute recursively

α

i

= α

i−1

+

x

i

−α

i−1

i

β

i

= β

i−1

+

(i −1)(x

i

−α

i−1

)

2

i

(4.55)

so that

¯ x = α

M

s

2

M

=

β

M

M −1

(4.56)

26

4.6 Low Discrepancy Sequences

In a eﬀort to get around the

1

√

M

, (M = number of samples) behaviour of Monte Carlo methods, quasi-Monte

Carlo methods have been devised.

These techniques use a deterministic sequence of numbers (low discrepancy sequences). The idea here

is that a Monte Carlo method does not ﬁll the sample space very evenly (after all, its random). A low

discrepancy sequence tends to sample the space in a orderly fashion. If d is the dimension of the space, then

the worst case error bound for an LDS method is

Error = O

(log M)

d

M

(4.57)

where M is the number of samples used. Clearly, if d is small, then this error bound is (at least asymptotically)

better than Monte Carlo.

LDS methods generate numbers on [0, 1]. We cannot use the Box-Muller method in this case to produce

normally distributed numbers, since these numbers are deterministic. We have to invert the cumulative

normal distribution in order to get the numbers distributed with mean zero and standard deviation one on

[−∞, +∞]. So, if F(x) is the inverse cumulative normal distribution, then

x

LDS

= uniformly distributed on [0, 1]

y

LDS

= F(x

LDS

) is N(0, 1) . (4.58)

Another problem has to do with the fact that if we are stepping through time, i.e.

S

n+1

= S

n

+S

n

(r∆t +φσ

√

∆t)

φ = N(0, 1) (4.59)

with, say, N steps in total, then we need to think of this as a problem in N dimensional space. In other

words, the k −th timestep is sampled from the k −th coordinate in this N dimensional space. We are trying

to uniformly sample from this N dimensional space.

Let ˆ x be a vector of LDS numbers on [0, 1], in N dimensional space

ˆ x =

x

1

x

2

[

x

N

¸

¸

¸

¸

. (4.60)

So, an LDS algorithm would proceed as follows, for the j

th trial

• Generate ˆ x

j

(the j

**th LDS number in an N dimensional space).
**

• Generate the normally distributed vector ˆ y

j

by inverting the cumulative normal distribution for each

component

ˆ y

j

=

F(x

j

1

)

F(x

j

2

)

[

F(x

j

N

)

¸

¸

¸

¸

(4.61)

• Generate a complete sample path k = 0, ..., N −1

S

k+1

j

= S

k

j

+S

k

j

(r∆t + ˆ y

j

k+1

σ

√

∆t)

(4.62)

• Compute the payoﬀ at S = S

N

j

27

The option value is the average of these trials.

There are a variety of LDS numbers: Halton, Sobol, Niederrieter, etc. Our tests seem to indicate that

Sobol is the best.

Note that the worst case error bound for the error is given by equation (4.57). If we use a reasonable

number of timesteps, say 50 −100, then, d = 50 −100, which gives a very bad error bound. For d large, the

numerator in equation (4.57) dominates. The denominator only dominates when

M · e

d

(4.63)

which is a very large number of trials for d · 100. Fortunately, at least for path-dependent options, we have

found that things are not quite this bad, and LDS seems to work if the number of timesteps is less than

100 −200. However, once the dimensionality gets above a few hundred, convergence seems to slow down.

4.7 Correlated Random Numbers

In many cases involving multiple assets, we would like to generate correlated, normally distributed random

numbers. Suppose we have i = 1, ..., d assets, and each asset follows the simulated path

S

n+1

i

= S

n

i

+S

n

i

(r∆t +φ

n

i

σ

i

√

∆t)

(4.64)

where φ

n

i

is N(0, 1) and

E(φ

n

i

φ

n

j

) = ρ

ij

(4.65)

where ρ

ij

is the correlation between asset i and asset j.

Now, it is easy to generate a set of d uncorrelated N(0, 1) variables. Call these

1

, ...,

d

. So, how do we

produce correlated numbers? Let

[Ψ]

ij

= ρ

ij

(4.66)

be the matrix of correlation coeﬃcients. Assume that this matrix is SPD (if not, one of the random variables

is a linear combination of the others, hence this is a degenerate case). Assuming Ψ is SPD, we can Cholesky

factor Ψ = LL

t

, so that

ρ

ij

=

¸

k

L

ik

L

t

kj

(4.67)

Let

¯

φ be the vector of correlated normally distributed random numbers (i.e. what we want to get), and let

¯ be the vector of uncorrelated N(0, 1) numbers (i.e. what we are given).

¯

φ =

φ

1

φ

2

[

φ

d

¸

¸

¸

¸

; ¯ =

1

2

[

d

¸

¸

¸

¸

(4.68)

So, given ¯ , we have

E(

i

j

) = δ

ij

where

δ

ij

= 0 ; if i = j

= 1 ; if i = j .

28

since the

i

are uncorrelated. Now, let

φ

i

=

¸

j

L

ij

j

(4.69)

which gives

φ

i

φ

k

=

¸

j

¸

l

L

ij

L

kl

l

j

=

¸

j

¸

l

L

ij

l

j

L

t

lk

. (4.70)

Now,

E(φ

i

φ

k

) = E

¸

j

¸

l

L

ij

l

j

L

t

lk

¸

¸

=

¸

j

¸

l

L

ij

E(

l

j

)L

t

lk

=

¸

j

¸

l

L

ij

δ

lj

L

t

lk

=

¸

l

L

il

L

t

lk

= ρ

ij

(4.71)

So, in order to generate correlated N(0, 1) numbers:

• Factor the correlation matrix Ψ = LL

t

• Generate uncorrelated N(0, 1) numbers

i

• Correlated numbers φ

i

are given from

¯

φ = L¯

4.8 Integration of Stochastic Diﬀerential Equations

Up to now, we have been fairly slack about deﬁning what we mean by convergence when we use forward

Euler timestepping (4.2) to integrate

dS = µS dt +σS dZ . (4.72)

The forward Euler algorithm is simply

S

i+1

= S

i

+S

i

(µh +φ

i

√

h) (4.73)

where h = ∆t is the ﬁnite timestep. For a good overview of these methods, check out (“An algorithmic

introduction to numerical simulation of stochastic diﬀerential equations,” by D. Higham, SIAM Review vol.

43 (2001) 525-546). This article also has some good tips on solving SDEs using Matlab, in particular, taking

full advantage of the vectorization of Matlab. Note that eliminating as many for loops as possible (i.e.

computing all the MC realizations for each timestep in a vector) can reduce computation time by orders of

magnitude.

Before we start deﬁning what we mean by convergence, let’s consider the following situation. Recall that

dZ = φ

√

dt (4.74)

29

where φ is a random draw from a normal distribution with mean zero and variance one. Let’s imagine

generating a set of Z values at discrete times t

i

, e.g. Z(t

i

) = Z

i

, by

Z

i+1

= Z

i

+φ

√

∆t . (4.75)

Now, these are all legitimate points along a Brownian motion path, since there is no timestepping error here,

in view of equation (2.53). So, this set of values ¦Z

0

, Z

1

, ..., ¦ are valid points along a Brownian path. Now,

recall that the exact solution (for a given Brownian path) of equation (4.72) is given by equation (2.57)

S(T) = S(0) exp[(µ −

σ

2

2

)t +σ(Z(T) −Z(0))] (4.76)

where T is the stopping time of the simulation.

Now if we integrate equation (4.72) using forward Euler, with the discrete timesteps ∆t = t

i+1

−t

i

, using

the realization of the Bownian path ¦Z

0

, Z

1

, ..., ¦, we will not get the exact solution (4.76). This is because

even though the Brownian path points are exact, time discretization errors are introduced in equation (4.73).

So, how can we systematically study convergence of algorithm (4.73)? We can simply take smaller timesteps.

However, we want to do this by ﬁlling in new Z values in the Brownian path, while keeping the old values

(since these are perfectly legitimate values). Let S(T)

h

represent the forward Euler solution (4.73) for a

ﬁxed timestep h. Let S(T) be the exact solution (4.76). As h → 0, we would expect S(T)

h

→ S(T), for a

given path.

4.8.1 The Brownian Bridge

So, given a set of valid Z

k

, how do we reﬁne this path, keeping the existing points along this path? In

particular, suppose we have two points Z

i

, Z

k

, at (t

i

, t

k

), and we would like to generate a point Z

j

at t

j

,

with t

i

< t

j

< t

k

. How should we pick Z

j

? What density function should we use when generating Z

j

, given

that Z

k

is known?

Let x, y be two draws from a normal distribution with mean zero and variance one. Suppose we have the

point Z(t

i

) = Z

i

and we generate Z(t

j

) = Z

j

, Z(t

k

) = Z

k

along the Wiener path,

Z

j

= Z

i

+x

t

j

−t

i

(4.77)

Z

k

= Z

j

+y

t

k

−t

j

(4.78)

Z

k

= Z

i

+x

t

j

−t

i

+y

t

k

−t

j

. (4.79)

So, given (x, y), and Z

i

, we can generate Z

j

, Z

k

. Suppose on the other hand, we have Z

i

, and we generate

Z

k

directly using

Z

k

= Z

i

+z

√

t

k

−t

i

, (4.80)

where z is N(0, 1). Then how do we generate Z

j

using equation (4.77)? Since we are now specifying that

we know Z

k

, this means that our method for generating Z

j

is constrained. For example, given z, we must

have that, from equations (4.79) and (4.80)

y =

z

√

t

k

−t

i

−x

√

t

j

−t

i

√

t

k

−t

j

. (4.81)

Now the probability density of drawing the pair (x, y) given z, denoted by p(x, y[z) is

p(x, y[z) =

p(x)p(y)

p(z)

(4.82)

where p(..) is a standard normal distribution, and we have used the fact that successive increments of a

Brownian process are uncorrelated.

30

From equation (4.81), we can write y = y(x, z), so that p(x, y[z) = p(x, y(x, z)[z)

p(x, y(x, z)[z) =

p(x)p(y(x, z))

p(z)

=

1

√

2π

exp

¸

−

1

2

(x

2

+y

2

−z

2

)

(4.83)

or (after some algebra, using equation (4.81))

p(x[z) =

1

√

2π

exp

¸

−

1

2

(x −αz)

2

/β

2

α =

t

j

−t

i

t

k

−t

i

β =

t

k

−t

j

t

k

−t

i

(4.84)

so that x is normally distributed with mean αz and variance β

2

. Since

z =

Z

k

−Z

i

√

t

k

−t

i

(4.85)

we have that x has mean

E(x) =

√

t

j

−t

i

t

k

−t

i

(Z

k

−Z

i

) (4.86)

and variance

E[(x −E(x))

2

] =

t

k

−t

j

t

k

−t

i

(4.87)

Now, let

x =

√

t

j

−t

i

t

k

−t

i

(Z

k

−Z

i

) +φ

t

k

−t

j

t

k

−t

i

(4.88)

where φ is N(0, 1). Clearly, x satisﬁes equations (4.86) and (4.88). Substituting equation (4.88) into (4.77)

gives

Z

j

=

t

k

−t

j

t

k

−t

i

Z

i

+

t

j

−t

i

t

k

−t

i

Z

k

+φ

(t

j

−t

i

)(t

k

−t

j

)

(t

k

−t

i

)

(4.89)

where φ is N(0, 1). Equation (4.89) is known as the Brownian Bridge.

Figure 4.1 shows diﬀerent Brownian paths constructed for diﬀerent timestep sizes. An initial coarse path

is constructed, then the ﬁne timestep path is constructed from the coarse path using a Brownian Bridge. By

construction, the ﬁnal timestep path will pass through the coarse timestep nodes.

Figure 4.2 shows the asset paths integrated using the forward Euler algorithm (4.73) fed with the Brow-

nian paths in Figure 4.1. In this case, note that the ﬁne timestep path does not coincide with the coarse

timestep nodes, due to the timestepping error.

4.8.2 Strong and Weak Convergence

Since we are dealing with a probabilistic situation here, it is not obvious how to deﬁne convergence. Given

a number of points along a Brownian path, we could imagine reﬁning this path (using a Brownian Bridge),

and then seeing if the solution converged to exact solution. For the model SDE (4.72), we could ask that

E

[S(T) −S

h

(T)[

≤ Const. h

γ

(4.90)

31

Time

B

r

o

w

n

i

a

n

P

a

t

h

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Time

B

r

o

w

n

i

a

n

P

a

t

h

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

Figure 4.1: Eﬀect of adding more points to a Brownian path using a Brownian bridge. Note that the small

timestep points match the coarse timestep points. Left: each coarse timestep is divided into 16 substeps.

Right: each coarse timestep divided into 64 substeps.

Time

A

s

s

e

t

P

r

i

c

e

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

80

85

90

95

100

105

110

115

120

Time

A

s

s

e

t

P

r

i

c

e

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

80

85

90

95

100

105

110

115

120

Figure 4.2: Brownian paths shown in Figure 4.1 used to determine asset price paths using forward Euler

timestepping (4.73). In this case, note that the asset paths for ﬁne and coarse timestepping do not agree at

the ﬁnal time (due to the timestepping error). Eventually, for small enough timesteps, the ﬁnal asset value

will converge to the exact solution to the SDE. Left: each coarse timestep is divided into 16 substeps. Right:

each coarse timestep divided into 64substeps.

32

T .25

σ .4

µ .06

S

0

100

Table 4.1: Data used in the convergence tests.

where the expectation in equation (4.90) is over many Brownian paths, and h is the timestep size. Note

that S(T) is the exact solution along a particular Brownian path; the same path used to compute S

h

(T).

Criterion (4.90) is called strong convergence. A less strict criterion is

[E [S(T)] −E

S

h

(T)

[ ≤ Const. h

γ

(4.91)

It can be shown that using forward Euler results in weak convergence with γ = 1, and strong convergence

with γ = .5.

Table 4.1 shows some test data used to integrate the SDE (4.72) using method (4.73). A series of Brownian

paths was constructed, beginning with a coarse timestep path. These paths were systematically reﬁned using

the Brownian Bridge construction. Table 4.2 shows results where the strong and weak convergence errors

are estimated as

Strong Error =

1

N

N

¸

i=1

[S(T)

i

−S

h

(T)

i

[

(4.92)

Weak Error = [

1

N

N

¸

i=1

[S(T)

i

] −

1

N

N

¸

i=1

S

h

(T)

i

[ , (4.93)

where S

h

(T)

i

is the solution obtained by forward Euler timestepping along the i

**th Brownian path, and
**

S(T)

i

is the exact solution along this same path, and N is the number of samples. Note that for equation

(4.72), we have the exact solution

lim

N→∞

1

N

N

¸

i=1

[S(T)

i

] = S

0

e

µT

(4.94)

but we do not replace the approximate sampled value of the limit in equation (4.93) by the theoretical limit

(4.94). If we use enough Monte Carlo samples, we could replace the approximate expression

lim

N→∞

1

N

N

¸

i=1

[S(T)

i

]

by S

0

e

µT

, but for normal parameters, the Monte Carlo sampling error is much larger than the timestepping

error, so we would have to use an enormous number of Monte Carlo samples. Estimating the weak error

using equation (4.93) will measure the timestepping error, as opposed to the Monte Carlo sampling error.

However, for normal parameters, even using equation (4.93) requires a large number of Monte Carlo samples

in order to ensure that the error is dominated by the timestepping error.

In Table 4.1, we can see that the ratio of the errors is about

√

2 for the strong error, and about one for

the weak error. This is consistent with a convergence rate of γ = .5 for strong convergence, and γ = 1.0 for

weak convergence.

4.9 Matlab and Monte Carlo Simulation

A straightforward implementation of Monte Carlo timestepping for solving the SDE

dS = µS dt +σS dZ (4.95)

in Matlab is shown in Algorithm (4.96). This code runs very slowly.

33

Timesteps Strong Error (4.90) Weak Error (4.91)

72 .0269 .00194

144 .0190 .00174

288 .0135 .00093

576 .0095 .00047

Table 4.2: Convergence results, 100,000 samples used. Data in Table 4.1.

Slow.m

randn(’state’,100);

T = 1.00; % expiry time

sigma = 0.25; % volatility

mu = .10; % P measure drift

S_init = 100; % initial value

N_sim = 10000; % number of simulations

N = 100; % number of timesteps

delt = T/N; % timestep

drift = mu*delt;

sigma_sqrt_delt = sigma*sqrt(delt);

S_new = zeros(N_sim,1);

for m=1:N_sim

S = S_init;

for i=1:N % timestep loop

S = S + S*( drift + sigma_sqrt_delt*randn(1,1) );

S = max(0.0, S );

% check to make sure that S_new cannot be < 0

end % timestep loop

S_new(m,1) = S;

end % simulation loop

n_bin = 200;

hist(S_new, n_bin);

stndrd_dev = std(S_new);

disp(sprintf(’standard deviation: %.5g\n’,stndrd_dev));

mean_S = mean(S_new);

disp(sprintf(’mean: %.5g\n’,stndrd_dev));

(4.96)

Alternatively, we can use Matlab’s vectorization capabilities to interchange the timestep loop and the

simulation loop. The innermost simulation loop can be replaced by vector statements, as shown in Algorithm

(4.97). This runs much faster.

34

Fast.m

randn(’state’,100);

%

T = 1.00; % expiry time

sigma = 0.25; % volatility

mu = .10; % P measure drift

S_init = 100; % initial value

N_sim = 10000; % number of simulations

N = 100; % number of timesteps

delt = T/N; % timestep

drift = mu*delt;

sigma_sqrt_delt = sigma*sqrt(delt);

S_old = zeros(N_sim,1);

S_new = zeros(N_sim,1);

S_old(1:N_sim,1) = S_init;

for i=1:N % timestep loop

% now, for each timestep, generate info for

% all simulations

S_new(:,1) = S_old(:,1) +...

S_old(:,1).*( drift + sigma_sqrt_delt*randn(N_sim,1) );

S_new(:,1) = max(0.0, S_new(:,1) );

% check to make sure that S_new cannot be < 0

S_old(:,1) = S_new(:,1);

%

% end of generation of all data for all simulations

% for this timestep

end % timestep loop

n_bin = 200;

hist(S_new, n_bin);

stndrd_dev = std(S_new);

disp(sprintf(’standard deviation: %.5g\n’,stndrd_dev));

mean_S = mean(S_new);

disp(sprintf(’mean: %.5g\n’,stndrd_dev));

(4.97)

5 The Binomial Model

We have seen that a problem with the Monte Carlo method is that it is diﬃcult to use for valuing American

style options. Recall that the holder of an American option can exercise the option at any time and receive

35

the payoﬀ. In order to determine whether or not it is worthwhile to hold the option, we have to compare the

value of continuing to hold the option (the continuation value) with the payoﬀ. If the continuation value is

greater than the payoﬀ, then we hold; otherwise, we exercise.

At any point in time, the continuation value depends on what happens in the future. Clearly, if we

simulate forward in time, as in the Monte Carlo approach, we don’t know what happens in the future, and

hence we don’t know how to act optimally. This is actually a dynamic programming problem. These sorts

of problems are usually solved by proceeding from the end point backwards. We use the same idea here. We

have to start from the terminal time and work backwards.

Recall that we can determine the no-arbitrage value of an option by pretending we live in a risk-neutral

world, where risky assets drift at r and are discounted at r. If we let X = log S, then the risk neutral process

for X is (from equation (2.55) )

dX = (r −

σ

2

2

)dt +σdZ . (5.1)

Now, we can construct a discrete approximation to this random walk using the lattice discussed in Section

2.5. In fact, all we have to do is let let α = r −

σ

2

2

, so that equation (5.1) is formally identical to equation

(2.4). In order to ensure that in the limit as ∆t → 0, we get the process (5.1), we require that the sizes of

the random jumps are ∆X = σ

√

∆t and that the probabilities of up (p) and down (q) moves are

p

r

=

1

2

[1 +

α

σ

√

∆t]

=

1

2

[1 +

r

σ

−

σ

2

√

∆t]

q

r

=

1

2

[1 −

α

σ

√

∆t]

=

1

2

[1 −

r

σ

−

σ

2

√

∆t] , (5.2)

where we have denoted the risk neutral probabilities by p

r

and q

r

to distinguish them from the real proba-

bilities p, q.

Now, we will switch to a more common notation. If we are at node j, timestep n, we will denote this

node location by X

n

j

. Recall that X = log S, so that in terms of asset price, this is S

n

j

= e

X

n

j

.

Now, consider that at node (j, n), the asset can move up with probability p

r

and down with probability

q

r

. In other words

S

n

j

→S

n+1

j+1

; with probability p

r

S

n

j

→S

n+1

j

; with probability q

r

(5.3)

Now, since in Section 2.5 we showed that ∆X = σ

√

∆t, so that (S = e

X

)

S

n+1

j+1

= S

n

j

e

σ

√

∆t

S

n+1

j

= S

n

j

e

−σ

√

∆t

(5.4)

or

S

n

j

= S

0

0

e

(2j−n)σ

√

∆t

; j = 0, .., n (5.5)

So, the ﬁrst step in the process is to construct a tree of stock price values, as shown on Figure 5.

Associated with each stock price on the lattice is the option value V

n

j

. We ﬁrst set the value of the option

at T = N∆t to the payoﬀ. For example, if we are valuing a put option, then

V

N

j

= max(K −S

N

j

, 0) ; j = 0, ..., N (5.6)

Then, we can use the risk neutral world idea to determine the no-arbitrage value of the option (it is the

expected value in the risk neutral world). We can do this by working backward through the lattice. The

value today is the discounted expected future value

36

S

0

0

S

1

1

S

0

1

S

2

2

S

1

2

S

0

2

Figure 5.1: Lattice of stock price values

European Lattice Algorithm

V

n

j

= e

−r∆t

p

r

V

n+1

j+1

+q

r

V

n+1

j

n = N −1, ..., 0

j = 0, ..., n (5.7)

Rolling back through the tree, we obtain the value at S

0

0

today, which is V

0

0

.

If the option is an American put, we can determine if it is optimal to hold or exercise, since we know the

continuation value. In this case the rollback (5.7) becomes

American Lattice Algorithm

(V

n

j

)

c

= e

−r∆t

p

r

V

n+1

j+1

+q

r

V

n+1

j

V

n

j

= max

(V

n

j

)

c

, max(K −S

n

j

, 0)

n = N −1, ..., 0

j = 0, ..., n (5.8)

which is illustrated in Figure 5.

The binomial lattice method has the following advantages

• It is very easy to code for simple cases.

• It is easy to explain to managers.

• American options are easy to handle.

However, the binomial lattice method has the following disadvantages

• Except for simple cases, coding becomes complex. For example, if we want to handle simple barrier

options, things become nightmarish.

• This method is algebraically identical to an explicit ﬁnite diﬀerence solution of the Black-Scholes

equation. Consequently, convergence is at an O(∆t) rate.

37

V

j

n

V

j

n

+

+

1

1

V

j

n+1

p

q

Figure 5.2: Backward recursion step.

• The probabilities p

r

, q

r

are not real probabilities, they are simply the coeﬃcients in a particular dis-

cretization of a PDE. Regarding them as probabilities leads to much fuzzy thinking, and complex

wrong-headed arguments.

If we are going to solve the Black-Scholes PDE, we might as well do it right, and not fool around with

lattices.

5.1 A No-arbitrage Lattice

We can also derive the lattice method directly from the discrete lattice model in Section 2.5. Suppose we

assume that

dS = µSdt +σSdZ (5.9)

and letting X = log S, we have that

dX = (µ −

σ

2

2

)dt +σdZ (5.10)

so that α = µ −

σ

2

2

in equation (2.19). Now, let’s consider the usual hedging portfolio at t = n∆t, S = S

n

j

,

P

n

j

= V

n

j

−(α

h

)S

n

j

, (5.11)

where V

n

j

is the value of the option at t = n∆t, S = S

n

j

. At t = (n + 1)∆t,

S

n

j

→ S

n+1

j+1

; with probability p

S

n

j

→ S

n+1

j

; with probability q

S

n+1

j+1

= S

n

j

e

σ

√

∆t

S

n+1

j

= S

n

j

e

−σ

√

∆t

so that the value of the hedging portfolio at t = n + 1 is

P

n+1

j+1

= V

n+1

j+1

−(α

h

)S

n+1

j+1

; with probability p (5.12)

P

n+1

j

= V

n+1

j

−(α

h

)S

n+1

j

; with probability q . (5.13)

38

Now, as in Section 2.3, we can determine (α

h

) so that the value of the hedging portfolio is independent of

p, q. We do this by requiring that

P

n+1

j+1

= P

n+1

j

(5.14)

so that

V

n+1

j+1

−(α

h

)S

n+1

j+1

= V

n+1

j

−(α

h

)S

n+1

j

which gives

(α

h

) =

V

n+1

j+1

−V

n+1

j

S

n+1

j+1

−S

n+1

j

. (5.15)

Since this portfolio is risk free, it must earn the risk free rate of return, so that

P

n

j

= e

−r∆t

P

n+1

j+1

= e

−r∆t

P

n+1

j

. (5.16)

Now, substitute for P

n

j

from equation (5.11), with P

n+1

j+1

from equation (5.13), and (α

h

) from equation (5.15)

gives

V

n

j

= e

−r∆t

p

r∗

V

n+1

j+1

+q

r∗

V

n+1

j

p

r∗

=

e

r∆t

−e

−σ

√

∆t

e

σ

√

∆t

−e

−σ

√

∆t

q

r∗

= 1 −p

r∗

. (5.17)

Note that p

r∗

, q

r∗

do not depend on the real drift rate µ, which is expected. If we expand p

r∗

, q

r∗

in a Taylor

Series, and compare with the p

r

, q

r

in equations (5.2), we can show that

p

r∗

= p

r

+O((∆t)

3/2

)

q

r∗

= q

r

+O((∆t)

3/2

) . (5.18)

After a bit more work, one can show that the value of the option at t = 0, V

0

0

using either p

r∗

, q

r∗

or p

r

, q

r

is the same to O(∆t), which is not surprising, since these methods can both be regarded as an explicit

ﬁnite diﬀerence approximation to the Black-Scholes equation, having truncation error O(∆t). The deﬁnition

p

r∗

, q

r∗

is the common deﬁnition in ﬁnance books, since the tree has no-arbitrage.

What is the meaning of a no-arbitrage tree? If we are sitting at node S

n

j

, and assuming that there are

only two possible future states

S

n

j

→ S

n+1

j+1

; with probability p

S

n

j

→ S

n+1

j

; with probability q

then using (α

h

) from equation (5.15) guarantees that the hedging portfolio has the same value in both future

states.

But let’s be a bit more sensible here. Suppose we are hedging a portfolio of RIM stock. Let ∆t = one

day. Suppose the price of RIM stocks is $10 today. Do we actually believe that tomorrow there are only two

possible prices for Rim stock

S

up

= 10e

σ

√

∆t

S

down

= 10e

−σ

√

∆t

?

(5.19)

Of course not. This is obviously a highly simpliﬁed model. The fact that there is no-arbitrage in the context

of the simpliﬁed model does not really have a lot of relevance to the real-world situation. The best that

can be said is that if the Black-Scholes model was perfect, then we have that the portfolio hedging ratios

computed using either p

r

, q

r

or p

r∗

, q

r∗

are both correct to O(∆t).

39

6 More on Ito’s Lemma

In Section 2.6.1, we mysteriously made the infamous comment

...it can be shown that dZ

2

→dt as dt →0

In this Section, we will give some justiﬁcation this remark. For a lot more details here, we refer the

reader to Stochastic Diﬀerential Equations, by Bernt Oksendal, Springer, 1998.

We have to go back here, and decide what the statement

dX = αdt +cdZ (6.1)

really means. The only sensible interpretation of this is

X(t) −X(0) =

t

0

α(X(s), s)ds +

t

0

c(X(s), s)dZ(s) . (6.2)

where we can interpret the integrals as the limit, as ∆t →0 of a discrete sum. For example,

t

0

c(X(s), s)dZ(s) = lim

∆t→0

j=N−1

¸

j=0

c

j

∆Z

j

c

j

= c(X(Z

j

), t

j

)

Z

j

= Z(t

j

)

∆Z

j

= Z(t

j+1

) −Z(t

j

)

∆t = t

j+1

−t

j

N = t/(∆t) (6.3)

In particular, in order to derive Ito’s Lemma, we have to decide what

t

0

c(X(s), s) dZ(s)

2

(6.4)

means. Replace the integral by a sum,

t

0

c(X(s), s) dZ(s)

2

= lim

∆t→0

j=N−1

¸

j=0

c(X

j

, t

j

)∆Z

2

j

. (6.5)

Note that we have evaluated the integral using the left hand end point of each subinterval (the no peeking

into the future principle).

From now on, we will use the notation

¸

j

≡

j=N−1

¸

j=0

. (6.6)

Now, we claim that

t

0

c(X(s), s)dZ

2

(s) =

t

0

c(X(s), s)ds (6.7)

or

lim

∆t→0

¸

j

c

j

∆Z

2

j

¸

¸

= lim

∆t→0

¸

j

c

j

∆t (6.8)

40

which is what we mean by equation (6.7). i.e. we can say that dZ

2

→dt as dt →0.

Now, let’s consider a ﬁnite ∆t, and consider the expression

E

¸

¸

j

c

j

∆Z

2

j

−

¸

j

c

j

∆t

¸

2

¸

¸

¸ (6.9)

If equation (6.9) tends to zero as ∆t →0, then we can say that (in the mean square limit)

lim

∆t→0

¸

j

c

j

∆Z

2

j

¸

¸

= lim

∆t→0

¸

j

c

j

∆t

=

t

0

c(X(s), s) ds (6.10)

so that in this sense

t

0

c(X, s) dZ

2

=

t

0

c(X, s) ds (6.11)

and hence we can say that

dZ

2

→dt (6.12)

with probability one as ∆t →0.

Now, expanding equation (6.9) gives

E

¸

¸

j

c

j

∆Z

2

j

−

¸

j

c

j

∆t

¸

2

¸

¸

¸ =

¸

ij

E

c

j

(∆Z

2

j

−∆t)c

i

(∆Z

2

i

−∆t)

. (6.13)

Now, we assume that

E

c

j

(∆Z

2

j

−∆t)c

i

(∆Z

2

i

−∆t)

= δ

ij

E

c

2

i

(∆Z

2

i

−∆t)

2

(6.14)

which means that c

j

(∆Z

2

j

−∆t) and c

i

(∆Z

2

i

−∆t) are independent if i = j. This follows since

• The increments of Brownian motion are uncorrelated, i.e. Cov [∆Z

i

∆Z

j

] = 0, i = j, which means

that Cov

∆Z

2

i

∆Z

2

j

= 0, or E

(∆Z

2

j

−∆t)(∆Z

2

i

−∆t)

= 0, i = j.

• c

i

= c(t

i

, X(Z

i

)), and ∆Z

i

are independent.

It also follows from the above properties that

E[c

2

j

(∆Z

2

j

−∆t)

2

] = E[c

2

j

] E[(∆Z

2

j

−∆t)

2

] (6.15)

since c

j

and (∆Z

2

j

−∆t) are independent.

Using equations (6.14-6.15), then equation (6.13) becomes

¸

ij

E

c

j

(∆Z

2

j

−∆t) c

i

(∆Z

2

i

−∆t)

=

¸

i

E[c

2

i

] E

(∆Z

2

i

−∆t)

2

. (6.16)

Now,

¸

i

E[c

2

i

] E

(∆Z

2

i

−∆t)

2

=

¸

i

E[c

2

i

]

E

∆Z

4

i

−2∆tE

∆Z

2

i

+ (∆t)

2

. (6.17)

41

Recall that (∆Z)

2

is N(0, ∆t) ( normally distributed with mean zero and variance ∆t) so that

E

(∆Z

i

)

2

= ∆t

E

(∆Z

i

)

4

= 3(∆t)

2

(6.18)

so that equation (6.17) becomes

E

∆Z

4

i

−2∆tE

∆Z

2

i

+ (∆t)

2

= 2(∆t)

2

(6.19)

and

¸

i

E[c

2

i

] E

(∆Z

2

i

−∆t)

2

= 2

¸

i

E[c

2

i

](∆t)

2

= 2∆t

¸

i

E[c

2

i

]∆t

= O(∆t) (6.20)

so that we have

E

¸

¸

c

j

∆Z

2

j

−

¸

j

c

j

∆t

¸

2

¸

¸

¸ = O(∆t) (6.21)

or

lim

∆t→0

E

¸

¸

c

j

∆Z

2

j

−

t

0

c(s, X(s))ds

2

¸

= 0 (6.22)

so that in this sense we can write

dZ

2

→ dt ; dt →0 . (6.23)

7 Derivative Contracts on non-traded Assets and Real Options

The hedging arguments used in previous sections use the underlying asset to construct a hedging portfolio.

What if the underlying asset cannot be bought and sold, or is non-storable? If the underlying variable is

an interest rate, we can’t store this. Or if the underlying asset is bandwidth, we can’t store this either.

However, we can get around this using the following approach.

7.1 Derivative Contracts

Let the underlying variable follow

dS = a(S, t)dt +b(S, t)dZ, (7.1)

and let F = F(S, t), so that from Ito’s Lemma

dF =

¸

aF

S

+

b

2

2

F

SS

+F

t

dt +bF

S

dZ, (7.2)

or in shorter form

dF = µdt +σ

∗

dZ

µ = aF

S

+

b

2

2

F

SS

+F

t

σ

∗

= bF

S

. (7.3)

42

Now, instead of hedging with the underlying asset, we will hedge one contract with another. Suppose we

have two contracts F

1

, F

2

(they could have diﬀerent maturities for example). Then

dF

1

= µ

1

dt +σ

∗

1

dZ

dF

2

= µ

2

dt +σ

∗

2

dZ

µ

i

= a(F

i

)

S

+

b

2

2

(F

i

)

SS

+ (F

i

)

t

σ

∗

i

= b(F

i

)

S

; i = 1, 2 . (7.4)

Consider the portfolio P such that (recall that ∆

1

, ∆

2

are constant over the hedging interval)

P = ∆

1

F

1

+ ∆

2

F

2

dP = ∆

1

(µ

1

dt +σ

∗

1

dZ) + ∆

2

(µ

2

dt +σ

∗

2

dZ) (7.5)

We can eliminate the random term by choosing

∆

1

= σ

∗

2

∆

2

= −σ

∗

1

(7.6)

so that

dP = (σ

∗

2

µ

1

−σ

∗

1

µ

2

)dt (7.7)

Since this portfolio is riskless, it must earn the risk-free rate of return,

(σ

∗

2

µ

1

−σ

∗

1

µ

2

)dt = r(∆

1

F

1

+ ∆

2

F

2

)dt (7.8)

and after some simpliﬁcation we obtain

µ

1

−rF

1

σ

∗

1

=

µ

2

−rF

2

σ

∗

2

(7.9)

Let λ

S

(S, t) be the value of both sides of equation (7.9), so that

µ

1

−rF

1

σ

∗

1

= λ

S

(S, t)

µ

2

−rF

2

σ

∗

2

= λ

S

(S, t) . (7.10)

Dropping the subscripts, we obtain

µ −rF

1

σ

∗

= λ

S

(7.11)

Substituting µ, σ

∗

from equations (7.3) into equation (7.11) gives

F

t

+

b

2

2

F

SS

+ (a −λ

S

b)F

S

−rF = 0 . (7.12)

Equation (7.12) is the PDE satisﬁed by a derivative contract on any asset S, traded or not. Suppose

µ = Fµ

σ

∗

= Fσ

(7.13)

so that we can write

dF = Fµ

dt +Fσ

dZ (7.14)

43

then using equation (7.13) in equation (7.11) gives

µ

= r +λ

S

σ

(7.15)

which has the convenient interpretation that the expected return on holding (not hedging) the derivative

contract F is the risk-free rate plus extra compensation due to the riskiness of holding F. The extra return

is λ

S

σ

, where λ

S

is the market price of risk of S (which should be the same for all contracts depending on

S) and σ

is the volatility of F. Note that the volatility and drift of F are not the volatility and drift of the

underlying asset S.

If S is a traded asset, it must satisfy equation (7.12). Substituting F = S into equation (7.13) gives

(a −λ

S

b) = rS, so that F then satisﬁes the usual Black-Scholes equation if b = σS.

If we believe that the Capital Asset Pricing Model holds, then a simple minded idea is to estimate

λ

S

= ρ

SM

λ

M

(7.16)

where λ

M

is the price of risk of the market portfolio, and ρ

SM

is the correlation of returns between S and

the returns of the market portfolio.

Another idea is the following. Suppose we can ﬁnd some companies whose main source of business is

based on S. Let q

i

be the price of this companies stock at t = t

i

. The return of the stock over t

i

−t

i−1

is

R

i

=

q

i

−q

i−1

q

i−1

.

Let R

M

i

be the return of the market portfolio (i.e. a broad index) over the same period. We compute β as

the best ﬁt linear regression to

R

i

= α +βR

M

i

which means that

β =

Cov(R, R

M

)

V ar(R

M

)

. (7.17)

Now, from CAPM we have that

E(R) = r +β

E(R

M

) −r

(7.18)

where E(...) is the expectation operator. We would like to determine the unlevered β, denoted by β

u

, which

is the β for an investment made using equity only. In other words, if the ﬁrm we used to compute the β

above has signiﬁcant debt, its riskiness with respect to S is ampliﬁed. The unlevered β can be computed by

β

u

=

E

E + (1 −T

c

)D

β (7.19)

where

D = long term debt

E = Total market capitalization

T

c

= Corporate Tax rate . (7.20)

So, now the expected return from a pure equity investment based on S is

E(R

u

) = r+ β

u

E(R

M

) −r

. (7.21)

If we assume that F in equation (7.14) is the company stock, then

µ

= E(R

u

)

= r +β

u

E(R

M

) −r

(7.22)

44

But equation (7.15) says that

µ

= r +λ

S

σ

. (7.23)

Combining equations (7.22-7.22) gives

λ

S

σ

= β

u

E(R

M

) −r

. (7.24)

Recall from equations (7.3) and (7.13) that

σ

∗

= Fσ

σ

∗

= bF

S

,

or

σ

=

bF

S

F

. (7.25)

Combining equations (7.24-7.25) gives

λ

S

=

β

u

E(R

M

) −r

bF

S

F

. (7.26)

In principle, we can now compute λ

S

, since

• The unleveraged β

u

is computed as described above. This can be done using market data for a speciﬁc

ﬁrm, whose main business is based on S, and the ﬁrms balance sheet.

• b(S, t)/S is the volatility rate of S (equation (7.1)).

• [E(R

M

) −r] can be determined from historical data. For example, the expected return of the market

index above the risk free rate is about 6% for the past 50 years of Canadian data.

• The risk free rate r is simply the current T-bill rate.

• F

S

can be estimated by computing a linear regression of the stock price of a ﬁrm which invests in

S, and S. Now, this may have to be unlevered, to reduce the eﬀect of debt. If we are going to now

value the real option for a speciﬁc ﬁrm, we will have to make some assumption about how the ﬁrm will

ﬁnance a new investment. If it is going to use pure equity, then we are done. If it is a mixture of debt

and equity, we should relever the value of F

S

. At this point, we need to talk to a Finance Professor to

get this right.

7.2 A Forward Contract

A forward contract is a special type of derivative contract. The holder of a forward contract agrees to buy or

sell the underlying asset at some delivery price K in the future. K is determined so that the cost of entering

into the forward contract is zero at its inception.

The payoﬀ of a (long) forward contract expiring at t = T is then

V (S, τ = 0) = S(T) −K . (7.27)

Note that there is no optionality in a forward contract.

The value of a forward contract is a contingent claim. and its value is given by equation (7.12)

V

t

+

b

2

2

V

SS

+ (a −λ

S

b)V

S

−rV = 0 . (7.28)

Now we can also use a simple no-arbitrage argument to express the value of a forward contract in terms

of the original delivery price K, (which is set at the inception of the contract) and the current forward price

45

f(S, τ). Suppose we are long a forward contract with delivery price K. At some time t > 0, (τ < T), the

forward price is no longer K. Suppose the forward price is f(S, τ), then the payoﬀ of a long forward contract,

entered into at (τ) is

Payoﬀ = S(T) −f(S(τ), τ) .

Suppose we are long the forward contract struck at t = 0 with delivery price K. At some time t > 0, we

hedge this contract by going short a forward with the current delivery price f(S, τ) (which costs us nothing

to enter into). The payoﬀ of this portfolio is

S −K −(S −f) = f −K (7.29)

Since f, K are known with certainty at (S, τ), then the value of this portfolio today is

(f −K)e

−rτ

. (7.30)

But if we hold a forward contract today, we can always construct the above hedge at no cost. Therefore,

V (S, τ) = (f −K)e

−rτ

. (7.31)

Substituting equation (7.31) into equation (7.28), and noting that K is a constant, gives us the following

PDE for the forward price (the delivery price which makes the forward contract worth zero at inception)

f

τ

=

b

2

2

f

SS

+ (a −λ

S

b)f

S

(7.32)

with terminal condition

f(S, τ = 0) = S (7.33)

which can be interpreted as the fact that the forward price must equal the spot price at t = T.

Suppose we can estimate a, b in equation (7.32), and there are a set of forward prices available. We can

then estimate λ

S

by solving equation (7.32) and adjusting λ

S

until we obtain a good ﬁt for the observed

forward prices.

7.2.1 Convenience Yield

We can also write equation (7.32) as

f

t

+

b

2

2

f

SS

+ (r −δ)Sf

S

= 0 (7.34)

where δ is deﬁned as

δ = r −

a −λ

S

b

S

. (7.35)

In this case, we can interpret δ as the convenience yield for holding the asset. For example, there is a

convenience to holding supplies of natural gas in reserve.

8 Discrete Hedging

In practice, we cannot hedge at inﬁnitesimal time intervals. In fact, we would like to hedge as infrequently as

possible, since in real life, there are transaction costs (something which is ignored in the basic Black-Scholes

equation, but which can be taken into account and results in a nonlinear PDE).

46

8.1 Delta Hedging

Recall that the basic derivation of the Black-Scholes equation used a hedging portfolio where we hold V

S

shares. In ﬁnance, V

S

is called the option delta, hence this strategy is called delta hedging.

As an example, consider the hedging portfolio P(t) which is composed of

• A short option position in an option −V (t).

• Long α(t)

h

S(t) shares

• An amount in a risk-free bank account B(t).

Initially, we have

P(0) = 0 = −V (0) +α(0)

h

S(0) +B(0)

α = V

S

B(0) = V (0) −α(0)

h

S(0)

The hedge is rebalanced at discrete times t

i

. Deﬁning

α

h

i

= V

S

(S

i

, t

i

)

V

i

= V (S

i

, t

i

)

then, we have to update the hedge by purchasing α

i

− α

i−1

shares at t = t

i

, so that updating our share

position requires

S(t

i

)(α

h

i

−α

h

i−1

)

in cash, which we borrow from the bank if (α

h

i

−α

h

i−1

) > 0. If (α

h

i

−α

h

i−1

) < 0, then we sell some shares and

deposit the proceeds in the bank account. If ∆t = t

i

−t

i−1

, then the bank account balance is updated by

B

i

= e

r∆t

B

i−1

−S

i

(α

h

i

−α

h

i−1

)

At the instant after the rebalancing time t

i

, the value of the portfolio is

P(t

i

) = −V (t

i

) +α(t

i

)

h

S(t

i

) +B(t

i

)

Since we are hedging at discrete time intervals, the hedge is no longer risk free (it is risk free only in the

limit as the hedging interval goes to zero). We can determine the distribution of proﬁt and loss ( P& L) by

carrying out a Monte Carlo simulation. Suppose we have precomputed the values of V

S

for all the likely (S, t)

values. Then, we simulate a series of random paths. For each random path, we determine the discounted

relative hedging error

error =

e

−rT

∗

P(T

∗

)

V (S

0

, t = 0)

(8.1)

After computing many sample paths, we can plot a histogram of relative hedging error, i.e. fraction of Monte

Carlo trials giving a hedging error between E and E+∆E. We can compute the variance of this distribution,

and also the value at risk (VAR). VAR is the worst case loss with a given probability. For example, a typical

VAR number reported is the maximum loss that would occur 95% of the time. In other words, ﬁnd the value

of E along the x-axis such that the area under the histogram plot to the right of this point is .95 the total

area.

As an example, consider the case of an American put option, T = .25, σ = .3, r = .06, K = S

0

= 100. At

t = 0, S

0

= 100. Since there are discrete hedging errors, the results in this case will depend on the stock drift

rate, which we set at µ = .08. The initial value of the American put, obtained by solving the Black-Scholes

linear complementarity problem, is $5.34. Figure 8.1 shows the results for no hedging, and hedging once

47

0

0.5

1

1.5

2

2.5

3

3.5

4

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

R

e

la

t

iv

e

f

r

e

q

u

e

n

c

y

Relative hedge error

0

0.5

1

1.5

2

2.5

3

3.5

4

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

R

e

la

t

iv

e

f

r

e

q

u

e

n

c

y

Relative hedge error

Figure 8.1: Relative frequency (y-axis) versus relative P&L of delta hedging strategies. Left: no hedging,

right: rebalance hedge once a month. American put, T = .25, σ = .3, r = .06, µ = .08, K = S

0

= 100. The

relative P&L is computed by dividing the actual P&L by the Black-Scholes price.

a month. The x-axis in these plots shows the relative P & L of this portfolio (i.e. P & L divided by the

Black-Scholes price), and the y-axis shows the relative frequency.

Relative P& L =

Actual P& L

Black-Scholes price

(8.2)

Note that the no-hedging strategy actually has a high probability of ending up with a proﬁt (from the

option writer’s point of view) since the drift rate of the stock is positive. In this case, the hedger does

nothing, but simply pockets the option premium. Note the sudden jump in the relative frequency at relative

P&L = 1. This is because the maximum the option writer stands to gain is the option premium, which

we assume is the Black-Scholes value. The writer makes this premium for any path which ends up S > K,

which is many paths, hence the sudden jump in probability. However, there is signiﬁcant probability of a

loss as well. Figure 8.1 also shows the relative frequency of the P&L of hedging once a month (only three

times during the life of the option).

In fact, there is a history of Ponzi-like hedge funds which simply write put options with essentially no

hedging. In this case, these funds will perform very well for several years, since markets tend to drift up on

average. However, then a sudden market drop occurs, and they will blow up. Blowing up is a technical term

for losing all your capital and being forced to get a real job. However, usually the owners of these hedge

funds walk away with large bonuses, and the shareholders take all the losses.

Figure 8.2 shows the results for rebalancing the hedge once a week, and daily. We can see clearly here

that the mean is zero, and variance is getting smaller as the hedging interval is reduced. In fact, one can

show that the standard deviation of the hedge error should be proportional to

√

∆t where ∆t is the hedge

rebalance frequency.

8.2 Gamma Hedging

In an attempt to account for some the errors in delta hedging at ﬁnite hedging intervals, we can try to use

second derivative information. The second derivative of an option value V

SS

is called the option gamma,

hence this strategy is termed delta-gamma hedging.

A gamma hedge consists of

• A short option position −V (t).

• Long α

h

S(t) shares

• Long β another derivative security I.

• An amount in a risk-free bank account B(t).

48

0

0.5

1

1.5

2

2.5

3

3.5

4

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

R

e

la

t

iv

e

f

r

e

q

u

e

n

c

y

Relative hedge error

0

0.5

1

1.5

2

2.5

3

3.5

4

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

R

e

la

t

iv

e

f

r

e

q

u

e

n

c

y

Relative hedge error

Figure 8.2: Relative frequency (y-axis) versus relative P&L of delta hedging strategies. Left: rebalance hedge

once a week, right: rebalance hedge daily. American put, T = .25, σ = .3, r = .06, µ = .08, K = S

0

= 100.

The relative P&L is computed by dividing the actual P&L by the Black-Scholes price.

Now, recall that we consider α

h

, β to be constant over the hedging interval (no peeking into the future),

so we can regard these as constants (for the duration of the hedging interval).

The hedge portfolio P(t) is then

P(t) = −V +α

h

S +βI +B(t)

Assuming that we buy and hold α

h

shares and β of the secondary instrument at the beginning of each

hedging interval, then we require that

∂P

∂S

= −

∂V

∂S

+α

h

+β

∂I

∂S

= 0

∂

2

P

∂S

2

= −

∂

2

V

∂S

2

+β

∂

2

I

∂S

2

= 0 (8.3)

Note that

• If β = 0, then we get back the usual delta hedge.

• In order for the gamma hedge to work, we need an instrument which has some gamma (the asset S has

second derivative zero). Hence, traders often speak of being long (positive) or short (negative) gamma,

and try to buy/sell things to get gamma neutral.

So, at t = 0 we have

P(0) = 0 ⇒B(0) = V (0) −α

h

0

S

0

−β

0

I

0

The amounts α

h

i

, β

i

are determined by requiring that equation (8.3) hold

−(V

S

)

i

+α

h

i

+β

i

(I

S

)

i

= 0

−(V

SS

)

i

+β

i

(I

SS

)

i

= 0 (8.4)

The bank account balance is then updated at each hedging time t

i

by

B

i

= e

r∆t

B

i−1

−S

i

(α

h

i

−α

h

i−1

) −I

i

(β

i

−β

i−1

)

We will consider the same example as we used in the delta hedge example. For an additional instrument,

we will use a European put option written on the same underlying with the same strike price and a maturity

of T=.5 years.

Figure 8.3 shows the results of gamma hedging, along with a comparison on delta hedging. In principle,

gamma hedging produces a smaller variance with less frequent hedging. However, we are exposed to more

model error in this case, since we need to be able to compute the second derivative of the theoretical price.

49

0

5

10

15

20

25

30

35

40

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

R

e

la

t

iv

e

f

r

e

q

u

e

n

c

y

Relative hedge error

0

5

10

15

20

25

30

35

40

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1

R

e

la

t

iv

e

f

r

e

q

u

e

n

c

y

Relative hedge error

Figure 8.3: Relative frequency (y-axis) versus relative P&L of gamma hedging strategies. Left: rebalance

hedge once a week, right: rebalance hedge daily. Dotted lines show the delta hedge for comparison. American

put, T = .25, σ = .3, r = .06, µ = .08, K = 100, S

0

= 100. Secondary instrument: European put option,

same strike, T = .5 years. The relative P&L is computed by dividing the actual P&L by the Black-Scholes

price.

8.3 Vega Hedging

The most important parameter in the option pricing is the volatility. What if we are not sure about the

value of the volatility? It is possible to assume that the volatility itself is stochastic, i.e.

dS = µSdt +

√

vSdZ

1

dv = κ(θ −v)dt +σ

v

√

vdZ

2

(8.5)

where µ is the expected growth rate of the stock price,

√

v is its instantaneous volatility, κ is a parameter

controlling how fast v reverts to its mean level of θ, σ

v

is the “volatility of volatility” parameter, and Z

1

, Z

2

are Wiener processes with correlation parameter ρ.

If we use the model in equation (8.5), the this will result in a two factor PDE to solve for the option price

and the hedging parameters. Since there are two sources of risk (dZ

1

, dZ

2

), we will need to hedge with the

underlying asset and another option (Heston, A closed form solution for options with stochastic volatility

with applications to bond and currency options, Rev. Fin. Studies 6 (1993) 327-343).

Another possibility is to assume that the volatility is uncertain, and to assume that

σ

min

≤ σ ≤ σ

max

,

and to hedge based on a worst case (from the hedger’s point of view). This results in an uncertain volatil-

ity model (Avellaneda, Levy, Paris, Pricing and Hedging Derivative Securities in Markets with Uncertain

Volatilities, Appl. Math. Fin. 2 (1995) 77-88). This is great if you can get someone to buy this option at

this price, because the hedger is always guaranteed to end up with a non-negative balance in the hedging

portfolio. But you may not be able to sell at this price, since the option price is expensive (after all, the

price you get has to cover the worst case scenario).

An alternative, much simpler, approach (and therefore popular in industry), is to construct a vega hedge.

We assume that we know the volatility, and price the option in the usual way. Then, as with a gamma

hedge, we construct a portfolio

• A short option position −V (t).

• Long α

h

S(t) shares

• Long β another derivative security I.

• An amount in a risk-free bank account B(t).

50

The hedge portfolio P(t) is then

P(t) = −V +α

h

S +βI +B(t)

Assuming that we buy and hold α

h

shares and β of the secondary instrument at the beginning of each

hedging interval, then we require that

∂P

∂S

= −

∂V

∂S

+α

h

+β

∂I

∂S

= 0

∂P

∂σ

= −

∂V

∂σ

+β

∂I

∂σ

= 0 (8.6)

Note that if we assume that σ is constant when pricing the option, yet do not assume σ is constant when

we hedge, this is somewhat inconsistent. Nevertheless, we can determine the derivatives in equation (8.6)

numerically (solve the pricing equation for several diﬀerent values of σ, and then ﬁnite diﬀerence the solu-

tions).

In practice, we would sell the option priced using our best estimate of σ (today). This is usually based on

looking at the prices of traded options, and then backing out the volatility which gives back today’s traded

option price (this is the implied volatility). Then as time goes on, the implied volatility will likely change.

We use the current implied volatility to determine the current hedge parameters in equation (8.6). Since

this implied volatility has likely changed since we last rebalanced the hedge, there is some error in the hedge.

However, taking into account the change in the hedge portfolio through equations (8.6) should make up for

this error. This procedure is called delta-vega hedging.

In fact, even if the underlying process is a stochastic volatility, the vega hedge computed using a constant

volatility model works surprisingly well (Hull and White, The pricing of options on assets with stochastic

volatilities, J. of Finance, 42 (1987) 281-300).

9 Jump Diﬀusion

Recall that if

dS = µSdt +σS dZ (9.1)

then from Ito’s Lemma we have

d[log S] = [µ −

σ

2

2

] dt +σ dZ. (9.2)

Now, suppose that we observe asset prices at discrete times t

i

, i.e. S(t

i

) = S

i

, with ∆t = t

i+1

− t

i

. Then

from equation (9.2) we have

log S

i+1

−log S

i

= log(

S

i+1

S

i

)

· [µ −

σ

2

2

] ∆t +σφ

√

∆t (9.3)

where φ is N(0, 1). Now, if ∆t is suﬃciently small, then ∆t is much smaller than

√

∆t, so that equation

(9.3) can be approximated by

log(

S

i+1

−S

i

+S

i

S

i

) = log(1 +

S

i+1

−S

i

S

i

)

· σφ

√

∆t. (9.4)

Deﬁne the return R

i

in the period t

i+1

−t

i

as

R

i

=

S

i+1

−S

i

S

i

(9.5)

51

so that equation (9.4) becomes

log(1 +R

i

) · R

i

= σφ

√

∆t.

Consequently, a plot of the discretely observed returns of S should be normally distributed, if the as-

sumption (9.1) is true. In Figure 9.1 we can see a histogram of monthly returns from the S&P500 for the

period 1982 −2002. The histogram has been scaled to zero mean and unit standard deviation. A standard

normal distribution is also shown. Note that for real data, there is a higher peak, and fatter tails than

the normal distribution. This means that there is higher probability of zero return, or a large gain or loss

compared to a normal distribution.

−1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6

0

5

10

15

20

Figure 9.1: Probability density functions for the S&P 500 monthly returns 1982 −2002, scaled to zero mean

and unit standard deviation and the standardized Normal distribution.

As ∆t →0, Geometric Brownian Motion (equation (9.1)) assumes that the probability of a large return

also tends to zero. The amplitude of the return is proportional to

√

∆t, so that the tails of the distribution

become unimportant.

But, in real life, we can sometimes see very large returns (positive or negative) in small time increments.

It therefore appears that Geometric Brownian Motion (GBM) is missing something important.

9.1 The Poisson Process

Consider a process where most of the time nothing happens (contrast this with Brownian motion, where

some changes occur at any time scale we look at), but on rare occasions, a jump occurs. The jump size does

not depend on the time interval, but the probability of the jump occurring does depend on the interval.

More formally, consider the process dq where, in the interval [t, t +dt],

dq = 1 ; with probability λdt

= 0 ; with probability 1 −λdt. (9.6)

Note, once again, that size of the Poisson outcome does not depend on dt. Also, the probability of a jump

occurring in [t, t +dt] goes to zero as dt →0, in contrast to Brownian motion, where some movement always

takes place (the probability of movement is constant as dt →0), but the size of the movement tends to zero

as dt →0. For future reference, note that

E[dq] = λ dt 1 + (1 −λ dt) 0

= λ dt (9.7)

52

0 0.05 0.1 0.15 0.2 0.25

40

60

80

100

120

140

160

180

Time (years)

A

s

s

e

t

P

r

i

c

e

Figure 9.2: Some realizations of a jump diﬀusion process which follows equation (9.10).

and

V ar(dq) = E[(dq −E[dq])

2

]

= E[(dq −λ dt)

2

]

= (1 −λ dt)

2

λ dt + (0 −λ dt)

2

(1 −λ dt)

= λ dt +O((dt)

2

) . (9.8)

Now, suppose we assume that, along with the usual GBM, occasionally the asset jumps, i.e. S → JS,

where J is the size of a (proportional) jump. We will restrict J to be non-negative.

Suppose a jump occurs in [t, t +dt], with probability λdt. Let’s write this jump process as an SDE, i.e.

[dS]

jump

= (J −1)S dq

since, if a jump occurs

S

after jump

= S

before jump

+ [dS]

jump

= S

before jump

+ (J −1)S

before jump

= JS

before jump

(9.9)

which is what we want to model. So, if we have a combination of GBM and a rare jump event, then

dS = µS dt +σS dZ + (J −1)S dq (9.10)

Assume that the jump size has some known probability density g(J), i.e. given that a jump occurs, then the

probability of a jump in [J, J +dJ] is g(J) dJ, and

+∞

−∞

g(J) dJ =

∞

0

g(J) dJ = 1 (9.11)

since we assume that g(J) = 0 if J < 0. For future reference, if f = f(J), then the expected value of f is

E[f] =

∞

0

f(J)g(J) dJ . (9.12)

The process (9.10) is basically geometric Brownian motion (a continuous process) with rare discontinuous

jumps. Some example realizations of jump diﬀusion paths are shown in Figure 9.2.

53

9.2 The Jump Diﬀusion Pricing Equation

Now, form the usual hedging portfolio

P = V −αS . (9.13)

Now, consider

[dP]

total

= [dP]

Brownian

+ [dP]

jump

(9.14)

where, from Ito’s Lemma

[dP]

Brownian

= [V

t

+

σ

2

S

2

2

V

SS

]dt + [V

S

−αS](µS dt +σS dZ) (9.15)

and, noting that the jump is of ﬁnite size,

[dP]

jump

= [V (JS, t) −V (S, t)] dq −α(J −1)S dq . (9.16)

If we hedge the Brownian motion risk, by setting α = V

S

, then equations (9.14-9.16) give us

dP = [V

t

+

σ

2

S

2

2

V

SS

]dt + [V (JS, t) −V (S, t)]dq −V

S

(J −1)S dq . (9.17)

So, we still have a random component (dq) which we have not hedged away. Let’s take the expected value

of this change in the portfolio, e.g.

E(dP) = [V

t

+

σ

2

S

2

2

V

SS

]dt +E[V (JS, t) −V (S, t)]E[dq] −V

S

SE[J −1]E[dq] (9.18)

where we have assumed that probability of the jump and the probability of the size of the jump are inde-

pendent. Deﬁning E(J −1) = κ, then we have that equation (9.18) becomes

E(dP) = [V

t

+

σ

2

S

2

2

V

SS

]dt +E[V (JS, t) −V (S, t)]λ dt −V

S

Sκλ dt . (9.19)

Now, we make a rather interesting assumption. Assume that an investor holds a diversiﬁed portfolio of

these hedging portfolios, for many diﬀerent stocks. If we make the rather dubious assumption that these

jumps for diﬀerent stocks are uncorrelated, then the variance of this portfolio of portfolios is small, hence

there is little risk in this portfolio. Hence, the expected return should be

E[dP] = rP dt . (9.20)

Now, equating equations (9.19 and (9.20) gives

V

t

+

σ

2

S

2

2

V

SS

+V

S

[rS −Sκλ] −(r +λ)V +E[V (JS, t)]λ = 0 . (9.21)

Using equation (9.12) in equation (9.21) gives

V

t

+

σ

2

S

2

2

V

SS

+V

S

[rS −Sκλ] −(r +λ)V +λ

∞

0

g(J)V (JS, t) dJ = 0 . (9.22)

Equation (9.22) is a Partial Integral Diﬀerential Equation (PIDE).

A common assumption is to assume that g(J) is log normal,

g(J) =

exp

−

(log(J)

2

−ˆ µ)

2γ

2

√

2πγJ

. (9.23)

54

where, some algebra shows that

E(J −1) = κ = exp(ˆ µ +γ

2

/2) −1 . (9.24)

Now, what about our dubious assumption that jump risk was diversiﬁable? In practice, we can regard

σ, ˆ µ, γ, λ as parameters, and ﬁt them to observed option prices. If we do this, (see L. Andersen and J.

Andreasen, Jump-Diﬀusion processes: Volatility smile ﬁtting and numerical methods, Review of Derivatives

Research (2002), vol 4, pages 231-262), then we ﬁnd that σ is close to historical volatility, but that the ﬁtted

values of λ, ˆ µ are at odds with the historical values. The ﬁtted values seem to indicate that investors are

pricing in larger more frequent jumps than has been historically observed. In other words, actual prices seem

to indicate that investors do require some compensation for jump risk, which makes sense. In other words,

these parameters contain a market price of risk.

Consequently, our assumption about jump risk being diversiﬁable is not really a problem if we ﬁt the

jump parameters from market (as opposed to historical) data, since the market-ﬁt parameters will contain

some eﬀect due to risk preferences of investors.

One can be more rigorous about this if you assume some utility function for investors. See (Alan Lewis,

Fear of jumps, Wilmott Magazine, December, 2002, pages 60-67) or (V. Naik, M. Lee, General equilibrium

pricing of options on the market portfolio with discontinuous returns, The Review of Financial Studies, vol

3 (1990) pages 493-521.)

10 Mean Variance Portfolio Optimization

An introduction to Computational Finance would not be complete without some discussion of Portfolio

Optimization. Consider a risky asset which follows Geometric Brownian Motion with drift

dS

S

= µ dt +σ dZ , (10.1)

where as usual dZ = φ

√

dt and φ ∼ N(0, 1). Suppose we consider a ﬁxed ﬁnite interval ∆t, then we can

write equation (10.1) as

R = µ

+σ

φ

R =

∆S

S

µ

= µ∆t

σ

= σ

√

∆t , (10.2)

where R is the actual return on the asset in [t, t + ∆t], µ

**is the expected return on the asset in [t, t + ∆t],
**

and σ

**is the standard deviation of the return on the asset in [t, t + ∆t].
**

Now consider a portfolio of N risky assets. Let R

i

be the return on asset i in [t, t + ∆t], so that

R

i

= µ

i

+σ

i

φ

i

(10.3)

Suppose that the correlation between asset i and asset j is given by ρ

ij

= E[φ

i

φ

j

]. Suppose we buy x

i

of

each asset at t, to form the portfolio P

P =

i=N

¸

i=1

x

i

S

i

. (10.4)

55

Then, over the interval [t, t + ∆t]

P + ∆P =

i=N

¸

i=1

x

i

S

i

(1 +R

i

)

∆P =

i=N

¸

i=1

x

i

S

i

R

i

∆P

P

=

i=N

¸

i=1

w

i

R

i

w

i

=

x

i

S

i

¸

j=N

j=1

x

j

S

j

(10.5)

In other words, we divide up our total wealth W =

¸

i=N

i=1

x

i

S

i

into each asset with weight w

i

. Note that

¸

i=N

i=1

w

i

= 1.

To summarize, given some initial wealth at t, we suppose that an investor allocates a fraction w

i

of this

wealth to each asset i. We assume that the total wealth is allocated to this risky portfolio P, so that

i=N

¸

i=1

w

i

= 1

P =

i=N

¸

i=1

x

i

S

i

R

p

=

∆P

P

=

i=N

¸

i=1

w

i

R

i

. (10.6)

The expected return on this portfolio R

p

in [t, t + ∆t] is

R

p

=

i=N

¸

i=1

w

i

µ

i

, (10.7)

while the variance of R

p

in [t, t + ∆t] is

V ar(R

p

) =

i=N

¸

i=1

j=N

¸

j=1

w

i

w

j

σ

i

σ

j

ρ

ij

. (10.8)

10.1 Special Cases

Suppose the assets all have zero correlation with one another, i.e. ρ

ij

≡ 0, ∀i = j (of course ρ

ii

= 1). Then

equation (10.8) becomes

V ar(R

p

) =

i=N

¸

i=1

(σ

i

)

2

(w

i

)

2

. (10.9)

Now, suppose we equally weight all the assets in the portfolio, i.e. w

i

= 1/N, ∀i. Let max

i

σ

i

= σ

max

, then

V ar(R

p

) =

1

N

2

i=N

¸

i=1

(σ

i

)

2

≤

N(σ

max

)

2

N

2

= O

1

N

, (10.10)

56

so that in this special case, if we diversify over a large number of assets, the standard deviation of the

portfolio tends to zero as N →∞.

Consider another case: all assets are perfectly correlated, ρ

ij

= 1, ∀i, j. In this case

V ar(R

p

) =

i=N

¸

i=1

j=N

¸

j=1

w

i

w

j

σ

i

σ

j

=

j=N

¸

j=1

w

j

σ

j

¸

¸

2

(10.11)

so that if sd(R) =

V ar(R) is the standard deviation of R, then, in this case

sd(R

p

) =

j=N

¸

j=1

w

j

σ

j

, (10.12)

which means that in this case the standard deviation of the portfolio is simply the weighted average of the

individual asset standard deviations.

In general, we can expect that 0 < [ρ

ij

[ < 1, so that the standard deviation of a portfolio of assets will

be smaller than the weighted average of the individual asset standard deviation, but larger than zero.

This means that diversiﬁcation will be a good thing (as Martha Stewart would say) in terms of risk versus

reward. In fact, a portfolio of as little as 10 −20 stocks tends to reap most of the beneﬁts of diversiﬁcation.

10.2 The Portfolio Allocation Problem

Diﬀerent investors will choose diﬀerent portfolios depending on how much risk they wish to take. However,

all investors like to achieve the highest possible expected return for a given amount of risk. We are assuming

that risk and standard deviation of portfolio return are synonymous.

Let the covariance matrix C be deﬁned as

[C]

ij

= C

ij

= σ

i

σ

j

ρ

ij

(10.13)

and deﬁne the vectors ¯ µ = [µ

1

, µ

2

, ..., µ

N

]

t

, ¯ w = [w

1

, w

2

, ..., w

N

]

t

. In theory, the covariance matrix should be

symmetric positive semi-deﬁnite. However, measurement errors may result in C having a negative eigenvalue,

which should be ﬁxed up somehow.

The expected return on the portfolio is then

R

p

= ¯ w

t

¯ µ , (10.14)

and the variance is

V ar(R

p

) = ¯ w

t

C ¯ w . (10.15)

We can think of portfolio allocation problem as the following. Let α represent the degree with which

investors want to maximize return at the expense of assuming more risk. If α → 0, then investors want

to avoid as much risk as possible. On the other hand, if α → ∞, then investors seek only to maximize

expected return, and don’t care about risk. The portfolio allocation problem is then (for given α) ﬁnd ¯ w

which satisﬁes

min

¯ w

¯ w

t

C ¯ w −α ¯ w

t

¯ µ (10.16)

subject to the constraints

¸

i

w

i

= 1 (10.17)

L

i

≤ w

i

≤ U

i

; i = 1, ..., N . (10.18)

57

Standard Deviation

E

x

p

e

c

t

e

d

R

e

t

u

r

n

0.2 0.3 0.4 0.5 0.6

0.1

0.125

0.15

0.175

0.2

0.225

0.25

Efficient Frontier

Figure 10.1: A typical eﬃcient frontier. This curve shows, for each value of portfolio standard deviation

SD(R

p

), the maximum possible expected portfolio return R

p

. Data in equation (10.20).

Constraint (10.17) is simply equation (10.6), while constraints (10.18) may arise due to the nature of the

portfolio. For example, most mutual funds can only hold long positions (w

i

≥ 0), and they may also be

prohibited from having a large position in any one asset (e.g. w

i

≤ .20). Long-short hedge funds will not

have these types of restrictions. For ﬁxed α, equations (10.16-10.18) constitute a quadratic programming

problem.

Let

sd(R

p

) = standard deviation of R

p

=

V ar(R

p

) (10.19)

We can now trace out a curve on the (sd(R

p

), R

p

) plane. We pick various values of α, and then solve the

quadratic programming problem (10.16-10.18). Figure 10.1 shows a typical curve, which is also known as

the eﬃcient frontier. The data used for this example is

¯ µ =

.15

.20

.08

¸

¸

; C =

.20 .05 −.01

.05 .30 .015

−.01 .015 .1

¸

¸

L =

0

0

0

¸

¸

; U =

∞

∞

∞

¸

¸

(10.20)

We have restricted this portfolio to be long only. For a given value of the standard deviation of the

portfolio return (sd(R

p

)), then any point below the curve is not eﬃcient, since there is another portfolio

with the same risk (standard deviation) and higher expected return. Only points on the curve are eﬃcient

in this manner. In general, a linear combination of portfolios at two points along the eﬃcient frontier will be

feasible, i.e. satisfy the constraints. This feasible region will be convex along the eﬃcient frontier. Another

way of saying this is that a straight line joining any two points along the curve does not intersect the curve

except at the given two points. Why is this the case? If this was not true, then the eﬃcient frontier would

not really be eﬃcient. (see Portfolio Theory and Capital Markets, W. Sharpe, McGraw Hill, 1970, reprinted

in 2000).

Figure 10.2 shows results if we allow the portfolio to hold up to .25 short positions in each asset. In other

58

Standard Deviation

E

x

p

e

c

t

e

d

R

e

t

u

r

n

0.2 0.3 0.4 0.5 0.6

0.1

0.125

0.15

0.175

0.2

0.225

0.25

Long Only

Long-Short

Figure 10.2: Eﬃcient frontier, comparing results for long-only portfolio (10.20) and a long-short portfolio

(same data except that lower bound constraint is replaced by equation (10.21).

words, the data is the same as in (10.20) except that

L =

−.25

−.25

−.25

¸

¸

. (10.21)

In general, long-short portfolios are more eﬃcient than long-only portfolios. This is the advertised advantage

of long-short hedge funds.

Since the feasible region is convex, we can actually proceed in a diﬀerent manner when constructing the

eﬃcient frontier. First of all, we can determine the maximum possible expected return (α = ∞ in equation

(10.16)),

min

¯ w

−¯ w

t

¯ µ

¸

i

w

i

= 1

L

i

≤ w

i

≤ U

i

; i = 1, ..., N (10.22)

which is simply a linear programming problem. If the solution weight vector to this problem is ( ¯ w)

max

, then

the maximum possible expected return is (R

p

)

max

= ¯ w

t

max

¯ µ.

Then determine the portfolio with the smallest possible risk, (α = 0 in equation (10.16) )

min

¯ w

¯ w

t

C ¯ w

¸

i

w

i

= 1

L

i

≤ w

i

≤ U

i

; i = 1, ..., N . (10.23)

If the solution weight vector to this quadratic program is given by ¯ w

min

, then the minimum possible portfolio

return is (R

p

)

min

= ¯ w

t

min

¯ µ. We then divide up the range [(R

p

)

min

, (R

p

)

max

] into a large number of discrete

portfolio returns (R

p

)

k

; k = 1, ..., N

pts

. Let e = [1, 1, ..., 1]

t

, and

A =

¸

¯ µ

t

e

t

; B

k

=

¸

(R

p

)

k

1

(10.24)

59

then, for given (R

p

)

k

we solve the quadratic program

min

¯ w

¯ w

t

C ¯ w

A¯ w = B

k

L

i

≤ w

i

≤ U

i

; i = 1, ..., N , (10.25)

with solution vector ( ¯ w)

k

and hence portfolio standard deviation sd((R

p

)

k

) =

( ¯ w)

t

k

C( ¯ w)

k

. This gives us

a set of pairs (sd((R

p

)

k

), (R

p

)

k

), k = 1, ..., N

pts

.

10.3 Adding a Risk-free asset

Up to now, we have assumed that each asset is risky, i.e. σ

i

> 0, ∀i. However, what happens if we add a

risk free asset to our portfolio? This risk-free asset must earn the risk free rate r

**= r∆t, and its standard
**

deviation is zero. The data for this case is (the risk-free asset is added to the end of the weight vector, with

r

= .03).

¯ µ =

.15

.20

.08

.03

¸

¸

¸

¸

; C =

.20 .05 −.01 0.0

.05 .30 .015 0.0

−.01 .015 .1 0.0

0.0 0.0 0.0 0.0

¸

¸

¸

¸

L =

0

0

0

−∞

¸

¸

¸

¸

; U =

∞

∞

∞

∞

¸

¸

¸

¸

(10.26)

where we have assumed that we can borrow any amount at the risk-free rate (a dubious assumption).

If we compute the eﬃcient frontier with a portfolio of risky assets and include one risk-free asset, we get

the result labeled capital market line in Figure 10.3. In other words, in this case the eﬃcient frontier is a

straight line. Note that this straight line is always above the eﬃcient frontier for the portfolio consisting of

all risky assets (as in Figure 10.1). In fact, given the eﬃcient frontier from Figure 10.1, we can construct the

eﬃcient frontier for a portfolio of the same risky assets plus a risk free asset in the following way. First of all,

we start at the point (0, r

) in the (sd(R

p

), R

p

) plane, corresponding to a portfolio which consists entirely

of the risk free asset. We then draw a straight line passing through (0, r

**), which touches the all-risky-asset
**

eﬃcient frontier at a single point (the straight line is tangent the all-risky-asset eﬃcient frontier). Let the

portfolio weights at this single point be denoted by ¯ w

M

. The portfolio corresponding to the weights ¯ w

M

is termed the market portfolio. Let (R

p

)

M

= ¯ w

t

M

¯ µ be the expected return on this market portfolio, with

corresponding standard deviation sd((R

p

)

M

). Let w

r

be the fraction invested in the risk free asset. Then,

any point along the capital market line has

R

p

= w

r

r

+ (1 −w

r

)(R

p

)

M

sd(R

p

) = (1 −w

r

) sd((R

p

)

M

) . (10.27)

If w

r

≥ 0, then we are lending at the risk-free rate. If w

r

< 0, we are borrowing at the risk-free rate.

Consequently, given a portfolio of risky assets, and a risk-free asset, then all investors should divide their

assets between the risk-free asset and the market portfolio. Any other choice for the portfolio is not eﬃcient.

Note that the actual fraction selected for investment in the market portfolio depends on the risk preferences

of the investor.

The capital market line is so important, that the equation of this line is written as R

p

= r

+λ

M

sd((R

p

)),

where λ

M

is the market price of risk. In other words, all diversiﬁed investors, at any particular point in

time, should have diversiﬁed portfolios which plot along the capital market line. All portfolios should have

the same Sharp ratio

λ

M

=

R

p

−r

sd(R

p

)

. (10.28)

60

Standard Deviation

E

x

p

e

c

t

e

d

R

e

t

u

r

n

0 0.1 0.2 0.3 0.4 0.5 0.6

0

0.025

0.05

0.075

0.1

0.125

0.15

0.175

0.2

0.225

0.25

Efficient Frontier

All Risky Assets

Risk Free

Return

Market

Portfolio

Capital Market

Line

Lending

Borrowing

Figure 10.3: The eﬃcient frontier from Figure 10.1 (all risky assets), and the eﬃcient frontier with the same

assets as in Figure 10.1, except that we include a risk free asset. In this case, the eﬃcient frontier becomes

a straight line, shown as the capital market line.

10.4 Criticism

Is mean-variance portfolio optimization the solution to all our problems? Not exactly. We have assumed

that µ

, σ

**are independent of time. This is not likely. Even if these parameters are reasonably constant,
**

they are diﬃcult to estimate. In particular, µ

**is hard to determine if the time series of returns is not very
**

long. Remember that for short time series, the noise term (Brownian motion) will dominate. If we have a

long time series, we can get a better estimate for µ

, but why do we think µ

**for a particular ﬁrm will be
**

constant for long periods? Probably, stock analysts should be estimating µ

**from company balance sheets,
**

sales data, etc. However, for the past few years, analysts have been too busy hyping stocks and going to

lunch to do any real work. So, there will be lots of diﬀerent estimates of µ

**, C, and hence many diﬀerent
**

optimal portfolios.

In fact, some recent studies have suggested that if investors simply use the 1/N rule, whereby initial

wealth is allocated equally between N assets, that this does a pretty good job, assuming that there is

uncertainty in the estimates of µ

, C.

We have also assumed that risk is measured by standard deviation of portfolio return. Actually, if I am

long an asset, I like it when the asset goes up, and I don’t like it when the asset goes down. In other words,

volatility which makes the price increase is good. This suggests that perhaps it may be more appropriate to

minimize downside risk only (assuming a long position).

Perhaps one of the most useful ideas that come from mean-variance portfolio optimization is that diver-

siﬁed investors (at any point in time) expect that any optimal portfolio will produce a return

R

p

= r

+λ

M

σ

p

R

p

= Expected portfolio return

r

**= risk-free return in period ∆t
**

λ

M

= market price of risk

σ

p

= Portfolio volatility , (10.29)

61

where diﬀerent investors will choose portfolios with diﬀerent σ

p

(volatility), depending on their risk prefer-

ences, but λ

M

is the same for all investors. Of course, we also have

R

M

= r

+λ

M

σ

M

. (10.30)

Note: there is a whole ﬁeld called Behavioural Finance, whose adherents don’t think much of mean-

variance portfolio optimization.

Another recent approach is to compute the optimal portfolio weights using using many diﬀerent perturbed

input data sets. The input data (expected returns, and covariances) are determined by resampling, i.e.

assuming that the observed values have some observational errors. In this way, we can get an some sort of

optimal portfolio weights which have some eﬀect of data errors incorporated in the result. This gives us an

average eﬃcient frontier, which, it is claimed, is less sensitive to data errors.

10.5 Individual Securities

Equation (10.30) refers to an eﬃcient portfolio. What is the relationship between risk and reward for

individual securities? Consider the following portfolio: divide all wealth between the market portfolio, with

weight w

M

and security i, with weight w

i

. By deﬁnition

w

M

+w

i

= 1 , (10.31)

and we deﬁne

R

M

= expected return on the market portfolio

R

i

= expected return on asset i

σ

M

= s.d. of return on market portfolio

σ

i

= s.d. of return on asset i

C

i,M

= σ

M

σ

i

ρ

i,M

= Covariance between i and M (10.32)

Now, the expected return on this portfolio is

R

p

= E[R

p

] = w

i

R

i

+w

M

R

M

= w

i

R

i

+ (1 −w

i

)R

M

(10.33)

and the variance is

V ar(R

p

) = (σ

p

)

2

= w

2

i

(σ

i

)

2

+ 2w

i

w

M

C

i,M

+w

2

M

(σ

M

)

2

= w

2

i

(σ

i

)

2

+ 2w

i

(1 −w

i

)C

i,M

+ (1 −w

i

)

2

(σ

M

)

2

(10.34)

For a set of values ¦w

i

¦, equations (10.33-10.34) will plot a curve in expected return-standard deviation

plane (R

p

, σ

p

) (e.g. Figure 10.3). Let’s determine the slope of this curve when w

i

→0, i.e. when this curve

intersects the capital market line at the market portfolio.

2(σ

p

)

∂(σ

p

)

∂w

i

= 2w

i

(σ

i

)

2

+ 2(1 −2w

i

)C

i,M

+ 2(w

i

−1)(σ

M

)

2

∂R

p

∂w

i

= R

i

−R

M

. (10.35)

Now,

∂R

p

∂(σ

p

)

=

∂Rp

∂wi

∂(σ

p

)

∂wi

=

(R

i

−R

M

)(σ

p

)

w

i

(σ

i

)

2

+ (1 −2w

i

)C

i,M

+ (w

i

−1)(σ

M

)

2

. (10.36)

62

Now, let w

i

→0 in equation (10.36), then we obtain

∂R

p

∂(σ

p

)

=

(R

i

−R

M

)(σ

M

)

C

i,M

−(σ

M

)

2

(10.37)

But this curve should be tangent to the capital market line, equation (10.30) at the point where the capital

market line touches the eﬃcient frontier. If this curve is not tangent to the capital market line, then this

implies that if we choose w

i

= ±, then the curve would be above the capital market line, which should not

be possible (the capital market line is the most eﬃcient possible portfolio). This assumes that positions with

w

i

< 0 in asset i are possible.

Assuming that the slope of the R

p

portfolio is tangent to the capital market line gives (from equations

(10.30,10.37))

R

M

−r

(σ

M

)

=

(R

i

−R

M

)(σ

M

)

C

i,M

−(σ

M

)

2

(10.38)

or

R

i

= r

+β

i

(R

M

−r

)

β

i

=

C

i,M

(σ

M

)

2

. (10.39)

The coeﬃcient β

i

in equation (10.39) has a nice intuitive deﬁnition. Suppose we have a time series of returns

(R

i

)

k

= Return on asset i, in period k

(R

M

)

k

= Return on market portfolio in period k . (10.40)

Typically, we assume that the market portfolio is a broad index, such as the TSX 300. Now, suppose we try

to obtain a least squares ﬁt to the above data, using the equation

R

i

· α

i

+b

i

R

M

. (10.41)

Carrying out the usual least squares analysis (e.g. do a linear regression of R

i

vs. R

M

), we ﬁnd that

b

i

=

C

i,M

(σ

M

)

2

(10.42)

so that we can write

R

i

· α

i

+β

i

R

M

. (10.43)

This means that β

i

is the slope of the best ﬁt straight line to a ((R

i

)

k

, (R

M

)

k

) scatter plot. An example is

shown in Figure 10.4. Now, from equation (10.39) we have that

R

i

= r

+β

i

(R

M

−r

) (10.44)

which is consistent with equation (10.43) if

R

i

= α

i

+β

i

R

M

+

i

E[

i

] = 0

α

i

= r

(1 −β

i

)

E[

i

, R

M

] = 0 , (10.45)

since

E[R

i

] = R

i

= α

i

+β

i

R

M

. (10.46)

63

−0.08 −0.06 −0.04 −0.02 0 0.02 0.04

−0.1

−0.05

0

0.05

0.1

0.15

Return on market

R

e

t

u

r

n

o

n

s

t

o

c

k

Rogers Wireless Communications

Figure 10.4: Return on Rogers Wireless Communications versus return on TSE 300. Each point represents

pairs of daily returns. The vertical axis measures the daily return on the stock and the horizontal axis that

of the TSE300.

Equation (10.46) has the interpretation that the return on asset i can be decomposed into a drift component,

a part which is correlated to the market portfolio (the broad index), and a random part uncorrelated with

the index. Make the following assumptions

E[

i

j

] = 0 ; i = j

= e

2

i

; i = j (10.47)

e.g. that returns on each each asset are correlated only through their correlation with the index. Consider

once again a portfolio where the wealth is divided amongst N assets, each asset receiving a fraction w

i

of

the initial wealth. In this case, the return on the portfolio is

R

p

=

i=N

¸

i=1

w

i

R

i

R

p

=

i=N

¸

i=1

w

i

α

i

+R

M

i=N

¸

i=1

w

i

β

i

(10.48)

and

s.d.(R

p

) =

(σ

M

)

2

i=N

¸

i=1

j=N

¸

j=1

w

i

w

j

β

i

β

j

+

i=N

¸

i=1

w

2

i

e

2

i

=

(σ

M

)

2

i=N

¸

i=1

w

i

β

i

2

+

i=N

¸

i=1

w

2

i

e

2

i

. (10.49)

Now, if w

i

= O(1/N), then

i=N

¸

i=1

w

2

i

e

2

i

(10.50)

is O(1/N) as N becomes large, hence equation (10.49) becomes

s.d.(R

p

) · σ

M

i=N

¸

i=1

w

i

β

i

. (10.51)

64

Note that if we write

R

i

= r

+λ

i

σ

i

(10.52)

then we also have that

R

i

= r

+β

i

(R

M

−r

) (10.53)

so that the market price of risk of security i is

λ

i

=

β

i

(R

M

−r

)

σ

i

(10.54)

which is useful in real options analysis.

11 Stocks for the Long Run?

Conventional wisdom states that investment in a diversiﬁed portfolio of equities has a low risk for a long

term investor. However, in a recent article (”Irrational Optimisim,” Fin. Anal. J. E. Simson, P.Marsh, M.

Staunton, vol 60 (January, 2004) 25-35) an extensive analysis of historical data of equity returns was carried

out. Projecting this information forward, the authors conclude that the probability of a negative real return

over a twenty year period, for a investor holding a diversiﬁed portfolio, is about 14 per cent. In fact, most

individuals in deﬁned contribution pension plans have poorly diversiﬁed portfolios. Making more realistic

assumptions for deﬁned contribution pension plans, the authors ﬁnd that the probabability of a negative

real return over twenty years is about 25 per cent.

Let’s see if we can explain why there is this common misconception about the riskiness of long term

equity investing. Table 11.1 shows a typical table in a Mutual Fund advertisement. From this table, we are

supposed to conclude that

• Long term equity investment is not very risky, with an annualized compound return about 3% higher

than the current yield on government bonds.

• If S is the value of the mutual fund, and B is the value of the government bond, then

B(T) = B(0)e

rT

r = .03

S(T) · S(0)e

αT

α = .06, (11.1)

for T large, which gives

S(T=30)

S(0)

B(T=30)

B(0)

= e

1.8−.9

= e

.9

· 2.46, (11.2)

indicating that you more than double your return by investing in equities compared to bonds (over the

long term).

A convenient way to measure the relative returns on these two investments (bonds and stocks) is to

compare the total compound return

Compound return: stocks = log

¸

S(T)

S(0)

= αT

Compound return: bonds = log

¸

B(T)

B(0)

= rT , (11.3)

65

1 year 2 years 5 years 10 years 20 years 30 years 30 year bond

yield

-2% -5% 10% 8% 7% 6% 3%

Table 11.1: Historical annualized compound return, XYZ Mutual Equity Funds. Also shown is the current

yield on a long term government bond.

or the annualized compound returns

Annualized compound return: stocks =

1

T

log

¸

S(T)

S(0)

= α

Annualized compound return: bonds =

1

T

log

¸

B(T)

B(0)

= r . (11.4)

If we assume that the value of the equity portfolio S follows a Geometric Brownian Motion

dS = µS dt +σS dZ (11.5)

then from equation (2.56) we have that

log

S(T)

S(0)

∼ N((µ −

σ

2

2

)T, σ

2

T) , (11.6)

i.e. the compound return in is normally distributed with mean (µ −

σ

2

2

)T and variance σ

2

T, so that the

variance of the total compound return increases as T becomes large.

Since var(aX) = a

2

var(X), it follows that

1

T

log

S(T)

S(0)

∼ N((µ −

σ

2

2

), σ

2

/T) , (11.7)

so that the the variance of the annualized return tends to zero at T becomes large.

Of course, what we really care about is the total compound return (that’s how much we actually have at

t = T, relative to what we invested at t = 0) at the end of the investment horizon. This is why Table 11.1

is misleading. There is signiﬁcent risk in equities, even over the long term (30 years would be long-term for

most investors).

Figure 11.1 shows the results of 100, 000 simulations of asset prices assuming that the asset follows

equation (11.5), with µ = .08, σ = .2. The investment horizon is 5 years. The results are given in terms of

histograms of the annualized compound return (equation (11.4)) and the total compound return ((equation

(11.3)).

Figure 11.2 shows similar results for an investment horizon of 30 years. Note how the variance of the

annualized return has decreased, while the variance of the total return has increased (verifying equations

(11.6-11.7)).

Assuming long term bonds yield 3%, this gives a total compound return over 30 years of .90, for bonds.

Looking at the right hand panel of Figure 11.2 shows that there are many possible scenarios where the return

on equities will be less than risk free bonds after 30 years. The number of scenarios with return less than

risk free bonds is given by the area to the left of .9 in the histogram.

12 Further Reading

12.1 General Interest

• Peter Bernstein, Capital Ideas: the improbable origins of modern Wall street, The Free Press, New

York, 1992.

66

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8

0

0.5

1

1.5

2

2.5

3

x 10

4

Annualized Returns − 5 years

Ann. Returns ($)

−3 −2 −1 0 1 2 3 4 5

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x 10

4

Log Returns − 5 years

Log Returns ($)

Figure 11.1: Histogram of distribution of returns T = 5 years. µ = .08, σ = .2, 100, 000 simulations. Left:

annualized return 1/T log[S(T)/S(0)]. Right: return log[S(T)/S(0)].

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8

0

0.5

1

1.5

2

2.5

3

x 10

4

Annualized Returns − 30 years

Ann. Returns ($)

−3 −2 −1 0 1 2 3 4 5

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

x 10

4

Log Returns − 30 years

Log Returns ($)

Figure 11.2: Histogram of distribution of returns T = 30 years. µ = .08, σ = .2, 100, 000 simulations. Left:

annualized return 1/T log[S(T)/S(0)]. Right: return log[S(T)/S(0)],

67

• Peter Bernstein, Against the Gods: the remarkable story of risk, John Wiley, New York, 1998, ISBN

0-471-29563-9.

• Burton Malkeil, A random walk down Wall Street, W.W. Norton, New York, 1999, ISBN 0-393-32040-5.

• N. Taleb, Fooled by Randomness, Texere, 2001, ISBN 1-58799-071-7.

12.2 More Background

• A. Dixit and R. Pindyck, Investment under uncertainty, Princeton University Press, 1994.

• John Hull, Options, futures and other derivatives, Prentice-Hall, 1997, ISBN 0-13-186479-3.

• S. Ross, R. Westerﬁeld, J. Jaﬀe, Corporate Finance, McGraw-Hill Irwin, 2002, ISBN 0-07-283137-5.

• W. Sharpe, Portfolio Theory and Capital Markets, Wiley, 1970, reprinted in 2000, ISBN 0-07-135320-8.

(Still a classic).

• Lenos Triegeorgis, Real Options: Managerial Flexibility and Strategy for Resource Allocation, MIT

Press, 1996, ISBN 0-262-20102-X.

12.3 More Technical

• P. Brandimarte, Numerical Methods in Finance: A Matlab Introduction, Wiley, 2002, ISBN 0-471-

39686-9.

• Boyle, Broadie, Glassermman, Monte Carlo methods for security pricing, J. Econ. Dyn. Con., 21:1267-

1321 (1997)

• P. Glasserman, Monte Carlo Methods in Financial Engineering, Springer (2004) ISBN 0-387-00451-3.

• D. Higham, An Introduction to Financial Option Valuation, Cambridge (2004) ISBN 0-521-83884-3.

• P. Jackel, Monte Carlo Methods in Finance, Wiley, 2002, ISBN 0-471-49741-X.

• Y.K. Kwok, Mathematical Models of Finance, Springer Singapore, 1998, ISBN 981-3083-565.

• S. Neftci, An Introduction to the Mathematics of Financial Derivatives, Academic Press (2000) ISBN

0-12-515392-9.

• R. Seydel, Tools for Computational Finance, Springer, 2002, ISBN 3-540-43609-X.

• D. Tavella and C. Randall, Pricing Financial Instruments: the Finite Diﬀerence Method, Wiley, 2000,

ISBN 0-471-19760-2.

• D. Tavella, Quantitative Methods in Derivatives Pricing: An Introduction to Computational Finance,

Wiley (2002) ISBN 0-471-39447-5.

• N. Taleb, Dynamic Hedging, Wiley, 1997, ISBN 0-471-15280-3.

• P. Wilmott, S. Howison, J. Dewynne, The mathematics of ﬁnancial derivatives: A student introduction,

Cambridge, 1997, ISBN 0-521-49789-2.

• Paul Wilmott, Paul Wilmott on Quantitative Finance, Wiley, 2000, ISBN 0-471-87438-8.

68

4.9

4.8.2 Strong and Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matlab and Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31 33

5 The Binomial Model 35 5.1 A No-arbitrage Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6 More on Ito’s Lemma 7 Derivative Contracts on non-traded Assets 7.1 Derivative Contracts . . . . . . . . . . . . . 7.2 A Forward Contract . . . . . . . . . . . . . 7.2.1 Convenience Yield . . . . . . . . . . and Real . . . . . . . . . . . . . . . . . . 40 Options 42 . . . . . . . . . . . . . . . . . . . . . . 42 . . . . . . . . . . . . . . . . . . . . . . 45 . . . . . . . . . . . . . . . . . . . . . . 46

8 Discrete Hedging 46 8.1 Delta Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 8.2 Gamma Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 8.3 Vega Hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 9 Jump Diﬀusion 51 9.1 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 9.2 The Jump Diﬀusion Pricing Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 10 Mean Variance Portfolio Optimization 10.1 Special Cases . . . . . . . . . . . . . . 10.2 The Portfolio Allocation Problem . . . 10.3 Adding a Risk-free asset . . . . . . . . 10.4 Criticism . . . . . . . . . . . . . . . . 10.5 Individual Securities . . . . . . . . . . 11 Stocks for the Long Run? 55 56 57 60 61 62 65

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

12 Further Reading 66 12.1 General Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 12.2 More Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 12.3 More Technical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

2

pdﬂatex version “Men wanted for hazardous journey, small wages, bitter cold, long months of complete darkness, constant dangers, safe return doubtful. Honour and recognition in case of success.” Advertisement placed by Earnest Shackleton in 1914. He received 5000 replies. An example of extreme risk-seeking behaviour. Hedging with options is used to mitigate risk, and would not appeal to members of Shackleton’s expedition.

1

The First Option Trade

Many people think that options and futures are recent inventions. However, options have a long history, going back to ancient Greece. As recorded by Aristotle in Politics, the ﬁfth century BC philosopher Thales of Miletus took part in a sophisticated trading strategy. The main point of this trade was to conﬁrm that philosophers could become rich if they so chose. This is perhaps the ﬁrst rejoinder to the famous question “If you are so smart, why aren’t you rich?” which has dogged academics throughout the ages. Thales observed that the weather was very favourable to a good olive crop, which would result in a bumper harvest of olives. If there was an established Athens Board of Olives Exchange, Thales could have simply sold olive futures short (a surplus of olives would cause the price of olives to go down). Since the exchange did not exist, Thales put a deposit on all the olive presses surrounding Miletus. When the olive crop was harvested, demand for olive presses reached enormous proportions (olives were not a storable commodity). Thales then sublet the presses for a proﬁt. Note that by placing a deposit on the presses, Thales was actually manufacturing an option on the olive crop, i.e. the most he could lose was his deposit. If had sold short olive futures, he would have been liable to an unlimited loss, in the event that the olive crop turned out bad, and the price of olives went up. In other words, he had an option on a future of a non-storable commodity.

2

The Black-Scholes Equation

This is the basic PDE used in option pricing. We will derive this PDE for a simple case below. Things get much more complicated for real contracts.

2.1

Background

Over the past few years derivative securities (options, futures, and forward contracts) have become essential tools for corporations and investors alike. Derivatives facilitate the transfer of ﬁnancial risks. As such, they may be used to hedge risk exposures or to assume risks in the anticipation of proﬁts. To take a simple yet instructive example, a gold mining ﬁrm is exposed to ﬂuctuations in the price of gold. The ﬁrm could use a forward contract to ﬁx the price of its future sales. This would protect the ﬁrm against a fall in the price of gold, but it would also sacriﬁce the upside potential from a gold price increase. This could be preserved by using options instead of a forward contract. Individual investors can also use derivatives as part of their investment strategies. This can be done through direct trading on ﬁnancial exchanges. In addition, it is quite common for ﬁnancial products to include some form of embedded derivative. Any insurance contract can be viewed as a put option. Consequently, any investment which provides some kind of protection actually includes an option feature. Standard examples include deposit insurance guarantees on savings accounts as well as the provision of being able to redeem a savings bond at par at any time. These types of embedded options are becoming increasingly common and increasingly complex. A prominent current example are investment guarantees being oﬀered by insurance companies (“segregated funds”) and mutual funds. In such contracts, the initial investment is guaranteed, and gains can be locked-in (reset) a ﬁxed number of times per year at the option of the contract holder. This is actually a very complex put option, known as a shout option. How much should an investor be willing to pay for this insurance? Determining the fair market value of these sorts of contracts is a problem in option pricing.

3

but not the obligation. the return earned by this portfolio must be the risk-free rate. Suppose the value of a stock is currently $20. 0) for a call = max(K − S. At some time T in the future (the expiry or exercise date) the holder has the right. Payoﬀ Payoﬀ = max(S − K.2) .3 A Simple Example: The Two State Tree This example is taken from Options. since the holder has a right but not an obligation. if the stock price is $18. where the holder must buy or sell at a prescribed price.2 Deﬁnitions Let’s consider some simple European put/call options. the option is worth zero. then the value of the portfolio is Value if stock goes up = $22δ − 1 Value if stock goes down = $18δ − 0 4 (2. with a European call option. and short (negative) one call option. in terms of the price of the underlying asset S. • Sell the asset at a prescribed price K (the exercise or strike price). This option can have only two possible values in three months: if the stock price is $22. We will compute δ so that the portfolio is riskless. 2. by John Hull. This is illustrated in Figure 2. This is a call option. the option is worth $1. Consider a portfolio consisting of a long (positive) position of δ shares of stock.1. We assume that the stock pays no dividends. and other derivatives. If the stock moves up to $22 or goes down to $18.1) Note that the payoﬀ from an option is always non-negative. This is a put option. and we would like to value a European call option to buy the stock in three months for $21.Stock Price = $22 Option Price = $1 Stock Price = $20 Stock Price = $18 Option Price = $0 Figure 2. At expiry time T . It is known that at the end of three months. This contrasts with a forward contract. in such a way that there is no uncertainty about the value of the portfolio at the end of three months.1: A simple case where the stock value can either be $22 or $18. K = $21. In order to price this option. futures. we can set up an imaginary portfolio consisting of the option and the stock. we know with certainty what the value of the option is. Since the portfolio has no risk. the stock price will be either $22 or $18. 0) for a put (2. 2. to • Buy an asset at a prescribed price K (the exercise or strike price).

we buy . the seller could sell at the higher price and lock in an instantaneous risk-free gain.367 The value of the stock today is $20.50.633 2. The value of the portfolio is 20 × . The value of the option is also independent of the probability that the stock goes up to $22 or down to $18. or fair market value.367 from the bank at the risk free rate (this means that we have to pay the bank back $4. Once we set up this hedge. Let the value of the option be V .50 in three months).00 (the current price of the stock). The call option is worthless. borrow $4. at the expiry of the option. 2.12×.50 (2. our stock holding is now worth $5. If the market price of the option was higher than this value.4 A hedging strategy So.00. which gives us $5.25 shares at $20.5 Brownian Motion Before we consider a model for stock price movements. it would be possible to lock in a risk-free gain by selling the portfolio short. if we choose δ = . Then. The value of the stock holding is now $4.25 = 4.50 Value if stock goes down = $18δ − 0 = $4. Note that this hedge works regardless of whether or not the stock goes up or down. In other words. we pay the option holder $1.633. i.50. σ is the volatility. so that for most investors. where dX = αdt + σdZ where αdt is the drift term. Today. Suppose X is a random variable.5) where φ is a random variable drawn from a normal distribution with mean zero and variance one (φ ∼ N (0. we don’t have a care in the world. we can assume that such opportunities are not possible (the no arbitrage condition). we sell the option for $. just enough to pay oﬀ the bank loan. This is somewhat counterintuitive. and dZ is a random term. and therefore that the market price of the option should be the theoretical price.367 → V = . Any such arbitrage opportunities are rapidly exploited in the market.3) So. In three months time.50 × e−. in this simple situation.00. • The stock goes down to $18.50. φ is normally distributed). The dZ term has the form √ dZ = φ dt (2. if we sell the above option (we hold a short position in the option).00 in cash. the value of the portfolio is $4. regardless of whether the stock moves up or down. this is price which a hedger requires to ensure that there is always just enough money at the end to net out at zero gain or loss.25 − V = 4. which leaves us with $4. if the market price of the option was lower than the theoretical. Suppose the current risk-free rate is 12%. which will precisely pay oﬀ the option holder and any bank loans required to set up the hedge. let’s consider the idea of Brownian motion with drift.So.50. or 4. then we can hedge this position in the following way.25. X → X + dX. which is just enough to pay oﬀ the bank loan. 1). Consequently. Alternatively. then the value of the portfolio today must be the present value of $4. and in time t → t + dt. one of two things happens • The stock goes up to $22. we see that the theoretical price of the option is the cost for the seller to set up portfolio. then the value of the portfolio is Value if stock goes up = $22δ − 1 = $4. A risk-free portfolio must earn the risk free rate. 5 .50.e.4) (2.

Assume that • X follows a Markov process.e. j)) is P (n. and (n − j) down moves (P (n. we have E(dX) = E(αdt) + E(σdZ) = αdt . so that t = n∆t. suppose we consider the distribution of X after n moves. with probability p with probability q (2. The probabilities of reaching any particular lattice point for the ﬁrst three moves are shown in Figure 2. the probability distribution in the future depends only on where it is now. then E(φ) = 0 Now in a time interval dt. • The probability of an up or down move is independent of what happened in the past. denoted by V ar(dX) is V ar(dX) = E([dX − E(dX)]2 ) = E([σdZ]2 ) = σ 2 dt . and the probability of a down move is q. → X0 − ∆h . (2. At any lattice point X0 + i∆h.11) Now.13) 6 .8) Let’s look at a discrete model to understand this process more completely. → X0 + ∆h . Let X = X0 at t = 0.2. Each move takes place in the time interval t → t + ∆t. Let ∆X be the change in X over the interval t → t + ∆t. E(φ2 ) = 1 .12) which is just a binomial distribution. The probability of j up moves.10) so that the variance of ∆X is (over t → t + ∆t) V ar(∆X) = E([∆X]2 ) − [E(∆X)]2 = (∆h)2 − (p − q)2 (∆h)2 = 4pq(∆h)2 . i. the probability of an up move is p.9) (2. if Xn is the value of X after n steps on the lattice. Now. Suppose that at t = ∆t. X0 X0 where p + q = 1. (2. • X can move only up or down ∆h. Then E(∆X) E([∆X]2 ) = (p − q)∆h = p(∆h)2 + q(−∆h)2 = (∆h)2 . Suppose that we have a discrete lattice of points. j) = n! pj q n−j j!(n − j)! (2. then E(Xn − X0 ) V ar(Xn − X0 ) = nE(∆X) = nV ar(∆X) .7) and the variance of dX.If E is the expectation operator. (2.6) (2. (2.

17) .10.11.14)) E(Xn − X0 ) V ar(Xn − X0 ) = σt (p − q) √ ∆t = t4pqσ 2 (2.X 0 + 3∆h p2 X 0 + 2∆h X 0 + ∆h X0 X 0 .14) Now. for E(Xn − X0 ) to be independent of ∆t as ∆t → 0.∆h X 0 . Now.14) we get that V ar(Xn − X0 ) is either 0 or inﬁnite after a ﬁnite time. (each up or down move is independent of previous moves). we need to choose ∆h = Const ∆t. ∆t 7 (2. 2. and we would like to recover dX = αdt + σdZ E(dX) = αdt V ar(dX) = σ 2 dt √ (2. so this √ obviously not a very interesting case). (Stock variances do not have either of these properties. we would like to take the limit at ∆t → 0 in such a way that the mean and variance of X. is Let’s choose ∆h = σ ∆t.13) we obtain = n(p − q)∆h t = (p − q)∆h ∆t V ar(Xn − X0 ) = n4pq(∆h)2 t = 4pq(∆h)2 ∆t E(Xn − X0 ) (2. which follows from the properties of a binomial distribution.15) as ∆t → 0.2: Probabilities of reaching the discrete lattice points for the ﬁrst three moves. from equations (2.16) Now. since 0 ≤ p. from equation (2. we must have √ (p − q) = Const.3∆h Figure 2. which gives (from equation (2. after a ﬁnite time t is independent of ∆t. q ≤ 1. Consequently. Otherwise.2∆h p p3 3p2q 2pq 3pq2 q q2 q 3 X 0 . 2.

t) t = 0 t dZ . 0 dZ is normally distributed with mean zero and variance t (the limit of a binomial distribution is a normal distribution). recall that the binomial distribution (2.23) From equation (2.8). (2.24) Now. so that dX = dZ = Now we can write t dZ 0 = ∆t→0 lim (Zi+1 − Zi ) = (Zn − Z0 ) . From equation (2. Hence. let’s imagine that X(tn ) − X(t0 ) = Xn − X0 is very small. we can interpret the random walk for X on the lattice (with these parameters) as the solution to the stochastic diﬀerential equation (SDE) dX = α dt + σ dZ √ dZ = φ dt. (2. so that we can write (Zi − Zi−1 )2 To summarize 8 (dZ)2 = ∆t . if n is large (∆t → 0).26) . so that Xn − X0 that equation (2. (2. ∆t → 0 .25) In other words. (2.20) (α = 0. with certainty. after a ﬁnite time t.21) which agrees with equations (2.If we choose p−q we get p q = = 1 [1 + 2 1 [1 − 2 α√ ∆t] σ √ α ∆t] σ = α√ ∆t σ (2. so that (Zn − Z0 ) ∼ N (0. putting together equations (2. Note that (Zi − Zi−1 )2 = ∆t.18) (2.19) Now.12) tends to a normal distribution. Consider the case where α = 0.24).7-2. i (2. in the limit as ∆t → 0. √ √ Recall that have that Zi − Zi−1 = ∆t with probability p and Zi − Zi−1 = − ∆t with probability q. (2.22) Z(ti ) − Z(ti−1 ) = Zi − Zi−1 = Xi − Xi−1 . we have that the mean of this distribution is zero. σ = 1) we have E(Zn − Z0 ) = 0 V ar(Zn − Z0 ) = t.19) gives E(Xn − X0 ) V ar(Xn − X0 ) = αt = tσ 2 (1 − = tσ 2 α2 ∆t) σ2 . dX and tn − t0 (2. σ = 1.20) dt. so Now.20) becomes E(dX) V ar(dX) = α dt = σ 2 dt . with variance t.16-2.

t) . (2. and dZ is the increment of a Wiener process.34) and (2.35) are called geometric Brownian motion with drift. (2. dt dx ) dt (2. the V ar(Xn − X0 ) is either zero or inﬁnite.32) (2. the stock price S changes to S + dS. σ is the volatility. Similarly. superimposed on the upward (relative) drift is a (relative) random walk. So.30) Consequently. 1). where dS = µdt + σdZ S where µ is the drift rate.• We can interpret the SDE dX = α dt + σ dZ √ dZ = φ dt.e. We suppose that in an inﬁnitesimal time dt. otherwise. after any ﬁnite time.33) 2. • V ar(dZ) = dt.27) as the limit of a discrete random walk on a lattice as the timestep tends to zero. i. ∆x ∆t = ±∞ . the actual path followed by stock is more complex than the simple situation described above.35) where φ ∼ N (0. (2.31) dx dt = ∆h (2.28) Going back to our lattice example. so we cannot speak of E( but we can possibly deﬁne E(dx) .34) (2. E(|∆X|) so that the the total distance traveled in n steps is n∆h = = which goes to inﬁnity as ∆t → 0. Equations (2. √ dZ = φ dt (2.6 Geometric Brownian motion with drift Of course. note that the total distance traveled over any ﬁnite interval of time becomes inﬁnite.29) t ∆h ∆t tσ √ ∆t (2. The degree of randomness is given 9 . These paths are not diﬀerentiable. does not exist. • We can integrate the term dZ to obtain t dZ 0 = Z(t) − Z(0) ∼ N (0. we assume that the relative changes in stock prices (the returns) follow Brownian motion with drift. Brownian motion is very jagged at every timescale. More realistically.

which is Ito’s Lemma (see Derivatives: the theory and practice of ﬁnancial engineering. Wilmott). by the volatility σ. In this case. Note that E(dS) = E(σSdZ + µSdt) = µSdt since E(dZ) = 0 (2. right: high volatility case. 10 . Figure 2. Left: low volatility case. Risk-free rate of return r = . we need only one result.38) An informal derivation of this result is given in the following section. we assume that the drift rate µ equals the risk free rate. where dG = µS ∂G σ 2 S 2 ∂ 2 G ∂G + + ∂S 2 ∂S 2 ∂t dt + σS ∂G dZ ∂S (2. where S follows the stochastic process equation (2.3: Realizations of asset price following geometric Brownian motion. then. The normal rules of calculus don’t apply.40 per year Asset Price ($) 600 500 400 300 200 100 0 0 2 4 6 8 10 12 Asset Price ($) Risk Free Return 600 500 400 300 200 100 0 0 2 4 6 8 10 12 Risk Free Return Time (years) Time (years) Figure 2.1000 900 800 700 1000 Low Volatility Case σ = .34). for our purposes. since for example dZ dt 1 = φ√ dt → ∞ as dt → 0 .3 gives an illustration of ten realizations of this random process for two diﬀerent values of the volatility.34) is a stochastic diﬀerential equation. in small time increment dt.36) and that the variance of dS is V ar[dS] = E(dS 2 ) − [E(dS)]2 = E(σ 2 S 2 dZ 2 ) = σ 2 S 2 dt (2.37) so that σ is a measure of the degree of randomness of the stock price movement. by P. G → G + dG.05.20 per year 900 800 700 High Volatility Case σ = . Equation (2. However. t). Suppose we have some function G = G(S. The study of these sorts of equations uses results from stochastic calculus.

2..39) ) (dS)2 = (adt + b dZ)2 = a2 dt2 + ab dZdt + b2 dZ 2 dS 2 + . then dG = GS dS + Gt dt + GSS Now (from (2. we have that φ2 dt becomes non-stochastic. t)dZ where dZ is the increment of a Weiner process. so that with probability one dZ 2 → dt as dt → 0 (2. dG = GS b dZ + (a GS + GSS (2.38) can be deduced by setting a = µS and b = σS in equation (2. we have the result that if dS = a(S. t)dZ and if G = G(S.39) where φ is a random variable drawn from a normal distribution with mean zero and unit variance.6..2. t).50). 2 b2 ) 2 (2.38).44. then b2 + Gt )dt 2 (2.42) Now. it can be shown (see Section 6) that in the limit as dt → 0. we have that.49) b2 + Gt )dt 2 Equation (2. t)dt + b(S.45) becomes (dS)2 = b2 dZ 2 + O((dt)3/2 ) or (dS)2 → b2 dt as dt → 0 Now.2.45) √ Since dZ = O( dt) and dZ 2 → dt.50) 11 .46) = GS (a dt + b dZ) + dt(Gt + GSS = GS b dZ + (aGS + GSS So.39.40) (2.41) (2.47) give dG = GS dS + Gt dt + GSS dS 2 + . equations(2. if E is the expectation operator.1 Ito’s Lemma We give an informal derivation of Ito’s lemma (2. t). suppose we have some function G = G(S. then E(φ) = 0 so that the expected value of dZ 2 is E(dZ 2 ) = dt (2..47) (2. Suppose we have a variable S which follows dS = a(S.48) (2.44) E(φ2 ) = 1 (2. Now since dZ 2 = φ2 dt (2. t)dt + b(S.43) Now.. equation (2. 2 (2.

and use Ito’s Lemma dF = FS SσdZ + (FS µS + FSS = (µ − so that we can integrate this to get F (t) = F (0) + (µ − or.6. 2. µ).2 Some uses of Ito’s Lemma Suppose we have dS = µdt + σdZ . S(t) = S(0) exp[(µ − σ2 )t + σ(Z(t) − Z(0))] .2. then (b(X.. so that ¯ dX dt t (2.56) σ2 )dt + σdZ .51) Unfortunately.3 Some more uses of Ito’s Lemma We can often use Ito’s Lemma and some algebraic tricks to determine some properties of distributions.60) = E[a] = a ¯ (2.58) ¯ If E[X] = X. then this can be integrated (from t = 0 to t = t) exactly to give S(t) = S(0) + µt + σ(Z(t) − Z(0)) and from equation (2.61) ¯ X=E 0 a dt . t) Suppose instead we use the more usual geometric Brownian motion dS = µSdt + σSdZ Let F (S) = log S.53) (2. t) and dZ are independent) ¯ E[dX] = d E[S] = dX = E[a dt] + E[b] E[dZ] = E[a dt] . 2 (2. since S = eF . If µ.54) (2.55) (2. σ = Const. 12 .6. t) dt + b(X.28) Z(t) − Z(0) ∼ N (0. (2. t) dZ .59) = a(X. these cases are about the only situations where we can exactly integrate the SDE (constant σ. (2. then dG = aGX + Gt + b2 GXX 2 dt + GX b dZ . 2 σ2 S 2 + Ft )dt 2 (2. Let dX then if G = G(X).52) (2.57) σ2 )t + σ(Z(t) − Z(0)) 2 (2.

let G(S) = S . 2.72) .68) From equations (2. then ¯ dG = E[µS · 3S 2 + σ 2 S 2 /2 · 3 · 2S] dt + E[3S 2 σS]E[dZ] = E[3µS 3 + 3σ 2 S 3 ] ¯ = 3(µ + σ 2 )G .64) (2. then ¯ E[dS] = dS = E[µS] dt ¯ = µS dt . ¯ t t (2.65) ¯ = µS dt = S0 eµt .67) so that (2. If G(S) = S 3 and G = E[G(S)] = E[S 3 ].66) and (2.62) which means that ¯ G = var(X) = E 0 b2 dt + E 0 ¯ 2(a − a)(X − X) dt ¯ .68. σ constant. so that E[G] = G = E[S ]. 2. (2. ¯ G 2 ¯ = G0 e(2µ+σ )t 2 2 E[S 2 ] = S0 e(2µ+σ )t .68) we then have var(S) = E[S 2 ] − (E[S])2 ¯ = E[S 2 ] − S 2 2 = S0 e2µt (eσ t − 1) 2 ¯ = S 2 (eσ t − 1) .66) ¯ Now. 13 (2. then (from Ito’s Lemma) ¯ dG = E[2µS 2 + σ 2 S 2 ] dt + E[2S 2 σ]E[dZ] = E[2µS 2 + σ 2 S 2 ] dt ¯ = (2µ + σ 2 )G dt .71) We can then obtain the skewness from ¯ ¯ ¯ ¯ E[(S − S)3 ] = E[S 3 − 2S 2 S − 2S S 2 + S 3 ] 3 2 3 ¯ ¯ = E[S ] − 2SE[S ] − S . If dS with µ. so that ¯ dS ¯ S 2 2 = µS dt + σS dZ (2. then ¯ dG = E [dG] ¯ ¯ a ¯ = E[2(X − X)a − 2(X − X)¯ + b2 ] dt + E[2b(X − X)]E[dZ] 2 ¯ = E[b dt] + E[2(X − X)(a − a) dt] . E[(S − S)3 ].72) to get the desired result. (2. (2. we can sometimes get more useful expressions.66. Equations (2. (2.¯ ¯ Let G = E[(X − X)2 ] = var(X). 2 (2.63) In a particular case.69) ¯ ¯ One can use the same ideas to compute the skewness. ¯ G (2.71) can then be substituted into equation (2.70) so that = E[S 3 ] 3 = S0 e3(µ+σ 2 )t .

turbulent ﬂow).34) and (2. Construct an imaginary portfolio. i. since we shall see (later on) that (αh ) actually depends on S. This eliminates the dZ term in equation (2. letting (αh ) = VS then substituting equation (2. equation (2.e.78) 14 . t).76) into equation (2. equation (2. • There are no arbitrage opportunities.74) we not included a term (α )S S.75).74) would be dP = dV − (αh )dS − Sd(αh ) but we have to remember that (αh ) does not change over a small time interval. and a number of (−(αh )) of the underlying asset. i.77) and (2.e. (If (αh ) > 0. at any instant in time. if we think of a real situation. and then S changes randomly. The value of this portfolio P is P = V − (αh )S (2.74) Note that in equation (2.e.79) (2. not a diﬀerential. we have borrowed an asset. Suppose that we have an option whose value is given by V = V (S.75) We can make this portfolio riskless over the time interval dt. we can own negative quantities of an asset).78) give rP dt = Vt + σ2 S 2 VSS dt 2 (2.34). Substituting equations (2. then we have sold the asset short.) So. we must choose (αh ).74) gives dP = σS VS − (αh ) dZ + µSVS + σ2 S 2 VSS + Vt − µ(αh )S dt 2 (2. If we were taking a true diﬀerential then equation (2.73) In a small time dt. by choosing (αh ) = VS in equation (2.75) gives dP = Vt + σ2 S 2 VSS dt 2 (2.2.76) Since P is now risk-free in the interval t → t + dt. This is actually a rather subtle point.e. all risk-free portfolios must earn the risk-free rate of return. and then we hold the portfolio while the asset moves randomly. then no-arbitrage says that dP = rP dt Therefore. since we pick (αh ).74) is actually the change in the value of the portfolio. • The risk-free rate of return is a constant r. and are obligated to give it back at some future date).77) (2. The principle of no peeking into the future is why Ito stochastic calculus is used. equations (2. dP = dV − (αh )dS h (2. (otherwise. (This is the analogue of our choice of the amount of stock in the riskless portfolio for the two state tree model. sold it. which is not permitted by the no-arbitrage condition) and hence (αh ) is not allowed to contain any information about future asset price movements. • Short selling is permitted (i. consisting of one option.7 The Black-Scholes Analysis Assume • The stock price follows geometric Brownian motion. we could get rich without risk.75).38) into equation (2. We are not allowed to peek into the future. P → P + dP . However. Other forms of stochastic calculus are used in Physics applications (i. So.

this should not be possible. In this case. The price is the cost of setting up the hedging portfolio. then we borrow at the risk free rate. we could short the hedging portfolio. • Long ∂V ∂S shares at price S. If the price was higher then the Black-Scholes price. 2. we receive the risk free rate of return. if the price was lower than the Black-Scholes price. except that the drift rate appears.Since P = V − (αh )S = V − VS S then substituting equation (2. Cash will ﬂow into and out of the bank account. 2. By the no-arbitrage condition. If negative. ∂S • We buy ∂V ∂S shares at price S.80) into equation (2. not science! 15 . But this is art.80) which is the Black-Scholes equation. Similarly. the value of the option is given by an equation similar to the Black-Scholes equation. we can see that the price of the option valued by the Black-Scholes equation is the market price of the option at any time.81) (2. our hedging portfolio will be • Short one option worth V . Equation (2. the price can be interpreted as the expected payoﬀ based on the guess for the drift rate.81) is independent of the drift rate µ. Note that given the receipt of the cash for the option. and end up with a positive amount at the end. this portfolio can be liquidated and any obligations implied by the short position in the option can be covered. Note that we are not trying to predict the price movements of the underlying asset. dynamically adjust the hedge.8 Hedging in Continuous Time We can construct a hedging strategy based on the solution to the above equation. we could construct the hedging portfolio. Note ∂S that this is a dynamic hedge. which is a random process. (This gives us V in cash initially). at zero gain or loss. So. who buys and holds the option. and must be continuously rebalanced. this strategy is self-ﬁnancing. The price given by the Black-Scholes price is not the value of the option to a speculator. since we have to continually rebalance the portfolio. For a speculator. ∂S At any instant in time (including the terminal time).81) is solved backwards in time from the option expiry time t = T to the present t = 0. If the amount in the bank is positive. The Black-Scholes price is not the expected payoﬀ. regardless of the value of S. in response to changes in S. Then we carry out the following • We sell one option worth V . and end up with a positive gain. The value of the option is based on a hedging strategy which is dynamic.79) gives Vt + σ2 S 2 VSS + rSVS − rV = 0 2 (2.9 The option price So. Suppose we sell an option at price V at t = 0. A speculator is making bets about the underlying drift rate of the stock (note that the drift rate does not appear in the Black-Scholes equation). • We borrow (S ∂V − V ) from the bank. Note the rather remarkable fact that equation (2. we adjust the amount of stock we own so that we always have ∂V shares. At every instant in time. • V − S ∂V cash in the bank account.

an investor acting optimally. 0) for a call for a put (2. since S evolves randomly. τ = 0) = = σ2 S 2 VSS + rSVS − rV 2 max(S − K. which have the feature that they can be exercised at any time. 0) max(K − S. τ ) ≥ Deﬁne the operator σ2 S 2 VSS + rSVS − rV ) 2 and let V (S.10 American early exercise Actually. then these cash ﬂows should be discounted at an appropriate discount rate. t) .81) in more conventional form. we wanted to know the expected value of the option.83) (2. then we also have the additional constraints V (S. Consequently the value of an option today can be considered to the be the discounted future value. 0) max(K − S. the higher ρ).85) (2. and therefore we can determine the optimal course of action.1) as (ignoring terms of o(dt). most options traded are American options. We can imagine that we are buying and holding the option. So. the American option pricing problem can be stated as LV ≡ Vτ − ( LV V −V∗ (V − V ∗ )LV ≥ 0 ≥ 0 = 0 max(S − K. More formally. Thus V (S. so that equation (2. which we will call ρ (i. 0) for a call for a put (2. 0) = V ∗ . t) = E(V (S.e. we know what the option is worth in future.81) becomes Vτ V (S. τ ) → Vτ = −rV S for a call V (S = ∞.2. which is uncertain.. 16 (3. t) ≥ max(S − K. This is simply the old idea of net present value. the expected value of V (S + dS. In order to write equation (2. and let V (S + dS. i. t + dt) given that V = V (S.84) (2. and not hedging. If we are considering the value of risky cash ﬂows in the future.81) with the additional constraint V (S. We can rewrite equation (3. τ ) → 0 for a put If the option is American. deﬁne τ = T − t.2) . t) = 1 E(V (S + dS. 0) for a call max(K − S. 0) for a put V (0. the riskier the cash ﬂows.86) 3 The Risk Neutral World Suppose instead of valuing an option using the above no-arbitrage argument. where o(dt) represents terms that go to zero faster than dt ) ρdtV (S.82) Note that since we are working backwards in time.1) where E(. t) at t = t. Regard S today as known. t + dt)) 1 + ρdt (3. Consequently..e. the value of an American option is given by the solution to equation (2. will always exercise the option if the value falls below the payoﬀ or exercise value. t) + dV ) − V (S.) is the expectation operator. t + dt) be the value of the option at some future time t + dt.

81). Now. so that E[V (S. This result is the source of endless confusion.9) then we simply get the Black-Scholes equation (2. then we compute V (S. we simply assume that dS = rSdt + σSdZ (3.4-3. this result is useful for determining the no-arbitrage value of an option using a Monte Carlo approach.5) From Ito’s Lemma (2. Nevertheless.38) we have that dV Noting that E(dZ) then E(dV ) = Combining equations (3. 2 (3. It is best to think of this as simply a mathematical ﬂuke. if we set ρ = r and µ = r in equation (3. t) + dV ) − V (S.2) becomes ρdtV (S. t) so that equation (3. 0) = e−rT EQ (V (S. If we know the option payoﬀ as a function of S at t = T . If we are not hedging. Using this numerical method. T )) (3. we have to estimate the drift rate µ.3) (3. maybe this is the value that we are interested in. In other words. S σ2 S 2 VSS + µSVS dt + σSdZ . note the interesting fact.9) Vt + σ2 S 2 VSS + µSVS dt . and then E(V (S. t).8) gives Vt + σ2 S 2 VSS + µSVS − ρV = 0 . This does not have any reality. 2 (3. if this is the case. we can determine the no-arbitrage price by pretending we are living in a world where all assets drift at rate r. and the discount rate ρ. t = T ) at t = T in the future. t) = E(dV ) . This makes it clear that we are taking the expectation in the risk neutral world (the expectation in the Q measure). This is the so-called risk neutral world. not the no-arbitrage value.9) is the PDE for the expected value of an option. This means that the no-arbitrage price of an option is identical to the expected value if ρ = r and µ = r. I’d rather just buy risk-free bonds in this case.8) = 0 (3.7) = Vt + (3.Since we regard V as the expected value. This contrasts with the real-world expectation (the P measure). the expected value (today) is given by solving Vt + σ2 S 2 VSS + µSVS = 0 .10) and simulate a large number of random paths.6) Equation (3. Assume that dS = µdt + σdZ . and all investments are discounted at rate r. 2 (3. Investors would be very stupid to think that the drift rate of risky investments is r.4) = E(dV ) .12) . However. There is in reality no such thing as a risk-neutral world.11) which should be the no-arbitrage value. Suppose we want to know the expected value (in the real world) of an asset which pays V (S. (3. Estimating the appropriate discount rate is always a thorny issue. Then. 2 17 (3. t)] = V (S. Note the EQ in the above equation.

12) is V = Const. we can acquire the asset for price S(t = 0). we should remember that we are 18 . the asset is worth S(t = T ). (3. Then.14) Today.. 0) K = Strike Price (4. Note that at each timestep. but are the risk neutral paths. suppose we are going to receive V = S(t = T ). we generate a new random number. the no-arbitrage value of the option is Option V alue = e−rT E(payof f ) e−rT 1 M m=M payof f (m) . Assume that the solution to equation (3. Suppose that we assume that the underlying process is dS = rdt + σdZ (4.16) is (equation (2. Isn’t this easier than using brute force statistics? PDEs are much more elegant. = Seµ(T −t) .. starting at some price today S 0 . At t = T . Given the payoﬀ function of the option.e.17) = Sµ dt + Sσ dZ (3.13) Noting that we receive V = S at t = T means that V (3.5) Recall that these paths are not the real paths. . max(S N − K.18) 2 So that we have just shown that E[S] = Seµt by using a simple PDE argument and Ito’s Lemma.3) For example.15) 4 Monte Carlo Methods This brings us to the simplest numerical method for computing the no-arbitrage value of an option. Equation (3.2) where ∆t is the ﬁnite timestep.4) Suppose we run a series of trials. if dS then (setting t = T ) E[S] = Seµt . In particular. m = 1. After N steps. m=1 (4.1) S then we can simulate a path forward in time. the value for this path would be V alue = P ayof f (S N ) . if the option was a European call. S(t) = S(0) exp[(µ − (3. t = 0)] = E[S(t = 0)] = S(t = 0)eµ(T ) In other words.where we have dropped the discounting term. with T = N ∆t. Seµ(T −t) . and we ﬁnd that V = Const. A(t)S. and φi is a random number which is N (0. (3. i. and denote the payoﬀ after the m th trial as payof f (m). M . just the asset at t = T . using a forward Euler timestepping method (S i = S(ti )) √ S i+1 = S i + S i (r∆t + σφi ∆t) (4.16) (3. 1). then V alue = (4.. Now. Recall that the exact solution to equation (3.57)) σ2 )t + σ(Z(t) − Z(0))] .14) then says that E[V (S(t = 0). we have a single realized path.

6) Now. if we have more than three factors. (4. This is easy if we use a PDE method. As well. this just means that S is very small. more than three) underlying assets. Sometimes this gives bad results! Note that the exact solution to Geometric Brownian motion (2. there is no problem. and determine the SDE for log S. This Monte Carlo error is of size √ O(1/ M ). approximating the expectation by the mean of many random paths. since S = eF .55). if F = log S. However. We also have that Complexity = O = O ∆t and hence Error = O 1 ( Complexity)1/3 . and if F < 0. we should choose M = O( (∆t)2 ). and ignore the timestep error. S remains zero for the rest of this particular simulation. This makes the total error O(∆t). PDE methods become very expensive computationally. if we want to determine the eﬀects of discrete hedging. since we are simulating forward in time.7) = O (Complexity)−1/3 In practice. we obtain (with µ = r) dF = (r − σ2 )dt + σdZ . People just pick a timestep. There are thus two sources of error in the Monte Carlo approach: timestepping error and sampling error. 2 (4. √ ) M ∆t = timestep M = number of Monte Carlo paths (4. due to the approximate nature of our Forward Euler method for solving the SDE. in any ﬁnite time.e. The slow rate of convergence of Monte Carlo methods makes these techniques unattractive except when the option is written on several (i. for example. We can use this idea for any stochastic process where the variable should not go negative. we cannot know at a given point in the forward path if it is optimal to exercise or hold an American style option. the convergence in terms of timestep error is often not done. since we solve the PDE backwards in time. it is possible that a negative or zero S i can show up. We should seek to balance the timestepping error and the sampling error. As well. i. In this case. in this case • Cut back the timestep at this point in the simulation so that S is positive. which is slowly converging. The error in the Monte Carlo method is then Error 1 = O max(∆t.e. However. a Monte Carlo approach is very easy to implement.9) so that now.1. one day. • Set S = 0 and continue. then. 2. so we always know the continuation value and hence can act optimally. 19 . it doesn’t make sense to drive the Monte Carlo error down to zero if there is O(∆t) timestepping error.e.8) M ∆t 1 (∆t)3 (4. • Use Ito’s Lemma. i. and increase the number of Monte Carlo samples until they achieve convergence in terms of sampling error. In order to make these two errors 1 the same order.57) has the property that the asset value S can never reach S = 0 if S(0) > 0. approximating the solution to the SDE by forward Euler. if F < 0. which has O(∆t) truncation error. We can do one of three things here. from equation (2.

As long as the timestep is not too large. See for example. Unfortunately.1 Monte Carlo Error Estimators The sampling error can be estimated via a statistical approach.10) where φ ∼ N (0. • Easily handles multiple assets.6.96ω 1. 4. σ are constants. For more details. I’ll remind you that equation (4. The disadvantages of Monte Carlo methods are • It is diﬃcult to apply this idea to problems involving optimal decision making (e. very accurately. If negative S values show up many times. Easily handles complex path dependence. often the system supplied random number generators.2 Random Numbers and Monte Carlo There are many good algorithms for generating random sequences which are uniformly distributed in [0. hence any errors incurred will not aﬀect the expected value very much. where r. this is a signal that the timestep is too large. and the infamous RANDU IBM function. please look at (Park and Miller. In this case S(T ) = S(0) exp[(r − √ σ2 )T + σφ T ] 2 (4. For these simple cases. then the SDE can be solved exactly. 1].11) (e m=1 −rT payof f (m) − µ) ˆ 2 (4.2). • It is hard to compute the Greeks (VS . The timestep error can be estimated by running the problem with diﬀerent size timesteps. (Numerical Recipies in C++. As pointed out in this book. are extremely bad. In the case of simple Geometric Brownian motion. we should always use equation (4.10) is exact. If the estimated mean of the sample is µ = ˆ and the standard deviation of the estimate is ω = 1 M −1 m=M 1/2 e−rT M m=M payof f (m) m=1 (4.10). 31 (1988) 1192-1201).13) Note that in order to reduce this error by a factor of 10. VSS ).12) then the 95% conﬁdence interval for the actual value V of the option is 1. and we can avoid timestepping errors (see Section 2. Monte Carlo is popular because • It is simple to code. 4. comparing the solutions. Another good generator is described in (Matsumoto and Nishimura.g. 2002).” ACM Transactions 20 . “The Mersenne Twister: a 623 dimensionally equidistributed uniform pseudorandom number generator. the number of simulations must be increased by 100.96ω <V <µ+ √ ˆ µ− √ ˆ M M (4. American options). 1). Press et al. most people set S = 0 and continue. this does not work in more realistic situations.. Cambridge University Press. The Matlab functions appear to be quite good. • MC converges slowly. ACM Transactions on Mathematical Software.Usually. such as rand in the standard C library. which are the hedging parameters. this situation is probably due to an event of low probability.

1]. i. there exists a y such that the probability of getting a y in [−∞.23) 1 √ 2π y e−(y ) −∞ 2 /2 dy . x] is dx dy x 2 2 = p(x) dx . we have that the probability of obtaining a number in [0. dy (4. Consequently. (4.16) we obtain p(y) = ˆ e−y /2 √ = .on Modelling and Computer Simulation.17) dx 0 = x.17) we have e−(y ) /2 √ dy . we must have (recall the law of transformation of probabilities) p(x)|dx| = p(y)|dy| ˆ or p(y) ˆ Suppose we want p(y) to be normal. ˆ e−y /2 √ . for x ∈ [0. y + dy]. if we generate uniformly distributed numbers x on [0.21) So. y] is equal to the probability of getting x in [0. we need random numbers which are normally distributed on [−∞. 1].16) 2π If we start with a uniform distribution. Suppose we have uniformly distributed numbers on [0. p(x) = 1 on [0. otherwise (4. with mean zero and variance one (N (0.15-4. 21 (4. (4. 8 (1998) 3-30. dx = x y 2 dx 0 = −∞ y e−(y ) /2 √ dy .) Code can be downloaded from the authors Web site.18) but from equation (4. 1]. x]. we do the following • Generate x • Find y such that x = We can write this last step as y = F (x) where F (x) is the inverse cumulative normal distribution. 1)). 0 ≤ x ≤ 1 = 0 . (4. then to determine y which are N (0. However.22) . How is y(x) distributed? Let p(y) be the probability ˆ distribution of obtaining y in [y. +∞]. the probability of obtaining a number between x and x + dx is p(x)dx = dx . then from equations (4. 1].e. 2π (4. 2π 2 2 (4. (4.15) (4.14) Let’s take a function of this random variable y(x).19) 2π So.20) or x = −∞ e−(y ) /2 √ dy . 2π Now. 1).

x + dx] and if y = y(x). from equation (4.30) (4. if y1 y2 and we have that p(y1 .30). 1]. If p(x) is the probability of ﬁnding x ∈ [x. x2 + dx2 ]. otherwise . i.29) Now. y2 ) = ˆ ∂(x1 . dx1 dx2 = ∂(x1 .e. y2 = y2 (x1 . x2 ) ∂(y1 . 1]. Then. then. x2 ) be the probability of obtaining (x1 . there is actually a simpler method for obtaining random numbers which are normally distributed. x2 ) which results in normal distributions for y1 . i. y2 .31) We denote this distribution as x1 ∼ U [0. x2 ) where U (x) = 1 .15) we have |p(x)dx| = |ˆ(y)dy| p or p(y) = p(x) ˆ dx . x1 + dx1 ] × [x2 . we want to ﬁnd a transformation y1 = y1 (x1 . and let p(xi . x2 ) ∂(y1 . ∂(y1 . y2 ) (4.25) (4.26) where the Jacobian of the transformation is deﬁned as ∂(x1 . x2 ) ∂(y1 .27) = y1 (x1 .4. 1]. x2 ) is given by equation (4. y2 ) (4. x2 ) = y2 (x1 . x2 ) dy1 dy2 . then we have from equation (4.e. x2 uniformly distributed on [0. 1] and x2 ∼ U [0. dy (4. Consider y1 y2 = = −2 log x1 cos 2πx2 −2 log x1 sin 2πx2 (4.27) that p(y1 . 1] × [0.3 The Box-Muller Algorithm Starting from random numbers which are uniformly distributed on [0. 0≤x≤1 = 0 . x2 ) (4.32) Now.28) Recall that the Jacobian of the transformation can be regarded as the scaling factor which transforms dx1 dx2 to dy1 dy2 . If p(x1 . x2 ) in [x1 . y2 ) (4. x2 . y + dy].24) Now. p(x1 . suppose we have two original random variables x1 . x2 ) ∂(x1 .33) 22 . suppose that we have x1 . x2 ). y2 ) ˆ = p(x1 . y2 ) = det ∂x1 ∂y1 ∂x2 ∂y1 ∂x1 ∂y2 ∂x2 ∂y2 (4. = U (x1 )U (x2 ) (4. and p(y) is the probability of ﬁnding ˆ y ∈ [y.

y2 ∼ N (0. 1] V1 = 2U1 − 1 .38) can be expensive due to the trigonometric function evaluations. we carry out the following procedure Rejection Method Repeat If ( V12 + V22 < 1 ) Accept Else Reject 23 . 1). We can use the following method to avoid these evaluations. y2 ) = 2 2 1 1 √ e−y1 /2 √ e−y2 /2 2π 2π (4. Now. x2 ) x1 x2 = = exp −1 2 2 (y + y2 ) 2 1 1 y2 tan−1 . Note that we generate two draws from a normal distribution on each pass through the loop.30) holds.1 An improved Box Muller The algorithm (4. 1) . V2 = 2U2 − 1 (4. 2π y1 (4. V2 ) are uniformly distributed in [−1. u2 ∼ U (0. ρ = −2 log u1 z1 = ρ cos θ. x2 ) ∂(y1 . 1). 1] × [−1.38) This has the eﬀect that Z1 ∼ N (0.32-4.35) Now. assuming that equation (4.or solving for (x2 . (4.35) we have p(y1 . U2 ∼ U [0. we can see that (using equation (4.39) which means that (V1 . then from equations (4. or y1 ∼ N (0. Let U1 ∼ U [0. y2 ) = ˆ 2 2 1 1 √ e−y1 /2 √ e−y2 /2 2π 2π (4. with mean zero and variance one. 1) U θ = 2πu2 . 1) and Z2 ∼ N (0. 1) .36) so that (y1 . z2 = ρ sin θ End Repeat (4.3.37) This gives the following algorithm for generating normally distributed random numbers (given uniformly distributed numbers): Box Muller Algorithm Repeat Generate u1 ∼ √ (0.34) After some tedious algebra. 4.34)) ∂(x1 . 1]. 1] . normally distributed random variables. y2 ) are independent.

43) so that p(R2 ) = 2R d(R2 ) dR = 1 (4. we have that p(R2 )d(R2 ) = p(R)dR = 2R dR (4. V2 ) and the V1 axis. then θ ∼ U [0. (4. V2 ) as in equation (4.42) From the fundamental law of transformation of probabilities.40) we have that (V1 . Now. (4. V2 ) ∼ D(0.45) Now in the original Box Muller algorithm (4. i. V2 ) using algorithm (4. and R2 = U [0.39). R + dR] is p(R) dR = = 2πR dR π(1)2 2R dR . 2π]. Therefore. in the (V1 . the last step in the Box Muller algorithm (4. Note that cos θ sin θ = = V1 V12 + V22 V2 V12 + V22 . if we let W = R2 . V2 ) are uniformly distributed on the disk centered at the origin. 1]. (4. As well.48) 24 . 2π]. 1) . U1 ∼ U [0.44) so that R2 is uniformly distributed on [0. and then process the pairs (V1 .41) (4.46) but θ = tan−1 (V2 /V1 ) ∼ U [0.38) by θ = tan−1 V2 V1 (4. ρ in algorithm (4. then the probability of ﬁnding R in [R. ρ= −2 log U1 θ = 2ΠU2 .40) which means that if we deﬁne (V1 . with radiius one.38). 1) and R2 = V12 + V22 . This is denoted by (V1 .Endif End Repeat (4. 1]. θ is the angle between a line from the origin to the point (V1 . (R2 ∼ U [0. (4. 1] . then we can replace θ.e.47) ρ = −2 log W . 1]). V2 ) plane.38) is Z1 Z2 = ρ cos θ = ρ sin θ . If (V1 . U2 ∼ U [0. if θ = tan−1 (V2 /V1 ). 1] . V2 ) ∼ D(0.

Call the estimate for the option price from this sample path V + . Perhaps the simplest idea is the Antithetic Variable method. +1] are rejected (about 21%). and Z1 ∼ N (0. W (4.40). Then compute the average ¯ V = V++V− . We store all the φi . Then compute a second sample path where (φi ) = −φi . 1]. sin θ = V2 /R. +1] × [−1. for a given path... Intuitively.4 Speeding up Monte Carlo 1 O( √ ) M Monte Carlo methods are slow to converge..49) Z1 Z2 End If End Repeat = V1 = V2 Consequently. Averaging over all the V . i = 1. 1). .. but this method is still generally more eﬃcient than brute force Box Muller. 1).. about (1 − π/4) of the random draws in [−1. Call this estimate V − . so that Z1 Z2 This leads to the following algorithm Polar form of Box Muller Repeat Generate U1 ∼ U [0.50) = 2U1 − 1 = 2U2 − 1 = V12 + V22 V1 = ρ√ W V2 = ρ√ . 25 . and Z2 ∼ N (0. There are many methods which can be used to try to speed up convergence. Suppose we compute a random asset path √ S i+1 = S i µ∆t + S i σφi ∆t where φi are N (0. i = 1.but since W = R2 = V12 + V22 . (Z1 . Let V1 V2 W If( W < 1) then −2 log W/W −2 log W/W (4. These are usually termed Variance Reduction techniques. U2 ∼ U [0. 4. 2 ¯ and continue sampling in this way.. slightly faster convergence is obtained.. then cos θ = V1 /R. Z2 ) are independent (uncorrelated). we can see that this symmetrizes the random paths. Because of the rejection step (4. since the error is given by Error = where M is the number of samples. . 1]. 1).

This should also be used if we want to plot a histogram of the distribution. For example. xM is ¯ M s2 M = 1 M −1 1 M M (xi − x)2 ¯ i=1 x = ¯ Alternatively. or compute the Value at Risk.51) which will be smaller than V ar(X + ) if Cov(X + . one can use s2 M = xi i=1 (4. x2 .54) (4.52) M 1 1 x2 − M − 1 i=1 i M M 2 xi i=1 (4. For example. + − 4.53) which has the advantage that the estimate of the mean and standard deviation can be computed in one loop. If we want to know the actual variance of the distribution (and not just the mean). max(S − K1 .56) . then we should not average each V + and V − . compute the standard deviation of the estimator as ω = V ar( X +X ).e. if the payoﬀ is a capped call payoﬀ = min(K2 . β1 = 0 then compute recursively αi βi so that x = αM ¯ βM s2 = M M −1 26 xi − αi−1 i (i − 1)(xi − αi−1 )2 = βi−1 + i = αi−1 + (4. Then V ar( X+ + X− ) 2 = = 1 V ar(X + ) + 4 1 V ar(X + ) + 2 1 1 V ar(X − ) + Cov(X + .. Note that V ar(X + ) = V ar(X − ) (they have the same distribution).5 Estimating the mean and variance M An estimate of the mean x and variance s2 of M numbers x1 . and compute the estimate of the variance in the usual way. X − ) is nonpositive. 2 However. 0)) K2 > K 1 then the antithetic method performs poorly. and X − be the estimates obtained from all the V − simulations. Note that this method can be used to estimate the mean. a probability distribution). if the payoﬀ is not a monotonic function of S. since this changes the variance of the actual distribution. Springer.55) (4.. Warning: this is not always the case. we should just use the estimates V + . X ) 2 (4. In order to avoid roundoﬀ. X − ) 4 2 1 + − Cov(X . the results may actually be worse than crude Monte Carlo. the following method is suggested by Seydel (R. Seydel. . 2002). if we want to estimate the distribution of option prices (i.. then to compute the variance of the distribution. Set α1 = x1 .Let X + be the option values obtained from all the V + simulations. Tools for Computational Finance.13). In the MC error estimator (4.

N − 1 k+1 Sj the cumulative normal distribution for each (4. say.60) | xN So. in N dimensional space ˆ x1 x x= 2 . N steps in total.. ˆ (4. Clearly.58) Another problem has to do with the fact that if we are stepping through time.59) with. quasi-Monte M Carlo methods have been devised. then the worst case error bound for an LDS method is Error = O (log M )d M (4. ˆ • Generate the normally distributed vector y j by inverting ˆ component F (xj ) 1 F (xj ) j 2 y = ˆ | F (xj ) N • Generate a complete sample path k = 0. In other words. then this error bound is (at least asymptotically) better than Monte Carlo. We are trying to uniformly sample from this N dimensional space. +∞].4. These techniques use a deterministic sequence of numbers (low discrepancy sequences).e.61) √ k k = Sj + Sj (r∆t + yk+1 σ ∆t) ˆj (4. if d is small. √ S n+1 = S n + S n (r∆t + φσ ∆t) φ = N (0. (4. Let x be a vector of LDS numbers on [0. 1]. A low discrepancy sequence tends to sample the space in a orderly fashion. So. for the j th trial • Generate xj (the j th LDS number in an N dimensional space). We cannot use the Box-Muller method in this case to produce normally distributed numbers. If d is the dimension of the space. the k − th timestep is sampled from the k − th coordinate in this N dimensional space.6 Low Discrepancy Sequences In a eﬀort to get around the √1 . 1) (4. its random). 1) . i. (M = number of samples) behaviour of Monte Carlo methods. then we need to think of this as a problem in N dimensional space. LDS methods generate numbers on [0..62) N • Compute the payoﬀ at S = Sj 27 . then xLDS yLDS = uniformly distributed on [0. The idea here is that a Monte Carlo method does not ﬁll the sample space very evenly (after all. since these numbers are deterministic. We have to invert the cumulative normal distribution in order to get the numbers distributed with mean zero and standard deviation one on [−∞.57) where M is the number of samples used. 1] = F (xLDS ) is N (0. an LDS algorithm would proceed as follows.. 1]. . if F (x) is the inverse cumulative normal distribution.

we have E( i j) = δij where δij = 0 . Now. what we are given). The denominator only dominates when M ed (4. d assets. convergence seems to slow down..57).. hence this is a degenerate case). which gives a very bad error bound.. the numerator in equation (4. φ1 1 φ ¯ φ = 2 . .65) where ρij is the correlation between asset i and asset j. Assuming Ψ is SPD. given ¯.64) where φn is N (0. Assume that this matrix is SPD (if not. we can Cholesky factor Ψ = LLt .e. There are a variety of LDS numbers: Halton. if i = j = 1 .57) dominates. 1) variables. For d large. However. (4.63) which is a very large number of trials for d 100. Note that the worst case error bound for the error is given by equation (4.7 Correlated Random Numbers In many cases involving multiple assets. it is easy to generate a set of d uncorrelated N (0. what we want to get). 1) numbers (i.68) | | φd d So. d = 50 − 100... If we use a reasonable number of timesteps. so that ρij = k Lik Lt kj (4. ¯= 2 (4. Sobol. Niederrieter. normally distributed random numbers. Call these produce correlated numbers? Let [Ψ]ij = ρij So. one of the random variables is a linear combination of the others. at least for path-dependent options. we have found that things are not quite this bad. once the dimensionality gets above a few hundred. 28 . 4.. and each asset follows the simulated path √ n+1 n n Si = Si + Si (r∆t + φn σi ∆t) i (4.66) be the matrix of correlation coeﬃcients. and LDS seems to work if the number of timesteps is less than 100 − 200. Fortunately.The option value is the average of these trials. how do we (4. d . Suppose we have i = 1. we would like to generate correlated. . etc. if i = j . and let ¯ be the vector of uncorrelated N (0. say 50 − 100.67) ¯ Let φ be the vector of correlated normally distributed random numbers (i.e. 1) and i E(φn φn ) i j = ρij 1 . Our tests seem to indicate that Sobol is the best. then.

The forward Euler algorithm is simply S i+1 √ = S i + S i (µh + φi h) (4. taking full advantage of the vectorization of Matlab. Now. Higham. Note that eliminating as many for loops as possible (i. E(φi φk ) = E = j l t l j Llk . let’s consider the following situation. For a good overview of these methods.70) j l Lij Lij E( t l j Llk t l j )Llk = j l Lij δlj Lt lk Lil Lt lk l = = ρij So. Before we start deﬁning what we mean by convergence. 1) numbers: • Factor the correlation matrix Ψ = LLt • Generate uncorrelated N (0. 1) numbers • Correlated numbers φi are given from i (4. we have been fairly slack about deﬁning what we mean by convergence when we use forward Euler timestepping (4. This article also has some good tips on solving SDEs using Matlab. in particular. computing all the MC realizations for each timestep in a vector) can reduce computation time by orders of magnitude. in order to generate correlated N (0.2) to integrate dS = µS dt + σS dZ .69) which gives φi φk = j l Lij Lkl Lij j l l j = Now.e.71) ¯ φ = L¯ 4.73) (4.72) where h = ∆t is the ﬁnite timestep. check out (“An algorithmic introduction to numerical simulation of stochastic diﬀerential equations. Recall that √ dZ = φ dt (4. (4.8 Integration of Stochastic Diﬀerential Equations Up to now. 43 (2001) 525-546).” by D.74) 29 . let φi = j Lij j (4. SIAM Review vol.since the i are uncorrelated.

8.76) where T is the stopping time of the simulation. As h → 0. Z(ti ) = Zi .77) (4. these are all legitimate points along a Brownian motion path.77)? Since we are now specifying that we know Zk . from equations (4. 1). } are valid points along a Brownian path. recall that the exact solution (for a given Brownian path) of equation (4.76). we have Zi .1 The Brownian Bridge So. we can generate Zj . suppose we have two points Zi .76). keeping the existing points along this path? In particular. Suppose we have the point Z(ti ) = Zi and we generate Z(tj ) = Zj . Zj Zk = Zi + x tj − ti = Zj + y tk − tj tk − tj . .where φ is a random draw from a normal distribution with mean zero and variance one. For example. and we would like to generate a point Zj at tj . Zk = Zi + x tj − ti + y (4. we want to do this by ﬁlling in new Z values in the Brownian path. tk ). }. (4. Let S(T ) be the exact solution (4.75) Now. However.79) and (4. Suppose on the other hand.. So. So. Let S(T )h represent the forward Euler solution (4. Now. in view of equation (2. with ti < tj < tk . how do we reﬁne this path.73)? We can simply take smaller timesteps. denoted by p(x. at (ti . this set of values {Z0 ..82) where p(. 4.73). while keeping the old values (since these are perfectly legitimate values). y). and Zi . Zk .. given that Zk is known? Let x. we will not get the exact solution (4. .g.. given (x. y) given z. This is because even though the Brownian path points are exact. Zk . Then how do we generate Zj using equation (4.81) √ tk − tj Now the probability density of drawing the pair (x.80) where z is N (0. Z(tk ) = Zk along the Wiener path. Z1 .. y be two draws from a normal distribution with mean zero and variance one. since there is no timestepping error here. How should we pick Zj ? What density function should we use when generating Zj ..72) is given by equation (2.72) using forward Euler. (4.53). and we generate Zk directly using √ Zk = Zi + z tk − ti .73) for a ﬁxed timestep h. by √ Zi+1 = Zi + φ ∆t . Now if we integrate equation (4. we must have that. this means that our method for generating Zj is constrained. given z.79) So. we would expect S(T )h → S(T ). 30 . how can we systematically study convergence of algorithm (4. Let’s imagine generating a set of Z values at discrete times ti . using the realization of the Bownian path {Z0 .57) S(T ) = S(0) exp[(µ − σ2 )t + σ(Z(T ) − Z(0))] 2 (4. e.. y|z) = p(x)p(y) p(z) (4. (4. with the discrete timesteps ∆t = ti+1 − ti . given a set of valid Zk . and we have used the fact that successive increments of a Brownian process are uncorrelated.) is a standard normal distribution. time discretization errors are introduced in equation (4. Z1 . y|z) is p(x.78) (4. for a given path.80) √ √ z tk − ti − x tj − t i y = .

the ﬁnal timestep path will pass through the coarse timestep nodes. z)|z) p(x. note that the ﬁne timestep path does not coincide with the coarse timestep nodes. z)|z) = = or (after some algebra. Figure 4. Equation (4. due to the timestepping error.81)) p(x|z) = 1 1 √ exp − (x − αz)2 /β 2 2 2π tj − t i α= tk − t i tk − t j β= t k − ti p(x)p(y(x.87) tj − t i (Zk − Zi ) tk − t i (4. Substituting equation (4. we can write y = y(x. then the ﬁne timestep path is constructed from the coarse path using a Brownian Bridge. we could imagine reﬁning this path (using a Brownian Bridge).From equation (4. In this case.1 shows diﬀerent Brownian paths constructed for diﬀerent timestep sizes. so that p(x. For the model SDE (4.73) fed with the Brownian paths in Figure 4. y(x.83) (4.86) and (4.89) is known as the Brownian Bridge. 1).88) = tk − tj tk − ti (4. using equation (4.1. 4. let √ x = tj − ti (Zk − Zi ) + φ tk − ti tk − tj tk − ti (4. Figure 4.89) where φ is N (0. it is not obvious how to deﬁne convergence. z).86) = Z − Zi √k tk − ti (4.85) where φ is N (0.72). An initial coarse path is constructed. x satisﬁes equations (4. and then seeing if the solution converged to exact solution. y(x.84) so that x is normally distributed with mean αz and variance β 2 .8. y|z) = p(x.2 shows the asset paths integrated using the forward Euler algorithm (4.77) gives Zj = t k − tj tk − t i Zi + tj − ti t k − ti Zk + φ (tj − ti )(tk − tj ) (tk − ti ) (4. Given a number of points along a Brownian path. 1). z)) p(z) 1 1 √ exp − (x2 + y 2 − z 2 ) 2 2π (4. hγ (4.81).2 Strong and Weak Convergence Since we are dealing with a probabilistic situation here.88) into (4. Clearly.90) .88). By construction. we could ask that E |S(T ) − S h (T )| 31 ≤ Const. Since z we have that x has mean √ E(x) = and variance E[(x − E(x))2 ] Now.

Note that the small timestep points match the coarse timestep points.4 0. note that the asset paths for ﬁne and coarse timestepping do not agree at the ﬁnal time (due to the timestepping error).3 0.0. for small enough timesteps.4 -0.5 0.2 0.2 -0. Right: each coarse timestep divided into 64substeps. 32 .1 -0.3 -0.4 0.3 0.5 0.7 0.1 0.3 0.1 -0.6 0.1 used to determine asset price paths using forward Euler timestepping (4.5 0.4 0.1 0.1: Eﬀect of adding more points to a Brownian path using a Brownian bridge.3 0.8 0.8 0.2 0. Eventually.4 0.7 0.3 0.8 0.1 0 -0.6 0. Left: each coarse timestep is divided into 16 substeps.7 0.1 0.6 0.8 0.3 -0.4 0.6 0. Right: each coarse timestep divided into 64 substeps.73).5 0.1 0 -0.4 0.2 -0.5 0.3 0. In this case.9 1 Brownian Path 0.9 1 Asset Price 0 0. Left: each coarse timestep is divided into 16 substeps. 120 115 110 105 100 95 90 85 80 120 115 110 105 100 95 90 85 80 Asset Price 0 0.2 0.2 0.4 -0.1 0.2 Brownian Path 0.5 0 0.2: Brownian paths shown in Figure 4.7 0.9 1 Time Time Figure 4.9 1 Time Time Figure 4.2 0.5 0. the ﬁnal asset value will converge to the exact solution to the SDE.5 0 0.

for normal parameters. and γ = 1.06 100 Table 4. √ In Table 4.72) using method (4. These paths were systematically reﬁned using the Brownian Bridge construction. the Monte Carlo sampling error is much larger than the timestepping error. Criterion (4. and h is the timestep size. 33 .2 shows results where the strong and weak convergence errors are estimated as Strong Error = 1 N | 1 N N |S(T )i − S h (T )i | i=1 N (4. This code runs very slowly.T σ µ S0 .90) is over many Brownian paths. However. and N is the number of samples.5 for strong convergence.72).1.93) by the theoretical limit (4. even using equation (4.93) requires a large number of Monte Carlo samples in order to ensure that the error is dominated by the timestepping error.95) A straightforward implementation of Monte Carlo timestepping for solving the SDE in Matlab is shown in Algorithm (4.93) i=1 where S h (T )i is the solution obtained by forward Euler timestepping along the i th Brownian path.91) It can be shown that using forward Euler results in weak convergence with γ = 1. and S(T )i is the exact solution along this same path.0 for weak convergence. This is consistent with a convergence rate of γ = . A series of Brownian paths was constructed.1: Data used in the convergence tests. hγ (4. we have the exact solution 1 N →∞ N lim N [S(T )i ] = S0 eµT i=1 (4. and strong convergence with γ = .4 .25 . so we would have to use an enormous number of Monte Carlo samples. and about one for the weak error. 4.92) S h (T )i | . Note that S(T ) is the exact solution along a particular Brownian path.5. we could replace the approximate expression 1 N →∞ N lim N [S(T )i ] i=1 by S0 eµT . where the expectation in equation (4.9 Matlab and Monte Carlo Simulation dS = µS dt + σS dZ (4. beginning with a coarse timestep path.94). we can see that the ratio of the errors is about 2 for the strong error. If we use enough Monte Carlo samples.73). as opposed to the Monte Carlo sampling error. Table 4. Table 4.1 shows some test data used to integrate the SDE (4. but for normal parameters. the same path used to compute S h (T ).93) will measure the timestepping error. Estimating the weak error using equation (4.96). Weak Error = [S(T )i ] − i=1 1 N N (4.94) but we do not replace the approximate sampled value of the limit in equation (4. A less strict criterion is |E [S(T )] − E S h (T ) | ≤ Const.90) is called strong convergence. Note that for equation (4.

for i=1:N % timestep loop S = S + S*( drift + sigma_sqrt_delt*randn(1. S_new = zeros(N_sim.97).1.91) .96) Alternatively.0095 Weak Error (4.0269 . hist(S_new. % % % % % % % expiry time volatility P measure drift initial value number of simulations number of timesteps timestep drift = mu*delt.m randn(’state’. T = 1. S = max(0.90) .5g\n’.00174 . delt = T/N.Timesteps 72 144 288 576 Strong Error (4. This runs much faster. S ). N_sim = 10000. Data in Table 4.1).000 samples used.00047 Table 4.25. n_bin).stndrd_dev)). as shown in Algorithm (4. Slow. end % simulation loop n_bin = 200. The innermost simulation loop can be replaced by vector statements. mu = . disp(sprintf(’standard deviation: %.1) ).00093 . (4. sigma_sqrt_delt = sigma*sqrt(delt).1) = S. sigma = 0.10.100). S_init = 100.00.5g\n’. 100.0135 . % check to make sure that S_new cannot be < 0 end % timestep loop S_new(m.2: Convergence results. we can use Matlab’s vectorization capabilities to interchange the timestep loop and the simulation loop.0. mean_S = mean(S_new).stndrd_dev)). stndrd_dev = std(S_new).0190 .00194 . 34 . N = 100. for m=1:N_sim S = S_init. disp(sprintf(’mean: %.

1) ). % T = 1.100). for i=1:N % timestep loop % now. stndrd_dev = std(S_new). for each timestep. S_new = zeros(N_sim.25.*( drift + sigma_sqrt_delt*randn(N_sim.00. sigma_sqrt_delt = sigma*sqrt(delt).stndrd_dev)). generate info for % all simulations S_new(:.1) +.97) 5 The Binomial Model We have seen that a problem with the Monte Carlo method is that it is diﬃcult to use for valuing American style options. N = 100.m randn(’state’.10.1) ). mu = . mean_S = mean(S_new). S_old = zeros(N_sim.5g\n’.1). % % % % % % % expiry time volatility P measure drift initial value number of simulations number of timesteps timestep drift = mu*delt. disp(sprintf(’mean: %. N_sim = 10000. S_new(:. sigma = 0.1) = max(0.5g\n’..1) = S_old(:. S_old(1:N_sim. (4.stndrd_dev)).Fast.1). Recall that the holder of an American option can exercise the option at any time and receive 35 . % % end of generation of all data for all simulations % for this timestep end % timestep loop n_bin = 200. hist(S_new.1). S_init = 100. S_old(:. S_new(:..1) = S_init. n_bin). % check to make sure that S_new cannot be < 0 S_old(:.1). delt = T/N.0. disp(sprintf(’standard deviation: %.1) = S_new(:.

For example.4).. If we are at node j. then the risk neutral process for X is (from equation (2. If we let X = log S. (5. as shown on Figure 5. Recall that we can determine the no-arbitrage value of an option by pretending we live in a risk-neutral world. the continuation value depends on what happens in the future.. if we are valuing a put option. The value today is the discounted expected future value 36 . This is actually a dynamic programming problem. all we have to do is let let α = r − σ . We ﬁrst set the value of the option at T = N ∆t to the payoﬀ.5) So. consider that at node (j. and hence we don’t know how to act optimally. At any point in time. Recall that X = log S. then we hold. These sorts of problems are usually solved by proceeding from the end point backwards. we can use the risk neutral world idea to determine the no-arbitrage value of the option (it is the expected value in the risk neutral world). where risky assets drift at r and are discounted at r.5 we showed that ∆X = σ ∆t. . . If the continuation value is greater than the payoﬀ. n). so that (S = e ) n+1 Sj+1 n+1 Sj n = Sj eσ √ ∆t ∆t √ n = Sj e−σ (5. j = 0.3) √ X Now.1) is formally identical to equation 2 (2. 0) . so that in terms of asset price. In order to ensure that in the limit as ∆t → 0. then VjN = N max(K − Sj . we require that the sizes of √ the random jumps are ∆X = σ ∆t and that the probabilities of up (p) and down (q) moves are 1 α√ pr = [1 + ∆t] 2 σ 1 r σ √ = [1 + − ∆t] 2 σ 2 √ 1 α qr = [1 − ∆t] 2 σ 1 r σ √ = [1 − − ∆t] . We use the same idea here. the ﬁrst step in the process is to construct a tree of stock price values. Now. otherwise. we can construct a discrete approximation to this random walk using the lattice discussed in Section 2 2. n (5. with probability pr with probability q r (5. this is Sj = eXj . Now.55) ) σ2 )dt + σdZ . so that equation (5.1). n+1 → Sj .4) or n Sj 0 = S0 e(2j−n)σ √ ∆t . as in the Monte Carlo approach. q. since in Section 2. In fact. we get the process (5. the asset can move up with probability pr and down with probability r q .1) 2 Now. timestep n. if we simulate forward in time.. We can do this by working backward through the lattice. Associated with each stock price on the lattice is the option value Vjn .the payoﬀ. j = 0. In order to determine whether or not it is worthwhile to hold the option. Clearly. we don’t know what happens in the future. we exercise. N (5.5.6) Then. (5.. we have to compare the value of continuing to hold the option (the continuation value) with the payoﬀ. we will switch to a more common notation. In other words dX = (r − n Sj n Sj n+1 → Sj+1 . we will denote this n n n node location by Xj .2) 2 σ 2 where we have denoted the risk neutral probabilities by pr and q r to distinguish them from the real probabilities p.. We have to start from the terminal time and work backwards.

. . max(K − Sj . the binomial lattice method has the following disadvantages • Except for simple cases.. 0 j = 0. we obtain the value at S0 today... since we know the continuation value. • This method is algebraically identical to an explicit ﬁnite diﬀerence solution of the Black-Scholes equation. coding becomes complex. • It is easy to explain to managers. However.. things become nightmarish. 0) n = N − 1.1: Lattice of stock price values European Lattice Algorithm Vjn n+1 = e−r∆t pr Vj+1 + q r Vjn+1 n = N − 1. If the option is an American put.. convergence is at an O(∆t) rate. For example...7) 0 Rolling back through the tree. • American options are easy to handle. 0 j = 0. 37 .2 S2 S1 1 S0 0 S2 1 S1 0 S2 0 Figure 5. .. n (5... which is V00 .7) becomes American Lattice Algorithm (Vjn )c Vjn n+1 = e−r∆t pr Vj+1 + q r Vjn+1 = n max (Vjn )c . we can determine if it is optimal to hold or exercise.8) which is illustrated in Figure 5. n (5. The binomial lattice method has the following advantages • It is very easy to code for simple cases.. Consequently.. In this case the rollback (5. . if we want to handle simple barrier options.

V n+11 j+ p Vj n q V n+1 j Figure 5. and not fool around with lattices. n Sj n Sj n+1 → Sj+1 . At t = (n + 1)∆t. (5. If we are going to solve the Black-Scholes PDE. let’s consider the usual hedging portfolio at t = n∆t. and complex wrong-headed arguments.13) Pjn+1 38 . 5.10) n in equation (2. Now.5. (5. we might as well do it right.11) n where Vjn is the value of the option at t = n∆t. they are simply the coeﬃcients in a particular discretization of a PDE. Suppose we assume that dS = µSdt + σSdZ and letting X = log S. n+1 n Sj+1 = Sj eσ n+1 n Sj = Sj e−σ so that the value of the hedging portfolio at t = n + 1 is n+1 Pj+1 n+1 n+1 = Vj+1 − (αh )Sj+1 .19). S = Sj .12) (5. n+1 = Vjn+1 − (αh )Sj . we have that dX = (µ − so that α = µ − σ2 2 (5. n Pjn = Vjn − (αh )Sj . Regarding them as probabilities leads to much fuzzy thinking. • The probabilities pr . q r are not real probabilities. with probability p with probability q .1 A No-arbitrage Lattice We can also derive the lattice method directly from the discrete lattice model in Section 2.2: Backward recursion step.9) σ2 )dt + σdZ 2 (5. S = Sj . with probability p with probability q √ ∆t ∆t √ → n+1 Sj .

Suppose we are hedging a portfolio of RIM stock. and (αh ) from equation (5.14) so that n+1 n+1 Vj+1 − (αh )Sj+1 which gives (αh ) = n+1 Vj+1 − Vjn+1 n+1 n+1 Sj+1 − Sj . using either p is the same to O(∆t). The best that can be said is that if the Black-Scholes model was perfect.2). and assuming only two possible future states n Sj n Sj n+1 → Sj+1 . having truncation error O(∆t). n What is the meaning of a no-arbitrage tree? If we are sitting at node Sj . q r∗ is the common deﬁnition in ﬁnance books. q. V00 in a Taylor (5. since these methods can both be regarded ﬁnite diﬀerence approximation to the Black-Scholes equation. (5. q or pr . The fact that there is no-arbitrage in the context of the simpliﬁed model does not really have a lot of relevance to the real-world situation. we can show that pr∗ q r∗ = pr + O((∆t)3/2 ) = q r + O((∆t)3/2 ) . Let ∆t = one day.13).16) n+1 Now. then we have that the portfolio hedging ratios computed using either pr . This is obviously a highly simpliﬁed model. with Pj+1 from equation (5.17) r∗ r∗ Note that p . But let’s be a bit more sensible here. then using (αh ) from equation (5. as in Section 2. q r or pr∗ .11). since the tree has no-arbitrage. we can determine (αh ) so that the value of the hedging portfolio is independent of p. .Now. 39 . substitute for Pjn from equation (5. so that Pjn n+1 = e−r∆t Pj+1 = e−r∆t Pjn+1 . pr∗ .18) r∗ After a bit more work. We do this by requiring that n+1 Pj+1 = Pjn+1 n+1 = Vjn+1 − (αh )Sj (5. Do we actually believe that tomorrow there are only two possible prices for Rim stock Sup Sdown = 10eσ = √ ∆t ∆t √ 10e−σ ? (5. Suppose the price of RIM stocks is $10 today. it must earn the risk free rate of return.15) Since this portfolio is risk free. q do not depend on the real drift rate µ. one can show that the value of the option at t = 0.15) gives Vjn n+1 = e−r∆t pr∗ Vj+1 + q r∗ Vjn+1 p q r∗ r∗ r∗ r∗ eσ ∆t − = 1 − pr∗ . q r in equations (5. and compare with the pr . which is not surprising.15) guarantees that the hedging portfolio has the same value in both future states. q r∗ are both correct to O(∆t). q r as an explicit The deﬁnition r∗ that there are with probability p with probability q → n+1 Sj .3. = er∆t − e−σ √ ∆t √ e−σ ∆t √ (5. which is expected. q Series. If we expand p . (5.19) Of course not.

5) Note that we have evaluated the integral using the left hand end point of each subinterval (the no peeking into the future principle). as ∆t → 0 of a discrete sum.6 More on Ito’s Lemma .. Replace the integral by a sum.3) c(X(s).8) 40 . s)dZ(s) = 0 ∆t→0 lim cj ∆Zj j=0 cj = c(X(Zj ).2) where we can interpret the integrals as the limit. tj ) Zj = Z(tj ) ∆Zj = Z(tj+1 ) − Z(tj ) ∆t = tj+1 − tj N = t/(∆t) In particular. t j=N −1 c(X(s). (6. s) dZ(s) 0 2 = ∆t→0 lim 2 c(Xj .1.1) really means. s)ds (6. Springer. (6. tj )∆Zj . we will give some justiﬁcation this remark. we will use the notation j=N −1 ≡ j j=0 . by Bernt Oksendal. 1998.6) Now. in order to derive Ito’s Lemma. The only sensible interpretation of this is t t X(t) − X(0) = 0 α(X(s). For example. For a lot more details here. t j=N −1 c(X(s). and decide what the statement dX = αdt + cdZ (6. s)dZ(s) .6. we have to decide what t (6.it can be shown that dZ 2 → dt as dt → 0 In Section 2. we mysteriously made the infamous comment In this Section. s) dZ(s)2 0 (6. j=0 (6..7) or lim j 2 cj ∆Zj = ∆t→0 ∆t→0 lim cj ∆t j (6. we refer the reader to Stochastic Diﬀerential Equations. s)ds + 0 c(X(s). s)dZ 2 (s) 0 = 0 c(X(s). We have to go back here.4) means. we claim that t t c(X(s). From now on.

and ∆Zi are independent.9) If equation (6.15). we can say that dZ 2 → dt as dt → 0. then we can say that (in the mean square limit) ∆t→0 lim j 2 cj ∆Zj = ∆t→0 t lim cj ∆t j = 0 c(X(s).13) becomes 2 2 E cj (∆Zj − ∆t) ci (∆Zi − ∆t) ij (6.which is what we mean by equation (6. (6.11) and hence we can say that dZ 2 → dt with probability one as ∆t → 0.e. let’s consider a ﬁnite ∆t. Now. s) dZ 2 0 = 0 c(X.e. s) ds (6.17) 41 . expanding equation (6.9) gives E 2 cj ∆Zj − j j (6. and consider the expression 2 E 2 cj ∆Zj − j j cj ∆t (6. then equation (6. or E (∆Zj − ∆t)(∆Zi − ∆t) = 0. Now.9) tends to zero as ∆t → 0. 2 E[c2 ] E (∆Zi − ∆t)2 i i = i 4 2 E[c2 ] E ∆Zi − 2∆tE ∆Zi + (∆t)2 i . • ci = c(ti . which means 2 2 2 2 that Cov ∆Zi ∆Zj = 0.10) so that in this sense t t c(X. This follows since • The increments of Brownian motion are uncorrelated. i. ij (6.14) 2 2 which means that cj (∆Zj − ∆t) and ci (∆Zi − ∆t) are independent if i = j. i.12) 2 cj ∆t = 2 2 E cj (∆Zj − ∆t)ci (∆Zi − ∆t) .14-6.16) Now.7). i = j. s) ds (6. X(Zi )). Using equations (6. we assume that 2 2 E cj (∆Zj − ∆t)ci (∆Zi − ∆t) 2 = δij E c2 (∆Zi − ∆t)2 i (6. i (6.13) Now.15) = i 2 E[c2 ] E (∆Zi − ∆t)2 . Cov [∆Zi ∆Zj ] = 0. It also follows from the above properties that 2 2 E[c2 (∆Zj − ∆t)2 ] = E[c2 ] E[(∆Zj − ∆t)2 ] j j 2 since cj and (∆Zj − ∆t) are independent. i = j.

so that from Ito’s Lemma dF or in shorter form dF = µdt + σ ∗ dZ µ = aFS + σ ∗ = bFS .20) 2 cj ∆t = O(∆t) (6. (7. X(s))ds = 0 (6.23) 7 Derivative Contracts on non-traded Assets and Real Options The hedging arguments used in previous sections use the underlying asset to construct a hedging portfolio. (6. t). However. What if the underlying asset cannot be bought and sold. we can’t store this either. t)dZ.22) so that in this sense we can write dZ 2 → dt . we can get around this using the following approach. t)dt + b(S.2) .19) and 2 E[c2 ] E (∆Zi − ∆t)2 i i = 2 i E[c2 ](∆t)2 i E[c2 ]∆t i i = 2∆t = O(∆t) so that we have E or t ∆t→0 2 2 cj ∆Zj − j (6.18) = 2(∆t)2 (6. Or if the underlying asset is bandwidth. 42 b2 FSS + Ft 2 (7. 2 (7.Recall that (∆Z)2 is N (0. ∆t) ( normally distributed with mean zero and variance ∆t) so that E (∆Zi )2 E (∆Zi ) so that equation (6. or is non-storable? If the underlying variable is an interest rate.17) becomes 4 2 E ∆Zi − 2∆tE ∆Zi + (∆t)2 4 = = ∆t 3(∆t)2 (6.3) = aFS + b2 FSS + Ft dt + bFS dZ.21) lim E 2 cj ∆Zj − 0 c(s.1 Derivative Contracts dS = a(S. 7. we can’t store this.1) Let the underlying variable follow and let F = F (S. dt → 0 .

∗ ∗ (σ2 µ1 − σ1 µ2 )dt = r(∆1 F1 + ∆2 F2 )dt (7.Now.5) We can eliminate the random term by choosing ∆1 ∆2 so that dP ∗ ∗ = (σ2 µ1 − σ1 µ2 )dt ∗ = σ2 ∗ = −σ1 (7. we will hedge one contract with another. (7. 2 .4) Consider the portfolio P such that (recall that ∆1 .3) into equation (7. Suppose we have two contracts F1 . σ ∗ from equations (7. (7.9).12) Equation (7. Suppose µ = Fµ σ∗ = F σ so that we can write dF = F µ dt + F σ dZ 43 (7. ∆2 are constant over the hedging interval) P dP = ∆1 F1 + ∆2 F2 ∗ ∗ = ∆1 (µ1 dt + σ1 dZ) + ∆2 (µ2 dt + σ2 dZ) (7. traded or not.6) (7. Then dF1 dF2 ∗ = µ1 dt + σ1 dZ ∗ = µ2 dt + σ2 dZ µi = a(Fi )S + ∗ σi = b(Fi )S b2 (Fi )SS + (Fi )t 2 . 2 (7.10) Substituting µ. so that µ1 − rF1 ∗ σ1 µ2 − rF2 ∗ σ2 Dropping the subscripts. t) be the value of both sides of equation (7.7) Since this portfolio is riskless. i = 1.12) is the PDE satisﬁed by a derivative contract on any asset S. it must earn the risk-free rate of return.11) gives Ft + b2 FSS + (a − λS b)FS − rF = 0 .14) (7. t) .11) = λS (S. t) = λS (S. F2 (they could have diﬀerent maturities for example). we obtain µ − rF1 σ∗ = λS (7.13) . instead of hedging with the underlying asset.9) Let λS (S.8) and after some simpliﬁcation we obtain µ1 − rF1 ∗ σ1 = µ2 − rF2 ∗ σ2 (7.

denoted by β u . if the ﬁrm we used to compute the β above has signiﬁcant debt.13) gives (a − λS b) = rS.20) So. We compute β as the best ﬁt linear regression to Ri which means that β Now...16) where λM is the price of risk of the market portfolio. it must satisfy equation (7. so that F then satisﬁes the usual Black-Scholes equation if b = σS. The return of the stock over ti − ti−1 is Ri = qi − qi−1 . The unlevered β can be computed by βu where D E Tc = = = long term debt Total market capitalization Corporate Tax rate . Note that the volatility and drift of F are not the volatility and drift of the underlying asset S. = E β E + (1 − Tc )D (7. In other words.22) . qi−1 M Let Ri be the return of the market portfolio (i. then µ = E(Ru ) = r + β u E(RM ) − r 44 (7. If we believe that the Capital Asset Pricing Model holds. RM ) . The extra return is λS σ . from CAPM we have that E(R) = M = α + βRi Cov(R. Suppose we can ﬁnd some companies whose main source of business is based on S. a broad index) over the same period. Let qi be the price of this companies stock at t = ti .then using equation (7. Substituting F = S into equation (7. V ar(RM ) (7.21) If we assume that F in equation (7. where λS is the market price of risk of S (which should be the same for all contracts depending on S) and σ is the volatility of F . and ρSM is the correlation of returns between S and the returns of the market portfolio.) is the expectation operator. If S is a traded asset.17) = r + β E(RM ) − r (7.18) where E(.e. (7.14) is the company stock. then a simple minded idea is to estimate λS = ρSM λM (7.13) in equation (7. which is the β for an investment made using equity only. its riskiness with respect to S is ampliﬁed.11) gives µ = r + λS σ (7. now the expected return from a pure equity investment based on S is E(Ru ) = r+ β u E(RM ) − r . We would like to determine the unlevered β.19) (7.12). Another idea is the following.15) which has the convenient interpretation that the expected return on holding (not hedging) the derivative contract F is the risk-free rate plus extra compensation due to the riskiness of holding F .

24) = Fσ = bFS . (which is set at the inception of the contract) and the current forward price 45 .25) . If it is a mixture of debt and equity.13) that σ∗ σ∗ or σ Combining equations (7. t)/S is the volatility rate of S (equation (7. • b(S. 7.3) and (7. If it is going to use pure equity. = bFS .23) = β u E(RM ) − r .26) In principle. we will have to make some assumption about how the ﬁrm will ﬁnance a new investment. the expected return of the market index above the risk free rate is about 6% for the past 50 years of Canadian data.22) gives λS σ Recall from equations (7.22-7. (7. The value of a forward contract is a contingent claim. we should relever the value of FS . whose main business is based on S. The payoﬀ of a (long) forward contract expiring at t = T is then V (S. (7. this may have to be unlevered. and the ﬁrms balance sheet.But equation (7. (7. we need to talk to a Finance Professor to get this right. The holder of a forward contract agrees to buy or sell the underlying asset at some delivery price K in the future. τ = 0) = S(T ) − K .24-7. then we are done. K is determined so that the cost of entering into the forward contract is zero at its inception. At this point. to reduce the eﬀect of debt. we can now compute λS . • FS can be estimated by computing a linear regression of the stock price of a ﬁrm which invests in S. and S.15) says that µ Combining equations (7.12) Vt + b2 VSS + (a − λS b)VS − rV = 0 . • [E(RM ) − r] can be determined from historical data. F (7.2 A Forward Contract A forward contract is a special type of derivative contract. and its value is given by equation (7. (7. • The risk free rate r is simply the current T-bill rate. since • The unleveraged β u is computed as described above.25) gives λS = β u E(RM ) − r bFS F = r + λS σ . Now.27) Note that there is no optionality in a forward contract. For example. If we are going to now value the real option for a speciﬁc ﬁrm. 2 (7. This can be done using market data for a speciﬁc ﬁrm.28) Now we can also use a simple no-arbitrage argument to express the value of a forward contract in terms of the original delivery price K.1)).

we would like to hedge as infrequently as possible. τ ) = (f − K)e−rτ . we cannot hedge at inﬁnitesimal time intervals.32). In fact.34) In this case. Therefore.1 Convenience Yield We can also write equation (7. At some time t > 0. τ ) (which costs us nothing to enter into). we can always construct the above hedge at no cost.31) into equation (7. Suppose the forward price is f (S.2. then the value of this portfolio today is (f − K)e−rτ .f (S. τ ). then the payoﬀ of a long forward contract. 8 Discrete Hedging In practice. We can then estimate λS by solving equation (7. entered into at (τ ) is Payoﬀ = S(T ) − f (S(τ ).30) (7. and there are a set of forward prices available. Suppose we are long the forward contract struck at t = 0 with delivery price K. and noting that K is a constant. τ ). At some time t > 0.33) b2 fSS + (a − λS b)fS 2 (7. the forward price is no longer K. we can interpret δ as the convenience yield for holding the asset. (7. V (S.32) which can be interpreted as the fact that the forward price must equal the spot price at t = T . but which can be taken into account and results in a nonlinear PDE).35) b2 fSS + (r − δ)SfS = 0 2 (7. gives us the following PDE for the forward price (the delivery price which makes the forward contract worth zero at inception) fτ = with terminal condition f (S. (7. τ = 0) = S (7.32) as ft + where δ is deﬁned as δ=r− a − λS b .29) But if we hold a forward contract today. Suppose we can estimate a. there is a convenience to holding supplies of natural gas in reserve.32) and adjusting λS until we obtain a good ﬁt for the observed forward prices. τ ).31) Substituting equation (7. S (7. we hedge this contract by going short a forward with the current delivery price f (S. b in equation (7. Suppose we are long a forward contract with delivery price K. since in real life. K are known with certainty at (S. (τ < T ). τ ) . The payoﬀ of this portfolio is S − K − (S − f ) = f − K Since f. there are transaction costs (something which is ignored in the basic Black-Scholes equation. 46 . 7.28). For example.

Since there are discrete hedging errors. VAR is the worst case loss with a given probability. which we borrow from the bank if (αi − αi−1 ) > 0. • Long α(t)h S(t) shares • An amount in a risk-free bank account B(t). Initially. If ∆t = ti − ti−1 . so that updating our share position requires h h S(ti )(αi − αi−1 ) h h h h in cash. K = S0 = 100. As an example. In ﬁnance. and hedging once 47 . fraction of Monte Carlo trials giving a hedging error between E and E +∆E. At t = 0. Suppose we have precomputed the values of VS for all the likely (S.34. and also the value at risk (VAR). S0 = 100. i.06. We can determine the distribution of proﬁt and loss ( P& L) by carrying out a Monte Carlo simulation. t) values. If (αi − αi−1 ) < 0. then we sell some shares and deposit the proceeds in the bank account. which we set at µ = . t = 0) ∗ (8.e. r = . the results in this case will depend on the stock drift rate. ti ) = V (Si . For example. consider the case of an American put option. we simulate a series of random paths.1) After computing many sample paths. For each random path. ti ) then. we can plot a histogram of relative hedging error. σ = . then the bank account balance is updated by h h Bi = er∆t Bi−1 − Si (αi − αi−1 ) At the instant after the rebalancing time ti . we have P (0) = 0 = −V (0) + α(0)h S(0) + B(0) α = VS B(0) = V (0) − α(0)h S(0) The hedge is rebalanced at discrete times ti . the value of the portfolio is P (ti ) = −V (ti ) + α(ti )h S(ti ) + B(ti ) Since we are hedging at discrete time intervals. Figure 8. In other words.25. hence this strategy is called delta hedging.1 shows the results for no hedging. we have to update the hedge by purchasing αi − αi−1 shares at t = ti . obtained by solving the Black-Scholes linear complementarity problem. ﬁnd the value of E along the x-axis such that the area under the histogram plot to the right of this point is .08.3. Deﬁning h αi Vi = VS (Si . As an example.1 Delta Hedging Recall that the basic derivation of the Black-Scholes equation used a hedging portfolio where we hold VS shares. we determine the discounted relative hedging error error = e−rT P (T ∗ ) V (S0 .8. The initial value of the American put.95× the total area. a typical VAR number reported is the maximum loss that would occur 95% of the time. consider the hedging portfolio P (t) which is composed of • A short option position in an option −V (t). We can compute the variance of this distribution. T = . the hedge is no longer risk free (it is risk free only in the limit as the hedging interval goes to zero). is $5. Then. VS is called the option delta.

r = . We can see clearly here that the mean is zero. and the y-axis shows the relative frequency. However. and the shareholders take all the losses.5 Relative hedge error 0 0.25. The writer makes this premium for any path which ends up S > K. T = . Relative P& L = Actual P& L Black-Scholes price (8.5 -1 -0. the hedger does nothing. In this case. since markets tend to drift up on average. 8. 48 .1 also shows the relative frequency of the P &L of hedging once a month (only three times during the life of the option).1: Relative frequency (y-axis) versus relative P&L of delta hedging strategies. The x-axis in these plots shows the relative P & L of this portfolio (i. which we assume is the Black-Scholes value.5 1. hence this strategy is termed delta-gamma hedging.3. hence the sudden jump in probability.5 -1 -0. In fact. The relative P&L is computed by dividing the actual P&L by the Black-Scholes price. usually the owners of these hedge funds walk away with large bonuses. American put.e. However. P & L divided by the Black-Scholes price). Figure 8.5 -2 -1.5 1 1 0. In this case. In fact.5 3 3 Relative frequency 2 Relative frequency -3 -2. The second derivative of an option value VSS is called the option gamma.2 shows the results for rebalancing the hedge once a week.06. there is signiﬁcant probability of a loss as well. This is because the maximum the option writer stands to gain is the option premium.5 -2 -1.2) Note that the no-hedging strategy actually has a high probability of ending up with a proﬁt (from the option writer’s point of view) since the drift rate of the stock is positive. A gamma hedge consists of • A short option position −V (t). but simply pockets the option premium. σ = . Left: no hedging. one can √ show that the standard deviation of the hedge error should be proportional to ∆t where ∆t is the hedge rebalance frequency.5 1 2. Blowing up is a technical term for losing all your capital and being forced to get a real job. K = S0 = 100.08. right: rebalance hedge once a month. However. • Long αh S(t) shares • Long β another derivative security I.4 4 3. and daily.2 Gamma Hedging In an attempt to account for some the errors in delta hedging at ﬁnite hedging intervals. these funds will perform very well for several years.5 2 1.5 Relative hedge error 0 0. we can try to use second derivative information.5 1 Figure 8. Note the sudden jump in the relative frequency at relative P &L = 1. there is a history of Ponzi-like hedge funds which simply write put options with essentially no hedging. then a sudden market drop occurs. and variance is getting smaller as the hedging interval is reduced.5 0 0 -3 -2. Figure 8. and they will blow up. µ = .5 2. which is many paths. • An amount in a risk-free bank account B(t). a month.5 0.5 3.

5 -2 -1. we will use a European put option written on the same underlying with the same strike price and a maturity of T=. For an additional instrument.5 1. K = S0 = 100. at t = 0 we have h P (0) = 0 ⇒ B(0) = V (0) − α0 S0 − β0 I0 h The amounts αi . • In order for the gamma hedge to work. r = . Hence.5 2 1. then we require that ∂P ∂S ∂2P ∂S 2 Note that • If β = 0. σ = .5 -2 -1. since we need to be able to compute the second derivative of the theoretical price. The relative P&L is computed by dividing the actual P&L by the Black-Scholes price. β to be constant over the hedging interval (no peeking into the future). right: rebalance hedge daily.5 1 Figure 8. and try to buy/sell things to get gamma neutral.5 1 2.08.5 1 1 0. gamma hedging produces a smaller variance with less frequent hedging. βi are determined by requiring that equation (8.2: Relative frequency (y-axis) versus relative P&L of delta hedging strategies. American put. Now.3 shows the results of gamma hedging. so we can regard these as constants (for the duration of the hedging interval).3. along with a comparison on delta hedging. In principle.5 0 0 -3 -2.5 0.5 3 3 Relative frequency 2 Relative frequency -3 -2.5 Relative hedge error 0 0.5 -1 -0. Left: rebalance hedge once a week.25.06.4) The bank account balance is then updated at each hedging time ti by Bi h h = er∆t Bi−1 − Si (αi − αi−1 ) − Ii (βi − βi−1 ) We will consider the same example as we used in the delta hedge example. T = .5 Relative hedge error 0 0. µ = . we are exposed to more model error in this case.5 2.5 3. traders often speak of being long (positive) or short (negative) gamma. So. we need an instrument which has some gamma (the asset S has second derivative zero).3) = 0 = 0 (8. The hedge portfolio P (t) is then P (t) = −V + αh S + βI + B(t) Assuming that we buy and hold αh shares and β of the secondary instrument at the beginning of each hedging interval.5 years. However.3) hold h −(VS )i + αi + βi (IS )i −(VSS )i + βi (ISS )i ∂V ∂I + αh + β =0 ∂S ∂S 2 2 ∂ V ∂ I = − 2 +β 2 =0 ∂S ∂S = − (8.4 4 3.5 -1 -0. Figure 8. then we get back the usual delta hedge. 49 . recall that we consider αh .

The relative P&L is computed by dividing the actual P&L by the Black-Scholes price.5 -2 -1. This is great if you can get someone to buy this option at this price. √ dS = µSdt + vSdZ1 √ dv = κ(θ − v)dt + σv vdZ2 (8. Z2 are Wiener processes with correlation parameter ρ. 2 (1995) 77-88). and price the option in the usual way. and to hedge based on a worst case (from the hedger’s point of view). Pricing and Hedging Derivative Securities in Markets with Uncertain Volatilities.5 Relative hedge error 0 0. as with a gamma hedge. r = .3 Vega Hedging The most important parameter in the option pricing is the volatility.3: Relative frequency (y-axis) versus relative P&L of gamma hedging strategies. σ = . same strike.06. µ = . Levy. i.25. σv is the “volatility of volatility” parameter. Fin.5 years. Another possibility is to assume that the volatility is uncertain. 8. T = . Paris. T = .e.5 Relative hedge error 0 0.5 -1 -0.3. What if we are not sure about the value of the volatility? It is possible to assume that the volatility itself is stochastic. • Long αh S(t) shares • Long β another derivative security I. • An amount in a risk-free bank account B(t). An alternative.5 1 25 25 20 15 15 10 10 5 5 0 0 -3 -2. Studies 6 (1993) 327-343). dZ2 ).5 -2 -1. A closed form solution for options with stochastic volatility with applications to bond and currency options. the this will result in a two factor PDE to solve for the option price and the hedging parameters. and to assume that σmin ≤ σ ≤ σmax . the price you get has to cover the worst case scenario). and Z1 . Rev. This results in an uncertain volatility model (Avellaneda. Fin.40 40 35 35 30 30 Relative frequency 20 Relative frequency -3 -2.5 1 Figure 8.08. v is its instantaneous volatility. S0 = 100. Dotted lines show the delta hedge for comparison. Secondary instrument: European put option. Then.5 -1 -0. is to construct a vega hedge. we construct a portfolio • A short option position −V (t). Math. If we use the model in equation (8.5). Since there are two sources of risk (dZ1 . because the hedger is always guaranteed to end up with a non-negative balance in the hedging portfolio. much simpler. we will need to hedge with the underlying asset and another option (Heston. American put. Appl. right: rebalance hedge daily. approach (and therefore popular in industry). K = 100. since the option price is expensive (after all. Left: rebalance hedge once a week. We assume that we know the volatility.5) √ where µ is the expected growth rate of the stock price. 50 . But you may not be able to sell at this price. κ is a parameter controlling how fast v reverts to its mean level of θ.

and then backing out the volatility which gives back today’s traded option price (this is the implied volatility).1) Recall that if then from Ito’s Lemma we have d[log S] = [µ − σ2 ] dt + σ dZ.2) Now.e.6) Note that if we assume that σ is constant when pricing the option. then we require that ∂P ∂S ∂P ∂σ ∂V ∂I + αh + β =0 ∂S ∂S ∂V ∂I = − +β =0 ∂σ ∂σ = − (8. of Finance. Since this implied volatility has likely changed since we last rebalanced the hedge. Now. if ∆t is suﬃciently small. i. taking into account the change in the hedge portfolio through equations (8.4) Deﬁne the return Ri in the period ti+1 − ti as Ri = Si+1 − Si Si 51 (9. Then as time goes on. However. the implied volatility will likely change. In fact. the vega hedge computed using a constant volatility model works surprisingly well (Hull and White. so that equation where φ is N (0. Nevertheless. 1).6) should make up for this error. yet do not assume σ is constant when we hedge.6) numerically (solve the pricing equation for several diﬀerent values of σ. we would sell the option priced using our best estimate of σ (today). 42 (1987) 281-300). with ∆t = ti+1 − ti . 2 (9. and then ﬁnite diﬀerence the solutions).3) can be approximated by log( Si+1 − Si + Si ) Si = log(1 + √ σφ ∆t. suppose that we observe asset prices at discrete times ti . The pricing of options on assets with stochastic volatilities. This is usually based on looking at the prices of traded options.3) ∆t. this is somewhat inconsistent. we can determine the derivatives in equation (8. J. In practice.The hedge portfolio P (t) is then P (t) = −V + αh S + βI + B(t) Assuming that we buy and hold αh shares and β of the secondary instrument at the beginning of each hedging interval.2) we have log Si+1 − log Si = log( Si+1 ) Si √ σ2 [µ − ] ∆t + σφ ∆t 2 √ (9. S(ti ) = Si .5) . 9 Jump Diﬀusion dS = µSdt + σS dZ (9. We use the current implied volatility to determine the current hedge parameters in equation (8. Then from equation (9.6). even if the underlying process is a stochastic volatility. Si+1 − Si ) Si (9. there is some error in the hedge. then ∆t is much smaller than (9. This procedure is called delta-vega hedging.

The amplitude of the return is proportional to ∆t.2 −1 −0.4) becomes log(1 + Ri ) √ Ri = σφ ∆t. 20 15 10 5 0 −1.1: Probability density functions for the S&P 500 monthly returns 1982 − 2002.1)) assumes that the probability of a large return √ also tends to zero. More formally. we can sometimes see very large returns (positive or negative) in small time increments. in real life. once again.8 −0.1 we can see a histogram of monthly returns from the S&P 500 for the period 1982 − 2002. with probability λdt with probability 1 − λdt. Note that for real data.1 The Poisson Process Consider a process where most of the time nothing happens (contrast this with Brownian motion. but the size of the movement tends to zero as dt → 0. where some changes occur at any time scale we look at). As ∆t → 0.7) 52 . but on rare occasions. Also. t + dt].6) Note. where some movement always takes place (the probability of movement is constant as dt → 0). but the probability of the jump occurring does depend on the interval. In Figure 9.so that equation (9.6 −0.4 0. note that E[dq] = λ dt · 1 + (1 − λ dt) · 0 = λ dt (9. (9. the probability of a jump occurring in [t.4 −0. a jump occurs. The jump size does not depend on the time interval. For future reference. so that the tails of the distribution become unimportant. The histogram has been scaled to zero mean and unit standard deviation. t + dt] goes to zero as dt → 0. This means that there is higher probability of zero return.1) is true. = 0 .4 −1. or a large gain or loss compared to a normal distribution. in the interval [t. 9. if the assumption (9. scaled to zero mean and unit standard deviation and the standardized Normal distribution. consider the process dq where. there is a higher peak. But. that size of the Poisson outcome does not depend on dt.6 Figure 9. a plot of the discretely observed returns of S should be normally distributed. A standard normal distribution is also shown. It therefore appears that Geometric Brownian Motion (GBM) is missing something important. dq = 1 .2 0 0. Consequently. Geometric Brownian Motion (equation (9. in contrast to Brownian motion. and fatter tails than the normal distribution.2 0.

then the expected value of f is ∞ E[f ] = 0 f (J)g(J) dJ .12) The process (9. J + dJ] is g(J) dJ.e.8) Now.10).1 0. if a jump occurs Saf ter jump = (J − 1)S dq = Sbef ore jump + [dS]jump = Sbef ore jump + (J − 1)Sbef ore = JSbef ore jump jump (9. Let’s write this jump process as an SDE. occasionally the asset jumps. with probability λdt. t + dt].10) Assume that the jump size has some known probability density g(J).2: Some realizations of a jump diﬀusion process which follows equation (9.e.2 0. suppose we assume that.15 0.25 Time (years) Figure 9. if we have a combination of GBM and a rare jump event. (9. i. i. where J is the size of a (proportional) jump. For future reference. S → JS. given that a jump occurs. then the probability of a jump in [J. and V ar(dq) = = = = E[(dq − E[dq])2 ] E[(dq − λ dt)2 ] (1 − λ dt)2 · λ dt + (0 − λ dt)2 · (1 − λ dt) λ dt + O((dt)2 ) . along with the usual GBM.05 0. i. Some example realizations of jump diﬀusion paths are shown in Figure 9. 53 . if f = f (J). and +∞ ∞ g(J) dJ −∞ = 0 g(J) dJ = 1 (9.9) which is what we want to model. So. We will restrict J to be non-negative. Suppose a jump occurs in [t. (9. then dS = µS dt + σS dZ + (J − 1)S dq (9.180 160 140 Asset Price 120 100 80 60 40 0 0.11) since we assume that g(J) = 0 if J < 0.10) is basically geometric Brownian motion (a continuous process) with rare discontinuous jumps.e.2. [dS]jump since.

9.2

The Jump Diﬀusion Pricing Equation

Now, form the usual hedging portfolio P Now, consider [dP ]total where, from Ito’s Lemma [dP ]Brownian = [Vt + σ2 S 2 VSS ]dt + [VS − αS](µS dt + σS dZ) 2 (9.15) = [dP ]Brownian + [dP ]jump (9.14) = V − αS . (9.13)

and, noting that the jump is of ﬁnite size, [dP ]jump = [V (JS, t) − V (S, t)] dq − α(J − 1)S dq . (9.16)

If we hedge the Brownian motion risk, by setting α = VS , then equations (9.14-9.16) give us dP = [Vt + σ2 S 2 VSS ]dt + [V (JS, t) − V (S, t)]dq − VS (J − 1)S dq . 2 (9.17)

So, we still have a random component (dq) which we have not hedged away. Let’s take the expected value of this change in the portfolio, e.g. E(dP ) = [Vt + σ2 S 2 VSS ]dt + E[V (JS, t) − V (S, t)]E[dq] − VS SE[J − 1]E[dq] 2 (9.18)

where we have assumed that probability of the jump and the probability of the size of the jump are independent. Deﬁning E(J − 1) = κ, then we have that equation (9.18) becomes E(dP ) = [Vt + σ2 S 2 VSS ]dt + E[V (JS, t) − V (S, t)]λ dt − VS Sκλ dt . 2 (9.19)

Now, we make a rather interesting assumption. Assume that an investor holds a diversiﬁed portfolio of these hedging portfolios, for many diﬀerent stocks. If we make the rather dubious assumption that these jumps for diﬀerent stocks are uncorrelated, then the variance of this portfolio of portfolios is small, hence there is little risk in this portfolio. Hence, the expected return should be E[dP ] = rP dt . Now, equating equations (9.19 and (9.20) gives Vt + σ2 S 2 VSS + VS [rS − Sκλ] − (r + λ)V + E[V (JS, t)]λ = 0 . 2 (9.21) (9.20)

**Using equation (9.12) in equation (9.21) gives Vt + σ2 S 2 VSS + VS [rS − Sκλ] − (r + λ)V + λ 2
**

∞

g(J)V (JS, t) dJ = 0 .

0

(9.22)

**Equation (9.22) is a Partial Integral Diﬀerential Equation (PIDE). A common assumption is to assume that g(J) is log normal,
**

µ exp − (log(J)2 −ˆ) 2γ √ . g(J) = 2πγJ

2

(9.23)

54

where, some algebra shows that E(J − 1) = κ = exp(ˆ + γ 2 /2) − 1 . µ (9.24)

Now, what about our dubious assumption that jump risk was diversiﬁable? In practice, we can regard σ, µ, γ, λ as parameters, and ﬁt them to observed option prices. If we do this, (see L. Andersen and J. ˆ Andreasen, Jump-Diﬀusion processes: Volatility smile ﬁtting and numerical methods, Review of Derivatives Research (2002), vol 4, pages 231-262), then we ﬁnd that σ is close to historical volatility, but that the ﬁtted values of λ, µ are at odds with the historical values. The ﬁtted values seem to indicate that investors are ˆ pricing in larger more frequent jumps than has been historically observed. In other words, actual prices seem to indicate that investors do require some compensation for jump risk, which makes sense. In other words, these parameters contain a market price of risk. Consequently, our assumption about jump risk being diversiﬁable is not really a problem if we ﬁt the jump parameters from market (as opposed to historical) data, since the market-ﬁt parameters will contain some eﬀect due to risk preferences of investors. One can be more rigorous about this if you assume some utility function for investors. See (Alan Lewis, Fear of jumps, Wilmott Magazine, December, 2002, pages 60-67) or (V. Naik, M. Lee, General equilibrium pricing of options on the market portfolio with discontinuous returns, The Review of Financial Studies, vol 3 (1990) pages 493-521.)

10

Mean Variance Portfolio Optimization

An introduction to Computational Finance would not be complete without some discussion of Portfolio Optimization. Consider a risky asset which follows Geometric Brownian Motion with drift dS S = µ dt + σ dZ , (10.1)

√ where as usual dZ = φ dt and φ ∼ N (0, 1). Suppose we consider a ﬁxed ﬁnite interval ∆t, then we can write equation (10.1) as R = µ +σ φ ∆S R= S µ = µ∆t √ σ = σ ∆t ,

(10.2)

where R is the actual return on the asset in [t, t + ∆t], µ is the expected return on the asset in [t, t + ∆t], and σ is the standard deviation of the return on the asset in [t, t + ∆t]. Now consider a portfolio of N risky assets. Let Ri be the return on asset i in [t, t + ∆t], so that Ri = µi + σi φi (10.3)

Suppose that the correlation between asset i and asset j is given by ρij = E[φi φj ]. Suppose we buy xi of each asset at t, to form the portfolio P

i=N

P

=

i=1

xi Si .

(10.4)

55

**Then, over the interval [t, t + ∆t]
**

i=N

P + ∆P

=

i=1 i=N

xi Si (1 + Ri ) xi Si Ri

i=1 i=N

∆P ∆P P

=

=

i=1

wi Ri wi =

i=N

xi Si j=N j=1 xj Sj

(10.5)

In other words, we divide up our total wealth W = i=1 xi Si into each asset with weight wi . Note that i=N i=1 wi = 1. To summarize, given some initial wealth at t, we suppose that an investor allocates a fraction wi of this wealth to each asset i. We assume that the total wealth is allocated to this risky portfolio P , so that

i=N

wi

i=1

=

1

i=N

P ∆P P

=

i=1 i=N

xi Si wi Ri .

i=1

Rp =

=

(10.6)

**The expected return on this portfolio Rp in [t, t + ∆t] is
**

i=N

Rp while the variance of Rp in [t, t + ∆t] is

=

i=1

wi µi ,

(10.7)

i=N j=N

V ar(Rp )

=

i=1 j=1

wi wj σi σj ρij .

(10.8)

10.1

Special Cases

Suppose the assets all have zero correlation with one another, i.e. ρij ≡ 0, ∀i = j (of course ρii = 1). Then equation (10.8) becomes

i=N

V ar(Rp )

=

i=1

(σi )2 (wi )2 .

(10.9)

**Now, suppose we equally weight all the assets in the portfolio, i.e. wi = 1/N, ∀i. Let maxi σi = σmax , then V ar(Rp ) = ≤ 1 N2
**

i=N

(σi )2

i=1

N (σmax )2 N2 1 = O , N 56

(10.10)

but larger than zero. ρij = 1. In this case i=N j=N V ar(Rp ) = i=1 j=1 wi wj σi σj j=N = so that if sd(R) = j=1 wj σj 2 (10. 57 (10. We are assuming that risk and standard deviation of portfolio return are synonymous. Let the covariance matrix C be deﬁned as [C]ij = Cij = σi σj ρij (10. This means that diversiﬁcation will be a good thing (as Martha Stewart would say) in terms of risk versus reward. .so that in this special case. then. then investors seek only to maximize expected return.. w2 . j.14) We can think of portfolio allocation problem as the following. if α → ∞.2 The Portfolio Allocation Problem Diﬀerent investors will choose diﬀerent portfolios depending on how much risk they wish to take.. in this case j=N sd(Rp ) = j=1 wj σj . which should be ﬁxed up somehow. if we diversify over a large number of assets. In fact. and don’t care about risk. In theory.18) Li ≤ wi . . µ2 .. µN ]t . The portfolio allocation problem is then (for given α) ﬁnd w ¯ which satisﬁes min wt C w − αwt µ ¯ ¯ ¯ ¯ w ¯ (10.11) V ar(R) is the standard deviation of R. . ¯ ¯ (10. a portfolio of as little as 10 − 20 stocks tends to reap most of the beneﬁts of diversiﬁcation. If α → 0. ¯ ¯ and the variance is V ar(Rp ) = wt C w .15) (10.. N .16) subject to the constraints wi i = 1 ≤ Ui . w = [w1 .. However.. On the other hand.13) and deﬁne the vectors µ = [µ1 .. The expected return on the portfolio is then Rp = w t µ . all investors like to achieve the highest possible expected return for a given amount of risk. Let α represent the degree with which investors want to maximize return at the expense of assuming more risk. 10. Consider another case: all assets are perfectly correlated. wN ]t . we can expect that 0 < |ρij | < 1. the covariance matrix should be ¯ ¯ symmetric positive semi-deﬁnite. However. ∀i.12) which means that in this case the standard deviation of the portfolio is simply the weighted average of the individual asset standard deviations. measurement errors may result in C having a negative eigenvalue.17) (10.. so that the standard deviation of a portfolio of assets will be smaller than the weighted average of the individual asset standard deviation. In general. then investors want to avoid as much risk as possible. (10. the standard deviation of the portfolio tends to zero as N → ∞.. i = 1.

01 .20). (see Portfolio Theory and Capital Markets. and they may also be prohibited from having a large position in any one asset (e.08 −.20 .15 0. The data used for this example is . Figure 10.225 Expected Return 0. and then solve the quadratic programming problem (10.19) We can now trace out a curve on the (sd(Rp ).05 ¯ . McGraw Hill. This curve shows. for each value of portfolio standard deviation SD(Rp ). W. then any point below the curve is not eﬃcient.2 Efficient Frontier 0.015 µ = . Long-short hedge funds will not have these types of restrictions.175 0.18). a linear combination of portfolios at two points along the eﬃcient frontier will be feasible. For example.2 0.e. most mutual funds can only hold long positions (wi ≥ 0).05 −.30 . Only points on the curve are eﬃcient in this manner. In general.20).5 0.3 Standard Deviation 0.20 . Constraint (10. This feasible region will be convex along the eﬃcient frontier. i. C = . satisfy the constraints.0. For a given value of the standard deviation of the portfolio return (sd(Rp )). while constraints (10.17) is simply equation (10.18) constitute a quadratic programming problem. equations (10.1 shows a typical curve. Figure 10. Rp ) plane. Data in equation (10.25 short positions in each asset.015 .16-10. which is also known as the eﬃcient frontier. We pick various values of α.1: A typical eﬃcient frontier. the maximum possible expected portfolio return Rp . U = ∞ (10. wi ≤ .20) 0 ∞ We have restricted this portfolio to be long only. In other 58 . 1970.g.01 .2 shows results if we allow the portfolio to hold up to .1 0 ∞ L= 0 .125 0. Why is this the case? If this was not true.16-10.25 0. since there is another portfolio with the same risk (standard deviation) and higher expected return.4 0.6).1 0. then the eﬃcient frontier would not really be eﬃcient. Let sd(Rp ) = = standard deviation of Rp V ar(Rp ) (10.18) may arise due to the nature of the portfolio. Another way of saying this is that a straight line joining any two points along the curve does not intersect the curve except at the given two points. Sharpe.6 Figure 10. reprinted in 2000). For ﬁxed α.15 .

the data is the same as in (10.25 L = −. Since the feasible region is convex.. N (10.. comparing results for long-only portfolio (10..3 Standard Deviation 0..21) In general..4 0. Let e = [1. . long-short portfolios are more eﬃcient than long-only portfolios. .2: Eﬃcient frontier.16)).. (α = 0 in equation (10.0.25 .20) except that −.. This is the advertised advantage of long-short hedge funds. First of all.20) and a long-short portfolio (same data except that lower bound constraint is replaced by equation (10. k = 1. i = 1. If the solution weight vector to this problem is (w)max . (10. then ¯ the maximum possible expected return is (Rp )max = wmax µ.6 Figure 10.225 Long-Short Expected Return 0.125 0...21). i = 1. we can actually proceed in a diﬀerent manner when constructing the eﬃcient frontier.175 Long Only 0.25 (10. 1]t .15 0. ¯t ¯ Then determine the portfolio with the smallest possible risk. N . . and A= µt ¯ et .2 0. we can determine the maximum possible expected return (α = ∞ in equation (10. (Rp )max ] into a large number of discrete portfolio returns (Rp )k . then the minimum possible portfolio ¯ ¯t ¯ return is (Rp )min = wmin µ. words.2 0.24) 59 .23) If the solution weight vector to this quadratic program is given by wmin . Bk = (Rp )k 1 (10. .1 0. min −wt µ ¯ ¯ w ¯ wi = 1 i Li ≤ wi ≤ Ui ..22) which is simply a linear programming problem. We then divide up the range [(Rp )min .5 0. Npts .25 0.. −..16) ) min wt C w ¯ ¯ w ¯ wi = 1 i Li ≤ wi ≤ Ui . 1.

what happens if we add a risk free asset to our portfolio? This risk-free asset must earn the risk free rate r = r∆t.1).05 µ= ¯ −. . i = 1. Consequently. (10. In fact. all diversiﬁed investors..03 0. Rp ) plane. Npts .0 0. corresponding to a portfolio which consists entirely of the risk free asset.015 0. (Rp )k ). First of all. then we are lending at the risk-free rate. .20 ..15 .28) . N .30 . given the eﬃcient frontier from Figure 10. sd(Rp ) 60 (10. In other words. any point along the capital market line has Rp = wr r + (1 − wr )(Rp )M sd(Rp ) = (1 − wr ) sd((Rp )M ) . However.015 . that the equation of this line is written as Rp = r +λM sd((Rp )).0 0 ∞ 0 ∞ L= (10.. Let (Rp )M = wM µ be the expected return on this market portfolio.01 0. with solution vector (w)k and hence portfolio standard deviation sd((Rp )k ) = ¯ a set of pairs (sd((Rp )k ). Let wr be the fraction invested in the risk free asset.27) If wr ≥ 0. Note that the actual fraction selected for investment in the market portfolio depends on the risk preferences of the investor. Let the portfolio weights at this single point be denoted by wM . . we start at the point (0. If wr < 0.01 .25) (w)t C(w)k . with ¯t ¯ corresponding standard deviation sd((Rp )M ).08 . Any other choice for the portfolio is not eﬃcient.3 Adding a Risk-free asset Up to now. should have diversiﬁed portfolios which plot along the capital market line. i.03). then all investors should divide their assets between the risk-free asset and the market portfolio. where λM is the market price of risk. we get the result labeled capital market line in Figure 10.0 .3. All portfolios should have the same Sharp ratio λM = Rp − r . The portfolio corresponding to the weights wM ¯ ¯ is termed the market portfolio..1. we are borrowing at the risk-free rate. This gives us ¯ k ¯ 10. In other words.1 0. we have assumed that each asset is risky. k = 1.26) 0 . C = . in this case the eﬃcient frontier is a straight line.0 .. for given (Rp )k we solve the quadratic program min wt C w ¯ ¯ w ¯ Aw = B k ¯ Li ≤ wi ≤ Ui . U = ∞ −∞ ∞ where we have assumed that we can borrow any amount at the risk-free rate (a dubious assumption).0 . r ). and a risk-free asset. given a portfolio of risky assets.then. We then draw a straight line passing through (0. If we compute the eﬃcient frontier with a portfolio of risky assets and include one risk-free asset. (10.20 .e.0 0. we can construct the eﬃcient frontier for a portfolio of the same risky assets plus a risk free asset in the following way. The data for this case is (the risk-free asset is added to the end of the weight vector. at any particular point in time. with r = .05 −. which touches the all-risky-asset eﬃcient frontier at a single point (the straight line is tangent the all-risky-asset eﬃcient frontier).. Then. and its standard deviation is zero. The capital market line is so important. Note that this straight line is always above the eﬃcient frontier for the portfolio consisting of all risky assets (as in Figure 10. ∀i. r ) in the (sd(Rp ). σi > 0.0 0.

1 Standard Deviation 0. the eﬃcient frontier becomes a straight line.6 Figure 10. We have assumed that µ . 61 (10. if I am long an asset.125 0. but why do we think µ for a particular ﬁrm will be constant for long periods? Probably.3 0.1 (all risky assets). stock analysts should be estimating µ from company balance sheets.2 0. Even if these parameters are reasonably constant.025 0 0 Risk Free Return 0. In particular. analysts have been too busy hyping stocks and going to lunch to do any real work.1 0.15 0. etc. We have also assumed that risk is measured by standard deviation of portfolio return. shown as the capital market line. This suggests that perhaps it may be more appropriate to minimize downside risk only (assuming a long position).4 Criticism Is mean-variance portfolio optimization the solution to all our problems? Not exactly.0. If we have a long time series.225 0.05 0. Remember that for short time series. In fact.3: The eﬃcient frontier from Figure 10. Actually. µ is hard to determine if the time series of returns is not very long. we can get a better estimate for µ .175 0. Perhaps one of the most useful ideas that come from mean-variance portfolio optimization is that diversiﬁed investors (at any point in time) expect that any optimal portfolio will produce a return Rp = r + λM σp Rp = Expected portfolio return r = risk-free return in period ∆t λM = market price of risk σp = Portfolio volatility . and I don’t like it when the asset goes down.4 0. they are diﬃcult to estimate. In this case.2 Capital Market Line Borrowing Efficient Frontier All Risky Assets Lending Market Portfolio Expected Return 0. there will be lots of diﬀerent estimates of µ . In other words.1. the noise term (Brownian motion) will dominate.25 0. σ are independent of time. and the eﬃcient frontier with the same assets as in Figure 10. except that we include a risk free asset. volatility which makes the price increase is good. This is not likely. 10. C. and hence many diﬀerent optimal portfolios. for the past few years. So.075 0. However. sales data. that this does a pretty good job. C. whereby initial wealth is allocated equally between N assets. assuming that there is uncertainty in the estimates of µ . some recent studies have suggested that if investors simply use the 1/N rule. I like it when the asset goes up.29) .5 0.

The input data (expected returns.32) Now.34) will plot a curve in expected return-standard deviation plane (Rp . and covariances) are determined by resampling. it is claimed.d. 2(σp ) ∂(σp ) ∂wi ∂Rp ∂wi = 2wi (σi )2 + 2(1 − 2wi )Ci.5 Individual Securities Equation (10. i.M + (wi − 1)(σM )2 62 (10.34) For a set of values {wi }.M = Covariance between i and M = 1 . In this way.35) Now. By deﬁnition wM + wi and we deﬁne RM Ri σM σi Ci. Figure 10. This gives us an average eﬃcient frontier.g.where diﬀerent investors will choose portfolios with diﬀerent σp (volatility).3). + (1 − 2wi )Ci.31) (10. Another recent approach is to compute the optimal portfolio weights using using many diﬀerent perturbed input data sets. equations (10. ∂Rp ∂(σp ) = ∂Rp ∂wi ∂(σp ) ∂wi = wi (σi )2 (Ri − RM )(σp ) .33) (10. i. is less sensitive to data errors. whose adherents don’t think much of meanvariance portfolio optimization.M + 2(wi − 1)(σM )2 = Ri − RM .30) Note: there is a whole ﬁeld called Behavioural Finance. (10.e. of return on asset i = σM σi ρi. What is the relationship between risk and reward for individual securities? Consider the following portfolio: divide all wealth between the market portfolio. Let’s determine the slope of this curve when wi → 0.M + (1 − wi )2 (σM )2 (10.M = expected return on the market portfolio = expected return on asset i = s. which. (10.d. the expected return on this portfolio is Rp = E[Rp ] = wi Ri + wM RM = wi Ri + (1 − wi )RM and the variance is 2 2 V ar(Rp ) = (σp )2 = wi (σi )2 + 2wi wM Ci. we also have RM = r + λM σ M . 10. but λM is the same for all investors. when this curve intersects the capital market line at the market portfolio. depending on their risk preferences. Of course.e.36) . assuming that the observed values have some observational errors. (10. with weight wi .30) refers to an eﬃcient portfolio. of return on market portfolio = s. σp ) (e.33-10. we can get an some sort of optimal portfolio weights which have some eﬀect of data errors incorporated in the result. with weight wM and security i.M + wM (σM )2 2 = wi (σi )2 + 2wi (1 − wi )Ci.

M βi = .g. An example is shown in Figure 10. (RM )k ) scatter plot. = r + βi (RM − r ) (10.46) .M − (σM )2 (10.43) Ci.39) has a nice intuitive deﬁnition.37) But this curve should be tangent to the capital market line. 63 (10. we ﬁnd that bi = so that we can write Ri αi + βi RM .42) This means that βi is the slope of the best ﬁt straight line to a ((Ri )k . suppose we try to obtain a least squares ﬁt to the above data. let wi → 0 in equation (10. (σM )2 = (Ri − RM )(σM ) Ci.Now. This assumes that positions with wi < 0 in asset i are possible. (10. If this curve is not tangent to the capital market line.37)) RM − r (σM ) or Ri = r + βi (RM − r ) Ci. using the equation Ri αi + bi RM . in period k Return on market portfolio in period k . such as the TSX 300.45) since E[Ri ] = Ri = αi + βi RM .30. (10. then we obtain ∂Rp ∂(σp ) = (Ri − RM )(σM ) Ci. then the curve would be above the capital market line.38) (10.39) The coeﬃcient βi in equation (10. Now. from equation (10. we assume that the market portfolio is a broad index. RM ] = 0 .4.39) we have that Ri which is consistent with equation (10. RM ).40) Typically. (10. Now.43) if Ri = αi + βi RM + i E[ i ] = 0 αi = r (1 − βi ) E[ i .M − (σM )2 (10. do a linear regression of Ri vs.36).44) (10.10. which should not be possible (the capital market line is the most eﬃcient possible portfolio). then this implies that if we choose wi = ± . Suppose we have a time series of returns (Ri )k (RM )k = = Return on asset i.41) Carrying out the usual least squares analysis (e. equation (10.30) at the point where the capital market line touches the eﬃcient frontier. Assuming that the slope of the Rp portfolio is tangent to the capital market line gives (from equations (10.M (σM )2 (10.

that returns on each each asset are correlated only through their correlation with the index.46) has the interpretation that the return on asset i can be decomposed into a drift component.d. Each point represents pairs of daily returns. hence equation (10. i (10.05 −0.50) is O(1/N ) as N becomes large. a part which is correlated to the market portfolio (the broad index).51) 64 . i = j i (10.(Rp ) σM i=1 wi βi .d.02 Return on market 0 0. i=j = e2 . and a random part uncorrelated with the index. if wi = O(1/N ).02 0.15 0.1 Return on stock 0.06 −0.1 −0.04 −0.05 0 −0.04 Figure 10. Consider once again a portfolio where the wealth is divided amongst N assets. The vertical axis measures the daily return on the stock and the horizontal axis that of the TSE300.08 −0. then (σM )2 i=1 wi βi + i=1 2 wi e2 . (10. Make the following assumptions E[ i j] = 0 .Rogers Wireless Communications 0.(Rp ) = (σM )2 i=1 j=1 i=N wi wj βi βj + i=1 2 i=N 2 wi e2 i = Now.47) e.4: Return on Rogers Wireless Communications versus return on TSE 300. the return on the portfolio is i=N Rp = i=1 i=N wi Ri i=N Rp and = i=1 wi αi + RM i=1 wi βi (10.48) i=N j=N i=N s. In this case.g. each asset receiving a fraction wi of the initial wealth. Equation (10.49) becomes i=N s.49) i=N 2 wi e2 i i=1 (10.

then B(T ) S(T ) = B(0)erT r = . 2004) 25-35) an extensive analysis of historical data of equity returns was carried out.54) 11 Stocks for the Long Run? Conventional wisdom states that investment in a diversiﬁed portfolio of equities has a low risk for a long term investor.3) . J.03 S(0)eαT α = . • If S is the value of the mutual fund. In fact. Anal.9 = e.06. Projecting this information forward. P. Simson. most individuals in deﬁned contribution pension plans have poorly diversiﬁed portfolios. (11.9 2. Staunton.Note that if we write Ri then we also have that Ri = r + βi (RM − r ) (10. vol 60 (January.” Fin. we are supposed to conclude that • Long term equity investment is not very risky.2) indicating that you more than double your return by investing in equities compared to bonds (over the long term). and B is the value of the government bond. for a investor holding a diversiﬁed portfolio. Table 11. the authors ﬁnd that the probabability of a negative real return over twenty years is about 25 per cent.52) so that the market price of risk of security i is λi which is useful in real options analysis. Making more realistic assumptions for deﬁned contribution pension plans.1) for T large.1 shows a typical table in a Mutual Fund advertisement. M. which gives S(T =30) S(0) B(T =30) B(0) = e1.Marsh. E. From this table.8−.46. However. with an annualized compound return about 3% higher than the current yield on government bonds. (11. = βi (RM − r ) σi (10. in a recent article (”Irrational Optimisim. the authors conclude that the probability of a negative real return over a twenty year period. Let’s see if we can explain why there is this common misconception about the riskiness of long term equity investing. A convenient way to measure the relative returns on these two investments (bonds and stocks) is to compare the total compound return Compound return: stocks = S(T ) = αT S(0) B(T ) Compound return: bonds = log = rT . is about 14 per cent. B(0) log 65 (11.53) = r + λi σi (10.

with µ = . it follows that 1 log T S(T ) S(0) ∼ N ((µ − σ2 ). for bonds. Figure 11.90. 12 12.1 is misleading. New York. The number of scenarios with return less than risk free bonds is given by the area to the left of . Note how the variance of the annualized return has decreased. Also shown is the current yield on a long term government bond. Looking at the right hand panel of Figure 11.2 shows that there are many possible scenarios where the return on equities will be less than risk free bonds after 30 years.08.5) (11. or the annualized compound returns Annualized compound return: stocks = Annualized compound return: bonds = 1 S(T ) log =α T S(0) 1 B(T ) log =r .6-11. T B(0) (11. Figure 11. 66 . Assuming long term bonds yield 3%.1 year -2% 2 years -5% 5 years 10% 10 years 8% 20 years 7% 30 years 6% 30 year bond yield 3% Table 11. this gives a total compound return over 30 years of .56) we have that log S(T ) S(0) ∼ N ((µ − σ2 )T. The investment horizon is 5 years.9 in the histogram. so that the i.7)).1: Historical annualized compound return. relative to what we invested at t = 0) at the end of the investment horizon.5). 000 simulations of asset prices assuming that the asset follows equation (11. This is why Table 11. σ 2 /T ) .4)) and the total compound return ((equation (11.3)).6) and variance σ 2 T . XYZ Mutual Equity Funds. There is signiﬁcent risk in equities.1 Further Reading General Interest • Peter Bernstein. even over the long term (30 years would be long-term for most investors). The Free Press.7) so that the the variance of the annualized return tends to zero at T becomes large. 1992. The results are given in terms of histograms of the annualized compound return (equation (11. Since var(aX) = a2 var(X).2. Capital Ideas: the improbable origins of modern Wall street. 2 σ2 2 )T = µS dt + σS dZ (11.4) If we assume that the value of the equity portfolio S follows a Geometric Brownian Motion dS then from equation (2. 2 (11. σ 2 T ) .e.2 shows similar results for an investment horizon of 30 years. what we really care about is the total compound return (that’s how much we actually have at t = T . the compound return in is normally distributed with mean (µ − variance of the total compound return increases as T becomes large. σ = . while the variance of the total return has increased (verifying equations (11. Of course.1 shows the results of 100.

8 1 0. Returns ($) 0. Left: annualized return 1/T log[S(T )/S(0)].8 −0.6 −0. µ = .2 0 Ann.1: Histogram of distribution of returns T = 5 years.2 0 −0. Right: return log[S(T )/S(0)].2: Histogram of distribution of returns T = 30 years.4 2 1.6 0. 000 simulations.4 0.4 2 1.6 1.2 1.8 0 −3 −2 −1 0 1 Log Returns ($) 2 3 4 5 Figure 11.4 0.08.2 0 Ann.5 0.5 1.6 1.2 0 −0. σ = . µ = .8 2.5 0. Returns ($) 0.x 10 3 4 Annualized Returns − 5 years 2 x 10 4 Log Returns − 5 years 1.6 −0.5 1 0.2 0.6 0. Left: annualized return 1/T log[S(T )/S(0)]. Right: return log[S(T )/S(0)].5 1 0.08. x 10 3 4 Annualized Returns − 30 years 2 x 10 4 Log Returns − 30 years 1.4 −0. 67 .2 0. σ = .5 1.8 0 −3 −2 −1 0 1 Log Returns ($) 2 3 4 5 Figure 11.8 2.8 1 0.6 0.2 1. 100.2.4 −0. 000 simulations.6 0. 100.2.4 0.8 −0.4 0.

• Y. Tavella and C. Cambridge (2004) ISBN 0-521-83884-3. Seydel. Investment under uncertainty. Pindyck. Neftci. • R. ISBN 981-3083-565.K. Tavella. 1998. Springer. Brandimarte. Monte Carlo Methods in Finance. • Boyle. • P. • Lenos Triegeorgis. • D.• Peter Bernstein. Springer (2004) ISBN 0-387-00451-3. Monte Carlo Methods in Financial Engineering. Fooled by Randomness. 1997. ISBN 0-07-283137-5. Westerﬁeld. Mathematical Models of Finance. • D. An Introduction to Financial Option Valuation. 68 . Dyn. J. New York. 12.2 More Background • A. Paul Wilmott on Quantitative Finance. Taleb. (Still a classic). New York. Against the Gods: the remarkable story of risk. Wiley (2002) ISBN 0-471-39447-5. • Burton Malkeil. ISBN 0-471-87438-8. 2000. ISBN 0-07-135320-8. • Paul Wilmott. ISBN 0-13-186479-3. futures and other derivatives. • N. ISBN 1-58799-071-7. ISBN 0-393-32040-5. Quantitative Methods in Derivatives Pricing: An Introduction to Computational Finance. ISBN 0-521-49789-2. Taleb. Wilmott. 1999. Econ. • D. • John Hull. ISBN 0-471-19760-2. Con. ISBN 0-262-20102-X. Sharpe. 2000. ISBN 0-47139686-9. Howison. Portfolio Theory and Capital Markets. Glassermman.3 More Technical • P. Dynamic Hedging. ISBN 0-471-49741-X. ISBN 3-540-43609-X. Wiley. 1997. • N. 2002. Texere. R. Springer Singapore. The mathematics of ﬁnancial derivatives: A student introduction. Corporate Finance. Dewynne. 2002. • P. Wiley. S. 2002. 1994.. Dixit and R.W. J. Higham. Jaﬀe. ISBN 0-471-15280-3. • S. Wiley. 2002. Prentice-Hall. Glasserman. McGraw-Hill Irwin. reprinted in 2000. John Wiley. Cambridge. Wiley. 1997. J. • S. MIT Press. An Introduction to the Mathematics of Financial Derivatives. 1998. Real Options: Managerial Flexibility and Strategy for Resource Allocation. Princeton University Press. Jackel. 12. 1970. Norton. Kwok. Wiley. Pricing Financial Instruments: the Finite Diﬀerence Method. Monte Carlo methods for security pricing. ISBN 0-471-29563-9. W. Ross. Numerical Methods in Finance: A Matlab Introduction. 1996. Broadie. Wiley. Tools for Computational Finance. • W. A random walk down Wall Street. Randall. 21:12671321 (1997) • P. Academic Press (2000) ISBN 0-12-515392-9. Options. 2001.