You are on page 1of 12

Monte Carlo Simulation: IEOR E4703 c 2010 by Martin Haugh

Probability Review and Overview of Monte Carlo


The rst part of these notes provides a brief overview of results and models that we will need from applied
probability. We then provide an overview of Monte-Carlo simulation and the some of the topics we will cover in
this course. In a very brief section at the end we provide links to some online resources for learning Matlab, the
default software that we will use in this course.
1 Probability Review
We now review some probability theory. Monte Carlo simulation relies extensively on probability theory so it is
very important that you understand this material well. We begin with some standard distributions (both discrete
and continuous) and then move on to discuss other topics.
Discrete Probability Distributions
Example 1 (Binomial Distribution)
We say X has a Binomial distribution, or X B(n, p), if
P(X = r) =
_
n
r
_
p
r
(1 p)
nr
.
For example, X might represent the number of heads in n independent coin tosses, where p = P(head). The
mean and variance of the binomial distribution satisfy
E[X] = np
Var[X] = np(1 p)
Question: Assuming we know how to generate random coin tosses, how would we generate a B(n, p) random
variable?
Example 2 (Geometric Distribution)
We say X has a Geometric distribution if
P(X = r) = (1 p)
r1
p.
For example, X could be the number of independent coin tosses required until 1
st
head where p = P(head).
The mean and variance of the geometric distribution satisfy
E[X] = 1/p
Var[X] = (1 p)/p
2
.
Question: Assuming we know how to generate random coin tosses, how would we generate a geometric
random variable?
Probability Review and Overview of Monte Carlo 2
Example 3 (Poisson Distribution)
We say X has a Poisson() distribution if
P(X = r) =

r
e

r!
.
The mean and variance of the Poisson distribution satisfy
E[X] =
Var[X] = .
Question: How do we generate a Poisson() random variable? Could we use coin tosses?
Continuous Probability Distributions
We now review some continuous distributions, beginning with the most important distribution in simulation, the
uniform distribution. But rst, we review some notation and important facts regarding continuous random
variables. This also applies to discrete random variables but with integrals replaced by summations.
Denition 1 The cumulative distribution function (CDF), F(x), of a rv, X, is dened by
F(x) := P(X x).
Denition 2 A continuous rv, X, has probability density function (PDF), f(), if f(x) 0 and x
P(X A) =
_
A
f(y) dy.
The CDF and PDF are related by
F(x) =
_
x

f(y) dy
so that if you dierentiate the CDF you obtain the pdf. It is often convenient to observe that
P
_
X
_
x

2
, x +

2
__
f(x)
Example 4 (Uniform Distribution)
We say X has the uniform distribution on (a, b) if it has PDF
f(x) =
_
_
_
1/(b a) x (a, b)
0 otherwise.
We write X U(a, b). The mean and variance of the uniform distribution satisfy
E[X] = (a +b)/2
Var[X] = (b a)
2
/12.
Question: Suppose for now that we can simulate, i.e. generate, independent and identically (IID) U(0, 1)
random variables (rvs). How do we generate a random coin toss with P(Head) = p ?
Example 5 (Normal Distribution)
We say X has a Normal distribution, or X N(,
2
), if
f(x) =
1

2
2
exp
_

(x )
2
2
2
_
.
Probability Review and Overview of Monte Carlo 3
The mean and variance of the normal distribution satisfy
E[X] =
Var[X] =
2
Question: How do we generate a normal random variable?
Example 6 (Exponential Distribution)
We say X has an Exponential() distribution if
f(x) = e
x
, x > 0.
The mean and variance of the normal distribution satisfy
E[X] = 1/
Var[X] = 1/
2
.
Question: How do we generate an exponential random variable?
Example 7 (Lognormal Distribution)
We say X has a lognormal distribution, or X LN(,
2
), if
log(X) N(,
2
).
The mean and variance of the lognormal distribution satisfy
E[X] = exp( +
2
/2)
Var[X] = exp(2 +
2
) (exp(
2
) 1).
The lognormal distribution has played a very important in nancial applications.
Question: How do we generate a lognormal random variable?
Example 8 (Gamma Distribution)
We say X has a Gamma(n, ) distribution if
f(x) =
e
x
(x)
n1
(n 1)!
, x > 0.
The mean and variance of the gamma distribution satisfy
E[X] = n/
Var[X] = n/
2
.
Question: How do we generate a gamma random variable?
Multivariate Distributions
Let X = (X
1
, . . . X
n
) be an n-dimensional vector of random variables. We have the following denitions and
statements.
Denition 3 (Joint CDF) For all x = (x
1
, . . . , x
n
)
T
R
n
, the joint cumulative distribution function (CDF)
of X satises
F
X
(x) = F
X
(x
1
, . . . , x
n
) = P(X
1
x
1
, . . . , X
n
x
n
).
Probability Review and Overview of Monte Carlo 4
Denition 4 (Marginal CDF) For a xed i, the marginal CDF of X
i
satises
F
X
i
(x
i
) = F
X
(, . . . , , x
i
, , . . . ).
It is straightforward to generalize the previous denition to joint marginal distributions. For example, the joint
marginal distribution of X
i
and X
j
satises F
ij
(x
i
, x
j
) = F
X
(, . . . , , x
i
, , . . . , , x
j
, , . . . ). If the
joint CDF is absolutely continuous, then it has an associated probability density function (PDF) so that
F
X
(x
1
, . . . , x
n
) =
_
x
1


_
x
n

f(u
1
, . . . , u
n
) du
1
. . . du
n
.
Similar statements also apply to the marginal CDFs.
Denition 5 (Independent Random Variables) A collection of random variables is independent if the
joint CDF (or PDF if it exists) can be factored into the product of the marginal CDFs (or PDFs).
Denition 6 (Conditional Distribution) If X
1
= (X
1
, . . . , X
k
)
T
and X
2
= (X
k+1
, . . . , X
n
)
T
is a
partition of X then the conditional CDF satises
F
X
2
|X
1
(x
2
|x
1
) = P(X
2
x
2
|X
1
= x
1
).
If X has a PDF, f(), then it satises
F
X
2
|X
1
(x
2
|x
1
) =
_
x
k+1


_
x
n

f(x
1
, . . . , x
k
, u
k+1
, . . . , u
n
)
f
X
1
(x
1
)
du
k+1
. . . du
n
where f
X
1
() is the joint marginal PDF of X
1
.
Assuming it exists, the mean vector of X is given by
E[X] := (E[X
1
], . . . , E[X
n
])
T
whereas, again assuming it exists, the covariance matrix of X satises
Cov(X) := := E
_
(XE[X]) (XE[X])
T

so that the (i, j)


th
element of is simply the covariance of X
i
and X
j
. Note that the covariance matrix is
symmetric so that
T
= , its diagonal elements satisfy
i,i
0, and it is positive semi-denite so that
x
T
x 0 for all x R
n
. The correlation matrix, (X) has as its (i, j)
th
element
ij
:= Corr(X
i
, X
j
). It is
also symmetric, positive semi-denite and has 1s along the diagonal. For any matrix A R
kn
and vector
a R
k
we have
E[AX+a] = AE[X] + a (1)
Cov(AX+a) = A Cov(X) A
T
. (2)
Note that (2) implies
Var(aX +bY ) = a
2
Var(X) +b
2
var(Y ) + 2abCov(X, Y ).
If X and Y independent, then Cov(X, Y ) = 0 though the converse is not true in general. Note, however, that if
(X, Y ) is bivariate normal then Cov(X, Y ) = 0 does imply that X and Y are independent.
Denition 7 The characteristic function of X is given by

X
(s) := E
_
e
is
T
X
_
for s R
n
(3)
and, if it exists, the moment-generating function (MGF) is given by (3) with s replaced by i s.
Probability Review and Overview of Monte Carlo 5
Example 9 (The Multivariate Normal Distribution)
If the n-dimensional vector X is multivariate normal with mean vector and covariance matrix then we write
X MN
n
(, ).
The standard multivariate normal has = 0 and = I
n
, the n n identity matrix. The PDF of X is given by
f(x) =
1
(2)
n/2
||
1/2
e

1
2
(x)
T

1
(x)
(4)
where | | denotes the determinant, and its characteristic function satises

X
(s) = E
_
e
is
T
X
_
= e
is
T

1
2
s
T
s
. (5)
Recall again our partition of X into X
1
= (X
1
, . . . , X
k
)
T
and X
2
= (X
k+1
, . . . , X
n
)
T
. If we extend this
notation naturally so that
=
_

1

2
_
and =
_

11

12

21

22
_
.
then we obtain the following results regarding the marginal and conditional distributions of X.
Marginal Distribution
The marginal distribution of a multivariate normal random vector is itself (multivariate) normal. In particular,
X
i
MN(
i
,
ii
), for i = 1, 2.
Conditional Distribution
Assuming is positive denite, the conditional distribution of a multivariate normal distribution is also a
(multivariate) normal distribution. In particular,
X
2
| X
1
= x
1
MN(
2.1
,
2.1
)
where
2.1
=
2
+
21

1
11
(x
1

1
) and
2.1
=
22

21

1
11

12
.
Linear Combinations
Linear combinations of multivariate normal random vectors remain normally distributed with mean vector and
covariance matrix given by (1) and (2), respectively.
Example 10 (The Multivariate t Distribution)
We say X has the multivariate t distribution with degrees-of-freedom (dof) is obtained when we take
X + Z
_

V
where Z MN(0, ) and V
2

independently of Z. We write X t
n
(, , ) and note that
Cov(X) = /( 2) but this is only dened when > 2. Clearly E[X] = . The multivariate t distribution
plays an important role in risk management as it often provides a very good t to asset return distributions.
Question: Suppose we can simulate MN(, ) random vectors and
2

random variables. How would we


simulate X t
n
(, , )?
Probability Review and Overview of Monte Carlo 6
Conditional Expectations and Conditional Variances
For random variables, X and Y , we have the conditional expectation identity
E[X] = E[E[X|Y ]]
and the conditional variance identity
Var(X) = Var(E[X|Y ]) + E[Var(X|Y )].
These identities, while useful in general, are of particular useful in simulation.
Example 11 (A Random Sum of Random Variables)
Let W = X
1
+X
2
+. . . +X
n
where the X
i
s are IID and n is also a rv, independent of the X
i
s.
1. What is E[W] ?
Use E[X] = E[E[X|Y ]]
What is X and what is Y ?
2. What is Var(W) ?
Use Var(X) = Var(E[X|Y ]) + E[Var(X|Y )]
Again, what is X and what is Y ?
Example 12 (Chickens and Eggs)
A hen lays N eggs where N Poisson(). Each egg hatches with probability p, independently of the other eggs
and N. Let K be the number of chickens.
1. What is E[K|N]?
2. What is E[K]?
3. What is E[N|K]?
Limit Theorems
Theorem 1 (The Weak Law of Large Numbers)
Suppose X
1
, . . . , X
n
are IID with mean . Then for any > 0
P
_

X
1
+. . . +X
n
n

>
_
0 as n .
Theorem 2 (The Strong Law of Large Numbers)
Suppose X
1
, . . . , X
n
are IID with mean . Then
lim
n
X
1
+. . . +X
n
n
=
with probability 1.
The Strong Law of Large Numbers is a very important result and indeed it will justify our simulation
methodology.
Probability Review and Overview of Monte Carlo 7
Theorem 3 (Central Limit Theorem)
Suppose X
1
, . . . , X
n
IID with mean and nite variances
2
. Dene
S
n
:=
X
1
+. . . +X
n
n
.
Then
S
n

n
N(0, 1) as n .
We will see later that the Central Limit Theorem allows us to easily analyze the output of our simulations.
The Poisson Process
Consider a stochastic process where events (or arrivals) occur at random points in time. Let N(t) denote the
number of events (or arrivals) that have occurred by time t and assume that N(t) has the following properties:
1. N(0) = 0
2. The number of arrivals in non-overlapping time intervals are independent
3. The distribution of the number of arrivals in a time interval only depends on the length of the interval
4. For h > 0 very small,
P(N(h) = 1) h
P(N(h) 2) 0
P(N(h) = 0) 1 h
Then we say that N(t) is a Poisson process with intensity . The Poisson process and generalizations of it are
very useful for modeling arrival processes. In particular, it can be used for modeling
- the emission of particles from a radioactive source
- market crashes
- corporate defaults
- the arrivals of customers to a queue.
It can be shown that
P(N(t) = r) =
(t)
r
e
t
r!
so that N(t) has the familiar Poisson(t) distribution. Let X
i
be the interval between the (i 1)
th
and the i
th
arrivals, i.e. the i
th
interarrival time. Then our assumptions imply (why?) that X
i
s are IID. What is the
distribution of X
i
? We can answer this question by computing P(X
1
> t) = P(N(t) = 0) = . . .?
Later in the course we will review Brownian motion and any other topics from probability theory that we might
need. We end our probability review with a simple yet often very confusing example.
Example 13 (A Brain-Teaser)
The king comes from a family of 2 children. What is the probability that the other child is his brother?
Notation:
BG means 1
st
child a boy, second child a girl
BB means both children are boys etc.
What is it we want to compute?
Probability Review and Overview of Monte Carlo 8
2 Overview of Monte Carlo Simulation
Monte-Carlo techniques are used to understand (and sometimes control) complex stochastic systems that are
too complex to be understood or controlled using analytic or numerical methods. Examples of such systems
include weather and climate systems, telecommunications networks and nancial markets. Before using
simulation, we rst need to build a mathematical model of the system.
Modeling the System
A good model should
facilitate a good understanding of the system, either analytically, numerically, or through simulation
capture the salient details of the system, omitting factors that are not relevant.
To build a good model we must perform the following steps:
identify the system, including the features and state variables that need to be included and those
that should be excluded
make necessary probabilistic assumptions about state variables and features
test these assumptions
identify the inputs and initial values of the state variables
identify the performance measure that we wish to obtain from the system
identify the mathematical relationship between the terminal state variables and the performance
measure
solve the model either analytically, numerically or using simulation.
Example 14 (An Inventory Problem)
A retailer sells a perishable commodity and each day he places an order for Q units. Each unit that is sold gives
a prot of 60 cents and units not sold at the end of the day are discarded at a loss of 40 cents per unit. The
demand, D, on any given day is uniformly distributed on [80, 140]. How many units should the retailer order to
maximize expected prot?
Solution 1: Use Simulation
Let P denote the retailers prot so that
P =
_
0.6Q if D Q
0.6D 0.4(QD) if Q > D
.
We begin by setting Q = 80 and then generating n IID replications of D. For each replication, we compute the
prot or loss, P
i
(80) say, for i = 1, . . . n. We then estimate E[P(80)] with

P
i
/n. We then repeat the entire
exercise for dierent values of Q and then select the value that gives the biggest estimated prot.
Solution 2: Solve Analytically!
We can write the expected prot as
E[P] =
_
140
Q
0.6Q
60
dx +
_
Q
80
(0.6x 0.4(Qx))
60
dx
Probability Review and Overview of Monte Carlo 9
and then use calculus to nd that the optimal value of Q is Q

= 116.
Note that the analytic solution gives the exact answer and requires far less work than simulation! But this
problem is simple and for more complex problems, simulation is often the only option.
The Issues in Simulation
In this course we assume that the model has already been constructed and we will use simulation techniques to
analyze it. (In many nancial applications the analysis amounts to little more than estimating the expected
value of some random variable which may represent the payo of a derivative security or the riskiness of a
security or portfolio.) A number of issues then arise:
What distribution do the random variables have?
How do we generate these random variables for the simulation?
How do we analyze the output of the simulation?
How many simulation runs do we need?
How do we improve the eciency of the simulation?
We will answer all of these questions in this course. Consider now the following example where we wish to price
an Asian call option.
Example 15 (Pricing Asian Options)
Consider a stock whose time t price is denoted by S
t
. For a given positive integer, m, the arithmetic mean, A,
and the geometric mean, G, are given respectively by
A :=

m
j=1
SjT
m
m
G :=
_
_
m

j=1
SjT
m
_
_
1/m
.
Now consider an asset that pays [AK]
+
:= max(AK, 0) at time T. This is a European call option on the
arithmetic mean. K is the strike price and is xed in advance at t = 0. T is the expiration date.
Similarly, a call option on the geometric mean has payo at time T given by [GK]
+
:= max(GK, 0).
Finance theory implies that the fair price, C
A
, of the arithmetic option is given by
C
A
:= E
Q
[e
rT
(AK)
+
] (6)
where the expectation in (6) is taken with respect to the risk-neutral probability distribution, Q, and r is the
risk-free interest rate. Similarly, the price, C
G
, of the geometric option is given by
C
G
:= E
Q
[e
rT
(GK)
+
].
We can sometimes compute C
G
in closed form but this is generally not true of C
A
. So in order to nd C
A
we
could use simulation. We have the following simulation algorithm:
Monte-Carlo Algorithm
1. Generate random values of SjT
m
for j = 1, . . . , m
2. Compute a
1
:=
_
1
m

m
j=1
SjT
m
K
_
+
Probability Review and Overview of Monte Carlo 10
3. Repeat n times obtaining sample payos a
1
, . . . , a
n
.
4. Estimate the Asian option price with

C
A
= e
rT
(a
1
+. . . +a
n
)
n
.
Several questions now arise. How do we generate sample values of SjT
m
? How large should n be? How do we
assess the quality of

C
A
? In particular, what are the values of E[

C
A
] and Var(

C
A
)? Can we make the simulation
more ecient?
Note that in the preceding example we have assumed that we know the model. The state variables and the
mathematical relationship between the inputs and outputs have all been identied. The goal is simply to price
the option.
Variance Reduction Techniques
A particularly important topic is simulation eciency. In particular we will wish to develop techniques that will
improve the eciency of our simulations and this leads us to variance reduction methods. Later in the course we
will study the following techniques:
1. Common random numbers
2. Antithetic variates
3. Control variates
4. Conditional expectations
5. Stratied sampling
6. Importance sampling
We now give a very brief introduction to and motivation for some of these variance reduction methods.
Antithetic Variates
Suppose S
t
is lognormally distributed so that log(S
t
) N(,
2
) and that we wish to generate values of S
t
.
Question: Do we need to generate 2 independent N(,
2
) variables in order to generate 2 sample values of
S
t
?
Answer: Maybe not! Note that if X N(0, 1) then X N(0, 1). Recalling that
Var(W +Y ) = Var(W) + Var(Y ) + 2Cov(W, Y )
can you see how this observation might be of use?
Control Variates
Control variates are based on the idea of using what you know. Returning to the Asian option example
suppose we know how to compute the price of the call option on the geometric mean. Can we use this
information? Consider using the following estimator for the price of the Asian option on the arithmetic mean

C
A
=

C
A
+b(C
G


C
G
)
Probability Review and Overview of Monte Carlo 11
where C
G
is the true price of the Asian option on the geometric mean.

C
A
and

C
G
are the Monte-Carlo
estimates of arithmetic and geometric mean options, respectively. Why might this work, i.e., result in a better
estimator,

C
A
? How do we choose b?
Conditional Monte Carlo and Stratied Monte Carlo
Recall our conditional variance identity which states
Var(X) = Var(E[X|Y ]) + E[Var(X|Y )].
This identity is very useful for certain variance reduction techniques. In particular, both conditional Monte Carlo
and stratied Monte Carlo are motivated by it. We will see why later in the course.
Importance Sampling
Sometimes it is useful to sample from a dierent distribution! For example, let X be a random variable with
density f() and suppose we want to estimate = E[h(X)]. Let g() be another density function. Then
= E[h(X)] =
_
h(x)f(x) dx
=
_
h(x)f(x)g(x)
g(x)
dx
= E
g
_
h(x)f(x)
g(x)
_
. (7)
where E
g
[] denotes an expectation with respect to the PDF, g(). What condition must g() satisfy in order to
justify (7). This kind of argument is often very useful for simulating rare events such as the probability of a
severe portfolio loss occurring.
Monte-Carlo Simulation in Finance
Monte-Carlo methods have many applications in nance beside estimating European option prices. They are
also used extensively to estimate risk measures such as Value-at-Risk (VaR) and Conditional Value-at-Risk
(CVaR). In recent years, Monte-Carlo methods have also been used to price high-dimensional American options
for which standard numerical techniques
1
are inadequate. They can also be used to estimate the sensitivity of
derivatives prices to underlying parameters, i.e. to estimate the Greeks. If time permits, we will study all of
these applications in this course. We will also learn how to simulate stochastic dierential equations (SDEs)
and, again if time permits, we will study the basics of Markov Chain Monte-Carlo (MCMC), the standard
computational tool of Bayesian statistics
2
.
3 Introduction to Matlab
Matlab will be the default programming language that we will use in this course. That said, students are free to
use R / S-Plus, C++, VBA or any language of their choosing when completing their assignments. Matlab
should be available in the computing labs at Columbia as well as in the computing lab of the IEOR department.
If you wish to purchase it for yourself, then you should purchase the student edition. Make sure you also obtain
the Statistics toolbox. The Optimization and Financial toolboxes are often useful as well but we will probably
not need them in this course. The easiest way to learn Matlab is by working through examples and you should
1
Such as lattice models and other nite dierence schemes.
2
Bayesian statistics also has many applications in nance. In particular it is often used for (i) asset allocation and portfolio
optimization problems and (ii) model calibration. For example, the classic Black-Litterman framework is a Bayesian framework
that is widely used in practice for asset allocation.
Probability Review and Overview of Monte Carlo 12
download one of the many Matlab tutorials that can be found online. They are relatively short and working
through them is probably the best way to start.
Some online tutorials can be found at
http://www.eece.maine.edu/mm/matweb.html
A particularly good one by Kermit Sigmon can be found at
http://web.mit.edu/6.777/www/downloads/primer.pdf
We will use Matlab extensively in this course for lectures assignment solutions.

You might also like