Stochastic Processes and the Mathematics of

Finance
Jonathan Block
April 1, 2008
2
Information for the class
Office: DRL3E2-A
Telephone: 215-898-8468
Office Hours: Tuesday 1:30-2:30, Thursday, 1:30-2:30.
Email: blockj@math.upenn.edu
References:
1. Financial Calculus, an introduction to derivative pricing, by Martin
Baxter and Andrew Rennie.
2. The Mathematics of Financial Derivatives-A Student Introduction, by
Wilmott, Howison and Dewynne.
3. A Random Walk Down Wall Street, Malkiel.
4. Options, Futures and Other Derivatives, Hull.
5. Black-Scholes and Beyond, Option Pricing Models, Chriss
6. Dynamic Asset Pricing Theory, Duffie
I prefer to use my own lecture notes, which cover exactly the topics that I
want. I like very much each of the books above. I list below a little about
each book.
1. Does a great job of explaining things, especially in discrete time.
2. Hull—More a book in straight finance, which is what it is intended to
be. Not much math. Explains financial aspects very well. Go here for
details about financial matters.
3. Duffie— This is a full fledged introduction into continuous time finance
for those with a background in measure theoretic probability theory.
Too advanced. But you might want to see how our course compares to
a PhD level course in this material.
3
4. Wilmott, Howison and Dewynne—Immediately translates the issues
into PDE. It is really a book in PDE. Doesn’t really touch much on
the probabilistic underpinnings of the subject.
Class grade will be based on homework and a take-home final. I will try to
give homework every week and you will have a week to hand it in.
4
Syllabus
1. Probability theory. The following material will not be covered in class.
I am assuming familiarity with this material (from Stat 430). I will
hand out notes regarding this material for those of you who are rusty,
or for those of you who have not taken a probability course but think
that you can become comfortable with this material.
(a) Probability spaces and random variables.
(b) Basic probability distributions.
(c) Expectation and variance, moments.
(d) Bivariate distributions.
(e) Conditional probability.
2. Derivatives.
(a) What is a derivative security?
(b) Types of derivatives.
(c) The basic problem: How much should I pay for an option? Fair
price.
(d) Expectation pricing. Einstein and Bachelier, what they knew
about derivative pricing. And what they didn’t know.
(e) Arbitrage and no arbitrage. The simple case of futures. Arbitrage
arguments.
(f) The arbitrage theorem.
(g) Arbitrage pricing and hedging.
3. Discrete time stochastic processes and pricing models.
(a) Binomial methods without much math. Arbitrage and reassigning
probabilities.
5
(b) A first look at martingales.
(c) Stochastic processes, discrete in time.
(d) Conditional expectations.
(e) Random walks.
(f) Change of probabilities.
(g) Martingales.
(h) Martingale representation theorem.
(i) Pricing a derivative and hedging portfolios.
(j) Martingale approach to dynamic asset allocation.
4. Continuous time processes. Their connection to PDE.
(a) Wiener processes.
(b) Stochastic integration..
(c) Stochastic differential equations and Ito’s lemma.
(d) Black-Scholes model.
(e) Derivation of the Black-Scholes Partial Differential Equation.
(f) Solving the Black Scholes equation. Comparison with martingale
method.
(g) Optimal portfolio selection.
5. Finer structure of financial time series.
6
Chapter 1
Basic Probability
The basic concept in probability theory is that of a random variable. A
random variable is a function of the basic outcomes in a probability space.
To define a probability space one needs three ingredients:
1. A sample space, that is a set S of “outcomes” for some experiment.
This is the set of all “basic” things that can happen. This set can be
a discrete set (such as the set of 5-card poker hands, or the possible
outcomes of rolling two dice) or it can be a continuous set (such as an
interval of the real number line for measuring temperature, etc).
2. A sigma-algebra (or σ-algebra) of subsets of S. This means a set Ω of
subsets of S (so Ω itself is a subset of the power set T(S) of all subsets
of S) that contains the empty set, contains S itself, and is closed under
countable intersections and countable unions. That is when E
j
∈ Ω,
for j = 1, 2, 3, . . . is a sequence of subsets in Ω, then

_
j=1
E
j
∈ Ω
and

j=1
E
j
∈ Ω
In addition, if E ∈ Ω then its complement is too, that is we assume
E
c
∈ Ω
7
8 CHAPTER 1. BASIC PROBABILITY
The above definition is not central to our approach of probability but it
is an essential part of the measure theoretic foundations of probability
theory. So we include the definition for completeness and in case you
come across the term in the future.
When the basic set S is finite (or countably infinite), then Ω is often
taken to be all subsets of S. As a technical point (that you should
probably ignore), when S is a continuous subset of the real line, this
is not possible, and one usually restricts attention to the set of subsets
that can be obtained by starting with all open intervals and taking
intersections and unions (countable) of them – the so-called Borel sets
[or more generally, Lebesgue measurable subsets of S). Ω is the collec-
tion of sets to which we will assign probabilities and you should think
of them as the possible events. In fact, we will most often call them
events.
3. A function P from Ω to the real numbers that assigns probabilities to
events. The function P must have the properties that:
(a) P(S) = 1, P(∅) = 0.
(b) 0 ≤ P(A) ≤ 1 for every A ∈ Ω
(c) If A
i
, i = 1, 2, 3, . . . is a countable (finite or infinite) collection of
disjoint sets (i.e., A
i
∩ A
j
= ∅ for all i different from j), then
P(∪A
i
)) =

P(A
i
).
These axioms imply that if A
c
is the complement of A, then P(A
c
) =
1 −P(A), and the principle of inclusion and exclusion: P(A∪ B) = P(A) +
P(B) −P(A ∩ B), even if A and B are not disjoint.
General probability spaces are a bit abstract and can be hard to deal
with. One of the purposes of random variables, is to bring things back to
something we are more familiar with. As mentioned above, a random variable
is a function X on a probability space (S, Ω, P) is a function that to every
outcome s ∈ S gives a real number X(s) ∈ R. For a subset A ⊂ R let’s
define X
−1
(A) to be the following subset of S:
X
−1
(A) = ¦s ∈ S[X(s) ∈ A¦
It may take a while to get used to what X
−1
(A) means, but do not think of
X
−1
as a function. In order for X
−1
to be a function we would need X to be
9
one-one, which we are not assuming. X
−1
(A) is just a subset of S as defined
above. The point of this definition is that if A ⊂ R then X
−1
(A) ⊂ S,
that is X
−1
(A) is the event that X of the outcome will be in A. If for
A = (a, b) ⊂ R an inverval, X
−1
(A) ∈ Ω (this is a techinical assumption on
X, called measurability, and should really be part of the definition of random
variable.) then we can define P
X
(A) = P(X
−1
(A)).
Proposition 1.0.1. P
X
is a probability function on R.
So we have in some sense transferred the probability function to the real
line. P
X
is called the probability density of the random variable X.
In practice, it is the probability density that we deal with and mention
of (S, Ω, P) is omitted. They are lurking somewhere in the background.
1.0.1 Discrete Random Variables
These take on only isolated (discrete) values, such as when you are counting
something. Usually, but not always, the values of a discrete random variable
X are (subset of the) integers, and we can assign a probability to any subset
of the sample space, as soon as we know the probability of any set containing
one element, i.e., P(¦X = k¦) for all k. It will be useful to write p(k) =
P(¦X = k¦) — set p(k) = 0 for all numbers not in the sample space. p(k)
is called the probability density function (or pdf for short) of X. We repeat,
for discrete random variables, the value p(k) represents the probability that
the event ¦X = k¦ occurs. So any function from the integers to the (real)
interval [0, 1] that has the property that

k=−∞
p(k) = 1
defines a discrete probability distribution.
1.0.2 Finite Discrete Random Variables
1. Uniform distribution – This is often encountered, e.g., coin flips, rolls
of a single die, other games of chance: S is the set of whole numbers
from a to b (this is a set with b − a + 1 elements!), Ω is the set of all
subsets of S, and P is defined by giving its values on all sets consisting
of one element each (since then the rule for disjoint unions takes over
10 CHAPTER 1. BASIC PROBABILITY
to calculate the probability on other sets). “Uniform” means that the
same value is assigned to each one-element set. Since P(S) = 1, the
value that must be assigned to each one element set is 1/(b −a + 1).
For example, the possible outcomes of rolling one die are ¦1¦, ¦2¦, ¦3¦, ¦4¦, ¦5¦
and ¦6¦. Each of these outcomes has the same probability, namely
1/6. We can express this by making a table, or specifying a function
f(k) = 1/6 for all k = 1, 2, 3, 4, 5, 6 and f(k) = 0 otherwise. Using
the disjoint union rule, we find for example that P(¦1, 2, 5¦) = 1/2,
P(¦2, 3¦) = 1/3, etc..
2. Binomial distribution – flip n fair coins, how many come up heads?
i.e., what is the probability that k of them come up heads? Or do a
sample in a population that favors the Democrat over the Republican
60 percent to 40 percent. What is the probability that in a sample of
size 100, more than 45 will favor the Republican?
The sample space S is 0, 1, 2, ....n since these are the possible outcomes
(number of heads, number of people favoring the Republican [n=100
in this case]). As before, the sigma algebra Ω is the set of all subsets
of S. The function P is more interesting this time:
P(¦k¦) =
_
n
k
_
2
n
where
n
k
is the binomial coefficient
_
n
k
_
=
n!
k!(n −k)!
which equals the number of subsets of an n-element set that have ex-
actly k elements.
1.0.3 Infinite Discrete Random Variables
3. Poisson distribution (with parameter α) – this arises as the number of
(random) events of some kind (such as people lining up at a bank, or
Geiger-counter clicks, or telephone calls arriving) per unit time. The
sample space S is the set of all nonnegative integers S = 0, 1, 2, 3, ....,
11
and again Ω is the set of all subsets of S. The probability function on
Ω is derived from:
P(¦k¦) = e
−α
α
k
k!
Note that this is an honest probability function, since we will have
P(S) = P(∪

k=1
¦k¦) =

k=1
e
−α
α
k
k!
= e
−α

k=1
α
k
k!
= e
−α
e
α
= 1
1.0.4 Continuous Random Variables
A continuous random variable can take on only any real values, such as when
you are measuring something. Usually the values are (subset of the) reals,
and for technical reasons, we can only assign a probability to certain subsets
of the sample space (but there are a lot of them). These subsets, either
the collection of Borel sets ((sets that can be obtained by taking countable
unions and intersections of intervals)) or Lebesgue-measurable sets ((Borels
plus a some other exotic sets)) comprise the set Ω. As soon as we know the
probability of any interval, i.e., P([a, b]) for all a and b, we can calculate the
probabililty of any Borel set. Recall that in the discrete case, the probabilities
were determined once we knew the probabilities of individual outcomes. That
is P was determined by p(k). In the continuous case, the probability of a
single outcome is always 0, P(¦x¦) = 0. However, it is enough to know
the probabilities of “very small” intervals of the form [x, x + dx], and we
can calculate continuous probabilities as integrals of “probability density
functions”, so-called pdf’s. We think of dx as a very small number and then
taking its limit at 0. Thus, in the continuous case, the pdf is
p(x) = lim
dx→0
1
dx
P([x, x +dx])
so that
P[a, b] =
_
b
a
p(x)dx (1.0.1)
This is really the defining equation of the continuous pdf and I can not stress
too strongly how much you need to use this. Any function p(x) that takes
on only positive values (they don’t have to be between 0 and 1 though), and
whose integral over the whole sample space (we can use the whole real line
if we assign the value p(x) = 0 for points x outside the sample space) is
12 CHAPTER 1. BASIC PROBABILITY
equal to 1 can be a pdf. In this case, we have (for small dx) that p(x)dx
represents (approximately) the probability of the set (interval) [x, x + dx]
(with error that goes to zero faster than dx does). More generally, we have
the probability of the set (interval) [a, b] is:
p([a, b]) =
_
b
a
p(x) dx
So any nonnegative function on the real numbers that has the property that
_

−∞
p(x) dx = 1
defines a continuous probability distribution.
Uniform distribution
As with the discrete uniform distribution, the variable takes on values in
some interval [a, b] and all variables are equally likely. In other words, all
small intervals [x, x + dx] are equally likely as long as dx is fixed and only
x varies. That means that p(x) should be a constant for x between a and
b, and zero outside the interval [a, b]. What constant? Well, to have the
integral of p(x) come out to be 1, we need the constant to be 1/(b −a). It is
easy to calculate that if a < r < s < b, then
p([r, s]) =
s −r
b −a
The Normal Distribution (or Gaussian distribution)
This is the most important probability distribution, because the distribution
of the average of the results of repeated experiments always approaches a
normal distribution (this is the “central limit theorem”). The sample space
for the normal distribution is always the entire real line. But to begin, we
need to calculate an integral:
I =
_

−∞
e
−x
2
dx
We will use a trick that goes back (at least) to Liouville: Now note that
13
I
2
=
__

−∞
e
−x
2
dx
_

__

−∞
e
−x
2
dx
_
(1.0.2)
=
__

−∞
e
−x
2
dx
_

__

−∞
e
−y
2
dy
_
(1.0.3)
=
_

−∞
_

−∞
e
−x
2
−y
2
dxdy (1.0.4)
because we can certainly change the name of the variable in the second in-
tegral, and then we can convert the product of single integrals into a double
integral. Now (the critical step), we’ll evaluate the integral in polar coordi-
nates (!!) – note that over the whole plane, r goes from 0 to infinity as θ
goes from 0 to 2π, and dxdy becomes rdrdθ:
I
2
=
_

0
_

0
e
−r
2
rdrdθ =
_

0
1
2
e
−r
2
[

0
dθ =
_

0
1
2
dθ = π
Therefore, I =

π. We need to arrange things so that the integral is 1, and
for reasons that will become apparent later, we arrange this as follows: define
N(x) =
1


e
−x
2
/2
Then N(x) defines a probability distribution, called the standard normal dis-
tribution. More generally, we define the normal distribution with parameters
µ and σ to be
p(x) =
1
σ
N(
x −µ
σ
) =
1

2πσ
e
−(x−µ)
2

2
Exponential distribution (with parameter α)
This arises when measuring waiting times until an event, or time-to-failure
in reliability studies. For this distribution, the sample space is the positive
part of the real line [0, ∞) (or we can just let p(x) = 0 for x < 0). The
probability density function is given by p(x) = αe
−αx
. It is easy to check
that the integral of p(x) from 0 to infinity is equal to 1, so p(x) defines a
bona fide probability density function.
14 CHAPTER 1. BASIC PROBABILITY
1.1 Expectation of a Random Variable
The expectation of a random variable is essentially the average value it is
expected to take on. Therefore, it is calculated as the weighted average of
the possible outcomes of the random variable, where the weights are just the
probabilities of the outcomes. As a trivial example, consider the (discrete)
random variable X whose sample space is the set ¦1, 2, 3¦ with probability
function given by p(1) = 0.3, p(2) = 0.1 and p(3) = 0.6. If we repeated this
experiment 100 times, we would expect to get about 30 occurrences of X = 1,
10 of X = 2 and 60 of X = 3. The average X would then be ((30)(1) +
(10)(2) + (60)(3))/100 = 2.3. In other words, (1)(0.3) + (2)(0.1) + (3)(0.6).
This reasoning leads to the defining formula:
E(X) =

x∈S
xp(x)
for any discrete random variable. The notation E(X) for the expectation of
X is standard, also in use is the notation ¸X).
For continuous random variables, the situation is similar, except the sum
is replaced by an integral (think of summing up the average values of x by
dividing the sample space into small intervals [x, x +dx] and calculating the
probability p(x)dx that X fall s into the interval. By reasoning similar to
the previous paragraph, the expectation should be
E(X) = lim
dx→0

x∈S
xp(x)dx =
_
S
xp(x)dx.
This is the formula for the expectation of a continuous random variable.
Example 1.1.1. :
1. Uniform discrete: We’ll need to use the formula for

k that you
1.1. EXPECTATION OF A RANDOM VARIABLE 15
learned in freshman calculus when you evaluated Riemann sums:
E(X) =
b

k=a
k
1
b −a + 1
(1.1.1)
=
1
b −a + 1
_
b

k=0
k −
a−1

k=0
k
_
(1.1.2)
=
1
b −a + 1
_
b(b + 1)
2

(a −1)a
2
_
(1.1.3)
=
1
2
1
b −a + 1
(b
2
+b −a
2
+a) (1.1.4)
=
1
2
1
b −a + 1
(b −a + 1)(b +a) (1.1.5)
=
b +a
2
(1.1.6)
which is what we should have expected from the uniform distribution.
2. Uniform continuous: (We expect to get (b + a)/2 again, right?). This
is easier:
E(X) =
_
b
a
x
1
b −a
dx (1.1.7)
=
1
b −a
b
2
−a
2
2
=
b +a
2
(1.1.8)
3. Poisson distribution with parameter α: Before we do this, recall the
Taylor series formula for the exponential function:
e
x
=

k=0
x
k
k!
Note that we can take the derivative of both sides to get the formula:
e
x
=

k=0
kx
k−1
k!
If we multiply both sides of this formula by x we get
xe
x
=

k=0
kx
k
k!
16 CHAPTER 1. BASIC PROBABILITY
We will use this formula with x replaced by α.
If X is a discrete random variable with a Poisson distribution, then its
expectation is:
E(X) =

k=0
k
e
−α
α
k
k!
= e
−α
_

k=0

k
k!
_
= e
−α
(αe
α
) = α.
4. Exponential distribution with parameter α. This is a little like the
Poisson calculation (with improper integrals instead of series), and we
will have to integrate by parts (we’ll use u = x so du = dx, and
dv = e
−αx
dx so that v will be −
1
α
e
−αx
):
E(X) =
_

0
xαe
−αx
dx = α
_

1
α
xe
−αx

1
α
2
e
−αx
_
[

0
=
1
α
.
Note the difference between the expectations of the Poisson and expo-
nential distributions!!
5. By the symmetry of the respective distributions around their “centers”,
it is pretty easy to conclude that the expectation of the binomial dis-
tribution (with parameter n) is n/2, and the expectation of the normal
distribution (with parameters µ and σ) is µ.
6. Geometric random variable (with parameter r): It is defined on the
sample space ¦0, 1, 2, 3, ...¦ of non-negative integers and, given a fixed
value of r between 0 and 1, has probability function given by
p(k) = (1 −r)r
k
The geometric series formula:

k=0
ar
k
=
a
1 −r
shows that p(k) is a probability function. We can compute the expec-
tation of a random variable X having a geometric distribution with
parameter r as follows (the trick is reminiscent of what we used on the
Poisson distribution):
E(X) =

k=0
k(1 −r)r
k
= (1 −r)r

k=0
kr
k−1
1.1. EXPECTATION OF A RANDOM VARIABLE 17
= (1 −r)r
d
dr

k=0
r
k
= (1 −r)r
d
dr
1
1 −r
=
r
1 −r
7. Another distribution defined on the positive integers ¦1, 2, 3, ...¦ by
the function p(k) =
6
π
2
k
2
. This is a probability function based on the
famous formula due to Euler :

k=1
1
k
2
=
π
2
6
But an interesting thing about this distribution concerns its expecta-
tion: Because the harmonic series diverges to infinity, we have that
E(X) =
6
π
2

k=1
k
k
2
=
6
π
2

k=1
1
k
= ∞
So the expectation of this random variable is infinite (Can you interpret
this in terms of an experiment whose outcome is a random variable with
this distribution?)
If X is a random variable (discrete or continuous), it is possible to define
related random variables by taking various functions of X, say X
2
or sin(X)
or whatever.
If Y = f(X) for some function f, then the probability of Y being in some
set A is defined to be the probability of X being in the set f
−1
(A).
As an example, consider an exponentially distributed random variable
X with parameter α=1. Let Y = X
2
. Since X can only be positive, the
probability that Y is in the interval [a, b] is the same as the probability that
X is in the interval [

a,

b].
We can calculate the probability density function p(y) of Y by recalling
that the probability that Y is in the interval [0, y] (actually (−∞, y]) is the
integral of p(y) from 0 to y. In other words, p(y) is the integral of the function
h(y), where h(y) = the probability that Y is in the interval [0, y]. But h(y)
is the same as the probability that X is in the interval [0,

y]. We calculate:
p(y) =
d
dy
_

y
0
e
−x
dx =
d
dy
_
−e
−x
[

y
0
_
=
d
dy
(1 −e


y
) =
1
2

y
e


y
There are two ways to calculate the expectation of Y . The first is obvious:
we can integrate yp(y). The other is to make the change of variables y = x
2
in
18 CHAPTER 1. BASIC PROBABILITY
this integral, which will yield (Check this!) that the expectation of Y = X
2
is
E(Y ) = E(X
2
) =
_

0
x
2
e
−x
dx
Theorem 1.1.2. (Law of the unconscious statistician) If f(x) is any function,
then the expectation of the function f(X) of the random variable X is
E(f(X)) =
_
f(x)p(x)dx or

x
f(x)p(x)
where p(x) is the probability density function of X if X is continuous, or the
probability density function of X if X is discrete.
1.2 Variance and higher moments
We are now lead more or less naturally to a discussion of the “moments” of
a random variable.
The r
th
moment of X is defined to be the expected value of X
r
. In
particular the first moment of X is its expectation. If s > r, then having
a finite s
th
moment is a more restrictive condition than having and r
th
one
(this is a convergence issue as x approaches infinity, since x
r
> x
s
for large
values of x.
A more useful set of moments is called the set of central moments. These
are defined to be the rth moments of the variable X −E(X). In particular,
the second moment of X −E(X) is called the variance of X and is denoted
σ
2
(X). (Its square root is the standard deviation.) It is a crude measure of
the extent to which the distribution of X is spread out from its expectation
value. It is a useful exercise to work out that
σ
2
(X) = E((X −E(X))
2
) = E(X
2
) −(E(X))
2
.
As an example, we compute the variance of the uniform and exponential
distributions:
1. Uniform discrete: If X has a discrete uniform distribution on the in-
terval [a, b], then recall that E(X) = (b + a)/2. We calculate Var(X)
as follows:
Var(X) = E(X
2
)−
_
b +a
2
_
2
=
b

k=a
k
2
b −a + 1

_
b +a
2
_
2
=
1
12
(b−a+2)(b−a).
1.3. BIVARIATE DISTRIBUTIONS 19
2. Uniform continuous: If X has a continuous uniform distribution on the
interval [a, b], then its variance is calculated as follows:
σ
2
(X) =
_
b
a
x
2
b −a
dx −
_
b +a
2
_
2
=
1
12
(b −a)
2
.
3. Exponential with parameter α: If X is exponentially distributed with
parameter α, recall that E(X) = 1/α. Thus:
Var(x) =
_

0
x
2
αe
−αx
dx −
1
α
2
=
1
α
2
The variance decreases as α increases; this agrees with intuition gained
from the graphs of exponential distributions shown last time.
1.3 Bivariate distributions
It is often the case that a random experiment yields several measurements (we
can think of the collection of measurements as a random vector) – one simple
example would be the numbers on the top of two thrown dice. When there are
several numbers like this, it is common to consider each as a random variable
in its own right, and to form the joint probability density (or joint probability)
function p(x, y, z, ...). For example, in the case of two dice, X and Y are
discrete random variables each with sample space S = ¦1, 2, 3, 4, 5, 6¦, and
p(x, y) = 1/36 for each (x, y) in the cartesian product SS. More generally,
we can consider discrete probability functions p(x, y) on sets of the form
S T, where X ranges over S and Y over T. Any function p(x, y) such that
p(x, y) is between 0 and 1 for all (x, y) and such that

x∈S

y∈T
p(x, y) = 1
defines a (joint) probability function on S T.
For continuous functions, the idea is the same. p(x, y) is the joint pdf of
X and Y if p(x, y) is non-negative and satisfies
_

−∞
_

−∞
p(x, y)dxdy = 1
20 CHAPTER 1. BASIC PROBABILITY
We will use the following example of a joint pdf throughout the rest of this
section:
X and Y will be random variables that can take on values (x, y) in the
triangle with vertices (0, 0), (2, 0) and (2, 2). The joint pdf of X and Y will
be given by p(x, y) = 1/(2x) if (x, y) is in the triangle and 0 otherwise. To
see that this is a probability density function, we need to integrate p(x, y)
over the triangle and get 1:
_
2
0
_
x
0
1
2x
dydx =
_
2
0
_
y
2x
_
[
x
0
dx =
_
2
0
1
2
dx = 1.
For discrete variables, we let p(i, j) be the probability that X = i and
Y = j, P(X = i ∩ Y = j). This give a function p, the joint probability
function, of X and Y that is defined on (some subset of) the set of pairs of
integers and such that 0 ≤ p(i, j) ≤ 1 for all i and j and

i=−∞

j=−∞
p(i, j) = 1
When we find it convenient to do so, we will set p(i, j) = 0 for all i and j
outside the domain we are considering.
For continuous variables, we define the joint probability density function
p(x, y) on (some subset of) the plane of pairs of real numbers. We interpret
the function as follows: p(x, y)dxdy is (approximately) the probability that
X is between x and x + dx and Y is between y and y + dy (with error that
goes to zero faster than dx and dy as they both go to zero). Thus, p(x, y)
must be a non-negative valued function with the property that
_

−∞
_

−∞
p(x, y) dxdy = 1
As with discrete variables, if our random variables always lie in some subset
of the plane, we will define p(x, y) to be 0 for all (x, y) outside that subset.
We take one simple example of each kind of random variable. For the
discrete random variable, we consider the roll of a pair of dice. We assume
that we can tell the dice apart, so there are thirty-six possible outcomes and
each is equally likely. Thus our joint probability function will be
p(i, j) = 1/36 if 1 ≤ i ≤ 6, 1 ≤ j ≤ 6
1.4. MARGINAL DISTRIBUTIONS 21
and p(i, j) = 0 otherwise.
For our continuous example, we take the example mentioned at the end
of the last lecture:
p(x, y) =
1
2x
for (x, y) in the triangle with vertices (0, 0), (2, 0) and (2, 2), and p(x, y) = 0
otherwise. We checked last time that this is a probability density function
(its integral is 1).
1.4 Marginal distributions
Often when confronted with the joint probability of two random variables,
we wish to restrict our attention to the value of just one or the other. We
can calculate the probability distribution of each variable separately in a
straightforward way, if we simply remember how to interpret probability
functions. These separated probability distributions are called the marginal
distributions of the respective individual random variables.
Given the joint probability function p(i, j) of the discrete variables X and
Y , we will show how to calculate the marginal distributions p
X
(i) of X and
p
Y
(j) of Y . To calculate p
X
(i), we recall that p
X
(i) is the probability that
X = i. It is certainly equal to the probability that X = i and Y = 0, or
X = i and Y = 1, or . . . . In other words the event X = i is the union of the
events X = i and Y = j as j runs over all possible values. Since these events
are disjoint, the probability of their union is the sum of the probabilities of
the events (namely, the sum of p(i, j)). Thus:
p
X
(i) =

j=−∞
p(i, j)
. Likewise,
p
Y
(j) =

i=−∞
p(i, j)
Make sure you understand the reasoning behind these two formulas!
An example of the use of this formula is provided by the roll of two
dice discussed above. Each of the 36 possible rolls has probability 1/36 of
22 CHAPTER 1. BASIC PROBABILITY
occurring, so we have probability function p(i, j) as indicated in the following
table:
i¸j 1 2 3 4 5 6 p
X
(i)
1 1/36 1/36 1/36 1/36 1/36 1/36 1/6
2 1/36 1/36 1/36 1/36 1/36 1/36 1/6
3 1/36 1/36 1/36 1/36 1/36 1/36 1/6
4 1/36 1/36 1/36 1/36 1/36 1/36 1/6
5 1/36 1/36 1/36 1/36 1/36 1/36 1/6
6 1/36 1/36 1/36 1/36 1/36 1/36 1/6
p
Y
(j) 1/6 1/6 1/6 1/6 1/6 1/6
The marginal probability distributions are given in the last column and last
row of the table. They are the probabilities for the outcomes of the first (resp
second) of the dice, and are obtained either by common sense or by adding
across the rows (resp down the columns).
For continuous random variables, the situation is similar. Given the joint
probability density function p(x, y) of a bivariate distribution of the two
random variables X and Y (where p(x, y) is positive on the actual sample
space subset of the plane, and zero outside it), we wish to calculate the
marginal probability density functions of X and Y . To do this, recall that
p
X
(x)dx is (approximately) the probability that X is between x and x +dx.
So to calculate this probability, we should sum all the probabilities that both
X is in [x, x +dx] and Y is in [y, y +dy] over all possible values of Y . In the
limit as dy approaches zero,this becomes an integral:
p
X
(x)dx =
__

−∞
p(x, y) dy
_
dx
In other words,
p
X
(x) =
_

−∞
p(x, y) dy
Similarly,
p
Y
(y) =
_

−∞
p(x, y) dx
Again, you should make sure you understand the intuition and the reasoning
behind these important formulas.
We return to our example:
p(x, y) =
1
2x
1.5. FUNCTIONS OF TWO RANDOM VARIABLES 23
for (x, y) in the triangle with vertices (0, 0), (2, 0) and (2, 2), and p(x, y) = 0
otherwise, and compute its marginal density functions. The easy one is p
X
(x)
so we do that one first. Note that for a given value of x between 0 and 2, y
ranges from 0 to x inside the triangle:
p
X
(x) =
_

−∞
p(x, y) dy =
_
x
0
1
2x
dy
=
_
y
2x

¸
¸
y=x
y=0
=
1
2
if 0 ≤ x ≤ 2, and p
X
(x) = 0 otherwise. This indicates that the values of X
are uniformly distributed over the interval from 0 to 2 (this agrees with the
intuition that the random points occur with greater density toward the left
side of the triangle but there is more area on the right side to balance this
out).
To calculate p
Y
(y), we begin with the observation that for each value of
y between 0 and 2, x ranges from y to 2 inside the triangle:
p
Y
(y) =
_

−∞
p(x, y) dx =
_
2
x
1
2x
dx
=
_
1
2
ln(x)

¸
¸
¸
x=2
x=y
=
1
2
(ln(2) −ln(y))
if 0 ≤ y ≤ 2 and p
Y
(y) = 0 otherwise. Note that p
Y
(y) approaches infinity as
y approaches 0 from above, and p
Y
(y) approaches 0 as y approaches 2. You
should check that this function is actually a probability density function on
the interval [0, 2], i.e., that its integral is 1.
1.5 Functions of two random variables
Frequently, it is necessary to calculate the probability (density) function of
a function of two random variables, given the joint probability (density)
function. By far, the most common such function is the sum of two random
variables, but the idea of the calculation applies in principle to any function
of two (or more!) random variables.
The principle we will follow for discrete random variables is as follows:
to calculate the probability function for F(X, Y ), we consider the events
24 CHAPTER 1. BASIC PROBABILITY
¦F(X, Y ) = f¦ for each value of f that can result from evaluating F at
points of the sample space of (X, Y ). Since there are only countably many
points in the sample space, the random variable F that results is discrete.
Then the probability function p
F
(f) is
p
F
(f) =

{(x,y)∈S|F(x,y)=f}
p(x, y)
This seems like a pretty weak principle, but it is surprisingly useful when
combined with a little insight (and cleverness).
As an example, we calculate the distribution of the sum of the two dice.
Since the outcome of each of the dice is a number between 1 and 6, the
outcome of the sum must be a number between 2 and 12. So for each f
between 2 and 12:
p
F
(f) =

x=−∞
p(x, f −x) =
min(6,f−1)

max(1,f−6)
1
36
=
1
36
(6 −[f −7[)
A table of the probabilities of various sums is as follows:
f 2 3 4 5 6 7 8 9 10 11 12
p
F
(f) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
The ”tent-shaped” distribution that results is typical of the sum of (inde-
pendent) uniformly distributed random variables.
For continuous distributions, our principle will be a little more compli-
cated, but more powerful as well. To enunciate it, we recall that to calculate
the probability of the event F < f, we integrate the pdf of F from −∞ to f:
P(F < f) =
_
f
−∞
p
F
(g) dg
Conversely, to recover the pdf of F, we can differentiate the resulting func-
tion:
p
F
(f) =
d
df
_
f
−∞
p
F
(g) dg
(this is simply the first fundamental theorem of calculus). Our principle for
calculating the pdf of a function of two random variables F(X, Y ) will be to
calculate the probabilities of the events ¦F(X, Y ) < f¦ (by integrating the
1.5. FUNCTIONS OF TWO RANDOM VARIABLES 25
joint pdf over the region of the plane defined by this inequality), and then
to differentiate with respect to f to get the pdf.
We apply this principle to calculate the pdf of the sum of the random
variables X and Y in our example:
p(x, y) =
1
2x
for (x, y) in the triangle T with vertices (0, 0), (2, 0) and (2, 2), and p(x, y) = 0
otherwise. Let Z = X + Y . To calculate the pdf p
Z
(z), we first note that
for any fixed number z, the region of the plane where Z < z is the half
plane below and to the left of the line y = z −x. To calculate the probability
P(Z < z), we must integrate the joint pdf p(x, y) over this region. Of course,
for z ≤ 0, we get zero since the half plane z < 0 has no points in common
with the triangle where the pdf is supported. Likewise, since both X and
Y are always between 0 and 2 the biggest the sum can be is 4. Therefore
P(Z < z) = 1 for all z ≥ 4.
For z between 0 and 4, we need to integrate 1/2x over the intersection of
the half-plane x +y < z and the triangle T. The shape of this intersection is
different, depending upon whether z is greater than or less than 2: If 0 ≤ z ≤
2, the intersection is a triangle with vertices at the points (0, 0), (z/2, z/2)
and (z, 0). In this case, it is easier to integrate first with respect to x and
then with respect to y, and we can calculate:
P(Z < z) =
_
z/2
0
_
z−y
y
1
2x
dxdy =
1
2
_
z/2
0
_
ln(x)[
x=z−y
x=y
_
dy
=
1
2
_
z/2
0
(ln(z −y) −ln(y)) dy =
1
2
(z −(z −y) ln(z −y) −y ln(y))[
y=z/2
y=0
=
1
2
z ln(2)
And since the (cumulative) probability that Z < z is
1
2
z ln(2) for 0 < z < 2,
the pdf over this range is p
Z
(z) =
ln(2)
2
.
The calculation of the pdf for 2 ≤ z ≤ 4 is somewhat trickier because
the intersection of the half-plane x + y < z and the triangle T is more
complicated. The intersection in this case is a quadrilateral with vertices
at the points (0, 0), (z/2, z/2), (2, z − 2) and (2, 0). We could calculate
P(Z < z) by integrating p(x,y) over this quadrilateral. But we will be a
26 CHAPTER 1. BASIC PROBABILITY
little more clever: Note that the quadrilateral is the ”difference” of two sets.
It consists of points inside the triangle with vertices (0, 0), (z/2, z/2), (z, 0)
that are to the left of the line x = 2. In other words it is points inside this
large triangle (and note that we already have computed the integral of 1/2x
over this large triangle to be
1
2
z ln(2)) that are NOT inside the triangle with
vertices (2, 0), (2, z − 2) and (z, 0). Thus, for 2 ≤ z ≤ 4, we can calculate
P(Z < z) as
P(Z < z) =
1
2
z ln(2) −
_
z
2
_
z−x
0
1
2x
dydx
=
1
2
z ln(2) −
_
z
2
_
y
2x
[
y=z−x
y=0
_
dx
=
1
2
z ln(2) −
_
z
2
z −x
2x
dx =
1
2
ln(2) +
1
2
(z ln(x) −x)[
x=z
x=2
=
1
2
z ln(2) −
1
2
(z ln(z) −z −z ln(2) + 2)
= z ln(2) −
1
2
z ln(z) +
1
2
z −1
To get the pdf for 2 < z < 4, we need only differentiate this quantity, to get
p
Z
(z) =
d
dz
(z ln(2) −
1
2
z ln(z) +
1
2
z −1) = ln(2) −
1
2
ln(z)
Now we have the pdf of Z = X +Y for all values of z. It is p
Z
(z) = ln(2)/2
for 0 < z < 2, it is p
Z
(z) = ln(2) −
1
2
ln(z) for 2 < z < 4 and it is 0 otherwise.
It would be good practice to check that the integral of p
Z
(z) is 1.
The following fact is extremely useful.
Proposition 1.5.1. If X and Y are two random variables, then E(X+Y ) =
E(X) +E(Y ).
1.6 Conditional Probability
In our study of stochastic processes, we will often be presented with situations
where we have some knowledge that will affect the probability of whether
some event will occur. For example, in the roll of two dice, suppose we already
know that the sum will be greater than 7. This changes the probabilities
from those that we computed above. The event F ≥ 8 has probability
1.6. CONDITIONAL PROBABILITY 27
(5 + 4 + 3 + 2 + 1)/36 = 15/36. So we are restricted to less than half of the
original sample space. We might wish to calculate the probability of getting
a 9 under these conditions. The quantity we wish to calculate is denoted
P(F = 9[F ≥ 8), read ”the probability that F=9 given that F ≥ 8”.
In general to calculate P(A[B) for two events A and B (it is not necessary
that A is a subset of B), we need only realize that we need to compute the
fraction of the time that the event B is true, it is also the case the A is true.
In symbols, we have
P(A[B) =
P(A ∩ B)
P(B)
For our dice example (noting that the event F = 9 is a subset of the event
F ≥ 8), we get
P(F = 9[F ≥ 8) =
P(F = 9)
P(F ≥ 8)
=
4/36
15/36
=
4
15
.
As another example (with continuous probability this time), we calculate
for our 1/2x on the triangle example the conditional probabilities: P(X >
1[Y > 1) as well as P(Y > 1[X > 1) (just to show that the probabilities of
A given B and B given A are usually different).
First P(X > 1[Y > 1). This one is easy! Note that in the triangle with
vertices (0, 0), (2, 0) and (2, 2) it is true that Y > 1 implies that X > 1.
Therefore the events (Y > 1) ∩ (X > 1) and Y > 1 are the same, so the
fraction we need to compute will have the same numerator and denominator.
Thus P(X > 1[Y > 1) = 1.
For P(Y > 1[X > 1) we actually need to compute something. But note
that Y > 1 is a subset of the event X > 1 in the triangle, so we get:
P(Y > 1[X > 1) =
P(Y > 1)
P(X > 1)
=
_
2
1
_
x
1
1
2x
dydx
_
2
1
_
x
0
1
2x
dydx
=
1
2
(1 −ln(2))
1
2
= 1 −ln(2)
Two events A and B are called independent if the probability of A given
B is the same as the probability of A (with no knowledge of B) and vice
versa. The assumption of independence of certain events is essential to many
probabilistic arguments. Independence of two random variables is expressed
by the equations:
P(A[B) = P(A) P(B[A) = P(B)
28 CHAPTER 1. BASIC PROBABILITY
and especially
P(A ∩ B) = P(A) P(B)
Two random variables X and Y are independent if the probability that
a < X < b remains unaffected by knowledge of the value of Y and vice versa.
This reduces to the fact that the joint probability (or probability density)
function of X and Y ”splits” as a product:
p(x, y) = p
X
(x)p
Y
(y)
of the marginal probabilities (or probability densities). This formula is a
straightforward consequence of the definition of independence and is left as
an exercise.
Proposition 1.6.1. 1. If X is a random variable, then σ
2
(aX) = a
2
σ
2
(X)
for any real number a.
2. If X and Y are two independent random variables, then σ
2
(X + Y ) =
σ
2
(X) +σ
2
(Y ).
1.7 Law of large numbers
The point of probability theory is to organize the random world. For many
centuries, people have noticed that, though certain measurements or pro-
cesses seem random, one can give a quantitative account of this randomness.
Probability gives a mathematical framework for understanding statements
like the chance of hitting 13 on a roulette wheel is 1/38. This belief about
the way the random world works is embodied in the empirical law of aver-
ages. This law is not a mathematical theorem. It states that if a random
experiment is performed repeatedly under identical and independent condi-
tions, then the proportion of trials that result in a given outcome converges
to a limit as the number of trials goes to infinity. We now state for future
reference the strong law of large numbers. This theorem is the main rigorous
result giving credence to the empirical law of averages.
Theorem 1.7.1. (The strong law of large numbers) Let X
1
, X
2
, . . . be a se-
quence of independent random variables, all with the same distribution,
whose expectation is µ. Then
P( lim
n→∞

n
i=1
X
i
n
→ µ) = 1
1.7. LAW OF LARGE NUMBERS 29
Exercises
1. Make your own example of a probability space that is finite and discrete.
Calculate the expectation of the underlying random variable X.
2. Make your own example of a probability space that is infinite and
discrete. Calculate the expectation of the underlying random variable
X.
3. Make your own example of a continuous random variable. Calculate
its expectation.
4. Prove that the normal distribution function
φ(x; µ, σ) =
1
σ


e
−(x−µ)
2

2
is really a probability density function on the real line (i.e., that it
is positive and that its integral from −∞ to ∞ is 1). Calculate the
expectation of this random variable.
5. What is the relationship among the following partial derivatives?
∂φ
∂σ
,

2
φ
∂x
2
, and
∂φ
∂t
, where t = σ
2
(for the last one, rewrite φ as a function of
x, µ and t).
6. Consider the experiment of picking a point at random from a uniform
distribution on the disk of radius R centered at the origin in the plane
(“uniform” here means if two regions of the disk have the same area,
then the random point is equally likely to be in either one). Calculate
the probability density function and the expectation of the random
variable D, defined to be the distance of the random point from the
origin.
7. Prove the principle of inclusion and exclusion, p(A∪B) = p(A)+p(B)−
p(A ∩ B) and p(A
c
) = 1 −p(A).
8. If S is a finite set with n elements, how many elements are in T(S)
the power set of S. (Hint, try it for the first few values of n, and don’t
forget to count the empty set.)
30 CHAPTER 1. BASIC PROBABILITY
9. Show that for a function f : S →R that for A
1
, A
2
, A
3
, . . ., a collection
of subsets of the real numbers, that
f
−1
(∪

j=1
A
j
) = ∪

j=1
f
−1
(A
j
)
Use this to prove that if f is a random variable on a probability space
(S, Ω, P then its distribution P
f
as defined above is also a probability.
10. For each of the examples of random variables you gave in problems 1,
2 and 3 calculate the expectation and the variance, if they exist.
11. Calculate the variance of the binomial, Poisson and normal distribu-
tions. The answer for the normal distribution is σ
2
.
12. Let X be any random variable with finite second moment. Consider
the function f(a) defined as follows: f(a) = E((X − a)
2
). Show that
the minimum value of f occurs when a = E(X).
13. Fill in the details of the calculation of the variance of the uniform and
exponential distributions. Also, prove that
σ
2
(X) = E((X −E(X))
2
) = E(X
2
) −(E(X))
2
.
14. Let X be a random variable with the standard normal distribution.
find the mean and variance of each of the following:[X[, X
2
and e
tX
.
15. Let X be the sine of an angle in radians chosen uniformly from the
interval (−π/2, π/2). Find the pdf of X, its mean and its variance.
16. Calculate the variance of the binomial, Poisson and normal distribu-
tions. The answer for the normal distribution is σ
2
.
17. Let p(x, y) be the uniform joint probability density on the unit disk,
i.e.,
p(x, y) =
1
π
if x
2
+y
2
< 1
and p(x, y) = 0 otherwise. Calculate the pdf of X +Y .
18. Suppose X and Y are independent random variables, each distributed
according to the exponential distribution with parameter α. Find the
joint pdf of X and Y (easy). Find the pdf of X + Y . Also find the
mean and variance of X +Y .
Chapter 2
Aspects of finance and
derivatives
The point of this book is to develop some mathematical techniques that can
be used in finance to model movements of assets and then to use this to solve
some interesting problems that occur in concrete practice.
Two of the problems that we will encounter is asset allocation and deriva-
tive valuation. The first, asset allocation, asks how do we allocate our re-
sources in various assets to achieve our goals? These goals are usually phrased
in terms of how much risk we are willing to acquire, how much income we
want to derive from the assets and what level of return we are looking for.
The second, derivative valuation, asks to calculate an intrinsic value of a
derivative security (for example, an option on a stock). Though it might
seem more esoteric, the second question is actually more basic than the first
(at least in our approach to the problem). Thus we will attack derivative
valuation first and then apply it to asset allocation.
2.1 What is a derivative security?
What is a derivative security? By this we mean any financial instrument
whose value depends on (i.e. is derived from ) an underlying security. The
underlying in theory could be practically anything: a stock, a stock index,
a bond or much more exotic things. Our most fundamental example is an
option on a stock. An options contract gives the right but not the obligation
to the purchaser of the contract to buy (or sell) a specified (in our case ) stock
31
32 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES
at a specific price at a specific future time. For example, I could buy a (call)
option to buy 100 shares of IBM at $90 a share on January 19. Currently,
this contract is selling for $19
1
2
. The seller of this option earns the $19
1
2
(the
price (premium) of the options contract) and incurs the obligation to sell me
100 shares of IBM stock on January 31 for $90 if I should so choose. Of
course, when January 19 rolls around, I would exercise my option only if the
price of IBM is more than $90.
Another well known and simpler example is a forward. A forward contract
is an agreement/contract between two parties that gives the right and the
obligation to one of the parties to purchase (or sell) a specified commodity
(or financial instrument) at a specified price at a specific future time. No
money changes hands at the outset of the agreement.
For example, I could buy a June forward contract for 50,000 gallons of frozen
orange juice concentrate for $.25/gallon. This would obligate me to take
delivery of the frozen orange juice concentrate in June at that price. No
money changes hands at the establishment of the contract.
A similar notion is that of a futures contract. The difference between a for-
ward and a future contract is that futures are generally traded on exchanges
and require other technical aspects, such as margin accounts.
2.2 Securities
The two most common types of securities are an ownership stake or a debt
stake in some entity, for example a company or a government. You become
a part owner in a company by buying shares of the company’s stock. A debt
security is usually acquired by buying a company’s or a government’s bonds.
A debt security like a bond is a loan to a company in exchange for interest
income and the promise to repay the loan at a future maturity date.
Securities can either be sold in a market (an exchange) or over the counter.
When sold in an exchange the price of the security is subject to market forces.
When sold over the counter, the price is arrived at through negotiation by
the parties involved.
Securities sold on an exchange are generally handled by market makers
or specialists. These are traders whose job it is to always make sure that
the security can always be bought or sold at some price. Thus they provide
liquidity for the security involved. Market makers must quote bid prices (the
amount they are willing to buy the security for)and ask prices (the amount
2.3. TYPES OF DERIVATIVES 33
they are willing to sell the security for) of the securities. Of course the bid
price is usually greater than the ask and this is one of the main ways market
makers make their money. The spot price is the amount that the security is
currently selling for. That is, we will usually assume that the bid and ask
prices are the same. This is one of many idealizations that we will make as
we analyze derivatives.
2.3 Types of derivatives
A European call option (on a stock) is a contract that gives the buyer the
right but not the obligation to buy a specified stock for a specified price K,
the strike price, at a specified date, T, the expiration date from the seller of
the option. If the owner of the option can exercise the option any time up
to T, then it is called an American call.
A put is just the opposite. It gives the owner of the contract the right to
sell. It also comes in the European and American varieties.
Some terminology: If one owns an asset, one is said to have a long position
in that asset. Thus is I own a share of IBM, I am long one share in IBM.
The opposite position is called short. Thus if I owe a share of IBM, I am
short one share of IBM. Short selling is a basic financial maneuver. To short
sell, one borrows the stock (from your broker, for example), sells it in the
market. In the future, one must return the stock (together with any dividend
income that the stock earned). Short selling allows negative amounts of a
stock in a portfolio and is essential to our ability to value derivatives.
One can have long or short positions in any asset, in particular in options.
Being long an option means owning it. Be short an option means that you
are the writer of the option.
These basic call and put options are sold on exchanges and their prices
are determined by market forces.
There are also many types of exotic options.
Asian options: these are options allowing one to buy a share of stock for the
average value of the stock over the lifetime of the option. Different versions
of these are based on calculating the average in different ways. An interesting
feature of these is that their pay off depends not only on the value of the
stock at expiration, but on the entire history of the stock’s movements. We
call this feature path dependence.
Barrier options: These are options that either go into effect or out of effect if
34 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES
the stock value passes some threshold. For example, a down and out option
is an option that expires worthless if the barrier X is reached from above
before expiration. There are also down and in, up and in, up and out, and
you are encouraged to think what these mean. These are path dependent.
Lookback options: Options whose pay-off depends on the the maximum or
minimum value of the stock over the lifetime of the option.
2.4 The basic problem: How much should I
pay for an option? Fair value
A basic notion of modern finance is that of risk. Contrary to common par-
lance, risk is not only a bad thing. Risk is what one gets rewarded for in
financial investments. Quantifying risk and the level of reward that goes
along with a given level of risk is a major activity in both theoretical and
practical circles in finance. It is essential that every individual or corporation
be able to select and manipulate the level of risk for a given transaction. This
redistribution of risk towards those who are willing and able to assume it is
what modern financial markets are about.
Options can be used to reduce or increase risk. Markets for options
and other derivatives are important in the sense that agents who anticipate
future revenues or payments can ensure a profit above a certain level or insure
themselves against a loss above a certain level.
It is pretty clear that a prerequisite for efficient management of risk is
that these derivative products are correctly valued, and priced. Thus our
basic question: How much should I pay for an option?
We make a formal distinction between value and price. The price is what
is actually charged while the value is what the thing is actually worth.
When a developer wants to figure out how much to charge for a new
house say, he calculates his costs: land, material and labor, insurance, cost
of financing etc.. This is the houses value. The developer, wanting to make
a profit, adds to this a markup. The total we call the price. Our problem is
to determine the value of an option.
In 1997, the Nobel Prize in economics was awarded to Robert Merton
and Myron Scholes, who along with Fischer Black, wrote down a solution
to this problem. Black died in his mid-fifties in 1995 and was therefore
not awarded the Nobel prize. Black and Scholes wrote down there formula,
2.5. EXPECTATIONPRICING. EINSTEINANDBACHELIER, WHAT THEYKNEWABOUT DERIVATIVE PRICING. ANDWHAT THEYDIDN’T KNOW35
the Black-Scholes Formula in 1973. They had a difficult time getting their
paper published. However, within a year of its publication, Hewlett Packard
had a calculator with this formula hardwired into it. Today this formula is
an indispensable tool for thousands of traders and investors to value stock
options in markets throughout the world.
Their method is quite general and can be used and modified to handle
may kinds of exotic options as well as to value insurance contracts, and even
has applications outside of finance.
2.5 Expectation pricing. Einstein and Bache-
lier, what they knew about derivative pric-
ing. And what they didn’t know
As opposed to the case of pricing a house, an option seems more like pricing
bets in a casino. That is, the amount the casino charges for a given bet
is based on models for the payoffs of the bets. Such models are of course
statistical/probabilistic.
As an example, suppose we play a (rather silly) game of tossing a die
and the casino pays the face value on the upside of the die in dollars. How
much should the casino charge to play this game. Well, the first thing is
to figure out the fair value. The casino of course is figuring that people
will play this game many many times. Assuming that the die is fair, (i.e.
P(X = i) = 1/6, for i = 1, 2, . . . , 6 and that each roll of the die doesn’t affect
(that is independent of ) any other roll, the law of large numbers says that if
there are n plays of the game, that the average of the face values on the die
will be with probability close to 1
E(X) = 1 1/6 + 2 1/6 + + 6 1/6 = 21/6
Thus after n plays of the game, the casino feels reasonably confident that
they will be paying off something close to $21/6 n. So 21/6 is the fair valuer
per game. The casino desiring to make a profit sets the actual cost to play
at 4. Not too much higher than the fair value or no would want to play, they
would lose their money too fast.
This is an example of expectation pricing. That is, the payoff of some
experiment is random. One models this payoff as a random variable and uses
36 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES
the strong law of large numbers to argue that the fair value is the expectation
of the random variable. This works much of the time.
Some times it doesn’t. There are a number of reasons why the actual value
might not be the expectation price. In order for the law of large numbers
to kick in, the experiment has to be performed many times, in independent
trials. If you are offered to play a game once, it might not be a very good
value. What’s the fair value to play Russian Roulette? As we will see below,
there are more subtle and interesting reason why the fair value is not enforced
in the market.
Thinking back to stocks and options, what would be the obvious way to
use expectation pricing to value an option. The first order of business would
be to model the movement of the underlying stock. Unfortunately, the best
we can do here as well is to give a probabilistic model.
The French mathematician Louis Bachelier in his doctoral dissertation
around 1900 was one of the earliest to attempt to model stock movements.
He did so by what we now call a Random Walk (discrete version) or a Wiener
process (continuous version). This actually predated the use of these to model
Brownian motion, (for which Einstein got his Nobel Prize ). Bachelier’s thesis
advisor, Jacques Hadamard did not have such high regard for Bachelier’s
work.
2.5.1 Payoffs of derivatives
We will have a lot to say in the future about models for stock movements.
But for now, let’s see how might we use this to value an option. Let us
consider a European call option C to buy a share of XYZ stock with strike
price K at time T. Let S
t
denote the value of the stock at time t.The one
time we know the value of the option is at expiration. At expiration there
are two possibilities. If the value of the stock at expiration is less than the
strike price S
T
≤ K, we don’t exercise our option and expires worthless. If
S
T
> K then it makes sense to exercise the option, paying K for the stock,
and selling it on the market for S
T
, the whole transaction netting $S
T
−K.
Thus, the option is worth max(S
t
− K, 0). This is what we call the payoff
of the derivative. It is what the derivative pays at expiration, which in this
case only depends on the value of the stock at expiration.
On your own, calculate the payoff of all the derivatives defined in section
2.3.
2.6. THE SIMPLE CASE OF FORWARDS. ARBITRAGE ARGUMENTS37
Assuming a probabilistic model for S
t
we would arrive at a random vari-
able S
T
describing in a probabilistic way the value of XYZ at time T. Thus
the value of the option would be the random variable max(S
T
−K, 0). And ex-
pectation pricing would say that the fair price should be E(max(S
T
−K, 0)).
(I am ignoring the time value of money, that is, I am assuming interest rates
are 0. )
Bachelier attempted to value options along these lines, unfortunately, it
turns out that expectation pricing (naively understood at least) does not
work. There is another wrinkle to option pricing. It’s called arbitrage.
2.6 The simple case of forwards. Arbitrage
arguments
I will illustrate the failure of expectation pricing with the example of a for-
ward contract. This time I will also keep track of the time value of money.
The time value of money is expressed by interest rates. If one has $B at
time t , then at time T > t it will be worth $ exp(r(T −t))B. Bonds are the
market instruments that express the time value of money. A zero-coupon
bond is a promise of a specific amount of money (the face or par value) at
some specified time in the future, the expiration date. So if a bond promises
to pay $B
T
at time T, then its current value B
t
can be expressed as B
T
discounted by some interest rate r, that is B
t
= exp(−r(T −t))B
T
. For the
meanwhile, we will assume that the interest rate r is constant. This makes
some sense. Most options are rather short lived, 3 months to a year or so.
Over this period, the interest rates do no vary so much.
Now suppose that we enter into a forward contract with another person to
buy one share of XYZ stock at time T. We want to calculate what the forward
price $F is. Thus at time T, I must buy a share of XYZ for $F from him and
he must sell it to me for $F. So at time T, the contract is worth $S
T
− F.
According to expectation pricing, and adjusting for the time value of money,
this contract should at time t be worth $ exp(r(T −t))E(S
T
−F). Since the
contract costs nothing to set up, if this is not zero, it would lead to profits or
losses over the long run, by the law of large numbers. So expectation pricing
says F = E(S
T
).
On the other hand, consider the following strategy for the party who
must deliver the stock at time T. If the stock is now, at time t worth $S
t
,
38 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES
he borrows the money at the risk free interest rate r, buys the stock now,
and holds the stock until time T. At time T he can just hand over the stock,
collect $F and repay the loan exp(r(T −t))S
t
. After the transaction is over
he has $(F − exp(r(T − t))S
t
. At the beginning of the contract, it is worth
nothing; no money changes hands. So he starts with a contract costing no
money to set up, and in the end has $(F −exp(r(T −t))S
t
. If this is positive,
then we have an arbitrage opportunity. That is a risk free profit. No matter
how small the profit is on any single transaction, one can parley this into
unlimited profits by performing the transaction in large multiples. On the
other hand, what will happen in practice if this is positive. Someone else will
come along and offer the same contract but with a slightly smaller F still
making a positive (and thus a potentially unlimited) positive. Therefore, as
long as $(F − exp(r(T − t))S
t
> 0 someone can come along and undercut
him and still make unlimited money. So $(F −exp(r(T −t))S
t
≤ 0
If $(F − exp(r(T − t))S
t
< 0 you can work it from the other side. That
is, the party to the contract who must buy the stock for F at time T, can
at time t short the stock (i.e. borrow it) and sell it in the market for $S
t
.
Put the money in the risk free instrument. At time T, buy the stock for $F,
return the stock, and now has exp(r(T −t))S
t
in the risk free instrument. So
again, at time t the contract costs nothing to set up, and at time T she has
$(exp(r(T −t))S
t
−F) which is positive. Again, an arbitrage opportunity. A
similar argument as in the paragraph above shows that $(exp(r(T −t))S
t

F) ≤ 0. So we arrive at F = exp(r(T − t))S
t
. So we see something very
interesting. The forward price has nothing to do with the expected value
of the stock at time S
T
. It only has to do with the value of the stock at
the time of the beginning of the contract. This is an example of the basic
arbitrage argument. In practice, the price arrived at by expectation pricing
F = E(S
T
) would be much higher than the price arrived at by the arbitrage
argument.
Let’s give another example of an arbitrage argument. This is also a basic
result expressing the relationship between the value of a call and of a put.
Proposition 2.6.1. (Put-Call parity) If S
t
denotes the value of a stock XYZ
at time t, and C and P denote the value of a call and put on XYZ with
the same strike price K and expiration T, then at time t, P + S
t
= C +
K exp(−r(T −t)).
Proof: By now, you should have calculated that the payoff on a put is
max(K−S
T
, 0). Now form the portfolio P +S −C. That is, long a share of
2.7. ARBITRAGE AND NO-ARBITRAGE 39
stock, and a put, and short a call. Then at time T we calculate.
Case 1: S
T
≤ K. Then P +S−C = max(K−S
T
, 0)+S
T
−max(S
T
−K, 0) =
K −S
T
+S
T
−0 = K
Case 2: S
T
> K. Then P +S−C = max(K−S
T
, 0)+S
T
−max(S
T
−K, 0) =
0 +S
T
−(S
T
−K) = K
So the payoff at time T from this portfolio is K no matter what. That is it’s
risk free. Regardless of what S
T
is, the portfolio is worth K at time T. What
other financial entity has this property? A zero-coupon bond with par value
K. Hence the value of this portfolio at time t is K exp(−r(T − t)). Giving
the theorem.
Q.E.D.
2.7 Arbitrage and no-arbitrage
Already we have made arguments based on the fact that it is natural for
competing players to undercut prices as long as they are guaranteed a pos-
itive risk free profit, and thus able to generate large profits by carrying out
multiple transactions. This is formalized by the no arbitrage hypothesis. (We
formalize this even more in the next section.) An arbitrage is a portfolio that
guarantees an (instantaneous) risk free profit. It is important to realize that
a zero-coupon bond is not an arbitrage. It is risk free to be sure, but the
growth of the risk free bond is just an expression of the time value of money.
Thus the word instantaneous in the definition is equivalent to saying that
the portfolio is risk free and grows faster than the risk-free interest rate.
Definition 2.7.1. The no-arbitrage hypothesis is the assumption that arbi-
trage opportunities do not exist.
What is the sense of this? Of course arbitrage opportunities exist. There
exist people called arbitrageurs whose job it is to find these opportunities.
(And some of them make a lot of money doing it.)
Actually, a good discussion of this is in Malkiel’s book, along with the
efficient market hypothesis. But Malkiel and others would insist that in
actual fact, arbitrage opportunities don’t really exist. But even if you don’t
buy this, it seems to be a reasonable assumption (especially in view of the
reasonable arguments we made above.) as a starting point.
40 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES
In fact, the existence of arbitrageurs makes the no-arbitrage argument
more reasonable. The existence of many people seeking arbitrage opportu-
nities out means that when such an opportunity arises, it closes all the more
quickly. On publicly traded stocks the existence of arbitrage is increasingly
rare. So as a basic principle in valuing derivatives it is a good starting point.
Also, if one sees derivatives mispriced, it might mean the existence of arbi-
trage opportunities. So you have a dichotomy; you can be right (about the
no-arbitrage hypothesis), in which case your evaluation of derivative prices
is better founded, or you can make money (having found an arbitrage oppor-
tunity).
2.8 The arbitrage theorem
In this section we establish the basic connection between finance and mathe-
matics. Our first goal is to figure out what arbitrage means mathematically
and for this we model financial markets in the following way. We consider
the following situation. Consider N possible securities, labeled ¦1, . . . , N¦.
These may be stocks, bonds, commodities, options, etc. Suppose that now, at
time 0, these securities are selling at S
1
, S
2
, S
3
, . . . , S
N
units of account (say
dollars if you need closure). Note that this notation is different than notation
in the rest of the notes in that subscript does not denote time but indexes the
different securities. Consider the column vector S whose components are the
S
j
’s. At time 1, the random nature of financial markets are represented by
the possible states of the world, which are labeled 1, 2, 3, 4, . . . , M. That is,
from time 0 to time 1 the world has changed from its current state (reflected
in the securities prices S), to one of M possible states. Each security, say i
has a payoff d
ij
in state j. This payoff reflects any change in the security’s
price, any dividends that are paid or any possible change to the over all worth
of the security. We can organize these payoffs into an N M matrix,
D =
_
_
_
_
d
1,1
d
1,2
. . . d
1,M
d
2,1
d
2,2
. . . d
2,M
. . . . . . . . . . . .
d
N,1
d
N,2
. . . d
N,M
_
_
_
_
(2.8.1)
2.8. THE ARBITRAGE THEOREM 41
Our portfolio is represented by a N 1 column matrix
θ =
_
_
_
_
θ
1
θ
2
. . .
θ
N
_
_
_
_
(2.8.2)
where θ
i
represents the number of copies of security i in the portfolio. This
can be a negative number, indicating short selling.
Now, the market value of the portfolio is θ S = θ
T
S. The pay-off of the
portfolio can be represented by the M 1 matrix D
T
θ
=
_
_
_
_
_

N
j=1
d
j,1
θ
j

N
j=1
d
j,2
θ
j
. . .

N
j=1
d
j,M
θ
j
_
_
_
_
_
(2.8.3)
where the ith row of the matrix, which we write as (D
T
θ)
i
represents the
payoff of the portfolio in state i. The portfolio is riskless if the payoff of the
portfolio is independent of the state of the world, i.e. (D
T
θ)
i
= (D
T
θ)
j
for
any states of the world i and j.
Let
R
n
+
= ¦(x
1
, . . . , x
n
)[x
i
≥ 0 for all i¦
and
R
n
++
= ¦(x
1
, . . . , x
n
)[x
i
> 0 for all i¦
. We write v ≥ 0 if v ∈ R
n
+
and v > 0 if v ∈ R
n
++
.
Definition 2.8.1. An arbitrage is a portfolio θ such that either
1. θ S ≤ 0 and D
T
θ > 0 or
2. θ S < 0 and D
T
θ ≥ 0
These conditions express that the portfolio goes from being worthless
(respectively, less that worthless) to being worth something (resp. at least
worth nothing), regardless of the state at time 1. In common parlance, an
arbitrage is an (instantaneous) risk free profit, in other words, a free-lunch.
Definition 2.8.2. A state price vector is a M1 column vector ψ such that
ψ > 0 and S = Dψ .
42 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES
Theorem 2.8.1. (Arbitrage theorem) There are no arbitrage portfolios if and
only if there is a state price vector.
Before we prove this theorem, we need an abstract theorem from linear
algebra. Let’s recall that a subset C ⊂ R
n
is called convex if the line between
any two points of C is completely contained in C. That is, for any two points
x, y ∈ C and any t ∈ [0, 1] one has
x +t(y −x) ∈ C
Theorem 2.8.2. (The separating hyperplane theorem) Let C be a compact
convex subset of R
n
and V ⊂ R
N
a vector subspace such that V ∩ C = ∅.
Then there exists a point ξ
0
∈ R
n
such that ξ
0
c > 0 for all c ∈ C and
ξ
0
v = 0 for all v ∈ V .
First we need a lemma.
Lemma 2.8.3. Let C be a closed and convex subset of R
n
not containing 0.
Then there exists ξ
0
∈ C such that ξ
0
c ≥ [[ξ
0
[[
2
for all c ∈ C.
Proof: Let r > 0 be such that B
r
(0) ∩ C ,= ∅. Here B
r
(x) denotes all those
points a distance r or less from x. Let ξ
0
∈ C ∩ B
r
(0) be an element which
minimizes the distance from the origin for elements in C ∩ B
r
(0). That is,
[[ξ
0
[[ ≤ [[c[[ (2.8.4)
for all c ∈ B
r
(0) ∩ C. Such a ξ
0
exists by the compactness of C ∩ B
r
(0).
Then it follows that (2.8.4) holds for all c ∈ C since if c ∈ C ∩ B
r
(0) then
[[ξ
0
[[ ≤ r ≤ [[c[[.
Now since C is convex, it follows that for any c ∈ C, we have ξ
0
+t(c−ξ
0
) ∈
C so that
[[ξ
0
[[
2
≤ [[ξ
0
+t(c −ξ
0
)[[
2
= [[ξ
0
[[
2
+ 2tξ
0
(c −ξ
0
) +t
2
[[c −ξ
0
[[
2
Hence
2[[ξ
0
[[
2
−t[[c −ξ
0
[[ ≤ 2ξ
0
c
for all t ∈ [0, 1]. So ξ
0
c ≥ [[ξ
0
[[
2
QED.
Proof (of the separating hyperplane theorem) Let C be compact and convex,
V ⊂ R
n
as in the statement of the theorem. Then consider
C −V = ¦c −v [ c ∈ C, v ∈ V ¦
2.8. THE ARBITRAGE THEOREM 43
This set is closed and convex. (Here is where you need C be compact.) Then
since C ∩V = ∅ we have that C −V doesn’t contain 0. Hence by the lemma
there exists ξ
0
∈ C −V such that ξ
0
x ≥ [[ξ
0
[[
2
for all x ∈ C −V . But this
implies that for all c ∈ C, v ∈ V that ξ
0
c −ξ
0
v ≥ [[ξ
0
[[
2
. Or that
ξ
0
c > ξ
0
v
for all c ∈ C and v ∈ V . Now note that since V is a linear subspace, that
for any v ∈ V , av ∈ V for all a ∈ R. Hence ξ
0
c > ξ
0
av = aξ
0
v for all a.
But this can’t happen unless ξ
0
v = 0. So we have ξ
0
v = 0 for all v ∈ V
and also that ξ
0
c > 0.
QED.
Proof (of the arbitrage theorem): Suppose S = Dψ, with ψ > 0. Let θ be a
portfolio with S θ ≤ 0. Then
0 ≥ S θ = Dψ θ = ψ D
T
θ
and since ψ > 0 this implies D
T
θ ≤ 0. The same argument works to handle
the case that S θ < 0.
To prove the opposite implication, let
V = ¦(−S θ, D
T
θ) [ θ ∈ R
N
¦ ⊂ R
M+1
Now consider C = R
+
R
M
+
. C is closed and convex. Now by definition,
there exists an arbitrage exactly when there is a non-zero point of intersection
of V and C. Now consider
C
1
= ¦(x
0
, x
1
, . . . , x
M
) ∈ C [ x
0
+ +x
M
= 1¦
C
1
is compact and convex. Hence there exists according to (2.8.2) x ∈ C
1
with
1. x v = 0 for all v ∈ V and
2. x c > 0 for all c ∈ C
1
.
Then writing x = (x
0
, x
1
, . . . , x
M
) we have that x v = 0 implies that for all
θ ∈ R
N
, (−Sθ)x
0
+D
T
θ(x
1
, . . . , x
M
)
T
= 0. So D
T
θ(x
1
, . . . , x
M
)
T
= x
0

or θ D(x
1
, . . . , x
M
)
T
= x
0
S θ. Hence letting
ψ =
(x
1
, . . . , x
M
)
T
x
0
44 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES
we have θ Dψ = S θ for all θ. Thus Dψ = S and ψ is a state price vector.
Q.E.D.
Let’s consider the following example. A market consisting of two stocks
1, 2. And two states 1, 2. Suppose that S
1
= 1 and S
2
= 1. And
D =
_
1 1
1
2
2
_
(2.8.5)
So security 1 pays off 1 regardless of state and security 2 pays of 1/2 in state
1 and 2 in state 2. Hence if
θ =
_
θ
1
θ
2
_
(2.8.6)
is a portfolio its payoff will be D
T
θ
_
θ
1

2
/2
θ
1
+ 2θ
2
_
(2.8.7)
Is there a ψ with S = Dψ. One can find this matrix easily by matrix inversion
so that
_
1
1
_
=
_
1 1
1
2
2
__
ψ
1
ψ
2
_
(2.8.8)
So that
_
ψ
1
ψ
2
_
=
_
4/3 −2/3
−1/3 2/3
__
1
1
_
=
_
2/3
1/3
_
(2.8.9)
Now we address two natural and important questions.
What is the meaning of the state price vector?
How do we use this to value derivatives?
We formalize the notion of a derivative.
Definition 2.8.3. A derivative (or a contingency claim) is a function on the
set of states.
We can write this function as an M 1 vector. So if h is a contingency
claim, then h will payoff h
j
if state j is realized. To see how this compares
with what we have been talking about up until now, let us model a call on
2.8. THE ARBITRAGE THEOREM 45
security 2 in the example above. Consider a call with strike price 1
1
2
on
security 2 with expiration at time 1. Then h
j
= max(d
2,j
−1
1
2
, 0). Or
h =
_
h
1
h
2
_
=
_
0
1
2
_
(2.8.10)
So our derivatives are really nothing new here, except that it allows the
possibility of a payoff depending on a combination of securities, which is
certainly a degree of generality that we might want.
To understand what the state price vector means, we introduce some
important special contingency claims. Let
j
be a contingency claim that
pays 1 if state change occurs, and 0 if any other state occurs. So

j
=
_
_
_
_
_
_
_
_
0
0

1 in the jth row

0
_
_
_
_
_
_
_
_
(2.8.11)
These are often called binary options. It is just a bet paying off 1 that state
j occurs. How much is this bet worth? Suppose that we can find a portfolio
θ
j
such that D
T
θ
j
=
j
. Let ψ be a state price vector. Then the market
value of the portfolio θ
j
is
S θ
j
= Dψ θ
j
= ψ D
T
θ
j
= ψ
j
= ψ
j
Hence we have shown, that the market value of a portfolio whose payoff is
j
is ψ
j
. So the state price vector is the market value of a bet that state j will
be realized. This is the meaning of ψ.
Now consider
B =
M

j=1

j
=
_
_
_
_
1
1

1
_
_
_
_
(2.8.12)
This is a contingency claim that pays 1 regardless of state. That is, B is
a riskless zero-coupon bond with par value 1 and expiration at time 1. Its
market value will then be R =

M
j=1
ψ
j
. So this R is the discount factor for
riskless borrowing. (you might want to think of R as (1 +r)
−1
where r is the
46 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES
interest rate.) Now we can form a vector Q =
1
R
ψ. Note that Q ∈ R
++
and

M
j=1
q
j
= 1. So Q can be interpreted as a probability function on the state
space. It is important to remark that these probabilities are not the
real world probabilities that might be estimated for the states j.
They are probabilities that are derived only from the no-arbitrage
condition. They are therefore often called pseudo-probabilities.
Coming back to our problem of valuing derivatives, consider a derivative
h. If θ is a portfolio whose payoff is h, i.e. D
T
θ = h, we say that θ replicates
h. Then the market value of θ is
S θ = Dψ θ = ψ D
T
θ
= ψ h = RQ h = D
M

j=1
q
j
h
j
= RE
Q
(h)
That is, the market value of θ (and therefore the derivative) is the discounted
expectation of h with respect to the pseudo-probability Q. Hence, we have
regained our idea that the value of a contingency claim should be
an expectation, but with respect to probabilities derived from only
from the no-arbitrage hypothesis, not from any actual probabilities
of the states occurring.
Let us go back to the example above. Let’s go ahead and value the option
on security number 2. R = 1 in this example, so the value of h is
RE
Q
(h) = 1 (0
2
3
+
1
2
1
3
) = 1/6
Coming back to general issues, there is one further thing we need to dis-
cuss. When is there a portfolio θ whose payoff D
T
θ is equal to a contingency
claim h. For this we make the following
Definition 2.8.4. A market is complete if for each of the
j
there is a portfolio
θ
j
with D
T
θ
j
=
j
.
Now we see that if a market is complete, then for any contingency claim
h, we have that for θ =

M
j=1
h
j
θ
j
that D
T
θ = h. That is, a market is
complete if and only if every claim can be replicated.
We state this and an additional fact as a
Theorem 2.8.4. 1. A market is complete if and only if every contingency
claim can be replicated.
2.8. THE ARBITRAGE THEOREM 47
2. An arbitrage free market is complete if and only if the state price vector
is unique.
Proof: Having proved the first statement above we prove the second. It
is clear from the first statement that if a market is complete then D
T
:
R
N
→ R
M
is onto, that is has rank M. Thus if ψ
1
and ψ
2
are two state
price vectors, then D(ψ
1
− ψ
2
) θ = (S − S) θ = 0 for all θ. But then
0 = D(ψ
1
− ψ
2
) θ = (ψ
1
− ψ
2
) D
T
θ. Since D
T
is onto, this means that

1
−ψ
2
) v = 0 for all v ∈ R
N
which means that ψ
1
−ψ
2
= 0.
We prove the other direction by contradiction. So assume that the market
is not complete, and ψ
1
is a state price vector. Then D
T
is not onto. So we
can find a vector τ orthogonal to the image, i.e. D
T
θ τ = 0 for all θ. Then
θ Dτ = 0 for all θ so Dτ = 0 and ψ
2
= ψ
1
+τ is another state price vector.
QED.
48 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES
Exercises
1. Consider a variant of the game described above, tossing a die and the
casino pays the face value on the upside of the die in dollars. Except
this time, if the player doesn’t like her first toss, she is allowed one
additional toss and the casino then pays the face value on the second
toss. What is the fair value of this game?.
2. Suppose a stock is worth $S

just before it pays a dividend, $D. What
is the value of the stock S
+
just after the dividend? Use an arbitrage
argument.
3. Suppose that stock XY Z pays no dividends, and C is an American
call on XY Z with strike price K and expiration T. Show, using an
arbitrage argument, that it is never advantageous to exercise early, that
is, for any intermediate time t < T, C
t
> S
t
−K. (Hint: Calculate the
payoff of the option as seen from the same time T in two cases: 1) If
you exercise early at time t < T, 2) If you hold it until expiration. To
finance the first case, you borrow K at time t to buy the stock, buy the
stock, and repay the loan at time T. Calculate this portfolio at time
T. The other case you know the payoff. Now compare and set up an
arbitrage if S
t
−K ≥ C
t
.)
4. Suppose that C is a call on a non-dividend paying stock S, with strike
price K and expiration T. Show that for any time t < T that C
t
>
S
t
−exp(−r(T −t))K.
5. Suppose that C and D are two calls on the same stock with the same
expiration T, but with strike prices K < L respectively. Show by an
arbitrage argument that C
t
≥ D
t
for all t < T.
6. Suppose that C and D are two calls on the same stock with the same
strike price K, but with expirations T
1
< T
2
, respectively. Show by an
arbitrage argument that D
t
≥ C
t
for all t < T
1
.
7. Using the arbitrage theorem, show that there can only be one bond
price. That is, if there are 2 securities paying $1, regardless of state,
then their market value is the same.
8. Consider the example of the market described in section (2.7). Let h
be a put on security 2 with strike price
1
2
. Calculate its market value.
2.8. THE ARBITRAGE THEOREM 49
9. Make up your own market, and show that there is a state price vector.
Consider a specific derivative (of your choice) and calculate its market
value.
50 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES
Chapter 3
Discrete time
In this chapter, we proceed from the one period model of the last chapter to
the mutli-period model.
3.1 A first look at martingales
In this section I would like to give a very intuitive sense of what a martingale
is, why they come up in option pricing, and then give an application to
calculating expected stopping times.
The concept of a martingale is one of primary importance in the modern
theory of probability and in theoretical finance. Let me begin by stating the
slogan that a martingale is a mathemical model for a fair game. In fact, the
term martingale derives from the well known betting strategy (in roulette)
of doubling ones bet after losing. First let’s think about what we mean by a
fair game.
Giorolamo Cardano, an Italian, was one of the first mathematicians to
study the theory of gambling. In his treatise, The Book of Games of Chance
he wrote: ”The most fundamental principle of all in gambling is simply equal
conditions, for example, of opponents, of bystanders, of money, of situation,
of the dice box, and of the die itself. To the extent to which you depart from
that equality, if it is in your opponent’s favor, you are a fool, and if in your
own, you are unjust.” (From A. Hald, A History of Probability and Statistics
and their applications before 1750, John Wiley and Sons, New York.)
How do we mathematically model this notion of a fair game, that is, of
equal conditions? Consider a number of gamblers, say N, playing a gambling
51
52 CHAPTER 3. DISCRETE TIME
game. We will assume that this game is played in discrete steps called hands.
We also assume that the gamblers make equal initial wagers and the winner
wins the whole pot. It is reasonable to model each gambler as the gamblers’
current fortunes. That is, this is the only information that we really need to
talk about. Noting the gamblers’ fortunes at a single time is not so useful,
so we keep track of their fortunes at each moment of time. So let F
n
denote
fortune of one of the gamblers after the nth hand. Because of the random
nature of gambling we let F
n
be a random variable for each n. A sequence
of random variables, F
n
is what is called a stochastic process.
Now for the equal conditions. A reasonable way to insist on the equal
conditions of the gamblers is to assume that each of the gamblers’ fortune
processes are distributed (as random variables) in exactly the same way. That
is, if p
n
(x) denotes the probability density function of F
n
, and G
n
is any of
the other gamblers’ fortunes, with pdf q
n
(x), then we have p
n
(x) = q
n
(x) for
all n and x.
What are the consequences of this condition? The amount that the gam-
bler wins in hand n is the random variable ξ
n
= F
n
− F
n−1
and the equal
distribution of the gamblers’ fortunes implies that the gamblers’ wins in the
nth hand are also equally distributed. From the conditions on the gambling
game described above, we must have E(ξ
n
) = 0. Think about this. Hence
we have that
E(F
n
) = E(F
n−1

n
) = E(F
n−1
) +E(ξ
n
) = E(F
n−1
)
So the expected fortunes of the gambler will not change over time. This is
close to the Martingale condition. The same analysis says that if the gambler
has F
n−1
after the n − 1 hand, then he can expect to have F
n−1
after the
nth hand. That is, E(F
n
[F
n−1
) = F
n−1
. (Here E(F
n
[F
n−1
) = F
n−1
denotes
a conditional expectation, which we explain below.) This is closer to the
Martingale condition which says that E(F
n
[F
n−1
, F
n−1
, , F
0
) = F
n−1
.
O Romeo, Romeo! wherefore art thou Romeo?
Deny thy father and refuse thy name;
Or, if thou wilt not, be but sworn my love,
And I’ll no longer be a Capulet.
-Juliet
3.1. A FIRST LOOK AT MARTINGALES 53
Now let me describe a very pretty application of the notion of martingale
to a simple problem. Everyone has come accross the theorem in a basic
probability course that a monkey typing randomly at a typewriter with a
probability of 1/26 of typing any given letter (we will always ignore spaces,
capitals and punctuation) and typing one letter per second will eventually
type the complete works of Shakespeare with probability 1. But how long
will it take. More particularly, let’s consider the question of how long it will
take just to type: ”O Romeo, Romeo”. Everyone recognizes this as Juliet’s
line in Act 2 scene 2 of Shakespeare’s Romeo and Juliet. Benvloio also says
this in Act 3 scene 1. So the question more mathematically is what is the
expected time for a monkey to type ”O Romeo Romeo”, ignoring spaces
and commas. This seems perhaps pretty hard. So let’s consider a couple of
similar but simpler questions and also get our mathematical juices going.
1. Flipping a fair coin over and over, what is the expected number of times
it takes to flip before getting a Heads and then a Tails: HT ?
2. Flipping a fair coin over and over, what is the expected number of times
it takes to flip two Heads in a row: HH ?
We can actually calculate these numbers by hand. Let T
1
denote the random
variable of expected time until HT and T
2
the expected time until HH.
Then we want to calculate E(T
1
) and E(T
2
). If T denotes one of these RV’s
(random variables) then
E(T) =

n=1
nP(T = n)
We then have
P(T
1
= 1) = 0
P(T
1
= 2) = P(¦HT¦) = 1/4
P(T
1
= 3) = P(¦HHT, THT¦) = 2/8
P(T
1
= 4) = P(¦HHHT, THHT, TTHT¦) = 3/16
P(T
1
= 5) = P(¦HHHHT, THHHT,
TTHHT, TTTHT¦) = 4/32
P(T
1
= 6) = P(¦HHHHHT, THHHHT,
TTHHHT, TTTHHT, TTTTHT¦) = 5/64
(3.1.1)
54 CHAPTER 3. DISCRETE TIME
We can see the pattern P(T
1
= n) =
n−1
2
n
and so
E(T
1
) =

n=2
n(n −1)/2
n
. To sum this, let’s write down the ”generating function”
f(t) =

n=1
n(n −1)t
n
so that E(T
1
) = f(1/2) To be able to calculate with such generating functions
is a useful skill. First not that the term n(n −1) is what would occur in the
general term after differentiating a power series twice. On the other hand,
this term sits in front of t
n
instead of t
n−2
so we need to multiply a second
derivative by t
2
. At this point, we make some guesses and then adjustments
to see that
f(t) = t
2
(

n=0
t
n
)

so it is easy to see that
f(t) =
2t
2
(1 −t)
3
and that E(T
1
) = f(1/2) = 4.
So far pretty easy and no surprises. Let’s do E(T
2
). Here
P(T
2
= 1) = 0
P(T
2
= 2) = P(¦HH¦) = 1/4
P(T
2
= 3) = P(¦THH¦) = 1/8
P(T
2
= 4) = P(¦T −THH, H −THH¦ = 2/16
P(T
2
= 5) = P(¦TT −THH, HT −THH, TH −THH¦) = 3/32
P(T
2
= 6) = P(¦TTT −THH, TTH −THH, THT −THH,
HTT −THH, HTH −THH¦) = 5/64
P(T
2
= 7) = P(¦TTTT −THH, TTTH −THH, TTHT −THH,
THTT −THH, HTTT −THH, THTH −THH,
HTHT −THH, HTTH −THH¦ = 8/128
(3.1.2)
Again the pattern is easy to see: P(T
2
= n) = f
n−1
/2
n
where f
n
denotes
the nth Fibonacci number (Prove it!).(Recall that the Fibonacci numbers are
3.1. A FIRST LOOK AT MARTINGALES 55
defined by the recursive relationship, f
1
= 1, f
2
= 1 and f
n
= f
n−1
+ f
n−2
.)
So
E(T
2
) =

n=2
nf
n−1
/2
n
Writing g(t) =


n=1
nf
n−1
t
n
we calculate in a similar manner (but more
complicated) that
g(t) =
2t
2
−t
3
(1 −t −t
2
)
2
and that E(T
2
) = g(1/2) = 6. A little surprising perhaps.
What is accounting for the fact that HH takes longer? A simple answer
is that if we are awaiting the target string (HT or HH) then if we succeed
to get the first letter at time n but then fail at time n + 1 to get the second
letter, the earliest we can succeed after that is n + 2 in the case of HT but
is n + 3 in the case of HH.
The same kind of analysis could in theory be done for ”OROMEOROMEO”
but it would probably be hopelessly complicated.
To approach ”OROMEOROMEO” let’s turn it into a game. Let T denote
the random variable giving me the expected time until ”OROMEOROMEO”.
I walk into a casino and sitting at a table dressed in the casino’s standard
uniform is a monkey in front of a typewriter. People are standing around the
table making bets on the letters of the alphabet. Then the monkey strikes
a key at random (it is a fair monkey) and the people win or lose according
to whether their bet was on the letter struck. If their letter came up, they
are paid off at a rate of 26 to 1. The game then continues. For simplicity,
assume that a round of monkey typing takes exactly one minute. Clearly the
odds and the pay-off are set up to yield a ”fair game”: this is an example of
a martingale. I know in my mathematical bones that I can not expect to do
better on average than to break even. That is:
Theorem 3.1.1. No matter what strategy I adopt, on average, I can never do
better than break even.
Eventually this will be a theorem in martingale theory (and not a very
hard one at that). Together a group of gamblers conspire together on the
following betting scheme. On the first play, the first bettor walks up to
the table and bets one Italian Lira (monkey typing is illegal in the United
States) ITL 1 that the first letter is an ”O”. If he wins, he gets ITL 26 and
bets all ITL 26 on the second play that the second letter is an ”R”. If he
56 CHAPTER 3. DISCRETE TIME
wins he bets all ITL 26
2
that on the third play the letter is an ”O”. Of
course if at some point he loses, all the money is confiscated and he walks
away with a net loss of the initial ITL 1. Now in addition to starting the
above betting sequence on the first play, the second bettor begins the same
betting sequence starting with the second play, that is, regardless of what
has happened on the first play, she also bets ITL 1 that the second play will
be an O, etc. One after another, the bettors (there are a lot of them) initiate
a new betting sequence at each play. If at some point one of the bettors gets
through ”OROMEOROMEO”, the game ends, and the conspirators walk
away with their winnings and losses which they share. Then they calculate
their total wins and losses.
Again, because the game is fair, the conspirators expected wins is going
to be 0. On the other hand, let’s calculate the amount they’ve won once
”OROMEOROMEO” has been achieved. Because they got through, the
betting sequence beginning ”OROMEOROMEO” has yielded ITL 26
11
. In
addition to this though, the sequence starting with the final ”O” in the first
ROMEO and continuing through the end has gone undefeated and yields
ITL 26
6
. And finally the final ”O” earns an additional ITL 26. On the other
hand, at each play, ITL 1 the betting sequence initiated at that play has been
defeated, except for those enumerated above. Each of these then counts as
a ITL 1 loss. How many such losses are there? Well, one for each play of
monkeytyping. Therefore, their net earnings will be
26
11
+ 26
6
+ 26 −T
Since the expected value of this is 0, it means that the expected value E(T) =
26
11
+ 26
6
+ 26. A fair amount of money even in Lira.
Try this for the ”HH” problem above to see more clearly how this works.
3.2 Introducing stochastic processes
A stochastic process which is discrete in both space and time is a sequence of
random variables, defined on the same sample space (that is they are jointly
distrubuted) ¦X
0
, X
1
, X
2
, . . .¦ such that the totality of values of all the X
n
’s
combined form a discrete set. The different values that the X
n
’s can take
are called the states of the stochastic process. The subscript n in X
n
is to be
thought of as time.
3.2. INTRODUCING STOCHASTIC PROCESSES 57
We are interested in understanding the evolution of such a process. One
way to describe such a thing is to consider how X
n
depends on X
0
, X
1
, . . . , X
n−1
.
Example 3.2.1. Let W
n
denote the position of a random walker. That is, at
time 0, W
0
= 0 say and at each moment in time, the random walker has a
probability p of taking a step to the left and a probability 1 − p of taking
a step to the right. Thus, if at time n, W
n
= k then at time n + 1 there is
probability p of being at k − 1 and a probability 1 − p of being at k + 1 at
time n + 1. We express this mathematically by saying
P(W
n+1
= j[W
n
= k) =
_
_
_
p if j = k −1
1 −p if j = k + 1
0 otherwise
(3.2.1)
In general, for a stochastic process X
n
we might specify
P(X
n+1
= j[X
n
= k
n
, X
n−1
= k
n−1
, , X
0
= k
0
)
If as in the case of a random walk
P(X
n+1
= j[X
n
= k
n
, X
n−1
= k
n−1
, , X
0
= k
0
) = P(X
n+1
= j[X
n
= k
n
)
the process is called a Markov process. We write
p
jk
= P(X
n+1
= k[X
n
= j)
The numbers p
jk
form a matrix (an infinite by infinite matrix if the random
variables X
n
are infinite), called the matrix of transition probabilities. They
satisfy (merely by virtue of being probabilities )
1. p
jk
≥ 0
2.

k
p
jk
= 1 for each j.
The matrix p
jk
as we’ve defined it tells us how the probabilities change from
the nth to the n + 1st step. So p
jk
a priori depends on n. If it doesn’t,
then the process is called a stationary Markov process. Let’s reiterate: p
jk
represents the probability of going from state j to state k in one time step.
Thus, if X
n
is a stationary Markov process, once we know the p
jk
the only
information left to know is the initial probability distribution of X
0
, that is
p
0
(k) = P(X
0
= k). The basic question is: Given p
0
and p
jk
what is the
58 CHAPTER 3. DISCRETE TIME
probability distribution of X
n
? Towards this end we introduce the ”higher”
transition probablilities
p
(n)
jk
= P(X
n
= k[X
0
= j)
which by stationarity also equals P(X
m+n
= k[X
m
= j) the probability of
going from j to k in n steps.
Proposition 3.2.1.
p
(n)
jk
=

h
p
(n−1)
jh
p
hk
Proof:
p
(n)
jk
= P(X
n
= k[X
0
= j)
Now a process that goes from state j to state k in n steps, must land some-
where on the n −1st step. Suppose it happens to land on the state h. Then
the probability
P(X
n
= k[X
n−1
= h) = P(X
1
= k[X
0
= h) = p
hk
by stationarity. Now the probability of going from state j to state h in n−1
steps and then going from state h to state k in one step is
P(X
n−1
= h[X
0
= j)P(X
n
= k[X
n−1
= h) = p
(n−1)
jh
p
hk
Now the proposition follows by summing up over all the states one could
have landed upon on the n −1st step. Q.E.D.
It follows by repeated application of the preceding proposition that
p
(n+m)
jk
=

h
p
(n)
jh
p
(m)
hk
This is called the Chapman-Kolmogorov equation. Expressed in terms of
matrix mulitplication, it just says that if P = p
jk
denotes the matrix of
transition probabilities from one step to the next, then P
n
the matrix raised
to the nth power represents the higher transition probabilities and of course
P
n+m
= P
n
P
m
.
Now we can answer the basic question posed above. If P(X
0
= j) = p
0
(j)
is the initial probability distribution of a stationary Markov process with
transition matrix P = p
jk
then
P(X
n
= k) =

j
p
0
(j)p
(n)
jk
3.2. INTRODUCING STOCHASTIC PROCESSES 59
3.2.1 Generating functions
Before getting down to some examples, I want to introduce a technique for
doing calculations with random variables. Let X be a discrete random vari-
able whose states are any integer j and has distribution p(k). We introduce
its generating function
F
X
(t) =

k
p(k)t
k
At this point we do not really pay attention to the convergence of F
X
, we will
do formal manipulations with F
X
and therefore F
X
is called a formal power
series (or more properly a formal Laurent series since it can have negative
powers of t). Here are some of the basic facts about the generating function
of X;
1. F
X
(1) = 1 since this is just the sum over all the states of the probabil-
ities, which must be 1.
2. E(X) = F

X
(1). Since F

X
(t) =

k
kp(k)t
k−1
. Now evaluate at 1.
3. Var(X) = F

X
(1) + F

X
(1) − (F

X
(1))
2
. This is an easy calculation as
well.
4. If X and Y are independent random variables, then F
X+Y
(t) = F
X
(t)F
Y
(t).
Before going on, let’s use these facts to do some computations.
Example 3.2.2. Let X be a Random variable representing a Bernoulli trial,
that is P(X = −1) = p, P(X = 1) = 1 − p and P(X = k) = 0 otherwise.
Thus F
X
= pt
−1
+ (1 −p)t. Now a Random walk can be described by
W
n
= X
1
+ +X
n
where each X
k
is an independent copy of X: X
k
represents taking a step to
the left or right at the kth step. Now according to fact 4) above, since the
X
k
are mutually independent we have
F
Wn
= F
X
1
F
X
2
F
Xn
= (F
X
)
n
= (pt
−1
+ (1 −p)t)
n
=
n

j=0
(
n
j)(pt
−1
)
j
((1 −p)t)
n−1
60 CHAPTER 3. DISCRETE TIME
=
n

j=0
(
n
j)p
j
(1 −p)
n−j
t
n−2j
Hence the distribution for the random walk W
n
can be read off as the co-
efficients of the generating function. So P(W
n
= n − 2j) = (
n
j)p
j
(1 − p)
n−j
for j = 0, 1, 2, . . . , n and is zero for all other values. (Note: This is our
third time calculating this distribution.) Also, E(W
n
) = F

Wn
(1). F

Wn
(t) =
d
dt
(pt
−1
+ (1 − p)t)
n
= n(pt
−1
+ (1 − p)t)
n−1
((1 − p) − pt
−2
) and evaluating
at 1 gives n(p + (1 −p))((1 −p) −p) = n((1 −p) −p). Similarly, F

Wn
= n(n−1)(pt
−1
+(1 −p)t)
n−2
((1 −p) −pt
−2
)
2
+n(pt
−1
+(1 −p)t)
n−1
(2pt
−3
)
and evaluating at 1 gives n(n −1)((1 −p) −p)
2
+n2p. So Var(W
n
)
= F

Wn
(1) +F

Wn
(1) −F

Wn
(1)
2
= n(n −1)((1 −p) −p)
2
+ 2np +n((1 −p) −p) −n
2
((1 −p) −p)
2
= −n((1 −p) −p)
2
+ 2np +n((1 −p) −p)
= n((1 −p) −p)(1 −((1 −p) −p)) + 2np = 2pn((1 −p) −p) + 2np
= 2np(1 −p) −2np
2
+ 2np = 2np(1 −p) + 2np(1 −p) = 4np(1 −p)
Consider the following experiment. Roll two dice and read off the sum
on them. Then flip a coin that number of times and count the number of
heads. This describes a random variable X. What is its distribution? We
can describe X mathematically as follows. Let N denote the random variable
of the sum on two dice. So according to a calculation in a previous set of
notes, P(N = n) =
1
36
(6 −[7 −n[) for n = 2, 3, ...12 and is 0 otherwise. Also,
let Y count the number of heads in one toss, so that P(Y = k) =
1
2
if k = 0
or 1 and is 0 otherwise. Hence our random variable X can be written as
X =
N

j=1
Y
j
(3.2.2)
where Y
j
are independent copies of Y and note that the upper limit of the
sum is the random variable N. This is an example of a composite random
variable, that is, in general a composite random variable is one that can
be expressed by (3.2.2)where N and Y are general random variables. The
generating function of such a composite is easy to calculate.
3.2. INTRODUCING STOCHASTIC PROCESSES 61
Proposition 3.2.2. Let X be the composite of two random variables N and
Y . Then F
X
(t) = F
N
(F
Y
(t)) = F
N
◦ F
Y
.
Proof:
P(X = k) =

n=0
P(N = n)P(
n

j=1
Y
j
= k)
Now, F
P
n
j=1
Y
j
= (F
Y
(t))
n
since the Y
j
’s are mutually independent. Therefore
F
X
=

n=1
P(N = n)(F
Y
(t))
n
= F
N
(F
Y
(t))
Q.E.D.
So in our example above of rolling dice and then flipping coins, F
N
=

1
n=2
2
1
36
(6−[7−n[)t
n
and F
Y
(t) = 1/2+1/2t. Therefore F
X
(t) =

12
n=2
1
36
(6−
[7 −n[)(1/2 + 1/2t)
n
from which one can read off the distribution of X.
The above sort of compound random variables appear naturally in the
context of branching processes. Suppose we try to model a population (of
amoebae, of humans, of subatomic particles) in the following way. We denote
the number of members in the nth generation by X
n
. We start with a single
individual in the zeroth generation. So X
0
= 1. This individual splits into
(or becomes) X
1
individuals (offspring) in the first generation. The number
of offspring X
1
is a random variable with a certain distribution P(X
1
= k) =
p(k). p(0) is the probability that X
0
dies. p(1) is the probability that X
0
has
one descendent, p(2), 2 descendants etc. Now in the first generation we have
X
1
individuals, each of which will produce offspring according to the same
probability distribution as its parent. Thus in the second generation,
X
2
=
X
1

n=1
X
1,n
where X
1,n
is an independent copy of X
1
. That is, X
2
is the compound
random variable of X
1
and itself. Continuing on, we ask that in the nth
generation, each individual will produce offspring according to the same dis-
tribution as its ancestor X
1
. Thus
X
n
=
X
n−1

n=1
X
1,n
62 CHAPTER 3. DISCRETE TIME
Using generating functions we can calculate the distribution of X
n
. So we
start with F
X
1
(t). Then by the formula for the generating function of a
compound random variable,
F
X
2
(t) = F
X
1
(F
X
1
(t))
and more generally,
F
Xn
= F
X
1
◦ n- times ◦ F
X
1
Without generating functions, this expression would be quite difficult.
3.3 Conditional expectation
We now introduce a very subtle concept of utmost importance to us: condi-
tional expectation. We will begin by defining it in its simplest incarnation:
expectation of a random variable conditioned on an event. Given an event
A, define its indicator to be the random variable
1
A
=
_
1 if ω ∈ A,
0 if ω / ∈ A.
_
Define
E(X[A) =
E(X 1
A
)
P(A)
.
In the discrete case, this equals

x
xP(X = x[A)
and in the continous case _
xp(x[A) dx
Some basic properties of conditional expectation are
Proposition 3.3.1. (Properties of conditional expectation)
1. E(X +Y [A) = E(X[A) +E(Y [A).
2. If X is a constant c on A, then E(XY [A) = cE(Y [A).
3.3. CONDITIONAL EXPECTATION 63
3. If B =

n
i=1
A
i
, then
E(X[B) =
n

i=1
E(X[A
i
)
P(A
i
)
P(B)
(Jensen’s inequality) A function f is convex if for all t ∈ [0, 1] and x < y f
satisfies the inequality
f((1 −t)x +ty) ≤ (1 −t)f(x) +tf(y)
Then for f convex, E(f(X)) ≥ f(E(X)) and E(f(X)[A) ≥ f(E(X[A)) .
If X and Y are two random variables defined on the same sample space,
then we can talk of the particular kind of event ¦Y = j¦. Hence it makes
sense to talk of E(X[Y = j). There is nothing really new here. But a more
useful way to think of the conditional expectation of X conditioned on Y is as
a random variable. As j runs over the values of the variable Y , E(X[Y = j)
takes on different values. These are the values of a new random variable.
Thus we make the following
Definition 3.3.1. Let X and Y be two random variables defined on the
same sample space. Define a random variable called E(X[Y ) and called the
expectation of X conditioned on Y . The values of this random variable are
E(X[Y = j) as j runs over the values of Y . Finally
P[E(X[Y ) = e] =

{j|E(X|Y =j)=e}
P[Y = j]
More generally, if Y
0
, Y
1
, . . . , Y
n
are a collection of random variables, we
may define E(X[Y
0
= y
0
, Y
1
= y
1
, . . . , Y
n
= y
n
) and E(X[Y
0
, , Y
1
, . . . , Y
n
).
Some basic properties of the random variable E(X[Y ) are listed in the
following proposition.
Proposition 3.3.2. Let X, Y be two random variables defined on the same
sample space. Then
1. (Tower property) E(E(X[Y )) = E(X).
2. If X, Y are independent, then E(X[Y ) = E(X).
3. Letting Z = f(Y ), then E(ZX[Y ) = ZE(X[Y )
64 CHAPTER 3. DISCRETE TIME
Proof:
1.
P(E(X[Y ) = e) =

{j|E(X|Y =j)=e}
P(Y = j)
Then
E(E(X[Y )) =

e
eP(E(X[Y ) = e)
=

j
E(X[Y = j)P(Y = j)
=

j
(

k
kP(X = k[Y = j)) P(Y = j)
=

j

k
k(P((X = k) ∩ (Y = j))/P(Y = j))P(Y = j)
=

j

k
kP((X = k) ∩ (Y = j)) =

k

j
kP((X = k) ∩ (Y = j))
=

k
kP((X = k) ∩
_

j
Y = j
_
) =

k
kP(X = k) = E(X)
(3.3.1)
2. If X, Y are independent, then E(X[Y ) = E(X). So
E(X[Y = j) =

k
kP(X = k[Y = j)
=

k
kP(X = k) = E(X).
(3.3.2)
This is true for all j so the conditional expectation if a constant.
3. We have
E(ZX[Y = j) =

k,l
klP((Z = k) ∩ (X = l)[Y = j)
=

k,l
klP((f(Y ) = k) ∩ (X = l)[Y = j)
= f(j)

l
lP(X = l[Y = j) = ZE(X[Y = j).
(3.3.3)
Q.E.D.
We also have an analogous statement for conditioning on a set of variables.
Proposition 3.3.3. Let X, Y
0
, . . . , Y
n
be random variables defined on the
same sample space. Then
1. (Tower property) E(E(X[Y
0
, . . . , Y
n
)) = E(X).
2. If X is independent of Y
0
, . . . , Y
n
, then E(X[Y
0
, . . . , Y
n
) = E(X).
3. Letting Z = f(Y
0
, . . . , Y
n
), then E(ZX[Y
0
, . . . , Y
n
) = ZE(X[Y
0
, . . . , Y
n
)
Proof: See exercises.
3.4. DISCRETE TIME FINANCE 65
3.4 Discrete time finance
Now let us return to the market. If S
n
denotes the value of a security at
time n, is S
n
a Martingale? There are two reasons why not. The first reason
is the time value of money. If S
n
were a Martingale, no one would buy it.
You might as well just put your money under your mattress. What about
the discounted security process? This would turn a bank account into a
Martingale. But any other security would not be. If the discounted stock
value were a Martingale, you would expect to make as much money as in
your bank account and people would just stick their money in the bank. The
second reason why the discounted stock price process is not a Martingale is
the non-existence of arbitrage opportunities.
Let’s go back to our model of the market from the last chapter. Thus we
have a vector of securities S and a pay-off matrix D = (d
ij
) where security i
pays off d
ij
in state j. In fact, let’s consider a very simple market consisting of
two securities, a bond S
1
and a stock S
2
. Then the value of the stock at time
1 is d
2j
if state j occurs. Under the no-arbitrage condition, there is a state
price vector ψ so that S = Dψ. Setting R =

j
ψ
j
and q
j
= ψ
j
/R as in the
last chapter, we have

j
q
j
= 1 and the q
j
’s determine a probability Q on the
set of states. Now the value of the stock at time 1 is a random variable, which
we will call d
2
with respect to the probabilities Q, thus, P(d
2
= d
2j
) = q
j
.
The expected value with respect to Q of the stock at time 1 is
E
Q
(d
2
) =

j
d
2,j
q
j
= (D(ψ/R))
2
= (S/R)
2
= S
2
/R
In other words, E(Rd
2
) = S
2
. That is, the expected discounted stock price
at time 1 is the same as the value at time 0, with respect to the probabilities
Q which were arrived at by the no-arbitrage assumption. This is the first
step to finding Q such that the discounted stock prices are martingales. This
is the basic phenomena that we will take advantage of in the rest of this
chapter to value derivatives.
We’ve seen that the no-arbitrage hypothesis led to the existence of a state
price vector, or to what is more or less the same thing, to a probability on the
set of states such that the expected value of the discounted price of the stock
at time 1 is the same as the stock price at time 0. The arbitrage theorem
of last chapter only gives us an existence theorem. It does not tell us how
to find Q in actual practice. For this we will simplify our market drastically
and specify a more definite model for stock movements.
66 CHAPTER 3. DISCRETE TIME
Our market will consist of a single stock S and a bond B.
Let us change notation a little. Now, at time 0, the stock is worth S
0
and
the bond is worth B
0
. At time 1, there will be only two states of nature, u
(up) or d (down). That is, at time 1 δt, the stock will either go up to some
value S
u
1
or down to S
d
1
. The bond will go to e
rδt
B
0
.
S
u
1
¸
S
0
, B
0
→ e
rδt
B
0
¸
S
d
1
(3.4.1)
According to our picture up there we need to find a probability Q such
that E
Q
(e
−rδt
S
1
) = S
0
. Let q be the probability that the stock goes up. Then
we want
S
0
= E
Q
(e
−rδt
S
1
) = qe
−rδt
S
u
1
+ (1 −q)e
−rδt
S
d
1
Thus we need
q(e
−rδt
S
u
1
−e
−rδt
S
d
1
) +e
−rδt
S
d
1
= S
0
So
q =
e
rδt
S
0
−S
d
1
S
u
1
−S
d
1
1 −q =
S
u
1
−e
rδt
S
0
S
u
1
−S
d
1
(3.4.2)
If h is a contingency claim on S, then according to our formula of last
chapter, the value the derivative is
V (h) = RE
Q
(h(S
1
))
= e
−rδt
(qh(S
u
1
) + (1 −q)h(S
d
1
))
= e
−rδt
__
e
rδt
S
0
−S
d
1
S
u
1
−S
d
1
_
h(S
u
1
) +
_
S
u
1
−e
rδt
S
0
S
u
1
−S
d
1
_
h(S
d
1
)
_
= e
−rδt
_
h(S
u
1
)
_
e
rδt
S
0
−S
d
1
S
u
1
−S
d
1
_
+h(S
d
1
)
_
S
u
1
−e
rδt
S
0
S
u
1
−S
d
1
__
.
Let’s rederive this based on an ab initio arbitrage argument for h.
Let’s try to construct a replicating portfolio for h, that is, a portfolio
consisting of the stock and bond so that no matter whether the stock goes
3.4. DISCRETE TIME FINANCE 67
up or down, the portfolio is worth the same as the derivative. Thus, suppose
at time 0 we set up a portfolio
θ =
_
φ
ψ
_
(3.4.3)
φ units of stock S
0
and ψ units of bond B
0
.
At time δt, the value of this portfolio is worth either φS
u
1
+ψe
rδt
B
0
if the
state is u, or φS
d
1
+ ψe
rδt
B
0
if the state is d. We want this value to be the
same as h(S
1
). So we solve for φ and ψ:
φS
u
1
+ψe
rδt
B
0
= h(S
u
1
)
φS
d
1
+ψe
rδt
B
0
= h(S
d
1
)
(3.4.4)
Subtracting,
φ(S
u
1
−S
d
1
) = h(S
u
1
) −h(S
d
1
).
So we find
φ =
h(S
u
1
) −h(S
d
1
)
S
u
1
−S
d
1
while
ψ =
h(S
u
1
) −φS
u
1
e
rδt
B
0
=
h(S
u
1
) −
h(S
u
1
)−h(S
d
1
)
S
u
1
−S
d
1
S
u
1
e
rδt
B
0
=
h(S
u
1
)S
u
1
−h(S
u
1
)S
d
1
−h(S
u
1
)S
u
1
+h(S
d
1
)S
u
1
e
rδt
B
0
(S
u
1
−S
d
1
)
=
h(S
u
1
)S
u
1
−h(S
u
1
)S
d
1
e
rδt
B
0
(S
u
1
−S
d
1
)
Thus, the value of the portfolio at time 0 is
φS
0
+ψB
0
=
h(S
u
1
)−h(S
d
1
)
S
u
1
−S
d
1
S
0
+
h(S
d
1
)S
u
1
−h(S
u
1
)S
d
1
e
rδt
B
0
(S
u
1
−S
d
1
)
B
0
=
_
e
rδt
(h(S
u
1
)−h(S
d
1
))
e
rδt
(S
u
1
−S
d
1
)
_
S
0
+
h(S
d
1
)S
u
1
−h(S
u
1
)S
d
1
e
rδt
(S
u
1
−S
d
1
)
= e
−rδt
_
h(S
u
1
)
_
e
rδt
S
0
−S
d
1
S
u
1
−S
d
1
_
+h(S
d
1
)
_
S
u
1
−e
rδt
S
0
S
u
1
−S
d
1
__
.
(3.4.5)
68 CHAPTER 3. DISCRETE TIME
Since the value of this portfolio and the derivative are the same at time 1, by
the no-arbitrage hypothesis, they must be equal at time 0. So we’ve arrived
the value of the derivative by a somewhat different argument.
Now we move on to a multi-period model. We are interested in defining a
reasonably flexible and concrete model for the movement of stocks.
We now want to divide the time period from now, time 0 to some horizon T,
which will often be the expiration time for a derivative, into N periods, each
of length δt so that Nδt = T.
One Step Redo with δt .
S
u
δt
q
¸
S
0
1−q
¸
S
d
δt
(3.4.6)
q =
S
0
e
rδt
−S
d
δt
S
u
δt
−S
d
δt
, 1 −q =
S
u
δt
−S
0
e
rδt
S
u
δt
−S
d
δt
Then for a contigency claim h we have V (h) = e
−rδt
E
Q
(h).
Two Steps

uu
q
u
1
¸

u
1−q
u
1
→ •
ud
q
0
¸

1−q
0
¸

d
1−q
d
1
→ •
du
q
d
1
¸

dd
(3.4.7)
q
d
1
=
e
rδt
S
d
1δt
−S
dd
2δt
S
du
2δt
−S
dd
2δt
, 1 −q
d
1
=
S
du
2δt
−e
rδt
S
d
1δt
S
du
2δt
−S
dd
2δt
q
u
1
=
e
rδt
S
u
1δt
−S
ud
2δt
S
uu
2δt
−S
ud
2δt
, 1 −q
u
1
=
S
uu
2δt
−e
rδt
S
u
1δt
S
uu
2δt
−S
ud
2δt
3.4. DISCRETE TIME FINANCE 69
q
0
=
e
rδt
S
0
−S
d
1δt
S
u
1δt
−S
d
1δt
, 1 −q
0
=
S
u
1δt
−e
rδt
S
0
S
u
1δt
−S
d
1δt
Suppose h is a contingency claim, and that we know h(S
2δt
). Then we
have
h(S
u
1δt
) = e
−rδt
(q
u
1
h(S
uu
2δt
) + (1 −q
u
1
)h(S
ud
2δt
)),
h(S
d
1δt
) = e
−rδt
(q
d
1
h(S
du
2δt
) + (1 −q
d
1
)h(S
dd
2δt
)).
Finally,
h(S
0
) = e
−rδt
(q
0
h(S
u
1δt
) + (1 −q
0
)h(S
d
1δt
))
= e
−rδt
(q
0
(e
−rδt
(q
u
1
h(S
uu
2δt
)+(1−q
u
1
)h(S
ud
2δt
)))+(1−q
0
)(e
−rδt
(q
d
1
h(S
du
2δt
)+(1−q
d
1
)h(S
dd
2δt
)))
= e
−2rδt
(q
0
q
u
1
h(S
uu
2δt
)+q
0
(1−q
u
1
)h(S
ud
2δt
)+(1−q
0
)q
d
1
h(S
du
2δt
)+(1−q
0
)(1−q
d
1
)h(S
dd
2δt
)).
Let’s try to organize this into an N-step tree.
First, let’s notice what the coefficients of h(S
••
2δt
) are. q
0
q
u
1
represents
going up twice. q
0
(1 − q
u
1
) represents going up, then down, etc.. That is,
these coefficients are the probability of ending up at a particular node of the
tree. Thus, if we define a sample space
Ψ = ¦uu, ud, du, dd¦
P(uu) = q
0
q
u
1
, P(ud) = q
0
(1−q
u
1
), P(du) = (1−q
0
)q
d
1
, P(dd) = (1−q
0
)(1−q
d
1
).
We check that this is a probability on the final nodes of the two step tree:
P(uu) +P(ud) +P(du) +P(dd)
= q
0
q
u
1
+q
0
(1 −q
u
1
) + (1 −q
0
)q
d
1
+ (1 −q
0
)(1 −q
d
1
)
= q
0
(q
u
1
+ 1 −q
u
1
) + (1 −q
0
)(q
d
1
+ 1 −q
d
1
) = q
0
+ 1 −q
0
= 1.
Now h(S
2
) is a random variable depending on w
0
w
1
∈ Ψ. Hence we have
V (h) = e
−2rδt
E
Q
(h(S
2
)).
So what’s the general case?
For N steps, the sample space is Ψ = ¦w
0
w
1
. . . w
N−1
[w
i
= u or d¦.
P(w
0
w
1
. . . w
N−1
) =
N−1

j=0
P(w
j
)
70 CHAPTER 3. DISCRETE TIME
where
P(w
0
) =
_
q
0
if w
0
= u
1 −q
0
if w
0
= d
_
and more generally
P(w
j
) = (
q
w
0
···w
j−1
1
if w
j
= u
1 −q
w
0
···w
j−1
1
ifw
j
= d
V (h) = e
−rNδt
E
Q
(h(S
N
))
3.5 Basic martingale theory
We will now begin our formal study of martingales.
Definition 3.5.1. Let M
n
be a stochastic process. We say M
n
is a Martingale
if
1. E([M
n
[) < ∞, and
2. E(M
n+1
[M
n
= m
n
, M
n−1
= m
n−1
, . . . , M
0
= m
0
) = m
n
.
Noting that E(M
n+1
[M
n
= m
n
, M
n−1
= m
n−1
, . . . , M
0
= m
0
) are the
values of a random variable which we called E(M
n+1
[M
n
, M
n−1
, . . . , M
0
) we
may write the condition 2. Above as E(M
n+1
[M
n
, M
n−1
, . . . , M
0
) = M
n
.
We also define a stochastic process M
n
to be a supermartingale if
1. E([M
n
[) < ∞, and
2. E(M
n+1
[M
n
= m
n
, M
n−1
= m
n−1
, . . . , M
0
= m
0
) ≤ m
n
(or equiva-
lently E(M
n+1
[M
n
, M
n−1
, . . . , M
0
) ≤ M
n
).
We say M
n
is a submartingale if
1. E([M
n
[) < ∞, and
2. E(M
n+1
[M
n
= m
n
, M
n−1
= m
n−1
, . . . , M
0
= m
0
) ≥ m
n
. (or equiva-
lently E(M
n+1
[M
n
, M
n−1
, . . . , M
0
) ≥ M
n
.
The following theorem is one expression of the fair game aspect of mar-
tingales.
3.5. BASIC MARTINGALE THEORY 71
Theorem 3.5.1. If Y
n
is a
1. supermartingale, then E(Y
n
) ≤ E(Y
m
) for n ≥ m.
2. submartingale, then E(Y
n
) ≥ E(Y
m
) for n ≥ m.
3. martingale E(Y
n
) = E(Y
m
) for n, m.
Proof
1. For n ≥ m, the tower property of conditional expectation implies
E(Y
n
) = E(E(Y
n
[Y
m
, Y
m−1
, . . . , Y
0
)) ≤ E(Y
m
).
2. is similar.
3. is implied by 1. and 2. together.
Q.E.D.
It will often be useful to consider several stochastic processes at once, or
to express a stochastic process in terms of other ones. But we will usually
assume that there is some basic or driving stochastic process and that others
are expressed in terms of this one and defined on the same sample space.
Thus we make the following more general definition of a martingale.
Definition 3.5.2. (Generalization) Let X
n
be a stochastic process. A stochas-
tic process M
0
, M
1
, . . . is a martingale relative to X
0
, . . . , X
n
if
1. E([M
n
[) < ∞ and
2. E(M
n+1
−M
n
[X
0
= x
0
, . . . , X
n
= x
n
) = 0.
Examples
1. Let X
n
= a + ξ
1
+ ξ
n
where ξ
i
are independent random variables
with E([ξ
i
[) < ∞ and E(ξ
i
) = 0. Then X
n
is a martingale. For
E(X
n+1
[X
0
= x
0
, . . . , X
n
= x
n
)
= E(X
n

n+1
[X
0
= x
0
, . . . , X
n
= x
n
)
= E(X
n
[X
0
= x
0
, . . . , X
n
= x
n
) +E(ξ
n+1
[X
0
= x
0
, . . . , X
n
= x
n
)
Now by property 1 of conditional expectation for the first summand
and property 2 for the second this is
= X
n
+E(ξ
n+1
) = X
n
72 CHAPTER 3. DISCRETE TIME
and thus a martingale.
A special case of this is when ξ
i
are IID with P(ξ
i
= +1) = 1/2, and
P(ξ
i
= −1) = 1/2, the symmetric random walk.
2. There is a multiplicative analogue of the previous example. Let X
n
=

1
ξ
2
ξ
n
where ξ
i
are independent random variables with E([ξ
i
[) <
∞ and E(ξ
i
) = 1. Then X
n
is a martingale. We leave the verification
of this to you.
3. Let S
n
= S
0

1
+ +ξ
n
, S
0
constant, ξ
i
independent random variables,
E(ξ
i
) = 0, σ
2

i
) = σ
2
i
. Set v
n
=

n
i=1
σ
2
i
. Then M
n
= S
2
n
− v
n
is a
martingale relative to S
0
, . . . , S
n
.
For
M
n+1
= (S
2
n+1
−v
n+1
) −(S
2
n
−v
n
)
= (S
n

n+1
)
2
−S
2
n
−(v
n+1
−v
n
)
= 2S
n
ξ
n+1

2
n+1
−σ
2
n+1
,
(3.5.1)
and thus E(M
n+1
−M
n
[S
0
, S
1
, . . . , S
n
)
= E(2S
n
ξ
n+1

2
n+1
−σ
2
n+1
[S
0
, . . . , S
n
)
= 2S
n
E(ξ
n+1
) +σ
2
n+1
−σ
2
n+1
= 0
4. There is a very general way to get martingales usually ascribed to Levy.
This will be very important to us. Let X
n
be a driving stochastic
process and let Y be a random variable defined on the same sample
space. Then we may define Y
n
= E(Y [X
n
, . . . , X
0
). Then Y
n
is a
martingale. For by the tower property of conditional expectation, we
have E(Y
n+1
[X
n
, . . . , X
0
) is by definition
E(E(Y [X
n+1
, . . . , X
0
)[X
n
, . . . , X
0
)) = E(Y [X
n
, . . . , X
0
) = Y
n
We now introduce a very useful notation. Let T(X
0
, . . . , X
n
) be the col-
lection of random variables that can be written as g(X
0
, . . . , X
n
) for some
function g of n + 1 variables. More compactly, given a stochastic process
X
k
, we let T
n
= T(X
0
, . . . , X
n
). T
n
represents all the information that is
knowable about the stochastic process X
k
at time n and we call it the filtra-
tion induced by X
n
. We write E(Y [T
n
) for E(Y [X
0
, . . . , X
n
). We restate
Proposition (3.3.3) in this notation. We also have an analogous statement
for conditioning on a set of variables.
3.5. BASIC MARTINGALE THEORY 73
Proposition 3.5.1. Let X, X
0
, . . . , X
n
, . . . be a stochastic process with fil-
tration T
n
, and X a random variable defined on the same sample space.
Then
1. E(E(X[T
n
)[T
m
) = E(X[T
m
) if m ≤ n.
2. If X, X
0
, . . . , X
n
are independent, then E(X[T
n
) = E(X).
3. Letting X ∈ T
n
. So X = f(X
0
, . . . , X
n
), then E(XY [T
n
) = XE(Y [T
n
)
Definition 3.5.3. Let X
n
be a stochastic process and T
n
its filtration. We
say a process φ
n
is
1. adapted to X
n
(or to T
n
) if φ
n
∈ T
n
for all n.
2. previsible relative to X
n
(or T
n
) if φ ∈ T
n−1
.
We motivate the definitions above and the following construction by
thinking of a gambling game. Let S
n
be a stochastic process representing
the price of a stock at the nth tick. Suppose we are playing a game and that
our winnings per unit bet in the nth game is X
n
−X
n
−1. Let φ
n
denote the
amount of our bet on the nth hand. It is a completely natural assumption
that φ
n
should only depend on the history of the game up to time n−1. This
is the previsible hypothesis. Given a bet of φ
n
on the nth hand our winnings
playing this game up til time n will be
φ
1
(X
1
−X
0
) +φ
2
(X
2
−X
1
) + φ
n
(X
n
−X
n−1
=
n

j=1
φ
j
(X
j
−X
j−1
=
n

j=1
φ
j
∆X
j
This is called the martingale transform of φ with respect to X
n
. We denote
it more compactly by (

φ∆X)
n
. The name comes from the second theorem
below, which gives us yet another way of forming new martingales from old.
Theorem 3.5.2. Let Y
n
be a supermartingale relative to X
0
, X
1
, . . ., T
n
the
filtration associated to X
n
. Let φ
n
be a previsible process and 0 ≤ φ
n
≤ c
n
.
Then G
n
= G
0
+

n
m=1
φ
m
(Y
m
−Y
m−1
) is an T
n
-supermartingale.
74 CHAPTER 3. DISCRETE TIME
Proof G
n+1
= G
n

n+1
(Y
n+1
−Y
n
). Thus
E(G
n+1
−G
n
)[X
0
, . . . , X
n
) = E(φ
n+1
(Y
n+1
−Y
n
)[X
0
, . . . , X
n
)
= φ
n+1
E(Y
n+1
−Y
n
[X
0
, . . . , X
n
)
≤ φ
n+1
0 ≤ 0.
(3.5.2)
Q.E.D.
Theorem 3.5.3. If φ
n+1
∈ T
n
, [φ
n
[ ≤ c
n
, and Y
n
is a martingale relative to
X
i
, then G
n
= G
0
+

n
m=1
φ
m
(Y
m
−Y
m−1
) is a martingale.
Definition 3.5.4. T, a random variable, is called a stopping time relative
to X
i
, if the event ¦T = n¦ is determined by X
0
, . . . , X
n
.
Example: In this example, we define a simple betting strategy of betting 1
until the stopping time tells us to stop. Thus, set
φ
n
(ω) =
_
1 if T(ω) ≥ n,
0 if T(ω) < n.
_
(3.5.3)
Let’s check that φ
n
∈ T(X
0
, . . . , X
n
). By definition, ¦T(ω) = n¦ is de-
termined by X
0
, . . . , X
n
. Now ¦φ
n
(ω) = 0¦ = ¦T ≤ n −1¦ =

n−1
k=0
¦T = k¦
is determined by X
0
, . . . , X
n−1
.
Let T ∧ n be the minimum of T and n. Thus
(T ∧ n)(ω) =
_
T(ω) if T(ω) < n,
= n if T(ω) ≥ n.
_
(3.5.4)
Theorem 3.5.2. 1. Let Y
n
be a supermartingale relative to X
i
and T a
stopping time. Then Y
T∧n
is a supermartingale.
2. Let Y
n
be a submartingale relative to X
i
and T a stopping time. Then
Y
T∧n
is a submartingale.
3. Let Y
n
be a martingale relative to X
i
and T a stopping time. Then
Y
T∧n
is a martingale.
Proof: Let φ
n
be the previsible process defined above, i.e.,
φ
n
=
_
1 if n ≤ T,
0 if n > T.
_
(3.5.5)
3.5. BASIC MARTINGALE THEORY 75
Then
G
n
= Y
0
+

n
m=1
φ
m
(Y
m
−Y
m−1
)
= Y
0

1
(Y
1
−Y
0
) +φ
2
(Y
2
−Y
1
) + +φ
n
(Y
n
−Y
n−1
)
=
_
Y
n
if n ≤ T
Y
T
if n > T.
_
= Y
T∧n
(3.5.6)
So it follows by the above.
Q.E.D.
Example: The following betting strategy of doubling after each loss is known
to most children and is in fact THE classic martingale, the object that gives
martingales their name.
Let P(ξ = ±1) =
1
2
, ξ
i
are independent, identically distributed random
variables, X
n
=

n
i=1
ξ
i
, which is a martingale. Now φ
n
is the previsible
process defined in the following way: φ
1
= 1. To define φ
n
we set inductively
φ
n
= 2φ
n−1
if ξ
n−1
= −1 and φ
n
= 1 if ξ
n−1
= 1. Then φ
n
is previsible. Now
define and T = min¦n[ξ
n
= 1¦. Now consider the process G
n
= (

φ∆X)
n
.
Then G
n
is a martingale by a previous theorem. On the other hand, E(G
T
) =
1, that is, when you finely win according to this betting strategy, you always
win 1. This seems like a sure fire way to win. But it requires an arbitrarily
pot to make this work. Under more realistic hypotheses, we find
Theorem 3.5.4. (Doob’s optional stopping theorem) Let M
n
be a martingale
relative to X
n
, T a stopping time relative to X
n
with P(T
ξ
< ∞) = 1. If
there is a K such that [M
T∧n
[ ≤ K for all n, then E(M
T
) = E(M
0
).
Proof We know that E(M
T∧n
) = E(M
0
) since by the previous theorem M
T∧n
is a martingale. Then
[E(M
0
) −E(M
T
)[ = [E(M
T∧n
) −E(M
T
)[ ≤ 2KP(T > n) → 0
as n → ∞.
Q.E.D.
An alternative version is
Theorem 3.5.5. If M
n
is a martingale relative to X
n
, T a stopping time,
[M
n+1
−M
n
[ ≤ K and E(T) < ∞, then E(M
T
) = E(M
0
).
We now discuss one last theorem before getting back to the finance. This
is the martingale representation theorem. According to what we proved
above, if X
n
is a martingale and φ
n
is previsible then the martingale transform
76 CHAPTER 3. DISCRETE TIME
(

φ∆X)
n
is also a martingale. The martingale representation theorem is
a converse of this. It gives conditions under which a martingale can be
expressed as a martingale transform of a given martingale. For this, we need
a condition on our driving martingale. Thus we define
Definition 3.5.5. A generalized random walk or binomial process is a stochas-
tic process X
n
in which at each time n, the value of X
n+1
−X
n
can only be
one of two possible values.
This like an ordinary random walk in that at every step the random walker
can take only 2 possible steps, but what the steps she can take depend on
the path up to time n. Thus
X
n+1
= X
n

n+1
(X
1
, X
2
, . . . , X
n
)
Hence assume that
P(ξ
n+1
(X
1
, . . . , X
n
) = a) = p
and
P(ξ
n+1
(X
1
, . . . , X
n
) = b) = 1 −p
where I am being a little sloppy in that a, b, and p are all dependent on
X
1
, . . . , X
n
. Then
E(X
n+1
[T
n
) = E(X
n

n
(X
0
, . . . , X
n
)[T
n
)
= X
n
+E(ξ
n
(X
0
, . . . , X
n
)[T
n
) = X
n
+pa + (1 −p)b
(3.5.7)
So in order that X
n
be a martingale we need
pa + (1 −p)b = 0 (3.5.8)
Theorem 3.5.6. Let X
0
, X
1
, . . . be a generalized random walk with associated
filtration T
n
. Let M
n
be a Martingale relative to T
n
. Then there exists a
previsible process φ
n
such that
M
n
=
n

i=1
φ
i
(X
i
−X
i−1
) = (

φ∆X)
n
3.5. BASIC MARTINGALE THEORY 77
Proof Since M
n
is adapted to X
n
, there exist functions f
n
such that M
n
=
f
n
(X
0
, . . . , X
n
). By induction, we need to show there exists φ
n+1
∈ T
n
with
M
n+1
= φ
n+1
(X
n+1
−X
n
) +M
n
. Now
0 = E(M
n+1
−M
n
[T
n
)
= E(f
n+1
(X
0
, . . . , X
n+1
) −f
n
(X
0
, . . . , X
n
)[T
n
)
= E(f
n+1
(X
0
, . . . , X
n+1
[T
n
) −f
n
(X
0
, . . . , X
n
).
(3.5.9)
Now by assumption, X
n+1
can take on only 2 possible values: X
n
+ a, with
probability p, and X
n
+b, with probability 1−p. So E(f
n+1
(X
0
, . . . , X
n+1
)[T
n
)
= f
n+1
(X
0
, . . . , X
n
, X
n
+a) p +f
n+1
(X
0
, . . . , X
n
, X
n
+b) (1 −p).
So continuing the earlier calculation,
0 = E(f
n+1
(X
0
, . . . , X
n+1
[T
n
) −f
n
(X
0
, . . . , X
n
)
= (f
n+1
(X
0
, . . . , X
n
, X
n
+a) −f
n
(X
0
, . . . , X
n
)) p
+(f
n+1
(X
0
, . . . , X
n
, X
n
+b) −f
n
(X
0
, . . . , X
n
)) (1 −p)
=
(f
n+1
(X
0
,...,Xn,Xn+a)−fn(X
0
,...,Xn))
a
ap
+
(f
n+1
(X
0
,...,Xn,Xn+b)−fn(X
0
,...,Xn))
b
b(1 −p)
=
(f
n+1
(X
0
,...,Xn,Xn+a)−fn(X
0
,...,Xn))
a
ap
+
(f
n+1
(X
0
,...,Xn,Xn+b)−fn(X
0
,...,Xn))
b
(−ap).
(3.5.10)
since b(1 − p) = −ap by (3.5.8). Or equivalently, (assuming neither a nor p
are zero,
(f
n+1
(X
0
, . . . , X
n
, X
n
+a) −f
n
(X
0
, . . . , X
n
))
a
=
(f
n+1
(X
0
, . . . , X
n
, X
n
+b) −f
n
(X
0
, . . . , X
n
))
b
Set φ
n+1
to be equal to this common value. Clearly, φ
n+1
only depends on
X
0
, . . . , X
n
and is therefore previsible. Now we have to cases to consider: if
X
n+1
= X
n
+ a or X
n+1
= X
n
+ b. Both calculations are the same. So if
X
n+1
= X
n
+a then
φ
n+1
(X
n+1
−X
n
) +M
n
= φ
n+1
a +M
n
= M
n+1
Q.E.D.
78 CHAPTER 3. DISCRETE TIME
3.6 Pricing a derivative
We now discuss the problem we were originally out to solve: to value a
derivative. So suppose we have a stock whose price process we denote by S
n
,
where each tick of the clock represents some time interval δt. We will assume,
to be concrete, that our riskless bond is given by the standard formula,
B
n
= e
nrδt
B
0
. We use the following notation. If V
n
represents some price
process, then
˜
V
n
= e
−nrδt
V
n
is the discounted price process. Suppose h
denotes the payoff function from a derivative on S
n
.
There are three steps to pricing h.
Step 1: Find Q such that
˜
S
n
= e
−rnδt
S
n
is a Q-martingale. Let T
n
denote
the filtration corresponding to
˜
S.
Step 2: Set
˜
V
n
= E(e
−rNδt
h(S
N
)[T
n
), then
˜
V
n
is a martingale as well. By
the martingale representation theorem there exists a previsible φ
i
such that
˜
V
n
= V
0
+
n

i=1
φ
i
(
˜
S
i

˜
S
i−1
).
Step 3: Construct a self-financing hedging portfolio. Just as in the one
period case, we will form a portfolio Π consisting of the stock and bond.
Thus Π
n
= φ
n
S
n

n
B
n
, where B
n
= e
rnδt
B
0
. So
˜
Π
n
= φ
n
˜
S
n
+ ψ
n
B
0
. We
require two conditions on Π.
1. φ
n
and ψ
n
should be previsible. From a financial point of view this is a
completely reasonable assumption, since decisions about what to hold
during the nth tick must be made prior to the nth tick.
2. The portfolio Π should be self financing. This too is financially justified.
In order for the portfolio to be viewed as equivalent to the claim h, it
must act like the claim. The claim needs no infusion of money and
provides no income until expiration. And this is also what we require
of the portfolio.
The self-financing condition is derived from the following considerations. Af-
ter tick n − 1 the protfolio is worth Π
n−1
= φ
n−1
S
n−1

n−1
B
n−1
Between
tick n−1 and n we readjust the portfolio in anticipation of the nth tick. But
we only want to use the resources that are available to us via the portfolio
itself. Also, between the n − 1st and nth tick, the stock is still worth S
n−1
.
Therefore we require
(SFC1) φ
n
S
n−1

n
B
n−1
= φ
n−1
S
n−1

n−1
B
n−1
.
3.6. PRICING A DERIVATIVE 79
Another equivalent formulation of the self-financing condition is arrived at
by
(∆Π)
n
= Π
n
−Π
n−1
= φ
n
S
n

n
B
n
−(φ
n−1
S
n−1

n−1
B
n−1
)
= φ
n
S
n

n
B
n
−(φ
n
S
n−1

n
B
n−1
)
= φ
n
(S
n
−S
n−1
) −ψ
n
(B
n
−B
n−1
)
= φ
n
(∆S)
n

n
(∆B)
n
(3.6.1)
So we have
(SCF2) (∆Π)
n
= φ
n
(∆S)
n

n
(∆B)
n
.
Now we have
˜
V
n
= E
Q
(e
−rNδt
h(S
N
)[T
n
) and we can write
˜
V
n
= V
0
+
n

i=1
φ
i

˜
S
i
,
for some previsible φ
i
. Now we want
˜
Π
n
= φ
n
˜
S
n

n
B
0
to equal
˜
V
n
. So set
ψ
n
= (
˜
V
n
−φ
n
˜
S
n
)/B
0
or equivalently
ψ
n
= (V
n
−φ
n
S
n
)/B
n
Clearly,
˜
Π
n
= φ
n
˜
S
n

n
B
0
=
˜
V
n
.
We now need to check two things:
1. previsibility of φ
n
and ψ
n
;
2. self-financing of the portfolio.
1. Previsibility: φ
n
is previsible by the martingale representation theorem.
As for ψ
n
, we have
ψ
n
=
˜
V
n
−φ
n
˜
S
n
B
0
=
φ
n
(
˜
S
n

˜
S
n−1
) +

n−1
j=1
φ
j
(
˜
S
j

˜
S
j−1
) −φ
n
˜
S
n
B
0
=
−φ
n
˜
S
n−1
+

n−1
j=1
φ
j
(
˜
S
j

˜
S
j−1
)
B
0
80 CHAPTER 3. DISCRETE TIME
which is manifestly independent of
˜
S
n
, hence ψ
n
is previsible.
2. Self-financing: We need to verify
φ
n
S
n−1

n
B
n−1
= φ
n−1
S
n−1

n−1
B
n−1
or what is the same thing
(SFC3) (φ
n
−φ
n−1
)
˜
S
n−1
+ (ψ
n
−ψ
n−1
)B
0
= 0
We calculate the right term (ψ
n
−ψ
n−1
)B
0
= (
˜
V
n
−φ
n
˜
S
n
B
0

˜
V
n−1
−φ
n−1
˜
S
n−1
B
0
)B
0
=
˜
V
n

˜
V
n−1

n−1
˜
S
n−1
−φ
n
˜
S
n
= φ
n
(
˜
S
n

˜
S
n−1
) +φ
n−1
˜
S
n−1
−φ
n
˜
S
n
= −(φ
n
−φ
n−1
)
˜
S
n−1
which clearly implies (SFC3).
So finally we have constructed a self-financing hedging strategy for the deriva-
tive h. By the no-arbitrage hypothesis, we must have that the value of h
at time n (including time 0 or now) is V (h)
n
= Π
n
= V
n
= e
rnδt
˜
V
n
=
e
rnδt
E
Q
(e
−rNδt
h(S
N
))
= e
−r(N−n)δt
E
Q
(h(S
N
))
3.7 Dynamic portfolio selection
In this section we will use our martingale theory and derivative pricing theory
to solve a problem in portfolio selection. We will carry this out in a rather
simplified situation, but the techniques are valid in much more generality.
Thus suppose that we want to invest in a stock, S
n
and the bond B
n
.
In our first go round, we just want to find a portfolio Π
n
= (φ
n
, ψ
n
) which
maximizes something. To maximize the expected value of Π
N
where N rep-
resents the time horizon, will just result in investing everything in the stock,
because it has a higher expected return. This does not take into account
our aversion to risk. It is better to maximize E(U(Π
n
)) where U is a (dif-
ferentiable) utility function that encodes our risk aversion. U is required to
satisfy two conditions:
3.7. DYNAMIC PORTFOLIO SELECTION 81
1. U should be increasing, which just represents that more money is better
than less, and means U

(x) > 0
2. U

should be decreasing, that is U

< 0 which represents that the
utility of the absolute gain is decreasing, that is the utility of going
from a billion dollars to a billion and one dollars is less than the utility
of going from one hundred dollars to one hundred and one dollars.
Some common utility functions are
U
0
(x) = ln(x)
U
p
(x) =
x
p
p
for 0 ,= p < 1
(3.7.1)
We will carry out the problem for U(x) = U
0
(x) = ln(x) and leave the
case of U
p
(x) as a problem.
So our problem is to now maximize E
P
(ln(Π
n
)) but we have constraints.
Here P represents the real world probabilities, since we want to maximize
in the real world, not the risk neutral one. First, want that the value of
our portfolio at time 0 is the amount that we are willing to invest in it. So
we want Π
0
= W
0
for some initial wealth W
0
. Second, if we want to let
this portfolio go, with no other additions or withdrawals, then we want it
to be self financing. Thus Π
n
should satisfy SFC. And finally, for Π
n
to be
realistically tradeable, it also must satisfy that it is previsible.
In chapter 3 we saw that there was an equivalence between two sets of
objects:
1. Self-financing previsible portfolio processes Π
n
and
2. derivatives h on the stock S.
If in addition we want the value of the portfolio to satisfy Π
0
= W
0
then our
problem
max(E
P
(ln(Π
N
))) subject to Π
0
= 0
can be reformulated as
max(E
P
(ln(h))) subject to the value of h at time 0 is W
0
82 CHAPTER 3. DISCRETE TIME
And now we may use our derivative pricing formula to look at this condition,
since the value of a derivative h at time 0 is
E
Q
(exp(−rT)h)
where Q is the risk neutral probabilities. Thus the above problem becomes
max(E
P
(ln(h))) subject to E
Q
(exp(−rT)h)) = W
0
Now we are in a position where we are really using two probabilities at the
same time. We need the following theorem which allows us to compare them.
Theorem 3.7.1. (Radon-Nikodym) Given two probabilities P and Q assume
that if P(E) = 0 then Q(E) = 0 for any event E. Then there exists a random
variable ξ such that
E
Q
(X) = E
P
(ξX)
for any random variable X.
Proof. In our discrete setting this theorem is easy. For each outcome ω, set
ξ(ω) = Q(ω)/P(ω), if P(ω) ,= 0 and 0 otherwise. Then
E
Q
(X) =

ω
X(ω)Q(ω)
=

ω
X(ω)P(ω)
Q(ω)
P(ω)
= E
P
(ξX)
So now our problem is
max(E
P
(ln(h))) subject to E
P
(ξ exp(−rT)h)) = W
0
We can now approach this extremal problem by the method of Lagrange
multipliers.
According to this method, to find an extremum of the function
max(E
P
(ln(h))) subject to E
P
(ξ exp(−rT)h)) = W
0
we introduce the mutliplier λ. And we solve
d
d
[
=0
(E
P
(ln(h +v))) −λE
P
(ξ exp(−rT)(h +v))) = 0
3.7. DYNAMIC PORTFOLIO SELECTION 83
for all random variables v. Now the left hand side is
E
P
(
d
d
[
=0
ln(h +v)) −λE
P
(
d
d
[
=0
ξ exp(−rT)(h +v)) =
E
P
(
1
h
v −λξ exp(−rT)v) =
E
P
([
1
h
−λξ exp(−rT)]v) =
(3.7.2)
and we want this to be 0 for all random variables v. In order for this expec-
tation to be 0 for all v, it must be the case that
1
h
−λξ exp(−rT) = 0
almost surely. Hence h = exp(rT)/(λξ). To calculate λ we put h back into
the constraint: W
0
= E
P
(ξ exp(−rT)h)
=E
P
(ξ exp(−rT) exp(rT)/(λξ))
=E
P
(1/λ)
(3.7.3)
So λ = 1/W
0
and thus finally we have
h = W
0
exp(rT)/ξ
To go further we need to write down a specific model for the stock. Thus
let X
1
, X
2
, . . . , be IID with law P(X
j
= ±1) = 1/2. Set S
0
be the initial
value of the stock and inductively
S
n
= S
n−1
(1 +µ δt +σ

δtX
n
)
In other words, S
n
is just a binomial stock model with
u = (1 +µ δt +σ

δt)
and
d = (1 +µ δt −σ

δt)
The sample space for this process can be taken to be sequences x
1
x
2
x
3
x
N
where x
i
is u or d representing either an up move or a down move.
Then by our formula for the risk neutral probability
q =
exp rδt −d
u −d
=
rδt −(µδt −σ

δt)
(µδt +σ

δt) −(µδt −σ

δt)
84 CHAPTER 3. DISCRETE TIME
up to first order in δt. So
q =
1
2
_
1 + (
r −µ
σ
)

δt
_
and
1 −q =
1
2
_
1 −(
r −µ
σ
)

δt
_
This then determines the random variable ξ in the Radon-Nikodym theorem,
3.7.1:
ξ(x
1
x
2
x
3
x
N
) = ξ
1
(x
1

2
(x
2
) ξ
N
(x
N
)
where x
i
denotes either u or d and
ξ
n
(u) =
q
p
=
_
1 + (
r −µ
σ
)

δt
_
(3.7.4)
and
ξ
n
(d) =
1 −q
1 −p
=
_
1 −(
r −µ
σ
)

δt
_
We now know which derivative has as its replicating portfolio the portfolio
that solves our optimization problem. We now calculate what the replicating
portfolio actually is. For this we use our theory from chapter 3 again. This
tells us that we take
¯
V
n
= E
Q
(exp(−rT)h[T
n
)
Then by the martingale representation theorem there is a previsible process
φ
n
such that
¯
V
n
=
¯
V
0
+
n

k=1
φ
k
(
¯
S
k

¯
S
k−1
)
Moreover, it tells us how pick φ
n
,
φ
n
=
¯
V
n

¯
V
n−1
¯
S
n

¯
S
n−1
Now let’s recall that by the fact that
¯
S
n
is a martingale with respect to Q
¯
S
n−1
= E
Q
(
¯
S
n
[T
n−1
) = q
¯
S
u
n
+ (1 −q)
¯
S
d
n
3.7. DYNAMIC PORTFOLIO SELECTION 85
where
¯
S
u
n
denotes the value of
¯
S
n
if S takes an uptick at time n. Similarly,
we have
¯
V
n−1
= E
Q
(
¯
V
n
[T
n−1
) = q
¯
V
u
n
+ (1 −q)
¯
V
d
n
From this, we see that if S takes an uptick at time n that
φ
n
=
¯
V
n

¯
V
n−1
¯
S
n

¯
S
n−1
=
¯
V
u
n
−(q
¯
V
u
n
+ (1 −q)
¯
V
d
n
)
¯
S
u
n
−(q
¯
S
u
n
+ (1 −q)
¯
S
d
n
)
=
¯
V
u
n

¯
V
d
n
¯
S
u
n

¯
S
d
n
A similar check will give the same exact answer if S took a downtick at time
n. (After all, φ
n
must be previsible and therefore independent of whether S
takes an uptick or downtick at time n.) Finally let’s remark that φ
n
can be
expressed in terms of the undiscounted prices
φ
n
=
¯
V
u
n

¯
V
d
n
¯
S
u
n

¯
S
d
n
=
V
u
n
−V
d
n
S
u
n
−S
d
n
since both the numerator and the denominator are expressions at the same
time n and therefore the discounting factor for the numerator and the de-
nominator cancel.
Now we use the following refinement of the Radon-Nikodym theorem.
Theorem 3.7.2. Given two probabilities P and Q assume that if P(E) = 0
then Q(E) = 0 for any event E. Let ξ be the random variable such that
E
Q
(X) = E
P
(ξX)
for any random variable X. Then for any event A, we have
E
Q
(X[A) =
E
P
(ξX[A)
E
P
(ξ[A)
I will ask you to prove this.
From this theorem it follows that
¯
V
n
= E
Q
(
¯
h[T
n
) =
E
P

¯
h[T
n
)
E
P
(ξ[T
n
)
Therefore we see that
¯
V
n
=
¯
V
n−1
ξ
n
(x
n
)
86 CHAPTER 3. DISCRETE TIME
recalling that x
n
denotes the nth tick and ξ
n
is given by (3.7.4).
Now we are ready to finish our calculation.
¯
V
n
=
e
V
n−1
ξn(xn)
so
¯
V
u
n
=
¯
V
n−1
ξ
n
(u)
=
¯
V
n−1
_
1 + (
r−µ
σ
)

δt
_
¯
V
d
n
=
¯
V
n−1
ξ
n
(d)
=
¯
V
n−1
_
1 −(
r−µ
σ
)

δt
_
We also have
S
u
n
= S
n−1
(1 +µδt +σ

δt)
so
¯
S
u
n
=
¯
S
n−1
(1 + (µ −r)δt +σ

δt)
up to first order in δt. Similarly
¯
S
d
n
=
¯
S
n−1
(1 + (µ −r)δt −σ

δt)
to first order in δt. So we have
φ
n
=
¯
V
u
n

¯
V
d
n
¯
S
u
n

¯
S
d
n
=
¯
V
n−1
_
1
(1+(
r−µ
σ
)

δt)

1
(1−(
r−µ
σ
)

δt)
_
¯
S
n−1
(2σ

δt)
=
¯
V
n−1
¯
S
n−1
_
_
_
_
1 −(
r−µ
σ
)

δt
_

_
1 + (
r−µ
σ
)

δt
_
_
1 −(
r−µ
σ
)
2
δt
_


δt
_
_
_
=
¯
V
n−1
¯
S
n−1
_

2(
r−µ
σ
)

δt


δt
_
(3.7.5)
up to terms of order δt. So we have
φ
n
=
¯
V
n−1
(
µ−r
σ
2
)
¯
S
n−1
=
V
n−1
(
µ−r
σ
2
)
S
n−1
(3.7.6)
the last equality is because the quotient of the two discounting factors cancel.
V
n−1
represents the value of the portfolio at time n − 1, and φ
n
represents
3.7. DYNAMIC PORTFOLIO SELECTION 87
the number of shares of stock in this portfolio. Hence the amount of money
invested in the stock is φ
n
S
n−1
= (
µ−r
σ
2
)V
n−1
. Thus formula (3.7.6) expresses
the fact that at every tick, you should have the proportion
µ−r
σ
2
of your port-
folio invested in the stock.
Over the 50 year period from 1926 to 1985, the S&P 500 has had excess
return µ−r over the short term treasury bills of about .058 with a standard
deviation of σ of .216. Doing the calculation with the values above yields
.058/(.216)
2
= 1.23 = 123%. That is, with log utility, the optimal portfolio
is one where one invests all ones wealth in the S&P 500, and borrows an
additional 23% of current wealth and invests it also in the S&P.
88 CHAPTER 3. DISCRETE TIME
3.8 Exercises
1. Let S
0
, S
u
1
, S
d
1
be as in class, for the one-step binomial model. For
the pseudo-probabilities q of going up and 1 −q of going down that we
calculated in class, what is E
Q
(S
1
).
2. We arrived at the following formula for discounting. If B is a bond
with par value 1 at time T, then at time t < T the bond is worth
exp(−r(T − t)) assuming constant interest rates. Now assume that
interest rates are not constant but are described by a known function
of time r(t). Recalling the derivation of the formula above for constant
interest rate, calculate the value of the same bond under the interest
rate function r(t). Your answer should have an integral in it.
3. A fair die is rolled twice. Let X denote the sum of the values. What is
E(X[A) where A is the event that the first roll was a 2.
4. Toss a nickel, a dime and a quarter. Let B
i
denote the event that i of
the coins landed heads up, i = 0, 1, 2, 3. Let X denote the sum of the
amount shown, that is, a coin says how much it is worth only on the
tales side, so X is the sum of the values of the coins that landed tales
up. What is E(X[B
i
).
5. Let X be a random variable with pdf p(x) = αexp(−αx), (a continuous
distribution.) Calculate E(X[¦X ≥ t¦).
In the next three problems calculate E(X[Y ) for the two random vari-
ables X and Y . That is, describe E(X[Y ) as a function of the values
of Y . Also, construct the pdf for this random variable.
6. A die is rolled twice. X is the sum of the values, and Y is the value on
the first roll.
7. From a bag containing four balls numbered 1, 2, 3 and 4 you draw two
balls. If at least one of the balls is numbered 2 you win $1 otherwise
3.8. EXERCISES 89
you lose a $1. Let X denote the amount won or lost. Let Y denote the
number on the first ball chosen.
8. A bag contains three coins, but one of them is fake and has two heads.
Two coins are drawn and tossed. Let X be the number of heads tossed
and Y denote the number of honest coins drawn.
9. Let W
n
denote the symmetric random walk starting at 0, that is p = 1/2
and W
0
= 0. Let X denote the number of steps to the right the random
walker takes in the first 6 steps. Calculate
(a) E(X[W
1
= −1).
(b) E(X[W
1
= 1).
(c) E(X[W
1
= 1, W
2
= −1)
(d) E(X[W
1
= 1, W
2
= −1, W
3
= −1)
10. We’ve seen that if ξ
n
are independent with expectations all 0, then
M
n
=
n

j=1
ξ
j
is a Martingale. It is not necessary for the ξ
j
’s to be independent.
In this exercise you are asked to prove that, though they may not be
independent, they are uncorrelated. Thus, let M
n
be a Martingale and
let ξ
j
= M
j
−M
j−1
be its increments. Show
(a) E(ξ
j
) = 0 for all j.
(b) E(ξ
j
ξ
k
) = 0 for all j ,= k. (This means that ξ
j
and ξ
k
have 0
covariance, since their expectations are 0.) (Hint: You will have
to use several of the properties of conditional expectations.)
In the following three problems, you are asked to derive what is called
the gambler’s ruin using techniques from Martingale theory.
Suppose a gambler plays a game repeatedly of tossing a coin with prob-
ability p of getting heads, in which case he wins $1 and 1 −p of getting
tails in which case he loses $1. Suppose the gambler begins with $a
90 CHAPTER 3. DISCRETE TIME
and keeps gambling until he either goes bankrupt or gets $b, a < b.
Mathematically, let ξ
j
, j = 1, 2, 3, . . . be IID random variables with
distribution P(ξ
j
= 1) = p and P(ξ
j
= −1) = 1 −p, where 0 < p < 1.
Let S
n
denote the gamblers fortune after the nth toss, that is
S
n
= a +ξ
1
+ +ξ
n
Define the stopping time, T = min¦n[S
n
= 0 or b¦ that is, T is the
time when the gambler goes bankrupt or achieves a fortune of b. Define
A
n
= S
n
−n(2p −1) and M
n
= (
1−p
p
)
Sn
.
11. Show A
n
and M
n
are Martingales.
12. Calculate P(S
T
= 0), the probability that the gambler will eventually
go bankrupt, in the following way. Using Doob’s optional stopping
theorem, calculate E(M
T
). Now use the definition of E(M
T
) in terms
of P(S
T
= j) to express P(S
T
= 0) in terms of E(M
T
) and solve for
P(S
T
= 0).
13. Calculate E(S
T
).
14. Calculate E(T) the expected time till bust or fortune in the follow-
ing way: calculate E(A
T
) by the optional stopping theorem. Express
E(A
T
) in terms of E(S
T
) and E(T). Solve for E(T).
In the following problems, you are asked to find an optimal betting
strategy using Martingale methods.
Suppose a gambler is playing a game repeatedly where the winnings of
a $1 bet pay-off $1 and the probability of winning each round is p > 1/2
is in the gambler’s favor. (By counting cards, blackjack can be made
such a game.) What betting strategy should the gambler adopt. It is
not hard to show that if the gambler wants to maximize his expected
fortune after n trials, then he should bet his entire fortune each round.
This seems counter intuitive and in fact doesn’t take into much account
the gambler’s strong interest not to go bankrupt. It turns out to make
more sense to maximize not ones expected fortune, but the expected
value of the logarithm his fortune.
As above, let ξ
j
be IID with P(ξ
j
= 1) = p and P(ξ
j
= −1) = 1 − p
and assume this time that p > 1/2. Let F
n
denote the fortune of the
3.8. EXERCISES 91
gambler at time n. Assume the gambler starts with $a and adopts a
betting strategy φ
j
, (a previsible process), that is, in the jth round he
bets $φ
j
. Then we can express his fortune
F
n
= a +
n

j=1
φ
j
ξ
j
We assume that 0 < φ
n
< F
n−1
, that is the gambler must bet at least
something to stay in the game, and he doesn’t ever want to bet his
entire fortune. Let e = p ln(p) + (1 −p) ln(1 −p) + ln 2.
15. Show L
n
= ln(F
n
) −ne is a super Martingale and thus E(ln(F
n
/F
0
)) ≤
ne.
16. Show that for some strategy, L
n
is a Martingale. What is the strategy?
(Hint: You should bet a fixed percentage of your current fortune.)
17. Carry out a similar analysis to the above using the utility function
U
p
(x) =
x
p
p
, p < 1 not equal to 0. Reduce to terms of order δt. You
will have to use Taylor’s theorem. Your answer should not have a δt in
it.
92 CHAPTER 3. DISCRETE TIME
Chapter 4
Stochastic processes and
finance in continuous time
In this chapter we begin our study of continuous time finance. We could
develop it similarly to the model of chapter 3 using continuous time martin-
gales, martingale representation theorem, change of measure, etc. But the
probabilistic infra structure needed to handle this is more formidable. An ap-
proach often taken is to model things by partial differential equations, which
is the approach we will take in this chapter. We will however discuss contin-
uous time stochastic processes and their relationship to partial differential
equations in more detail than most books.
We will warm up by progressing slowly from stochastic processes discrete
in both time and space to processes continuous in both time and space.
4.1 Stochastic processes continuous in time,
discrete in space
We segue from our completely discrete processes by now allowing the time
parameter to be continuous but still demanding space to be discrete. Thus
our second kind of stochastic process we will consider is a process X
t
where
the time t is a real-valued deterministic variable and at each time t, the
sample space Ψ of the random variable X
t
is a (finite or infinite) discrete set.
In such a case, a state X of the entire process is described as a sequence of
jump times (τ
1
, τ
2
, . . .) and a sequence of values of X (elements of the common
93
94 CHAPTER 4. CONTINUOUS TIME
discrete state space) j
0
, j
1
, . . .. These describe a ”path” of the system through
S given by X(t) = j
0
for 0 < t < τ
1
, X
t
= j
1
for τ
1
< t < τ
2
, etc... Because
X moves by a sequence of jumps, these processes are often called ”jump”
processes.
Thus, to describe the stochastic process, we need, for each j ∈ S, a contin-
uous random variable F
j
that describes how long it will be before the process
(starting at X = j) will jump, and a set of discrete random variables Q
j
that
describe the probabilities of jumping from j to other states when the jump
occurs. We will let Q
jk
be the probability that the state starting at j will
jump to k (when it jumps), so

k
Q
jk
= 1 and Q
jj
= 0
We also define the function F
j
(t) to be the probability that the process
starting at X = j has already jumped at time t, i.e.,
F
j
(t) = P(τ
1
< t [ X
0
= j).
All this is defined so that
1.
P(τ
1
< t, X
τ
1
= k[X
0
= j) = F
j
(t)Q
jk
(4.1.1)
(this is an independence assumption: where we jump to is independent
of when we jump).
Instead of writing P(. . . [X
0
= j), we will begin to write P
j
(. . . ).
We will make the usual stationarity and Markov assumptions as follows:
We assume first that
2.
P
j

1
< s, X
τ
1
= k, τ
2
−τ
1
< t, X
τ
2
= l)
= P(τ
1
< s, X
τ
1
= k, τ
2
−τ
1
< t, X
τ
2
= l [ X
0
= j)
= F
j
(s)Q
jk
F
k
(t)Q
kl
.
It will be convenient to define the quantity P
jk
(t) to be the probability
that X
t
= k, given that X
0
= j, i.e.,
P
jk
(t) = P
j
(X
t
= k).
4.1. CONTINUOUS TIME, DISCRETE SPACE 95
3. The Markov assumption is
P(X
t
= k[X
s
1
= j
1
, . . . , X
sn
= j
n
, X
s
= j) = P
jk
(t −s).
The three assumptions 1, 2, 3 will enable us to derive a (possibly infinite)
system of differential equations for the probability functions P
jk
(t) for j and
k ∈ S. First, the stationarity assumption M1 tells us that
P
j

1
> t +s [ τ
1
> s) = P
j

1
> t).
This means that the probability of having to wait at least one minute more
before the process jumps is independent of how long we have waited already,
i.e., we can take any time to be t = 0 and the rules governing the process
will not change. (From a group- theoretic point of view this means that the
process is invariant under time translation). Using the formula for conditional
probability (P(A[B) = P(A ∩ B)/P(B)), the fact that the event τ
1
> t + s
is a subset of the event τ
1
> s, we translate this equation into
1 −F
j
(t +s)
1 −F
j
(s)
= 1 −F
j
(t)
This implies that the function 1-F
j
(t) is a multiplicative function, i.e., it
must be an exponential function. It is decreasing since F
j
(t) represents the
probability that something happens before time t, and F
j
(0)=0, so we must
have that 1-F
j
(t)=e
−q
j
t
for some positive number q
j
. In other words,
F
j
is exponentially distributed with probability density function
f
j
(t) = q
j
e
−q
j
t
for t ≥ 0
To check this, note that
P
j

1
> t) = 1 −F
j
(t) =
_

t
q
j
e
−q
j
s
ds = e
−q
j
t
so the two ways of computing F
j
(t) agree .
The next consequence of our stationarity and Markov assumptions are the
Chapman-Kolmorgorov equations in this setting:
P
jk
(t +s) =

l∈S
P
jl
(t)P
lk
(s).
96 CHAPTER 4. CONTINUOUS TIME
They say that to calculate the probability of going from state j to state k
in time t + s, one can sum up all the ways of going from j to k via any l at
the intermediate time t. This is an obvious consistency equation, but it will
prove very useful later on.
Now we can begin the calculations that will produce our system of differ-
ential equations for P
jk
(t). The first step is to express P
jk
(t) based on the
time of the first jump from j. The result we intend to prove is the following:
P
jk
(t) = δ
jk
e
−q
j
t
+
_
t
0
q
j
e
q
j
s
_

l=j
Q
jl
P
lk
(t −s)
_
ds
As intimidating as this formula looks, it can be understood in a fairly intuitive
way. We take it a term at a time:
δ
jk
e
−q
j
t
: This term is the probability that the first jump hasn’t happened
yet, i.e., that the state has remained j from the start. That is why the
Kronecker delta symbol is there (recall that δ
jk
= 1 if j = k and = 0 if j is
different from k). This term is zero unless j = k, in which case it gives the
probability that τ
1
> t, i.e., that no jump has occurred yet.
The integral term can be best understood as a ”double sum”, and then
by using the Chapman Kolmorgorov equation. For any s between 0 and
t, the definition of f
j
(t) tells us that the probability that the first jump
occurs between time s and time s+ds is (approximately) f
j
(s)ds = q
j
e
−q
j
s
ds
(where the error in the approximation goes to zero faster than ds as ds
goes to zero). Therefore, the probability that for l different from j, the first
jump occurs between time s and time s+ds and the jump goes from j to l
is (approximately) q
j
e
−q
j
s
Q
jl
ds. Based on this, we see that the probability
that the first jump occurs between time s and time s+ds, and the jump goes
from j to l, and in the intervening time (between s and t) the process goes
from l to k is (approximately)
q
j
e
−q
j
s
Q
jl
P
lk
(t −s)ds
To account for all of the ways of going from j to k in time t, we need to add
up these probabilities over all possible times s of the first jump (this gives an
integral with respect to ds in the limit as ds goes to 0), and over all possible
results l different from j of the first jump. This reasoning yields the second
term of the formula.
We can make a change of variables in the integral term of the formula: let
s be replaced by t-s. Note that this will transform ds into -ds, but it will
4.1. CONTINUOUS TIME, DISCRETE SPACE 97
also reverse the limits of integration. The result of this change of variables
is (after taking out factors from the sum and integral that depend neither
upon l nor upon s) :
P
jk
(t) = δ
jk
e
−q
j
t
+q
j
e
−q
j
t
_
t
0
e
q
j
s
_

l=j
Q
jl
P
lk
(s)
_
ds
= e
−q
j
t
_
δ
jk
+q
j
_
t
0
e
q
j
s
_

l=j
Q
jl
P
lk
(s)
_
ds
_
Now P
jk
(t) is a continuous function of t because it is defined as the product
of a continuous function with a constant plus a constant times the integral
of a bounded function of t (the integrand is bounded on any bounded t
interval). But once P
jk
is continuous, we can differentiate both sides of the
last equation with respect to t and show that P
jk
(t) is also differentiable.
We take this derivative (and take note of the fact that when we differentiate
the first factor we just get a constant times what we started with) using the
fundamental theorem of calculus to arrive at:
P

jk
(t) = −q
j
P
jk
(t) +q
j

l=j
Q
jl
P
lk
(t).
A special case of this differential equation occurs when t=0. Note if we
know that X
0
= j, then it must be true that P
jk
(0) = δ
jk
. This observation
enables us to write:
P

jk
(0) = −q
j
P
jk
(0) +q
j

l=j
Q
jl
P
lk
(0)
= −q
j
δ
jk
+q
j

l=j
Q
jl
δ
lk
= −q
j
δ
jk
+q
j
Q
jk
.
One last bit of notation! Let q
jk
= P

jk
(0). So q
jj
= −q
j
and q
jk
= q
j
Q
jk
if
j ,= k. (Note that this implies that

k=j
q
jk
= q
j
= −q
jj
98 CHAPTER 4. CONTINUOUS TIME
so the sum of the q
jk
as k ranges over all of S is 0. the constants q
jk
a re
called the infinitesimal parameters of the stochastic process.
Given this notation, we can rewrite our equation for P

jk
(t) given above as
follows:
P

jk
(t) =

l∈S
q
jl
P
lk
(t)
for all t > 0. This system of differential equations is called the backward
equation of the process.
Another system of differential equations we can derive is the forward equa-
tion of the process. To do this, we differentiate both sides of the Chapman
Kolmogorov equation with respect to s and get
P

jk
(t +s) =

l∈S
P
jl
(t)P

lk
(s)
If we set s=0 in this equation and use the definition of q
jk
, we get the forward
equation:
P

jk
(t) =

l∈S
q
lk
P
jl
(t).
4.2 Examples of Stochastic processes contin-
uous in time, discrete in space
The simplest useful example of a stochastic process that is continuous in time
but discrete in space is the Poisson process. The Poisson process is a special
case of a set of processes called birth processes. In these processes, the sample
space for each random variable X
t
is the set of non-negative integers, and for
each non-negative integer j, we choose a positive real parameter λ
j
which will
be equal to both q
j,j+1
and −q
j
. Thus, in such a process, the value of X can
either jump up by 1 or stay the same. The time until the process jumps up
by one, starting from X=j, is exponentially distributed with parameter λ
j
.
So if we think of X as representing a population, it is one that experiences
only births (one at a time) at random times, but no deaths.
The specialization in the Poisson process is that λ
j
is chosen to have the
same value, λ, for all j. Since we move one step to the right (on the j-axis)
4.2. EXAMPLES 99
according to the exponential distribution, we can conclude without much
thought that
P
jk
(t) = 0 for k < j and t > 0
and that
P
jj
(t) = e
−λt
for all t > 0. To learn any more, we must consider the system of differential
equations (the ”forward equations”) derived in the preceding lecture.
Recall that the forward equation
P

jk
(t) =

l∈S
q
lk
P
jl
(t)
gave the rate of change in the probability of being at k (given that the process
started at j) in terms of the probabilities of jumping to k from any l just at
time t (the terms of the sum for l ,= k) and (minus) the probability of already
being at k an d jumping away just at time t (recall that q
kk
is negative.
For the Poisson process, the forward equations simplify considerably, be-
cause q
jk
is nonzero only if j = k or if k = j + 1. And for these values we
have:
q
jj
= −q
j
= −λ and q
j,j+1
= λ
Therefore the forward equation reduces to:
P

jk
(t) =

l=0
q
lk
P
jl
(t) = q
k−1,k
P
j,k−1
(t) +q
kk
P
jk
(t)
= λP
j,k−1
(t) −λP
jk
(t).
Since we already know P
jk
(t) for k ≤ j, we will use the preceding differential
equation inductively to calculate P
jk
(t) for k > j, one k at a time.
To do this, though, we need to review a little about linear first-order dif-
ferential equations. In particular, we will consider equations of the form
f

+λf = g
where f and g are functions of t and λ is a constant. To solve such equa-
tions, we make the left side have the form of the derivative of a product by
multiplying through by e
λt
:
e
λt
f

+λe
λt
f = e
λt
g
100 CHAPTER 4. CONTINUOUS TIME
We recognize the left as the derivative of the product of e
λt
and f:
(e
λt
f)

= e
λt
g
We can integrate both sides of this 9with s substituted for t) from 0 to t to
recover f(t):
_
t
0
d
ds
(e
λs
f(s)) ds =
_
t
0
e
λs
g(s) ds
Of course, the integral of the derivative may be evaluated using the (second)
fundamental theorem of calculus:
e
λt
f(t) −f(0) =
_
t
0
e
λs
g(s) ds
Now we can solve algebraically for f(t):
f(t) = e
−λt
f(0) +e
−λt
_
t
0
e
λs
g(s) ds
Using this formula for the solution of the differential equation, we substitute
P
jk
(t) for f(t), and λP
j,k−1
(t) for g(t). We get that
P
jk
(t) = e
−λt
P
jk
(0) +λ
_
t
0
e
−λ(t−s)
P
j,k−1
(s) ds
Since we are assuming that the process starts at j, i.e., X
0
= j, we know that
P
jk
(0) = 0 for all k ,= j and P
jj
(0) = 1. In other words
P
jk
(0) = δ
jk
(this is the Kronecker delta).
Now we are ready to compute. Note that we have to compute P
jk
(t) for
all possible nonnegative integer values of j and k. In other words, we have to
fill in all the entries in the table:
P
00
(t) P
01
(t) P
02
(t) P
03
(t) P
04
(t)
P
10
(t) P
11
(t) P
12
(t) P
13
(t) P
14
(t)
P
20
(t) P
21
(t) P
22
(t) P
23
(t) P
24
(t)
P
30
(t) P
31
(t) P
32
(t) P
33
(t) P
34
(t)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4.2. EXAMPLES 101
Of course, we already know part of the table:
e
−λt
P
01
(t) P
02
(t) P
03
(t) P
04
(t)
0 e
−λt
P
12
(t) P
13
(t) P
14
(t)
0 0 e
−λt
P
23
(t) P
24
(t)
0 0 0 e
−λt
P
34
(t)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
So we begin the induction:
We know that P
jj
= e
−λt
, so we can use the differential equation
P

j,j+1
(t) +λP
j,j+1
(t) = λP
jj
(t)
to get that
P
j,j+1
(t) = λ
_
t
0
e
−λ(t−s)
e
−λs
ds = λte
−λt
Note that this fills in the next diagonal of the table.
Then we can use the differential equation
P

j,j+2
(t) +λP
j,j+2
(t) = λP
j,j+1
(t)
to get that
P
j,j+2
(t) = λ
_
t
0
e
−λ(t−s)
λse
−λs
ds =
(λt)
2
2
e
−λt
Continuing by induction we get that
P
jk
(t) =
(λt)
k−j
(k −j)!
e
−λt
provided 0 ≤ j ≤ k and t > 0. This means that the random variable X
t
(this is the infinite discrete random variable that gives the value of j at time
t, i.e., tells how many jumps have happened up until time t, has a Poisson
distribution with parameter λt This is the important connection between the
exponential distribution and the Poisson distribution.
102 CHAPTER 4. CONTINUOUS TIME
4.3 The Wiener process
4.3.1 Review of Taylor’s theorem
Taylor series are familiar to every student of calculus as a means writing
functions as power series. The real force of Taylor series is what is called
Taylor’s theorem which tells one how well the Taylor polynomials approxi-
mate the function. We will be satisfied with the following version of Taylor’s
theorem. First, we introduce some notation.
Definition 4.3.1. For two functions f and g, we say that f(h) is o(g(h))
(as h → 0) (read ”f is little o of g) if
lim
h→0
f(h)/g(h) = 0
It of course means that f(h) is smaller than g(h) (as h gets small) but in
a very precise manner.
As an example, note that another way to say that f

(a) = b is to say that
f(a +h) −(f(a) +bh) = o(h)
That is, that the difference between the function h → f(a+h) and the linear
function h → f(a) +bh is small as h → 0.
Taylor’s theorem is just the higher order version of this statement.
Theorem 4.3.1. Let f be a function defined on an interval (α, β) be n times
continuously differentiable. Then for any a ∈ (α, β)
f(a +h) = f(a) +f

(a)h +
f

(a)
2!
h
2
+
f
(3)
(a)
3!
h
3
+ +
f
(n)
(a)
n!
h
n
+o(h
n
)
We will in fact need a multivariable version of this, but only of low order.
Theorem 4.3.2. Let f(x, y) be a function of two variables defined on an
open set of O ⊂ R
2
. Then for any point (a, b) ∈ O one has
f(a +k, b +k) = f(a, b) +
∂f
∂x
(a, b)h +
∂f
∂y
(a, b)k
+
1
2
(

2
f
∂x
2
(a, b)h
2
+ 2

2
f
∂x∂y
(a, b)kh +

2
f
∂y
2
(a, b)k
2
) +o(h
2
, k
2
, hk)
4.3. THE WIENER PROCESS 103
4.3.2 The Fokker-Plank equation
We will give a derivation of the Fokker-Planck equation, which governs the
evolution of the probability density function of a random variable-valued
function X
t
that satisfies a ”first-order stochastic differential equation”. The
variable evolves according to what in the literature is called a Wiener process,
named after Norbert Wiener, who developed a great deal of the theory of
stochastic processes during the first half of this century.
We shall begin with a generalized version of random walk (an ”unbalanced”
version where the probability of moving in one direction is not necessarily
the same as that of moving in the other). Then we shall let both the space
and time increments shrink in such a way that we can make sense of the limit
as both ∆x and ∆y approach 0.
Recall from the previous sections the one-dimensional random walk (on
the integers) beginning at W
0
= 0. We fix a parameter p, and let p be the
probability of taking a step to the right, so that 1 − p is the probability of
taking a step to the left, at any (integer-valued) time n. In symbols, this is:
p = P(W
n+1
= x + 1[W
n
= x) and 1 −p = P(W
n+1
= x −1[W
n
= x).
Thus
W
n
= ξ
1
+ +ξ
n
where ξ
j
are IID with P(ξ
j
= 1) = p and P(ξ
j
= −1) = 1 − p. Since we
are given that W
0
= 0, we know that the probability function of W
n
will be
essentially binomially distributed (with parameters n and p, except that at
time n, x will go from −n to +n, skipping every other integer as discussed
earlier). We will set v
x,n
= P(W
n
= x). Then
v
x,n
=
_
n
1
2
(n +x)
_
p
1
2
(n+x)
(1 −p)
1
2
(n−x)
for x = −n, −(n −2), . . . , (n −2), n and v
x,n
= 0 otherwise.
For use later, we calculate the expected value and variance of W
n
. First,
E(W
n
) =
n

x=−n
xv
x,n
=
n

k=0
(2k −n)v
x,2k−n
=
n

k=0
(2k −n)
_
n
k
_
p
k
(1 −p)
n−k
= n(2p −1)
104 CHAPTER 4. CONTINUOUS TIME
(it’s an exercise to check the algebra; but the result is intuitive). Also,
Var(W
n
) =
n

k=0
(2k −n)
2
v
x,2k−n
=
n

k=0
(2k −n
2
)
_
n
k
_
p
k
(1 −p)
n−k
= 4np(1 −p)
(another exercise!).
Now lets form a new stochastic process W

and we let the space steps ∆x
and time steps ∆t be different from 1. W

is a shorthand for W
∆t,∆x
.
First, the time steps. We will take time steps of length ∆t. This will have
little effect on our formulas, except everywhere we will have to note that the
number n of time steps needed to get from time 0 to time t is
n =
t
∆t
Each step to the left or right has length ∆x instead of length 1. This has
the effect of scaling W by the factor ∆x. Since multiplying a random variable
by a constant has the effect of multiplying its expected value by the same
constant, and its variance by the square of the constant.
Therefore, the mean and variance of the total displacement in time t are:
E(W

t
) =
t
∆t
(2p −1)∆x and Var(W

t
) = 4p(1 −p)t
(∆x)
2
∆t
We want both of these quantities to remain finite for all fixed t, since we
don’t want our process to have a positive probability of zooming off to infinity
in finite time, and we also want Var(W(t)) to be non-zero. Otherwise the
process will be completely deterministic.
We need to examine what we have control over in order to achieve these
ends. First off, we don’t expect to let ∆x and ∆t to go to zero in any arbitrary
manner. In fact, it seems clear that in order for the variance to be finite and
non-zero, we should insist that ∆t be roughly the same order of magnitude
as (∆x)
2
. This is because we don’t seem to have control over anything else
in the expression for the variance. It is reasonable that p and q vary with
∆x, but we probably don’t want their product to approach zero (that would
force our process to move always to the left or always to the right), and it
can’t approach infinity (since the maximum value of p(1 −p) is 1/4). So we
4.3. THE WIENER PROCESS 105
are forced to make the assumption that
(∆x)
2
∆t
remain bounded as they both
go to zero. In fact, we will go one step further. We will insist that
(∆x)
2
∆t
= 2D (4.3.1)
for some (necessarily positive) constant D.
Now, we are in a bind with the expected value. The expected value will
go to infinity as ∆x → 0, and the only parameter we have to control it is
2p −1. We need 2p −1 to have the same order of magnitude as ∆x in order
to keep E(W

t
) bounded. Since p = 1 − p when p = 1/2, we will make the
assumption that
p =
1
2
+
c
2D
∆x (4.3.2)
for some real (positive, negative or zero) constant c, which entails
1 −p =
1
2

c
2D
∆x
The reason for the choice of the parameters will become apparent. D is called
the diffusion coefficient, and c is called the drift coefficient.
Note that as ∆x and ∆t approach zero, the values v
n,k
= P(W

n∆t
=
k∆x) = p(k∆x, n∆t) specify values of the function p at more and more
points in the plane. In fact, the points at which p is defined become more
and more dense in the plane as ∆x and ∆t approach zero. This means that
given any point in the plane, we can find a sequence of points at which such an
p is defined, that approach our given point as ∆x and ∆t approach zero. We
would like to be able to assert the existence of the limiting function, and that
it expresses how the probability of a continuous random variable evolves with
time. We will not quite do this, but, assuming the existence of the limiting
function, we will derive a partial differential equation that it must satisfy.
Then we shall use a probabilistic argument to obtain a candidate limiting
function. Finally, we will check that the candidate limiting function is a
solution of the partial differential equation. This isn’t quite a proof of the
result we would like, but it touches on the important steps in the argument.
We begin with the derivation of the partial differential equation. To do
this, we note that for the original random walk, it was true that v
x,n
satisfies
the following difference equation:
v
k,n+1
= pv
k−1,n
+ (1 −p)v
k+1,n
106 CHAPTER 4. CONTINUOUS TIME
This is just the matrix of the (infinite discrete) Markov process that defines
the random walk. When we pass into the ∆x, ∆t realm, and use p(x, t)
notation instead of v
k∆x,n∆t
, this equation becomes:
(∗) p(x, t + ∆t) = pp(x −∆x, t) + (1 −p)p(x + ∆x, t)
Next, we need Taylor’s expansion for a function of two variables. As we do
the expansion, we must keep in mind that we are going to let ∆t → 0, and as
we do this, we must account for all terms of order up to and including that
of ∆t. Thus, we must account for both ∆x and (∆x)
2
terms (but (∆x)
3
and
∆t∆x both go to zero faster than ∆t).
We expand each of the three expressions in equation (*) around the point
(x, t), keeping terms up to order ∆t and (∆x)
2
. We get:
p(x, t + ∆t) = p(x, t) + ∆(t)
∂p
∂t
(x, t) +
p(x + ∆x, t) = p(x, t) + ∆(x)
∂p
∂x
(x, t) +
(∆x)
2
2

2
p
∂x
2
(x, t) +
p(x −∆x, t) = p(x, t) −∆(x)
∂p
∂x
(x, t) +
(∆x)
2
2

2
p
∂x
2
(x, t) +
We substitute these three equations into (*), recalling that p + (1 −p) = 1,
we get
∆t
∂p
∂t
= −(2p −1)∆x
∂p
∂x
+
(∆x)
2
2

2
p
∂x
2
(where all the partial derivatives are evaluated at (x, t)). Finally, we divide
through by ∆t, and recall that (∆x)
2
/∆t = 2D, and 2p − 1 = ∆xc/D to
arrive at the equation:
∂p
∂t
= −2c
∂p
∂x
+D

2
p
∂x
2
This last partial differential equation is called the Fokker-Planck equation.
To find a solution of the Fokker-Planck equation, we use a probabilistic
argument to find the limit of the random walk distribution as ∆x and ∆t
approach 0. The key step in this argument uses the fact that the binomial
distribution with parameters n and p approaches the normal distribution
with expected value np and variance np(1 − p) as n approaches infinity. In
terms of our assumptions about ∆x, ∆t, p and 1 −p, we see that at time t,
the binomial distribution has mean
t(2p −1)
∆x
∆t
= 2ct
4.3. THE WIENER PROCESS 107
and variance
4p(1 −p)t
(∆x)
2
∆t
which approaches 2Dt as ∆t → 0. Therefore, we expect W
t
in the limit to be
normally distributed with mean 2ct and variance 2Dt. Thus, the probability
that W
t
< x is
P(W
t
< x) =
1


_ x−2ct

2Dt
−∞
e

1
2
y
2
dy
Therefore the pdf of W
t
is the derivative of this with respect to x, which
yields the pdf
p(x, t) =
1
2

πDt
e

(x−2ct)
2
4Dt
A special case of this is when c = 0 and D =
1
2
. The corresponding process
is called the Wiener process (or Brownian motion). The following properties
characterize the Wiener process, which we record as a theorem.
Theorem 4.3.3. 1. W
0
= 0 with probability 1.
2. W
t
is a continuous function of t with probability 1.
3. For t > s W
t
− W
s
is normally distributed with mean 0 and variance
t −s.
4. For T ≥ s ≥ t ≥ u ≥ v, W
s
−W
t
and W
u
−W
v
are independent.
5. The probability density for W
t
is
p(x, t) =
1

2πt
e

x
2
2t
Proof: Let’s discuss these points.
1. This is clear since we defined W
t
as a limit of random walks which all
started at 0.
3. This we derived from the Fokker-Plank equation.
4. Again, we defined W
t
as a limit of random walks, W

. Now W

t
=
∆x(ξ
1
+ ξ
n
) where n = t/∆t. So if t = n∆t, s = m∆t, u = k∆t and
v = l∆t where n ≥ m ≥ k ≥ l then we have
W
t
−W
s
= ∆x(ξ
m+1

m+2
+ ξ
n
)
108 CHAPTER 4. CONTINUOUS TIME
and
W
u
−W
v
= ∆x(ξ
l+1
+ ξ
k
)
which are independent.
2. This fact is beyond the scope of our discussion but is intuitive.
Q.E.D.
While, according to 2. above, the paths W
t
are continuous, they are very
rough and we see according to the next proposition that they are not differ-
entiable, with probability 1.
Proposition 4.3.4. For all t > 0 and all n = 1, 2, . . .,
lim
h→0
P([W
t+h
−W
t
[ > n[h[) = 1
so that W
t
is nowhere differentiable.
Proof: Assume without loss of generality that h > 0. Then
P([W
t+h
−W
t
[ > nh) = P([W
h
[ > nh)
by 3) of ??. Now this gives
1 −P([W
h
[ ≤ nh)
Now
P([W
h
[ ≤ nh)) =
1

2πh
_
nh
−nh
e

x
2
2t
dx

1

2πh
_
nh
−nh
1dx ≤
1

2πh
2hn
=
_
2h
π
which converges to 0 as h → 0 for any n. Q.E.D.
Almost everything we use about the Wiener process will follow from ??.
We now do some sample computations to show how to manipulate the Wiener
process and some of which will be useful later.
1. Calculate the expected distance of the Wiener process from 0.
Solution: This is just E([W
t
[) which is
_

−∞
[x[
1

2πt
e

x
2
2t
4.3. THE WIENER PROCESS 109
= 2
_

0
x
1

2πt
e

x
2
2t
Changing variables, u = x
2
/(2t) this integral becomes
2t

2πt
_

0
e
−u
du =
_
2t/π
2. Calculate E(W
t
W
s
).
Solution: Assume without loss of generality that t > s. Then
E(W
t
W
s
) = E((W
t
−W
s
)W
s
+W
2
s
) = E((W
t
−W
s
)W
s
) +E(W
2
s
)
Now W
t
−W
s
is independent of W
s
since t > s and W
s
has mean 0, therefore
we get
E(W
t
−W
s
)E(W
s
) +σ
2
(W
s
) = s
since W
t
−W
s
has mean 0.
The Wiener Process Starting at y, W
y
The defining properties of W
y
t
= W
t
+y, the Wiener process starting at y,
are:
1. W
y
0
= y.
2. W
y
t
is continuous with probability 1.
3. W
y
t
is normally distributed with µ = y, σ
2
= t.
4. W
y
t
−W
y
s
is normally distributed with µ = 0, σ
2
= t −s, t > s.
5. W
y
t
−W
y
s
, W
y
u
−W
y
v
are independent for t ≥ s ≥ u ≥ v.
6. The pdf is given by
p(x, t[y) =
1

2πt
e

(x−y)
2
2t
P(a ≤ W
y
t
≤ b) =
1

2πt
_
b
a
e

(x−y)
2
2t
dx
110 CHAPTER 4. CONTINUOUS TIME
How do we use Wiener Process to model stock movements?
We would like to make sense of the idea that
dS
t
dt
= µS
t
+noise(σ, S
t
), where
if the noise term were absent then S
t
is a bank account earning interest u.
There are two obvious problems:
(1) THe Wiener process is differentiable nowhere, so why should S
t
be
differentiable anywhere?
(2) “Noise” is very difficult to make rigorous.
To motivate our way out of these problems let us look at deterministic
motion. The most general 1st order differential equation is
df
dt
= φ(f(t), t) .
By integrating both sides we arrive at the following integral equation
f(t) −f(t
0
) =
_
t
t
0
df
dt
dt =
_
t
t
0
φ(f(t), t) dt .
That is,
f(t) = f(t
0
) +
_
t
t
0
φ(f(t), t) dt .
Theoretically integral equations are better behaved than differential equa-
tions (they tend to smooth out bumps). So we try to make the corresponding
conversion from differential to integral equation in the stochastic world.
Our first goal is just to define a reasonable notion of stochastic integral.
We had the Martingale Transform: If S
n
was a stock, φ
n
denoted the
amount of stock at tick n, then the gain from tick 0 to tick N was
N

i=1
φ
i
(S
i
−S
i−1
)
It is these sorts of sums that become integrals in continuous time.
Recall the Riemann integral
_
b
a
f(x) dx: Take a partition of [a, b], T =
¦a = t
0
< t
1
< < t
n
= b¦.
Theorem 4.3.5. (Riemann’s Theorem) Let f be a function such that the
probability of choosing a point of discontinuity is 0. Let mesh T = max¦[t
i
−t
i−1
[ [i =
1, . . . , n¦, let t
i−1
< ξ
i
< t
i
, then let
RS(f, T) =
n

i=1
f(ξ
i
)(t
i
−t
i−1
) .
4.3. THE WIENER PROCESS 111
(RS stands for Riemann sum). Then
lim
meshP→0
RS(f, T)
exists, and is independent of the choices of ξ
i
within their respective intervals.
We call this limit the Riemann integral
_
b
a
f(x)dx
Note that the summation is beginning to look like a martingale transform.
The next level of integral technology is the Riemann-Stieltjes integral.
Suppose φ(x) is a non decreasing function, T a partition, and ξ
i
values be-
tween partition points. Let
RSS(f, T, φ) =
n

i=1
f(ξ
i
)(φ(t
i
) −φ(t
i−1
)) .
(“RSS” stands for Riemann-Stieltjes sum.) If lim
meshP→0
RSS(f, T, φ) ex-
ists, then by definition,
_
b
a
f(t) dφ(t) is this limit. Assuming φ is differen-
tiable, the corresponding Riemann’s theorem holds, namely
lim
meshP→0
RSS(f, T, φ)
exists (and is independent of the choices of ξ
i
).
Proposition 4.3.6. If φ is differentiable,
_
b
a
f(t) dφ(t) =
_
b
a
f(t)φ

(t) dt.
Proof. Let T = ¦a = t
0
< t
1
< < t
n
= b¦.
RSS(f, T, φ) =

n
i=1
f(ξ
i
)(φ(t
i
) −φ(t
i−1
))
=

n
i=1
f(ξ
i
)
φ(t
i
)−φ(t
i−1
)
t
i
−t
i−1
(t
i
−t
i−1
)
=

n
i=1
f(ξ
i
)(φ

(t
i
) +o([t
i
−t
i−1
[
2
))(t
i
−t
i−1
)
(4.3.3)
and so lim
meshP→0

n
i=1
f(ξ
i

(t
i
)(t
i
−t
i−1
) =
_
b
a
f(t)φ

(t) dt.
Example. Let φ(x) be differentiable.
_
b
a
φ(t) dφ(t) =
_
b
a
φ(t)φ

(t) dt =
1
2
φ(b)
2

1
2
φ(a)
2
.
112 CHAPTER 4. CONTINUOUS TIME
A stochastic integral is formed much like the Riemann-Stieltjes integral,
with one twist.
For X
t
and φ
t
stochastic processes,
_
b
a
φ
t
dX
t
is to be a random variable. In the Riemann and Riemann-Stieltjes cases, the
choices of the points ξ
i
in the intervals were irrelevant, in the stochastic case it
is very much relevant. As an example, take T = ¦a = t
0
< t
1
< < t
n
= b¦.
Consider
n

i=1
φ
ξ
i
(X
t
i
−X
t
i−1
) .
Let’s work out the above sum for X
t
= φ
t
= W
t
, where W
t
is the Wiener
process, using two different choices for the ξ
i
’s.
First, let ξ
i
= t
i−1
, the left endpoint. Then
E
_

n
i=1
W
t
i−1
(W
t
i
−W
t
i−1
)
_
=

n
i=1
E(W
t
i−1
(W
t
i
−W
t
i−1
))
=

n
i=1
E(W
t
i−1
)E(W
t
i
−W
t
i−1
) = 0 ,
using independence.
Second, let ξ
i
= t
i
, the right endpoint. Then
E
_

n
i=1
W
t
i
(W
t
i
−W
t
i−1
)
_
=

n
i=1
E(W
t
i
(W
t
i
−W
t
i−1
)) =

n
i=1
E(W
t
i
2
−W
t
i
W
t
i−1
)
=

n
i=1
E(W
t
i
2
) −E(W
t
i
W
t
i−1
)
=

n
i=1
t
i
−t
i−1
= (t
1
−t
0
) + (t
2
−t
1
) + + (t
n
−t
n−1
)
= t
n
−t
0
= b −a .
Which endpoint to choose? Following Itˆo, one uses the left-hand endpoint.
One defines the Itˆo sum
IS(φ
t
, T, X
t
) =
n

i=1
φ
t
i−1
(X
t
i
−X
t
i−1
)
.
There are two motivations for this choice.
4.3. THE WIENER PROCESS 113
1. The lefthand choice looks a little like a previsibility condition. For
example, if X
t
is a stock-process and φ
t
a portfolio process then the
left-hand choice embodies the idea that our gains from time T
i−1
to
time t
i
should be based on a choice made at time t
i−1
.
2. It turns out, that if X
t
is a Martingale, then the Ito sum will be a
Martingale as well.
Following Stratonovich, one takes ξ
i
=
1
2
(t
i−1
+ t
i
), the midpoint. As an
advantage, the resulting Stratonovich calculus looks like traditional calculus.
As a disadvantage, the derived stochastic process is not a martingale.
Let X
t
be a stochastic process. Let φ
t
be another stochastic process defined
on the same sample space as X
t
.
Definition 4.3.2. φ
t
is adapted to X
t
if φ
t
is independent of X
s
− X
t
for
s > t.
Then we define the Itˆo integral
_
t
a
φ
s
dX
s
≡ lim
meshP→0
IS(φ, T, X) = lim
meshP→0
n

i=1
φ
t
i−1
(X
t
i
−X
t
i−1
) .
Let us call a random variable with finite first and second moments sqaure
integrable. One knows that for X, Y square integrable random variables
defined on the same sample space, P(X = Y ) = 1 iff E(X) = E(Y ) and
Var(X −Y ) = 0. Now Var(X −Y ) = E((X −Y )
2
) −(E(X −Y ))
2
, so
P(X = Y ) = 1 iff E((X −Y )
2
) = 0 .
This is a criterion for checking whether two square integrable random vari-
ables are equal (almost surely).
Definition 4.3.3. A sequence X
α
of square integrable random variables
converges in mean-square to X if E((X −X
α
)
2
) → 0 as α → 0. In symbols,
limX
α
= X. This is equivalent to E(X
α
) → E(X) and Var(X −X
α
) → 0.
This is the sense of the limit in the definition of the stochastic integral.
Here is a sample computation of a stochastic integral. Let’s evaluate
_
b
a
W
t
dW
t
114 CHAPTER 4. CONTINUOUS TIME
We need the following facts: E(W
t
) = 0, E(W
t
2
) = t, E(W
t
3
) = 0, and
E(W
t
4
) = 3t
2
.
IS(W
t
, T, W
t
) =

n
i=1
W
t
i−1
∆W
t
i
=

n
i=1
W
t
i−1
(W
t
i
−W
t
i−1
)
=
1
2

n
i=1
(W
t
i−1
+ ∆W
t
i
)
2
−W
t
i−1
2
−(∆W
t
i
)
2
=
1
2

n
i=1
W
t
i
2
−W
t
i−1
2

1
2

n
i=1
(∆W
t
i
)
2
=
1
2
((W
t
1
2
−W
t
0
2
) + (W
t
2
2
−W
t
1
2
) +
+(W
tn
2
−W
t
n−1
2
)) −
1
2

n
i=1
(∆W
t
i
)
2
=
1
2
(W
b
2
−W
a
2
) −
1
2

n
i=1
(∆W
t
i
)
2
.
The first part is recognizable as what the Riemann-Stieltjes integral gives
formally. But what is

n
i=1
(∆W
t
i
)
2
as meshT → 0? The simple calculation
E
_
n

i=1
(∆W
t
i
)
2
_
=
n

i=1
E((∆W
t
i
)
2
)
=
n

i=1
E((W
t
i
−W
t
i−1
)
2
) =
n

i=1
t
i
−t
i−1
= b −a
leads us to the
Claim:
lim
meshP→0
E(
n

i=1
(∆W
t
i
)
2
) = b −a
and therefore
_
b
a
W
t
dW
t
=
1
2
(W
b
2
−W
a
2
) −
1
2
(b −a) .
Proof.
E
_
(
n

i=1
(∆W
t
i
)
2
−(b −a))
2
_
= E
_
(

n
i=1
(∆W
t
i
2
))
2
−2(b −a)

n
i=1
(∆W
t
i
)
2
+ (b −a)
2
_
= E
_

n
i=1
(∆W
t
i
)
4
+ 2

i<j
(∆W
t
i
)
2
(∆W
t
j
)
2
−2(b −a)

n
i=1
(∆W
t
i
)
2
+ (b −a)
2
_
=

n
i=1
3(∆t
i
)
2
+ 2

i<j
E((∆W
t
i
)
2
)E((∆W
t
j
)
2
) −E(2(b −a)

n
i=1
(∆W
t
i
)
2
+ (b −a)
2
)
= 3

n
i=1
(∆t
i
)
2
+ 2

i<j
(∆t
i
)(∆t
j
) −2(b −a)

n
i=1
∆t
i
+ (b −a)
2
= 2

n
i=1
(∆t
i
)
2
+
_

n
i=1
(∆t
i
)
2
+ 2

i<j
(∆t
i
)(∆t
j
)
_
−2(b −a)

n
i=1
∆t
i
+ (b −a)
2
= 2

n
i=1
(∆t
i
)
2
+
_

n
i=1
(∆t
i
)
_
2
−2(b −a)

n
i=1
∆t
i
+ (b −a)
2
= 2

n
i=1
(∆t
i
)
2
+
_

n
i=1
∆t
i
−(b −a)
_
2
= 2

n
i=1
(∆t
i
)
2
4.3. THE WIENER PROCESS 115
Now for each i, ∆t
i
≤ mesh T, so (∆t
i
)
2
≤ ∆t
i
mesh T. (Note that only
one ∆t
i
is being rounded up.) Summing i = 1 to n, 0 ≤

n
i=1
(∆t
i
)
2

_
n
i=1
∆t
i
_
mesh T = (b −a) mesh T → 0 as meshT → 0. So
lim
meshP→0
n

i=1
(∆t
i
)
2
= 0 .
Intuitively, ∆t
i
≈ mesh T ≈
1
n
, so lim
meshP→0

n
i=1
(∆t
i
)
2
≈ lim
n→∞
n
(
1
n
)
2
→ 0.
The underlying reason for the difference of the stochastic integral and the
Stieltjes integral is that ∆W
t
i
is, with high probability, of order

∆t
i
.
Slogan of the Itˆo Calculus: dW
t
2
= dt.
In deriving the Wiener Process,
(∆X)
2
∆t
= 1, so (∆X)
2
= (∆W)
2
= ∆t.
dW
2
t
= dt is only given precise sense through its integral form:
Theorem 4.3.7. . If φ
s
is adapted to W
s
, then
_
t
a
φ
s
(dW
s
)
2
=
_
t
a
φ
s
ds.
Moreover, for n > 2,
_
t
a
φ
s
(dW
s
)
n
= 0.
Proving this will be in the exercises.
Properties of the Itˆo Integral. If φ
s
, ψ
s
are adapted to the given process
(X
s
or W
s
), then
(1)
_
t
a
dX
s
= X
t
−X
a
,
(2)
_
t
a

s

s
) dX
s
=
_
t
a
φ
s
dX
s
+
_
t
a
ψ
s
dX
s
,
(3) E
_
_
t
a
φ
s
dW
s
_
= 0 and
(4) E
_
(
_
t
a
φ
s
dW
s
)
2
_
= E
_
_
t
a
φ
s
2
ds
_
.
Proof.
116 CHAPTER 4. CONTINUOUS TIME
1. For any partition T = ¦a = s
0
< s
1
< < s
n
= t¦, IS(1, T, X
s
) =

n
i=1
1 (X
s
i
− X
s
i−1
), which telescopes to X
sn
− X
s
0
= X
t
− X
a
. So
the limit as meshT → 0 is X
t
−X
a
.
2. It is obvious for any partition T that IS(φ
s

s
, T, X
s
) = IS(φ
s
, T, X
s
)+
IS(ψ
s
, T, X
s
). Now take the limit as meshT → 0.
3. Given a partition T, IS(φ
s
, T, W
s
) =

n
i=1
φ
s
i−1
(W
s
i
−W
s
i−1
). Taking
expected values gives
E(IS(φ
s
, T, W
s
)) = E
_

n
i=1
φ
s
i−1
(W
s
i
−W
s
i−1
)
_
=

n
i=1
E(φ
s
i−1
(W
s
i
−W
s
i−1
))
=

n
i=1
E(φ
s
i−1
)E(W
s
i
−W
s
i−1
) = 0 .
Taking the limit as meshT → 0 gives the result.
4. Given a partition T,
E(IS(φ
s
, T, W
s
)
2
)
= E
_
(

n
i=1
φ
s
i−1
(W
s
i
−W
s
i−1
))
2
_
= E
_

n
i=1
φ
s
i−1
2
(W
s
i
−W
s
i−1
)
2
+ 2

i<j
φ
s
i−1
φ
s
j−1
(W
s
i
−W
s
i−1
)(W
s
j
−W
s
j−1
)
_
= E
_

n
i=1
φ
s
i−1
2
(∆W
s
i
)
2
_
+ 2

i<j
E(φ
s
i−1
φ
s
j−1
(W
s
i
−W
s
i−1
)(W
s
j
−W
s
j−1
))
= E
_

n
i=1
φ
s
i−1
2
(∆W
s
i
)
2
_
+ 2

i<j
E(φ
s
i−1
φ
s
j−1
(W
s
i
−W
s
i−1
)) E(W
s
j
−W
s
j−1
)
=

n
i=1
E(φ
s
i−1
2
)∆s
i
+ 2

i<j
E(φ
s
i−1
φ
s
j−1
(W
s
i
−W
s
i−1
)) 0
=

n
i=1
E(φ
s
i−1
2
)∆s
i
= IS(φ
s
2
, T, s) .
Now take the limit as meshT → 0.
4.3.3 Stochastic Differential Equations.
A first-order stochastic differential equation is an expression of the form
dX
t
= a(X
t
, t) dt +b(X
t
, t) dW
t
,
where a(x, t), b(x, t) are functions of two variables.
A stochastic process X
t
is a solution to this SDE if
X
t
−X
a
=
_
t
a
dX
s
=
_
t
a
a(X
s
, s) ds +b(X
s
, s) dW
s
.
4.3. THE WIENER PROCESS 117
Lemma 4.3.8. (Itˆo’s Lemma). Suppose we are given a twice differentiable
function f(x), and a solution X
t
to the SDE dX
t
= a dt +b dW
t
. Then f(X
t
)
is a stochastic process, and it satisfies the SDE
df(X
t
) = [f

(X
t
)a +
1
2
f

(X
t
)b
2
] dt +f

(X
t
)b dW
t
.
More generally, if f(x, t) a function of two variables, twice differentiable in
x, once differentiable in t, one has df(X
t
, t)
= [
∂f
∂t
(X
t
, t) +
∂f
dx
(X
t
, t)a +
1
2

2
f
∂x
2
(X
t
, t)b
2
] dt +
∂f
∂x
(X
t
, t)b dW
t
.
Proof. This is just a calculation using two terms of the Taylor’s series for
f, and the rules dW
t
2
= dt and dW
t
n
= 0 for n > 2. (Note that dt dW
t
=
dW
t
3
= 0 and dt
2
= dW
t
4
= 0.)
df(X
t
) = f(X
t+dt
) −f(X
t
) = [f(X
t
+dX
t
)] −f(X
t
)
= [f(X
t
) +f

(X
t
) dX
t
+
1
2
f

(X
t
)(dX
t
)
2
+o((dX
t
)
3
)] −f(X
t
)
= f

(X
t
) dX
t
+
1
2
f

(X
t
)(dX
t
)
2
= f

(X
t
)[a(X
t
, t) dt +b(X
t
, t) dW
t
] +
1
2
f

(X
t
)[a
2
dt
2
+ 2ab dt dW
t
+b
2
dW
t
2
]
= [f

(X
t
)a(X
t
, t) +
1
2
f

(X
t
)(b(X
t
, t))
2
] dt +f

(X
t
)b(X
t
, t) dW
t
Example. To calculate
_
t
a
W
t
dW
t
, show that is equal to
_
t
a
dX
t
= X
t
−X
a
for some X
t
. That is, find f(x) such that W
t
dW
t
= d(f(W
t
)). Since
_
x dx =
1
2
x
2
, consider f(x) =
1
2
x
2
. Then f

(x) = x and f

(x) = 1, so Itˆo’s lemma,
applied to dW
t
= 0 dt + 1 dW
t
, gives
d(f(W
t
)) = f

(W
t
) dW
t
+
1
2
f

(W
t
) dt = W
t
dW
t
+
1
2
1 dt = W
t
dW
t
+
1
2
dt .
Equivalently, W
t
dW
t
= df(W
t
)−
1
2
dt. Therefore,
_
t
a
W
s
dW
s
=
_
t
a
d(f(W
s
))−
_
t
a
1
2
dt = f(W
t
) −f(W
a
) −
1
2
(t −a) =
1
2
W
t
2

1
2
W
a
2

1
2
(t −a).
Example. Calculate
_
t
a
W
s
2
dW
s
. Since
_
x
2
dx =
1
3
x
3
, consider f(x) =
1
3
x
3
.
Then f

(x) = x
2
and f

(x) = 2x. Itˆo’s lemma implies
d(
1
3
x
3
) = W
s
2
dW
s
+
1
2
2W
s
ds = W
s
2
dW
s
+W
s
ds ,
or W
s
2
dW
s
= d(
1
3
W
s
3
) −W
s
ds, so
_
t
a
W
s
2
dW
s
=
1
3
W
t
2

1
3
W
a
2

_
t
a
W
s
ds .
118 CHAPTER 4. CONTINUOUS TIME
(The last integrand does not really simplify.)
Example. Solve dX
s
=
1
2
X
s
ds + X
s
dW
s
. Consider f

(X
s
) X
s
= 1. With
ordinary variables, this is f

(x)x = 1, or f

(x) =
1
x
. So we are led to consider
f(x) = ln x, with f

(x) =
1
x
, and f

(x) = −
1
x
2
.
Then, using dX
s
=
1
2
X
s
ds +X
s
dW
s
and dX
s
2
= X
s
2
ds,
d(ln X
s
) =
1
Xs
dX
s
+
1
2
(−
1
Xs
2
)(dX
s
)
2
=
1
2
ds +dW
s

1
2
ds = dW
s
.
Now integrate both sides, getting
_
t
a
d(ln X
s
) =
_
t
a
dW
s
= W
t
−W
a
or
ln
_
Xt
Xa
_
= W
t
−W
a
and so
X
t
= X
a
e
Wt−Wa
.
Asset Models
The basic model for a stock is the SDE
dS
t
= µS
t
dt +σS
t
dW
t
,
where µ is the short-term expected return, and σ is the standard deviation
of the the short-term returns, called volatility.
Note that if σ = 0, the model simplifies to dS
t
= µS
t
dt, implying S
t
=
S
a
e
µt
, which is just the value of an interest-bearing, risk-free security.
Compared with the generic SDE
dX
t
= a(X
t
, t) dt +b(X
t
, t) dW
t
the stock model uses a(x, t) = µx and b(x, t) = σx. We have dS
t
2
= σ
2
S
t
2
dt.
The model SDE is similar to the last example, and the same change of vari-
ables will work. Using f(x) = ln x, f

(x) =
1
x
, and f

(x) = −
1
x
2
, Itˆo’s lemma
gives
d(ln S
t
) =
1
St
dS
t
+
1
2
(−
1
St
2
)(dS
t
)
2
= µdt +σ dW
t

1
2
σ
2
dt
= (µ −
1
2
σ
2
) dt +σ dW
t
.
Integrating both sides gives
_
t
0
d(ln S
s
) =
_
t
0
(µ −
1
2
σ
2
) ds +σ dW
s
ln S
t
−ln S
0
= (µ −
1
2
σ
2
)t +σW
t
St
S
0
= e
(µ −
1
2
σ
2
)t +σW
t
S
t
= S
0
e
(µ −
1
2
σ
2
)t +σW
t
.
4.3. THE WIENER PROCESS 119
The expected return of S
t
is
E(ln(S
t
/S
0
)) = E((µ −
1
2
σ
2
)t +σW
t
) = (µ −
1
2
σ
2
)t .
Vasicek’s interest rate model is dr
t
= a(b−r
t
) dt+σ dW
t
. It satisfies “mean
reversion”, meaning that if r
t
< b, then dr
t
> 0, pulling r
t
upwards, while if
r
t
> b, then dr
t
< 0, pushing r
t
downwards.
The Cox-Ingersoll-Ross model is dr
t
= a(b−r
t
) dt +σ

r
t
dW
t
. It has mean
reversion, and a volatility that grows with r
t
.
Ornstein-Uhlenbeck Process. An O-U process is a solution to
dX
t
= µX
t
dt +σ dW
t
with an initial condition X
0
= x
0
.
This SDE is solved by using the integrating factor e
−µt
. First,
e
−µt
dX
t
= µe
−µt
X
t
dt +σe
−µt
dW
t
holds just by multiplying the given SDE. Letting f(x, t) = xe
−µt
, then
∂f
∂t
= −µe
−µt
X
t
dt ,
∂f
∂x
= e
−µt
, and

2
f
∂x
2
= 0 .
Itˆo’s lemma gives
df(X
t
, t) =
∂f
∂t
(X
t
, t) dt +
∂f
∂x
(X
t
, t) dX
t
+

2
f
∂x
2
(X
t
, t) dX
t
2
d(e
−µt
X
t
) = −µe
−µt
X
t
dt +e
−µt
dX
t
or
e
−µt
dX
t
= d(e
−µt
X
t
) +µe
−µt
X
t
dt .
So the SDE, with integrating factor, rewrites as
d(e
−µt
X
t
) +µe
−µt
X
t
dt = µe
−µt
X
t
dt +σe
−µt
dW
t
d(e
−µt
X
t
) = σe
−µt
dW
t
.
Now integrate both sides from 0 to t:
_
t
0
d(e
−µs
X
s
) =
_
t
0
σe
−µs
dW
s
e
−µt
X
t
−X
0
=
_
t
0
σe
−µs
dW
s
X
t
= e
µt
X
0
+e
µt
_
t
0
σe
−µs
dW
s
.
120 CHAPTER 4. CONTINUOUS TIME
Let’s go back to the stock model
dS
t
= µS
t
dt +σS
t
dW
t
with solution
S
t
= S
0
e
(µ −
1
2
σ
2
)t +σW
t
.
We calculate the pdf of S
t
as follows. First,
P(S
t
≤ S) = P(S
0
e
(µ −
1
2
σ
2
)t +σW
t
≤ S)
= P(e
(µ −
1
2
σ
2
)t +σW
t
≤ S/S
0
)
= P((µ −
1
2
σ
2
)t +σW
t
≤ ln(S/S
0
))
=
1

2πσ
2
t
_
ln(S/S
0
)
−∞
e

(x−(µ−
1
2
σ
2
)t)
2

2
t
dx .
Second, the pdf is the derivative of the above with respect to S. The funda-
mental theorem of calculus gives us derivatives of the form
d
du
_
u
a
φ(x) dx =
φ(u). The chain rule gives us, where u = g(S), a derivative
d
dS
_
g(S)
a
φ(x) dx = (
d
du
_
u
a
φ(x) dx)
du
dS
= φ(u) g

(S) = φ(g(S)) g

(S) .
In the case at hand, g(S) = ln(S/S
0
) = ln S −ln S
0
, so g

(S) = 1/S, and the
pdf is
d
dS
P(S
t
≤ S) =
d
dS
1

2πσ
2
t
_
ln(S/S
0
)
−∞
e

(x−(µ−
1
2
σ
2
)t)
2

2
t
dx
=
1
S

2πσ
2
t
e

(ln(S/S
0
)−(µ−
1
2
σ
2
)t)
2

2
t
for S > 0 and identically 0 for S ≤ 0.
Pricing Derivatives in Continuous Time
Our situation concerns a stock with price stochastic variable S
t
, and a
derivative security, assumed to have expiration T, and further assumed that
its (gross) payoff depends only on T and S
T
. Let V (S
T
, T) denote this payoff.
More generally, let V (S
t
, t) denote the payoff for this derivative if it expires
at time t with stock price S
t
. In particular, V (S
t
, t) depends only on t and
S
t
. That is, we have a deterministic function V (s, t), and are plugging in a
stochastic process S
t
and getting a stochastic process V (S
t
, t).
To price V , construct a self-financing, hedging portfolio for V . That is, set
up a portfolio consisting of φ
t
units of stock S
t
and ψ
t
units of bond B
t
Π
t
= φ
t
S
t

t
B
t
,
4.3. THE WIENER PROCESS 121
such that the terminal value of the portfolio equals the derivative payoff
Π
T
= V
T
,
subject to the self-financing condition
SFC1 Π
t
= Π
0
+
_
t
0
φ
s
dS
s

s
dB
s
.
The pricing of the portfolio is straightforward, and the no-arbitrage as-
sumption implies Π
t
= V
t
for all t ≤ T.
Recall, also, the self-financing condition in the discrete case:
(∆Π)
n
= φ
n
∆S
n

n
∆B
n
.
The continuous limit is
SFC2 dΠ
t
= φ
t
dS
t

t
dB
t
.
We now model the stock and bond. The stock is dS
t
= µS
t
dt + σS
t
dW
t
,
while the bond is dB
t
= rB
t
dt.
Theorem 4.3.9. (Black-Scholes). Given a derivative as above, there exists
a self-financing, hedging portfolio Π
t
= φ
t
S
t

t
B
t
with Π
t
= V (S
t
, t) if and
only if
(BSE)
∂V
∂t
+
1
2
σ
2
S
2

2
V
∂S
2
+rS
∂V
∂S
−rV = 0,
and furthermore, φ
t
=
∂V
∂S
.
Proof. Let’s calculate dV (S
t
, t) = dΠ
t
two ways. First, by Ito’s lemma
dV (S
t
, t) =
∂V
∂t
(S
t
, t) dt +
∂V
∂S
(S
t
, t) dS
t
+
1
2

2
V
∂S
2
(S
t
, t)(dS
t
)
2
,
or, suppressing the S
t
, t arguments,
dV
t
=
∂V
∂t
dt +
∂V
∂S
dS
t
+
1
2

2
V
∂S
2
(dS
t
)
2
= (
∂V
∂t
+µS
t
∂V
∂S
+
1
2
σ
2
S
t
2 ∂
2
V
∂S
2
) dt +
∂V
∂S
σS
t
dW
t
.
On the other hand, using our stock and bond model on a portfolio, the self
financing condition (SFC2) says that V
t
= Π
t
= φ
t
S
t

t
B
t
,
dV
t
= dΠ
t
= φ
t
dS
t

t
dB
t
= (φ
t
µS
t

t
rB
t
) dt +φ
t
σS
t
dW
t
.
122 CHAPTER 4. CONTINUOUS TIME
If these are to be equal, the coefficients of dt and of dW
t
must be equal.
Equating the coefficients of dW
t
gives
φ
t
σS
t
=
∂V
∂S
σS
t
, hence φ
t
=
∂V
∂S
,
and equating the coefficients of dt gives
φ
t
µS
t

t
rB
t
=
∂V
∂t
+µS
t
∂V
∂S
+
1
2
σ
2
S
t
2

2
V
∂S
2
,
hence
ψ
t
rB
t
=
∂V
∂t
+
1
2
σ
2
S
t
2

2
V
∂S
2
.
Since ψ
t
B
t
= Π
t
−φ
t
S
t
= V
t

∂V
∂S
S
t
we have
r(V
t

∂V
∂S
S
t
) =
∂V
∂t
+
1
2
σ
2
S
t
2

2
V
∂S
2
,
or in other works (BSE), completing the proof.
In summary, to construct a self-financing, replicating portfolio for a deriva-
tive, you need the value of the derivative V
t
= V (S
t
, t) to satisfy the Black-
Scholes equation and for the stock holding to be φ
t
=
∂V
∂S
.
The theory of partial differential equations tells us that in order to get
unique solutions, we need boundary and temporal conditions. Since the BSE
is first order in t, we need one time condition. And since the BSE is second
order in S, we need two boundary conditions.
The time condition is simply the expiration payoff V (S
T
, T) = h(S
T
).
(Usually, one has an “initial” condition, corresponding to time 0. Here, one
has a “final” or “terminal” condition.) The boundary conditions are usually
based on S = 0 and S → ∞.
For example, consider a call. h(S) = max(S − K, 0) is, by definition,
the payoff of a call at expiry, so V (S
T
, T) = max(S
T
− K, 0) is the final
condition. If the stock price is zero, a call is worthless, hence V (0, t) = 0 is
one boundary condition. If the stock price grows arbitrarily large, the call is
essentially worth the same as the stock, that is V (S, t) ∼ S.
The One-Dimensional Heat Equation
Let the temperature of a homogeneous one-dimensional object at position
x, time t be given by u(x, t). The heat equation is
∂u
∂t
= k

2
u
∂x
2
.
4.3. THE WIENER PROCESS 123
where k is a constant, the thermal conductivity of the object’s material.
k =
1
2
gives the Fokker-Planck equation for a Wiener path, with solution
u(x, t) =
1

2πt
e

x
2
2t
.
If we let k = 1, the solution is
u(x, t) =
1

4πt
e

x
2
4t
.
What is lim
t→0
u(x, t), the initial temperature distribution? To answer this,
we use the following notion.
Definition. A Dirac δ-function is a one-parameter family δ
t
, indexed by
t ≥ 0, of piecewise smooth functions such that
(1) δ
t
(x) ≥ 0 for all t, x,
(2)
_

−∞
δ
t
(x) dx = 1 for all t,
(3) lim
t→0
δ
t
(x) = 0 for all x ,= 0.
Example. Consider
δ
t
= ¦
0 , if x < −t or x > t ,
1
2t
, if −t ≤ x ≤ t .
Important Example. The solutions to the heat equation
δ
t
(x) = u(x, t) =
1

4πt
e

x
2
4t
are a Dirac δ-function.
Proposition. If f is any bounded smooth function, δ
t
any Dirac δ-function,
then
lim
t→0
_

−∞
δ
t
(x)f(x) dx = f(0) ,
more generally,
lim
t→0
_

−∞
δ
t
(x −y)f(x) dx = f(y) .
Intuitively, this can be seen as follows. Every Dirac δ-function is something
like the first example, and on a small interval, continuous functions are ap-
proximately constant, so lim
t→0
_

−∞
δ
t
(x−y)f(x) dx ≈ lim
t→0
_
t−y
−t−y
1
2t
f(x) dx ≈
lim
t→0
2t
1
2t
f(y) ≈ f(y).
124 CHAPTER 4. CONTINUOUS TIME
Proceeding, the Heat Equation
∂u
∂t
=

2
u
∂x
2
,
with initial condition u(x, 0) = φ(x) and boundary conditions u(∞, t) =
u(−∞, t) = 0, will be solved using
δ
t
=
1

4πt
e

x
2
4t
,
which is both a Dirac δ-function and a solution to the Heat Equation for
t > 0.
Let
v(y, t) =
1

4πt
_

−∞
e

(x−y)
2
4t
φ(x) dx for t > 0 .
Claim. lim
t→0
v(x, t) = φ(x). This is just an example of the previous proposi-
tion:
lim
t→0
1

4πt
_

−∞
e

(x−y)
2
4t
φ(x) dx = lim
t→0
_

−∞
δ
t
(x −y)φ(x) dx = φ(y) .
Claim. v satisfies the Heat Equation.
∂v
∂t
=

∂t
_

−∞
e

(x−y)
2
4t
φ(x) dx
=
_

−∞

∂t
(e

(x−y)
2
4t
) φ(x) dx
=
_

−∞

2
∂y
2
(e

(x−y)
2
4t
) φ(x) dx
=

2
∂y
2
_

−∞
e

(x−y)
2
4t
φ(x) dx =

2
v
∂y
2
.
Back to the Black-Scholes equation
∂V
∂t
+
1
2
σ
2
S
2

2
V
∂S
2
+rS
∂V
∂S
−rV = 0 .
Note that µ, the expected return of S, does not appear explicitly in the
equation. This is called risk neutrality. The particulars of the derivative
are encoded in the boundary conditions. For a call C(S, t) one has
∂C
∂t
+
1
2
σ
2
S
2

2
C
∂S
2
+rS
∂C
∂S
−rV = 0 ,
4.3. THE WIENER PROCESS 125
C(S, T) = max(S −K, 0) ,
C(0, t) = 0 for all t, C(S, t) ∼ S as S → ∞.
And for a put P(S, t) one has
∂P
∂t
+
1
2
σ
2
S
2

2
P
∂S
2
+rS
∂P
∂S
−rV = 0 ,
P(S, T) = max(K −S, 0) ,
P(0, t) = K for all t, P(S, t) → 0 as S → ∞.
Black-Scholes Formula
The solution to the the call PDE above is
C(S, t) = SN(d
+
) −Ke
−r(T−t)
N(d

) ,
where
d
+
=
ln(
S
K
)+(r+
1
2
σ
2
)(T−t)
σ

T−t
,
d

=
ln(
S
K
)+(r−
1
2
σ
2
)(T−t)
σ

T−t
,
and
N(x) =
1


_
x
−∞
e

s
2
2
ds =
1


_

−x
e

s
2
2
ds
is the cumulative normal distribution.
If you are given data, including the market price for derivatives, and nu-
merically solve the Black-Scholes formula for σ, the resulting value is called
the implied volatility.
We derive the Black-Scholes formula from the Black-Scholes equation, by
using a change of variables to obtain the heat equation, whose solutions we
have identified, at least in a formal way.
The first difference is the heat equation has an initial condition at t = 0,
while the Black-Scholes equation has a final condition at t = T. This can be
compensated for by letting τ = T −t. Then in terms of τ, the final condition
at t = T is an initial condition at τ = 0.
The second difference is that the Black-Scholes equation has S
2 ∂
2
V
∂S
2
while
the heat equation has

2
u
∂x
2
. If we let S = e
x
, or equivalently, x = ln S, then

∂x
=
∂S
∂x

∂S
= e
x ∂
∂S
= S

∂S

2
∂x
2
= S
2 ∂
2
∂S
2
+
126 CHAPTER 4. CONTINUOUS TIME
The third difference is that the Black-Scholes equation has
∂V
∂S
and V terms,
but the heat equation has no
∂u
∂x
or u terms. An integrating factor can
eliminate such terms, as follows. Suppose we are given
∂u
∂t
=

2
u
∂x
2
+a
∂u
∂x
+bu.
Let
u(x, t) = e
αx+βt
v(x, t) ,
with α, β to be chosen later. Then
∂u
∂t
= βe
αx+βt
v +e
αx+βt
∂v
∂t
,
∂u
∂x
= αe
αx+βt
v +e
αx+βt
∂v
∂x
,

2
u
∂x
2
= α
2
e
αx+βt
v + 2αe
αx+βt
∂v
∂x
+e
αx+βt

2
v
∂x
2
.
So
∂u
∂t
=

2
u
∂x
2
+a
∂u
∂x
+bu
becomes
e
αx+βt
(βv+
∂v
∂t
) = e
αx+βt

2
v+2α
∂v
∂x
+

2
v
∂x
2
)+ae
αx+βt
(αv+
∂v
∂x
)+be
αx+βt
v ,
or
βv +
∂v
∂t
= α
2
v + 2α
∂v
∂x
+

2
v
∂x
2
+a (αv +
∂v
∂x
) +b v ,
hence,
∂v
∂t
=

2
v
∂x
2
+ (a + 2α)
∂v
∂x
+ (α
2
+aα +b −β)v .
With a and b given, one can set a + 2α = 0 and α
2
+ aα + b − β = 0 and
solve for α and β. One gets α = −
1
2
a and β = b − α
2
= b −
1
4
a
2
. In other
words, if v(x, t) is a solution to
∂v
∂t
=

2
v
∂x
2
,
4.3. THE WIENER PROCESS 127
then u(x, t) = e

1
2
ax + (b −
1
4
a
2
)t
v(x, t) satisfies
∂u
∂t
=

2
u
∂x
2
+a
∂u
∂x
+bu.
With these three differences accounted for, we will now solve the Black-
Scholes equation for a European call. We are considering
∂C
∂t
+
1
2
σ
2
S
2

2
C
∂S
2
+rS
∂C
∂S
−rV = 0
with final condition
C(S, T) = max(S −K, 0) .
First make the change of variables S = Ke
x
, or equivalently, x = ln(S/K) =
ln S −ln K, and also let τ =
1
2
σ
2
(T −t), so C(S, t) = Ku(x, τ). Then
1.
∂C
∂t
=
∂C
∂τ


dt
= (K
∂u
∂τ
) (−
1
2
σ
2
) = −
1
2
σ
2
K
∂u
∂τ
,
2.
∂C
∂S
=
∂C
∂x

dx
dS
= (K
∂u
∂x
)
1
S
=
K
S
∂u
∂x
,
3.

2
C
∂S
2
=

∂S
(
K
S
∂u
∂x
) = −
K
S
2
∂u
∂x
+
K
S

∂S
(
∂u
∂x
)
= −
K
S
2
∂u
∂x
+
K
S

∂x
(
∂u
∂x
)
∂x
∂S
= −
K
S
2
∂u
∂x
+
K
S
2

2
u
∂x
2
.
Substituting this into the Black-Scholes equation, we have

1
2
σ
2
K
∂u
∂τ
+
1
2
σ
2
S
2
(−
K
S
2
∂u
∂x
+
K
S
2

2
u
∂x
2
) +rS(
K
S
∂u
∂x
) −rKu = 0
or
1
2
σ
2
∂u
∂τ
=
1
2
σ
2

2
u
∂x
2

1
2
σ
2
∂u
∂x
+r
∂u
∂x
−ru
or
∂u
∂τ
=

2
u
∂x
2
+ (
2r
σ
2
−1)
∂u
∂x

2r
σ
2
u.
128 CHAPTER 4. CONTINUOUS TIME
Set k =
2r
σ
2
. The equation becomes
∂u
∂τ
=

2
u
∂x
2
+ (k −1)
∂u
∂x
−ku.
And the final condition C(S, T) = max(S −K, 0) becomes the initial condi-
tion Ku(x, 0) = max(Ke
x
−K, 0), which becomes u(x, 0) = max(e
x
−1, 0).
As we saw, this reduces to the heat equation if we let u(x, τ) = e
αx+βτ
v(x, τ),
with α = −
1
2
(k −1) and β = −k −
1
4
(k −1)
2
= −
1
4
(k + 1)
2
. And the initial
condition u(x, 0) = max(e
x
−1, 0) becomes
e

1
2
(k−1)x
v(x, 0) = max(e
x
−1, 0)
or equivalently,
v(x, 0) = max(e
x
e
1
2
(k−1)x
−e
1
2
(k−1)x
, 0) = max(e
1
2
(k+1)x
−e
1
2
(k−1)x
, 0) .
Recall that the heat equation
∂v
∂τ
=

2
v
∂x
2
v(x, 0) = φ(x)
is solved by
v(x, τ) =
1

4πτ
_

−∞
e

(y−x)
2

φ(y) dy ,
which in this case has φ(y) = max(e
1
2
(k+1)y
−e
1
2
(k−1)y
, 0).
Now, e
1
2
(k+1)y
> e
1
2
(k−1)y
if and only if e
1
2
(k+1)y
/e
1
2
(k−1)y
> 1 if and only if
e
y
> 1 if and only if y > 0. So in our situation, we can write
φ(y) =
_
e
1
2
(k+1)y
−e
1
2
(k−1)y
if y > 0
0 if y ≤ 0
Now make the change of variables s = (y − x)/

2τ. Then ds = dy/

2τ,
so dy =

2τ ds. Then since φ(

2τs + x) = 0 when

2τs + x < 0, that is,
when s <
−x


,
v(x, τ) =



4πτ
_

−∞
e

(x−(

2τs+x))
2

φ(

2τs +x) ds
=
1


_

−∞
e

2τs
2

φ(

2τs +x) ds
=
1


_

−∞
e

s
2
2
φ(

2τs +x) ds
=
1


_


x


e

s
2
2
(e
1
2
(k+1)(

2τs+x)
−e
1
2
(k−1)(

2τs+x)
) ds
= I
+
−I

,
4.3. THE WIENER PROCESS 129
where
I
±
=
1


_


x


e

1
2
s
2
+
1
2
(k±1)(

2τs+x)
ds .
We calculate I
+
by completing the square inside the exponent.
I
+
=
1


_


x


e

1
2
s
2
+
1
2
(k+1)(

2τs+x)
ds
=
1


e
1
2
(k+1)x
_


x


e

1
2
s
2
+
1
2
(k+1)

2τs
ds
=
e
1
2
(k+1)x


_


x


e

1
2
(s−
1
2
(k+1)

2τ)
2
+
1
4
(k+1)
2
τ
ds
=
e
1
2
(k+1)x+
1
4
(k+1)
2
τ


_


x


e

1
2
(s−
1
2
(k+1)

2τ)
2
ds
=
e
1
2
(k+1)x+
1
4
(k+1)
2
τ


_


x



1
2
(k+1)


e

1
2
t
2
dt
=
e
1
2
(k+1)x+
1
4
(k+1)
2
τ


_
x


+
1
2
(k+1)


−∞
e

1
2
t
2
dt
= e
1
2
(k+1)x+
1
4
(k+1)
2
τ
N(d
+
) = e
x−αx−βτ
N(d
+
) .
using α = −
1
2
(k−1) and β = −
1
4
(k+1)
2
and where d
+
=
x


+
1
2
(k + 1)

2τ.
The second to last line was arrived at since
_

a
e
−t
2
/2
dt =
_
−a
−∞
e
−t
2
/2
dt
Now S = Ke
x
and 2τ = σ
2
(T −t), so
d
+
=
x


+
1
2
(k + 1)


=
ln(S/K)
σ

T−t
+
1
2
(
2r
σ
2
+ 1)σ

T −t
=
ln(
S
K
)+(r+
1
2
σ
2
)(T−t)
σ

T−t
.
The integral I

is treated in the same manner and gives
I

= e
1
2
(k−1)+
1
4
(k−1)
2
τ
N(d

)
= e
−αx−βτ−kτ
N(d

)
where d

=
x


+
1
2
(k − 1)

2τ =
ln(
S
K
)+(r−
1
2
σ
2
)(T−t)
σ

T−t
. So with v(x, τ) =
I
+
−I

established, we have substituting everything back in: x = ln(S/K),
130 CHAPTER 4. CONTINUOUS TIME
τ =
σ
2
2
(T −t) and k =
2r
σ
2
.
C(S, t) = Ku(x, τ) = Ke
αx+βτ
v(x, τ)
= Ke
αx+βτ
(I
+
−I

)
= Ke
αx+βτ
e
x−αx−βτ
N(d
+
) −Ke
αx+βτ
e
−αx−βτ−kτ
N(d

)
= Ke
x
N(d
+
) −Ke
−kτ
N(d

)
= SN(d
+
) −Ke
−r(T−t)
N(d

)
4.4 Exercises
1. Draw some graphs of the distribution of a Poisson process X
t
for various
values of t (and λ). What happens as t increases?
2. Since for a Poisson process P
jk
(t) = P
0,k−j
(t) for all positive t, show
that if X
0
= j, then the random variable X
t
− j has a Poisson distri-
bution with parameter λt. Then, use the Markov assumption to prove
that X
t
−X
s
has Poisson distribution with parameter λ(t −s).
3. Let T
m
be the continuous random variable that gives the time to the
mth jump of a Poisson process with parameter λ. Find the pdf of T
m
.
4. For a Poisson process with parameter λ, find P(X
s
= m[X
t
= n), both
for the case
0 ≤ n ≤ m and 0 ≤ t ≤ s
and the case
0 ≤ m ≤ n and 0 ≤ s ≤ t
5. Verify that the expected value and variance of W

t
are as claimed in
the text.
6. Memorize the properties of the Wiener process and close your notes
and books and write them on them down.
7. Verify that the function p(x, t) obtained at the end of the section sat-
isfies the Fokker-Planck equation.
8. Calculate E(e
λWt
).
4.4. EXERCISES 131
9. Suppose a particle is moving along the real line following a Wiener
process. At time 10 you observe the position of the particle is 3. What
is the probability that at time 12 the particle will be found in the
interval [5, 7]. (Write down the expression and then use a computer, or
a table if you must, to estimate this value.)
10. On a computer (or a calculator), find the smallest value of n = 1, 2, 3,
such that P([W
1
[ ≤ n) is calculated as 0.
11. What is E(W
2
s
W
t
). You must break up the cases s < t and s > t.
The following three problems provide justification for the Ito calculus
rule (dW
s
)
2
= ds. This problems is very similar to the calculation of
the Ito integral
_
W
t
dW
t
. Recall that for a process φ
t
adapted to X
t
how the Ito integral
_
t
t
0
φ
s
dX
s
is defined. First, for a partition T = ¦t
0
< t
1
< t
2
< t
n
= t¦ we
define the Ito sum
I(φ
s
, T, dX
s
) =
n

j=1
φ
t
j−1
∆X
t
j
where ∆X
t
j
= X
t
j
−X
t
j−1
. Then one forms the limit as the mesh size
of the partitions go to zero,
_
t
t
0
φ
s
dX
s
= lim
mesh P→0
I(φ
s
, T, dX
s
)
12. Let W
t
be the Wiener process. Show
(a) Show E((∆W
t
i
)
2
) = ∆t
i
.
(b) Show E(((∆W
t
i
)
2
−∆t
i
)
2
) = 2(∆t
i
)
2
.
13. For a partition T as above, calculate the mean square difference which
is by definition
E((I(φ
s
, T, (dW
s
)
2
) −I(φ
s
, T, dt))
2
)
132 CHAPTER 4. CONTINUOUS TIME
= E((
n

j=1
φ
t
j−1
(∆W
t
j
)
2

n

j=1
φ
t
j−1
∆t
j
)
2
)
and show that it equals
2
n

j=1
E(φ
2
t
j−1
)(∆t
j
)
2
14. Show that for an adapted process φ
s
such that E(φ
2
s
) ≤ M for some
M > 0 and for all s, that the limit as the mesh T → 0 of this expression
is 0 and thus that for any adapted φ
s
_
t
t
0
φ
s
(dW
s
)
2
=
_
t
t
0
φ
s
ds
15. Calculate
_
t
0
W
n
s
dW
s
16. (a) Solve the stock model SDE
dS
t
= µS
t
dt +σS
t
dW
t
(b) Find the pdf of S
t
.
(c) If µ = .1 and σ = .25 and S
0
= 100 what is
P(S
1
≥ 110)
.
17. What is d(sin(W
t
))
18. Solve the Vasicek model for interest rates
dr
t
= a(b −r
t
)dt +σdW
t
subject to the initial condition r
0
= c. (Hint: You will need to use a
multiplier similar to the one for the Ornstein-Uhlenbeck process.)
4.4. EXERCISES 133
19. This problem is long and has many parts. If you can not do one part,
go onto the next.
In this problem, you will show how to synthesize any derivative (whose
pay-off only depends on the final value of the stock) in terms of calls
with different strikes. This allows us to express the value of any deriva-
tive in terms of the Black-Scholes formula.
(a) Consider the following functions: Fix a number h > 0. Define
δ
h
(x) =
_
¸
_
¸
_
0 if [x[ ≥ h
(x +h)/h
2
if x ∈ [−h, 0]
(h −x)/h
2
if x ∈ [0, h]
_
¸
_
¸
_
(4.4.1)
Show that collection of functions δ
h
(x) for h > 0 forms a δ-
function.
Recall that this implies that for any continuous function f that
f(y) = lim
h→0
_

−∞
f(x)δ
h
(x −y)dx
(b) Consider a non-divdend paying stock S. Suppose European calls
on S with expiration T are available with arbitrary strike prices
K. Let C(S, t, K) denote the value at time t of the call with strike
K. Thus, for example, C(S, T, K) = max(S −K, 0).
Show
δ
h
(S −K) =
1
h
2
(C(S, T, K +h) −2C(S, T, K) +C(S, T, K −h))
(c) Let h(S) denote the payoff function of some derivative. Then we
can synthesize the derivative by
h(S) = lim
h→0
_

0
h(K)δ
h
(S −K)dK
If V (S, t) denotes the value at time t of the derivative with payoff
function h if stock price is S (at time t) then why is V (S, t) =
lim
h→0
_

0
h(K)
1
h
2
(C(S, T, K +h) −2C(S, T, K) +C(S, T, K −h)) dK
134 CHAPTER 4. CONTINUOUS TIME
(d) Using Taylor’s theorem, show that
lim
h→0
1
h
2
(C(S, T, K +h) −2C(S, T, K) +C(S, T, K −h))
=

2
C(S, t, K)
∂K
2
(e) Show that V (S, t) is equal to
h(K)
∂C(S, t, K)
∂K
[

K=0
−h

(K)C(S, t, K)[

K=0
+
_

0
h

(K)C(S, t, K)dK
(Hint: Integrate by parts.)

2

Information for the class

Office: DRL3E2-A Telephone: 215-898-8468 Office Hours: Tuesday 1:30-2:30, Thursday, 1:30-2:30. Email: blockj@math.upenn.edu References: 1. Financial Calculus, an introduction to derivative pricing, by Martin Baxter and Andrew Rennie. 2. The Mathematics of Financial Derivatives-A Student Introduction, by Wilmott, Howison and Dewynne. 3. A Random Walk Down Wall Street, Malkiel. 4. Options, Futures and Other Derivatives, Hull. 5. Black-Scholes and Beyond, Option Pricing Models, Chriss 6. Dynamic Asset Pricing Theory, Duffie I prefer to use my own lecture notes, which cover exactly the topics that I want. I like very much each of the books above. I list below a little about each book. 1. Does a great job of explaining things, especially in discrete time. 2. Hull—More a book in straight finance, which is what it is intended to be. Not much math. Explains financial aspects very well. Go here for details about financial matters. 3. Duffie— This is a full fledged introduction into continuous time finance for those with a background in measure theoretic probability theory. Too advanced. But you might want to see how our course compares to a PhD level course in this material.

3 4. Wilmott, Howison and Dewynne—Immediately translates the issues into PDE. It is really a book in PDE. Doesn’t really touch much on the probabilistic underpinnings of the subject. Class grade will be based on homework and a take-home final. I will try to give homework every week and you will have a week to hand it in.

4 Syllabus 1. (e) Arbitrage and no arbitrage. (c) The basic problem: How much should I pay for an option? Fair price. Einstein and Bachelier. or for those of you who have not taken a probability course but think that you can become comfortable with this material. Derivatives. (a) Binomial methods without much math. And what they didn’t know. Probability theory. The following material will not be covered in class. (g) Arbitrage pricing and hedging. I am assuming familiarity with this material (from Stat 430). (a) What is a derivative security? (b) Types of derivatives. . 3. (d) Bivariate distributions. (d) Expectation pricing. Arbitrage arguments. what they knew about derivative pricing. (c) Expectation and variance. (f) The arbitrage theorem. (b) Basic probability distributions. Arbitrage and reassigning probabilities. (a) Probability spaces and random variables. 2. I will hand out notes regarding this material for those of you who are rusty. Discrete time stochastic processes and pricing models. moments. The simple case of futures. (e) Conditional probability.

5. Comparison with martingale method.. (c) Stochastic differential equations and Ito’s lemma. 4. (j) Martingale approach to dynamic asset allocation. discrete in time. (c) Stochastic processes. (f) Change of probabilities. Finer structure of financial time series. (g) Optimal portfolio selection. (d) Black-Scholes model. (g) Martingales. (a) Wiener processes.5 (b) A first look at martingales. . (e) Random walks. (e) Derivation of the Black-Scholes Partial Differential Equation. (b) Stochastic integration. (d) Conditional expectations. Their connection to PDE. (f) Solving the Black Scholes equation. Continuous time processes. (i) Pricing a derivative and hedging portfolios. (h) Martingale representation theorem.

6 .

This set can be a discrete set (such as the set of 5-card poker hands. 2. for j = 1. then ∞ Ej ∈ Ω j=1 and ∞ Ej ∈ Ω j=1 In addition. that is a set S of “outcomes” for some experiment. A sigma-algebra (or σ-algebra) of subsets of S. 2. .Chapter 1 Basic Probability The basic concept in probability theory is that of a random variable. This is the set of all “basic” things that can happen. 3. That is when Ej ∈ Ω. contains S itself. and is closed under countable intersections and countable unions. To define a probability space one needs three ingredients: 1. . etc). if E ∈ Ω then its complement is too. is a sequence of subsets in Ω. This means a set Ω of subsets of S (so Ω itself is a subset of the power set P(S) of all subsets of S) that contains the empty set. or the possible outcomes of rolling two dice) or it can be a continuous set (such as an interval of the real number line for measuring temperature. . A random variable is a function of the basic outcomes in a probability space. that is we assume Ec ∈ Ω 7 . A sample space.

General probability spaces are a bit abstract and can be hard to deal with. and one usually restricts attention to the set of subsets that can be obtained by starting with all open intervals and taking intersections and unions (countable) of them – the so-called Borel sets [or more generally. is a countable (finite or infinite) collection of disjoint sets (i. These axioms imply that if Ac is the complement of A. In fact. Ai ∩ Aj = ∅ for all i different from j). then P (Ac ) = 1 − P (A). we will most often call them events. BASIC PROBABILITY The above definition is not central to our approach of probability but it is an essential part of the measure theoretic foundations of probability theory. So we include the definition for completeness and in case you come across the term in the future. and the principle of inclusion and exclusion: P (A ∪ B) = P (A) + P (B) − P (A ∩ B). For a subset A ⊂ R let’s define X −1 (A) to be the following subset of S: X −1 (A) = {s ∈ S|X(s) ∈ A} It may take a while to get used to what X −1 (A) means. . As a technical point (that you should probably ignore). is to bring things back to something we are more familiar with. The function P must have the properties that: (a) P (S) = 1. .8 CHAPTER 1. then P (∪Ai )) = P (Ai ). P (∅) = 0. a random variable is a function X on a probability space (S. Ω. P ) is a function that to every outcome s ∈ S gives a real number X(s) ∈ R.e. Ω is the collection of sets to which we will assign probabilities and you should think of them as the possible events. then Ω is often taken to be all subsets of S. When the basic set S is finite (or countably infinite). Lebesgue measurable subsets of S). when S is a continuous subset of the real line. 3. but do not think of X −1 as a function. A function P from Ω to the real numbers that assigns probabilities to events. In order for X −1 to be a function we would need X to be . (b) 0 ≤ P (A) ≤ 1 for every A ∈ Ω (c) If Ai . even if A and B are not disjoint. As mentioned above. One of the purposes of random variables. . i = 1. 2. 3.. this is not possible.

Usually. and we can assign a probability to any subset of the sample space. p(k) is called the probability density function (or pdf for short) of X. the value p(k) represents the probability that the event {X = k} occurs. So we have in some sense transferred the probability function to the real line. Proposition 1.0. rolls of a single die. If for A = (a. and should really be part of the definition of random variable. it is the probability density that we deal with and mention of (S. but not always. 1.) then we can define PX (A) = P (X −1 (A)).e.0. Ω is the set of all subsets of S.. The point of this definition is that if A ⊂ R then X −1 (A) ⊂ S. X −1 (A) ∈ Ω (this is a techinical assumption on X.2 Finite Discrete Random Variables 1.9 one-one. for discrete random variables. PX is called the probability density of the random variable X. b) ⊂ R an inverval. that is X −1 (A) is the event that X of the outcome will be in A. 1. Uniform distribution – This is often encountered. the values of a discrete random variable X are (subset of the) integers. coin flips. P ) is omitted. 1] that has the property that ∞ p(k) = 1 k=−∞ defines a discrete probability distribution. It will be useful to write p(k) = P ({X = k}) — set p(k) = 0 for all numbers not in the sample space.. X −1 (A) is just a subset of S as defined above.g. We repeat. other games of chance: S is the set of whole numbers from a to b (this is a set with b − a + 1 elements!). such as when you are counting something. P ({X = k}) for all k.1 Discrete Random Variables These take on only isolated (discrete) values. i. called measurability.1. So any function from the integers to the (real) interval [0. and P is defined by giving its values on all sets consisting of one element each (since then the rule for disjoint unions takes over . They are lurking somewhere in the background. In practice. as soon as we know the probability of any set containing one element. Ω.0. e. PX is a probability function on R. which we are not assuming.

6 and f (k) = 0 otherwise. Each of these outcomes has the same probability. what is the probability that k of them come up heads? Or do a sample in a population that favors the Democrat over the Republican 60 percent to 40 percent. 2. . the possible outcomes of rolling one die are {1}.3 Infinite Discrete Random Variables 3. 1. we find for example that P ({1. Poisson distribution (with parameter α) – this arises as the number of (random) events of some kind (such as people lining up at a bank. . number of people favoring the Republican [n=100 in this case]). namely 1/6. We can express this by making a table. P ({2. Using the disjoint union rule. As before. how many come up heads? i.10 CHAPTER 1. “Uniform” means that the same value is assigned to each one-element set. Since P (S) = 1. or specifying a function f (k) = 1/6 for all k = 1. 1.. 1.. 2. 2.. The function P is more interesting this time: P ({k}) = where n k n k 2n is the binomial coefficient n n! = k!(n − k)! k which equals the number of subsets of an n-element set that have exactly k elements. {3}. etc.. The sample space S is the set of all nonnegative integers S = 0.. For example. 4. 2. 3.0. or Geiger-counter clicks. What is the probability that in a sample of size 100. Binomial distribution – flip n fair coins. BASIC PROBABILITY to calculate the probability on other sets).. the value that must be assigned to each one element set is 1/(b − a + 1).n since these are the possible outcomes (number of heads. {4}. .. more than 45 will favor the Republican? The sample space S is 0.. {2}. {5} and {6}. or telephone calls arriving) per unit time.. 5. 3. the sigma algebra Ω is the set of all subsets of S.e. 5}) = 1/2. 3}) = 1/3. 2.

in the continuous case. Thus. b] = a 1 P ([x. Recall that in the discrete case. since we will have ∞ P (S) = P (∪∞ {k}) k=1 = k=1 e −α α k ∞ k! =e −α k=1 αk = e−α eα = 1 k! 1. either the collection of Borel sets ((sets that can be obtained by taking countable unions and intersections of intervals)) or Lebesgue-measurable sets ((Borels plus a some other exotic sets)) comprise the set Ω. and whose integral over the whole sample space (we can use the whole real line if we assign the value p(x) = 0 for points x outside the sample space) is . and for technical reasons. the probability of a single outcome is always 0.4 Continuous Random Variables A continuous random variable can take on only any real values. so-called pdf’s. These subsets. However. b]) for all a and b. P ([a. We think of dx as a very small number and then taking its limit at 0. The probability function on Ω is derived from: αk P ({k}) = e−α k! Note that this is an honest probability function.11 and again Ω is the set of all subsets of S. x + dx]) dx→0 dx b p(x)dx (1.0.1) This is really the defining equation of the continuous pdf and I can not stress too strongly how much you need to use this.0. In the continuous case. the probabilities were determined once we knew the probabilities of individual outcomes. Any function p(x) that takes on only positive values (they don’t have to be between 0 and 1 though). P ({x}) = 0. i. such as when you are measuring something. That is P was determined by p(k). we can only assign a probability to certain subsets of the sample space (but there are a lot of them). it is enough to know the probabilities of “very small” intervals of the form [x. the pdf is p(x) = lim so that P [a.. we can calculate the probabililty of any Borel set.e. x + dx]. and we can calculate continuous probabilities as integrals of “probability density functions”. As soon as we know the probability of any interval. Usually the values are (subset of the) reals.

x + dx] are equally likely as long as dx is fixed and only x varies. BASIC PROBABILITY equal to 1 can be a pdf. b] and all variables are equally likely. It is easy to calculate that if a < r < s < b. then p([r. But to begin. More generally. b]. In other words. b] is: b p([a. In this case. That means that p(x) should be a constant for x between a and b. b]) = a p(x) dx So any nonnegative function on the real numbers that has the property that ∞ p(x) dx = 1 −∞ defines a continuous probability distribution. Uniform distribution As with the discrete uniform distribution. all small intervals [x. The sample space for the normal distribution is always the entire real line. we need the constant to be 1/(b − a). What constant? Well. we need to calculate an integral: ∞ I= −∞ e−x dx 2 We will use a trick that goes back (at least) to Liouville: Now note that . s]) = s−r b−a The Normal Distribution (or Gaussian distribution) This is the most important probability distribution. we have the probability of the set (interval) [a. and zero outside the interval [a. we have (for small dx) that p(x)dx represents (approximately) the probability of the set (interval) [x. the variable takes on values in some interval [a. to have the integral of p(x) come out to be 1. x + dx] (with error that goes to zero faster than dx does). because the distribution of the average of the results of repeated experiments always approaches a normal distribution (this is the “central limit theorem”).12 CHAPTER 1.

we define the normal distribution with parameters µ and σ to be −(x−µ)2 1 x−µ 1 p(x) = N ( )= √ e 2σ2 σ σ 2πσ Exponential distribution (with parameter α) This arises when measuring waiting times until an event. More generally. I = π. and for reasons that will become apparent later. For this distribution. so p(x) defines a bona fide probability density function. we’ll evaluate the integral in polar coordinates (!!) – note that over the whole plane. or time-to-failure in reliability studies.0.0. we arrange this as follows: define 1 2 N (x) = √ e−x /2 2π Then N (x) defines a probability distribution. ∞) (or we can just let p(x) = 0 for x < 0). and dxdy becomes rdrdθ: 2π ∞ 0 I2 = 0 e−r rdrdθ = 0 2 2π 1 −r2 ∞ e |0 dθ = 2 2π 0 1 dθ = π 2 √ Therefore.0. We need to arrange things so that the integral is 1.13 ∞ I 2 = −∞ ∞ e −x2 ∞ dx · −∞ ∞ −∞ e−x dx e−y dy 2 2 (1. .4) = = e−x dx · −∞ ∞ ∞ 2 e−x −∞ −∞ 2 −y 2 dxdy because we can certainly change the name of the variable in the second integral. The probability density function is given by p(x) = αe−αx . Now (the critical step). It is easy to check that the integral of p(x) from 0 to infinity is equal to 1. and then we can convert the product of single integrals into a double integral. called the standard normal distribution.3) (1. r goes from 0 to infinity as θ goes from 0 to 2π. the sample space is the positive part of the real line [0.2) (1.

1) + (3)(0. Therefore.6).6.1 Expectation of a Random Variable The expectation of a random variable is essentially the average value it is expected to take on. This reasoning leads to the defining formula: E(X) = x∈S xp(x) for any discrete random variable. where the weights are just the probabilities of the outcomes.1. p(2) = 0. (1)(0. Uniform discrete: We’ll need to use the formula for k that you . x + dx] and calculating the probability p(x)dx that X fall s into the interval. BASIC PROBABILITY 1. it is calculated as the weighted average of the possible outcomes of the random variable.3. This is the formula for the expectation of a continuous random variable. By reasoning similar to the previous paragraph. As a trivial example. 10 of X = 2 and 60 of X = 3. In other words.3.3) + (2)(0. Example 1.14 CHAPTER 1. the situation is similar. consider the (discrete) random variable X whose sample space is the set {1. except the sum is replaced by an integral (think of summing up the average values of x by dividing the sample space into small intervals [x. The notation E(X) for the expectation of X is standard. 2. 3} with probability function given by p(1) = 0. If we repeated this experiment 100 times.1 and p(3) = 0. For continuous random variables. : 1. we would expect to get about 30 occurrences of X = 1. The average X would then be ((30)(1) + (10)(2) + (60)(3))/100 = 2. also in use is the notation X . the expectation should be E(X) = lim dx→0 xp(x)dx = x∈S S xp(x)dx.1.

EXPECTATION OF A RANDOM VARIABLE learned in freshman calculus when you evaluated Riemann sums: b 15 E(X) = k=a k· 1 b−a+1 b a−1 (1.1. right?).1) 1 = b−a+1 = k− k=0 k=0 k (1.8) 3.1.1.1.7) (1. Poisson distribution with parameter α: Before we do this.4) 2b−a+1 1 1 = (b − a + 1)(b + a) (1.2) b(b + 1) (a − 1)a 1 − (1.1.3) b−a+1 2 2 1 1 = (b2 + b − a2 + a) (1. 2.1.1.6) 2 which is what we should have expected from the uniform distribution.1.1. This is easier: E(X) = 1 dx b−a a 1 b 2 − a2 b+a = = b−a 2 2 x b (1. recall the Taylor series formula for the exponential function: ∞ e = k=0 x xk k! Note that we can take the derivative of both sides to get the formula: ∞ e = k=0 x kxk−1 k! If we multiply both sides of this formula by x we get ∞ xe = k=0 x kxk k! .5) 2b−a+1 b+a = (1. Uniform continuous: (We expect to get (b + a)/2 again.1.

} of non-negative integers and. BASIC PROBABILITY We will use this formula with x replaced by α. 4. This is a little like the Poisson calculation (with improper integrals instead of series). 6. it is pretty easy to conclude that the expectation of the binomial distribution (with parameter n) is n/2. and the expectation of the normal distribution (with parameters µ and σ) is µ. and 1 dv = e−αx dx so that v will be − α e−αx ): ∞ E(X) = 0 1 1 1 xαe−αx dx = α − xe−αx − 2 e−αx |∞ = . 0 α α α Note the difference between the expectations of the Poisson and exponential distributions!! 5. 3. Geometric random variable (with parameter r): It is defined on the sample space {0. given a fixed value of r between 0 and 1. Exponential distribution with parameter α. 1. and we will have to integrate by parts (we’ll use u = x so du = dx. .. If X is a discrete random variable with a Poisson distribution. We can compute the expectation of a random variable X having a geometric distribution with parameter r as follows (the trick is reminiscent of what we used on the Poisson distribution): ∞ ∞ E(X) = k=0 k(1 − r)r = (1 − r)r k=0 k krk−1 . By the symmetry of the respective distributions around their “centers”. 2. then its expectation is: e−α αk E(X) = = e−α k k! k=0 ∞ ∞ k=0 kαk k! = e−α (αeα ) = α.. has probability function given by p(k) = (1 − r)rk The geometric series formula: ∞ ark = k=0 a 1−r shows that p(k) is a probability function.16 CHAPTER 1.

y] (actually (−∞. The other is to make the change of variables y = x2 in . we have that 6 E(X) = 2 π ∞ k=1 k 6 = 2 2 k π ∞ k=1 1 =∞ k So the expectation of this random variable is infinite (Can you interpret this in terms of an experiment whose outcome is a random variable with this distribution?) If X is a random variable (discrete or continuous). it is possible to define related random variables by taking various functions of X. b] is the same as the probability that in √ X is in the interval [ a. y]. We calculate: √ d p(y) = dy y e−x dx = 0 √ √ √ d d 1 y = (1 − e− y ) = √ e− y −e−x |0 dy dy 2 y There are two ways to calculate the expectation of Y . Since X can only be positive. where h(y) = the probability that Y is in the interval [0. then the probability of Y being in some set A is defined to be the probability of X being in the set f −1 (A). p(y) is the integral of the function h(y). consider an exponentially distributed random variable X with parameter α=1.} by the function p(k) = π26k2 . y]) is the integral of p(y) from 0 to y. 3. The first is obvious: we can integrate yp(y). Let Y = X 2 .1. EXPECTATION OF A RANDOM VARIABLE = (1 − r)r d dr ∞ 17 rk = (1 − r)r k=0 r d 1 = dr 1 − r 1−r 7.1.. Another distribution defined on the positive integers {1. In other words. y]. This is a probability function based on the famous formula due to Euler : ∞ k=1 1 π2 = k2 6 But an interesting thing about this distribution concerns its expectation: Because the harmonic series diverges to infinity. b]. 2. As an example. If Y = f (X) for some function f . We can calculate the probability density function p(y) of Y by recalling that the probability that Y is in the interval [0. . the probability that Y is√ the interval [a. say X 2 or sin(X) or whatever.. But h(y) √ is the same as the probability that X is in the interval [0.

12 . Uniform discrete: If X has a discrete uniform distribution on the interval [a. then recall that E(X) = (b + a)/2. As an example. In particular. b]. In particular the first moment of X is its expectation. then having a finite sth moment is a more restrictive condition than having and rth one (this is a convergence issue as x approaches infinity. (Law of the unconscious statistician) If f (x) is any function. since xr > xs for large values of x.18 CHAPTER 1.1. We calculate Var(X) as follows: b+a Var(X) = E(X )− 2 2 2 b = k=a k2 b+a − b−a+1 2 2 = 1 (b−a+2)(b−a).2. or the probability density function of X if X is discrete. A more useful set of moments is called the set of central moments. BASIC PROBABILITY this integral. 1. which will yield (Check this!) that the expectation of Y = X 2 is ∞ x2 e−x dx E(Y ) = E(X 2 ) = 0 Theorem 1. we compute the variance of the uniform and exponential distributions: 1. It is a useful exercise to work out that σ 2 (X) = E((X − E(X))2 ) = E(X 2 ) − (E(X))2 . (Its square root is the standard deviation. These are defined to be the rth moments of the variable X − E(X).) It is a crude measure of the extent to which the distribution of X is spread out from its expectation value. the second moment of X − E(X) is called the variance of X and is denoted σ 2 (X). If s > r. then the expectation of the function f (X) of the random variable X is E(f (X)) = f (x)p(x)dx or x f (x)p(x) where p(x) is the probability density function of X if X is continuous.2 Variance and higher moments We are now lead more or less naturally to a discussion of the “moments” of a random variable. The rth moment of X is defined to be the expected value of X r .

. y) = 1 x∈S y∈T defines a (joint) probability function on S × T . Any function p(x.1. 6}. Thus: ∞ Var(x) = 0 x2 αe−αx dx − 1 1 = 2 2 α α The variance decreases as α increases. in the case of two dice. More generally. 5. it is common to consider each as a random variable in its own right. When there are several numbers like this. recall that E(X) = 1/α. y)dxdy = 1 −∞ −∞ . y. and p(x. y) in the cartesian product S×S. y) = 1/36 for each (x. where X ranges over S and Y over T . . For example.3. 4. 2. 12 3. For continuous functions. b].). we can consider discrete probability functions p(x. p(x. then its variance is calculated as follows: b σ (X) = a 2 x2 dx − b−a b+a 2 2 = 1 (b − a)2 . y) is non-negative and satisfies ∞ ∞ p(x. y) is the joint pdf of X and Y if p(x. z. 1. y) and such that p(x.. BIVARIATE DISTRIBUTIONS 19 2. the idea is the same. y) is between 0 and 1 for all (x. Exponential with parameter α: If X is exponentially distributed with parameter α. X and Y are discrete random variables each with sample space S = {1. this agrees with intuition gained from the graphs of exponential distributions shown last time. y) such that p(x.3 Bivariate distributions It is often the case that a random experiment yields several measurements (we can think of the collection of measurements as a random vector) – one simple example would be the numbers on the top of two thrown dice. y) on sets of the form S × T . 3. Uniform continuous: If X has a continuous uniform distribution on the interval [a. and to form the joint probability density (or joint probability) function p(x.

we need to integrate p(x. BASIC PROBABILITY We will use the following example of a joint pdf throughout the rest of this section: X and Y will be random variables that can take on values (x. we define the joint probability density function p(x. j) = 1/36 if 1 ≤ i ≤ 6. so there are thirty-six possible outcomes and each is equally likely. 1 ≤ j ≤ 6 . For continuous variables. y) dxdy = 1 −∞ −∞ As with discrete variables. j) = 1 i=−∞ j=−∞ When we find it convenient to do so. p(x. 0) and (2. This give a function p. j) = 0 for all i and j outside the domain we are considering. we let p(i. the joint probability function. (2. We interpret the function as follows: p(x. y)dxdy is (approximately) the probability that X is between x and x + dx and Y is between y and y + dy (with error that goes to zero faster than dx and dy as they both go to zero). y) in the triangle with vertices (0. We assume that we can tell the dice apart. P (X = i ∩ Y = j).20 CHAPTER 1. To see that this is a probability density function. if our random variables always lie in some subset of the plane. y) to be 0 for all (x. 2 For discrete variables. The joint pdf of X and Y will be given by p(x. We take one simple example of each kind of random variable. we consider the roll of a pair of dice. y) must be a non-negative valued function with the property that ∞ ∞ p(x. j) ≤ 1 for all i and j and ∞ ∞ p(i. 2). j) be the probability that X = i and Y = j. y) over the triangle and get 1: 2 0 0 x 1 dydx = 2x 2 0 y x | dx = 2x 0 2 0 1 dx = 1. y) = 1/(2x) if (x. For the discrete random variable. y) is in the triangle and 0 otherwise. of X and Y that is defined on (some subset of) the set of pairs of integers and such that 0 ≤ p(i. we will define p(x. we will set p(i. Thus. Thus our joint probability function will be p(i. y) outside that subset. 0). y) on (some subset of) the plane of pairs of real numbers.

In other words the event X = i is the union of the events X = i and Y = j as j runs over all possible values. . y) = 2x for (x. Since these events are disjoint. we will show how to calculate the marginal distributions pX (i) of X and pY (j) of Y . Thus: ∞ pX (i) = j=−∞ p(i. It is certainly equal to the probability that X = i and Y = 0.1. 0). the sum of p(i. j) . . 1. j) = 0 otherwise. ∞ pY (j) = i=−∞ p(i. or X = i and Y = 1. 2). These separated probability distributions are called the marginal distributions of the respective individual random variables. (2. MARGINAL DISTRIBUTIONS 21 and p(i. To calculate pX (i). Given the joint probability function p(i. Each of the 36 possible rolls has probability 1/36 of . or . j) Make sure you understand the reasoning behind these two formulas! An example of the use of this formula is provided by the roll of two dice discussed above. We checked last time that this is a probability density function (its integral is 1). 0) and (2. j)). .4 Marginal distributions Often when confronted with the joint probability of two random variables. we take the example mentioned at the end of the last lecture: 1 p(x. y) = 0 otherwise. the probability of their union is the sum of the probabilities of the events (namely. j) of the discrete variables X and Y . y) in the triangle with vertices (0. For our continuous example. Likewise. and p(x. we wish to restrict our attention to the value of just one or the other. if we simply remember how to interpret probability functions. We can calculate the probability distribution of each variable separately in a straightforward way. we recall that pX (i) is the probability that X = i.4.

recall that pX (x)dx is (approximately) the probability that X is between x and x + dx. the situation is similar. To do this. and are obtained either by common sense or by adding across the rows (resp down the columns). BASIC PROBABILITY occurring. y) dy dx ∞ In other words. y) is positive on the actual sample space subset of the plane. y) dx −∞ Again. we wish to calculate the marginal probability density functions of X and Y . They are the probabilities for the outcomes of the first (resp second) of the dice. y) dy −∞ ∞ Similarly. We return to our example: p(x. you should make sure you understand the intuition and the reasoning behind these important formulas. and zero outside it). x + dx] and Y is in [y. y) of a bivariate distribution of the two random variables X and Y (where p(x. pY (y) = p(x.22 CHAPTER 1. j) as indicated in the following table: i\j 1 2 3 4 5 6 pX (i) 1 1/36 1/36 1/36 1/36 1/36 1/36 1/6 2 1/36 1/36 1/36 1/36 1/36 1/36 1/6 3 1/36 1/36 1/36 1/36 1/36 1/36 1/6 4 1/36 1/36 1/36 1/36 1/36 1/36 1/6 1/36 1/36 1/36 1/36 1/36 1/36 1/6 5 6 1/36 1/36 1/36 1/36 1/36 1/36 1/6 pY (j) 1/6 1/6 1/6 1/6 1/6 1/6 The marginal probability distributions are given in the last column and last row of the table. In the limit as dy approaches zero. y) = 1 2x . we should sum all the probabilities that both X is in [x. For continuous random variables.this becomes an integral: ∞ pX (x)dx = −∞ p(x. y + dy] over all possible values of Y . so we have probability function p(i. Given the joint probability density function p(x. pX (x) = p(x. So to calculate this probability.

and pX (x) = 0 otherwise. but the idea of the calculation applies in principle to any function of two (or more!) random variables. 0) and (2. Note that pY (y) approaches infinity as y approaches 0 from above. x ranges from y to 2 inside the triangle: ∞ 2 pY (y) = −∞ p(x. that its integral is 1. This indicates that the values of X are uniformly distributed over the interval from 0 to 2 (this agrees with the intuition that the random points occur with greater density toward the left side of the triangle but there is more area on the right side to balance this out). given the joint probability (density) function. Y ). y) dx = x x=2 x=y 1 dx 2x = 1 ln(x) 2 1 = (ln(2) − ln(y)) 2 if 0 ≤ y ≤ 2 and pY (y) = 0 otherwise. y ranges from 0 to x inside the triangle: ∞ x pX (x) = −∞ p(x. The principle we will follow for discrete random variables is as follows: to calculate the probability function for F (X. y) dy = 0 1 dy 2x = y 2x y=x = y=0 1 2 if 0 ≤ x ≤ 2. we begin with the observation that for each value of y between 0 and 2. Note that for a given value of x between 0 and 2. y) = 0 otherwise.e. 2). i. 2]. and pY (y) approaches 0 as y approaches 2.5. You should check that this function is actually a probability density function on the interval [0. (2.1. To calculate pY (y). y) in the triangle with vertices (0. and p(x. By far.. and compute its marginal density functions. the most common such function is the sum of two random variables. FUNCTIONS OF TWO RANDOM VARIABLES 23 for (x. it is necessary to calculate the probability (density) function of a function of two random variables. 1. we consider the events .5 Functions of two random variables Frequently. 0). The easy one is pX (x) so we do that one first.

Since there are only countably many points in the sample space.y)∈S|F (x. y) This seems like a pretty weak principle. Then the probability function pF (f ) is pF (f ) = {(x. Since the outcome of each of the dice is a number between 1 and 6.f −1) pF (f ) = x=−∞ p(x. Y ).24 CHAPTER 1. but it is surprisingly useful when combined with a little insight (and cleverness). the random variable F that results is discrete. Y ) < f } (by integrating the . we integrate the pdf of F from −∞ to f : f P (F < f ) = −∞ pF (g) dg Conversely. our principle will be a little more complicated. we calculate the distribution of the sum of the two dice. Y ) = f } for each value of f that can result from evaluating F at points of the sample space of (X. To enunciate it.y)=f } p(x. As an example. Y ) will be to calculate the probabilities of the events {F (X.f −6) 1 1 = (6 − |f − 7|) 36 36 A table of the probabilities of various sums is as follows: f 2 3 4 5 6 7 8 9 10 11 12 pF (f ) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 The ”tent-shaped” distribution that results is typical of the sum of (independent) uniformly distributed random variables. the outcome of the sum must be a number between 2 and 12. but more powerful as well. For continuous distributions. we recall that to calculate the probability of the event F < f . BASIC PROBABILITY {F (X. f − x) = max(1. we can differentiate the resulting function: d f pF (g) dg pF (f ) = df −∞ (this is simply the first fundamental theorem of calculus). to recover the pdf of F . So for each f between 2 and 12: ∞ min(6. Our principle for calculating the pdf of a function of two random variables F (X.

1. we need to integrate 1/2x over the intersection of the half-plane x + y < z and the triangle T.5. it is easier to integrate first with respect to x and then with respect to y. since both X and Y are always between 0 and 2 the biggest the sum can be is 4. (2. 0). z − 2) and (2. We could calculate P (Z < z) by integrating p(x. 0). we must integrate the joint pdf p(x. y) = 1 2x for (x. We apply this principle to calculate the pdf of the sum of the random variables X and Y in our example: p(x. To calculate the probability P (Z < z). z/2). Therefore P (Z < z) = 1 for all z ≥ 4. y) in the triangle T with vertices (0. 0). the intersection is a triangle with vertices at the points (0. Of course. we first note that for any fixed number z. To calculate the pdf pZ (z). 0). But we will be a = 1 2 z/2 . for z ≤ 0. 2 The calculation of the pdf for 2 ≤ z ≤ 4 is somewhat trickier because the intersection of the half-plane x + y < z and the triangle T is more complicated. 2). Likewise. 0) and (2. (z/2. Let Z = X + Y . depending upon whether z is greater than or less than 2: If 0 ≤ z ≤ 2. 0). z/2) and (z. The shape of this intersection is different. (z/2. and then to differentiate with respect to f to get the pdf. FUNCTIONS OF TWO RANDOM VARIABLES 25 joint pdf over the region of the plane defined by this inequality). the pdf over this range is pZ (z) = ln(2) . and p(x. and we can calculate: z/2 z−y y P (Z < z) = 0 1 1 dxdy = 2x 2 z/2 ln(x)|x=z−y dy x=y 0 1 y=z/2 (ln(z − y) − ln(y)) dy = (z − (z − y) ln(z − y) − y ln(y))|y=0 2 0 1 = z ln(2) 2 1 And since the (cumulative) probability that Z < z is 2 z ln(2) for 0 < z < 2. the region of the plane where Z < z is the half plane below and to the left of the line y = z − x. (2.y) over this quadrilateral. The intersection in this case is a quadrilateral with vertices at the points (0. In this case. For z between 0 and 4. y) = 0 otherwise. y) over this region. we get zero since the half plane z < 0 has no points in common with the triangle where the pdf is supported.

for 2 ≤ z ≤ 4.1. suppose we already know that the sum will be greater than 7. z/2). If X and Y are two random variables. to get pZ (z) = 1 1 1 d (z ln(2) − z ln(z) + z − 1) = ln(2) − ln(z) dz 2 2 2 Now we have the pdf of Z = X + Y for all values of z. then E(X +Y ) = E(X) + E(Y ). in the roll of two dice. we will often be presented with situations where we have some knowledge that will affect the probability of whether some event will occur. The event F ≥ 8 has probability .5. 0) that are to the left of the line x = 2. The following fact is extremely useful. For example. It consists of points inside the triangle with vertices (0. In other words it is points inside this large triangle (and note that we already have computed the integral of 1/2x 1 over this large triangle to be 2 z ln(2)) that are NOT inside the triangle with vertices (2. 0). we can calculate P (Z < z) as z z−x 1 1 dydx P (Z < z) = z ln(2) − 2 2x 2 0 1 = z ln(2) − 2 z z 2 y y=z−x | dx 2x y=0 z−x 1 1 1 dx = ln(2) + (z ln(x) − x)|x=z = z ln(2) − x=2 2 2x 2 2 2 1 1 = z ln(2) − (z ln(z) − z − z ln(2) + 2) 2 2 1 1 = z ln(2) − z ln(z) + z − 1 2 2 To get the pdf for 2 < z < 4. (z. it is pZ (z) = ln(2) − 1 ln(z) for 2 < z < 4 and it is 0 otherwise. (z/2. 1.26 CHAPTER 1. 2 It would be good practice to check that the integral of pZ (z) is 1. This changes the probabilities from those that we computed above. It is pZ (z) = ln(2)/2 for 0 < z < 2. BASIC PROBABILITY little more clever: Note that the quadrilateral is the ”difference” of two sets. we need only differentiate this quantity.6 Conditional Probability In our study of stochastic processes. Proposition 1. z − 2) and (z. 0). 0). (2. Thus.

0). we get P (F = 9|F ≥ 8) = P (F = 9) 4/36 4 = = . it is also the case the A is true. For P (Y > 1|X > 1) we actually need to compute something. read ”the probability that F=9 given that F ≥ 8”. This one is easy! Note that in the triangle with vertices (0.1. Independence of two random variables is expressed by the equations: P (A|B) = P (A) P (B|A) = P (B) . The assumption of independence of certain events is essential to many probabilistic arguments. so we get: 2 x 1 2 1 0 x P (Y > 1|X > 1) = P (Y > 1) = P (X > 1) 1 1 dydx 1 (1 − ln(2)) 2x = 2 = 1 − ln(2) 1 1 2 dydx 2x Two events A and B are called independent if the probability of A given B is the same as the probability of A (with no knowledge of B) and vice versa. we have P (A ∩ B) P (A|B) = P (B) For our dice example (noting that the event F = 9 is a subset of the event F ≥ 8). 2) it is true that Y > 1 implies that X > 1. CONDITIONAL PROBABILITY 27 (5 + 4 + 3 + 2 + 1)/36 = 15/36. But note that Y > 1 is a subset of the event X > 1 in the triangle. In symbols. 0) and (2. So we are restricted to less than half of the original sample space. Therefore the events (Y > 1) ∩ (X > 1) and Y > 1 are the same. First P (X > 1|Y > 1). so the fraction we need to compute will have the same numerator and denominator. We might wish to calculate the probability of getting a 9 under these conditions.6. P (F ≥ 8) 15/36 15 As another example (with continuous probability this time). The quantity we wish to calculate is denoted P (F = 9|F ≥ 8). (2. we calculate for our 1/2x on the triangle example the conditional probabilities: P (X > 1|Y > 1) as well as P (Y > 1|X > 1) (just to show that the probabilities of A given B and B given A are usually different). In general to calculate P (A|B) for two events A and B (it is not necessary that A is a subset of B). we need only realize that we need to compute the fraction of the time that the event B is true. Thus P (X > 1|Y > 1) = 1.

Then P ( lim n i=1 Xi n→∞ n → µ) = 1 . then σ 2 (aX) = a2 σ 2 (X) for any real number a. be a sequence of independent random variables. (The strong law of large numbers) Let X1 . Theorem 1. This formula is a straightforward consequence of the definition of independence and is left as an exercise. This reduces to the fact that the joint probability (or probability density) function of X and Y ”splits” as a product: p(x. people have noticed that.1. y) = pX (x)pY (y) of the marginal probabilities (or probability densities). This theorem is the main rigorous result giving credence to the empirical law of averages. For many centuries. Proposition 1.6. all with the same distribution. We now state for future reference the strong law of large numbers.1. 2. It states that if a random experiment is performed repeatedly under identical and independent conditions. .7 Law of large numbers The point of probability theory is to organize the random world.7. This belief about the way the random world works is embodied in the empirical law of averages. 1. then the proportion of trials that result in a given outcome converges to a limit as the number of trials goes to infinity. one can give a quantitative account of this randomness. though certain measurements or processes seem random. then σ 2 (X + Y ) = σ 2 (X) + σ 2 (Y ). BASIC PROBABILITY P (A ∩ B) = P (A) · P (B) Two random variables X and Y are independent if the probability that a < X < b remains unaffected by knowledge of the value of Y and vice versa. . 1. If X and Y are two independent random variables. If X is a random variable. .28 and especially CHAPTER 1. This law is not a mathematical theorem. Probability gives a mathematical framework for understanding statements like the chance of hitting 13 on a roulette wheel is 1/38. X2 . whose expectation is µ.

Make your own example of a probability space that is finite and discrete. 8. rewrite φ as a function of ∂x2 ∂t x. that it is positive and that its integral from −∞ to ∞ is 1). Prove the principle of inclusion and exclusion.7. (Hint. ∂φ 5.1. Make your own example of a probability space that is infinite and discrete. 2φ ∂ . defined to be the distance of the random point from the origin.. p(A∪B) = p(A)+p(B)− p(A ∩ B) and p(Ac ) = 1 − p(A). where t = σ 2 (for the last one. µ and t). Calculate its expectation. Calculate the expectation of the underlying random variable X. 4. Prove that the normal distribution function −(x−µ)2 1 φ(x. µ. σ) = √ e 2σ2 σ 2π is really a probability density function on the real line (i. and ∂φ . 7. LAW OF LARGE NUMBERS Exercises 29 1. Make your own example of a continuous random variable. 6. 3. then the random point is equally likely to be in either one). Calculate the expectation of this random variable. Calculate the expectation of the underlying random variable X.e.) . Consider the experiment of picking a point at random from a uniform distribution on the disk of radius R centered at the origin in the plane (“uniform” here means if two regions of the disk have the same area. and don’t forget to count the empty set. What is the relationship among the following partial derivatives? ∂σ . If S is a finite set with n elements. Calculate the probability density function and the expectation of the random variable D. how many elements are in P(S) the power set of S. try it for the first few values of n. 2.

10. 11. 2 and 3 calculate the expectation and the variance. Let p(x. Let X be any random variable with finite second moment. The answer for the normal distribution is σ 2 . .30 CHAPTER 1. Ω. Show that the minimum value of f occurs when a = E(X). . 18. if they exist. its mean and its variance. Also find the mean and variance of X + Y . that f −1 (∪∞ Aj ) = ∪∞ f −1 (Aj ) j=1 j=1 Use this to prove that if f is a random variable on a probability space (S. Calculate the variance of the binomial. A3 . Calculate the variance of the binomial. P then its distribution Pf as defined above is also a probability. Fill in the details of the calculation of the variance of the uniform and exponential distributions. 14. Show that for a function f : S → R that for A1 . 13. π/2). Find the pdf of X. y) be the uniform joint probability density on the unit disk. Calculate the pdf of X + Y . y) = 0 otherwise. 1 p(x. a collection of subsets of the real numbers. i. Also. 12. Let X be the sine of an angle in radians chosen uniformly from the interval (−π/2. 16. Poisson and normal distributions. 15. Find the joint pdf of X and Y (easy). BASIC PROBABILITY 9. X 2 and etX .e. Find the pdf of X + Y .. Consider the function f (a) defined as follows: f (a) = E((X − a)2 ).. . The answer for the normal distribution is σ 2 . Suppose X and Y are independent random variables. prove that σ 2 (X) = E((X − E(X))2 ) = E(X 2 ) − (E(X))2 . For each of the examples of random variables you gave in problems 1. . y) = if x2 + y 2 < 1 π and p(x. Let X be a random variable with the standard normal distribution. each distributed according to the exponential distribution with parameter α. Poisson and normal distributions. A2 . 17. find the mean and variance of each of the following:|X|.

Two of the problems that we will encounter is asset allocation and derivative valuation. An options contract gives the right but not the obligation to the purchaser of the contract to buy (or sell) a specified (in our case ) stock 31 . is derived from ) an underlying security. asks to calculate an intrinsic value of a derivative security (for example. asks how do we allocate our resources in various assets to achieve our goals? These goals are usually phrased in terms of how much risk we are willing to acquire.Chapter 2 Aspects of finance and derivatives The point of this book is to develop some mathematical techniques that can be used in finance to model movements of assets and then to use this to solve some interesting problems that occur in concrete practice. a stock index. an option on a stock). 2. The underlying in theory could be practically anything: a stock. Though it might seem more esoteric. a bond or much more exotic things. asset allocation.1 What is a derivative security? What is a derivative security? By this we mean any financial instrument whose value depends on (i. Our most fundamental example is an option on a stock.e. Thus we will attack derivative valuation first and then apply it to asset allocation. derivative valuation. The first. how much income we want to derive from the assets and what level of return we are looking for. The second. the second question is actually more basic than the first (at least in our approach to the problem).

Of course. I would exercise my option only if the price of IBM is more than $90. Securities sold on an exchange are generally handled by market makers or specialists. No money changes hands at the establishment of the contract. When sold over the counter. For example.32 CHAPTER 2. for example a company or a government. such as margin accounts. the price is arrived at through negotiation by the parties involved. I could buy a (call) option to buy 100 shares of IBM at $90 a share on January 19. 1 this contract is selling for $19 1 . I could buy a June forward contract for 50. Thus they provide liquidity for the security involved. A forward contract is an agreement/contract between two parties that gives the right and the obligation to one of the parties to purchase (or sell) a specified commodity (or financial instrument) at a specified price at a specific future time.25/gallon.2 Securities The two most common types of securities are an ownership stake or a debt stake in some entity. ASPECTS OF FINANCE AND DERIVATIVES at a specific price at a specific future time. Securities can either be sold in a market (an exchange) or over the counter. Currently. 2. No money changes hands at the outset of the agreement. A debt security is usually acquired by buying a company’s or a government’s bonds. when January 19 rolls around. You become a part owner in a company by buying shares of the company’s stock. These are traders whose job it is to always make sure that the security can always be bought or sold at some price.000 gallons of frozen orange juice concentrate for $. Market makers must quote bid prices (the amount they are willing to buy the security for)and ask prices (the amount . The seller of this option earns the $19 2 (the 2 price (premium) of the options contract) and incurs the obligation to sell me 100 shares of IBM stock on January 31 for $90 if I should so choose. A debt security like a bond is a loan to a company in exchange for interest income and the promise to repay the loan at a future maturity date. This would obligate me to take delivery of the frozen orange juice concentrate in June at that price. A similar notion is that of a futures contract. When sold in an exchange the price of the security is subject to market forces. Another well known and simpler example is a forward. The difference between a forward and a future contract is that futures are generally traded on exchanges and require other technical aspects. For example.

The opposite position is called short. the expiration date from the seller of the option. Being long an option means owning it.2. One can have long or short positions in any asset. Short selling is a basic financial maneuver. I am long one share in IBM. sells it in the market. TYPES OF DERIVATIVES 33 they are willing to sell the security for) of the securities. one borrows the stock (from your broker. In the future. We call this feature path dependence. Of course the bid price is usually greater than the ask and this is one of the main ways market makers make their money. A put is just the opposite. but on the entire history of the stock’s movements. one is said to have a long position in that asset. the strike price.3. then it is called an American call. It also comes in the European and American varieties. Barrier options: These are options that either go into effect or out of effect if . Different versions of these are based on calculating the average in different ways. Thus if I owe a share of IBM. Thus is I own a share of IBM. The spot price is the amount that the security is currently selling for. That is. If the owner of the option can exercise the option any time up to T . It gives the owner of the contract the right to sell.3 Types of derivatives A European call option (on a stock) is a contract that gives the buyer the right but not the obligation to buy a specified stock for a specified price K. 2. in particular in options. This is one of many idealizations that we will make as we analyze derivatives. for example). Be short an option means that you are the writer of the option. These basic call and put options are sold on exchanges and their prices are determined by market forces. Asian options: these are options allowing one to buy a share of stock for the average value of the stock over the lifetime of the option. one must return the stock (together with any dividend income that the stock earned). To short sell. T . An interesting feature of these is that their pay off depends not only on the value of the stock at expiration. at a specified date. There are also many types of exotic options. I am short one share of IBM. Some terminology: If one owns an asset. we will usually assume that the bid and ask prices are the same. Short selling allows negative amounts of a stock in a portfolio and is essential to our ability to value derivatives.

Markets for options and other derivatives are important in the sense that agents who anticipate future revenues or payments can ensure a profit above a certain level or insure themselves against a loss above a certain level. Black and Scholes wrote down there formula. The price is what is actually charged while the value is what the thing is actually worth. Quantifying risk and the level of reward that goes along with a given level of risk is a major activity in both theoretical and practical circles in finance. These are path dependent. material and labor. insurance. Contrary to common parlance. There are also down and in.34 CHAPTER 2. This is the houses value. wrote down a solution to this problem. the Nobel Prize in economics was awarded to Robert Merton and Myron Scholes. It is essential that every individual or corporation be able to select and manipulate the level of risk for a given transaction. ASPECTS OF FINANCE AND DERIVATIVES the stock value passes some threshold. This redistribution of risk towards those who are willing and able to assume it is what modern financial markets are about. Lookback options: Options whose pay-off depends on the the maximum or minimum value of the stock over the lifetime of the option. The total we call the price. and priced. The developer. Black died in his mid-fifties in 1995 and was therefore not awarded the Nobel prize. a down and out option is an option that expires worthless if the barrier X is reached from above before expiration.. Our problem is to determine the value of an option. cost of financing etc. up and in. and you are encouraged to think what these mean. wanting to make a profit. he calculates his costs: land.4 The basic problem: How much should I pay for an option? Fair value A basic notion of modern finance is that of risk. adds to this a markup. For example. who along with Fischer Black. up and out. 2. Options can be used to reduce or increase risk. In 1997. When a developer wants to figure out how much to charge for a new house say. risk is not only a bad thing. Risk is what one gets rewarded for in financial investments. . It is pretty clear that a prerequisite for efficient management of risk is that these derivative products are correctly valued. Thus our basic question: How much should I pay for an option? We make a formal distinction between value and price.

. The casino desiring to make a profit sets the actual cost to play at 4. suppose we play a (rather silly) game of tossing a die and the casino pays the face value on the upside of the die in dollars. The casino of course is figuring that people will play this game many many times. . an option seems more like pricing bets in a casino.2. As an example. P (X = i) = 1/6. .5. This is an example of expectation pricing. Their method is quite general and can be used and modified to handle may kinds of exotic options as well as to value insurance contracts. They had a difficult time getting their paper published. and even has applications outside of finance.5 Expectation pricing. the amount the casino charges for a given bet is based on models for the payoffs of the bets. within a year of its publication. WHAT THEY KNEW ABOUT DER the Black-Scholes Formula in 1973. However.e. Not too much higher than the fair value or no would want to play. 2. Hewlett Packard had a calculator with this formula hardwired into it. what they knew about derivative pricing. the first thing is to figure out the fair value. 2. the casino feels reasonably confident that they will be paying off something close to $21/6 · n. . Today this formula is an indispensable tool for thousands of traders and investors to value stock options in markets throughout the world. EXPECTATION PRICING. And what they didn’t know As opposed to the case of pricing a house. the law of large numbers says that if there are n plays of the game. So 21/6 is the fair valuer per game. for i = 1. Assuming that the die is fair. How much should the casino charge to play this game. Such models are of course statistical/probabilistic. (i. That is. Einstein and Bachelier. they would lose their money too fast. 6 and that each roll of the die doesn’t affect (that is independent of ) any other roll. EINSTEIN AND BACHELIER. That is. One models this payoff as a random variable and uses . the payoff of some experiment is random. Well. that the average of the face values on the die will be with probability close to 1 E(X) = 1 · 1/6 + 2 · 1/6 + · · · + 6 · 1/6 = 21/6 Thus after n plays of the game.

1 Payoffs of derivatives We will have a lot to say in the future about models for stock movements.3. it might not be a very good value. But for now. we don’t exercise our option and expires worthless. What’s the fair value to play Russian Roulette? As we will see below. This works much of the time. Thus.5. let’s see how might we use this to value an option. In order for the law of large numbers to kick in. the best we can do here as well is to give a probabilistic model. 2. The French mathematician Louis Bachelier in his doctoral dissertation around 1900 was one of the earliest to attempt to model stock movements. calculate the payoff of all the derivatives defined in section 2. and selling it on the market for ST . At expiration there are two possibilities. 0). the experiment has to be performed many times. Jacques Hadamard did not have such high regard for Bachelier’s work. There are a number of reasons why the actual value might not be the expectation price. Let us consider a European call option C to buy a share of XYZ stock with strike price K at time T . If ST > K then it makes sense to exercise the option. This actually predated the use of these to model Brownian motion. On your own. what would be the obvious way to use expectation pricing to value an option. . in independent trials. Thinking back to stocks and options. It is what the derivative pays at expiration. (for which Einstein got his Nobel Prize ). paying K for the stock. ASPECTS OF FINANCE AND DERIVATIVES the strong law of large numbers to argue that the fair value is the expectation of the random variable. Let St denote the value of the stock at time t. This is what we call the payoff of the derivative. If the value of the stock at expiration is less than the strike price ST ≤ K. there are more subtle and interesting reason why the fair value is not enforced in the market.The one time we know the value of the option is at expiration.36 CHAPTER 2. He did so by what we now call a Random Walk (discrete version) or a Wiener process (continuous version). Some times it doesn’t. the whole transaction netting $ST − K. which in this case only depends on the value of the stock at expiration. the option is worth max(St − K. If you are offered to play a game once. The first order of business would be to model the movement of the underlying stock. Bachelier’s thesis advisor. Unfortunately.

For the meanwhile. 2. The time value of money is expressed by interest rates. 3 months to a year or so. ) Bachelier attempted to value options along these lines. This makes some sense. we will assume that the interest rate r is constant. then at time T > t it will be worth $ exp(r(T − t))B. Over this period. I am assuming interest rates are 0. that is Bt = exp(−r(T − t))BT . On the other hand. the expiration date. it would lead to profits or losses over the long run. If one has $B at time t . 0)). (I am ignoring the time value of money. So expectation pricing says F = E(ST ).6. this contract should at time t be worth $ exp(r(T − t))E(ST − F ). I must buy a share of XYZ for $F from him and he must sell it to me for $F . So at time T . This time I will also keep track of the time value of money. Now suppose that we enter into a forward contract with another person to buy one share of XYZ stock at time T.6 The simple case of forwards. at time t worth $St . Bonds are the market instruments that express the time value of money. If the stock is now. then its current value Bt can be expressed as BT discounted by some interest rate r. A zero-coupon bond is a promise of a specific amount of money (the face or par value) at some specified time in the future. unfortunately. There is another wrinkle to option pricing. And expectation pricing would say that the fair price should be E(max(ST − K. Arbitrage arguments I will illustrate the failure of expectation pricing with the example of a forward contract.2. if this is not zero. by the law of large numbers. Since the contract costs nothing to set up. THE SIMPLE CASE OF FORWARDS. We want to calculate what the forward price $F is. Thus at time T . It’s called arbitrage. Thus the value of the option would be the random variable max(ST −K. the interest rates do no vary so much. . and adjusting for the time value of money. So if a bond promises to pay $BT at time T . Most options are rather short lived. that is. According to expectation pricing. 0). ARBITRAGE ARGUMENTS37 Assuming a probabilistic model for St we would arrive at a random variable ST describing in a probabilistic way the value of XYZ at time T . the contract is worth $ST − F . consider the following strategy for the party who must deliver the stock at time T . it turns out that expectation pricing (naively understood at least) does not work.

it is worth nothing. Someone else will come along and offer the same contract but with a slightly smaller F still making a positive (and thus a potentially unlimited) positive. long a share of . So again. In practice. This is an example of the basic arbitrage argument. Let’s give another example of an arbitrage argument. No matter how small the profit is on any single transaction. P + St = C + K exp(−r(T − t)). A similar argument as in the paragraph above shows that $(exp(r(T − t))St − F ) ≤ 0. borrow it) and sell it in the market for $St . Again.6. At time T he can just hand over the stock. At the beginning of the contract. and holds the stock until time T . So he starts with a contract costing no money to set up. So $(F − exp(r(T − t))St ≤ 0 If $(F − exp(r(T − t))St < 0 you can work it from the other side. then at time t. and now has exp(r(T − t))St in the risk free instrument. one can parley this into unlimited profits by performing the transaction in large multiples. as long as $(F − exp(r(T − t))St > 0 someone can come along and undercut him and still make unlimited money. Proposition 2. So we see something very interesting. and in the end has $(F − exp(r(T − t))St . Therefore. That is a risk free profit.38 CHAPTER 2. buys the stock now. and C and P denote the value of a call and put on XYZ with the same strike price K and expiration T . you should have calculated that the payoff on a put is max(K − ST . Proof: By now. This is also a basic result expressing the relationship between the value of a call and of a put. the party to the contract who must buy the stock for F at time T . collect $F and repay the loan exp(r(T − t))St . return the stock. ASPECTS OF FINANCE AND DERIVATIVES he borrows the money at the risk free interest rate r. what will happen in practice if this is positive. no money changes hands. 0). Now form the portfolio P + S − C. So we arrive at F = exp(r(T − t))St . an arbitrage opportunity. At time T . That is. can at time t short the stock (i. That is. at time t the contract costs nothing to set up. After the transaction is over he has $(F − exp(r(T − t))St . (Put-Call parity) If St denotes the value of a stock XYZ at time t.1. Put the money in the risk free instrument. The forward price has nothing to do with the expected value of the stock at time ST . buy the stock for $F . then we have an arbitrage opportunity.e. and at time T she has $(exp(r(T − t))St − F ) which is positive. the price arrived at by expectation pricing F = E(ST ) would be much higher than the price arrived at by the arbitrage argument. On the other hand. It only has to do with the value of the stock at the time of the beginning of the contract. If this is positive.

2. ARBITRAGE AND NO-ARBITRAGE 39 stock. What is the sense of this? Of course arbitrage opportunities exist. That is it’s risk free. Then P +S −C = max(K −ST . Regardless of what ST is. 0)+ST −max(ST −K. But Malkiel and others would insist that in actual fact. Then P +S −C = max(K −ST . Q. Case 1: ST ≤ K.7 Arbitrage and no-arbitrage Already we have made arguments based on the fact that it is natural for competing players to undercut prices as long as they are guaranteed a positive risk free profit.D.) An arbitrage is a portfolio that guarantees an (instantaneous) risk free profit. It is important to realize that a zero-coupon bond is not an arbitrage. the portfolio is worth K at time T . Thus the word instantaneous in the definition is equivalent to saying that the portfolio is risk free and grows faster than the risk-free interest rate. Then at time T we calculate.7. but the growth of the risk free bond is just an expression of the time value of money. a good discussion of this is in Malkiel’s book.) as a starting point. 0) = K − ST + ST − 0 = K Case 2: ST > K.) Actually. arbitrage opportunities don’t really exist. What other financial entity has this property? A zero-coupon bond with par value K.7. The no-arbitrage hypothesis is the assumption that arbitrage opportunities do not exist. It is risk free to be sure. along with the efficient market hypothesis. .2. Hence the value of this portfolio at time t is K exp(−r(T − t)). and a put. Definition 2. 0)+ST −max(ST −K. (We formalize this even more in the next section. 0) = 0 + ST − (ST − K) = K So the payoff at time T from this portfolio is K no matter what.E. This is formalized by the no arbitrage hypothesis. There exist people called arbitrageurs whose job it is to find these opportunities. and short a call. (And some of them make a lot of money doing it. and thus able to generate large profits by carrying out multiple transactions. But even if you don’t buy this. it seems to be a reasonable assumption (especially in view of the reasonable arguments we made above.1. Giving the theorem.

.1 d2. these securities are selling at S1 . 2.1) . at time 0. d1. We can organize these payoffs into an N × M matrix. . any dividends that are paid or any possible change to the over all worth of the security. which are labeled 1. labeled {1. in which case your evaluation of derivative prices is better founded.8. . options. . . .2 D=  . The existence of many people seeking arbitrage opportunities out means that when such an opportunity arises. or you can make money (having found an arbitrage opportunity). .2  d2. .40 CHAPTER 2. . . Our first goal is to figure out what arbitrage means mathematically and for this we model financial markets in the following way.M (2...1 dN.. 4. That is. . S3 . the existence of arbitrageurs makes the no-arbitrage argument more reasonable. 2. if one sees derivatives mispriced. Consider the column vector S whose components are the Sj ’s. 3. d1. N }. Each security. .8 The arbitrage theorem In this section we establish the basic connection between finance and mathematics. At time 1... S2 . . We consider the following situation. . This payoff reflects any change in the security’s price.2   . you can be right (about the no-arbitrage hypothesis). . d2.  . So as a basic principle in valuing derivatives it is a good starting point. . from time 0 to time 1 the world has changed from its current state (reflected in the securities prices S). Note that this notation is different than notation in the rest of the notes in that subscript does not denote time but indexes the different securities. So you have a dichotomy. Consider N possible securities. M . . . Also.. ASPECTS OF FINANCE AND DERIVATIVES In fact. On publicly traded stocks the existence of arbitrage is increasingly rare.M . These may be stocks. say i has a payoff dij in state j. etc. .1 d1.M   . SN units of account (say dollars if you need closure). it closes all the more quickly. bonds. dN. the random nature of financial markets are represented by the possible states of the world.. it might mean the existence of arbitrage opportunities. Suppose that now. commodities. . . to one of M possible states. dN.

.2..  θN 41 (2. indicating short selling.8. θ · S < 0 and DT θ ≥ 0 These conditions express that the portfolio goes from being worthless (respectively.e..2.2) where θi represents the number of copies of security i in the portfolio. The portfolio is riskless if the payoff of the portfolio is independent of the state of the world. Definition 2. . THE ARBITRAGE THEOREM Our portfolio is represented by a N × 1 column matrix   θ1  θ  θ= 2   . a free-lunch. regardless of the state at time 1. ..1. Now.. . An arbitrage is a portfolio θ such that either 1.8. (DT θ)i = (DT θ)j for any states of the world i and j. xn )|xi ≥ 0 for all i} + and Rn = {(x1 . . θ · S ≤ 0 and DT θ > 0 or 2. This can be a negative number. at least worth nothing). less that worthless) to being worth something (resp. A state price vector is a M × 1 column vector ψ such that ψ > 0 and S = Dψ . in other words. In common parlance.M θj where the ith row of the matrix. an arbitrage is an (instantaneous) risk free profit. Let Rn = {(x1 . i.1 θj   N  j=1 dj.8.8.2 θj  (2. . xn )|xi > 0 for all i} ++ . .   N j=1 dj. . We write v ≥ 0 if v ∈ Rn and v > 0 if v ∈ Rn . . which we write as (DT θ)i represents the payoff of the portfolio in state i. + ++ Definition 2. the market value of the portfolio is θ · S = θT S.3) =  . The pay-off of the portfolio can be represented by the M × 1 matrix DT θ   N j=1 dj.8.

42

CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

Theorem 2.8.1. (Arbitrage theorem) There are no arbitrage portfolios if and only if there is a state price vector. Before we prove this theorem, we need an abstract theorem from linear algebra. Let’s recall that a subset C ⊂ Rn is called convex if the line between any two points of C is completely contained in C. That is, for any two points x, y ∈ C and any t ∈ [0, 1] one has x + t(y − x) ∈ C Theorem 2.8.2. (The separating hyperplane theorem) Let C be a compact convex subset of Rn and V ⊂ RN a vector subspace such that V ∩ C = ∅. Then there exists a point ξ0 ∈ Rn such that ξ0 · c > 0 for all c ∈ C and ξ0 · v = 0 for all v ∈ V . First we need a lemma. Lemma 2.8.3. Let C be a closed and convex subset of Rn not containing 0. Then there exists ξ0 ∈ C such that ξ0 · c ≥ ||ξ0 ||2 for all c ∈ C. Proof: Let r > 0 be such that Br (0) ∩ C = ∅. Here Br (x) denotes all those points a distance r or less from x. Let ξ0 ∈ C ∩ Br (0) be an element which minimizes the distance from the origin for elements in C ∩ Br (0). That is, ||ξ0 || ≤ ||c|| (2.8.4)

for all c ∈ Br (0) ∩ C. Such a ξ0 exists by the compactness of C ∩ Br (0). Then it follows that (2.8.4) holds for all c ∈ C since if c ∈ C ∩ Br (0) then ||ξ0 || ≤ r ≤ ||c||. Now since C is convex, it follows that for any c ∈ C, we have ξ0 +t(c−ξ0 ) ∈ C so that ||ξ0 ||2 ≤ ||ξ0 + t(c − ξ0 )||2 = ||ξ0 ||2 + 2tξ0 · (c − ξ0 ) + t2 ||c − ξ0 ||2 Hence 2||ξ0 ||2 − t||c − ξ0 || ≤ 2ξ0 · c for all t ∈ [0, 1]. So ξ0 · c ≥ ||ξ0 ||2 QED. Proof (of the separating hyperplane theorem) Let C be compact and convex, V ⊂ Rn as in the statement of the theorem. Then consider C − V = {c − v | c ∈ C, v ∈ V }

2.8. THE ARBITRAGE THEOREM

43

This set is closed and convex. (Here is where you need C be compact.) Then since C ∩ V = ∅ we have that C − V doesn’t contain 0. Hence by the lemma there exists ξ0 ∈ C − V such that ξ0 · x ≥ ||ξ0 ||2 for all x ∈ C − V . But this implies that for all c ∈ C, v ∈ V that ξ0 · c − ξ0 · v ≥ ||ξ0 ||2 . Or that ξ0 · c > ξ0 · v for all c ∈ C and v ∈ V . Now note that since V is a linear subspace, that for any v ∈ V , av ∈ V for all a ∈ R. Hence ξ0 · c > ξ0 · av = aξ0 · v for all a. But this can’t happen unless ξ0 · v = 0. So we have ξ0 · v = 0 for all v ∈ V and also that ξ0 · c > 0. QED. Proof (of the arbitrage theorem): Suppose S = Dψ, with ψ > 0. Let θ be a portfolio with S · θ ≤ 0. Then 0 ≥ S · θ = Dψ · θ = ψ · DT θ and since ψ > 0 this implies DT θ ≤ 0. The same argument works to handle the case that S · θ < 0. To prove the opposite implication, let V = {(−S · θ, DT θ) | θ ∈ RN } ⊂ RM +1 Now consider C = R+ × RM . C is closed and convex. Now by definition, + there exists an arbitrage exactly when there is a non-zero point of intersection of V and C. Now consider C1 = {(x0 , x1 , . . . , xM ) ∈ C | x0 + · · · + xM = 1} C1 is compact and convex. Hence there exists according to (2.8.2) x ∈ C1 with 1. x · v = 0 for all v ∈ V and 2. x · c > 0 for all c ∈ C1 . Then writing x = (x0 , x1 , . . . , xM ) we have that x · v = 0 implies that for all θ ∈ RN , (−S ·θ)x0 +DT θ ·(x1 , . . . , xM )T = 0. So DT θ ·(x1 , . . . , xM )T = x0 S ·θ or θ · D(x1 , . . . , xM )T = x0 S · θ. Hence letting ψ= (x1 , . . . , xM )T x0

44

CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES

we have θ · Dψ = S · θ for all θ. Thus Dψ = S and ψ is a state price vector. Q.E.D. Let’s consider the following example. A market consisting of two stocks 1, 2. And two states 1, 2. Suppose that S1 = 1 and S2 = 1. And D= 1 1 1 2 2 (2.8.5)

So security 1 pays off 1 regardless of state and security 2 pays of 1/2 in state 1 and 2 in state 2. Hence if θ1 θ= (2.8.6) θ2 is a portfolio its payoff will be DT θ θ1 + θ2 /2 θ1 + 2θ2 (2.8.7)

Is there a ψ with S = Dψ. One can find this matrix easily by matrix inversion so that 1 1 1 ψ1 = (2.8.8) 1 2 1 ψ2 2 So that ψ1 ψ2 = 4/3 −2/3 −1/3 2/3 1 1 = 2/3 1/3 (2.8.9)

Now we address two natural and important questions. What is the meaning of the state price vector? How do we use this to value derivatives? We formalize the notion of a derivative. Definition 2.8.3. A derivative (or a contingency claim) is a function on the set of states. We can write this function as an M × 1 vector. So if h is a contingency claim, then h will payoff hj if state j is realized. To see how this compares with what we have been talking about up until now, let us model a call on

Then the market value of the portfolio θj is S · θj = Dψ · θj = ψ · DT θj = ψ · j = ψj Hence we have shown. Its market value will then be R = M ψj . (you might want to think of R as (1 + r)−1 where r is the . 0). How much is this bet worth? Suppose that we can find a portfolio θj such that DT θj = j .2. Consider a call with strike price 1 1 on 2 security 2 with expiration at time 1. THE ARBITRAGE THEOREM 45 security 2 in the example above. Let ψ be a state price vector. So the state price vector is the market value of a bet that state j will be realized. except that it allows the possibility of a payoff depending on a combination of securities.11) 1 in the jth row     ···  0 These are often called binary options. That is.8.8. and 0 if any other state occurs. B is a riskless zero-coupon bond with par value 1 and expiration at time 1.8. Let j be a contingency claim that pays 1 if state change occurs. Then hj = max(d2. To understand what the state price vector means. we introduce some important special contingency claims. So this R is the discount factor for j=1 riskless borrowing. It is just a bet paying off 1 that state j occurs.10) So our derivatives are really nothing new here. Or 2 h= h1 h2 = 0 1 2 (2. Now consider   1 M  1  j  B= = (2. which is certainly a degree of generality that we might want. This is the meaning of ψ. So   0  0     ···  j   = (2.j − 1 1 .12)  ···  j=1 1 This is a contingency claim that pays 1 regardless of state. that the market value of a portfolio whose payoff is j is ψj .8.

Hence. a market is complete if and only if every claim can be replicated. j there is a portfolio Now we see that if a market is complete. so the value of h is 2 11 ) = 1/6 REQ (h) = 1 · (0 + 3 23 Coming back to general issues.4. Then the market value of θ is S · θ = Dψ · θ = ψ · DT θ M = ψ · h = RQ · h = D j=1 qj hj = REQ (h) That is. we have regained our idea that the value of a contingency claim should be an expectation. So Q can be interpreted as a probability function on the state space. but with respect to probabilities derived from only from the no-arbitrage hypothesis. If θ is a portfolio whose payoff is h. They are probabilities that are derived only from the no-arbitrage condition. Note that Q ∈ R++ and M j=1 qj = 1. the market value of θ (and therefore the derivative) is the discounted expectation of h with respect to the pseudo-probability Q. consider a derivative h. They are therefore often called pseudo-probabilities. When is there a portfolio θ whose payoff DT θ is equal to a contingency claim h. That is.e. For this we make the following Definition 2. then for any contingency claim M j T h.46 CHAPTER 2. there is one further thing we need to discuss. we have that for θ = j=1 hj θ that D θ = h. ASPECTS OF FINANCE AND DERIVATIVES 1 interest rate. 1. DT θ = h. Coming back to our problem of valuing derivatives. .4. i. not from any actual probabilities of the states occurring.8. Let us go back to the example above. A market is complete if and only if every contingency claim can be replicated. Let’s go ahead and value the option on security number 2. It is important to remark that these probabilities are not the real world probabilities that might be estimated for the states j. We state this and an additional fact as a Theorem 2. we say that θ replicates h.8. A market is complete if for each of the θj with DT θj = j .) Now we can form a vector Q = R ψ. R = 1 in this example.

So assume that the market is not complete. . and ψ 1 is a state price vector.8. i. But then 0 = D(ψ 1 − ψ 2 ) · θ = (ψ 1 − ψ 2 ) · DT θ. this means that (ψ 1 − ψ 2 ) · v = 0 for all v ∈ RN which means that ψ 1 − ψ 2 = 0.2. then D(ψ 1 − ψ 2 ) · θ = (S − S) · θ = 0 for all θ. It is clear from the first statement that if a market is complete then DT : RN → RM is onto. Proof: Having proved the first statement above we prove the second. We prove the other direction by contradiction. THE ARBITRAGE THEOREM 47 2. Then DT is not onto.e. Then θ · Dτ = 0 for all θ so Dτ = 0 and ψ 2 = ψ 1 + τ is another state price vector. Thus if ψ 1 and ψ 2 are two state price vectors. that is has rank M . QED. An arbitrage free market is complete if and only if the state price vector is unique. DT θ · τ = 0 for all θ. So we can find a vector τ orthogonal to the image. Since DT is onto.

The other case you know the payoff. $D. 3. with strike price K and expiration T .) 4. but with strike prices K < L respectively. Suppose that stock XY Z pays no dividends. show that there can only be one bond price. Let h 1 be a put on security 2 with strike price 2 . 5. that is. 8. if there are 2 securities paying $1. and repay the loan at time T .48 Exercises CHAPTER 2. (Hint: Calculate the payoff of the option as seen from the same time T in two cases: 1) If you exercise early at time t < T . ASPECTS OF FINANCE AND DERIVATIVES 1. Consider the example of the market described in section (2. 6. Suppose that C is a call on a non-dividend paying stock S. What is the fair value of this game?. Show by an arbitrage argument that Dt ≥ Ct for all t < T1 . 2) If you hold it until expiration. buy the stock. respectively. Calculate its market value. for any intermediate time t < T . That is. regardless of state. Show. but with expirations T1 < T2 . Calculate this portfolio at time T . Ct > St − K. Suppose a stock is worth $S− just before it pays a dividend. then their market value is the same.7). you borrow K at time t to buy the stock. Show that for any time t < T that Ct > St − exp(−r(T − t))K. . 7. Suppose that C and D are two calls on the same stock with the same strike price K. tossing a die and the casino pays the face value on the upside of the die in dollars. and C is an American call on XY Z with strike price K and expiration T . Show by an arbitrage argument that Ct ≥ Dt for all t < T . To finance the first case. that it is never advantageous to exercise early. using an arbitrage argument. Consider a variant of the game described above. Using the arbitrage theorem. if the player doesn’t like her first toss. 2. Suppose that C and D are two calls on the same stock with the same expiration T . Except this time. she is allowed one additional toss and the casino then pays the face value on the second toss. Now compare and set up an arbitrage if St − K ≥ Ct . What is the value of the stock S+ just after the dividend? Use an arbitrage argument.

.8. THE ARBITRAGE THEOREM 49 9. Consider a specific derivative (of your choice) and calculate its market value.2. and show that there is a state price vector. Make up your own market.

50 CHAPTER 2. ASPECTS OF FINANCE AND DERIVATIVES .

3. you are a fool. playing a gambling 51 . John Wiley and Sons. and then give an application to calculating expected stopping times. Giorolamo Cardano. of the dice box. Let me begin by stating the slogan that a martingale is a mathemical model for a fair game. The concept of a martingale is one of primary importance in the modern theory of probability and in theoretical finance. First let’s think about what we mean by a fair game. Hald.Chapter 3 Discrete time In this chapter. of money. A History of Probability and Statistics and their applications before 1750. why they come up in option pricing. of bystanders.” (From A. an Italian. we proceed from the one period model of the last chapter to the mutli-period model. of equal conditions? Consider a number of gamblers. for example.1 A first look at martingales In this section I would like to give a very intuitive sense of what a martingale is. The Book of Games of Chance he wrote: ”The most fundamental principle of all in gambling is simply equal conditions. and of the die itself. if it is in your opponent’s favor. of opponents.) How do we mathematically model this notion of a fair game. In his treatise. of situation. that is. In fact. you are unjust. and if in your own. was one of the first mathematicians to study the theory of gambling. New York. the term martingale derives from the well known betting strategy (in roulette) of doubling ones bet after losing. say N . To the extent to which you depart from that equality.

That is. This is close to the Martingale condition. Now for the equal conditions. then we have pn (x) = qn (x) for all n and x.52 CHAPTER 3. We also assume that the gamblers make equal initial wagers and the winner wins the whole pot. then he can expect to have Fn−1 after the nth hand. which we explain below. if pn (x) denotes the probability density function of Fn . That is. That is. It is reasonable to model each gambler as the gamblers’ current fortunes. Think about this. From the conditions on the gambling game described above. So let Fn denote fortune of one of the gamblers after the nth hand. F0 ) = Fn−1 . Or. DISCRETE TIME game. · · · . so we keep track of their fortunes at each moment of time. Hence we have that E(Fn ) = E(Fn−1 + ξn ) = E(Fn−1 ) + E(ξn ) = E(Fn−1 ) So the expected fortunes of the gambler will not change over time. with pdf qn (x). What are the consequences of this condition? The amount that the gambler wins in hand n is the random variable ξn = Fn − Fn−1 and the equal distribution of the gamblers’ fortunes implies that the gamblers’ wins in the nth hand are also equally distributed. we must have E(ξn ) = 0. We will assume that this game is played in discrete steps called hands. O Romeo. And I’ll no longer be a Capulet. Fn−1 .) This is closer to the Martingale condition which says that E(Fn |Fn−1 . A reasonable way to insist on the equal conditions of the gamblers is to assume that each of the gamblers’ fortune processes are distributed (as random variables) in exactly the same way. be but sworn my love. E(Fn |Fn−1 ) = Fn−1 . Because of the random nature of gambling we let Fn be a random variable for each n. A sequence of random variables. Fn is what is called a stochastic process. -Juliet . if thou wilt not. Romeo! wherefore art thou Romeo? Deny thy father and refuse thy name. this is the only information that we really need to talk about. Noting the gamblers’ fortunes at a single time is not so useful. (Here E(Fn |Fn−1 ) = Fn−1 denotes a conditional expectation. The same analysis says that if the gambler has Fn−1 after the n − 1 hand. and Gn is any of the other gamblers’ fortunes.

Everyone recognizes this as Juliet’s line in Act 2 scene 2 of Shakespeare’s Romeo and Juliet. Let T1 denote the random variable of expected time until HT and T2 the expected time until HH. T HHHHT. So the question more mathematically is what is the expected time for a monkey to type ”O Romeo Romeo”. T HHT. ignoring spaces and commas. what is the expected number of times it takes to flip two Heads in a row: HH ? We can actually calculate these numbers by hand. T T HT }) = 3/16 P ({HHHHT. Benvloio also says this in Act 3 scene 1. If T denotes one of these RV’s (random variables) then ∞ E(T ) = n=1 nP (T = n) We then have 0 P ({HT }) = 1/4 P ({HHT. A FIRST LOOK AT MARTINGALES 53 Now let me describe a very pretty application of the notion of martingale to a simple problem. This seems perhaps pretty hard. T T T HT }) = 4/32 P (T1 = 6) = P ({HHHHHT.1) . Flipping a fair coin over and over. But how long will it take. 1. capitals and punctuation) and typing one letter per second will eventually type the complete works of Shakespeare with probability 1. So let’s consider a couple of similar but simpler questions and also get our mathematical juices going. More particularly.1. T HHHT. Then we want to calculate E(T1 ) and E(T2 ).3. Everyone has come accross the theorem in a basic probability course that a monkey typing randomly at a typewriter with a probability of 1/26 of typing any given letter (we will always ignore spaces. T T HHHT. T HT }) = 2/8 P ({HHHT. T T T T HT }) = 5/64 P (T1 P (T1 P (T1 P (T1 P (T1 = 1) = = 2) = = 3) = = 4) = = 5) = (3.1. what is the expected number of times it takes to flip before getting a Heads and then a Tails: HT ? 2. Flipping a fair coin over and over. T T T HHT. let’s consider the question of how long it will take just to type: ”O Romeo. Romeo”. T T HHT.

On the other hand.2) Again the pattern is easy to see: P (T2 = n) = fn−1 /2n where fn denotes the nth Fibonacci number (Prove it!). T T T H − T HH. let’s write down the ”generating function” ∞ f (t) = n=1 n(n − 1)tn so that E(T1 ) = f (1/2) To be able to calculate with such generating functions is a useful skill. H − T HH} = 2/16 P ({T T − T HH. At this point. First not that the term n(n − 1) is what would occur in the general term after differentiating a power series twice. Here 0 P ({HH}) = 1/4 P ({T HH}) = 1/8 P ({T − T HH. To sum this. we make some guesses and then adjustments to see that ∞ f (t) = t2 ( n=0 tn ) so it is easy to see that f (t) = 2t2 (1 − t)3 and that E(T1 ) = f (1/2) = 4. HT − T HH. HT H − T HH}) = 5/64 P (T2 = 7) = P ({T T T T − T HH. T H − T HH}) = 3/32 P ({T T T − T HH.(Recall that the Fibonacci numbers are P (T2 P (T2 P (T2 P (T2 P (T2 P (T2 = 1) = = 2) = = 3) = = 4) = = 5) = = 6) = . T HT T − T HH.1. this term sits in front of tn instead of tn−2 so we need to multiply a second derivative by t2 . T T H − T HH. HT T H − T HH} = 8/128 (3. HT HT − T HH.54 We can see the pattern P (T1 = n) = ∞ n−1 2n CHAPTER 3. HT T − T HH. T HT − T HH. So far pretty easy and no surprises. T HT H − T HH. DISCRETE TIME and so E(T1 ) = n=2 n(n − 1)/2n . HT T T − T HH. T T HT − T HH. Let’s do E(T2 ).

the first bettor walks up to the table and bets one Italian Lira (monkey typing is illegal in the United States) ITL 1 that the first letter is an ”O”. Let T denote the random variable giving me the expected time until ”OROMEOROMEO”. he gets ITL 26 and bets all ITL 26 on the second play that the second letter is an ”R”. The same kind of analysis could in theory be done for ”OROMEOROMEO” but it would probably be hopelessly complicated. A FIRST LOOK AT MARTINGALES 55 defined by the recursive relationship.1. I walk into a casino and sitting at a table dressed in the casino’s standard uniform is a monkey in front of a typewriter.) So ∞ E(T2 ) = n=2 nfn−1 /2n Writing g(t) = complicated) that ∞ n=1 nfn−1 t we calculate in a similar manner (but more 2t2 − t3 g(t) = (1 − t − t2 )2 n and that E(T2 ) = g(1/2) = 6. on average. I know in my mathematical bones that I can not expect to do better on average than to break even.1.3. To approach ”OROMEOROMEO” let’s turn it into a game. I can never do better than break even. they are paid off at a rate of 26 to 1. A little surprising perhaps. For simplicity. What is accounting for the fact that HH takes longer? A simple answer is that if we are awaiting the target string (HT or HH) then if we succeed to get the first letter at time n but then fail at time n + 1 to get the second letter. Together a group of gamblers conspire together on the following betting scheme. If their letter came up. f2 = 1 and fn = fn−1 + fn−2 . assume that a round of monkey typing takes exactly one minute. the earliest we can succeed after that is n + 2 in the case of HT but is n + 3 in the case of HH.1. People are standing around the table making bets on the letters of the alphabet. On the first play. Eventually this will be a theorem in martingale theory (and not a very hard one at that). That is: Theorem 3. The game then continues. If he wins. No matter what strategy I adopt. f1 = 1. Clearly the odds and the pay-off are set up to yield a ”fair game”: this is an example of a martingale. If he . Then the monkey strikes a key at random (it is a fair monkey) and the people win or lose according to whether their bet was on the letter struck.

regardless of what has happened on the first play. because the game is fair. On the other hand. 3. How many such losses are there? Well. Then they calculate their total wins and losses. at each play. the sequence starting with the final ”O” in the first ROMEO and continuing through the end has gone undefeated and yields ITL 266 . If at some point one of the bettors gets through ”OROMEOROMEO”. X1 . let’s calculate the amount they’ve won once ”OROMEOROMEO” has been achieved. . their net earnings will be 2611 + 266 + 26 − T Since the expected value of this is 0. the second bettor begins the same betting sequence starting with the second play. Because they got through. all the money is confiscated and he walks away with a net loss of the initial ITL 1. the betting sequence beginning ”OROMEOROMEO” has yielded ITL 2611 .2 Introducing stochastic processes A stochastic process which is discrete in both space and time is a sequence of random variables.} such that the totality of values of all the Xn ’s combined form a discrete set. . that is. the conspirators expected wins is going to be 0. A fair amount of money even in Lira. The subscript n in Xn is to be thought of as time. Now in addition to starting the above betting sequence on the first play.56 CHAPTER 3. In addition to this though. and the conspirators walk away with their winnings and losses which they share. ITL 1 the betting sequence initiated at that play has been defeated. it means that the expected value E(T ) = 2611 + 266 + 26. etc. except for those enumerated above. The different values that the Xn ’s can take are called the states of the stochastic process. Of course if at some point he loses. Again. Try this for the ”HH” problem above to see more clearly how this works. the game ends. One after another. one for each play of monkeytyping. Therefore. Each of these then counts as a ITL 1 loss. the bettors (there are a lot of them) initiate a new betting sequence at each play. . defined on the same sample space (that is they are jointly distrubuted) {X0 . . she also bets ITL 1 that the second play will be an O. On the other hand. And finally the final ”O” earns an additional ITL 26. DISCRETE TIME wins he bets all ITL 262 that on the third play the letter is an ”O”. X2 .

Xn−1 = kn−1 . Xn−1 . We express this mathematically by saying  if j = k − 1  p 1 − p if j = k + 1 P (Wn+1 = j|Wn = k) = (3.2. So pjk a priori depends on n. if at time n.3.1)  0 otherwise In general. called the matrix of transition probabilities. k pjk = 1 for each j. . · · · . They satisfy (merely by virtue of being probabilities ) 1.1. that is p0 (k) = P (X0 = k). Xn−1 = kn−1 . We write pjk = P (Xn+1 = k|Xn = j) The numbers pjk form a matrix (an infinite by infinite matrix if the random variables Xn are infinite).2. Let Wn denote the position of a random walker. the random walker has a probability p of taking a step to the left and a probability 1 − p of taking a step to the right. X1 . If it doesn’t. Example 3. at time 0. for a stochastic process Xn we might specify P (Xn+1 = j|Xn = kn . Thus. Wn = k then at time n + 1 there is probability p of being at k − 1 and a probability 1 − p of being at k + 1 at time n + 1. Thus. X0 = k0 ) = P (Xn+1 = j|Xn = kn ) the process is called a Markov process. W0 = 0 say and at each moment in time. then the process is called a stationary Markov process. Let’s reiterate: pjk represents the probability of going from state j to state k in one time step. once we know the pjk the only information left to know is the initial probability distribution of X0 . One way to describe such a thing is to consider how Xn depends on X0 . INTRODUCING STOCHASTIC PROCESSES 57 We are interested in understanding the evolution of such a process. . That is. . The basic question is: Given p0 and pjk what is the . if Xn is a stationary Markov process. The matrix pjk as we’ve defined it tells us how the probabilities change from the nth to the n + 1st step.2. pjk ≥ 0 2. X0 = k0 ) If as in the case of a random walk P (Xn+1 = j|Xn = kn . . · · · .

Now the probability of going from state j to state h in n − 1 steps and then going from state h to state k in one step is P (Xn−1 = h|X0 = j)P (Xn = k|Xn−1 = h) = pjh (n−1) (n) phk Now the proposition follows by summing up over all the states one could have landed upon on the n − 1st step. Suppose it happens to land on the state h.58 CHAPTER 3.E. pjk = h (n) (n) pjh (n−1) phk Proof: pjk = P (Xn = k|X0 = j) Now a process that goes from state j to state k in n steps. must land somewhere on the n − 1st step. Q. DISCRETE TIME probability distribution of Xn ? Towards this end we introduce the ”higher” transition probablilities pjk = P (Xn = k|X0 = j) which by stationarity also equals P (Xm+n = k|Xm = j) the probability of going from j to k in n steps.2. Now we can answer the basic question posed above.1. it just says that if P = pjk denotes the matrix of transition probabilities from one step to the next. then P n the matrix raised to the nth power represents the higher transition probabilities and of course P n+m = P n P m . If P (X0 = j) = p0 (j) is the initial probability distribution of a stationary Markov process with transition matrix P = pjk then P (Xn = k) = j p0 (j)pjk (n) . Expressed in terms of matrix mulitplication. Then the probability P (Xn = k|Xn−1 = h) = P (X1 = k|X0 = h) = phk by stationarity. It follows by repeated application of the preceding proposition that pjk (n+m) = h pjh phk (n) (m) This is called the Chapman-Kolmogorov equation. Proposition 3.D.

Let X be a Random variable representing a Bernoulli trial. Now a Random walk can be described by Wn = X1 + · · · + Xn where each Xk is an independent copy of X: Xk represents taking a step to the left or right at the kth step. since the Xk are mutually independent we have FWn = FX1 FX2 · · · FXn = (FX )n = (pt−1 + (1 − p)t)n n = j=0 (j)(pt−1 )j ((1 − p)t)n−1 n . 3.1 Generating functions Before getting down to some examples. Let X be a discrete random variable whose states are any integer j and has distribution p(k). Thus FX = pt−1 + (1 − p)t. then FX+Y (t) = FX (t)FY (t). which must be 1. P (X = 1) = 1 − p and P (X = k) = 0 otherwise. We introduce its generating function FX (t) = p(k)tk k At this point we do not really pay attention to the convergence of FX . Example 3. If X and Y are independent random variables. that is P (X = −1) = p.2. Here are some of the basic facts about the generating function of X. FX (1) = 1 since this is just the sum over all the states of the probabilities.3. Var(X) = FX (1) + FX (1) − (FX (1))2 .2.2.2. E(X) = FX (1). Now according to fact 4) above. we will do formal manipulations with FX and therefore FX is called a formal power series (or more properly a formal Laurent series since it can have negative powers of t). 1. 2. let’s use these facts to do some computations. Now evaluate at 1. 4. Since FX (t) = k kp(k)tk−1 . This is an easy calculation as well. Before going on. INTRODUCING STOCHASTIC PROCESSES 59 3. I want to introduce a technique for doing calculations with random variables.

. The generating function of such a composite is easy to calculate. DISCRETE TIME = j=0 (j)pj (1 − p)n−j tn−2j Hence the distribution for the random walk Wn can be read off as the con efficients of the generating function. Also. Similarly. that is. E(Wn ) = FWn (1). 2. . P (N = n) = 36 (6 − |7 − n|) for n = 2.2.) Also. So according to a calculation in a previous set of 1 notes. Hence our random variable X can be written as N X= j=1 Yj (3. in general a composite random variable is one that can be expressed by (3. What is its distribution? We can describe X mathematically as follows.2) where Yj are independent copies of Y and note that the upper limit of the sum is the random variable N .2)where N and Y are general random variables. . . FWn = n(n − 1)(pt−1 + (1 − p)t)n−2 ((1 − p) − pt−2 )2 + n(pt−1 + (1 − p)t)n−1 (2pt−3 ) and evaluating at 1 gives n(n − 1)((1 − p) − p)2 + n2p. Then flip a coin that number of times and count the number of heads. This is an example of a composite random variable.12 and is 0 otherwise. 1.. . (Note: This is our third time calculating this distribution. n and is zero for all other values.2. 1 let Y count the number of heads in one toss. 3. FWn (t) = d (pt−1 + (1 − p)t)n = n(pt−1 + (1 − p)t)n−1 ((1 − p) − pt−2 ) and evaluating dt at 1 gives n(p + (1 − p))((1 − p) − p) = n((1 − p) − p). .. This describes a random variable X. Roll two dice and read off the sum on them. So P (Wn = n − 2j) = (j)pj (1 − p)n−j for j = 0. Let N denote the random variable of the sum on two dice. So Var(Wn ) = FWn (1) + FWn (1) − FWn (1)2 = n(n − 1)((1 − p) − p)2 + 2np + n((1 − p) − p) − n2 ((1 − p) − p)2 = −n((1 − p) − p)2 + 2np + n((1 − p) − p) = n((1 − p) − p)(1 − ((1 − p) − p)) + 2np = 2pn((1 − p) − p) + 2np = 2np(1 − p) − 2np2 + 2np = 2np(1 − p) + 2np(1 − p) = 4np(1 − p) Consider the following experiment. so that P (Y = k) = 2 if k = 0 or 1 and is 0 otherwise.60 n n CHAPTER 3.

The above sort of compound random variables appear naturally in the context of branching processes.D. Then FX (t) = FN (FY (t)) = FN ◦ FY .E. 2 descendants etc. That is. Thus in the second generation. FN = 1 12 1 1 n n=2 2 36 (6−|7−n|)t and FY (t) = 1/2+1/2t. of subatomic particles) in the following way.2.n where X1. Now in the first generation we have X1 individuals. INTRODUCING STOCHASTIC PROCESSES 61 Proposition 3. each of which will produce offspring according to the same probability distribution as its parent. each individual will produce offspring according to the same distribution as its ancestor X1 . So X0 = 1.n .2. Proof: P (X = k) = n=0 ∞ n P (N = n)P ( j=1 Yj = k) Now. So in our example above of rolling dice and then flipping coins. Therefore j=1 ∞ FX = n=1 P (N = n)(FY (t))n = FN (FY (t)) Q. of humans. p(2). The number of offspring X1 is a random variable with a certain distribution P (X1 = k) = p(k). we ask that in the nth generation. X2 is the compound random variable of X1 and itself.3.2. p(1) is the probability that X0 has one descendent.n is an independent copy of X1 . Continuing on. X1 X2 = n=1 X1. Let X be the composite of two random variables N and Y . We start with a single individual in the zeroth generation. Suppose we try to model a population (of amoebae. We denote the number of members in the nth generation by Xn . Thus Xn−1 Xn = n=1 X1. p(0) is the probability that X0 dies. FPn Yj = (FY (t))n since the Yj ’s are mutually independent. Therefore FX (t) = n=2 36 (6− n |7 − n|)(1/2 + 1/2t) from which one can read off the distribution of X. This individual splits into (or becomes) X1 individuals (offspring) in the first generation.

define its indicator to be the random variable 1A = Define E(X|A) = In the discrete case. then E(XY |A) = cE(Y |A).3 Conditional expectation We now introduce a very subtle concept of utmost importance to us: conditional expectation. this expression would be quite difficult. DISCRETE TIME Using generating functions we can calculate the distribution of Xn . / E(X · 1A ) .1. E(X + Y |A) = E(X|A) + E(Y |A). Given an event A.3. . 2. So we start with FX1 (t). 0 if ω ∈ A. (Properties of conditional expectation) 1. If X is a constant c on A. P (A) and in the continous case xp(x|A) dx Some basic properties of conditional expectation are Proposition 3. FXn = FX1 ◦ · · · n. FX2 (t) = FX1 (FX1 (t)) and more generally.62 CHAPTER 3. this equals xP (X = x|A) x 1 if ω ∈ A.times ◦ FX1 Without generating functions. Then by the formula for the generating function of a compound random variable. 3. We will begin by defining it in its simplest incarnation: expectation of a random variable conditioned on an event.

As j runs over the values of the variable Y . if Y0 . Y1 = y1 . . E(f (X)) ≥ f (E(X)) and E(f (X)|A) ≥ f (E(X|A)) . Yn ). Let X and Y be two random variables defined on the same sample space.3. . Thus we make the following Definition 3. . Letting Z = f (Y ). we may define E(X|Y0 = y0 . . . Yn = yn ) and E(X|Y0 .3. CONDITIONAL EXPECTATION 3. If X. Yn are a collection of random variables. These are the values of a new random variable. Y1 . The values of this random variable are E(X|Y = j) as j runs over the values of Y . then E(ZX|Y ) = ZE(X|Y ) . Y be two random variables defined on the same sample space. . . Then 1. then we can talk of the particular kind of event {Y = j}. . then E(X|Y ) = E(X). Some basic properties of the random variable E(X|Y ) are listed in the following proposition.3. But a more useful way to think of the conditional expectation of X conditioned on Y is as a random variable.1.3. 3. . Hence it makes sense to talk of E(X|Y = j). Y1 . Let X. If X and Y are two random variables defined on the same sample space. There is nothing really new here.2. Finally P [E(X|Y ) = e] = {j|E(X|Y =j)=e} P [Y = j] More generally. . E(X|Y = j) takes on different values. . . . then n E(X|B) = i=1 E(X|Ai ) P (Ai ) P (B) (Jensen’s inequality) A function f is convex if for all t ∈ [0. 2. If B = n i=1 63 Ai . (Tower property) E(E(X|Y )) = E(X). Proposition 3. Y are independent. 1] and x < y f satisfies the inequality f ((1 − t)x + ty) ≤ (1 − t)f (x) + tf (y) Then for f convex. Define a random variable called E(X|Y ) and called the expectation of X conditioned on Y .

. then E(X|Y ) = E(X). .3. . . . . Yn ) = ZE(X|Y0 . . If X is independent of Y0 . k (3.l klP ((f (Y ) = k) ∩ (X = l)|Y = j) = f (j) l lP (X = l|Y = j) = ZE(X|Y = j). . We have E(ZX|Y = j) = k. So E(X|Y = j) = = kP (X = k|Y = j) k kP (X = k) = E(X).3.3. DISCRETE TIME P (Y = j) {j|E(X|Y =j)=e} Then E(E(X|Y )) = = = = = = e eP (E(X|Y ) = e) E(X|Y = j)P (Y = j) j j( k kP (X = k|Y = j)) P (Y = j) j k k(P ((X = k) ∩ (Y = j))/P (Y = j))P (Y = j) j k kP ((X = k) ∩ (Y = j)) = k j kP ((X = k) ∩ (Y = j)) k kP ((X = k) ∩ j Y =j )= k kP (X = k) = E(X) (3. . . Yn ) = E(X). 3.3) Q.D. Then 1. . . .E. Let X. (3. Letting Z = f (Y0 . . .3. Yn be random variables defined on the same sample space. . . . . Yn ) Proof: See exercises.3. 2. . 3. Proposition 3. . If X.l klP ((Z = k) ∩ (X = l)|Y = j) = k. Yn . then E(X|Y0 .1) 2. Y0 . (Tower property) E(E(X|Y0 . . . . We also have an analogous statement for conditioning on a set of variables. P (E(X|Y ) = e) = CHAPTER 3. then E(ZX|Y0 . . Y are independent.2) This is true for all j so the conditional expectation if a constant. . .64 Proof: 1. Yn ). . . Yn )) = E(X).

let’s consider a very simple market consisting of two securities. is Sn a Martingale? There are two reasons why not. The arbitrage theorem of last chapter only gives us an existence theorem. You might as well just put your money under your mattress. This is the basic phenomena that we will take advantage of in the rest of this chapter to value derivatives. the expected discounted stock price at time 1 is the same as the value at time 0. a bond S1 and a stock S2 . If Sn were a Martingale. thus. What about the discounted security process? This would turn a bank account into a Martingale. In fact. Setting R = j ψj and qj = ψj /R as in the last chapter. For this we will simplify our market drastically and specify a more definite model for stock movements. E(Rd2 ) = S2 . Under the no-arbitrage condition. Now the value of the stock at time 1 is a random variable. there is a state price vector ψ so that S = Dψ. no one would buy it. Then the value of the stock at time 1 is d2j if state j occurs. This is the first step to finding Q such that the discounted stock prices are martingales.4 Discrete time finance Now let us return to the market. That is. The expected value with respect to Q of the stock at time 1 is EQ (d2 ) = j d2. . DISCRETE TIME FINANCE 65 3. you would expect to make as much money as in your bank account and people would just stick their money in the bank. If the discounted stock value were a Martingale. The first reason is the time value of money. which we will call d2 with respect to the probabilities Q. or to what is more or less the same thing. If Sn denotes the value of a security at time n. It does not tell us how to find Q in actual practice. But any other security would not be.4. to a probability on the set of states such that the expected value of the discounted price of the stock at time 1 is the same as the stock price at time 0. We’ve seen that the no-arbitrage hypothesis led to the existence of a state price vector. P (d2 = d2j ) = qj . Let’s go back to our model of the market from the last chapter. with respect to the probabilities Q which were arrived at by the no-arbitrage assumption.j qj = (D(ψ/R))2 = (S/R)2 = S2 /R In other words.3. The second reason why the discounted stock price process is not a Martingale is the non-existence of arbitrage opportunities. we have j qj = 1 and the qj ’s determine a probability Q on the set of states. Thus we have a vector of securities S and a pay-off matrix D = (dij ) where security i pays off dij in state j.

that is. u S1 S0 d S1 .66 CHAPTER 3. a portfolio consisting of the stock and bond so that no matter whether the stock goes . Now. The bond will go to erδt B0 . there will be only two states of nature. the stock is worth S0 and the bond is worth B0 . DISCRETE TIME Our market will consist of a single stock S and a bond B. At time 1. Let us change notation a little. u (up) or d (down). the stock will either go up to some u d value S1 or down to S1 . at time 1 · δt. Let’s rederive this based on an ab initio arbitrage argument for h.4.4. at time 0. Then we want u d S0 = EQ (e−rδt S1 ) = qe−rδt S1 + (1 − q)e−rδt S1 Thus we need u d d q(e−rδt S1 − e−rδt S1 ) + e−rδt S1 = S0 So d erδt S0 − S1 u d S1 − S1 S u − erδt S0 1−q = 1 u d S1 − S 1 q= (3. then according to our formula of last chapter. the value the derivative is V (h) = REQ (h(S1 )) u d = e−rδt (qh(S1 ) + (1 − q)h(S1 )) = e−rδt d erδt S0 − S1 u d S 1 − S1 u h(S1 ) + u S1 − erδt S0 u d S1 − S1 d h(S1 ) u = e−rδt h(S1 ) d erδt S0 − S1 u d S1 − S1 d + h(S1 ) u S1 − erδt S0 u d S1 − S 1 .1) According to our picture up there we need to find a probability Q such that EQ (e−rδt S1 ) = S0 . Let’s try to construct a replicating portfolio for h. B0 → erδt B0 (3.2) If h is a contingency claim on S. That is. Let q be the probability that the stock goes up.

suppose at time 0 we set up a portfolio θ= φ ψ (3. the portfolio is worth the same as the derivative. DISCRETE TIME FINANCE 67 up or down.3) φ units of stock S0 and ψ units of bond B0 .4. . We want this value to be the same as h(S1 ).4) Subtracting. u At time δt. the value of the portfolio at time 0 is d u d u h(S1 )−h(S1 ) h(S1 )S u −h(S1 )S d · S0 + erδt B1 (S u −S d ) 1 · B0 u d S1 −S1 0 1 1 u d u u erδt (h(S1 )−h(S1 )) h(S d )S1 −h(S1 )S d · S0 + e1rδt (S u −S d ) 1 u d erδt (S1 −S1 ) 1 1 d u erδt S0 −S1 S1 −erδt S0 −rδt u d e h(S1 ) S u −S d + h(S1 ) S u −S d 1 1 1 1 φS0 + ψB0 = = = (3. So we solve for φ and ψ: u u φS1 + ψerδt B0 = h(S1 ) d d φS1 + ψerδt B0 = h(S1 ) (3. or φS1 + ψerδt B0 if the state is d.3. Thus.4. So we find φ= while d u h(S1 ) − h(S1 ) u d S1 − S1 u u h(S1 ) − φS1 erδt B0 u d h(S1 )−h(S1 ) u S1 u −S d S1 1 erδt B0 ψ= = u h(S1 ) − = u u u d u u d u h(S1 )S1 − h(S1 )S1 − h(S1 )S1 + h(S1 )S1 u d erδt B0 (S1 − S1 ) u u u d h(S1 )S1 − h(S1 )S1 = u d erδt B0 (S1 − S1 ) Thus.5) .4. the value of this portfolio is worth either φS1 + ψerδt B0 if the d state is u. u d u d φ(S1 − S1 ) = h(S1 ) − h(S1 ).4.

4. uu ud S2δt − S2δt d 1 − q1 = u 1 − q1 = du d S2δt − erδt S1δt du dd S2δt − S2δt uu u S2δt − erδt S1δt uu ud S2δt − S2δt . We are interested in defining a reasonably flexible and concrete model for the movement of stocks. 1−q = q= u d u d Sδt − Sδt Sδt − Sδt (3. time 0 to some horizon T . each of length δt so that N δt = T . Two Steps •uu u q1 u 1−q1 •u q0 → •ud (3.68 CHAPTER 3. DISCRETE TIME Since the value of this portfolio and the derivative are the same at time 1. q u Sδt S0 1−q d Sδt u d Sδt − S0 erδt S0 erδt − Sδt . One Step Redo with δt .6) Then for a contigency claim h we have V (h) = e−rδt EQ (h). which will often be the expiration time for a derivative. they must be equal at time 0. Now we move on to a multi-period model. We now want to divide the time period from now.7) • 1−q0 d 1−q1 •d → d q1 •du •dd d q1 = u q1 = d dd erδt S1δt − S2δt . by the no-arbitrage hypothesis. into N periods. du dd S2δt − S2δt u ud erδt S1δt − S2δt .4. So we’ve arrived the value of the derivative by a somewhat different argument.

. etc. P (du) = (1−q0 )q1 . . . these coefficients are the probability of ending up at a particular node of the tree. dd} u u d d P (uu) = q0 q1 . u d h(S0 ) = e−rδt (q0 h(S1δt ) + (1 − q0 )h(S1δt )) u uu u ud d du d dd = e−rδt (q0 (e−rδt (q1 h(S2δt )+(1−q1 )h(S2δt )))+(1−q0 )(e−rδt (q1 h(S2δt )+(1−q1 )h(S2δt ))) u uu u ud d du d dd = e−2rδt (q0 q1 h(S2δt )+q0 (1−q1 )h(S2δt )+(1−q0 )q1 h(S2δt )+(1−q0 )(1−q1 )h(S2δt )).4. Thus. •• u First. u d S1δt − S1δt 69 1 − q0 = u S1δt − erδt S0 u d S1δt − S1δt Suppose h is a contingency claim. q0 (1 − q1 ) represents going up. then down. So what’s the general case? For N steps. Hence we have V (h) = e−2rδt EQ (h(S2 )). . DISCRETE TIME FINANCE q0 = d erδt S0 − S1δt . wN −1 ) = j=0 P (wj ) . let’s notice what the coefficients of h(S2δt ) are. du. Then we have u u uu u ud h(S1δt ) = e−rδt (q1 h(S2δt ) + (1 − q1 )h(S2δt )). wN −1 |wi = u or d}.3. P (dd) = (1−q0 )(1−q1 ).. d du d dd d h(S1δt ) = e−rδt (q1 h(S2δt ) + (1 − q1 )h(S2δt )). N −1 P (w0 w1 . the sample space is Ψ = {w0 w1 . and that we know h(S2δt ). That is. Finally. P (ud) = q0 (1−q1 ). Now h(S2 ) is a random variable depending on w0 w1 ∈ Ψ. We check that this is a probability on the final nodes of the two step tree: P (uu) + P (ud) + P (du) + P (dd) u u d d = q0 q1 + q0 (1 − q1 ) + (1 − q0 )q1 + (1 − q0 )(1 − q1 ) u u d d = q0 (q1 + 1 − q1 ) + (1 − q0 )(q1 + 1 − q1 ) = q0 + 1 − q0 = 1. q0 q1 represents u going up twice. if we define a sample space Ψ = {uu. Let’s try to organize this into an N -step tree. ud.

M0 ) ≥ Mn . . . .1.70 where P (w0 ) = and more generally P (wj ) = ( CHAPTER 3. We also define a stochastic process Mn to be a supermartingale if 1. Mn−1 = mn−1 . The following theorem is one expression of the fair game aspect of martingales. M0 ) ≤ Mn ). . Mn−1 . . Noting that E(Mn+1 |Mn = mn . M0 = m0 ) ≥ mn . Mn−1 . . . . Mn−1 = mn−1 . and 2. and 2. . E(|Mn |) < ∞. .5. Above as E(Mn+1 |Mn . M0 ) we may write the condition 2. . E(Mn+1 |Mn = mn . We say Mn is a submartingale if 1. . Mn−1 . . . M0 ) = Mn . E(|Mn |) < ∞. . (or equivalently E(Mn+1 |Mn . E(Mn+1 |Mn = mn . . Mn−1 = mn−1 .5 Basic martingale theory We will now begin our formal study of martingales. We say Mn is a Martingale if 1. . . . . . . DISCRETE TIME q0 if w0 = u 1 − q0 if w0 = d q1 0 j−1 if wj = u w0 ···wj−1 1 − q1 ifwj = d w ···w V (h) = e−rN δt EQ (h(SN )) 3. M0 = m0 ) ≤ mn (or equivalently E(Mn+1 |Mn . Mn−1 = mn−1 . Let Mn be a stochastic process. . . . E(|Mn |) < ∞. . Mn−1 . . Definition 3. . and 2. M0 = m0 ) are the values of a random variable which we called E(Mn+1 |Mn . . . . . . E(Mn+1 |Mn = mn . M0 = m0 ) = mn .

. or to express a stochastic process in terms of other ones. It will often be useful to consider several stochastic processes at once. For n ≥ m. . E(Mn+1 − Mn |X0 = x0 . . is implied by 1. . . Proof 71 1. m. is a martingale relative to X0 . . . If Yn is a 1. A stochastic process M0 . . (Generalization) Let Xn be a stochastic process. . then E(Yn ) ≤ E(Ym ) for n ≥ m. For E(Xn+1 |X0 = x0 . . But we will usually assume that there is some basic or driving stochastic process and that others are expressed in terms of this one and defined on the same sample space. .1. Definition 3. . Q. together. . 2. 2.D. and 2. supermartingale. . Xn = xn ) = E(Xn |X0 = x0 . submartingale. Y0 )) ≤ E(Ym ). . . 3. BASIC MARTINGALE THEORY Theorem 3.2. Xn = xn ) + E(ξn+1 |X0 = x0 . . is similar. . Xn if 1. Xn = xn ) Now by property 1 of conditional expectation for the first summand and property 2 for the second this is = Xn + E(ξn+1 ) = Xn . the tower property of conditional expectation implies E(Yn ) = E(E(Yn |Ym . 3. . . Xn = xn ) = 0. Let Xn = a + ξ1 + · · · ξn where ξi are independent random variables with E(|ξi |) < ∞ and E(ξi ) = 0. . Xn = xn ) = E(Xn + ξn+1 |X0 = x0 . Examples 1. E(|Mn |) < ∞ and 2.3. .5.5. .5. . . Then Xn is a martingale. .E. martingale E(Yn ) = E(Ym ) for n. then E(Yn ) ≥ E(Ym ) for n ≥ m. M1 . . Thus we make the following more general definition of a martingale. . Ym−1 . . . .

For by the tower property of conditional expectation. . Then Xn is a martingale.5. . . given a stochastic process Xk . . . . Let F(X0 .72 and thus a martingale. X0 ) is by definition E(E(Y |Xn+1 . . . . . We restate Proposition (3. Set vn = n σi . . ξi independent random variables. Then Yn is a martingale. . . X0 ). S0 constant. . . . . the symmetric random walk. . σ 2 (ξi ) = σi . Sn . we have E(Yn+1 |Xn . 2 2 2 E(ξi ) = 0. Let Sn = S0 +ξ1 +· · ·+ξn . . . Let Xn = aξ1 · ξ2 · · · ξn where ξi are independent random variables with E(|ξi |) < ∞ and E(ξi ) = 1. 3. . . . . 2. X0 ) = Yn We now introduce a very useful notation. . . . X0 )) = E(Y |Xn . . We also have an analogous statement for conditioning on a set of variables. Fn represents all the information that is knowable about the stochastic process Xk at time n and we call it the filtration induced by Xn . . . . Sn ) 2 2 = E(2Sn ξn+1 + ξn+1 − σn+1 |S0 . DISCRETE TIME A special case of this is when ξi are IID with P (ξi = +1) = 1/2. . . Then we may define Yn = E(Y |Xn . More compactly. . Xn ). . This will be very important to us. . Xn ) be the collection of random variables that can be written as g(X0 . . We write E(Y |Fn ) for E(Y |X0 . we let Fn = F(X0 . . (3. . . Let Xn be a driving stochastic process and let Y be a random variable defined on the same sample space. We leave the verification of this to you. Then Mn = Sn − vn is a i=1 martingale relative to S0 . CHAPTER 3.1) and thus E(Mn+1 − Mn |S0 . . . . For 2 2 Mn+1 = (Sn+1 − vn+1 ) − (Sn − vn ) 2 = (Sn + ξn+1 )2 − Sn − (vn+1 − vn ) 2 2 = 2Sn ξn+1 + ξn+1 − σn+1 . Sn ) 2 2 = 2Sn E(ξn+1 ) + σn+1 − σn+1 = 0 4. . . There is a multiplicative analogue of the previous example.3. . . . Xn ) for some function g of n + 1 variables. . S1 . Xn ).3) in this notation. There is a very general way to get martingales usually ascribed to Levy. X0 )|Xn . and P (ξi = −1) = 1/2.

which gives us yet another way of forming new martingales from old. Let Sn be a stochastic process representing the price of a stock at the nth tick. Suppose we are playing a game and that our winnings per unit bet in the nth game is Xn − Xn − 1. . be a stochastic process with filtration Fn . 2. Let φn denote the amount of our bet on the nth hand.5. Theorem 3. Let Yn be a supermartingale relative to X0 . Xn ). This is the previsible hypothesis. . Let X. Fn the filtration associated to Xn . So X = f (X0 . . E(E(X|Fn )|Fm ) = E(X|Fm ) if m ≤ n. Letting X ∈ Fn . then E(XY |Fn ) = XE(Y |Fn ) Definition 3. 3. then E(X|Fn ) = E(X). and X a random variable defined on the same sample space.5. . Then Gn = G0 + n φm (Ym − Ym−1 ) is an Fn -supermartingale. . m=1 . . X1 . . .. . . Given a bet of φn on the nth hand our winnings playing this game up til time n will be φ1 (X1 − X0 ) + φ2 (X2 − X1 ) + · · · φn (Xn − Xn−1 n = j=1 φj (Xj − Xj−1 n = j=1 φj ∆Xj This is called the martingale transform of φ with respect to Xn . .1. We denote it more compactly by ( φ∆X)n .5. X0 . . 2. We motivate the definitions above and the following construction by thinking of a gambling game. . BASIC MARTINGALE THEORY 73 Proposition 3. adapted to Xn (or to Fn ) if φn ∈ Fn for all n.3. Xn are independent. Let Xn be a stochastic process and Fn its filtration.3. We say a process φn is 1.5. Let φn be a previsible process and 0 ≤ φn ≤ cn . .2. X0 . . previsible relative to Xn (or Fn ) if φ ∈ Fn−1 . If X. Then 1. It is a completely natural assumption that φn should only depend on the history of the game up to time n−1. . . The name comes from the second theorem below. Xn . .

DISCRETE TIME Proof Gn+1 = Gn + φn+1 (Yn+1 − Yn ). if n > T. . Let Yn be a submartingale relative to Xi and T a stopping time. a random variable.e. 2. (3.4. 1. . 3. (3. Theorem 3. Xn . set φn (ω) = 1 if T (ω) ≥ n. . then Gn = G0 + n φm (Ym − Ym−1 ) is a martingale. . Then YT ∧n is a martingale. Now {φn (ω) = 0} = {T ≤ n − 1} = n−1 {T = k} k=0 is determined by X0 . . . . .5. (3.. Then YT ∧n is a supermartingale. {T (ω) = n} is determined by X0 . . If φn+1 ∈ Fn . . Xn ). i. .2) Let’s check that φn ∈ F(X0 . Thus (T ∧ n)(ω) = T (ω) if T (ω) < n.5.5.5. is called a stopping time relative to Xi . T . Q. . . . |φn | ≤ cn . . . we define a simple betting strategy of betting 1 until the stopping time tells us to stop.5) . Let Yn be a supermartingale relative to Xi and T a stopping time. m=1 Definition 3. .74 CHAPTER 3. . . . . φn = 1 0 if n ≤ T. . . Xn . . .5. Xn ) = E(φn+1 (Yn+1 − Yn )|X0 .5. .3. if the event {T = n} is determined by X0 . Then YT ∧n is a submartingale. Let Yn be a martingale relative to Xi and T a stopping time.5.2. Proof: Let φn be the previsible process defined above. Xn−1 . By definition. Xn ) = φn+1 E(Yn+1 − Yn |X0 .E. Example: In this example. 0 if T (ω) < n. Thus E(Gn+1 − Gn )|X0 . Xn ) ≤ φn+1 · 0 ≤ 0. = n if T (ω) ≥ n.4) Theorem 3.D. and Yn is a martingale relative to Xi .3) (3. Thus. Let T ∧ n be the minimum of T and n. . .

which is a martingale. that is. Proof We know that E(MT ∧n ) = E(M0 ) since by the previous theorem MT ∧n is a martingale.4. identically distributed random 2 n variables. An alternative version is Theorem 3.6) So it follows by the above. T a stopping time. This seems like a sure fire way to win. if Xn is a martingale and φn is previsible then the martingale transform . T a stopping time relative to Xn with P (Tξ < ∞) = 1. (Doob’s optional stopping theorem) Let Mn be a martingale relative to Xn . Now define and T = min{n|ξn = 1}. Now consider the process Gn = ( φ∆X)n . we find Theorem 3. the object that gives martingales their name. To define φn we set inductively φn = 2φn−1 if ξn−1 = −1 and φn = 1 if ξn−1 = 1. But it requires an arbitrarily pot to make this work. Q. Q.5. then E(MT ) = E(M0 ). On the other hand. If Mn is a martingale relative to Xn . you always win 1. Example: The following betting strategy of doubling after each loss is known to most children and is in fact THE classic martingale. We now discuss one last theorem before getting back to the finance. Under more realistic hypotheses. This is the martingale representation theorem. when you finely win according to this betting strategy. = YT ∧n 75 (3.E. Then φn is previsible. then E(MT ) = E(M0 ). According to what we proved above. Then Gn is a martingale by a previous theorem. Xn = i=1 ξi .5. |Mn+1 − Mn | ≤ K and E(T ) < ∞.D. BASIC MARTINGALE THEORY Then Gn = Y0 + n φm (Ym − Ym−1 ) m=1 = Y0 + φ1 (Y1 − Y0 ) + φ2 (Y2 − Y1 ) + · · · + φn (Yn − Yn−1 ) Yn if n ≤ T = YT if n > T.D. If there is a K such that |MT ∧n | ≤ K for all n.5.5.E.3. Let P (ξ = ±1) = 1 . Then |E(M0 ) − E(MT )| = |E(MT ∧n ) − E(MT )| ≤ 2KP (T > n) → 0 as n → ∞. E(GT ) = 1.5. ξi are independent. Now φn is the previsible process defined in the following way: φ1 = 1.

. . Then there exists a previsible process φn such that n Mn = i=1 φi (Xi − Xi−1 ) = ( φ∆X)n . Xn . . . .5. .5. .8) Theorem 3.7) So in order that Xn be a martingale we need pa + (1 − p)b = 0 (3. X1 . . Xn )|Fn ) = Xn + pa + (1 − p)b (3. . we need a condition on our driving martingale.76 CHAPTER 3. . but what the steps she can take depend on the path up to time n. . be a generalized random walk with associated filtration Fn . . The martingale representation theorem is a converse of this. Thus Xn+1 = Xn + ξn+1 (X1 . For this. . . .5. . b. Let Mn be a Martingale relative to Fn . A generalized random walk or binomial process is a stochastic process Xn in which at each time n. . DISCRETE TIME ( φ∆X)n is also a martingale. . Xn ) = b) = 1 − p where I am being a little sloppy in that a. . . X2 . It gives conditions under which a martingale can be expressed as a martingale transform of a given martingale. .6. and p are all dependent on X1 .5.5. Then E(Xn+1 |Fn ) = E(Xn + ξn (X0 . . . Thus we define Definition 3. . . the value of Xn+1 − Xn can only be one of two possible values. This like an ordinary random walk in that at every step the random walker can take only 2 possible steps. Let X0 . Xn ) Hence assume that P (ξn+1 (X1 . . Xn ) = a) = p and P (ξn+1 (X1 . Xn )|Fn ) = Xn + E(ξn (X0 . .

So continuing the earlier calculation. Xn+1 |Fn ) − fn (X0 . Clearly. . . Or equivalently.. Xn+1 ) − fn (X0 . . Now 0 = E(Mn+1 − Mn |Fn ) = E(fn+1 (X0 .3. and Xn +b. . Xn+1 |Fn ) − fn (X0 .E. .. . . . . Xn . .Xn . . .8). Xn and is therefore previsible. Both calculations are the same.Xn )) + (−ap). . . Xn ). Xn . .. . (assuming neither a nor p are zero. . φn+1 only depends on X0 . .. .. Xn .. . . . .. . Xn )) · p + (fn+1 (X0 . Xn ).. we need to show there exists φn+1 ∈ Fn with Mn+1 = φn+1 (Xn+1 − Xn ) + Mn . .Xn . .5. .. with probability p. So if Xn+1 = Xn + a then = φn+1 (Xn+1 − Xn ) + Mn = φn+1 a + Mn = Mn+1 Q.5. ..5. . Xn + a) − fn (X0 .Xn +a)−fn (X0 . Xn ..9) Now by assumption. Xn + a) − fn (X0 ...Xn )) = ap a (fn+1 (X0 . . Xn + b) − fn (X0 . .Xn +b)−fn (X0 . . . .. Xn ) = (fn+1 (X0 . . .. .Xn )) + b(1 − p) b (fn+1 (X0 . (3. with probability 1−p. . ..10) since b(1 − p) = −ap by (3.. . By induction. ... .. .5. . Xn+1 )|Fn ) = fn+1 (X0 . Xn )|Fn ) = E(fn+1 (X0 .. . . . .Xn +b)−fn (X0 . .. . .. Xn + b) · (1 − p). . . . Xn + b) − fn (X0 ... . . (fn+1 (X0 . Xn )) · (1 − p) +a)−fn (X0 .. . .. . . ..Xn )) ap = (fn+1 (X0 . So E(fn+1 (X0 . .Xn . . .Xn . BASIC MARTINGALE THEORY 77 Proof Since Mn is adapted to Xn .. there exist functions fn such that Mn = fn (X0 . . . Xn + a) · p + fn+1 (X0 . . . . 0 = E(fn+1 (X0 . .. . . .. Now we have to cases to consider: if Xn+1 = Xn + a or Xn+1 = Xn + b. Xn )) b Set φn+1 to be equal to this common value. . Xn .D. Xn ..Xna (fn+1 (X0 . . . Xn )) a (fn+1 (X0 . Xn+1 can take on only 2 possible values: Xn + a. b (3. .

The self-financing condition is derived from the following considerations. ˜ Step 1: Find Q such that Sn = e−rnδt Sn is a Q-martingale. There are three steps to pricing h. ˜ ˜ Thus Πn = φn Sn + ψn Bn . In order for the portfolio to be viewed as equivalent to the claim h. Let Fn denote ˜ the filtration corresponding to S. between the n − 1st and nth tick. 1.78 CHAPTER 3. that our riskless bond is given by the standard formula. then Vn is a martingale as well. If Vn represents some price ˜ process. since decisions about what to hold during the nth tick must be made prior to the nth tick. So suppose we have a stock whose price process we denote by Sn . DISCRETE TIME 3. We require two conditions on Π. This too is financially justified. So Πn = φn Sn + ψn B0 . 2. to be concrete. it must act like the claim. ˜n = E(e−rN δt h(SN )|Fn ). Bn = enrδt B0 . We will assume. φn and ψn should be previsible. And this is also what we require of the portfolio. where each tick of the clock represents some time interval δt. where Bn = ernδt B0 . Therefore we require (SF C1) φn Sn−1 + ψn Bn−1 = φn−1 Sn−1 + ψn−1 Bn−1 . Step 3: Construct a self-financing hedging portfolio. then Vn = e−nrδt Vn is the discounted price process. Just as in the one period case. After tick n − 1 the protfolio is worth Πn−1 = φn−1 Sn−1 + ψn−1 Bn−1 Between tick n − 1 and n we readjust the portfolio in anticipation of the nth tick. By ˜ Step 2: Set V the martingale representation theorem there exists a previsible φi such that n ˜ Vn = V0 + i=1 ˜ ˜ φi (Si − Si−1 ). the stock is still worth Sn−1 . The portfolio Π should be self financing. From a financial point of view this is a completely reasonable assumption. We use the following notation. . Also. The claim needs no infusion of money and provides no income until expiration.6 Pricing a derivative We now discuss the problem we were originally out to solve: to value a derivative. we will form a portfolio Π consisting of the stock and bond. But we only want to use the resources that are available to us via the portfolio itself. Suppose h denotes the payoff function from a derivative on Sn .

previsibility of φn and ψn .1) ˜ Now we have Vn = EQ (e−rN δt h(SN )|Fn ) and we can write n ˜ Vn = V0 + i=1 ˜ φi ∆Si . self-financing of the portfolio. So set ˜ ˜ ψn = (Vn − φn Sn )/B0 or equivalently ψn = (Vn − φn Sn )/Bn ˜ ˜ ˜ Clearly.3. Now we want Πn = φn Sn + ψn B0 to equal Vn . We now need to check two things: 1. PRICING A DERIVATIVE 79 Another equivalent formulation of the self-financing condition is arrived at by (∆Π)n = Πn − Πn−1 = φn Sn + ψn Bn − (φn−1 Sn−1 + ψn−1 Bn−1 ) = φn Sn + ψn Bn − (φn Sn−1 + ψn Bn−1 ) = φn (Sn − Sn−1 ) − ψn (Bn − Bn−1 ) = φn (∆S)n + ψn (∆B)n So we have (SCF 2) (∆Π)n = φn (∆S)n + ψn (∆B)n . 1. As for ψn . Πn = φn Sn + ψn B0 = Vn .6. ˜ ˜ ˜ for some previsible φi . we have ˜ ˜ Vn − φn Sn ψn = B0 n−1 ˜ ˜ ˜ ˜ ˜ φn (Sn − Sn−1 ) + j=1 φj (Sj − Sj−1 ) − φn Sn = B0 n−1 ˜ ˜ ˜ −φn Sn−1 + j=1 φj (Sj − Sj−1 ) = B0 .6. (3. 2. Previsibility: φn is previsible by the martingale representation theorem.

hence ψn is previsible. So finally we have constructed a self-financing hedging strategy for the derivative h. 2. By the no-arbitrage hypothesis.80 CHAPTER 3. This does not take into account our aversion to risk. we just want to find a portfolio Πn = (φn . In our first go round. will just result in investing everything in the stock. ψn ) which maximizes something. U is required to satisfy two conditions: . Sn and the bond Bn .7 Dynamic portfolio selection In this section we will use our martingale theory and derivative pricing theory to solve a problem in portfolio selection. It is better to maximize E(U (Πn )) where U is a (differentiable) utility function that encodes our risk aversion. DISCRETE TIME ˜ which is manifestly independent of Sn . because it has a higher expected return. We will carry this out in a rather simplified situation. Thus suppose that we want to invest in a stock. Self-financing: We need to verify φn Sn−1 + ψn Bn−1 = φn−1 Sn−1 + ψn−1 Bn−1 or what is the same thing (SF C3) ˜ (φn − φn−1 )Sn−1 + (ψn − ψn−1 )B0 = 0 We calculate the right term (ψn − ψn−1 )B0 =( ˜ ˜ ˜ ˜ Vn − φn Sn Vn−1 − φn−1 Sn−1 − )B0 B0 B0 ˜ ˜ ˜ ˜ = Vn − Vn−1 + φn−1 Sn−1 − φn Sn ˜ ˜ ˜ ˜ = φn (Sn − Sn−1 ) + φn−1 Sn−1 − φn Sn ˜ = −(φn − φn−1 )Sn−1 which clearly implies (SFC3). To maximize the expected value of ΠN where N represents the time horizon. but the techniques are valid in much more generality. we must have that the value of h ˜ at time n (including time 0 or now) is V (h)n = Πn = Vn = ernδt Vn = ernδt EQ (e−rN δt h(SN )) = e−r(N −n)δt EQ (h(SN )) 3.

Thus Πn should satisfy SFC. Self-financing previsible portfolio processes Πn and 2. since we want to maximize in the real world. And finally. if we want to let this portfolio go. which just represents that more money is better than less. it also must satisfy that it is previsible.3. If in addition we want the value of the portfolio to satisfy Π0 = W0 then our problem max(EP (ln(ΠN ))) subject to Π0 = 0 can be reformulated as max(EP (ln(h))) subject to the value of h at time 0 is W0 . then we want it to be self financing. for Πn to be realistically tradeable. want that the value of our portfolio at time 0 is the amount that we are willing to invest in it.7. that is the utility of going from a billion dollars to a billion and one dollars is less than the utility of going from one hundred dollars to one hundred and one dollars. Second. and means U (x) > 0 2. DYNAMIC PORTFOLIO SELECTION 81 1. with no other additions or withdrawals. So we want Π0 = W0 for some initial wealth W0 . not the risk neutral one. that is U < 0 which represents that the utility of the absolute gain is decreasing. Some common utility functions are U0 (x) = ln(x) xp for 0 = p < 1 Up (x) = p (3. U should be decreasing. In chapter 3 we saw that there was an equivalence between two sets of objects: 1. So our problem is to now maximize EP (ln(Πn )) but we have constraints.1) We will carry out the problem for U (x) = U0 (x) = ln(x) and leave the case of Up (x) as a problem.7. Here P represents the real world probabilities. U should be increasing. First. derivatives h on the stock S.

since the value of a derivative h at time 0 is EQ (exp(−rT )h) where Q is the risk neutral probabilities. According to this method. We need the following theorem which allows us to compare them. In our discrete setting this theorem is easy. DISCRETE TIME And now we may use our derivative pricing formula to look at this condition. Then EQ (X) = ω X(ω)Q(ω) Q(ω) = EP (ξX) P (ω) = ω X(ω)P (ω) So now our problem is max(EP (ln(h))) subject to EP (ξ exp(−rT )h)) = W0 We can now approach this extremal problem by the method of Lagrange multipliers. For each outcome ω. (Radon-Nikodym) Given two probabilities P and Q assume that if P (E) = 0 then Q(E) = 0 for any event E. And we solve d | =0 (EP (ln(h + v))) − λEP (ξ exp(−rT )(h + v))) = 0 d .82 CHAPTER 3.1. to find an extremum of the function max(EP (ln(h))) subject to EP (ξ exp(−rT )h)) = W0 we introduce the mutliplier λ.7. set ξ(ω) = Q(ω)/P (ω). Thus the above problem becomes max(EP (ln(h))) subject to EQ (exp(−rT )h)) = W0 Now we are in a position where we are really using two probabilities at the same time. Proof. if P (ω) = 0 and 0 otherwise. Theorem 3. Then there exists a random variable ξ such that EQ (X) = EP (ξX) for any random variable X.

3) The sample space for this process can be taken to be sequences x1 x2 x3 · · · xN where xi is u or d representing either an up move or a down move. Hence h = exp(rT )/(λξ).7. To calculate λ we put h back into the constraint: W0 = EP (ξ exp(−rT )h) =EP (ξ exp(−rT ) · exp(rT )/(λξ)) =EP (1/λ) So λ = 1/W0 and thus finally we have h = W0 exp(rT )/ξ To go further we need to write down a specific model for the stock.2) and we want this to be 0 for all random variables v. Thus let X1 .3. . In order for this expectation to be 0 for all v. Now the left hand side is EP ( d d | =0 ln(h + v)) − λEP ( | =0 ξ exp(−rT )(h + v)) = d d 1 EP ( v − λξ exp(−rT )v) = h 1 EP ([ − λξ exp(−rT )]v) = h 83 (3. be IID with law P (Xj = ±1) = 1/2. . . it must be the case that 1 − λξ exp(−rT ) = 0 h almost surely. Set S0 be the initial value of the stock and inductively √ Sn = Sn−1 (1 + µ · δt + σ δtXn ) In other words.7. Sn is just a binomial stock model with √ u = (1 + µ · δt + σ δt) and √ d = (1 + µ · δt − σ δt) (3. .7. DYNAMIC PORTFOLIO SELECTION for all random variables v. Then by our formula for the risk neutral probability √ exp rδt − d rδt − (µδt − σ δt) √ √ q= = u−d (µδt + σ δt) − (µδt − σ δt) . X2 .

So q= and 1−q = 1 2 1−( 1 2 1+( CHAPTER 3.7.7.84 up to first order in δt.1: ξ(x1 x2 x3 · · · xN ) = ξ1 (x1 )ξ2 (x2 ) · · · ξN (xN ) where xi denotes either u or d and ξn (u) = and ξn (d) = q = p 1+( r−µ √ ) δt σ r−µ √ ) δt σ (3. DISCRETE TIME r−µ √ ) δt σ r−µ √ ) δt σ This then determines the random variable ξ in the Radon-Nikodym theorem.4) 1−q = 1−p 1−( We now know which derivative has as its replicating portfolio the portfolio that solves our optimization problem. For this we use our theory from chapter 3 again. it tells us how pick φn . φn = Vn − Vn−1 Sn − Sn−1 Now let’s recall that by the fact that Sn is a martingale with respect to Q u d Sn−1 = EQ (Sn |Fn−1 ) = q Sn + (1 − q)Sn . We now calculate what the replicating portfolio actually is. This tells us that we take Vn = EQ (exp(−rT )h|Fn ) Then by the martingale representation theorem there is a previsible process φn such that n Vn = V0 + k=1 φk (Sk − Sk−1 ) Moreover. 3.

Similarly. φn must be previsible and therefore independent of whether S takes an uptick or downtick at time n. we have d u Vn−1 = EQ (Vn |Fn−1 ) = q Vn + (1 − q)Vn From this. Theorem 3. Now we use the following refinement of the Radon-Nikodym theorem.3. Then for any event A.7. we see that if S takes an uptick at time n that φn = Vn − Vn−1 Sn − Sn−1 = u u d Vn − (q Vn + (1 − q)Vn ) u u d Sn − (q Sn + (1 − q)Sn ) = u d Vn − Vn u d Sn − S n A similar check will give the same exact answer if S took a downtick at time n. Given two probabilities P and Q assume that if P (E) = 0 then Q(E) = 0 for any event E. From this theorem it follows that Vn = EQ (h|Fn ) = Therefore we see that Vn = Vn−1 ξn (xn ) EP (ξ h|Fn ) EP (ξ|Fn ) EP (ξX|A) EP (ξ|A) . (After all. we have EQ (X|A) = I will ask you to prove this.7.) Finally let’s remark that φn can be expressed in terms of the undiscounted prices φn = u d Vn − Vn u d Sn − Sn = u d Vn − Vn u d Sn − Sn since both the numerator and the denominator are expressions at the same time n and therefore the discounting factor for the numerator and the denominator cancel.2. Let ξ be the random variable such that EQ (X) = EP (ξX) for any random variable X. DYNAMIC PORTFOLIO SELECTION 85 u where Sn denotes the value of Sn if S takes an uptick at time n.

Vn = ξVn−1) so n (xn u Vn = Vn−1 Vn−1 = √ ξn (u) 1 + ( r−µ ) δt σ Vn−1 Vn−1 = √ ξn (d) 1 − ( r−µ ) δt σ d Vn = We also have so √ u Sn = Sn−1 (1 + µδt + σ δt) √ u Sn = Sn−1 (1 + (µ − r)δt + σ δt) √ d Sn = Sn−1 (1 + (µ − r)δt − σ δt) up to first order in δt. Vn−1 represents the value of the portfolio at time n − 1.7. and φn represents . DISCRETE TIME recalling that xn denotes the nth tick and ξn is given by (3.5) up to terms of order δt. e Now we are ready to finish our calculation. So we have φn = Vn−1 ( µ−r ) σ2 Sn−1 Vn−1 ( µ−r ) σ2 = Sn−1 (3.7. So we have φn = d u Vn − Vn u d S n − Sn Vn−1 = Vn−1 Sn−1 Vn−1 Sn−1 = = 1 − 1−( r−µ )√δt ( ) σ √ Sn−1 (2σ δt)   √ √  1 − ( r−µ ) δt − 1 + ( r−µ ) δt  σ σ √ r−µ 2   1 − ( σ ) δt 2σ δt √ 2( r−µ ) δt σ √ − 2σ δt (1+( r−µ ) σ 1 √ δt) (3. Similarly to first order in δt.4).7.86 CHAPTER 3.6) the last equality is because the quotient of the two discounting factors cancel.

7. Hence the amount of money invested in the stock is φn Sn−1 = ( µ−r )Vn−1 .216. the S&P 500 has had excess return µ − r over the short term treasury bills of about . .058/(. with log utility. you should have the proportion µ−r of your portσ2 folio invested in the stock. DYNAMIC PORTFOLIO SELECTION 87 the number of shares of stock in this portfolio. That is. the optimal portfolio is one where one invests all ones wealth in the S&P 500. Doing the calculation with the values above yields .058 with a standard deviation of σ of .3.7.216)2 = 1. and borrows an additional 23% of current wealth and invests it also in the S&P. Over the 50 year period from 1926 to 1985. Thus formula (3.23 = 123%.6) expresses σ2 the fact that at every tick.

In the next three problems calculate E(X|Y ) for the two random variables X and Y . calculate the value of the same bond under the interest rate function r(t). (a continuous distribution. What is E(X|Bi ). 2. A fair die is rolled twice. so X is the sum of the values of the coins that landed tales up.88 CHAPTER 3. Toss a nickel. Now assume that interest rates are not constant but are described by a known function of time r(t). Your answer should have an integral in it. construct the pdf for this random variable. Let S0 . S1 be as in class. 3. what is EQ (S1 ). a dime and a quarter.) Calculate E(X|{X ≥ t}). If at least one of the balls is numbered 2 you win $1 otherwise . We arrived at the following formula for discounting. 5. From a bag containing four balls numbered 1. 6. Let X denote the sum of the amount shown. A die is rolled twice. a coin says how much it is worth only on the tales side. 7. S1 . What is E(X|A) where A is the event that the first roll was a 2. 2. Let X denote the sum of the values. for the one-step binomial model. 1. describe E(X|Y ) as a function of the values of Y . Let X be a random variable with pdf p(x) = α exp(−αx).8 Exercises d u 1. That is. i = 0. 4. Recalling the derivation of the formula above for constant interest rate. 2. Let Bi denote the event that i of the coins landed heads up. then at time t < T the bond is worth exp(−r(T − t)) assuming constant interest rates. If B is a bond with par value 1 at time T . X is the sum of the values. and Y is the value on the first roll. Also. 3 and 4 you draw two balls. For the pseudo-probabilities q of going up and 1 − q of going down that we calculated in class. DISCRETE TIME 3. 3. that is.

then n Mn = j=1 ξj is a Martingale. in which case he wins $1 and 1 − p of getting tails in which case he loses $1. Let X be the number of heads tossed and Y denote the number of honest coins drawn. Show (a) E(ξj ) = 0 for all j. (b) E(ξj ξk ) = 0 for all j = k. Two coins are drawn and tossed. (This means that ξj and ξk have 0 covariance. since their expectations are 0. Let X denote the amount won or lost.8. A bag contains three coins. Let Y denote the number on the first ball chosen. Calculate (a) E(X|W1 = −1). they are uncorrelated.3. (b) E(X|W1 = 1).) (Hint: You will have to use several of the properties of conditional expectations. It is not necessary for the ξj ’s to be independent. 9. let Mn be a Martingale and let ξj = Mj − Mj−1 be its increments. Thus. Let Wn denote the symmetric random walk starting at 0. W2 = −1) (d) E(X|W1 = 1. Let X denote the number of steps to the right the random walker takes in the first 6 steps. Suppose the gambler begins with $a . (c) E(X|W1 = 1. though they may not be independent. W3 = −1) 10. W2 = −1. We’ve seen that if ξn are independent with expectations all 0. you are asked to derive what is called the gambler’s ruin using techniques from Martingale theory. but one of them is fake and has two heads. that is p = 1/2 and W0 = 0.) In the following three problems. EXERCISES 89 you lose a $1. Suppose a gambler plays a game repeatedly of tossing a coin with probability p of getting heads. In this exercise you are asked to prove that. 8.

This seems counter intuitive and in fact doesn’t take into much account the gambler’s strong interest not to go bankrupt. . (By counting cards. Using Doob’s optional stopping theorem. As above. Let Fn denote the fortune of the . Define An = Sn − n(2p − 1) and Mn = ( 1−p )Sn . let ξj be IID with P (ξj = 1) = p and P (ξj = −1) = 1 − p and assume this time that p > 1/2. Show An and Mn are Martingales. 2. 14. . T = min{n|Sn = 0 or b} that is. let ξj . It is not hard to show that if the gambler wants to maximize his expected fortune after n trials. you are asked to find an optimal betting strategy using Martingale methods. in the following way. Suppose a gambler is playing a game repeatedly where the winnings of a $1 bet pay-off $1 and the probability of winning each round is p > 1/2 is in the gambler’s favor. 3. . In the following problems. Calculate E(T ) the expected time till bust or fortune in the following way: calculate E(AT ) by the optional stopping theorem. calculate E(MT ). the probability that the gambler will eventually go bankrupt. be IID random variables with distribution P (ξj = 1) = p and P (ξj = −1) = 1 − p. It turns out to make more sense to maximize not ones expected fortune. Let Sn denote the gamblers fortune after the nth toss. blackjack can be made such a game.90 CHAPTER 3. where 0 < p < 1. p 11.) What betting strategy should the gambler adopt. 13. j = 1. T is the time when the gambler goes bankrupt or achieves a fortune of b. 12. Calculate E(ST ). but the expected value of the logarithm his fortune. then he should bet his entire fortune each round. Calculate P (ST = 0). Solve for E(T ). DISCRETE TIME and keeps gambling until he either goes bankrupt or gets $b. Mathematically. Express E(AT ) in terms of E(ST ) and E(T ). a < b. Now use the definition of E(MT ) in terms of P (ST = j) to express P (ST = 0) in terms of E(MT ) and solve for P (ST = 0). that is Sn = a + ξ1 + · · · + ξn Define the stopping time.

. Your answer should not have a δt in it.8. Carry out a similar analysis to the above using the utility function p Up (x) = xp . that is. and he doesn’t ever want to bet his entire fortune. 16. Assume the gambler starts with $a and adopts a betting strategy φj . in the jth round he bets $φj . You will have to use Taylor’s theorem.3. What is the strategy? (Hint: You should bet a fixed percentage of your current fortune. that is the gambler must bet at least something to stay in the game. Reduce to terms of order δt. Then we can express his fortune n Fn = a + j=1 φj ξj We assume that 0 < φn < Fn−1 . Ln is a Martingale. Show that for some strategy. p < 1 not equal to 0.) 17. Show Ln = ln(Fn ) − ne is a super Martingale and thus E(ln(Fn /F0 )) ≤ ne. 15. (a previsible process). EXERCISES 91 gambler at time n. Let e = p ln(p) + (1 − p) ln(1 − p) + ln 2.

92 CHAPTER 3. DISCRETE TIME .

We will however discuss continuous time stochastic processes and their relationship to partial differential equations in more detail than most books. We will warm up by progressing slowly from stochastic processes discrete in both time and space to processes continuous in both time and space. τ2 . discrete in space We segue from our completely discrete processes by now allowing the time parameter to be continuous but still demanding space to be discrete. But the probabilistic infra structure needed to handle this is more formidable.1 Stochastic processes continuous in time. Thus our second kind of stochastic process we will consider is a process Xt where the time t is a real-valued deterministic variable and at each time t. the sample space Ψ of the random variable Xt is a (finite or infinite) discrete set. etc. . In such a case. An approach often taken is to model things by partial differential equations.Chapter 4 Stochastic processes and finance in continuous time In this chapter we begin our study of continuous time finance. We could develop it similarly to the model of chapter 3 using continuous time martingales. a state X of the entire process is described as a sequence of jump times (τ1 . martingale representation theorem. which is the approach we will take in this chapter. 4. . . change of measure.) and a sequence of values of X (elements of the common 93 .

We will let Qjk be the probability that the state starting at j will jump to k (when it jumps). . j1 . Pj (τ1 < s. We will make the usual stationarity and Markov assumptions as follows: We assume first that 2.. . . etc. to describe the stochastic process. Xt = j1 for τ1 < t < τ2 . Xτ1 = k. Fj (t) = P (τ1 < t | X0 = j).1. Xτ2 = l | X0 = j) = Fj (s)Qjk Fk (t)Qkl . ). a continuous random variable Fj that describes how long it will be before the process (starting at X = j) will jump.94 CHAPTER 4. and a set of discrete random variables Qj that describe the probabilities of jumping from j to other states when the jump occurs.1) (this is an independence assumption: where we jump to is independent of when we jump). All this is defined so that 1. Xτ2 = l) = P (τ1 < s. τ2 − τ1 < t. Thus. given that X0 = j. Instead of writing P (... . so Qjk = 1 k and Qjj = 0 We also define the function Fj (t) to be the probability that the process starting at X = j has already jumped at time t. we will begin to write Pj (. It will be convenient to define the quantity Pjk (t) to be the probability that Xt = k. we need. i. Xτ1 = k. These describe a ”path” of the system through S given by X(t) = j0 for 0 < t < τ1 . . . . these processes are often called ”jump” processes. τ2 − τ1 < t. Because X moves by a sequence of jumps. Xτ1 = k|X0 = j) = Fj (t)Qjk (4. . |X0 = j).e.. P (τ1 < t. CONTINUOUS TIME discrete state space) j0 .. i. Pjk (t) = Pj (Xt = k). for each j ∈ S.e.

This means that the probability of having to wait at least one minute more before the process jumps is independent of how long we have waited already. 95 The three assumptions 1. 2. CONTINUOUS TIME. so we must have that 1-Fj (t)=e−qj t for some positive number qj . First. and Fj (0)=0.e. ..4.. Xs = j) = Pjk (t − s). we translate this equation into 1 − Fj (t + s) = 1 − Fj (t) 1 − Fj (s) This implies that the function 1-Fj (t) is a multiplicative function. 3 will enable us to derive a (possibly infinite) system of differential equations for the probability functions Pjk (t) for j and k ∈ S. Using the formula for conditional probability (P (A|B) = P (A ∩ B)/P (B)). (From a group. i. . Fj is exponentially distributed with probability density function fj (t) = qj e−qj t To check this. . DISCRETE SPACE 3. the fact that the event τ1 > t + s is a subset of the event τ1 > s. note that ∞ for t ≥ 0 Pj (τ1 > t) = 1 − Fj (t) = t qj e−qj s ds = e−qj t so the two ways of computing Fj (t) agree .1. In other words.e. the stationarity assumption M1 tells us that Pj (τ1 > t + s | τ1 > s) = Pj (τ1 > t). Xsn = jn . The Markov assumption is P (Xt = k|Xs1 = j1 . i. . it must be an exponential function. It is decreasing since Fj (t) represents the probability that something happens before time t.theoretic point of view this means that the process is invariant under time translation). The next consequence of our stationarity and Markov assumptions are the Chapman-Kolmorgorov equations in this setting: Pjk (t + s) = l∈S Pjl (t)Plk (s). . we can take any time to be t = 0 and the rules governing the process will not change.

CONTINUOUS TIME They say that to calculate the probability of going from state j to state k in time t + s. and in the intervening time (between s and t) the process goes from l to k is (approximately) qj e−qj s Qjl Plk (t − s)ds To account for all of the ways of going from j to k in time t.96 CHAPTER 4. that no jump has occurred yet. Therefore. the probability that for l different from j. This reasoning yields the second term of the formula.e. That is why the Kronecker delta symbol is there (recall that δjk = 1 if j = k and = 0 if j is different from k). Based on this. i. For any s between 0 and t. This is an obvious consistency equation.. This term is zero unless j = k. Now we can begin the calculations that will produce our system of differential equations for Pjk (t). i. in which case it gives the probability that τ1 > t. The integral term can be best understood as a ”double sum”..e. and over all possible results l different from j of the first jump. that the state has remained j from the start. we see that the probability that the first jump occurs between time s and time s+ds. we need to add up these probabilities over all possible times s of the first jump (this gives an integral with respect to ds in the limit as ds goes to 0). it can be understood in a fairly intuitive way. but it will . and then by using the Chapman Kolmorgorov equation. We take it a term at a time: δjk e−qj t : This term is the probability that the first jump hasn’t happened yet. one can sum up all the ways of going from j to k via any l at the intermediate time t. Note that this will transform ds into -ds. but it will prove very useful later on. the first jump occurs between time s and time s+ds and the jump goes from j to l is (approximately) qj e−qj s Qjl ds. the definition of fj (t) tells us that the probability that the first jump occurs between time s and time s+ds is (approximately) fj (s)ds = qj e−qj s ds (where the error in the approximation goes to zero faster than ds as ds goes to zero). and the jump goes from j to l. The result we intend to prove is the following: t Pjk (t) = δjk e−qj t + 0 qj eqj s l=j Qjl Plk (t − s) ds As intimidating as this formula looks. We can make a change of variables in the integral term of the formula: let s be replaced by t-s. The first step is to express Pjk (t) based on the time of the first jump from j.

One last bit of notation! Let qjk = Pjk (0). Note if we know that X0 = j. So qjj = −qj and qjk = qj Qjk if j = k. CONTINUOUS TIME. we can differentiate both sides of the last equation with respect to t and show that Pjk (t) is also differentiable. DISCRETE SPACE 97 also reverse the limits of integration. But once Pjk is continuous. We take this derivative (and take note of the fact that when we differentiate the first factor we just get a constant times what we started with) using the fundamental theorem of calculus to arrive at: Pjk (t) = −qj Pjk (t) + qj l=j Qjl Plk (t). (Note that this implies that qjk = qj = −qjj k=j . then it must be true that Pjk (0) = δjk . A special case of this differential equation occurs when t=0.1. The result of this change of variables is (after taking out factors from the sum and integral that depend neither upon l nor upon s) : t Pjk (t) = δjk e−qj t + qj e−qj t 0 eqj s l=j Qjl Plk (s) ds t = e−qj t δjk + qj 0 eqj s l=j Qjl Plk (s) ds Now Pjk (t) is a continuous function of t because it is defined as the product of a continuous function with a constant plus a constant times the integral of a bounded function of t (the integrand is bounded on any bounded t interval).4. This observation enables us to write: Pjk (0) = −qj Pjk (0) + qj l=j Qjl Plk (0) = −qj δjk + qj l=j Qjl δlk = −qj δjk + qj Qjk .

but no deaths. we differentiate both sides of the Chapman Kolmogorov equation with respect to s and get Pjk (t + s) = l∈S Pjl (t)Plk (s) If we set s=0 in this equation and use the definition of qjk . Thus. Another system of differential equations we can derive is the forward equation of the process. discrete in space The simplest useful example of a stochastic process that is continuous in time but discrete in space is the Poisson process. Since we move one step to the right (on the j-axis) . it is one that experiences only births (one at a time) at random times. in such a process. the value of X can either jump up by 1 or stay the same.98 CHAPTER 4. The time until the process jumps up by one. In these processes.2 Examples of Stochastic processes continuous in time. This system of differential equations is called the backward equation of the process. To do this.j+1 and −qj . Pjk (t) = l∈S 4. starting from X=j. CONTINUOUS TIME so the sum of the qjk as k ranges over all of S is 0. we choose a positive real parameter λj which will be equal to both qj. the sample space for each random variable Xt is the set of non-negative integers. and for each non-negative integer j. The Poisson process is a special case of a set of processes called birth processes. we can rewrite our equation for Pjk (t) given above as follows: Pjk (t) = qjl Plk (t) l∈S for all t > 0. is exponentially distributed with parameter λj . The specialization in the Poisson process is that λj is chosen to have the same value. for all j. Given this notation. we get the forward equation: qlk Pjl (t). So if we think of X as representing a population. the constants qjk a re called the infinitesimal parameters of the stochastic process. λ.

j+1 = λ Therefore the forward equation reduces to: ∞ Pjk (t) = l=0 qlk Pjl (t) = qk−1.k−1 (t) − λPjk (t).4. we make the left side have the form of the derivative of a product by multiplying through by eλt : eλt f + λeλt f = eλt g . To solve such equations. though. we will consider equations of the form f + λf = g where f and g are functions of t and λ is a constant. one k at a time. we will use the preceding differential equation inductively to calculate Pjk (t) for k > j. In particular. To do this.k Pj. we can conclude without much thought that Pjk (t) = 0 for k < j and t > 0 and that Pjj (t) = e−λt for all t > 0. because qjk is nonzero only if j = k or if k = j + 1. Recall that the forward equation Pjk (t) = l∈S qlk Pjl (t) gave the rate of change in the probability of being at k (given that the process started at j) in terms of the probabilities of jumping to k from any l just at time t (the terms of the sum for l = k) and (minus) the probability of already being at k an d jumping away just at time t (recall that qkk is negative. For the Poisson process.k−1 (t) + qkk Pjk (t) = λPj. the forward equations simplify considerably. EXAMPLES 99 according to the exponential distribution. we must consider the system of differential equations (the ”forward equations”) derived in the preceding lecture. we need to review a little about linear first-order differential equations. To learn any more. Since we already know Pjk (t) for k ≤ j.2. And for these values we have: qjj = −qj = −λ and qj.

and λPj. We get that t Pjk (t) = e−λt Pjk (0) + λ 0 e−λ(t−s) Pj. ··· ··· ··· ··· ··· . . P03 (t) P13 (t) P23 (t) P33 (t) .100 CHAPTER 4. P01 (t) P11 (t) P21 (t) P31 (t) .k−1 (s) ds Since we are assuming that the process starts at j. . we have to fill in all the entries in the table: P00 (t) P10 (t) P20 (t) P30 (t) . .k−1 (t) for g(t). CONTINUOUS TIME We recognize the left as the derivative of the product of eλt and f: (eλt f ) = eλt g We can integrate both sides of this 9with s substituted for t) from 0 to t to recover f(t): t t d λs eλs g(s) ds (e f (s)) ds = 0 ds 0 Of course.. Now we are ready to compute.. . the integral of the derivative may be evaluated using the (second) fundamental theorem of calculus: t e f (t) − f (0) = 0 λt eλs g(s) ds Now we can solve algebraically for f(t): t f (t) = e−λt f (0) + e−λt 0 eλs g(s) ds Using this formula for the solution of the differential equation. we know that Pjk (0) = 0 for all k = j and Pjj (0) = 1. P04 (t) P14 (t) P24 (t) P34 (t) . .. . In other words Pjk (0) = δjk (this is the Kronecker delta). Note that we have to compute Pjk (t) for all possible nonnegative integer values of j and k. we substitute Pjk (t) for f(t). P02 (t) P12 (t) P22 (t) P32 (t) .e. . . In other words. X0 = j. i.

j+1 (t) = λPjj (t) to get that t Pj.j+2 (t) = λPj. .e. tells how many jumps have happened up until time t. . . . . EXAMPLES Of course. .j+1 (t) to get that t Pj.2.j+2 (t) + λPj.. . ··· ··· ··· ··· ··· 101 So we begin the induction: We know that Pjj = e−λt ..j+1 (t) = λ 0 e−λ(t−s) e−λs ds = λte−λt Note that this fills in the next diagonal of the table. has a Poisson distribution with parameter λt This is the important connection between the exponential distribution and the Poisson distribution. . i. so we can use the differential equation Pj. Then we can use the differential equation Pj.j+1 (t) + λPj.j+2 (t) = λ 0 e−λ(t−s) λse−λs ds = (λt)2 −λt e 2 Continuing by induction we get that Pjk (t) = (λt)k−j −λt e (k − j)! provided 0 ≤ j ≤ k and t > 0. . . . . P04 (t) P14 (t) P24 (t) P34 (t) . This means that the random variable Xt (this is the infinite discrete random variable that gives the value of j at time t. we already know part of the table: e−λt P01 (t) P02 (t) P03 (t) 0 e−λt P12 (t) P13 (t) 0 0 e−λt P23 (t) 0 0 0 e−λt .4. .

102 CHAPTER 4.1 The Wiener process Review of Taylor’s theorem Taylor series are familiar to every student of calculus as a means writing functions as power series.3.3 4. The real force of Taylor series is what is called Taylor’s theorem which tells one how well the Taylor polynomials approximate the function. b + k) = f (a. b)k 2 ) + o(h2 . As an example. b) + ∂f ∂f (a. b)k ∂x ∂y 1 ∂2f ∂2f ∂2f + ( 2 (a. We will be satisfied with the following version of Taylor’s theorem. Taylor’s theorem is just the higher order version of this statement.1. y) be a function of two variables defined on an open set of O ⊂ R2 .3. b)kh + 2 (a. k 2 . but only of low order. For two functions f and g. Definition 4. Let f (x. we say that f (h) is o(g(h)) (as h → 0) (read ”f is little o of g) if h→0 lim f (h)/g(h) = 0 It of course means that f (h) is smaller than g(h) (as h gets small) but in a very precise manner.3. Then for any a ∈ (α. b)h + (a. that the difference between the function h → f (a + h) and the linear function h → f (a) + bh is small as h → 0. Theorem 4. we introduce some notation. CONTINUOUS TIME 4. Let f be a function defined on an interval (α. b)h2 + 2 (a.2. Then for any point (a. β) be n times continuously differentiable. b) ∈ O one has f (a + k.3. β) f (n) (a) n f (a) 2 f (3) (a) 3 h + h + ··· + h + o(hn ) f (a + h) = f (a) + f (a)h + 2! 3! n! We will in fact need a multivariable version of this. Theorem 4. hk) 2 ∂x ∂x∂y ∂y . note that another way to say that f (a) = b is to say that f (a + h) − (f (a) + bh) = o(h) That is.1. First.

4.3. THE WIENER PROCESS

103

4.3.2

The Fokker-Plank equation

We will give a derivation of the Fokker-Planck equation, which governs the evolution of the probability density function of a random variable-valued function Xt that satisfies a ”first-order stochastic differential equation”. The variable evolves according to what in the literature is called a Wiener process, named after Norbert Wiener, who developed a great deal of the theory of stochastic processes during the first half of this century. We shall begin with a generalized version of random walk (an ”unbalanced” version where the probability of moving in one direction is not necessarily the same as that of moving in the other). Then we shall let both the space and time increments shrink in such a way that we can make sense of the limit as both ∆x and ∆y approach 0. Recall from the previous sections the one-dimensional random walk (on the integers) beginning at W0 = 0. We fix a parameter p, and let p be the probability of taking a step to the right, so that 1 − p is the probability of taking a step to the left, at any (integer-valued) time n. In symbols, this is: p = P (Wn+1 = x + 1|Wn = x) and 1 − p = P (Wn+1 = x − 1|Wn = x). Thus Wn = ξ1 + · · · + ξn where ξj are IID with P (ξj = 1) = p and P (ξj = −1) = 1 − p. Since we are given that W0 = 0, we know that the probability function of Wn will be essentially binomially distributed (with parameters n and p, except that at time n, x will go from −n to +n, skipping every other integer as discussed earlier). We will set vx,n = P (Wn = x). Then vx,n =
1 (n 2

n + x)

p 2 (n+x) (1 − p) 2 (n−x)

1

1

for x = −n, −(n − 2), . . . , (n − 2), n and vx,n = 0 otherwise. For use later, we calculate the expected value and variance of Wn . First,
n n n

E(Wn ) =
x=−n

xvx,n =
k=0

(2k − n)vx,2k−n =
k=0

(2k − n)

n k p (1 − p)n−k k

= n(2p − 1)

104

CHAPTER 4. CONTINUOUS TIME

(it’s an exercise to check the algebra; but the result is intuitive). Also,
n n

Var(Wn ) =
k=0

(2k − n) vx,2k−n =
k=0

2

(2k − n2 )

n k p (1 − p)n−k k

= 4np(1 − p) (another exercise!). Now lets form a new stochastic process W ∆ and we let the space steps ∆x and time steps ∆t be different from 1. W ∆ is a shorthand for W ∆t,∆x . First, the time steps. We will take time steps of length ∆t. This will have little effect on our formulas, except everywhere we will have to note that the number n of time steps needed to get from time 0 to time t is n= t ∆t

Each step to the left or right has length ∆x instead of length 1. This has the effect of scaling W by the factor ∆x. Since multiplying a random variable by a constant has the effect of multiplying its expected value by the same constant, and its variance by the square of the constant. Therefore, the mean and variance of the total displacement in time t are: E(Wt∆ ) = t (∆x)2 (2p − 1)∆x and Var(Wt∆ ) = 4p(1 − p)t ∆t ∆t

We want both of these quantities to remain finite for all fixed t, since we don’t want our process to have a positive probability of zooming off to infinity in finite time, and we also want Var(W (t)) to be non-zero. Otherwise the process will be completely deterministic. We need to examine what we have control over in order to achieve these ends. First off, we don’t expect to let ∆x and ∆t to go to zero in any arbitrary manner. In fact, it seems clear that in order for the variance to be finite and non-zero, we should insist that ∆t be roughly the same order of magnitude as (∆x)2 . This is because we don’t seem to have control over anything else in the expression for the variance. It is reasonable that p and q vary with ∆x, but we probably don’t want their product to approach zero (that would force our process to move always to the left or always to the right), and it can’t approach infinity (since the maximum value of p(1 − p) is 1/4). So we

4.3. THE WIENER PROCESS
2

105

are forced to make the assumption that (∆x) remain bounded as they both ∆t go to zero. In fact, we will go one step further. We will insist that (∆x)2 = 2D ∆t (4.3.1)

for some (necessarily positive) constant D. Now, we are in a bind with the expected value. The expected value will go to infinity as ∆x → 0, and the only parameter we have to control it is 2p − 1. We need 2p − 1 to have the same order of magnitude as ∆x in order to keep E(Wt∆ ) bounded. Since p = 1 − p when p = 1/2, we will make the assumption that c 1 ∆x (4.3.2) p= + 2 2D for some real (positive, negative or zero) constant c, which entails 1−p= 1 c − ∆x 2 2D

The reason for the choice of the parameters will become apparent. D is called the diffusion coefficient, and c is called the drift coefficient. ∆ Note that as ∆x and ∆t approach zero, the values vn,k = P (Wn∆t = k∆x) = p(k∆x, n∆t) specify values of the function p at more and more points in the plane. In fact, the points at which p is defined become more and more dense in the plane as ∆x and ∆t approach zero. This means that given any point in the plane, we can find a sequence of points at which such an p is defined, that approach our given point as ∆x and ∆t approach zero. We would like to be able to assert the existence of the limiting function, and that it expresses how the probability of a continuous random variable evolves with time. We will not quite do this, but, assuming the existence of the limiting function, we will derive a partial differential equation that it must satisfy. Then we shall use a probabilistic argument to obtain a candidate limiting function. Finally, we will check that the candidate limiting function is a solution of the partial differential equation. This isn’t quite a proof of the result we would like, but it touches on the important steps in the argument. We begin with the derivation of the partial differential equation. To do this, we note that for the original random walk, it was true that vx,n satisfies the following difference equation: vk,n+1 = pvk−1,n + (1 − p)vk+1,n

t) + ∆(t) p(x + ∆x. t) + · · · ∂t ∂p (∆x)2 ∂ 2 p (x. we divide through by ∆t. t) + ∆(x) ∂p (x. and as we do this. t) + (1 − p)p(x + ∆x. the binomial distribution has mean ∆x t(2p − 1) = 2ct ∆t . The key step in this argument uses the fact that the binomial distribution with parameters n and p approaches the normal distribution with expected value np and variance np(1 − p) as n approaches infinity. we need Taylor’s expansion for a function of two variables. We get: p(x. t + ∆t) = p(x. Finally. we must account for both ∆x and (∆x)2 terms (but (∆x)3 and ∆t∆x both go to zero faster than ∆t). we see that at time t. t) Next. we get ∂p ∂p (∆x)2 ∂ 2 p ∆t = −(2p − 1)∆x + ∂t ∂x 2 ∂x2 (where all the partial derivatives are evaluated at (x.106 CHAPTER 4. As we do the expansion.n∆t . CONTINUOUS TIME This is just the matrix of the (infinite discrete) Markov process that defines the random walk. We expand each of the three expressions in equation (*) around the point (x. p and 1 − p. t) = p(x. t) = p(x. In terms of our assumptions about ∆x. t) + (x. we use a probabilistic argument to find the limit of the random walk distribution as ∆x and ∆t approach 0. t) notation instead of vk∆x. t) + · · · p(x − ∆x. ∆t realm. keeping terms up to order ∆t and (∆x)2 . t). we must keep in mind that we are going to let ∆t → 0. t + ∆t) = pp(x − ∆x. t)). Thus. and 2p − 1 = ∆xc/D to arrive at the equation: ∂p ∂2p ∂p = −2c +D 2 ∂t ∂x ∂x This last partial differential equation is called the Fokker-Planck equation. t) + · · · ∂x 2 ∂x2 (∆x)2 ∂ 2 p ∂p (x. When we pass into the ∆x. t) − ∆(x) (x. and recall that (∆x)2 /∆t = 2D. t) + ∂x 2 ∂x2 We substitute these three equations into (*). ∆t. and use p(x. we must account for all terms of order up to and including that of ∆t. To find a solution of the Fokker-Planck equation. this equation becomes: (∗) p(x. recalling that p + (1 − p) = 1.

The corresponding process 2 is called the Wiener process (or Brownian motion). 1. 5. t) = √ 2 πDt A special case of this is when c = 0 and D = 1 . 3.4. For t > s Wt − Ws is normally distributed with mean 0 and variance t − s. Thus. Therefore. The following properties characterize the Wiener process. Theorem 4. Ws − Wt and Wu − Wv are independent. we expect Wt in the limit to be normally distributed with mean 2ct and variance 2Dt. This is clear since we defined Wt as a limit of random walks which all started at 0. For T ≥ s ≥ t ≥ u ≥ v.3. Wt is a continuous function of t with probability 1. 4. Now Wt∆ = ∆x(ξ1 + · · · ξn ) where n = t/∆t. W ∆ .3. W0 = 0 with probability 1. u = k∆t and v = l∆t where n ≥ m ≥ k ≥ l then we have Wt − Ws = ∆x(ξm+1 + ξm+2 + · · · ξn ) . So if t = n∆t. we defined Wt as a limit of random walks. 3. Again. the probability that Wt < x is x−2ct √ 2Dt 1 2 1 √ P (Wt < x) = e− 2 y dy 2π −∞ 4p(1 − p)t Therefore the pdf of Wt is the derivative of this with respect to x. 1. s = m∆t. The probability density for Wt is p(x. This we derived from the Fokker-Plank equation. which we record as a theorem. t) = √ 1 − x2 e 2t 2πt Proof: Let’s discuss these points. 2.3. THE WIENER PROCESS and variance 107 (∆x)2 ∆t which approaches 2Dt as ∆t → 0. 4. which yields the pdf (x−2ct)2 1 e− 4Dt p(x.

. . they are very rough and we see according to the next proposition that they are not differentiable. 2. While. 2. Almost everything we use about the Wiener process will follow from ??. . Now this gives 1 − P (|Wh | ≤ nh) Now P (|Wh | ≤ nh)) = √ ≤√ 1 2πh nh 1 2πh nh −nh e− 2t dx 1 2hn 2πh x2 1dx ≤ √ −nh 2h π which converges to 0 as h → 0 for any n. We now do some sample computations to show how to manipulate the Wiener process and some of which will be useful later. Calculate the expected distance of the Wiener process from 0.108 and CHAPTER 4.3. 1. according to 2. This fact is beyond the scope of our discussion but is intuitive. For all t > 0 and all n = 1. the paths Wt are continuous.. Q. Proposition 4. Then P (|Wt+h − Wt | > nh) = P (|Wh | > nh) by 3) of ??.D.E.D. Solution: This is just E(|Wt |) which is = ∞ |x| √ −∞ 1 − x2 e 2t 2πt . above. Q.E. h→0 lim P (|Wt+h − Wt | > n|h|) = 1 so that Wt is nowhere differentiable. Proof: Assume without loss of generality that h > 0.4. CONTINUOUS TIME Wu − Wv = ∆x(ξl+1 + · · · ξk ) which are independent. with probability 1.

Calculate E(Wt Ws ). W0 = y. t > s. 6.3. Wu − Wv are independent for t ≥ s ≥ u ≥ v. therefore we get E(Wt − Ws )E(Ws ) + σ 2 (Ws ) = s since Wt − Ws has mean 0. Solution: Assume without loss of generality that t > s. Wty − Wsy is normally distributed with µ = 0. u = x2 /(2t) this integral becomes 2t √ 2πt ∞ e−u du = 0 2t/π 2. The pdf is given by p(x. The Wiener Process Starting at y. Then E(Wt Ws ) = E((Wt − Ws )Ws + Ws2 ) = E((Wt − Ws )Ws ) + E(Ws2 ) Now Wt − Ws is independent of Ws since t > s and Ws has mean 0. y y 5. 3. 4. Wty − Wsy . 2. W y The defining properties of Wty = Wt + y. THE WIENER PROCESS ∞ 109 x√ 0 =2 1 − x2 e 2t 2πt Changing variables. the Wiener process starting at y. t|y) = √ 1 − (x−y)2 2t e 2πt 1 2πt b a (x−y) e− 2t dx 2 P (a ≤ Wty ≤ b) = √ .4. σ 2 = t − s. Wty is normally distributed with µ = y. are: y 1. Wty is continuous with probability 1. σ 2 = t.

φn denoted the amount of stock at tick n. (Riemann’s Theorem) Let f be a function such that the probability of choosing a point of discontinuity is 0. dt By integrating both sides we arrive at the following integral equation t f (t) − f (t0 ) = t0 df dt = dt t t φ(f (t). . . . t) dt . f (t) = f (t0 ) + t0 φ(f (t). t) .5. Our first goal is just to define a reasonable notion of stochastic integral. b Recall the Riemann integral a f (x) dx: Take a partition of [a.3. So we try to make the corresponding conversion from differential to integral equation in the stochastic world. t0 That is. let ti−1 < ξi < ti . . We had the Martingale Transform: If Sn was a stock. so why should St be differentiable anywhere? (2) “Noise” is very difficult to make rigorous. t) dt . CONTINUOUS TIME How do we use Wiener Process to model stock movements? dSt = µSt +noise(σ. b]. Theorem 4. Let mesh P = max{|ti − ti−1 | |i = 1. There are two obvious problems: (1) THe Wiener process is differentiable nowhere. The most general 1st order differential equation is df = φ(f (t). then the gain from tick 0 to tick N was N φi (Si − Si−1 ) i=1 It is these sorts of sums that become integrals in continuous time. Theoretically integral equations are better behaved than differential equations (they tend to smooth out bumps).110 CHAPTER 4. To motivate our way out of these problems let us look at deterministic motion. P) = i=1 f (ξi )(ti − ti−1 ) . P = {a = t0 < t1 < · · · < tn = b}. then let n RS(f. . St ). n}. where We would like to make sense of the idea that dt if the noise term were absent then St is a bank account earning interest u.

We call this limit the Riemann integral b f (x)dx a Note that the summation is beginning to look like a martingale transform. Then lim RS(f.6. P a partition. then by definition. RSS(f. namely lim RSS(f. Suppose φ(x) is a non decreasing function.4. φ) = i=1 f (ξi )(φ(ti ) − φ(ti−1 )) .3.3) f (ξi )φ (ti )(ti − ti−1 ) = f (t)φ (t) dt. the corresponding Riemann’s theorem holds. Assuming φ is differentiable. b b φ(t) dφ(t) = a a 1 1 φ(t)φ (t) dt = φ(b)2 − φ(a)2 . (“RSS” stands for Riemann-Stieltjes sum. P.) If limmeshP→0 RSS(f. φ) exb ists. THE WIENER PROCESS (RS stands for Riemann sum). and is independent of the choices of ξi within their respective intervals. P) meshP→0 111 exists. Proof. b a b a f (t) dφ(t) = f (t)φ (t) dt. Example. φ) = = = and so limmeshP→0 n i=1 n i=1 n i=1 n i=1 f (ξi )(φ(ti ) − φ(ti−1 )) f (ξi ) φ(ti )−φ(ti−1 ) (ti − ti−1 ) ti −ti−1 f (ξi )(φ (ti ) + o(|ti − ti−1 |2 ))(ti − ti−1 ) b a (4. a f (t) dφ(t) is this limit. Let n RSS(f. φ) meshP→0 exists (and is independent of the choices of ξi ). The next level of integral technology is the Riemann-Stieltjes integral. Proposition 4. Let P = {a = t0 < t1 < · · · < tn = b}.3. and ξi values between partition points. P. P.3. If φ is differentiable. P. Let φ(x) be differentiable. 2 2 .

In the Riemann and Riemann-Stieltjes cases.112 CHAPTER 4. the choices of the points ξi in the intervals were irrelevant. . the right endpoint. the left endpoint. Then E n i=1 Wti−1 (Wti − Wti−1 ) = = n i=1 n i=1 E(Wti−1 (Wti − Wti−1 )) E(Wti−1 )E(Wti − Wti−1 ) = 0 . b φt dXt a is to be a random variable. First. Then E n i=1 Wti (Wti − Wti−1 ) n n 2 = i=1 E(Wti − Wti Wti−1 ) i=1 E(Wti (Wti − Wti−1 )) = n 2 = i=1 E(Wti ) − E(Wti Wti−1 ) n = i=1 ti − ti−1 = (t1 − t0 ) + (t2 − t1 ) + · · · + (tn − tn−1 ) = tn − t0 = b − a . There are two motivations for this choice. let ξi = ti . Second. P. Consider n φξi (Xti − Xti−1 ) . where Wt is the Wiener process. i=1 Let’s work out the above sum for Xt = φt = Wt . o One defines the Itˆ sum o n IS(φt . using two different choices for the ξi ’s. As an example. using independence. Which endpoint to choose? Following Itˆ. For Xt and φt stochastic processes. CONTINUOUS TIME A stochastic integral is formed much like the Riemann-Stieltjes integral. let ξi = ti−1 . one uses the left-hand endpoint. in the stochastic case it is very much relevant. with one twist. take P = {a = t0 < t1 < · · · < tn = b}. Xt ) = i=1 φti−1 (Xti − Xti−1 ) .

4. φt is adapted to Xt if φt is independent of Xs − Xt for s > t. one takes ξi = 1 (ti−1 + ti ). Now Var(X − Y ) = E((X − Y )2 ) − (E(X − Y ))2 . Definition 4. As a disadvantage. A sequence X α of square integrable random variables converges in mean-square to X if E((X − X α )2 ) → 0 as α → 0. Let Xt be a stochastic process. This is a criterion for checking whether two square integrable random variables are equal (almost surely). then the Ito sum will be a Martingale as well. Here is a sample computation of a stochastic integral.2. Y square integrable random variables defined on the same sample space. Let’s evaluate b Wt dWt a . so P (X = Y ) = 1 iff E((X − Y )2 ) = 0 . 2. This is the sense of the limit in the definition of the stochastic integral. The lefthand choice looks a little like a previsibility condition. It turns out. the derived stochastic process is not a martingale. P. Let φt be another stochastic process defined on the same sample space as Xt .3. For example. that if Xt is a Martingale. if Xt is a stock-process and φt a portfolio process then the left-hand choice embodies the idea that our gains from time Ti−1 to time ti should be based on a choice made at time ti−1 . THE WIENER PROCESS 113 1. lim X α = X. the midpoint. i=1 Let us call a random variable with finite first and second moments sqaure integrable. Then we define the Itˆ integral o t n φs dXs ≡ a lim IS(φ. Definition 4.3. Following Stratonovich.3.3. P (X = Y ) = 1 iff E(X) = E(Y ) and Var(X − Y ) = 0. X) = lim meshP→0 meshP→0 φti−1 (Xti − Xti−1 ) . One knows that for X. In symbols. This is equivalent to E(X α ) → E(X) and Var(X − X α ) → 0. As an 2 advantage. the resulting Stratonovich calculus looks like traditional calculus.

E ( (∆Wti )2 − (b − a))2 i=1 n 2 i=1 (∆Wti ) = E ( n 2 2 i=1 (∆Wti )) − 2(b − a) + (b − a)2 n n 4 2 2 2 2 = E i=1 (∆Wti ) + 2 i<j (∆Wti ) (∆Wtj ) − 2(b − a) i=1 (∆Wti ) + (b − a) n n 2 2 2 2 2 = i=1 3(∆ti ) + 2 i<j E((∆Wti ) )E((∆Wtj ) ) − E(2(b − a) i=1 (∆Wti ) + (b − a) ) = 3 n (∆ti )2 + 2 i<j (∆ti )(∆tj ) − 2(b − a) n ∆ti + (b − a)2 i=1 i=1 = 2 = 2 = 2 n 2 i=1 (∆ti ) n 2 i=1 (∆ti ) n 2 i=1 (∆ti ) + + + n 2 i=1 (∆ti ) n i=1 (∆ti ) n i=1 +2 2 i<j (∆ti )(∆tj ) n i=1 − 2(b − a) n i=1 ∆ti + (b − a)2 − 2(b − a) 2 ∆ti + (b − a)2 ∆ti − (b − a) =2 n 2 i=1 (∆ti ) . Wt ) = = = = = n n i=1 Wti−1 · ∆Wti = i=1 Wti−1 (Wti − Wti−1 ) n 2 1 2 2 i=1 (Wti−1 + ∆Wti ) − Wti−1 − (∆Wti ) 2 n n 2 2 1 1 2 i=1 Wti − Wti−1 − 2 i=1 (∆Wti ) 2 2 2 2 1 ((Wt1 − Wt0 ) + (Wt2 − Wt1 2 ) + · · · 2 1 +(Wtn 2 − Wtn−1 2 )) − 2 n (∆Wti )2 i=1 1 1 (Wb 2 − Wa 2 ) − 2 n (∆Wti )2 . But what is n (∆Wti )2 as meshP → 0? The simple calculation i=1 n n E i=1 n (∆Wti )2 = i=1 n 2 E((∆Wti )2 ) ti − ti−1 = b − a i=1 = i=1 E((Wti − Wti−1 ) ) = leads us to the Claim: n lim E( (∆Wti )2 ) = b − a meshP→0 i=1 and therefore b a 1 1 Wt dWt = (Wb 2 − Wa 2 ) − (b − a) . E(Wt 3 ) = 0. and E(Wt 4 ) = 3t2 . IS(Wt . P. E(Wt 2 ) = t. CONTINUOUS TIME We need the following facts: E(Wt ) = 0. 2 2 n Proof.114 CHAPTER 4. i=1 2 The first part is recognizable as what the Riemann-Stieltjes integral gives formally.

so (∆ti )2 ≤ ∆ti · mesh P.3. t t (2) a (φs + ψs ) dXs = a t φs dXs + a ψs dXs . a φs (dWs )n = 0. with high probability. ψs are adapted to the given process o (Xs or Ws ). E ( a φs dWs ) 2 = E a φs 2 ds . (3) E a t φs dWs = 0 and t (4) Proof. then t (1) a t dXs = Xt − Xa . THE WIENER PROCESS 115 Now for each i. so limmeshP→0 n (∆ti )2 ≈ limn→∞ n · i=1 1 ( n )2 → 0. then t Moreover. Properties of the Itˆ Integral.) Summing i = 1 to n. ∆ti ≤ mesh P. ∆ti ≈ mesh P ≈ n . (Note that only n 2 one ∆ti is being rounded up. 0 ≤ i=1 (∆ti ) ≤ n i=1 ∆ti · mesh P = (b − a) · mesh P → 0 as meshP → 0. So n lim meshP→0 (∆ti )2 = 0 . o (∆X)2 = 1. i=1 1 Intuitively. . Slogan of the Itˆ Calculus: dWt 2 = dt.3. .4. The underlying reason for the difference of the stochastic integral and the √ Stieltjes integral is that ∆Wti is. φs (dWs )2 = t a φs ds.7. ∆t dWt2 = dt is only given precise sense through its integral form: t a In deriving the Wiener Process. If φs is adapted to Ws . for n > 2. Theorem 4. Proving this will be in the exercises. If φs . so (∆X)2 = (∆W )2 = ∆t. of order ∆ti .

.3. which telescopes to Xsn − Xs0 = Xt − Xa . P. b(x. P. s) dWs . For any partition P = {a = s0 < s1 < · · · < sn = t}. t) dt + b(Xt . P. Xs ) = IS(φs . Given a partition P. t).116 CHAPTER 4. A first-order stochastic differential equation is an expression of the form dXt = a(Xt . E(IS(φs . s) . Taking φsi−1 (Wsi − Wsi−1 ) = n E(φsi−1 (Wsi − Wsi−1 )) i=1 n = i=1 E(φsi−1 )E(Wsi − Wsi−1 ) = 0 . It is obvious for any partition P that IS(φs +ψs . t) are functions of two variables. t) dWt . 2. 4. Xs )+ IS(ψs . P. P. IS(φs . Xs ). P. Given a partition P.3 Stochastic Differential Equations. 3. P. Now take the limit as meshP → 0. Ws ) = expected values gives E(IS(φs . CONTINUOUS TIME 1. Ws )2 ) = = = = = = E ( E E E n i=1 n i=1 n i=1 n i=1 n i=1 n i=1 φsi−1 (Wsi − Wsi−1 ))2 i<j φsi−1 2 (Wsi − Wsi−1 )2 + 2 φsi−1 2 (∆Wsi )2 + 2 φsi−1 2 (∆Wsi )2 + 2 2 i<j i<j φsi−1 φsj−1 (Wsi − Wsi−1 )(Wsj − Wsj−1 ) E(φsi−1 φsj−1 (Wsi − Wsi−1 )(Wsj − Wsj−1 )) E(φsi−1 φsj−1 (Wsi − Wsi−1 )) · E(Wsj − Wsj−1 ) E(φsi−1 )∆si + 2 i<j E(φsi−1 φsj−1 (Wsi − Wsi−1 )) · 0 E(φsi−1 2 )∆si = IS(φs 2 . IS(1. So the limit as meshP → 0 is Xt − Xa . 4. Taking the limit as meshP → 0 gives the result. where a(x. A stochastic process Xt is a solution to this SDE if t t Xt − Xa = a dXs = a a(Xs . Now take the limit as meshP → 0. Ws )) = E n i=1 n i=1 φsi−1 (Wsi − Wsi−1 ). P. Xs ) = n i=1 1 · (Xsi − Xsi−1 ). s) ds + b(Xs .

find f (x) such that Wt dWt = d(f (Wt )).8. t))2 ] dt + f (Xt )b(Xt . To calculate a Wt dWt . a . 3 2 1 or Ws 2 dWs = d( 3 Ws 3 ) − Ws ds. t) dWt 2 t t Example. 2 More generally. Then f (Xt ) is a stochastic process.) df (Xt ) = = = = = f (Xt+dt ) − f (Xt ) = [f (Xt + dXt )] − f (Xt ) [f (Xt ) + f (Xt ) dXt + 1 f (Xt )(dXt )2 + o((dXt )3 )] − f (Xt ) 2 f (Xt ) dXt + 1 f (Xt )(dXt )2 2 1 f (Xt )[a(Xt . twice differentiable in x. (Xt . t)b dWt . and the rules dWt 2 = dt and dWt n = 0 for n > 2. That is. so Itˆ’s lemma. consider f (x) = 1 x2 . t)a + (Xt . one has df (Xt . 2 2 2 a 2 t Example. THE WIENER PROCESS 117 Lemma 4. if f (x. Since x2 dx = 1 x3 . Since x dx = 1 2 x . once differentiable in t. Wt dWt = df (Wt )− 1 dt. a Ws dWs = a d(f (Ws ))− 2 t 1 1 dt = f (Wt ) − f (Wa ) − 1 (t − a) = 2 Wt 2 − 1 Wa 2 − 1 (t − a). t) =[ ∂f ∂f 1 ∂2f ∂f (Xt . Calculate a Ws 2 dWs . so t t t Ws 2 dWs = a 1 2 1 Wt − Wa 2 − 3 3 t Ws ds . This is just a calculation using two terms of the Taylor’s series for f . t) + 1 f (Xt )(b(Xt . o 2 2 applied to dWt = 0 · dt + 1 · dWt . t)b2 ] dt + 2 ∂t dx 2 ∂x ∂x Proof. gives 1 1 1 d(f (Wt )) = f (Wt ) dWt + f (Wt ) dt = Wt dWt + · 1 dt = Wt dWt + dt . (Itˆ’s Lemma). Suppose we are given a twice differentiable o function f (x). t) dt + b(Xt . consider f (x) = 1 x3 . and a solution Xt to the SDE dXt = a dt + b dWt . Itˆ’s lemma implies o 1 1 d( x3 ) = Ws 2 dWs + · 2Ws ds = Ws 2 dWs + Ws ds . show that is equal to a dXt = Xt − Xa for some Xt . t) a function of two variables. 3 3 Then f (x) = x2 and f (x) = 2x. Then f (x) = x and f (x) = 1. Therefore. and it satisfies the SDE 1 df (Xt ) = [f (Xt )a + f (Xt )b2 ] dt + f (Xt )b dWt .4.3. (Note that dt dWt = dWt 3 = 0 and dt2 = dWt 4 = 0. 2 2 2 Equivalently.3. t) dWt ] + 2 f (Xt )[a2 dt2 + 2ab dt dWt + b2 dWt 2 ] [f (Xt )a(Xt . t) + (Xt .

t) = σx. the model simplifies to dSt = µSt dt. this is f (x)·x = 1.118 CHAPTER 4. called volatility. risk-free security. which is just the value of an interest-bearing. 2 d(ln Xs ) = = t a 1 1 dXs + 2 (− X1 2 )(dXs )2 Xs s 1 ds + dWs − 1 ds = dWs 2 2 . and σ is the standard deviation of the the short-term returns. Then. with f (x) = x . and f (x) = − x2 . using dXs = 1 Xs ds + Xs dWs and dXs 2 = Xs 2 ds. We have dSt 2 = σ 2 St 2 dt. . getting d(ln Xs ) = a dWs = Wt − Wa or Xt Xa t ln Asset Models Xt = Wt − Wa and so = Xa eWt −Wa . f (x) = x . Using f (x) = ln x. Now integrate both sides. and f (x) = − x2 . Note that if σ = 0. t) dt + b(Xt . t) = µx and b(x. Itˆ’s lemma o gives 1 1 d(ln St ) = St dSt + 2 (− S12 )(dSt )2 t = µ dt + σ dWt − 1 σ 2 dt 2 = (µ − 1 σ 2 ) dt + σ dWt . So we are led to consider 1 1 f (x) = ln x. Consider f (Xs ) · Xs = 1. With 1 ordinary variables.) 1 Example. The model SDE is similar to the last example. or f (x) = x . implying St = Sa eµt . CONTINUOUS TIME (The last integrand does not really simplify. Solve dXs = 2 Xs ds + Xs dWs . t) dWt the stock model uses a(x. and the same change of vari1 1 ables will work. where µ is the short-term expected return. 2 Integrating both sides gives 1 d(ln Ss ) = 0 (µ − 2 σ 2 ) ds + σ dWs 1 2 ln St − ln S0 = (µ − 2 σ )t + σWt 1 2 St = e(µ − 2 σ )t + σWt S0 t 0 t St 1 2 = S0 e(µ − 2 σ )t + σWt . The basic model for a stock is the SDE dSt = µSt dt + σSt dWt . Compared with the generic SDE dXt = a(Xt .

pushing rt downwards.4. Now integrate both sides from 0 to t: t d(e−µs Xs ) 0 −µt e Xt − X0 ∂2f (Xt . An O-U process is a solution to dXt = µXt dt + σ dWt with an initial condition X0 = x0 . then drt > 0. It satisfies “mean reversion”. This SDE is solved by using the integrating factor e−µt . First. t) dt + ∂f (Xt . then ∂f = −µe−µt Xt dt . and a volatility that grows with rt . ∂t Itˆ’s lemma gives o df (Xt . So the SDE. then drt < 0. ∂x2 Xt = 0 σe−µs dWs t = 0 σe−µs dWs t = eµt X0 + eµt 0 σe−µs dWs . meaning that if rt < b. t) = xe−µt . e−µt dXt = µe−µt Xt dt + σe−µt dWt holds just by multiplying the given SDE. t) dXt 2 ∂x2 ∂f = e−µt . rewrites as d(e−µt Xt ) + µe−µt Xt dt = µe−µt Xt dt + σe−µt dWt d(e−µt Xt ) = σe−µt dWt . t . while if rt > b. pulling rt upwards. It has mean reversion. t) dXt + ∂t ∂x d(e−µt Xt ) = −µe−µt Xt dt + e−µt dXt or e−µt dXt = d(e−µt Xt ) + µe−µt Xt dt . Ornstein-Uhlenbeck Process. ∂x and ∂2f = 0. 2 2 119 Vasicek’s interest rate model is drt = a(b−rt ) dt+σ dWt . Letting f (x. t) = ∂f (Xt . THE WIENER PROCESS The expected return of St is 1 1 E(ln(St /S0 )) = E((µ − σ 2 )t + σWt ) = (µ − σ 2 )t . with integrating factor.3. √ The Cox-Ingersoll-Ross model is drt = a(b − rt ) dt + σ rt dWt .

120 Let’s go back to the stock model CHAPTER 4. dS In the case at hand. g(S) = ln(S/S0 ) = ln S − ln S0 . the pdf is the derivative of the above with respect to S. a derivative d dS g(S) φ(x) dx = ( a d du u φ(x) dx) · a du = φ(u) · g (S) = φ(g(S)) · g (S) . construct a self-financing. To price V . First. and are plugging in a stochastic process St and getting a stochastic process V (St . That is. That is. CONTINUOUS TIME dSt = µSt dt + σSt dWt with solution 1 2 St = S0 e(µ − 2 σ )t + σWt . t) denote the payoff for this derivative if it expires at time t with stock price St . hedging portfolio for V . V (St . . set up a portfolio consisting of φt units of stock St and ψt units of bond Bt Πt = φt St + ψt Bt . Let V (ST . t) depends only on t and St . 1 2 P (St ≤ S) = P (S0 e(µ − 2 σ )t + σWt ≤ S) 1 2 = P (e(µ − 2 σ )t + σWt ≤ S/S0 ) 1 = P ((µ − 2 σ 2 )t + σWt ≤ ln(S/S0 )) = √ 1 2πσ 2 t ln(S/S0 ) −∞ e− 1 (x−(µ− 2 σ 2 )t)2 2t 2σ dx . and further assumed that its (gross) payoff depends only on T and ST . assumed to have expiration T . The chain rule gives us. so g (S) = 1/S. we have a deterministic function V (s. let V (St . T ) denote this payoff. The fundau d mental theorem of calculus gives us derivatives of the form du a φ(x) dx = φ(u). and a derivative security. Second. In particular. t). t). Pricing Derivatives in Continuous Time Our situation concerns a stock with price stochastic variable St . More generally. where u = g(S). We calculate the pdf of St as follows. and the pdf is (x−(µ− 1 σ 2 )t)2 2 ln(S/S0 ) − d d √ 1 2σ 2 t dx P (S ≤ S) = e dS t dS 2πσ 2 t −∞ = √1 e− S 2πσ 2 t (ln(S/S0 )−(µ− 1 σ 2 )t)2 2 2σ 2 t for S > 0 and identically 0 for S ≤ 0.

4. and the no-arbitrage assumption implies Πt = Vt for all t ≤ T . Recall. ∂t 2 ∂S ∂S ∂V . (Black-Scholes). t) = 1 dVt = ∂V dt + ∂V dSt + 2 ∂ V (dSt )2 ∂t ∂S ∂S 2 2 = ( ∂V + µSt ∂V + 1 σ 2 St 2 ∂ V ) dt + ∂t ∂S 2 ∂S 2 2 ∂V σSt ∂S dWt . Given a derivative as above. The continuous limit is SF C2 dΠt = φt dSt + ψt dBt . ∂t ∂S 2 ∂S 2 or. the self-financing condition in the discrete case: (∆Π)n = φn ∆Sn + ψn ∆Bn . We now model the stock and bond. t) dSt + (St . ∂S and furthermore. t arguments. THE WIENER PROCESS such that the terminal value of the portfolio equals the derivative payoff ΠT = VT . subject to the self-financing condition t 121 SF C1 Πt = Π 0 + 0 φs dSs + ψs dBs . the self financing condition (SFC2) says that Vt = Πt = φt St + ψt Bt . hedging portfolio Πt = φt St + ψt Bt with Πt = V (St . .3. by Ito’s lemma ∂V 1 ∂2V ∂V (St . there exists a self-financing. t) = dΠt two ways.3. The pricing of the portfolio is straightforward. dV (St . The stock is dSt = µSt dt + σSt dWt . suppressing the St . t) if and only if (BSE) 1 ∂V ∂2V ∂V + σ 2 S 2 2 + rS − rV = 0. First. Theorem 4. Let’s calculate dV (St . On the other hand. dVt = dΠt = φt dSt + ψt dBt = (φt µSt + ψt rBt ) dt + φt σSt dWt . while the bond is dBt = rBt dt.9. t) dt + (St . φt = Proof. t)(dSt )2 . also. using our stock and bond model on a portfolio.

Since the BSE is first order in t. Here. we need boundary and temporal conditions. t) to satisfy the BlackScholes equation and for the stock holding to be φt = ∂V .) The boundary conditions are usually based on S = 0 and S → ∞. ∂S ∂t 2 ∂S 2 or in other works (BSE). T ) = max(ST − K. t). (Usually. consider a call. ∂S ∂S and equating the coefficients of dt gives φt σSt = φt µSt + ψt rBt = hence ∂V ∂V 1 ∂2V + µSt + σ 2 St 2 2 . t) = 0 is one boundary condition. to construct a self-financing. completing the proof. hence V (0. 0) is. one has a “final” or “terminal” condition. If the stock price grows arbitrarily large. T ) = h(ST ). In summary. one has an “initial” condition. the call is essentially worth the same as the stock. time t be given by u(x. we need one time condition. the coefficients of dt and of dWt must be equal. For example. ∂t ∂S 2 ∂S ∂V 1 ∂2V + σ 2 St 2 2 . The One-Dimensional Heat Equation Let the temperature of a homogeneous one-dimensional object at position x. corresponding to time 0. If the stock price is zero. we need two boundary conditions. CONTINUOUS TIME If these are to be equal. ∂t ∂x . Equating the coefficients of dWt gives ∂V ∂V σSt . And since the BSE is second order in S. a call is worthless. by definition. t) ∼ S. replicating portfolio for a derivative. 0) is the final condition. h(S) = max(S − K. ∂S The theory of partial differential equations tells us that in order to get unique solutions.122 CHAPTER 4. so V (ST . ∂t 2 ∂S ∂V Since ψt Bt = Πt − φt St = Vt − ∂S St we have ψt rBt = ∂V ∂V 1 2 2 ∂2V r(Vt − St ) = + σ St . you need the value of the derivative Vt = V (St . The time condition is simply the expiration payoff V (ST . The heat equation is ∂2u ∂u =k 2. hence φt = . the payoff of a call at expiry. that is V (S.

t) = √ x2 1 e− 4t . δt any Dirac δ-function. The solutions to the heat equation 2 1 −x 4t δt (x) = u(x. and on a small interval. t) = √ If we let k = 1. 4πt x2 1 e− 2t . if − t ≤ x ≤ t . t) = √ e 4πt are a Dirac δ-function. k = 1 gives the Fokker-Planck equation for a Wiener path. the initial temperature distribution? To answer this. so limt→0 −∞ δt (x−y)f (x) dx ≈ limt→0 −t−y 2t f (x) dx ≈ 1 limt→0 2t · 2t · f (y) ≈ f (y). δt (x) ≥ 0 for all t. 2πt What is limt→0 u(x. ∞ δ (x) dx = 1 for all −∞ t Important Example. If f is any bounded smooth function. we use the following notion. Every Dirac δ-function is something like the first example. Consider δt = { 0 . −∞ Intuitively. t). the solution is u(x. of piecewise smooth functions such that (1) (2) (3) Example. x. continuous functions are ap∞ t−y 1 proximately constant.3.4. limt→0 δt (x) = 0 for all x = 0. this can be seen as follows. the thermal conductivity of the object’s material. indexed by t ≥ 0. if x < −t or x > t . 1 . . A Dirac δ-function is a one-parameter family δt . with solution 2 u(x. Definition. 2t t. t→0 −∞ ∞ more generally. THE WIENER PROCESS 123 where k is a constant. then ∞ lim δt (x)f (x) dx = f (0) . Proposition. lim t→0 δt (x − y)f (x) dx = f (y) .

124 Proceeding, the Heat Equation

CHAPTER 4. CONTINUOUS TIME

∂u ∂2u = , ∂t ∂x2 with initial condition u(x, 0) = φ(x) and boundary conditions u(∞, t) = u(−∞, t) = 0, will be solved using δt = √
x2 1 e− 4t , 4πt

which is both a Dirac δ-function and a solution to the Heat Equation for t > 0. Let ∞ (x−y)2 1 e− 4t · φ(x) dx for t > 0 . v(y, t) = √ 4πt −∞ Claim. limt→0 v(x, t) = φ(x). This is just an example of the previous proposition: lim √
t→0

1 4πt

∞ −∞

(x−y) e− 4t · φ(x) dx = lim t→0

2

δt (x − y)φ(x) dx = φ(y) .
−∞

Claim. v satisfies the Heat Equation.
∂v ∂t

= = = =

∂ ∂t

∞ −∞

(x−y)2 ∞ ∂2 (e− 4t ) · φ(x) dx −∞ ∂y 2 (x−y)2 ∞ ∂2 ∂2 e− 4t · φ(x) dx = ∂yv 2 ∂y 2 −∞

∞ −∞ ∂t

(x−y) e− 4t · φ(x) dx 2 − (x−y) ) · φ(x) dx ∂ 4t (e

2

.

Back to the Black-Scholes equation ∂V 1 ∂2V ∂V + σ 2 S 2 2 + rS − rV = 0 . ∂t 2 ∂S ∂S Note that µ, the expected return of S, does not appear explicitly in the equation. This is called risk neutrality. The particulars of the derivative are encoded in the boundary conditions. For a call C(S, t) one has ∂C 1 2 2 ∂ 2 C ∂C + σ S + rS − rV = 0 , ∂t 2 ∂S 2 ∂S

4.3. THE WIENER PROCESS C(S, T ) = max(S − K, 0) , C(0, t) = 0 for all t, C(S, t) ∼ S as S → ∞ . And for a put P (S, t) one has ∂2P ∂P 1 ∂P + σ 2 S 2 2 + rS − rV = 0 , ∂t 2 ∂S ∂S P (S, T ) = max(K − S, 0) , P (0, t) = K for all t, P (S, t) → 0 as S → ∞ . Black-Scholes Formula The solution to the the call PDE above is C(S, t) = SN (d+ ) − Ke−r(T −t) N (d− ) , where d+ = d− = and
S ln( K )+(r+ 1 σ 2 )(T −t) √ 2 σ T −t S ln( K )+(r− 1 σ 2 )(T −t) √ 2 σ T −t

125

, ,

x ∞ s2 s2 1 1 e− 2 ds = √ e− 2 ds N (x) = √ 2π −∞ 2π −x is the cumulative normal distribution. If you are given data, including the market price for derivatives, and numerically solve the Black-Scholes formula for σ, the resulting value is called the implied volatility. We derive the Black-Scholes formula from the Black-Scholes equation, by using a change of variables to obtain the heat equation, whose solutions we have identified, at least in a formal way. The first difference is the heat equation has an initial condition at t = 0, while the Black-Scholes equation has a final condition at t = T . This can be compensated for by letting τ = T − t. Then in terms of τ , the final condition at t = T is an initial condition at τ = 0. 2 The second difference is that the Black-Scholes equation has S 2 ∂ V while ∂S 2 ∂2u the heat equation has . If we let S = ex , or equivalently, x = ln S, then ∂x2 ∂ ∂x ∂2 ∂x2 ∂ ∂ ∂ = ∂S ∂S = ex ∂S = S ∂S ∂x ∂2 = S 2 ∂S 2 + · · ·

126

CHAPTER 4. CONTINUOUS TIME

The third difference is that the Black-Scholes equation has ∂V and V terms, ∂S but the heat equation has no ∂u or u terms. An integrating factor can ∂x eliminate such terms, as follows. Suppose we are given ∂u ∂2u ∂u = + bu . +a 2 ∂t ∂x ∂x Let u(x, t) = eαx+βt v(x, t) , with α, β to be chosen later. Then ∂u ∂v = βeαx+βt v + eαx+βt , ∂t ∂t ∂u ∂v = αeαx+βt v + eαx+βt , ∂x ∂x ∂2u ∂v ∂2v = α2 eαx+βt v + 2αeαx+βt + eαx+βt 2 . ∂x2 ∂x ∂x So ∂u ∂2u ∂u = +a + bu ∂t ∂x2 ∂x ∂v ∂v ∂ 2 v ∂v ) = eαx+βt (α2 v+2α + 2 )+a·eαx+βt (αv+ )+b·eαx+βt v , ∂t ∂x ∂x ∂x ∂v ∂ 2 v ∂v ∂v = α2 v + 2α + 2 + a · (αv + )+b·v, ∂t ∂x ∂x ∂x

becomes eαx+βt (βv+ or βv + hence, ∂v ∂2v ∂v = + (a + 2α) + (α2 + aα + b − β)v . 2 ∂t ∂x ∂x With a and b given, one can set a + 2α = 0 and α2 + aα + b − β = 0 and solve for α and β. One gets α = − 1 a and β = b − α2 = b − 1 a2 . In other 2 4 words, if v(x, t) is a solution to ∂v ∂2v = , ∂t ∂x2

Then 1. 3. 0) . t) = Ku(x. We are considering ∂C 1 2 2 ∂ 2 C ∂C + σ S − rV = 0 + rS 2 ∂t 2 ∂S ∂S with final condition C(S. so C(S. ∂S ∂x dS ∂x S S ∂x ∂2C ∂ K ∂u K ∂u K ∂ ∂u = ( )=− 2 + ( ) 2 ∂S ∂S S ∂x S ∂x S ∂S ∂x =− K ∂u K ∂ ∂u ∂x K ∂u K ∂ 2 u + ( ) =− 2 + . +a 2 ∂t ∂x ∂x With these three differences accounted for. First make the change of variables S = Kex . t) satisfies 127 ∂u ∂2u ∂u = + bu . and also let τ = 2 σ 2 (T − t). we have 1 ∂u 1 2 2 K ∂u K ∂ 2 u K ∂u − σ2K + σ S (− 2 + 2 2 ) + rS( ) − rKu = 0 2 ∂τ 2 S ∂x S ∂x S ∂x or 1 2 ∂u 1 ∂ 2 u 1 ∂u ∂u σ = σ2 2 − σ2 +r − ru 2 ∂τ 2 ∂x 2 ∂x ∂x ∂2u 2r ∂u 2r ∂u = + ( 2 − 1) − u. S 2 ∂x S ∂x ∂x ∂S S ∂x S 2 ∂x2 2. ∂C ∂C dτ ∂u 1 1 ∂u = · = (K ) · (− σ 2 ) = − σ 2 K . Substituting this into the Black-Scholes equation.4. t) = e− 2 ax + (b − 4 a )t v(x. τ ). we will now solve the BlackScholes equation for a European call. THE WIENER PROCESS 1 1 2 then u(x. x = ln(S/K) = 1 ln S − ln K. ∂t ∂τ dt ∂τ 2 2 ∂τ ∂C ∂C dx ∂u 1 K ∂u = · = (K ) · = . or equivalently.3. T ) = max(S − K. ∂τ ∂x2 σ ∂x σ 2 or .

0) becomes e− 2 (k−1)x v(x. 1 1 1 1 Now. this reduces to the heat equation if we let u(x. τ ) = √ 1 4πτ v(x. 0) = max(ex − 1. 0) = max(Kex − K. 0) = max(e 2 (k+1)x − e 2 (k−1)x . σ2 CHAPTER 4. 0) = max(ex − 1. So in our situation. 0) = max(ex · e 2 (k−1)x − e 2 (k−1)x . 0) or equivalently. . + (k − 1) 2 ∂τ ∂x ∂x And the final condition C(S. φ(y) = v(x. which becomes u(x. √ √ √ so dy = 2τ ds. we can write if y > 0 if y ≤ 0 √ √ Now make the change of variables s = (y − x)/ 2τ . τ ) = = = = √ √ 2τ 4πτ √1 2π √1 2π ∞ −∞ ∞ −∞ ∞ −∞ 2 − (x−( 2τ s+x)) φ(√2τ s + x) ds 4τ e 2τ s2 √ e− 4τ φ( 2τ s + x) ds e 2 (k+1)y − e 2 (k−1)y 0 1 1 √ ∞ √1 x 2π − √2τ s2 √ e− 2 φ( 2τ s + x) ds 2 − s2 (e 1 (k+1)(√2τ s+x) − e 1 (k−1)(√2τ s+x) ) ds 2 2 e = I+ − I− . v(x.128 Set k = 2r . τ ). 0). 0) = max(ex − 1. that is. 2 which in this case has φ(y) = max(e − e 2 (k−1)y . As we saw. 0). e 2 (k+1)y > e 2 (k−1)y if and only if e 2 (k+1)y /e 2 (k−1)y > 1 if and only if ey > 1 if and only if y > 0. 0). CONTINUOUS TIME The equation becomes ∂u ∂2u ∂u = − ku . Recall that the heat equation ∂2v ∂v = ∂τ ∂x2 is solved by v(x. Then since φ( 2τ s + x) = 0 when 2τ s + x < 0. Then ds = dy/ 2τ . τ ) = eαx+βτ v(x. −x when s < √2τ . And the initial 4 4 condition u(x. 0) . T ) = max(S − K. 0) = φ(x) ∞ −∞ 1 (k+1)y 2 1 1 1 1 1 1 (y−x) e− 4τ φ(y) dy . 1 with α = − 2 (k − 1) and β = −k − 1 (k − 1)2 = − 1 (k + 1)2 . 0) becomes the initial condition Ku(x.

−a e−t dt = −∞ e−t 2 /2 dt Now S = Kex and 2τ = σ 2 (T − t). σ T −t √ −t The integral I− is treated in the same manner and gives I− = e 2 (k−1)+ 4 (k−1) τ N (d− ) = e−αx−βτ −kτ N (d− ) S √ ln( K )+(r− 1 σ 2 )(T −t) √ 2 where d− = √x + 1 (k − 1) 2τ = . So with v(x. τ ) = 2 σ T −t 2τ I+ − I− established. We calculate I+ by completing the square inside the exponent. so d+ = = = √x + 1 (k + 1) 2τ 2 2τ √ ln(S/K) 1 2r √ + 2 ( σ2 + 1)σ T σ T −t S ln( K )+(r+ 1 σ 2 )(T −t) √ 2 . THE WIENER PROCESS where 1 I± = √ 2π 129 ∞ − √x 2τ e− 2 s 1 2 1 + 2 (k±1)( √ 2τ s+x) ds . I+ = = = = = = = √ ∞ − 1 s2 + 1 (k+1)( 2τ s+x) √1 2 ds x e 2 2π − √2τ √ 1 1 2 1 ∞ − s + 2 (k+1) 2τ s √1 · e 2 (k+1)x ds x e 2 −√ 2π 2τ √ 1 (k+1)x 1 1 2 1 2 ∞ e 2√ e− 2 (s− 2 (k+1) 2τ ) + 4 (k+1) τ ds − √x 2π 2τ √ 2 1 1 e 2 (k+1)x+ 4 (k+1) τ ∞ − 1 (s− 1 (k+1) 2τ )2 √ 2 ds x e 2 −√ 2π 1 2 ∞ √ e− 2 t dt − √x − 1 (k+1) 2τ 2 2τ √ 2 1 1 √x + 1 (k+1) 2τ 1 2 2 2τ e 2 (k+1)x+ 4 (k+1) τ √ e− 2 t dt −∞ 2π 1 1 2 e 2 (k+1)x+ 4 (k+1) τ · N (d+ ) = ex−αx−βτ N (d+ ) . e 2 (k+1)x+ 4 (k+1) √ 2π 1 1 2τ 2τ 1 using α = − 2 (k −1) and β = − 1 (k +1)2 and where d+ = 4 The second to last line was arrived at since ∞ a 2 /2 √x 2τ √ 1 + 2 (k + 1) 2τ . 1 1 2 .3.4. we have substituting everything back in: x = ln(S/K).

Then. Let Tm be the continuous random variable that gives the time to the mth jump of a Poisson process with parameter λ.4 Exercises 1.130 τ= σ2 (T 2 CHAPTER 4. CONTINUOUS TIME − t) and k = 2r . 3. Draw some graphs of the distribution of a Poisson process Xt for various values of t (and λ). find P (Xs = m|Xt = n). both for the case 0≤n≤m and 0≤t≤s and the case 0≤m≤n and 0≤s≤t 5.k−j (t) for all positive t. t) obtained at the end of the section satisfies the Fokker-Planck equation. What happens as t increases? 2. Memorize the properties of the Wiener process and close your notes and books and write them on them down. For a Poisson process with parameter λ. σ2 C(S. 8. . Verify that the expected value and variance of Wt∆ are as claimed in the text. Calculate E(eλWt ). then the random variable Xt − j has a Poisson distribution with parameter λt. use the Markov assumption to prove that Xt − Xs has Poisson distribution with parameter λ(t − s). τ ) Keαx+βτ (I+ − I− ) Keαx+βτ ex−αx−βτ · N (d+ ) − Keαx+βτ e−αx−βτ −kτ N (d− ) Kex N (d+ ) − Ke−kτ N (d− ) SN (d+ ) − Ke−r(T −t) N (d− ) 4. Find the pdf of Tm . Verify that the function p(x. 7. t) = = = = = Ku(x. τ ) = Keαx+βτ v(x. 4. show that if X0 = j. 6. Since for a Poisson process Pjk (t) = P0.

Recall that for a process φt adapted to Xt how the Ito integral t φs dXs t0 is defined. Then one forms the limit as the mesh size of the partitions go to zero. dXs ) = j=1 φtj−1 ∆Xtj where ∆Xtj = Xtj − Xtj−1 . What is the probability that at time 12 the particle will be found in the interval [5. What is E(Ws2 Wt ). P. (dWs )2 ) − I(φs . find the smallest value of n = 1. calculate the mean square difference which is by definition E((I(φs . EXERCISES 131 9. On a computer (or a calculator).4. P.) 10. Suppose a particle is moving along the real line following a Wiener process. dt))2 ) . 7]. You must break up the cases s < t and s > t. dXs ) mesh P→0 12. · · · such that P (|W1 | ≤ n) is calculated as 0. Let Wt be the Wiener process. or a table if you must. The following three problems provide justification for the Ito calculus rule (dWs )2 = ds. 3. to estimate this value. P. 11. This problems is very similar to the calculation of the Ito integral Wt dWt . 2. At time 10 you observe the position of the particle is 3.4. t φs dXs = t0 lim I(φs . For a partition P as above. First. P. (Write down the expression and then use a computer. Show (a) Show E((∆Wti )2 ) = ∆ti . (b) Show E(((∆Wti )2 − ∆ti )2 ) = 2(∆ti )2 . for a partition P = {t0 < t1 < t2 < · · · tn = t} we define the Ito sum n I(φs . 13.

(Hint: You will need to use a multiplier similar to the one for the Ornstein-Uhlenbeck process.132 n CHAPTER 4.1 and σ = . 17. (a) Solve the stock model SDE dSt = µSt dt + σSt dWt (b) Find the pdf of St . (c) If µ = .25 and S0 = 100 what is P (S1 ≥ 110) . Calculate 0 t Wsn dWs 16. Show that for an adapted process φs such that E(φ2 ) ≤ M for some s M > 0 and for all s. Solve the Vasicek model for interest rates drt = a(b − rt )dt + σdWt subject to the initial condition r0 = c. What is d(sin(Wt )) 18. CONTINUOUS TIME n = E(( j=1 φtj−1 (∆Wtj )2 − j=1 φtj−1 ∆tj )2 ) and show that it equals n 2 j=1 E(φ2j−1 )(∆tj )2 t 14.) . that the limit as the mesh P → 0 of this expression is 0 and thus that for any adapted φs t t φs (dWs )2 = t0 t0 φs ds 15.

Let C(S.4. t. K − h)) dK h2 . 0] (4. for example. K) = max(S − K. Thus. t) denotes the value at time t of the derivative with payoff function h if stock price is S (at time t) then why is V (S. T. h] Show that collection of functions δh (x) for h > 0 forms a δfunction. Recall that this implies that for any continuous function f that ∞ f (y) = lim h→0 f (x)δh (x − y)dx −∞ (b) Consider a non-divdend paying stock S.1)     2 (h − x)/h if x ∈ [0. T. T. K) + C(S. EXERCISES 133 19. T. Define   0 if |x| ≥ h     2 δh (x) = (x + h)/h if x ∈ [−h. Suppose European calls on S with expiration T are available with arbitrary strike prices K. This allows us to express the value of any derivative in terms of the Black-Scholes formula.4. In this problem. This problem is long and has many parts. go onto the next. K) denote the value at time t of the call with strike K. you will show how to synthesize any derivative (whose pay-off only depends on the final value of the stock) in terms of calls with different strikes. K + h) − 2C(S. Show 1 (C(S. K) + C(S. t) = ∞ h→0 lim h(K) 0 1 (C(S. T. 0). If you can not do one part. K + h) − 2C(S. T. Then we can synthesize the derivative by δh (S − K) = ∞ h(S) = lim h→0 h(K)δh (S − K)dK 0 If V (S. C(S.4. T. (a) Consider the following functions: Fix a number h > 0. K − h)) h2 (c) Let h(S) denote the payoff function of some derivative.

K − h)) h→0 h2 lim = ∂ 2 C(S. t. T. T. t. K) + C(S. K + h) − 2C(S. K)dK 0 (Hint: Integrate by parts. t) is equal to h(K) ∂C(S. K) ∂K 2 (e) Show that V (S. K) ∞ |K=0 −h (K)C(S.134 CHAPTER 4.) . show that 1 (C(S. T. CONTINUOUS TIME (d) Using Taylor’s theorem. t. K)|∞ + K=0 ∂K ∞ h (K)C(S. t.

Sign up to vote on this title
UsefulNot useful