The Basics of Financial Mathematics

Spring 2003
Richard F. Bass
Department of Mathematics
University of Connecticut
These notes are c (2003 by Richard Bass. They may be used for personal use or
class use, but not for commercial purposes. If you find any errors, I would appreciate
hearing from you: bass@math.uconn.edu
1
1. Introduction.
In this course we will study mathematical finance. Mathematical finance is not
about predicting the price of a stock. What it is about is figuring out the price of options
and derivatives.
The most familiar type of option is the option to buy a stock at a given price at
a given time. For example, suppose Microsoft is currently selling today at $40 per share.
A European call option is something I can buy that gives me the right to buy a share of
Microsoft at some future date. To make up an example, suppose I have an option that
allows me to buy a share of Microsoft for $50 in three months time, but does not compel
me to do so. If Microsoft happens to be selling at $45 in three months time, the option is
worthless. I would be silly to buy a share for $50 when I could call my broker and buy it
for $45. So I would choose not to exercise the option. On the other hand, if Microsoft is
selling for $60 three months from now, the option would be quite valuable. I could exercise
the option and buy a share for $50. I could then turn around and sell the share on the
open market for $60 and make a profit of $10 per share. Therefore this stock option I
possess has some value. There is some chance it is worthless and some chance that it will
lead me to a profit. The basic question is: how much is the option worth today?
The huge impetus in financial derivatives was the seminal paper of Black and Scholes
in 1973. Although many researchers had studied this question, Black and Scholes gave a
definitive answer, and a great deal of research has been done since. These are not just
academic questions; today the market in financial derivatives is larger than the market
in stock securities. In other words, more money is invested in options on stocks than in
stocks themselves.
Options have been around for a long time. The earliest ones were used by manu-
facturers and food producers to hedge their risk. A farmer might agree to sell a bushel of
wheat at a fixed price six months from now rather than take a chance on the vagaries of
market prices. Similarly a steel refinery might want to lock in the price of iron ore at a
fixed price.
The sections of these notes can be grouped into five categories. The first is elemen-
tary probability. Although someone who has had a course in undergraduate probability
will be familiar with some of this, we will talk about a number of topics that are not usu-
ally covered in such a course: σ-fields, conditional expectations, martingales. The second
category is the binomial asset pricing model. This is just about the simplest model of a
stock that one can imagine, and this will provide a case where we can see most of the major
ideas of mathematical finance, but in a very simple setting. Then we will turn to advanced
probability, that is, ideas such as Brownian motion, stochastic integrals, stochastic differ-
ential equations, Girsanov transformation. Although to do this rigorously requires measure
theory, we can still learn enough to understand and work with these concepts. We then
2
return to finance and work with the continuous model. We will derive the Black-Scholes
formula, see the Fundamental Theorem of Asset Pricing, work with equivalent martingale
measures, and the like. The fifth main category is term structure models, which means
models of interest rate behavior.
I found some unpublished notes of Steve Shreve extremely useful in preparing these
notes. I hope that he has turned them into a book and that this book is now available.
The stochastic calculus part of these notes is from my own book: Probabilistic Techniques
in Analysis, Springer, New York, 1995.
I would also like to thank Evarist Gin´e who pointed out a number of errors.
3
2. Review of elementary probability.
Let’s begin by recalling some of the definitions and basic concepts of elementary
probability. We will only work with discrete models at first.
We start with an arbitrary set, called the probability space, which we will denote
by Ω, the capital Greek letter “omega.” We are given a class T of subsets of Ω. These are
called events. We require T to be a σ-field.
Definition 2.1. A collection T of subsets of Ω is called a σ-field if
(1) ∅ ∈ T,
(2) Ω ∈ T,
(3) A ∈ T implies A
c
∈ T, and
(4) A
1
, A
2
, . . . ∈ T implies both ∪

i=1
A
i
∈ T and ∩

i=1
A
i
∈ T.
Here A
c
= ¦ω ∈ Ω : ω / ∈ A¦ denotes the complement of A. ∅ denotes the empty set, that
is, the set with no elements. We will use without special comment the usual notations of
∪ (union), ∩ (intersection), ⊂ (contained in), ∈ (is an element of).
Typically, in an elementary probability course, T will consist of all subsets of
Ω, but we will later need to distinguish between various σ-fields. Here is an exam-
ple. Suppose one tosses a coin two times and lets Ω denote all possible outcomes. So
Ω = ¦HH, HT, TH, TT¦. A typical σ-field T would be the collection of all subsets of Ω.
In this case it is trivial to show that T is a σ-field, since every subset is in T. But if
we let ( = ¦∅, Ω, ¦HH, HT¦, ¦TH, TT¦¦, then ( is also a σ-field. One has to check the
definition, but to illustrate, the event ¦HH, HT¦ is in (, so we require the complement of
that set to be in ( as well. But the complement is ¦TH, TT¦ and that event is indeed in
(.
One point of view which we will explore much more fully later on is that the σ-field
tells you what events you “know.” In this example, T is the σ-field where you “know”
everything, while ( is the σ-field where you “know” only the result of the first toss but not
the second. We won’t try to be precise here, but to try to add to the intuition, suppose
one knows whether an event in T has happened or not for a particular outcome. We
would then know which of the events ¦HH¦, ¦HT¦, ¦TH¦, or ¦TT¦ has happened and so
would know what the two tosses of the coin showed. On the other hand, if we know which
events in ( happened, we would only know whether the event ¦HH, HT¦ happened, which
means we would know that the first toss was a heads, or we would know whether the event
¦TH, TT¦ happened, in which case we would know that the first toss was a tails. But
there is no way to tell what happened on the second toss from knowing which events in (
happened. Much more on this later.
The third basic ingredient is a probability.
4
Definition 2.2. A function P on T is a probability if it satisfies
(1) if A ∈ T, then 0 ≤ P(A) ≤ 1,
(2) P(Ω) = 1, and
(3) P(∅) = 0, and
(4) if A
1
, A
2
, . . . ∈ T are pairwise disjoint, then P(∪

i=1
A
i
) =
¸

i=1
P(A
i
).
A collection of sets A
i
is pairwise disjoint if A
i
∩ A
j
= ∅ unless i = j.
There are a number of conclusions one can draw from this definition. As one
example, if A ⊂ B, then P(A) ≤ P(B) and P(A
c
) = 1 − P(A). See Note 1 at the end of
this section for a proof.
Someone who has had measure theory will realize that a σ-field is the same thing
as a σ-algebra and a probability is a measure of total mass one.
A random variable (abbreviated r.v.) is a function X from Ω to R, the reals. To
be more precise, to be a r.v. X must also be measurable, which means that ¦ω : X(ω) ≥
a¦ ∈ T for all reals a.
The notion of measurability has a simple definition but is a bit subtle. If we take
the point of view that we know all the events in (, then if Y is (-measurable, then we
know Y . Phrased another way, suppose we know whether or not the event has occurred
for each event in (. Then if Y is (-measurable, we can compute the value of Y .
Here is an example. In the example above where we tossed a coin two times, let X
be the number of heads in the two tosses. Then X is T measurable but not ( measurable.
To see this, let us consider A
a
= ¦ω ∈ Ω : X(ω) ≥ a¦. This event will equal

Ω if a ≤ 0;
¦HH, HT, TH¦ if 0 < a ≤ 1;
¦HH¦ if 1 < a ≤ 2;
∅ if 2 < a.
For example, if a =
3
2
, then the event where the number of heads is
3
2
or greater is the
event where we had two heads, namely, ¦HH¦. Now observe that for each a the event A
a
is in T because T contains all subsets of Ω. Therefore X is measurable with respect to T.
However it is not true that A
a
is in ( for every value of a – take a =
3
2
as just one example
– the subset ¦HH¦ is not in (. So X is not measurable with respect to the σ-field (.
A discrete r.v. is one where P(ω : X(ω) = a) = 0 for all but countably many a’s,
say, a
1
, a
2
, . . ., and
¸
i
P(ω : X(ω) = a
i
) = 1. In defining sets one usually omits the ω;
thus (X = x) means the same as ¦ω : X(ω) = x¦.
In the discrete case, to check measurability with respect to a σ-field T, it is enough
that (X = a) ∈ T for all reals a. The reason for this is that if x
1
, x
2
, . . . are the values of
5
x for which P(X = x) = 0, then we can write (X ≥ a) = ∪
x
i
≥a
(X = x
i
) and we have a
countable union. So if (X = x
i
) ∈ T, then (X ≥ a) ∈ T.
Given a discrete r.v. X, the expectation or mean is defined by
EX =
¸
x
xP(X = x)
provided the sum converges. If X only takes finitely many values, then this is a finite sum
and of course it will converge. This is the situation that we will consider for quite some
time. However, if X can take an infinite number of values (but countable), convergence
needs to be checked. For example, if P(X = 2
n
) = 2
−n
for n = 1, 2, . . ., then EX =
¸

n=1
2
n
2
−n
= ∞.
There is an alternate definition of expectation which is equivalent in the discrete
setting. Set
EX =
¸
ω∈Ω
X(ω)P(¦ω¦).
To see that this is the same, look at Note 2 at the end of the section. The advantage of the
second definition is that some properties of expectation, such as E(X +Y ) = EX +EY ,
are immediate, while with the first definition they require quite a bit of proof.
We say two events A and B are independent if P(A∩B) = P(A)P(B). Two random
variables X and Y are independent if P(X ∈ A, Y ∈ B) = P(X ∈ A)P(X ∈ B) for all A
and B that are subsets of the reals. The comma in the expression P(X ∈ A, Y ∈ B) means
“and.” Thus
P(X ∈ A, Y ∈ B) = P((X ∈ A) ∩ (Y ∈ B)).
The extension of the definition of independence to the case of more than two events or
random variables is not surprising: A
1
, . . . , A
n
are independent if
P(A
i
1
∩ ∩ A
i
j
) = P(A
i
1
) P(A
i
j
)
whenever ¦i
1
, . . . , i
j
¦ is a subset of ¦1, . . . , n¦.
A common misconception is that an event is independent of itself. If A is an event
that is independent of itself, then
P(A) = P(A∩ A) = P(A)P(A) = (P(A))
2
.
The only finite solutions to the equation x = x
2
are x = 0 and x = 1, so an event is
independent of itself only if it has probability 0 or 1.
Two σ-fields T and ( are independent if A and B are independent whenever A ∈ T
and B ∈ (. A r.v. X and a σ-field ( are independent if P((X ∈ A) ∩B) = P(X ∈ A)P(B)
whenever A is a subset of the reals and B ∈ (.
6
As an example, suppose we toss a coin two times and we define the σ-fields (
1
=
¦∅, Ω, ¦HH, HT¦, ¦TH, TT¦¦ and (
2
= ¦∅, Ω, ¦HH, TH¦, ¦HT, TT¦¦. Then (
1
and (
2
are
independent if P(HH) = P(HT) = P(TH) = P(TT) =
1
4
. (Here we are writing P(HH)
when a more accurate way would be to write P(¦HH¦).) An easy way to understand this
is that if we look at an event in (
1
that is not ∅ or Ω, then that is the event that the first
toss is a heads or it is the event that the first toss is a tails. Similarly, a set other than ∅
or Ω in (
2
will be the event that the second toss is a heads or that the second toss is a
tails.
If two r.v.s X and Y are independent, we have the multiplication theorem, which
says that E(XY ) = (EX)(EY ) provided all the expectations are finite. See Note 3 for a
proof.
Suppose X
1
, . . . , X
n
are n independent r.v.s, such that for each one P(X
i
= 1) = p,
P(X
i
= 0) = 1 − p, where p ∈ [0, 1]. The random variable S
n
=
¸
n
i=1
X
i
is called a
binomial r.v., and represents, for example, the number of successes in n trials, where the
probability of a success is p. An important result in probability is that
P(S
n
= k) =
n!
k!(n −k)!
p
k
(1 −p)
n−k
.
The variance of a random variable is
Var X = E[(X −EX)
2
].
This is also equal to
E[X
2
] −(EX)
2
.
It is an easy consequence of the multiplication theorem that if X and Y are independent,
Var (X +Y ) = Var X + Var Y.
The expression E[X
2
] is sometimes called the second moment of X.
We close this section with a definition of conditional probability. The probability
of A given B, written P(A [ B) is defined by
P(A∩ B)
P(B)
,
provided P(B) = 0. The conditional expectation of X given B is defined to be
E[X; B]
P(B)
,
7
provided P(B) = 0. The notation E[X; B] means E[X1
B
], where 1
B
(ω) is 1 if ω ∈ B and
0 otherwise. Another way of writing E[X; B] is
E[X; B] =
¸
ω∈B
X(ω)P(¦ω¦).
(We will use the notation E[X; B] frequently.)
Note 1. Suppose we have two disjoint sets C and D. Let A
1
= C, A
2
= D, and A
i
= ∅ for
i ≥ 3. Then the A
i
are pairwise disjoint and
P(C ∪ D) = P(∪

i=1
A
i
) =

¸
i=1
P(A
i
) = P(C) +P(D) (2.1)
by Definition 2.2(3) and (4). Therefore Definition 2.2(4) holds when there are only two sets
instead of infinitely many, and a similar argument shows the same is true when there are an
arbitrary (but finite) number of sets.
Now suppose A ⊂ B. Let C = A and D = B − A, where B − A is defined to be
B ∩ A
c
(this is frequently written B ` A as well). Then C and D are disjoint, and by (2.1)
P(B) = P(C ∪ D) = P(C) +P(D) ≥ P(C) = P(A).
The other equality we mentioned is proved by letting C = A and D = A
c
. Then C and
D are disjoint, and
1 = P(Ω) = P(C ∪ D) = P(C) +P(D) = P(A) +P(A
c
).
Solving for P(A
c
), we have
P(A
c
) = 1 −P(A).
Note 2. Let us show the two definitions of expectation are the same (in the discrete case).
Starting with the first definition we have
EX =
¸
x
xP(X = x)
=
¸
x
x
¸
{ω∈Ω:X(ω)=x}
P(¦ω¦)
=
¸
x
¸
{ω∈Ω:X(ω)=x}
X(ω)P(¦ω¦)
=
¸
ω∈Ω
X(ω)P(¦ω¦),
8
and we end up with the second definition.
Note 3. Suppose X can takes the values x
1
, x
2
, . . . and Y can take the values y
1
, y
2
, . . ..
Let A
i
= ¦ω : X(ω) = x
i
¦ and B
j
= ¦ω : Y (ω) = y
j
¦. Then
X =
¸
i
x
i
1
A
i
, Y =
¸
j
y
j
1
B
j
,
and so
XY =
¸
i
¸
j
x
i
y
i
1
A
i
1
B
j
.
Since 1
A
i
1
B
j
= 1
A
i
∩B
j
, it follows that
E[XY ] =
¸
i
¸
j
x
i
y
j
P(A
i
∩ B
j
),
assuming the double sum converges. Since X and Y are independent, A
i
= (X = x
i
) is
independent of B
j
= (Y = y
j
) and so
E[XY ] =
¸
i
¸
j
x
i
y
j
P(A
i
)P(B
j
)
=
¸
i
x
i
P(A
i
)

¸
j
y
j
P(B
j
)

=
¸
i
x
i
P(A
i
)EY
= (EX)(EY ).
9
3. Conditional expectation.
Suppose we have 200 men and 100 women, 70 of the men are smokers, and 50 of
the women are smokers. If a person is chosen at random, then the conditional probability
that the person is a smoker given that it is a man is 70 divided by 200, or 35%, while the
conditional probability the person is a smoker given that it is a women is 50 divided by
100, or 50%. We will want to be able to encompass both facts in a single entity.
The way to do that is to make conditional probability a random variable rather
than a number. To reiterate, we will make conditional probabilities random. Let M, W be
man, woman, respectively, and S, S
c
smoker and nonsmoker, respectively. We have
P(S [ M) = .35, P(S [ W) = .50.
We introduce the random variable
(.35)1
M
+ (.50)1
W
and use that for our conditional probability. So on the set M its value is .35 and on the
set W its value is .50.
We need to give this random variable a name, so what we do is let ( be the σ-field
consisting of ¦∅, Ω, M, W¦ and denote this random variable P(S [ (). Thus we are going
to talk about the conditional probability of an event given a σ-field.
What is the precise definition?
Definition 3.1. Suppose there exist finitely (or countably) many sets B
1
, B
2
, . . ., all hav-
ing positive probability, such that they are pairwise disjoint, Ω is equal to their union, and
( is the σ-field one obtains by taking all finite or countable unions of the B
i
. Then the
conditional probability of A given ( is
P(A [ () =
¸
i
P(A∩ B
i
)
P(B
i
)
1
B
i
(ω).
In short, on the set B
i
the conditional probability is equal to P(A [ B
i
).
Not every σ-field can be so represented, so this definition will need to be extended
when we get to continuous models. σ-fields that can be represented as in Definition 3.1 are
called finitely (or countably) generated and are said to be generated by the sets B
1
, B
2
, . . ..
Let’s look at another example. Suppose Ω consists of the possible results when we
toss a coin three times: HHH, HHT, etc. Let T
3
denote all subsets of Ω. Let T
1
consist of
the sets ∅, Ω, ¦HHH, HHT, HTH, HTT¦, and ¦THH, THT, TTH, TTT¦. So T
1
consists
of those events that can be determined by knowing the result of the first toss. We want to
let T
2
denote those events that can be determined by knowing the first two tosses. This will
10
include the sets ∅, Ω, ¦HHH, HHT¦, ¦HTH, HTT¦, ¦THH, THT¦, ¦TTH, TTT¦. This is
not enough to make T
2
a σ-field, so we add to T
2
all sets that can be obtained by taking
unions of these sets.
Suppose we tossed the coin independently and suppose that it was fair. Let us
calculate P(A [ T
1
), P(A [ T
2
), and P(A [ T
3
) when A is the event ¦HHH¦. First
the conditional probability given T
1
. Let C
1
= ¦HHH, HHT, HTH, HTT¦ and C
2
=
¦THH, THT, TTH, TTT¦. On the set C
1
the conditional probability is P(A∩C
1
)/P(C
1
) =
P(HHH)/P(C
1
) =
1
8
/
1
2
=
1
4
. On the set C
2
the conditional probability is P(A∩C
2
)/P(C
2
)
= P(∅)/P(C
2
) = 0. Therefore P(A [ T
1
) = (.25)1
C
1
. This is plausible – the probability of
getting three heads given the first toss is
1
4
if the first toss is a heads and 0 otherwise.
Next let us calculate P(A [ T
2
). Let D
1
= ¦HHH, HHT¦, D
2
= ¦HTH, HTT¦, D
3
= ¦THH, THT¦, D
4
= ¦TTH, TTT¦. So T
2
is the σ-field consisting of all possible unions
of some of the D
i
’s. P(A [ D
1
) = P(HHH)/P(D
1
) =
1
8
/
1
4
=
1
2
. Also, as above, P(A [
D
i
) = 0 for i = 2, 3, 4. So P(A [ T
2
) = (.50)1
D
1
. This is again plausible – the probability
of getting three heads given the first two tosses is
1
2
if the first two tosses were heads and
0 otherwise.
What about conditional expectation? Recall E[X; B
i
] = E[X1
B
i
] and also that
E[1
B
] = 1 P(1
B
= 1) + 0 P(1
B
= 0) = P(B). Given a random variable X, we define
E[X [ (] =
¸
i
E[X; B
i
]
P(B
i
)
1
B
i
.
This is the obvious definition, and it agrees with what we had before because E[1
A
[ (]
should be equal to P(A [ ().
We now turn to some properties of conditional expectation. Some of the following
propositions may seem a bit technical. In fact, they are! However, these properties are
crucial to what follows and there is no choice but to master them.
Proposition 3.2. E[X [ (] is ( measurable, that is, if Y = E[X [ (], then (Y > a) is a
set in ( for each real a.
Proof. By the definition,
Y = E[X [ (] =
¸
i
E[X; B
i
]
P(B
i
)
1
B
i
=
¸
i
b
i
1
B
i
if we set b
i
= E[X; B
i
]/P(B
i
). The set (Y ≥ a) is a union of some of the B
i
, namely, those
B
i
for which b
i
≥ a. But the union of any collection of the B
i
is in (.
An example might help. Suppose
Y = 2 1
B
1
+ 3 1
B
2
+ 6 1
B
3
+ 4 1
B
4
and a = 3.5. Then (Y ≥ a) = B
3
∪ B
4
, which is in (.
11
Proposition 3.3. If C ∈ ( and Y = E[X [ (], then E[Y ; C] = E[X; C].
Proof. Since Y =
¸
E[X;B
i
]
P(B
i
)
1
B
i
and the B
i
are disjoint, then
E[Y ; B
j
] =
E[X; B
j
]
P(B
j
)
E1
B
j
= E[X; B
j
].
Now if C = B
j
1
∪ ∪B
j
n
∪ , summing the above over the j
k
gives E[Y ; C] = E[X; C].
Let us look at the above example for this proposition, and let us do the case where
C = B
2
. Note 1
B
2
1
B
2
= 1
B
2
because the product is 1 1 = 1 if ω is in B
2
and 0 otherwise.
On the other hand, it is not possible for an ω to be in more than one of the B
i
, so
1
B
2
1
B
i
= 0 if i = 2. Multiplying Y in the above example by 1
B
2
, we see that
E[Y ; C] = E[Y ; B
2
] = E[Y 1
B
2
] = E[3 1
B
2
]
= 3E[1
B
2
] = 3P(B
2
).
However the number 3 is not just any number; it is E[X; B
2
]/P(B
2
). So
3P(B
2
) =
E[X; B
2
]
P(B
2
)
P(B
2
) = E[X; B
2
] = E[X; C],
just as we wanted. If C = B
1
∪ B
4
, for example, we then write
E[X; C] = E[X1
C
] = E[X(1
B
2
+ 1
B
4
)]
= E[X1
B
2
] +E[X1
B
4
] = E[X; B
2
] +E[X; B
4
].
By the first part, this equals E[Y ; B
2
]+E[Y ; B
4
], and we undo the above string of equalities
but with Y instead of X to see that this is E[Y ; C].
If a r.v. Y is ( measurable, then for any a we have (Y = a) ∈ ( which means that
(Y = a) is the union of one or more of the B
i
. Since the B
i
are disjoint, it follows that Y
must be constant on each B
i
.
Again let us look at an example. Suppose Z takes only the values 1, 3, 4, 7. Let
D
1
= (Z = 1), D
2
= (Z = 3), D
3
= (Z = 4), D
4
= (Z = 7). Note that we can write
Z = 1 1
D
1
+ 3 1
D
2
+ 4 1
D
3
+ 7 1
D
4
.
To see this, if ω ∈ D
2
, for example, the right hand side will be 0+3 1+0+0, which agrees
with Z(ω). Now if Z is ( measurable, then (Z ≥ a) ∈ ( for each a. Take a = 7, and we
see D
4
∈ (. Take a = 4 and we see D
3
∪ D
4
∈ (. Taking a = 3 shows D
2
∪ D
3
∪ D
4
∈ (.
12
Now D
3
= (D
3
∪D
4
) ∩D
c
4
, so since ( is a σ-field, D
3
∈ (. Similarly D
2
, D
1
∈ (. Because
sets in ( are unions of the B
i
’s, we must have Z constant on the B
i
’s. For example, if it
so happened that D
1
= B
1
, D
2
= B
2
∪ B
4
, D
3
= B
3
∪ B
6
∪ B
7
, and D
4
= B
5
, then
Z = 1 1
B
1
+ 3 1
B
2
+ 4 1
B
3
+ 3 1
B
4
+ 7 1
B
5
+ +4 1
B
6
+ 4 1
B
7
.
We still restrict ourselves to the discrete case. In this context, the properties given
in Propositions 3.2 and 3.3 uniquely determine E[X [ (].
Proposition 3.4. Suppose Z is ( measurable and E[Z; C] = E[X; C] whenever C ∈ (.
Then Z = E[X [ (].
Proof. Since Z is ( measurable, then Z must be constant on each B
i
. Let the value of Z
on B
i
be z
i
. So Z =
¸
i
z
i
1
B
i
. Then
z
i
P(B
i
) = E[Z; B
i
] = E[X; B
i
],
or z
i
= E[X; B
i
]/P(B
i
) as required.
The following propositions contain the main facts about this new definition of con-
ditional expectation that we will need.
Proposition 3.5. (1) If X
1
≥ X
2
, then E[X
1
[ (] ≥ E[X
2
[ (].
(2) E[aX
1
+bX
2
[ (] = aE[X
1
[ (] +bE[X
2
[ (].
(3) If X is ( measurable, then E[X [ (] = X.
(4) E[E[X [ (]] = EX.
(5) If X is independent of (, then E[X [ (] = EX.
We will prove Proposition 3.5 in Note 1 at the end of the section. At this point it
is more fruitful to understand what the proposition says.
We will see in Proposition 3.8 below that we may think of E[X [ (] as the best
prediction of X given (. Accepting this for the moment, we can give an interpretation of
(1)-(5). (1) says that if X
1
is larger than X
2
, then the predicted value of X
1
should be
larger than the predicted value of X
2
. (2) says that the predicted value of X
1
+X
2
should
be the sum of the predicted values. (3) says that if we know ( and X is ( measurable,
then we know X and our best prediction of X is X itself. (4) says that the average of the
predicted value of X should be the average value of X. (5) says that if knowing ( gives us
no additional information on X, then the best prediction for the value of X is just EX.
Proposition 3.6. If Z is ( measurable, then E[XZ [ (] = ZE[X [ (].
We again defer the proof, this time to Note 2.
Proposition 3.6 says that as far as conditional expectations with respect to a σ-
field ( go, (-measurable random variables act like constants: they can be taken inside or
outside the conditional expectation at will.
13
Proposition 3.7. If H ⊂ ( ⊂ T, then
E[E[X [ H] [ (] = E[X [ H] = E[E[X [ (] [ H].
Proof. E[X [ H] is H measurable, hence ( measurable, since H ⊂ (. The left hand
equality now follows by Proposition 3.5(3). To get the right hand equality, let W be the
right hand expression. It is H measurable, and if C ∈ H ⊂ (, then
E[W; C] = E[E[X [ (]; C] = E[X; C]
as required.
In words, if we are predicting a prediction of X given limited information, this is
the same as a single prediction given the least amount of information.
Let us verify that conditional expectation may be viewed as the best predictor of
a random variable given a σ-field. If X is a r.v., a predictor Z is just another random
variable, and the goodness of the prediction will be measured by E[(X − Z)
2
], which is
known as the mean square error.
Proposition 3.8. If X is a r.v., the best predictor among the collection of (-measurable
random variables is Y = E[X [ (].
Proof. Let Z be any (-measurable random variable. We compute, using Proposition
3.5(3) and Proposition 3.6,
E[(X −Z)
2
[ (] = E[X
2
[ (] −2E[XZ [ (] +E[Z
2
[ (]
= E[X
2
[ (] −2ZE[X [ (] +Z
2
= E[X
2
[ (] −2ZY +Z
2
= E[X
2
[ (] −Y
2
+ (Y −Z)
2
= E[X
2
[ (] −2Y E[X [ (] +Y
2
+ (Y −Z)
2
= E[X
2
[ (] −2E[XY [ (] +E[Y
2
[ (] + (Y −Z)
2
= E[(X −Y )
2
[ (] + (Y −Z)
2
.
We also used the fact that Y is ( measurable. Taking expectations and using Proposition
3.5(4),
E[(X −Z)
2
] = E[(X −Y )
2
] +E[(Y −Z)
2
].
The right hand side is bigger than or equal to E[(X −Y )
2
] because (Y −Z)
2
≥ 0. So the
error in predicting X by Z is larger than the error in predicting X by Y , and will be equal
if and only if Z = Y . So Y is the best predictor.
14
There is one more interpretation of conditional expectation that may be useful. The
collection of all random variables is a linear space, and the collection of all (-measurable
random variables is clearly a subspace. Given X, the conditional expectation Y = E[X [ (]
is equal to the projection of X onto the subspace of (-measurable random variables. To
see this, we write X = Y +(X −Y ), and what we have to check is that the inner product
of Y and X − Y is 0, that is, Y and X − Y are orthogonal. In this context, the inner
product of X
1
and X
2
is defined to be E[X
1
X
2
], so we must show E[Y (X−Y )] = 0. Note
E[Y (X −Y ) [ (] = Y E[X −Y [ (] = Y (E[X [ (] −Y ) = Y (Y −Y ) = 0.
Taking expectations,
E[Y (X −Y )] = E[E[Y (X −Y ) [ (] ] = 0,
just as we wished.
If Y is a discrete random variable, that is, it takes only countably many values
y
1
, y
2
, . . ., we let B
i
= (Y = y
i
). These will be disjoint sets whose union is Ω. If σ(Y )
is the collection of all unions of the B
i
, then σ(Y ) is a σ-field, and is called the σ-field
generated by Y . It is easy to see that this is the smallest σ-field with respect to which Y
is measurable. We write E[X [ Y ] for E[X [ σ(Y )].
Note 1. We prove Proposition 3.5. (1) and (2) are immediate from the definition. To prove
(3), note that if Z = X, then Z is ( measurable and E[X; C] = E[Z; C] for any C ∈ (; this
is trivial. By Proposition 3.4 it follows that Z = E[X [ (];this proves (3). To prove (4), if we
let C = Ω and Y = E[X [ (], then EY = E[Y ; C] = E[X; C] = EX.
Last is (5). Let Z = EX. Z is constant, so clearly ( measurable. By the in-
dependence, if C ∈ (, then E[X; C] = E[X1
C
] = (EX)(E1
C
) = (EX)(P(C)). But
E[Z; C] = (EX)(P(C)) since Z is constant. By Proposition 3.4 we see Z = E[X [ (].
Note 2. We prove Proposition 3.6. Note that ZE[X [ (] is ( measurable, so by Proposition
3.4 we need to show its expectation over sets C in ( is the same as that of XZ. As in the
proof of Proposition 3.3, it suffices to consider only the case when C is one of the B
i
. Now Z
is ( measurable, hence it is constant on B
i
; let its value be z
i
. Then
E[ZE[X [ (]; B
i
] = E[z
i
E[X [ (]; B
i
] = z
i
E[E[X [ (]; B
i
] = z
i
E[X; B
i
] = E[XZ; B
i
]
as desired.
15
4. Martingales.
Suppose we have a sequence of σ-fields T
1
⊂ T
2
⊂ T
3
. An example would be
repeatedly tossing a coin and letting T
k
be the sets that can be determined by the first
k tosses. Another example is to let T
k
be the events that are determined by the values
of a stock at times 1 through k. A third example is to let X
1
, X
2
, . . . be a sequence of
random variables and let T
k
be the σ-field generated by X
1
, . . . , X
k
, the smallest σ-field
with respect to which X
1
, . . . , X
k
are measurable.
Definition 4.1. A r.v. X is integrable if E[X[ < ∞. Given an increasing sequence of
σ-fields T
n
, a sequence of r.v.’s X
n
is adapted if X
n
is T
n
measurable for each n.
Definition 4.2. A martingale M
n
is a sequence of random variables such that
(1) M
n
is integrable for all n,
(2) M
n
is adapted to T
n
, and
(3) for all n
E[M
n+1
[ T
n
] = M
n
. (4.1)
Usually (1) and (2) are easy to check, and it is (3) that is the crucial property. If
we have (1) and (2), but instead of (3) we have
(3/) for all n
E[M
n+1
[ T
n
] ≥ M
n
,
then we say M
n
is a submartingale. If we have (1) and (2), but instead of (3) we have
(3//) for all n
E[M
n+1
[ T
n
] ≤ M
n
,
then we say M
n
is a supermartingale.
Submartingales tends to increase and supermartingales tend to decrease. The
nomenclature may seem like it goes the wrong way; Doob defined these terms by anal-
ogy with the notions of subharmonic and superharmonic functions in analysis. (Actually,
it is more than an analogy: we won’t explore this, but it turns out that the composition
of a subharmonic function with Brownian motion yields a submartingale, and similarly for
superharmonic functions.)
Note that the definition of martingale depends on the collection of σ-fields. When
it is needed for clarity, one can say that (M
n
, T
n
) is a martingale. To define conditional
expectation, one needs a probability, so a martingale depends on the probability as well.
When we need to, we will say that M
n
is a martingale with respect to the probability P.
This is an issue when there is more than one probability around.
We will see that martingales are ubiquitous in financial math. For example, security
prices and one’s wealth will turn out to be examples of martingales.
16
The word “martingale” is also used for the piece of a horse’s bridle that runs from
the horse’s head to its chest. It keeps the horse from raising its head too high. It turns out
that martingales in probability cannot get too large. The word also refers to a gambling
system. I did some searching on the Internet, and there seems to be no consensus on the
derivation of the term.
Here is an example of a martingale. Let X
1
, X
2
, . . . be a sequence of independent
r.v.’s with mean 0 that are independent. (Saying a r.v. X
i
has mean 0 is the same as
saying EX
i
= 0; this presupposes that E[X
1
[ is finite.) Set T
n
= σ(X
1
, . . . , X
n
), the
σ-field generated by X
1
, . . . , X
n
. Let M
n
=
¸
n
i=1
X
i
. Definition 4.2(2) is easy to see.
Since E[M
n
[ ≤
¸
n
i=1
E[X
i
[, Definition 4.2(1) also holds. We now check
E[M
n+1
[ T
n
] = X
1
+ +X
n
+E[X
n+1
[ T
n
] = M
n
+EX
n+1
= M
n
,
where we used the independence.
Another example: suppose in the above that the X
k
all have variance 1, and let
M
n
= S
2
n
−n, where S
n
=
¸
n
i=1
X
i
. Again (1) and (2) of Definition 4.2 are easy to check.
We compute
E[M
n+1
[ T
n
] = E[S
2
n
+ 2X
n+1
S
n
+X
2
n+1
[ T
n
] −(n + 1).
We have E[S
2
n
[ T
n
] = S
2
n
since S
n
is T
n
measurable.
E[2X
n+1
S
n
[ T
n
] = 2S
n
E[X
n+1
[ T
n
] = 2S
n
EX
n+1
= 0.
And E[X
2
n+1
[ T
n
] = EX
2
n+1
= 1. Substituting, we obtain E[M
n+1
[ T
n
] = M
n
, or M
n
is
a martingale.
A third example: Suppose you start with a dollar and you are tossing a fair coin
independently. If it turns up heads you double your fortune, tails you go broke. This is
“double or nothing.” Let M
n
be your fortune at time n. To formalize this, let X
1
, X
2
, . . .
be independent r.v.’s that are equal to 2 with probability
1
2
and 0 with probability
1
2
. Then
M
n
= X
1
X
n
. Let T
n
be the σ-field generated by X
1
, . . . , X
n
. Note 0 ≤ M
n
≤ 2
n
, and
so Definition 4.2(1) is satisfied, while (2) is easy. To compute the conditional expectation,
note EX
n+1
= 1. Then
E[M
n+1
[ T
n
] = M
n
E[X
n+1
[ T
n
] = M
n
EX
n+1
= M
n
,
using the independence.
Before we give our fourth example, let us observe that
[E[X [ T][ ≤ E[[X[ [ T]. (4.2)
To see this, we have −[X[ ≤ X ≤ [X[, so −E[[X[ [ T] ≤ E[X [ T] ≤ E[[X[ [ T]. Since
E[[X[ [ T] is nonnegative, (4.2) follows.
Our fourth example will be used many times, so we state it as a proposition.
17
Proposition 4.3. Let T
1
, T
2
, . . . be given and let X be a fixed r.v. with E[X[ < ∞. Let
M
n
= E[X [ T
n
]. Then M
n
is a martingale.
Proof. Definition 4.2(2) is clear, while
E[M
n
[ ≤ E[E[[X[ [ T
n
]] = E[X[ < ∞
by (4.2); this shows Definition 4.2(1). We have
E[M
n+1
[ T
n
] = E[E[X [ T
n+1
] [ T
n
] = E[X [ T
n
] = M
n
.
18
5. Properties of martingales.
When it comes to discussing American options, we will need the concept of stopping
times. A mapping τ from Ω into the nonnegative integers is a stopping time if (τ = k) ∈ T
k
for each k. One sometimes allows τ to also take on the value ∞.
An example is τ = min¦k : S
k
≥ A¦. This is a stopping time because (τ = k) =
(S
0
, S
1
, . . . , S
k−1
< A, S
k
≥ A) ∈ T
k
. We can think of a stopping time as the first time
something happens. σ = max¦k : S
k
≥ A¦, the last time, is not a stopping time. (We will
use the convention that the minimum of an empty set is +∞; so, for example, with the
above definition of τ, on the event that S
k
is never in A, we have τ = ∞.
Here is an intuitive description of a stopping time. If I tell you to drive to the city
limits and then drive until you come to the second stop light after that, you know when
you get there that you have arrived; you don’t need to have been there before or to look
ahead. But if I tell you to drive until you come to the second stop light before the city
limits, either you must have been there before or else you have to go past where you are
supposed to stop, continue on to the city limits, and then turn around and come back two
stop lights. You don’t know when you first get to the second stop light before the city
limits that you get to stop there. The first set of instructions forms a stopping time, the
second set does not.
Note (τ ≤ k) = ∪
k
j=0
(τ = j). Since (τ = j) ∈ T
j
⊂ T
k
, then the event (τ ≤ k) ∈ T
k
for all k. Conversely, if τ is a r.v. with (τ ≤ k) ∈ T
k
for all k, then
(τ = k) = (τ ≤ k) −(τ ≤ k −1).
Since (τ ≤ k) ∈ T
k
and (τ ≤ k −1) ∈ T
k−1
⊂ T
k
, then (τ = k) ∈ T
k
, and such a τ must
be a stopping time.
Our first result is Jensen’s inequality.
Proposition 5.1. If g is convex, then
g(E[X [ (]) ≤ E[g(X) [ (]
provided all the expectations exist.
For ordinary expectations rather than conditional expectations, this is still true.
That is, if g is convex and the expectations exist, then
g(EX) ≤ E[g(X)].
We already know some special cases of this: when g(x) = [x[, this says [EX[ ≤ E[X[;
when g(x) = x
2
, this says (EX)
2
≤ EX
2
, which we know because EX
2
− (EX)
2
=
E(X −EX)
2
≥ 0.
19
For Proposition 5.1 as well as many of the following propositions, the statement of
the result is more important than the proof, and we relegate the proof to Note 1 below.
One reason we want Jensen’s inequality is to show that a convex function applied
to a martingale yields a submartingale.
Proposition 5.2. If M
n
is a martingale and g is convex, then g(M
n
) is a submartingale,
provided all the expectations exist.
Proof. By Jensen’s inequality,
E[g(M
n+1
) [ T
n
] ≥ g(E[M
n+1
[ T
n
]) = g(M
n
).
If M
n
is a martingale, then EM
n
= E[E[M
n+1
[ T
n
]] = EM
n+1
. So EM
0
=
EM
1
= = EM
n
. Doob’s optional stopping theorem says the same thing holds when
fixed times n are replaced by stopping times.
Theorem 5.3. Suppose K is a positive integer, N is a stopping time such that N ≤ K
a.s., and M
n
is a martingale. Then
EM
N
= EM
K
.
Here, to evaluate M
N
, one first finds N(ω) and then evaluates M
·
(ω) for that value of N.
Proof. We have
EM
N
=
K
¸
k=0
E[M
N
; N = k].
If we show that the k-th summand is E[M
n
; N = k], then the sum will be
K
¸
k=0
E[M
n
; N = k] = EM
n
as desired. We have
E[M
N
; N = k] = E[M
k
; N = k]
by the definition of M
N
. Now (N = k) is in T
k
, so by Proposition 2.2 and the fact that
M
k
= E[M
k+1
[ T
k
],
E[M
k
; N = k] = E[M
k+1
; N = k].
We have (N = k) ∈ T
k
⊂ T
k+1
. Since M
k+1
= E[M
k+2
[ T
k+1
], Proposition 2.2 tells us
that
E[M
k+1
; N = k] = E[M
k+2
; N = k].
20
We continue, using (N = k) ∈ T
k
⊂ T
k+1
⊂ T
k+2
, and we obtain
E[M
N
; N = k] = E[M
k
; N = k] = E[M
k+1
; N = k] = = E[M
n
; N = k].
If we change the equalities in the above to inequalities, the same result holds for sub-
martingales.
As a corollary we have two of Doob’s inequalities:
Theorem 5.4. If M
n
is a nonnegative submartingale,
(a) P(max
k≤n
M
k
≥ λ) ≤
1
λ
EM
n
.
(b) E(max
k≤n
M
2
k
) ≤ 4EM
2
n
.
For the proof, see Note 2 below.
Note 1. We prove Proposition 5.1. If g is convex, then the graph of g lies above all the
tangent lines. Even if g does not have a derivative at x
0
, there is a line passing through x
0
which lies beneath the graph of g. So for each x
0
there exists c(x
0
) such that
g(x) ≥ g(x
0
) +c(x
0
)(x −x
0
).
Apply this with x = X(ω) and x
0
= E[X [ (](ω). We then have
g(X) ≥ g(E[X [ (]) +c(E[X [ (])(X −E[X [ (]).
If g is differentiable, we let c(x
0
) = g

(x
0
). In the case where g is not differentiable, then we
choose c to be the left hand upper derivate, for example. (For those who are not familiar with
derivates, this is essentially the left hand derivative.) One can check that if c is so chosen,
then c(E[X [ (]) is ( measurable.
Now take the conditional expectation with respect to (. The first term on the right is
( measurable, so remains the same. The second term on the right is equal to
c(E[X [ (])E[X −E[X [ (] [ (] = 0.
Note 2. We prove Theorem 5.4. Set M
n+1
= M
n
. It is easy to see that the sequence
M
1
, M
2
, . . . , M
n+1
is also a submartingale. Let N = min¦k : M
k
≥ λ¦ ∧ (n + 1), the first
time that M
k
is greater than or equal to λ, where a ∧ b = min(a, b). Then
P(max
k≤n
M
k
≥ λ) = P(N ≤ n)
21
and if N ≤ n, then M
N
≥ λ. Now
P(max
k≤n
M
k
≥ λ) = E[1
(N≤n)
] ≤ E

M
N
λ
; N ≤ n

(5.1)
=
1
λ
E[M
N∧n
; N ≤ n] ≤
1
λ
EM
N∧n
.
Finally, since M
n
is a submartingale, EM
N∧n
≤ EM
n
.
We now look at (b). Let us write M

for max
k≤n
M
k
. If EM
2
n
= ∞, there is nothing
to prove. If it is finite, then by Jensen’s inequality, we have
EM
2
k
= E[E[M
n
[ T
k
]
2
] ≤ E[E[M
2
n
[ T
k
] ] = EM
2
n
< ∞
for k ≤ n. Then
E(M

)
2
= E[ max
1≤k≤n
M
2
k
] ≤ E

n
¸
k=1
M
2
k

< ∞.
We have
E[M
N∧n
; N ≤ n] =

¸
k=0
E[M
k∧n
; N = k].
Arguing as in the proof of Theorem 5.3,
E[M
k∧n
; N = k] ≤ E[M
n
; N = k],
and so
E[M
N∧n
; N ≤ n] ≤

¸
k=0
E[M
n
; N = k] = E[M
n
; N ≤ n].
The last expression is at most E[M
n
; M

≥ λ]. If we multiply (5.1) by 2λ and integrate over
λ from 0 to ∞, we obtain


0
2λP(M

≥ λ)dλ ≤ 2


0
E[M
n
: M

≥ λ]
= 2E


0
M
n
1
(M

≥λ)

= 2E

M
n

M

0

= 2E[M
n
M

].
Using Cauchy-Schwarz, this is bounded by
2(EM
2
n
)
1/2
(E(M

)
2
)
1/2
.
22
On the other hand,


0
2λP(M

≥ λ)dλ = E


0
2λ1
(M

≥λ)

= E

M

0
2λdλ = E(M

)
2
.
We therefore have
E(M

)
2
≤ 2(EM
2
n
)
1/2
(E(M

)
2
)
1/2
.
Recall we showed E(M

)
2
< ∞. We divide both sides by (E(M

)
2
)
1/2
, square both sides,
and obtain (b).
Note 3. We will show that bounded martingales converge. (The hypothesis of boundedness
can be weakened; for example, E[M
n
[ ≤ c < ∞ for some c not depending on n suffices.)
Theorem 5.5. Suppose M
n
is a martingale bounded in absolute value by K. That is,
[M
n
[ ≤ K for all n. Then lim
n→∞
M
n
exists a.s.
Proof. Since M
n
is bounded, it can’t tend to +∞ or −∞. The only possibility is that it
might oscillate. Let a < b be two rationals. What might go wrong is that M
n
might be larger
than b infinitely often and less than a infinitely often. If we show the probability of this is 0,
then taking the union over all pairs of rationals (a, b) shows that almost surely M
n
cannot
oscillate, and hence must converge.
Fix a < b, let N
n
= (M
n
− a)
+
, and let S
1
= min¦k : N
k
≤ 0¦, T
1
= min¦k > S
1
:
N
k
≥ b − a¦, S
2
= min¦k > T
1
: N
k
≤ 0¦, and so on. Let U
n
= max¦k : T
k
≤ n¦. U
n
is called the number of upcrossings up to time n. We want to show that max
n
U
n
< ∞ a.s.
Note by Jensen’s inequality N
n
is a submartingale. Since S
1
< T
1
< S
2
< , then S
n+1
> n.
We can write
2K ≥ N
n
−N
S
n+1
∧n
=
n+1
¸
k=1
(N
S
k+1
∧n
−N
T
k
∧n
) +
n+1
¸
k=1
(N
T
k
∧n
−N
S
k
∧n
).
Now take expectations. The expectation of the first sum on the right and the last term are
greater than or equal to zero by optional stopping. The middle term is larger than (b −a)U
n
,
so we conclude
(b −a)EU
n
≤ 2K.
Let n → ∞ to see that E max
n
U
n
< ∞, which implies max
n
U
n
< ∞ a.s., which is what we
needed.
Note 4. We will state Fatou’s lemma in the following form.
If X
n
is a sequence of nonnegative random variables converging to X a.s., then EX ≤
sup
n
EX
n
.
This formulation is equivalent to the classical one and is better suited for our use.
23
6. The one step binomial asset pricing model.
Let us begin by giving the simplest possible model of a stock and see how a European
call option should be valued in this context.
Suppose we have a single stock whose price is S
0
. Let d and u be two numbers with
0 < d < 1 < u. Here “d” is a mnemonic for “down” and “u” for “up.” After one time unit
the stock price will be either uS
0
with probability P or else dS
0
with probability Q, where
P + Q = 1. We will assume 0 < P, Q < 1. Instead of purchasing shares in the stock, you
can also put your money in the bank where one will earn interest at rate r. Alternatives
to the bank are money market funds or bonds; the key point is that these are considered
to be risk-free.
A European call option in this context is the option to buy one share of the stock
at time 1 at price K. K is called the strike price. Let S
1
be the price of the stock at time
1. If S
1
is less than K, then the option is worthless at time 1. If S
1
is greater than K, you
can use the option at time 1 to buy the stock at price K, immediately turn around and
sell the stock for price S
1
and make a profit of S
1
−K. So the value of the option at time
1 is
V
1
= (S
1
−K)
+
,
where x
+
is max(x, 0). The principal question to be answered is: what is the value V
0
of
the option at time 0? In other words, how much should one pay for a European call option
with strike price K?
It is possible to buy a negative number of shares of a stock. This is equivalent to
selling shares of a stock you don’t have and is called selling short. If you sell one share
of stock short, then at time 1 you must buy one share at whatever the market price is at
that time and turn it over to the person that you sold the stock short to. Similarly you
can buy a negative number of options, that is, sell an option.
You can also deposit a negative amount of money in the bank, which is the same
as borrowing. We assume that you can borrow at the same interest rate r, not exactly a
totally realistic assumption. One way to make it seem more realistic is to assume you have
a large amount of money on deposit, and when you borrow, you simply withdraw money
from that account.
We are looking at the simplest possible model, so we are going to allow only one
time step: one makes an investment, and looks at it again one day later.
Let’s suppose the price of a European call option is V
0
and see what conditions
one can put on V
0
. Suppose you start out with V
0
dollars. One thing you could do is
buy one option. The other thing you could do is use the money to buy ∆
0
shares of
stock. If V
0
> ∆
0
S
0
, there will be some money left over and you put that in the bank. If
V
0
< ∆
0
S
0
, you do not have enough money to buy the stock, and you make up the shortfall
by borrowing money from the bank. In either case, at this point you have V
0
− ∆
0
S
0
in
24
the bank and ∆
0
shares of stock.
If the stock goes up, at time 1 you will have

0
uS
0
+ (1 +r)(V
0
−∆
0
S
0
),
and if it goes down,

0
dS
0
+ (1 +r)(V
0
−∆
0
S
0
).
We have not said what ∆
0
should be. Let us do that now. Let V
u
1
= (uS
0
− K)
+
and V
d
1
= (dS
0
−K)
+
. Note these are deterministic quantities, i.e., not random. Let

0
=
V
u
1
−V
d
1
uS
0
−dS
0
,
and we will also need
W
0
=
1
1 +r

1 +r −d
u −d
V
u
1
+
u −(1 +r)
u −d
V
d
1

.
In a moment we will do some algebra and see that if the stock goes up and you had
bought stock instead of the option you would now have
V
u
1
+ (1 +r)(V
0
−W
0
),
while if the stock went down, you would now have
V
d
1
+ (1 +r)(V
0
−W
0
).
Let’s check the first of these, the second being similar. We need to show

0
uS
0
+ (1 +r)(V
0
−∆
0
S
0
) = V
u
1
+ (1 +r)(V
0
−W
0
). (6.1)
The left hand side of (6.1) is equal to

0
S
0
(u −(1 +r)) + (1 +r)V
0
=
V
u
1
−V
d
1
u −d
(u −(1 +r)) + (1 +r)V
0
. (6.2)
The right hand side of (6.1) is equal to
V
u
1

1 +r −d
u −d
V
u
1
+
u −(1 +r)
u −d
V
d
1

+ (1 +r)V
0
. (6.3)
Now check that the coefficients of V
0
, of V
u
1
, and of V
d
1
agree in (6.2) and (6.3).
Suppose that V
0
> W
0
. What you want to do is come along with no money, sell
one option for V
0
dollars, use the money to buy ∆
0
shares, and put the rest in the bank
25
(or borrow if necessary). If the buyer of your option wants to exercise the option, you give
him one share of stock and sell the rest. If he doesn’t want to exercise the option, you sell
your shares of stock and pocket the money. Remember it is possible to have a negative
number of shares. You will have cleared (1 + r)(V
0
− W
0
), whether the stock went up or
down, with no risk.
If V
0
< W
0
, you just do the opposite: sell ∆
0
shares of stock short, buy one option,
and deposit or make up the shortfall from the bank. This time, you clear (1+r)(W
0
−V
0
),
whether the stock goes up or down.
Now most people believe that you can’t make a profit on the stock market without
taking a risk. The name for this is “no free lunch,” or “arbitrage opportunities do not
exist.” The only way to avoid this is if V
0
= W
0
. In other words, we have shown that the
only reasonable price for the European call option is W
0
.
The “no arbitrage” condition is not just a reflection of the belief that one cannot get
something for nothing. It also represents the belief that the market is freely competitive.
The way it works is this: suppose W
0
= $3. Suppose you could sell options at a price
V
0
= $5; this is larger than W
0
and you would earn V
0
−W
0
= $2 per option without risk.
Then someone else would observe this and decide to sell the same option at a price less
than V
0
but larger than W
0
, say $4. This person would still make a profit, and customers
would go to him and ignore you because they would be getting a better deal. But then a
third person would decide to sell the option for less than your competition but more than
W
0
, say at $3.50. This would continue as long as any one would try to sell an option above
price W
0
.
We will examine this problem of pricing options in more complicated contexts, and
while doing so, it will become apparent where the formulas for ∆
0
and W
0
came from. At
this point, we want to make a few observations.
Remark 6.1. First of all, if 1 +r > u, one would never buy stock, since one can always
do better by putting money in the bank. So we may suppose 1 + r < u. We always have
1 +r ≥ 1 > d. If we set
p =
1 +r −d
u −d
, q =
u −(1 +r)
u −d
,
then p, q ≥ 0 and p +q = 1. Thus p and q act like probabilities, but they have nothing to
do with P and Q. Note also that the price V
0
= W
0
does not depend on P or Q. It does
depend on p and q, which seems to suggest that there is an underlying probability which
controls the option price and is not the one that governs the stock price.
Remark 6.2. There is nothing special about European call options in our argument
above. One could let V
u
1
and V
1
d
be any two values of any option, which are paid out if the
26
stock goes up or down, respectively. The above analysis shows we can exactly duplicate
the result of buying any option V by instead buying some shares of stock. If in some model
one can do this for any option, the market is called complete in this model.
Remark 6.3. If we let P be the probability so that S
1
= uS
0
with probability p and
S
1
= dS
0
with probability q and we let E be the corresponding expectation, then some
algebra shows that
V
0
=
1
1 +r
EV
1
.
This will be generalized later.
Remark 6.4. If one buys one share of stock at time 0, then one expects at time 1 to
have (Pu + Qd)S
0
. One then divides by 1 + r to get the value of the stock in today’s
dollars. (r, the risk-free interest rate, can also be considered the rate of inflation. A dollar
tomorrow is equivalent to 1/(1 +r) dollars today.) Suppose instead of P and Q being the
probabilities of going up and down, they were in fact p and q. One would then expect to
have (pu+qd)S
0
and then divide by 1+r. Substituting the values for p and q, this reduces
to S
0
. In other words, if p and q were the correct probabilities, one would expect to have
the same amount of money one started with. When we get to the binomial asset pricing
model with more than one step, we will see that the generalization of this fact is that the
stock price at time n is a martingale, still with the assumption that p and q are the correct
probabilities. This is a special case of the fundamental theorem of finance: there always
exists some probability, not necessarily the one you observe, under which the stock price
is a martingale.
Remark 6.5. Our model allows after one time step the possibility of the stock going up or
going down, but only these two options. What if instead there are 3 (or more) possibilities.
Suppose for example, that the stock goes up a factor u with probability P, down a factor
d with probability Q, and remains constant with probability R, where P + Q + R = 1.
The corresponding price of a European call option would be (uS
0
−K)
+
, (dS
0
−K)
+
, or
(S
0
−K)
+
. If one could replicate this outcome by buying and selling shares of the stock,
then the “no arbitrage” rule would give the exact value of the call option in this model.
But, except in very special circumstances, one cannot do this, and the theory falls apart.
One has three equations one wants to satisfy, in terms of V
u
1
, V
d
1
, and V
c
1
. (The “c” is
a mnemonic for “constant.”) There are however only two variables, ∆
0
and V
0
at your
disposal, and most of the time three equations in two unknowns cannot be solved.
Remark 6.6. In our model we ruled out the cases that P or Q were zero. If Q = 0,
that is, we are certain that the stock will go up, then we would always invest in the stock
if u > 1 + r, as we would always do better, and we would always put the money in the
bank if u ≤ 1 +r. Similar considerations apply when P = 0. It is interesting to note that
27
the cases where P = 0 or Q = 0 are the only ones in which our derivation is not valid.
It turns out that in more general models the true probabilities enter only in determining
which events have probability 0 or 1 and in no other way.
28
7. The multi-step binomial asset pricing model.
In this section we will obtain a formula for the pricing of options when there are n
time steps, but each time the stock can only go up by a factor u or down by a factor d.
The “Black-Scholes” formula we will obtain is already a nontrivial result that is useful.
We assume the following.
(1) Unlimited short selling of stock
(2) Unlimited borrowing
(3) No transaction costs
(4) Our buying and selling is on a small enough scale that it does not affect the market.
We need to set up the probability model. Ω will be all sequences of length n of H’s
and T’s. S
0
will be a fixed number and we define S
k
(ω) = u
j
d
k−j
S
0
if the first k elements
of a given ω ∈ Ω has j occurrences of H and k −j occurrences of T. (What we are doing is
saying that if the j-th element of the sequence making up ω is an H, then the stock price
goes up by a factor u; if T, then down by a factor d.) T
k
will be the σ-field generated by
S
0
, . . . , S
k
.
Let
p =
(1 +r) −d
u −d
, q =
u −(1 +r)
u −d
and define P(ω) = p
j
q
n−j
if ω has j appearances of H and n − j appearances of T. We
observe that under P the random variables S
k+1
/S
k
are independent and equal to u with
probability p and d with probability q. To see this, let Y
k
= S
k
/S
k−1
. Thus Y
k
is the
factor the stock price goes up or down at time k. Then P(Y
1
= y
1
, . . . , Y
n
= y
n
) = p
j
q
n−j
,
where j is the number of the y
k
that are equal to u. On the other hand, this is equal to
P(Y
1
= y
1
) P(Y
n
= y
n
). Let E denote the expectation corresponding to P.
The P we construct may not be the true probabilities of going up or down. That
doesn’t matter - it will turn out that using the principle of “no arbitrage,” it is P that
governs the price.
Our first result is the fundamental theorem of finance in the current context.
Proposition 7.1. Under P the discounted stock price (1 +r)
−k
S
k
is a martingale.
Proof. Since the random variable S
k+1
/S
k
is independent of T
k
, we have
E[(1 +r)
−(k+1)
S
k+1
[ T
k
] = (1 +r)
−k
S
k
(1 +r)
−1
E[S
k+1
/S
k
[ T
k
].
Using the independence the conditional expectation on the right is equal to
E[S
k+1
/S
k
] = pu +qd = 1 +r.
29
Substituting yields the proposition.
Let ∆
k
be the number of shares held between times k and k + 1. We require ∆
k
to be T
k
measurable. ∆
0
, ∆
1
, . . . is called the portfolio process. Let W
0
be the amount
of money you start with and let W
k
be the amount of money you have at time k. W
k
is
the wealth process. If we have ∆
k
shares between times k and k + 1, then at time k + 1
those shares will be worth ∆
k
S
k+1
. The amount of cash we hold between time k and k +1
is W
k
minus the amount held in stock, that is, W
k
− ∆
k
S
k
. At time k + 1 this is worth
(1 +r)[W
k
−∆
k
S
k
]. Therefore
W
k+1
= ∆
k
S
k+1
+ (1 +r)[W
k
−∆
k
S
k
].
Note that in the case where r = 0 we have
W
k+1
−W
k
= ∆
k
(S
k+1
−S
k
),
or
W
k+1
= W
0
+
k
¸
i=0

i
(S
i+1
−S
i
).
This is a discrete version of a stochastic integral. Since
E[W
k+1
−W
k
[ T
k
] = ∆
k
E[S
k+1
−S
k
[ T
k
] = 0,
it follows that in the case r = 0 that W
k
is a martingale. More generally
Proposition 7.2. Under P the discounted wealth process (1 +r)
−k
W
k
is a martingale.
Proof. We have
(1 +r)
−(k+1)
W
k+1
= (1 +r)
−k
W
k
+ ∆
k
[(1 +r)
−(k+1)
S
k+1
−(1 +r)
−k
S
k
].
Observe that
E[∆
k
[(1 +r)
−(k+1)
S
k+1
−(1 +r)
−k
S
k
[ T
k
]
= ∆
k
E[(1 +r)
−(k+1)
S
k+1
−(1 +r)
−k
S
k
[ T
k
] = 0.
The result follows.
Our next result is that the binomial model is complete. It is easy to lose the idea
in the algebra, so first let us try to see why the theorem is true.
For simplicity let us first consider the case r = 0. Let V
k
= E[V [ T
k
]; by Propo-
sition 4.3 we see that V
k
is a martingale. We want to construct a portfolio process, i.e.,
30
choose ∆
k
’s, so that W
n
= V . We will do it inductively by arranging matters so that
W
k
= V
k
for all k. Recall that W
k
is also a martingale.
Suppose we have W
k
= V
k
at time k and we want to find ∆
k
so that W
k+1
= V
k+1
.
At the (k +1)-st step there are only two possible changes for the price of the stock and so
since V
k+1
is T
k+1
measurable, only two possible values for V
k+1
. We need to choose ∆
k
so that W
k+1
= V
k+1
for each of these two possibilities. We only have one parameter, ∆
k
,
to play with to match up two numbers, which may seem like an overconstrained system of
equations. But both V and W are martingales, which is why the system can be solved.
Now let us turn to the details. In the following proof we allow r ≥ 0.
Theorem 7.3. The binomial asset pricing model is complete.
The precise meaning of this is the following. If V is any random variable that is T
n
measurable, there exists a constant W
0
and a portfolio process ∆
k
so that the wealth
process W
k
satisfies W
n
= V . In other words, starting with W
0
dollars, we can trade
shares of stock to exactly duplicate the outcome of any option V .
Proof. Let
V
k
= (1 +r)
k
E[(1 +r)
−n
V [ T
k
].
By Proposition 4.3 (1 +r)
−k
V
k
is a martingale. If ω = (t
1
, . . . , t
n
), where each t
i
is an H
or T, let

k
(ω) =
V
k+1
(t
1
, . . . , t
k
, H, t
k+2
, . . . , t
n
) −V
k+1
(t
1
, . . . , t
k
, T, t
k+2
, . . . , t
n
)
S
k+1
(t
1
, . . . , t
k
, H, t
k+2
, . . . , t
n
) −S
k+1
(t
1
, . . . , t
k
, T, t
k+2
, . . . , t
n
)
.
Set W
0
= V
0
, and we will show by induction that the wealth process at time k equals V
k
.
The first thing to show is that ∆
k
is T
k
measurable. Neither S
k+1
nor V
k+1
depends
on t
k+2
, . . . , t
n
. So ∆
k
depends only on the variables t
1
, . . . , t
k
, hence is T
k
measurable.
Now t
k+2
, . . . , t
n
play no role in the rest of the proof, and t
1
, . . . , t
k
will be fixed,
so we drop the t’s from the notation. If we write V
k+1
(H), this is an abbreviation for
V
k+1
(t
1
, . . . , t
k
, H, t
k+2
, . . . , t
n
).
We know (1 +r)
−k
V
k
is a martingale under P so that
V
k
= E[(1 +r)
−1
V
k+1
[ T
k
] (7.1)
=
1
1 +r
[pV
k+1
(H) +qV
k+1
(T)].
(See Note 1.) We now suppose W
k
= V
k
and want to show W
k+1
(H) = V
k+1
(H) and
W
k+1
(T) = V
k+1
(T). Then using induction we have W
n
= V
n
= V as required. We show
the first equality, the second being similar.
31
W
k+1
(H) = ∆
k
S
k+1
(H) + (1 +r)[W
k
−∆
k
S
k
]
= ∆
k
[uS
k
−(1 +r)S
k
] + (1 +r)V
k
=
V
k+1
(H) −V
k+1
(T)
(u −d)S
k
S
k
[u −(1 +r)] +pV
k+1
(H) +qV
k+1
(T)
= V
k+1
(H).
We are done.
Finally, we obtain the Black-Scholes formula in this context. Let V be any option
that is T
n
-measurable. The one we have in mind is the European call, for which V =
(S
n
−K)
+
, but the argument is the same for any option whatsoever.
Theorem 7.4. The value of the option V at time 0 is V
0
= (1 +r)
−n
EV .
Proof. We can construct a portfolio process ∆
k
so that if we start with W
0
= (1+r)
−n
EV ,
then the wealth at time n will equal V , no matter what the market does in between. If
we could buy or sell the option V at a price other than W
0
, we could obtain a riskless
profit. That is, if the option V could be sold at a price c
0
larger than W
0
, we would sell
the option for c
0
dollars, use W
0
to buy and sell stock according to the portfolio process

k
, have a net worth of V +(1 +r)
n
(c
0
−W
0
) at time n, meet our obligation to the buyer
of the option by using V dollars, and have a net profit, at no risk, of (1 + r)
n
(c
0
− W
0
).
If c
0
were less than W
0
, we would do the same except buy an option, hold −∆
k
shares at
time k, and again make a riskless profit. By the “no arbitrage” rule, that can’t happen,
so the price of the option V must be W
0
.
Remark 7.5. Note that the proof of Theorem 7.4 tells you precisely what hedging
strategy (i.e., what portfolio process) to use.
In the binomial asset pricing model, there is no difficulty computing the price of a
European call. We have
E(S
n
−K)
+
=
¸
x
(x −K)
+
P(S
n
= x)
and
P(S
n
= x) =

n
k

p
k
q
n−k
if x = u
k
d
n−k
S
0
. Therefore the price of the European call is
(1 +r)
−n
n
¸
k=0
(u
k
d
n−k
S
0
−K)
+

n
k

p
k
q
n−k
.
32
The formula in Theorem 7.4 holds for exotic options as well. Suppose
V = max
i=1,...,n
S
i
− min
j=1,...,n
S
j
.
In other words, you sell the stock for the maximum value it takes during the first n time
steps and you buy at the minimum value the stock takes; you are allowed to wait until
time n and look back to see what the maximum and minimum were. You can even do this
if the maximum comes before the minimum. This V is still T
n
measurable, so the theory
applies. Naturally, such a “buy low, sell high” option is very desirable, and the price of
such a V will be quite high. It is interesting that even without using options, you can
duplicate the operation of buying low and selling high by holding an appropriate number
of shares ∆
k
at time k, where you do not look into the future to determine ∆
k
.
Let us look at an example of a European call so that it is clear how to do the
calculations. Consider the binomial asset pricing model with n = 3, u = 2, d =
1
2
, r = 0.1,
S
0
= 10, and K = 15. If V is a European call with strike price K and exercise date n, let
us compute explicitly the random variables V
1
and V
2
and calculate the value V
0
. Let us
also compute the hedging strategy ∆
0
, ∆
1
, and ∆
2
.
Let
p =
(1 +r) −d
u −d
= .4, q =
u −(1 +r)
u −d
= .6.
The following table describes the values of the stock, the payoff V , and the probabilities
for each possible outcome ω.
33
ω S
1
S
2
S
3
V Probability
HHH 10u 10u
2
10u
3
65 p
3
HHT 10u 10u
2
10u
2
d 5 p
2
q
HTH 10u 10ud 10u
2
d 5 p
2
q
HTT 10u 10ud 10ud
2
0 pq
2
THH 10d 10ud 10u
2
d 5 p
2
q
THT 10d 10ud 10ud
2
0 pq
2
TTH 10d 10d
2
10ud
2
0 pq
2
TTT 10d 10d
2
10d
3
0 q
3
We then calculate
V
0
= (1 +r)
−3
EV = (1 +r)
−3
(65p
3
+ 15p
2
q) = 4.2074.
V
1
= (1 +r)
−2
E[V [ T
1
], so we have
V
1
(H) = (1 +r)
−2
(65p
2
+ 10pq) = 10.5785, V
1
(T) = (1 +r)
−2
5pq = .9917.
V
2
= (1 +r)
−1
E[V [ T
2
], so we have
V
2
(HH) = (1 +r)
−1
(65p + 5q) = 24.5454, V
2
(HT) = (1 +r)
−1
5p = 1.8182,
V
2
(TH) = (1 +r)
−1
5p = 1.8182, V
2
(TT) = 0.
The formula for ∆
k
is given by

k
=
V
k+1
(H) −V
k+1
(T)
S
k+1
(H) −S
k+1
(T)
,
so

0
=
V
1
(H) −V
1
(T)
S
1
(H) −S
1
(T)
= .6391,
where V
1
and S
1
are as above.

1
(H) =
V
2
(HH) −V
2
(HT)
S
2
(HH) −S
2
(HT)
= .7576, ∆
1
(T) =
V
2
(TH) −V
2
(TT)
S
2
(TH) −S
2
(TT)
= .2424.

2
(HH) =
V
3
(HHH) −V
3
(HHT)
S
3
(HHH) −S
3
(HHT)
= 1.0,

2
(HT) =
V
3
(HTH) −V
3
(HTT)
S
3
(HTH) −S
3
(HTT)
= .3333,

2
(TH) =
V
3
(THH) −V
3
(THT)
S
3
(THH) −S
3
(THT)
= .3333,

2
(TT) =
V
3
(TTH) −V
3
(TTT)
S
3
(TTH) −S
3
(TTT)
= 0.0.
34
Note 1. The second equality is (7.1) is not entirely obvious. Intuitively, it says that one has
a heads with probability p and the value of V
k+1
is V
k+1
(H) and one has tails with probability
q, and the value of V
k+1
is V
k+1
(T).
Let us give a more rigorous proof of (7.1). The right hand side of (7.1) is T
k
measurable,
so we need to show that if A ∈ T
k
, then
E[V
k+1
; A] = E[pV
k+1
(H) +qV
k+1
(T); A].
By linearity, it suffices to show this for A = ¦ω = (t
1
t
2
t
n
) : t
1
= s
1
, . . . , t
k
= s
k
¦, where
s
1
s
2
s
k
is any sequence of H’s and T’s. Now
E[V
k+1
; s
1
s
k
] = E[V
k+1
; s
1
s
k
H] +E[V
k+1
; s
1
s
k
T]
= V
k+1
(s
1
s
k
H)P(s
1
s
k
H) +V
k+1
(s
1
s
k
T)P(s
1
s
k
T).
By independence this is
V
k+1
(s
1
s
k
H)P(s
1
s
k
)p +V
k+1
(s
1
s
k
T)P(s
1
s
k
)q,
which is what we wanted.
35
8. American options.
An American option is one where you can exercise the option any time before some
fixed time T. For example, on a European call, one can only use it to buy a share of stock
at the expiration time T, while for an American call, at any time before time T, one can
decide to pay K dollars and obtain a share of stock.
Let us give an informal argument on how to price an American call, giving a more
rigorous argument in a moment. One can always wait until time T to exercise an American
call, so the value must be at least as great as that of a European call. On the other hand,
suppose you decide to exercise early. You pay K dollars, receive one share of stock, and
your wealth is S
t
−K. You hold onto the stock, and at time T you have one share of stock
worth S
T
, and for which you paid K dollars. So your wealth is S
T
−K ≤ (S
T
−K)
+
. In
fact, we have strict inequality, because you lost the interest on your K dollars that you
would have received if you had waited to exercise until time T. Therefore an American
call is worth no more than a European call, and hence its value must be the same as that
of a European call.
This argument does not work for puts, because selling stock gives you some money
on which you will receive interest, so it may be advantageous to exercise early. (A put is
the option to sell a stock at a price K at time T.)
Here is the more rigorous argument. Suppose that if you exercise the option at time
k, your payoff is g(S
k
). In present day dollars, that is, after correcting for inflation, you
have (1+r)
−k
g(S
k
). You have to make a decision on when to exercise the option, and that
decision can only be based on what has already happened, not on what is going to happen
in the future. In other words, we have to choose a stopping time τ, and we exercise the
option at time τ(ω). Thus our payoff is (1 +r)
−τ
g(S
τ
). This is a random quantity. What
we want to do is find the stopping time that maximizes the expected value of this random
variable. As usual, we work with P, and thus we are looking for the stopping time τ such
that τ ≤ n and
E(1 +r)
−τ
g(S
τ
)
is as large as possible. The problem of finding such a τ is called an optimal stopping
problem.
Suppose g(x) is convex with g(0) = 0. Certainly g(x) = (x−K)
+
is such a function.
We will show that τ ≡ n is the solution to the above optimal stopping problem: the best
time to exercise is as late as possible.
We have
g(λx) = g(λx + (1 −λ) 0) ≤ λg(x) + (1 −λ)g(0) = λg(x), 0 ≤ λ ≤ 1. (8.1)
36
By Jensen’s inequality,
E[(1 +r)
−(k+1)
g(S
k+1
) [ T
k
] = (1 +r)
−k
E

1
1 +r
g(S
k+1
) [ T
k

≥ (1 +r)
−k
E

g

1
1 +r
S
k+1

[ T
k

≥ (1 +r)
−k
g

E

1
1 +r
S
k+1
[ T
k

= (1 +r)
−k
g(S
k
).
For the first inequality we used (8.1). So (1 + r)
−k
g(S
k
) is a submartingale. By optional
stopping,
E[(1 +r)
−τ
g(S
τ
)] ≤ E[(1 +r)
−n
g(S
n
)],
so τ ≡ n always does best.
For puts, the payoff is g(S
k
), where g(x) = (K −x)
+
. This is also convex function,
but this time g(0) = 0, and the above argument fails.
Although good approximations are known, an exact solution to the problem of
valuing an American put is unknown, and is one of the major unsolved problems in financial
mathematics.
37
9. Continuous random variables.
We are now going to start working toward continuous times and stocks that can
take any positive number as a value, so we need to prepare by extending some of our
definitions.
Given any random variable X ≥ 0, we can approximate it by r.v’s X
n
that are
discrete. We let
X
n
=
n2
n
¸
i=0
i
2
n
1
(i/2
n
≤X<(i+1)/2
n
)
.
In words, if X(ω) lies between 0 and n, we let X
n
(ω) be the closest value i/2
n
that is
less than or equal to X(ω). For ω where X(ω) > n + 2
−n
we set X
n
(ω) = 0. Clearly
the X
n
are discrete, and approximate X. In fact, on the set where X ≤ n, we have that
[X(ω) −X
n
(ω)[ ≤ 2
−n
.
For reasonable X we are going to define EX = limEX
n
. Since the X
n
increase
with n, the limit must exist, although it could be +∞. If X is not necessarily nonnegative,
we define EX = EX
+
− EX

, provided at least one of EX
+
and EX

is finite. Here
X
+
= max(X, 0) and X

= max(−X, 0).
There are some things one wants to prove, but all this has been worked out in
measure theory and the theory of the Lebesgue integral; see Note 1. Let us confine ourselves
here to showing this definition is the same as the usual one when X has a density.
Recall X has a density f
X
if
P(X ∈ [a, b]) =

b
a
f
X
(x)dx
for all a and b. In this case
EX =


−∞
xf
X
(x)dx
provided


−∞
[x[f
X
(x)dx < ∞. With our definition of X
n
we have
P(X
n
= i/2
n
) = P(X ∈ [i/2
n
, (i + 1)/2
n
)) =

(i+1)/2
n
i/2
n
f
X
(x)dx.
Then
EX
n
=
¸
i
i
2
n
P(X
n
= i/2
n
) =
¸
i

(i+1)/2
n
i/2
n
i
2
n
f
X
(x)dx.
Since x differs from i/2
n
by at most 1/2
n
when x ∈ [i/2
n
, (i + 1)/2
n
), this will tend to

xf
X
(x)dx, unless the contribution to the integral for [x[ ≥ n does not go to 0 as n → ∞.
As long as

[x[f
X
(x)dx < ∞, one can show that this contribution does indeed go to 0.
38
We also need an extension of the definition of conditional probability. A r.v. is (
measurable if (X > a) ∈ ( for every a. How do we define E[Z [ (] when ( is not generated
by a countable collection of disjoint sets?
Again, there is a completely worked out theory that holds in all cases; see Note 2.
Let us give a definition that is equivalent that works except for a very few cases. Let us
suppose that for each n the σ-field (
n
is finitely generated. This means that (
n
is generated
by finitely many disjoint sets B
n1
, . . . , B
nm
n
. So for each n, the number of B
ni
is finite but
arbitrary, the B
ni
are disjoint, and their union is Ω. Suppose also that (
1
⊂ (
2
⊂ . Now

n
(
n
will not in general be a σ-field, but suppose ( is the smallest σ-field that contains
all the (
n
. Finally, define P(A [ () = limP(A [ (
n
).
This is a fairly general set-up. For example, let Ω be the real line and let (
n
be
generated by the sets (−∞, n), [n, ∞) and [i/2
n
, (i + 1)/2
n
). Then ( will contain every
interval that is closed on the left and open on the right, hence ( must be the σ-field that
one works with when one talks about Lebesgue measure on the line.
The question that one might ask is: how does one know the limit exists? Since
the (
n
increase, we know by Proposition 4.3 that M
n
= P(A [ (
n
) is a martingale with
respect to the (
n
. It is certainly bounded above by 1 and bounded below by 0, so by the
martingale convergence theorem, it must have a limit as n → ∞.
Once one has a definition of conditional probability, one defines conditional expec-
tation by what one expects. If X is discrete, one can write X as
¸
j
a
j
1
A
j
and then one
defines
E[X [ (] =
¸
j
a
j
P(A
j
[ ().
If the X is not discrete, one write X = X
+
−X

, one approximates X
+
by discrete random
variables, and takes a limit, and similarly for X

. One has to worry about convergence,
but everything does go through.
With this extended definition of conditional expectation, do all the properties of
Section 2 hold? The answer is yes. See Note 2 again.
With continuous random variables, we need to be more cautious about what we
mean when we say two random variables are equal. We say X = Y almost surely, abbre-
viated “a.s.”, if
P(¦ω : X(ω) = Y (ω)¦) = 0.
So X = Y except for a set of probability 0. The a.s. terminology is used other places as
well: X
n
→ Y a.s. means that except for a set of ω’s of probability zero, X
n
(ω) → Y (ω).
Note 1. The best way to define expected value is via the theory of the Lebesgue integral. A
probability P is a measure that has total mass 1. So we define
EX =

X(ω) P(dω).
39
To recall how the definition goes, we say X is simple if X(ω) =
¸
m
i=1
a
i
1
A
i
(ω) with each
a
i
≥ 0, and for a simple X we define
EX =
m
¸
i=1
a
i
P(A
i
).
If X is nonnegative, we define
EX = sup¦EY : Y simple , Y ≤ X¦.
Finally, provided at least one of EX
+
and EX

is finite, we define
EX = EX
+
−EX

.
This is the same definition as described above.
Note 2. The Radon-Nikodym theorem from measure theory says that if Q and P are two
finite measures on (Ω, () and Q(A) = 0 whenever P(A) = 0 and A ∈ (, then there exists an
integrable function Y that is (-measurable such that Q(A) =

A
Y dP for every measurable
set A.
Let us apply the Radon-Nikodym theorem to the following situation. Suppose (Ω, T, P)
is a probability space and X ≥ 0 is integrable: EX < ∞. Suppose ( ⊂ T. Define two new
probabilities on ( as follows. Let P

= P[
G
, that is, P

(A) = P(A) if A ∈ ( and P

(A) is not
defined if A ∈ T − (. Define Q by Q(A) =

A
XdP = E[X; A] if A ∈ (. One can show
(using the monotone convergence theorem from measure theory) that Q is a finite measure on
(. (One can also use this definition to define Q(A) for A ∈ T, but we only want to define Q
on (, as we will see in a moment.) So Q and P

are two finite measures on (Ω, (). If A ∈ (
and P

(A) = 0, then P(A) = 0 and so it follows that Q(A) = 0. By the Radon-Nikodym
theorem there exists an integrable random variable Y such that Y is ( measurable (this is why
we worried about which σ-field we were working with) and
Q(A) =

A
Y dP

if A ∈ (. Note
((a) Y is ( measurable, and
(b) if A ∈ (,
E[Y ; A] = E[X; A]
because
E[Y ; A] = E[Y 1
A
] =

A
Y dP =

A
Y dP

= Q(A) =

A
XdP = E[X1
A
] = E[X; A].
40
We define E[X [ (] to be the random variable Y . If X is integrable but not necessarily
nonnegative, then X
+
and X

will be integrable and we define
E[X [ (] = E[X
+
[ (] −E[X

[ (].
We define
P(B [ () = E[1
B
[ (]
if B ∈ T.
Let us show that there is only one r.v., up to almost sure equivalence, that satisfies (a)
and (b) above. If Y and Z are ( measurable, and E[Y ; A] = E[X; A] = E[Z; A] for A ∈ (,
then the set A
n
= (Y > Z +
1
n
) will be in (, and so
E[Z; A
n
] +
1
n
P(A
n
) = E[Z +
1
n
; A
n
] ≤ E[Y ; A
n
] = E[Z; A
n
].
Consequently P(A
n
) = 0. This is true for each positive integer n, so P(Y > Z) = 0. By
symmetry, P(Z > Y ) = 0, and therefore P(Y = Z) = 0 as we wished.
If one checks the proofs of Propositions 2.3, 2.4, and 2.5, one sees that only properties
(a) and (b) above were used. So the propositions hold for the new definition of conditional
expectation as well.
In the case where ( is finitely or countably generated, under both the new and old
definitions (a) and (b) hold. By the uniqueness result, the new and old definitions agree.
41
10. Stochastic processes.
We will be talking about stochastic processes. Previously we discussed sequences
S
1
, S
2
, . . . of r.v.’s. Now we want to talk about processes Y
t
for t ≥ 0. For example, we
can think of S
t
being the price of a stock at time t. Any nonnegative time t is allowed.
We typically let T
t
be the smallest σ-field with respect to which Y
s
is measurable
for all s ≤ t. So T
t
= σ(Y
s
: s ≤ t). As you might imagine, there are a few technicalities
one has to worry about. We will try to avoid thinking about them as much as possible,
but see Note 1.
We call a collection of σ-fields T
t
with T
s
⊂ T
t
if s < t a filtration. We say the
filtration satisfies the “usual conditions” if the T
t
are right continuous and complete (see
Note 1); all the filtrations we consider will satisfy the usual conditions.
We say a stochastic process has continuous paths if the following holds. For each
ω, the map t → Y
t
(ω) defines a function from [0, ∞) to R. If this function is a continuous
function for all ω’s except for a set of probability zero, we say Y
t
has continuous paths.
Definition 10.1. A mapping τ : Ω → [0, ∞) is a stopping time if for each t we have
(τ ≤ t) ∈ T
t
.
Typically, τ will be a continuous random variable and P(τ = t) = 0 for each t, which
is why we need a definition just a bit different from the discrete case.
Since (τ < t) = ∪

n=1
(τ ≤ t −
1
n
) and (τ ≤ t −
1
n
) ∈ T
t−
1
n
⊂ T
t
, then for a stopping
time τ we have (τ < t) ∈ T
t
for all t.
Conversely, suppose τ is a nonnegative r.v. for which (τ < t) ∈ T
t
for all t. We
claim τ is a stopping time. The proof is easy, but we need the right continuity of the T
t
here, so we put the proof in Note 2.
A continuous time martingale (or submartingale) is what one expects: each M
t
is
integrable, each M
t
is T
t
measurable, and if s < t, then
E[M
t
[ T
s
] = M
s
.
(Here we are saying the left hand side and the right hand side are equal almost surely; we
will usually not write the “a.s.” since almost all of our equalities for random variables are
only almost surely.)
The analogues of Doob’s theorems go through. Note 3 has the proofs.
Note 1. For technical reasons, one typically defines T
t
as follows. Let T
0
t
= σ(Y
s
: s ≤ t).
This is what we referred to as T
t
above. Next add to T
0
t
all sets N for which P(N) = 0. Such
sets are called null sets, and since they have probability 0, they don’t affect anything. In fact,
one wants to add all sets N that we think of being null sets, even though they might not be
42
measurable. To be more precise, we say N is a null set if inf¦P(A) : A ∈ T, N ⊂ A¦ = 0.
Recall we are starting with a σ-field T and all the T
0
t
’s are contained in T. Let T
00
t
be the σ-
field generated by T
0
t
and all null sets N, that is, the smallest σ-field containing T
0
t
and every
null set. In measure theory terminology, what we have done is to say T
00
t
is the completion of
T
0
t
.
Lastly, we want to make our σ-fields right continuous. We set T
t
= ∩
ε>0
T
00
t+ε
. Al-
though the union of σ-fields is not necessarily a σ-field, the intersection of σ-fields is. T
t
contains T
00
t
but might possibly contain more besides. An example of an event that is in T
t
but that may not be in T
00
t
is
A = ¦ω : lim
n→∞
Y
t+
1
n
(ω) ≥ 0¦.
A ∈ T
00
t+
1
m
for each m, so it is in T
t
. There is no reason it needs to be in T
00
t
if Y is not
necessarily continuous at t. It is easy to see that ∩
ε>0
T
t+ε
= T
t
, which is what we mean
when we say T
t
is right continuous.
When talking about a stochastic process Y
t
, there are various types of measurability
one can consider. Saying Y
t
is adapted to T
t
means Y
t
is T
t
measurable for each t. However,
since Y
t
is really a function of two variables, t and ω, there are other notions of measurability
that come into play. We will be considering stochastic processes that have continuous paths or
that are predictable (the definition will be given later), so these various types of measurability
will not be an issue for us.
Note 2. Suppose (τ < t) ∈ T
t
for all t. Then for each positive integer n
0
,
(τ ≤ t) = ∩

n=n
0
(τ < t +
1
n
).
For n ≥ n
0
we have (τ < t +
1
n
) ∈ T
t+
1
n
⊂ T
t+
1
n
0
. Therefore (τ ≤ t) ∈ T
t+
1
n
0
for each n
0
.
Hence the set is in the intersection: ∩
n
0
>1
T
t+
1
n
0
⊂ ∩
ε>0
T
t+ε
= T
t
.
Note 3. We want to prove the analogues of Theorems 5.3 and 5.4. The proof of Doob’s
inequalities are simpler. We only will need the analogue of Theorem 5.4(b).
Theorem 10.2. Suppose M
t
is a martingale with continuous paths and EM
2
t
< ∞ for
all t. Then for each t
0
E[(sup
s≤t
0
M
s
)
2
] ≤ 4E[[M
t
0
[
2
].
Proof. By the definition of martingale in continuous time, N
k
is a martingale in discrete time
with respect to (
k
when we set N
k
= M
kt
0
/2
n and (
k
= T
kt
0
/2
n. By Theorem 5.4(b)
E[ max
0≤k≤2
n
M
2
kt
0
/2
n] = E[ max
0≤k≤2
n
N
2
k
] ≤ 4EN
2
2
n = 4EM
2
t
0
.
43
(Recall (max
k
a
k
)
2
= max a
2
k
if all the a
k
≥ 0.)
Now let n → ∞. Since M
t
has continuous paths, max
0≤k≤2
n M
2
kt
0
/2
n
increases up to
sup
s≤t
0
M
2
s
. Our result follows from the monotone convergence theorem from measure theory
(see Note 4).
We now prove the analogue of Theorem 5.3. The proof is simpler if we assume that
EM
2
t
is finite; the result is still true without this assumption.
Theorem 10.3. Suppose M
t
is a martingale with continuous paths, EM
2
t
< ∞ for all t,
and τ is a stopping time bounded almost surely by t
0
. Then EM
τ
= EM
t
0
.
Proof. We approximate τ by stopping times taking only finitely many values. For n > 0
define
τ
n
(ω) = inf¦kt
0
/2
n
: τ(ω) < kt
0
/2
n
¦.
τ
n
takes only the values kt
0
/2
n
for some k ≤ 2
n
. The event (τ
n
≤ jt
0
/2
n
) is equal to
(τ < jt
0
/2
n
), which is in T
jt
0
/2
n since τ is a stopping time. So (τ
n
≤ s) ∈ T
s
if s is of the
form jt
0
/2
n
for some j. A moment’s thought, using the fact that τ
n
only takes values of the
form kt
0
/2
n
, shows that τ
n
is a stopping time.
It is clear that τ
n
↓ τ for every ω. Since M
t
has continuous paths, M
τ
n
→ M
τ
a.s.
Let N
k
and (
k
be as in the proof of Theorem 10.2. Let σ
n
= k if τ
n
= kt
0
/2
n
. By
Theorem 5.3,
EN
σ
n
= EN
2
n,
which is the same as saying
EM
τ
n
= EM
t
0
.
To complete the proof, we need to show EM
τ
n
converges to EM
τ
. This is almost
obvious, because we already observed that M
τ
n
→ M
τ
a.s. Without the assumption that
EM
2
t
< ∞ for all t, this is actually quite a bit of work, but with the assumption it is not too
bad.
Either [M
τ
n
− M
τ
[ is less than or equal to 1 or greater than 1. If it is greater than 1,
it is less than [M
τ
n
−M
τ
[
2
. So in either case,
[M
τ
n
−M
τ
[ ≤ 1 +[M
τ
n
−M
τ
[
2
. (10.1)
Because both [M
τ
n
[ and [M
τ
[ are bounded by sup
s≤t
0
[M
s
[, the right hand side of (10.1) is
bounded by 1 + 4 sup
s≤t
0
[M
s
[
2
, which is integrable by Theorem 10.2. [M
τ
n
−M
τ
[ → 0, and
so by the dominated convergence theorem from measure theory (Note 4),
E[M
τ
n
−M
τ
[ → 0.
44
Finally,
[EM
τ
n
−EM
τ
[ = [E(M
τ
n
−M
τ
)[ ≤ E[M
τ
n
−M
τ
[ → 0.
Note 4. The dominated convergence theorem says that if X
n
→ X a.s. and [X
n
[ ≤ Y a.s.
for each n, where EY < ∞, then EX
n
→ EX.
The monotone convergence theorem says that if X
n
≥ 0 for each n, X
n
≤ X
n+1
for
each n, and X
n
→ X, then EX
n
→ EX.
45
11. Brownian motion.
First, let us review a few facts about normal random variables. We say X is a
normal random variable with mean a and variance b
2
if
P(c ≤ X ≤ d) =

d
c
1

2πb
2
e
−(y−a)
2
/2b
2
dy
and we will abbreviate this by saying X is ^(a, b
2
). If X is ^(a, b
2
), then EX = a,
Var X = b
2
, and E[X[
p
< ∞ is finite for every positive integer p. Moreover
Ee
tX
= e
at
e
t
2
b
2
/2
.
Let S
n
be a simple symmetric random walk. This means that Y
k
= S
k
− S
k−1
equals +1 with probability
1
2
, equals −1 with probability
1
2
, and is independent of Y
j
for
j < k. We notice that ES
n
= 0 while ES
2
n
=
¸
n
i=1
EY
2
i
+
¸
i=j
EY
i
Y
j
= n using the fact
that E[Y
i
Y
j
] = (EY
i
)(EY
j
) = 0.
Define X
n
t
= S
nt
/

n if nt is an integer and by linear interpolation for other t. If
nt is an integer, EX
n
t
= 0 and E(X
n
t
)
2
= t. It turns out X
n
t
does not converge for any ω.
However there is another kind of convergence, called weak convergence, that takes
place. There exists a process Z
t
such that for each k, each t
1
< t
2
< < t
k
, and each
a
1
< b
1
, a
2
< b
2
, . . . , a
k
< b
k
, we have
(1) The paths of Z
t
are continuous as a function of t.
(2) P(X
n
t
1
∈ [a
1
, b
1
], . . . , X
n
t
k
∈ [a
k
, b
k
]) → P(Z
t
1
∈ [a
1
, b
1
], . . . , Z
t
k
∈ [a
k
, b
k
]).
See Note 1 for more discussion of weak convergence.
The limit Z
t
is called a Brownian motion starting at 0. It has the following prop-
erties.
(1) EZ
t
= 0.
(2) EZ
2
t
= t.
(3) Z
t
−Z
s
is independent of T
s
= σ(Z
r
, r ≤ s).
(4) Z
t
−Z
s
has the distribution of a normal random variable with mean 0 and variance
t −s. This means
P(Z
t
−Z
s
∈ [a, b]) =

b
a
1

2π(t −s)
e
−y
2
/2(t−s)
dy.
(This result follows from the central limit theorem.)
(5) The map t → Z
t
(ω) is continuous for almost all ω.
See Note 2 for a few remarks on this definition.
It is common to use B
t
(“B” for Brownian) or W
t
(“W” for Wiener, who was the
first person to prove rigorously that Brownian motion exists). We will most often use W
t
.
46
We will use Brownian motion extensively and develop some of its properties. As
one might imagine for a limit of a simple random walk, the paths of Brownian motion have
a huge number of oscillations. It turns out that the function t → W
t
(ω) is continuous, but
it is not differentiable; in fact one cannot define a derivative at any value of t. Another
bizarre property: if one looks at the set of times at which W
t
(ω) is equal to 0, this is a
set which is uncountable, but contains no intervals. There is nothing special about 0 – the
same is true for the set of times at which W
t
(ω) is equal to a for any level a.
In what follows, one of the crucial properties of a Brownian motion is that it is a
martingale with continuous paths. Let us prove this.
Proposition 11.1. W
t
is a martingale with respect to T
t
and W
t
has continuous paths.
Proof. As part of the definition of a Brownian motion, W
t
has continuous paths. W
t
is T
t
measurable by the definition of T
t
. Since the distribution of W
t
is that of a normal random
variable with mean 0 and variance t, then E[W
t
[ < ∞ for all t. (In fact, E[W
t
[
n
< ∞ for
all n.)
The key property is to show E[W
t
[ T
s
] = W
s
.
E[W
t
[ T
s
] = E[W
t
−W
s
[ T
s
] +E[W
s
[ T
s
] = E[W
t
−W
s
] +W
s
= W
s
.
We used here the facts that W
t
− W
s
is independent of T
s
and that E[W
t
− W
s
] = 0
because W
t
and W
s
have mean 0.
We will also need
Proposition 11.2. W
2
t
−t is a martingale with continuous paths with respect to T
t
.
Proof. That W
2
t
− t is integrable and is T
t
measurable is as in the above proof. We
calculate
E[W
2
t
−t [ T
s
] = E[((W
t
−W
s
) +W
s
)
2
[ T
s
] −t
= E[(W
t
−W
s
)
2
[ T
s
] + 2E[(W
t
−W
s
)W
s
[ T
s
] +E[W
2
s
[ T
s
] −t
= E[(W
t
−W
s
)
2
] + 2W
s
E[W
t
−W
s
[ T
s
] +W
2
s
−t.
We used the facts that W
s
is T
s
measurable and that (W
t
− W
s
)
2
is independent of T
s
because W
t
−W
s
is. The second term on the last line is equal to W
s
E[W
t
−W
s
] = 0. The
first term, because W
t
− W
s
is normal with mean 0 and variance t − s, is equal to t − s.
Substituting, the last line is equal to
(t −s) + 0 +W
2
s
−t = W
2
s
−s
47
as required.
Note 1. A sequence of random variables X
n
converges weakly to X if P(a < X
n
< b) →
P(a < X < b) for all a, b ∈ [−∞, ∞] such that P(X = a) = P(X = b) = 0. a and b can be
infinite. If X
n
converges to a normal random variable, then P(X = a) = P(X = b) = 0 for all
a and b. This is the type of convergence that takes place in the central limit theorem. It will
not be true in general that X
n
converges to X almost surely.
For a sequence of random vectors (X
n
1
, . . . , X
m
k
) to converge to a random vector
(X
1
, . . . , X
k
), one can give an analogous definition. But saying that the normalized random
walks X
n
(t) above converge weakly to Z
t
actually says more than (2). A result from probability
theory says that X
n
converges to X weakly if and only if E[f(X
n
)] → E[f(X)] whenever f
is a bounded continuous function on R. We use this to define weak convergence for stochastic
processes. Let C([0, ∞) be the collection of all continuous functions from [0, ∞) to the reals.
This is a metric space, so the notion of function from C([0, ∞)) to R being continuous makes
sense. We say that the processes X
n
converge weakly to the process Z, and mean by this
that E[F(X
n
)] → E[F(Z)] whenever F is a bounded continuous function on C([0, ∞)). One
example of such a function F would be F(f) = sup
0≤t<∞
[f(t)[ if f ∈ C([0, ∞)); Another
would be F(f) =

1
0
f(t)dt.
The reason one wants to show that X
n
converges weakly to Z instead of just showing
(2) is that weak convergence can be shown to imply that Z has continuous paths.
Note 2. First of all, there is some redundancy in the definition: one can show that parts
of the definition are implied by the remaining parts, but we won’t worry about this. Second,
we actually want to let T
t
to be the completion of σ(Z
s
: s ≤ t), that is, we throw in all the
null sets into each T
t
. One can prove that the resulting T
t
are right continuous, and hence
the filtration T
t
satisfies the “usual” conditions. Finally, the “almost all” in (5) means that
t → Z
t
(ω) is continuous for all ω, except for a set of ω of probability zero.
48
12. Stochastic integrals.
If one wants to consider the (deterministic) integral

t
0
f(s) dg(s), where f and g
are continuous and g is continuously differentiable, we can define it analogously to the
usual Riemann integral as the limit of Riemann sums
¸
n
i=1
f(s
i
)[g(s
i
) − g(s
i−1
)], where
s
1
< s
2
< < s
n
is a partition of [0, t]. This is known as the Riemann-Stieltjes integral.
One can show (using the mean value theorem, for example) that

t
0
f(s) dg(s) =

t
0
f(s)g

(s) ds.
If we were to take f(s) = 1
[0,a]
(s) (which is not continuous, but that is a minor matter
here), one would expect the following:

t
0
1
[0,a]
(s) dg(s) =

t
0
1
[0,a]
(s)g

(s) ds =

a
0
g

(s) ds = g(a) −g(0).
Note that although we use the fact that g is differentiable in the intermediate stages, the
first and last terms make sense for any g.
We now want to replace g by a Brownian path and f by a random integrand. The
expression

f(s) dW(s) does not make sense as a Riemann-Stieltjes integral because it is
a fact that W(s) is not differentiable as a function of t. We need to define the expression
by some other means. We will show that it can be defined as the limit in L
2
of Riemann
sums. The resulting integral is called a stochastic integral.
Let us consider a very special case first. Suppose f is continuous and deterministic
(i.e., does not depend on ω). Suppose we take a Riemann sum approximation
I
n
=
2
n
−1
¸
i=0
f(
i
2
n
)[W(
i+1
2
n
) −W(
i
2
n
)].
Since W
t
has zero expectation for each t, EI
n
= 0. Let us calculate the second moment:
EI
2
n
= E

¸
i
f(
i
2
n
)[W(
i+1
2
n
) −W(
i
2
n
)]

2

(12.1)
= E
2
n
−1
¸
i=0
f(
i
2
n
)
2
[W(
i+1
2
n
) −W(
i
2
n
)]
2
+E
¸
i=j
f(
i
2
n
)f(
j
2
n
)[W(
i+1
2
n
) −W(
i
2
n
)] [W(
j+1
2
n
) −W(
j
2
n
)].
The first sum is bounded by
¸
i
f(
i
2
n
)
2
1
2
n

1
0
f(t)
2
dt,
49
since the second moment of W(
i+1
2
n
) − W(
i
2
n
) is 1/2
n
. Using the independence and the
fact that W
t
has mean zero,
E

[W(
i+1
2
n
−W(
i
2
n
)] [W(
j+1
2
n
−W(
j
2
n
)]

= E[W(
i+1
2
n
−W(
i
2
n
)]E[W(
j+1
2
n
−W(
j
2
n
)] = 0,
and so the second sum on the right hand side of (12.1) is zero. This calculation is the key
to the stochastic integral.
We now turn to the construction. Let W
t
be a Brownian motion. We will only
consider integrands H
s
such that H
s
is T
s
measurable for each s (see Note 1). We will
construct

t
0
H
s
dW
s
for all H with
E

t
0
H
2
s
ds < ∞. (12.2)
Before we proceed we will need to define the quadratic variation of a continuous
martingale. We will use the following theorem without proof because in our applications we
can construct the desired increasing process directly. We often say a process is a continuous
process if its paths are continuous, and similarly a continuous martingale is a martingale
with continuous paths.
Theorem 12.1. Suppose M
t
is a continuous martingale such that EM
2
t
< ∞ for all t.
There exists one and only one increasing process A
t
that is adapted to T
t
, has continuous
paths, and A
0
= 0 such that M
2
t
−A
t
is a martingale.
The simplest example of such a martingale is Brownian motion. If W
t
is a Brownian
motion, we saw in Proposition 11.2 that W
2
t
− t is a martingale. So in this case A
t
= t
almost surely, for all t. Hence 'W`
t
= t.
We use the notation 'M`
t
for the increasing process given in Theorem 12.1 and call
it the quadratic variation process of M. We will see later that in the case of stochastic
integrals, where
N
t
=

t
0
H
s
dW
s
,
it turns out that 'N`
t
=

t
0
H
2
s
ds.
We will use the following frequently, and in fact, these are the only two properties
of Brownian motion that play a significant role in the construction.
Lemma 12.1. (a) E[W
b
−W
a
[ T
a
] = 0.
(b) E[W
2
b
−W
2
a
[ T
a
] = E[(W
b
−W
a
)
2
[ T
a
] = b −a.
Proof. (a) This is E[W
b
− W
a
] = 0 by the independence of W
b
− W
a
from T
a
and the
fact that W
b
and W
a
have mean zero.
50
(b) (W
b
−W
a
)
2
is independent of T
a
, so the conditional expectation is the same as
E[(W
b
−W
a
)
2
]. Since W
b
−W
a
is a ^(0, b −a), the second equality in (b) follows.
To prove the first equality in (b), we write
E[W
2
b
−W
2
a
[ T
a
] = E[((W
b
−W
a
) +W
a
)
2
[ T
a
] −E[W
2
a
[ T
a
]
= E[(W
b
−W
a
)
2
[ T
a
] + 2E[W
a
(W
b
−W
a
) [ T
a
] +E[W
2
a
[ T
a
]
−E[W
2
a
[ T
a
]
= E[(W
b
−W
a
)
2
[ T
a
] + 2W
a
E[W
b
−W
a
[ T
a
],
and the first equality follows by applying (a).
We construct the stochastic integral in three steps. We say an integrand H
s
= H
s
(ω)
is elementary if
H
s
(ω) = G(ω)1
(a,b]
(s)
where 0 ≤ a < b and G is bounded and T
a
measurable. We say H is simple if it is a finite
linear combination of elementary processes, that is,
H
s
(ω) =
n
¸
i=1
G
i
(ω)1
(a
i
,b
i
]
(s). (12.3)
We first construct the stochastic integral for H elementary; the work here is showing the
stochastic integral is a martingale. We next construct the integral for H simple and here
the difficulty is calculating the second moment. Finally we consider the case of general H.
First step. If G is bounded and T
a
measurable, let H
s
(ω) = G(ω)1
(a,b]
(s), and define the
stochastic integral to be the process N
t
, where N
t
= G(W
t∧b
− W
t∧a
). Compare this to
the first paragraph of this section, where we considered Riemann-Stieltjes integrals.
Proposition 12.2. N
t
is a continuous martingale, EN
2

= E[G
2
(b −a)] and
'N`
t
=

t
0
G
2
1
[a,b]
(s) ds.
Proof. The continuity is clear. Let us look at E[N
t
[ T
s
]. In the case a < s < t < b, this
is equal to
E[G(W
t
−W
a
) [ T
s
] = GE[(W
t
−W
a
) [ T
s
] = G(W
s
−W
a
) = N
s
.
In the case s < a < t < b, E[N
t
[ T
s
] is equal to
E[G(W
t
−W
a
) [ T
s
] = E[GE[W
t
−W
a
[ T
a
] [ T
s
] = 0 = N
s
.
51
The other possibilities are s < t < a < b, a < b < s < t, as < a < b < t, and a < s < b < t;
these are done similarly.
For EN
2

, we have using Lemma 12.1(b)
EN
2

= E[G
2
E[(W
b
−W
a
)
2
[ T
a
]] = E[G
2
E[W
2
b
−W
2
a
[ T
a
]] = E[G
2
(b −a)].
For 'N`
t
, we need to show
E[G
2
(W
t∧b
−W
t∧a
)
2
−G
2
(t ∧ b −t ∧ a) [ T
s
]
= G
2
(W
s∧b
−W
s∧a
)
2
−G
2
(s ∧ b −s ∧ a).
We do this by checking all six cases for the relative locations of a, b, s, and t; we do one of
the cases in Note 2.
Second step. Next suppose H
s
is simple as in (12.3). In this case define the stochastic
integral
N
t
=

t
0
H
s
dW
s
=
n
¸
i=1
G
i
(W
b
i
∧t
−W
a
i
∧t
).
Proposition 12.3. N
t
is a continuous martingale, EN
2

= E


0
H
2
s
ds, and 'N`
t
=

t
0
H
2
s
ds.
Proof. We may rewrite H so that the intervals (a
i
, b
i
] satisfy a
1
≤ b
1
≤ a
2
≤ b
2
≤ ≤ b
n
.
For example, if we had a
1
< a
2
< b
1
< b
2
, we could write
H
s
= G
1
1
(a
1
,a
2
]
+ (G
1
+G
2
)1
(a
2
,b
1
]
+G
2
1
(b
1
,b
2
]
,
and then if we set G

1
= G
1
, G

2
= G
1
+ G
2
, G

3
= G
2
and a

1
= a
1
, b

1
= a
2
, a

2
= a
2
, b

2
=
b
1
, a

3
= b
1
, b

3
= b
2
, we have written H as
3
¸
i=1
G

i
1
(a

i
,b

i
]
.
So now we have H simple but with the intervals (a

i
, b

i
] non-overlapping.
Since the sum of martingales is clearly a martingale, N
t
is a martingale. The sum
of continuous processes will be continuous, so N
t
has continuous paths.
We have
EN
2

= E

¸
G
2
i
(W
b
i
−W
a
i
)
2

+ 2E

¸
i<j
G
i
G
j
(W
b
i
−W
a
i
)(W
b
j
−W
a
j
)

.
52
The terms in the second sum vanish, because when we condition on T
a
j
, we have
E[G
i
G
j
(W
b
i
−W
a
i
)E[(W
b
j
−W
a
j
) [ T
a
j
] = 0.
Taking expectations,
E[G
i
G
j
(W
b
i
−W
a
i
)(W
b
j
−W
a
j
)] = 0.
For the terms in the first sum, by Lemma 12.1
E[G
2
i
(W
b
i
−W
a
i
)
2
] = E[G
2
i
E[(W
b
i
−W
a
i
)
2
[ T
a
i
]] = E[G
2
i
([b
i
−a
i
)].
So
EN
2

=
n
¸
i=1
E[G
2
i
([b
i
−a
i
)],
and this is the same as E


0
H
2
s
ds.
Third step. Now suppose H
s
is adapted and E


0
H
2
s
ds < ∞. Using some results from
measure theory (Note 3), we can choose H
n
s
simple such that E


0
(H
n
s
− H
s
)
2
ds → 0.
The triangle inequality then implies (see Note 3 again)
E


0
(H
n
s
−H
m
s
)
2
ds → 0.
Define N
n
t
=

t
0
H
n
s
dW
s
using Step 2. By Doob’s inequality (Theorem 10.3) we have
E[sup
t
(N
n
t
−N
m
t
)
2
] = E

sup
t

t
0
(H
n
s
−H
m
s
) dW
s

2

≤ 4E


0
(H
n
s
−H
m
s
) dW
s

2
= 4E


0
(H
n
s
−H
m
s
)
2
ds → 0.
This should look reminiscent of the definition of Cauchy sequences, and in fact that is what
is going on here; Note 3 has details. In the present context Cauchy sequences converge,
and one can show (Note 3) that there exists a process N
t
such that
E

sup
t

t
0
H
n
s
dW
s
−N
t

2

→ 0.
If H
n
s
and H
n
s

are two sequences converging to H, then E(

t
0
(H
n
s
−H
n
s

) dW
s
)
2
=
E

t
0
(H
n
s
−H
n
s

)
2
ds → 0, or the limit is independent of which sequence H
n
we choose. See
Note 4 for the proof that N
t
is a martingale, EN
2
t
= E

t
0
H
2
s
ds, and 'N`
t
=

t
0
H
2
s
ds.
53
Because sup
t
[

t
0
H
n
s
dW
s
−N
t
] → 0, one can show there exists a subsequence such that the
convergence takes place almost surely, and with probability one, N
t
has continuous paths
(Note 5).
We write N
t
=

t
0
H
s
dW
s
and call N
t
the stochastic integral of H with respect to
W.
We discuss some extensions of the definition. First of all, if we replace W
t
by a
continuous martingale M
t
and H
s
is adapted with E

t
0
H
2
s
d'M`
s
< ∞, we can duplicate
everything we just did (see Note 6) with ds replaced by d'M`
s
and get a stochastic integral.
In particular, if d'M`
s
= K
2
s
ds, we replace ds by K
2
s
ds.
There are some other extensions of the definition that are not hard. If the random
variable


0
H
2
s
d'M`
s
is finite but without its expectation being finite, we can define the
stochastic integral by defining it for t ≤ T
N
for suitable stopping times T
N
and then letting
T
N
→ ∞; look at Note 7.
A process A
t
is of bounded variation if the paths of A
t
have bounded variation. This
means that one can write A
t
= A
+
t
−A

t
, where A
+
t
and A

t
have paths that are increasing.
[A[
t
is then defined to be A
+
t
+ A

t
. A semimartingale is the sum of a martingale and a
process of bounded variation. If


0
H
2
s
d'M`
s
+


0
[H
s
[ [dA
s
[ < ∞ and X
t
= M
t
+ A
t
,
we define

t
0
H
s
dX
s
=

t
0
H
s
dM
s
+

t
0
H
s
dA
s
,
where the first integral on the right is a stochastic integral and the second is a Riemann-
Stieltjes or Lebesgue-Stieltjes integral. For a semimartingale, we define 'X`
t
= 'M
t
`. Note
7 has more on this.
Given two semimartingales X and Y we define 'X, Y `
t
by what is known as polar-
ization:
'X, Y `
t
=
1
2
['X +Y `
t
−'X`
t
−'Y `
t
].
As an example, if X
t
=

t
0
H
s
dW
s
and Y
t
=

t
0
K
s
dW
s
, then (X+Y )
t
=

t
0
(H
s
+K
s
)dW
s
,
so
'X +Y `
t
=

t
0
(H
s
+K
s
)
2
ds =

t
0
H
2
s
ds +

t
0
2H
s
K
s
ds +

t
0
K
2
s
ds.
Since 'X`
t
=

t
0
H
2
s
ds with a similar formula for 'Y `
t
, we conclude
'X, Y `
t
=

t
0
H
s
K
s
ds.
The following holds, which is what one would expect.
54
Proposition 12.4. Suppose K
s
is adapted to T
s
and E


0
K
2
s
ds < ∞. Let N
t
=

t
0
K
s
dW
s
. Suppose H
s
is adapted and E


0
H
2
s
d'N`
s
< ∞. Then E


0
H
2
s
K
2
s
ds < ∞
and

t
0
H
s
dN
s
=

t
0
H
s
K
s
dW
s
.
The argument for the proof is given in Note 8.
What does a stochastic integral mean? If one thinks of the derivative of Z
t
as being
a white noise, then

t
0
H
s
dZ
s
is like a filter that increases or decreases the volume by a
factor H
s
.
For us, an interpretation is that Z
t
represents a stock price. Then

t
0
H
s
dZ
s
repre-
sents our profit (or loss) if we hold H
s
shares at time s. This can be seen most easily if
H
s
= G1
[a,b]
. So we buy G(ω) shares at time a and sell them at time b. The stochastic
integral represents our profit or loss.
Since we are in continuous time, we are allowed to buy and sell continuously and
instantaneously. What we are not allowed to do is look into the future to make our
decisions, which is where the H
s
adapted condition comes in.
Note 1. Let us be more precise concerning the measurability of H that is needed. H is a
stochastic process, so can be viewed as a map from [0, ∞) Ω to R by H : (s, ω) → H
s
(ω).
We define a σ-field { on [0, ∞)Ω as follows. Consider the collection of processes of the form
G(ω)1
(a,b])
(s) where G is bounded and T
a
measurable for some a < b. Define { to be the
smallest σ-field with respect to which every process of this form is measurable. { is called the
predictable or previsible σ-field, and if a process H is measurable with respect to {, then the
process is called predictable. What we require for our integrands H is that they be predictable
processes.
If H
s
has continuous paths, then approximating continuous functions by step functions
shows that such an H can be approximated by linear combinations of processes of the form
G(ω)1
(a,b])
(s). So continuous processes are predictable. The majority of the integrands we
will consider will be continuous.
If one is slightly more careful, one sees that processes whose paths are functions which
are continuous from the left at each time point are also predictable. This gives an indication
of where the name comes from. If H
s
has paths which are left continuous, then H
t
=
lim
n→∞
H
t−
1
n
, and we can “predict” the value of H
t
from the values at times that come
before t. If H
t
is only right continuous and a path has a jump at time t, this is not possible.
Note 2. Let us consider the case a < s < t < b; again similar arguments take care of the
other five cases. We need to show
E[G
2
(W
t
−W
a
)
2
−G
2
(t −a) [ T
s
] = G
2
(W
s
−W
a
)
2
−G
2
(s −a). (12.4)
55
The left hand side is equal to G
2
E[(W
t
−W
a
)
2
−(t −a) [ T
s
]. We write this as
G
2
E[((W
t
−W
s
) + (W
s
−W
a
))
2
−(t −a) [ T
s
]
= G
2

E[(W
t
−W
s
)
2
[ T
s
] + 2E[(W
t
−W
s
)(W
s
−W
a
) [ T
s
]
+E[(W
s
−W
a
)
2
[ T
s
] −(t −a)
¸
= G
2

E[(W
t
−W
s
)
2
] + 2(W
s
−W
a
)E[W
t
−W
s
[ T
s
] + (W
s
−W
a
)
2
−(t −a)
¸
= G
2

(t −s) + 0 + (W
s
−W
a
)
2
−(t −a)
¸
.
The last expression is equal to the right hand side of (12.4).
Note 3. A definition from measure theory says that if µ is a measure, |f|
2
, the L
2
norm of
f with respect to the measure µ, is defined as

f(x)
2
µ(dx)

1/2
. The space L
2
is defined
to be the set of functions f for which |f|
2
< ∞. (A technical proviso: one has to identify as
equal functions which differ only on a set of measure 0.) If one defines a distance between two
functions f and g by d(f, g) = |f −g|
2
, this is a metric on the space L
2
, and a theorem from
measure theory says that L
2
is complete with respect to this metric. Another theorem from
measure theory says that the collection of simple functions (functions of the form
¸
n
i=1
c
i
1
A
i
)
is dense in L
2
with respect to the metric.
Let us define a norm on stochastic processes; this is essentially an L
2
norm. Define
|N| =

E sup
0≤t<∞
N
2
t

1/2
.
One can show that this is a norm, and hence that the triangle inequality holds. Moreover, the
space of processes N such that |N| < ∞ is complete with respect to this norm. This means
that if N
n
is a Cauchy sequence, i.e., if given ε there exists n
0
such that |N
n
− N
m
| < ε
whenever n, m ≥ n
0
, then the Cauchy sequence converges, that is, there exists N with |N| <
∞ such that |N
n
−N| → 0.
We can define another norm on stochastic processes. Define
|H|
2
=

E


0
H
2
s
ds

1/2
.
This can be viewed as a standard L
2
norm, namely, the L
2
norm with respect to the measure
µ defined on { by
µ(A) = E


0
1
A
(s, ω)ds.
Since the set of simple functions with respect to µ is dense in L
2
, this says that if H is
measurable with respect to {, then there exist simple processes H
n
s
that are also measurable
with respect to { such that |H
n
−H|
2
→ 0.
56
Note 4. We have |N
n
−N| → 0, where the norm here is the one described in Note 3. Each
N
n
is a stochastic integral of the type described in Step 2 of the construction, hence each N
n
t
is a martingale. Let s < t and A ∈ T
s
. Since E[N
n
t
[ T
s
] = N
n
s
, then
E[N
n
t
; A] = E[N
n
s
; A]. (12.5)
By Cauchy-Schwarz,
[E[N
n
t
; A] −E[N
t
; A][ ≤ E[ [N
n
t
−N
t
[; A] ≤

E[(N
n
t
−N
t
)
2
]

1/2

E[1
2
A
]

1/2
≤ |N
n
−N| → 0. (12.6)
We have a similar limit when t is replaced by s, so taking the limit in (12.5) yields
E[N
t
; A] = E[N
s
; A].
Since N
s
is T
s
measurable and has the same expectation over sets A ∈ T
s
as N
t
does, then
by Proposition 4.3 E[N
t
[ T
s
] = N
s
, or N
t
is a martingale.
Suppose |N
n
− N| → 0. Given ε > 0 there exists n
0
such that |N
n
− N| < ε if
n ≥ n
0
. Take ε = 1 and choose n
0
. By the triangle inequality,
|N| ≤ |N
n
| +|N
n
−N| ≤ |N
n
| + 1 < ∞
since |N
n
| is finite for each n.
That N
2
t
− 'N`
t
is a martingale is similar to the proof that N
t
is a martingale, but
slightly more delicate. We leave the proof to the reader, but note that in place of (SEC.402)
one writes
[E[(N
n
t
)
2
; A] −E[(N
t
)
2
; A][ ≤ E[ [(N
n
t
)
2
−(N
t
)
2
[] ≤ E[ [N
n
t
−N
t
[ [N
n
t
+N
t
[]. (12.7)
By Cauchy-Schwarz this is less than
|N
n
t
−N
t
| |N
n
t
+N
t
|.
since |N
n
t
+ N
t
| ≤ |N
n
t
| + |N
t
| is bounded independently of n, we see that the left hand
side of (12.7) tends to 0.
Note 5. We have |N
n
−N| → 0, where the norm is described in Note 3. This means that
E[sup
t
[N
n
t
−N
t
[
2
] → 0.
A result from measure theory implies that there exists a subsequence n
k
such that
sup
t
[N
n
k
t
−N
t
[
2
→ 0, a.s.
57
So except for a set of ω’s of probability 0, N
n
k
t
(ω) converges to N
t
(ω) uniformly. Each N
n
k
t
(ω)
is continuous by Step 2, and the uniform limit of continuous functions is continuous, therefore
N
t
(ω) is a continuous function of t. Incidentally, this is the primary reason we considered
Doob’s inequalities.
Note 6. If M
t
is a continuous martingale, E[M
b
− M
a
[ T
a
] = E[M
b
[ T
a
] − M
a
=
M
a
− M
a
= 0. This is the analogue of Lemma 12.1(a). To show the analogue of Lemma
12.1(b),
E[(M
b
−M
a
)
2
[ T
a
] = E[M
2
b
[ T
a
] −2E[M
b
M
a
[ T
a
] +E[M
2
a
[ T
a
]
= E[M
2
b
[ T
a
] −2M
a
E[M
b
[ T
a
] +M
2
a
= E[M
2
b
−'M`
b
[ T
a
] +E['M`
b
−'M`
a
[ T
a
] −M
2
a
+'M`
a
= E['M`
b
−'M`
a
[ T
a
],
since M
2
t
−'M`
t
is a martingale. That
E[M
2
b
−M
2
a
[ T
a
] = E['M`
b
−'M`
a
[ T
a
]
is just a rewriting of
E[M
2
b
−'M`
b
[ T
a
] = M
2
a
−'M`
a
= E[M
2
a
−'M`
a
[ T
a
].
With these two properties in place of Lemma 12.1, replacing W
s
by M
s
and ds by
d'M`
s
, the construction of the stochastic integral

t
0
H
s
dM
s
goes through exactly as above.
Note 7. If we let T
K
= inf¦t > 0 :

t
0
H
2
s
d'M`
s
≥ K¦, the first time the integral is larger
than or equal to K, and we let H
K
s
= H
s
1
(s≤T
K
)
, then


0
H
K
s
d'M`
s
≤ K and there is no
difficulty defining N
K
t
=

t
0
H
K
s
dM
s
for every t. One can show that if t ≤ T
K
1
and T
K
2
, then
N
K
1
t
= N
K
2
t
a.s. If

t
0
H
s
d'M`
s
is finite for every t, then T
K
→ ∞ as K → ∞. If we call the
common value N
t
, this allows one to define the stochastic integral N
t
for each t in the case
where the integral

t
0
H
2
s
d'M`
s
is finite for every t, even if the expectation of the integral is
not.
We can do something similar is M
t
is a martingale but where we do not have E'M`

<
∞. Let S
K
= inf¦t : [M
t
[ ≥ K¦, the first time [M
t
[ is larger than or equal to K. If we let
M
K
t
= M
t∧S
K
, where t ∧ S
k
= min(t, S
K
), then one can show M
K
is a martingale bounded
in absolute value by K. So we can define J
K
t
=

t
0
H
s
dM
K
t
for every t, using the paragraph
above to handle the wider class of H’s, if necessary. Again, one can show that if t ≤ S
K
1
and
t ≤ S
K
2
, then the value of the stochastic integral will be the same no matter whether we use
M
K
1
or M
K
2
as our martingale. We use the common value as a definition of the stochastic
integral J
t
. We have S
K
→ ∞ as K → ∞, so we have a definition of J
t
for each t.
58
Note 8. We only outline how the proof goes. To show

t
0
H
s
dN
s
=

t
0
H
s
K
s
dW
s
, (12.8)
one shows that (SEC.801) holds for H
s
simple and then takes limits. To show this, it suffices
to look at H
s
elementary and use linearity. To show (12.8) for H
s
elementary, first prove this
in the case when K
s
is elementary, use linearity to extend it to the case when K is simple, and
then take limits to obtain it for arbitrary K. Thus one reduces the proof to showing (12.8)
when both H and K are elementary. In this situation, one can explicitly write out both sides
of the equation and see that they are equal.
59
13. Ito’s formula.
Suppose W
t
is a Brownian motion and f : R → R is a C
2
function, that is, f and its
first two derivatives are continuous. Ito’s formula, which is sometime known as the change
of variables formula, says that
f(W
t
) −f(W
0
) =

t
0
f

(W
s
)dW
s
+
1
2

t
0
f

(W
s
)ds.
Compare this with the fundamental theorem of calculus:
f(t) −f(0) =

t
0
f

(s)ds.
In Ito’s formula we have a second order term to carry along.
The idea behind the proof is quite simple. By Taylor’s theorem.
f(W
t
) −f(W
0
) =
n−1
¸
i=0
[f(W
(i+1)t/n
) −f(W
it/n
)]

n−1
¸
i=1
f

(W
it/n
)(W
(i+1)t/n
−W
it/n
)
+
1
2
n−1
¸
i=0
f

(W
it/n
)(W
(i+1)t/n
−W
it/n
)
2
.
The first sum on the right is approximately the stochastic integral and the second is
approximately the quadratic variation.
For a more general semimartingale X
t
= M
t
+A
t
, Ito’s formula reads
Theorem 13.1. If f ∈ C
2
, then
f(X
t
) −f(X
0
) =

t
0
f

(X
s
)dX
s
+
1
2

t
0
f

(X
s
)d'M`
s
.
Let us look at an example. Let W
t
be Brownian motion, X
t
= σW
t
− σ
2
t/2, and
f(x) = e
x
. Then 'X`
t
= 'σW`
t
= σ
2
t, f

(x) = f”(x) = e
x
, and
e
σW
t
−σ
2
t/2
= 1 +

t
0
e
X
s
σdW
s

1
2

t
0
e
X
s 1
2
σ
2
ds (13.1)
+
1
2

t
0
e
X
s
σ
2
ds
= 1 +

t
0
e
X
s
σdW
s
.
60
This example will be revisited many times later on.
Let us give another example of the use of Ito’s formula. Let X
t
= W
t
and let
f(x) = x
k
. Then f

(x) = kx
k−1
and f

(x) = k(k −1)x
k−2
. We then have
W
k
t
= W
k
0
+

t
0
kW
k−1
s
dW
s
+
1
2

t
0
k(k −1)W
k−2
s
d'W`
s
=

t
0
kW
k−1
s
dW
s
+
k(k −1)
2

t
0
W
k−2
s
ds.
When k = 3, this says W
3
t
−3

t
0
W
s
ds is a stochastic integral with respect to a Brownian
motion, and hence a martingale.
For a semimartingale X
t
= M
t
+A
t
we set 'X`
t
= 'M`
t
. Given two semimartingales
X, Y , we define
'X, Y `
t
=
1
2
['X +Y `
t
−'X`
t
−'Y `
t
].
The following is known as Ito’s product formula. It may also be viewed as an
integration by parts formula.
Proposition 13.2. If X
t
and Y
t
are semimartingales,
X
t
Y
t
= X
0
Y
0
+

t
0
X
s
dY
s
+

t
0
Y
s
dX
s
+'X, Y `
t
.
Proof. Applying Ito’s formula with f(x) = x
2
to X
t
+Y
t
, we obtain
(X
t
+Y
t
)
2
= (X
0
+Y
0
)
2
+ 2

t
0
(X
s
+Y
s
)(dX
s
+dY
s
) +'X +Y `
t
.
Applying Ito’s formula with f(x) = x
2
to X and to Y , then
X
2
t
= X
2
0
+ 2

t
0
X
s
dX
s
+'X`
t
and
Y
2
t
= Y
2
0
+ 2

t
0
Y
s
dY
s
+'Y `
t
.
Then some algebra and the fact that
X
t
Y
t
=
1
2
[(X
t
+Y
t
)
2
−X
2
t
−Y
2
t
]
yields the formula.
There is a multidimensional version of Ito’s formula: if X
t
= (X
1
t
, . . . , X
d
t
) is a
vector, each component of which is a semimartingale, and f ∈ C
2
, then
f(X
t
) −f(X
0
) =
d
¸
i=1

t
0
∂f
∂x
i
(X
s
)dX
i
s
+
1
2
d
¸
i,j=1

t
0

2
f
∂x
2
i
(X
s
)d'X
i
, X
j
`
s
.
The following application of Ito’s formula, known as L´evy’s theorem, is important.
61
Theorem 13.3. Suppose M
t
is a continuous martingale with 'M`
t
= t. Then M
t
is a
Brownian motion.
Before proving this, recall from undergraduate probability that the moment generating
function of a r.v. X is defined by m
X
(a) = Ee
aX
and that if two random variables have
the same moment generating function, they have the same law. This is also true if we
replace a by iu. In this case we have ϕ
X
(u) = Ee
iuX
and ϕ
X
is called the characteristic
function of X. The reason for looking at the characteristic function is that ϕ
X
always
exists, whereas m
X
(a) might be infinite. The one special case we will need is that if X is
a normal r.v. with mean 0 and variance t, then ϕ
X
(u) = e
−u
2
t/2
. This follows from the
formula for m
X
(a) with a replaced by iu (this can be justified rigorously).
Proof. We will prove that M
t
is a ^(0, t); for the remainder of the proof see Note 1.
We apply Ito’s formula with f(x) = e
iux
. Then
e
iuM
t
= 1 +

t
0
iue
iuM
s
dM
s
+
1
2

t
0
(−u
2
)e
iuM
s
d'M`
s
.
Taking expectations and using 'M`
s
= s and the fact that a stochastic integral is a
martingale, hence has 0 expectation, we have
Ee
iuM
t
= 1 −
u
2
2

t
0
e
iuM
s
ds.
Let J(t) = Ee
iuM
t
. The equation can be rewritten
J(t) = 1 −
u
2
2

t
0
J(s)ds.
So J

(t) = −
1
2
u
2
J(t) with J(0) = 1. The solution to this elementary ODE is J(t) =
e
−u
2
t/2
. Since
Ee
iuM
t
= e
−u
2
t/2
,
then by our remarks above the law of M
t
must be that of a ^(0, t), which shows that M
t
is a mean 0 variance t normal r.v.
Note 1. If A ∈ T
s
and we do the same argument with M
t
replaced by M
s+t
−M
s
, we have
e
iu(M
s+t
−M
s
)
= 1 +

t
0
iue
iu(M
s+r
−M
s
)
dM
r
+
1
2

t
0
(−u
2
)e
iu(M
s+r
−M
s
)
d'M`
r
.
Multiply this by 1
A
and take expectations. Since a stochastic integral is a martingale, the
stochastic integral term again has expectation 0. If we let K(t) = E[e
iu(M
t+s
−M
t
)
; A], we
now arrive at K

(t) = −
1
2
u
2
K(t) with K(0) = P(A), so
K(t) = P(A)e
−u
2
t/2
.
62
Therefore
E

e
iu(M
t+s
−M
s
)
; A

= Ee
iu(M
t+s
−M
s
)
P(A). (13.2)
If f is a nice function and
´
f is its Fourier transform, replace u in the above by −u, multiply
by
´
f(u), and integrate over u. (To do the integral, we approximate the integral by a Riemann
sum and then take limits.) We then have
E[f(M
s+t
−M
s
); A] = E[f((M
s+t
−M
s
)]P(A).
By taking limits we have this for f = 1
B
, so
P(M
s+t
−M
s
∈ B, A) = P(M
s+t
−M
s
∈ B)P(A).
This implies that M
s+t
−M
s
is independent of T
s
.
Note Var (M
t
−M
s
) = t −s; take A = Ω in (13.2).
63
14. The Girsanov theorem.
Suppose P is a probability and
dX
t
= dW
t
+µ(X
t
)dt,
where W
t
is a Brownian motion. This is short hand for
X
t
= X
0
+W
t
+

t
0
µ(X
s
)ds. (14.1)
Let
M
t
= exp

t
0
µ(X
s
)dW
s

t
0
µ(X
s
)
2
ds/2

. (14.2)
Then as we have seen before, by Ito’s formula, M
t
is a martingale. This calculation is
reviewed in Note 1. We also observe that M
0
= 1.
Now let us define a new probability by setting
Q(A) = E[M
t
; A] (14.3)
if A ∈ T
t
. We had better be sure this Q is well defined. If A ∈ T
s
⊂ T
t
, then E[M
t
; A] =
E[M
s
; A] because M
t
is a martingale. We also check that Q(Ω) = E[M
t
; Ω] = EM
t
. This
is equal to EM
0
= 1, since M
t
is a martingale.
What the Girsanov theorem says is
Theorem 14.1. Under Q, X
t
is a Brownian motion.
Under P, W
t
is a Brownian motion and X
t
is not. Under Q, the process W
t
is no
longer a Brownian motion.
In order for a process X
t
to be a Brownian motion, we need at a minimum that X
t
is mean zero and variance t. To define mean and variance, we need a probability. Therefore
a process might be a Brownian motion with respect to one probability and not another.
Most of the other parts of the definition of being a Brownian motion also depend on the
probability.
Similarly, to be a martingale, we need conditional expectations, and the conditional
expectation of a random variable depends on what probability is being used.
There is a more general version of the Girsanov theorem.
Theorem 14.2. If X
t
is a martingale under P, then under Q the process X
t
− D
t
is a
martingale where
D
t
=

t
0
1
M
s
d'X, M`
s
.
64
'X`
t
is the same under both P and Q.
Let us see how Theorem 14.1 can be used. Let S
t
be the stock price, and suppose
dS
t
= σS
t
dW
t
+mS
t
dt.
(So in the above formulation, µ(x) = m for all x.) Define
M
t
= e
(−m/σ)(W
t
)−(m
2
/2σ
2
)t
.
Then from (13.1) M
t
is a martingale and
M
t
= 1 +

t
0


m
σ

M
s
dW
s
.
Let X
t
= W
t
. Then
'X, M`
t
=

t
0


m
σ

M
s
ds = −

t
0
M
s
m
σ
ds.
Therefore

t
0
1
M
s
d'X, M`
s
= −

t
0
m
σ
ds = −(m/σ)t.
Define Q by (14.3). By Theorem 14.2, under Q the process

W
t
= W
t
+ (m/σ)t is a
martingale. Hence
dS
t
= σS
t
(dW
t
+ (m/σ)dt) = σS
t
d

W
t
,
or
S
t
= S
0
+

t
0
σS
s
d

W
s
is a martingale. So we have found a probability under which the asset price is a martingale.
This means that Q is the risk-neutral probability, which we have been calling P.
Let us give another example of the use of the Girsanov theorem. Suppose X
t
=
W
t
+µt, where µ is a constant. We want to compute the probability that X
t
exceeds the
level a by time t
0
.
We first need the probability that a Brownian motion crosses a level a by time t
0
.
If A
t
= sup
s≤t
W
t
, (note we are not looking at [W
t
[), we have
P(A
t
> a, c ≤ W
t
≤ d) =

d
c
ϕ(t, a, x), (14.4)
where
ϕ(t, a, x) =

1

2πt
e
−x
2
/2t
x ≥ a
1

2πt
e
−(2a−x)
2
/2t
x < a.
65
This is called the reflection principle, and the name is due to the derivation, given in Note
2. Sometimes one says
P(W
t
= x, A
t
> a) = P(W
t
= 2a −x), x < a,
but this is not precise because W
t
is a continuous random variable and both sides of the
above equation are zero; (14.4) is the rigorous version of the reflection principle.
Now let W
t
be a Brownian motion under P. Let dQ/dP = M
t
= e
µW
t
−µ
2
t/2
. Let
Y
t
= W
t
− µt. Theorem 14.1 says that under Q, Y
t
is a Brownian motion. We have
W
t
= Y
t
+µt.
Let A = (sup
s≤t
0
W
s
≥ a). We want to calculate
P(sup
s≤t
0
(W
s
+µs) ≥ a).
W
t
is a Brownian motion under P while Y
t
is a Brownian motion under Q. So this proba-
bility is equal to
Q(sup
s≤t
0
(Y
s
+µs) ≥ a).
This in turn is equal to
Q(sup
s≤t
0
W
s
≥ a) = Q(A).
Now we use the expression for M
t
:
Q(A) = E
P
[e
µW
t
0
−µ
2
t
0
/2
; A]
=


−∞
e
µx−µ
2
t
0
/2
P(sup
s≤t
0
W
s
≥ a, W
t
0
= x)dx
= e
−µ
2
t
0
/2

a
−∞
1

2πt
0
e
µx
e
−(2a−x)
2
/2t
0
dx +


a
1

2πt
0
e
µx
e
−x
2
/2t
0
dx.

Proof of Theorem 14.1. Using Ito’s formula with f(x) = e
x
,
M
t
= 1 −

t
0
µ(X
r
)M
r
dW
r
.
So
'W, M`
t
= −

t
0
µ(X
r
)M
r
dr.
Since Q(A) = E
P
[M
t
; A], it is not hard to see that
E
Q
[W
t
; A] = E
P
[M
t
W
t
; A].
66
By Ito’s product formula this is
E
P

t
0
M
r
dW
r
; A

+E
P

t
0
W
r
dM
r
; A

+E
P

'W, M`
t
; A

.
Since

t
0
M
r
dW
r
and

t
0
W
r
dM
r
are stochastic integrals with respect to martingales, they
are themselves martingales. Thus the above is equal to
E
P

s
0
M
r
dW
r
; A

+E
P

s
0
W
r
dM
r
; A

+E
P

'W, M`
t
; A

.
Using the product formula again, this is
E
P
[M
s
W
s
; A] +E
P
['W, M`
t
−'W, M`
s
; A] = E
Q
[W
s
; A] +E
P
['W, M`
t
−'W, M`
s
; A].
The last term on the right is equal to
E
P

t
s
d'W, M`
r
; A

= E
P

t
s
M
r
µ(X
r
)dr; A

= E
P

t
s
E
P
[M
t
[ T
r
]µ(X
r
)dr; A

= E
P

t
s
M
t
µ(X
r
)dr; A

= E
Q

t
s
µ(X
r
)dr; A

= −E
Q

t
0
µ(X
r
) dr; A

+E
Q

s
0
µ(X
r
) dr; A

.
Therefore
E
Q

W
t
+

t
0
µ(X
r
)dr; A

= E
Q

W
s
+

s
0
µ(X
r
)dr; A

,
which shows X
t
is a martingale with respect to Q.
Similarly, X
2
t
− t is a martingale with respect to Q. By L´evy’s theorem, X
t
is a
Brownian motion.
In Note 3 we give a proof of Theorem 14.2 and in Note 4 we show how Theorem
14.1 is really a special case of Theorem 14.2.
Note 1. Let
Y
t
= −

t
0
µ(X
s
)dW
s

1
2

t
0
[µ(X
s
)]
2
ds.
We apply Ito’s formula with the function f(x) = e
x
. Note the martingale part of Y
t
is the
stochastic integral term and the quadratic variation of Y is the quadratic variation of the
martingale part, so
'Y `
t
=

t
0
[−µ(X
s
)]
2
ds.
67
Then f

(x) = e
x
, f

(x) = e
x
, and hence
M
t
= e
Y
t
= e
Y
0
+

t
0
e
Y
s
dY
s
+
1
2

t
0
e
Y
s
d'Y `
s
= 1 +

t
0
M
s
(−µ(X
s
)dW
s

1
2

t
0
[µ(X
s
)]
2
ds
+
1
2

t
0
M
s
[−µ(X
s
)]
2
ds
= 1 −

t
0
M
s
µ(X
s
)dW
s
.
Since stochastic integrals with respect to a Brownian motion are martingales, this completes
the argument that M
t
is a martingale.
Note 2. Let S
n
be a simple random walk. This means that X
1
, X
2
, . . . , are independent
and identically distributed random variables with P(X
i
= 1) = P(X
i
= −1) =
1
2
; let S
n
=
¸
n
i=1
X
i
. If you are playing a game where you toss a fair coin and win $1 if it comes up heads
and lose $1 if it comes up tails, then S
n
will be your fortune at time n. Let A
n
= max
0≤k≤n
S
k
.
We will show the analogue of (14.4) for S
n
, which is
P(S
n
= x, A
n
≥ a) =

P(S
n
= x) x ≥ a
P(S
n
= 2a −x) x < a.
(14.5)
(14.4) can be derived from this using a weak convergence argument.
To establish (14.5), note that if x ≥ a and S
n
= x, then automatically A
n
≥ a, so
the only case to consider is when x < a. Any path that crosses a but is at level x at time n
has a corresponding path determined by reflecting across level a at the first time the Brownian
motion hits a; the reflected path will end up at a + (a −x) = 2a −x. The probability on the
left hand side of (14.5) is the number of paths that hit a and end up at x divided by the total
number of paths. Since the number of paths that hit a and end up at x is equal to the number
of paths that end up at 2a−x, then the probability on the left is equal to the number of paths
that end up at 2a −x divided by the total number of paths; this is P(S
n
= 2a −x), which is
the right hand side.
Note 3. To prove Theorem 14.2, we proceed as follows. Assume without loss of generality
that X
0
= 0. Then if A ∈ T
s
,
E
Q
[X
t
; A] = E
P
[M
t
X
t
; A]
= E
P

t
0
M
r
dX
r
; A

+E
P

t
0
X
r
dM
r
; A

+E
P
['X, M`
t
; A]
= E
P

s
0
M
r
dX
r
; A

+E
P

s
0
X
r
dM
r
; A

+E
P
['X, M`
t
; A]
= E
Q
[X
s
; A] +E
Q
['X, M`
t
−'X, M`
s
; A].
68
Here we used the fact that stochastic integrals with respect to the martingales X and M are
again martingales.
On the other hand,
E
P
['X, M`
t
−'X, M`
s
; A] = E
P

t
s
d'X, M`
r
; A

= E
P

t
s
M
r
dD
r
; A

= E
P

t
s
E
P
[M
t
[ T
r
] dD
r
; A

= E
P

t
s
M
t
dD
r
; A

= E
P
[(D
t
−D
s
)M
t
; A]
= E
Q
[D
t
−D
s
; A].
The proof of the quadratic variation assertion is similar.
Note 4. Here is an argument showing how Theorem 14.1 can also be derived from Theorem
14.2.
From our formula for M we have dM
t
= −M
t
µ(X
t
)dW
t
, and therefore d'X, M`
t
=
−M
t
µ(X
t
)dt. Hence by Theorem 14.2 we see that under Q, X
t
is a continuous martingale
with 'X`
t
= t. By L´evy’s theorem, this means that X is a Brownian motion under Q.
69
15. Stochastic differential equations.
Let W
t
be a Brownian motion. We are interested in the existence and uniqueness
for stochastic differential equations (SDEs) of the form
dX
t
= σ(X
t
) dW
t
+b(X
t
) dt, X
0
= x
0
. (15.1)
This means X
t
satisfies
X
t
= x
0
+

t
0
σ(X
s
) dW
s
+

t
0
b(X
s
) ds. (15.2)
Here W
t
is a Brownian motion, and (15.2) holds for almost every ω.
We have to make some assumptions on σ and b. We assume they are Lipschitz,
which means:
[σ(x) −σ(y)[ ≤ c[x −y[, [b(x) −b(y)[ ≤ c[x −y[
for some constant c. We also suppose that σ and b grow at most linearly, which means:
[σ(x)[ ≤ c(1 +[x[), [b(x)[ ≤ c(1 +[x[).
Theorem 15.1. There exists one and only one solution to (15.2).
The idea of the proof is Picard iteration, which is how existence and uniqueness for
ordinary differential equations is proved; see Note 1.
The intuition behind (15.1) is that X
t
behaves locally like a multiple of Brownian
motion plus a constant drift: locally X
t+h
−X
t
≈ σ(W
t+h
−W
t
) +b((t +h) −t). However
the constants σ and b depend on the current value of X
t
. When X
t
is at different points,
the coefficients vary, which is why they are written σ(X
t
) and b(X
t
). σ is sometimes called
the diffusion coefficient and µ is sometimes called the drift coefficient.
The above theorem also works in higher dimensions. We want to solve
dX
i
t
=
d
¸
j=1
σ
ij
(X
s
)dW
j
s
+b
i
(X
s
)ds, i = 1, . . . , d.
This is an abbreviation for the equation
X
i
t
= x
i
0
+

t
0
d
¸
j=1
σ
ij
(X
s
)dW
j
s
+

t
0
b
i
(X
s
)ds.
Here the initial value is x
0
= (x
1
0
, . . . , x
d
0
), the solution process is X
t
= (X
1
t
, . . . , X
d
t
), and
W
1
t
, . . . , W
d
t
are d independent Brownian motions. If all of the σ
ij
and b
i
are Lipschitz
and grow at most linearly, we have existence and uniqueness for the solution.
70
Suppose one wants to solve
dZ
t
= aZ
t
dW
t
+bZ
t
dt.
Note that this equation is linear in Z
t
, and it turns out that linear equations are almost
the only ones that have an explicit solution. In this case we can write down the explicit
solution and then verify that it satisfies the SDE. The uniqueness result above (Theorem
15.1) shows that we have in fact found the solution.
Let
Z
t
= Z
0
e
aW
t
−a
2
t/2+bt
.
We will verify that this is correct by using Ito’s formula. Let X
t
= aW
t
−a
2
t/2+bt. Then
X
t
is a semimartingale with martingale part aW
t
and 'X`
t
= a
2
t. Z
t
= e
X
t
. By Ito’s
formula with f(x) = e
x
,
Z
t
= Z
0
+

t
0
e
X
s
dX
s
+
1
2

t
0
e
X
s
a
2
ds
= Z
0
+

t
0
aZ
s
dW
s

t
0
a
2
2
Z
s
ds +

t
0
bZ
s
ds
+
1
2

t
0
a
2
Z
s
ds
=

t
0
aZ
s
dW
s
+

t
0
bZ
s
ds.
This is the integrated form of the equation we wanted to solve.
There is a connection between SDEs and partial differential equations. Let f be a
C
2
function. If we apply Ito’s formula,
f(X
t
) = f(X
0
) +

t
0
f

(X
s
)dX
s
+
1
2

t
0
f

(X
s
)d'X`
s
.
From (15.2) we know 'X`
t
=

t
0
σ(X
s
)
2
ds. If we substitute for dX
s
and d'X`
s
, we obtain
f(X
t
) = f(X
0
) +

t
0
f

(X
s
)dW
s
+

t
0
µ(X
s
)ds
+
1
2

t
0
f

(X
s
)σ(X
s
)
2
ds
= f(X
0
) +

t
0
f

(X
s
)dW
s
+

t
0
Lf(X
s
)ds,
where we write
Lf(x) =
1
2
σ(x)
2
f

(x) +µ(x)f

(x).
71
L is an example of a differential operator. Since the stochastic integral with respect to a
Brownian motion is a martingale, we see from the above that
f(X
t
) −f(X
0
) −

t
0
Lf(X
s
)ds
is a martingale. This fact can be exploited to derive results about PDEs from SDEs and
vice versa.
Note 1. Let us illustrate the uniqueness part, and for simplicity, assume b is identically 0 and
σ is bounded.
Proof of uniqueness. If X and Y are two solutions,
X
t
−Y
t
=

t
0
[σ(X
s
) −σ(Y
s
)]dW
s
.
So
E[X
t
−Y
t
[
2
= E

t
0
[σ(X
s
) −σ(Y
s
)[
2
ds ≤ c

t
0
E[X
s
−Y
s
[
2
ds,
using the Lipschitz hypothesis on σ. If we let g(t) = E[X
t
−Y
t
[
2
, we have
g(t) ≤ c

t
0
g(s) ds.
Since we are assuming σ is bounded, EX
2
t
= E

t
0
(σ(X
s
))
2
ds ≤ ct and similarly for EY
2
t
, so
g(t) ≤ ct. Then
g(t) ≤ c

t
0

c

s
0
g(r) dr

ds.
Iteration implies
g(t) ≤ At
n
/n!
for each n, which implies g must be 0.
72
16. Continuous time financial models.
The most common model by far in finance is one where the security price is based
on a Brownian motion. One does not want to say the price is some multiple of Brownian
motion for two reasons. First, of all, a Brownian motion can become negative, which
doesn’t make sense for stock prices. Second, if one invests $1,000 in a stock selling for $1
and it goes up to $2, one has the same profit, namely, $1,000, as if one invests $1,000 in a
stock selling for $100 and it goes up to $200. It is the proportional increase one wants.
Therefore one sets ∆S
t
/S
t
to be the quantity related to a Brownian motion. Differ-
ent stocks have different volatilities σ (consider a high-tech stock versus a pharmaceutical).
In addition, one expects a mean rate of return µ on one’s investment that is positive (oth-
erwise, why not just put the money in the bank?). In fact, one expects the mean rate
of return to be higher than the risk-free interest rate r because one expects something in
return for undertaking risk.
So the model that is used is to let the stock price be modeled by the SDE
dS
t
/S
t
= σdW
t
+µdt,
or what looks better,
dS
t
= σS
t
dW
t
+µS
t
dt. (16.1)
Fortunately this SDE is one of those that can be solved explicitly, and in fact we
gave the solution in Section 15.
Proposition 16.1. The solution to (16.1) is given by
S
t
= S
0
e
σW
t
+(µ−(σ
2
/2)t)
. (16.2)
Proof. Using Theorem 15.1 there will only be one solution, so we need to verify that S
t
as given in (16.2) satisfies (16.1). We already did this, but it is important enough that we
will do it again. Let us first assume S
0
= 1. Let X
t
= σW
t
+ (µ −(σ
2
/2)t, let f(x) = e
x
,
and apply Ito’s formula. We obtain
S
t
= e
X
t
= e
X
0
+

t
0
e
X
s
dX
s
+
1
2

t
0
e
X
s
d'X`
s
= 1 +

t
0
S
s
σdW
s
+

t
0
S
s
(µ −
1
2
σ
2
)ds
+
1
2

t
0
S
s
σ
2
ds
= 1 +

t
0
S
s
σdW
s
+

t
0
S
s
µds,
73
which is (16.1). If S
0
= 0, just multiply both sides by S
0
.
Suppose for the moment that the interest rate r is 0. If one purchases ∆
0
shares
(possibly a negative number) at time t
0
, then changes the investment to ∆
1
shares at time
t
1
, then changes the investment to ∆
2
at time t
2
, etc., then one’s wealth at time t will be
X
t
0
+ ∆
0
(S
t
1
−S
t
0
) + ∆
1
(S
t
2
−S
t
1
) + + ∆
i
(S
t
i+1
−S
t
i
). (16.3)
To see this, at time t
0
one has the original wealth X
t
0
. One buys ∆
0
shares and the cost
is ∆
0
S
t
0
. At time t
1
one sells the ∆
0
shares for the price of S
t
1
per share, and so one’s
wealth is now X
t
0
+ ∆
0
(S
t
1
− S
t
0
). One now pays ∆
1
S
t
1
for ∆
1
shares at time t
1
and
continues. The right hand side of (16.3) is the same as
X
t
0
+

t
t
0
∆(s)dS
s
,
where we have t ≥ t
i+1
and ∆(s) = ∆
i
for t
i
≤ s < t
i+1
. In other words, our wealth is
given by a stochastic integral with respect to the stock price. The requirement that the
integrand of a stochastic integral be adapted is very natural: we cannot base the number
of shares we own at time s on information that will not be available until the future.
How should we modify this when the interest rate r is not zero? Let P
t
be the
present value of the stock price. So
P
t
= e
−rt
S
t
.
Note that P
0
= S
0
. When we hold ∆
i
shares of stock from t
i
to t
i+1
, our profit in present
days dollars will be

i
(P
t
i+1
−P
t
i
).
The formula for our wealth then becomes
X
t
0
+

t
t
0
∆(s)dP
s
.
By Ito’s product formula,
dP
t
= e
−rt
dS
t
−re
−rt
S
t
dt
= e
−rt
σS
t
dW
t
+e
−rt
µS
t
dt −re
−rt
S
t
dt
= σP
t
dW
t
+ (µ −r)P
t
dt.
Similarly to (16.2), the solution to this SDE is
P
t
= P
0
e
σW
t
+(µ−r−σ
2
/2)t
. (16.4)
The continuous time model of finance is that the security price is given by (16.1)
(often called geometric Brownian motion), that there are no transaction costs, but one can
trade as many shares as one wants and vary the amount held in a continuous fashion. This
clearly is not the way the market actually works, for example, stock prices are discrete,
but this model has proved to be a very good one.
74
17. Markov properties of Brownian motion.
Let W
t
be a Brownian motion. Because W
t+r
−W
t
is independent of σ(W
s
: s ≤ t),
then knowing the path of W up to time s gives no help in predicting W
t+r
− W
t
. In
particular, if we want to predict W
t+r
and we know W
t
, then knowing the path up to time
t gives no additional advantage in predicting W
t+r
. Phrased another way, this says that
to predict the future, we only need to know where we are and not how we got there.
Let’s try to give a more precise description of this property, which is known as the
Markov property.
Fix r and let Z
t
= W
t+r
− W
r
. Clearly the map t → Z
t
is continuous since the
same is true for W. Since Z
t
− Z
s
= W
t+r
− W
s+r
, then the distribution of Z
t
− Z
s
is
normal with mean zero and variance (t +r) −(s +r). One can also check the other parts
of the definition to show that Z
t
is also a Brownian motion.
Recall that a stopping time in the continuous framework is a r.v. T taking values
in [0, ∞) such that (T ≤ t) ∈ T
t
for all t. To make a satisfactory theory, we need that the
T
t
be right continuous (see Section 10), but this is fairly technical and we will ignore it.
If T is a stopping time, T
T
is the collection of events A such that A∩(T > t) ∈ T
t
for all t.
Let us try to provide some motivation for this definition of T
T
. It will be simpler to
consider the discrete time case. The analogue of T
T
in the discrete case is the following:
if N is a stopping time, let
T
N
= ¦A : A∩ (N ≤ k) ∈ T
k
for all k¦.
If X
k
is a sequence that is adapted to the σ-fields T
k
, that is, X
k
is T
k
measurable when
k = 0, 1, 2, . . ., then knowing which events in T
k
have occurred allows us to calculate X
k
for each k. So a reasonable definition of T
N
should allow us to calculate X
N
whenever
we know which events in T
N
have occurred or not. Or phrased another way, we want X
N
to be T
N
measurable. Where did the sequence X
k
come from? It could be any adapted
sequence. Therefore one definition of the σ-field of events occurring before time N might
be:
Consider the collection of random variables X
N
where X
k
is a sequence adapted
to T
k
. Let (
N
be the smallest σ-field with respect to which each of these random
variables X
N
is measurable.
In other words, we want (
N
to be the σ-field generated by the collection of random
variables X
N
for all sequences X
k
that are adapted to T
k
.
We show in Note 1 that T
N
= (
N
. The σ-field T
N
is just a bit easier to work with.
Now we proceed to the strong Markov property for Brownian motion, the proof of
which is given in Note 2.
75
Proposition 17.1. If X
t
is a Brownian motion and T is a bounded stopping time, then
X
T+t
−X
T
is a mean 0 variance t random variable and is independent of T
T
.
This proposition says: if you want to predict X
T+t
, you could do it knowing all of
T
T
or just knowing X
T
. Since X
T+t
− X
T
is independent of T
T
, the extra information
given in T
T
does you no good at all.
We need a way of expressing the Markov and strong Markov properties that will
generalize to other processes.
Let W
t
be a Brownian motion. Consider the process W
x
t
= x+W
t
, which is known
as Brownian motion started at x. Define Ω

to be set of continuous functions on [0, ∞), let
X
t
(ω) = ω(t), and let the σ-field be the one generated by the X
t
. Define P
x
on (Ω

, T

) by
P
x
(X
t
1
∈ A
1
, . . . , X
t
n
∈ A
n
) = P(W
x
t
1
∈ A
1
, . . . , W
x
t
n
∈ A
n
).
What we have done is gone from one probability space Ω with many processes W
x
t
to one
process X
t
with many probability measures P
x
.
An example in the Markov chain setting might help. No knowledge of Markov chains
is necessary to understand this. Suppose we have a Markov chain with 3 states, A, B, and
C. Suppose we have a probability P and three different Markov chains. The first, called
X
A
n
, represents the position at time n for the chain started at A. So X
A
0
= A, and X
A
1
can
be one of A, B, C, X
A
2
can be one of A, B, C, and so on. Similarly we have X
B
n
, the chain
started at B, and X
C
n
. Define Ω

= ¦(AAA), (AAB), (ABA), . . . , (BAA), (BAB), . . .¦.
So Ω

denotes the possible sequence of states for time n = 0, 1, 2. If ω = ABA, set
Y
0
(ω) = A, Y
1
(ω) = B, Y
2
(ω) = A, and similarly for all the other 26 values of ω. Define
P
A
(AAA) = P(X
A
0
= A, X
A
1
= A, X
A
2
= A). Similarly define P
A
(AAB), . . .. Define
P
B
(AAA) = P(X
B
0
= A, X
B
1
= A, X
B
2
= A) (this will be 0 because we know X
B
0
= B),
and similarly for the other values of ω. We also define P
C
. So we now have one process,
Y
n
, and three probabilities P
A
, P
B
, P
C
. As you can see, there really isn’t all that much
going on here.
Here is another formulation of the Markov property.
Proposition 17.2. If s < t and f is bounded or nonnegative, then
E
x
[f(X
t
) [ T
s
] = E
X
s
[f(X
t−s
)], a.s.
The right hand side is to be interpreted as follows. Define ϕ(x) = E
x
f(X
t−s
). Then
E
X
s
f(X
t−s
) means ϕ(X
s
(ω)). One often writes P
t
f(x) for E
x
f(X
t
). We prove this in
Note 3.
This formula generalizes: If s < t < u, then
E
x
[f(X
t
)g(X
u
) [ T
s
] = E
X
s
[f(X
t−s
)g(X
u−s
)],
76
and so on for functions of X at more times.
Using Proposition 17.1, the statement and proof of Proposition 17.2 can be extended
to stopping times.
Proposition 17.3. If T is a bounded stopping time, then
E
x
[f(X
T+t
) [ T
T
] = E
X
T
[f(X
t
)].
We can also establish the Markov property and strong Markov property in the
context of solutions of stochastic differential equations. If we let X
x
t
denote the solution
to
X
x
t
= x +

t
0
σ(X
x
s
)dW
s
+

t
0
b(X
x
s
)ds,
so that X
x
t
is the solution of the SDE started at x, we can define new probabilities by
P
x
(X
t
1
∈ A
1
, . . . , X
t
n
∈ A
n
) = P(X
x
t
1
∈ A
1
, . . . , X
x
t
n
∈ A
n
).
This is similar to what we did in defining P
x
for Brownian motion, but here we do not
have translation invariance. One can show that when there is uniqueness for the solution
to the SDE, the family (P
x
, X
t
) satisfies the Markov and strong Markov property. The
statement is precisely the same as the statement of Proposition 17.3.
Note 1. We want to show (
N
= T
N
. Since (
N
is the smallest σ-field with respect to which
X
N
is measurable for all adapted sequences X
k
and it is easy to see that T
N
is a σ-field, to
show (
N
⊂ T
N
, it suffices to show that X
N
is measurable with respect to T
N
whenever X
k
is adapted. Therefore we need to show that for such a sequence X
k
and any real number a,
the event (X
N
> a) ∈ T
N
.
Now (X
N
> a) ∩ (N = j) = (X
j
> a) ∩ (N = j). The event (X
j
> a) ∈ T
j
since X is an adapted sequence. Since N is a stopping time, then (N ≤ j) ∈ T
j
and
(N ≤ j −1)
c
∈ T
j−1
⊂ T
j
, and so the event (N = j) = (N ≤ j) ∩ (N ≤ j −1)
c
is in T
j
. If
j ≤ k, then (N = j) ∈ T
j
⊂ T
k
. Therefore
(X
N
> a) ∩ (N ≤ k) = ∪
k
j=0
((X
N
> a) ∩ (N = j)) ∈ T
k
,
which proves that (X
N
> a) ∈ T
N
.
To show T
N
⊂ (
N
, we suppose that A ∈ T
N
. Let X
k
= 1
A∩(N≤k)
. Since A ∈ T
N
,
then A∩(N ≤ k) ∈ T
k
, so X
k
is T
k
measurable. But X
N
= 1
A∩(N≤N)
= 1
A
, so A = (X
N
>
0) ∈ (
N
. We have thus shown that T
N
⊂ (
N
, and combining with the previous paragraph,
we conclude T
N
= (
N
.
77
Note 2. Let T
n
be defined by T
n
(ω) = (k + 1)/2
n
if T(ω) ∈ [k/2
n
, (k + 1)/2
n
). It is easy
to check that T
n
is a stopping time. Let f be continuous and A ∈ T
T
. Then A ∈ T
T
n
as
well. We have
E[f(X
T
n
+t
−X
T
n
); A] =
¸
E[f(X k
2
n
+t
−X k
2
n
); A∩ T
n
= k/2
n
]
=
¸
E[f(X k
2
n
+t
−X k
2
n
)]P(A∩ T
n
= k/2
n
)
= Ef(X
t
)P(A).
Let n → ∞, so
E[f(X
T+t
−X
T
); A] = Ef(X
t
)P(A).
Taking limits this equation holds for all bounded f.
If we take A = Ω and f = 1
B
, we see that X
T+t
−X
T
has the same distribution as X
t
,
which is that of a mean 0 variance t normal random variable. If we let A ∈ T
T
be arbitrary
and f = 1
B
, we see that
P(X
T+t
−X
T
∈ B, A) = P(X
t
∈ B)P(A) = P(X
T+t
−X
T
∈ B)P(A),
which implies that X
T+t
−X
T
is independent of T
T
.
Note 3. Before proving Proposition 17.2, recall from undergraduate analysis that every
bounded function is the limit of linear combinations of functions e
iux
, u ∈ R. This follows
from using the inversion formula for Fourier transforms. There are various slightly different
formulas for the Fourier transform. We use
´
f(u) =

e
iux
f(x) dx. If f is smooth enough and
has compact support, then one can recover f by the formula
f(x) =
1

e
−iux
´
f(u) du.
We can first approximate this improper integral by
1

N
−N
e
−iux
´
f(u) du
by taking N larger and larger. For each N we can approximate
1

N
−N
e
−iux
´
f(u) du by using
Riemann sums. Thus we can approximate f(x) by a linear combination of terms of the form
e
iu
j
x
. Finally, bounded functions can be approximated by smooth functions with compact
support.
Proof. Let f(x) = e
iux
. Then
E
x
[e
iuX
t
[ T
s
] = e
iuX
s
E
x
[e
iu(X
t
−X
s
)
[ T
s
]
= e
iuX
s
e
−u
2
(t−s)/2
.
On the other hand,
ϕ(y) = E
y
[f(X
t−s
)] = E[e
iu(W
t−s
+y)
] = e
iuy
e
−u
2
(t−s)/2
.
So ϕ(X
s
) = E
x
[e
iuX
t
[ T
s
]. Using linearity and taking limits, we have the lemma for all f.
78
18. Martingale representation theorem.
In this section we want to show that every random variable that is T
t
measurable
can be written as a stochastic integral of Brownian motion. In the next section we use
this to show that under the model of geometric Brownian motion the market is complete.
This means that no matter what option one comes up with, one can exactly replicate the
result (no matter what the market does) by buying and selling shares of stock.
In mathematical terms, we let T
t
be the σ-field generated by W
s
, s ≤ t. From (16.2)
we see that T
t
is also the same as the σ-field generated by S
s
, s ≤ t, so it doesn’t matter
which one we work with. We want to show that if V is T
t
measurable, then there exists
H
s
adapted such that
V = V
0
+

H
s
dW
s
, (18.1)
where V
0
is a constant.
Our goal is to prove
Theorem 18.1. If V is T
t
measurable and EV
2
< ∞, then there exists a constant c and
an adapted integrand H
s
with E

t
0
H
2
s
ds < ∞ such that
V = c +

t
0
H
s
dW
s
.
Before we prove this, let us explain why this is called a martingale representation
theorem. Suppose M
s
is a martingale adapted to T
s
, where the T
s
are the σ-field generated
by a Brownian motion. Suppose also that EM
2
t
< ∞. Set V = M
t
. By Theorem 18.1, we
can write
M
t
= V = c +

t
0
H
s
dW
s
.
The stochastic integral is a martingale, so for r ≤ t,
M
r
= E[M
t
[ T
r
] = c +E

t
0
H
s
dW
s
[ T
r

= c +

r
0
H
s
dW
s
.
We already knew that stochastic integrals were martingales; what this says is the converse:
every martingale can be represented as a stochastic integral. Don’t forget that we need
EM
2
t
< ∞ and M
s
adapted to the σ-fields of a Brownian motion.
In Note 1 we show that if every martingale can be represented as a stochastic
integral, then every random variable V that is T
t
measurable can, too, provided EV
2
< ∞.
There are several proofs of Theorem 18.1. Unfortunately, they are all technical. We
outline one proof here, giving details in the notes. We start with the following, proved in
Note 2.
79
Proposition 18.2. Suppose
V
n
= c
n
+

t
0
H
n
s
dW
s
,
c
n
→ c,
E[V
n
−V [
2
→ 0,
and for each n the process H
n
is adapted with E

t
0
(H
n
s
)
2
ds < ∞. Then there exist a
constant c and an adapted H
s
with E

t
0
H
2
s
ds < ∞ so that
V
t
= c +

t
0
H
s
dW
s
.
What this proposition says is that if we can represent a sequence of random variables V
n
and V
n
→ V , then we can represent V .
Let 1 be the collection of random variables that can be represented as stochastic
integrals. By this we mean
1 = ¦V : EV
2
< ∞,V is T
t
measurable,V = c +

t
0
H
s
dW
s
for some adapted H with E

t
0
H
2
s
ds < ∞¦.
Next we show 1 contains a particular collection of random variables. (The proof is
in Note 3.)
Proposition 18.3. If g is bounded, the random variable g(W
t
) is in 1.
An almost identical proof shows that if f is bounded, then
f(W
t
−W
s
) = c +

t
s
H
r
dW
r
for some c and H
r
.
Proposition 18.4. If t
0
≤ t
1
≤ ≤ t
n
≤ t and f
1
, . . . , f
n
are bounded functions, then
f
1
(W
t
1
−W
t
0
)f
2
(W
t
2
−W
t
1
) f
n
(W
t
n
−W
t
n−1
) is in 1.
See Note 4 for the proof.
We now finish the proof of Theorem 18.1. We have shown that a large class of
random variables is contained in 1.
Proof of Theorem 18.1. We have shown that random variables of the form
f
1
(W
t
1
−W
t
0
)f
2
(W
t
2
−W
t
1
) f
n
(W
t
n
−W
t
n−1
) (18.2)
80
are in 1. Clearly if V
i
∈ 1 for i = 1, . . . , m, and a
i
are constants, then a
1
V
1
+ a
m
V
m
is
also in 1. Finally, from measure theory we know that if EV
2
< ∞and V is T
t
measurable,
we can find a sequence V
k
such that E[V
k
−V [
2
→ 0 and each V
k
is a linear combination
of random variables of the form given in (18.2). Now apply Proposition 18.2.
Note 1. Suppose we know that every martingale M
s
adapted to T
s
with EM
2
t
can be
represented as M
r
= c+

r
0
H
s
dW
s
for some suitable H. If V is T
t
measurable with EV
2
< ∞,
let M
r
= E[V [ T
r
]. We know this is a martingale, so
M
r
= c +

r
0
H
s
dW
s
for suitable H. Applying this with r = t,
V = E[V [ T
t
] = M
t
= c +

t
0
H
s
dW
s
.
Note 2. We prove Proposition 18.2. By our assumptions,
E[(V
n
−c
n
) −(V
m
−c
m
)[
2
→ 0
as n, m → ∞. So
E

t
0
(H
n
s
−H
m
s
)dW
s

2
→ 0.
From our formulas for stochastic integrals, this means
E

t
0
[H
n
s
−H
m
s
[
2
ds → 0.
This says that H
n
s
is a Cauchy sequence in the space L
2
(with respect to the norm | |
2
given
by |Y |
2
=

E

t
0
Y
2
s
ds

1/2
). Measure theory tells us that L
2
is a complete metric space, so
there exists H
s
such that
E

t
0
[H
n
s
−H
s
[
2
ds → 0.
In particular H
n
s
→ H
s
, and this implies H
s
is adapted. Another consequence, due to Fatou’s
lemma, is that E

t
0
H
2
s
ds.
Let U
t
=

t
0
H
s
dW
s
. Then as above,
E[(V
n
−c
n
) −U
t
[
2
= E

t
0
(H
n
s
−H
s
)
2
ds → 0.
Therefore U
t
= V −c, and U has the desired form.
81
Note 3. Here is the proof of Proposition 18.3. By Ito’s formula with X
s
= −iuW
s
+u
2
s/2
and f(x) = e
x
,
e
X
t
= 1 +

t
0
e
X
s
(−iu)dW
s
+

t
0
e
X
s
(u
2
/2)ds
+
1
2

t
0
e
X
s
(−iu)
2
ds
= 1 −iu

t
0
e
X
s
dW
s
.
If we multiply both sides by e
−u
2
t/2
, which is a constant and hence adapted, we obtain
e
−iuW
t
= c
u
+

t
0
H
u
s
dW
s
(18.3)
for an appropriate constant c
u
and integrand H
u
.
If f is a smooth function (e.g., C

with compact support), then its Fourier transform
´
f will also be very nice. So if we multiply (18.3) by
´
f(u) and integrate over u from −∞ to
∞, we obtain
f(W
t
) = c +

t
0
H
s
dW
s
for some constant c and some adapted integrand H. (We implicitly used Proposition 18.2,
because we approximate our integral by Riemann sums, and then take a limit.) Now using
Proposition 18.2 we take limits and obtain the proposition.
Note 4. The argument is by induction; let us do the case n = 2 for clarity. So we suppose
V = f(W
t
)g(W
u
−W
t
).
From Proposition 18.3 we now have that
f(W
t
) = c +

t
0
H
s
dW
s
, g(W
u
−W
t
) = d +

u
t
K
s
dW
s
.
Set H
r
= H
r
if 0 ≤ s < t and 0 otherwise. Set K
r
= K
r
if s ≤ r < t and 0 otherwise. Let
X
s
= c +

s
0
H
r
dW
r
and Y
s
= d +

s
0
K
r
dW
r
. Then
'X, Y `
s
=

s
0
H
r
K
r
dr = 0.
Then by the Ito product formula,
X
s
Y
s
= X
0
Y
0
+

s
0
X
r
dY
r
+

s
0
Y
r
dX
r
+'X, Y `
s
= cd +

s
0
[X
r
K
r
+Y
r
H
r
]dW
r
.
82
If we now take s = u, that is exactly what we wanted. Note that X
r
K
r
+Y
r
H
r
is 0 if r > u;
this is needed to do the general induction step.
83
19. Completeness.
Now let P
t
be a geometric Brownian motion. As we mentioned in Section 16, if
P
t
= P
0
exp(σW
t
+ (µ − r − σ
2
/2)t), then given P
t
we can determine W
t
and vice versa,
so the σ fields generated by P
t
and W
t
are the same. Recall P
t
satisfies
dP
t
= σP
t
dW
t
+ (µ −r)P
t
dt.
Define a new probability P by
dP
dP
= M
t
= exp(aW
t
−a
2
t/2).
By the Girsanov theorem,

W
t
= W
t
−at
is a Brownian motion under P. So
dP
t
= σP
t
d

W
t
+σP
t
adt + (µ −r)P
t
dt.
If we choose a = −(µ −r)/σ, we then have
dP
t
= σP
t
d

W
t
. (19.1)
Since

W
t
is a Brownian motion under P, then P
t
must be a martingale, since it is a
stochastic integral of a Brownian motion. We can rewrite (19.1) as
d

W
t
= σ
−1
P
−1
t
dP
t
. (19.2)
Given a T
t
measurable variable V , we know by Theorem 18.1 that there exist a
constant and an adapted process H
s
such that E

t
0
H
2
s
ds < ∞ and
V = c +

t
0
H
s
d

W
s
.
But then using (19.2) we have
V = c +

t
0
H
s
σ
−1
P
−1
s
dP
s
.
We have therefore proved
Theorem 19.1. If P
t
is a geometric Brownian motion and V is T
t
measurable and square
integrable, then there exist a constant c and an adapted process K
s
such that
V = c +

t
0
K
s
dP
s
.
Moreover, there is a probability P under which P
t
is a martingale.
The probability P is called the risk-neutral measure. Under P the present day value
of the stock price is a martingale.
84
20. Black-Scholes formula, I.
We can now derive the formula for the price of any option. Let T ≥ 0 be a fixed
real. If V is T
T
measurable, we have by Theorem 19.1 that
V = c +

T
0
K
s
dP
s
, (20.1)
and under P, the process P
s
is a martingale.
Theorem 20.1. The price of V must be EV .
Proof. This is the “no arbitrage” principle again. Suppose the price of the option V at
time 0 is W. Starting with 0 dollars, we can sell the option V for W dollars, and use the
W dollars to buy and trade shares of the stock. In fact, if we use c of those dollars, and
invest according to the strategy of holding K
s
shares at time s, then at time T we will
have
e
rT
(W
0
−c) +V
dollars. At time T the buyer of our option exercises it and we use V dollars to meet that
obligation. That leaves us a profit of e
rT
(W
0
−c) if W
0
> c, without any risk. Therefore
W
0
must be less than or equal to c. If W
0
< c, we just reverse things: we buy the option
instead of sell it, and hold −K
s
shares of stock at time s. By the same argument, since
we can’t get a riskless profit, we must have W
0
≥ c, or W
0
= c.
Finally, under P the process P
t
is a martingale. So taking expectations in (20.1),
we obtain
EV = c.
The formula in the statement of Theorem 20.1. is amenable to calculation. Suppose
we have the standard European option, where
V = e
−rt
(S
t
−K)
+
= (e
−rt
S
t
−e
−rt
K)
+
= (P
t
−e
−rt
K)
+
.
Recall that under P the stock price satisfies
dP
t
= σP
t
d

W
t
,
where

W
t
is a Brownian motion under P. So then
P
t
= P
0
e
σ ¯ W
t
−σ
2
t/2
.
85
Hence
EV = E[(P
T
−e
−rT
K)
+
] (20.2)
= E[(P
0
e
σ ¯ W
T
−(σ
2
/2)T
−e
−rT
K)
+
].
We know the density of

W
T
is just (2πT)
−1/2
e
−y
2
/(2T)
, so we can do some calculations
(see Note 1) and end up with the famous Black-Scholes formula:
W
0
= xΦ(g(x, T)) −Ke
−rT
Φ(h(x, T)),
where Φ(z) =
1

z
−∞
e
−y
2
/2
dy, x = P
0
= S
0
,
g(x, T) =
log(x/K) + (r +σ
2
/2)T
σ

T
,
h(x, T) = g(x, T) −σ

T.
It is of considerable interest that the final formula depends on σ but is completely
independent of µ. The reason for that can be explained as follows. Under P the process P
t
satisfies dP
t
= σP
t
d

W
t
, where

W
t
is a Brownian motion. Therefore, similarly to formulas
we have already done,
P
t
= P
0
e
σ ¯ W
t
−σ
2
t/2
,
and there is no µ present here. (We used the Girsanov formula to get rid of the µ.) The
price of the option V is
E[P
T
−e
−rT
K]
+
, (20.3)
which is independent of µ since P
t
is.
Note 1. We want to calculate
E

(xe
σ ¯ W
T
−σ
2
T/2
−e
−rT
K)
+

, (20.4)
where

W
t
is a Brownian motion under P and we write x for P
0
= S
0
. Since

W
T
is a normal
random vairable with mean 0 and variance T, we can write it as

TZ, where Z is a standard
mean 0 variance 1 normal random variable.
Now
xe
σ

TZ−σ
2
T/2
> e
−rT
K
if and only if
log x +σ

TZ −σ
2
T/2 > −r + log K,
86
or if
Z > (σ
2
T/2) −r + log K −log x.
We write z
0
for the right hand side of the above inequality. Recall that 1 −Φ(z) = Φ(−z) for
all z by the symmetry of the normal density. So (20.4) is equal to
1


z
0
(xe
σ

Tz−σ
2
T/2
−e
−rT
K)
+
e
−z
2
/2
dz
= x
1


z
0
e

1
2
(z
2
−2σ

Tz+σ
2
T
dz −Ke
−rT 1


z
0
e
−z
2
/2
dz
= x
1


z
0
e

1
2
(z−σ

T)
2
dz −Ke
−rT
(1 −Φ(z
0
))
= x
1


z
0
−σ

T
e
−y
2
/2
dy −Ke
−rT
Φ(−z
0
)
= x(1 −Φ(z
0
−σ

T)) −Ke
−rT
Φ(−z
0
)
= xΦ(σ

T −z
0
) −Ke
−rT
Φ(−z
0
).
This is the Black-Scholes formula if we observe that σ

T −z
0
= g(x, T) and −z
0
= h(x, T).
87
21. Hedging strategies.
The previous section allows us to compute the value of any option, but we would also
like to know what the hedging strategy is. This means, if we know V = EV +

T
0
H
s
dS
s
,
what should H
s
be? This might be important to know if we wanted to duplicate an option
that was not available in the marketplace, or if we worked for a bank and wanted to provide
an option for sale.
It is not always possible to compute H, but in many cases of interest it is possible.
We illustrate one technique with two examples.
First, suppose we want to hedge the standard European call V = e
−rT
(S
T
−K)
+
=
(P
T
−e
−rT
K)
+
. We are working here with the risk-neutral probability only. It turns out
it makes no difference: the definition of

t
0
H
s
dX
s
for a semimartingale X does not depend
on the probability P, other than worrying about some integrability conditions.
We can rewrite V as
V = EV +g(

W
T
),
where
g(x) = (e
σx−σ
2
T/2
−e
−rT
K)
+
−EV.
Therefore the expectation of g(

W
T
) is 0. Recall that under P,

W is a Brownian motion.
If we write g(

W
T
) as

T
0
H
s
d

W
s
, (21.1)
then since dP
t
= σP
t
d

W
t
, we have
g(

W
T
) = c +

T
0
1
σP
s
H
s
dP
s
. (21.2)
Therefore it suffices to find the representation of the form (21.1).
Recall from the section on the Markov property that
P
t
f(x) = E
x
f(

W
t
) = Ef(x +

W
t
) =

1

2πt
e
−(y)
2
/2t
f(x +y)dy.
Let M
t
= E[g(

W
T
) [ T
t
]. By Proposition 4.3, we know that M
t
is a martingale. By the
Markov property Proposition 17.2, we see that
M
t
= E
¯ W
t
[g(

W
T−t
] = P
T−t
g(

W
t
). (21.3)
Now let us apply Ito’s formula with the function f(x
1
, x
2
) = P
x
2
g(x
1
) to the process
X
t
= (X
1
t
, X
2
t
) = (

W
t
, T − t). So we need to use the multidimensional version of Ito’s
formula. We have dX
1
t
= d

W
t
and dX
2
t
= −dt. Since X
2
t
is a decreasing process and has
88
no martingale part, then d'X
2
`
t
= 0 and d'X
1
, X
2
`
t
= 0, while d'X
1
`
t
= dt. Ito’s formula
says that
f(X
1
t
, X
2
t
) = f(X
1
0
, X
2
0
) +

t
0
2
¸
i=1
∂f
∂x
i
(X
t
)dX
i
t
+
1
2

t
0
2
¸
i,j=1

2
f
∂x
i
∂x
j
(X
t
)d'X
i
, X
j
`
t
= c +

t
0
∂f
∂x
1
(X
t
)d

W
t
+ some terms with dt.
But we know that f(X
t
) = P
T−t
g(

W
t
) = M
t
is a martingale, so the sum of the terms
involving dt must be zero; if not, f(X
t
) would have a bounded variation part. We conclude
M
t
=

t
0

∂x
P
T−s
g(

W
s
)d

W
s
.
If we take t = T, we then have
g(

W
T
) = M
T
=

T
0

∂x
P
T−s
g(

W
s
)d

W
s
,
and we have our representation.
For a second example, let’s look at the sell-high option. Here the payoff is sup
s≤T
S
s
,
the largest the stock price ever is up to time T. This is T
T
measurable, so we can compute
its value. How can one get the equivalent outcome without looking into the future?
For simplicity, let us suppose the interest rate r is 0. Let N
t
= sup
s≤t
S
s
, the
maximum up to time t. It is not the case that N
t
is a Markov process. Intuitively, the
reasoning goes like this: suppose the maximum up to time 1 is $100, and we want to
predict the maximum up to time 2. If the stock price at time 1 is close to $100, then we
have one prediction, while if the stock price at time 1 is close to $2, we would definitely
have another prediction. So the prediction for N
2
does not depend just on N
1
, but also
the stock price at time 1. This same intuitive reasoning does suggest, however, that the
triple Z
t
= (S
t
, N
t
, t) is a Markov process, and this turns out to be correct. Adding in the
information about the current stock price gives a certain amount of evidence to predict
the future values of N
t
; adding in the history of the stock prices up to time t gives no
additional information.
Once we believe this, the rest of the argument is very similar to the first example.
Let P
u
f(z) = E
z
f(Z
u
), where z = (s, n, t). Let g(Z
t
) = N
t
−EN
T
. Then
M
t
= E[g(Z
T
) [ T
t
] = E
Z
t
[g(Z
T−t
)] = P
T−t
g(Z
t
).
89
We then let f(s, n, t) = P
T−t
g(s, n, t) and apply Ito’s formula. The process N
t
is always
increasing, so has no martingale part, and hence 'N`
t
= 0. When we apply Ito’s formula,
we get a dS
t
term, which is the martingale term, we get some terms involving dt, which are
of bounded variation, and we get a term involving dN
t
, which is also of bounded variation.
But M
t
is a martingale, so all the dt and dN
t
terms must cancel. Therefore we should be
left with the martingale term, which is

t
0

∂s
P
T−s
g(S
s
, N
s
, s)dS
s
,
where again g(s, n, t) = n. This gives us our hedging strategy for the sell-high option, and
it can be explicitly calculated.
There is another way to calculate hedging strategies, using what is known as the
Clark-Haussmann-Ocone formula. This is a more complicated procedure, and most cases
can be done as well by an appropriate use of the Markov property.
90
22. Black-Scholes formula, II.
Here is a second approach to the Black-Scholes formula. This approach works for
European calls and several other options, but does not work in the generality that the
first approach does. On the other hand, it allows one to compute more easily what the
equivalent strategy of buying or selling stock should be to duplicate the outcome of the
given option. In this section we work with the actual price of the stock instead of the
present value.
Let V
t
be the value of the portfolio and assume V
t
= f(S
t
, T −t) for all t, where f
is some function that is sufficiently smooth. We also want V
T
= (S
T
−K)
+
.
Recall Ito’s formula. The multivariate version is
f(X
t
) = f(X
0
) +

t
0
d
¸
i=1
f
x
i
(X
s
) dX
i
s
+
1
2

t
0
d
¸
i,j=1
f
x
i
x
j
(X
s
) d'X
i
, X
j
`
s
.
Here X
t
= (X
1
t
, . . . , X
d
t
) and f
x
i
denotes the partial derivative of f in the x
i
direction,
and similarly for the second partial derivatives.
We apply this with d = 2 and X
t
= (S
t
, T − t). From the SDE that S
t
solves,
d'X
1
`
t
= σ
2
S
2
t
dt, 'X
2
`
t
= 0 (since T − t is of bounded variation and hence has no
martingale part), and 'X
1
, X
2
`
t
= 0. Also, dX
2
t
= −dt. Then
V
t
−V
0
= f(S
t
, T −t) −f(S
0
, T) (22.1)
=

t
0
f
x
(S
u
, T −u) dS
u

t
0
f
s
(S
u
, T −u) du
+
1
2

t
0
σ
2
S
2
u
f
xx
(S
u
, T −u) du.
On the other hand, if a
u
and b
u
are the number of shares of stock and bonds, respectively,
held at time u,
V
t
−V
0
=

t
0
a
u
dS
u
+

t
0
b
u

u
. (22.2)
This formula says that the increase in net worth is given by the profit we obtain by holding
a
u
shares of stock and b
u
bonds at time u. Since the value of the portfolio at time t is
V
t
= a
t
S
t
+b
t
β
t
,
we must have
b
t
= (V
t
−a
t
S
t
)/β
t
. (22.3)
Also, recall
β
t
= β
0
e
rt
. (22.4)
91
To match up (22.2) with (22.1), we must therefore have
a
t
= f
x
(S
t
, T −t) (22.5)
and
r[f(S
t
, T −t) −S
t
f
x
(S
t
, T −t)] = −f
s
(S
t
, T −t) +
1
2
σ
2
S
2
t
f
xx
(S
t
, T −t) (22.6)
for all t and all S
t
. (22.6) leads to the parabolic PDE
f
s
=
1
2
σ
2
x
2
f
xx
+rxf
x
−rf, (x, s) ∈ (0, ∞) [0, T), (22.7)
and
f(x, 0) = (x −K)
+
. (22.8)
Solving this equation for f, f(x, T) is what V
0
should be, i.e., the cost of setting up the
equivalent portfolio. Equation (22.5) shows what the trading strategy should be.
92
23. The fundamental theorem of finance.
In Section 19, we showed there was a probability measure under which P
t
= e
−rt
S
t
was a martingale. This is true very generally. Let S
t
be the price of a security in today’s
dollars. We will suppose S
t
is a continuous semimartingale, and can be written S
t
=
M
t
+A
t
.
Arbitrage means that there is a trading strategy H
s
such that there is no chance that
we lose anything and there is a positive profit with positive probability. Mathematically,
arbitrage exists if there exists H
s
that is adapted and satisfies a suitable integrability
condition with

T
0
H
s
dS
s
≥ 0, a.s.
and
P

T
0
H
s
dS
s
> b

> ε
for some b, ε > 0. It turns out that to get a necessary and sufficient condition for S
t
to be
a martingale, we need a slightly weaker condition.
The NFLVR condition (“no free lunch with vanishing risk”) is that there do not
exist a fixed time T, ε, b > 0, and H
n
(that are adapted and satisfy the appropriate
integrability conditions) such that

T
0
H
n
(s) dS
s
> −
1
n
, a.s.
for all t and
P

T
0
H
n
(s) dS
s
> b

> ε.
Here T, b, ε do not depend on n. The condition says that one can with positive
probability ε make a profit of b and with a loss no larger than 1/n.
Two probabilities P and Q are equivalent if P(A) = 0 if and only Q(A) = 0,
i.e., the two probabilities have the same collection of sets of probability zero. Q is an
equivalent martingale measure if Q is a probability measure, Q is equivalent to P, and S
t
is a martingale under Q.
Theorem 23.1. If S
t
is a continuous semimartingale and the NFLVR conditions holds,
then there exists an equivalent martingale measure Q.
The proof is rather technical and involves some heavy-duty measure theory, so we
will only point examine a part of it. Suppose that we happened to have S
t
= W
t
+ f(t),
where f(t) is a deterministic increasing continuous function. To obtain the equivalent
martingale measure, we would want to let
M
t
= e

t
0
f

(s)dW
s

1
2

t
0
(f

(s))
2
ds
.
93
In order for M
t
to make sense, we need f to be differentiable. A result from measure
theory says that if f is not differentiable, then we can find a subset A of [0, ∞) such
that

t
0
1
A
(s)ds = 0 but the amount of increase of f over the set A is positive. This last
statement is phrased mathematically by saying

t
0
1
A
(s)df(s) > 0,
where the integral is a Riemann-Stieltjes (or better, a Lebesgue-Stieltjes) integral. Then
if we hold H
s
= 1
A
(s) shares at time s, our net profit is

t
0
H
s
dS
s
=

t
0
1
A
(s)dW
s
+

t
0
1
A
(s) df(s).
The second term would be positive since this is the amount of increase of f over the set
A. The first term is 0, since E(

t
0
1
A
(s)dW
s
)
2
=

t
0
1
A
(s)
2
ds = 0. So our net profit is
nonrandom and positive, or in other words, we have made a net gain without risk. This
contradicts “no arbitrage.” See Note 1 for more on this.
Sometime Theorem 23.1 is called the first fundamental theorem of asset pricing.
The second fundamental theorem is the following.
Theorem 23.2. The equivalent martingale measure is unique if and only if the market is
complete.
We will not prove this.
Note 1. We will not prove Theorem 23.1, but let us give a few more indications of what is
going on. First of all, recall the Cantor set. This is where E
1
= [0, 1], E
2
is the set obtained
from E
1
by removing the open interval (
1
3
,
2
3
), E
3
is the set obtained from E
2
by removing
the middle third from each of the two intervals making up E
2
, and so on. The intersection,
E = ∩

n=1
E
n
, is the Cantor set, and is closed, nonempty, in fact uncountable, yet it contains
no intervals. Also, the Lebesgue measure of A is 0. We set A = E. Let f be the Cantor-
Lebesgue function. This is the function that is equal to 0 on (−∞, 0], 1 on [1, ∞), equal to
1
2
on the interval [
1
3
,
2
3
], equal to
1
4
on [
1
9
,
2
9
], equal to
3
4
on [
7
9
,
8
9
], and is defined similarly on
each interval making up the complement of A. It turns out we can define f on A so that it is
continuous, and one can show

1
0
1
A
(s)df(s) = 1. So this A and f provide a concrete example
of what we were discussing.
94
24. American puts.
The proper valuation of American puts is one of the important unsolved problems
in mathematical finance. Recall that a European put pays out (K − S
T
)
+
at time T,
while an American put allows one to exercise early. If one exercises an American put at
time t < T, one receives (K − S
t
)
+
. Then during the period [t, T] one receives interest,
and the amount one has is (K − S
t
)
+
e
r(T−t)
. In today’s dollars that is the equivalent of
(K−S
t
)
+
e
−rt
. One wants to find a rule, known as the exercise policy, for when to exercise
the put, and then one wants to see what the value is for that policy. Since one cannot look
into the future, one is in fact looking for a stopping time τ that maximizes
Ee
−rτ
(K −S
τ
)
+
.
There is no good theoretical solution to finding the stopping time τ, although good
approximations exist. We will, however, discuss just a bit of the theory of optimal stopping,
which reworks the problem into another form.
Let G
t
denote the amount you will receive at time t. For American puts, we set
G
t
= e
−rt
(K −S
t
)
+
.
Our problem is to maximize EG
τ
over all stopping times τ.
We first need
Proposition 24.1. If S and T are bounded stopping times with S ≤ T and M is a
martingale, then
E[M
T
[ T
S
] = M
S
.
Proof. Let A ∈ T
S
. Define U by
U(ω) =

S(ω) if ω ∈ A,
T(ω) if ω / ∈ A.
It is easy to see that U is a stopping time, so by Doob’s optional stopping theorem,
EM
0
= EM
U
= E[M
S
; A] +E[M
T
; A
c
].
Also,
EM
0
= EM
T
= E[M
T
; A] +E[M
T
; A
c
].
Taking the difference, E[M
T
; A] = E[M
s
; A], which is what we needed to show.
Given two supermartingales X
t
and Y
t
, it is routine to check that X
t
∧ Y
t
is also a
supermartingale. Also, if X
n
t
are supermartingales with X
n
t
↓ X
t
, one can check that X
t
95
is again a supermartingale. With these facts, one can show that given a process such as
G
t
, there is a least supermartingale larger than G
t
.
So we define W
t
to be a supermartingale (with respect to P, of course) such that
W
t
≥ G
t
a.s for each t and if Y
t
is another supermartingale with Y
t
≥ G
t
for all t, then
W
t
≤ Y
t
for all t. We set τ = inf¦t : W
t
= G
t
¦. We will show that τ is the solution to the
problem of finding the optimal stopping time. Of course, computing W
t
and τ is another
problem entirely.
Let
T
t
= ¦τ : τ is a stopping time, t ≤ τ ≤ T¦.
Let
V
t
= sup
τ∈T
t
E[G
τ
[ T
t
].
Proposition 24.2. V
t
is a supermartingale and V
t
≥ G
t
for all t.
Proof. The fixed time t is a stopping time in T
t
, so V
t
≥ E[G
t
[ T
t
] = G
t
, or V
t
≥ G
t
. so
we only need to show that V
t
is a supermartingale.
Suppose s < t. Let π be the stopping time in T
t
for which V
t
= E[G
π
[ T
t
].
π ∈ T
t
⊂ T
s
. Then
E[V
t
[ T
s
] = E[G
π
[ T
s
] ≤ sup
τ∈T
s
E[G
τ
[ T
s
] = V
s
.
Proposition 24.3. If Y
t
is a supermartingale with Y
t
≥ G
t
for all t, then Y
t
≥ V
t
.
Proof. If τ ∈ T
t
, then since Y
t
is a supermartingale, we have
E[Y
τ
[ T
t
] ≤ Y
t
.
So
V
t
= sup
τ∈T
t
E[G
τ
[ T
t
] ≤ sup
τ∈T
t
E[Y
τ
[ T
t
] ≤ Y
t
.
What we have shown is that W
t
is equal to V
t
. It remains to show that τ is optimal.
There may in fact be more than one optimal time, but in any case τ is one of them. Recall
we have T
0
is the σ-field generated by S
0
, and hence consists of only ∅ and Ω.
96
Proposition 24.4. τ is an optimal stopping time.
Proof. Since T
0
is trivial, V
0
= sup
τ∈T
0
E[G
τ
[ T
0
] = sup
τ
E[G
τ
]. Let σ be a stopping
time where the supremum is attained. Then
V
0
≥ E[V
σ
[ T
0
] = E[V
σ
] ≥ E[G
σ
] = V
0
.
Therefore all the inequalities must be equalities. Since V
σ
≥ G
σ
, we must have V
σ
= G
σ
.
Since τ was the first time that W
t
equals G
t
and W
t
= V
t
, we see that τ ≤ σ. Then
E[G
τ
] = E[V
τ
] ≥ EV
σ
= EG
σ
.
Therefore the expected value of G
τ
is as least as large as the expected value of G
σ
, and
hence τ is also an optimal stopping time.
The above representation of the optimal stopping problem may seem rather bizarre.
However, this procedure gives good usable results for some optimal stopping problems. An
example is where G
t
is a function of just W
t
.
97
25. Term structure.
We now want to consider the case where the interest rate is nondeterministic, that
is, it has a random component. To do so, we take another look at option pricing.
Accumulation factor. Let r(t) be the (random) interest rate at time t. Let
β(t) = e

t
0
r(u)du
be the accumulation factor. One dollar at time T will be worth 1/β(T) in today’s dollars.
Let V = (S
T
−K)
+
be the payoff on the standard European call option at time T
with strike price K, where S
t
is the stock price. In today’s dollars it is worth, as we have
seen, V/β(T). Therefore the price of the option should be
E

V
β(T)

.
We can also get an expression for the value of the option at time t. The payoff, in terms
of dollars at time t, should be the payoff at time T discounted by the interest or inflation
rate, and so should be
e

T
t
r(u)du
(S
T
−K)
+
.
Therefore the value at time t is
E

e

T
t
r(u)du
(S
T
−K)
+
[ T
t

= E

β(t)
β(T)
V [ T
t

= β(t)E

V
β(T)
[ T
t

.
From now on we assume we have already changed to the risk-neutral measure and
we write P instead of P.
Zero coupon. A zero coupon bond with maturity date T pays $1 at time T and nothing
before. This is equivalent to an option with payoff value V = 1. So its price at time t, as
above, should be
B(t, T) = β(t)E

1
β(T)
[ T
t

= E

e

T
t
r(u)du
[ T
t

.
Let’s derive the SDE satisfied by B(t, T). Let N
t
= E[1/β(T) [ T
t
]. This is a
martingale. By the martingale representation theorem,
N
t
= E[1/β(T)] +

t
0
H
s
dW
s
for some adapted integrand H
s
. So B(t, T) = β(t)N
t
. Here T is fixed. By Ito’s product
formula,
dB(t, T) = β(t)dN
t
+N
t
dβ(t)
= β(t)H
t
dW
t
+N
t
r(t)β(t)dt
= β(t)H
t
dW
t
+B(t, T)r(t)dt,
98
and we thus have
dB(t, T) = β(t)H
t
dW
t
+B(t, T)r(t)dt. (25.1)
Forward rates. We now discuss forward rates. If one holds T fixed and graphs B(t, T) as
a function of t, the graph will not clearly show the behavior of r. One sometimes specifies
interest rates by what are known as forward rates.
Suppose we want to borrow $1 at time T and repay it with interest at time T +ε.
At the present time we are at time t ≤ T. Let us try to accomplish this by buying a zero
coupon bond with maturity date T and shorting (i.e., selling) N zero coupon bonds with
maturity date T +ε. Our outlay of money at time t is
B(t, T) −NB(t, T +ε) = 0.
If we set
N = B(t, T)/B(t, T +ε),
our outlay at time t is 0. At time T we receive $1. At time T +ε we pay B(t, T)/B(t, T +ε).
The effective rate of interest R over the time period T to T +ε is
e
εR
=
B(t, T)
B(t, T +ε)
.
Solving for R, we have
R =
log B(t, T) −log B(t, T +ε)
ε
.
We now let ε → 0. We define the forward rate by
f(t, T) = −

∂T
log B(t, T). (25.2)
Sometimes interest rates are specified by giving f(t, T) instead of B(t, T) or r(t).
Recovering B from f. Let us see how to recover B(t, T) from f(t, T). Integrating, we have

T
t
f(t, u)du = −

T
t

∂u
log B(t, u)du = −log B(t, u) [
u=T
u=t
= −log B(t, T) + log B(t, t).
Since B(t, t) is the value of a zero coupon bond at time t which expires at time t, it is
equal to 1, and its log is 0. Solving for B(t, T), we have
B(t, T) = e

T
t
f(t,u)du
. (25.3)
99
Recovering r from f. Next, let us show how to recover r(t) from the forward rates. We
have
B(t, T) = E

e

T
t
r(u)du
[ T
t

.
Differentiating,

∂T
B(t, T) = E

−r(T)e

T
t
r(u)du
[ T
t

.
Evaluating this when T = t, we obtain
E[−r(t) [ T
t
] = −r(t). (25.4)
On the other hand, from (25.3) we have

∂T
B(t, T) = −f(t, T)e

T
t
f(t,u)du
.
Setting T = t we obtain −f(t, t). Comparing with (25.4) yields
r(t) = f(t, t). (25.5)
100
26. Some interest rate models.
Heath-Jarrow-Morton model
Instead of specifying r, the Heath-Jarrow-Morton model (HJM) specifies the forward
rates:
df(t, T) = σ(t, T)dW
t
+α(t, T)dt. (26.1)
Let us derive the SDE that B(t, T) satisfies. Let
α

(t, T) =

T
t
α(t, u)du, σ

(t, T) =

T
t
σ(t, u)du.
Since B(t, T) = exp(−

T
t
f(t, u)du), we derive the SDE for B by using Ito’s formula with
the function e
x
and X
t
= −

T
t
f(t, u)du. We have
dX
t
= f(t, t)dt −

T
t
df(t, u)du
= r(t)dt −

T
t
[α(t, u)dt +σ(t, u)dW
t
] du
= r(t)dt −

T
t
α(t, u)du

dt −

T
t
σ(t, u)du

dW
t
= r(t)dt −α

(t, T)dt −σ

(t, T)dW
t
.
Therefore, using Ito’s formula,
dB(t, T) = B(t, T)dX
t
+
1
2
B(t, T)(σ

(t, T))
2
dt
= B(t, T)

r(t) −α

+
1
2


)
2

dt −σ

B(t, T)dW
t
.
From (25.1) we know the dt term must be B(t, T)r(t)dt, hence
dB(t, T) = B(t, T)r(t)dt −σ

B(t, T)dW
t
.
Comparing with (26.1), we see that if P is the risk-neutral measure, we have α

=
1
2


)
2
.
See Note 1 for more on this.
Hull and White model
In this model, the interest rate r is specified as the solution to the SDE
dr(t) = σ(t)dW
t
+ (a(t) −b(t)r(t))dt. (26.2)
Here σ, a, b are deterministic functions. The stochastic integral term introduces random-
ness, while the a − br term causes a drift toward a(t)/b(t). (Note that if σ(t) = σ, a(t) =
a, b(t) = b are constants and σ = 0, then the solution to (26.2) becomes r(t) = a/b.)
101
(26.2) is one of those SDE’s that can be solved explicitly. Let K(t) =

t
0
b(u)du.
Then
d

e
K(t)
r(t)

= e
K(t)
r(t)b(t)dt +e
K(t)

a(t) −b(t)r(t)

dt +e
K(t)
[σ(t)dW
t
]
= e
K(t)
a(t)dt +e
K(t)
[σ(t)dW
t
].
Integrating both sides,
e
K(t)
r(t) = r(0) +

t
0
e
K(u)
a(u)du +

t
0
e
K(u)
σ(u)dW
u
.
Multiplying both sides by e
−K(t)
, we have the explicit solution
r(t) = e
−K(t)

r(0) +

t
0
e
K(u)
a(u)du +

t
0
e
K(u)
σ(u)dW
u

.
If F(u) is deterministic, then

t
0
F(u)dW
u
= lim
¸
F(u
i
)(W
u
i+1
−W
u
i
).
From undergraduate probability, linear combinations of Gaussian r.v.’s (Gaussian = nor-
mal) are Gaussian, and also limits of Gaussian r.v.’s are Gaussian, so we conclude that the
r.v.

t
0
F(u)dW
u
is Gaussian. We see that the mean at time t is
Er(t) = e
−K(t)

r(0) +

t
0
e
K(u)
a(u)du

.
We know how to calculate the second moment of a stochastic integral, so
Var r(t) = e
−2K(t)

t
0
e
2K(u)
σ(u)
2
du.
(One can similarly calculate the covariance of r(s) and r(t).) Limits of linear combinations
of Gaussians are Gaussian, so we can calculate the mean and variance of

T
0
r(t)dt and get
an explicit expression for
B(0, T) = Ee

T
0
r(u)du
.
Cox-Ingersoll-Ross model
One drawback of the Hull and White model is that since r(t) is Gaussian, it can take
negative values with positive probability, which doesn’t make sense. The Cox-Ingersoll-
Ross model avoids this by modeling r by the SDE
dr(t) = (a −br(t))dt +σ

r(t)dW
t
.
102
The difference from the Hull and White model is the square root of r in the stochastic
integral term. This square root term implies that when r(t) is small, the fluctuations in
r(t) are larger than they are in the Hull and White model. Provided a ≥
1
2
σ
2
, it can be
shown that r(t) will never hit 0 and will always be positive. Although one cannot solve
for r explicitly, one can calculate the distribution of r. It turns out to be related to the
square of what are known in probability theory as Bessel processes. (The density of r(t),
for example, will be given in terms of Bessel functions.)
Note 1. If P is not the risk-neutral measure, it is still possible that one exists. Let θ(t) be a
function of t, let M
t
= exp(−

t
0
θ(u)dW
u

1
2

t
0
θ(u)
2
du) and define P(A) = E[M
T
; A] for
A ∈ T
T
. By the Girsanov theorem,
dB(t, T) = B(t, T)

r(t) −α

+
1
2


)
2


θ]dt −σ

B(t, T)d

W
t
,
where

W
t
is a Brownian motion under P. Again, comparing this with (25.1) we must have
α

=
1
2


)
2


θ.
Differentiating with respect to T, we obtain
α(t, T) = σ(t, T)σ

(t, T) +σ(t, T)θ(t).
If we try to solve this equation for θ, there is no reason off-hand that θ depends only on t and
not T. However, if θ does not depend on T, P will be the risk-neutral measure.
103
Problems
1. Show E[XE[Y [ (] ] = E[Y E[X [ (] ].
2. Prove that E[aX
1
+bX
2
[ (] = aE[X
1
[ (] +bE[X
2
[ (].
3. Suppose X
1
, X
2
, . . . , X
n
are independent and for each i we have P(X
i
= 1)
= P(X
i
= −1) =
1
2
. Let S
n
=
¸
n
i=1
X
i
. Show that M
n
= S
3
n
−3nS
n
is a martingale.
4. Let X
i
and S
n
be as in Problem 3. Let φ(x) =
1
2
(e
x
+e
−x
). Show that M
n
= e
aS
n
φ(a)
−n
is a martingale for each a real.
5. Suppose M
n
is a martingale, N
n
= M
2
n
, and EN
n
< ∞ for each n. Show
E[N
n+1
[ T
n
] ≥ N
n
for each n. Do not use Jensen’s inequality.
6. Suppose M
n
is a martingale, N
n
= [M
n
[, and EN
n
< ∞ for each n. Show
E[N
n+1
[ T
n
] ≥ N
n
for each n. Do not use Jensen’s inequality.
7. Suppose X
n
is a martingale with respect to (
n
and T
n
= σ(X
1
, . . . , X
n
). Show X
n
is
a martingale with respect to T
n
.
8. Show that if X
n
and Y
n
are martingales with respect to ¦T
n
¦ and Z
n
= max(X
n
, Y
n
),
then E[Z
n+1
[ T
n
] ≥ Z
n
.
9. Let X
n
and Y
n
be martingales with EX
2
n
< ∞ and EY
2
n
< ∞. Show
EX
n
Y
n
−EX
0
Y
0
=
n
¸
m=1
E(X
m
−X
m−1
)(Y
m
−Y
m−1
).
10. Consider the binomial asset pricing model with n = 3, u = 3, d =
1
2
, r = 0.1, S
0
= 20,
and K = 10. If V is a European call with strike price K and exercise date n, compute
explicitly the random variables V
1
and V
2
and calculate the value V
0
.
11. In the same model as problem 1, compute the hedging strategy ∆
0
, ∆
1
, and ∆
2
.
12. Show that in the binomial asset pricing model the value of the option V at time k is
V
k
.
13. Suppose X
n
is a submartingale. Show there exists a martingale M
n
such that if
A
n
= X
n
−M
n
, then A
0
≤ A
1
≤ A
2
≤ and A
n
is T
n−1
measurable for each n.
14. Suppose X
n
is a submartingale and X
n
= M
n
+A
n
= M

n
+A

n
, where both A
n
and
A

n
are T
n−1
measurable for each n, both M and M

are martingales, both A
n
and A

n
increase in n, and A
0
= A

0
. Show M
n
= M

n
for each n.
15. Suppose that S and T are stopping times. Show that max(S, T) and min(S, T) are
also stopping times.
104
16. Suppose that S
n
is a stopping time for each n and S
1
≤ S
2
≤ . Show S = lim
n→∞
S
n
is also a stopping time. Show that if instead S
1
≥ S
2
≥ and S = lim
n→∞
S
n
, then S is
again a stopping time.
17. Let W
t
be Brownian motion. Show that e
iuW
t
+u
2
t/2
can be written in the form

t
0
H
s
dW
s
and give an explicit formula for H
s
.
18. Suppose M
t
is a continuous bounded martingale for which 'M`

is also bounded.
Show that
2
n
−1
¸
i=0
(Mi+1
2
n
−M i
2
n
)
2
converges to 'M`
1
as n → ∞.
[Hint: Show that Ito’s formula implies
(Mi+1
2
n
−M i
2
n
)
2
=

(i+1)/2
n
i/2
n
(M
s
−M i
2
n
)dM
s
+'M` i+1
2
n
−'M` i
2
n
.
Then sum over i and show that the stochastic integral term goes to zero as n → ∞.]
19. Let f
ε
(0) = f

ε
(0) = 0 and f

ε
(x) =
1

1
(−ε,ε)
(x). You may assume that it is valid to
use Ito’s formula with the function f
ε
(note f
ε
/ ∈ C
2
). Show that
1

t
0
1
(−ε,ε)
(W
s
)ds
converges as ε → 0 to a continuous nondecreasing process that is not identically zero and
that increases only when X
t
is at 0.
[Hint: Use Ito’s formula to rewrite
1

t
0
1
(−ε,ε)
(W
s
)ds in terms of f
ε
(W
t
) −f
ε
(W
0
) plus a
stochastic integral term and take the limit in this formula.]
20. Let X
t
be the solution to
dX
t
= σ(X
t
)dW
t
+b(X
t
)dt, X
0
= x,
where W
t
is Brownian motion and σ and b are bounded C

functions and σ is bounded
below by a positive constant. Find a nonconstant function f such that f(X
t
) is a martin-
gale.
[Hint: Apply Ito’s formula to f(X
t
) and obtain an ordinary differential equation that f
needs to satisfy.]
21. Suppose X
t
= W
t
+ F(t), where F is a twice continuously differentiable function,
F(0) = 0, and W
t
is a Brownian motion under P. Find a probability measure Q under
105
which X
t
is a Brownian motion and prove your statement. (You will need to use the
general Girsanov theorem.)
22. Suppose X
t
= W
t

t
0
X
s
ds. Show that
X
t
=

t
0
e
s−t
dW
s
.
23. Suppose we have a stock where σ = 2, K = 15, S
0
= 10, r = 0.1, and T = 3. Suppose
we are in the continuous time model. Determine the price of the standard European call
using the Black-Scholes formula.
23. Let
ψ(t, x, y, µ) = P(sup
s≤t
(W
s
+µs) = y for s ≤ t, W
t
= x),
where W
t
is a Brownian motion. More precisely, for each A, B, C, D,
P(A ≤ sup
s≤t
(W
s
+µs) ≤ B, C ≤ W
t
≤ D) =

D
C

B
A
ψ(t, x, y, µ)dy dx.
(ψ has an explicit formula, but we don’t need that here.) Let the stock price S
t
be given
by the standard geometric Brownian motion. Let V be the option that pays off sup
s≤T
S
s
at time T. Determine the price at time 0 of V as an expression in terms of ψ.
25. Suppose the interest rate is 0 and S
t
is the standard geometric Brownian motion stock
price. Let A and B be fixed positive reals, and let V be the option that pays off 1 at time
T if A ≤ S
T
≤ B and 0 otherwise.
(a) Determine the price at time 0 of V .
(b) Find the hedging strategy that duplicates the claim V .
26. Let V be the standard European call that has strike price K and exercise date T. Let
r and σ be constants, as usual, but let µ(t) be a deterministic (i.e., nonrandom) function.
Suppose the stock price is given by
dS
t
= σS
t
dW
t
+µ(t)S
t
dt,
where W
t
is a Brownian motion. Find the price at time 0 of V .
106

1. Introduction. In this course we will study mathematical finance. Mathematical finance is not about predicting the price of a stock. What it is about is figuring out the price of options and derivatives. The most familiar type of option is the option to buy a stock at a given price at a given time. For example, suppose Microsoft is currently selling today at $40 per share. A European call option is something I can buy that gives me the right to buy a share of Microsoft at some future date. To make up an example, suppose I have an option that allows me to buy a share of Microsoft for $50 in three months time, but does not compel me to do so. If Microsoft happens to be selling at $45 in three months time, the option is worthless. I would be silly to buy a share for $50 when I could call my broker and buy it for $45. So I would choose not to exercise the option. On the other hand, if Microsoft is selling for $60 three months from now, the option would be quite valuable. I could exercise the option and buy a share for $50. I could then turn around and sell the share on the open market for $60 and make a profit of $10 per share. Therefore this stock option I possess has some value. There is some chance it is worthless and some chance that it will lead me to a profit. The basic question is: how much is the option worth today? The huge impetus in financial derivatives was the seminal paper of Black and Scholes in 1973. Although many researchers had studied this question, Black and Scholes gave a definitive answer, and a great deal of research has been done since. These are not just academic questions; today the market in financial derivatives is larger than the market in stock securities. In other words, more money is invested in options on stocks than in stocks themselves. Options have been around for a long time. The earliest ones were used by manufacturers and food producers to hedge their risk. A farmer might agree to sell a bushel of wheat at a fixed price six months from now rather than take a chance on the vagaries of market prices. Similarly a steel refinery might want to lock in the price of iron ore at a fixed price. The sections of these notes can be grouped into five categories. The first is elementary probability. Although someone who has had a course in undergraduate probability will be familiar with some of this, we will talk about a number of topics that are not usually covered in such a course: σ-fields, conditional expectations, martingales. The second category is the binomial asset pricing model. This is just about the simplest model of a stock that one can imagine, and this will provide a case where we can see most of the major ideas of mathematical finance, but in a very simple setting. Then we will turn to advanced probability, that is, ideas such as Brownian motion, stochastic integrals, stochastic differential equations, Girsanov transformation. Although to do this rigorously requires measure theory, we can still learn enough to understand and work with these concepts. We then 2

return to finance and work with the continuous model. We will derive the Black-Scholes formula, see the Fundamental Theorem of Asset Pricing, work with equivalent martingale measures, and the like. The fifth main category is term structure models, which means models of interest rate behavior. I found some unpublished notes of Steve Shreve extremely useful in preparing these notes. I hope that he has turned them into a book and that this book is now available. The stochastic calculus part of these notes is from my own book: Probabilistic Techniques in Analysis, Springer, New York, 1995. I would also like to thank Evarist Gin´ who pointed out a number of errors. e

3

A typical σ-field F would be the collection of all subsets of Ω. ∅ denotes the empty set. T T }}. but to try to add to the intuition. F is the σ-field where you “know” everything. then G is also a σ-field. But the complement is {T H. but we will later need to distinguish between various σ-fields. in which case we would know that the first toss was a tails. we would only know whether the event {HH. We would then know which of the events {HH}. . We will only work with discrete models at first. the set with no elements. Typically. ∈ F implies both ∪∞ Ai ∈ F and ∩∞ Ai ∈ F. {HT }. {T H. in an elementary probability course. So Ω = {HH. suppose one knows whether an event in F has happened or not for a particular outcome. HT }. ∈ (is an element of). and A1 . T T }. so we require the complement of that set to be in G as well. . We require F to be a σ-field. or {T T } has happened and so would know what the two tosses of the coin showed. Ω ∈ F. Definition 2. A2 . Let’s begin by recalling some of the definitions and basic concepts of elementary probability. HT } is in G. We start with an arbitrary set.2. Ω. {T H}. i=1 i=1 Here Ac = {ω ∈ Ω : ω ∈ A} denotes the complement of A. or we would know whether the event {T H. Suppose one tosses a coin two times and lets Ω denote all possible outcomes. F will consist of all subsets of Ω. but to illustrate. the capital Greek letter “omega. which we will denote by Ω. Here is an example.” In this example. the event {HH. T T } and that event is indeed in G. But if we let G = {∅. HT. ⊂ (contained in). These are called events. ∩ (intersection).” We are given a class F of subsets of Ω. On the other hand. A ∈ F implies Ac ∈ F. if we know which events in G happened. One point of view which we will explore much more fully later on is that the σ-field tells you what events you “know. We will use without special comment the usual notations of ∪ (union).1. We won’t try to be precise here. while G is the σ-field where you “know” only the result of the first toss but not the second. . {HH. One has to check the definition. Review of elementary probability. Much more on this later. 4 . called the probability space. which means we would know that the first toss was a heads. In this case it is trivial to show that F is a σ-field. since every subset is in F. T H. The third basic ingredient is a probability. that / is. T T } happened. But there is no way to tell what happened on the second toss from knowing which events in G happened. HT } happened. A collection F of subsets of Ω is called a σ-field if (1) (2) (3) (4) ∅ ∈ F.

. A discrete r. In the example above where we tossed a coin two times. Therefore X is measurable with respect to F.) is a function X from Ω to R. and i P(ω : X(ω) = ai ) = 1. say. There are a number of conclusions one can draw from this definition. then P(A) ≤ P(B) and P(Ac ) = 1 − P(A). a2 . Then if Y is G-measurable.  {HH} if 1 < a ≤ 2. to be a r. Someone who has had measure theory will realize that a σ-field is the same thing as a σ-algebra and a probability is a measure of total mass one. a1 . then the event where the number of heads is 3 or greater is the 2 event where we had two heads. To be more precise. ∈ F are pairwise disjoint. A function P on F is a probability if it satisfies (1) (2) (3) (4) if A ∈ F. 3 However it is not true that Aa is in G for every value of a – take a = 2 as just one example – the subset {HH} is not in G. x2 .  ∅ if 2 < a. Now observe that for each a the event Aa is in F because F contains all subsets of Ω.v. then P(∪∞ Ai ) = i=1 ∞ i=1 P(Ai ). . 3 For example. Ω  {HH. Here is an example. P(Ω) = 1. we can compute the value of Y . the reals. Then X is F measurable but not G measurable. T H} if 0 < a ≤ 1. {HH}. thus (X = x) means the same as {ω : X(ω) = x}. it is enough that (X = a) ∈ F for all reals a. then if Y is G-measurable. are the values of 5 . To see this. This event will equal  if a ≤ 0. if A ⊂ B. X must also be measurable. let X be the number of heads in the two tosses. . which means that {ω : X(ω) ≥ a} ∈ F for all reals a.v.2. if a = 2 . . As one example. A2 . The notion of measurability has a simple definition but is a bit subtle. In defining sets one usually omits the ω. In the discrete case. suppose we know whether or not the event has occurred for each event in G. See Note 1 at the end of this section for a proof. The reason for this is that if x1 .Definition 2. then we know Y . to check measurability with respect to a σ-field F. . A random variable (abbreviated r.. is one where P(ω : X(ω) = a) = 0 for all but countably many a’s.v. . If we take the point of view that we know all the events in G. Phrased another way. A collection of sets Ai is pairwise disjoint if Ai ∩ Aj = ∅ unless i = j. So X is not measurable with respect to the σ-field G. . let us consider Aa = {ω ∈ Ω : X(ω) ≥ a}. HT. . and if A1 . and P(∅) = 0. . namely. then 0 ≤ P(A) ≤ 1.

. A common misconception is that an event is independent of itself. then we can write (X ≥ a) = ∪xi ≥a (X = xi ) and we have a countable union. ij } is a subset of {1. then this is a finite sum and of course it will converge.. X and a σ-field G are independent if P((X ∈ A) ∩ B) = P(X ∈ A)P(B) whenever A is a subset of the reals and B ∈ G. n=1 2 · 2 There is an alternate definition of expectation which is equivalent in the discrete setting.” Thus P(X ∈ A. n}.x for which P(X = x) = 0. . Y ∈ B) = P(X ∈ A)P(X ∈ B) for all A and B that are subsets of the reals. such as E (X + Y ) = E X + E Y . Given a discrete r. This is the situation that we will consider for quite some time. . Y ∈ B) means “and. so an event is independent of itself only if it has probability 0 or 1. So if (X = xi ) ∈ F. . ω∈Ω To see that this is the same. convergence needs to be checked. The comma in the expression P(X ∈ A. . An are independent if P(Ai1 ∩ · · · ∩ Aij ) = P(Ai1 ) · · · P(Aij ) whenever {i1 . The advantage of the second definition is that some properties of expectation. The extension of the definition of independence to the case of more than two events or random variables is not surprising: A1 .v. 2. A r. Set EX = X(ω)P({ω}). Two σ-fields F and G are independent if A and B are independent whenever A ∈ F and B ∈ G. . then P(A) = P(A ∩ A) = P(A)P(A) = (P(A))2 . if P(X = 2n ) = 2−n for n = 1. then E X = ∞ n −n = ∞. The only finite solutions to the equation x = x2 are x = 0 and x = 1. are immediate. . .v. X. . 6 . Y ∈ B) = P((X ∈ A) ∩ (Y ∈ B)). . . For example. look at Note 2 at the end of the section. . . We say two events A and B are independent if P(A ∩ B) = P(A)P(B). If X only takes finitely many values. However. while with the first definition they require quite a bit of proof. . then (X ≥ a) ∈ F. Two random variables X and Y are independent if P(X ∈ A. the expectation or mean is defined by EX = x xP(X = x) provided the sum converges. if X can take an infinite number of values (but countable). If A is an event that is independent of itself. .

Xn are n independent r. {HH. The probability of A given B. such that for each one P(Xi = 1) = p.s. Ω. where the probability of a success is p.s X and Y are independent.v. . It is an easy consequence of the multiplication theorem that if X and Y are independent. which says that E (XY ) = (E X)(E Y ) provided all the expectations are finite.v. where p ∈ [0. {T H.v. If two r. We close this section with a definition of conditional probability. . The conditional expectation of X given B is defined to be E [X.. Then G1 and G2 are independent if P(HH) = P(HT ) = P(T H) = P(T T ) = 1 . Suppose X1 . The random variable Sn = i=1 Xi is called a binomial r. {HT. Similarly. T T }}. Var (X + Y ) = Var X + Var Y. suppose we toss a coin two times and we define the σ-fields G1 = {∅. {HH. P(B) provided P(B) = 0. n P(Xi = 0) = 1 − p. An important result in probability is that P(Sn = k) = n! pk (1 − p)n−k . we have the multiplication theorem. .As an example. the number of successes in n trials. B] . The expression E [X 2 ] is sometimes called the second moment of X. HT }. Ω. This is also equal to E [X 2 ] − (E X)2 . k!(n − k)! The variance of a random variable is Var X = E [(X − E X)2 ]. written P(A | B) is defined by P(A ∩ B) . . for example. and represents. then that is the event that the first toss is a heads or it is the event that the first toss is a tails. 1]. (Here we are writing P(HH) 4 when a more accurate way would be to write P({HH}). T T }} and G2 = {∅. See Note 3 for a proof. T H}.) An easy way to understand this is that if we look at an event in G1 that is not ∅ or Ω. a set other than ∅ or Ω in G2 will be the event that the second toss is a heads or that the second toss is a tails. P(B) 7 .

Note 2. B] frequently.2(4) holds when there are only two sets instead of infinitely many. Solving for P(Ac ). (We will use the notation E [X.1) by Definition 2. B] is E [X. and a similar argument shows the same is true when there are an arbitrary (but finite) number of sets. B] means E [X1B ]. Let C = A and D = B − A.1) P(B) = P(C ∪ D) = P(C) + P(D) ≥ P(C) = P(A). Then C and D are disjoint. 8 .2(3) and (4). where 1B (ω) is 1 if ω ∈ B and 0 otherwise. The other equality we mentioned is proved by letting C = A and D = Ac . Then C and D are disjoint. Suppose we have two disjoint sets C and D. Let us show the two definitions of expectation are the same (in the discrete case). A2 = D. Another way of writing E [X. B] = ω∈B X(ω)P({ω}). Starting with the first definition we have EX = x xP(X = x) x x {ω∈Ω:X(ω)=x} = = P({ω}) X(ω)P({ω}) x {ω∈Ω:X(ω)=x} = ω∈Ω X(ω)P({ω}). and 1 = P(Ω) = P(C ∪ D) = P(C) + P(D) = P(A) + P(Ac ). and by (2.provided P(B) = 0. Let A1 = C. and Ai = ∅ for i ≥ 3. we have P(Ac ) = 1 − P(A). Then the Ai are pairwise disjoint and ∞ P(C ∪ D) = P(∪∞ Ai ) i=1 = i=1 P(Ai ) = P(C) + P(D) (2. Therefore Definition 2.) Note 1. The notation E [X. Now suppose A ⊂ B. where B − A is defined to be c B ∩ A (this is frequently written B \ A as well).

y2 . assuming the double sum converges. 9 . Since X and Y are independent.and we end up with the second definition. . . .. Ai = (X = xi ) is independent of Bj = (Y = yj ) and so E [XY ] = i j xi yj P(Ai )P(Bj ) xi P(Ai ) i j = = i yj P(Bj ) xi P(Ai )E Y = (E X)(E Y ). Suppose X can takes the values x1 . Y = j y j 1B j . . x2 . . Note 3. Since 1Ai 1Bj = 1Ai ∩Bj . Let Ai = {ω : X(ω) = xi } and Bj = {ω : Y (ω) = yj }. and so XY = i j xi yi 1Ai 1Bj . . it follows that E [XY ] = i j xi yj P(Ai ∩ Bj ). and Y can take the values y1 . Then X= i xi 1Ai .

σ-fields that can be represented as in Definition 3. We need to give this random variable a name. Conditional expectation. . Suppose we have 200 men and 100 women. we will make conditional probabilities random.3. The way to do that is to make conditional probability a random variable rather than a number. To reiterate. Let F1 consist of the sets ∅. and G is the σ-field one obtains by taking all finite or countable unions of the Bi . Let’s look at another example. 70 of the men are smokers. P(A ∩ Bi ) 1Bi (ω). We introduce the random variable (.35)1M + (. So F1 consists of those events that can be determined by knowing the result of the first toss. B2 . We want to let F2 denote those events that can be determined by knowing the first two tosses. W be man. T T H. Suppose Ω consists of the possible results when we toss a coin three times: HHH. then the conditional probability that the person is a smoker given that it is a man is 70 divided by 200. What is the precise definition? Definition 3. Ω is equal to their union.35 and on the set W its value is . .50)1W and use that for our conditional probability. on the set Bi the conditional probability is equal to P(A | Bi ). etc. {HHH. B2 . Let M. P(Bi ) In short. T HT. so what we do is let G be the σ-field consisting of {∅. Ω. T T T }. S c smoker and nonsmoker. This will 10 .1 are called finitely (or countably) generated and are said to be generated by the sets B1 . M. . W } and denote this random variable P(S | G).50. Ω. If a person is chosen at random. We have P(S | M ) = . . or 50%. respectively.50. Let F3 denote all subsets of Ω. . So on the set M its value is .1. HT H. . Thus we are going to talk about the conditional probability of an event given a σ-field. HHT. so this definition will need to be extended when we get to continuous models. and S. Not every σ-field can be so represented. respectively.. and 50 of the women are smokers. Then the conditional probability of A given G is P(A | G) = i P(S | W ) = . HT T }.. Suppose there exist finitely (or countably) many sets B1 . or 35%. while the conditional probability the person is a smoker given that it is a women is 50 divided by 100. all having positive probability. woman.35. HHT. and {T HH. such that they are pairwise disjoint. We will want to be able to encompass both facts in a single entity.

they are! However. An example might help. Proof. P(A | D1 ) = P(HHH)/P(D1 ) = 1 / 4 = 1 . {HT H. Also. Proposition 3. those Bi for which bi ≥ a. So P(A | F2 ) = (. HT H. P(Bi ) i This is the obvious definition. D4 = {T T H. D3 = {T HH. we define E [X. Suppose Y = 2 · 1B1 + 3 · 1B2 + 6 · 1B3 + 4 · 1B4 and a = 3. HHT. Let D1 = {HHH. {T HH. T T H. {HHH. This is again plausible – the probability 1 of getting three heads given the first two tosses is 2 if the first two tosses were heads and 0 otherwise. Given a random variable X.2. Bi ] 1Bi = P(Bi ) bi 1Bi i if we set bi = E [X. 3. namely.include the sets ∅.50)1D1 . if Y = E [X | G]. 4 Next let us calculate P(A | F2 ). HHT }. HT T }. then (Y > a) is a set in G for each real a. So F2 is the σ-field consisting of all possible unions 1 of some of the Di ’s. {T T H. This is not enough to make F2 a σ-field. that is. Therefore P(A | F1 ) = (. P(A | F2 ). P(A | 8 2 Di ) = 0 for i = 2. Then (Y ≥ a) = B3 ∪ B4 . HT T }. which is in G. 4. First the conditional probability given F1 . We now turn to some properties of conditional expectation. T HT. 11 . and it agrees with what we had before because E [1A | G] should be equal to P(A | G). Suppose we tossed the coin independently and suppose that it was fair. and P(A | F3 ) when A is the event {HHH}.25)1C1 . Some of the following propositions may seem a bit technical. HHT }. Y = E [X | G] = i E [X. Let us calculate P(A | F1 ). This is plausible – the probability of getting three heads given the first toss is 1 if the first toss is a heads and 0 otherwise. T T T }. T T T }. D2 = {HT H. By the definition.5. Ω. On the set C1 the conditional probability is P(A∩C1 )/P(C1 ) = P(HHH)/P(C1 ) = 1 / 1 = 1 . T HT }. What about conditional expectation? Recall E [X. these properties are crucial to what follows and there is no choice but to master them. But the union of any collection of the Bi is in G. E [X | G] is G measurable. On the set C2 the conditional probability is P(A∩C2 )/P(C2 ) 8 2 4 = P(∅)/P(C2 ) = 0. T T T }. Let C1 = {HHH. The set (Y ≥ a) is a union of some of the Bi . T HT }. Bi ] E [X | G] = 1Bi . Bi ]/P(Bi ). In fact. as above. Bi ] = E [X1Bi ] and also that E [1B ] = 1 · P(1B = 1) + 0 · P(1B = 0) = P(B). HT T } and C2 = {T HH. so we add to F2 all sets that can be obtained by taking unions of these sets.

B4 ]. D2 = (Z = 3). Take a = 4 and we see D3 ∪ D4 ∈ G. 3. summing the above over the jk gives E [Y . C] = E [Y . we then write E [X. it is E [X. B2 ] = E [X.Bi ] P(Bi ) 1Bi and the Bi are disjoint. then (Z ≥ a) ∈ G for each a. then E [Y . Multiplying Y in the above example by 1B2 . then for any a we have (Y = a) ∈ G which means that (Y = a) is the union of one or more of the Bi . this equals E [Y . Suppose Z takes only the values 1. 4. then E [X. so 1B2 1Bi = 0 if i = 2. Let us look at the above example for this proposition. and we undo the above string of equalities but with Y instead of X to see that this is E [Y . Proof. C] = E [X1C ] = E [X(1B2 + 1B4 )] = E [X1B2 ] + E [X1B4 ] = E [X. So 3P(B2 ) = E [X. B2 ] + E [X. Let D1 = (Z = 1). On the other hand. it follows that Y must be constant on each Bi . if ω ∈ D2 . C] = E [X.3. for example. P(Bj ) E [Y . B2 ]+E [Y . Since Y = E [X. To see this. D4 = (Z = 7). Again let us look at an example. 12 . Note that we can write Z = 1 · 1D1 + 3 · 1D2 + 4 · 1D3 + 7 · 1D4 . and we see D4 ∈ G. it is not possible for an ω to be in more than one of the Bi . C]. Bj ] = Now if C = Bj1 ∪ · · · ∪ Bjn ∪ · · ·. B2 ] P(B2 ) = E [X. C] = E [X. However the number 3 is not just any number. If C ∈ G and Y = E [X | G]. Since the Bi are disjoint. for example. Note 1B2 1B2 = 1B2 because the product is 1 · 1 = 1 if ω is in B2 and 0 otherwise. we see that E [Y . C]. C]. Bj ] E 1Bj = E [X. C].v. By the first part.Proposition 3. Taking a = 3 shows D2 ∪ D3 ∪ D4 ∈ G. B2 ]/P(B2 ). Take a = 7. D3 = (Z = 4). P(B2 ) just as we wanted. Y is G measurable. B4 ]. Bj ]. 7. and let us do the case where C = B2 . the right hand side will be 0 + 3 · 1 + 0 + 0. If a r. If C = B1 ∪ B4 . which agrees with Z(ω). B2 ] = E [Y 1B2 ] = E [3 · 1B2 ] = 3E [1B2 ] = 3P(B2 ). Now if Z is G measurable.

or zi = E [X.6 says that as far as conditional expectations with respect to a σfield G go. then the predicted value of X1 should be larger than the predicted value of X2 . we must have Z constant on the Bi ’s.2 and 3. Proof. G-measurable random variables act like constants: they can be taken inside or outside the conditional expectation at will. Similarly D2 . we can give an interpretation of (1)-(5). then we know X and our best prediction of X is X itself. Proposition 3. We will see in Proposition 3.4. Proposition 3. Bi ]. Bi ] = E [X. then E [X | G] = X. (3) If X is G measurable. Let the value of Z on Bi be zi . and D4 = B5 . C] whenever C ∈ G. if it so happened that D1 = B1 . In this context. The following propositions contain the main facts about this new definition of conditional expectation that we will need. (4) says that the average of the predicted value of X should be the average value of X.5. (2) E [aX1 + bX2 | G] = aE [X1 | G] + bE [X2 | G]. then E [X1 | G] ≥ E [X2 | G]. D1 ∈ G.8 below that we may think of E [X | G] as the best prediction of X given G. the properties given in Propositions 3.5 in Note 1 at the end of the section. C] = E [X. then Z must be constant on each Bi . Bi ]/P(Bi ) as required. (1) If X1 ≥ X2 .6. Suppose Z is G measurable and E [Z. Proposition 3. (4) E [E [X | G]] = E X.c Now D3 = (D3 ∪ D4 ) ∩ D4 . Proposition 3. then E [X | G] = E X. If Z is G measurable. We will prove Proposition 3. (5) If X is independent of G. D2 = B2 ∪ B4 . D3 ∈ G. so since G is a σ-field. So Z = i zi 1Bi . D3 = B3 ∪ B6 ∪ B7 . then E [XZ | G] = ZE [X | G]. 13 . then the best prediction for the value of X is just E X. At this point it is more fruitful to understand what the proposition says. Then Z = E [X | G]. (5) says that if knowing G gives us no additional information on X. Then zi P(Bi ) = E [Z. We again defer the proof. Since Z is G measurable.3 uniquely determine E [X | G]. Because sets in G are unions of the Bi ’s. this time to Note 2. We still restrict ourselves to the discrete case. then Z = 1 · 1B1 + 3 · 1B2 + 4 · 1B3 + 3 · 1B4 + 7 · 1B5 + +4 · 1B6 + 4 · 1B7 . (3) says that if we know G and X is G measurable. (2) says that the predicted value of X1 + X2 should be the sum of the predicted values. For example. (1) says that if X1 is larger than X2 . Accepting this for the moment.

if we are predicting a prediction of X given limited information. C] = E [E [X | G]. In words. The left hand equality now follows by Proposition 3. So Y is the best predictor.Proposition 3.6. We compute. and will be equal if and only if Z = Y . If H ⊂ G ⊂ F.5(3) and Proposition 3. The right hand side is bigger than or equal to E [(X − Y )2 ] because (Y − Z)2 ≥ 0. C] as required.5(3). which is known as the mean square error. Let Z be any G-measurable random variable.7. C] = E [X. Let us verify that conditional expectation may be viewed as the best predictor of a random variable given a σ-field. let W be the right hand expression. Proof. since H ⊂ G.8. Proposition 3.5(4). E [X | H] is H measurable. a predictor Z is just another random variable. Taking expectations and using Proposition 3. E [(X − Z)2 ] = E [(X − Y )2 ] + E [(Y − Z)2 ]. using Proposition 3.. So the error in predicting X by Z is larger than the error in predicting X by Y .v. E [(X − Z)2 | G] = E [X 2 | G] − 2E [XZ | G] + E [Z 2 | G] = E [X 2 | G] − 2ZE [X | G] + Z 2 = E [X 2 | G] − 2ZY + Z 2 = E [X 2 | G] − Y 2 + (Y − Z)2 = E [X 2 | G] − 2Y E [X | G] + Y 2 + (Y − Z)2 = E [X 2 | G] − 2E [XY | G] + E [Y 2 | G] + (Y − Z)2 = E [(X − Y )2 | G] + (Y − Z)2 . then E [W . this is the same as a single prediction given the least amount of information. 14 .v. It is H measurable. and the goodness of the prediction will be measured by E [(X − Z)2 ]. Proof. If X is a r. hence G measurable. If X is a r. and if C ∈ H ⊂ G. To get the right hand equality. then E [E [X | H] | G] = E [X | H] = E [E [X | G] | H].. the best predictor among the collection of G-measurable random variables is Y = E [X | G]. We also used the fact that Y is G measurable.

By Proposition 3. C] = E [Z.this proves (3). (1) and (2) are immediate from the definition. Let Z = E X. Note E [Y (X − Y ) | G] = Y E [X − Y | G] = Y (E [X | G] − Y ) = Y (Y − Y ) = 0.4 we see Z = E [X | G]. These will be disjoint sets whose union is Ω. so by Proposition 3. By the independence. hence it is constant on Bi . If σ(Y ) is the collection of all unions of the Bi .. C] = E [X. Note that ZE [X | G] is G measurable. and what we have to check is that the inner product of Y and X − Y is 0. just as we wished. we let Bi = (Y = yi ). Note 1. Y and X − Y are orthogonal. that is. and is called the σ-field generated by Y . this is trivial. Last is (5). Note 2. Then E [ZE [X | G]. In this context. The collection of all random variables is a linear space. We write E [X | Y ] for E [X | σ(Y )]. If Y is a discrete random variable. Given X. note that if Z = X. . if C ∈ G. let its value be zi . To prove (4). it takes only countably many values y1 . so we must show E [Y (X − Y )] = 0. C] for any C ∈ G. it suffices to consider only the case when C is one of the Bi . the conditional expectation Y = E [X | G] is equal to the projection of X onto the subspace of G-measurable random variables. To see this. that is.4 we need to show its expectation over sets C in G is the same as that of XZ. To prove (3). then Z is G measurable and E [X. C] = E [X1C ] = (E X)(E 1C ) = (E X)(P(C)). Z is constant. y2 . C] = (E X)(P(C)) since Z is constant. Bi ] = zi E [E [X | G]. By Proposition 3. then E Y = E [Y . Bi ] = zi E [X. 15 . As in the proof of Proposition 3. But E [Z. Bi ] = E [XZ. It is easy to see that this is the smallest σ-field with respect to which Y is measurable. . then E [X.4 it follows that Z = E [X | G]. Taking expectations.There is one more interpretation of conditional expectation that may be useful. then σ(Y ) is a σ-field.6. E [Y (X − Y )] = E [E [Y (X − Y ) | G] ] = 0. if we let C = Ω and Y = E [X | G]. . Bi ] as desired. We prove Proposition 3. so clearly G measurable.5. we write X = Y + (X − Y ). Now Z is G measurable. Bi ] = E [zi E [X | G]. We prove Proposition 3. the inner product of X1 and X2 is defined to be E [X1 X2 ]. C] = E X.3. and the collection of all G-measurable random variables is clearly a subspace.

. it is more than an analogy: we won’t explore this. (4. .v. X2 . one needs a probability. we will say that Mn is a martingale with respect to the probability P. Definition 4. . An example would be repeatedly tossing a coin and letting Fk be the sets that can be determined by the first k tosses. Xk are measurable. . one can say that (Mn . Martingales. When it is needed for clarity. (Actually. (2) Mn is adapted to Fn . Doob defined these terms by analogy with the notions of subharmonic and superharmonic functions in analysis. . and it is (3) that is the crucial property. A martingale Mn is a sequence of random variables such that (1) Mn is integrable for all n. Xk . . Fn ) is a martingale. a sequence of r. X is integrable if E |X| < ∞. Definition 4. . be a sequence of random variables and let Fk be the σ-field generated by X1 . but instead of (3) we have (3 ) for all n E [Mn+1 | Fn ] ≤ Mn . A third example is to let X1 .1) Usually (1) and (2) are easy to check. so a martingale depends on the probability as well. . but instead of (3) we have (3 ) for all n E [Mn+1 | Fn ] ≥ Mn . For example. 16 . The nomenclature may seem like it goes the wrong way.1. then we say Mn is a submartingale. the smallest σ-field with respect to which X1 . and similarly for superharmonic functions. and (3) for all n E [Mn+1 | Fn ] = Mn . security prices and one’s wealth will turn out to be examples of martingales. When we need to. but it turns out that the composition of a subharmonic function with Brownian motion yields a submartingale. Given an increasing sequence of σ-fields Fn .) Note that the definition of martingale depends on the collection of σ-fields.2. Suppose we have a sequence of σ-fields F1 ⊂ F2 ⊂ F3 · · ·. A r. . . To define conditional expectation.4. then we say Mn is a supermartingale. Another example is to let Fk be the events that are determined by the values of a stock at times 1 through k.v. If we have (1) and (2). If we have (1) and (2). We will see that martingales are ubiquitous in financial math. This is an issue when there is more than one probability around. Submartingales tends to increase and supermartingales tend to decrease.’s Xn is adapted if Xn is Fn measurable for each n. .

X2 . Substituting. If it turns up heads you double your fortune.The word “martingale” is also used for the piece of a horse’s bridle that runs from the horse’s head to its chest. The word also refers to a gambling system.2(2) is easy to see. we obtain E [Mn+1 | Fn ] = Mn . Let X1 . . Definition 4.2(1) also holds. E [2Xn+1 Sn | Fn ] = 2Sn E [Xn+1 | Fn ] = 2Sn E Xn+1 = 0. or Mn is a martingale. . . . Xi has mean 0 is the same as saying E Xi = 0.v. and so Definition 4. It keeps the horse from raising its head too high. be a sequence of independent r. To formalize this. where we used the independence. Then 2 2 Mn = X1 · · · Xn . Xn ). . Note 0 ≤ Mn ≤ 2n .v. . . and let n 2 Mn = Sn − n. We compute 2 2 E [Mn+1 | Fn ] = E [Sn + 2Xn+1 Sn + Xn+1 | Fn ] − (n + 1). 17 .’s with mean 0 that are independent.2) follows. . Xn . while (2) is easy. 2 2 We have E [Sn | Fn ] = Sn since Sn is Fn measurable. . Before we give our fourth example. .v.) Set Fn = σ(X1 . Let Fn be the σ-field generated by X1 . X2 .” Let Mn be your fortune at time n. To compute the conditional expectation. so we state it as a proposition. Then E [Mn+1 | Fn ] = Mn E [Xn+1 | Fn ] = Mn E Xn+1 = Mn . . (4. let X1 . and there seems to be no consensus on the derivation of the term. . using the independence. we have −|X| ≤ X ≤ |X|. the n σ-field generated by X1 . this presupposes that E |X1 | is finite. A third example: Suppose you start with a dollar and you are tossing a fair coin independently. We now check E [Mn+1 | Fn ] = X1 + · · · + Xn + E [Xn+1 | Fn ] = Mn + E Xn+1 = Mn . . (Saying a r. 2 2 And E [Xn+1 | Fn ] = E Xn+1 = 1. It turns out that martingales in probability cannot get too large. I did some searching on the Internet.2(1) is satisfied. . be independent r. Xn . Definition 4. . Our fourth example will be used many times. Since E [|X| | F] is nonnegative. where Sn = i=1 Xi . . Again (1) and (2) of Definition 4. This is “double or nothing. (4.’s that are equal to 2 with probability 1 and 0 with probability 1 . n Since E |Mn | ≤ i=1 E |Xi |. . .2) To see this. so −E [|X| | F] ≤ E [X | F] ≤ E [|X| | F]. tails you go broke. note E Xn+1 = 1. let us observe that |E [X | F]| ≤ E [|X| | F]. Here is an example of a martingale.2 are easy to check. Another example: suppose in the above that the Xk all have variance 1. Let Mn = i=1 Xi .

with E |X| < ∞. .2(1). this shows Definition 4. 18 .2(2) is clear.v. be given and let X be a fixed r. Definition 4.Proposition 4. .2). Proof. while E |Mn | ≤ E [E [|X| | Fn ]] = E |X| < ∞ by (4. Then Mn is a martingale. Let Mn = E [X | Fn ]. F2 .3. We have E [Mn+1 | Fn ] = E [E [X | Fn+1 ] | Fn ] = E [X | Fn ] = Mn . Let F1 . .

Properties of martingales. This is a stopping time because (τ = k) = (S0 . we have τ = ∞. Sk ≥ A) ∈ Fk . (We will use the convention that the minimum of an empty set is +∞. . which we know because E X 2 − (E X)2 = E (X − E X)2 ≥ 0. continue on to the city limits. is not a stopping time. if τ is a r. When it comes to discussing American options. then g(E X) ≤ E [g(X)]. for example. If I tell you to drive to the city limits and then drive until you come to the second stop light after that. Conversely. . If g is convex. this says (E X)2 ≤ E X 2 . if g is convex and the expectations exist. this is still true. then (τ = k) = (τ ≤ k) − (τ ≤ k − 1). That is. One sometimes allows τ to also take on the value ∞. Since (τ = j) ∈ Fj ⊂ Fk . we will need the concept of stopping times.1. For ordinary expectations rather than conditional expectations. We can think of a stopping time as the first time something happens. on the event that Sk is never in A. this says |E X| ≤ E |X|. Since (τ ≤ k) ∈ Fk and (τ ≤ k − 1) ∈ Fk−1 ⊂ Fk . Note (τ ≤ k) = ∪k (τ = j). An example is τ = min{k : Sk ≥ A}. The first set of instructions forms a stopping time. when g(x) = x2 . either you must have been there before or else you have to go past where you are supposed to stop. you know when you get there that you have arrived. then g(E [X | G]) ≤ E [g(X) | G] provided all the expectations exist. We already know some special cases of this: when g(x) = |x|. Our first result is Jensen’s inequality. with the above definition of τ . and such a τ must be a stopping time.v. you don’t need to have been there before or to look ahead. so. then the event (τ ≤ k) ∈ Fk j=0 for all k. σ = max{k : Sk ≥ A}. . S1 . You don’t know when you first get to the second stop light before the city limits that you get to stop there. Sk−1 < A. .5. with (τ ≤ k) ∈ Fk for all k. Proposition 5. then (τ = k) ∈ Fk . Here is an intuitive description of a stopping time. 19 . A mapping τ from Ω into the nonnegative integers is a stopping time if (τ = k) ∈ Fk for each k. But if I tell you to drive until you come to the second stop light before the city limits. the second set does not. the last time. and then turn around and come back two stop lights.

N = k]. so by Proposition 2. We have (N = k) ∈ Fk ⊂ Fk+1 . N = k] by the definition of MN . E [Mk . 20 . Proposition 5. N = k] = E [Mk+2 .2 tells us that E [Mk+1 . N = k].. So E M0 = E M1 = · · · = E Mn . Suppose K is a positive integer. By Jensen’s inequality. Proof. If Mn is a martingale and g is convex. Proof.1 as well as many of the following propositions.For Proposition 5.s. N = k] = E Mn k=0 as desired. then g(Mn ) is a submartingale. and we relegate the proof to Note 1 below. One reason we want Jensen’s inequality is to show that a convex function applied to a martingale yields a submartingale. We have E [MN . then the sum will be K E [Mn . Doob’s optional stopping theorem says the same thing holds when fixed times n are replaced by stopping times. If Mn is a martingale. provided all the expectations exist. N = k]. We have E MN = k=0 K E [MN . Since Mk+1 = E [Mk+2 | Fk+1 ]. N = k] = E [Mk+1 . Then E MN = E MK . the statement of the result is more important than the proof. one first finds N (ω) and then evaluates M· (ω) for that value of N . Now (N = k) is in Fk . If we show that the k-th summand is E [Mn . Here. Proposition 2. to evaluate MN . N = k]. Theorem 5. N is a stopping time such that N ≤ K a. then E Mn = E [E [Mn+1 | Fn ]] = E Mn+1 . and Mn is a martingale.3. E [g(Mn+1 ) | Fn ] ≥ g(E [Mn+1 | Fn ]) = g(Mn ). N = k] = E [Mk .2.2 and the fact that Mk = E [Mk+1 | Fk ].

Note 1. Even if g does not have a derivative at x0 . we let c(x0 ) = g (x0 ).4. If we change the equalities in the above to inequalities. Apply this with x = X(ω) and x0 = E [X | G](ω). see Note 2 below. It is easy to see that the sequence M1 . (For those who are not familiar with derivates. there is a line passing through x0 which lies beneath the graph of g. then c(E [X | G]) is G measurable. b). where a ∧ b = min(a. We then have g(X) ≥ g(E [X | G]) + c(E [X | G])(X − E [X | G]). Mn+1 is also a submartingale.1. If Mn is a nonnegative submartingale.We continue. and we obtain E [MN . using (N = k) ∈ Fk ⊂ Fk+1 ⊂ Fk+2 .) One can check that if c is so chosen. the first time that Mk is greater than or equal to λ. . We prove Proposition 5. Let N = min{k : Mk ≥ λ} ∧ (n + 1). Set Mn+1 = Mn . In the case where g is not differentiable. M2 .4. N = k] = E [Mk+1 . N = k] = · · · = E [Mn . N = k]. If g is convex. For the proof. then the graph of g lies above all the tangent lines. 2 2 (b) E (maxk≤n Mk ) ≤ 4E Mn . 1 (a) P(maxk≤n Mk ≥ λ) ≤ λ E Mn . We prove Theorem 5. If g is differentiable. N = k] = E [Mk . Now take the conditional expectation with respect to G. As a corollary we have two of Doob’s inequalities: Theorem 5. . So for each x0 there exists c(x0 ) such that g(x) ≥ g(x0 ) + c(x0 )(x − x0 ). The first term on the right is G measurable. Then P(max Mk ≥ λ) = P(N ≤ n) k≤n 21 . . so remains the same. for example. this is essentially the left hand derivative. the same result holds for submartingales. The second term on the right is equal to c(E [X | G])E [X − E [X | G] | G] = 0. then we choose c to be the left hand upper derivate. . Note 2.

N ≤ n] ≤ E MN ∧n . Now P(max Mk ≥ λ) = E [1(N ≤n) ] ≤ E k≤n MN . The last expression is at most E [Mn . then by Jensen’s inequality. since Mn is a submartingale. N = k]. and so E [MN ∧n .1) Finally. 2 We now look at (b). we obtain ∞ ∞ 2λP(M ∗ ≥ λ)dλ ≤ 2 0 0 E [Mn : M ∗ ≥ λ] ∞ = 2E 0 Mn 1(M ∗ ≥λ) dλ M∗ = 2E Mn = 2E [Mn M ]. M ∗ ≥ λ]. N ≤ n] = k=0 ∗ 2 1≤k≤n 2 Mk ] n ≤E k=1 2 Mk < ∞. Using Cauchy-Schwarz. Let us write M ∗ for maxk≤n Mk . E [Mk∧n . N ≤ n]. 0 ∗ dλ 22 .N ≤ n λ 1 1 = E [MN ∧n . λ λ (5.and if N ≤ n. ∞ E [Mk∧n . N = k] = E [Mn . If it is finite. there is nothing to prove. Arguing as in the proof of Theorem 5. Then E (M ) = E [ max We have E [MN ∧n .1) by 2λ and integrate over λ from 0 to ∞. this is bounded by 2 2(E Mn )1/2 (E (M ∗ )2 )1/2 . N = k] ≤ E [Mn . If we multiply (5. N ≤ n] ≤ k=0 ∞ E [Mn . then MN ≥ λ. N = k]. E MN ∧n ≤ E Mn .3. If E Mn = ∞. we have 2 2 2 E Mk = E [E [Mn | Fk ]2 ] ≤ E [E [Mn | Fk ] ] = E Mn < ∞ for k ≤ n.

On the other hand.) Theorem 5. E |Mn | ≤ c < ∞ for some c not depending on n suffices. Suppose Mn is a martingale bounded in absolute value by K. We can write n+1 n+1 2K ≥ Nn − NSn+1 ∧n = k=1 (NSk+1 ∧n − NTk ∧n ) + k=1 (NTk ∧n − NSk ∧n ). and hence must converge. which implies maxn Un < ∞ a. Note by Jensen’s inequality Nn is a submartingale. square both sides. The middle term is larger than (b − a)Un . then Sn+1 > n. We will show that bounded martingales converge.5. and obtain (b). What might go wrong is that Mn might be larger than b infinitely often and less than a infinitely often. let Nn = (Mn − a)+ . Let a < b be two rationals. Note 4. Since S1 < T1 < S2 < · · ·. T1 = min{k > S1 : Nk ≥ b − a}. We therefore have 2 E (M ∗ )2 ≤ 2(E Mn )1/2 (E (M ∗ )2 )1/2 . b) shows that almost surely Mn cannot oscillate. and so on. 23 . Since Mn is bounded. Let Un = max{k : Tk ≤ n}. ∞ ∞ 2λP(M ∗ ≥ λ)dλ = E 0 0 2λ1(M ∗ ≥λ) dλ M∗ =E 0 2λ dλ = E (M ∗ )2 . which is what we needed. The only possibility is that it might oscillate. (The hypothesis of boundedness can be weakened. We divide both sides by (E (M ∗ )2 )1/2 . and let S1 = min{k : Nk ≤ 0}.s. The expectation of the first sum on the right and the last term are greater than or equal to zero by optional stopping. it can’t tend to +∞ or −∞. Then limn→∞ Mn exists a. Proof. This formulation is equivalent to the classical one and is better suited for our use. If Xn is a sequence of nonnegative random variables converging to X a. for example.. Un is called the number of upcrossings up to time n.. |Mn | ≤ K for all n.s. That is.s. so we conclude (b − a)E Un ≤ 2K. then taking the union over all pairs of rationals (a. Note 3. Recall we showed E (M ∗ )2 < ∞. We will state Fatou’s lemma in the following form. We want to show that maxn Un < ∞ a.s. Now take expectations. Fix a < b. Let n → ∞ to see that E maxn Un < ∞. If we show the probability of this is 0. then E X ≤ supn E Xn . S2 = min{k > T1 : Nk ≤ 0}.

” After one time unit the stock price will be either uS0 with probability P or else dS0 with probability Q. you simply withdraw money from that account. This is equivalent to selling shares of a stock you don’t have and is called selling short. sell an option. A European call option in this context is the option to buy one share of the stock at time 1 at price K. If S1 is less than K. Q < 1. 0). If you sell one share of stock short. so we are going to allow only one time step: one makes an investment. One thing you could do is buy one option. where x+ is max(x. One way to make it seem more realistic is to assume you have a large amount of money on deposit. the key point is that these are considered to be risk-free. there will be some money left over and you put that in the bank.6. Let us begin by giving the simplest possible model of a stock and see how a European call option should be valued in this context. We will assume 0 < P. So the value of the option at time 1 is V1 = (S1 − K)+ . The principal question to be answered is: what is the value V0 of the option at time 0? In other words. In either case. then at time 1 you must buy one share at whatever the market price is at that time and turn it over to the person that you sold the stock short to. If S1 is greater than K. you can also put your money in the bank where one will earn interest at rate r. K is called the strike price. We are looking at the simplest possible model. which is the same as borrowing. that is. immediately turn around and sell the stock for price S1 and make a profit of S1 − K. The other thing you could do is use the money to buy ∆0 shares of stock. at this point you have V0 − ∆0 S0 in 24 . Here “d” is a mnemonic for “down” and “u” for “up. and looks at it again one day later. The one step binomial asset pricing model. Instead of purchasing shares in the stock. where P + Q = 1. you can use the option at time 1 to buy the stock at price K. Let d and u be two numbers with 0 < d < 1 < u. Similarly you can buy a negative number of options. then the option is worthless at time 1. If V0 > ∆0 S0 . Let S1 be the price of the stock at time 1. Let’s suppose the price of a European call option is V0 and see what conditions one can put on V0 . Alternatives to the bank are money market funds or bonds. how much should one pay for a European call option with strike price K? It is possible to buy a negative number of shares of a stock. not exactly a totally realistic assumption. We assume that you can borrow at the same interest rate r. Suppose you start out with V0 dollars. and you make up the shortfall by borrowing money from the bank. you do not have enough money to buy the stock. Suppose we have a single stock whose price is S0 . You can also deposit a negative amount of money in the bank. and when you borrow. If V0 < ∆0 S0 .

2) (6. We need to show ∆0 uS0 + (1 + r)(V0 − ∆0 S0 ) = V1u + (1 + r)(V0 − W0 ). of V1u . Let ∆0 = and we will also need W0 = 1 + r − d u u − (1 + r) d 1 V1 + V1 . Let us do that now. u−d u−d (6. and if it goes down. and of V1d agree in (6. ∆0 dS0 + (1 + r)(V0 − ∆0 S0 ). Note these are deterministic quantities. We have not said what ∆0 should be.e. If the stock goes up. Let V1u = (uS0 − K)+ and V1d = (dS0 − K)+ .1) is equal to V1u − 1 + r − d u u − (1 + r) d V1 + V1 + (1 + r)V0 .. 1+r u−d u−d V1u − V1d .3) (6.1) is equal to V1u − V1d (u − (1 + r)) + (1 + r)V0 .2) and (6. Suppose that V0 > W0 . The left hand side of (6. at time 1 you will have ∆0 uS0 + (1 + r)(V0 − ∆0 S0 ).1) Now check that the coefficients of V0 . the second being similar. while if the stock went down. sell one option for V0 dollars. you would now have V1d + (1 + r)(V0 − W0 ). uS0 − dS0 In a moment we will do some algebra and see that if the stock goes up and you had bought stock instead of the option you would now have V1u + (1 + r)(V0 − W0 ). What you want to do is come along with no money. ∆0 S0 (u − (1 + r)) + (1 + r)V0 = u−d The right hand side of (6.the bank and ∆0 shares of stock. not random. use the money to buy ∆0 shares. and put the rest in the bank 25 . Let’s check the first of these. i.3).

Thus p and q act like probabilities.2. u−d then p. we want to make a few observations. q ≥ 0 and p + q = 1. if 1 + r > u. and customers would go to him and ignore you because they would be getting a better deal. you give him one share of stock and sell the rest. If we set p= 1+r−d . and while doing so. There is nothing special about European call options in our argument above. Remark 6. If the buyer of your option wants to exercise the option. say $4. One could let V1u and Vd1 be any two values of any option. Then someone else would observe this and decide to sell the same option at a price less than V0 but larger than W0 . whether the stock went up or down. you just do the opposite: sell ∆0 shares of stock short. which are paid out if the 26 . First of all. Remember it is possible to have a negative number of shares. So we may suppose 1 + r < u.” or “arbitrage opportunities do not exist. buy one option. It does depend on p and q. u−d q= u − (1 + r) . This would continue as long as any one would try to sell an option above price W0 . one would never buy stock. you sell your shares of stock and pocket the money. it will become apparent where the formulas for ∆0 and W0 came from. Remark 6. Suppose you could sell options at a price V0 = $5. We always have 1 + r ≥ 1 > d. but they have nothing to do with P and Q.(or borrow if necessary). It also represents the belief that the market is freely competitive. You will have cleared (1 + r)(V0 − W0 ). say at $3.1. you clear (1 + r)(W0 − V0 ).” The only way to avoid this is if V0 = W0 . The way it works is this: suppose W0 = $3. But then a third person would decide to sell the option for less than your competition but more than W0 . whether the stock goes up or down. This time. since one can always do better by putting money in the bank. this is larger than W0 and you would earn V0 − W0 = $2 per option without risk. Note also that the price V0 = W0 does not depend on P or Q. This person would still make a profit. Now most people believe that you can’t make a profit on the stock market without taking a risk. we have shown that the only reasonable price for the European call option is W0 . The “no arbitrage” condition is not just a reflection of the belief that one cannot get something for nothing. In other words. and deposit or make up the shortfall from the bank. At this point. with no risk.50. If V0 < W0 . If he doesn’t want to exercise the option. The name for this is “no free lunch. which seems to suggest that there is an underlying probability which controls the option price and is not the one that governs the stock price. We will examine this problem of pricing options in more complicated contexts.

we will see that the generalization of this fact is that the stock price at time n is a martingale. that is.4. A dollar tomorrow is equivalent to 1/(1 + r) dollars today. (The “c” is a mnemonic for “constant. (dS0 − K)+ . The corresponding price of a European call option would be (uS0 − K)+ . then some algebra shows that 1 E V1 .”) There are however only two variables. the market is called complete in this model. Similar considerations apply when P = 0. or (S0 − K)+ .) Suppose instead of P and Q being the probabilities of going up and down. but only these two options. One then divides by 1 + r to get the value of the stock in today’s dollars. (r. not necessarily the one you observe.6. If one could replicate this outcome by buying and selling shares of the stock. and most of the time three equations in two unknowns cannot be solved. In other words. under which the stock price is a martingale. In our model we ruled out the cases that P or Q were zero. Substituting the values for p and q. What if instead there are 3 (or more) possibilities. one would expect to have the same amount of money one started with. and V1c . and we would always put the money in the bank if u ≤ 1 + r. except in very special circumstances. respectively. Remark 6. Suppose for example. then one expects at time 1 to have (P u + Qd)S0 . The above analysis shows we can exactly duplicate the result of buying any option V by instead buying some shares of stock. then the “no arbitrage” rule would give the exact value of the call option in this model. If we let P be the probability so that S1 = uS0 with probability p and S1 = dS0 with probability q and we let E be the corresponding expectation. ∆0 and V0 at your disposal. This is a special case of the fundamental theorem of finance: there always exists some probability. down a factor d with probability Q. in terms of V1u . Our model allows after one time step the possibility of the stock going up or going down. V0 = 1+r This will be generalized later. and the theory falls apart.5. we are certain that the stock will go up. When we get to the binomial asset pricing model with more than one step. the risk-free interest rate. Remark 6. But. V1d . One would then expect to have (pu + qd)S0 and then divide by 1 + r. as we would always do better. if p and q were the correct probabilities. It is interesting to note that 27 . this reduces to S0 . can also be considered the rate of inflation. One has three equations one wants to satisfy. Remark 6. one cannot do this. Remark 6. that the stock goes up a factor u with probability P .stock goes up or down. then we would always invest in the stock if u > 1 + r.3. and remains constant with probability R. If Q = 0. If in some model one can do this for any option. If one buys one share of stock at time 0. where P + Q + R = 1. still with the assumption that p and q are the correct probabilities. they were in fact p and q.

28 . It turns out that in more general models the true probabilities enter only in determining which events have probability 0 or 1 and in no other way.the cases where P = 0 or Q = 0 are the only ones in which our derivation is not valid.

where j is the number of the yk that are equal to u. Yn = yn ) = pj q n−j .” it is P that governs the price. this is equal to P(Y1 = y1 ) · · · P(Yn = yn ). then the stock price goes up by a factor u. S0 will be a fixed number and we define Sk (ω) = uj dk−j S0 if the first k elements of a given ω ∈ Ω has j occurrences of H and k − j occurrences of T . let Yk = Sk /Sk−1 .it will turn out that using the principle of “no arbitrage. Ω will be all sequences of length n of H’s and T ’s. . (What we are doing is saying that if the j-th element of the sequence making up ω is an H. . That doesn’t matter .7. if T . we have E [(1 + r)−(k+1) Sk+1 | Fk ] = (1 + r)−k Sk (1 + r)−1 E [Sk+1 /Sk | Fk ]. Unlimited short selling of stock Unlimited borrowing No transaction costs Our buying and selling is on a small enough scale that it does not affect the market. p= q= u−d u−d and define P(ω) = pj q n−j if ω has j appearances of H and n − j appearances of T . The P we construct may not be the true probabilities of going up or down. Since the random variable Sk+1 /Sk is independent of Fk . We observe that under P the random variables Sk+1 /Sk are independent and equal to u with probability p and d with probability q. S k . Then P(Y1 = y1 . Let (1 + r) − d u − (1 + r) . Using the independence the conditional expectation on the right is equal to E [Sk+1 /Sk ] = pu + qd = 1 + r. Proof. Let E denote the expectation corresponding to P. 29 . We assume the following. The multi-step binomial asset pricing model. In this section we will obtain a formula for the pricing of options when there are n time steps. then down by a factor d. Our first result is the fundamental theorem of finance in the current context.) Fk will be the σ-field generated by S0 . The “Black-Scholes” formula we will obtain is already a nontrivial result that is useful. On the other hand. . . Under P the discounted stock price (1 + r)−k Sk is a martingale. (1) (2) (3) (4) We need to set up the probability model. Proposition 7.1. . To see this. . . but each time the stock can only go up by a factor u or down by a factor d. . Thus Yk is the factor the stock price goes up or down at time k.

i. The result follows. It is easy to lose the idea in the algebra. Therefore Wk+1 = ∆k Sk+1 + (1 + r)[Wk − ∆k Sk ]. We require ∆k to be Fk measurable.. Let W0 be the amount of money you start with and let Wk be the amount of money you have at time k. it follows that in the case r = 0 that Wk is a martingale. is called the portfolio process. Let ∆k be the number of shares held between times k and k + 1.Substituting yields the proposition. so first let us try to see why the theorem is true. We have (1 + r)−(k+1) Wk+1 = (1 + r)−k Wk + ∆k [(1 + r)−(k+1) Sk+1 − (1 + r)−k Sk ]. . Under P the discounted wealth process (1 + r)−k Wk is a martingale. Let Vk = E [V | Fk ]. that is.3 we see that Vk is a martingale. or k Wk+1 = W0 + i=0 ∆i (Si+1 − Si ). At time k + 1 this is worth (1 + r)[Wk − ∆k Sk ]. Our next result is that the binomial model is complete. ∆1 . Wk − ∆k Sk . More generally Proposition 7. The amount of cash we hold between time k and k + 1 is Wk minus the amount held in stock. Observe that E [∆k [(1 + r)−(k+1) Sk+1 − (1 + r)−k Sk | Fk ] = ∆k E [(1 + r)−(k+1) Sk+1 − (1 + r)−k Sk | Fk ] = 0. For simplicity let us first consider the case r = 0. 30 . . ∆0 . Note that in the case where r = 0 we have Wk+1 − Wk = ∆k (Sk+1 − Sk ).e. Proof. by Proposition 4. Wk is the wealth process. . We want to construct a portfolio process. then at time k + 1 those shares will be worth ∆k Sk+1 . If we have ∆k shares between times k and k + 1. Since E [Wk+1 − Wk | Fk ] = ∆k E [Sk+1 − Sk | Fk ] = 0.2. This is a discrete version of a stochastic integral.

. let ∆k (ω) = Vk+1 (t1 .) We now suppose Wk = Vk and want to show Wk+1 (H) = Vk+1 (H) and Wk+1 (T ) = Vk+1 (T ). . this is an abbreviation for Vk+1 (t1 . tk . . The binomial asset pricing model is complete. . . . H. Let Vk = (1 + r)k E [(1 + r)−n V | Fk ]. . . tk+2 . At the (k + 1)-st step there are only two possible changes for the price of the stock and so since Vk+1 is Fk+1 measurable. . there exists a constant W0 and a portfolio process ∆k so that the wealth process Wk satisfies Wn = V . . Recall that Wk is also a martingale. . . where each ti is an H or T . and we will show by induction that the wealth process at time k equals Vk . so we drop the t’s from the notation. We show the first equality. We only have one parameter. H. . Proof. tk . tn play no role in the rest of the proof. . We know (1 + r)−k Vk is a martingale under P so that Vk = E [(1 + r)−1 Vk+1 | Fk ] 1 = [pVk+1 (H) + qVk+1 (T )]. Neither Sk+1 nor Vk+1 depends on tk+2 . Then using induction we have Wn = Vn = V as required. .1) (See Note 1. . But both V and W are martingales. . So ∆k depends only on the variables t1 . Now let us turn to the details. If ω = (t1 . . If V is any random variable that is Fn measurable. . Now tk+2 . tk . . . hence is Fk measurable.choose ∆k ’s. tk+2 .3. . 31 . . . . . The first thing to show is that ∆k is Fk measurable. . T. . tk+2 . tk+2 . . . . We need to choose ∆k so that Wk+1 = Vk+1 for each of these two possibilities. and t1 . tn ). the second being similar. tn ) . If we write Vk+1 (H). . . . we can trade shares of stock to exactly duplicate the outcome of any option V . . so that Wn = V . . tk will be fixed. . Sk+1 (t1 . to play with to match up two numbers. In other words. . . By Proposition 4. . T. . tk+2 . . . ∆k . In the following proof we allow r ≥ 0. . . Theorem 7. . . tn ) − Vk+1 (t1 . tn . only two possible values for Vk+1 . . . . tk . tn ) Set W0 = V0 . starting with W0 dollars. 1+r (7. tn ). . . . H. which may seem like an overconstrained system of equations. . Suppose we have Wk = Vk at time k and we want to find ∆k so that Wk+1 = Vk+1 . tk . tn ) − Sk+1 (t1 . . which is why the system can be solved. . . . tk . We will do it inductively by arranging matters so that Wk = Vk for all k. . . The precise meaning of this is the following.3 (1 + r)−k Vk is a martingale.

no matter what the market does in between. In the binomial asset pricing model. The value of the option V at time 0 is V0 = (1 + r)−n E V . Let V be any option that is Fn -measurable. = We are done.4. we could obtain a riskless profit.. that can’t happen. Theorem 7. We can construct a portfolio process ∆k so that if we start with W0 = (1+r)−n E V . for which V = (Sn − K)+ . what portfolio process) to use. and again make a riskless profit. By the “no arbitrage” rule. have a net worth of V + (1 + r)n (c0 − W0 ) at time n. we obtain the Black-Scholes formula in this context. at no risk.4 tells you precisely what hedging strategy (i.Wk+1 (H) = ∆k Sk+1 (H) + (1 + r)[Wk − ∆k Sk ] = ∆k [uSk − (1 + r)Sk ] + (1 + r)Vk Vk+1 (H) − Vk+1 (T ) Sk [u − (1 + r)] + pVk+1 (H) + qVk+1 (T ) (u − d)Sk = Vk+1 (H). The one we have in mind is the European call. but the argument is the same for any option whatsoever. If we could buy or sell the option V at a price other than W0 . Proof. Remark 7. if the option V could be sold at a price c0 larger than W0 . Therefore the price of the European call is n (1 + r) −n k=0 (uk dn−k S0 − K)+ 32 n k pk q n−k . so the price of the option V must be W0 . Finally. use W0 to buy and sell stock according to the portfolio process ∆k .5. meet our obligation to the buyer of the option by using V dollars. We have E (Sn − K)+ = x (x − K)+ P(Sn = x) and P(Sn = x) = n k pk q n−k if x = uk dn−k S0 . . then the wealth at time n will equal V . hold −∆k shares at time k. If c0 were less than W0 . That is.e. we would do the same except buy an option. there is no difficulty computing the price of a European call. and have a net profit. of (1 + r)n (c0 − W0 ). Note that the proof of Theorem 7. we would sell the option for c0 dollars.

. It is interesting that even without using options.. and the price of such a V will be quite high. sell high” option is very desirable. Let us also compute the hedging strategy ∆0 . such a “buy low. ∆1 .. and ∆2 . i=1. and the probabilities for each possible outcome ω. the payoff V .The formula in Theorem 7. You can even do this if the maximum comes before the minimum.. you are allowed to wait until time n and look back to see what the maximum and minimum were. Consider the binomial asset pricing model with n = 3. let us compute explicitly the random variables V1 and V2 and calculate the value V0 . so the theory applies. 2 S0 = 10. r = 0. This V is still Fn measurable. u = 2. Let (1 + r) − d u − (1 + r) = . = . you sell the stock for the maximum value it takes during the first n time steps and you buy at the minimum value the stock takes.n In other words..n j=1. where you do not look into the future to determine ∆k . and K = 15.. Naturally.4. p= q= u−d u−d The following table describes the values of the stock.4 holds for exotic options as well. Suppose V = max Si − min Sj . Let us look at an example of a European call so that it is clear how to do the calculations.6. you can duplicate the operation of buying low and selling high by holding an appropriate number of shares ∆k at time k. 33 . If V is a European call with strike price K and exercise date n.1.. d = 1 ..

0. V1 (H) − V1 (T ) = .6391. V1 = (1 + r)−2 E [V | F1 ]. V2 (T T ) = 0.2424. V1 (T ) = (1 + r)−2 5pq = . V2 = (1 + r)−1 E [V | F2 ].3333.8182.2074. S1 (H) − S1 (T ) V3 (HHH) − V3 (HHT ) = 1. so we have V2 (HH) = (1 + r)−1 (65p + 5q) = 24.5454. S3 (HHH) − S3 (HHT ) V3 (HT H) − V3 (HT T ) ∆2 (HT ) = = .3333. S3 (HT H) − S3 (HT T ) V3 (T HH) − V3 (T HT ) ∆2 (T H) = = . V2 (T H) = (1 + r)−1 5p = 1.9917.8182. S2 (T H) − S2 (T T ) Vk+1 (H) − Vk+1 (T ) . so we have V1 (H) = (1 + r)−2 (65p2 + 10pq) = 10. ∆1 (H) = V2 (HH) − V2 (HT ) = .7576.5785. S2 (HH) − S2 (HT ) ∆2 (HH) = ∆1 (T ) = V2 (T H) − V2 (T T ) = . S3 (T T H) − S3 (T T T ) 34 .0. S3 (T HH) − S3 (T HT ) V3 (T T H) − V3 (T T T ) ∆2 (T T ) = = 0. The formula for ∆k is given by ∆k = so ∆0 = where V1 and S1 are as above.ω S1 S2 10u2 10u2 10ud 10ud 10ud 10ud 10d2 10d2 S3 10u3 10u2 d 10u2 d 10ud2 10u2 d 10ud2 10ud2 10d3 V 65 5 5 0 5 0 0 0 Probability p3 p2 q p2 q pq 2 p2 q pq 2 pq 2 q3 HHH 10u HHT 10u HTH 10u HTT 10u THH 10d THT 10d TTH 10d TTT 10d We then calculate V0 = (1 + r)−3 E V = (1 + r)−3 (65p3 + 15p2 q) = 4. Sk+1 (H) − Sk+1 (T ) V2 (HT ) = (1 + r)−1 5p = 1.

1). Now E [Vk+1 . s1 · · · sk ] = E [Vk+1 . so we need to show that if A ∈ Fk . 35 . and the value of Vk+1 is Vk+1 (T ). which is what we wanted. then E [Vk+1 . .1) is Fk measurable. By linearity. By independence this is Vk+1 (s1 · · · sk H)P(s1 · · · sk )p + Vk+1 (s1 · · · sk T )P(s1 · · · sk )q. . s1 · · · sk T ] = Vk+1 (s1 · · · sk H)P(s1 · · · sk H) + Vk+1 (s1 · · · sk T )P(s1 · · · sk T ). A]. Let us give a more rigorous proof of (7. s1 · · · sk H] + E [Vk+1 . A] = E [pVk+1 (H) + qVk+1 (T ). it suffices to show this for A = {ω = (t1 t2 · · · tn ) : t1 = s1 .1) is not entirely obvious.Note 1. tk = sk }. . The second equality is (7. Intuitively. where s1 s2 · · · sk is any sequence of H’s and T ’s. . The right hand side of (7. it says that one has a heads with probability p and the value of Vk+1 is Vk+1 (H) and one has tails with probability q.

and hence its value must be the same as that of a European call. This is a random quantity. We will show that τ ≡ n is the solution to the above optimal stopping problem: the best time to exercise is as late as possible. receive one share of stock. so the value must be at least as great as that of a European call. we work with P. As usual. we have to choose a stopping time τ . your payoff is g(Sk ). In other words. Suppose that if you exercise the option at time k. You have to make a decision on when to exercise the option. What we want to do is find the stopping time that maximizes the expected value of this random variable. on a European call. 36 0 ≤ λ ≤ 1. In fact. because you lost the interest on your K dollars that you would have received if you had waited to exercise until time T . giving a more rigorous argument in a moment.) Here is the more rigorous argument. one can decide to pay K dollars and obtain a share of stock. An American option is one where you can exercise the option any time before some fixed time T . one can only use it to buy a share of stock at the expiration time T . not on what is going to happen in the future. In present day dollars. So your wealth is ST − K ≤ (ST − K)+ . Certainly g(x) = (x−K)+ is such a function. You hold onto the stock.8. and thus we are looking for the stopping time τ such that τ ≤ n and E (1 + r)−τ g(Sτ ) is as large as possible. so it may be advantageous to exercise early. (A put is the option to sell a stock at a price K at time T . Let us give an informal argument on how to price an American call. because selling stock gives you some money on which you will receive interest. Thus our payoff is (1 + r)−τ g(Sτ ). you have (1 + r)−k g(Sk ). For example. You pay K dollars. Suppose g(x) is convex with g(0) = 0. and that decision can only be based on what has already happened. while for an American call. (8. that is. and we exercise the option at time τ (ω). Therefore an American call is worth no more than a European call. at any time before time T .1) . suppose you decide to exercise early. we have strict inequality. On the other hand. American options. after correcting for inflation. This argument does not work for puts. and at time T you have one share of stock worth ST . The problem of finding such a τ is called an optimal stopping problem. We have g(λx) = g(λx + (1 − λ) · 0) ≤ λg(x) + (1 − λ)g(0) = λg(x). One can always wait until time T to exercise an American call. and for which you paid K dollars. and your wealth is St − K.

For the first inequality we used (8. By optional stopping. 37 . Although good approximations are known.1). E [(1 + r)−τ g(Sτ )] ≤ E [(1 + r)−n g(Sn )]. and the above argument fails. the payoff is g(Sk ).By Jensen’s inequality. but this time g(0) = 0. For puts. where g(x) = (K − x)+ . and is one of the major unsolved problems in financial mathematics. This is also convex function. an exact solution to the problem of valuing an American put is unknown. so τ ≡ n always does best. So (1 + r)−k g(Sk ) is a submartingale. E [(1 + r)−(k+1) g(Sk+1 ) | Fk ] = (1 + r)−k E 1 g(Sk+1 ) | Fk 1+r 1 ≥ (1 + r)−k E g Sk+1 | Fk 1+r 1 ≥ (1 + r)−k g E Sk+1 | Fk 1+r = (1 + r)−k g(Sk ).

so we need to prepare by extending some of our definitions. this will tend to xfX (x)dx. We let n2n i 1(i/2n ≤X<(i+1)/2n ) . we define E X = E X + − E X − . see Note 1. In this case EX = ∞ xfX (x)dx −∞ provided ∞ −∞ |x|fX (x)dx < ∞. Clearly the Xn are discrete. Xn = 2n i=0 In words. 0) and X − = max(−X. the limit must exist. Recall X has a density fX if b P(X ∈ [a. but all this has been worked out in measure theory and the theory of the Lebesgue integral. For ω where X(ω) > n + 2−n we set Xn (ω) = 0. Continuous random variables. (i + 1)/2 )) = i/2n fX (x)dx. unless the contribution to the integral for |x| ≥ n does not go to 0 as n → ∞. although it could be +∞. There are some things one wants to prove. 0). we let Xn (ω) be the closest value i/2n that is less than or equal to X(ω). one can show that this contribution does indeed go to 0. Here X + = max(X. b]) = a fX (x)dx for all a and b. We are now going to start working toward continuous times and stocks that can take any positive number as a value. (i + 1)/2n ). and approximate X.9. on the set where X ≤ n. As long as |x|fX (x)dx < ∞. if X(ω) lies between 0 and n. 2n Since x differs from i/2n by at most 1/2n when x ∈ [i/2n . In fact. With our definition of Xn we have (i+1)/2n n n n P(Xn = i/2 ) = P(X ∈ [i/2 . Let us confine ourselves here to showing this definition is the same as the usual one when X has a density. we have that |X(ω) − Xn (ω)| ≤ 2−n . For reasonable X we are going to define E X = lim E Xn . we can approximate it by r. provided at least one of E X + and E X − is finite. Given any random variable X ≥ 0. Then E Xn = i i P(Xn = i/2n ) = 2n (i+1)/2n i i/2n i fX (x)dx. 38 . If X is not necessarily nonnegative. Since the Xn increase with n.v’s Xn that are discrete.

So X = Y except for a set of probability 0.s. So we define EX = X(ω) P(dω). For example. It is certainly bounded above by 1 and bounded below by 0. but everything does go through.s. This means that Gn is generated by finitely many disjoint sets Bn1 . How do we define E [Z | G] when G is not generated by a countable collection of disjoint sets? Again. This is a fairly general set-up.s. See Note 2 again. abbreviated “a. n). one write X = X + −X − .v. A probability P is a measure that has total mass 1. and their union is Ω. The a. one defines conditional expectation by what one expects. Suppose also that G1 ⊂ G2 ⊂ · · ·. .”. let Ω be the real line and let Gn be generated by the sets (−∞. but suppose G is the smallest σ-field that contains all the Gn . we need to be more cautious about what we mean when we say two random variables are equal.We also need an extension of the definition of conditional probability. is G measurable if (X > a) ∈ G for every a. Once one has a definition of conditional probability. [n. Let us give a definition that is equivalent that works except for a very few cases. do all the properties of Section 2 hold? The answer is yes. 39 .3 that Mn = P(A | Gn ) is a martingale with respect to the Gn . Finally. Xn (ω) → Y (ω). one can write X as j aj 1Aj and then one defines E [X | G] = aj P(Aj | G). The best way to define expected value is via the theory of the Lebesgue integral. it must have a limit as n → ∞. the number of Bni is finite but arbitrary. The question that one might ask is: how does one know the limit exists? Since the Gn increase. means that except for a set of ω’s of probability zero. ∞) and [i/2n . With this extended definition of conditional expectation. Let us suppose that for each n the σ-field Gn is finitely generated. So for each n. j If the X is not discrete. hence G must be the σ-field that one works with when one talks about Lebesgue measure on the line. Now ∪n Gn will not in general be a σ-field. . A r. there is a completely worked out theory that holds in all cases. and similarly for X − . the Bni are disjoint. define P(A | G) = lim P(A | Gn ). Note 1. One has to worry about convergence. (i + 1)/2n ). we know by Proposition 4. With continuous random variables. Then G will contain every interval that is closed on the left and open on the right. see Note 2. If X is discrete. Bnmn . We say X = Y almost surely. if P({ω : X(ω) = Y (ω)}) = 0. terminology is used other places as well: Xn → Y a. and takes a limit. so by the martingale convergence theorem. . . one approximates X + by discrete random variables.

F.) So Q and P are two finite measures on (Ω. A] if A ∈ G. then P(A) = 0 and so it follows that Q(A) = 0. but we only want to define Q on G.To recall how the definition goes. Let us apply the Radon-Nikodym theorem to the following situation. Define two new probabilities on G as follows. Y ≤ X}. One can show (using the monotone convergence theorem from measure theory) that Q is a finite measure on G. Let P = P|G . we say X is simple if X(ω) = ai ≥ 0. By the Radon-Nikodym theorem there exists an integrable random variable Y such that Y is G measurable (this is why we worried about which σ-field we were working with) and Q(A) = A Y dP if A ∈ G. 40 . Note 2. Finally. If A ∈ G and P (A) = 0. as we will see in a moment. This is the same definition as described above. we define E X = E X + − E X −. and for a simple X we define m m i=1 ai 1Ai (ω) with each EX = i=1 ai P(Ai ). If X is nonnegative. (One can also use this definition to define Q(A) for A ∈ F. A] because E [Y . provided at least one of E X + and E X − is finite. Note ((a) Y is G measurable. we define E X = sup{E Y : Y simple . and (b) if A ∈ G. The Radon-Nikodym theorem from measure theory says that if Q and P are two finite measures on (Ω. G) and Q(A) = 0 whenever P(A) = 0 and A ∈ G. A] = E [Y 1A ] = A Y dP = A Y dP = Q(A) = A XdP = E [X1A ] = E [X. Suppose G ⊂ F. Suppose (Ω. P) is a probability space and X ≥ 0 is integrable: E X < ∞. A] = E [X. G). Define Q by Q(A) = A XdP = E [X. that is. then there exists an integrable function Y that is G-measurable such that Q(A) = A Y dP for every measurable set A. E [Y . A]. P (A) = P(A) if A ∈ G and P (A) is not defined if A ∈ F − G.

If X is integrable but not necessarily nonnegative. We define P(B | G) = E [1B | G] if B ∈ F. An ] + n P(An ) = E [Z + n . then X + and X − will be integrable and we define E [X | G] = E [X + | G] − E [X − | G].We define E [X | G] to be the random variable Y . 2. P(Z > Y ) = 0. and so 1 1 E [Z. If Y and Z are G measurable. up to almost sure equivalence. In the case where G is finitely or countably generated. An ] = E [Z. and therefore P(Y = Z) = 0 as we wished. 41 . 1 then the set An = (Y > Z + n ) will be in G. that satisfies (a) and (b) above. By symmetry. under both the new and old definitions (a) and (b) hold.5. An ]. So the propositions hold for the new definition of conditional expectation as well. Let us show that there is only one r. If one checks the proofs of Propositions 2. An ] ≤ E [Y . one sees that only properties (a) and (b) above were used. This is true for each positive integer n.. A] = E [X.3. A] = E [Z. Consequently P(An ) = 0. By the uniqueness result.v. and 2. and E [Y . the new and old definitions agree. so P(Y > Z) = 0.4. A] for A ∈ G.

Stochastic processes. Such sets are called null sets.v. A mapping τ : Ω → [0. Definition 10. one wants to add all sets N that we think of being null sets.v. We will be talking about stochastic processes. so we put the proof in Note 2. for which (τ < t) ∈ Ft for all t. (Here we are saying the left hand side and the right hand side are equal almost surely. Next add to Ft all sets N for which P(N ) = 0. we will usually not write the “a. we say Yt has continuous paths. Conversely. Typically. 1 1 1 Since (τ < t) = ∪∞ (τ ≤ t − n ) and (τ ≤ t − n ) ∈ Ft− n ⊂ Ft . We say the filtration satisfies the “usual conditions” if the Ft are right continuous and complete (see Note 1). For example. We typically let Ft be the smallest σ-field with respect to which Ys is measurable for all s ≤ t. 0 This is what we referred to as Ft above. Now we want to talk about processes Yt for t ≥ 0. but see Note 1. suppose τ is a nonnegative r. but we need the right continuity of the Ft here. ∞) is a stopping time if for each t we have (τ ≤ t) ∈ Ft . .10. τ will be a continuous random variable and P(τ = t) = 0 for each t.1. even though they might not be 42 . then for a stopping n=1 time τ we have (τ < t) ∈ Ft for all t. Previously we discussed sequences S1 . We will try to avoid thinking about them as much as possible. then E [Mt | Fs ] = Ms . one typically defines Ft as follows. of r. We call a collection of σ-fields Ft with Fs ⊂ Ft if s < t a filtration.’s. and if s < t. A continuous time martingale (or submartingale) is what one expects: each Mt is integrable. all the filtrations we consider will satisfy the usual conditions. . If this function is a continuous function for all ω’s except for a set of probability zero. the map t → Yt (ω) defines a function from [0. and since they have probability 0. As you might imagine. each Mt is Ft measurable. we can think of St being the price of a stock at time t. which is why we need a definition just a bit different from the discrete case. We claim τ is a stopping time. 0 Note 1. In fact.” since almost all of our equalities for random variables are only almost surely.) The analogues of Doob’s theorems go through. there are a few technicalities one has to worry about. they don’t affect anything. S2 . For technical reasons. So Ft = σ(Ys : s ≤ t). Note 3 has the proofs. For each ω.s. . Any nonnegative time t is allowed. The proof is easy. ∞) to R. Let Ft = σ(Ys : s ≤ t). We say a stochastic process has continuous paths if the following holds.

measurable. To be more precise, we say N is a null set if inf{P(A) : A ∈ F, N ⊂ A} = 0. 0 00 Recall we are starting with a σ-field F and all the Ft ’s are contained in F. Let Ft be the σ0 0 field generated by Ft and all null sets N , that is, the smallest σ-field containing Ft and every 00 null set. In measure theory terminology, what we have done is to say Ft is the completion of 0 Ft . 00 Lastly, we want to make our σ-fields right continuous. We set Ft = ∩ε>0 Ft+ε . Although the union of σ-fields is not necessarily a σ-field, the intersection of σ-fields is. Ft 00 contains Ft but might possibly contain more besides. An example of an event that is in Ft 00 but that may not be in Ft is
1 A = {ω : lim Yt+ n (ω) ≥ 0}.

n→∞

00 00 A ∈ Ft+ 1 for each m, so it is in Ft . There is no reason it needs to be in Ft if Y is not m necessarily continuous at t. It is easy to see that ∩ε>0 Ft+ε = Ft , which is what we mean when we say Ft is right continuous. When talking about a stochastic process Yt , there are various types of measurability one can consider. Saying Yt is adapted to Ft means Yt is Ft measurable for each t. However, since Yt is really a function of two variables, t and ω, there are other notions of measurability that come into play. We will be considering stochastic processes that have continuous paths or that are predictable (the definition will be given later), so these various types of measurability will not be an issue for us.

Note 2. Suppose (τ < t) ∈ Ft for all t. Then for each positive integer n0 ,
1 (τ ≤ t) = ∩∞ 0 (τ < t + n ). n=n 1 1 For n ≥ n0 we have (τ < t + n ) ∈ Ft+ n ⊂ Ft+ n1 . Therefore (τ ≤ t) ∈ Ft+ n1 for each n0 . 0 0 Hence the set is in the intersection: ∩n0 >1 Ft+ n1 ⊂ ∩ε>0 Ft+ε = Ft .
0

Note 3. We want to prove the analogues of Theorems 5.3 and 5.4. The proof of Doob’s inequalities are simpler. We only will need the analogue of Theorem 5.4(b). Theorem 10.2. Suppose Mt is a martingale with continuous paths and E Mt2 < ∞ for all t. Then for each t0 E [(sup Ms )2 ] ≤ 4E [|Mt0 |2 ].
s≤t0

Proof. By the definition of martingale in continuous time, Nk is a martingale in discrete time with respect to Gk when we set Nk = Mkt0 /2n and Gk = Fkt0 /2n . By Theorem 5.4(b)
2 2 2 E [ max n Mkt0 /2n ] = E [ max n Nk ] ≤ 4E N2n = 4E Mt2 . 0 0≤k≤2 0≤k≤2

43

(Recall (maxk ak )2 = max a2 if all the ak ≥ 0.) k 2 Now let n → ∞. Since Mt has continuous paths, max0≤k≤2n Mkt0 /2n increases up to 2 sups≤t0 Ms . Our result follows from the monotone convergence theorem from measure theory (see Note 4). We now prove the analogue of Theorem 5.3. The proof is simpler if we assume that is finite; the result is still true without this assumption.

E Mt2

Theorem 10.3. Suppose Mt is a martingale with continuous paths, E Mt2 < ∞ for all t, and τ is a stopping time bounded almost surely by t0 . Then E Mτ = E Mt0 . Proof. We approximate τ by stopping times taking only finitely many values. For n > 0 define τn (ω) = inf{kt0 /2n : τ (ω) < kt0 /2n }. τn takes only the values kt0 /2n for some k ≤ 2n . The event (τn ≤ jt0 /2n ) is equal to (τ < jt0 /2n ), which is in Fjt0 /2n since τ is a stopping time. So (τn ≤ s) ∈ Fs if s is of the form jt0 /2n for some j. A moment’s thought, using the fact that τn only takes values of the form kt0 /2n , shows that τn is a stopping time. It is clear that τn ↓ τ for every ω. Since Mt has continuous paths, Mτn → Mτ a.s. Let Nk and Gk be as in the proof of Theorem 10.2. Let σn = k if τn = kt0 /2n . By Theorem 5.3, E Nσn = E N2n , which is the same as saying E Mτn = E Mt0 . To complete the proof, we need to show E Mτn converges to E Mτ . This is almost obvious, because we already observed that Mτn → Mτ a.s. Without the assumption that E Mt2 < ∞ for all t, this is actually quite a bit of work, but with the assumption it is not too bad. Either |Mτn − Mτ | is less than or equal to 1 or greater than 1. If it is greater than 1, it is less than |Mτn − Mτ |2 . So in either case, |Mτn − Mτ | ≤ 1 + |Mτn − Mτ |2 . (10.1)

Because both |Mτn | and |Mτ | are bounded by sups≤t0 |Ms |, the right hand side of (10.1) is bounded by 1 + 4 sups≤t0 |Ms |2 , which is integrable by Theorem 10.2. |Mτn − Mτ | → 0, and so by the dominated convergence theorem from measure theory (Note 4), E |Mτn − Mτ | → 0. 44

Finally, |E Mτn − E Mτ | = |E (Mτn − Mτ )| ≤ E |Mτn − Mτ | → 0.

Note 4. The dominated convergence theorem says that if Xn → X a.s. and |Xn | ≤ Y a.s. for each n, where E Y < ∞, then E Xn → E X. The monotone convergence theorem says that if Xn ≥ 0 for each n, Xn ≤ Xn+1 for each n, and Xn → X, then E Xn → E X.

45

If X is N (a. and is independent of Yj for 2 2 n 2 j < k. 2 (2) E Zt = t. 46 . we have (1) The paths of Zt are continuous as a function of t. The limit Zt is called a Brownian motion starting at 0. (1) E Zt = 0. Let Sn be a simple symmetric random walk. (4) Zt − Zs has the distribution of a normal random variable with mean 0 and variance t − s. (3) Zt − Zs is independent of Fs = σ(Zr . and E |X|p < ∞ is finite for every positive integer p. There exists a process Zt such that for each k. . We will most often use Wt . . . . It has the following properties. b]) = a 1 2π(t − s) e−y 2 /2(t−s) dy. See Note 1 for more discussion of weak convergence. This means b P(Zt − Zs ∈ [a. We say X is a normal random variable with mean a and variance b2 if d P(c ≤ X ≤ d) = c √ 1 2πb2 e−(y−a) 2 /2b2 dy and we will abbreviate this by saying X is N (a. It turns out Xt does not converge for any ω. However there is another kind of convergence. ak < bk . each t1 < t2 < · · · < tk . Moreover E etX = eat et 2 2 b /2 . We notice that E Sn = 0 while E Sn = i=1 E Yi2 + i=j E Yi Yj = n using the fact that E [Yi Yj ] = (E Yi )(E Yj ) = 0. . n n (2) P(Xt1 ∈ [a1 . equals −1 with probability 1 . bk ]). b2 ). b1 ]. who was the first person to prove rigorously that Brownian motion exists). First. . b1 ].) (5) The map t → Zt (ω) is continuous for almost all ω. Brownian motion. . See Note 2 for a few remarks on this definition. then E X = a. let us review a few facts about normal random variables. . r ≤ s). Var X = b2 . that takes place. Ztk ∈ [ak . . √ n Define Xt = Snt / n if nt is an integer and by linear interpolation for other t.11. If n n n nt is an integer. a2 < b2 . . . called weak convergence. (This result follows from the central limit theorem. It is common to use Bt (“B” for Brownian) or Wt (“W” for Wiener. This means that Yk = Sk − Sk−1 equals +1 with probability 1 . . b2 ). E Xt = 0 and E (Xt )2 = t. and each a1 < b1 . Xtk ∈ [ak . bk ]) → P(Zt1 ∈ [a1 .

Wt is Ft measurable by the definition of Ft . We used here the facts that Wt − Ws is independent of Fs and that E [Wt − Ws ] = 0 because Wt and Ws have mean 0. We will also need Proposition 11.1. then E |Wt | < ∞ for all t. E [Wt | Fs ] = E [Wt − Ws | Fs ] + E [Ws | Fs ] = E [Wt − Ws ] + Ws = Ws . but it is not differentiable. Proposition 11. We calculate E [Wt2 − t | Fs ] = E [((Wt − Ws ) + Ws )2 | Fs ] − t 2 = E [(Wt − Ws )2 | Fs ] + 2E [(Wt − Ws )Ws | Fs ] + E [Ws | Fs ] − t 2 = E [(Wt − Ws )2 ] + 2Ws E [Wt − Ws | Fs ] + Ws − t. That Wt2 − t is integrable and is Ft measurable is as in the above proof. It turns out that the function t → Wt (ω) is continuous. Wt has continuous paths. but contains no intervals. one of the crucial properties of a Brownian motion is that it is a martingale with continuous paths. in fact one cannot define a derivative at any value of t. Let us prove this. As part of the definition of a Brownian motion. We used the facts that Ws is Fs measurable and that (Wt − Ws )2 is independent of Fs because Wt − Ws is. is equal to t − s. Wt is a martingale with respect to Ft and Wt has continuous paths. The first term.) The key property is to show E [Wt | Fs ] = Ws . Since the distribution of Wt is that of a normal random variable with mean 0 and variance t. the paths of Brownian motion have a huge number of oscillations. Proof. There is nothing special about 0 – the same is true for the set of times at which Wt (ω) is equal to a for any level a. Proof. because Wt − Ws is normal with mean 0 and variance t − s. (In fact. the last line is equal to 2 2 (t − s) + 0 + Ws − t = Ws − s 47 . E |Wt |n < ∞ for all n. Another bizarre property: if one looks at the set of times at which Wt (ω) is equal to 0. Wt2 − t is a martingale with continuous paths with respect to Ft . Substituting.2.We will use Brownian motion extensively and develop some of its properties. In what follows. The second term on the last line is equal to Ws E [Wt − Ws ] = 0. As one might imagine for a limit of a simple random walk. this is a set which is uncountable.

b ∈ [−∞. A sequence of random variables Xn converges weakly to X if P(a < Xn < b) → P(a < X < b) for all a. we throw in all the null sets into each Ft . and mean by this that E [F (Xn )] → E [F (Z)] whenever F is a bounded continuous function on C([0. Let C([0. One example of such a function F would be F (f ) = sup0≤t<∞ |f (t)| if f ∈ C([0. The reason one wants to show that Xn converges weakly to Z instead of just showing (2) is that weak convergence can be shown to imply that Z has continuous paths. the “almost all” in (5) means that t → Zt (ω) is continuous for all ω. It will not be true in general that Xn converges to X almost surely. Xk ) to converge to a random vector (X1 . Finally. that is. ∞) to the reals. and hence the filtration Ft satisfies the “usual” conditions. ∞) be the collection of all continuous functions from [0. But saying that the normalized random walks Xn (t) above converge weakly to Zt actually says more than (2). . . Note 1. we actually want to let Ft to be the completion of σ(Zs : s ≤ t). ∞)). there is some redundancy in the definition: one can show that parts of the definition are implied by the remaining parts. then P(X = a) = P(X = b) = 0 for all a and b. except for a set of ω of probability zero. We say that the processes Xn converge weakly to the process Z. A result from probability theory says that Xn converges to X weakly if and only if E [f (Xn )] → E [f (X)] whenever f is a bounded continuous function on R. Note 2.as required. Another 1 would be F (f ) = 0 f (t)dt. but we won’t worry about this. If Xn converges to a normal random variable. We use this to define weak convergence for stochastic processes. . . Xk ). This is a metric space. so the notion of function from C([0. . ∞] such that P(X = a) = P(X = b) = 0. one can give an analogous definition. . n m For a sequence of random vectors (X1 . This is the type of convergence that takes place in the central limit theorem. . ∞)). a and b can be infinite. First of all. 48 . One can prove that the resulting Ft are right continuous. . ∞)) to R being continuous makes sense. Second.

where s1 < s2 < · · · < sn is a partition of [0. where f and g are continuous and g is continuously differentiable. One can show (using the mean value theorem. t If one wants to consider the (deterministic) integral 0 f (s) dg(s).e.a] (s)g (s) ds = 0 g (s) ds = g(a) − g(0).a] (s) (which is not continuous. Let us consider a very special case first. for example) that t t f (s) dg(s) = 0 0 f (s)g (s) ds. t].12. If we were to take f (s) = 1[0. Stochastic integrals.1) =E +E f ( 2in )f ( 2jn )[W ( i+1 ) − W ( 2in )] [W ( j+1 ) − W ( 2jn )]. We will show that it can be defined as the limit in L2 of Riemann sums.a] (s) dg(s) = 0 0 1[0. The expression f (s) dW (s) does not make sense as a Riemann-Stieltjes integral because it is a fact that W (s) is not differentiable as a function of t. the first and last terms make sense for any g. Suppose we take a Riemann sum approximation 2n −1 In = i=0 f ( 2in )[W ( i+1 ) − W ( 2in )]. we can define it analogously to the n usual Riemann integral as the limit of Riemann sums i=1 f (si )[g(si ) − g(si−1 )]. Let us calculate the second moment: 2 E In = E i 2n −1 f ( 2in )[W ( i+1 ) − W ( 2in )] 2n f ( 2in )2 [W ( i+1 ) − W ( 2in )]2 2n i=0 2 (12. Note that although we use the fact that g is differentiable in the intermediate stages. We need to define the expression by some other means. E In = 0. 2n 2n i=j The first sum is bounded by 1 f ( 2in )2 n 2 1 ≈ 0 f (t)2 dt. 2n Since Wt has zero expectation for each t.. We now want to replace g by a Brownian path and f by a random integrand. This is known as the Riemann-Stieltjes integral. Suppose f is continuous and deterministic (i. The resulting integral is called a stochastic integral. one would expect the following: t t a 1[0. but that is a minor matter here). i 49 . does not depend on ω).

2) Before we proceed we will need to define the quadratic variation of a continuous martingale.1.1 and call it the quadratic variation process of M . these are the only two properties of Brownian motion that play a significant role in the construction.1.1) is zero. we saw in Proposition 11. 50 . (12. (a) This is E [Wb − Wa ] = 0 by the independence of Wb − Wa from Fa and the fact that Wb and Wa have mean zero. There exists one and only one increasing process At that is adapted to Ft . Theorem 12. 2 2 (b) E [Wb − Wa | Fa ] = E [(Wb − Wa )2 | Fa ] = b − a. where t Nt = 0 Hs dWs . E [W ( i+1 − W ( 2in )] [W ( j+1 − W ( 2jn )] = E [W ( i+1 − W ( 2in )]E [W ( j+1 − W ( 2jn )] = 0. it turns out that N t = t 0 2 Hs ds. The simplest example of such a martingale is Brownian motion. (a) E [Wb − Wa | Fa ] = 0. This calculation is the key to the stochastic integral. Lemma 12. Let Wt be a Brownian motion.since the second moment of W ( i+1 ) − W ( 2in ) is 1/2n . for all t. and A0 = 0 such that Mt2 − At is a martingale. and similarly a continuous martingale is a martingale with continuous paths. Hence W t = t. Suppose Mt is a continuous martingale such that E Mt2 < ∞ for all t. has continuous paths. We often say a process is a continuous process if its paths are continuous.2 that Wt2 − t is a martingale. We will use the following theorem without proof because in our applications we can construct the desired increasing process directly. We will see later that in the case of stochastic integrals. 2n 2n 2n 2n and so the second sum on the right hand side of (12. We use the notation M t for the increasing process given in Theorem 12. We now turn to the construction. We will t construct 0 Hs dWs for all H with t E 0 2 Hs ds < ∞. Proof. We will only consider integrands Hs such that Hs is Fs measurable for each s (see Note 1). and in fact. So in this case At = t almost surely. If Wt is a Brownian motion. We will use the following frequently. Using the independence and the 2n fact that Wt has mean zero.

where we considered Riemann-Stieltjes integrals. In the case s < a < t < b. and define the stochastic integral to be the process Nt .(b) (Wb − Wa )2 is independent of Fa .b] (s) ds. and the first equality follows by applying (a). we write 2 2 2 E [Wb − Wa | Fa ] = E [((Wb − Wa ) + Wa )2 | Fa ] − E [Wa | Fa ] 2 = E [(Wb − Wa )2 | Fa ] + 2E [Wa (Wb − Wa ) | Fa ] + E [Wa | Fa ] 2 − E [Wa | Fa ] = E [(Wb − Wa )2 | Fa ] + 2Wa E [Wb − Wa | Fa ]. We next construct the integral for H simple and here the difficulty is calculating the second moment. that is.3) We first construct the stochastic integral for H elementary. Since Wb − Wa is a N (0. E N∞ = E [G2 (b − a)] and t N t = 0 G2 1[a. E [Nt | Fs ] is equal to E [G(Wt − Wa ) | Fs ] = E [GE [Wt − Wa | Fa ] | Fs ] = 0 = Ns . We say H is simple if it is a finite linear combination of elementary processes. n Hs (ω) = i=1 Gi (ω)1(ai . where Nt = G(Wt∧b − Wt∧a ). Compare this to the first paragraph of this section. Let us look at E [Nt | Fs ]. In the case a < s < t < b.bi ] (s). We construct the stochastic integral in three steps. so the conditional expectation is the same as E [(Wb − Wa )2 ]. We say an integrand Hs = Hs (ω) is elementary if Hs (ω) = G(ω)1(a. this is equal to E [G(Wt − Wa ) | Fs ] = GE [(Wt − Wa ) | Fs ] = G(Ws − Wa ) = Ns . b − a). 51 . To prove the first equality in (b). (12. First step. Finally we consider the case of general H. let Hs (ω) = G(ω)1(a.b] (s) where 0 ≤ a < b and G is bounded and Fa measurable. If G is bounded and Fa measurable. the second equality in (b) follows. Nt is a continuous martingale. 2 Proposition 12.b] (s).2. the work here is showing the stochastic integral is a martingale. The continuity is clear. Proof.

Second step. we could write Hs = G1 1(a1 . We have 2 E N∞ = E G2 (Wbi − Wai )2 + 2E i i<j Gi Gj (Wbi − Wai )(Wbj − Waj ) . We may rewrite H so that the intervals (ai . and N t 2 Proposition 12. Nt is a martingale.a2 ] + (G1 + G2 )1(a2 . these are done similarly.bi ] . b3 = b2 . E N∞ = E t 2 Hs ds. i=1 So now we have H simple but with the intervals (ai . if we had a1 < a2 < b1 < b2 . and then if we set G1 = G1 .1(b) 2 2 2 E N∞ = E [G2 E [(Wb − Wa )2 | Fa ]] = E [G2 E [Wb − Wa | Fa ]] = E [G2 (b − a)]. a < b < s < t. and a < s < b < t. 52 . For N t . 2 For E N∞ . 0 = Proof. we need to show E [G2 (Wt∧b − Wt∧a )2 − G2 (t ∧ b − t ∧ a) | Fs ] = G2 (Ws∧b − Ws∧a )2 − G2 (s ∧ b − s ∧ a). a3 = b1 . G3 = G2 and a1 = a1 . b. we have written H as 3 Gi 1(ai . a2 = a2 . bi ] non-overlapping.3.b2 ] . The sum of continuous processes will be continuous. In this case define the stochastic integral t n Nt = 0 Hs dWs = i=1 Gi (Wbi ∧t − Wai ∧t ). and t.b1 ] + G2 1(b1 . b2 = b1 . Since the sum of martingales is clearly a martingale. s. b1 = a2 . We do this by checking all six cases for the relative locations of a.3). we do one of the cases in Note 2. G2 = G1 + G2 . as < a < b < t. ∞ 0 2 Hs ds. bi ] satisfy a1 ≤ b1 ≤ a2 ≤ b2 ≤ · · · ≤ bn . Nt is a continuous martingale. we have using Lemma 12. Next suppose Hs is simple as in (12. For example. so Nt has continuous paths.The other possibilities are s < t < a < b.

Using some results from ∞ n n measure theory (Note 3). Taking expectations. by Lemma 12. Note 3 has details.3) we have t 2 E [sup(Ntn − Ntm )2 ] = E sup t t 0 ∞ n m (Hs − Hs ) dWs n m (Hs − Hs ) dWs 2 ≤ 4E 0 ∞ = 4E 0 n m (Hs − Hs )2 ds → 0. and in fact that is what is going on here. i i i So 2 E N∞ ∞ 0 2 Hs ds. and one can show (Note 3) that there exists a process Nt such that t E sup t 0 n Hs dWs − Nt 2 → 0. we have E [Gi Gj (Wbi − Wai )E [(Wbj − Waj ) | Faj ] = 0.The terms in the second sum vanish. i and this is the same as E 2 Third step. because when we condition on Faj . E Nt2 = E 0 Hs ds. By Doob’s inequality (Theorem 10. This should look reminiscent of the definition of Cauchy sequences. or the limit is independent of which sequence H n we choose.1 E [G2 (Wbi − Wai )2 ] = E [G2 E [(Wbi − Wai )2 | Fai ]] = E [G2 ([bi − ai )]. 53 . The triangle inequality then implies (see Note 3 again) ∞ n m (Hs − Hs )2 ds → 0. then E ( 0 (Hs − Hs ) dWs )2 = t n n E 0 (Hs − Hs )2 ds → 0. 0 E Define Ntn = t 0 n Hs dWs using Step 2. Now suppose Hs is adapted and E 0 Hs ds < ∞. E [Gi Gj (Wbi − Wai )(Wbj − Waj )] = 0. we can choose Hs simple such that E 0 (Hs − Hs )2 ds → 0. In the present context Cauchy sequences converge. For the terms in the first sum. ∞ n = i=1 E[G2 ([bi − ai )]. See t t 2 2 Note 4 for the proof that Nt is a martingale. t n n n n If Hs and Hs are two sequences converging to H. and N t = 0 Hs ds.

n Because supt [ 0 Hs dWs − Nt ] → 0. If 0 Hs d M s + 0 |Hs | |dAs | < ∞ and Xt = Mt + At . look at Note 7. and with probability one. we conclude t X. A semimartingale is the sum of a martingale and a ∞ ∞ 2 process of bounded variation. Y t by what is known as polarization: X. which is what one would expect. if we replace Wt by a t 2 continuous martingale Mt and Hs is adapted with E 0 Hs d M s < ∞. There are some other extensions of the definition that are not hard. Given two semimartingales X and Y we define X. we define X t = Mt . t We write Nt = W. t t t t + − |A|t is then defined to be At + At . 0 t Hs dWs and Yt = Ks dWs . 2 2 In particular. 54 . For a semimartingale. A process At is of bounded variation if the paths of At have bounded variation. where the first integral on the right is a stochastic integral and the second is a RiemannStieltjes or Lebesgue-Stieltjes integral. = t 0 2 Hs ds with a similar formula for Y t . t 0 Hs dWs and call Nt the stochastic integral of H with respect to We discuss some extensions of the definition. This means that one can write At = A+ −A− . Y As an example. Note 7 has more on this. Nt has continuous paths (Note 5). if d M s = Ks ds. we define t t t Hs dXs = 0 0 Hs dMs + 0 Hs dAs . The following holds. if Xt = so X +Y Since X t t t 0 t t = 1[ X + Y 2 t 0 t t − X t − Y t ]. we can define the stochastic integral by defining it for t ≤ TN for suitable stopping times TN and then letting TN → ∞. t (Hs + Ks )dWs . one can show there exists a subsequence such that the convergence takes place almost surely. then (X + Y )t = t = 0 (Hs + Ks )2 ds = 0 2 Hs ds + 0 2Hs Ks ds + 0 2 Ks ds. where A+ and A− have paths that are increasing. we can duplicate everything we just did (see Note 6) with ds replaced by d M s and get a stochastic integral. we replace ds by Ks ds. First of all. Y t = 0 Hs Ks ds. If the random ∞ 2 variable 0 Hs d M s is finite but without its expectation being finite.

b]) (s) where G is bounded and Fa measurable for some a < b. This can be seen most easily if Hs = G1[a.2 Proposition 12. Suppose Ks is adapted to Fs and E 0 Ks ds < ∞.4) . Let us be more precise concerning the measurability of H that is needed. ∞) × Ω as follows. We need to show E [G2 (Wt − Wa )2 − G2 (t − a) | Fs ] = G2 (Ws − Wa )2 − G2 (s − a). We define a σ-field P on [0. Then E 0 Hs Ks ds < ∞ 0 and t t ∞ Hs dNs = 0 0 Hs Ks dWs . we are allowed to buy and sell continuously and instantaneously. So continuous processes are predictable. This gives an indication of where the name comes from. and we can “predict” the value of Ht from the values at times that come before t. t For us. What we are not allowed to do is look into the future to make our decisions. If Ht is only right continuous and a path has a jump at time t. If Hs has paths which are left continuous. then 0 Hs dZs is like a filter that increases or decreases the volume by a factor Hs . one sees that processes whose paths are functions which are continuous from the left at each time point are also predictable. so can be viewed as a map from [0. which is where the Hs adapted condition comes in. If one is slightly more careful. Consider the collection of processes of the form G(ω)1(a. then the process is called predictable. and if a process H is measurable with respect to P. an interpretation is that Zt represents a stock price.b]) (s). The argument for the proof is given in Note 8. Suppose Hs is adapted and E 0 Hs d N s < ∞.4. then Ht = 1 limn→∞ Ht− n . If Hs has continuous paths. Let us consider the case a < s < t < b. P is called the predictable or previsible σ-field. The majority of the integrands we will consider will be continuous. this is not possible. Note 1. Let Nt = t ∞ ∞ 2 2 2 Ks dWs . What we require for our integrands H is that they be predictable processes. ω) → Hs (ω). Then 0 Hs dZs represents our profit (or loss) if we hold Hs shares at time s. Define P to be the smallest σ-field with respect to which every process of this form is measurable. Note 2. then approximating continuous functions by step functions shows that such an H can be approximated by linear combinations of processes of the form G(ω)1(a. again similar arguments take care of the other five cases. So we buy G(ω) shares at time a and sell them at time b. What does a stochastic integral mean? If one thinks of the derivative of Zt as being t a white noise.b] . ∞) × Ω to R by H : (s. The stochastic integral represents our profit or loss. 55 (12. H is a stochastic process. Since we are in continuous time.

The last expression is equal to the right hand side of (12. The space L2 is defined to be the set of functions f for which f 2 < ∞. Define N = E sup Nt2 0≤t<∞ 1/2 . We write this as G2 E [((Wt − Ws ) + (Ws − Wa ))2 − (t − a) | Fs ] = G2 E [(Wt − Ws )2 | Fs ] + 2E [(Wt − Ws )(Ws − Wa ) | Fs ] + E [(Ws − Wa )2 | Fs ] − (t − a) = G2 E [(Wt − Ws )2 ] + 2(Ws − Wa )E [Wt − Ws | Fs ] + (Ws − Wa )2 − (t − a) = G2 (t − s) + 0 + (Ws − Wa )2 − (t − a) . Another theorem from n measure theory says that the collection of simple functions (functions of the form i=1 ci 1Ai ) is dense in L2 with respect to the metric. f f with respect to the measure µ.) If one defines a distance between two functions f and g by d(f. namely. i.The left hand side is equal to G2 E [(Wt − Wa )2 − (t − a) | Fs ]. the space of processes N such that N < ∞ is complete with respect to this norm. this says that if H is n measurable with respect to P. Note 3. this is essentially an L2 norm. the L2 norm with respect to the measure µ defined on P by ∞ µ(A) = E 0 1A (s. (A technical proviso: one has to identify as equal functions which differ only on a set of measure 0. Let us define a norm on stochastic processes. m ≥ n0 . there exists N with N < ∞ such that N n − N → 0. Moreover. g) = f − g 2 . that is. then there exist simple processes Hs that are also measurable with respect to P such that H n − H 2 → 0. the L2 norm of . This means that if N n is a Cauchy sequence. Since the set of simple functions with respect to µ is dense in L2 . if given ε there exists n0 such that N n − N m < ε whenever n. Define ∞ H 2 = E 0 2 Hs ds 1/2 . and a theorem from measure theory says that L2 is complete with respect to this metric. We can define another norm on stochastic processes. is defined as f (x)2 µ(dx) 1/2 2. One can show that this is a norm. then the Cauchy sequence converges.. and hence that the triangle inequality holds. A definition from measure theory says that if µ is a measure.4).e. 56 . This can be viewed as a standard L2 norm. this is a metric on the space L2 . ω)ds.

or Nt is a martingale. A] − E [Nt . We have N n − N → 0. By Cauchy-Schwarz this is less than Ntn − Nt Ntn + Nt .7) tends to 0. This means that E [sup |Ntn − Nt |2 ] → 0. A] − E [(Nt )2 . Since Ns is Fs measurable and has the same expectation over sets A ∈ Fs as Nt does. We leave the proof to the reader.s.402) one writes |E [(Ntn )2 . Suppose N n − N → 0. A] = E [Ns . but slightly more delicate.5) By Cauchy-Schwarz. A]. |E [Ntn . then n E [Ntn .5) yields E [Nt . we see that the left hand side of (12. (12. where the norm is described in Note 3. A] = E [Ns . then by Proposition 4. Let s < t and A ∈ Fs . A]. (12. A]| ≤ E [ |(Ntn )2 − (Nt )2 |] ≤ E [ |Ntn − Nt | |Ntn + Nt |]. but note that in place of (SEC. Each N n is a stochastic integral of the type described in Step 2 of the construction. That Nt2 − N t is a martingale is similar to the proof that Nt is a martingale. We have a similar limit when t is replaced by s. Given ε > 0 there exists n0 such that N n − N < ε if n ≥ n0 . Take ε = 1 and choose n0 .6) since Ntn + Nt ≤ Ntn + Nt is bounded independently of n.3 E [Nt | Fs ] = Ns . t A result from measure theory implies that there exists a subsequence nk such that sup |Ntnk − Nt |2 → 0. 57 . A] ≤ E [(Ntn − Nt )2 ] ≤ N n − N → 0. hence each Ntn n is a martingale. N ≤ Nn + Nn − N ≤ Nn + 1 < ∞ since N n is finite for each n. where the norm here is the one described in Note 3.7) 1/2 E [12 ] A 1/2 (12. so taking the limit in (12. We have N n − N → 0. A]| ≤ E [ |Ntn − Nt |. t a. Since E [Ntn | Fs ] = Ns . By the triangle inequality.Note 4. Note 5.

using the paragraph above to handle the wider class of H’s. replacing Ws by Ms and ds by t d M s . one can show that if t ≤ SK1 and t ≤ SK2 . this is the primary reason we considered Doob’s inequalities. 2 Note 7. if necessary. then t NtK1 = NtK2 a. the first time |Mt | is larger than or equal to K.1(a). then one can show M K is a martingale bounded t K in absolute value by K. then 0 Hs d M s ≤ K and there is no t K difficulty defining NtK = 0 Hs dMs for every t. This is the analogue of Lemma 12. With these two properties in place of Lemma 12. Each Ntnk (ω) is continuous by Step 2. If Mt is a continuous martingale. and we let Hs = Hs 1(s≤TK ) . One can show that if t ≤ TK1 and TK2 . So we can define Jt = 0 Hs dMtK for every t. Ntnk (ω) converges to Nt (ω) uniformly. therefore Nt (ω) is a continuous function of t. this allows one to define the stochastic integral Nt for each t in the case t 2 where the integral 0 Hs d M s is finite for every t. even if the expectation of the integral is not. If 0 Hs d M s is finite for every t. 2 2 E [(Mb − Ma )2 | Fa ] = E [Mb | Fa ] − 2E [Mb Ma | Fa ] + E [Ma | Fa ] 2 2 = E [Mb | Fa ] − 2Ma E [Mb | Fa ] + Ma 2 = E [Mb − M b | Fa ] + E [ M a b − M a 2 | Fa ] − Ma + M a = E[ M since Mt2 − M t b − M | Fa ]. SK ).So except for a set of ω’s of probability 0. We can do something similar is Mt is a martingale but where we do not have E M ∞ < ∞. E [Mb − Ma | Fa ] = E [Mb | Fa ] − Ma = Ma − Ma = 0.1. That 2 2 E [Mb − Ma | Fa ] = E [ M b − M a | Fa ] is just a rewriting of 2 E [Mb − M b 2 | Fa ] = Ma − M a 2 = E [Ma − M a | Fa ]. If we let MtK = Mt∧SK . is a martingale. Let SK = inf{t : |Mt | ≥ K}. where t ∧ Sk = min(t.s.1(b). Incidentally. t 58 . We use the common value as a definition of the stochastic integral Jt . To show the analogue of Lemma 12. If we call the common value Nt . the construction of the stochastic integral 0 Hs dMs goes through exactly as above. then the value of the stochastic integral will be the same no matter whether we use M K1 or M K2 as our martingale. and the uniform limit of continuous functions is continuous. the first time the integral is larger ∞ K K than or equal to K. If we let TK = inf{t > 0 : 0 Hs d M s ≥ K}. We have SK → ∞ as K → ∞. so we have a definition of Jt for each t. then TK → ∞ as K → ∞. Note 6. Again.

8) when both H and K are elementary.801) holds for Hs simple and then takes limits. In this situation.8) for Hs elementary. use linearity to extend it to the case when K is simple. Thus one reduces the proof to showing (12. first prove this in the case when Ks is elementary. it suffices to look at Hs elementary and use linearity. To show this. To show (12. (12.8) one shows that (SEC. one can explicitly write out both sides of the equation and see that they are equal. 59 . We only outline how the proof goes. and then take limits to obtain it for arbitrary K. To show t t Hs dNs = 0 0 Hs Ks dWs .Note 8.

If f ∈ C 2 . Let Wt be Brownian motion. Xt = σWt − σ 2 t/2. Ito’s formula. and e σWt −σ 2 t/2 t t =1+ 0 e 1 2 t Xs σdWs − 1 2 0 eXs 1 σ 2 ds 2 (13. For a more general semimartingale Xt = Mt + At . which is sometime known as the change of variables formula. says that t t f (Wt ) − f (W0 ) = 0 f (Ws )dWs + 1 2 f (Ws )ds. By Taylor’s theorem.1. that is. n−1 f (Wt ) − f (W0 ) = i=0 n−1 [f (W(i+1)t/n ) − f (Wit/n )] f (Wit/n )(W(i+1)t/n − Wit/n ) i=1 n−1 ≈ + 1 2 i=0 f (Wit/n )(W(i+1)t/n − Wit/n )2 . Ito’s formula reads Theorem 13. Then X t = σW t = σ 2 t. The first sum on the right is approximately the stochastic integral and the second is approximately the quadratic variation.13. The idea behind the proof is quite simple. Let us look at an example. f (x) = f ”(x) = ex . and f (x) = ex . 0 60 . Ito’s formula. In Ito’s formula we have a second order term to carry along.1) t + =1+ eXs σ 2 ds 0 eXs σdWs . f and its first two derivatives are continuous. then t t f (Xt ) − f (X0 ) = 0 f (Xs )dXs + 1 2 0 f (Xs )d M s . Suppose Wt is a Brownian motion and f : R → R is a C 2 function. 0 Compare this with the fundamental theorem of calculus: t f (t) − f (0) = 0 f (s)ds.

It may also be viewed as an integration by parts formula. . We then have t t k−1 kWs dWs 0 t Wtk = = k W0 + + 1 2 k−2 k(k − 1)Ws d W 0 t k−2 Ws ds. known as L´vy’s theorem. t Applying Ito’s formula with f (x) = x2 to X and to Y . we define X. Let Xt = Wt and let f (x) = xk . If Xt and Yt are semimartingales. t t Wt3 t 0 Xt Yt = X0 Y0 + 0 Xs dYs + 0 Ys dXs + X. For a semimartingale Xt = Mt +At we set X t = M t . 0 s k−1 kWs dWs + 0 k(k − 1) 2 When k = 3. then d t 0 f (Xt ) − f (X0 ) = i=1 ∂f i (Xs )dXs + ∂xi d 1 2 i. Y t . Xt ) is a vector. Y . ∂x2 i The following application of Ito’s formula. Given two semimartingales X. Applying Ito’s formula with f (x) = x2 to Xt + Yt . X j s .j=1 0 t ∂2f (Xs )d X i . each component of which is a semimartingale. then 2 Xt = 2 X0 +2 0 t Xs dXs + X t and Yt2 = Y02 +2 0 Ys dYs + Y t . . Then f (x) = kxk−1 and f (x) = k(k − 1)xk−2 . Proof.This example will be revisited many times later on. and f ∈ C 2 . this says − 3 Ws ds is a stochastic integral with respect to a Brownian motion. Then some algebra and the fact that 2 1 Xt Yt = 2 [(Xt + Yt )2 − Xt − Yt2 ] yields the formula. 1 d There is a multidimensional version of Ito’s formula: if Xt = (Xt . e 61 . is important. Y t = 1 [ X + Y t − X t − Y t ]. Let us give another example of the use of Ito’s formula.2. we obtain t (Xt + Yt )2 = (X0 + Y0 )2 + 2 0 (Xs + Ys )(dXs + dYs ) + X + Y t . 2 The following is known as Ito’s product formula. Proposition 13. . and hence a martingale. .

A]. Proof. whereas mX (a) might be infinite.v. The one special case we will need is that if X is 2 a normal r. We will prove that Mt is a N (0. We apply Ito’s formula with f (x) = eiux . which shows that Mt is a mean 0 variance t normal r. hence has 0 expectation. X is defined by mX (a) = E eaX and that if two random variables have the same moment generating function. Since a stochastic integral is a martingale. This is also true if we replace a by iu. Taking expectations and using M s = s and the fact that a stochastic integral is a martingale.v. Then Mt is a Before proving this. for the remainder of the proof see Note 1. The reason for looking at the characteristic function is that ϕX always exists. we have Ee iuMt u2 =1− 2 t eiuMs ds. with mean 0 and variance t.v. then ϕX (u) = e−u t/2 . t). Note 1. Suppose Mt is a continuous martingale with M Brownian motion. we 1 now arrive at K (t) = − 2 u2 K(t) with K(0) = P(A). If A ∈ Fs and we do the same argument with Mt replaced by Ms+t − Ms . In this case we have ϕX (u) = E eiuX and ϕX is called the characteristic function of X. so K(t) = P(A)e−u 62 2 t/2 . Then t t eiuMt = 1 + 0 iueiuMs dMs + 1 2 0 (−u2 )eiuMs d M s . they have the same law. t). The equation can be rewritten J(t) = 1 − u2 2 t J(s)ds. recall from undergraduate probability that the moment generating function of a r.3. t = t. Since 2 E eiuMt = e−u t/2 . Multiply this by 1A and take expectations. 0 So J (t) = − 1 u2 J(t) with J(0) = 1. This follows from the formula for mX (a) with a replaced by iu (this can be justified rigorously). the stochastic integral term again has expectation 0. 0 Let J(t) = E eiuMt . The solution to this elementary ODE is J(t) = 2 2 e−u t/2 .Theorem 13. If we let K(t) = E [eiu(Mt+s −Mt ) . . we have t t eiu(Ms+t −Ms ) = 1 + 0 iueiu(Ms+r −Ms ) dMr + 1 2 0 (−u2 )eiu(Ms+r −Ms ) d M r . then by our remarks above the law of Mt must be that of a N (0.

so P(Ms+t − Ms ∈ B. take A = Ω in (13. This implies that Ms+t − Ms is independent of Fs . multiply by f (u). 63 . replace u in the above by −u.2) If f is a nice function and f is its Fourier transform. By taking limits we have this for f = 1B . A = E eiu(Mt+s −Ms ) P(A). and integrate over u.) We then have E [f (Ms+t − Ms ). A) = P(Ms+t − Ms ∈ B)P(A).Therefore E eiu(Mt+s −Ms ) . (13. A] = E [f ((Ms+t − Ms )]P(A). Note Var (Mt − Ms ) = t − s. (To do the integral.2). we approximate the integral by a Riemann sum and then take limits.

Now let us define a new probability by setting Q(A) = E [Mt .1. we need conditional expectations. Suppose P is a probability and dXt = dWt + µ(Xt )dt. To define mean and variance. Therefore a process might be a Brownian motion with respect to one probability and not another. then E [Mt . Wt is a Brownian motion and Xt is not. What the Girsanov theorem says is Theorem 14. then under Q the process Xt − Dt is a martingale where t 1 Dt = d X.1) Let Mt = exp − 0 t t µ(Xs )dWs − 0 µ(Xs )2 ds/2 . Xt is a Brownian motion. A] (14.2) Then as we have seen before. This is equal to E M0 = 1. Mt is a martingale. (14. Ω] = E Mt . to be a martingale. This is short hand for t Xt = X0 + Wt + 0 µ(Xs )ds. If Xt is a martingale under P. We also observe that M0 = 1. We had better be sure this Q is well defined. and the conditional expectation of a random variable depends on what probability is being used. Under Q. where Wt is a Brownian motion. Similarly. (14. Under Q. Theorem 14. we need at a minimum that Xt is mean zero and variance t. We also check that Q(Ω) = E [Mt . This calculation is reviewed in Note 1. There is a more general version of the Girsanov theorem. the process Wt is no longer a Brownian motion. A] because Mt is a martingale. A] = E [Ms . In order for a process Xt to be a Brownian motion. 0 Ms 64 .2. M s . Under P. since Mt is a martingale. Most of the other parts of the definition of being a Brownian motion also depend on the probability. If A ∈ Fs ⊂ Ft .14. we need a probability.3) if A ∈ Ft . by Ito’s formula. The Girsanov theorem.

and suppose dSt = σSt dWt + mSt dt.X t is the same under both P and Q. Let St be the stock price.1) Mt is a martingale and t 2 /2σ 2 )t . Hence dSt = σSt (dWt + (m/σ)dt) = σSt dWt . σ Let Xt = Wt . We want to compute the probability that Xt exceeds the level a by time t0 . Let us give another example of the use of the Girsanov theorem. a.1 can be used. c ≤ Wt ≤ d) = c ϕ(t. Mt = 1 + 0 − m Ms dWs . So we have found a probability under which the asset price is a martingale. or St = S0 + 0 t σSs dWs is a martingale. (14. Suppose Xt = Wt + µt. which we have been calling P.4) where ϕ(t. µ(x) = m for all x. M Ms s =− 0 m ds = −(m/σ)t. 65 .) Define Mt = e(−m/σ)(Wt )−(m Then from (13. Let us see how Theorem 14. x) = 2 √ 1 e−x /2t 2πt 2 √ 1 e−(2a−x) /2t 2πt x≥a x < a. we have d P(At > a. a. x). This means that Q is the risk-neutral probability. (So in the above formulation.3). We first need the probability that a Brownian motion crosses a level a by time t0 . If At = sups≤t Wt . σ Define Q by (14. M Therefore 0 t t = 0 − m Ms ds = − σ t t Ms 0 m ds. Then t X. (note we are not looking at |Wt |). By Theorem 14. σ 1 d X.2. where µ is a constant. under Q the process Wt = Wt + (m/σ)t is a martingale.

A]. Since Q(A) = E P [Mt . So W. A] s≤t0 = −∞ eµx−µ 2 2 t0 /2 a P(sup Ws ≥ a. (14. and the name is due to the derivation. 2πt0 Proof of Theorem 14. x < a. but this is not precise because Wt is a continuous random variable and both sides of the above equation are zero. M t t =− 0 µ(Xr )Mr dr. s≤t0 This in turn is equal to Q(sup Ws ≥ a) = Q(A). 2 Now let Wt be a Brownian motion under P. given in Note 2. it is not hard to see that E Q [Wt . At > a) = P(Wt = 2a − x). We want to calculate P(sup (Ws + µs) ≥ a). 66 . Theorem 14.This is called the reflection principle. Let A = (sups≤t0 Ws ≥ a). A] = E P [Mt Wt . A]. Using Ito’s formula with f (x) = ex . Let Yt = Wt − µt.4) is the rigorous version of the reflection principle. Let dQ/dP = Mt = eµWt −µ t/2 . s≤t0 Wt is a Brownian motion under P while Yt is a Brownian motion under Q. We have Wt = Yt + µt.1 says that under Q. Sometimes one says P(Wt = x.1. t Mt = 1 − 0 µ(Xr )Mr dWr . Yt is a Brownian motion. So this probability is equal to Q(sup (Ys + µs) ≥ a). Wt0 = x)dx 2 1 eµx e−(2a−x) /2t0 dx + 2πt0 ∞ = e−µ t0 /2 −∞ √ √ a 2 1 eµx e−x /2t0 dx. s≤t0 Now we use the expression for Mt : Q(A) = E P [eµWt0 −µ ∞ 2 t0 /2 .

A + E Q 0 µ(Xr ) dr.2 and in Note 4 we show how Theorem 14. A s Mt µ(Xr )dr. A µ(Xr )dr. A] + E P [ W. M r . M t . A . Note 1. A] + E P [ W. 0 We apply Ito’s formula with the function f (x) = ex . A]. M t .2. Xt − t is a martingale with respect to Q. A . Using the product formula again. A . A + E P 0 t Wr dMr .1 is really a special case of Theorem 14. The last term on the right is equal to t t t EP s d W. Therefore E Q Wt + 0 t s µ(Xr )dr. A] = E Q [Ws . M s . 2 Similarly. Let Yt = − 0 t t µ(Xs )dWs − 1 2 [µ(Xs )]2 ds. this is E P [Ms Ws . A + E P W.By Ito’s product formula this is t t EP 0 t Mr dWr . Thus the above is equal to s s EP 0 Mr dWr . M t − W. Since 0 Mr dWr and 0 Wr dMr are stochastic integrals with respect to martingales. A . which shows Xt is a martingale with respect to Q. In Note 3 we give a proof of Theorem 14. A + E P W. A = E Q Ws + 0 µ(Xr )dr. A = E Q − s t s = −E Q 0 µ(Xr ) dr. By L´vy’s theorem. A = E P − = EP − Mr µ(Xr )dr. 67 . M t − W. A = E P − s t s t E P [Mt | Fr ]µ(Xr )dr. they are themselves martingales. A + E P 0 Wr dMr . so t Y t = 0 [−µ(Xs )]2 ds. Xt is a e Brownian motion. M s . Note the martingale part of Yt is the stochastic integral term and the quadratic variation of Y is the quadratic variation of the martingale part.

A + E P 0 s Xr dMr . A] = E P [Mt Xt . Let Sn be a simple random walk.4) for Sn . Assume without loss of generality that X0 = 0. . then the probability on the left is equal to the number of paths that end up at 2a − x divided by the total number of paths. . we proceed as follows. M − X. A]. then automatically An ≥ a.Then f (x) = ex . this completes the argument that Mt is a martingale. If you are playing a game where you toss a fair coin and win $1 if it comes up heads and lose $1 if it comes up tails. which is P(Sn = x. note that if x ≥ a and Sn = x. . are independent 1 and identically distributed random variables with P(Xi = 1) = P(Xi = −1) = 2 . To establish (14. X2 .4) can be derived from this using a weak convergence argument. An ≥ a) = P(Sn = x) x≥a P(Sn = 2a − x) x < a. M s . 0 Since stochastic integrals with respect to a Brownian motion are martingales.5). M t . let Sn = n i=1 Xi . A] = EP 0 Mr dXr . so the only case to consider is when x < a. Since the number of paths that hit a and end up at x is equal to the number of paths that end up at 2a − x. this is P(Sn = 2a − x). We will show the analogue of (14. Note 3. Note 2. Any path that crosses a but is at level x at time n has a corresponding path determined by reflecting across level a at the first time the Brownian motion hits a. A + E P 0 t = E Q [Xs . . (14. The probability on the left hand side of (14. A] + E Q [ X. 68 . E Q [Xt . f (x) = ex . Let An = max0≤k≤n Sk . M t . This means that X1 . A] Xr dMr . and hence t t Mt = eYt = eY0 + 0 t eYs dYs + 1 2 1 2 eYs d Y 0 t s =1+ 0 Ms (−µ(Xs )dWs − t 1 2 t [µ(Xs )]2 ds 0 + =1− Ms [−µ(Xs )]2 ds 0 Ms µ(Xs )dWs . the reflected path will end up at a + (a − x) = 2a − x.5) is the number of paths that hit a and end up at x divided by the total number of paths. which is the right hand side. Then if A ∈ Fs . A + E P [ X. then Sn will be your fortune at time n. To prove Theorem 14. A + E P [ X.2.5) (14. A] t t = EP 0 s Mr dXr .

By L´vy’s theorem. Here is an argument showing how Theorem 14. Note 4. M t − X. and therefore d X. this means that X is a Brownian motion under Q. A s = EP = E P [(Dt − Ds )Mt . A]. A] = E P s t d X.2 we see that under Q. A Mr dDr .Here we used the fact that stochastic integrals with respect to the martingales X and M are again martingales. M t = −Mt µ(Xt )dt. A Mt dDr . From our formula for M we have dMt = −Mt µ(Xt )dWt .2. M s . On the other hand. Hence by Theorem 14. The proof of the quadratic variation assertion is similar. A] = E Q [Dt − Ds . t E P [ X. e 69 . M r . A = EP s t = EP s t E P [Mt | Fr ] dDr .1 can also be derived from Theorem 14. Xt is a continuous martingale with X t = t.

This is an abbreviation for the equation t i Xt d j σij (Xs )dWs 0 j=1 t = xi 0 + + 0 bi (Xs )ds.2) holds for almost every ω. The intuition behind (15.2) Here Wt is a Brownian motion. |b(x) − b(y)| ≤ c|x − y| for some constant c. We have to make some assumptions on σ and b. see Note 1. . σ is sometimes called the diffusion coefficient and µ is sometimes called the drift coefficient. Xt ). which is why they are written σ(Xt ) and b(Xt ). .15. Wtd are d independent Brownian motions. . . the coefficients vary. 1 d Here the initial value is x0 = (x1 . We are interested in the existence and uniqueness for stochastic differential equations (SDEs) of the form dXt = σ(Xt ) dWt + b(Xt ) dt.2). d. (15. We assume they are Lipschitz. The above theorem also works in higher dimensions. If all of the σij and bi are Lipschitz and grow at most linearly. xd ). . . and 0 0 Wt1 . (15. .1) Xt = x0 + 0 σ(Xs ) dWs + 0 b(Xs ) ds. We also suppose that σ and b grow at most linearly. |b(x)| ≤ c(1 + |x|). This means Xt satisfies t t X0 = x0 . which is how existence and uniqueness for ordinary differential equations is proved. which means: |σ(x) − σ(y)| ≤ c|x − y|. . . The idea of the proof is Picard iteration. Let Wt be a Brownian motion. the solution process is Xt = (Xt . we have existence and uniqueness for the solution. . . . and (15. We want to solve d i dXt = j=1 j σij (Xs )dWs + bi (Xs )ds. There exists one and only one solution to (15. Stochastic differential equations. i = 1. Theorem 15.1) is that Xt behaves locally like a multiple of Brownian motion plus a constant drift: locally Xt+h − Xt ≈ σ(Wt+h − Wt ) + b((t + h) − t). When Xt is at different points. which means: |σ(x)| ≤ c(1 + |x|). However the constants σ and b depend on the current value of Xt .1. 70 . . . . .

From (15.2) we know X t = t 0 σ(Xs )2 ds. and it turns out that linear equations are almost the only ones that have an explicit solution. Let 2 Zt = Z0 eaWt −a t/2+bt . where we write 1 Lf (x) = 2 σ(x)2 f (x) + µ(x)f (x). 71 . we obtain t t f (Xt ) = f (X0 ) + 0 t f (Xs )dWs + 0 µ(Xs )ds + 1 2 f (Xs )σ(Xs )2 ds 0 t t = f (X0 ) + 0 f (Xs )dWs + 0 Lf (Xs )ds. We will verify that this is correct by using Ito’s formula. The uniqueness result above (Theorem 15. Let f be a 2 C function. Note that this equation is linear in Zt . This is the integrated form of the equation we wanted to solve. Then Xt is a semimartingale with martingale part aWt and X t = a2 t.Suppose one wants to solve dZt = aZt dWt + bZt dt. t t Zt = Z0 + 0 t eXs dXs + aZs dWs − 0 t 1 2 eXs a2 ds 0 t 2 0 = Z0 + + t a Zs ds + 2 t bZs ds 0 1 2 a2 Zs ds 0 t = 0 aZs dWs + 0 bZs ds. If we apply Ito’s formula. t t f (Xt ) = f (X0 ) + 0 f (Xs )dXs + 1 2 0 f (Xs )d X s . There is a connection between SDEs and partial differential equations.1) shows that we have in fact found the solution. Let Xt = aWt − a2 t/2 + bt. If we substitute for dXs and d X s . Zt = eXt . By Ito’s formula with f (x) = ex . In this case we can write down the explicit solution and then verify that it satisfies the SDE.

which implies g must be 0. Note 1. t (σ(Xs ))2 ds 0 s ≤ ct and similarly for E Yt2 . so g(t) ≤ c 0 c 0 g(r) dr ds. This fact can be exploited to derive results about PDEs from SDEs and vice versa. and for simplicity. If we let g(t) = E |Xt − Yt |2 . If X and Y are two solutions. Let us illustrate the uniqueness part. Since the stochastic integral with respect to a Brownian motion is a martingale. we have t g(t) ≤ c 0 2 Since we are assuming σ is bounded.L is an example of a differential operator. So E |Xt − Yt |2 = E 0 t t |σ(Xs ) − σ(Ys )|2 ds ≤ c 0 E |Xs − Ys |2 ds. using the Lipschitz hypothesis on σ. E Xt = E g(t) ≤ ct. Proof of uniqueness. t Xt − Yt = 0 [σ(Xs ) − σ(Ys )]dWs . Iteration implies g(t) ≤ Atn /n! for each n. Then t g(s) ds. 72 . assume b is identically 0 and σ is bounded. we see from the above that t f (Xt ) − f (X0 ) − 0 Lf (Xs )ds is a martingale.

The most common model by far in finance is one where the security price is based on a Brownian motion. and in fact we gave the solution in Section 15. one expects the mean rate of return to be higher than the risk-free interest rate r because one expects something in return for undertaking risk. a Brownian motion can become negative. dSt = σSt dWt + µSt dt.2) Proof.1) is given by St = S0 eσWt +(µ−(σ 2 /2)t) . 73 . Using Theorem 15. It is the proportional increase one wants. so we need to verify that St as given in (16.000 in a stock selling for $100 and it goes up to $200. The solution to (16. namely. We obtain t t St = eXt = eX0 + 0 t eXs dXs + t 1 2 eXs d X 0 s =1+ 0 Ss σdWs + 0 t 1 Ss (µ − 2 σ 2 )ds + t 1 2 Ss σ 2 ds 0 t =1+ 0 Ss σdWs + 0 Ss µds. We already did this. So the model that is used is to let the stock price be modeled by the SDE dSt /St = σdWt + µdt.2) satisfies (16. One does not want to say the price is some multiple of Brownian motion for two reasons.1) Fortunately this SDE is one of those that can be solved explicitly. one expects a mean rate of return µ on one’s investment that is positive (otherwise. which doesn’t make sense for stock prices.1). Second.1. Therefore one sets ∆St /St to be the quantity related to a Brownian motion. In fact. $1. but it is important enough that we will do it again. Different stocks have different volatilities σ (consider a high-tech stock versus a pharmaceutical). if one invests $1. Let us first assume S0 = 1. why not just put the money in the bank?). (16.000 in a stock selling for $1 and it goes up to $2. of all. First. and apply Ito’s formula. Let Xt = σWt + (µ − (σ 2 /2)t. let f (x) = ex . (16.16. Proposition 16. Continuous time financial models. as if one invests $1.1 there will only be one solution.000. In addition. or what looks better. one has the same profit.

stock prices are discrete. etc.2). but this model has proved to be a very good one. at time t0 one has the original wealth Xt0 . where we have t ≥ ti+1 and ∆(s) = ∆i for ti ≤ s < ti+1 . In other words. Suppose for the moment that the interest rate r is 0. and so one’s wealth is now Xt0 + ∆0 (St1 − St0 ). When we hold ∆i shares of stock from ti to ti+1 . (16.which is (16. then one’s wealth at time t will be Xt0 + ∆0 (St1 − St0 ) + ∆1 (St2 − St1 ) + · · · + ∆i (Sti+1 − Sti ). If one purchases ∆0 shares (possibly a negative number) at time t0 . just multiply both sides by S0 . One now pays ∆1 St1 for ∆1 shares at time t1 and continues. dPt = e−rt dSt − re−rt St dt = e−rt σSt dWt + e−rt µSt dt − re−rt St dt = σPt dWt + (µ − r)Pt dt. (16. The requirement that the integrand of a stochastic integral be adapted is very natural: we cannot base the number of shares we own at time s on information that will not be available until the future. If S0 = 0. Note that P0 = S0 . then changes the investment to ∆1 shares at time t1 . but one can trade as many shares as one wants and vary the amount held in a continuous fashion. our profit in present days dollars will be ∆i (Pti+1 − Pti ). Similarly to (16.1).1) (often called geometric Brownian motion). So Pt = e−rt St . the solution to this SDE is Pt = P0 eσWt +(µ−r−σ 2 /2)t .3) To see this.4) The continuous time model of finance is that the security price is given by (16. By Ito’s product formula. that there are no transaction costs. The formula for our wealth then becomes t Xt0 + t0 ∆(s)dPs . At time t1 one sells the ∆0 shares for the price of St1 per share.. 74 . then changes the investment to ∆2 at time t2 . How should we modify this when the interest rate r is not zero? Let Pt be the present value of the stock price. One buys ∆0 shares and the cost is ∆0 St0 . This clearly is not the way the market actually works.3) is the same as t Xt0 + t0 ∆(s)dSs . The right hand side of (16. our wealth is given by a stochastic integral with respect to the stock price. for example.

FT is the collection of events A such that A ∩ (T > t) ∈ Ft for all t. The analogue of FT in the discrete case is the following: if N is a stopping time. Fix r and let Zt = Wt+r − Wr . If Xk is a sequence that is adapted to the σ-fields Fk . then knowing the path of W up to time s gives no help in predicting Wt+r − Wt . which is known as the Markov property. T taking values in [0. let FN = {A : A ∩ (N ≤ k) ∈ Fk for all k}. . Let Wt be a Brownian motion. It will be simpler to consider the discrete time case. Let GN be the smallest σ-field with respect to which each of these random variables XN is measurable.v. The σ-field FN is just a bit easier to work with. Where did the sequence Xk come from? It could be any adapted sequence. Markov properties of Brownian motion. So a reasonable definition of FN should allow us to calculate XN whenever we know which events in FN have occurred or not. . One can also check the other parts of the definition to show that Zt is also a Brownian motion. 1. Clearly the map t → Zt is continuous since the same is true for W . if we want to predict Wt+r and we know Wt .17. Let us try to provide some motivation for this definition of FT . Phrased another way. If T is a stopping time. Xk is Fk measurable when k = 0. To make a satisfactory theory.. Because Wt+r − Wt is independent of σ(Ws : s ≤ t). Now we proceed to the strong Markov property for Brownian motion. we want XN to be FN measurable. 75 . the proof of which is given in Note 2. . Or phrased another way. In other words. In particular. then knowing the path up to time t gives no additional advantage in predicting Wt+r . then knowing which events in Fk have occurred allows us to calculate Xk for each k. Let’s try to give a more precise description of this property. we need that the Ft be right continuous (see Section 10). that is. We show in Note 1 that FN = GN . this says that to predict the future. 2. ∞) such that (T ≤ t) ∈ Ft for all t. Since Zt − Zs = Wt+r − Ws+r . Therefore one definition of the σ-field of events occurring before time N might be: Consider the collection of random variables XN where Xk is a sequence adapted to Fk . but this is fairly technical and we will ignore it. we want GN to be the σ-field generated by the collection of random variables XN for all sequences Xk that are adapted to Fk . we only need to know where we are and not how we got there. then the distribution of Zt − Zs is normal with mean zero and variance (t + r) − (s + r). Recall that a stopping time in the continuous framework is a r.

Define A A A PA (AAA) = P(X0 = A. . . Here is another formulation of the Markov property. Define ϕ(x) = E x f (Xt−s ). X2 = A). C. Suppose we have a probability P and three different Markov chains. Similarly we have Xn . the extra information given in FT does you no good at all. Define Ω to be set of continuous functions on [0. The right hand side is to be interpreted as follows. (ABA). Y1 (ω) = B. We need a way of expressing the Markov and strong Markov properties that will generalize to other processes. and let the σ-field be the one generated by the Xt . ∞).1.2. X2 = A) (this will be 0 because we know X0 = B). So X0 = A. If s < t and f is bounded or nonnegative. So we now have one process. (AAB). and X1 can B A be one of A. . We also define PC . The first. Define Px on (Ω . which is known as Brownian motion started at x. Proposition 17. you could do it knowing all of FT or just knowing XT . then E x [f (Xt ) | Fs ] = E Xs [f (Xt−s )]. . Suppose we have a Markov chain with 3 states. set Y0 (ω) = A. X1 = A. If Xt is a Brownian motion and T is a bounded stopping time. . (BAB). . . X2 can be one of A. Consider the process Wtx = x + Wt . Define B B B B PB (AAA) = P(X0 = A.}. Y2 (ω) = A. F ) by Px (Xt1 ∈ A1 . Similarly define PA (AAB).s. and similarly for the other values of ω. We prove this in Note 3. 1. Then E Xs f (Xt−s ) means ϕ(Xs (ω)). and Xn . then E x [f (Xt )g(Xu ) | Fs ] = E Xs [f (Xt−s )g(Xu−s )]. An example in the Markov chain setting might help. Since XT +t − XT is independent of FT . B. Xtn ∈ An ) = P(Wtx ∈ A1 . One often writes Pt f (x) for E x f (Xt ). Yn . . let Xt (ω) = ω(t). (BAA). . B. 76 . and C. . and so on. . the chain C started at B. .. a. and three probabilities PA . B. there really isn’t all that much going on here. then XT +t − XT is a mean 0 variance t random variable and is independent of FT . . . and similarly for all the other 26 values of ω. represents the position at time n for the chain started at A. called A A A Xn . If ω = ABA. A. No knowledge of Markov chains is necessary to understand this. . . PB . 2. Let Wt be a Brownian motion. As you can see. PC . Wtx ∈ An ). X1 = A. This formula generalizes: If s < t < u. Define Ω = {(AAA). So Ω denotes the possible sequence of states for time n = 0.Proposition 17. This proposition says: if you want to predict XT +t . . . C. 1 n What we have done is gone from one probability space Ω with many processes Wtx to one process Xt with many probability measures Px .

Proposition 17. we conclude FN = GN . we can define new probabilities by x x Px (Xt1 ∈ A1 . then (N = j) ∈ Fj ⊂ Fk . then (N ≤ j) ∈ Fj and (N ≤ j − 1)c ∈ Fj−1 ⊂ Fj . the event (XN > a) ∈ FN . Xtn ∈ An ) = P(Xt1 ∈ A1 . and so the event (N = j) = (N ≤ j) ∩ (N ≤ j − 1)c is in Fj . .and so on for functions of X at more times. Since A ∈ FN . . Now (XN > a) ∩ (N = j) = (Xj > a) ∩ (N = j). Therefore we need to show that for such a sequence Xk and any real number a. This is similar to what we did in defining Px for Brownian motion. If we let Xt denote the solution to t t x Xt = x + 0 x σ(Xs )dWs + 0 x b(Xs )ds. We have thus shown that FN ⊂ GN . The statement is precisely the same as the statement of Proposition 17.3. If j ≤ k. the family (Px . we suppose that A ∈ FN . If T is a bounded stopping time. But XN = 1A∩(N ≤N ) = 1A . . so Xk is Fk measurable. To show FN ⊂ GN . Xtn ∈ An ).1. Since N is a stopping time. . then E x [f (XT +t ) | FT ] = E XT [f (Xt )]. The event (Xj > a) ∈ Fj since X is an adapted sequence. . and combining with the previous paragraph. One can show that when there is uniqueness for the solution to the SDE.3. . x so that Xt is the solution of the SDE started at x. We want to show GN = FN . . Xt ) satisfies the Markov and strong Markov property. but here we do not have translation invariance. . to show GN ⊂ FN . Let Xk = 1A∩(N ≤k) . j=0 which proves that (XN > a) ∈ FN . Therefore (XN > a) ∩ (N ≤ k) = ∪k ((XN > a) ∩ (N = j)) ∈ Fk .2 can be extended to stopping times. so A = (XN > 0) ∈ GN . Since GN is the smallest σ-field with respect to which XN is measurable for all adapted sequences Xk and it is easy to see that FN is a σ-field. 77 . the statement and proof of Proposition 17. then A∩(N ≤ k) ∈ Fk . Using Proposition 17. it suffices to show that XN is measurable with respect to FN whenever Xk is adapted. We can also establish the Markov property and strong Markov property in the x context of solutions of stochastic differential equations. Note 1.

Thus we can approximate f (x) by a linear combination of terms of the form eiuj x .Note 2. (t−s)/2 . we see that XT +t − XT has the same distribution as Xt . A ∩ Tn = k/2n ] n n 2 2 = Let n → ∞. A] = E f (Xt )P(A). Using linearity and taking limits. ϕ(y) = E y [f (Xt−s )] = E [eiu(Wt−s +y) ] = eiuy e−u 2 2 (t−s)/2 . Let f be continuous and A ∈ FT .2. 2π We can first approximate this improper integral by 1 2π N e−iux f (u) du −N N 1 by taking N larger and larger. (k + 1)/2n ). then one can recover f by the formula 1 f (x) = e−iux f (u) du. Let Tn be defined by Tn (ω) = (k + 1)/2n if T (ω) ∈ [k/2n . which is that of a mean 0 variance t normal random variable. If we take A = Ω and f = 1B . Taking limits this equation holds for all bounded f . If f is smooth enough and has compact support. Note 3. We use f (u) = eiux f (x) dx. It is easy to check that Tn is a stopping time. We have E [f (XTn +t − XTn ). we see that P(XT +t − XT ∈ B. recall from undergraduate analysis that every bounded function is the limit of linear combinations of functions eiux . There are various slightly different formulas for the Fourier transform. Finally. A) = P(Xt ∈ B)P(A) = P(XT +t − XT ∈ B)P(A). bounded functions can be approximated by smooth functions with compact support. E [f (XT +t − XT ). 78 . Let f (x) = eiux . which implies that XT +t − XT is independent of FT . Before proving Proposition 17. Then E x [eiuXt | Fs ] = eiuXs E x [eiu(Xt −Xs ) | Fs ] = eiuXs e−u On the other hand. we have the lemma for all f . So ϕ(Xs ) = E x [eiuXt | Fs ]. For each N we can approximate 2π −N e−iux f (u) du by using Riemann sums. If we let A ∈ FT be arbitrary and f = 1B . Then A ∈ FTn as well. u ∈ R. This follows from using the inversion formula for Fourier transforms. Proof. so E [f (X k 2n +t −X k 2n )]P(A ∩ Tn = k/2n ) = E f (Xt )P(A). A] = E [f (X k +t − X k ).

1) V =c+ 0 Hs dWs . where the Fs are the σ-field generated by a Brownian motion. we can write t Mt = V = c + 0 Hs dWs . Suppose Ms is a martingale adapted to Fs . provided E V 2 < ∞. then there exists a constant c and t 2 an adapted integrand Hs with E 0 Hs ds < ∞ such that t Hs dWs . they are all technical. There are several proofs of Theorem 18. so it doesn’t matter which one we work with. Unfortunately. s ≤ t.1. In mathematical terms. one can exactly replicate the result (no matter what the market does) by buying and selling shares of stock. The stochastic integral is a martingale. Set V = Mt . 79 . let us explain why this is called a martingale representation theorem. Our goal is to prove Theorem 18. We already knew that stochastic integrals were martingales. We want to show that if V is Ft measurable.1. Don’t forget that we need E Mt2 < ∞ and Ms adapted to the σ-fields of a Brownian motion.2) we see that Ft is also the same as the σ-field generated by Ss . we let Ft be the σ-field generated by Ws . If V is Ft measurable and E V 2 < ∞. proved in Note 2. In the next section we use this to show that under the model of geometric Brownian motion the market is complete.18. By Theorem 18. too. From (16. Martingale representation theorem. This means that no matter what option one comes up with.1. then every random variable V that is Ft measurable can. what this says is the converse: every martingale can be represented as a stochastic integral. In Note 1 we show that if every martingale can be represented as a stochastic integral. We outline one proof here. so for r ≤ t. t r Mr = E [Mt | Fr ] = c + E 0 Hs dWs | Fr = c + 0 Hs dWs . then there exists Hs adapted such that V = V0 + where V0 is a constant. In this section we want to show that every random variable that is Ft measurable can be written as a stochastic integral of Brownian motion. Before we prove this. s ≤ t. Suppose also that E Mt2 < ∞. We start with the following. giving details in the notes. (18.

V is Ft measurable. E |V n − V |2 → 0. then f1 (Wt1 − Wt0 )f2 (Wt2 − Wt1 ) · · · fn (Wtn − Wtn−1 ) is in R.) Proposition 18. then we can represent V .4. By this we mean t R = {V : E V 2 < ∞.3. What this proposition says is that if we can represent a sequence of random variables Vn and Vn → V .1. 0 for some adapted H with E Next we show R contains a particular collection of random variables. Suppose t V cn → c. An almost identical proof shows that if f is bounded. If t0 ≤ t1 ≤ · · · ≤ tn ≤ t and f1 . See Note 4 for the proof. fn are bounded functions. Then there exist a t 2 constant c and an adapted Hs with E 0 Hs ds < ∞ so that t t Vt = c + 0 Hs dWs . n = cn + 0 n Hs dWs . then t f (Wt − Ws ) = c + s Hr dWr for some c and Hr . n and for each n the process H n is adapted with E 0 (Hs )2 ds < ∞. . . Proof of Theorem 18. We now finish the proof of Theorem 18. We have shown that random variables of the form f1 (Wt1 − Wt0 )f2 (Wt2 − Wt1 ) · · · fn (Wtn − Wtn−1 ) 80 (18.2. Let R be the collection of random variables that can be represented as stochastic integrals. We have shown that a large class of random variables is contained in R.Proposition 18. . (The proof is in Note 3. If g is bounded.V = c + 0 t Hs dWs 2 Hs ds < ∞}. . Proposition 18.1. the random variable g(Wt ) is in R.2) .

E |(V n − cn ) − (V m − cm )|2 → 0 as n. Clearly if Vi ∈ R for i = 1. Measure theory tells us that L2 is a complete metric space.are in R. Applying this with r = t. 81 . m → ∞. t V = E [V | Ft ] = Mt = c + 0 Hs dWs . By our assumptions. and U has the desired form. .2). let Mr = E [V | Fr ]. so r Mr = c + 0 Hs dWs for suitable H. Finally. t Let Ut = 0 Hs dWs . so t n |Hs − Hs |2 ds → 0. this means t E 0 n m |Hs − Hs |2 ds → 0. then a1 V1 + · · · am Vm is also in R. and this implies Hs is adapted. So t E 0 n m (Hs − Hs )dWs 2 → 0. If V is Ft measurable with E V 2 < ∞. t E |(V n − cn ) − Ut |2 = E 0 n (Hs − Hs )2 ds → 0. 0 E n In particular Hs → Hs . Note 1. From our formulas for stochastic integrals.2. . m. from measure theory we know that if E V 2 < ∞ and V is Ft measurable. . Suppose we know that every martingale Ms adapted to Fs with E Mt2 can be r represented as Mr = c+ 0 Hs dWs for some suitable H. we can find a sequence Vk such that E |Vk − V |2 → 0 and each Vk is a linear combination of random variables of the form given in (18. Note 2. We prove Proposition 18. . Therefore Ut = V − c. We know this is a martingale. Then as above.2. Another consequence. due to Fatou’s t 2 lemma. and ai are constants. n This says that Hs is a Cauchy sequence in the space L2 (with respect to the norm · 2 given by Y 2 = E Ys2 ds there exists Hs such that t 0 1/2 ). Now apply Proposition 18. is that E 0 Hs ds.

So if we multiply (18. Set K r = Kr if s ≤ r < t and 0 otherwise. If we multiply both sides by e −u2 t/2 . t t eXt = 1 + 0 eXs (−iu)dWs + 0 t eXs (u2 /2)ds + = 1 − iu 0 1 2 t eXs (−iu)2 ds 0 eXs dWs . then its Fourier transform f will also be very nice.2. which is a constant and hence adapted. g(Wu − Wt ) = d + t Ks dWs . we obtain t e −iuWt = cu + 0 u Hs dWs (18.. Then s X.2 we take limits and obtain the proposition. s = 0 H r K r dr = 0. By Ito’s formula with Xs = −iuWs + u2 s/2 and f (x) = ex . let us do the case n = 2 for clarity. Set H r = Hr if 0 ≤ s < t and 0 otherwise. 82 . because we approximate our integral by Riemann sums.Note 3.3 we now have that t u f (Wt ) = c + 0 Hs dWs . C ∞ with compact support). Y = cd + 0 [Xr K r + Yr H r ]dWr .3) for an appropriate constant cu and integrand H u . Y Then by the Ito product formula. s s Xs Ys = X0 Y0 + 0 s Xr dYr + 0 s Yr dXr + X. Here is the proof of Proposition 18. From Proposition 18. The argument is by induction.3. Let s s Xs = c + 0 H r dWr and Ys = d + 0 K r dWr . If f is a smooth function (e.) Now using Proposition 18.g. we obtain t f (Wt ) = c + 0 Hs dWs for some constant c and some adapted integrand H. (We implicitly used Proposition 18. Note 4. So we suppose V = f (Wt )g(Wu − Wt ). and then take a limit.3) by f (u) and integrate over u from −∞ to ∞.

83 . that is exactly what we wanted.If we now take s = u. this is needed to do the general induction step. Note that Xr K r + Yr H r is 0 if r > u.

84 .1) Since Wt is a Brownian motion under P. so the σ fields generated by Pt and Wt are the same. Wt = Wt − at is a Brownian motion under P. (19. since it is a stochastic integral of a Brownian motion. As we mentioned in Section 16. We can rewrite (19. we then have dPt = σPt dWt . Under P the present day value of the stock price is a martingale. But then using (19.1) as dWt = σ −1 Pt−1 dPt . If we choose a = −(µ − r)/σ.19. So dPt = σPt dWt + σPt adt + (µ − r)Pt dt. We have therefore proved Theorem 19. Moreover.2) we have t V =c+ 0 −1 Hs σ −1 Ps dPs . there is a probability P under which Pt is a martingale. If Pt is a geometric Brownian motion and V is Ft measurable and square integrable. (19.1 that there exist a t 2 constant and an adapted process Hs such that E 0 Hs ds < ∞ and t V =c+ 0 Hs d W s . The probability P is called the risk-neutral measure. dP By the Girsanov theorem. we know by Theorem 18. then given Pt we can determine Wt and vice versa. Define a new probability P by dP = Mt = exp(aWt − a2 t/2).2) Given a Ft measurable variable V . Now let Pt be a geometric Brownian motion. Recall Pt satisfies dPt = σPt dWt + (µ − r)Pt dt. Completeness. then there exist a constant c and an adapted process Ks such that t V =c+ 0 Ks dPs . then Pt must be a martingale. if Pt = P0 exp(σWt + (µ − r − σ 2 /2)t).1.

Therefore W0 must be less than or equal to c. That leaves us a profit of erT (W0 − c) if W0 > c. we must have W0 ≥ c.1) and under P. (20. since we can’t get a riskless profit. Recall that under P the stock price satisfies dPt = σPt dWt . So then Pt = P0 eσWt −σ 85 2 t/2 . we can sell the option V for W dollars. the process Ps is a martingale. without any risk. At time T the buyer of our option exercises it and we use V dollars to meet that obligation. Suppose we have the standard European option. We can now derive the formula for the price of any option. we have by Theorem 19. If W0 < c. The formula in the statement of Theorem 20. is amenable to calculation. This is the “no arbitrage” principle again.20. and hold −Ks shares of stock at time s. and invest according to the strategy of holding Ks shares at time s.1). Suppose the price of the option V at time 0 is W . where V = e−rt (St − K)+ = (e−rt St − e−rt K)+ = (Pt − e−rt K)+ . we obtain E V = c. In fact. So taking expectations in (20. The price of V must be E V . if we use c of those dollars. By the same argument. Starting with 0 dollars.1.1 that T V =c+ 0 Ks dPs . . Let T ≥ 0 be a fixed real. Proof. If V is FT measurable. Finally. where Wt is a Brownian motion under P. then at time T we will have erT (W0 − c) + V dollars. or W0 = c. and use the W dollars to buy and trade shares of the stock. I. Black-Scholes formula. we just reverse things: we buy the option instead of sell it. under P the process Pt is a martingale. Theorem 20.1.

It is of considerable interest that the final formula depends on σ but is completely independent of µ. T )). Now √ 2 xeσ T Z−σ T /2 > e−rT K if and only if √ log x + σ T Z − σ 2 T /2 > −r + log K. Therefore.) The price of the option V is E [PT − e−rT K]+ . 2 Pt = P0 eσWt −σ t/2 . (We used the Girsanov formula to get rid of the µ. Note 1. T ) = g(x. similarly to formulas we have already done. Under P the process Pt satisfies dPt = σPt dWt . where Wt is a Brownian motion. T )) − Ke−rT Φ(h(x.3) which is independent of µ since Pt is. 86 . T ) = √ h(x. T ) − σ T . Since WT is a normal √ random vairable with mean 0 and variance T . We want to calculate E (xeσWT −σ 2 T /2 − e−rT K)+ . so we can do some calculations (see Note 1) and end up with the famous Black-Scholes formula: W0 = xΦ(g(x. where Φ(z) = √1 2π 2 z e−y /2 −∞ 2 dy. x = P0 = S0 . The reason for that can be explained as follows.Hence E V = E [(PT − e−rT K)+ ] = E [(P0 eσWT −(σ 2 (20. where Z is a standard mean 0 variance 1 normal random variable.2) − e−rT K)+ ]. and there is no µ present here. /2)T We know the density of WT is just (2πT )−1/2 e−y /(2T ) . we can write it as T Z. (20. σ T g(x. (20. log(x/K) + (r + σ 2 /2)T √ .4) where Wt is a Brownian motion under P and we write x for P0 = S0 .

So (20. 2 87 . √ This is the Black-Scholes formula if we observe that σ T − z0 = g(x. Recall that 1 − Φ(z) = Φ(−z) for all z by the symmetry of the normal density. We write z0 for the right hand side of the above inequality. T ).4) is equal to ∞ √1 2π (xeσ z0 √ T z−σ 2 T /2 ∞ − e−rT K)+ e−z √ −2σ T z+σ 2 T √ 2 T) 2 /2 dz ∞ = x √1 2π = x √1 2π = x √1 2π e− 2 (z z0 ∞ z0 ∞ √ z0 −σ T 1 1 2 dz − Ke−rT √1 2π e−z z0 2 /2 dz e− 2 (z−σ dz − Ke−rT (1 − Φ(z0 )) e−y /2 dy − Ke−rT Φ(−z0 ) √ = x(1 − Φ(z0 − σ T )) − Ke−rT Φ(−z0 ) √ = xΦ(σ T − z0 ) − Ke−rT Φ(−z0 ). T ) and −z0 = h(x.or if Z > (σ 2 T /2) − r + log K − log x.

1). other than worrying about some integrability conditions. It is not always possible to compute H. where g(x) = (eσx−σ 2 T /2 − e−rT K)+ − E V. We have dXt = dWt and dXt = −dt. So we need to use the multidimensional version of Ito’s 1 2 2 formula. (21. We are working here with the risk-neutral probability only. First. We illustrate one technique with two examples. but in many cases of interest it is possible. Recall from the section on the Markov property that Pt f (x) = E f (Wt ) = E f (x + Wt ) = x √ 1 −(y)2 /2t e f (x + y)dy. suppose we want to hedge the standard European call V = e−rT (ST − K)+ = (PT − e−rT K)+ . x2 ) = Px2 g(x1 ) to the process 1 2 Xt = (Xt . By Proposition 4. Since Xt is a decreasing process and has 88 . It turns out t it makes no difference: the definition of 0 Hs dXs for a semimartingale X does not depend on the probability P. but we would also T like to know what the hedging strategy is. 2πt Let Mt = E [g(WT ) | Ft ]. By the Markov property Proposition 17. or if we worked for a bank and wanted to provide an option for sale. we see that Mt = E Wt [g(WT −t ] = PT −t g(Wt ). We can rewrite V as V = E V + g(WT ).21.3) Now let us apply Ito’s formula with the function f (x1 . Xt ) = (Wt .2) Therefore it suffices to find the representation of the form (21. Therefore the expectation of g(WT ) is 0. W is a Brownian motion. Hedging strategies. we know that Mt is a martingale. The previous section allows us to compute the value of any option. what should Hs be? This might be important to know if we wanted to duplicate an option that was not available in the marketplace. This means. if we know V = E V + 0 Hs dSs . Recall that under P. 0 (21. σPs (21. T − t).3.1) then since dPt = σPt dWt . If we write g(WT ) as T Hs d W s . we have T g(WT ) = c + 0 1 Hs dPs .2.

and this turns out to be correct. X 2 t 2 t = 0. X0 ) t + 0 i=1 2 ∂f i (Xt )dXt ∂xi t + =c+ 1 2 t 0 0 ∂2f (Xt )d X i . where z = (s. Xt ) t = 0 and d X 1 . but also the stock price at time 1. the rest of the argument is very similar to the first example. z Let Pu f (z) = E f (Zu ).j=1 ∂f (Xt )dWt + some terms with dt. however. we then have T g(WT ) = MT = 0 ∂ PT −s g(Ws )dWs . t) is a Markov process. we would definitely have another prediction. Nt . ∂x If we take t = T . let us suppose the interest rate r is 0. Let g(Zt ) = Nt − E NT . the largest the stock price ever is up to time T . and we want to predict the maximum up to time 2. the reasoning goes like this: suppose the maximum up to time 1 is $100. while d X 1 t = dt. We conclude t Mt = 0 ∂ PT −s g(Ws )dWs . Once we believe this. so we can compute its value. if not. let’s look at the sell-high option. n. that the triple Zt = (St . Intuitively. Then Mt = E [g(ZT ) | Ft ] = E Zt [g(ZT −t )] = PT −t g(Zt ). If the stock price at time 1 is close to $100. so the sum of the terms involving dt must be zero. then d X 2 says that 1 2 f (Xt .no martingale part. the maximum up to time t. It is not the case that Nt is a Markov process. Adding in the information about the current stock price gives a certain amount of evidence to predict the future values of Nt . Here the payoff is sups≤T Ss . Ito’s formula = 1 2 f (X0 . Let Nt = sups≤t Ss . X j ∂xi ∂xj i. t). This same intuitive reasoning does suggest. 89 . adding in the history of the stock prices up to time t gives no additional information. How can one get the equivalent outcome without looking into the future? For simplicity. ∂x1 But we know that f (Xt ) = PT −t g(Wt ) = Mt is a martingale. ∂x and we have our representation. This is FT measurable. f (Xt ) would have a bounded variation part. For a second example. while if the stock price at time 1 is close to $2. then we have one prediction. So the prediction for N2 does not depend just on N1 .

90 . so all the dt and dNt terms must cancel. ∂s where again g(s. t) = PT −t g(s. s)dSs . Therefore we should be left with the martingale term. which is also of bounded variation.We then let f (s. which is t 0 ∂ PT −s g(Ss . and most cases can be done as well by an appropriate use of the Markov property. and hence N t = 0. which is the martingale term. n. which are of bounded variation. and it can be explicitly calculated. and we get a term involving dNt . Ns . n. t) and apply Ito’s formula. n. This gives us our hedging strategy for the sell-high option. we get some terms involving dt. When we apply Ito’s formula. The process Nt is always increasing. so has no martingale part. we get a dSt term. There is another way to calculate hedging strategies. But Mt is a martingale. using what is known as the Clark-Haussmann-Ocone formula. t) = n. This is a more complicated procedure.

T − t) − f (S0 . 1 d Here Xt = (Xt . In this section we work with the actual price of the stock instead of the present value. . Also. T − t) for all t. Here is a second approach to the Black-Scholes formula. From the SDE that St solves. it allows one to compute more easily what the equivalent strategy of buying or selling stock should be to duplicate the outcome of the given option. respectively. . Since the value of the portfolio at time t is Vt = at St + bt βt . dXt = −dt. t t Vt − V0 = 0 au dSu + 0 bu dβu . X 2 t = 0. Let Vt be the value of the portfolio and assume Vt = f (St . II.4) (22.3) . and X 1 . Then Vt − V0 = f (St . On the other hand. T − u) du. This approach works for European calls and several other options.2) This formula says that the increase in net worth is given by the profit we obtain by holding au shares of stock and bu bonds at time u.1) fs (Su .j=1 fxi xj (Xs ) d X i . T − t). . we must have bt = (Vt − at St )/βt . Recall Ito’s formula. We also want VT = (ST − K)+ . We apply this with d = 2 and Xt = (St . The multivariate version is t d i fxi (Xs ) dXs 0 i=1 f (Xt ) = f (X0 ) + 1 + 2 t d 0 i. T ) t t (22.22. 0 On the other hand. T − u) dSu − t + 1 2 2 σ 2 Su fxx (Su . and similarly for the second partial derivatives. Black-Scholes formula. 1 2 d X t = σ 2 St dt. but does not work in the generality that the first approach does. (22. if au and bu are the number of shares of stock and bonds. X 2 t = 0 (since T − t is of bounded variation and hence has no 2 martingale part). X j s . Xt ) and fxi denotes the partial derivative of f in the xi direction. T − u) du 0 = 0 fx (Su . where f is some function that is sufficiently smooth. Also. held at time u. . recall βt = β0 ert . 91 (22.

2 and f (x. i. T − t) + σ 2 St fxx (St . f (x.To match up (22.6) leads to the parabolic PDE fs = 1 σ 2 x2 fxx + rxfx − rf. (22.2) with (22. the cost of setting up the equivalent portfolio. ∞) × [0. we must therefore have at = fx (St . T − t) 2 for all t and all St ..1). Equation (22. T ). T − t)] = −fs (St .5) shows what the trading strategy should be. T − t) − St fx (St .e. s) ∈ (0. (22. 92 . T − t) and 1 2 r[f (St . T ) is what V0 should be. 0) = (x − K)+ . (22.5) Solving this equation for f .7) (22.8) (x.6) (22.

We will suppose St is a continuous semimartingale. 93 . arbitrage exists if there exists Hs that is adapted and satisfies a suitable integrability condition with T Hs dSs ≥ 0. b > 0. To obtain the equivalent martingale measure. Two probabilities P and Q are equivalent if P(A) = 0 if and only Q(A) = 0. we showed there was a probability measure under which Pt = e−rt St was a martingale. 0 a. This is true very generally. and P 0 T Hs dSs > b > ε for some b.e. In Section 19. so we will only point examine a part of it. then there exists an equivalent martingale measure Q. n T a. The condition says that one can with positive probability ε make a profit of b and with a loss no larger than 1/n. Suppose that we happened to have St = Wt + f (t). It turns out that to get a necessary and sufficient condition for St to be a martingale. Arbitrage means that there is a trading strategy Hs such that there is no chance that we lose anything and there is a positive profit with positive probability. ε. i. Let St be the price of a security in today’s dollars. and Hn (that are adapted and satisfy the appropriate integrability conditions) such that T 0 1 Hn (s) dSs > − . for all t and P 0 Hn (s) dSs > b > ε.23.s. the two probabilities have the same collection of sets of probability zero. If St is a continuous semimartingale and the NFLVR conditions holds. and St is a martingale under Q. ε > 0. where f (t) is a deterministic increasing continuous function.. we would want to let Mt = e − t 0 1 f (s)dWs − 2 t 0 (f (s))2 ds . Q is an equivalent martingale measure if Q is a probability measure. and can be written St = M t + At . ε do not depend on n. The fundamental theorem of finance. Here T. Q is equivalent to P. Theorem 23. b.s. we need a slightly weaker condition. Mathematically.1. The proof is rather technical and involves some heavy-duty measure theory. The NFLVR condition (“no free lunch with vanishing risk”) is that there do not exist a fixed time T .

In order for Mt to make sense, we need f to be differentiable. A result from measure theory says that if f is not differentiable, then we can find a subset A of [0, ∞) such t that 0 1A (s)ds = 0 but the amount of increase of f over the set A is positive. This last statement is phrased mathematically by saying
t

1A (s)df (s) > 0,
0

where the integral is a Riemann-Stieltjes (or better, a Lebesgue-Stieltjes) integral. Then if we hold Hs = 1A (s) shares at time s, our net profit is
t t t

Hs dSs =
0 0

1A (s)dWs +
0

1A (s) df (s).

The second term would be positive since this is the amount of increase of f over the set t t A. The first term is 0, since E ( 0 1A (s)dWs )2 = 0 1A (s)2 ds = 0. So our net profit is nonrandom and positive, or in other words, we have made a net gain without risk. This contradicts “no arbitrage.” See Note 1 for more on this. Sometime Theorem 23.1 is called the first fundamental theorem of asset pricing. The second fundamental theorem is the following. Theorem 23.2. The equivalent martingale measure is unique if and only if the market is complete. We will not prove this. Note 1. We will not prove Theorem 23.1, but let us give a few more indications of what is going on. First of all, recall the Cantor set. This is where E1 = [0, 1], E2 is the set obtained 1 from E1 by removing the open interval ( 3 , 2 ), E3 is the set obtained from E2 by removing 3 the middle third from each of the two intervals making up E2 , and so on. The intersection, E = ∩∞ En , is the Cantor set, and is closed, nonempty, in fact uncountable, yet it contains n=1 no intervals. Also, the Lebesgue measure of A is 0. We set A = E. Let f be the CantorLebesgue function. This is the function that is equal to 0 on (−∞, 0], 1 on [1, ∞), equal to 1 1 2 1 1 2 3 7 8 2 on the interval [ 3 , 3 ], equal to 4 on [ 9 , 9 ], equal to 4 on [ 9 , 9 ], and is defined similarly on each interval making up the complement of A. It turns out we can define f on A so that it is 1 continuous, and one can show 0 1A (s)df (s) = 1. So this A and f provide a concrete example of what we were discussing.

94

24. American puts. The proper valuation of American puts is one of the important unsolved problems in mathematical finance. Recall that a European put pays out (K − ST )+ at time T , while an American put allows one to exercise early. If one exercises an American put at time t < T , one receives (K − St )+ . Then during the period [t, T ] one receives interest, and the amount one has is (K − St )+ er(T −t) . In today’s dollars that is the equivalent of (K − St )+ e−rt . One wants to find a rule, known as the exercise policy, for when to exercise the put, and then one wants to see what the value is for that policy. Since one cannot look into the future, one is in fact looking for a stopping time τ that maximizes E e−rτ (K − Sτ )+ . There is no good theoretical solution to finding the stopping time τ , although good approximations exist. We will, however, discuss just a bit of the theory of optimal stopping, which reworks the problem into another form. Let Gt denote the amount you will receive at time t. For American puts, we set Gt = e−rt (K − St )+ . Our problem is to maximize E Gτ over all stopping times τ . We first need Proposition 24.1. If S and T are bounded stopping times with S ≤ T and M is a martingale, then E [MT | FS ] = MS . Proof. Let A ∈ FS . Define U by U (ω) = S(ω) if ω ∈ A, T (ω) if ω ∈ A. /

It is easy to see that U is a stopping time, so by Doob’s optional stopping theorem, E M0 = E MU = E [MS ; A] + E [MT ; Ac ]. Also, E M0 = E MT = E [MT ; A] + E [MT ; Ac ]. Taking the difference, E [MT ; A] = E [Ms ; A], which is what we needed to show. Given two supermartingales Xt and Yt , it is routine to check that Xt ∧ Yt is also a n n supermartingale. Also, if Xt are supermartingales with Xt ↓ Xt , one can check that Xt 95

is again a supermartingale. With these facts, one can show that given a process such as Gt , there is a least supermartingale larger than Gt . So we define Wt to be a supermartingale (with respect to P, of course) such that Wt ≥ Gt a.s for each t and if Yt is another supermartingale with Yt ≥ Gt for all t, then Wt ≤ Yt for all t. We set τ = inf{t : Wt = Gt }. We will show that τ is the solution to the problem of finding the optimal stopping time. Of course, computing Wt and τ is another problem entirely. Let Tt = {τ : τ is a stopping time, t ≤ τ ≤ T }. Let Vt = sup E [Gτ | Ft ].
τ ∈Tt

Proposition 24.2. Vt is a supermartingale and Vt ≥ Gt for all t. Proof. The fixed time t is a stopping time in Tt , so Vt ≥ E [Gt | Ft ] = Gt , or Vt ≥ Gt . so we only need to show that Vt is a supermartingale. Suppose s < t. Let π be the stopping time in Tt for which Vt = E [Gπ | Ft ]. π ∈ Tt ⊂ Ts . Then E [Vt | Fs ] = E [Gπ | Fs ] ≤ sup E [Gτ | Fs ] = Vs .
τ ∈Ts

Proposition 24.3. If Yt is a supermartingale with Yt ≥ Gt for all t, then Yt ≥ Vt . Proof. If τ ∈ Tt , then since Yt is a supermartingale, we have E [Yτ | Ft ] ≤ Yt . So Vt = sup E [Gτ | Ft ] ≤ sup E [Yτ | Ft ] ≤ Yt .
τ ∈Tt τ ∈Tt

What we have shown is that Wt is equal to Vt . It remains to show that τ is optimal. There may in fact be more than one optimal time, but in any case τ is one of them. Recall we have F0 is the σ-field generated by S0 , and hence consists of only ∅ and Ω. 96

Proof. Then V0 ≥ E [Vσ | F0 ] = E [Vσ ] ≥ E [Gσ ] = V0 . Since F0 is trivial.Proposition 24. Therefore the expected value of Gτ is as least as large as the expected value of Gσ . Therefore all the inequalities must be equalities. and hence τ is also an optimal stopping time. Then E [Gτ ] = E [Vτ ] ≥ EVσ = E Gσ . this procedure gives good usable results for some optimal stopping problems. The above representation of the optimal stopping problem may seem rather bizarre. An example is where Gt is a function of just Wt . Since Vσ ≥ Gσ . τ is an optimal stopping time. However. V0 = supτ ∈T0 E [Gτ | F0 ] = supτ E [Gτ ]. Let σ be a stopping time where the supremum is attained. we see that τ ≤ σ. we must have Vσ = Gσ . Since τ was the first time that Wt equals Gt and Wt = Vt . 97 .4.

In today’s dollars it is worth. So B(t. This is a martingale. Therefore the price of the option should be E V . One dollar at time T will be worth 1/β(T ) in today’s dollars. Let V = (ST − K)+ be the payoff on the standard European call option at time T with strike price K. should be B(t. By Ito’s product formula. Let t β(t) = e 0 r(u)du be the accumulation factor.25. β(T ) β(T ) From now on we assume we have already changed to the risk-neutral measure and we write P instead of P. 98 . as above. as we have seen. r(u)du (ST − K)+ | Ft = E β(t) V V | Ft = β(t)E | Ft . in terms of dollars at time t. T ) = β(t)Nt . The payoff. To do so. By the martingale representation theorem. Let Nt = E [1/β(T ) | Ft ]. t Nt = E [1/β(T )] + 0 Hs dWs for some adapted integrand Hs . dB(t. This is equivalent to an option with payoff value V = 1. T ). T )r(t)dt. T ) = β(t)dNt + Nt dβ(t) = β(t)Ht dWt + Nt r(t)β(t)dt = β(t)Ht dWt + B(t. T ) = β(t)E 1 − | Ft = E e β(T ) T t r(u)du | Ft . A zero coupon bond with maturity date T pays $1 at time T and nothing before. should be the payoff at time T discounted by the interest or inflation rate. we take another look at option pricing. Accumulation factor. V /β(T ). Let’s derive the SDE satisfied by B(t. that is. where St is the stock price. So its price at time t. Here T is fixed. Term structure. β(T ) We can also get an expression for the value of the option at time t. Let r(t) be the (random) interest rate at time t. We now want to consider the case where the interest rate is nondeterministic. Zero coupon. and so should be e Therefore the value at time t is E e − T t − T t r(u)du (ST − K)+ . it has a random component.

T ) − log B(t.e. If one holds T fixed and graphs B(t. our outlay at time t is 0. T +ε). u)du = − log B(t. At time T +ε we pay B(t. T Since B(t. we have log B(t. t) is the value of a zero coupon bond at time t which expires at time t. ∂T (25. We now discuss forward rates. T ) + log B(t. One sometimes specifies interest rates by what are known as forward rates. Let us try to accomplish this by buying a zero coupon bond with maturity date T and shorting (i. ε We now let ε → 0. we have B(t. Integrating. If we set N = B(t. we have T t f (t. At the present time we are at time t ≤ T . Let us see how to recover B(t. (25. T ) or r(t). u) |u=T u=t ∂u t = − log B(t. selling) N zero coupon bonds with maturity date T + ε. T ) from f (t. T ). T ). it is equal to 1. T ). T ) = β(t)Ht dWt + B(t. T + ε) = 0. T + ε) . Our outlay of money at time t is B(t.and we thus have dB(t.2) B(t. At time T we receive $1. Solving for B(t. T ) − N B(t. The effective rate of interest R over the time period T to T + ε is eεR = Solving for R. T ) instead of B(t. T )/B(t. (25.u)du .. B(t. T )/B(t. T )r(t)dt. T ) = e − T t f (t. T + ε). t). T ) . Recovering B from f . Suppose we want to borrow $1 at time T and repay it with interest at time T + ε. u)du = − ∂ log B(t. T ) as a function of t. T ) = − ∂ log B(t. the graph will not clearly show the behavior of r.1) Forward rates.3) 99 . T + ε) Sometimes interest rates are specified by giving f (t. and its log is 0. We define the forward rate by R= f (t.

u)du . T ) = E − r(T )e ∂T Evaluating this when T = t.5) 100 .4) yields r(t) = f (t. Comparing with (25. Next.4) f (t. Setting T = t we obtain −f (t. T ) = −f (t. T ) = E e t | Ft . from (25. We have T − r(u)du B(t. On the other hand. we obtain T t r(u)du | Ft .Recovering r from f . E [−r(t) | Ft ] = −r(t). (25. T )e ∂T T t (25.3) we have ∂ − B(t. t). ∂ − B(t. let us show how to recover r(t) from the forward rates. Differentiating. t).

From (25. 1 Comparing with (26. T )r(t)dt. u)du). T )dWt . Therefore. (26. u)dWt ] du = r(t)dt − t T T = r(t)dt − t α(t. then the solution to (26. T ) = exp(− t f (t. We have T dXt = f (t. (Note that if σ(t) = σ. T )(σ ∗ (t. T ) = B(t. 1 dB(t. using Ito’s formula. we see that if P is the risk-neutral measure. T )r(t)dt − σ ∗ B(t.2) Here σ. T ) satisfies. b are deterministic functions. T )dt. T )dWt . T ) r(t) − α∗ + 2 (σ ∗ )2 dt − σ ∗ B(t. a(t) = a. T )dXt + 2 B(t. Hull and White model In this model. T )dWt + α(t. σ ∗ (t. T ) = σ(t. T )dt − σ ∗ (t. Let T T α∗ (t.1). See Note 1 for more on this.1) we know the dt term must be B(t. (26. The stochastic integral term introduces randomness. T ) = t σ(t. u)du. u)du [α(t. the Heath-Jarrow-Morton model (HJM) specifies the forward rates: df (t. Some interest rate models. t)dt − t T df (t. u)du.26. u)du dWt = r(t)dt − α∗ (t. Since B(t. T )dWt .1) Let us derive the SDE that B(t. hence dB(t.2) becomes r(t) = a/b. we derive the SDE for B by using Ito’s formula with T the function ex and Xt = − t f (t. we have α∗ = 2 (σ ∗ )2 . u)du. Heath-Jarrow-Morton model Instead of specifying r. u)du dt − t σ(t. the interest rate r is specified as the solution to the SDE dr(t) = σ(t)dWt + (a(t) − b(t)r(t))dt.) 101 . b(t) = b are constants and σ = 0. while the a − br term causes a drift toward a(t)/b(t). T ))2 dt 1 = B(t. u)dt + σ(t. a. T ) = B(t. T ) = t T α(t.

We see that the mean at time t is t E r(t) = e −K(t) r(0) + 0 eK(u) a(u)du . we have the explicit solution t t r(t) = e−K(t) r(0) + 0 eK(u) a(u)du + 0 eK(u) σ(u)dWu .(26. Let K(t) = Then t 0 b(u)du. From undergraduate probability. If F (u) is deterministic. Integrating both sides. d eK(t) r(t) = eK(t) r(t)b(t)dt + eK(t) a(t) − b(t)r(t) dt + eK(t) [σ(t)dWt ] = eK(t) a(t)dt + eK(t) [σ(t)dWt ].v.) Limits of linear combinations T of Gaussians are Gaussian. linear combinations of Gaussian r. .v. so t Var r(t) = e−2K(t) 0 e2K(u) σ(u)2 du. t t eK(t) r(t) = r(0) + 0 eK(u) a(u)du + 0 eK(u) σ(u)dWu . T ) = E e − T 0 r(u)du . Cox-Ingersoll-Ross model One drawback of the Hull and White model is that since r(t) is Gaussian.’s are Gaussian. 0 F (u)dWu is Gaussian.v. We know how to calculate the second moment of a stochastic integral. then t F (u)dWu = lim 0 F (ui )(Wui+1 − Wui ).2) is one of those SDE’s that can be solved explicitly. (One can similarly calculate the covariance of r(s) and r(t). it can take negative values with positive probability.’s (Gaussian = normal) are Gaussian. which doesn’t make sense. so we conclude that the t r. The Cox-IngersollRoss model avoids this by modeling r by the SDE dr(t) = (a − br(t))dt + σ 102 r(t)dWt . Multiplying both sides by e−K(t) . and also limits of Gaussian r. so we can calculate the mean and variance of 0 r(t)dt and get an explicit expression for B(0.

will be given in terms of Bessel functions. T ) + σ(t. T ) r(t) − α∗ + 2 (σ ∗ )2 + σ ∗ θ]dt − σ ∗ B(t. where Wt is a Brownian motion under P. Provided a ≥ 2 σ 2 . if θ does not depend on T . T )σ ∗ (t. However. If we try to solve this equation for θ. comparing this with (25. let Mt = exp(− 0 θ(u)dWu − 2 0 θ(u)2 du) and define P(A) = E [MT . T ) = B(t. Let θ(t) be a t 1 t function of t. it can be shown that r(t) will never hit 0 and will always be positive.The difference from the Hull and White model is the square root of r in the stochastic integral term. It turns out to be related to the square of what are known in probability theory as Bessel processes.) Note 1. T )θ(t). This square root term implies that when r(t) is small. A] for A ∈ FT . 103 . we obtain α(t. Although one cannot solve for r explicitly. P will be the risk-neutral measure.1) we must have α∗ = 1 (σ ∗ )2 + σ ∗ θ. there is no reason off-hand that θ depends only on t and not T . Again. the fluctuations in 1 r(t) are larger than they are in the Hull and White model. one can calculate the distribution of r. (The density of r(t). If P is not the risk-neutral measure. for example. By the Girsanov theorem. 2 Differentiating with respect to T . T )dWt . 1 dB(t. it is still possible that one exists. T ) = σ(t.

Show that Mn = Sn − 3nSn is a martingale. u = 3. and ∆2 . Yn ). Consider the binomial asset pricing model with n = 3. . both An and An increase in n. and E Nn < ∞ for each n. . 12. . r = 0. Show that Mn = eaSn φ(a)−n 2 is a martingale for each a real. Xn ). Suppose X1 . 6. and E Nn < ∞ for each n. Let φ(x) = 1 (ex +e−x ). 15. 2 5. then E [Zn+1 | Fn ] ≥ Zn . Xn are independent and for each i we have P(Xi = 1) n 1 3 = P(Xi = −1) = 2 . T ) and min(S. Suppose Mn is a martingale. 2 2 9.Problems 1. Show Xn is a martingale with respect to Fn . Nn = |Mn |. . If V is a European call with strike price K and exercise date n. 2. Show E [Nn+1 | Fn ] ≥ Nn for each n. 13. 3. Suppose that S and T are stopping times. Show that max(S. compute explicitly the random variables V1 and V2 and calculate the value V0 .1. and A0 = A0 . Prove that E [aX1 + bX2 | G] = aE [X1 | G] + bE [X2 | G]. Do not use Jensen’s inequality. compute the hedging strategy ∆0 . Let Sn = i=1 Xi . and K = 10. . 7. . . 4. Show that if Xn and Yn are martingales with respect to {Fn } and Zn = max(Xn . Nn = Mn . Suppose Xn is a submartingale. Show E [XE [Y | G] ] = E [Y E [X | G] ]. Let Xi and Sn be as in Problem 3. In the same model as problem 1. S0 = 20. Let Xn and Yn be martingales with E Xn < ∞ and E Yn < ∞. Suppose Mn is a martingale. Show n E Xn Yn − E X0 Y0 = m=1 E (Xm − Xm−1 )(Ym − Ym−1 ). Suppose Xn is a submartingale and Xn = Mn + An = Mn + An . Show that in the binomial asset pricing model the value of the option V at time k is Vk . . Show there exists a martingale Mn such that if An = Xn − Mn . X2 . ∆1 . 14. both M and M are martingales. Suppose Xn is a martingale with respect to Gn and Fn = σ(X1 . d = 2 . Show Mn = Mn for each n. Show E [Nn+1 | Fn ] ≥ Nn for each n. then A0 ≤ A1 ≤ A2 ≤ · · · and An is Fn−1 measurable for each n. 104 . 11. where both An and An are Fn−1 measurable for each n. 8. Do not use Jensen’s inequality. 1 10. T ) are also stopping times.

t 1 [Hint: Use Ito’s formula to rewrite 2ε 0 1(−ε. Suppose that Sn is a stopping time for each n and S1 ≤ S2 ≤ · · ·. Find a nonconstant function f such that f (Xt ) is a martingale. 0 2 t/2 can be written in the form 18. 17. Then sum over i and show that the stochastic integral term goes to zero as n → ∞.ε) (Ws )ds 0 converges as ε → 0 to a continuous nondecreasing process that is not identically zero and that increases only when Xt is at 0. Suppose Xt = Wt + F (t).] 21. and Wt is a Brownian motion under P.16.ε) (x). Show that if instead S1 ≥ S2 ≥ · · · and S = limn→∞ Sn . Let fε (0) = fε (0) = 0 and fε (x) = 2ε 1(−ε. where F is a twice continuously differentiable function. [Hint: Show that Ito’s formula implies (i+1)/2n (M i+1 − M n 2 i 2n ) = i/2n 2 (Ms − M i 2n )dMs + M i+1 2n − M i 2n . F (0) = 0. Let Xt be the solution to dXt = σ(Xt )dWt + b(Xt )dt. Show that / 1 2ε t 1(−ε. Let Wt be Brownian motion. You may assume that it is valid to use Ito’s formula with the function fε (note fε ∈ C 2 ). Find a probability measure Q under 105 . Show that eiuWt +u t Hs dWs and give an explicit formula for Hs . [Hint: Apply Ito’s formula to f (Xt ) and obtain an ordinary differential equation that f needs to satisfy. then S is again a stopping time. Suppose Mt is a continuous bounded martingale for which M Show that n 2 −1 ∞ is also bounded. where Wt is Brownian motion and σ and b are bounded C ∞ functions and σ is bounded below by a positive constant. (M i+1 − M n 2 i 2n )2 i=0 converges to M 1 as n → ∞.] 20. X0 = x.ε) (Ws )ds in terms of fε (Wt ) − fε (W0 ) plus a stochastic integral term and take the limit in this formula.] 1 19. Show S = limn→∞ Sn is also a stopping time.

µ)dy dx. for each A. s≤t where Wt is a Brownian motion. (ψ has an explicit formula. (b) Find the hedging strategy that duplicates the claim V . Let A and B be fixed positive reals. where Wt is a Brownian motion. (You will need to use the general Girsanov theorem. Determine the price of the standard European call using the Black-Scholes formula. Let V be the option that pays off sups≤T Ss at time T . Suppose the stock price is given by dSt = σSt dWt + µ(t)St dt. (a) Determine the price at time 0 of V .. µ) = P(sup(Ws + µs) = y for s ≤ t. Suppose the interest rate is 0 and St is the standard geometric Brownian motion stock price. B. D B P(A ≤ sup(Ws + µs) ≤ B. 106 . Show that t Xt = 0 es−t dWs . Let ψ(t. Wt = x). Suppose we have a stock where σ = 2. Determine the price at time 0 of V as an expression in terms of ψ.which Xt is a Brownian motion and prove your statement. 25. but let µ(t) be a deterministic (i. but we don’t need that here. K = 15. and let V be the option that pays off 1 at time T if A ≤ ST ≤ B and 0 otherwise. and T = 3. C. D. x. Suppose we are in the continuous time model. r = 0. y. Let V be the standard European call that has strike price K and exercise date T . nonrandom) function. S0 = 10. 23. 26. Let r and σ be constants.e. Find the price at time 0 of V .) Let the stock price St be given by the standard geometric Brownian motion. x.) 22.1. y. More precisely. 23. C ≤ Wt ≤ D) = s≤t C A ψ(t. Suppose Xt = Wt − t 0 Xs ds. as usual.

Sign up to vote on this title
UsefulNot useful