This action might not be possible to undo. Are you sure you want to continue?

Spring 2003

Richard F. Bass

Department of Mathematics

University of Connecticut

These notes are c _2003 by Richard Bass. They may be used for personal use or

class use, but not for commercial purposes. If you ﬁnd any errors, I would appreciate

hearing from you: bass@math.uconn.edu

1

1. Introduction.

In this course we will study mathematical ﬁnance. Mathematical ﬁnance is not

about predicting the price of a stock. What it is about is ﬁguring out the price of options

and derivatives.

The most familiar type of option is the option to buy a stock at a given price at

a given time. For example, suppose Microsoft is currently selling today at $40 per share.

A European call option is something I can buy that gives me the right to buy a share of

Microsoft at some future date. To make up an example, suppose I have an option that

allows me to buy a share of Microsoft for $50 in three months time, but does not compel

me to do so. If Microsoft happens to be selling at $45 in three months time, the option is

worthless. I would be silly to buy a share for $50 when I could call my broker and buy it

for $45. So I would choose not to exercise the option. On the other hand, if Microsoft is

selling for $60 three months from now, the option would be quite valuable. I could exercise

the option and buy a share for $50. I could then turn around and sell the share on the

open market for $60 and make a proﬁt of $10 per share. Therefore this stock option I

possess has some value. There is some chance it is worthless and some chance that it will

lead me to a proﬁt. The basic question is: how much is the option worth today?

The huge impetus in ﬁnancial derivatives was the seminal paper of Black and Scholes

in 1973. Although many researchers had studied this question, Black and Scholes gave a

deﬁnitive answer, and a great deal of research has been done since. These are not just

academic questions; today the market in ﬁnancial derivatives is larger than the market

in stock securities. In other words, more money is invested in options on stocks than in

stocks themselves.

Options have been around for a long time. The earliest ones were used by manu-

facturers and food producers to hedge their risk. A farmer might agree to sell a bushel of

wheat at a ﬁxed price six months from now rather than take a chance on the vagaries of

market prices. Similarly a steel reﬁnery might want to lock in the price of iron ore at a

ﬁxed price.

The sections of these notes can be grouped into ﬁve categories. The ﬁrst is elemen-

tary probability. Although someone who has had a course in undergraduate probability

will be familiar with some of this, we will talk about a number of topics that are not usu-

ally covered in such a course: σ-ﬁelds, conditional expectations, martingales. The second

category is the binomial asset pricing model. This is just about the simplest model of a

stock that one can imagine, and this will provide a case where we can see most of the major

ideas of mathematical ﬁnance, but in a very simple setting. Then we will turn to advanced

probability, that is, ideas such as Brownian motion, stochastic integrals, stochastic diﬀer-

ential equations, Girsanov transformation. Although to do this rigorously requires measure

theory, we can still learn enough to understand and work with these concepts. We then

2

return to ﬁnance and work with the continuous model. We will derive the Black-Scholes

formula, see the Fundamental Theorem of Asset Pricing, work with equivalent martingale

measures, and the like. The ﬁfth main category is term structure models, which means

models of interest rate behavior.

I found some unpublished notes of Steve Shreve extremely useful in preparing these

notes. I hope that he has turned them into a book and that this book is now available.

The stochastic calculus part of these notes is from my own book: Probabilistic Techniques

in Analysis, Springer, New York, 1995.

I would also like to thank Evarist Gin´e who pointed out a number of errors.

3

2. Review of elementary probability.

Let’s begin by recalling some of the deﬁnitions and basic concepts of elementary

probability. We will only work with discrete models at ﬁrst.

We start with an arbitrary set, called the probability space, which we will denote

by Ω, the capital Greek letter “omega.” We are given a class T of subsets of Ω. These are

called events. We require T to be a σ-ﬁeld.

Deﬁnition 2.1. A collection T of subsets of Ω is called a σ-ﬁeld if

(1) ∅ ∈ T,

(2) Ω ∈ T,

(3) A ∈ T implies A

c

∈ T, and

(4) A

1

, A

2

, . . . ∈ T implies both ∪

∞

i=1

A

i

∈ T and ∩

∞

i=1

A

i

∈ T.

Here A

c

= ¦ω ∈ Ω : ω / ∈ A¦ denotes the complement of A. ∅ denotes the empty set, that

is, the set with no elements. We will use without special comment the usual notations of

∪ (union), ∩ (intersection), ⊂ (contained in), ∈ (is an element of).

Typically, in an elementary probability course, T will consist of all subsets of

Ω, but we will later need to distinguish between various σ-ﬁelds. Here is an exam-

ple. Suppose one tosses a coin two times and lets Ω denote all possible outcomes. So

Ω = ¦HH, HT, TH, TT¦. A typical σ-ﬁeld T would be the collection of all subsets of Ω.

In this case it is trivial to show that T is a σ-ﬁeld, since every subset is in T. But if

we let ( = ¦∅, Ω, ¦HH, HT¦, ¦TH, TT¦¦, then ( is also a σ-ﬁeld. One has to check the

deﬁnition, but to illustrate, the event ¦HH, HT¦ is in (, so we require the complement of

that set to be in ( as well. But the complement is ¦TH, TT¦ and that event is indeed in

(.

One point of view which we will explore much more fully later on is that the σ-ﬁeld

tells you what events you “know.” In this example, T is the σ-ﬁeld where you “know”

everything, while ( is the σ-ﬁeld where you “know” only the result of the ﬁrst toss but not

the second. We won’t try to be precise here, but to try to add to the intuition, suppose

one knows whether an event in T has happened or not for a particular outcome. We

would then know which of the events ¦HH¦, ¦HT¦, ¦TH¦, or ¦TT¦ has happened and so

would know what the two tosses of the coin showed. On the other hand, if we know which

events in ( happened, we would only know whether the event ¦HH, HT¦ happened, which

means we would know that the ﬁrst toss was a heads, or we would know whether the event

¦TH, TT¦ happened, in which case we would know that the ﬁrst toss was a tails. But

there is no way to tell what happened on the second toss from knowing which events in (

happened. Much more on this later.

The third basic ingredient is a probability.

4

Deﬁnition 2.2. A function P on T is a probability if it satisﬁes

(1) if A ∈ T, then 0 ≤ P(A) ≤ 1,

(2) P(Ω) = 1, and

(3) P(∅) = 0, and

(4) if A

1

, A

2

, . . . ∈ T are pairwise disjoint, then P(∪

∞

i=1

A

i

) =

∞

i=1

P(A

i

).

A collection of sets A

i

is pairwise disjoint if A

i

∩ A

j

= ∅ unless i = j.

There are a number of conclusions one can draw from this deﬁnition. As one

example, if A ⊂ B, then P(A) ≤ P(B) and P(A

c

) = 1 − P(A). See Note 1 at the end of

this section for a proof.

Someone who has had measure theory will realize that a σ-ﬁeld is the same thing

as a σ-algebra and a probability is a measure of total mass one.

A random variable (abbreviated r.v.) is a function X from Ω to R, the reals. To

be more precise, to be a r.v. X must also be measurable, which means that ¦ω : X(ω) ≥

a¦ ∈ T for all reals a.

The notion of measurability has a simple deﬁnition but is a bit subtle. If we take

the point of view that we know all the events in (, then if Y is (-measurable, then we

know Y . Phrased another way, suppose we know whether or not the event has occurred

for each event in (. Then if Y is (-measurable, we can compute the value of Y .

Here is an example. In the example above where we tossed a coin two times, let X

be the number of heads in the two tosses. Then X is T measurable but not ( measurable.

To see this, let us consider A

a

= ¦ω ∈ Ω : X(ω) ≥ a¦. This event will equal

_

¸

_

¸

_

Ω if a ≤ 0;

¦HH, HT, TH¦ if 0 < a ≤ 1;

¦HH¦ if 1 < a ≤ 2;

∅ if 2 < a.

For example, if a =

3

2

, then the event where the number of heads is

3

2

or greater is the

event where we had two heads, namely, ¦HH¦. Now observe that for each a the event A

a

is in T because T contains all subsets of Ω. Therefore X is measurable with respect to T.

However it is not true that A

a

is in ( for every value of a – take a =

3

2

as just one example

– the subset ¦HH¦ is not in (. So X is not measurable with respect to the σ-ﬁeld (.

A discrete r.v. is one where P(ω : X(ω) = a) = 0 for all but countably many a’s,

say, a

1

, a

2

, . . ., and

i

P(ω : X(ω) = a

i

) = 1. In deﬁning sets one usually omits the ω;

thus (X = x) means the same as ¦ω : X(ω) = x¦.

In the discrete case, to check measurability with respect to a σ-ﬁeld T, it is enough

that (X = a) ∈ T for all reals a. The reason for this is that if x

1

, x

2

, . . . are the values of

5

x for which P(X = x) ,= 0, then we can write (X ≥ a) = ∪

x

i

≥a

(X = x

i

) and we have a

countable union. So if (X = x

i

) ∈ T, then (X ≥ a) ∈ T.

Given a discrete r.v. X, the expectation or mean is deﬁned by

EX =

x

xP(X = x)

provided the sum converges. If X only takes ﬁnitely many values, then this is a ﬁnite sum

and of course it will converge. This is the situation that we will consider for quite some

time. However, if X can take an inﬁnite number of values (but countable), convergence

needs to be checked. For example, if P(X = 2

n

) = 2

−n

for n = 1, 2, . . ., then EX =

∞

n=1

2

n

2

−n

= ∞.

There is an alternate deﬁnition of expectation which is equivalent in the discrete

setting. Set

EX =

ω∈Ω

X(ω)P(¦ω¦).

To see that this is the same, look at Note 2 at the end of the section. The advantage of the

second deﬁnition is that some properties of expectation, such as E(X +Y ) = EX +EY ,

are immediate, while with the ﬁrst deﬁnition they require quite a bit of proof.

We say two events A and B are independent if P(A∩B) = P(A)P(B). Two random

variables X and Y are independent if P(X ∈ A, Y ∈ B) = P(X ∈ A)P(X ∈ B) for all A

and B that are subsets of the reals. The comma in the expression P(X ∈ A, Y ∈ B) means

“and.” Thus

P(X ∈ A, Y ∈ B) = P((X ∈ A) ∩ (Y ∈ B)).

The extension of the deﬁnition of independence to the case of more than two events or

random variables is not surprising: A

1

, . . . , A

n

are independent if

P(A

i

1

∩ ∩ A

i

j

) = P(A

i

1

) P(A

i

j

)

whenever ¦i

1

, . . . , i

j

¦ is a subset of ¦1, . . . , n¦.

A common misconception is that an event is independent of itself. If A is an event

that is independent of itself, then

P(A) = P(A∩ A) = P(A)P(A) = (P(A))

2

.

The only ﬁnite solutions to the equation x = x

2

are x = 0 and x = 1, so an event is

independent of itself only if it has probability 0 or 1.

Two σ-ﬁelds T and ( are independent if A and B are independent whenever A ∈ T

and B ∈ (. A r.v. X and a σ-ﬁeld ( are independent if P((X ∈ A) ∩B) = P(X ∈ A)P(B)

whenever A is a subset of the reals and B ∈ (.

6

As an example, suppose we toss a coin two times and we deﬁne the σ-ﬁelds (

1

=

¦∅, Ω, ¦HH, HT¦, ¦TH, TT¦¦ and (

2

= ¦∅, Ω, ¦HH, TH¦, ¦HT, TT¦¦. Then (

1

and (

2

are

independent if P(HH) = P(HT) = P(TH) = P(TT) =

1

4

. (Here we are writing P(HH)

when a more accurate way would be to write P(¦HH¦).) An easy way to understand this

is that if we look at an event in (

1

that is not ∅ or Ω, then that is the event that the ﬁrst

toss is a heads or it is the event that the ﬁrst toss is a tails. Similarly, a set other than ∅

or Ω in (

2

will be the event that the second toss is a heads or that the second toss is a

tails.

If two r.v.s X and Y are independent, we have the multiplication theorem, which

says that E(XY ) = (EX)(EY ) provided all the expectations are ﬁnite. See Note 3 for a

proof.

Suppose X

1

, . . . , X

n

are n independent r.v.s, such that for each one P(X

i

= 1) = p,

P(X

i

= 0) = 1 − p, where p ∈ [0, 1]. The random variable S

n

=

n

i=1

X

i

is called a

binomial r.v., and represents, for example, the number of successes in n trials, where the

probability of a success is p. An important result in probability is that

P(S

n

= k) =

n!

k!(n −k)!

p

k

(1 −p)

n−k

.

The variance of a random variable is

Var X = E[(X −EX)

2

].

This is also equal to

E[X

2

] −(EX)

2

.

It is an easy consequence of the multiplication theorem that if X and Y are independent,

Var (X +Y ) = Var X + Var Y.

The expression E[X

2

] is sometimes called the second moment of X.

We close this section with a deﬁnition of conditional probability. The probability

of A given B, written P(A [ B) is deﬁned by

P(A∩ B)

P(B)

,

provided P(B) ,= 0. The conditional expectation of X given B is deﬁned to be

E[X; B]

P(B)

,

7

provided P(B) ,= 0. The notation E[X; B] means E[X1

B

], where 1

B

(ω) is 1 if ω ∈ B and

0 otherwise. Another way of writing E[X; B] is

E[X; B] =

ω∈B

X(ω)P(¦ω¦).

(We will use the notation E[X; B] frequently.)

Note 1. Suppose we have two disjoint sets C and D. Let A

1

= C, A

2

= D, and A

i

= ∅ for

i ≥ 3. Then the A

i

are pairwise disjoint and

P(C ∪ D) = P(∪

∞

i=1

A

i

) =

∞

i=1

P(A

i

) = P(C) +P(D) (2.1)

by Deﬁnition 2.2(3) and (4). Therefore Deﬁnition 2.2(4) holds when there are only two sets

instead of inﬁnitely many, and a similar argument shows the same is true when there are an

arbitrary (but ﬁnite) number of sets.

Now suppose A ⊂ B. Let C = A and D = B − A, where B − A is deﬁned to be

B ∩ A

c

(this is frequently written B ¸ A as well). Then C and D are disjoint, and by (2.1)

P(B) = P(C ∪ D) = P(C) +P(D) ≥ P(C) = P(A).

The other equality we mentioned is proved by letting C = A and D = A

c

. Then C and

D are disjoint, and

1 = P(Ω) = P(C ∪ D) = P(C) +P(D) = P(A) +P(A

c

).

Solving for P(A

c

), we have

P(A

c

) = 1 −P(A).

Note 2. Let us show the two deﬁnitions of expectation are the same (in the discrete case).

Starting with the ﬁrst deﬁnition we have

EX =

x

xP(X = x)

=

x

x

{ω∈Ω:X(ω)=x}

P(¦ω¦)

=

x

{ω∈Ω:X(ω)=x}

X(ω)P(¦ω¦)

=

ω∈Ω

X(ω)P(¦ω¦),

8

and we end up with the second deﬁnition.

Note 3. Suppose X can takes the values x

1

, x

2

, . . . and Y can take the values y

1

, y

2

, . . ..

Let A

i

= ¦ω : X(ω) = x

i

¦ and B

j

= ¦ω : Y (ω) = y

j

¦. Then

X =

i

x

i

1

A

i

, Y =

j

y

j

1

B

j

,

and so

XY =

i

j

x

i

y

i

1

A

i

1

B

j

.

Since 1

A

i

1

B

j

= 1

A

i

∩B

j

, it follows that

E[XY ] =

i

j

x

i

y

j

P(A

i

∩ B

j

),

assuming the double sum converges. Since X and Y are independent, A

i

= (X = x

i

) is

independent of B

j

= (Y = y

j

) and so

E[XY ] =

i

j

x

i

y

j

P(A

i

)P(B

j

)

=

i

x

i

P(A

i

)

_

j

y

j

P(B

j

)

_

=

i

x

i

P(A

i

)EY

= (EX)(EY ).

9

3. Conditional expectation.

Suppose we have 200 men and 100 women, 70 of the men are smokers, and 50 of

the women are smokers. If a person is chosen at random, then the conditional probability

that the person is a smoker given that it is a man is 70 divided by 200, or 35%, while the

conditional probability the person is a smoker given that it is a women is 50 divided by

100, or 50%. We will want to be able to encompass both facts in a single entity.

The way to do that is to make conditional probability a random variable rather

than a number. To reiterate, we will make conditional probabilities random. Let M, W be

man, woman, respectively, and S, S

c

smoker and nonsmoker, respectively. We have

P(S [ M) = .35, P(S [ W) = .50.

We introduce the random variable

(.35)1

M

+ (.50)1

W

and use that for our conditional probability. So on the set M its value is .35 and on the

set W its value is .50.

We need to give this random variable a name, so what we do is let ( be the σ-ﬁeld

consisting of ¦∅, Ω, M, W¦ and denote this random variable P(S [ (). Thus we are going

to talk about the conditional probability of an event given a σ-ﬁeld.

What is the precise deﬁnition?

Deﬁnition 3.1. Suppose there exist ﬁnitely (or countably) many sets B

1

, B

2

, . . ., all hav-

ing positive probability, such that they are pairwise disjoint, Ω is equal to their union, and

( is the σ-ﬁeld one obtains by taking all ﬁnite or countable unions of the B

i

. Then the

conditional probability of A given ( is

P(A [ () =

i

P(A∩ B

i

)

P(B

i

)

1

B

i

(ω).

In short, on the set B

i

the conditional probability is equal to P(A [ B

i

).

Not every σ-ﬁeld can be so represented, so this deﬁnition will need to be extended

when we get to continuous models. σ-ﬁelds that can be represented as in Deﬁnition 3.1 are

called ﬁnitely (or countably) generated and are said to be generated by the sets B

1

, B

2

, . . ..

Let’s look at another example. Suppose Ω consists of the possible results when we

toss a coin three times: HHH, HHT, etc. Let T

3

denote all subsets of Ω. Let T

1

consist of

the sets ∅, Ω, ¦HHH, HHT, HTH, HTT¦, and ¦THH, THT, TTH, TTT¦. So T

1

consists

of those events that can be determined by knowing the result of the ﬁrst toss. We want to

let T

2

denote those events that can be determined by knowing the ﬁrst two tosses. This will

10

include the sets ∅, Ω, ¦HHH, HHT¦, ¦HTH, HTT¦, ¦THH, THT¦, ¦TTH, TTT¦. This is

not enough to make T

2

a σ-ﬁeld, so we add to T

2

all sets that can be obtained by taking

unions of these sets.

Suppose we tossed the coin independently and suppose that it was fair. Let us

calculate P(A [ T

1

), P(A [ T

2

), and P(A [ T

3

) when A is the event ¦HHH¦. First

the conditional probability given T

1

. Let C

1

= ¦HHH, HHT, HTH, HTT¦ and C

2

=

¦THH, THT, TTH, TTT¦. On the set C

1

the conditional probability is P(A∩C

1

)/P(C

1

) =

P(HHH)/P(C

1

) =

1

8

/

1

2

=

1

4

. On the set C

2

the conditional probability is P(A∩C

2

)/P(C

2

)

= P(∅)/P(C

2

) = 0. Therefore P(A [ T

1

) = (.25)1

C

1

. This is plausible – the probability of

getting three heads given the ﬁrst toss is

1

4

if the ﬁrst toss is a heads and 0 otherwise.

Next let us calculate P(A [ T

2

). Let D

1

= ¦HHH, HHT¦, D

2

= ¦HTH, HTT¦, D

3

= ¦THH, THT¦, D

4

= ¦TTH, TTT¦. So T

2

is the σ-ﬁeld consisting of all possible unions

of some of the D

i

’s. P(A [ D

1

) = P(HHH)/P(D

1

) =

1

8

/

1

4

=

1

2

. Also, as above, P(A [

D

i

) = 0 for i = 2, 3, 4. So P(A [ T

2

) = (.50)1

D

1

. This is again plausible – the probability

of getting three heads given the ﬁrst two tosses is

1

2

if the ﬁrst two tosses were heads and

0 otherwise.

What about conditional expectation? Recall E[X; B

i

] = E[X1

B

i

] and also that

E[1

B

] = 1 P(1

B

= 1) + 0 P(1

B

= 0) = P(B). Given a random variable X, we deﬁne

E[X [ (] =

i

E[X; B

i

]

P(B

i

)

1

B

i

.

This is the obvious deﬁnition, and it agrees with what we had before because E[1

A

[ (]

should be equal to P(A [ ().

We now turn to some properties of conditional expectation. Some of the following

propositions may seem a bit technical. In fact, they are! However, these properties are

crucial to what follows and there is no choice but to master them.

Proposition 3.2. E[X [ (] is ( measurable, that is, if Y = E[X [ (], then (Y > a) is a

set in ( for each real a.

Proof. By the deﬁnition,

Y = E[X [ (] =

i

E[X; B

i

]

P(B

i

)

1

B

i

=

i

b

i

1

B

i

if we set b

i

= E[X; B

i

]/P(B

i

). The set (Y ≥ a) is a union of some of the B

i

, namely, those

B

i

for which b

i

≥ a. But the union of any collection of the B

i

is in (.

An example might help. Suppose

Y = 2 1

B

1

+ 3 1

B

2

+ 6 1

B

3

+ 4 1

B

4

and a = 3.5. Then (Y ≥ a) = B

3

∪ B

4

, which is in (.

11

Proposition 3.3. If C ∈ ( and Y = E[X [ (], then E[Y ; C] = E[X; C].

Proof. Since Y =

E[X;B

i

]

P(B

i

)

1

B

i

and the B

i

are disjoint, then

E[Y ; B

j

] =

E[X; B

j

]

P(B

j

)

E1

B

j

= E[X; B

j

].

Now if C = B

j

1

∪ ∪B

j

n

∪ , summing the above over the j

k

gives E[Y ; C] = E[X; C].

Let us look at the above example for this proposition, and let us do the case where

C = B

2

. Note 1

B

2

1

B

2

= 1

B

2

because the product is 1 1 = 1 if ω is in B

2

and 0 otherwise.

On the other hand, it is not possible for an ω to be in more than one of the B

i

, so

1

B

2

1

B

i

= 0 if i ,= 2. Multiplying Y in the above example by 1

B

2

, we see that

E[Y ; C] = E[Y ; B

2

] = E[Y 1

B

2

] = E[3 1

B

2

]

= 3E[1

B

2

] = 3P(B

2

).

However the number 3 is not just any number; it is E[X; B

2

]/P(B

2

). So

3P(B

2

) =

E[X; B

2

]

P(B

2

)

P(B

2

) = E[X; B

2

] = E[X; C],

just as we wanted. If C = B

1

∪ B

4

, for example, we then write

E[X; C] = E[X1

C

] = E[X(1

B

2

+ 1

B

4

)]

= E[X1

B

2

] +E[X1

B

4

] = E[X; B

2

] +E[X; B

4

].

By the ﬁrst part, this equals E[Y ; B

2

]+E[Y ; B

4

], and we undo the above string of equalities

but with Y instead of X to see that this is E[Y ; C].

If a r.v. Y is ( measurable, then for any a we have (Y = a) ∈ ( which means that

(Y = a) is the union of one or more of the B

i

. Since the B

i

are disjoint, it follows that Y

must be constant on each B

i

.

Again let us look at an example. Suppose Z takes only the values 1, 3, 4, 7. Let

D

1

= (Z = 1), D

2

= (Z = 3), D

3

= (Z = 4), D

4

= (Z = 7). Note that we can write

Z = 1 1

D

1

+ 3 1

D

2

+ 4 1

D

3

+ 7 1

D

4

.

To see this, if ω ∈ D

2

, for example, the right hand side will be 0+3 1+0+0, which agrees

with Z(ω). Now if Z is ( measurable, then (Z ≥ a) ∈ ( for each a. Take a = 7, and we

see D

4

∈ (. Take a = 4 and we see D

3

∪ D

4

∈ (. Taking a = 3 shows D

2

∪ D

3

∪ D

4

∈ (.

12

Now D

3

= (D

3

∪D

4

) ∩D

c

4

, so since ( is a σ-ﬁeld, D

3

∈ (. Similarly D

2

, D

1

∈ (. Because

sets in ( are unions of the B

i

’s, we must have Z constant on the B

i

’s. For example, if it

so happened that D

1

= B

1

, D

2

= B

2

∪ B

4

, D

3

= B

3

∪ B

6

∪ B

7

, and D

4

= B

5

, then

Z = 1 1

B

1

+ 3 1

B

2

+ 4 1

B

3

+ 3 1

B

4

+ 7 1

B

5

+ +4 1

B

6

+ 4 1

B

7

.

We still restrict ourselves to the discrete case. In this context, the properties given

in Propositions 3.2 and 3.3 uniquely determine E[X [ (].

Proposition 3.4. Suppose Z is ( measurable and E[Z; C] = E[X; C] whenever C ∈ (.

Then Z = E[X [ (].

Proof. Since Z is ( measurable, then Z must be constant on each B

i

. Let the value of Z

on B

i

be z

i

. So Z =

i

z

i

1

B

i

. Then

z

i

P(B

i

) = E[Z; B

i

] = E[X; B

i

],

or z

i

= E[X; B

i

]/P(B

i

) as required.

The following propositions contain the main facts about this new deﬁnition of con-

ditional expectation that we will need.

Proposition 3.5. (1) If X

1

≥ X

2

, then E[X

1

[ (] ≥ E[X

2

[ (].

(2) E[aX

1

+bX

2

[ (] = aE[X

1

[ (] +bE[X

2

[ (].

(3) If X is ( measurable, then E[X [ (] = X.

(4) E[E[X [ (]] = EX.

(5) If X is independent of (, then E[X [ (] = EX.

We will prove Proposition 3.5 in Note 1 at the end of the section. At this point it

is more fruitful to understand what the proposition says.

We will see in Proposition 3.8 below that we may think of E[X [ (] as the best

prediction of X given (. Accepting this for the moment, we can give an interpretation of

(1)-(5). (1) says that if X

1

is larger than X

2

, then the predicted value of X

1

should be

larger than the predicted value of X

2

. (2) says that the predicted value of X

1

+X

2

should

be the sum of the predicted values. (3) says that if we know ( and X is ( measurable,

then we know X and our best prediction of X is X itself. (4) says that the average of the

predicted value of X should be the average value of X. (5) says that if knowing ( gives us

no additional information on X, then the best prediction for the value of X is just EX.

Proposition 3.6. If Z is ( measurable, then E[XZ [ (] = ZE[X [ (].

We again defer the proof, this time to Note 2.

Proposition 3.6 says that as far as conditional expectations with respect to a σ-

ﬁeld ( go, (-measurable random variables act like constants: they can be taken inside or

outside the conditional expectation at will.

13

Proposition 3.7. If H ⊂ ( ⊂ T, then

E[E[X [ H] [ (] = E[X [ H] = E[E[X [ (] [ H].

Proof. E[X [ H] is H measurable, hence ( measurable, since H ⊂ (. The left hand

equality now follows by Proposition 3.5(3). To get the right hand equality, let W be the

right hand expression. It is H measurable, and if C ∈ H ⊂ (, then

E[W; C] = E[E[X [ (]; C] = E[X; C]

as required.

In words, if we are predicting a prediction of X given limited information, this is

the same as a single prediction given the least amount of information.

Let us verify that conditional expectation may be viewed as the best predictor of

a random variable given a σ-ﬁeld. If X is a r.v., a predictor Z is just another random

variable, and the goodness of the prediction will be measured by E[(X − Z)

2

], which is

known as the mean square error.

Proposition 3.8. If X is a r.v., the best predictor among the collection of (-measurable

random variables is Y = E[X [ (].

Proof. Let Z be any (-measurable random variable. We compute, using Proposition

3.5(3) and Proposition 3.6,

E[(X −Z)

2

[ (] = E[X

2

[ (] −2E[XZ [ (] +E[Z

2

[ (]

= E[X

2

[ (] −2ZE[X [ (] +Z

2

= E[X

2

[ (] −2ZY +Z

2

= E[X

2

[ (] −Y

2

+ (Y −Z)

2

= E[X

2

[ (] −2Y E[X [ (] +Y

2

+ (Y −Z)

2

= E[X

2

[ (] −2E[XY [ (] +E[Y

2

[ (] + (Y −Z)

2

= E[(X −Y )

2

[ (] + (Y −Z)

2

.

We also used the fact that Y is ( measurable. Taking expectations and using Proposition

3.5(4),

E[(X −Z)

2

] = E[(X −Y )

2

] +E[(Y −Z)

2

].

The right hand side is bigger than or equal to E[(X −Y )

2

] because (Y −Z)

2

≥ 0. So the

error in predicting X by Z is larger than the error in predicting X by Y , and will be equal

if and only if Z = Y . So Y is the best predictor.

14

There is one more interpretation of conditional expectation that may be useful. The

collection of all random variables is a linear space, and the collection of all (-measurable

random variables is clearly a subspace. Given X, the conditional expectation Y = E[X [ (]

is equal to the projection of X onto the subspace of (-measurable random variables. To

see this, we write X = Y +(X −Y ), and what we have to check is that the inner product

of Y and X − Y is 0, that is, Y and X − Y are orthogonal. In this context, the inner

product of X

1

and X

2

is deﬁned to be E[X

1

X

2

], so we must show E[Y (X−Y )] = 0. Note

E[Y (X −Y ) [ (] = Y E[X −Y [ (] = Y (E[X [ (] −Y ) = Y (Y −Y ) = 0.

Taking expectations,

E[Y (X −Y )] = E[E[Y (X −Y ) [ (] ] = 0,

just as we wished.

If Y is a discrete random variable, that is, it takes only countably many values

y

1

, y

2

, . . ., we let B

i

= (Y = y

i

). These will be disjoint sets whose union is Ω. If σ(Y )

is the collection of all unions of the B

i

, then σ(Y ) is a σ-ﬁeld, and is called the σ-ﬁeld

generated by Y . It is easy to see that this is the smallest σ-ﬁeld with respect to which Y

is measurable. We write E[X [ Y ] for E[X [ σ(Y )].

Note 1. We prove Proposition 3.5. (1) and (2) are immediate from the deﬁnition. To prove

(3), note that if Z = X, then Z is ( measurable and E[X; C] = E[Z; C] for any C ∈ (; this

is trivial. By Proposition 3.4 it follows that Z = E[X [ (];this proves (3). To prove (4), if we

let C = Ω and Y = E[X [ (], then EY = E[Y ; C] = E[X; C] = EX.

Last is (5). Let Z = EX. Z is constant, so clearly ( measurable. By the in-

dependence, if C ∈ (, then E[X; C] = E[X1

C

] = (EX)(E1

C

) = (EX)(P(C)). But

E[Z; C] = (EX)(P(C)) since Z is constant. By Proposition 3.4 we see Z = E[X [ (].

Note 2. We prove Proposition 3.6. Note that ZE[X [ (] is ( measurable, so by Proposition

3.4 we need to show its expectation over sets C in ( is the same as that of XZ. As in the

proof of Proposition 3.3, it suﬃces to consider only the case when C is one of the B

i

. Now Z

is ( measurable, hence it is constant on B

i

; let its value be z

i

. Then

E[ZE[X [ (]; B

i

] = E[z

i

E[X [ (]; B

i

] = z

i

E[E[X [ (]; B

i

] = z

i

E[X; B

i

] = E[XZ; B

i

]

as desired.

15

4. Martingales.

Suppose we have a sequence of σ-ﬁelds T

1

⊂ T

2

⊂ T

3

. An example would be

repeatedly tossing a coin and letting T

k

be the sets that can be determined by the ﬁrst

k tosses. Another example is to let T

k

be the events that are determined by the values

of a stock at times 1 through k. A third example is to let X

1

, X

2

, . . . be a sequence of

random variables and let T

k

be the σ-ﬁeld generated by X

1

, . . . , X

k

, the smallest σ-ﬁeld

with respect to which X

1

, . . . , X

k

are measurable.

Deﬁnition 4.1. A r.v. X is integrable if E[X[ < ∞. Given an increasing sequence of

σ-ﬁelds T

n

, a sequence of r.v.’s X

n

is adapted if X

n

is T

n

measurable for each n.

Deﬁnition 4.2. A martingale M

n

is a sequence of random variables such that

(1) M

n

is integrable for all n,

(2) M

n

is adapted to T

n

, and

(3) for all n

E[M

n+1

[ T

n

] = M

n

. (4.1)

Usually (1) and (2) are easy to check, and it is (3) that is the crucial property. If

we have (1) and (2), but instead of (3) we have

(3/) for all n

E[M

n+1

[ T

n

] ≥ M

n

,

then we say M

n

is a submartingale. If we have (1) and (2), but instead of (3) we have

(3//) for all n

E[M

n+1

[ T

n

] ≤ M

n

,

then we say M

n

is a supermartingale.

Submartingales tends to increase and supermartingales tend to decrease. The

nomenclature may seem like it goes the wrong way; Doob deﬁned these terms by anal-

ogy with the notions of subharmonic and superharmonic functions in analysis. (Actually,

it is more than an analogy: we won’t explore this, but it turns out that the composition

of a subharmonic function with Brownian motion yields a submartingale, and similarly for

superharmonic functions.)

Note that the deﬁnition of martingale depends on the collection of σ-ﬁelds. When

it is needed for clarity, one can say that (M

n

, T

n

) is a martingale. To deﬁne conditional

expectation, one needs a probability, so a martingale depends on the probability as well.

When we need to, we will say that M

n

is a martingale with respect to the probability P.

This is an issue when there is more than one probability around.

We will see that martingales are ubiquitous in ﬁnancial math. For example, security

prices and one’s wealth will turn out to be examples of martingales.

16

The word “martingale” is also used for the piece of a horse’s bridle that runs from

the horse’s head to its chest. It keeps the horse from raising its head too high. It turns out

that martingales in probability cannot get too large. The word also refers to a gambling

system. I did some searching on the Internet, and there seems to be no consensus on the

derivation of the term.

Here is an example of a martingale. Let X

1

, X

2

, . . . be a sequence of independent

r.v.’s with mean 0 that are independent. (Saying a r.v. X

i

has mean 0 is the same as

saying EX

i

= 0; this presupposes that E[X

1

[ is ﬁnite.) Set T

n

= σ(X

1

, . . . , X

n

), the

σ-ﬁeld generated by X

1

, . . . , X

n

. Let M

n

=

n

i=1

X

i

. Deﬁnition 4.2(2) is easy to see.

Since E[M

n

[ ≤

n

i=1

E[X

i

[, Deﬁnition 4.2(1) also holds. We now check

E[M

n+1

[ T

n

] = X

1

+ +X

n

+E[X

n+1

[ T

n

] = M

n

+EX

n+1

= M

n

,

where we used the independence.

Another example: suppose in the above that the X

k

all have variance 1, and let

M

n

= S

2

n

−n, where S

n

=

n

i=1

X

i

. Again (1) and (2) of Deﬁnition 4.2 are easy to check.

We compute

E[M

n+1

[ T

n

] = E[S

2

n

+ 2X

n+1

S

n

+X

2

n+1

[ T

n

] −(n + 1).

We have E[S

2

n

[ T

n

] = S

2

n

since S

n

is T

n

measurable.

E[2X

n+1

S

n

[ T

n

] = 2S

n

E[X

n+1

[ T

n

] = 2S

n

EX

n+1

= 0.

And E[X

2

n+1

[ T

n

] = EX

2

n+1

= 1. Substituting, we obtain E[M

n+1

[ T

n

] = M

n

, or M

n

is

a martingale.

A third example: Suppose you start with a dollar and you are tossing a fair coin

independently. If it turns up heads you double your fortune, tails you go broke. This is

“double or nothing.” Let M

n

be your fortune at time n. To formalize this, let X

1

, X

2

, . . .

be independent r.v.’s that are equal to 2 with probability

1

2

and 0 with probability

1

2

. Then

M

n

= X

1

X

n

. Let T

n

be the σ-ﬁeld generated by X

1

, . . . , X

n

. Note 0 ≤ M

n

≤ 2

n

, and

so Deﬁnition 4.2(1) is satisﬁed, while (2) is easy. To compute the conditional expectation,

note EX

n+1

= 1. Then

E[M

n+1

[ T

n

] = M

n

E[X

n+1

[ T

n

] = M

n

EX

n+1

= M

n

,

using the independence.

Before we give our fourth example, let us observe that

[E[X [ T][ ≤ E[[X[ [ T]. (4.2)

To see this, we have −[X[ ≤ X ≤ [X[, so −E[[X[ [ T] ≤ E[X [ T] ≤ E[[X[ [ T]. Since

E[[X[ [ T] is nonnegative, (4.2) follows.

Our fourth example will be used many times, so we state it as a proposition.

17

Proposition 4.3. Let T

1

, T

2

, . . . be given and let X be a ﬁxed r.v. with E[X[ < ∞. Let

M

n

= E[X [ T

n

]. Then M

n

is a martingale.

Proof. Deﬁnition 4.2(2) is clear, while

E[M

n

[ ≤ E[E[[X[ [ T

n

]] = E[X[ < ∞

by (4.2); this shows Deﬁnition 4.2(1). We have

E[M

n+1

[ T

n

] = E[E[X [ T

n+1

] [ T

n

] = E[X [ T

n

] = M

n

.

18

5. Properties of martingales.

When it comes to discussing American options, we will need the concept of stopping

times. A mapping τ from Ω into the nonnegative integers is a stopping time if (τ = k) ∈ T

k

for each k. One sometimes allows τ to also take on the value ∞.

An example is τ = min¦k : S

k

≥ A¦. This is a stopping time because (τ = k) =

(S

0

, S

1

, . . . , S

k−1

< A, S

k

≥ A) ∈ T

k

. We can think of a stopping time as the ﬁrst time

something happens. σ = max¦k : S

k

≥ A¦, the last time, is not a stopping time. (We will

use the convention that the minimum of an empty set is +∞; so, for example, with the

above deﬁnition of τ, on the event that S

k

is never in A, we have τ = ∞.

Here is an intuitive description of a stopping time. If I tell you to drive to the city

limits and then drive until you come to the second stop light after that, you know when

you get there that you have arrived; you don’t need to have been there before or to look

ahead. But if I tell you to drive until you come to the second stop light before the city

limits, either you must have been there before or else you have to go past where you are

supposed to stop, continue on to the city limits, and then turn around and come back two

stop lights. You don’t know when you ﬁrst get to the second stop light before the city

limits that you get to stop there. The ﬁrst set of instructions forms a stopping time, the

second set does not.

Note (τ ≤ k) = ∪

k

j=0

(τ = j). Since (τ = j) ∈ T

j

⊂ T

k

, then the event (τ ≤ k) ∈ T

k

for all k. Conversely, if τ is a r.v. with (τ ≤ k) ∈ T

k

for all k, then

(τ = k) = (τ ≤ k) −(τ ≤ k −1).

Since (τ ≤ k) ∈ T

k

and (τ ≤ k −1) ∈ T

k−1

⊂ T

k

, then (τ = k) ∈ T

k

, and such a τ must

be a stopping time.

Our ﬁrst result is Jensen’s inequality.

Proposition 5.1. If g is convex, then

g(E[X [ (]) ≤ E[g(X) [ (]

provided all the expectations exist.

For ordinary expectations rather than conditional expectations, this is still true.

That is, if g is convex and the expectations exist, then

g(EX) ≤ E[g(X)].

We already know some special cases of this: when g(x) = [x[, this says [EX[ ≤ E[X[;

when g(x) = x

2

, this says (EX)

2

≤ EX

2

, which we know because EX

2

− (EX)

2

=

E(X −EX)

2

≥ 0.

19

For Proposition 5.1 as well as many of the following propositions, the statement of

the result is more important than the proof, and we relegate the proof to Note 1 below.

One reason we want Jensen’s inequality is to show that a convex function applied

to a martingale yields a submartingale.

Proposition 5.2. If M

n

is a martingale and g is convex, then g(M

n

) is a submartingale,

provided all the expectations exist.

Proof. By Jensen’s inequality,

E[g(M

n+1

) [ T

n

] ≥ g(E[M

n+1

[ T

n

]) = g(M

n

).

If M

n

is a martingale, then EM

n

= E[E[M

n+1

[ T

n

]] = EM

n+1

. So EM

0

=

EM

1

= = EM

n

. Doob’s optional stopping theorem says the same thing holds when

ﬁxed times n are replaced by stopping times.

Theorem 5.3. Suppose K is a positive integer, N is a stopping time such that N ≤ K

a.s., and M

n

is a martingale. Then

EM

N

= EM

K

.

Here, to evaluate M

N

, one ﬁrst ﬁnds N(ω) and then evaluates M

·

(ω) for that value of N.

Proof. We have

EM

N

=

K

k=0

E[M

N

; N = k].

If we show that the k-th summand is E[M

n

; N = k], then the sum will be

K

k=0

E[M

n

; N = k] = EM

n

as desired. We have

E[M

N

; N = k] = E[M

k

; N = k]

by the deﬁnition of M

N

. Now (N = k) is in T

k

, so by Proposition 2.2 and the fact that

M

k

= E[M

k+1

[ T

k

],

E[M

k

; N = k] = E[M

k+1

; N = k].

We have (N = k) ∈ T

k

⊂ T

k+1

. Since M

k+1

= E[M

k+2

[ T

k+1

], Proposition 2.2 tells us

that

E[M

k+1

; N = k] = E[M

k+2

; N = k].

20

We continue, using (N = k) ∈ T

k

⊂ T

k+1

⊂ T

k+2

, and we obtain

E[M

N

; N = k] = E[M

k

; N = k] = E[M

k+1

; N = k] = = E[M

n

; N = k].

If we change the equalities in the above to inequalities, the same result holds for sub-

martingales.

As a corollary we have two of Doob’s inequalities:

Theorem 5.4. If M

n

is a nonnegative submartingale,

(a) P(max

k≤n

M

k

≥ λ) ≤

1

λ

EM

n

.

(b) E(max

k≤n

M

2

k

) ≤ 4EM

2

n

.

For the proof, see Note 2 below.

Note 1. We prove Proposition 5.1. If g is convex, then the graph of g lies above all the

tangent lines. Even if g does not have a derivative at x

0

, there is a line passing through x

0

which lies beneath the graph of g. So for each x

0

there exists c(x

0

) such that

g(x) ≥ g(x

0

) +c(x

0

)(x −x

0

).

Apply this with x = X(ω) and x

0

= E[X [ (](ω). We then have

g(X) ≥ g(E[X [ (]) +c(E[X [ (])(X −E[X [ (]).

If g is diﬀerentiable, we let c(x

0

) = g

(x

0

). In the case where g is not diﬀerentiable, then we

choose c to be the left hand upper derivate, for example. (For those who are not familiar with

derivates, this is essentially the left hand derivative.) One can check that if c is so chosen,

then c(E[X [ (]) is ( measurable.

Now take the conditional expectation with respect to (. The ﬁrst term on the right is

( measurable, so remains the same. The second term on the right is equal to

c(E[X [ (])E[X −E[X [ (] [ (] = 0.

Note 2. We prove Theorem 5.4. Set M

n+1

= M

n

. It is easy to see that the sequence

M

1

, M

2

, . . . , M

n+1

is also a submartingale. Let N = min¦k : M

k

≥ λ¦ ∧ (n + 1), the ﬁrst

time that M

k

is greater than or equal to λ, where a ∧ b = min(a, b). Then

P(max

k≤n

M

k

≥ λ) = P(N ≤ n)

21

and if N ≤ n, then M

N

≥ λ. Now

P(max

k≤n

M

k

≥ λ) = E[1

(N≤n)

] ≤ E

_

M

N

λ

; N ≤ n

_

(5.1)

=

1

λ

E[M

N∧n

; N ≤ n] ≤

1

λ

EM

N∧n

.

Finally, since M

n

is a submartingale, EM

N∧n

≤ EM

n

.

We now look at (b). Let us write M

∗

for max

k≤n

M

k

. If EM

2

n

= ∞, there is nothing

to prove. If it is ﬁnite, then by Jensen’s inequality, we have

EM

2

k

= E[E[M

n

[ T

k

]

2

] ≤ E[E[M

2

n

[ T

k

] ] = EM

2

n

< ∞

for k ≤ n. Then

E(M

∗

)

2

= E[ max

1≤k≤n

M

2

k

] ≤ E

_

n

k=1

M

2

k

¸

< ∞.

We have

E[M

N∧n

; N ≤ n] =

∞

k=0

E[M

k∧n

; N = k].

Arguing as in the proof of Theorem 5.3,

E[M

k∧n

; N = k] ≤ E[M

n

; N = k],

and so

E[M

N∧n

; N ≤ n] ≤

∞

k=0

E[M

n

; N = k] = E[M

n

; N ≤ n].

The last expression is at most E[M

n

; M

∗

≥ λ]. If we multiply (5.1) by 2λ and integrate over

λ from 0 to ∞, we obtain

_

∞

0

2λP(M

∗

≥ λ)dλ ≤ 2

_

∞

0

E[M

n

: M

∗

≥ λ]

= 2E

_

∞

0

M

n

1

(M

∗

≥λ)

dλ

= 2E

_

M

n

_

M

∗

0

dλ

_

= 2E[M

n

M

∗

].

Using Cauchy-Schwarz, this is bounded by

2(EM

2

n

)

1/2

(E(M

∗

)

2

)

1/2

.

22

On the other hand,

_

∞

0

2λP(M

∗

≥ λ)dλ = E

_

∞

0

2λ1

(M

∗

≥λ)

dλ

= E

_

M

∗

0

2λdλ = E(M

∗

)

2

.

We therefore have

E(M

∗

)

2

≤ 2(EM

2

n

)

1/2

(E(M

∗

)

2

)

1/2

.

Recall we showed E(M

∗

)

2

< ∞. We divide both sides by (E(M

∗

)

2

)

1/2

, square both sides,

and obtain (b).

Note 3. We will show that bounded martingales converge. (The hypothesis of boundedness

can be weakened; for example, E[M

n

[ ≤ c < ∞ for some c not depending on n suﬃces.)

Theorem 5.5. Suppose M

n

is a martingale bounded in absolute value by K. That is,

[M

n

[ ≤ K for all n. Then lim

n→∞

M

n

exists a.s.

Proof. Since M

n

is bounded, it can’t tend to +∞ or −∞. The only possibility is that it

might oscillate. Let a < b be two rationals. What might go wrong is that M

n

might be larger

than b inﬁnitely often and less than a inﬁnitely often. If we show the probability of this is 0,

then taking the union over all pairs of rationals (a, b) shows that almost surely M

n

cannot

oscillate, and hence must converge.

Fix a < b, let N

n

= (M

n

− a)

+

, and let S

1

= min¦k : N

k

≤ 0¦, T

1

= min¦k > S

1

:

N

k

≥ b − a¦, S

2

= min¦k > T

1

: N

k

≤ 0¦, and so on. Let U

n

= max¦k : T

k

≤ n¦. U

n

is called the number of upcrossings up to time n. We want to show that max

n

U

n

< ∞ a.s.

Note by Jensen’s inequality N

n

is a submartingale. Since S

1

< T

1

< S

2

< , then S

n+1

> n.

We can write

2K ≥ N

n

−N

S

n+1

∧n

=

n+1

k=1

(N

S

k+1

∧n

−N

T

k

∧n

) +

n+1

k=1

(N

T

k

∧n

−N

S

k

∧n

).

Now take expectations. The expectation of the ﬁrst sum on the right and the last term are

greater than or equal to zero by optional stopping. The middle term is larger than (b −a)U

n

,

so we conclude

(b −a)EU

n

≤ 2K.

Let n → ∞ to see that E max

n

U

n

< ∞, which implies max

n

U

n

< ∞ a.s., which is what we

needed.

Note 4. We will state Fatou’s lemma in the following form.

If X

n

is a sequence of nonnegative random variables converging to X a.s., then EX ≤

sup

n

EX

n

.

This formulation is equivalent to the classical one and is better suited for our use.

23

6. The one step binomial asset pricing model.

Let us begin by giving the simplest possible model of a stock and see how a European

call option should be valued in this context.

Suppose we have a single stock whose price is S

0

. Let d and u be two numbers with

0 < d < 1 < u. Here “d” is a mnemonic for “down” and “u” for “up.” After one time unit

the stock price will be either uS

0

with probability P or else dS

0

with probability Q, where

P + Q = 1. We will assume 0 < P, Q < 1. Instead of purchasing shares in the stock, you

can also put your money in the bank where one will earn interest at rate r. Alternatives

to the bank are money market funds or bonds; the key point is that these are considered

to be risk-free.

A European call option in this context is the option to buy one share of the stock

at time 1 at price K. K is called the strike price. Let S

1

be the price of the stock at time

1. If S

1

is less than K, then the option is worthless at time 1. If S

1

is greater than K, you

can use the option at time 1 to buy the stock at price K, immediately turn around and

sell the stock for price S

1

and make a proﬁt of S

1

−K. So the value of the option at time

1 is

V

1

= (S

1

−K)

+

,

where x

+

is max(x, 0). The principal question to be answered is: what is the value V

0

of

the option at time 0? In other words, how much should one pay for a European call option

with strike price K?

It is possible to buy a negative number of shares of a stock. This is equivalent to

selling shares of a stock you don’t have and is called selling short. If you sell one share

of stock short, then at time 1 you must buy one share at whatever the market price is at

that time and turn it over to the person that you sold the stock short to. Similarly you

can buy a negative number of options, that is, sell an option.

You can also deposit a negative amount of money in the bank, which is the same

as borrowing. We assume that you can borrow at the same interest rate r, not exactly a

totally realistic assumption. One way to make it seem more realistic is to assume you have

a large amount of money on deposit, and when you borrow, you simply withdraw money

from that account.

We are looking at the simplest possible model, so we are going to allow only one

time step: one makes an investment, and looks at it again one day later.

Let’s suppose the price of a European call option is V

0

and see what conditions

one can put on V

0

. Suppose you start out with V

0

dollars. One thing you could do is

buy one option. The other thing you could do is use the money to buy ∆

0

shares of

stock. If V

0

> ∆

0

S

0

, there will be some money left over and you put that in the bank. If

V

0

< ∆

0

S

0

, you do not have enough money to buy the stock, and you make up the shortfall

by borrowing money from the bank. In either case, at this point you have V

0

− ∆

0

S

0

in

24

the bank and ∆

0

shares of stock.

If the stock goes up, at time 1 you will have

∆

0

uS

0

+ (1 +r)(V

0

−∆

0

S

0

),

and if it goes down,

∆

0

dS

0

+ (1 +r)(V

0

−∆

0

S

0

).

We have not said what ∆

0

should be. Let us do that now. Let V

u

1

= (uS

0

− K)

+

and V

d

1

= (dS

0

−K)

+

. Note these are deterministic quantities, i.e., not random. Let

∆

0

=

V

u

1

−V

d

1

uS

0

−dS

0

,

and we will also need

W

0

=

1

1 +r

_

1 +r −d

u −d

V

u

1

+

u −(1 +r)

u −d

V

d

1

_

.

In a moment we will do some algebra and see that if the stock goes up and you had

bought stock instead of the option you would now have

V

u

1

+ (1 +r)(V

0

−W

0

),

while if the stock went down, you would now have

V

d

1

+ (1 +r)(V

0

−W

0

).

Let’s check the ﬁrst of these, the second being similar. We need to show

∆

0

uS

0

+ (1 +r)(V

0

−∆

0

S

0

) = V

u

1

+ (1 +r)(V

0

−W

0

). (6.1)

The left hand side of (6.1) is equal to

∆

0

S

0

(u −(1 +r)) + (1 +r)V

0

=

V

u

1

−V

d

1

u −d

(u −(1 +r)) + (1 +r)V

0

. (6.2)

The right hand side of (6.1) is equal to

V

u

1

−

_

1 +r −d

u −d

V

u

1

+

u −(1 +r)

u −d

V

d

1

_

+ (1 +r)V

0

. (6.3)

Now check that the coeﬃcients of V

0

, of V

u

1

, and of V

d

1

agree in (6.2) and (6.3).

Suppose that V

0

> W

0

. What you want to do is come along with no money, sell

one option for V

0

dollars, use the money to buy ∆

0

shares, and put the rest in the bank

25

(or borrow if necessary). If the buyer of your option wants to exercise the option, you give

him one share of stock and sell the rest. If he doesn’t want to exercise the option, you sell

your shares of stock and pocket the money. Remember it is possible to have a negative

number of shares. You will have cleared (1 + r)(V

0

− W

0

), whether the stock went up or

down, with no risk.

If V

0

< W

0

, you just do the opposite: sell ∆

0

shares of stock short, buy one option,

and deposit or make up the shortfall from the bank. This time, you clear (1+r)(W

0

−V

0

),

whether the stock goes up or down.

Now most people believe that you can’t make a proﬁt on the stock market without

taking a risk. The name for this is “no free lunch,” or “arbitrage opportunities do not

exist.” The only way to avoid this is if V

0

= W

0

. In other words, we have shown that the

only reasonable price for the European call option is W

0

.

The “no arbitrage” condition is not just a reﬂection of the belief that one cannot get

something for nothing. It also represents the belief that the market is freely competitive.

The way it works is this: suppose W

0

= $3. Suppose you could sell options at a price

V

0

= $5; this is larger than W

0

and you would earn V

0

−W

0

= $2 per option without risk.

Then someone else would observe this and decide to sell the same option at a price less

than V

0

but larger than W

0

, say $4. This person would still make a proﬁt, and customers

would go to him and ignore you because they would be getting a better deal. But then a

third person would decide to sell the option for less than your competition but more than

W

0

, say at $3.50. This would continue as long as any one would try to sell an option above

price W

0

.

We will examine this problem of pricing options in more complicated contexts, and

while doing so, it will become apparent where the formulas for ∆

0

and W

0

came from. At

this point, we want to make a few observations.

Remark 6.1. First of all, if 1 +r > u, one would never buy stock, since one can always

do better by putting money in the bank. So we may suppose 1 + r < u. We always have

1 +r ≥ 1 > d. If we set

p =

1 +r −d

u −d

, q =

u −(1 +r)

u −d

,

then p, q ≥ 0 and p +q = 1. Thus p and q act like probabilities, but they have nothing to

do with P and Q. Note also that the price V

0

= W

0

does not depend on P or Q. It does

depend on p and q, which seems to suggest that there is an underlying probability which

controls the option price and is not the one that governs the stock price.

Remark 6.2. There is nothing special about European call options in our argument

above. One could let V

u

1

and V

1

d

be any two values of any option, which are paid out if the

26

stock goes up or down, respectively. The above analysis shows we can exactly duplicate

the result of buying any option V by instead buying some shares of stock. If in some model

one can do this for any option, the market is called complete in this model.

Remark 6.3. If we let P be the probability so that S

1

= uS

0

with probability p and

S

1

= dS

0

with probability q and we let E be the corresponding expectation, then some

algebra shows that

V

0

=

1

1 +r

EV

1

.

This will be generalized later.

Remark 6.4. If one buys one share of stock at time 0, then one expects at time 1 to

have (Pu + Qd)S

0

. One then divides by 1 + r to get the value of the stock in today’s

dollars. (r, the risk-free interest rate, can also be considered the rate of inﬂation. A dollar

tomorrow is equivalent to 1/(1 +r) dollars today.) Suppose instead of P and Q being the

probabilities of going up and down, they were in fact p and q. One would then expect to

have (pu+qd)S

0

and then divide by 1+r. Substituting the values for p and q, this reduces

to S

0

. In other words, if p and q were the correct probabilities, one would expect to have

the same amount of money one started with. When we get to the binomial asset pricing

model with more than one step, we will see that the generalization of this fact is that the

stock price at time n is a martingale, still with the assumption that p and q are the correct

probabilities. This is a special case of the fundamental theorem of ﬁnance: there always

exists some probability, not necessarily the one you observe, under which the stock price

is a martingale.

Remark 6.5. Our model allows after one time step the possibility of the stock going up or

going down, but only these two options. What if instead there are 3 (or more) possibilities.

Suppose for example, that the stock goes up a factor u with probability P, down a factor

d with probability Q, and remains constant with probability R, where P + Q + R = 1.

The corresponding price of a European call option would be (uS

0

−K)

+

, (dS

0

−K)

+

, or

(S

0

−K)

+

. If one could replicate this outcome by buying and selling shares of the stock,

then the “no arbitrage” rule would give the exact value of the call option in this model.

But, except in very special circumstances, one cannot do this, and the theory falls apart.

One has three equations one wants to satisfy, in terms of V

u

1

, V

d

1

, and V

c

1

. (The “c” is

a mnemonic for “constant.”) There are however only two variables, ∆

0

and V

0

at your

disposal, and most of the time three equations in two unknowns cannot be solved.

Remark 6.6. In our model we ruled out the cases that P or Q were zero. If Q = 0,

that is, we are certain that the stock will go up, then we would always invest in the stock

if u > 1 + r, as we would always do better, and we would always put the money in the

bank if u ≤ 1 +r. Similar considerations apply when P = 0. It is interesting to note that

27

the cases where P = 0 or Q = 0 are the only ones in which our derivation is not valid.

It turns out that in more general models the true probabilities enter only in determining

which events have probability 0 or 1 and in no other way.

28

7. The multi-step binomial asset pricing model.

In this section we will obtain a formula for the pricing of options when there are n

time steps, but each time the stock can only go up by a factor u or down by a factor d.

The “Black-Scholes” formula we will obtain is already a nontrivial result that is useful.

We assume the following.

(1) Unlimited short selling of stock

(2) Unlimited borrowing

(3) No transaction costs

(4) Our buying and selling is on a small enough scale that it does not aﬀect the market.

We need to set up the probability model. Ω will be all sequences of length n of H’s

and T’s. S

0

will be a ﬁxed number and we deﬁne S

k

(ω) = u

j

d

k−j

S

0

if the ﬁrst k elements

of a given ω ∈ Ω has j occurrences of H and k −j occurrences of T. (What we are doing is

saying that if the j-th element of the sequence making up ω is an H, then the stock price

goes up by a factor u; if T, then down by a factor d.) T

k

will be the σ-ﬁeld generated by

S

0

, . . . , S

k

.

Let

p =

(1 +r) −d

u −d

, q =

u −(1 +r)

u −d

and deﬁne P(ω) = p

j

q

n−j

if ω has j appearances of H and n − j appearances of T. We

observe that under P the random variables S

k+1

/S

k

are independent and equal to u with

probability p and d with probability q. To see this, let Y

k

= S

k

/S

k−1

. Thus Y

k

is the

factor the stock price goes up or down at time k. Then P(Y

1

= y

1

, . . . , Y

n

= y

n

) = p

j

q

n−j

,

where j is the number of the y

k

that are equal to u. On the other hand, this is equal to

P(Y

1

= y

1

) P(Y

n

= y

n

). Let E denote the expectation corresponding to P.

The P we construct may not be the true probabilities of going up or down. That

doesn’t matter - it will turn out that using the principle of “no arbitrage,” it is P that

governs the price.

Our ﬁrst result is the fundamental theorem of ﬁnance in the current context.

Proposition 7.1. Under P the discounted stock price (1 +r)

−k

S

k

is a martingale.

Proof. Since the random variable S

k+1

/S

k

is independent of T

k

, we have

E[(1 +r)

−(k+1)

S

k+1

[ T

k

] = (1 +r)

−k

S

k

(1 +r)

−1

E[S

k+1

/S

k

[ T

k

].

Using the independence the conditional expectation on the right is equal to

E[S

k+1

/S

k

] = pu +qd = 1 +r.

29

Substituting yields the proposition.

Let ∆

k

be the number of shares held between times k and k + 1. We require ∆

k

to be T

k

measurable. ∆

0

, ∆

1

, . . . is called the portfolio process. Let W

0

be the amount

of money you start with and let W

k

be the amount of money you have at time k. W

k

is

the wealth process. If we have ∆

k

shares between times k and k + 1, then at time k + 1

those shares will be worth ∆

k

S

k+1

. The amount of cash we hold between time k and k +1

is W

k

minus the amount held in stock, that is, W

k

− ∆

k

S

k

. At time k + 1 this is worth

(1 +r)[W

k

−∆

k

S

k

]. Therefore

W

k+1

= ∆

k

S

k+1

+ (1 +r)[W

k

−∆

k

S

k

].

Note that in the case where r = 0 we have

W

k+1

−W

k

= ∆

k

(S

k+1

−S

k

),

or

W

k+1

= W

0

+

k

i=0

∆

i

(S

i+1

−S

i

).

This is a discrete version of a stochastic integral. Since

E[W

k+1

−W

k

[ T

k

] = ∆

k

E[S

k+1

−S

k

[ T

k

] = 0,

it follows that in the case r = 0 that W

k

is a martingale. More generally

Proposition 7.2. Under P the discounted wealth process (1 +r)

−k

W

k

is a martingale.

Proof. We have

(1 +r)

−(k+1)

W

k+1

= (1 +r)

−k

W

k

+ ∆

k

[(1 +r)

−(k+1)

S

k+1

−(1 +r)

−k

S

k

].

Observe that

E[∆

k

[(1 +r)

−(k+1)

S

k+1

−(1 +r)

−k

S

k

[ T

k

]

= ∆

k

E[(1 +r)

−(k+1)

S

k+1

−(1 +r)

−k

S

k

[ T

k

] = 0.

The result follows.

Our next result is that the binomial model is complete. It is easy to lose the idea

in the algebra, so ﬁrst let us try to see why the theorem is true.

For simplicity let us ﬁrst consider the case r = 0. Let V

k

= E[V [ T

k

]; by Propo-

sition 4.3 we see that V

k

is a martingale. We want to construct a portfolio process, i.e.,

30

choose ∆

k

’s, so that W

n

= V . We will do it inductively by arranging matters so that

W

k

= V

k

for all k. Recall that W

k

is also a martingale.

Suppose we have W

k

= V

k

at time k and we want to ﬁnd ∆

k

so that W

k+1

= V

k+1

.

At the (k +1)-st step there are only two possible changes for the price of the stock and so

since V

k+1

is T

k+1

measurable, only two possible values for V

k+1

. We need to choose ∆

k

so that W

k+1

= V

k+1

for each of these two possibilities. We only have one parameter, ∆

k

,

to play with to match up two numbers, which may seem like an overconstrained system of

equations. But both V and W are martingales, which is why the system can be solved.

Now let us turn to the details. In the following proof we allow r ≥ 0.

Theorem 7.3. The binomial asset pricing model is complete.

The precise meaning of this is the following. If V is any random variable that is T

n

measurable, there exists a constant W

0

and a portfolio process ∆

k

so that the wealth

process W

k

satisﬁes W

n

= V . In other words, starting with W

0

dollars, we can trade

shares of stock to exactly duplicate the outcome of any option V .

Proof. Let

V

k

= (1 +r)

k

E[(1 +r)

−n

V [ T

k

].

By Proposition 4.3 (1 +r)

−k

V

k

is a martingale. If ω = (t

1

, . . . , t

n

), where each t

i

is an H

or T, let

∆

k

(ω) =

V

k+1

(t

1

, . . . , t

k

, H, t

k+2

, . . . , t

n

) −V

k+1

(t

1

, . . . , t

k

, T, t

k+2

, . . . , t

n

)

S

k+1

(t

1

, . . . , t

k

, H, t

k+2

, . . . , t

n

) −S

k+1

(t

1

, . . . , t

k

, T, t

k+2

, . . . , t

n

)

.

Set W

0

= V

0

, and we will show by induction that the wealth process at time k equals V

k

.

The ﬁrst thing to show is that ∆

k

is T

k

measurable. Neither S

k+1

nor V

k+1

depends

on t

k+2

, . . . , t

n

. So ∆

k

depends only on the variables t

1

, . . . , t

k

, hence is T

k

measurable.

Now t

k+2

, . . . , t

n

play no role in the rest of the proof, and t

1

, . . . , t

k

will be ﬁxed,

so we drop the t’s from the notation. If we write V

k+1

(H), this is an abbreviation for

V

k+1

(t

1

, . . . , t

k

, H, t

k+2

, . . . , t

n

).

We know (1 +r)

−k

V

k

is a martingale under P so that

V

k

= E[(1 +r)

−1

V

k+1

[ T

k

] (7.1)

=

1

1 +r

[pV

k+1

(H) +qV

k+1

(T)].

(See Note 1.) We now suppose W

k

= V

k

and want to show W

k+1

(H) = V

k+1

(H) and

W

k+1

(T) = V

k+1

(T). Then using induction we have W

n

= V

n

= V as required. We show

the ﬁrst equality, the second being similar.

31

W

k+1

(H) = ∆

k

S

k+1

(H) + (1 +r)[W

k

−∆

k

S

k

]

= ∆

k

[uS

k

−(1 +r)S

k

] + (1 +r)V

k

=

V

k+1

(H) −V

k+1

(T)

(u −d)S

k

S

k

[u −(1 +r)] +pV

k+1

(H) +qV

k+1

(T)

= V

k+1

(H).

We are done.

Finally, we obtain the Black-Scholes formula in this context. Let V be any option

that is T

n

-measurable. The one we have in mind is the European call, for which V =

(S

n

−K)

+

, but the argument is the same for any option whatsoever.

Theorem 7.4. The value of the option V at time 0 is V

0

= (1 +r)

−n

EV .

Proof. We can construct a portfolio process ∆

k

so that if we start with W

0

= (1+r)

−n

EV ,

then the wealth at time n will equal V , no matter what the market does in between. If

we could buy or sell the option V at a price other than W

0

, we could obtain a riskless

proﬁt. That is, if the option V could be sold at a price c

0

larger than W

0

, we would sell

the option for c

0

dollars, use W

0

to buy and sell stock according to the portfolio process

∆

k

, have a net worth of V +(1 +r)

n

(c

0

−W

0

) at time n, meet our obligation to the buyer

of the option by using V dollars, and have a net proﬁt, at no risk, of (1 + r)

n

(c

0

− W

0

).

If c

0

were less than W

0

, we would do the same except buy an option, hold −∆

k

shares at

time k, and again make a riskless proﬁt. By the “no arbitrage” rule, that can’t happen,

so the price of the option V must be W

0

.

Remark 7.5. Note that the proof of Theorem 7.4 tells you precisely what hedging

strategy (i.e., what portfolio process) to use.

In the binomial asset pricing model, there is no diﬃculty computing the price of a

European call. We have

E(S

n

−K)

+

=

x

(x −K)

+

P(S

n

= x)

and

P(S

n

= x) =

_

n

k

_

p

k

q

n−k

if x = u

k

d

n−k

S

0

. Therefore the price of the European call is

(1 +r)

−n

n

k=0

(u

k

d

n−k

S

0

−K)

+

_

n

k

_

p

k

q

n−k

.

32

The formula in Theorem 7.4 holds for exotic options as well. Suppose

V = max

i=1,...,n

S

i

− min

j=1,...,n

S

j

.

In other words, you sell the stock for the maximum value it takes during the ﬁrst n time

steps and you buy at the minimum value the stock takes; you are allowed to wait until

time n and look back to see what the maximum and minimum were. You can even do this

if the maximum comes before the minimum. This V is still T

n

measurable, so the theory

applies. Naturally, such a “buy low, sell high” option is very desirable, and the price of

such a V will be quite high. It is interesting that even without using options, you can

duplicate the operation of buying low and selling high by holding an appropriate number

of shares ∆

k

at time k, where you do not look into the future to determine ∆

k

.

Let us look at an example of a European call so that it is clear how to do the

calculations. Consider the binomial asset pricing model with n = 3, u = 2, d =

1

2

, r = 0.1,

S

0

= 10, and K = 15. If V is a European call with strike price K and exercise date n, let

us compute explicitly the random variables V

1

and V

2

and calculate the value V

0

. Let us

also compute the hedging strategy ∆

0

, ∆

1

, and ∆

2

.

Let

p =

(1 +r) −d

u −d

= .4, q =

u −(1 +r)

u −d

= .6.

The following table describes the values of the stock, the payoﬀ V , and the probabilities

for each possible outcome ω.

33

ω S

1

S

2

S

3

V Probability

HHH 10u 10u

2

10u

3

65 p

3

HHT 10u 10u

2

10u

2

d 5 p

2

q

HTH 10u 10ud 10u

2

d 5 p

2

q

HTT 10u 10ud 10ud

2

0 pq

2

THH 10d 10ud 10u

2

d 5 p

2

q

THT 10d 10ud 10ud

2

0 pq

2

TTH 10d 10d

2

10ud

2

0 pq

2

TTT 10d 10d

2

10d

3

0 q

3

We then calculate

V

0

= (1 +r)

−3

EV = (1 +r)

−3

(65p

3

+ 15p

2

q) = 4.2074.

V

1

= (1 +r)

−2

E[V [ T

1

], so we have

V

1

(H) = (1 +r)

−2

(65p

2

+ 10pq) = 10.5785, V

1

(T) = (1 +r)

−2

5pq = .9917.

V

2

= (1 +r)

−1

E[V [ T

2

], so we have

V

2

(HH) = (1 +r)

−1

(65p + 5q) = 24.5454, V

2

(HT) = (1 +r)

−1

5p = 1.8182,

V

2

(TH) = (1 +r)

−1

5p = 1.8182, V

2

(TT) = 0.

The formula for ∆

k

is given by

∆

k

=

V

k+1

(H) −V

k+1

(T)

S

k+1

(H) −S

k+1

(T)

,

so

∆

0

=

V

1

(H) −V

1

(T)

S

1

(H) −S

1

(T)

= .6391,

where V

1

and S

1

are as above.

∆

1

(H) =

V

2

(HH) −V

2

(HT)

S

2

(HH) −S

2

(HT)

= .7576, ∆

1

(T) =

V

2

(TH) −V

2

(TT)

S

2

(TH) −S

2

(TT)

= .2424.

∆

2

(HH) =

V

3

(HHH) −V

3

(HHT)

S

3

(HHH) −S

3

(HHT)

= 1.0,

∆

2

(HT) =

V

3

(HTH) −V

3

(HTT)

S

3

(HTH) −S

3

(HTT)

= .3333,

∆

2

(TH) =

V

3

(THH) −V

3

(THT)

S

3

(THH) −S

3

(THT)

= .3333,

∆

2

(TT) =

V

3

(TTH) −V

3

(TTT)

S

3

(TTH) −S

3

(TTT)

= 0.0.

34

Note 1. The second equality is (7.1) is not entirely obvious. Intuitively, it says that one has

a heads with probability p and the value of V

k+1

is V

k+1

(H) and one has tails with probability

q, and the value of V

k+1

is V

k+1

(T).

Let us give a more rigorous proof of (7.1). The right hand side of (7.1) is T

k

measurable,

so we need to show that if A ∈ T

k

, then

E[V

k+1

; A] = E[pV

k+1

(H) +qV

k+1

(T); A].

By linearity, it suﬃces to show this for A = ¦ω = (t

1

t

2

t

n

) : t

1

= s

1

, . . . , t

k

= s

k

¦, where

s

1

s

2

s

k

is any sequence of H’s and T’s. Now

E[V

k+1

; s

1

s

k

] = E[V

k+1

; s

1

s

k

H] +E[V

k+1

; s

1

s

k

T]

= V

k+1

(s

1

s

k

H)P(s

1

s

k

H) +V

k+1

(s

1

s

k

T)P(s

1

s

k

T).

By independence this is

V

k+1

(s

1

s

k

H)P(s

1

s

k

)p +V

k+1

(s

1

s

k

T)P(s

1

s

k

)q,

which is what we wanted.

35

8. American options.

An American option is one where you can exercise the option any time before some

ﬁxed time T. For example, on a European call, one can only use it to buy a share of stock

at the expiration time T, while for an American call, at any time before time T, one can

decide to pay K dollars and obtain a share of stock.

Let us give an informal argument on how to price an American call, giving a more

rigorous argument in a moment. One can always wait until time T to exercise an American

call, so the value must be at least as great as that of a European call. On the other hand,

suppose you decide to exercise early. You pay K dollars, receive one share of stock, and

your wealth is S

t

−K. You hold onto the stock, and at time T you have one share of stock

worth S

T

, and for which you paid K dollars. So your wealth is S

T

−K ≤ (S

T

−K)

+

. In

fact, we have strict inequality, because you lost the interest on your K dollars that you

would have received if you had waited to exercise until time T. Therefore an American

call is worth no more than a European call, and hence its value must be the same as that

of a European call.

This argument does not work for puts, because selling stock gives you some money

on which you will receive interest, so it may be advantageous to exercise early. (A put is

the option to sell a stock at a price K at time T.)

Here is the more rigorous argument. Suppose that if you exercise the option at time

k, your payoﬀ is g(S

k

). In present day dollars, that is, after correcting for inﬂation, you

have (1+r)

−k

g(S

k

). You have to make a decision on when to exercise the option, and that

decision can only be based on what has already happened, not on what is going to happen

in the future. In other words, we have to choose a stopping time τ, and we exercise the

option at time τ(ω). Thus our payoﬀ is (1 +r)

−τ

g(S

τ

). This is a random quantity. What

we want to do is ﬁnd the stopping time that maximizes the expected value of this random

variable. As usual, we work with P, and thus we are looking for the stopping time τ such

that τ ≤ n and

E(1 +r)

−τ

g(S

τ

)

is as large as possible. The problem of ﬁnding such a τ is called an optimal stopping

problem.

Suppose g(x) is convex with g(0) = 0. Certainly g(x) = (x−K)

+

is such a function.

We will show that τ ≡ n is the solution to the above optimal stopping problem: the best

time to exercise is as late as possible.

We have

g(λx) = g(λx + (1 −λ) 0) ≤ λg(x) + (1 −λ)g(0) = λg(x), 0 ≤ λ ≤ 1. (8.1)

36

By Jensen’s inequality,

E[(1 +r)

−(k+1)

g(S

k+1

) [ T

k

] = (1 +r)

−k

E

_

1

1 +r

g(S

k+1

) [ T

k

_

≥ (1 +r)

−k

E

_

g

_

1

1 +r

S

k+1

_

[ T

k

_

≥ (1 +r)

−k

g

_

E

_

1

1 +r

S

k+1

[ T

k

__

= (1 +r)

−k

g(S

k

).

For the ﬁrst inequality we used (8.1). So (1 + r)

−k

g(S

k

) is a submartingale. By optional

stopping,

E[(1 +r)

−τ

g(S

τ

)] ≤ E[(1 +r)

−n

g(S

n

)],

so τ ≡ n always does best.

For puts, the payoﬀ is g(S

k

), where g(x) = (K −x)

+

. This is also convex function,

but this time g(0) ,= 0, and the above argument fails.

Although good approximations are known, an exact solution to the problem of

valuing an American put is unknown, and is one of the major unsolved problems in ﬁnancial

mathematics.

37

9. Continuous random variables.

We are now going to start working toward continuous times and stocks that can

take any positive number as a value, so we need to prepare by extending some of our

deﬁnitions.

Given any random variable X ≥ 0, we can approximate it by r.v’s X

n

that are

discrete. We let

X

n

=

n2

n

i=0

i

2

n

1

(i/2

n

≤X<(i+1)/2

n

)

.

In words, if X(ω) lies between 0 and n, we let X

n

(ω) be the closest value i/2

n

that is

less than or equal to X(ω). For ω where X(ω) > n + 2

−n

we set X

n

(ω) = 0. Clearly

the X

n

are discrete, and approximate X. In fact, on the set where X ≤ n, we have that

[X(ω) −X

n

(ω)[ ≤ 2

−n

.

For reasonable X we are going to deﬁne EX = limEX

n

. Since the X

n

increase

with n, the limit must exist, although it could be +∞. If X is not necessarily nonnegative,

we deﬁne EX = EX

+

− EX

−

, provided at least one of EX

+

and EX

−

is ﬁnite. Here

X

+

= max(X, 0) and X

−

= max(−X, 0).

There are some things one wants to prove, but all this has been worked out in

measure theory and the theory of the Lebesgue integral; see Note 1. Let us conﬁne ourselves

here to showing this deﬁnition is the same as the usual one when X has a density.

Recall X has a density f

X

if

P(X ∈ [a, b]) =

_

b

a

f

X

(x)dx

for all a and b. In this case

EX =

_

∞

−∞

xf

X

(x)dx

provided

_

∞

−∞

[x[f

X

(x)dx < ∞. With our deﬁnition of X

n

we have

P(X

n

= i/2

n

) = P(X ∈ [i/2

n

, (i + 1)/2

n

)) =

_

(i+1)/2

n

i/2

n

f

X

(x)dx.

Then

EX

n

=

i

i

2

n

P(X

n

= i/2

n

) =

i

_

(i+1)/2

n

i/2

n

i

2

n

f

X

(x)dx.

Since x diﬀers from i/2

n

by at most 1/2

n

when x ∈ [i/2

n

, (i + 1)/2

n

), this will tend to

_

xf

X

(x)dx, unless the contribution to the integral for [x[ ≥ n does not go to 0 as n → ∞.

As long as

_

[x[f

X

(x)dx < ∞, one can show that this contribution does indeed go to 0.

38

We also need an extension of the deﬁnition of conditional probability. A r.v. is (

measurable if (X > a) ∈ ( for every a. How do we deﬁne E[Z [ (] when ( is not generated

by a countable collection of disjoint sets?

Again, there is a completely worked out theory that holds in all cases; see Note 2.

Let us give a deﬁnition that is equivalent that works except for a very few cases. Let us

suppose that for each n the σ-ﬁeld (

n

is ﬁnitely generated. This means that (

n

is generated

by ﬁnitely many disjoint sets B

n1

, . . . , B

nm

n

. So for each n, the number of B

ni

is ﬁnite but

arbitrary, the B

ni

are disjoint, and their union is Ω. Suppose also that (

1

⊂ (

2

⊂ . Now

∪

n

(

n

will not in general be a σ-ﬁeld, but suppose ( is the smallest σ-ﬁeld that contains

all the (

n

. Finally, deﬁne P(A [ () = limP(A [ (

n

).

This is a fairly general set-up. For example, let Ω be the real line and let (

n

be

generated by the sets (−∞, n), [n, ∞) and [i/2

n

, (i + 1)/2

n

). Then ( will contain every

interval that is closed on the left and open on the right, hence ( must be the σ-ﬁeld that

one works with when one talks about Lebesgue measure on the line.

The question that one might ask is: how does one know the limit exists? Since

the (

n

increase, we know by Proposition 4.3 that M

n

= P(A [ (

n

) is a martingale with

respect to the (

n

. It is certainly bounded above by 1 and bounded below by 0, so by the

martingale convergence theorem, it must have a limit as n → ∞.

Once one has a deﬁnition of conditional probability, one deﬁnes conditional expec-

tation by what one expects. If X is discrete, one can write X as

j

a

j

1

A

j

and then one

deﬁnes

E[X [ (] =

j

a

j

P(A

j

[ ().

If the X is not discrete, one write X = X

+

−X

−

, one approximates X

+

by discrete random

variables, and takes a limit, and similarly for X

−

. One has to worry about convergence,

but everything does go through.

With this extended deﬁnition of conditional expectation, do all the properties of

Section 2 hold? The answer is yes. See Note 2 again.

With continuous random variables, we need to be more cautious about what we

mean when we say two random variables are equal. We say X = Y almost surely, abbre-

viated “a.s.”, if

P(¦ω : X(ω) ,= Y (ω)¦) = 0.

So X = Y except for a set of probability 0. The a.s. terminology is used other places as

well: X

n

→ Y a.s. means that except for a set of ω’s of probability zero, X

n

(ω) → Y (ω).

Note 1. The best way to deﬁne expected value is via the theory of the Lebesgue integral. A

probability P is a measure that has total mass 1. So we deﬁne

EX =

_

X(ω) P(dω).

39

To recall how the deﬁnition goes, we say X is simple if X(ω) =

m

i=1

a

i

1

A

i

(ω) with each

a

i

≥ 0, and for a simple X we deﬁne

EX =

m

i=1

a

i

P(A

i

).

If X is nonnegative, we deﬁne

EX = sup¦EY : Y simple , Y ≤ X¦.

Finally, provided at least one of EX

+

and EX

−

is ﬁnite, we deﬁne

EX = EX

+

−EX

−

.

This is the same deﬁnition as described above.

Note 2. The Radon-Nikodym theorem from measure theory says that if Q and P are two

ﬁnite measures on (Ω, () and Q(A) = 0 whenever P(A) = 0 and A ∈ (, then there exists an

integrable function Y that is (-measurable such that Q(A) =

_

A

Y dP for every measurable

set A.

Let us apply the Radon-Nikodym theorem to the following situation. Suppose (Ω, T, P)

is a probability space and X ≥ 0 is integrable: EX < ∞. Suppose ( ⊂ T. Deﬁne two new

probabilities on ( as follows. Let P

= P[

G

, that is, P

(A) = P(A) if A ∈ ( and P

(A) is not

deﬁned if A ∈ T − (. Deﬁne Q by Q(A) =

_

A

XdP = E[X; A] if A ∈ (. One can show

(using the monotone convergence theorem from measure theory) that Q is a ﬁnite measure on

(. (One can also use this deﬁnition to deﬁne Q(A) for A ∈ T, but we only want to deﬁne Q

on (, as we will see in a moment.) So Q and P

**are two ﬁnite measures on (Ω, (). If A ∈ (
**

and P

**(A) = 0, then P(A) = 0 and so it follows that Q(A) = 0. By the Radon-Nikodym
**

theorem there exists an integrable random variable Y such that Y is ( measurable (this is why

we worried about which σ-ﬁeld we were working with) and

Q(A) =

_

A

Y dP

if A ∈ (. Note

((a) Y is ( measurable, and

(b) if A ∈ (,

E[Y ; A] = E[X; A]

because

E[Y ; A] = E[Y 1

A

] =

_

A

Y dP =

_

A

Y dP

= Q(A) =

_

A

XdP = E[X1

A

] = E[X; A].

40

We deﬁne E[X [ (] to be the random variable Y . If X is integrable but not necessarily

nonnegative, then X

+

and X

−

will be integrable and we deﬁne

E[X [ (] = E[X

+

[ (] −E[X

−

[ (].

We deﬁne

P(B [ () = E[1

B

[ (]

if B ∈ T.

Let us show that there is only one r.v., up to almost sure equivalence, that satisﬁes (a)

and (b) above. If Y and Z are ( measurable, and E[Y ; A] = E[X; A] = E[Z; A] for A ∈ (,

then the set A

n

= (Y > Z +

1

n

) will be in (, and so

E[Z; A

n

] +

1

n

P(A

n

) = E[Z +

1

n

; A

n

] ≤ E[Y ; A

n

] = E[Z; A

n

].

Consequently P(A

n

) = 0. This is true for each positive integer n, so P(Y > Z) = 0. By

symmetry, P(Z > Y ) = 0, and therefore P(Y ,= Z) = 0 as we wished.

If one checks the proofs of Propositions 2.3, 2.4, and 2.5, one sees that only properties

(a) and (b) above were used. So the propositions hold for the new deﬁnition of conditional

expectation as well.

In the case where ( is ﬁnitely or countably generated, under both the new and old

deﬁnitions (a) and (b) hold. By the uniqueness result, the new and old deﬁnitions agree.

41

10. Stochastic processes.

We will be talking about stochastic processes. Previously we discussed sequences

S

1

, S

2

, . . . of r.v.’s. Now we want to talk about processes Y

t

for t ≥ 0. For example, we

can think of S

t

being the price of a stock at time t. Any nonnegative time t is allowed.

We typically let T

t

be the smallest σ-ﬁeld with respect to which Y

s

is measurable

for all s ≤ t. So T

t

= σ(Y

s

: s ≤ t). As you might imagine, there are a few technicalities

one has to worry about. We will try to avoid thinking about them as much as possible,

but see Note 1.

We call a collection of σ-ﬁelds T

t

with T

s

⊂ T

t

if s < t a ﬁltration. We say the

ﬁltration satisﬁes the “usual conditions” if the T

t

are right continuous and complete (see

Note 1); all the ﬁltrations we consider will satisfy the usual conditions.

We say a stochastic process has continuous paths if the following holds. For each

ω, the map t → Y

t

(ω) deﬁnes a function from [0, ∞) to R. If this function is a continuous

function for all ω’s except for a set of probability zero, we say Y

t

has continuous paths.

Deﬁnition 10.1. A mapping τ : Ω → [0, ∞) is a stopping time if for each t we have

(τ ≤ t) ∈ T

t

.

Typically, τ will be a continuous random variable and P(τ = t) = 0 for each t, which

is why we need a deﬁnition just a bit diﬀerent from the discrete case.

Since (τ < t) = ∪

∞

n=1

(τ ≤ t −

1

n

) and (τ ≤ t −

1

n

) ∈ T

t−

1

n

⊂ T

t

, then for a stopping

time τ we have (τ < t) ∈ T

t

for all t.

Conversely, suppose τ is a nonnegative r.v. for which (τ < t) ∈ T

t

for all t. We

claim τ is a stopping time. The proof is easy, but we need the right continuity of the T

t

here, so we put the proof in Note 2.

A continuous time martingale (or submartingale) is what one expects: each M

t

is

integrable, each M

t

is T

t

measurable, and if s < t, then

E[M

t

[ T

s

] = M

s

.

(Here we are saying the left hand side and the right hand side are equal almost surely; we

will usually not write the “a.s.” since almost all of our equalities for random variables are

only almost surely.)

The analogues of Doob’s theorems go through. Note 3 has the proofs.

Note 1. For technical reasons, one typically deﬁnes T

t

as follows. Let T

0

t

= σ(Y

s

: s ≤ t).

This is what we referred to as T

t

above. Next add to T

0

t

all sets N for which P(N) = 0. Such

sets are called null sets, and since they have probability 0, they don’t aﬀect anything. In fact,

one wants to add all sets N that we think of being null sets, even though they might not be

42

measurable. To be more precise, we say N is a null set if inf¦P(A) : A ∈ T, N ⊂ A¦ = 0.

Recall we are starting with a σ-ﬁeld T and all the T

0

t

’s are contained in T. Let T

00

t

be the σ-

ﬁeld generated by T

0

t

and all null sets N, that is, the smallest σ-ﬁeld containing T

0

t

and every

null set. In measure theory terminology, what we have done is to say T

00

t

is the completion of

T

0

t

.

Lastly, we want to make our σ-ﬁelds right continuous. We set T

t

= ∩

ε>0

T

00

t+ε

. Al-

though the union of σ-ﬁelds is not necessarily a σ-ﬁeld, the intersection of σ-ﬁelds is. T

t

contains T

00

t

but might possibly contain more besides. An example of an event that is in T

t

but that may not be in T

00

t

is

A = ¦ω : lim

n→∞

Y

t+

1

n

(ω) ≥ 0¦.

A ∈ T

00

t+

1

m

for each m, so it is in T

t

. There is no reason it needs to be in T

00

t

if Y is not

necessarily continuous at t. It is easy to see that ∩

ε>0

T

t+ε

= T

t

, which is what we mean

when we say T

t

is right continuous.

When talking about a stochastic process Y

t

, there are various types of measurability

one can consider. Saying Y

t

is adapted to T

t

means Y

t

is T

t

measurable for each t. However,

since Y

t

is really a function of two variables, t and ω, there are other notions of measurability

that come into play. We will be considering stochastic processes that have continuous paths or

that are predictable (the deﬁnition will be given later), so these various types of measurability

will not be an issue for us.

Note 2. Suppose (τ < t) ∈ T

t

for all t. Then for each positive integer n

0

,

(τ ≤ t) = ∩

∞

n=n

0

(τ < t +

1

n

).

For n ≥ n

0

we have (τ < t +

1

n

) ∈ T

t+

1

n

⊂ T

t+

1

n

0

. Therefore (τ ≤ t) ∈ T

t+

1

n

0

for each n

0

.

Hence the set is in the intersection: ∩

n

0

>1

T

t+

1

n

0

⊂ ∩

ε>0

T

t+ε

= T

t

.

Note 3. We want to prove the analogues of Theorems 5.3 and 5.4. The proof of Doob’s

inequalities are simpler. We only will need the analogue of Theorem 5.4(b).

Theorem 10.2. Suppose M

t

is a martingale with continuous paths and EM

2

t

< ∞ for

all t. Then for each t

0

E[(sup

s≤t

0

M

s

)

2

] ≤ 4E[[M

t

0

[

2

].

Proof. By the deﬁnition of martingale in continuous time, N

k

is a martingale in discrete time

with respect to (

k

when we set N

k

= M

kt

0

/2

n and (

k

= T

kt

0

/2

n. By Theorem 5.4(b)

E[ max

0≤k≤2

n

M

2

kt

0

/2

n] = E[ max

0≤k≤2

n

N

2

k

] ≤ 4EN

2

2

n = 4EM

2

t

0

.

43

(Recall (max

k

a

k

)

2

= max a

2

k

if all the a

k

≥ 0.)

Now let n → ∞. Since M

t

has continuous paths, max

0≤k≤2

n M

2

kt

0

/2

n

increases up to

sup

s≤t

0

M

2

s

. Our result follows from the monotone convergence theorem from measure theory

(see Note 4).

We now prove the analogue of Theorem 5.3. The proof is simpler if we assume that

EM

2

t

is ﬁnite; the result is still true without this assumption.

Theorem 10.3. Suppose M

t

is a martingale with continuous paths, EM

2

t

< ∞ for all t,

and τ is a stopping time bounded almost surely by t

0

. Then EM

τ

= EM

t

0

.

Proof. We approximate τ by stopping times taking only ﬁnitely many values. For n > 0

deﬁne

τ

n

(ω) = inf¦kt

0

/2

n

: τ(ω) < kt

0

/2

n

¦.

τ

n

takes only the values kt

0

/2

n

for some k ≤ 2

n

. The event (τ

n

≤ jt

0

/2

n

) is equal to

(τ < jt

0

/2

n

), which is in T

jt

0

/2

n since τ is a stopping time. So (τ

n

≤ s) ∈ T

s

if s is of the

form jt

0

/2

n

for some j. A moment’s thought, using the fact that τ

n

only takes values of the

form kt

0

/2

n

, shows that τ

n

is a stopping time.

It is clear that τ

n

↓ τ for every ω. Since M

t

has continuous paths, M

τ

n

→ M

τ

a.s.

Let N

k

and (

k

be as in the proof of Theorem 10.2. Let σ

n

= k if τ

n

= kt

0

/2

n

. By

Theorem 5.3,

EN

σ

n

= EN

2

n,

which is the same as saying

EM

τ

n

= EM

t

0

.

To complete the proof, we need to show EM

τ

n

converges to EM

τ

. This is almost

obvious, because we already observed that M

τ

n

→ M

τ

a.s. Without the assumption that

EM

2

t

< ∞ for all t, this is actually quite a bit of work, but with the assumption it is not too

bad.

Either [M

τ

n

− M

τ

[ is less than or equal to 1 or greater than 1. If it is greater than 1,

it is less than [M

τ

n

−M

τ

[

2

. So in either case,

[M

τ

n

−M

τ

[ ≤ 1 +[M

τ

n

−M

τ

[

2

. (10.1)

Because both [M

τ

n

[ and [M

τ

[ are bounded by sup

s≤t

0

[M

s

[, the right hand side of (10.1) is

bounded by 1 + 4 sup

s≤t

0

[M

s

[

2

, which is integrable by Theorem 10.2. [M

τ

n

−M

τ

[ → 0, and

so by the dominated convergence theorem from measure theory (Note 4),

E[M

τ

n

−M

τ

[ → 0.

44

Finally,

[EM

τ

n

−EM

τ

[ = [E(M

τ

n

−M

τ

)[ ≤ E[M

τ

n

−M

τ

[ → 0.

Note 4. The dominated convergence theorem says that if X

n

→ X a.s. and [X

n

[ ≤ Y a.s.

for each n, where EY < ∞, then EX

n

→ EX.

The monotone convergence theorem says that if X

n

≥ 0 for each n, X

n

≤ X

n+1

for

each n, and X

n

→ X, then EX

n

→ EX.

45

11. Brownian motion.

First, let us review a few facts about normal random variables. We say X is a

normal random variable with mean a and variance b

2

if

P(c ≤ X ≤ d) =

_

d

c

1

√

2πb

2

e

−(y−a)

2

/2b

2

dy

and we will abbreviate this by saying X is ^(a, b

2

). If X is ^(a, b

2

), then EX = a,

Var X = b

2

, and E[X[

p

< ∞ is ﬁnite for every positive integer p. Moreover

Ee

tX

= e

at

e

t

2

b

2

/2

.

Let S

n

be a simple symmetric random walk. This means that Y

k

= S

k

− S

k−1

equals +1 with probability

1

2

, equals −1 with probability

1

2

, and is independent of Y

j

for

j < k. We notice that ES

n

= 0 while ES

2

n

=

n

i=1

EY

2

i

+

i=j

EY

i

Y

j

= n using the fact

that E[Y

i

Y

j

] = (EY

i

)(EY

j

) = 0.

Deﬁne X

n

t

= S

nt

/

√

n if nt is an integer and by linear interpolation for other t. If

nt is an integer, EX

n

t

= 0 and E(X

n

t

)

2

= t. It turns out X

n

t

does not converge for any ω.

However there is another kind of convergence, called weak convergence, that takes

place. There exists a process Z

t

such that for each k, each t

1

< t

2

< < t

k

, and each

a

1

< b

1

, a

2

< b

2

, . . . , a

k

< b

k

, we have

(1) The paths of Z

t

are continuous as a function of t.

(2) P(X

n

t

1

∈ [a

1

, b

1

], . . . , X

n

t

k

∈ [a

k

, b

k

]) → P(Z

t

1

∈ [a

1

, b

1

], . . . , Z

t

k

∈ [a

k

, b

k

]).

See Note 1 for more discussion of weak convergence.

The limit Z

t

is called a Brownian motion starting at 0. It has the following prop-

erties.

(1) EZ

t

= 0.

(2) EZ

2

t

= t.

(3) Z

t

−Z

s

is independent of T

s

= σ(Z

r

, r ≤ s).

(4) Z

t

−Z

s

has the distribution of a normal random variable with mean 0 and variance

t −s. This means

P(Z

t

−Z

s

∈ [a, b]) =

_

b

a

1

_

2π(t −s)

e

−y

2

/2(t−s)

dy.

(This result follows from the central limit theorem.)

(5) The map t → Z

t

(ω) is continuous for almost all ω.

See Note 2 for a few remarks on this deﬁnition.

It is common to use B

t

(“B” for Brownian) or W

t

(“W” for Wiener, who was the

ﬁrst person to prove rigorously that Brownian motion exists). We will most often use W

t

.

46

We will use Brownian motion extensively and develop some of its properties. As

one might imagine for a limit of a simple random walk, the paths of Brownian motion have

a huge number of oscillations. It turns out that the function t → W

t

(ω) is continuous, but

it is not diﬀerentiable; in fact one cannot deﬁne a derivative at any value of t. Another

bizarre property: if one looks at the set of times at which W

t

(ω) is equal to 0, this is a

set which is uncountable, but contains no intervals. There is nothing special about 0 – the

same is true for the set of times at which W

t

(ω) is equal to a for any level a.

In what follows, one of the crucial properties of a Brownian motion is that it is a

martingale with continuous paths. Let us prove this.

Proposition 11.1. W

t

is a martingale with respect to T

t

and W

t

has continuous paths.

Proof. As part of the deﬁnition of a Brownian motion, W

t

has continuous paths. W

t

is T

t

measurable by the deﬁnition of T

t

. Since the distribution of W

t

is that of a normal random

variable with mean 0 and variance t, then E[W

t

[ < ∞ for all t. (In fact, E[W

t

[

n

< ∞ for

all n.)

The key property is to show E[W

t

[ T

s

] = W

s

.

E[W

t

[ T

s

] = E[W

t

−W

s

[ T

s

] +E[W

s

[ T

s

] = E[W

t

−W

s

] +W

s

= W

s

.

We used here the facts that W

t

− W

s

is independent of T

s

and that E[W

t

− W

s

] = 0

because W

t

and W

s

have mean 0.

We will also need

Proposition 11.2. W

2

t

−t is a martingale with continuous paths with respect to T

t

.

Proof. That W

2

t

− t is integrable and is T

t

measurable is as in the above proof. We

calculate

E[W

2

t

−t [ T

s

] = E[((W

t

−W

s

) +W

s

)

2

[ T

s

] −t

= E[(W

t

−W

s

)

2

[ T

s

] + 2E[(W

t

−W

s

)W

s

[ T

s

] +E[W

2

s

[ T

s

] −t

= E[(W

t

−W

s

)

2

] + 2W

s

E[W

t

−W

s

[ T

s

] +W

2

s

−t.

We used the facts that W

s

is T

s

measurable and that (W

t

− W

s

)

2

is independent of T

s

because W

t

−W

s

is. The second term on the last line is equal to W

s

E[W

t

−W

s

] = 0. The

ﬁrst term, because W

t

− W

s

is normal with mean 0 and variance t − s, is equal to t − s.

Substituting, the last line is equal to

(t −s) + 0 +W

2

s

−t = W

2

s

−s

47

as required.

Note 1. A sequence of random variables X

n

converges weakly to X if P(a < X

n

< b) →

P(a < X < b) for all a, b ∈ [−∞, ∞] such that P(X = a) = P(X = b) = 0. a and b can be

inﬁnite. If X

n

converges to a normal random variable, then P(X = a) = P(X = b) = 0 for all

a and b. This is the type of convergence that takes place in the central limit theorem. It will

not be true in general that X

n

converges to X almost surely.

For a sequence of random vectors (X

n

1

, . . . , X

m

k

) to converge to a random vector

(X

1

, . . . , X

k

), one can give an analogous deﬁnition. But saying that the normalized random

walks X

n

(t) above converge weakly to Z

t

actually says more than (2). A result from probability

theory says that X

n

converges to X weakly if and only if E[f(X

n

)] → E[f(X)] whenever f

is a bounded continuous function on R. We use this to deﬁne weak convergence for stochastic

processes. Let C([0, ∞) be the collection of all continuous functions from [0, ∞) to the reals.

This is a metric space, so the notion of function from C([0, ∞)) to R being continuous makes

sense. We say that the processes X

n

converge weakly to the process Z, and mean by this

that E[F(X

n

)] → E[F(Z)] whenever F is a bounded continuous function on C([0, ∞)). One

example of such a function F would be F(f) = sup

0≤t<∞

[f(t)[ if f ∈ C([0, ∞)); Another

would be F(f) =

_

1

0

f(t)dt.

The reason one wants to show that X

n

converges weakly to Z instead of just showing

(2) is that weak convergence can be shown to imply that Z has continuous paths.

Note 2. First of all, there is some redundancy in the deﬁnition: one can show that parts

of the deﬁnition are implied by the remaining parts, but we won’t worry about this. Second,

we actually want to let T

t

to be the completion of σ(Z

s

: s ≤ t), that is, we throw in all the

null sets into each T

t

. One can prove that the resulting T

t

are right continuous, and hence

the ﬁltration T

t

satisﬁes the “usual” conditions. Finally, the “almost all” in (5) means that

t → Z

t

(ω) is continuous for all ω, except for a set of ω of probability zero.

48

12. Stochastic integrals.

If one wants to consider the (deterministic) integral

_

t

0

f(s) dg(s), where f and g

are continuous and g is continuously diﬀerentiable, we can deﬁne it analogously to the

usual Riemann integral as the limit of Riemann sums

n

i=1

f(s

i

)[g(s

i

) − g(s

i−1

)], where

s

1

< s

2

< < s

n

is a partition of [0, t]. This is known as the Riemann-Stieltjes integral.

One can show (using the mean value theorem, for example) that

_

t

0

f(s) dg(s) =

_

t

0

f(s)g

(s) ds.

If we were to take f(s) = 1

[0,a]

(s) (which is not continuous, but that is a minor matter

here), one would expect the following:

_

t

0

1

[0,a]

(s) dg(s) =

_

t

0

1

[0,a]

(s)g

(s) ds =

_

a

0

g

(s) ds = g(a) −g(0).

Note that although we use the fact that g is diﬀerentiable in the intermediate stages, the

ﬁrst and last terms make sense for any g.

We now want to replace g by a Brownian path and f by a random integrand. The

expression

_

f(s) dW(s) does not make sense as a Riemann-Stieltjes integral because it is

a fact that W(s) is not diﬀerentiable as a function of t. We need to deﬁne the expression

by some other means. We will show that it can be deﬁned as the limit in L

2

of Riemann

sums. The resulting integral is called a stochastic integral.

Let us consider a very special case ﬁrst. Suppose f is continuous and deterministic

(i.e., does not depend on ω). Suppose we take a Riemann sum approximation

I

n

=

2

n

−1

i=0

f(

i

2

n

)[W(

i+1

2

n

) −W(

i

2

n

)].

Since W

t

has zero expectation for each t, EI

n

= 0. Let us calculate the second moment:

EI

2

n

= E

__

i

f(

i

2

n

)[W(

i+1

2

n

) −W(

i

2

n

)]

_

2

_

(12.1)

= E

2

n

−1

i=0

f(

i

2

n

)

2

[W(

i+1

2

n

) −W(

i

2

n

)]

2

+E

i=j

f(

i

2

n

)f(

j

2

n

)[W(

i+1

2

n

) −W(

i

2

n

)] [W(

j+1

2

n

) −W(

j

2

n

)].

The ﬁrst sum is bounded by

i

f(

i

2

n

)

2

1

2

n

≈

_

1

0

f(t)

2

dt,

49

since the second moment of W(

i+1

2

n

) − W(

i

2

n

) is 1/2

n

. Using the independence and the

fact that W

t

has mean zero,

E

_

[W(

i+1

2

n

−W(

i

2

n

)] [W(

j+1

2

n

−W(

j

2

n

)]

¸

= E[W(

i+1

2

n

−W(

i

2

n

)]E[W(

j+1

2

n

−W(

j

2

n

)] = 0,

and so the second sum on the right hand side of (12.1) is zero. This calculation is the key

to the stochastic integral.

We now turn to the construction. Let W

t

be a Brownian motion. We will only

consider integrands H

s

such that H

s

is T

s

measurable for each s (see Note 1). We will

construct

_

t

0

H

s

dW

s

for all H with

E

_

t

0

H

2

s

ds < ∞. (12.2)

Before we proceed we will need to deﬁne the quadratic variation of a continuous

martingale. We will use the following theorem without proof because in our applications we

can construct the desired increasing process directly. We often say a process is a continuous

process if its paths are continuous, and similarly a continuous martingale is a martingale

with continuous paths.

Theorem 12.1. Suppose M

t

is a continuous martingale such that EM

2

t

< ∞ for all t.

There exists one and only one increasing process A

t

that is adapted to T

t

, has continuous

paths, and A

0

= 0 such that M

2

t

−A

t

is a martingale.

The simplest example of such a martingale is Brownian motion. If W

t

is a Brownian

motion, we saw in Proposition 11.2 that W

2

t

− t is a martingale. So in this case A

t

= t

almost surely, for all t. Hence ¸W)

t

= t.

We use the notation ¸M)

t

for the increasing process given in Theorem 12.1 and call

it the quadratic variation process of M. We will see later that in the case of stochastic

integrals, where

N

t

=

_

t

0

H

s

dW

s

,

it turns out that ¸N)

t

=

_

t

0

H

2

s

ds.

We will use the following frequently, and in fact, these are the only two properties

of Brownian motion that play a signiﬁcant role in the construction.

Lemma 12.1. (a) E[W

b

−W

a

[ T

a

] = 0.

(b) E[W

2

b

−W

2

a

[ T

a

] = E[(W

b

−W

a

)

2

[ T

a

] = b −a.

Proof. (a) This is E[W

b

− W

a

] = 0 by the independence of W

b

− W

a

from T

a

and the

fact that W

b

and W

a

have mean zero.

50

(b) (W

b

−W

a

)

2

is independent of T

a

, so the conditional expectation is the same as

E[(W

b

−W

a

)

2

]. Since W

b

−W

a

is a ^(0, b −a), the second equality in (b) follows.

To prove the ﬁrst equality in (b), we write

E[W

2

b

−W

2

a

[ T

a

] = E[((W

b

−W

a

) +W

a

)

2

[ T

a

] −E[W

2

a

[ T

a

]

= E[(W

b

−W

a

)

2

[ T

a

] + 2E[W

a

(W

b

−W

a

) [ T

a

] +E[W

2

a

[ T

a

]

−E[W

2

a

[ T

a

]

= E[(W

b

−W

a

)

2

[ T

a

] + 2W

a

E[W

b

−W

a

[ T

a

],

and the ﬁrst equality follows by applying (a).

We construct the stochastic integral in three steps. We say an integrand H

s

= H

s

(ω)

is elementary if

H

s

(ω) = G(ω)1

(a,b]

(s)

where 0 ≤ a < b and G is bounded and T

a

measurable. We say H is simple if it is a ﬁnite

linear combination of elementary processes, that is,

H

s

(ω) =

n

i=1

G

i

(ω)1

(a

i

,b

i

]

(s). (12.3)

We ﬁrst construct the stochastic integral for H elementary; the work here is showing the

stochastic integral is a martingale. We next construct the integral for H simple and here

the diﬃculty is calculating the second moment. Finally we consider the case of general H.

First step. If G is bounded and T

a

measurable, let H

s

(ω) = G(ω)1

(a,b]

(s), and deﬁne the

stochastic integral to be the process N

t

, where N

t

= G(W

t∧b

− W

t∧a

). Compare this to

the ﬁrst paragraph of this section, where we considered Riemann-Stieltjes integrals.

Proposition 12.2. N

t

is a continuous martingale, EN

2

∞

= E[G

2

(b −a)] and

¸N)

t

=

_

t

0

G

2

1

[a,b]

(s) ds.

Proof. The continuity is clear. Let us look at E[N

t

[ T

s

]. In the case a < s < t < b, this

is equal to

E[G(W

t

−W

a

) [ T

s

] = GE[(W

t

−W

a

) [ T

s

] = G(W

s

−W

a

) = N

s

.

In the case s < a < t < b, E[N

t

[ T

s

] is equal to

E[G(W

t

−W

a

) [ T

s

] = E[GE[W

t

−W

a

[ T

a

] [ T

s

] = 0 = N

s

.

51

The other possibilities are s < t < a < b, a < b < s < t, as < a < b < t, and a < s < b < t;

these are done similarly.

For EN

2

∞

, we have using Lemma 12.1(b)

EN

2

∞

= E[G

2

E[(W

b

−W

a

)

2

[ T

a

]] = E[G

2

E[W

2

b

−W

2

a

[ T

a

]] = E[G

2

(b −a)].

For ¸N)

t

, we need to show

E[G

2

(W

t∧b

−W

t∧a

)

2

−G

2

(t ∧ b −t ∧ a) [ T

s

]

= G

2

(W

s∧b

−W

s∧a

)

2

−G

2

(s ∧ b −s ∧ a).

We do this by checking all six cases for the relative locations of a, b, s, and t; we do one of

the cases in Note 2.

Second step. Next suppose H

s

is simple as in (12.3). In this case deﬁne the stochastic

integral

N

t

=

_

t

0

H

s

dW

s

=

n

i=1

G

i

(W

b

i

∧t

−W

a

i

∧t

).

Proposition 12.3. N

t

is a continuous martingale, EN

2

∞

= E

_

∞

0

H

2

s

ds, and ¸N)

t

=

_

t

0

H

2

s

ds.

Proof. We may rewrite H so that the intervals (a

i

, b

i

] satisfy a

1

≤ b

1

≤ a

2

≤ b

2

≤ ≤ b

n

.

For example, if we had a

1

< a

2

< b

1

< b

2

, we could write

H

s

= G

1

1

(a

1

,a

2

]

+ (G

1

+G

2

)1

(a

2

,b

1

]

+G

2

1

(b

1

,b

2

]

,

and then if we set G

1

= G

1

, G

2

= G

1

+ G

2

, G

3

= G

2

and a

1

= a

1

, b

1

= a

2

, a

2

= a

2

, b

2

=

b

1

, a

3

= b

1

, b

3

= b

2

, we have written H as

3

i=1

G

i

1

(a

i

,b

i

]

.

So now we have H simple but with the intervals (a

i

, b

i

] non-overlapping.

Since the sum of martingales is clearly a martingale, N

t

is a martingale. The sum

of continuous processes will be continuous, so N

t

has continuous paths.

We have

EN

2

∞

= E

_

G

2

i

(W

b

i

−W

a

i

)

2

_

+ 2E

_

i<j

G

i

G

j

(W

b

i

−W

a

i

)(W

b

j

−W

a

j

)

_

.

52

The terms in the second sum vanish, because when we condition on T

a

j

, we have

E[G

i

G

j

(W

b

i

−W

a

i

)E[(W

b

j

−W

a

j

) [ T

a

j

] = 0.

Taking expectations,

E[G

i

G

j

(W

b

i

−W

a

i

)(W

b

j

−W

a

j

)] = 0.

For the terms in the ﬁrst sum, by Lemma 12.1

E[G

2

i

(W

b

i

−W

a

i

)

2

] = E[G

2

i

E[(W

b

i

−W

a

i

)

2

[ T

a

i

]] = E[G

2

i

([b

i

−a

i

)].

So

EN

2

∞

=

n

i=1

E[G

2

i

([b

i

−a

i

)],

and this is the same as E

_

∞

0

H

2

s

ds.

Third step. Now suppose H

s

is adapted and E

_

∞

0

H

2

s

ds < ∞. Using some results from

measure theory (Note 3), we can choose H

n

s

simple such that E

_

∞

0

(H

n

s

− H

s

)

2

ds → 0.

The triangle inequality then implies (see Note 3 again)

E

_

∞

0

(H

n

s

−H

m

s

)

2

ds → 0.

Deﬁne N

n

t

=

_

t

0

H

n

s

dW

s

using Step 2. By Doob’s inequality (Theorem 10.3) we have

E[sup

t

(N

n

t

−N

m

t

)

2

] = E

_

sup

t

_

_

t

0

(H

n

s

−H

m

s

) dW

s

_

2

_

≤ 4E

_

_

∞

0

(H

n

s

−H

m

s

) dW

s

_

2

= 4E

_

∞

0

(H

n

s

−H

m

s

)

2

ds → 0.

This should look reminiscent of the deﬁnition of Cauchy sequences, and in fact that is what

is going on here; Note 3 has details. In the present context Cauchy sequences converge,

and one can show (Note 3) that there exists a process N

t

such that

E

__

sup

t

¸

¸

¸

_

t

0

H

n

s

dW

s

−N

t

¸

¸

¸

_

2

_

→ 0.

If H

n

s

and H

n

s

**are two sequences converging to H, then E(
**

_

t

0

(H

n

s

−H

n

s

) dW

s

)

2

=

E

_

t

0

(H

n

s

−H

n

s

)

2

ds → 0, or the limit is independent of which sequence H

n

we choose. See

Note 4 for the proof that N

t

is a martingale, EN

2

t

= E

_

t

0

H

2

s

ds, and ¸N)

t

=

_

t

0

H

2

s

ds.

53

Because sup

t

[

_

t

0

H

n

s

dW

s

−N

t

] → 0, one can show there exists a subsequence such that the

convergence takes place almost surely, and with probability one, N

t

has continuous paths

(Note 5).

We write N

t

=

_

t

0

H

s

dW

s

and call N

t

the stochastic integral of H with respect to

W.

We discuss some extensions of the deﬁnition. First of all, if we replace W

t

by a

continuous martingale M

t

and H

s

is adapted with E

_

t

0

H

2

s

d¸M)

s

< ∞, we can duplicate

everything we just did (see Note 6) with ds replaced by d¸M)

s

and get a stochastic integral.

In particular, if d¸M)

s

= K

2

s

ds, we replace ds by K

2

s

ds.

There are some other extensions of the deﬁnition that are not hard. If the random

variable

_

∞

0

H

2

s

d¸M)

s

is ﬁnite but without its expectation being ﬁnite, we can deﬁne the

stochastic integral by deﬁning it for t ≤ T

N

for suitable stopping times T

N

and then letting

T

N

→ ∞; look at Note 7.

A process A

t

is of bounded variation if the paths of A

t

have bounded variation. This

means that one can write A

t

= A

+

t

−A

−

t

, where A

+

t

and A

−

t

have paths that are increasing.

[A[

t

is then deﬁned to be A

+

t

+ A

−

t

. A semimartingale is the sum of a martingale and a

process of bounded variation. If

_

∞

0

H

2

s

d¸M)

s

+

_

∞

0

[H

s

[ [dA

s

[ < ∞ and X

t

= M

t

+ A

t

,

we deﬁne

_

t

0

H

s

dX

s

=

_

t

0

H

s

dM

s

+

_

t

0

H

s

dA

s

,

where the ﬁrst integral on the right is a stochastic integral and the second is a Riemann-

Stieltjes or Lebesgue-Stieltjes integral. For a semimartingale, we deﬁne ¸X)

t

= ¸M

t

). Note

7 has more on this.

Given two semimartingales X and Y we deﬁne ¸X, Y )

t

by what is known as polar-

ization:

¸X, Y )

t

=

1

2

[¸X +Y )

t

−¸X)

t

−¸Y )

t

].

As an example, if X

t

=

_

t

0

H

s

dW

s

and Y

t

=

_

t

0

K

s

dW

s

, then (X+Y )

t

=

_

t

0

(H

s

+K

s

)dW

s

,

so

¸X +Y )

t

=

_

t

0

(H

s

+K

s

)

2

ds =

_

t

0

H

2

s

ds +

_

t

0

2H

s

K

s

ds +

_

t

0

K

2

s

ds.

Since ¸X)

t

=

_

t

0

H

2

s

ds with a similar formula for ¸Y )

t

, we conclude

¸X, Y )

t

=

_

t

0

H

s

K

s

ds.

The following holds, which is what one would expect.

54

Proposition 12.4. Suppose K

s

is adapted to T

s

and E

_

∞

0

K

2

s

ds < ∞. Let N

t

=

_

t

0

K

s

dW

s

. Suppose H

s

is adapted and E

_

∞

0

H

2

s

d¸N)

s

< ∞. Then E

_

∞

0

H

2

s

K

2

s

ds < ∞

and

_

t

0

H

s

dN

s

=

_

t

0

H

s

K

s

dW

s

.

The argument for the proof is given in Note 8.

What does a stochastic integral mean? If one thinks of the derivative of Z

t

as being

a white noise, then

_

t

0

H

s

dZ

s

is like a ﬁlter that increases or decreases the volume by a

factor H

s

.

For us, an interpretation is that Z

t

represents a stock price. Then

_

t

0

H

s

dZ

s

repre-

sents our proﬁt (or loss) if we hold H

s

shares at time s. This can be seen most easily if

H

s

= G1

[a,b]

. So we buy G(ω) shares at time a and sell them at time b. The stochastic

integral represents our proﬁt or loss.

Since we are in continuous time, we are allowed to buy and sell continuously and

instantaneously. What we are not allowed to do is look into the future to make our

decisions, which is where the H

s

adapted condition comes in.

Note 1. Let us be more precise concerning the measurability of H that is needed. H is a

stochastic process, so can be viewed as a map from [0, ∞) Ω to R by H : (s, ω) → H

s

(ω).

We deﬁne a σ-ﬁeld T on [0, ∞)Ω as follows. Consider the collection of processes of the form

G(ω)1

(a,b])

(s) where G is bounded and T

a

measurable for some a < b. Deﬁne T to be the

smallest σ-ﬁeld with respect to which every process of this form is measurable. T is called the

predictable or previsible σ-ﬁeld, and if a process H is measurable with respect to T, then the

process is called predictable. What we require for our integrands H is that they be predictable

processes.

If H

s

has continuous paths, then approximating continuous functions by step functions

shows that such an H can be approximated by linear combinations of processes of the form

G(ω)1

(a,b])

(s). So continuous processes are predictable. The majority of the integrands we

will consider will be continuous.

If one is slightly more careful, one sees that processes whose paths are functions which

are continuous from the left at each time point are also predictable. This gives an indication

of where the name comes from. If H

s

has paths which are left continuous, then H

t

=

lim

n→∞

H

t−

1

n

, and we can “predict” the value of H

t

from the values at times that come

before t. If H

t

is only right continuous and a path has a jump at time t, this is not possible.

Note 2. Let us consider the case a < s < t < b; again similar arguments take care of the

other ﬁve cases. We need to show

E[G

2

(W

t

−W

a

)

2

−G

2

(t −a) [ T

s

] = G

2

(W

s

−W

a

)

2

−G

2

(s −a). (12.4)

55

The left hand side is equal to G

2

E[(W

t

−W

a

)

2

−(t −a) [ T

s

]. We write this as

G

2

E[((W

t

−W

s

) + (W

s

−W

a

))

2

−(t −a) [ T

s

]

= G

2

_

E[(W

t

−W

s

)

2

[ T

s

] + 2E[(W

t

−W

s

)(W

s

−W

a

) [ T

s

]

+E[(W

s

−W

a

)

2

[ T

s

] −(t −a)

_

= G

2

_

E[(W

t

−W

s

)

2

] + 2(W

s

−W

a

)E[W

t

−W

s

[ T

s

] + (W

s

−W

a

)

2

−(t −a)

_

= G

2

_

(t −s) + 0 + (W

s

−W

a

)

2

−(t −a)

_

.

The last expression is equal to the right hand side of (12.4).

Note 3. A deﬁnition from measure theory says that if µ is a measure, |f|

2

, the L

2

norm of

f with respect to the measure µ, is deﬁned as

_

_

f(x)

2

µ(dx)

_

1/2

. The space L

2

is deﬁned

to be the set of functions f for which |f|

2

< ∞. (A technical proviso: one has to identify as

equal functions which diﬀer only on a set of measure 0.) If one deﬁnes a distance between two

functions f and g by d(f, g) = |f −g|

2

, this is a metric on the space L

2

, and a theorem from

measure theory says that L

2

is complete with respect to this metric. Another theorem from

measure theory says that the collection of simple functions (functions of the form

n

i=1

c

i

1

A

i

)

is dense in L

2

with respect to the metric.

Let us deﬁne a norm on stochastic processes; this is essentially an L

2

norm. Deﬁne

|N| =

_

E sup

0≤t<∞

N

2

t

_

1/2

.

One can show that this is a norm, and hence that the triangle inequality holds. Moreover, the

space of processes N such that |N| < ∞ is complete with respect to this norm. This means

that if N

n

is a Cauchy sequence, i.e., if given ε there exists n

0

such that |N

n

− N

m

| < ε

whenever n, m ≥ n

0

, then the Cauchy sequence converges, that is, there exists N with |N| <

∞ such that |N

n

−N| → 0.

We can deﬁne another norm on stochastic processes. Deﬁne

|H|

2

=

_

E

_

∞

0

H

2

s

ds

_

1/2

.

This can be viewed as a standard L

2

norm, namely, the L

2

norm with respect to the measure

µ deﬁned on T by

µ(A) = E

_

∞

0

1

A

(s, ω)ds.

Since the set of simple functions with respect to µ is dense in L

2

, this says that if H is

measurable with respect to T, then there exist simple processes H

n

s

that are also measurable

with respect to T such that |H

n

−H|

2

→ 0.

56

Note 4. We have |N

n

−N| → 0, where the norm here is the one described in Note 3. Each

N

n

is a stochastic integral of the type described in Step 2 of the construction, hence each N

n

t

is a martingale. Let s < t and A ∈ T

s

. Since E[N

n

t

[ T

s

] = N

n

s

, then

E[N

n

t

; A] = E[N

n

s

; A]. (12.5)

By Cauchy-Schwarz,

[E[N

n

t

; A] −E[N

t

; A][ ≤ E[ [N

n

t

−N

t

[; A] ≤

_

E[(N

n

t

−N

t

)

2

]

_

1/2

_

E[1

2

A

]

_

1/2

≤ |N

n

−N| → 0. (12.6)

We have a similar limit when t is replaced by s, so taking the limit in (12.5) yields

E[N

t

; A] = E[N

s

; A].

Since N

s

is T

s

measurable and has the same expectation over sets A ∈ T

s

as N

t

does, then

by Proposition 4.3 E[N

t

[ T

s

] = N

s

, or N

t

is a martingale.

Suppose |N

n

− N| → 0. Given ε > 0 there exists n

0

such that |N

n

− N| < ε if

n ≥ n

0

. Take ε = 1 and choose n

0

. By the triangle inequality,

|N| ≤ |N

n

| +|N

n

−N| ≤ |N

n

| + 1 < ∞

since |N

n

| is ﬁnite for each n.

That N

2

t

− ¸N)

t

is a martingale is similar to the proof that N

t

is a martingale, but

slightly more delicate. We leave the proof to the reader, but note that in place of (SEC.402)

one writes

[E[(N

n

t

)

2

; A] −E[(N

t

)

2

; A][ ≤ E[ [(N

n

t

)

2

−(N

t

)

2

[] ≤ E[ [N

n

t

−N

t

[ [N

n

t

+N

t

[]. (12.7)

By Cauchy-Schwarz this is less than

|N

n

t

−N

t

| |N

n

t

+N

t

|.

since |N

n

t

+ N

t

| ≤ |N

n

t

| + |N

t

| is bounded independently of n, we see that the left hand

side of (12.7) tends to 0.

Note 5. We have |N

n

−N| → 0, where the norm is described in Note 3. This means that

E[sup

t

[N

n

t

−N

t

[

2

] → 0.

A result from measure theory implies that there exists a subsequence n

k

such that

sup

t

[N

n

k

t

−N

t

[

2

→ 0, a.s.

57

So except for a set of ω’s of probability 0, N

n

k

t

(ω) converges to N

t

(ω) uniformly. Each N

n

k

t

(ω)

is continuous by Step 2, and the uniform limit of continuous functions is continuous, therefore

N

t

(ω) is a continuous function of t. Incidentally, this is the primary reason we considered

Doob’s inequalities.

Note 6. If M

t

is a continuous martingale, E[M

b

− M

a

[ T

a

] = E[M

b

[ T

a

] − M

a

=

M

a

− M

a

= 0. This is the analogue of Lemma 12.1(a). To show the analogue of Lemma

12.1(b),

E[(M

b

−M

a

)

2

[ T

a

] = E[M

2

b

[ T

a

] −2E[M

b

M

a

[ T

a

] +E[M

2

a

[ T

a

]

= E[M

2

b

[ T

a

] −2M

a

E[M

b

[ T

a

] +M

2

a

= E[M

2

b

−¸M)

b

[ T

a

] +E[¸M)

b

−¸M)

a

[ T

a

] −M

2

a

+¸M)

a

= E[¸M)

b

−¸M)

a

[ T

a

],

since M

2

t

−¸M)

t

is a martingale. That

E[M

2

b

−M

2

a

[ T

a

] = E[¸M)

b

−¸M)

a

[ T

a

]

is just a rewriting of

E[M

2

b

−¸M)

b

[ T

a

] = M

2

a

−¸M)

a

= E[M

2

a

−¸M)

a

[ T

a

].

With these two properties in place of Lemma 12.1, replacing W

s

by M

s

and ds by

d¸M)

s

, the construction of the stochastic integral

_

t

0

H

s

dM

s

goes through exactly as above.

Note 7. If we let T

K

= inf¦t > 0 :

_

t

0

H

2

s

d¸M)

s

≥ K¦, the ﬁrst time the integral is larger

than or equal to K, and we let H

K

s

= H

s

1

(s≤T

K

)

, then

_

∞

0

H

K

s

d¸M)

s

≤ K and there is no

diﬃculty deﬁning N

K

t

=

_

t

0

H

K

s

dM

s

for every t. One can show that if t ≤ T

K

1

and T

K

2

, then

N

K

1

t

= N

K

2

t

a.s. If

_

t

0

H

s

d¸M)

s

is ﬁnite for every t, then T

K

→ ∞ as K → ∞. If we call the

common value N

t

, this allows one to deﬁne the stochastic integral N

t

for each t in the case

where the integral

_

t

0

H

2

s

d¸M)

s

is ﬁnite for every t, even if the expectation of the integral is

not.

We can do something similar is M

t

is a martingale but where we do not have E¸M)

∞

<

∞. Let S

K

= inf¦t : [M

t

[ ≥ K¦, the ﬁrst time [M

t

[ is larger than or equal to K. If we let

M

K

t

= M

t∧S

K

, where t ∧ S

k

= min(t, S

K

), then one can show M

K

is a martingale bounded

in absolute value by K. So we can deﬁne J

K

t

=

_

t

0

H

s

dM

K

t

for every t, using the paragraph

above to handle the wider class of H’s, if necessary. Again, one can show that if t ≤ S

K

1

and

t ≤ S

K

2

, then the value of the stochastic integral will be the same no matter whether we use

M

K

1

or M

K

2

as our martingale. We use the common value as a deﬁnition of the stochastic

integral J

t

. We have S

K

→ ∞ as K → ∞, so we have a deﬁnition of J

t

for each t.

58

Note 8. We only outline how the proof goes. To show

_

t

0

H

s

dN

s

=

_

t

0

H

s

K

s

dW

s

, (12.8)

one shows that (SEC.801) holds for H

s

simple and then takes limits. To show this, it suﬃces

to look at H

s

elementary and use linearity. To show (12.8) for H

s

elementary, ﬁrst prove this

in the case when K

s

is elementary, use linearity to extend it to the case when K is simple, and

then take limits to obtain it for arbitrary K. Thus one reduces the proof to showing (12.8)

when both H and K are elementary. In this situation, one can explicitly write out both sides

of the equation and see that they are equal.

59

13. Ito’s formula.

Suppose W

t

is a Brownian motion and f : R → R is a C

2

function, that is, f and its

ﬁrst two derivatives are continuous. Ito’s formula, which is sometime known as the change

of variables formula, says that

f(W

t

) −f(W

0

) =

_

t

0

f

(W

s

)dW

s

+

1

2

_

t

0

f

(W

s

)ds.

Compare this with the fundamental theorem of calculus:

f(t) −f(0) =

_

t

0

f

(s)ds.

In Ito’s formula we have a second order term to carry along.

The idea behind the proof is quite simple. By Taylor’s theorem.

f(W

t

) −f(W

0

) =

n−1

i=0

[f(W

(i+1)t/n

) −f(W

it/n

)]

≈

n−1

i=1

f

(W

it/n

)(W

(i+1)t/n

−W

it/n

)

+

1

2

n−1

i=0

f

(W

it/n

)(W

(i+1)t/n

−W

it/n

)

2

.

The ﬁrst sum on the right is approximately the stochastic integral and the second is

approximately the quadratic variation.

For a more general semimartingale X

t

= M

t

+A

t

, Ito’s formula reads

Theorem 13.1. If f ∈ C

2

, then

f(X

t

) −f(X

0

) =

_

t

0

f

(X

s

)dX

s

+

1

2

_

t

0

f

(X

s

)d¸M)

s

.

Let us look at an example. Let W

t

be Brownian motion, X

t

= σW

t

− σ

2

t/2, and

f(x) = e

x

. Then ¸X)

t

= ¸σW)

t

= σ

2

t, f

(x) = f”(x) = e

x

, and

e

σW

t

−σ

2

t/2

= 1 +

_

t

0

e

X

s

σdW

s

−

1

2

_

t

0

e

X

s 1

2

σ

2

ds (13.1)

+

1

2

_

t

0

e

X

s

σ

2

ds

= 1 +

_

t

0

e

X

s

σdW

s

.

60

This example will be revisited many times later on.

Let us give another example of the use of Ito’s formula. Let X

t

= W

t

and let

f(x) = x

k

. Then f

(x) = kx

k−1

and f

(x) = k(k −1)x

k−2

. We then have

W

k

t

= W

k

0

+

_

t

0

kW

k−1

s

dW

s

+

1

2

_

t

0

k(k −1)W

k−2

s

d¸W)

s

=

_

t

0

kW

k−1

s

dW

s

+

k(k −1)

2

_

t

0

W

k−2

s

ds.

When k = 3, this says W

3

t

−3

_

t

0

W

s

ds is a stochastic integral with respect to a Brownian

motion, and hence a martingale.

For a semimartingale X

t

= M

t

+A

t

we set ¸X)

t

= ¸M)

t

. Given two semimartingales

X, Y , we deﬁne

¸X, Y )

t

=

1

2

[¸X +Y )

t

−¸X)

t

−¸Y )

t

].

The following is known as Ito’s product formula. It may also be viewed as an

integration by parts formula.

Proposition 13.2. If X

t

and Y

t

are semimartingales,

X

t

Y

t

= X

0

Y

0

+

_

t

0

X

s

dY

s

+

_

t

0

Y

s

dX

s

+¸X, Y )

t

.

Proof. Applying Ito’s formula with f(x) = x

2

to X

t

+Y

t

, we obtain

(X

t

+Y

t

)

2

= (X

0

+Y

0

)

2

+ 2

_

t

0

(X

s

+Y

s

)(dX

s

+dY

s

) +¸X +Y )

t

.

Applying Ito’s formula with f(x) = x

2

to X and to Y , then

X

2

t

= X

2

0

+ 2

_

t

0

X

s

dX

s

+¸X)

t

and

Y

2

t

= Y

2

0

+ 2

_

t

0

Y

s

dY

s

+¸Y )

t

.

Then some algebra and the fact that

X

t

Y

t

=

1

2

[(X

t

+Y

t

)

2

−X

2

t

−Y

2

t

]

yields the formula.

There is a multidimensional version of Ito’s formula: if X

t

= (X

1

t

, . . . , X

d

t

) is a

vector, each component of which is a semimartingale, and f ∈ C

2

, then

f(X

t

) −f(X

0

) =

d

i=1

_

t

0

∂f

∂x

i

(X

s

)dX

i

s

+

1

2

d

i,j=1

_

t

0

∂

2

f

∂x

2

i

(X

s

)d¸X

i

, X

j

)

s

.

The following application of Ito’s formula, known as L´evy’s theorem, is important.

61

Theorem 13.3. Suppose M

t

is a continuous martingale with ¸M)

t

= t. Then M

t

is a

Brownian motion.

Before proving this, recall from undergraduate probability that the moment generating

function of a r.v. X is deﬁned by m

X

(a) = Ee

aX

and that if two random variables have

the same moment generating function, they have the same law. This is also true if we

replace a by iu. In this case we have ϕ

X

(u) = Ee

iuX

and ϕ

X

is called the characteristic

function of X. The reason for looking at the characteristic function is that ϕ

X

always

exists, whereas m

X

(a) might be inﬁnite. The one special case we will need is that if X is

a normal r.v. with mean 0 and variance t, then ϕ

X

(u) = e

−u

2

t/2

. This follows from the

formula for m

X

(a) with a replaced by iu (this can be justiﬁed rigorously).

Proof. We will prove that M

t

is a ^(0, t); for the remainder of the proof see Note 1.

We apply Ito’s formula with f(x) = e

iux

. Then

e

iuM

t

= 1 +

_

t

0

iue

iuM

s

dM

s

+

1

2

_

t

0

(−u

2

)e

iuM

s

d¸M)

s

.

Taking expectations and using ¸M)

s

= s and the fact that a stochastic integral is a

martingale, hence has 0 expectation, we have

Ee

iuM

t

= 1 −

u

2

2

_

t

0

e

iuM

s

ds.

Let J(t) = Ee

iuM

t

. The equation can be rewritten

J(t) = 1 −

u

2

2

_

t

0

J(s)ds.

So J

(t) = −

1

2

u

2

J(t) with J(0) = 1. The solution to this elementary ODE is J(t) =

e

−u

2

t/2

. Since

Ee

iuM

t

= e

−u

2

t/2

,

then by our remarks above the law of M

t

must be that of a ^(0, t), which shows that M

t

is a mean 0 variance t normal r.v.

Note 1. If A ∈ T

s

and we do the same argument with M

t

replaced by M

s+t

−M

s

, we have

e

iu(M

s+t

−M

s

)

= 1 +

_

t

0

iue

iu(M

s+r

−M

s

)

dM

r

+

1

2

_

t

0

(−u

2

)e

iu(M

s+r

−M

s

)

d¸M)

r

.

Multiply this by 1

A

and take expectations. Since a stochastic integral is a martingale, the

stochastic integral term again has expectation 0. If we let K(t) = E[e

iu(M

t+s

−M

t

)

; A], we

now arrive at K

(t) = −

1

2

u

2

K(t) with K(0) = P(A), so

K(t) = P(A)e

−u

2

t/2

.

62

Therefore

E

_

e

iu(M

t+s

−M

s

)

; A

_

= Ee

iu(M

t+s

−M

s

)

P(A). (13.2)

If f is a nice function and

´

f is its Fourier transform, replace u in the above by −u, multiply

by

´

f(u), and integrate over u. (To do the integral, we approximate the integral by a Riemann

sum and then take limits.) We then have

E[f(M

s+t

−M

s

); A] = E[f((M

s+t

−M

s

)]P(A).

By taking limits we have this for f = 1

B

, so

P(M

s+t

−M

s

∈ B, A) = P(M

s+t

−M

s

∈ B)P(A).

This implies that M

s+t

−M

s

is independent of T

s

.

Note Var (M

t

−M

s

) = t −s; take A = Ω in (13.2).

63

14. The Girsanov theorem.

Suppose P is a probability and

dX

t

= dW

t

+µ(X

t

)dt,

where W

t

is a Brownian motion. This is short hand for

X

t

= X

0

+W

t

+

_

t

0

µ(X

s

)ds. (14.1)

Let

M

t

= exp

_

−

_

t

0

µ(X

s

)dW

s

−

_

t

0

µ(X

s

)

2

ds/2

_

. (14.2)

Then as we have seen before, by Ito’s formula, M

t

is a martingale. This calculation is

reviewed in Note 1. We also observe that M

0

= 1.

Now let us deﬁne a new probability by setting

Q(A) = E[M

t

; A] (14.3)

if A ∈ T

t

. We had better be sure this Q is well deﬁned. If A ∈ T

s

⊂ T

t

, then E[M

t

; A] =

E[M

s

; A] because M

t

is a martingale. We also check that Q(Ω) = E[M

t

; Ω] = EM

t

. This

is equal to EM

0

= 1, since M

t

is a martingale.

What the Girsanov theorem says is

Theorem 14.1. Under Q, X

t

is a Brownian motion.

Under P, W

t

is a Brownian motion and X

t

is not. Under Q, the process W

t

is no

longer a Brownian motion.

In order for a process X

t

to be a Brownian motion, we need at a minimum that X

t

is mean zero and variance t. To deﬁne mean and variance, we need a probability. Therefore

a process might be a Brownian motion with respect to one probability and not another.

Most of the other parts of the deﬁnition of being a Brownian motion also depend on the

probability.

Similarly, to be a martingale, we need conditional expectations, and the conditional

expectation of a random variable depends on what probability is being used.

There is a more general version of the Girsanov theorem.

Theorem 14.2. If X

t

is a martingale under P, then under Q the process X

t

− D

t

is a

martingale where

D

t

=

_

t

0

1

M

s

d¸X, M)

s

.

64

¸X)

t

is the same under both P and Q.

Let us see how Theorem 14.1 can be used. Let S

t

be the stock price, and suppose

dS

t

= σS

t

dW

t

+mS

t

dt.

(So in the above formulation, µ(x) = m for all x.) Deﬁne

M

t

= e

(−m/σ)(W

t

)−(m

2

/2σ

2

)t

.

Then from (13.1) M

t

is a martingale and

M

t

= 1 +

_

t

0

_

−

m

σ

_

M

s

dW

s

.

Let X

t

= W

t

. Then

¸X, M)

t

=

_

t

0

_

−

m

σ

_

M

s

ds = −

_

t

0

M

s

m

σ

ds.

Therefore

_

t

0

1

M

s

d¸X, M)

s

= −

_

t

0

m

σ

ds = −(m/σ)t.

Deﬁne Q by (14.3). By Theorem 14.2, under Q the process

W

t

= W

t

+ (m/σ)t is a

martingale. Hence

dS

t

= σS

t

(dW

t

+ (m/σ)dt) = σS

t

d

W

t

,

or

S

t

= S

0

+

_

t

0

σS

s

d

W

s

is a martingale. So we have found a probability under which the asset price is a martingale.

This means that Q is the risk-neutral probability, which we have been calling P.

Let us give another example of the use of the Girsanov theorem. Suppose X

t

=

W

t

+µt, where µ is a constant. We want to compute the probability that X

t

exceeds the

level a by time t

0

.

We ﬁrst need the probability that a Brownian motion crosses a level a by time t

0

.

If A

t

= sup

s≤t

W

t

, (note we are not looking at [W

t

[), we have

P(A

t

> a, c ≤ W

t

≤ d) =

_

d

c

ϕ(t, a, x), (14.4)

where

ϕ(t, a, x) =

_

1

√

2πt

e

−x

2

/2t

x ≥ a

1

√

2πt

e

−(2a−x)

2

/2t

x < a.

65

This is called the reﬂection principle, and the name is due to the derivation, given in Note

2. Sometimes one says

P(W

t

= x, A

t

> a) = P(W

t

= 2a −x), x < a,

but this is not precise because W

t

is a continuous random variable and both sides of the

above equation are zero; (14.4) is the rigorous version of the reﬂection principle.

Now let W

t

be a Brownian motion under P. Let dQ/dP = M

t

= e

µW

t

−µ

2

t/2

. Let

Y

t

= W

t

− µt. Theorem 14.1 says that under Q, Y

t

is a Brownian motion. We have

W

t

= Y

t

+µt.

Let A = (sup

s≤t

0

W

s

≥ a). We want to calculate

P(sup

s≤t

0

(W

s

+µs) ≥ a).

W

t

is a Brownian motion under P while Y

t

is a Brownian motion under Q. So this proba-

bility is equal to

Q(sup

s≤t

0

(Y

s

+µs) ≥ a).

This in turn is equal to

Q(sup

s≤t

0

W

s

≥ a) = Q(A).

Now we use the expression for M

t

:

Q(A) = E

P

[e

µW

t

0

−µ

2

t

0

/2

; A]

=

_

∞

−∞

e

µx−µ

2

t

0

/2

P(sup

s≤t

0

W

s

≥ a, W

t

0

= x)dx

= e

−µ

2

t

0

/2

_

_

a

−∞

1

√

2πt

0

e

µx

e

−(2a−x)

2

/2t

0

dx +

_

∞

a

1

√

2πt

0

e

µx

e

−x

2

/2t

0

dx.

_

Proof of Theorem 14.1. Using Ito’s formula with f(x) = e

x

,

M

t

= 1 −

_

t

0

µ(X

r

)M

r

dW

r

.

So

¸W, M)

t

= −

_

t

0

µ(X

r

)M

r

dr.

Since Q(A) = E

P

[M

t

; A], it is not hard to see that

E

Q

[W

t

; A] = E

P

[M

t

W

t

; A].

66

By Ito’s product formula this is

E

P

_

_

t

0

M

r

dW

r

; A

_

+E

P

_

_

t

0

W

r

dM

r

; A

_

+E

P

_

¸W, M)

t

; A

_

.

Since

_

t

0

M

r

dW

r

and

_

t

0

W

r

dM

r

are stochastic integrals with respect to martingales, they

are themselves martingales. Thus the above is equal to

E

P

_

_

s

0

M

r

dW

r

; A

_

+E

P

_

_

s

0

W

r

dM

r

; A

_

+E

P

_

¸W, M)

t

; A

_

.

Using the product formula again, this is

E

P

[M

s

W

s

; A] +E

P

[¸W, M)

t

−¸W, M)

s

; A] = E

Q

[W

s

; A] +E

P

[¸W, M)

t

−¸W, M)

s

; A].

The last term on the right is equal to

E

P

_

_

t

s

d¸W, M)

r

; A

_

= E

P

_

−

_

t

s

M

r

µ(X

r

)dr; A

_

= E

P

_

−

_

t

s

E

P

[M

t

[ T

r

]µ(X

r

)dr; A

_

= E

P

_

−

_

t

s

M

t

µ(X

r

)dr; A

_

= E

Q

_

−

_

t

s

µ(X

r

)dr; A

_

= −E

Q

_

_

t

0

µ(X

r

) dr; A

_

+E

Q

_

_

s

0

µ(X

r

) dr; A

_

.

Therefore

E

Q

_

W

t

+

_

t

0

µ(X

r

)dr; A

_

= E

Q

_

W

s

+

_

s

0

µ(X

r

)dr; A

_

,

which shows X

t

is a martingale with respect to Q.

Similarly, X

2

t

− t is a martingale with respect to Q. By L´evy’s theorem, X

t

is a

Brownian motion.

In Note 3 we give a proof of Theorem 14.2 and in Note 4 we show how Theorem

14.1 is really a special case of Theorem 14.2.

Note 1. Let

Y

t

= −

_

t

0

µ(X

s

)dW

s

−

1

2

_

t

0

[µ(X

s

)]

2

ds.

We apply Ito’s formula with the function f(x) = e

x

. Note the martingale part of Y

t

is the

stochastic integral term and the quadratic variation of Y is the quadratic variation of the

martingale part, so

¸Y )

t

=

_

t

0

[−µ(X

s

)]

2

ds.

67

Then f

(x) = e

x

, f

(x) = e

x

, and hence

M

t

= e

Y

t

= e

Y

0

+

_

t

0

e

Y

s

dY

s

+

1

2

_

t

0

e

Y

s

d¸Y )

s

= 1 +

_

t

0

M

s

(−µ(X

s

)dW

s

−

1

2

_

t

0

[µ(X

s

)]

2

ds

+

1

2

_

t

0

M

s

[−µ(X

s

)]

2

ds

= 1 −

_

t

0

M

s

µ(X

s

)dW

s

.

Since stochastic integrals with respect to a Brownian motion are martingales, this completes

the argument that M

t

is a martingale.

Note 2. Let S

n

be a simple random walk. This means that X

1

, X

2

, . . . , are independent

and identically distributed random variables with P(X

i

= 1) = P(X

i

= −1) =

1

2

; let S

n

=

n

i=1

X

i

. If you are playing a game where you toss a fair coin and win $1 if it comes up heads

and lose $1 if it comes up tails, then S

n

will be your fortune at time n. Let A

n

= max

0≤k≤n

S

k

.

We will show the analogue of (14.4) for S

n

, which is

P(S

n

= x, A

n

≥ a) =

_

P(S

n

= x) x ≥ a

P(S

n

= 2a −x) x < a.

(14.5)

(14.4) can be derived from this using a weak convergence argument.

To establish (14.5), note that if x ≥ a and S

n

= x, then automatically A

n

≥ a, so

the only case to consider is when x < a. Any path that crosses a but is at level x at time n

has a corresponding path determined by reﬂecting across level a at the ﬁrst time the Brownian

motion hits a; the reﬂected path will end up at a + (a −x) = 2a −x. The probability on the

left hand side of (14.5) is the number of paths that hit a and end up at x divided by the total

number of paths. Since the number of paths that hit a and end up at x is equal to the number

of paths that end up at 2a−x, then the probability on the left is equal to the number of paths

that end up at 2a −x divided by the total number of paths; this is P(S

n

= 2a −x), which is

the right hand side.

Note 3. To prove Theorem 14.2, we proceed as follows. Assume without loss of generality

that X

0

= 0. Then if A ∈ T

s

,

E

Q

[X

t

; A] = E

P

[M

t

X

t

; A]

= E

P

_

_

t

0

M

r

dX

r

; A

_

+E

P

_

_

t

0

X

r

dM

r

; A

_

+E

P

[¸X, M)

t

; A]

= E

P

_

_

s

0

M

r

dX

r

; A

_

+E

P

_

_

s

0

X

r

dM

r

; A

_

+E

P

[¸X, M)

t

; A]

= E

Q

[X

s

; A] +E

Q

[¸X, M)

t

−¸X, M)

s

; A].

68

Here we used the fact that stochastic integrals with respect to the martingales X and M are

again martingales.

On the other hand,

E

P

[¸X, M)

t

−¸X, M)

s

; A] = E

P

_

_

t

s

d¸X, M)

r

; A

_

= E

P

_

_

t

s

M

r

dD

r

; A

_

= E

P

_

_

t

s

E

P

[M

t

[ T

r

] dD

r

; A

_

= E

P

_

_

t

s

M

t

dD

r

; A

_

= E

P

[(D

t

−D

s

)M

t

; A]

= E

Q

[D

t

−D

s

; A].

The proof of the quadratic variation assertion is similar.

Note 4. Here is an argument showing how Theorem 14.1 can also be derived from Theorem

14.2.

From our formula for M we have dM

t

= −M

t

µ(X

t

)dW

t

, and therefore d¸X, M)

t

=

−M

t

µ(X

t

)dt. Hence by Theorem 14.2 we see that under Q, X

t

is a continuous martingale

with ¸X)

t

= t. By L´evy’s theorem, this means that X is a Brownian motion under Q.

69

15. Stochastic diﬀerential equations.

Let W

t

be a Brownian motion. We are interested in the existence and uniqueness

for stochastic diﬀerential equations (SDEs) of the form

dX

t

= σ(X

t

) dW

t

+b(X

t

) dt, X

0

= x

0

. (15.1)

This means X

t

satisﬁes

X

t

= x

0

+

_

t

0

σ(X

s

) dW

s

+

_

t

0

b(X

s

) ds. (15.2)

Here W

t

is a Brownian motion, and (15.2) holds for almost every ω.

We have to make some assumptions on σ and b. We assume they are Lipschitz,

which means:

[σ(x) −σ(y)[ ≤ c[x −y[, [b(x) −b(y)[ ≤ c[x −y[

for some constant c. We also suppose that σ and b grow at most linearly, which means:

[σ(x)[ ≤ c(1 +[x[), [b(x)[ ≤ c(1 +[x[).

Theorem 15.1. There exists one and only one solution to (15.2).

The idea of the proof is Picard iteration, which is how existence and uniqueness for

ordinary diﬀerential equations is proved; see Note 1.

The intuition behind (15.1) is that X

t

behaves locally like a multiple of Brownian

motion plus a constant drift: locally X

t+h

−X

t

≈ σ(W

t+h

−W

t

) +b((t +h) −t). However

the constants σ and b depend on the current value of X

t

. When X

t

is at diﬀerent points,

the coeﬃcients vary, which is why they are written σ(X

t

) and b(X

t

). σ is sometimes called

the diﬀusion coeﬃcient and µ is sometimes called the drift coeﬃcient.

The above theorem also works in higher dimensions. We want to solve

dX

i

t

=

d

j=1

σ

ij

(X

s

)dW

j

s

+b

i

(X

s

)ds, i = 1, . . . , d.

This is an abbreviation for the equation

X

i

t

= x

i

0

+

_

t

0

d

j=1

σ

ij

(X

s

)dW

j

s

+

_

t

0

b

i

(X

s

)ds.

Here the initial value is x

0

= (x

1

0

, . . . , x

d

0

), the solution process is X

t

= (X

1

t

, . . . , X

d

t

), and

W

1

t

, . . . , W

d

t

are d independent Brownian motions. If all of the σ

ij

and b

i

are Lipschitz

and grow at most linearly, we have existence and uniqueness for the solution.

70

Suppose one wants to solve

dZ

t

= aZ

t

dW

t

+bZ

t

dt.

Note that this equation is linear in Z

t

, and it turns out that linear equations are almost

the only ones that have an explicit solution. In this case we can write down the explicit

solution and then verify that it satisﬁes the SDE. The uniqueness result above (Theorem

15.1) shows that we have in fact found the solution.

Let

Z

t

= Z

0

e

aW

t

−a

2

t/2+bt

.

We will verify that this is correct by using Ito’s formula. Let X

t

= aW

t

−a

2

t/2+bt. Then

X

t

is a semimartingale with martingale part aW

t

and ¸X)

t

= a

2

t. Z

t

= e

X

t

. By Ito’s

formula with f(x) = e

x

,

Z

t

= Z

0

+

_

t

0

e

X

s

dX

s

+

1

2

_

t

0

e

X

s

a

2

ds

= Z

0

+

_

t

0

aZ

s

dW

s

−

_

t

0

a

2

2

Z

s

ds +

_

t

0

bZ

s

ds

+

1

2

_

t

0

a

2

Z

s

ds

=

_

t

0

aZ

s

dW

s

+

_

t

0

bZ

s

ds.

This is the integrated form of the equation we wanted to solve.

There is a connection between SDEs and partial diﬀerential equations. Let f be a

C

2

function. If we apply Ito’s formula,

f(X

t

) = f(X

0

) +

_

t

0

f

(X

s

)dX

s

+

1

2

_

t

0

f

(X

s

)d¸X)

s

.

From (15.2) we know ¸X)

t

=

_

t

0

σ(X

s

)

2

ds. If we substitute for dX

s

and d¸X)

s

, we obtain

f(X

t

) = f(X

0

) +

_

t

0

f

(X

s

)dW

s

+

_

t

0

µ(X

s

)ds

+

1

2

_

t

0

f

(X

s

)σ(X

s

)

2

ds

= f(X

0

) +

_

t

0

f

(X

s

)dW

s

+

_

t

0

Lf(X

s

)ds,

where we write

Lf(x) =

1

2

σ(x)

2

f

(x) +µ(x)f

(x).

71

L is an example of a diﬀerential operator. Since the stochastic integral with respect to a

Brownian motion is a martingale, we see from the above that

f(X

t

) −f(X

0

) −

_

t

0

Lf(X

s

)ds

is a martingale. This fact can be exploited to derive results about PDEs from SDEs and

vice versa.

Note 1. Let us illustrate the uniqueness part, and for simplicity, assume b is identically 0 and

σ is bounded.

Proof of uniqueness. If X and Y are two solutions,

X

t

−Y

t

=

_

t

0

[σ(X

s

) −σ(Y

s

)]dW

s

.

So

E[X

t

−Y

t

[

2

= E

_

t

0

[σ(X

s

) −σ(Y

s

)[

2

ds ≤ c

_

t

0

E[X

s

−Y

s

[

2

ds,

using the Lipschitz hypothesis on σ. If we let g(t) = E[X

t

−Y

t

[

2

, we have

g(t) ≤ c

_

t

0

g(s) ds.

Since we are assuming σ is bounded, EX

2

t

= E

_

t

0

(σ(X

s

))

2

ds ≤ ct and similarly for EY

2

t

, so

g(t) ≤ ct. Then

g(t) ≤ c

_

t

0

_

c

_

s

0

g(r) dr

_

ds.

Iteration implies

g(t) ≤ At

n

/n!

for each n, which implies g must be 0.

72

16. Continuous time ﬁnancial models.

The most common model by far in ﬁnance is one where the security price is based

on a Brownian motion. One does not want to say the price is some multiple of Brownian

motion for two reasons. First, of all, a Brownian motion can become negative, which

doesn’t make sense for stock prices. Second, if one invests $1,000 in a stock selling for $1

and it goes up to $2, one has the same proﬁt, namely, $1,000, as if one invests $1,000 in a

stock selling for $100 and it goes up to $200. It is the proportional increase one wants.

Therefore one sets ∆S

t

/S

t

to be the quantity related to a Brownian motion. Diﬀer-

ent stocks have diﬀerent volatilities σ (consider a high-tech stock versus a pharmaceutical).

In addition, one expects a mean rate of return µ on one’s investment that is positive (oth-

erwise, why not just put the money in the bank?). In fact, one expects the mean rate

of return to be higher than the risk-free interest rate r because one expects something in

return for undertaking risk.

So the model that is used is to let the stock price be modeled by the SDE

dS

t

/S

t

= σdW

t

+µdt,

or what looks better,

dS

t

= σS

t

dW

t

+µS

t

dt. (16.1)

Fortunately this SDE is one of those that can be solved explicitly, and in fact we

gave the solution in Section 15.

Proposition 16.1. The solution to (16.1) is given by

S

t

= S

0

e

σW

t

+(µ−(σ

2

/2)t)

. (16.2)

Proof. Using Theorem 15.1 there will only be one solution, so we need to verify that S

t

as given in (16.2) satisﬁes (16.1). We already did this, but it is important enough that we

will do it again. Let us ﬁrst assume S

0

= 1. Let X

t

= σW

t

+ (µ −(σ

2

/2)t, let f(x) = e

x

,

and apply Ito’s formula. We obtain

S

t

= e

X

t

= e

X

0

+

_

t

0

e

X

s

dX

s

+

1

2

_

t

0

e

X

s

d¸X)

s

= 1 +

_

t

0

S

s

σdW

s

+

_

t

0

S

s

(µ −

1

2

σ

2

)ds

+

1

2

_

t

0

S

s

σ

2

ds

= 1 +

_

t

0

S

s

σdW

s

+

_

t

0

S

s

µds,

73

which is (16.1). If S

0

,= 0, just multiply both sides by S

0

.

Suppose for the moment that the interest rate r is 0. If one purchases ∆

0

shares

(possibly a negative number) at time t

0

, then changes the investment to ∆

1

shares at time

t

1

, then changes the investment to ∆

2

at time t

2

, etc., then one’s wealth at time t will be

X

t

0

+ ∆

0

(S

t

1

−S

t

0

) + ∆

1

(S

t

2

−S

t

1

) + + ∆

i

(S

t

i+1

−S

t

i

). (16.3)

To see this, at time t

0

one has the original wealth X

t

0

. One buys ∆

0

shares and the cost

is ∆

0

S

t

0

. At time t

1

one sells the ∆

0

shares for the price of S

t

1

per share, and so one’s

wealth is now X

t

0

+ ∆

0

(S

t

1

− S

t

0

). One now pays ∆

1

S

t

1

for ∆

1

shares at time t

1

and

continues. The right hand side of (16.3) is the same as

X

t

0

+

_

t

t

0

∆(s)dS

s

,

where we have t ≥ t

i+1

and ∆(s) = ∆

i

for t

i

≤ s < t

i+1

. In other words, our wealth is

given by a stochastic integral with respect to the stock price. The requirement that the

integrand of a stochastic integral be adapted is very natural: we cannot base the number

of shares we own at time s on information that will not be available until the future.

How should we modify this when the interest rate r is not zero? Let P

t

be the

present value of the stock price. So

P

t

= e

−rt

S

t

.

Note that P

0

= S

0

. When we hold ∆

i

shares of stock from t

i

to t

i+1

, our proﬁt in present

days dollars will be

∆

i

(P

t

i+1

−P

t

i

).

The formula for our wealth then becomes

X

t

0

+

_

t

t

0

∆(s)dP

s

.

By Ito’s product formula,

dP

t

= e

−rt

dS

t

−re

−rt

S

t

dt

= e

−rt

σS

t

dW

t

+e

−rt

µS

t

dt −re

−rt

S

t

dt

= σP

t

dW

t

+ (µ −r)P

t

dt.

Similarly to (16.2), the solution to this SDE is

P

t

= P

0

e

σW

t

+(µ−r−σ

2

/2)t

. (16.4)

The continuous time model of ﬁnance is that the security price is given by (16.1)

(often called geometric Brownian motion), that there are no transaction costs, but one can

trade as many shares as one wants and vary the amount held in a continuous fashion. This

clearly is not the way the market actually works, for example, stock prices are discrete,

but this model has proved to be a very good one.

74

17. Markov properties of Brownian motion.

Let W

t

be a Brownian motion. Because W

t+r

−W

t

is independent of σ(W

s

: s ≤ t),

then knowing the path of W up to time s gives no help in predicting W

t+r

− W

t

. In

particular, if we want to predict W

t+r

and we know W

t

, then knowing the path up to time

t gives no additional advantage in predicting W

t+r

. Phrased another way, this says that

to predict the future, we only need to know where we are and not how we got there.

Let’s try to give a more precise description of this property, which is known as the

Markov property.

Fix r and let Z

t

= W

t+r

− W

r

. Clearly the map t → Z

t

is continuous since the

same is true for W. Since Z

t

− Z

s

= W

t+r

− W

s+r

, then the distribution of Z

t

− Z

s

is

normal with mean zero and variance (t +r) −(s +r). One can also check the other parts

of the deﬁnition to show that Z

t

is also a Brownian motion.

Recall that a stopping time in the continuous framework is a r.v. T taking values

in [0, ∞) such that (T ≤ t) ∈ T

t

for all t. To make a satisfactory theory, we need that the

T

t

be right continuous (see Section 10), but this is fairly technical and we will ignore it.

If T is a stopping time, T

T

is the collection of events A such that A∩(T > t) ∈ T

t

for all t.

Let us try to provide some motivation for this deﬁnition of T

T

. It will be simpler to

consider the discrete time case. The analogue of T

T

in the discrete case is the following:

if N is a stopping time, let

T

N

= ¦A : A∩ (N ≤ k) ∈ T

k

for all k¦.

If X

k

is a sequence that is adapted to the σ-ﬁelds T

k

, that is, X

k

is T

k

measurable when

k = 0, 1, 2, . . ., then knowing which events in T

k

have occurred allows us to calculate X

k

for each k. So a reasonable deﬁnition of T

N

should allow us to calculate X

N

whenever

we know which events in T

N

have occurred or not. Or phrased another way, we want X

N

to be T

N

measurable. Where did the sequence X

k

come from? It could be any adapted

sequence. Therefore one deﬁnition of the σ-ﬁeld of events occurring before time N might

be:

Consider the collection of random variables X

N

where X

k

is a sequence adapted

to T

k

. Let (

N

be the smallest σ-ﬁeld with respect to which each of these random

variables X

N

is measurable.

In other words, we want (

N

to be the σ-ﬁeld generated by the collection of random

variables X

N

for all sequences X

k

that are adapted to T

k

.

We show in Note 1 that T

N

= (

N

. The σ-ﬁeld T

N

is just a bit easier to work with.

Now we proceed to the strong Markov property for Brownian motion, the proof of

which is given in Note 2.

75

Proposition 17.1. If X

t

is a Brownian motion and T is a bounded stopping time, then

X

T+t

−X

T

is a mean 0 variance t random variable and is independent of T

T

.

This proposition says: if you want to predict X

T+t

, you could do it knowing all of

T

T

or just knowing X

T

. Since X

T+t

− X

T

is independent of T

T

, the extra information

given in T

T

does you no good at all.

We need a way of expressing the Markov and strong Markov properties that will

generalize to other processes.

Let W

t

be a Brownian motion. Consider the process W

x

t

= x+W

t

, which is known

as Brownian motion started at x. Deﬁne Ω

**to be set of continuous functions on [0, ∞), let
**

X

t

(ω) = ω(t), and let the σ-ﬁeld be the one generated by the X

t

. Deﬁne P

x

on (Ω

, T

) by

P

x

(X

t

1

∈ A

1

, . . . , X

t

n

∈ A

n

) = P(W

x

t

1

∈ A

1

, . . . , W

x

t

n

∈ A

n

).

What we have done is gone from one probability space Ω with many processes W

x

t

to one

process X

t

with many probability measures P

x

.

An example in the Markov chain setting might help. No knowledge of Markov chains

is necessary to understand this. Suppose we have a Markov chain with 3 states, A, B, and

C. Suppose we have a probability P and three diﬀerent Markov chains. The ﬁrst, called

X

A

n

, represents the position at time n for the chain started at A. So X

A

0

= A, and X

A

1

can

be one of A, B, C, X

A

2

can be one of A, B, C, and so on. Similarly we have X

B

n

, the chain

started at B, and X

C

n

. Deﬁne Ω

**= ¦(AAA), (AAB), (ABA), . . . , (BAA), (BAB), . . .¦.
**

So Ω

**denotes the possible sequence of states for time n = 0, 1, 2. If ω = ABA, set
**

Y

0

(ω) = A, Y

1

(ω) = B, Y

2

(ω) = A, and similarly for all the other 26 values of ω. Deﬁne

P

A

(AAA) = P(X

A

0

= A, X

A

1

= A, X

A

2

= A). Similarly deﬁne P

A

(AAB), . . .. Deﬁne

P

B

(AAA) = P(X

B

0

= A, X

B

1

= A, X

B

2

= A) (this will be 0 because we know X

B

0

= B),

and similarly for the other values of ω. We also deﬁne P

C

. So we now have one process,

Y

n

, and three probabilities P

A

, P

B

, P

C

. As you can see, there really isn’t all that much

going on here.

Here is another formulation of the Markov property.

Proposition 17.2. If s < t and f is bounded or nonnegative, then

E

x

[f(X

t

) [ T

s

] = E

X

s

[f(X

t−s

)], a.s.

The right hand side is to be interpreted as follows. Deﬁne ϕ(x) = E

x

f(X

t−s

). Then

E

X

s

f(X

t−s

) means ϕ(X

s

(ω)). One often writes P

t

f(x) for E

x

f(X

t

). We prove this in

Note 3.

This formula generalizes: If s < t < u, then

E

x

[f(X

t

)g(X

u

) [ T

s

] = E

X

s

[f(X

t−s

)g(X

u−s

)],

76

and so on for functions of X at more times.

Using Proposition 17.1, the statement and proof of Proposition 17.2 can be extended

to stopping times.

Proposition 17.3. If T is a bounded stopping time, then

E

x

[f(X

T+t

) [ T

T

] = E

X

T

[f(X

t

)].

We can also establish the Markov property and strong Markov property in the

context of solutions of stochastic diﬀerential equations. If we let X

x

t

denote the solution

to

X

x

t

= x +

_

t

0

σ(X

x

s

)dW

s

+

_

t

0

b(X

x

s

)ds,

so that X

x

t

is the solution of the SDE started at x, we can deﬁne new probabilities by

P

x

(X

t

1

∈ A

1

, . . . , X

t

n

∈ A

n

) = P(X

x

t

1

∈ A

1

, . . . , X

x

t

n

∈ A

n

).

This is similar to what we did in deﬁning P

x

for Brownian motion, but here we do not

have translation invariance. One can show that when there is uniqueness for the solution

to the SDE, the family (P

x

, X

t

) satisﬁes the Markov and strong Markov property. The

statement is precisely the same as the statement of Proposition 17.3.

Note 1. We want to show (

N

= T

N

. Since (

N

is the smallest σ-ﬁeld with respect to which

X

N

is measurable for all adapted sequences X

k

and it is easy to see that T

N

is a σ-ﬁeld, to

show (

N

⊂ T

N

, it suﬃces to show that X

N

is measurable with respect to T

N

whenever X

k

is adapted. Therefore we need to show that for such a sequence X

k

and any real number a,

the event (X

N

> a) ∈ T

N

.

Now (X

N

> a) ∩ (N = j) = (X

j

> a) ∩ (N = j). The event (X

j

> a) ∈ T

j

since X is an adapted sequence. Since N is a stopping time, then (N ≤ j) ∈ T

j

and

(N ≤ j −1)

c

∈ T

j−1

⊂ T

j

, and so the event (N = j) = (N ≤ j) ∩ (N ≤ j −1)

c

is in T

j

. If

j ≤ k, then (N = j) ∈ T

j

⊂ T

k

. Therefore

(X

N

> a) ∩ (N ≤ k) = ∪

k

j=0

((X

N

> a) ∩ (N = j)) ∈ T

k

,

which proves that (X

N

> a) ∈ T

N

.

To show T

N

⊂ (

N

, we suppose that A ∈ T

N

. Let X

k

= 1

A∩(N≤k)

. Since A ∈ T

N

,

then A∩(N ≤ k) ∈ T

k

, so X

k

is T

k

measurable. But X

N

= 1

A∩(N≤N)

= 1

A

, so A = (X

N

>

0) ∈ (

N

. We have thus shown that T

N

⊂ (

N

, and combining with the previous paragraph,

we conclude T

N

= (

N

.

77

Note 2. Let T

n

be deﬁned by T

n

(ω) = (k + 1)/2

n

if T(ω) ∈ [k/2

n

, (k + 1)/2

n

). It is easy

to check that T

n

is a stopping time. Let f be continuous and A ∈ T

T

. Then A ∈ T

T

n

as

well. We have

E[f(X

T

n

+t

−X

T

n

); A] =

E[f(X k

2

n

+t

−X k

2

n

); A∩ T

n

= k/2

n

]

=

E[f(X k

2

n

+t

−X k

2

n

)]P(A∩ T

n

= k/2

n

)

= Ef(X

t

)P(A).

Let n → ∞, so

E[f(X

T+t

−X

T

); A] = Ef(X

t

)P(A).

Taking limits this equation holds for all bounded f.

If we take A = Ω and f = 1

B

, we see that X

T+t

−X

T

has the same distribution as X

t

,

which is that of a mean 0 variance t normal random variable. If we let A ∈ T

T

be arbitrary

and f = 1

B

, we see that

P(X

T+t

−X

T

∈ B, A) = P(X

t

∈ B)P(A) = P(X

T+t

−X

T

∈ B)P(A),

which implies that X

T+t

−X

T

is independent of T

T

.

Note 3. Before proving Proposition 17.2, recall from undergraduate analysis that every

bounded function is the limit of linear combinations of functions e

iux

, u ∈ R. This follows

from using the inversion formula for Fourier transforms. There are various slightly diﬀerent

formulas for the Fourier transform. We use

´

f(u) =

_

e

iux

f(x) dx. If f is smooth enough and

has compact support, then one can recover f by the formula

f(x) =

1

2π

_

e

−iux

´

f(u) du.

We can ﬁrst approximate this improper integral by

1

2π

_

N

−N

e

−iux

´

f(u) du

by taking N larger and larger. For each N we can approximate

1

2π

_

N

−N

e

−iux

´

f(u) du by using

Riemann sums. Thus we can approximate f(x) by a linear combination of terms of the form

e

iu

j

x

. Finally, bounded functions can be approximated by smooth functions with compact

support.

Proof. Let f(x) = e

iux

. Then

E

x

[e

iuX

t

[ T

s

] = e

iuX

s

E

x

[e

iu(X

t

−X

s

)

[ T

s

]

= e

iuX

s

e

−u

2

(t−s)/2

.

On the other hand,

ϕ(y) = E

y

[f(X

t−s

)] = E[e

iu(W

t−s

+y)

] = e

iuy

e

−u

2

(t−s)/2

.

So ϕ(X

s

) = E

x

[e

iuX

t

[ T

s

]. Using linearity and taking limits, we have the lemma for all f.

78

18. Martingale representation theorem.

In this section we want to show that every random variable that is T

t

measurable

can be written as a stochastic integral of Brownian motion. In the next section we use

this to show that under the model of geometric Brownian motion the market is complete.

This means that no matter what option one comes up with, one can exactly replicate the

result (no matter what the market does) by buying and selling shares of stock.

In mathematical terms, we let T

t

be the σ-ﬁeld generated by W

s

, s ≤ t. From (16.2)

we see that T

t

is also the same as the σ-ﬁeld generated by S

s

, s ≤ t, so it doesn’t matter

which one we work with. We want to show that if V is T

t

measurable, then there exists

H

s

adapted such that

V = V

0

+

_

H

s

dW

s

, (18.1)

where V

0

is a constant.

Our goal is to prove

Theorem 18.1. If V is T

t

measurable and EV

2

< ∞, then there exists a constant c and

an adapted integrand H

s

with E

_

t

0

H

2

s

ds < ∞ such that

V = c +

_

t

0

H

s

dW

s

.

Before we prove this, let us explain why this is called a martingale representation

theorem. Suppose M

s

is a martingale adapted to T

s

, where the T

s

are the σ-ﬁeld generated

by a Brownian motion. Suppose also that EM

2

t

< ∞. Set V = M

t

. By Theorem 18.1, we

can write

M

t

= V = c +

_

t

0

H

s

dW

s

.

The stochastic integral is a martingale, so for r ≤ t,

M

r

= E[M

t

[ T

r

] = c +E

_

_

t

0

H

s

dW

s

[ T

r

_

= c +

_

r

0

H

s

dW

s

.

We already knew that stochastic integrals were martingales; what this says is the converse:

every martingale can be represented as a stochastic integral. Don’t forget that we need

EM

2

t

< ∞ and M

s

adapted to the σ-ﬁelds of a Brownian motion.

In Note 1 we show that if every martingale can be represented as a stochastic

integral, then every random variable V that is T

t

measurable can, too, provided EV

2

< ∞.

There are several proofs of Theorem 18.1. Unfortunately, they are all technical. We

outline one proof here, giving details in the notes. We start with the following, proved in

Note 2.

79

Proposition 18.2. Suppose

V

n

= c

n

+

_

t

0

H

n

s

dW

s

,

c

n

→ c,

E[V

n

−V [

2

→ 0,

and for each n the process H

n

is adapted with E

_

t

0

(H

n

s

)

2

ds < ∞. Then there exist a

constant c and an adapted H

s

with E

_

t

0

H

2

s

ds < ∞ so that

V

t

= c +

_

t

0

H

s

dW

s

.

What this proposition says is that if we can represent a sequence of random variables V

n

and V

n

→ V , then we can represent V .

Let 1 be the collection of random variables that can be represented as stochastic

integrals. By this we mean

1 = ¦V : EV

2

< ∞,V is T

t

measurable,V = c +

_

t

0

H

s

dW

s

for some adapted H with E

_

t

0

H

2

s

ds < ∞¦.

Next we show 1 contains a particular collection of random variables. (The proof is

in Note 3.)

Proposition 18.3. If g is bounded, the random variable g(W

t

) is in 1.

An almost identical proof shows that if f is bounded, then

f(W

t

−W

s

) = c +

_

t

s

H

r

dW

r

for some c and H

r

.

Proposition 18.4. If t

0

≤ t

1

≤ ≤ t

n

≤ t and f

1

, . . . , f

n

are bounded functions, then

f

1

(W

t

1

−W

t

0

)f

2

(W

t

2

−W

t

1

) f

n

(W

t

n

−W

t

n−1

) is in 1.

See Note 4 for the proof.

We now ﬁnish the proof of Theorem 18.1. We have shown that a large class of

random variables is contained in 1.

Proof of Theorem 18.1. We have shown that random variables of the form

f

1

(W

t

1

−W

t

0

)f

2

(W

t

2

−W

t

1

) f

n

(W

t

n

−W

t

n−1

) (18.2)

80

are in 1. Clearly if V

i

∈ 1 for i = 1, . . . , m, and a

i

are constants, then a

1

V

1

+ a

m

V

m

is

also in 1. Finally, from measure theory we know that if EV

2

< ∞and V is T

t

measurable,

we can ﬁnd a sequence V

k

such that E[V

k

−V [

2

→ 0 and each V

k

is a linear combination

of random variables of the form given in (18.2). Now apply Proposition 18.2.

Note 1. Suppose we know that every martingale M

s

adapted to T

s

with EM

2

t

can be

represented as M

r

= c+

_

r

0

H

s

dW

s

for some suitable H. If V is T

t

measurable with EV

2

< ∞,

let M

r

= E[V [ T

r

]. We know this is a martingale, so

M

r

= c +

_

r

0

H

s

dW

s

for suitable H. Applying this with r = t,

V = E[V [ T

t

] = M

t

= c +

_

t

0

H

s

dW

s

.

Note 2. We prove Proposition 18.2. By our assumptions,

E[(V

n

−c

n

) −(V

m

−c

m

)[

2

→ 0

as n, m → ∞. So

E

¸

¸

¸

_

t

0

(H

n

s

−H

m

s

)dW

s

¸

¸

¸

2

→ 0.

From our formulas for stochastic integrals, this means

E

_

t

0

[H

n

s

−H

m

s

[

2

ds → 0.

This says that H

n

s

is a Cauchy sequence in the space L

2

(with respect to the norm | |

2

given

by |Y |

2

=

_

E

_

t

0

Y

2

s

ds

_

1/2

). Measure theory tells us that L

2

is a complete metric space, so

there exists H

s

such that

E

_

t

0

[H

n

s

−H

s

[

2

ds → 0.

In particular H

n

s

→ H

s

, and this implies H

s

is adapted. Another consequence, due to Fatou’s

lemma, is that E

_

t

0

H

2

s

ds.

Let U

t

=

_

t

0

H

s

dW

s

. Then as above,

E[(V

n

−c

n

) −U

t

[

2

= E

_

t

0

(H

n

s

−H

s

)

2

ds → 0.

Therefore U

t

= V −c, and U has the desired form.

81

Note 3. Here is the proof of Proposition 18.3. By Ito’s formula with X

s

= −iuW

s

+u

2

s/2

and f(x) = e

x

,

e

X

t

= 1 +

_

t

0

e

X

s

(−iu)dW

s

+

_

t

0

e

X

s

(u

2

/2)ds

+

1

2

_

t

0

e

X

s

(−iu)

2

ds

= 1 −iu

_

t

0

e

X

s

dW

s

.

If we multiply both sides by e

−u

2

t/2

, which is a constant and hence adapted, we obtain

e

−iuW

t

= c

u

+

_

t

0

H

u

s

dW

s

(18.3)

for an appropriate constant c

u

and integrand H

u

.

If f is a smooth function (e.g., C

∞

with compact support), then its Fourier transform

´

f will also be very nice. So if we multiply (18.3) by

´

f(u) and integrate over u from −∞ to

∞, we obtain

f(W

t

) = c +

_

t

0

H

s

dW

s

for some constant c and some adapted integrand H. (We implicitly used Proposition 18.2,

because we approximate our integral by Riemann sums, and then take a limit.) Now using

Proposition 18.2 we take limits and obtain the proposition.

Note 4. The argument is by induction; let us do the case n = 2 for clarity. So we suppose

V = f(W

t

)g(W

u

−W

t

).

From Proposition 18.3 we now have that

f(W

t

) = c +

_

t

0

H

s

dW

s

, g(W

u

−W

t

) = d +

_

u

t

K

s

dW

s

.

Set H

r

= H

r

if 0 ≤ s < t and 0 otherwise. Set K

r

= K

r

if s ≤ r < t and 0 otherwise. Let

X

s

= c +

_

s

0

H

r

dW

r

and Y

s

= d +

_

s

0

K

r

dW

r

. Then

¸X, Y )

s

=

_

s

0

H

r

K

r

dr = 0.

Then by the Ito product formula,

X

s

Y

s

= X

0

Y

0

+

_

s

0

X

r

dY

r

+

_

s

0

Y

r

dX

r

+¸X, Y )

s

= cd +

_

s

0

[X

r

K

r

+Y

r

H

r

]dW

r

.

82

If we now take s = u, that is exactly what we wanted. Note that X

r

K

r

+Y

r

H

r

is 0 if r > u;

this is needed to do the general induction step.

83

19. Completeness.

Now let P

t

be a geometric Brownian motion. As we mentioned in Section 16, if

P

t

= P

0

exp(σW

t

+ (µ − r − σ

2

/2)t), then given P

t

we can determine W

t

and vice versa,

so the σ ﬁelds generated by P

t

and W

t

are the same. Recall P

t

satisﬁes

dP

t

= σP

t

dW

t

+ (µ −r)P

t

dt.

Deﬁne a new probability P by

dP

dP

= M

t

= exp(aW

t

−a

2

t/2).

By the Girsanov theorem,

W

t

= W

t

−at

is a Brownian motion under P. So

dP

t

= σP

t

d

W

t

+σP

t

adt + (µ −r)P

t

dt.

If we choose a = −(µ −r)/σ, we then have

dP

t

= σP

t

d

W

t

. (19.1)

Since

W

t

is a Brownian motion under P, then P

t

must be a martingale, since it is a

stochastic integral of a Brownian motion. We can rewrite (19.1) as

d

W

t

= σ

−1

P

−1

t

dP

t

. (19.2)

Given a T

t

measurable variable V , we know by Theorem 18.1 that there exist a

constant and an adapted process H

s

such that E

_

t

0

H

2

s

ds < ∞ and

V = c +

_

t

0

H

s

d

W

s

.

But then using (19.2) we have

V = c +

_

t

0

H

s

σ

−1

P

−1

s

dP

s

.

We have therefore proved

Theorem 19.1. If P

t

is a geometric Brownian motion and V is T

t

measurable and square

integrable, then there exist a constant c and an adapted process K

s

such that

V = c +

_

t

0

K

s

dP

s

.

Moreover, there is a probability P under which P

t

is a martingale.

The probability P is called the risk-neutral measure. Under P the present day value

of the stock price is a martingale.

84

20. Black-Scholes formula, I.

We can now derive the formula for the price of any option. Let T ≥ 0 be a ﬁxed

real. If V is T

T

measurable, we have by Theorem 19.1 that

V = c +

_

T

0

K

s

dP

s

, (20.1)

and under P, the process P

s

is a martingale.

Theorem 20.1. The price of V must be EV .

Proof. This is the “no arbitrage” principle again. Suppose the price of the option V at

time 0 is W. Starting with 0 dollars, we can sell the option V for W dollars, and use the

W dollars to buy and trade shares of the stock. In fact, if we use c of those dollars, and

invest according to the strategy of holding K

s

shares at time s, then at time T we will

have

e

rT

(W

0

−c) +V

dollars. At time T the buyer of our option exercises it and we use V dollars to meet that

obligation. That leaves us a proﬁt of e

rT

(W

0

−c) if W

0

> c, without any risk. Therefore

W

0

must be less than or equal to c. If W

0

< c, we just reverse things: we buy the option

instead of sell it, and hold −K

s

shares of stock at time s. By the same argument, since

we can’t get a riskless proﬁt, we must have W

0

≥ c, or W

0

= c.

Finally, under P the process P

t

is a martingale. So taking expectations in (20.1),

we obtain

EV = c.

The formula in the statement of Theorem 20.1. is amenable to calculation. Suppose

we have the standard European option, where

V = e

−rt

(S

t

−K)

+

= (e

−rt

S

t

−e

−rt

K)

+

= (P

t

−e

−rt

K)

+

.

Recall that under P the stock price satisﬁes

dP

t

= σP

t

d

W

t

,

where

W

t

is a Brownian motion under P. So then

P

t

= P

0

e

σ ¯ W

t

−σ

2

t/2

.

85

Hence

EV = E[(P

T

−e

−rT

K)

+

] (20.2)

= E[(P

0

e

σ ¯ W

T

−(σ

2

/2)T

−e

−rT

K)

+

].

We know the density of

W

T

is just (2πT)

−1/2

e

−y

2

/(2T)

, so we can do some calculations

(see Note 1) and end up with the famous Black-Scholes formula:

W

0

= xΦ(g(x, T)) −Ke

−rT

Φ(h(x, T)),

where Φ(z) =

1

√

2π

_

z

−∞

e

−y

2

/2

dy, x = P

0

= S

0

,

g(x, T) =

log(x/K) + (r +σ

2

/2)T

σ

√

T

,

h(x, T) = g(x, T) −σ

√

T.

It is of considerable interest that the ﬁnal formula depends on σ but is completely

independent of µ. The reason for that can be explained as follows. Under P the process P

t

satisﬁes dP

t

= σP

t

d

W

t

, where

W

t

is a Brownian motion. Therefore, similarly to formulas

we have already done,

P

t

= P

0

e

σ ¯ W

t

−σ

2

t/2

,

and there is no µ present here. (We used the Girsanov formula to get rid of the µ.) The

price of the option V is

E[P

T

−e

−rT

K]

+

, (20.3)

which is independent of µ since P

t

is.

Note 1. We want to calculate

E

_

(xe

σ ¯ W

T

−σ

2

T/2

−e

−rT

K)

+

_

, (20.4)

where

W

t

is a Brownian motion under P and we write x for P

0

= S

0

. Since

W

T

is a normal

random vairable with mean 0 and variance T, we can write it as

√

TZ, where Z is a standard

mean 0 variance 1 normal random variable.

Now

xe

σ

√

TZ−σ

2

T/2

> e

−rT

K

if and only if

log x +σ

√

TZ −σ

2

T/2 > −r + log K,

86

or if

Z > (σ

2

T/2) −r + log K −log x.

We write z

0

for the right hand side of the above inequality. Recall that 1 −Φ(z) = Φ(−z) for

all z by the symmetry of the normal density. So (20.4) is equal to

1

√

2π

_

∞

z

0

(xe

σ

√

Tz−σ

2

T/2

−e

−rT

K)

+

e

−z

2

/2

dz

= x

1

√

2π

_

∞

z

0

e

−

1

2

(z

2

−2σ

√

Tz+σ

2

T

dz −Ke

−rT 1

√

2π

_

∞

z

0

e

−z

2

/2

dz

= x

1

√

2π

_

∞

z

0

e

−

1

2

(z−σ

√

T)

2

dz −Ke

−rT

(1 −Φ(z

0

))

= x

1

√

2π

_

∞

z

0

−σ

√

T

e

−y

2

/2

dy −Ke

−rT

Φ(−z

0

)

= x(1 −Φ(z

0

−σ

√

T)) −Ke

−rT

Φ(−z

0

)

= xΦ(σ

√

T −z

0

) −Ke

−rT

Φ(−z

0

).

This is the Black-Scholes formula if we observe that σ

√

T −z

0

= g(x, T) and −z

0

= h(x, T).

87

21. Hedging strategies.

The previous section allows us to compute the value of any option, but we would also

like to know what the hedging strategy is. This means, if we know V = EV +

_

T

0

H

s

dS

s

,

what should H

s

be? This might be important to know if we wanted to duplicate an option

that was not available in the marketplace, or if we worked for a bank and wanted to provide

an option for sale.

It is not always possible to compute H, but in many cases of interest it is possible.

We illustrate one technique with two examples.

First, suppose we want to hedge the standard European call V = e

−rT

(S

T

−K)

+

=

(P

T

−e

−rT

K)

+

. We are working here with the risk-neutral probability only. It turns out

it makes no diﬀerence: the deﬁnition of

_

t

0

H

s

dX

s

for a semimartingale X does not depend

on the probability P, other than worrying about some integrability conditions.

We can rewrite V as

V = EV +g(

W

T

),

where

g(x) = (e

σx−σ

2

T/2

−e

−rT

K)

+

−EV.

Therefore the expectation of g(

W

T

) is 0. Recall that under P,

W is a Brownian motion.

If we write g(

W

T

) as

_

T

0

H

s

d

W

s

, (21.1)

then since dP

t

= σP

t

d

W

t

, we have

g(

W

T

) = c +

_

T

0

1

σP

s

H

s

dP

s

. (21.2)

Therefore it suﬃces to ﬁnd the representation of the form (21.1).

Recall from the section on the Markov property that

P

t

f(x) = E

x

f(

W

t

) = Ef(x +

W

t

) =

_

1

√

2πt

e

−(y)

2

/2t

f(x +y)dy.

Let M

t

= E[g(

W

T

) [ T

t

]. By Proposition 4.3, we know that M

t

is a martingale. By the

Markov property Proposition 17.2, we see that

M

t

= E

¯ W

t

[g(

W

T−t

] = P

T−t

g(

W

t

). (21.3)

Now let us apply Ito’s formula with the function f(x

1

, x

2

) = P

x

2

g(x

1

) to the process

X

t

= (X

1

t

, X

2

t

) = (

W

t

, T − t). So we need to use the multidimensional version of Ito’s

formula. We have dX

1

t

= d

W

t

and dX

2

t

= −dt. Since X

2

t

is a decreasing process and has

88

no martingale part, then d¸X

2

)

t

= 0 and d¸X

1

, X

2

)

t

= 0, while d¸X

1

)

t

= dt. Ito’s formula

says that

f(X

1

t

, X

2

t

) = f(X

1

0

, X

2

0

) +

_

t

0

2

i=1

∂f

∂x

i

(X

t

)dX

i

t

+

1

2

_

t

0

2

i,j=1

∂

2

f

∂x

i

∂x

j

(X

t

)d¸X

i

, X

j

)

t

= c +

_

t

0

∂f

∂x

1

(X

t

)d

W

t

+ some terms with dt.

But we know that f(X

t

) = P

T−t

g(

W

t

) = M

t

is a martingale, so the sum of the terms

involving dt must be zero; if not, f(X

t

) would have a bounded variation part. We conclude

M

t

=

_

t

0

∂

∂x

P

T−s

g(

W

s

)d

W

s

.

If we take t = T, we then have

g(

W

T

) = M

T

=

_

T

0

∂

∂x

P

T−s

g(

W

s

)d

W

s

,

and we have our representation.

For a second example, let’s look at the sell-high option. Here the payoﬀ is sup

s≤T

S

s

,

the largest the stock price ever is up to time T. This is T

T

measurable, so we can compute

its value. How can one get the equivalent outcome without looking into the future?

For simplicity, let us suppose the interest rate r is 0. Let N

t

= sup

s≤t

S

s

, the

maximum up to time t. It is not the case that N

t

is a Markov process. Intuitively, the

reasoning goes like this: suppose the maximum up to time 1 is $100, and we want to

predict the maximum up to time 2. If the stock price at time 1 is close to $100, then we

have one prediction, while if the stock price at time 1 is close to $2, we would deﬁnitely

have another prediction. So the prediction for N

2

does not depend just on N

1

, but also

the stock price at time 1. This same intuitive reasoning does suggest, however, that the

triple Z

t

= (S

t

, N

t

, t) is a Markov process, and this turns out to be correct. Adding in the

information about the current stock price gives a certain amount of evidence to predict

the future values of N

t

; adding in the history of the stock prices up to time t gives no

additional information.

Once we believe this, the rest of the argument is very similar to the ﬁrst example.

Let P

u

f(z) = E

z

f(Z

u

), where z = (s, n, t). Let g(Z

t

) = N

t

−EN

T

. Then

M

t

= E[g(Z

T

) [ T

t

] = E

Z

t

[g(Z

T−t

)] = P

T−t

g(Z

t

).

89

We then let f(s, n, t) = P

T−t

g(s, n, t) and apply Ito’s formula. The process N

t

is always

increasing, so has no martingale part, and hence ¸N)

t

= 0. When we apply Ito’s formula,

we get a dS

t

term, which is the martingale term, we get some terms involving dt, which are

of bounded variation, and we get a term involving dN

t

, which is also of bounded variation.

But M

t

is a martingale, so all the dt and dN

t

terms must cancel. Therefore we should be

left with the martingale term, which is

_

t

0

∂

∂s

P

T−s

g(S

s

, N

s

, s)dS

s

,

where again g(s, n, t) = n. This gives us our hedging strategy for the sell-high option, and

it can be explicitly calculated.

There is another way to calculate hedging strategies, using what is known as the

Clark-Haussmann-Ocone formula. This is a more complicated procedure, and most cases

can be done as well by an appropriate use of the Markov property.

90

22. Black-Scholes formula, II.

Here is a second approach to the Black-Scholes formula. This approach works for

European calls and several other options, but does not work in the generality that the

ﬁrst approach does. On the other hand, it allows one to compute more easily what the

equivalent strategy of buying or selling stock should be to duplicate the outcome of the

given option. In this section we work with the actual price of the stock instead of the

present value.

Let V

t

be the value of the portfolio and assume V

t

= f(S

t

, T −t) for all t, where f

is some function that is suﬃciently smooth. We also want V

T

= (S

T

−K)

+

.

Recall Ito’s formula. The multivariate version is

f(X

t

) = f(X

0

) +

_

t

0

d

i=1

f

x

i

(X

s

) dX

i

s

+

1

2

_

t

0

d

i,j=1

f

x

i

x

j

(X

s

) d¸X

i

, X

j

)

s

.

Here X

t

= (X

1

t

, . . . , X

d

t

) and f

x

i

denotes the partial derivative of f in the x

i

direction,

and similarly for the second partial derivatives.

We apply this with d = 2 and X

t

= (S

t

, T − t). From the SDE that S

t

solves,

d¸X

1

)

t

= σ

2

S

2

t

dt, ¸X

2

)

t

= 0 (since T − t is of bounded variation and hence has no

martingale part), and ¸X

1

, X

2

)

t

= 0. Also, dX

2

t

= −dt. Then

V

t

−V

0

= f(S

t

, T −t) −f(S

0

, T) (22.1)

=

_

t

0

f

x

(S

u

, T −u) dS

u

−

_

t

0

f

s

(S

u

, T −u) du

+

1

2

_

t

0

σ

2

S

2

u

f

xx

(S

u

, T −u) du.

On the other hand, if a

u

and b

u

are the number of shares of stock and bonds, respectively,

held at time u,

V

t

−V

0

=

_

t

0

a

u

dS

u

+

_

t

0

b

u

dβ

u

. (22.2)

This formula says that the increase in net worth is given by the proﬁt we obtain by holding

a

u

shares of stock and b

u

bonds at time u. Since the value of the portfolio at time t is

V

t

= a

t

S

t

+b

t

β

t

,

we must have

b

t

= (V

t

−a

t

S

t

)/β

t

. (22.3)

Also, recall

β

t

= β

0

e

rt

. (22.4)

91

To match up (22.2) with (22.1), we must therefore have

a

t

= f

x

(S

t

, T −t) (22.5)

and

r[f(S

t

, T −t) −S

t

f

x

(S

t

, T −t)] = −f

s

(S

t

, T −t) +

1

2

σ

2

S

2

t

f

xx

(S

t

, T −t) (22.6)

for all t and all S

t

. (22.6) leads to the parabolic PDE

f

s

=

1

2

σ

2

x

2

f

xx

+rxf

x

−rf, (x, s) ∈ (0, ∞) [0, T), (22.7)

and

f(x, 0) = (x −K)

+

. (22.8)

Solving this equation for f, f(x, T) is what V

0

should be, i.e., the cost of setting up the

equivalent portfolio. Equation (22.5) shows what the trading strategy should be.

92

23. The fundamental theorem of ﬁnance.

In Section 19, we showed there was a probability measure under which P

t

= e

−rt

S

t

was a martingale. This is true very generally. Let S

t

be the price of a security in today’s

dollars. We will suppose S

t

is a continuous semimartingale, and can be written S

t

=

M

t

+A

t

.

Arbitrage means that there is a trading strategy H

s

such that there is no chance that

we lose anything and there is a positive proﬁt with positive probability. Mathematically,

arbitrage exists if there exists H

s

that is adapted and satisﬁes a suitable integrability

condition with

_

T

0

H

s

dS

s

≥ 0, a.s.

and

P

_

_

T

0

H

s

dS

s

> b

_

> ε

for some b, ε > 0. It turns out that to get a necessary and suﬃcient condition for S

t

to be

a martingale, we need a slightly weaker condition.

The NFLVR condition (“no free lunch with vanishing risk”) is that there do not

exist a ﬁxed time T, ε, b > 0, and H

n

(that are adapted and satisfy the appropriate

integrability conditions) such that

_

T

0

H

n

(s) dS

s

> −

1

n

, a.s.

for all t and

P

_

_

T

0

H

n

(s) dS

s

> b

_

> ε.

Here T, b, ε do not depend on n. The condition says that one can with positive

probability ε make a proﬁt of b and with a loss no larger than 1/n.

Two probabilities P and Q are equivalent if P(A) = 0 if and only Q(A) = 0,

i.e., the two probabilities have the same collection of sets of probability zero. Q is an

equivalent martingale measure if Q is a probability measure, Q is equivalent to P, and S

t

is a martingale under Q.

Theorem 23.1. If S

t

is a continuous semimartingale and the NFLVR conditions holds,

then there exists an equivalent martingale measure Q.

The proof is rather technical and involves some heavy-duty measure theory, so we

will only point examine a part of it. Suppose that we happened to have S

t

= W

t

+ f(t),

where f(t) is a deterministic increasing continuous function. To obtain the equivalent

martingale measure, we would want to let

M

t

= e

−

_

t

0

f

(s)dW

s

−

1

2

_

t

0

(f

(s))

2

ds

.

93

In order for M

t

to make sense, we need f to be diﬀerentiable. A result from measure

theory says that if f is not diﬀerentiable, then we can ﬁnd a subset A of [0, ∞) such

that

_

t

0

1

A

(s)ds = 0 but the amount of increase of f over the set A is positive. This last

statement is phrased mathematically by saying

_

t

0

1

A

(s)df(s) > 0,

where the integral is a Riemann-Stieltjes (or better, a Lebesgue-Stieltjes) integral. Then

if we hold H

s

= 1

A

(s) shares at time s, our net proﬁt is

_

t

0

H

s

dS

s

=

_

t

0

1

A

(s)dW

s

+

_

t

0

1

A

(s) df(s).

The second term would be positive since this is the amount of increase of f over the set

A. The ﬁrst term is 0, since E(

_

t

0

1

A

(s)dW

s

)

2

=

_

t

0

1

A

(s)

2

ds = 0. So our net proﬁt is

nonrandom and positive, or in other words, we have made a net gain without risk. This

contradicts “no arbitrage.” See Note 1 for more on this.

Sometime Theorem 23.1 is called the ﬁrst fundamental theorem of asset pricing.

The second fundamental theorem is the following.

Theorem 23.2. The equivalent martingale measure is unique if and only if the market is

complete.

We will not prove this.

Note 1. We will not prove Theorem 23.1, but let us give a few more indications of what is

going on. First of all, recall the Cantor set. This is where E

1

= [0, 1], E

2

is the set obtained

from E

1

by removing the open interval (

1

3

,

2

3

), E

3

is the set obtained from E

2

by removing

the middle third from each of the two intervals making up E

2

, and so on. The intersection,

E = ∩

∞

n=1

E

n

, is the Cantor set, and is closed, nonempty, in fact uncountable, yet it contains

no intervals. Also, the Lebesgue measure of A is 0. We set A = E. Let f be the Cantor-

Lebesgue function. This is the function that is equal to 0 on (−∞, 0], 1 on [1, ∞), equal to

1

2

on the interval [

1

3

,

2

3

], equal to

1

4

on [

1

9

,

2

9

], equal to

3

4

on [

7

9

,

8

9

], and is deﬁned similarly on

each interval making up the complement of A. It turns out we can deﬁne f on A so that it is

continuous, and one can show

_

1

0

1

A

(s)df(s) = 1. So this A and f provide a concrete example

of what we were discussing.

94

24. American puts.

The proper valuation of American puts is one of the important unsolved problems

in mathematical ﬁnance. Recall that a European put pays out (K − S

T

)

+

at time T,

while an American put allows one to exercise early. If one exercises an American put at

time t < T, one receives (K − S

t

)

+

. Then during the period [t, T] one receives interest,

and the amount one has is (K − S

t

)

+

e

r(T−t)

. In today’s dollars that is the equivalent of

(K−S

t

)

+

e

−rt

. One wants to ﬁnd a rule, known as the exercise policy, for when to exercise

the put, and then one wants to see what the value is for that policy. Since one cannot look

into the future, one is in fact looking for a stopping time τ that maximizes

Ee

−rτ

(K −S

τ

)

+

.

There is no good theoretical solution to ﬁnding the stopping time τ, although good

approximations exist. We will, however, discuss just a bit of the theory of optimal stopping,

which reworks the problem into another form.

Let G

t

denote the amount you will receive at time t. For American puts, we set

G

t

= e

−rt

(K −S

t

)

+

.

Our problem is to maximize EG

τ

over all stopping times τ.

We ﬁrst need

Proposition 24.1. If S and T are bounded stopping times with S ≤ T and M is a

martingale, then

E[M

T

[ T

S

] = M

S

.

Proof. Let A ∈ T

S

. Deﬁne U by

U(ω) =

_

S(ω) if ω ∈ A,

T(ω) if ω / ∈ A.

It is easy to see that U is a stopping time, so by Doob’s optional stopping theorem,

EM

0

= EM

U

= E[M

S

; A] +E[M

T

; A

c

].

Also,

EM

0

= EM

T

= E[M

T

; A] +E[M

T

; A

c

].

Taking the diﬀerence, E[M

T

; A] = E[M

s

; A], which is what we needed to show.

Given two supermartingales X

t

and Y

t

, it is routine to check that X

t

∧ Y

t

is also a

supermartingale. Also, if X

n

t

are supermartingales with X

n

t

↓ X

t

, one can check that X

t

95

is again a supermartingale. With these facts, one can show that given a process such as

G

t

, there is a least supermartingale larger than G

t

.

So we deﬁne W

t

to be a supermartingale (with respect to P, of course) such that

W

t

≥ G

t

a.s for each t and if Y

t

is another supermartingale with Y

t

≥ G

t

for all t, then

W

t

≤ Y

t

for all t. We set τ = inf¦t : W

t

= G

t

¦. We will show that τ is the solution to the

problem of ﬁnding the optimal stopping time. Of course, computing W

t

and τ is another

problem entirely.

Let

T

t

= ¦τ : τ is a stopping time, t ≤ τ ≤ T¦.

Let

V

t

= sup

τ∈T

t

E[G

τ

[ T

t

].

Proposition 24.2. V

t

is a supermartingale and V

t

≥ G

t

for all t.

Proof. The ﬁxed time t is a stopping time in T

t

, so V

t

≥ E[G

t

[ T

t

] = G

t

, or V

t

≥ G

t

. so

we only need to show that V

t

is a supermartingale.

Suppose s < t. Let π be the stopping time in T

t

for which V

t

= E[G

π

[ T

t

].

π ∈ T

t

⊂ T

s

. Then

E[V

t

[ T

s

] = E[G

π

[ T

s

] ≤ sup

τ∈T

s

E[G

τ

[ T

s

] = V

s

.

Proposition 24.3. If Y

t

is a supermartingale with Y

t

≥ G

t

for all t, then Y

t

≥ V

t

.

Proof. If τ ∈ T

t

, then since Y

t

is a supermartingale, we have

E[Y

τ

[ T

t

] ≤ Y

t

.

So

V

t

= sup

τ∈T

t

E[G

τ

[ T

t

] ≤ sup

τ∈T

t

E[Y

τ

[ T

t

] ≤ Y

t

.

What we have shown is that W

t

is equal to V

t

. It remains to show that τ is optimal.

There may in fact be more than one optimal time, but in any case τ is one of them. Recall

we have T

0

is the σ-ﬁeld generated by S

0

, and hence consists of only ∅ and Ω.

96

Proposition 24.4. τ is an optimal stopping time.

Proof. Since T

0

is trivial, V

0

= sup

τ∈T

0

E[G

τ

[ T

0

] = sup

τ

E[G

τ

]. Let σ be a stopping

time where the supremum is attained. Then

V

0

≥ E[V

σ

[ T

0

] = E[V

σ

] ≥ E[G

σ

] = V

0

.

Therefore all the inequalities must be equalities. Since V

σ

≥ G

σ

, we must have V

σ

= G

σ

.

Since τ was the ﬁrst time that W

t

equals G

t

and W

t

= V

t

, we see that τ ≤ σ. Then

E[G

τ

] = E[V

τ

] ≥ EV

σ

= EG

σ

.

Therefore the expected value of G

τ

is as least as large as the expected value of G

σ

, and

hence τ is also an optimal stopping time.

The above representation of the optimal stopping problem may seem rather bizarre.

However, this procedure gives good usable results for some optimal stopping problems. An

example is where G

t

is a function of just W

t

.

97

25. Term structure.

We now want to consider the case where the interest rate is nondeterministic, that

is, it has a random component. To do so, we take another look at option pricing.

Accumulation factor. Let r(t) be the (random) interest rate at time t. Let

β(t) = e

_

t

0

r(u)du

be the accumulation factor. One dollar at time T will be worth 1/β(T) in today’s dollars.

Let V = (S

T

−K)

+

be the payoﬀ on the standard European call option at time T

with strike price K, where S

t

is the stock price. In today’s dollars it is worth, as we have

seen, V/β(T). Therefore the price of the option should be

E

_

V

β(T)

_

.

We can also get an expression for the value of the option at time t. The payoﬀ, in terms

of dollars at time t, should be the payoﬀ at time T discounted by the interest or inﬂation

rate, and so should be

e

−

_

T

t

r(u)du

(S

T

−K)

+

.

Therefore the value at time t is

E

_

e

−

_

T

t

r(u)du

(S

T

−K)

+

[ T

t

_

= E

_

β(t)

β(T)

V [ T

t

_

= β(t)E

_

V

β(T)

[ T

t

_

.

From now on we assume we have already changed to the risk-neutral measure and

we write P instead of P.

Zero coupon. A zero coupon bond with maturity date T pays $1 at time T and nothing

before. This is equivalent to an option with payoﬀ value V = 1. So its price at time t, as

above, should be

B(t, T) = β(t)E

_

1

β(T)

[ T

t

_

= E

_

e

−

_

T

t

r(u)du

[ T

t

_

.

Let’s derive the SDE satisﬁed by B(t, T). Let N

t

= E[1/β(T) [ T

t

]. This is a

martingale. By the martingale representation theorem,

N

t

= E[1/β(T)] +

_

t

0

H

s

dW

s

for some adapted integrand H

s

. So B(t, T) = β(t)N

t

. Here T is ﬁxed. By Ito’s product

formula,

dB(t, T) = β(t)dN

t

+N

t

dβ(t)

= β(t)H

t

dW

t

+N

t

r(t)β(t)dt

= β(t)H

t

dW

t

+B(t, T)r(t)dt,

98

and we thus have

dB(t, T) = β(t)H

t

dW

t

+B(t, T)r(t)dt. (25.1)

Forward rates. We now discuss forward rates. If one holds T ﬁxed and graphs B(t, T) as

a function of t, the graph will not clearly show the behavior of r. One sometimes speciﬁes

interest rates by what are known as forward rates.

Suppose we want to borrow $1 at time T and repay it with interest at time T +ε.

At the present time we are at time t ≤ T. Let us try to accomplish this by buying a zero

coupon bond with maturity date T and shorting (i.e., selling) N zero coupon bonds with

maturity date T +ε. Our outlay of money at time t is

B(t, T) −NB(t, T +ε) = 0.

If we set

N = B(t, T)/B(t, T +ε),

our outlay at time t is 0. At time T we receive $1. At time T +ε we pay B(t, T)/B(t, T +ε).

The eﬀective rate of interest R over the time period T to T +ε is

e

εR

=

B(t, T)

B(t, T +ε)

.

Solving for R, we have

R =

log B(t, T) −log B(t, T +ε)

ε

.

We now let ε → 0. We deﬁne the forward rate by

f(t, T) = −

∂

∂T

log B(t, T). (25.2)

Sometimes interest rates are speciﬁed by giving f(t, T) instead of B(t, T) or r(t).

Recovering B from f. Let us see how to recover B(t, T) from f(t, T). Integrating, we have

_

T

t

f(t, u)du = −

_

T

t

∂

∂u

log B(t, u)du = −log B(t, u) [

u=T

u=t

= −log B(t, T) + log B(t, t).

Since B(t, t) is the value of a zero coupon bond at time t which expires at time t, it is

equal to 1, and its log is 0. Solving for B(t, T), we have

B(t, T) = e

−

_

T

t

f(t,u)du

. (25.3)

99

Recovering r from f. Next, let us show how to recover r(t) from the forward rates. We

have

B(t, T) = E

_

e

−

_

T

t

r(u)du

[ T

t

_

.

Diﬀerentiating,

∂

∂T

B(t, T) = E

_

−r(T)e

−

_

T

t

r(u)du

[ T

t

_

.

Evaluating this when T = t, we obtain

E[−r(t) [ T

t

] = −r(t). (25.4)

On the other hand, from (25.3) we have

∂

∂T

B(t, T) = −f(t, T)e

−

_

T

t

f(t,u)du

.

Setting T = t we obtain −f(t, t). Comparing with (25.4) yields

r(t) = f(t, t). (25.5)

100

26. Some interest rate models.

Heath-Jarrow-Morton model

Instead of specifying r, the Heath-Jarrow-Morton model (HJM) speciﬁes the forward

rates:

df(t, T) = σ(t, T)dW

t

+α(t, T)dt. (26.1)

Let us derive the SDE that B(t, T) satisﬁes. Let

α

∗

(t, T) =

_

T

t

α(t, u)du, σ

∗

(t, T) =

_

T

t

σ(t, u)du.

Since B(t, T) = exp(−

_

T

t

f(t, u)du), we derive the SDE for B by using Ito’s formula with

the function e

x

and X

t

= −

_

T

t

f(t, u)du. We have

dX

t

= f(t, t)dt −

_

T

t

df(t, u)du

= r(t)dt −

_

T

t

[α(t, u)dt +σ(t, u)dW

t

] du

= r(t)dt −

_

_

T

t

α(t, u)du

_

dt −

_

_

T

t

σ(t, u)du

_

dW

t

= r(t)dt −α

∗

(t, T)dt −σ

∗

(t, T)dW

t

.

Therefore, using Ito’s formula,

dB(t, T) = B(t, T)dX

t

+

1

2

B(t, T)(σ

∗

(t, T))

2

dt

= B(t, T)

_

r(t) −α

∗

+

1

2

(σ

∗

)

2

_

dt −σ

∗

B(t, T)dW

t

.

From (25.1) we know the dt term must be B(t, T)r(t)dt, hence

dB(t, T) = B(t, T)r(t)dt −σ

∗

B(t, T)dW

t

.

Comparing with (26.1), we see that if P is the risk-neutral measure, we have α

∗

=

1

2

(σ

∗

)

2

.

See Note 1 for more on this.

Hull and White model

In this model, the interest rate r is speciﬁed as the solution to the SDE

dr(t) = σ(t)dW

t

+ (a(t) −b(t)r(t))dt. (26.2)

Here σ, a, b are deterministic functions. The stochastic integral term introduces random-

ness, while the a − br term causes a drift toward a(t)/b(t). (Note that if σ(t) = σ, a(t) =

a, b(t) = b are constants and σ = 0, then the solution to (26.2) becomes r(t) = a/b.)

101

(26.2) is one of those SDE’s that can be solved explicitly. Let K(t) =

_

t

0

b(u)du.

Then

d

_

e

K(t)

r(t)

_

= e

K(t)

r(t)b(t)dt +e

K(t)

_

a(t) −b(t)r(t)

_

dt +e

K(t)

[σ(t)dW

t

]

= e

K(t)

a(t)dt +e

K(t)

[σ(t)dW

t

].

Integrating both sides,

e

K(t)

r(t) = r(0) +

_

t

0

e

K(u)

a(u)du +

_

t

0

e

K(u)

σ(u)dW

u

.

Multiplying both sides by e

−K(t)

, we have the explicit solution

r(t) = e

−K(t)

_

r(0) +

_

t

0

e

K(u)

a(u)du +

_

t

0

e

K(u)

σ(u)dW

u

_

.

If F(u) is deterministic, then

_

t

0

F(u)dW

u

= lim

F(u

i

)(W

u

i+1

−W

u

i

).

From undergraduate probability, linear combinations of Gaussian r.v.’s (Gaussian = nor-

mal) are Gaussian, and also limits of Gaussian r.v.’s are Gaussian, so we conclude that the

r.v.

_

t

0

F(u)dW

u

is Gaussian. We see that the mean at time t is

Er(t) = e

−K(t)

_

r(0) +

_

t

0

e

K(u)

a(u)du

_

.

We know how to calculate the second moment of a stochastic integral, so

Var r(t) = e

−2K(t)

_

t

0

e

2K(u)

σ(u)

2

du.

(One can similarly calculate the covariance of r(s) and r(t).) Limits of linear combinations

of Gaussians are Gaussian, so we can calculate the mean and variance of

_

T

0

r(t)dt and get

an explicit expression for

B(0, T) = Ee

−

_

T

0

r(u)du

.

Cox-Ingersoll-Ross model

One drawback of the Hull and White model is that since r(t) is Gaussian, it can take

negative values with positive probability, which doesn’t make sense. The Cox-Ingersoll-

Ross model avoids this by modeling r by the SDE

dr(t) = (a −br(t))dt +σ

_

r(t)dW

t

.

102

The diﬀerence from the Hull and White model is the square root of r in the stochastic

integral term. This square root term implies that when r(t) is small, the ﬂuctuations in

r(t) are larger than they are in the Hull and White model. Provided a ≥

1

2

σ

2

, it can be

shown that r(t) will never hit 0 and will always be positive. Although one cannot solve

for r explicitly, one can calculate the distribution of r. It turns out to be related to the

square of what are known in probability theory as Bessel processes. (The density of r(t),

for example, will be given in terms of Bessel functions.)

Note 1. If P is not the risk-neutral measure, it is still possible that one exists. Let θ(t) be a

function of t, let M

t

= exp(−

_

t

0

θ(u)dW

u

−

1

2

_

t

0

θ(u)

2

du) and deﬁne P(A) = E[M

T

; A] for

A ∈ T

T

. By the Girsanov theorem,

dB(t, T) = B(t, T)

_

r(t) −α

∗

+

1

2

(σ

∗

)

2

+σ

∗

θ]dt −σ

∗

B(t, T)d

W

t

,

where

W

t

is a Brownian motion under P. Again, comparing this with (25.1) we must have

α

∗

=

1

2

(σ

∗

)

2

+σ

∗

θ.

Diﬀerentiating with respect to T, we obtain

α(t, T) = σ(t, T)σ

∗

(t, T) +σ(t, T)θ(t).

If we try to solve this equation for θ, there is no reason oﬀ-hand that θ depends only on t and

not T. However, if θ does not depend on T, P will be the risk-neutral measure.

103

Problems

1. Show E[XE[Y [ (] ] = E[Y E[X [ (] ].

2. Prove that E[aX

1

+bX

2

[ (] = aE[X

1

[ (] +bE[X

2

[ (].

3. Suppose X

1

, X

2

, . . . , X

n

are independent and for each i we have P(X

i

= 1)

= P(X

i

= −1) =

1

2

. Let S

n

=

n

i=1

X

i

. Show that M

n

= S

3

n

−3nS

n

is a martingale.

4. Let X

i

and S

n

be as in Problem 3. Let φ(x) =

1

2

(e

x

+e

−x

). Show that M

n

= e

aS

n

φ(a)

−n

is a martingale for each a real.

5. Suppose M

n

is a martingale, N

n

= M

2

n

, and EN

n

< ∞ for each n. Show

E[N

n+1

[ T

n

] ≥ N

n

for each n. Do not use Jensen’s inequality.

6. Suppose M

n

is a martingale, N

n

= [M

n

[, and EN

n

< ∞ for each n. Show

E[N

n+1

[ T

n

] ≥ N

n

for each n. Do not use Jensen’s inequality.

7. Suppose X

n

is a martingale with respect to (

n

and T

n

= σ(X

1

, . . . , X

n

). Show X

n

is

a martingale with respect to T

n

.

8. Show that if X

n

and Y

n

are martingales with respect to ¦T

n

¦ and Z

n

= max(X

n

, Y

n

),

then E[Z

n+1

[ T

n

] ≥ Z

n

.

9. Let X

n

and Y

n

be martingales with EX

2

n

< ∞ and EY

2

n

< ∞. Show

EX

n

Y

n

−EX

0

Y

0

=

n

m=1

E(X

m

−X

m−1

)(Y

m

−Y

m−1

).

10. Consider the binomial asset pricing model with n = 3, u = 3, d =

1

2

, r = 0.1, S

0

= 20,

and K = 10. If V is a European call with strike price K and exercise date n, compute

explicitly the random variables V

1

and V

2

and calculate the value V

0

.

11. In the same model as problem 1, compute the hedging strategy ∆

0

, ∆

1

, and ∆

2

.

12. Show that in the binomial asset pricing model the value of the option V at time k is

V

k

.

13. Suppose X

n

is a submartingale. Show there exists a martingale M

n

such that if

A

n

= X

n

−M

n

, then A

0

≤ A

1

≤ A

2

≤ and A

n

is T

n−1

measurable for each n.

14. Suppose X

n

is a submartingale and X

n

= M

n

+A

n

= M

n

+A

n

, where both A

n

and

A

n

are T

n−1

measurable for each n, both M and M

**are martingales, both A
**

n

and A

n

increase in n, and A

0

= A

0

. Show M

n

= M

n

for each n.

15. Suppose that S and T are stopping times. Show that max(S, T) and min(S, T) are

also stopping times.

104

16. Suppose that S

n

is a stopping time for each n and S

1

≤ S

2

≤ . Show S = lim

n→∞

S

n

is also a stopping time. Show that if instead S

1

≥ S

2

≥ and S = lim

n→∞

S

n

, then S is

again a stopping time.

17. Let W

t

be Brownian motion. Show that e

iuW

t

+u

2

t/2

can be written in the form

_

t

0

H

s

dW

s

and give an explicit formula for H

s

.

18. Suppose M

t

is a continuous bounded martingale for which ¸M)

∞

is also bounded.

Show that

2

n

−1

i=0

(Mi+1

2

n

−M i

2

n

)

2

converges to ¸M)

1

as n → ∞.

[Hint: Show that Ito’s formula implies

(Mi+1

2

n

−M i

2

n

)

2

=

_

(i+1)/2

n

i/2

n

(M

s

−M i

2

n

)dM

s

+¸M) i+1

2

n

−¸M) i

2

n

.

Then sum over i and show that the stochastic integral term goes to zero as n → ∞.]

19. Let f

ε

(0) = f

ε

(0) = 0 and f

ε

(x) =

1

2ε

1

(−ε,ε)

(x). You may assume that it is valid to

use Ito’s formula with the function f

ε

(note f

ε

/ ∈ C

2

). Show that

1

2ε

_

t

0

1

(−ε,ε)

(W

s

)ds

converges as ε → 0 to a continuous nondecreasing process that is not identically zero and

that increases only when X

t

is at 0.

[Hint: Use Ito’s formula to rewrite

1

2ε

_

t

0

1

(−ε,ε)

(W

s

)ds in terms of f

ε

(W

t

) −f

ε

(W

0

) plus a

stochastic integral term and take the limit in this formula.]

20. Let X

t

be the solution to

dX

t

= σ(X

t

)dW

t

+b(X

t

)dt, X

0

= x,

where W

t

is Brownian motion and σ and b are bounded C

∞

functions and σ is bounded

below by a positive constant. Find a nonconstant function f such that f(X

t

) is a martin-

gale.

[Hint: Apply Ito’s formula to f(X

t

) and obtain an ordinary diﬀerential equation that f

needs to satisfy.]

21. Suppose X

t

= W

t

+ F(t), where F is a twice continuously diﬀerentiable function,

F(0) = 0, and W

t

is a Brownian motion under P. Find a probability measure Q under

105

which X

t

is a Brownian motion and prove your statement. (You will need to use the

general Girsanov theorem.)

22. Suppose X

t

= W

t

−

_

t

0

X

s

ds. Show that

X

t

=

_

t

0

e

s−t

dW

s

.

23. Suppose we have a stock where σ = 2, K = 15, S

0

= 10, r = 0.1, and T = 3. Suppose

we are in the continuous time model. Determine the price of the standard European call

using the Black-Scholes formula.

23. Let

ψ(t, x, y, µ) = P(sup

s≤t

(W

s

+µs) = y for s ≤ t, W

t

= x),

where W

t

is a Brownian motion. More precisely, for each A, B, C, D,

P(A ≤ sup

s≤t

(W

s

+µs) ≤ B, C ≤ W

t

≤ D) =

_

D

C

_

B

A

ψ(t, x, y, µ)dy dx.

(ψ has an explicit formula, but we don’t need that here.) Let the stock price S

t

be given

by the standard geometric Brownian motion. Let V be the option that pays oﬀ sup

s≤T

S

s

at time T. Determine the price at time 0 of V as an expression in terms of ψ.

25. Suppose the interest rate is 0 and S

t

is the standard geometric Brownian motion stock

price. Let A and B be ﬁxed positive reals, and let V be the option that pays oﬀ 1 at time

T if A ≤ S

T

≤ B and 0 otherwise.

(a) Determine the price at time 0 of V .

(b) Find the hedging strategy that duplicates the claim V .

26. Let V be the standard European call that has strike price K and exercise date T. Let

r and σ be constants, as usual, but let µ(t) be a deterministic (i.e., nonrandom) function.

Suppose the stock price is given by

dS

t

= σS

t

dW

t

+µ(t)S

t

dt,

where W

t

is a Brownian motion. Find the price at time 0 of V .

106

1. Introduction. In this course we will study mathematical ﬁnance. Mathematical ﬁnance is not about predicting the price of a stock. What it is about is ﬁguring out the price of options and derivatives. The most familiar type of option is the option to buy a stock at a given price at a given time. For example, suppose Microsoft is currently selling today at $40 per share. A European call option is something I can buy that gives me the right to buy a share of Microsoft at some future date. To make up an example, suppose I have an option that allows me to buy a share of Microsoft for $50 in three months time, but does not compel me to do so. If Microsoft happens to be selling at $45 in three months time, the option is worthless. I would be silly to buy a share for $50 when I could call my broker and buy it for $45. So I would choose not to exercise the option. On the other hand, if Microsoft is selling for $60 three months from now, the option would be quite valuable. I could exercise the option and buy a share for $50. I could then turn around and sell the share on the open market for $60 and make a proﬁt of $10 per share. Therefore this stock option I possess has some value. There is some chance it is worthless and some chance that it will lead me to a proﬁt. The basic question is: how much is the option worth today? The huge impetus in ﬁnancial derivatives was the seminal paper of Black and Scholes in 1973. Although many researchers had studied this question, Black and Scholes gave a deﬁnitive answer, and a great deal of research has been done since. These are not just academic questions; today the market in ﬁnancial derivatives is larger than the market in stock securities. In other words, more money is invested in options on stocks than in stocks themselves. Options have been around for a long time. The earliest ones were used by manufacturers and food producers to hedge their risk. A farmer might agree to sell a bushel of wheat at a ﬁxed price six months from now rather than take a chance on the vagaries of market prices. Similarly a steel reﬁnery might want to lock in the price of iron ore at a ﬁxed price. The sections of these notes can be grouped into ﬁve categories. The ﬁrst is elementary probability. Although someone who has had a course in undergraduate probability will be familiar with some of this, we will talk about a number of topics that are not usually covered in such a course: σ-ﬁelds, conditional expectations, martingales. The second category is the binomial asset pricing model. This is just about the simplest model of a stock that one can imagine, and this will provide a case where we can see most of the major ideas of mathematical ﬁnance, but in a very simple setting. Then we will turn to advanced probability, that is, ideas such as Brownian motion, stochastic integrals, stochastic diﬀerential equations, Girsanov transformation. Although to do this rigorously requires measure theory, we can still learn enough to understand and work with these concepts. We then 2

return to ﬁnance and work with the continuous model. We will derive the Black-Scholes formula, see the Fundamental Theorem of Asset Pricing, work with equivalent martingale measures, and the like. The ﬁfth main category is term structure models, which means models of interest rate behavior. I found some unpublished notes of Steve Shreve extremely useful in preparing these notes. I hope that he has turned them into a book and that this book is now available. The stochastic calculus part of these notes is from my own book: Probabilistic Techniques in Analysis, Springer, New York, 1995. I would also like to thank Evarist Gin´ who pointed out a number of errors. e

3

the set with no elements.2. T T } happened. . Review of elementary probability. ∩ (intersection). We will only work with discrete models at ﬁrst. and A1 . Typically. Deﬁnition 2. i=1 i=1 Here Ac = {ω ∈ Ω : ω ∈ A} denotes the complement of A. In this case it is trivial to show that F is a σ-ﬁeld. HT }. {T H. or we would know whether the event {T H. So Ω = {HH. the capital Greek letter “omega. we would only know whether the event {HH. T T }. HT. But if we let G = {∅. HT } is in G. which we will denote by Ω.” We are given a class F of subsets of Ω. while G is the σ-ﬁeld where you “know” only the result of the ﬁrst toss but not the second. We will use without special comment the usual notations of ∪ (union). F will consist of all subsets of Ω. A collection F of subsets of Ω is called a σ-ﬁeld if (1) (2) (3) (4) ∅ ∈ F. 4 . We start with an arbitrary set. Let’s begin by recalling some of the deﬁnitions and basic concepts of elementary probability.” In this example. . We require F to be a σ-ﬁeld. One has to check the deﬁnition. ∈ (is an element of). but to illustrate. These are called events. On the other hand. F is the σ-ﬁeld where you “know” everything. ⊂ (contained in). {T H}. But the complement is {T H. T T }}. {HH. in an elementary probability course. so we require the complement of that set to be in G as well.1. T T } and that event is indeed in G. A2 . which means we would know that the ﬁrst toss was a heads. but we will later need to distinguish between various σ-ﬁelds. Ω ∈ F. A ∈ F implies Ac ∈ F. but to try to add to the intuition. Here is an example. We would then know which of the events {HH}. or {T T } has happened and so would know what the two tosses of the coin showed. We won’t try to be precise here. Ω. called the probability space. in which case we would know that the ﬁrst toss was a tails. that / is. ∅ denotes the empty set. if we know which events in G happened. A typical σ-ﬁeld F would be the collection of all subsets of Ω. then G is also a σ-ﬁeld. the event {HH. ∈ F implies both ∪∞ Ai ∈ F and ∩∞ Ai ∈ F. Suppose one tosses a coin two times and lets Ω denote all possible outcomes. since every subset is in F. Much more on this later. T H. One point of view which we will explore much more fully later on is that the σ-ﬁeld tells you what events you “know. {HT }. But there is no way to tell what happened on the second toss from knowing which events in G happened. suppose one knows whether an event in F has happened or not for a particular outcome. The third basic ingredient is a probability. HT } happened. .

then 0 ≤ P(A) ≤ 1. let X be the number of heads in the two tosses. thus (X = x) means the same as {ω : X(ω) = x}. In the discrete case. and if A1 . x2 . Phrased another way. say. it is enough that (X = a) ∈ F for all reals a. Someone who has had measure theory will realize that a σ-ﬁeld is the same thing as a σ-algebra and a probability is a measure of total mass one. then we know Y . we can compute the value of Y . {HH}. Here is an example. A function P on F is a probability if it satisﬁes (1) (2) (3) (4) if A ∈ F. The reason for this is that if x1 . To see this. are the values of 5 . and P(∅) = 0. Then if Y is G-measurable. The notion of measurability has a simple deﬁnition but is a bit subtle. which means that {ω : X(ω) ≥ a} ∈ F for all reals a. to check measurability with respect to a σ-ﬁeld F.2.Deﬁnition 2. Therefore X is measurable with respect to F. There are a number of conclusions one can draw from this deﬁnition. . if a = 2 .) is a function X from Ω to R. is one where P(ω : X(ω) = a) = 0 for all but countably many a’s. P(Ω) = 1. A random variable (abbreviated r.. a1 . . This event will equal if a ≤ 0. if A ⊂ B. 3 For example. a2 . HT. namely. So X is not measurable with respect to the σ-ﬁeld G. the reals. . T H} if 0 < a ≤ 1. {HH} if 1 < a ≤ 2. ∅ if 2 < a. then if Y is G-measurable. A discrete r. to be a r. A collection of sets Ai is pairwise disjoint if Ai ∩ Aj = ∅ unless i = j. .v. A2 . . then P(A) ≤ P(B) and P(Ac ) = 1 − P(A). . To be more precise. In deﬁning sets one usually omits the ω. then the event where the number of heads is 3 or greater is the 2 event where we had two heads. In the example above where we tossed a coin two times.v. suppose we know whether or not the event has occurred for each event in G. . If we take the point of view that we know all the events in G. Now observe that for each a the event Aa is in F because F contains all subsets of Ω. X must also be measurable. and i P(ω : X(ω) = ai ) = 1. See Note 1 at the end of this section for a proof. ∈ F are pairwise disjoint. then P(∪∞ Ai ) = i=1 ∞ i=1 P(Ai ). As one example. .v. let us consider Aa = {ω ∈ Ω : X(ω) ≥ a}. . 3 However it is not true that Aa is in G for every value of a – take a = 2 as just one example – the subset {HH} is not in G. Then X is F measurable but not G measurable. Ω {HH.

. 6 . X. if X can take an inﬁnite number of values (but countable). Y ∈ B) means “and. Two random variables X and Y are independent if P(X ∈ A. then (X ≥ a) ∈ F. then this is a ﬁnite sum and of course it will converge. Set EX = X(ω)P({ω}). . . Two σ-ﬁelds F and G are independent if A and B are independent whenever A ∈ F and B ∈ G.v. ij } is a subset of {1. Given a discrete r. A r. . such as E (X + Y ) = E X + E Y . . Y ∈ B) = P((X ∈ A) ∩ (Y ∈ B)). We say two events A and B are independent if P(A ∩ B) = P(A)P(B). n=1 2 · 2 There is an alternate deﬁnition of expectation which is equivalent in the discrete setting. The only ﬁnite solutions to the equation x = x2 are x = 0 and x = 1. . are immediate. . However. then we can write (X ≥ a) = ∪xi ≥a (X = xi ) and we have a countable union.x for which P(X = x) = 0. An are independent if P(Ai1 ∩ · · · ∩ Aij ) = P(Ai1 ) · · · P(Aij ) whenever {i1 . The advantage of the second deﬁnition is that some properties of expectation. while with the ﬁrst deﬁnition they require quite a bit of proof. . So if (X = xi ) ∈ F. then E X = ∞ n −n = ∞.. If A is an event that is independent of itself. . n}. the expectation or mean is deﬁned by EX = x xP(X = x) provided the sum converges. For example. convergence needs to be checked. . if P(X = 2n ) = 2−n for n = 1. A common misconception is that an event is independent of itself. look at Note 2 at the end of the section. If X only takes ﬁnitely many values. Y ∈ B) = P(X ∈ A)P(X ∈ B) for all A and B that are subsets of the reals. .” Thus P(X ∈ A. The comma in the expression P(X ∈ A. The extension of the deﬁnition of independence to the case of more than two events or random variables is not surprising: A1 . X and a σ-ﬁeld G are independent if P((X ∈ A) ∩ B) = P(X ∈ A)P(B) whenever A is a subset of the reals and B ∈ G. so an event is independent of itself only if it has probability 0 or 1. . 2. . This is the situation that we will consider for quite some time.v. then P(A) = P(A ∩ A) = P(A)P(A) = (P(A))2 . . ω∈Ω To see that this is the same. .

HT }.As an example. {HH. T T }} and G2 = {∅. T T }}. Xn are n independent r. Suppose X1 .s X and Y are independent. {T H.v. which says that E (XY ) = (E X)(E Y ) provided all the expectations are ﬁnite.s. It is an easy consequence of the multiplication theorem that if X and Y are independent. {HT. The random variable Sn = i=1 Xi is called a binomial r. Ω. we have the multiplication theorem. See Note 3 for a proof. . . {HH. We close this section with a deﬁnition of conditional probability. for example. The conditional expectation of X given B is deﬁned to be E [X.. Ω. and represents. such that for each one P(Xi = 1) = p. (Here we are writing P(HH) 4 when a more accurate way would be to write P({HH}). a set other than ∅ or Ω in G2 will be the event that the second toss is a heads or that the second toss is a tails. If two r. the number of successes in n trials.v. n P(Xi = 0) = 1 − p. P(B) provided P(B) = 0. where p ∈ [0. suppose we toss a coin two times and we deﬁne the σ-ﬁelds G1 = {∅. B] .v. P(B) 7 . written P(A | B) is deﬁned by P(A ∩ B) . k!(n − k)! The variance of a random variable is Var X = E [(X − E X)2 ]. T H}. . An important result in probability is that P(Sn = k) = n! pk (1 − p)n−k . then that is the event that the ﬁrst toss is a heads or it is the event that the ﬁrst toss is a tails. 1]. Then G1 and G2 are independent if P(HH) = P(HT ) = P(T H) = P(T T ) = 1 . .) An easy way to understand this is that if we look at an event in G1 that is not ∅ or Ω. where the probability of a success is p. This is also equal to E [X 2 ] − (E X)2 . The probability of A given B. Similarly. The expression E [X 2 ] is sometimes called the second moment of X. Var (X + Y ) = Var X + Var Y.

8 . B] frequently.1) P(B) = P(C ∪ D) = P(C) + P(D) ≥ P(C) = P(A). B] is E [X. The other equality we mentioned is proved by letting C = A and D = Ac . Let C = A and D = B − A. Solving for P(Ac ). B] means E [X1B ]. Let A1 = C.1) by Deﬁnition 2. (We will use the notation E [X. Therefore Deﬁnition 2. where B − A is deﬁned to be c B ∩ A (this is frequently written B \ A as well). Let us show the two deﬁnitions of expectation are the same (in the discrete case). Starting with the ﬁrst deﬁnition we have EX = x xP(X = x) x x {ω∈Ω:X(ω)=x} = = P({ω}) X(ω)P({ω}) x {ω∈Ω:X(ω)=x} = ω∈Ω X(ω)P({ω}). Note 2. Another way of writing E [X. Then the Ai are pairwise disjoint and ∞ P(C ∪ D) = P(∪∞ Ai ) i=1 = i=1 P(Ai ) = P(C) + P(D) (2. A2 = D. Now suppose A ⊂ B. Then C and D are disjoint. and Ai = ∅ for i ≥ 3. and 1 = P(Ω) = P(C ∪ D) = P(C) + P(D) = P(A) + P(Ac ).provided P(B) = 0.) Note 1. The notation E [X.2(4) holds when there are only two sets instead of inﬁnitely many. Then C and D are disjoint. we have P(Ac ) = 1 − P(A). and a similar argument shows the same is true when there are an arbitrary (but ﬁnite) number of sets. B] = ω∈B X(ω)P({ω}).2(3) and (4). Suppose we have two disjoint sets C and D. where 1B (ω) is 1 if ω ∈ B and 0 otherwise. and by (2.

.and we end up with the second deﬁnition. Since 1Ai 1Bj = 1Ai ∩Bj . . Ai = (X = xi ) is independent of Bj = (Y = yj ) and so E [XY ] = i j xi yj P(Ai )P(Bj ) xi P(Ai ) i j = = i yj P(Bj ) xi P(Ai )E Y = (E X)(E Y ). and so XY = i j xi yi 1Ai 1Bj . Suppose X can takes the values x1 .. . Y = j y j 1B j . it follows that E [XY ] = i j xi yj P(Ai ∩ Bj ). Then X= i xi 1Ai . and Y can take the values y1 . y2 . assuming the double sum converges. . x2 . Note 3. 9 . . Since X and Y are independent. Let Ai = {ω : X(ω) = xi } and Bj = {ω : Y (ω) = yj }. .

50. HT T }. S c smoker and nonsmoker. P(Bi ) In short. B2 . If a person is chosen at random. so this deﬁnition will need to be extended when we get to continuous models. we will make conditional probabilities random. The way to do that is to make conditional probability a random variable rather than a number. HHT. W be man.1 are called ﬁnitely (or countably) generated and are said to be generated by the sets B1 . HT H. etc. respectively. Then the conditional probability of A given G is P(A | G) = i P(S | W ) = . Thus we are going to talk about the conditional probability of an event given a σ-ﬁeld. Ω is equal to their union. M.. . and G is the σ-ﬁeld one obtains by taking all ﬁnite or countable unions of the Bi . T T H. then the conditional probability that the person is a smoker given that it is a man is 70 divided by 200. or 35%. . Suppose Ω consists of the possible results when we toss a coin three times: HHH. Let M. T HT. 70 of the men are smokers. What is the precise deﬁnition? Deﬁnition 3. on the set Bi the conditional probability is equal to P(A | Bi ). .35. σ-ﬁelds that can be represented as in Deﬁnition 3. Suppose we have 200 men and 100 women.50)1W and use that for our conditional probability. Ω. P(A ∩ Bi ) 1Bi (ω). So on the set M its value is .50. So F1 consists of those events that can be determined by knowing the result of the ﬁrst toss. so what we do is let G be the σ-ﬁeld consisting of {∅. while the conditional probability the person is a smoker given that it is a women is 50 divided by 100. Conditional expectation.3. all having positive probability. We want to let F2 denote those events that can be determined by knowing the ﬁrst two tosses. . T T T }. such that they are pairwise disjoint. We need to give this random variable a name. To reiterate. respectively. HHT. or 50%. Let F3 denote all subsets of Ω. Let’s look at another example. We will want to be able to encompass both facts in a single entity. Not every σ-ﬁeld can be so represented. We introduce the random variable (. woman. Suppose there exist ﬁnitely (or countably) many sets B1 . We have P(S | M ) = . .. Let F1 consist of the sets ∅. This will 10 . {HHH. and {T HH.35)1M + (. Ω. and S.35 and on the set W its value is .1. . W } and denote this random variable P(S | G). and 50 of the women are smokers. B2 .

Given a random variable X. HHT }. E [X | G] is G measurable. Bi ] E [X | G] = 1Bi . {T HH. So F2 is the σ-ﬁeld consisting of all possible unions 1 of some of the Di ’s. Proposition 3. HT T }. On the set C2 the conditional probability is P(A∩C2 )/P(C2 ) 8 2 4 = P(∅)/P(C2 ) = 0. On the set C1 the conditional probability is P(A∩C1 )/P(C1 ) = P(HHH)/P(C1 ) = 1 / 1 = 1 . T T T }. First the conditional probability given F1 . In fact. This is not enough to make F2 a σ-ﬁeld. 4 Next let us calculate P(A | F2 ). T T H. Suppose Y = 2 · 1B1 + 3 · 1B2 + 6 · 1B3 + 4 · 1B4 and a = 3. P(Bi ) i This is the obvious deﬁnition. as above. Bi ] 1Bi = P(Bi ) bi 1Bi i if we set bi = E [X. so we add to F2 all sets that can be obtained by taking unions of these sets. {HT H. 4. T HT. 3. Y = E [X | G] = i E [X. 11 . P(A | 8 2 Di ) = 0 for i = 2. This is plausible – the probability of getting three heads given the ﬁrst toss is 1 if the ﬁrst toss is a heads and 0 otherwise. Let D1 = {HHH. these properties are crucial to what follows and there is no choice but to master them. What about conditional expectation? Recall E [X. But the union of any collection of the Bi is in G. HT H. Bi ] = E [X1Bi ] and also that E [1B ] = 1 · P(1B = 1) + 0 · P(1B = 0) = P(B). T HT }. Also. Let C1 = {HHH. T T T }. HT T } and C2 = {T HH. and P(A | F3 ) when A is the event {HHH}. Proof. those Bi for which bi ≥ a. {T T H. if Y = E [X | G].5. HHT. T T T }. D4 = {T T H. By the deﬁnition. Let us calculate P(A | F1 ). Then (Y ≥ a) = B3 ∪ B4 . they are! However. An example might help. T HT }. The set (Y ≥ a) is a union of some of the Bi . Some of the following propositions may seem a bit technical. namely. that is. P(A | D1 ) = P(HHH)/P(D1 ) = 1 / 4 = 1 . {HHH. which is in G. Bi ]/P(Bi ). and it agrees with what we had before because E [1A | G] should be equal to P(A | G).50)1D1 . We now turn to some properties of conditional expectation. P(A | F2 ). D2 = {HT H. then (Y > a) is a set in G for each real a. HHT }. we deﬁne E [X.2. Therefore P(A | F1 ) = (. So P(A | F2 ) = (. This is again plausible – the probability 1 of getting three heads given the ﬁrst two tosses is 2 if the ﬁrst two tosses were heads and 0 otherwise. Ω. Suppose we tossed the coin independently and suppose that it was fair.25)1C1 . HT T }.include the sets ∅. D3 = {T HH.

it is not possible for an ω to be in more than one of the Bi . B2 ]/P(B2 ). So 3P(B2 ) = E [X. C] = E [X. Multiplying Y in the above example by 1B2 . Let D1 = (Z = 1). C] = E [X1C ] = E [X(1B2 + 1B4 )] = E [X1B2 ] + E [X1B4 ] = E [X. then E [Y . so 1B2 1Bi = 0 if i = 2. the right hand side will be 0 + 3 · 1 + 0 + 0. By the ﬁrst part. B2 ] = E [Y 1B2 ] = E [3 · 1B2 ] = 3E [1B2 ] = 3P(B2 ). Since Y = E [X. 3. Bj ] = Now if C = Bj1 ∪ · · · ∪ Bjn ∪ · · ·. Take a = 7. C] = E [Y . B2 ] = E [X. Note that we can write Z = 1 · 1D1 + 3 · 1D2 + 4 · 1D3 + 7 · 1D4 . B2 ] + E [X. it is E [X. if ω ∈ D2 . we see that E [Y . D3 = (Z = 4). P(B2 ) just as we wanted. then E [X. 12 .v. If C ∈ G and Y = E [X | G]. Y is G measurable. 7. C] = E [X. C]. then (Z ≥ a) ∈ G for each a. this equals E [Y . summing the above over the jk gives E [Y .Bi ] P(Bi ) 1Bi and the Bi are disjoint. and we see D4 ∈ G. If C = B1 ∪ B4 . B2 ]+E [Y . and let us do the case where C = B2 . D2 = (Z = 3). P(Bj ) E [Y .Proposition 3. 4. then for any a we have (Y = a) ∈ G which means that (Y = a) is the union of one or more of the Bi . we then write E [X. On the other hand. B4 ]. Bj ]. D4 = (Z = 7). C]. Bj ] E 1Bj = E [X. Taking a = 3 shows D2 ∪ D3 ∪ D4 ∈ G. Suppose Z takes only the values 1. and we undo the above string of equalities but with Y instead of X to see that this is E [Y . Now if Z is G measurable. B2 ] P(B2 ) = E [X. To see this. If a r.3. Let us look at the above example for this proposition. it follows that Y must be constant on each Bi . Note 1B2 1B2 = 1B2 because the product is 1 · 1 = 1 if ω is in B2 and 0 otherwise. C]. However the number 3 is not just any number. Take a = 4 and we see D3 ∪ D4 ∈ G. Again let us look at an example. which agrees with Z(ω). B4 ]. for example. C]. Proof. Since the Bi are disjoint. for example.

6 says that as far as conditional expectations with respect to a σﬁeld G go. Accepting this for the moment. (5) If X is independent of G. then the predicted value of X1 should be larger than the predicted value of X2 . We still restrict ourselves to the discrete case. The following propositions contain the main facts about this new deﬁnition of conditional expectation that we will need. then E [X | G] = E X. D1 ∈ G. Let the value of Z on Bi be zi . D3 = B3 ∪ B6 ∪ B7 . C] = E [X. (1) If X1 ≥ X2 .2 and 3. and D4 = B5 . we can give an interpretation of (1)-(5). D2 = B2 ∪ B4 . then E [XZ | G] = ZE [X | G]. We will see in Proposition 3. D3 ∈ G. Proposition 3. Similarly D2 . So Z = i zi 1Bi . Because sets in G are unions of the Bi ’s. (3) says that if we know G and X is G measurable. Bi ].5 in Note 1 at the end of the section.c Now D3 = (D3 ∪ D4 ) ∩ D4 . then E [X1 | G] ≥ E [X2 | G]. then we know X and our best prediction of X is X itself. In this context. Then zi P(Bi ) = E [Z. if it so happened that D1 = B1 . (1) says that if X1 is larger than X2 . (5) says that if knowing G gives us no additional information on X. (4) says that the average of the predicted value of X should be the average value of X. For example.5. we must have Z constant on the Bi ’s. (2) says that the predicted value of X1 + X2 should be the sum of the predicted values. At this point it is more fruitful to understand what the proposition says.4. C] whenever C ∈ G. or zi = E [X. G-measurable random variables act like constants: they can be taken inside or outside the conditional expectation at will.3 uniquely determine E [X | G]. the properties given in Propositions 3. then Z = 1 · 1B1 + 3 · 1B2 + 4 · 1B3 + 3 · 1B4 + 7 · 1B5 + +4 · 1B6 + 4 · 1B7 . this time to Note 2. Proposition 3. Proof. If Z is G measurable. We again defer the proof. so since G is a σ-ﬁeld. Bi ]/P(Bi ) as required. then E [X | G] = X. Then Z = E [X | G]. (2) E [aX1 + bX2 | G] = aE [X1 | G] + bE [X2 | G]. Suppose Z is G measurable and E [Z. Proposition 3. Since Z is G measurable. 13 . Bi ] = E [X. We will prove Proposition 3. (3) If X is G measurable. then Z must be constant on each Bi . Proposition 3.8 below that we may think of E [X | G] as the best prediction of X given G.6. then the best prediction for the value of X is just E X. (4) E [E [X | G]] = E X.

It is H measurable. Proof.5(3). We also used the fact that Y is G measurable. since H ⊂ G. if we are predicting a prediction of X given limited information. Proposition 3.8. and will be equal if and only if Z = Y . a predictor Z is just another random variable. The right hand side is bigger than or equal to E [(X − Y )2 ] because (Y − Z)2 ≥ 0. using Proposition 3. We compute. this is the same as a single prediction given the least amount of information. Proof. Taking expectations and using Proposition 3. If H ⊂ G ⊂ F. So the error in predicting X by Z is larger than the error in predicting X by Y . If X is a r. So Y is the best predictor.7. C] as required. then E [E [X | H] | G] = E [X | H] = E [E [X | G] | H]. let W be the right hand expression. Let us verify that conditional expectation may be viewed as the best predictor of a random variable given a σ-ﬁeld.Proposition 3. If X is a r. C] = E [E [X | G]. E [(X − Z)2 ] = E [(X − Y )2 ] + E [(Y − Z)2 ]. 14 . To get the right hand equality. hence G measurable. C] = E [X. then E [W .. the best predictor among the collection of G-measurable random variables is Y = E [X | G].5(3) and Proposition 3.5(4). E [X | H] is H measurable. and if C ∈ H ⊂ G.v. Let Z be any G-measurable random variable.v. In words. E [(X − Z)2 | G] = E [X 2 | G] − 2E [XZ | G] + E [Z 2 | G] = E [X 2 | G] − 2ZE [X | G] + Z 2 = E [X 2 | G] − 2ZY + Z 2 = E [X 2 | G] − Y 2 + (Y − Z)2 = E [X 2 | G] − 2Y E [X | G] + Y 2 + (Y − Z)2 = E [X 2 | G] − 2E [XY | G] + E [Y 2 | G] + (Y − Z)2 = E [(X − Y )2 | G] + (Y − Z)2 . The left hand equality now follows by Proposition 3.. and the goodness of the prediction will be measured by E [(X − Z)2 ]. which is known as the mean square error.6.

To prove (3). then Z is G measurable and E [X. . that is. Let Z = E X.4 we need to show its expectation over sets C in G is the same as that of XZ. that is. Note 2. By Proposition 3. if C ∈ G. so clearly G measurable.3. this is trivial. If Y is a discrete random variable. just as we wished. let its value be zi . By Proposition 3. In this context.4 we see Z = E [X | G]. then E [X.4 it follows that Z = E [X | G]. . C] = (E X)(P(C)) since Z is constant.5. C] = E X. we let Bi = (Y = yi ). As in the proof of Proposition 3. Then E [ZE [X | G]. Bi ] = zi E [X. 15 . Note that ZE [X | G] is G measurable. (1) and (2) are immediate from the deﬁnition.this proves (3). so we must show E [Y (X − Y )] = 0. the conditional expectation Y = E [X | G] is equal to the projection of X onto the subspace of G-measurable random variables. then σ(Y ) is a σ-ﬁeld. These will be disjoint sets whose union is Ω.6. it takes only countably many values y1 . it suﬃces to consider only the case when C is one of the Bi . Last is (5). so by Proposition 3. Taking expectations. We prove Proposition 3. Bi ] as desired. The collection of all random variables is a linear space. To prove (4). if we let C = Ω and Y = E [X | G]. C] for any C ∈ G. note that if Z = X. We write E [X | Y ] for E [X | σ(Y )].There is one more interpretation of conditional expectation that may be useful. C] = E [X1C ] = (E X)(E 1C ) = (E X)(P(C)). It is easy to see that this is the smallest σ-ﬁeld with respect to which Y is measurable. C] = E [X. y2 . We prove Proposition 3. E [Y (X − Y )] = E [E [Y (X − Y ) | G] ] = 0. If σ(Y ) is the collection of all unions of the Bi . the inner product of X1 and X2 is deﬁned to be E [X1 X2 ]. and the collection of all G-measurable random variables is clearly a subspace. we write X = Y + (X − Y ). hence it is constant on Bi .. Z is constant. To see this. Given X. Note E [Y (X − Y ) | G] = Y E [X − Y | G] = Y (E [X | G] − Y ) = Y (Y − Y ) = 0. By the independence. Bi ] = E [XZ. and what we have to check is that the inner product of Y and X − Y is 0. Bi ] = E [zi E [X | G]. . Y and X − Y are orthogonal. then E Y = E [Y . Note 1. But E [Z. and is called the σ-ﬁeld generated by Y . C] = E [Z. Bi ] = zi E [E [X | G]. Now Z is G measurable.

4. and (3) for all n E [Mn+1 | Fn ] = Mn . be a sequence of random variables and let Fk be the σ-ﬁeld generated by X1 . The nomenclature may seem like it goes the wrong way. 16 . .) Note that the deﬁnition of martingale depends on the collection of σ-ﬁelds. . This is an issue when there is more than one probability around. A martingale Mn is a sequence of random variables such that (1) Mn is integrable for all n. For example. An example would be repeatedly tossing a coin and letting Fk be the sets that can be determined by the ﬁrst k tosses. Xk are measurable. it is more than an analogy: we won’t explore this. and it is (3) that is the crucial property. . Fn ) is a martingale. so a martingale depends on the probability as well.1. If we have (1) and (2). security prices and one’s wealth will turn out to be examples of martingales. then we say Mn is a submartingale. Suppose we have a sequence of σ-ﬁelds F1 ⊂ F2 ⊂ F3 · · ·. Doob deﬁned these terms by analogy with the notions of subharmonic and superharmonic functions in analysis.2. Deﬁnition 4. . . A third example is to let X1 . one can say that (Mn . one needs a probability. . Given an increasing sequence of σ-ﬁelds Fn . but it turns out that the composition of a subharmonic function with Brownian motion yields a submartingale. . . . We will see that martingales are ubiquitous in ﬁnancial math.’s Xn is adapted if Xn is Fn measurable for each n. we will say that Mn is a martingale with respect to the probability P.v. If we have (1) and (2). (Actually. and similarly for superharmonic functions. A r. Martingales. Xk . Submartingales tends to increase and supermartingales tend to decrease.v. (4. . To deﬁne conditional expectation. Deﬁnition 4. but instead of (3) we have (3 ) for all n E [Mn+1 | Fn ] ≤ Mn . (2) Mn is adapted to Fn . . but instead of (3) we have (3 ) for all n E [Mn+1 | Fn ] ≥ Mn . a sequence of r. Another example is to let Fk be the events that are determined by the values of a stock at times 1 through k.1) Usually (1) and (2) are easy to check. then we say Mn is a supermartingale. When it is needed for clarity. X is integrable if E |X| < ∞. the smallest σ-ﬁeld with respect to which X1 . When we need to. X2 .

Xn . . where we used the independence. Let Mn = i=1 Xi . Again (1) and (2) of Deﬁnition 4. If it turns up heads you double your fortune. so −E [|X| | F] ≤ E [X | F] ≤ E [|X| | F]. A third example: Suppose you start with a dollar and you are tossing a fair coin independently.2(1) is satisﬁed. or Mn is a martingale. Note 0 ≤ Mn ≤ 2n . this presupposes that E |X1 | is ﬁnite. let X1 . Xn . using the independence. .2(2) is easy to see. where Sn = i=1 Xi . Here is an example of a martingale. . . we obtain E [Mn+1 | Fn ] = Mn . E [2Xn+1 Sn | Fn ] = 2Sn E [Xn+1 | Fn ] = 2Sn E Xn+1 = 0. . and so Deﬁnition 4. Then E [Mn+1 | Fn ] = Mn E [Xn+1 | Fn ] = Mn E Xn+1 = Mn . so we state it as a proposition. Another example: suppose in the above that the Xk all have variance 1. . 2 2 We have E [Sn | Fn ] = Sn since Sn is Fn measurable.’s with mean 0 that are independent. .The word “martingale” is also used for the piece of a horse’s bridle that runs from the horse’s head to its chest. . To formalize this. . . be independent r.v. . . the n σ-ﬁeld generated by X1 . (4. . . Xn ). The word also refers to a gambling system. Before we give our fourth example. . Substituting. (4. Let Fn be the σ-ﬁeld generated by X1 .v. We now check E [Mn+1 | Fn ] = X1 + · · · + Xn + E [Xn+1 | Fn ] = Mn + E Xn+1 = Mn . while (2) is easy. n Since E |Mn | ≤ i=1 E |Xi |. note E Xn+1 = 1.2) follows. It keeps the horse from raising its head too high.” Let Mn be your fortune at time n. I did some searching on the Internet. This is “double or nothing.2(1) also holds. .2) To see this. Then 2 2 Mn = X1 · · · Xn . be a sequence of independent r. We compute 2 2 E [Mn+1 | Fn ] = E [Sn + 2Xn+1 Sn + Xn+1 | Fn ] − (n + 1). 2 2 And E [Xn+1 | Fn ] = E Xn+1 = 1. Let X1 .2 are easy to check. Since E [|X| | F] is nonnegative.) Set Fn = σ(X1 . (Saying a r.v. X2 . and let n 2 Mn = Sn − n. and there seems to be no consensus on the derivation of the term. To compute the conditional expectation. Xi has mean 0 is the same as saying E Xi = 0. let us observe that |E [X | F]| ≤ E [|X| | F]. . we have −|X| ≤ X ≤ |X|. . Deﬁnition 4. 17 .’s that are equal to 2 with probability 1 and 0 with probability 1 . Deﬁnition 4. Our fourth example will be used many times. tails you go broke. X2 . It turns out that martingales in probability cannot get too large.

.3. Proof. Then Mn is a martingale.2).v.2(1). . be given and let X be a ﬁxed r. with E |X| < ∞. this shows Deﬁnition 4. F2 . Deﬁnition 4. . Let F1 . 18 . Let Mn = E [X | Fn ]. while E |Mn | ≤ E [E [|X| | Fn ]] = E |X| < ∞ by (4. We have E [Mn+1 | Fn ] = E [E [X | Fn+1 ] | Fn ] = E [X | Fn ] = Mn .Proposition 4.2(2) is clear.

we have τ = ∞.5. Note (τ ≤ k) = ∪k (τ = j). then the event (τ ≤ k) ∈ Fk j=0 for all k. you don’t need to have been there before or to look ahead. continue on to the city limits. (We will use the convention that the minimum of an empty set is +∞. For ordinary expectations rather than conditional expectations. A mapping τ from Ω into the nonnegative integers is a stopping time if (τ = k) ∈ Fk for each k. Proposition 5. if g is convex and the expectations exist. You don’t know when you ﬁrst get to the second stop light before the city limits that you get to stop there. the second set does not. One sometimes allows τ to also take on the value ∞. The ﬁrst set of instructions forms a stopping time. When it comes to discussing American options. so. This is a stopping time because (τ = k) = (S0 . Since (τ = j) ∈ Fj ⊂ Fk . We can think of a stopping time as the ﬁrst time something happens. . Our ﬁrst result is Jensen’s inequality. Here is an intuitive description of a stopping time. σ = max{k : Sk ≥ A}. either you must have been there before or else you have to go past where you are supposed to stop. on the event that Sk is never in A. . this is still true. for example. An example is τ = min{k : Sk ≥ A}. if τ is a r. then (τ = k) = (τ ≤ k) − (τ ≤ k − 1). We already know some special cases of this: when g(x) = |x|. Sk−1 < A. with the above deﬁnition of τ . 19 . and such a τ must be a stopping time. . If g is convex.v. . then g(E X) ≤ E [g(X)]. Properties of martingales. which we know because E X 2 − (E X)2 = E (X − E X)2 ≥ 0. and then turn around and come back two stop lights. with (τ ≤ k) ∈ Fk for all k. this says (E X)2 ≤ E X 2 . Sk ≥ A) ∈ Fk . If I tell you to drive to the city limits and then drive until you come to the second stop light after that. is not a stopping time. the last time. S1 . this says |E X| ≤ E |X|.1. we will need the concept of stopping times. Conversely. then (τ = k) ∈ Fk . Since (τ ≤ k) ∈ Fk and (τ ≤ k − 1) ∈ Fk−1 ⊂ Fk . But if I tell you to drive until you come to the second stop light before the city limits. you know when you get there that you have arrived. then g(E [X | G]) ≤ E [g(X) | G] provided all the expectations exist. That is. when g(x) = x2 .

then g(Mn ) is a submartingale. We have E [MN . By Jensen’s inequality.2 and the fact that Mk = E [Mk+1 | Fk ]. Proof. then the sum will be K E [Mn . N = k] = E [Mk+2 . We have E MN = k=0 K E [MN .. If Mn is a martingale and g is convex. Since Mk+1 = E [Mk+2 | Fk+1 ]. and Mn is a martingale. Now (N = k) is in Fk . N = k] = E [Mk+1 . and we relegate the proof to Note 1 below. Proof. Then E MN = E MK . One reason we want Jensen’s inequality is to show that a convex function applied to a martingale yields a submartingale. N = k] = E Mn k=0 as desired. N = k]. the statement of the result is more important than the proof.3. E [g(Mn+1 ) | Fn ] ≥ g(E [Mn+1 | Fn ]) = g(Mn ).2 tells us that E [Mk+1 . N = k]. If we show that the k-th summand is E [Mn . Theorem 5. Doob’s optional stopping theorem says the same thing holds when ﬁxed times n are replaced by stopping times. N is a stopping time such that N ≤ K a. If Mn is a martingale. Proposition 5. Here.For Proposition 5.2. Suppose K is a positive integer. N = k] by the deﬁnition of MN . 20 . N = k]. so by Proposition 2. one ﬁrst ﬁnds N (ω) and then evaluates M· (ω) for that value of N . N = k] = E [Mk . Proposition 2. We have (N = k) ∈ Fk ⊂ Fk+1 .s. So E M0 = E M1 = · · · = E Mn . provided all the expectations exist. N = k]. E [Mk . to evaluate MN .1 as well as many of the following propositions. then E Mn = E [E [Mn+1 | Fn ]] = E Mn+1 .

then c(E [X | G]) is G measurable. N = k]. Now take the conditional expectation with respect to G. and we obtain E [MN . The ﬁrst term on the right is G measurable. then the graph of g lies above all the tangent lines. We then have g(X) ≥ g(E [X | G]) + c(E [X | G])(X − E [X | G]). In the case where g is not diﬀerentiable. N = k] = E [Mk+1 . It is easy to see that the sequence M1 . If g is convex. we let c(x0 ) = g (x0 ). Note 2. using (N = k) ∈ Fk ⊂ Fk+1 ⊂ Fk+2 . Apply this with x = X(ω) and x0 = E [X | G](ω). Mn+1 is also a submartingale. If Mn is a nonnegative submartingale. 1 (a) P(maxk≤n Mk ≥ λ) ≤ λ E Mn . Note 1. M2 . We prove Theorem 5. this is essentially the left hand derivative. . . We prove Proposition 5. the same result holds for submartingales. for example. Even if g does not have a derivative at x0 .We continue. see Note 2 below. Then P(max Mk ≥ λ) = P(N ≤ n) k≤n 21 . Let N = min{k : Mk ≥ λ} ∧ (n + 1). The second term on the right is equal to c(E [X | G])E [X − E [X | G] | G] = 0. the ﬁrst time that Mk is greater than or equal to λ.) One can check that if c is so chosen.4. . Set Mn+1 = Mn . b). so remains the same. For the proof. there is a line passing through x0 which lies beneath the graph of g. N = k] = · · · = E [Mn . then we choose c to be the left hand upper derivate. . (For those who are not familiar with derivates.1. N = k] = E [Mk . If we change the equalities in the above to inequalities. As a corollary we have two of Doob’s inequalities: Theorem 5. So for each x0 there exists c(x0 ) such that g(x) ≥ g(x0 ) + c(x0 )(x − x0 ). If g is diﬀerentiable.4. 2 2 (b) E (maxk≤n Mk ) ≤ 4E Mn . where a ∧ b = min(a.

If we multiply (5. there is nothing to prove. N ≤ n] = k=0 ∗ 2 1≤k≤n 2 Mk ] n ≤E k=1 2 Mk < ∞.3.N ≤ n λ 1 1 = E [MN ∧n . N = k] = E [Mn . we have 2 2 2 E Mk = E [E [Mn | Fk ]2 ] ≤ E [E [Mn | Fk ] ] = E Mn < ∞ for k ≤ n. λ λ (5. we obtain ∞ ∞ 2λP(M ∗ ≥ λ)dλ ≤ 2 0 0 E [Mn : M ∗ ≥ λ] ∞ = 2E 0 Mn 1(M ∗ ≥λ) dλ M∗ = 2E Mn = 2E [Mn M ]. If it is ﬁnite. ∞ E [Mk∧n . N = k] ≤ E [Mn . then by Jensen’s inequality. Let us write M ∗ for maxk≤n Mk . E [Mk∧n . this is bounded by 2 2(E Mn )1/2 (E (M ∗ )2 )1/2 . then MN ≥ λ. The last expression is at most E [Mn . Now P(max Mk ≥ λ) = E [1(N ≤n) ] ≤ E k≤n MN . E MN ∧n ≤ E Mn . M ∗ ≥ λ]. since Mn is a submartingale.1) Finally. N ≤ n]. N ≤ n] ≤ k=0 ∞ E [Mn . and so E [MN ∧n . If E Mn = ∞. N = k].and if N ≤ n. Using Cauchy-Schwarz. N = k]. Arguing as in the proof of Theorem 5. 2 We now look at (b). 0 ∗ dλ 22 . Then E (M ) = E [ max We have E [MN ∧n . N ≤ n] ≤ E MN ∧n .1) by 2λ and integrate over λ from 0 to ∞.

and obtain (b). We want to show that maxn Un < ∞ a. We divide both sides by (E (M ∗ )2 )1/2 . E |Mn | ≤ c < ∞ for some c not depending on n suﬃces. Let a < b be two rationals. That is. Note 3. We can write n+1 n+1 2K ≥ Nn − NSn+1 ∧n = k=1 (NSk+1 ∧n − NTk ∧n ) + k=1 (NTk ∧n − NSk ∧n ). and so on. let Nn = (Mn − a)+ . Let n → ∞ to see that E maxn Un < ∞. then Sn+1 > n..s. (The hypothesis of boundedness can be weakened. Recall we showed E (M ∗ )2 < ∞. T1 = min{k > S1 : Nk ≥ b − a}. square both sides. The expectation of the ﬁrst sum on the right and the last term are greater than or equal to zero by optional stopping. b) shows that almost surely Mn cannot oscillate.On the other hand. Then limn→∞ Mn exists a. S2 = min{k > T1 : Nk ≤ 0}. and hence must converge. The only possibility is that it might oscillate. What might go wrong is that Mn might be larger than b inﬁnitely often and less than a inﬁnitely often. ∞ ∞ 2λP(M ∗ ≥ λ)dλ = E 0 0 2λ1(M ∗ ≥λ) dλ M∗ =E 0 2λ dλ = E (M ∗ )2 . If we show the probability of this is 0. If Xn is a sequence of nonnegative random variables converging to X a. then E X ≤ supn E Xn . 23 . Fix a < b. which implies maxn Un < ∞ a. for example. |Mn | ≤ K for all n. it can’t tend to +∞ or −∞.. We therefore have 2 E (M ∗ )2 ≤ 2(E Mn )1/2 (E (M ∗ )2 )1/2 . Note by Jensen’s inequality Nn is a submartingale. We will state Fatou’s lemma in the following form. then taking the union over all pairs of rationals (a. Let Un = max{k : Tk ≤ n}. The middle term is larger than (b − a)Un .) Theorem 5.s.s.5. This formulation is equivalent to the classical one and is better suited for our use. so we conclude (b − a)E Un ≤ 2K. Note 4.s. and let S1 = min{k : Nk ≤ 0}. which is what we needed. We will show that bounded martingales converge. Proof. Now take expectations. Since Mn is bounded. Since S1 < T1 < S2 < · · ·. Suppose Mn is a martingale bounded in absolute value by K. Un is called the number of upcrossings up to time n.

6. and looks at it again one day later. where x+ is max(x. you do not have enough money to buy the stock. so we are going to allow only one time step: one makes an investment. Let’s suppose the price of a European call option is V0 and see what conditions one can put on V0 . One thing you could do is buy one option. you simply withdraw money from that account. The principal question to be answered is: what is the value V0 of the option at time 0? In other words. So the value of the option at time 1 is V1 = (S1 − K)+ . that is. Similarly you can buy a negative number of options. sell an option. Q < 1. and when you borrow. immediately turn around and sell the stock for price S1 and make a proﬁt of S1 − K. If S1 is less than K. If V0 < ∆0 S0 . Alternatives to the bank are money market funds or bonds. you can use the option at time 1 to buy the stock at price K. then at time 1 you must buy one share at whatever the market price is at that time and turn it over to the person that you sold the stock short to. K is called the strike price. Let us begin by giving the simplest possible model of a stock and see how a European call option should be valued in this context. If you sell one share of stock short. you can also put your money in the bank where one will earn interest at rate r. If V0 > ∆0 S0 . We are looking at the simplest possible model. In either case. where P + Q = 1. at this point you have V0 − ∆0 S0 in 24 . We assume that you can borrow at the same interest rate r. One way to make it seem more realistic is to assume you have a large amount of money on deposit. how much should one pay for a European call option with strike price K? It is possible to buy a negative number of shares of a stock. If S1 is greater than K. not exactly a totally realistic assumption. The other thing you could do is use the money to buy ∆0 shares of stock.” After one time unit the stock price will be either uS0 with probability P or else dS0 with probability Q. This is equivalent to selling shares of a stock you don’t have and is called selling short. A European call option in this context is the option to buy one share of the stock at time 1 at price K. The one step binomial asset pricing model. and you make up the shortfall by borrowing money from the bank. 0). which is the same as borrowing. Let d and u be two numbers with 0 < d < 1 < u. Suppose you start out with V0 dollars. You can also deposit a negative amount of money in the bank. then the option is worthless at time 1. Let S1 be the price of the stock at time 1. the key point is that these are considered to be risk-free. We will assume 0 < P. Suppose we have a single stock whose price is S0 . Here “d” is a mnemonic for “down” and “u” for “up. there will be some money left over and you put that in the bank. Instead of purchasing shares in the stock.

Let ∆0 = and we will also need W0 = 1 + r − d u u − (1 + r) d 1 V1 + V1 . If the stock goes up. of V1u . use the money to buy ∆0 shares.2) and (6. We need to show ∆0 uS0 + (1 + r)(V0 − ∆0 S0 ) = V1u + (1 + r)(V0 − W0 ).2) (6.3) (6. and if it goes down. ∆0 dS0 + (1 + r)(V0 − ∆0 S0 ).1) is equal to V1u − V1d (u − (1 + r)) + (1 + r)V0 . i. the second being similar. you would now have V1d + (1 + r)(V0 − W0 ). and put the rest in the bank 25 .1) is equal to V1u − 1 + r − d u u − (1 + r) d V1 + V1 + (1 + r)V0 . Let’s check the ﬁrst of these. Let V1u = (uS0 − K)+ and V1d = (dS0 − K)+ . uS0 − dS0 In a moment we will do some algebra and see that if the stock goes up and you had bought stock instead of the option you would now have V1u + (1 + r)(V0 − W0 ). sell one option for V0 dollars. not random. and of V1d agree in (6.1) Now check that the coeﬃcients of V0 . Suppose that V0 > W0 . at time 1 you will have ∆0 uS0 + (1 + r)(V0 − ∆0 S0 ).3). 1+r u−d u−d V1u − V1d . while if the stock went down..the bank and ∆0 shares of stock.e. ∆0 S0 (u − (1 + r)) + (1 + r)V0 = u−d The right hand side of (6. Let us do that now. u−d u−d (6. The left hand side of (6. We have not said what ∆0 should be. What you want to do is come along with no money. Note these are deterministic quantities.

One could let V1u and Vd1 be any two values of any option. At this point. and while doing so. Now most people believe that you can’t make a proﬁt on the stock market without taking a risk. The way it works is this: suppose W0 = $3. you clear (1 + r)(W0 − V0 ). If he doesn’t want to exercise the option. you give him one share of stock and sell the rest. say at $3. In other words.1. But then a third person would decide to sell the option for less than your competition but more than W0 . So we may suppose 1 + r < u. Note also that the price V0 = W0 does not depend on P or Q. and deposit or make up the shortfall from the bank. Remember it is possible to have a negative number of shares. Remark 6. since one can always do better by putting money in the bank. The name for this is “no free lunch. We will examine this problem of pricing options in more complicated contexts.2. u−d then p. whether the stock goes up or down. which are paid out if the 26 . You will have cleared (1 + r)(V0 − W0 ). we have shown that the only reasonable price for the European call option is W0 . whether the stock went up or down.(or borrow if necessary). Then someone else would observe this and decide to sell the same option at a price less than V0 but larger than W0 . The “no arbitrage” condition is not just a reﬂection of the belief that one cannot get something for nothing. This person would still make a proﬁt. this is larger than W0 and you would earn V0 − W0 = $2 per option without risk. This time. you just do the opposite: sell ∆0 shares of stock short. This would continue as long as any one would try to sell an option above price W0 .” or “arbitrage opportunities do not exist. it will become apparent where the formulas for ∆0 and W0 came from.50.” The only way to avoid this is if V0 = W0 . we want to make a few observations. It does depend on p and q. Thus p and q act like probabilities. First of all. q ≥ 0 and p + q = 1. one would never buy stock. which seems to suggest that there is an underlying probability which controls the option price and is not the one that governs the stock price. There is nothing special about European call options in our argument above. Suppose you could sell options at a price V0 = $5. but they have nothing to do with P and Q. say $4. and customers would go to him and ignore you because they would be getting a better deal. you sell your shares of stock and pocket the money. If V0 < W0 . if 1 + r > u. If the buyer of your option wants to exercise the option. Remark 6. buy one option. If we set p= 1+r−d . We always have 1 + r ≥ 1 > d. u−d q= u − (1 + r) . It also represents the belief that the market is freely competitive. with no risk.

) Suppose instead of P and Q being the probabilities of going up and down. and we would always put the money in the bank if u ≤ 1 + r. V0 = 1+r This will be generalized later. and remains constant with probability R. they were in fact p and q. One would then expect to have (pu + qd)S0 and then divide by 1 + r. then some algebra shows that 1 E V1 . One then divides by 1 + r to get the value of the stock in today’s dollars. (r. can also be considered the rate of inﬂation. that is. and most of the time three equations in two unknowns cannot be solved. If Q = 0. Remark 6. If one could replicate this outcome by buying and selling shares of the stock. the risk-free interest rate. If we let P be the probability so that S1 = uS0 with probability p and S1 = dS0 with probability q and we let E be the corresponding expectation. then one expects at time 1 to have (P u + Qd)S0 .6. Our model allows after one time step the possibility of the stock going up or going down. then the “no arbitrage” rule would give the exact value of the call option in this model.3. Substituting the values for p and q. except in very special circumstances. Similar considerations apply when P = 0. in terms of V1u . A dollar tomorrow is equivalent to 1/(1 + r) dollars today. the market is called complete in this model. What if instead there are 3 (or more) possibilities.”) There are however only two variables. one would expect to have the same amount of money one started with. but only these two options. Remark 6. (dS0 − K)+ . In our model we ruled out the cases that P or Q were zero. respectively. under which the stock price is a martingale. ∆0 and V0 at your disposal. that the stock goes up a factor u with probability P . as we would always do better. not necessarily the one you observe. and the theory falls apart. still with the assumption that p and q are the correct probabilities. If one buys one share of stock at time 0. The corresponding price of a European call option would be (uS0 − K)+ . down a factor d with probability Q. It is interesting to note that 27 . we are certain that the stock will go up. if p and q were the correct probabilities. and V1c . or (S0 − K)+ . (The “c” is a mnemonic for “constant. this reduces to S0 . In other words.5.4. This is a special case of the fundamental theorem of ﬁnance: there always exists some probability. we will see that the generalization of this fact is that the stock price at time n is a martingale. One has three equations one wants to satisfy. If in some model one can do this for any option. V1d . one cannot do this.stock goes up or down. then we would always invest in the stock if u > 1 + r. Suppose for example. Remark 6. where P + Q + R = 1. When we get to the binomial asset pricing model with more than one step. But. Remark 6. The above analysis shows we can exactly duplicate the result of buying any option V by instead buying some shares of stock.

28 .the cases where P = 0 or Q = 0 are the only ones in which our derivation is not valid. It turns out that in more general models the true probabilities enter only in determining which events have probability 0 or 1 and in no other way.

” it is P that governs the price. Thus Yk is the factor the stock price goes up or down at time k. Our ﬁrst result is the fundamental theorem of ﬁnance in the current context. . . Then P(Y1 = y1 . let Yk = Sk /Sk−1 . The “Black-Scholes” formula we will obtain is already a nontrivial result that is useful. Using the independence the conditional expectation on the right is equal to E [Sk+1 /Sk ] = pu + qd = 1 + r. (1) (2) (3) (4) We need to set up the probability model.1. In this section we will obtain a formula for the pricing of options when there are n time steps. Proposition 7. 29 . . then down by a factor d.it will turn out that using the principle of “no arbitrage. Let (1 + r) − d u − (1 + r) . To see this. Ω will be all sequences of length n of H’s and T ’s. . if T . . Unlimited short selling of stock Unlimited borrowing No transaction costs Our buying and selling is on a small enough scale that it does not aﬀect the market. p= q= u−d u−d and deﬁne P(ω) = pj q n−j if ω has j appearances of H and n − j appearances of T . S k . . we have E [(1 + r)−(k+1) Sk+1 | Fk ] = (1 + r)−k Sk (1 + r)−1 E [Sk+1 /Sk | Fk ]. Proof. where j is the number of the yk that are equal to u. then the stock price goes up by a factor u. Yn = yn ) = pj q n−j . Under P the discounted stock price (1 + r)−k Sk is a martingale. .7. but each time the stock can only go up by a factor u or down by a factor d. (What we are doing is saying that if the j-th element of the sequence making up ω is an H. S0 will be a ﬁxed number and we deﬁne Sk (ω) = uj dk−j S0 if the ﬁrst k elements of a given ω ∈ Ω has j occurrences of H and k − j occurrences of T . . The P we construct may not be the true probabilities of going up or down.) Fk will be the σ-ﬁeld generated by S0 . That doesn’t matter . The multi-step binomial asset pricing model. On the other hand. We observe that under P the random variables Sk+1 /Sk are independent and equal to u with probability p and d with probability q. this is equal to P(Y1 = y1 ) · · · P(Yn = yn ). Since the random variable Sk+1 /Sk is independent of Fk . Let E denote the expectation corresponding to P. We assume the following.

Since E [Wk+1 − Wk | Fk ] = ∆k E [Sk+1 − Sk | Fk ] = 0. This is a discrete version of a stochastic integral. by Proposition 4. Wk is the wealth process. ∆0 . For simplicity let us ﬁrst consider the case r = 0.3 we see that Vk is a martingale. Therefore Wk+1 = ∆k Sk+1 + (1 + r)[Wk − ∆k Sk ]. Under P the discounted wealth process (1 + r)−k Wk is a martingale.Substituting yields the proposition. Note that in the case where r = 0 we have Wk+1 − Wk = ∆k (Sk+1 − Sk ). Wk − ∆k Sk . is called the portfolio process. The result follows. then at time k + 1 those shares will be worth ∆k Sk+1 .. . . Our next result is that the binomial model is complete. If we have ∆k shares between times k and k + 1. . i. Let W0 be the amount of money you start with and let Wk be the amount of money you have at time k. it follows that in the case r = 0 that Wk is a martingale. Proof. We want to construct a portfolio process. 30 . At time k + 1 this is worth (1 + r)[Wk − ∆k Sk ]. We require ∆k to be Fk measurable. More generally Proposition 7. that is.e. Let Vk = E [V | Fk ]. It is easy to lose the idea in the algebra. or k Wk+1 = W0 + i=0 ∆i (Si+1 − Si ). so ﬁrst let us try to see why the theorem is true. ∆1 . The amount of cash we hold between time k and k + 1 is Wk minus the amount held in stock. Let ∆k be the number of shares held between times k and k + 1. We have (1 + r)−(k+1) Wk+1 = (1 + r)−k Wk + ∆k [(1 + r)−(k+1) Sk+1 − (1 + r)−k Sk ].2. Observe that E [∆k [(1 + r)−(k+1) Sk+1 − (1 + r)−k Sk | Fk ] = ∆k E [(1 + r)−(k+1) Sk+1 − (1 + r)−k Sk | Fk ] = 0.

Let Vk = (1 + r)k E [(1 + r)−n V | Fk ]. only two possible values for Vk+1 . where each ti is an H or T . We show the ﬁrst equality. . tn ). tn . .1) (See Note 1. tn ) Set W0 = V0 . By Proposition 4. . .choose ∆k ’s. Recall that Wk is also a martingale. . . If ω = (t1 . hence is Fk measurable. H. tn ). . . At the (k + 1)-st step there are only two possible changes for the price of the stock and so since Vk+1 is Fk+1 measurable. . . H. which is why the system can be solved. . . H. let ∆k (ω) = Vk+1 (t1 . the second being similar. . . so we drop the t’s from the notation. In the following proof we allow r ≥ 0. tk+2 . tk will be ﬁxed. Sk+1 (t1 . . . T. we can trade shares of stock to exactly duplicate the outcome of any option V . . . . . . . The precise meaning of this is the following. We need to choose ∆k so that Wk+1 = Vk+1 for each of these two possibilities. and t1 . tn ) − Vk+1 (t1 . . . The ﬁrst thing to show is that ∆k is Fk measurable. . . . Now let us turn to the details. . . . tk+2 . . tk . . . 1+r (7. Proof. We know (1 + r)−k Vk is a martingale under P so that Vk = E [(1 + r)−1 Vk+1 | Fk ] 1 = [pVk+1 (H) + qVk+1 (T )]. tn ) − Sk+1 (t1 . . .3. and we will show by induction that the wealth process at time k equals Vk . tk+2 . . . T. . . If we write Vk+1 (H). tk . to play with to match up two numbers. Neither Sk+1 nor Vk+1 depends on tk+2 . . starting with W0 dollars. . The binomial asset pricing model is complete. . We only have one parameter. so that Wn = V . tk . . tk+2 . . tk . . But both V and W are martingales. tn play no role in the rest of the proof. . 31 . We will do it inductively by arranging matters so that Wk = Vk for all k. . . this is an abbreviation for Vk+1 (t1 . . Then using induction we have Wn = Vn = V as required. If V is any random variable that is Fn measurable. Theorem 7.) We now suppose Wk = Vk and want to show Wk+1 (H) = Vk+1 (H) and Wk+1 (T ) = Vk+1 (T ). . . . . ∆k . So ∆k depends only on the variables t1 . . In other words.3 (1 + r)−k Vk is a martingale. tn ) . . . . tk . tk . . tk+2 . . Suppose we have Wk = Vk at time k and we want to ﬁnd ∆k so that Wk+1 = Vk+1 . there exists a constant W0 and a portfolio process ∆k so that the wealth process Wk satisﬁes Wn = V . . which may seem like an overconstrained system of equations. Now tk+2 .

we would do the same except buy an option. that can’t happen. If c0 were less than W0 . if the option V could be sold at a price c0 larger than W0 . Finally. so the price of the option V must be W0 . we could obtain a riskless proﬁt. and have a net proﬁt. The one we have in mind is the European call. That is.Wk+1 (H) = ∆k Sk+1 (H) + (1 + r)[Wk − ∆k Sk ] = ∆k [uSk − (1 + r)Sk ] + (1 + r)Vk Vk+1 (H) − Vk+1 (T ) Sk [u − (1 + r)] + pVk+1 (H) + qVk+1 (T ) (u − d)Sk = Vk+1 (H).4. hold −∆k shares at time k. By the “no arbitrage” rule. Remark 7. there is no diﬃculty computing the price of a European call. what portfolio process) to use. We can construct a portfolio process ∆k so that if we start with W0 = (1+r)−n E V . . Proof. and again make a riskless proﬁt. If we could buy or sell the option V at a price other than W0 .. of (1 + r)n (c0 − W0 ). we would sell the option for c0 dollars. but the argument is the same for any option whatsoever. Note that the proof of Theorem 7. = We are done. Let V be any option that is Fn -measurable. Therefore the price of the European call is n (1 + r) −n k=0 (uk dn−k S0 − K)+ 32 n k pk q n−k . use W0 to buy and sell stock according to the portfolio process ∆k . We have E (Sn − K)+ = x (x − K)+ P(Sn = x) and P(Sn = x) = n k pk q n−k if x = uk dn−k S0 .5. then the wealth at time n will equal V . no matter what the market does in between.e. at no risk. Theorem 7. have a net worth of V + (1 + r)n (c0 − W0 ) at time n. The value of the option V at time 0 is V0 = (1 + r)−n E V . we obtain the Black-Scholes formula in this context. meet our obligation to the buyer of the option by using V dollars. In the binomial asset pricing model. for which V = (Sn − K)+ .4 tells you precisely what hedging strategy (i.

This V is still Fn measurable. You can even do this if the maximum comes before the minimum. you are allowed to wait until time n and look back to see what the maximum and minimum were. d = 1 . It is interesting that even without using options. If V is a European call with strike price K and exercise date n. 2 S0 = 10.. i=1.4 holds for exotic options as well. where you do not look into the future to determine ∆k .. you can duplicate the operation of buying low and selling high by holding an appropriate number of shares ∆k at time k. and ∆2 .. and K = 15... Let (1 + r) − d u − (1 + r) = . you sell the stock for the maximum value it takes during the ﬁrst n time steps and you buy at the minimum value the stock takes.The formula in Theorem 7. Suppose V = max Si − min Sj .. r = 0.. p= q= u−d u−d The following table describes the values of the stock. 33 . and the probabilities for each possible outcome ω. so the theory applies. let us compute explicitly the random variables V1 and V2 and calculate the value V0 .n j=1. and the price of such a V will be quite high.6.1. sell high” option is very desirable. ∆1 . Consider the binomial asset pricing model with n = 3. u = 2. Let us also compute the hedging strategy ∆0 . such a “buy low. Naturally. the payoﬀ V . = .n In other words.. Let us look at an example of a European call so that it is clear how to do the calculations.4.

5454. V2 (T H) = (1 + r)−1 5p = 1.7576.8182.3333. V1 (T ) = (1 + r)−2 5pq = . so we have V1 (H) = (1 + r)−2 (65p2 + 10pq) = 10.0. V2 (T T ) = 0.2424. S3 (HHH) − S3 (HHT ) V3 (HT H) − V3 (HT T ) ∆2 (HT ) = = . Sk+1 (H) − Sk+1 (T ) V2 (HT ) = (1 + r)−1 5p = 1. ∆1 (H) = V2 (HH) − V2 (HT ) = . S3 (HT H) − S3 (HT T ) V3 (T HH) − V3 (T HT ) ∆2 (T H) = = . V1 (H) − V1 (T ) = .0. S2 (T H) − S2 (T T ) Vk+1 (H) − Vk+1 (T ) .2074.5785. S2 (HH) − S2 (HT ) ∆2 (HH) = ∆1 (T ) = V2 (T H) − V2 (T T ) = . V1 = (1 + r)−2 E [V | F1 ]. S1 (H) − S1 (T ) V3 (HHH) − V3 (HHT ) = 1.9917. V2 = (1 + r)−1 E [V | F2 ]. S3 (T HH) − S3 (T HT ) V3 (T T H) − V3 (T T T ) ∆2 (T T ) = = 0. The formula for ∆k is given by ∆k = so ∆0 = where V1 and S1 are as above.6391.3333.ω S1 S2 10u2 10u2 10ud 10ud 10ud 10ud 10d2 10d2 S3 10u3 10u2 d 10u2 d 10ud2 10u2 d 10ud2 10ud2 10d3 V 65 5 5 0 5 0 0 0 Probability p3 p2 q p2 q pq 2 p2 q pq 2 pq 2 q3 HHH 10u HHT 10u HTH 10u HTT 10u THH 10d THT 10d TTH 10d TTT 10d We then calculate V0 = (1 + r)−3 E V = (1 + r)−3 (65p3 + 15p2 q) = 4.8182. S3 (T T H) − S3 (T T T ) 34 . so we have V2 (HH) = (1 + r)−1 (65p + 5q) = 24.

. By independence this is Vk+1 (s1 · · · sk H)P(s1 · · · sk )p + Vk+1 (s1 · · · sk T )P(s1 · · · sk )q. Let us give a more rigorous proof of (7. A] = E [pVk+1 (H) + qVk+1 (T ). s1 · · · sk H] + E [Vk+1 . . it suﬃces to show this for A = {ω = (t1 t2 · · · tn ) : t1 = s1 . which is what we wanted. The right hand side of (7. and the value of Vk+1 is Vk+1 (T ). tk = sk }. By linearity. then E [Vk+1 . s1 · · · sk T ] = Vk+1 (s1 · · · sk H)P(s1 · · · sk H) + Vk+1 (s1 · · · sk T )P(s1 · · · sk T ). it says that one has a heads with probability p and the value of Vk+1 is Vk+1 (H) and one has tails with probability q.1) is not entirely obvious. The second equality is (7. A]. where s1 s2 · · · sk is any sequence of H’s and T ’s. s1 · · · sk ] = E [Vk+1 .1) is Fk measurable.1). . 35 . Intuitively.Note 1. . so we need to show that if A ∈ Fk . Now E [Vk+1 .

Certainly g(x) = (x−K)+ is such a function. Thus our payoﬀ is (1 + r)−τ g(Sτ ). So your wealth is ST − K ≤ (ST − K)+ . In other words.) Here is the more rigorous argument.1) . You pay K dollars. On the other hand. we work with P. one can decide to pay K dollars and obtain a share of stock. (8. on a European call. one can only use it to buy a share of stock at the expiration time T . that is. This is a random quantity. In fact. Therefore an American call is worth no more than a European call. suppose you decide to exercise early. and your wealth is St − K. 36 0 ≤ λ ≤ 1. One can always wait until time T to exercise an American call. and hence its value must be the same as that of a European call. not on what is going to happen in the future. We have g(λx) = g(λx + (1 − λ) · 0) ≤ λg(x) + (1 − λ)g(0) = λg(x). and at time T you have one share of stock worth ST . at any time before time T .8. The problem of ﬁnding such a τ is called an optimal stopping problem. You hold onto the stock. and thus we are looking for the stopping time τ such that τ ≤ n and E (1 + r)−τ g(Sτ ) is as large as possible. What we want to do is ﬁnd the stopping time that maximizes the expected value of this random variable. so the value must be at least as great as that of a European call. In present day dollars. because selling stock gives you some money on which you will receive interest. We will show that τ ≡ n is the solution to the above optimal stopping problem: the best time to exercise is as late as possible. because you lost the interest on your K dollars that you would have received if you had waited to exercise until time T . Suppose that if you exercise the option at time k. As usual. so it may be advantageous to exercise early. For example. you have (1 + r)−k g(Sk ). You have to make a decision on when to exercise the option. Suppose g(x) is convex with g(0) = 0. and we exercise the option at time τ (ω). (A put is the option to sell a stock at a price K at time T . after correcting for inﬂation. and that decision can only be based on what has already happened. American options. An American option is one where you can exercise the option any time before some ﬁxed time T . receive one share of stock. giving a more rigorous argument in a moment. we have strict inequality. while for an American call. Let us give an informal argument on how to price an American call. and for which you paid K dollars. your payoﬀ is g(Sk ). This argument does not work for puts. we have to choose a stopping time τ .

and is one of the major unsolved problems in ﬁnancial mathematics. 37 . By optional stopping. So (1 + r)−k g(Sk ) is a submartingale. Although good approximations are known. This is also convex function.By Jensen’s inequality. but this time g(0) = 0. so τ ≡ n always does best. an exact solution to the problem of valuing an American put is unknown. the payoﬀ is g(Sk ). For puts. For the ﬁrst inequality we used (8. E [(1 + r)−τ g(Sτ )] ≤ E [(1 + r)−n g(Sn )].1). E [(1 + r)−(k+1) g(Sk+1 ) | Fk ] = (1 + r)−k E 1 g(Sk+1 ) | Fk 1+r 1 ≥ (1 + r)−k E g Sk+1 | Fk 1+r 1 ≥ (1 + r)−k g E Sk+1 | Fk 1+r = (1 + r)−k g(Sk ). where g(x) = (K − x)+ . and the above argument fails.

Xn = 2n i=0 In words. (i + 1)/2n ). Continuous random variables. so we need to prepare by extending some of our deﬁnitions. but all this has been worked out in measure theory and the theory of the Lebesgue integral. There are some things one wants to prove. if X(ω) lies between 0 and n. provided at least one of E X + and E X − is ﬁnite.9. As long as |x|fX (x)dx < ∞. on the set where X ≤ n. 2n Since x diﬀers from i/2n by at most 1/2n when x ∈ [i/2n . Here X + = max(X. we have that |X(ω) − Xn (ω)| ≤ 2−n . b]) = a fX (x)dx for all a and b.v’s Xn that are discrete. If X is not necessarily nonnegative. and approximate X. 38 . For reasonable X we are going to deﬁne E X = lim E Xn . Given any random variable X ≥ 0. Let us conﬁne ourselves here to showing this deﬁnition is the same as the usual one when X has a density. we can approximate it by r. With our deﬁnition of Xn we have (i+1)/2n n n n P(Xn = i/2 ) = P(X ∈ [i/2 . one can show that this contribution does indeed go to 0. 0). 0) and X − = max(−X. this will tend to xfX (x)dx. In fact. the limit must exist. we let Xn (ω) be the closest value i/2n that is less than or equal to X(ω). For ω where X(ω) > n + 2−n we set Xn (ω) = 0. We let n2n i 1(i/2n ≤X<(i+1)/2n ) . we deﬁne E X = E X + − E X − . We are now going to start working toward continuous times and stocks that can take any positive number as a value. Since the Xn increase with n. In this case EX = ∞ xfX (x)dx −∞ provided ∞ −∞ |x|fX (x)dx < ∞. Clearly the Xn are discrete. Recall X has a density fX if b P(X ∈ [a. unless the contribution to the integral for |x| ≥ n does not go to 0 as n → ∞. Then E Xn = i i P(Xn = i/2n ) = 2n (i+1)/2n i i/2n i fX (x)dx. although it could be +∞. (i + 1)/2 )) = i/2n fX (x)dx. see Note 1.

For example. Note 1. is G measurable if (X > a) ∈ G for every a.v. if P({ω : X(ω) = Y (ω)}) = 0. .s. let Ω be the real line and let Gn be generated by the sets (−∞. How do we deﬁne E [Z | G] when G is not generated by a countable collection of disjoint sets? Again. means that except for a set of ω’s of probability zero. hence G must be the σ-ﬁeld that one works with when one talks about Lebesgue measure on the line. . we need to be more cautious about what we mean when we say two random variables are equal. the number of Bni is ﬁnite but arbitrary. Finally. see Note 2. we know by Proposition 4. This means that Gn is generated by ﬁnitely many disjoint sets Bn1 . Let us suppose that for each n the σ-ﬁeld Gn is ﬁnitely generated. We say X = Y almost surely. and takes a limit. abbreviated “a. Suppose also that G1 ⊂ G2 ⊂ · · ·. This is a fairly general set-up. Bnmn . So we deﬁne EX = X(ω) P(dω). A r. and their union is Ω. Xn (ω) → Y (ω). one approximates X + by discrete random variables. it must have a limit as n → ∞. one deﬁnes conditional expectation by what one expects. Let us give a deﬁnition that is equivalent that works except for a very few cases. Once one has a deﬁnition of conditional probability. . [n. Then G will contain every interval that is closed on the left and open on the right. so by the martingale convergence theorem.s. the Bni are disjoint. A probability P is a measure that has total mass 1. . So X = Y except for a set of probability 0. but everything does go through.3 that Mn = P(A | Gn ) is a martingale with respect to the Gn . If X is discrete. but suppose G is the smallest σ-ﬁeld that contains all the Gn . one can write X as j aj 1Aj and then one deﬁnes E [X | G] = aj P(Aj | G). See Note 2 again. terminology is used other places as well: Xn → Y a. So for each n.”. The a. and similarly for X − . (i + 1)/2n ). one write X = X + −X − . One has to worry about convergence. The question that one might ask is: how does one know the limit exists? Since the Gn increase. ∞) and [i/2n . Now ∪n Gn will not in general be a σ-ﬁeld. It is certainly bounded above by 1 and bounded below by 0. deﬁne P(A | G) = lim P(A | Gn ). With continuous random variables. n). With this extended deﬁnition of conditional expectation. 39 . there is a completely worked out theory that holds in all cases. do all the properties of Section 2 hold? The answer is yes. The best way to deﬁne expected value is via the theory of the Lebesgue integral.We also need an extension of the deﬁnition of conditional probability.s. j If the X is not discrete.

but we only want to deﬁne Q on G. G) and Q(A) = 0 whenever P(A) = 0 and A ∈ G. we deﬁne E X = E X + − E X −. Let us apply the Radon-Nikodym theorem to the following situation.To recall how the deﬁnition goes. A] = E [X. The Radon-Nikodym theorem from measure theory says that if Q and P are two ﬁnite measures on (Ω. Y ≤ X}. F. we deﬁne E X = sup{E Y : Y simple . Note ((a) Y is G measurable. If X is nonnegative. P (A) = P(A) if A ∈ G and P (A) is not deﬁned if A ∈ F − G.) So Q and P are two ﬁnite measures on (Ω. This is the same deﬁnition as described above. we say X is simple if X(ω) = ai ≥ 0. Deﬁne two new probabilities on G as follows. as we will see in a moment. then there exists an integrable function Y that is G-measurable such that Q(A) = A Y dP for every measurable set A. E [Y . G). If A ∈ G and P (A) = 0. that is. Let P = P|G . Suppose (Ω. Suppose G ⊂ F. provided at least one of E X + and E X − is ﬁnite. 40 . then P(A) = 0 and so it follows that Q(A) = 0. P) is a probability space and X ≥ 0 is integrable: E X < ∞. Finally. Note 2. and (b) if A ∈ G. (One can also use this deﬁnition to deﬁne Q(A) for A ∈ F. A] = E [Y 1A ] = A Y dP = A Y dP = Q(A) = A XdP = E [X1A ] = E [X. and for a simple X we deﬁne m m i=1 ai 1Ai (ω) with each EX = i=1 ai P(Ai ). A]. By the Radon-Nikodym theorem there exists an integrable random variable Y such that Y is G measurable (this is why we worried about which σ-ﬁeld we were working with) and Q(A) = A Y dP if A ∈ G. A] if A ∈ G. Deﬁne Q by Q(A) = A XdP = E [X. A] because E [Y . One can show (using the monotone convergence theorem from measure theory) that Q is a ﬁnite measure on G.

P(Z > Y ) = 0. and E [Y ..4. If one checks the proofs of Propositions 2. and 2. Consequently P(An ) = 0. up to almost sure equivalence.We deﬁne E [X | G] to be the random variable Y . A] = E [Z. By the uniqueness result. So the propositions hold for the new deﬁnition of conditional expectation as well. one sees that only properties (a) and (b) above were used. Let us show that there is only one r. In the case where G is ﬁnitely or countably generated. so P(Y > Z) = 0. By symmetry. and so 1 1 E [Z. then X + and X − will be integrable and we deﬁne E [X | G] = E [X + | G] − E [X − | G]. the new and old deﬁnitions agree. 2. that satisﬁes (a) and (b) above.5. An ] + n P(An ) = E [Z + n . We deﬁne P(B | G) = E [1B | G] if B ∈ F.3. A] = E [X. 41 . If X is integrable but not necessarily nonnegative. An ] = E [Z. If Y and Z are G measurable. An ]. This is true for each positive integer n.v. under both the new and old deﬁnitions (a) and (b) hold. 1 then the set An = (Y > Z + n ) will be in G. An ] ≤ E [Y . and therefore P(Y = Z) = 0 as we wished. A] for A ∈ G.

but see Note 1. We typically let Ft be the smallest σ-ﬁeld with respect to which Ys is measurable for all s ≤ t. Let Ft = σ(Ys : s ≤ t). Stochastic processes. 1 1 1 Since (τ < t) = ∪∞ (τ ≤ t − n ) and (τ ≤ t − n ) ∈ Ft− n ⊂ Ft . The proof is easy. We say the ﬁltration satisﬁes the “usual conditions” if the Ft are right continuous and complete (see Note 1). then E [Mt | Fs ] = Ms . and if s < t.s. 0 This is what we referred to as Ft above. Conversely. . they don’t aﬀect anything. . so we put the proof in Note 2. we can think of St being the price of a stock at time t.1. A mapping τ : Ω → [0. of r. (Here we are saying the left hand side and the right hand side are equal almost surely.v. For technical reasons. all the ﬁltrations we consider will satisfy the usual conditions. We say a stochastic process has continuous paths if the following holds. there are a few technicalities one has to worry about. . As you might imagine. Typically. We will try to avoid thinking about them as much as possible.v. Next add to Ft all sets N for which P(N ) = 0. ∞) to R. suppose τ is a nonnegative r. We will be talking about stochastic processes. for which (τ < t) ∈ Ft for all t. If this function is a continuous function for all ω’s except for a set of probability zero. but we need the right continuity of the Ft here.10. then for a stopping n=1 time τ we have (τ < t) ∈ Ft for all t. In fact. we say Yt has continuous paths. We claim τ is a stopping time. τ will be a continuous random variable and P(τ = t) = 0 for each t. S2 . which is why we need a deﬁnition just a bit diﬀerent from the discrete case. and since they have probability 0. 0 Note 1. Deﬁnition 10. For each ω.) The analogues of Doob’s theorems go through. one wants to add all sets N that we think of being null sets. Any nonnegative time t is allowed. each Mt is Ft measurable. Such sets are called null sets. the map t → Yt (ω) deﬁnes a function from [0. Now we want to talk about processes Yt for t ≥ 0. For example.’s. Note 3 has the proofs. one typically deﬁnes Ft as follows. ∞) is a stopping time if for each t we have (τ ≤ t) ∈ Ft . So Ft = σ(Ys : s ≤ t). We call a collection of σ-ﬁelds Ft with Fs ⊂ Ft if s < t a ﬁltration.” since almost all of our equalities for random variables are only almost surely. we will usually not write the “a. Previously we discussed sequences S1 . A continuous time martingale (or submartingale) is what one expects: each Mt is integrable. even though they might not be 42 .

measurable. To be more precise, we say N is a null set if inf{P(A) : A ∈ F, N ⊂ A} = 0. 0 00 Recall we are starting with a σ-ﬁeld F and all the Ft ’s are contained in F. Let Ft be the σ0 0 ﬁeld generated by Ft and all null sets N , that is, the smallest σ-ﬁeld containing Ft and every 00 null set. In measure theory terminology, what we have done is to say Ft is the completion of 0 Ft . 00 Lastly, we want to make our σ-ﬁelds right continuous. We set Ft = ∩ε>0 Ft+ε . Although the union of σ-ﬁelds is not necessarily a σ-ﬁeld, the intersection of σ-ﬁelds is. Ft 00 contains Ft but might possibly contain more besides. An example of an event that is in Ft 00 but that may not be in Ft is

1 A = {ω : lim Yt+ n (ω) ≥ 0}.

n→∞

00 00 A ∈ Ft+ 1 for each m, so it is in Ft . There is no reason it needs to be in Ft if Y is not m necessarily continuous at t. It is easy to see that ∩ε>0 Ft+ε = Ft , which is what we mean when we say Ft is right continuous. When talking about a stochastic process Yt , there are various types of measurability one can consider. Saying Yt is adapted to Ft means Yt is Ft measurable for each t. However, since Yt is really a function of two variables, t and ω, there are other notions of measurability that come into play. We will be considering stochastic processes that have continuous paths or that are predictable (the deﬁnition will be given later), so these various types of measurability will not be an issue for us.

**Note 2. Suppose (τ < t) ∈ Ft for all t. Then for each positive integer n0 ,
**

1 (τ ≤ t) = ∩∞ 0 (τ < t + n ). n=n 1 1 For n ≥ n0 we have (τ < t + n ) ∈ Ft+ n ⊂ Ft+ n1 . Therefore (τ ≤ t) ∈ Ft+ n1 for each n0 . 0 0 Hence the set is in the intersection: ∩n0 >1 Ft+ n1 ⊂ ∩ε>0 Ft+ε = Ft .

0

Note 3. We want to prove the analogues of Theorems 5.3 and 5.4. The proof of Doob’s inequalities are simpler. We only will need the analogue of Theorem 5.4(b). Theorem 10.2. Suppose Mt is a martingale with continuous paths and E Mt2 < ∞ for all t. Then for each t0 E [(sup Ms )2 ] ≤ 4E [|Mt0 |2 ].

s≤t0

Proof. By the deﬁnition of martingale in continuous time, Nk is a martingale in discrete time with respect to Gk when we set Nk = Mkt0 /2n and Gk = Fkt0 /2n . By Theorem 5.4(b)

2 2 2 E [ max n Mkt0 /2n ] = E [ max n Nk ] ≤ 4E N2n = 4E Mt2 . 0 0≤k≤2 0≤k≤2

43

(Recall (maxk ak )2 = max a2 if all the ak ≥ 0.) k 2 Now let n → ∞. Since Mt has continuous paths, max0≤k≤2n Mkt0 /2n increases up to 2 sups≤t0 Ms . Our result follows from the monotone convergence theorem from measure theory (see Note 4). We now prove the analogue of Theorem 5.3. The proof is simpler if we assume that is ﬁnite; the result is still true without this assumption.

E Mt2

Theorem 10.3. Suppose Mt is a martingale with continuous paths, E Mt2 < ∞ for all t, and τ is a stopping time bounded almost surely by t0 . Then E Mτ = E Mt0 . Proof. We approximate τ by stopping times taking only ﬁnitely many values. For n > 0 deﬁne τn (ω) = inf{kt0 /2n : τ (ω) < kt0 /2n }. τn takes only the values kt0 /2n for some k ≤ 2n . The event (τn ≤ jt0 /2n ) is equal to (τ < jt0 /2n ), which is in Fjt0 /2n since τ is a stopping time. So (τn ≤ s) ∈ Fs if s is of the form jt0 /2n for some j. A moment’s thought, using the fact that τn only takes values of the form kt0 /2n , shows that τn is a stopping time. It is clear that τn ↓ τ for every ω. Since Mt has continuous paths, Mτn → Mτ a.s. Let Nk and Gk be as in the proof of Theorem 10.2. Let σn = k if τn = kt0 /2n . By Theorem 5.3, E Nσn = E N2n , which is the same as saying E Mτn = E Mt0 . To complete the proof, we need to show E Mτn converges to E Mτ . This is almost obvious, because we already observed that Mτn → Mτ a.s. Without the assumption that E Mt2 < ∞ for all t, this is actually quite a bit of work, but with the assumption it is not too bad. Either |Mτn − Mτ | is less than or equal to 1 or greater than 1. If it is greater than 1, it is less than |Mτn − Mτ |2 . So in either case, |Mτn − Mτ | ≤ 1 + |Mτn − Mτ |2 . (10.1)

Because both |Mτn | and |Mτ | are bounded by sups≤t0 |Ms |, the right hand side of (10.1) is bounded by 1 + 4 sups≤t0 |Ms |2 , which is integrable by Theorem 10.2. |Mτn − Mτ | → 0, and so by the dominated convergence theorem from measure theory (Note 4), E |Mτn − Mτ | → 0. 44

Finally, |E Mτn − E Mτ | = |E (Mτn − Mτ )| ≤ E |Mτn − Mτ | → 0.

Note 4. The dominated convergence theorem says that if Xn → X a.s. and |Xn | ≤ Y a.s. for each n, where E Y < ∞, then E Xn → E X. The monotone convergence theorem says that if Xn ≥ 0 for each n, Xn ≤ Xn+1 for each n, and Xn → X, then E Xn → E X.

45

. r ≤ s). b1 ]. . We will most often use Wt . b1 ]. See Note 1 for more discussion of weak convergence. (3) Zt − Zs is independent of Fs = σ(Zr . If n n n nt is an integer. who was the ﬁrst person to prove rigorously that Brownian motion exists). that takes place. b]) = a 1 2π(t − s) e−y 2 /2(t−s) dy. . each t1 < t2 < · · · < tk . . Ztk ∈ [ak . Let Sn be a simple symmetric random walk. .) (5) The map t → Zt (ω) is continuous for almost all ω. let us review a few facts about normal random variables. This means that Yk = Sk − Sk−1 equals +1 with probability 1 . then E X = a. . √ n Deﬁne Xt = Snt / n if nt is an integer and by linear interpolation for other t. This means b P(Zt − Zs ∈ [a. Var X = b2 . and each a1 < b1 . However there is another kind of convergence. . We say X is a normal random variable with mean a and variance b2 if d P(c ≤ X ≤ d) = c √ 1 2πb2 e−(y−a) 2 /2b2 dy and we will abbreviate this by saying X is N (a. bk ]) → P(Zt1 ∈ [a1 . It has the following properties. We notice that E Sn = 0 while E Sn = i=1 E Yi2 + i=j E Yi Yj = n using the fact that E [Yi Yj ] = (E Yi )(E Yj ) = 0. There exists a process Zt such that for each k. a2 < b2 . (This result follows from the central limit theorem. . bk ]). we have (1) The paths of Zt are continuous as a function of t. b2 ). ak < bk . and E |X|p < ∞ is ﬁnite for every positive integer p. . called weak convergence. It turns out Xt does not converge for any ω. If X is N (a. . (4) Zt − Zs has the distribution of a normal random variable with mean 0 and variance t − s. equals −1 with probability 1 . n n (2) P(Xt1 ∈ [a1 . (1) E Zt = 0. 2 (2) E Zt = t. Brownian motion. First. . It is common to use Bt (“B” for Brownian) or Wt (“W” for Wiener. Xtk ∈ [ak . . Moreover E etX = eat et 2 2 b /2 . The limit Zt is called a Brownian motion starting at 0. and is independent of Yj for 2 2 n 2 j < k. 46 . b2 ).11. E Xt = 0 and E (Xt )2 = t. See Note 2 for a few remarks on this deﬁnition.

Another bizarre property: if one looks at the set of times at which Wt (ω) is equal to 0.2. but it is not diﬀerentiable. The second term on the last line is equal to Ws E [Wt − Ws ] = 0. is equal to t − s. The ﬁrst term. E [Wt | Fs ] = E [Wt − Ws | Fs ] + E [Ws | Fs ] = E [Wt − Ws ] + Ws = Ws . one of the crucial properties of a Brownian motion is that it is a martingale with continuous paths. There is nothing special about 0 – the same is true for the set of times at which Wt (ω) is equal to a for any level a. In what follows. Wt has continuous paths. We used here the facts that Wt − Ws is independent of Fs and that E [Wt − Ws ] = 0 because Wt and Ws have mean 0. Wt is Ft measurable by the deﬁnition of Ft . Proof.) The key property is to show E [Wt | Fs ] = Ws . As one might imagine for a limit of a simple random walk. this is a set which is uncountable. We used the facts that Ws is Fs measurable and that (Wt − Ws )2 is independent of Fs because Wt − Ws is. It turns out that the function t → Wt (ω) is continuous. E |Wt |n < ∞ for all n. We calculate E [Wt2 − t | Fs ] = E [((Wt − Ws ) + Ws )2 | Fs ] − t 2 = E [(Wt − Ws )2 | Fs ] + 2E [(Wt − Ws )Ws | Fs ] + E [Ws | Fs ] − t 2 = E [(Wt − Ws )2 ] + 2Ws E [Wt − Ws | Fs ] + Ws − t. Substituting. because Wt − Ws is normal with mean 0 and variance t − s.1. As part of the deﬁnition of a Brownian motion. the paths of Brownian motion have a huge number of oscillations. but contains no intervals. Proof. the last line is equal to 2 2 (t − s) + 0 + Ws − t = Ws − s 47 . We will also need Proposition 11. Wt is a martingale with respect to Ft and Wt has continuous paths. Proposition 11.We will use Brownian motion extensively and develop some of its properties. That Wt2 − t is integrable and is Ft measurable is as in the above proof. in fact one cannot deﬁne a derivative at any value of t. Wt2 − t is a martingale with continuous paths with respect to Ft . Since the distribution of Wt is that of a normal random variable with mean 0 and variance t. Let us prove this. (In fact. then E |Wt | < ∞ for all t.

This is the type of convergence that takes place in the central limit theorem.as required. A sequence of random variables Xn converges weakly to X if P(a < Xn < b) → P(a < X < b) for all a. It will not be true in general that Xn converges to X almost surely. . Xk ) to converge to a random vector (X1 . ∞) be the collection of all continuous functions from [0. but we won’t worry about this. . ∞)). The reason one wants to show that Xn converges weakly to Z instead of just showing (2) is that weak convergence can be shown to imply that Z has continuous paths. ∞)) to R being continuous makes sense. Note 2. ∞)). . . we actually want to let Ft to be the completion of σ(Zs : s ≤ t). If Xn converges to a normal random variable. Let C([0. 48 . Xk ). a and b can be inﬁnite. One can prove that the resulting Ft are right continuous. that is. one can give an analogous deﬁnition. . and hence the ﬁltration Ft satisﬁes the “usual” conditions. A result from probability theory says that Xn converges to X weakly if and only if E [f (Xn )] → E [f (X)] whenever f is a bounded continuous function on R. First of all. we throw in all the null sets into each Ft . the “almost all” in (5) means that t → Zt (ω) is continuous for all ω. We say that the processes Xn converge weakly to the process Z. except for a set of ω of probability zero. . ∞] such that P(X = a) = P(X = b) = 0. . This is a metric space. so the notion of function from C([0. Second. Finally. But saying that the normalized random walks Xn (t) above converge weakly to Zt actually says more than (2). One example of such a function F would be F (f ) = sup0≤t<∞ |f (t)| if f ∈ C([0. there is some redundancy in the deﬁnition: one can show that parts of the deﬁnition are implied by the remaining parts. then P(X = a) = P(X = b) = 0 for all a and b. n m For a sequence of random vectors (X1 . Note 1. . b ∈ [−∞. Another 1 would be F (f ) = 0 f (t)dt. and mean by this that E [F (Xn )] → E [F (Z)] whenever F is a bounded continuous function on C([0. We use this to deﬁne weak convergence for stochastic processes. ∞) to the reals.

but that is a minor matter here). Note that although we use the fact that g is diﬀerentiable in the intermediate stages. The resulting integral is called a stochastic integral. where s1 < s2 < · · · < sn is a partition of [0. one would expect the following: t t a 1[0. If we were to take f (s) = 1[0.1) =E +E f ( 2in )f ( 2jn )[W ( i+1 ) − W ( 2in )] [W ( j+1 ) − W ( 2jn )]..12. We now want to replace g by a Brownian path and f by a random integrand.a] (s)g (s) ds = 0 g (s) ds = g(a) − g(0). Stochastic integrals. The expression f (s) dW (s) does not make sense as a Riemann-Stieltjes integral because it is a fact that W (s) is not diﬀerentiable as a function of t. the ﬁrst and last terms make sense for any g.e. This is known as the Riemann-Stieltjes integral. We will show that it can be deﬁned as the limit in L2 of Riemann sums. we can deﬁne it analogously to the n usual Riemann integral as the limit of Riemann sums i=1 f (si )[g(si ) − g(si−1 )]. One can show (using the mean value theorem. t]. Suppose f is continuous and deterministic (i. Let us consider a very special case ﬁrst. does not depend on ω). i 49 . Suppose we take a Riemann sum approximation 2n −1 In = i=0 f ( 2in )[W ( i+1 ) − W ( 2in )]. 2n 2n i=j The ﬁrst sum is bounded by 1 f ( 2in )2 n 2 1 ≈ 0 f (t)2 dt. where f and g are continuous and g is continuously diﬀerentiable. Let us calculate the second moment: 2 E In = E i 2n −1 f ( 2in )[W ( i+1 ) − W ( 2in )] 2n f ( 2in )2 [W ( i+1 ) − W ( 2in )]2 2n i=0 2 (12. We need to deﬁne the expression by some other means. for example) that t t f (s) dg(s) = 0 0 f (s)g (s) ds.a] (s) dg(s) = 0 0 1[0.a] (s) (which is not continuous. E In = 0. 2n Since Wt has zero expectation for each t. t If one wants to consider the (deterministic) integral 0 f (s) dg(s).

where t Nt = 0 Hs dWs . If Wt is a Brownian motion. There exists one and only one increasing process At that is adapted to Ft . Suppose Mt is a continuous martingale such that E Mt2 < ∞ for all t. We will t construct 0 Hs dWs for all H with t E 0 2 Hs ds < ∞. The simplest example of such a martingale is Brownian motion. We now turn to the construction. We will use the following theorem without proof because in our applications we can construct the desired increasing process directly.1) is zero. Using the independence and the 2n fact that Wt has mean zero.2) Before we proceed we will need to deﬁne the quadratic variation of a continuous martingale. We will use the following frequently. and A0 = 0 such that Mt2 − At is a martingale. We will see later that in the case of stochastic integrals.1. E [W ( i+1 − W ( 2in )] [W ( j+1 − W ( 2jn )] = E [W ( i+1 − W ( 2in )]E [W ( j+1 − W ( 2jn )] = 0. 2n 2n 2n 2n and so the second sum on the right hand side of (12. (a) E [Wb − Wa | Fa ] = 0.1. 50 . So in this case At = t almost surely. and similarly a continuous martingale is a martingale with continuous paths. This calculation is the key to the stochastic integral. 2 2 (b) E [Wb − Wa | Fa ] = E [(Wb − Wa )2 | Fa ] = b − a. we saw in Proposition 11. (a) This is E [Wb − Wa ] = 0 by the independence of Wb − Wa from Fa and the fact that Wb and Wa have mean zero. Hence W t = t. We will only consider integrands Hs such that Hs is Fs measurable for each s (see Note 1). Let Wt be a Brownian motion. has continuous paths. Proof. and in fact. We use the notation M t for the increasing process given in Theorem 12.since the second moment of W ( i+1 ) − W ( 2in ) is 1/2n . Lemma 12. (12. these are the only two properties of Brownian motion that play a signiﬁcant role in the construction. We often say a process is a continuous process if its paths are continuous.1 and call it the quadratic variation process of M . Theorem 12. for all t. it turns out that N t = t 0 2 Hs ds.2 that Wt2 − t is a martingale.

In the case s < a < t < b. If G is bounded and Fa measurable. and the ﬁrst equality follows by applying (a). First step.b] (s) ds. We say an integrand Hs = Hs (ω) is elementary if Hs (ω) = G(ω)1(a. 2 Proposition 12.3) We ﬁrst construct the stochastic integral for H elementary. Finally we consider the case of general H. the work here is showing the stochastic integral is a martingale. and deﬁne the stochastic integral to be the process Nt . E N∞ = E [G2 (b − a)] and t N t = 0 G2 1[a. To prove the ﬁrst equality in (b). where we considered Riemann-Stieltjes integrals. Let us look at E [Nt | Fs ]. where Nt = G(Wt∧b − Wt∧a ).b] (s) where 0 ≤ a < b and G is bounded and Fa measurable. In the case a < s < t < b. 51 . so the conditional expectation is the same as E [(Wb − Wa )2 ]. We construct the stochastic integral in three steps. (12. Nt is a continuous martingale.bi ] (s). Compare this to the ﬁrst paragraph of this section. We next construct the integral for H simple and here the diﬃculty is calculating the second moment. E [Nt | Fs ] is equal to E [G(Wt − Wa ) | Fs ] = E [GE [Wt − Wa | Fa ] | Fs ] = 0 = Ns . let Hs (ω) = G(ω)1(a. Since Wb − Wa is a N (0. We say H is simple if it is a ﬁnite linear combination of elementary processes. the second equality in (b) follows. that is.2.b] (s). n Hs (ω) = i=1 Gi (ω)1(ai . The continuity is clear. this is equal to E [G(Wt − Wa ) | Fs ] = GE [(Wt − Wa ) | Fs ] = G(Ws − Wa ) = Ns . b − a).(b) (Wb − Wa )2 is independent of Fa . Proof. we write 2 2 2 E [Wb − Wa | Fa ] = E [((Wb − Wa ) + Wa )2 | Fa ] − E [Wa | Fa ] 2 = E [(Wb − Wa )2 | Fa ] + 2E [Wa (Wb − Wa ) | Fa ] + E [Wa | Fa ] 2 − E [Wa | Fa ] = E [(Wb − Wa )2 | Fa ] + 2Wa E [Wb − Wa | Fa ].

as < a < b < t. For example. and N t 2 Proposition 12. b2 = b1 . E N∞ = E t 2 Hs ds. b3 = b2 . Nt is a continuous martingale.bi ] . We do this by checking all six cases for the relative locations of a. s. 2 For E N∞ . In this case deﬁne the stochastic integral t n Nt = 0 Hs dWs = i=1 Gi (Wbi ∧t − Wai ∧t ). For N t . these are done similarly. we need to show E [G2 (Wt∧b − Wt∧a )2 − G2 (t ∧ b − t ∧ a) | Fs ] = G2 (Ws∧b − Ws∧a )2 − G2 (s ∧ b − s ∧ a). Nt is a martingale. a3 = b1 . The sum of continuous processes will be continuous.3). Second step. i=1 So now we have H simple but with the intervals (ai . G2 = G1 + G2 . Since the sum of martingales is clearly a martingale. so Nt has continuous paths. Next suppose Hs is simple as in (12.3. a2 = a2 . 52 . we have using Lemma 12. ∞ 0 2 Hs ds. We may rewrite H so that the intervals (ai . 0 = Proof. we could write Hs = G1 1(a1 .b2 ] . bi ] non-overlapping. we do one of the cases in Note 2. if we had a1 < a2 < b1 < b2 . G3 = G2 and a1 = a1 . We have 2 E N∞ = E G2 (Wbi − Wai )2 + 2E i i<j Gi Gj (Wbi − Wai )(Wbj − Waj ) . and t. we have written H as 3 Gi 1(ai .1(b) 2 2 2 E N∞ = E [G2 E [(Wb − Wa )2 | Fa ]] = E [G2 E [Wb − Wa | Fa ]] = E [G2 (b − a)]. a < b < s < t.a2 ] + (G1 + G2 )1(a2 .The other possibilities are s < t < a < b. b. bi ] satisfy a1 ≤ b1 ≤ a2 ≤ b2 ≤ · · · ≤ bn . and then if we set G1 = G1 . b1 = a2 . and a < s < b < t.b1 ] + G2 1(b1 .

The terms in the second sum vanish. E [Gi Gj (Wbi − Wai )(Wbj − Waj )] = 0. Now suppose Hs is adapted and E 0 Hs ds < ∞. The triangle inequality then implies (see Note 3 again) ∞ n m (Hs − Hs )2 ds → 0. This should look reminiscent of the deﬁnition of Cauchy sequences. and N t = 0 Hs ds. ∞ n = i=1 E[G2 ([bi − ai )].1 E [G2 (Wbi − Wai )2 ] = E [G2 E [(Wbi − Wai )2 | Fai ]] = E [G2 ([bi − ai )]. Taking expectations. i and this is the same as E 2 Third step. we have E [Gi Gj (Wbi − Wai )E [(Wbj − Waj ) | Faj ] = 0. and in fact that is what is going on here. Note 3 has details. For the terms in the ﬁrst sum. by Lemma 12. t n n n n If Hs and Hs are two sequences converging to H. because when we condition on Faj . Using some results from ∞ n n measure theory (Note 3). 53 .3) we have t 2 E [sup(Ntn − Ntm )2 ] = E sup t t 0 ∞ n m (Hs − Hs ) dWs n m (Hs − Hs ) dWs 2 ≤ 4E 0 ∞ = 4E 0 n m (Hs − Hs )2 ds → 0. or the limit is independent of which sequence H n we choose. we can choose Hs simple such that E 0 (Hs − Hs )2 ds → 0. i i i So 2 E N∞ ∞ 0 2 Hs ds. and one can show (Note 3) that there exists a process Nt such that t E sup t 0 n Hs dWs − Nt 2 → 0. E Nt2 = E 0 Hs ds. 0 E Deﬁne Ntn = t 0 n Hs dWs using Step 2. By Doob’s inequality (Theorem 10. See t t 2 2 Note 4 for the proof that Nt is a martingale. In the present context Cauchy sequences converge. then E ( 0 (Hs − Hs ) dWs )2 = t n n E 0 (Hs − Hs )2 ds → 0.

The following holds. For a semimartingale. we conclude t X. Y t = 0 Hs Ks ds. then (X + Y )t = t = 0 (Hs + Ks )2 ds = 0 2 Hs ds + 0 2Hs Ks ds + 0 2 Ks ds. t We write Nt = W. 2 2 In particular. Y As an example. we deﬁne X t = Mt . If the random ∞ 2 variable 0 Hs d M s is ﬁnite but without its expectation being ﬁnite. and with probability one. Nt has continuous paths (Note 5). 0 t Hs dWs and Yt = Ks dWs . A process At is of bounded variation if the paths of At have bounded variation. if Xt = so X +Y Since X t t t 0 t t = 1[ X + Y 2 t 0 t t − X t − Y t ]. 54 . we can duplicate everything we just did (see Note 6) with ds replaced by d M s and get a stochastic integral. where the ﬁrst integral on the right is a stochastic integral and the second is a RiemannStieltjes or Lebesgue-Stieltjes integral.n Because supt [ 0 Hs dWs − Nt ] → 0. we replace ds by Ks ds. if we replace Wt by a t 2 continuous martingale Mt and Hs is adapted with E 0 Hs d M s < ∞. If 0 Hs d M s + 0 |Hs | |dAs | < ∞ and Xt = Mt + At . t (Hs + Ks )dWs . There are some other extensions of the deﬁnition that are not hard. where A+ and A− have paths that are increasing. A semimartingale is the sum of a martingale and a ∞ ∞ 2 process of bounded variation. one can show there exists a subsequence such that the convergence takes place almost surely. Note 7 has more on this. if d M s = Ks ds. which is what one would expect. we can deﬁne the stochastic integral by deﬁning it for t ≤ TN for suitable stopping times TN and then letting TN → ∞. t t t t + − |A|t is then deﬁned to be At + At . t 0 Hs dWs and call Nt the stochastic integral of H with respect to We discuss some extensions of the deﬁnition. First of all. we deﬁne t t t Hs dXs = 0 0 Hs dMs + 0 Hs dAs . look at Note 7. Y t by what is known as polarization: X. Given two semimartingales X and Y we deﬁne X. This means that one can write At = A+ −A− . = t 0 2 Hs ds with a similar formula for Y t .

Then 0 Hs dZs represents our proﬁt (or loss) if we hold Hs shares at time s.b]) (s). Suppose Hs is adapted and E 0 Hs d N s < ∞. If Ht is only right continuous and a path has a jump at time t. The argument for the proof is given in Note 8. Since we are in continuous time. This can be seen most easily if Hs = G1[a. What does a stochastic integral mean? If one thinks of the derivative of Zt as being t a white noise. 55 (12. then 0 Hs dZs is like a ﬁlter that increases or decreases the volume by a factor Hs . This gives an indication of where the name comes from. Let Nt = t ∞ ∞ 2 2 2 Ks dWs .b]) (s) where G is bounded and Fa measurable for some a < b. What we require for our integrands H is that they be predictable processes. The majority of the integrands we will consider will be continuous. we are allowed to buy and sell continuously and instantaneously. The stochastic integral represents our proﬁt or loss. Deﬁne P to be the smallest σ-ﬁeld with respect to which every process of this form is measurable.2 Proposition 12. an interpretation is that Zt represents a stock price. this is not possible. H is a stochastic process. If Hs has paths which are left continuous. So continuous processes are predictable. so can be viewed as a map from [0. and we can “predict” the value of Ht from the values at times that come before t. Note 1. one sees that processes whose paths are functions which are continuous from the left at each time point are also predictable. Suppose Ks is adapted to Fs and E 0 Ks ds < ∞. Let us consider the case a < s < t < b. Then E 0 Hs Ks ds < ∞ 0 and t t ∞ Hs dNs = 0 0 Hs Ks dWs .b] .4) . ∞) × Ω as follows. then Ht = 1 limn→∞ Ht− n . ω) → Hs (ω). again similar arguments take care of the other ﬁve cases. P is called the predictable or previsible σ-ﬁeld. If Hs has continuous paths. ∞) × Ω to R by H : (s. What we are not allowed to do is look into the future to make our decisions. So we buy G(ω) shares at time a and sell them at time b. If one is slightly more careful. t For us. Consider the collection of processes of the form G(ω)1(a. Note 2. then approximating continuous functions by step functions shows that such an H can be approximated by linear combinations of processes of the form G(ω)1(a. and if a process H is measurable with respect to P. We need to show E [G2 (Wt − Wa )2 − G2 (t − a) | Fs ] = G2 (Ws − Wa )2 − G2 (s − a). We deﬁne a σ-ﬁeld P on [0.4. Let us be more precise concerning the measurability of H that is needed. then the process is called predictable. which is where the Hs adapted condition comes in.

The last expression is equal to the right hand side of (12. (A technical proviso: one has to identify as equal functions which diﬀer only on a set of measure 0. f f with respect to the measure µ. namely. We write this as G2 E [((Wt − Ws ) + (Ws − Wa ))2 − (t − a) | Fs ] = G2 E [(Wt − Ws )2 | Fs ] + 2E [(Wt − Ws )(Ws − Wa ) | Fs ] + E [(Ws − Wa )2 | Fs ] − (t − a) = G2 E [(Wt − Ws )2 ] + 2(Ws − Wa )E [Wt − Ws | Fs ] + (Ws − Wa )2 − (t − a) = G2 (t − s) + 0 + (Ws − Wa )2 − (t − a) . this says that if H is n measurable with respect to P. then the Cauchy sequence converges. the L2 norm of . Deﬁne N = E sup Nt2 0≤t<∞ 1/2 .The left hand side is equal to G2 E [(Wt − Wa )2 − (t − a) | Fs ]. Another theorem from n measure theory says that the collection of simple functions (functions of the form i=1 ci 1Ai ) is dense in L2 with respect to the metric. g) = f − g 2 . This can be viewed as a standard L2 norm. and hence that the triangle inequality holds. Note 3.e. there exists N with N < ∞ such that N n − N → 0. The space L2 is deﬁned to be the set of functions f for which f 2 < ∞. One can show that this is a norm. and a theorem from measure theory says that L2 is complete with respect to this metric. the space of processes N such that N < ∞ is complete with respect to this norm. Deﬁne ∞ H 2 = E 0 2 Hs ds 1/2 . this is essentially an L2 norm. Moreover. m ≥ n0 . Let us deﬁne a norm on stochastic processes.) If one deﬁnes a distance between two functions f and g by d(f. is deﬁned as f (x)2 µ(dx) 1/2 2. 56 . this is a metric on the space L2 . We can deﬁne another norm on stochastic processes. then there exist simple processes Hs that are also measurable with respect to P such that H n − H 2 → 0. Since the set of simple functions with respect to µ is dense in L2 . if given ε there exists n0 such that N n − N m < ε whenever n. i..4). This means that if N n is a Cauchy sequence. ω)ds. that is. the L2 norm with respect to the measure µ deﬁned on P by ∞ µ(A) = E 0 1A (s. A deﬁnition from measure theory says that if µ is a measure.

We have N n − N → 0. N ≤ Nn + Nn − N ≤ Nn + 1 < ∞ since N n is ﬁnite for each n. 57 .3 E [Nt | Fs ] = Ns . Note 5. We have a similar limit when t is replaced by s. (12. Since E [Ntn | Fs ] = Ns . By Cauchy-Schwarz this is less than Ntn − Nt Ntn + Nt . hence each Ntn n is a martingale. That Nt2 − N t is a martingale is similar to the proof that Nt is a martingale. (12. where the norm is described in Note 3. so taking the limit in (12. but slightly more delicate.7) 1/2 E [12 ] A 1/2 (12. We have N n − N → 0. but note that in place of (SEC. t a. A].7) tends to 0. A] = E [Ns . A] − E [Nt .402) one writes |E [(Ntn )2 . then by Proposition 4. A] ≤ E [(Ntn − Nt )2 ] ≤ N n − N → 0. Suppose N n − N → 0.s. A]| ≤ E [ |Ntn − Nt |. t A result from measure theory implies that there exists a subsequence nk such that sup |Ntnk − Nt |2 → 0. A]| ≤ E [ |(Ntn )2 − (Nt )2 |] ≤ E [ |Ntn − Nt | |Ntn + Nt |]. Take ε = 1 and choose n0 .5) By Cauchy-Schwarz.6) since Ntn + Nt ≤ Ntn + Nt is bounded independently of n. A]. Given ε > 0 there exists n0 such that N n − N < ε if n ≥ n0 . Each N n is a stochastic integral of the type described in Step 2 of the construction.Note 4. Let s < t and A ∈ Fs . Since Ns is Fs measurable and has the same expectation over sets A ∈ Fs as Nt does. By the triangle inequality. or Nt is a martingale. then n E [Ntn .5) yields E [Nt . A] − E [(Nt )2 . |E [Ntn . This means that E [sup |Ntn − Nt |2 ] → 0. A] = E [Ns . we see that the left hand side of (12. We leave the proof to the reader. where the norm here is the one described in Note 3.

Let SK = inf{t : |Mt | ≥ K}. Each Ntnk (ω) is continuous by Step 2. To show the analogue of Lemma 12.1. That 2 2 E [Mb − Ma | Fa ] = E [ M b − M a | Fa ] is just a rewriting of 2 E [Mb − M b 2 | Fa ] = Ma − M a 2 = E [Ma − M a | Fa ]. If 0 Hs d M s is ﬁnite for every t.1(a). then the value of the stochastic integral will be the same no matter whether we use M K1 or M K2 as our martingale. then 0 Hs d M s ≤ K and there is no t K diﬃculty deﬁning NtK = 0 Hs dMs for every t. This is the analogue of Lemma 12. If we call the common value Nt . so we have a deﬁnition of Jt for each t. If we let TK = inf{t > 0 : 0 Hs d M s ≥ K}. one can show that if t ≤ SK1 and t ≤ SK2 . Again. If Mt is a continuous martingale. We can do something similar is Mt is a martingale but where we do not have E M ∞ < ∞. even if the expectation of the integral is not. the construction of the stochastic integral 0 Hs dMs goes through exactly as above. using the paragraph above to handle the wider class of H’s. is a martingale. t 58 . therefore Nt (ω) is a continuous function of t. Note 6. We use the common value as a deﬁnition of the stochastic integral Jt . this allows one to deﬁne the stochastic integral Nt for each t in the case t 2 where the integral 0 Hs d M s is ﬁnite for every t.So except for a set of ω’s of probability 0. Incidentally. 2 2 E [(Mb − Ma )2 | Fa ] = E [Mb | Fa ] − 2E [Mb Ma | Fa ] + E [Ma | Fa ] 2 2 = E [Mb | Fa ] − 2Ma E [Mb | Fa ] + Ma 2 = E [Mb − M b | Fa ] + E [ M a b − M a 2 | Fa ] − Ma + M a = E[ M since Mt2 − M t b − M | Fa ]. If we let MtK = Mt∧SK . then t NtK1 = NtK2 a. We have SK → ∞ as K → ∞. One can show that if t ≤ TK1 and TK2 . SK ). this is the primary reason we considered Doob’s inequalities. So we can deﬁne Jt = 0 Hs dMtK for every t. and we let Hs = Hs 1(s≤TK ) . where t ∧ Sk = min(t. then one can show M K is a martingale bounded t K in absolute value by K. and the uniform limit of continuous functions is continuous. the ﬁrst time the integral is larger ∞ K K than or equal to K. 2 Note 7. the ﬁrst time |Mt | is larger than or equal to K.1(b). Ntnk (ω) converges to Nt (ω) uniformly. then TK → ∞ as K → ∞.s. if necessary. With these two properties in place of Lemma 12. replacing Ws by Ms and ds by t d M s . E [Mb − Ma | Fa ] = E [Mb | Fa ] − Ma = Ma − Ma = 0.

8) for Hs elementary.Note 8. (12. 59 .801) holds for Hs simple and then takes limits. ﬁrst prove this in the case when Ks is elementary. To show (12. Thus one reduces the proof to showing (12. We only outline how the proof goes. use linearity to extend it to the case when K is simple. To show t t Hs dNs = 0 0 Hs Ks dWs . To show this. and then take limits to obtain it for arbitrary K.8) one shows that (SEC.8) when both H and K are elementary. one can explicitly write out both sides of the equation and see that they are equal. In this situation. it suﬃces to look at Hs elementary and use linearity.

f (x) = f ”(x) = ex . then t t f (Xt ) − f (X0 ) = 0 f (Xs )dXs + 1 2 0 f (Xs )d M s . Xt = σWt − σ 2 t/2.1) t + =1+ eXs σ 2 ds 0 eXs σdWs . 0 Compare this with the fundamental theorem of calculus: t f (t) − f (0) = 0 f (s)ds. and e σWt −σ 2 t/2 t t =1+ 0 e 1 2 t Xs σdWs − 1 2 0 eXs 1 σ 2 ds 2 (13.1. Ito’s formula. n−1 f (Wt ) − f (W0 ) = i=0 n−1 [f (W(i+1)t/n ) − f (Wit/n )] f (Wit/n )(W(i+1)t/n − Wit/n ) i=1 n−1 ≈ + 1 2 i=0 f (Wit/n )(W(i+1)t/n − Wit/n )2 . The ﬁrst sum on the right is approximately the stochastic integral and the second is approximately the quadratic variation. For a more general semimartingale Xt = Mt + At . which is sometime known as the change of variables formula. says that t t f (Wt ) − f (W0 ) = 0 f (Ws )dWs + 1 2 f (Ws )ds. Let Wt be Brownian motion. Then X t = σW t = σ 2 t. f and its ﬁrst two derivatives are continuous. In Ito’s formula we have a second order term to carry along. Let us look at an example. If f ∈ C 2 . that is. The idea behind the proof is quite simple. Ito’s formula. Suppose Wt is a Brownian motion and f : R → R is a C 2 function. By Taylor’s theorem. and f (x) = ex . 0 60 .13. Ito’s formula reads Theorem 13.

. Let Xt = Wt and let f (x) = xk . known as L´vy’s theorem. 1 d There is a multidimensional version of Ito’s formula: if Xt = (Xt . .2. X j s . we deﬁne X. We then have t t k−1 kWs dWs 0 t Wtk = = k W0 + + 1 2 k−2 k(k − 1)Ws d W 0 t k−2 Ws ds. Y . If Xt and Yt are semimartingales. 2 The following is known as Ito’s product formula. . ∂x2 i The following application of Ito’s formula. It may also be viewed as an integration by parts formula. then d t 0 f (Xt ) − f (X0 ) = i=1 ∂f i (Xs )dXs + ∂xi d 1 2 i. Let us give another example of the use of Ito’s formula. Y t . each component of which is a semimartingale. then 2 Xt = 2 X0 +2 0 t Xs dXs + X t and Yt2 = Y02 +2 0 Ys dYs + Y t . e 61 .This example will be revisited many times later on. and hence a martingale. Xt ) is a vector. 0 s k−1 kWs dWs + 0 k(k − 1) 2 When k = 3. and f ∈ C 2 . is important.j=1 0 t ∂2f (Xs )d X i . Proposition 13. Y t = 1 [ X + Y t − X t − Y t ]. . For a semimartingale Xt = Mt +At we set X t = M t . we obtain t (Xt + Yt )2 = (X0 + Y0 )2 + 2 0 (Xs + Ys )(dXs + dYs ) + X + Y t . Then some algebra and the fact that 2 1 Xt Yt = 2 [(Xt + Yt )2 − Xt − Yt2 ] yields the formula. Applying Ito’s formula with f (x) = x2 to Xt + Yt . this says − 3 Ws ds is a stochastic integral with respect to a Brownian motion. Proof. Then f (x) = kxk−1 and f (x) = k(k − 1)xk−2 . t Applying Ito’s formula with f (x) = x2 to X and to Y . t t Wt3 t 0 Xt Yt = X0 Y0 + 0 Xs dYs + 0 Ys dXs + X. Given two semimartingales X.

recall from undergraduate probability that the moment generating function of a r. so K(t) = P(A)e−u 62 2 t/2 . Multiply this by 1A and take expectations. Suppose Mt is a continuous martingale with M Brownian motion. The solution to this elementary ODE is J(t) = 2 2 e−u t/2 . we have Ee iuMt u2 =1− 2 t eiuMs ds. We will prove that Mt is a N (0. they have the same law. Then Mt is a Before proving this. Since 2 E eiuMt = e−u t/2 . This follows from the formula for mX (a) with a replaced by iu (this can be justiﬁed rigorously). If A ∈ Fs and we do the same argument with Mt replaced by Ms+t − Ms . We apply Ito’s formula with f (x) = eiux .v.3. Proof. A]. Taking expectations and using M s = s and the fact that a stochastic integral is a martingale. the stochastic integral term again has expectation 0. for the remainder of the proof see Note 1.v. t). Then t t eiuMt = 1 + 0 iueiuMs dMs + 1 2 0 (−u2 )eiuMs d M s . hence has 0 expectation. 0 Let J(t) = E eiuMt . we 1 now arrive at K (t) = − 2 u2 K(t) with K(0) = P(A). Since a stochastic integral is a martingale. then ϕX (u) = e−u t/2 . This is also true if we replace a by iu. we have t t eiu(Ms+t −Ms ) = 1 + 0 iueiu(Ms+r −Ms ) dMr + 1 2 0 (−u2 )eiu(Ms+r −Ms ) d M r .v. t = t. t). Note 1.Theorem 13. then by our remarks above the law of Mt must be that of a N (0. X is deﬁned by mX (a) = E eaX and that if two random variables have the same moment generating function. whereas mX (a) might be inﬁnite. with mean 0 and variance t. If we let K(t) = E [eiu(Mt+s −Mt ) . 0 So J (t) = − 1 u2 J(t) with J(0) = 1. The one special case we will need is that if X is 2 a normal r. The reason for looking at the characteristic function is that ϕX always exists. In this case we have ϕX (u) = E eiuX and ϕX is called the characteristic function of X. . The equation can be rewritten J(t) = 1 − u2 2 t J(s)ds. which shows that Mt is a mean 0 variance t normal r.

63 . and integrate over u. A = E eiu(Mt+s −Ms ) P(A). multiply by f (u). (13. (To do the integral. Note Var (Mt − Ms ) = t − s. so P(Ms+t − Ms ∈ B. A) = P(Ms+t − Ms ∈ B)P(A). take A = Ω in (13. we approximate the integral by a Riemann sum and then take limits. replace u in the above by −u.Therefore E eiu(Mt+s −Ms ) .2).2) If f is a nice function and f is its Fourier transform. A] = E [f ((Ms+t − Ms )]P(A). This implies that Ms+t − Ms is independent of Fs . By taking limits we have this for f = 1B .) We then have E [f (Ms+t − Ms ).

The Girsanov theorem. Theorem 14. A] = E [Ms . There is a more general version of the Girsanov theorem. M s . What the Girsanov theorem says is Theorem 14. since Mt is a martingale. we need at a minimum that Xt is mean zero and variance t. This calculation is reviewed in Note 1. Under Q. 0 Ms 64 . then under Q the process Xt − Dt is a martingale where t 1 Dt = d X. If Xt is a martingale under P. This is short hand for t Xt = X0 + Wt + 0 µ(Xs )ds. Mt is a martingale. where Wt is a Brownian motion. Under P. A] (14. Suppose P is a probability and dXt = dWt + µ(Xt )dt. by Ito’s formula. then E [Mt . In order for a process Xt to be a Brownian motion. Wt is a Brownian motion and Xt is not. the process Wt is no longer a Brownian motion. and the conditional expectation of a random variable depends on what probability is being used. To deﬁne mean and variance. Therefore a process might be a Brownian motion with respect to one probability and not another. we need a probability.3) if A ∈ Ft . We also observe that M0 = 1. to be a martingale. Ω] = E Mt .1. We had better be sure this Q is well deﬁned.1) Let Mt = exp − 0 t t µ(Xs )dWs − 0 µ(Xs )2 ds/2 . Under Q. Now let us deﬁne a new probability by setting Q(A) = E [Mt . We also check that Q(Ω) = E [Mt . Xt is a Brownian motion. we need conditional expectations. A] because Mt is a martingale. If A ∈ Fs ⊂ Ft .2) Then as we have seen before.14.2. (14. (14. Similarly. This is equal to E M0 = 1. Most of the other parts of the deﬁnition of being a Brownian motion also depend on the probability.

(So in the above formulation. a. Hence dSt = σSt (dWt + (m/σ)dt) = σSt dWt .4) where ϕ(t. a. Let St be the stock price. M Ms s =− 0 m ds = −(m/σ)t. x). Suppose Xt = Wt + µt. σ Let Xt = Wt .) Deﬁne Mt = e(−m/σ)(Wt )−(m Then from (13. M Therefore 0 t t = 0 − m Ms ds = − σ t t Ms 0 m ds. where µ is a constant. σ Deﬁne Q by (14. Then t X. So we have found a probability under which the asset price is a martingale. under Q the process Wt = Wt + (m/σ)t is a martingale. By Theorem 14.X t is the same under both P and Q.1 can be used. x) = 2 √ 1 e−x /2t 2πt 2 √ 1 e−(2a−x) /2t 2πt x≥a x < a.1) Mt is a martingale and t 2 /2σ 2 )t . Mt = 1 + 0 − m Ms dWs . we have d P(At > a.2. 65 . c ≤ Wt ≤ d) = c ϕ(t.3). We ﬁrst need the probability that a Brownian motion crosses a level a by time t0 . This means that Q is the risk-neutral probability. We want to compute the probability that Xt exceeds the level a by time t0 . Let us see how Theorem 14. σ 1 d X. µ(x) = m for all x. If At = sups≤t Wt . Let us give another example of the use of the Girsanov theorem. (14. (note we are not looking at |Wt |). which we have been calling P. and suppose dSt = σSt dWt + mSt dt. or St = S0 + 0 t σSs dWs is a martingale.

but this is not precise because Wt is a continuous random variable and both sides of the above equation are zero. (14. Wt0 = x)dx 2 1 eµx e−(2a−x) /2t0 dx + 2πt0 ∞ = e−µ t0 /2 −∞ √ √ a 2 1 eµx e−x /2t0 dx. given in Note 2. Yt is a Brownian motion. 2πt0 Proof of Theorem 14. M t t =− 0 µ(Xr )Mr dr. A] = E P [Mt Wt . At > a) = P(Wt = 2a − x). t Mt = 1 − 0 µ(Xr )Mr dWr . So W. We have Wt = Yt + µt. x < a. Let Yt = Wt − µt. A]. A]. 2 Now let Wt be a Brownian motion under P.1 says that under Q.4) is the rigorous version of the reﬂection principle. Let dQ/dP = Mt = eµWt −µ t/2 . Theorem 14. Since Q(A) = E P [Mt . 66 . Using Ito’s formula with f (x) = ex .This is called the reﬂection principle. Let A = (sups≤t0 Ws ≥ a).1. So this probability is equal to Q(sup (Ys + µs) ≥ a). Sometimes one says P(Wt = x. s≤t0 Now we use the expression for Mt : Q(A) = E P [eµWt0 −µ ∞ 2 t0 /2 . We want to calculate P(sup (Ws + µs) ≥ a). A] s≤t0 = −∞ eµx−µ 2 2 t0 /2 a P(sup Ws ≥ a. it is not hard to see that E Q [Wt . and the name is due to the derivation. s≤t0 Wt is a Brownian motion under P while Yt is a Brownian motion under Q. s≤t0 This in turn is equal to Q(sup Ws ≥ a) = Q(A).

By L´vy’s theorem. A = E P − s t s t E P [Mt | Fr ]µ(Xr )dr. M t − W. A]. Thus the above is equal to s s EP 0 Mr dWr . A . A = E P − = EP − Mr µ(Xr )dr. The last term on the right is equal to t t t EP s d W. A . Therefore E Q Wt + 0 t s µ(Xr )dr. so t Y t = 0 [−µ(Xs )]2 ds. M s . A µ(Xr )dr. Using the product formula again. In Note 3 we give a proof of Theorem 14. Note the martingale part of Yt is the stochastic integral term and the quadratic variation of Y is the quadratic variation of the martingale part. Note 1. A + E P 0 t Wr dMr . Let Yt = − 0 t t µ(Xs )dWs − 1 2 [µ(Xs )]2 ds. Since 0 Mr dWr and 0 Wr dMr are stochastic integrals with respect to martingales. A] + E P [ W. A] = E Q [Ws . A + E Q 0 µ(Xr ) dr. which shows Xt is a martingale with respect to Q. they are themselves martingales. 67 . A + E P 0 Wr dMr . M s .2. this is E P [Ms Ws . A . M r . A . 2 Similarly. 0 We apply Ito’s formula with the function f (x) = ex . A + E P W. A = E Q Ws + 0 µ(Xr )dr. A + E P W. M t − W.2 and in Note 4 we show how Theorem 14.By Ito’s product formula this is t t EP 0 t Mr dWr . Xt − t is a martingale with respect to Q. A s Mt µ(Xr )dr. M t .1 is really a special case of Theorem 14. M t . Xt is a e Brownian motion. A] + E P [ W. A = E Q − s t s = −E Q 0 µ(Xr ) dr.

. Assume without loss of generality that X0 = 0. Note 2. let Sn = n i=1 Xi . then automatically An ≥ a. (14. Since the number of paths that hit a and end up at x is equal to the number of paths that end up at 2a − x.5) is the number of paths that hit a and end up at x divided by the total number of paths. . are independent 1 and identically distributed random variables with P(Xi = 1) = P(Xi = −1) = 2 . the reﬂected path will end up at a + (a − x) = 2a − x. 0 Since stochastic integrals with respect to a Brownian motion are martingales. .Then f (x) = ex . 68 . Let An = max0≤k≤n Sk . then the probability on the left is equal to the number of paths that end up at 2a − x divided by the total number of paths. E Q [Xt . A] = E P [Mt Xt . X2 . If you are playing a game where you toss a fair coin and win $1 if it comes up heads and lose $1 if it comes up tails. Then if A ∈ Fs . An ≥ a) = P(Sn = x) x≥a P(Sn = 2a − x) x < a.4) can be derived from this using a weak convergence argument.4) for Sn . A + E P [ X. . and hence t t Mt = eYt = eY0 + 0 t eYs dYs + 1 2 1 2 eYs d Y 0 t s =1+ 0 Ms (−µ(Xs )dWs − t 1 2 t [µ(Xs )]2 ds 0 + =1− Ms [−µ(Xs )]2 ds 0 Ms µ(Xs )dWs . M t .2. which is P(Sn = x. M − X. M s . A + E P [ X. A + E P 0 t = E Q [Xs . we proceed as follows. f (x) = ex . then Sn will be your fortune at time n. A] = EP 0 Mr dXr . this completes the argument that Mt is a martingale. We will show the analogue of (14. Let Sn be a simple random walk. A] t t = EP 0 s Mr dXr . so the only case to consider is when x < a. A] + E Q [ X. To establish (14. This means that X1 . A]. this is P(Sn = 2a − x). A] Xr dMr . M t . The probability on the left hand side of (14. Any path that crosses a but is at level x at time n has a corresponding path determined by reﬂecting across level a at the ﬁrst time the Brownian motion hits a.5). To prove Theorem 14. which is the right hand side.5) (14. Note 3. note that if x ≥ a and Sn = x. A + E P 0 s Xr dMr .

M r . Hence by Theorem 14. A s = EP = E P [(Dt − Ds )Mt .Here we used the fact that stochastic integrals with respect to the martingales X and M are again martingales. By L´vy’s theorem.2 we see that under Q. A = EP s t = EP s t E P [Mt | Fr ] dDr . M t = −Mt µ(Xt )dt. The proof of the quadratic variation assertion is similar. A] = E P s t d X. t E P [ X. Xt is a continuous martingale with X t = t.1 can also be derived from Theorem 14. this means that X is a Brownian motion under Q.2. A]. Note 4. e 69 . and therefore d X. A] = E Q [Dt − Ds . From our formula for M we have dMt = −Mt µ(Xt )dWt . Here is an argument showing how Theorem 14. M t − X. A Mt dDr . A Mr dDr . On the other hand. M s .

1 d Here the initial value is x0 = (x1 . The idea of the proof is Picard iteration. This is an abbreviation for the equation t i Xt d j σij (Xs )dWs 0 j=1 t = xi 0 + + 0 bi (Xs )ds.15. d. We want to solve d i dXt = j=1 j σij (Xs )dWs + bi (Xs )ds. σ is sometimes called the diﬀusion coeﬃcient and µ is sometimes called the drift coeﬃcient. . There exists one and only one solution to (15. However the constants σ and b depend on the current value of Xt .2) Here Wt is a Brownian motion. . When Xt is at diﬀerent points. and 0 0 Wt1 . . the coeﬃcients vary. We are interested in the existence and uniqueness for stochastic diﬀerential equations (SDEs) of the form dXt = σ(Xt ) dWt + b(Xt ) dt. see Note 1. The above theorem also works in higher dimensions. Stochastic diﬀerential equations. |b(x) − b(y)| ≤ c|x − y| for some constant c. We assume they are Lipschitz. (15. Wtd are d independent Brownian motions.1) is that Xt behaves locally like a multiple of Brownian motion plus a constant drift: locally Xt+h − Xt ≈ σ(Wt+h − Wt ) + b((t + h) − t). . i = 1. Let Wt be a Brownian motion. . (15. . Theorem 15. We also suppose that σ and b grow at most linearly. We have to make some assumptions on σ and b. . . which is how existence and uniqueness for ordinary diﬀerential equations is proved. . which is why they are written σ(Xt ) and b(Xt ).2). the solution process is Xt = (Xt . which means: |σ(x)| ≤ c(1 + |x|). Xt ). This means Xt satisﬁes t t X0 = x0 . . we have existence and uniqueness for the solution. .2) holds for almost every ω. xd ). |b(x)| ≤ c(1 + |x|).1) Xt = x0 + 0 σ(Xs ) dWs + 0 b(Xs ) ds. and (15. . . . 70 . . The intuition behind (15. If all of the σij and bi are Lipschitz and grow at most linearly. which means: |σ(x) − σ(y)| ≤ c|x − y|.1. .

t t Zt = Z0 + 0 t eXs dXs + aZs dWs − 0 t 1 2 eXs a2 ds 0 t 2 0 = Z0 + + t a Zs ds + 2 t bZs ds 0 1 2 a2 Zs ds 0 t = 0 aZs dWs + 0 bZs ds. There is a connection between SDEs and partial diﬀerential equations. If we apply Ito’s formula.2) we know X t = t 0 σ(Xs )2 ds.Suppose one wants to solve dZt = aZt dWt + bZt dt.1) shows that we have in fact found the solution. From (15. Let f be a 2 C function. and it turns out that linear equations are almost the only ones that have an explicit solution. We will verify that this is correct by using Ito’s formula. This is the integrated form of the equation we wanted to solve. we obtain t t f (Xt ) = f (X0 ) + 0 t f (Xs )dWs + 0 µ(Xs )ds + 1 2 f (Xs )σ(Xs )2 ds 0 t t = f (X0 ) + 0 f (Xs )dWs + 0 Lf (Xs )ds. By Ito’s formula with f (x) = ex . Note that this equation is linear in Zt . If we substitute for dXs and d X s . Then Xt is a semimartingale with martingale part aWt and X t = a2 t. In this case we can write down the explicit solution and then verify that it satisﬁes the SDE. Zt = eXt . t t f (Xt ) = f (X0 ) + 0 f (Xs )dXs + 1 2 0 f (Xs )d X s . 71 . where we write 1 Lf (x) = 2 σ(x)2 f (x) + µ(x)f (x). Let 2 Zt = Z0 eaWt −a t/2+bt . Let Xt = aWt − a2 t/2 + bt. The uniqueness result above (Theorem 15.

Proof of uniqueness.L is an example of a diﬀerential operator. E Xt = E g(t) ≤ ct. which implies g must be 0. Note 1. Iteration implies g(t) ≤ Atn /n! for each n. Then t g(s) ds. So E |Xt − Yt |2 = E 0 t t |σ(Xs ) − σ(Ys )|2 ds ≤ c 0 E |Xs − Ys |2 ds. Since the stochastic integral with respect to a Brownian motion is a martingale. If X and Y are two solutions. so g(t) ≤ c 0 c 0 g(r) dr ds. 72 . and for simplicity. Let us illustrate the uniqueness part. assume b is identically 0 and σ is bounded. t Xt − Yt = 0 [σ(Xs ) − σ(Ys )]dWs . This fact can be exploited to derive results about PDEs from SDEs and vice versa. we see from the above that t f (Xt ) − f (X0 ) − 0 Lf (Xs )ds is a martingale. we have t g(t) ≤ c 0 2 Since we are assuming σ is bounded. using the Lipschitz hypothesis on σ. If we let g(t) = E |Xt − Yt |2 . t (σ(Xs ))2 ds 0 s ≤ ct and similarly for E Yt2 .

Second. which doesn’t make sense for stock prices. if one invests $1. Therefore one sets ∆St /St to be the quantity related to a Brownian motion. Let Xt = σWt + (µ − (σ 2 /2)t. We obtain t t St = eXt = eX0 + 0 t eXs dXs + t 1 2 eXs d X 0 s =1+ 0 Ss σdWs + 0 t 1 Ss (µ − 2 σ 2 )ds + t 1 2 Ss σ 2 ds 0 t =1+ 0 Ss σdWs + 0 Ss µds. It is the proportional increase one wants. a Brownian motion can become negative. So the model that is used is to let the stock price be modeled by the SDE dSt /St = σdWt + µdt. dSt = σSt dWt + µSt dt. Diﬀerent stocks have diﬀerent volatilities σ (consider a high-tech stock versus a pharmaceutical). In fact. one expects a mean rate of return µ on one’s investment that is positive (otherwise.1 there will only be one solution.1. and apply Ito’s formula.1) Fortunately this SDE is one of those that can be solved explicitly. namely. Proposition 16.000. (16.1). (16. We already did this.2) satisﬁes (16. The most common model by far in ﬁnance is one where the security price is based on a Brownian motion. One does not want to say the price is some multiple of Brownian motion for two reasons. Continuous time ﬁnancial models. but it is important enough that we will do it again. so we need to verify that St as given in (16. 73 . In addition.000 in a stock selling for $1 and it goes up to $2. let f (x) = ex . of all. why not just put the money in the bank?). Let us ﬁrst assume S0 = 1.2) Proof. Using Theorem 15. one expects the mean rate of return to be higher than the risk-free interest rate r because one expects something in return for undertaking risk.1) is given by St = S0 eσWt +(µ−(σ 2 /2)t) . The solution to (16.000 in a stock selling for $100 and it goes up to $200. and in fact we gave the solution in Section 15. $1. one has the same proﬁt. or what looks better.16. First. as if one invests $1.

When we hold ∆i shares of stock from ti to ti+1 . but this model has proved to be a very good one. for example. Similarly to (16. Note that P0 = S0 . at time t0 one has the original wealth Xt0 .which is (16..4) The continuous time model of ﬁnance is that the security price is given by (16. the solution to this SDE is Pt = P0 eσWt +(µ−r−σ 2 /2)t . One now pays ∆1 St1 for ∆1 shares at time t1 and continues. The requirement that the integrand of a stochastic integral be adapted is very natural: we cannot base the number of shares we own at time s on information that will not be available until the future.3) is the same as t Xt0 + t0 ∆(s)dSs . This clearly is not the way the market actually works. The formula for our wealth then becomes t Xt0 + t0 ∆(s)dPs . So Pt = e−rt St . that there are no transaction costs. where we have t ≥ ti+1 and ∆(s) = ∆i for ti ≤ s < ti+1 .3) To see this.1). By Ito’s product formula. then changes the investment to ∆2 at time t2 . and so one’s wealth is now Xt0 + ∆0 (St1 − St0 ). In other words.1) (often called geometric Brownian motion). our proﬁt in present days dollars will be ∆i (Pti+1 − Pti ). etc. just multiply both sides by S0 . How should we modify this when the interest rate r is not zero? Let Pt be the present value of the stock price. Suppose for the moment that the interest rate r is 0. (16.2). The right hand side of (16. stock prices are discrete. 74 . (16. our wealth is given by a stochastic integral with respect to the stock price. then changes the investment to ∆1 shares at time t1 . dPt = e−rt dSt − re−rt St dt = e−rt σSt dWt + e−rt µSt dt − re−rt St dt = σPt dWt + (µ − r)Pt dt. At time t1 one sells the ∆0 shares for the price of St1 per share. but one can trade as many shares as one wants and vary the amount held in a continuous fashion. If S0 = 0. If one purchases ∆0 shares (possibly a negative number) at time t0 . One buys ∆0 shares and the cost is ∆0 St0 . then one’s wealth at time t will be Xt0 + ∆0 (St1 − St0 ) + ∆1 (St2 − St1 ) + · · · + ∆i (Sti+1 − Sti ).

FT is the collection of events A such that A ∩ (T > t) ∈ Ft for all t. the proof of which is given in Note 2. Let’s try to give a more precise description of this property. 75 .. Let GN be the smallest σ-ﬁeld with respect to which each of these random variables XN is measurable. . Markov properties of Brownian motion. 1. In other words. ∞) such that (T ≤ t) ∈ Ft for all t. Where did the sequence Xk come from? It could be any adapted sequence. this says that to predict the future. let FN = {A : A ∩ (N ≤ k) ∈ Fk for all k}. . that is. we need that the Ft be right continuous (see Section 10). T taking values in [0. Since Zt − Zs = Wt+r − Ws+r . Phrased another way. Recall that a stopping time in the continuous framework is a r.17. Fix r and let Zt = Wt+r − Wr . we only need to know where we are and not how we got there. One can also check the other parts of the deﬁnition to show that Zt is also a Brownian motion. 2. The σ-ﬁeld FN is just a bit easier to work with. we want XN to be FN measurable. In particular. It will be simpler to consider the discrete time case. Therefore one deﬁnition of the σ-ﬁeld of events occurring before time N might be: Consider the collection of random variables XN where Xk is a sequence adapted to Fk . Now we proceed to the strong Markov property for Brownian motion. So a reasonable deﬁnition of FN should allow us to calculate XN whenever we know which events in FN have occurred or not. then the distribution of Zt − Zs is normal with mean zero and variance (t + r) − (s + r). then knowing which events in Fk have occurred allows us to calculate Xk for each k. The analogue of FT in the discrete case is the following: if N is a stopping time. if we want to predict Wt+r and we know Wt . Because Wt+r − Wt is independent of σ(Ws : s ≤ t). We show in Note 1 that FN = GN . Clearly the map t → Zt is continuous since the same is true for W .v. but this is fairly technical and we will ignore it. Let us try to provide some motivation for this deﬁnition of FT . To make a satisfactory theory. then knowing the path of W up to time s gives no help in predicting Wt+r − Wt . Let Wt be a Brownian motion. . we want GN to be the σ-ﬁeld generated by the collection of random variables XN for all sequences Xk that are adapted to Fk . then knowing the path up to time t gives no additional advantage in predicting Wt+r . Xk is Fk measurable when k = 0. which is known as the Markov property. If T is a stopping time. Or phrased another way. If Xk is a sequence that is adapted to the σ-ﬁelds Fk .

set Y0 (ω) = A. As you can see. . So we now have one process. called A A A Xn . represents the position at time n for the chain started at A. and C. Wtx ∈ An ). B. and three probabilities PA . This proposition says: if you want to predict XT +t . Since XT +t − XT is independent of FT . . Suppose we have a probability P and three diﬀerent Markov chains. Then E Xs f (Xt−s ) means ϕ(Xs (ω)). X1 = A. Deﬁne A A A PA (AAA) = P(X0 = A. If s < t and f is bounded or nonnegative. . Deﬁne Px on (Ω . F ) by Px (Xt1 ∈ A1 . and Xn . Yn . . (BAB).s. B. which is known as Brownian motion started at x. let Xt (ω) = ω(t). and similarly for all the other 26 values of ω. . 2. . Deﬁne Ω = {(AAA). 1 n What we have done is gone from one probability space Ω with many processes Wtx to one process Xt with many probability measures Px .1. We prove this in Note 3. PB . Let Wt be a Brownian motion. The right hand side is to be interpreted as follows. the extra information given in FT does you no good at all. . Deﬁne Ω to be set of continuous functions on [0. So X0 = A. you could do it knowing all of FT or just knowing XT .Proposition 17. . Deﬁne B B B B PB (AAA) = P(X0 = A. then XT +t − XT is a mean 0 variance t random variable and is independent of FT . No knowledge of Markov chains is necessary to understand this. and so on. B. A. Here is another formulation of the Markov property. Consider the process Wtx = x + Wt . One often writes Pt f (x) for E x f (Xt ). ∞). X2 = A) (this will be 0 because we know X0 = B). and let the σ-ﬁeld be the one generated by the Xt . PC . X1 = A. and X1 can B A be one of A. (AAB). We also deﬁne PC . ..2. (BAA). Xtn ∈ An ) = P(Wtx ∈ A1 . . . X2 = A). Similarly we have Xn . . 1. . . So Ω denotes the possible sequence of states for time n = 0. An example in the Markov chain setting might help. Y1 (ω) = B. C. . Y2 (ω) = A. Similarly deﬁne PA (AAB). then E x [f (Xt )g(Xu ) | Fs ] = E Xs [f (Xt−s )g(Xu−s )]. We need a way of expressing the Markov and strong Markov properties that will generalize to other processes. and similarly for the other values of ω. . (ABA). X2 can be one of A. the chain C started at B. . If ω = ABA. Proposition 17.}. If Xt is a Brownian motion and T is a bounded stopping time. there really isn’t all that much going on here. The ﬁrst. then E x [f (Xt ) | Fs ] = E Xs [f (Xt−s )]. 76 . Deﬁne ϕ(x) = E x f (Xt−s ). a. C. This formula generalizes: If s < t < u. Suppose we have a Markov chain with 3 states. .

the family (Px . then (N ≤ j) ∈ Fj and (N ≤ j − 1)c ∈ Fj−1 ⊂ Fj . Therefore (XN > a) ∩ (N ≤ k) = ∪k ((XN > a) ∩ (N = j)) ∈ Fk . and combining with the previous paragraph.1. One can show that when there is uniqueness for the solution to the SDE. We have thus shown that FN ⊂ GN . but here we do not have translation invariance. Since A ∈ FN . Since N is a stopping time. .and so on for functions of X at more times. . the statement and proof of Proposition 17. Xt ) satisﬁes the Markov and strong Markov property. it suﬃces to show that XN is measurable with respect to FN whenever Xk is adapted. . But XN = 1A∩(N ≤N ) = 1A . Proposition 17. Xtn ∈ An ) = P(Xt1 ∈ A1 . The event (Xj > a) ∈ Fj since X is an adapted sequence. If T is a bounded stopping time. Let Xk = 1A∩(N ≤k) . This is similar to what we did in deﬁning Px for Brownian motion. x so that Xt is the solution of the SDE started at x. The statement is precisely the same as the statement of Proposition 17. If we let Xt denote the solution to t t x Xt = x + 0 x σ(Xs )dWs + 0 x b(Xs )ds.3. If j ≤ k. so Xk is Fk measurable. Using Proposition 17. then E x [f (XT +t ) | FT ] = E XT [f (Xt )].2 can be extended to stopping times. and so the event (N = j) = (N ≤ j) ∩ (N ≤ j − 1)c is in Fj . so A = (XN > 0) ∈ GN . we suppose that A ∈ FN . we conclude FN = GN . . We can also establish the Markov property and strong Markov property in the x context of solutions of stochastic diﬀerential equations. then (N = j) ∈ Fj ⊂ Fk . 77 . . We want to show GN = FN . we can deﬁne new probabilities by x x Px (Xt1 ∈ A1 . Therefore we need to show that for such a sequence Xk and any real number a. . to show GN ⊂ FN .3. Note 1. Since GN is the smallest σ-ﬁeld with respect to which XN is measurable for all adapted sequences Xk and it is easy to see that FN is a σ-ﬁeld. To show FN ⊂ GN . . then A∩(N ≤ k) ∈ Fk . the event (XN > a) ∈ FN . Xtn ∈ An ). . j=0 which proves that (XN > a) ∈ FN . Now (XN > a) ∩ (N = j) = (Xj > a) ∩ (N = j).

A] = E [f (X k +t − X k ). Taking limits this equation holds for all bounded f . A ∩ Tn = k/2n ] n n 2 2 = Let n → ∞. If f is smooth enough and has compact support. Then A ∈ FTn as well. we have the lemma for all f . ϕ(y) = E y [f (Xt−s )] = E [eiu(Wt−s +y) ] = eiuy e−u 2 2 (t−s)/2 . Finally. u ∈ R.2. Let Tn be deﬁned by Tn (ω) = (k + 1)/2n if T (ω) ∈ [k/2n . This follows from using the inversion formula for Fourier transforms. E [f (XT +t − XT ). It is easy to check that Tn is a stopping time. Proof. Let f (x) = eiux . If we let A ∈ FT be arbitrary and f = 1B . Let f be continuous and A ∈ FT . A) = P(Xt ∈ B)P(A) = P(XT +t − XT ∈ B)P(A). which implies that XT +t − XT is independent of FT . We have E [f (XTn +t − XTn ). Thus we can approximate f (x) by a linear combination of terms of the form eiuj x . then one can recover f by the formula 1 f (x) = e−iux f (u) du. So ϕ(Xs ) = E x [eiuXt | Fs ]. Note 3. 2π We can ﬁrst approximate this improper integral by 1 2π N e−iux f (u) du −N N 1 by taking N larger and larger. For each N we can approximate 2π −N e−iux f (u) du by using Riemann sums. A] = E f (Xt )P(A). recall from undergraduate analysis that every bounded function is the limit of linear combinations of functions eiux . (t−s)/2 . We use f (u) = eiux f (x) dx. 78 . so E [f (X k 2n +t −X k 2n )]P(A ∩ Tn = k/2n ) = E f (Xt )P(A). (k + 1)/2n ). we see that XT +t − XT has the same distribution as Xt . Using linearity and taking limits. which is that of a mean 0 variance t normal random variable. bounded functions can be approximated by smooth functions with compact support. Then E x [eiuXt | Fs ] = eiuXs E x [eiu(Xt −Xs ) | Fs ] = eiuXs e−u On the other hand.Note 2. Before proving Proposition 17. If we take A = Ω and f = 1B . we see that P(XT +t − XT ∈ B. There are various slightly diﬀerent formulas for the Fourier transform.

so it doesn’t matter which one we work with. Our goal is to prove Theorem 18. t r Mr = E [Mt | Fr ] = c + E 0 Hs dWs | Fr = c + 0 Hs dWs . Suppose Ms is a martingale adapted to Fs . Don’t forget that we need E Mt2 < ∞ and Ms adapted to the σ-ﬁelds of a Brownian motion.1) V =c+ 0 Hs dWs . The stochastic integral is a martingale. then there exists a constant c and t 2 an adapted integrand Hs with E 0 Hs ds < ∞ such that t Hs dWs . s ≤ t.2) we see that Ft is also the same as the σ-ﬁeld generated by Ss . provided E V 2 < ∞. In Note 1 we show that if every martingale can be represented as a stochastic integral. We outline one proof here. By Theorem 18. Suppose also that E Mt2 < ∞. then every random variable V that is Ft measurable can. then there exists Hs adapted such that V = V0 + where V0 is a constant. too. 79 . proved in Note 2.18. From (16. If V is Ft measurable and E V 2 < ∞.1. We start with the following. s ≤ t. where the Fs are the σ-ﬁeld generated by a Brownian motion. This means that no matter what option one comes up with. In the next section we use this to show that under the model of geometric Brownian motion the market is complete. Before we prove this.1. we can write t Mt = V = c + 0 Hs dWs . We want to show that if V is Ft measurable. one can exactly replicate the result (no matter what the market does) by buying and selling shares of stock. Unfortunately. (18. what this says is the converse: every martingale can be represented as a stochastic integral. they are all technical. There are several proofs of Theorem 18. Set V = Mt . In mathematical terms.1. let us explain why this is called a martingale representation theorem. we let Ft be the σ-ﬁeld generated by Ws . Martingale representation theorem. In this section we want to show that every random variable that is Ft measurable can be written as a stochastic integral of Brownian motion. giving details in the notes. so for r ≤ t. We already knew that stochastic integrals were martingales.

n and for each n the process H n is adapted with E 0 (Hs )2 ds < ∞. We have shown that random variables of the form f1 (Wt1 − Wt0 )f2 (Wt2 − Wt1 ) · · · fn (Wtn − Wtn−1 ) 80 (18. Proof of Theorem 18.4.) Proposition 18. the random variable g(Wt ) is in R. (The proof is in Note 3.3. E |V n − V |2 → 0. . n = cn + 0 n Hs dWs . 0 for some adapted H with E Next we show R contains a particular collection of random variables. What this proposition says is that if we can represent a sequence of random variables Vn and Vn → V . An almost identical proof shows that if f is bounded. .1. See Note 4 for the proof. If t0 ≤ t1 ≤ · · · ≤ tn ≤ t and f1 . Proposition 18.Proposition 18.V is Ft measurable. then we can represent V .2) . then f1 (Wt1 − Wt0 )f2 (Wt2 − Wt1 ) · · · fn (Wtn − Wtn−1 ) is in R. By this we mean t R = {V : E V 2 < ∞. .2. fn are bounded functions. If g is bounded. Suppose t V cn → c. We now ﬁnish the proof of Theorem 18. .V = c + 0 t Hs dWs 2 Hs ds < ∞}. We have shown that a large class of random variables is contained in R. then t f (Wt − Ws ) = c + s Hr dWr for some c and Hr . Then there exist a t 2 constant c and an adapted Hs with E 0 Hs ds < ∞ so that t t Vt = c + 0 Hs dWs . Let R be the collection of random variables that can be represented as stochastic integrals.1.

We prove Proposition 18.2. and this implies Hs is adapted. Clearly if Vi ∈ R for i = 1. n This says that Hs is a Cauchy sequence in the space L2 (with respect to the norm · 2 given by Y 2 = E Ys2 ds there exists Hs such that t 0 1/2 ). so t n |Hs − Hs |2 ds → 0. we can ﬁnd a sequence Vk such that E |Vk − V |2 → 0 and each Vk is a linear combination of random variables of the form given in (18. Note 2. . and ai are constants. Note 1. Suppose we know that every martingale Ms adapted to Fs with E Mt2 can be r represented as Mr = c+ 0 Hs dWs for some suitable H. 0 E n In particular Hs → Hs . Finally. So t E 0 n m (Hs − Hs )dWs 2 → 0. Therefore Ut = V − c. so r Mr = c + 0 Hs dWs for suitable H. 81 . and U has the desired form. t V = E [V | Ft ] = Mt = c + 0 Hs dWs . . . t E |(V n − cn ) − Ut |2 = E 0 n (Hs − Hs )2 ds → 0. is that E 0 Hs ds.2. Applying this with r = t. Measure theory tells us that L2 is a complete metric space. Then as above. let Mr = E [V | Fr ]. If V is Ft measurable with E V 2 < ∞. m → ∞. from measure theory we know that if E V 2 < ∞ and V is Ft measurable. E |(V n − cn ) − (V m − cm )|2 → 0 as n. due to Fatou’s t 2 lemma. From our formulas for stochastic integrals. t Let Ut = 0 Hs dWs . Now apply Proposition 18. m. Another consequence. then a1 V1 + · · · am Vm is also in R. We know this is a martingale.are in R. this means t E 0 n m |Hs − Hs |2 ds → 0.2). . By our assumptions.

So if we multiply (18. Set K r = Kr if s ≤ r < t and 0 otherwise. If we multiply both sides by e −u2 t/2 .2. 82 . C ∞ with compact support). The argument is by induction.2 we take limits and obtain the proposition. g(Wu − Wt ) = d + t Ks dWs . By Ito’s formula with Xs = −iuWs + u2 s/2 and f (x) = ex ..3. From Proposition 18. Y = cd + 0 [Xr K r + Yr H r ]dWr . because we approximate our integral by Riemann sums.Note 3. Y Then by the Ito product formula. s = 0 H r K r dr = 0. Note 4. which is a constant and hence adapted. Set H r = Hr if 0 ≤ s < t and 0 otherwise. s s Xs Ys = X0 Y0 + 0 s Xr dYr + 0 s Yr dXr + X. we obtain t f (Wt ) = c + 0 Hs dWs for some constant c and some adapted integrand H. So we suppose V = f (Wt )g(Wu − Wt ). Let s s Xs = c + 0 H r dWr and Ys = d + 0 K r dWr .g. Then s X. then its Fourier transform f will also be very nice. (We implicitly used Proposition 18.3) by f (u) and integrate over u from −∞ to ∞. we obtain t e −iuWt = cu + 0 u Hs dWs (18. t t eXt = 1 + 0 eXs (−iu)dWs + 0 t eXs (u2 /2)ds + = 1 − iu 0 1 2 t eXs (−iu)2 ds 0 eXs dWs . Here is the proof of Proposition 18. and then take a limit.) Now using Proposition 18. If f is a smooth function (e.3 we now have that t u f (Wt ) = c + 0 Hs dWs .3) for an appropriate constant cu and integrand H u . let us do the case n = 2 for clarity.

If we now take s = u. that is exactly what we wanted. Note that Xr K r + Yr H r is 0 if r > u. this is needed to do the general induction step. 83 .

(19. So dPt = σPt dWt + σPt adt + (µ − r)Pt dt.2) we have t V =c+ 0 −1 Hs σ −1 Ps dPs . The probability P is called the risk-neutral measure.19. If Pt is a geometric Brownian motion and V is Ft measurable and square integrable. we then have dPt = σPt dWt . then there exist a constant c and an adapted process Ks such that t V =c+ 0 Ks dPs . Wt = Wt − at is a Brownian motion under P. if Pt = P0 exp(σWt + (µ − r − σ 2 /2)t). Under P the present day value of the stock price is a martingale. But then using (19. As we mentioned in Section 16. We can rewrite (19.1) as dWt = σ −1 Pt−1 dPt .1 that there exist a t 2 constant and an adapted process Hs such that E 0 Hs ds < ∞ and t V =c+ 0 Hs d W s . so the σ ﬁelds generated by Pt and Wt are the same. there is a probability P under which Pt is a martingale. then given Pt we can determine Wt and vice versa. we know by Theorem 18. We have therefore proved Theorem 19. Moreover. dP By the Girsanov theorem. (19. Deﬁne a new probability P by dP = Mt = exp(aWt − a2 t/2).1) Since Wt is a Brownian motion under P. 84 .2) Given a Ft measurable variable V . since it is a stochastic integral of a Brownian motion. Now let Pt be a geometric Brownian motion. If we choose a = −(µ − r)/σ. then Pt must be a martingale.1. Completeness. Recall Pt satisﬁes dPt = σPt dWt + (µ − r)Pt dt.

So taking expectations in (20. Finally. we just reverse things: we buy the option instead of sell it. Proof. we have by Theorem 19. Recall that under P the stock price satisﬁes dPt = σPt dWt .1) and under P.1. and hold −Ks shares of stock at time s.20. Starting with 0 dollars. The price of V must be E V . The formula in the statement of Theorem 20. If W0 < c. if we use c of those dollars. we can sell the option V for W dollars. or W0 = c. Let T ≥ 0 be a ﬁxed real. we obtain E V = c.1). where V = e−rt (St − K)+ = (e−rt St − e−rt K)+ = (Pt − e−rt K)+ . We can now derive the formula for the price of any option. then at time T we will have erT (W0 − c) + V dollars. Black-Scholes formula. and invest according to the strategy of holding Ks shares at time s.1 that T V =c+ 0 Ks dPs . Suppose the price of the option V at time 0 is W . we must have W0 ≥ c. In fact. under P the process Pt is a martingale. . So then Pt = P0 eσWt −σ 85 2 t/2 . That leaves us a proﬁt of erT (W0 − c) if W0 > c. (20. where Wt is a Brownian motion under P. the process Ps is a martingale. By the same argument. and use the W dollars to buy and trade shares of the stock. Suppose we have the standard European option. Therefore W0 must be less than or equal to c. I. Theorem 20. is amenable to calculation.1. without any risk. At time T the buyer of our option exercises it and we use V dollars to meet that obligation. If V is FT measurable. This is the “no arbitrage” principle again. since we can’t get a riskless proﬁt.

Now √ 2 xeσ T Z−σ T /2 > e−rT K if and only if √ log x + σ T Z − σ 2 T /2 > −r + log K. Therefore. T ) − σ T . (20. and there is no µ present here. T ) = √ h(x. (20.4) where Wt is a Brownian motion under P and we write x for P0 = S0 . Since WT is a normal √ random vairable with mean 0 and variance T .3) which is independent of µ since Pt is. similarly to formulas we have already done. Under P the process Pt satisﬁes dPt = σPt dWt . T ) = g(x.) The price of the option V is E [PT − e−rT K]+ . Note 1. We want to calculate E (xeσWT −σ 2 T /2 − e−rT K)+ . where Φ(z) = √1 2π 2 z e−y /2 −∞ 2 dy.2) − e−rT K)+ ]. T )). we can write it as T Z. log(x/K) + (r + σ 2 /2)T √ . It is of considerable interest that the ﬁnal formula depends on σ but is completely independent of µ. σ T g(x. so we can do some calculations (see Note 1) and end up with the famous Black-Scholes formula: W0 = xΦ(g(x. T )) − Ke−rT Φ(h(x. The reason for that can be explained as follows. (We used the Girsanov formula to get rid of the µ. where Wt is a Brownian motion. /2)T We know the density of WT is just (2πT )−1/2 e−y /(2T ) . 86 .Hence E V = E [(PT − e−rT K)+ ] = E [(P0 eσWT −(σ 2 (20. 2 Pt = P0 eσWt −σ t/2 . x = P0 = S0 . where Z is a standard mean 0 variance 1 normal random variable.

Recall that 1 − Φ(z) = Φ(−z) for all z by the symmetry of the normal density. √ This is the Black-Scholes formula if we observe that σ T − z0 = g(x. So (20.or if Z > (σ 2 T /2) − r + log K − log x. T ). T ) and −z0 = h(x. We write z0 for the right hand side of the above inequality. 2 87 .4) is equal to ∞ √1 2π (xeσ z0 √ T z−σ 2 T /2 ∞ − e−rT K)+ e−z √ −2σ T z+σ 2 T √ 2 T) 2 /2 dz ∞ = x √1 2π = x √1 2π = x √1 2π e− 2 (z z0 ∞ z0 ∞ √ z0 −σ T 1 1 2 dz − Ke−rT √1 2π e−z z0 2 /2 dz e− 2 (z−σ dz − Ke−rT (1 − Φ(z0 )) e−y /2 dy − Ke−rT Φ(−z0 ) √ = x(1 − Φ(z0 − σ T )) − Ke−rT Φ(−z0 ) √ = xΦ(σ T − z0 ) − Ke−rT Φ(−z0 ).

Therefore the expectation of g(WT ) is 0. but in many cases of interest it is possible. σPs (21. we know that Mt is a martingale.21. or if we worked for a bank and wanted to provide an option for sale.3. W is a Brownian motion. what should Hs be? This might be important to know if we wanted to duplicate an option that was not available in the marketplace. The previous section allows us to compute the value of any option. 2πt Let Mt = E [g(WT ) | Ft ]. T − t). if we know V = E V + 0 Hs dSs . other than worrying about some integrability conditions. First.2.3) Now let us apply Ito’s formula with the function f (x1 . but we would also T like to know what the hedging strategy is.1). Recall that under P. 0 (21.2) Therefore it suﬃces to ﬁnd the representation of the form (21. It turns out t it makes no diﬀerence: the deﬁnition of 0 Hs dXs for a semimartingale X does not depend on the probability P. suppose we want to hedge the standard European call V = e−rT (ST − K)+ = (PT − e−rT K)+ . We have dXt = dWt and dXt = −dt. we see that Mt = E Wt [g(WT −t ] = PT −t g(Wt ). Recall from the section on the Markov property that Pt f (x) = E f (Wt ) = E f (x + Wt ) = x √ 1 −(y)2 /2t e f (x + y)dy. where g(x) = (eσx−σ 2 T /2 − e−rT K)+ − E V. Xt ) = (Wt . Hedging strategies. By Proposition 4. (21. We illustrate one technique with two examples. we have T g(WT ) = c + 0 1 Hs dPs . We can rewrite V as V = E V + g(WT ). It is not always possible to compute H. If we write g(WT ) as T Hs d W s . x2 ) = Px2 g(x1 ) to the process 1 2 Xt = (Xt . We are working here with the risk-neutral probability only.1) then since dPt = σPt dWt . By the Markov property Proposition 17. So we need to use the multidimensional version of Ito’s 1 2 2 formula. This means. Since Xt is a decreasing process and has 88 .

the maximum up to time t. 89 . Once we believe this. Let Nt = sups≤t Ss . Xt ) t = 0 and d X 1 . We conclude t Mt = 0 ∂ PT −s g(Ws )dWs . ∂x and we have our representation. ∂x If we take t = T . Ito’s formula = 1 2 f (X0 . If the stock price at time 1 is close to $100. How can one get the equivalent outcome without looking into the future? For simplicity. So the prediction for N2 does not depend just on N1 . This same intuitive reasoning does suggest. however. t) is a Markov process. Intuitively. Then Mt = E [g(ZT ) | Ft ] = E Zt [g(ZT −t )] = PT −t g(Zt ). f (Xt ) would have a bounded variation part. then d X 2 says that 1 2 f (Xt . Here the payoﬀ is sups≤T Ss . and we want to predict the maximum up to time 2. but also the stock price at time 1.no martingale part. Nt . Let g(Zt ) = Nt − E NT . we then have T g(WT ) = MT = 0 ∂ PT −s g(Ws )dWs . the reasoning goes like this: suppose the maximum up to time 1 is $100. let us suppose the interest rate r is 0. the largest the stock price ever is up to time T . For a second example. the rest of the argument is very similar to the ﬁrst example. while if the stock price at time 1 is close to $2. if not. z Let Pu f (z) = E f (Zu ). let’s look at the sell-high option. and this turns out to be correct. X 2 t 2 t = 0. so we can compute its value. so the sum of the terms involving dt must be zero. adding in the history of the stock prices up to time t gives no additional information. we would deﬁnitely have another prediction. ∂x1 But we know that f (Xt ) = PT −t g(Wt ) = Mt is a martingale. that the triple Zt = (St . then we have one prediction. X j ∂xi ∂xj i. where z = (s. X0 ) t + 0 i=1 2 ∂f i (Xt )dXt ∂xi t + =c+ 1 2 t 0 0 ∂2f (Xt )d X i . while d X 1 t = dt. n.j=1 ∂f (Xt )dWt + some terms with dt. It is not the case that Nt is a Markov process. Adding in the information about the current stock price gives a certain amount of evidence to predict the future values of Nt . t). This is FT measurable.

We then let f (s. n. using what is known as the Clark-Haussmann-Ocone formula. which are of bounded variation. so has no martingale part. t) = n. which is the martingale term. which is also of bounded variation. we get some terms involving dt. There is another way to calculate hedging strategies. Ns . When we apply Ito’s formula. This is a more complicated procedure. s)dSs . which is t 0 ∂ PT −s g(Ss . 90 . t) = PT −t g(s. and most cases can be done as well by an appropriate use of the Markov property. and hence N t = 0. t) and apply Ito’s formula. Therefore we should be left with the martingale term. The process Nt is always increasing. and it can be explicitly calculated. This gives us our hedging strategy for the sell-high option. ∂s where again g(s. n. and we get a term involving dNt . n. we get a dSt term. so all the dt and dNt terms must cancel. But Mt is a martingale.

it allows one to compute more easily what the equivalent strategy of buying or selling stock should be to duplicate the outcome of the given option. (22. II. t t Vt − V0 = 0 au dSu + 0 bu dβu .3) . Then Vt − V0 = f (St . 0 On the other hand. Also. we must have bt = (Vt − at St )/βt . Black-Scholes formula. Xt ) and fxi denotes the partial derivative of f in the xi direction.22. Here is a second approach to the Black-Scholes formula. Let Vt be the value of the portfolio and assume Vt = f (St . respectively. T − t). Also. 1 2 d X t = σ 2 St dt. X 2 t = 0 (since T − t is of bounded variation and hence has no 2 martingale part). Recall Ito’s formula. . . .1) fs (Su . 1 d Here Xt = (Xt . T − t) − f (S0 . but does not work in the generality that the ﬁrst approach does. We also want VT = (ST − K)+ . and similarly for the second partial derivatives.2) This formula says that the increase in net worth is given by the proﬁt we obtain by holding au shares of stock and bu bonds at time u. T − u) du. From the SDE that St solves. dXt = −dt. T − u) dSu − t + 1 2 2 σ 2 Su fxx (Su . . This approach works for European calls and several other options. held at time u. Since the value of the portfolio at time t is Vt = at St + bt βt . X j s . where f is some function that is suﬃciently smooth.j=1 fxi xj (Xs ) d X i . T ) t t (22. 91 (22.4) (22. recall βt = β0 ert . T − t) for all t. We apply this with d = 2 and Xt = (St . T − u) du 0 = 0 fx (Su . if au and bu are the number of shares of stock and bonds. On the other hand. In this section we work with the actual price of the stock instead of the present value. and X 1 . The multivariate version is t d i fxi (Xs ) dXs 0 i=1 f (Xt ) = f (X0 ) + 1 + 2 t d 0 i. X 2 t = 0.

(22. 2 and f (x.1).5) Solving this equation for f . the cost of setting up the equivalent portfolio. T − t) and 1 2 r[f (St .6) (22. (22.To match up (22. T − t) 2 for all t and all St . T ) is what V0 should be. Equation (22.2) with (22. i.e.6) leads to the parabolic PDE fs = 1 σ 2 x2 fxx + rxfx − rf. 0) = (x − K)+ . T − t) + σ 2 St fxx (St . ∞) × [0. (22. f (x.5) shows what the trading strategy should be. T ). T − t)] = −fs (St .7) (22.. s) ∈ (0. we must therefore have at = fx (St .8) (x. T − t) − St fx (St . 92 .

The condition says that one can with positive probability ε make a proﬁt of b and with a loss no larger than 1/n. where f (t) is a deterministic increasing continuous function. Q is equivalent to P.1. and can be written St = M t + At . and St is a martingale under Q. we would want to let Mt = e − t 0 1 f (s)dWs − 2 t 0 (f (s))2 ds .. then there exists an equivalent martingale measure Q. and P 0 T Hs dSs > b > ε for some b. Suppose that we happened to have St = Wt + f (t). so we will only point examine a part of it. Mathematically. 93 . If St is a continuous semimartingale and the NFLVR conditions holds. we need a slightly weaker condition. for all t and P 0 Hn (s) dSs > b > ε. arbitrage exists if there exists Hs that is adapted and satisﬁes a suitable integrability condition with T Hs dSs ≥ 0.23. we showed there was a probability measure under which Pt = e−rt St was a martingale. The NFLVR condition (“no free lunch with vanishing risk”) is that there do not exist a ﬁxed time T .e. 0 a. To obtain the equivalent martingale measure. Two probabilities P and Q are equivalent if P(A) = 0 if and only Q(A) = 0. In Section 19. Let St be the price of a security in today’s dollars. The fundamental theorem of ﬁnance. It turns out that to get a necessary and suﬃcient condition for St to be a martingale. Theorem 23. i. We will suppose St is a continuous semimartingale. b. This is true very generally.s. ε > 0. n T a. the two probabilities have the same collection of sets of probability zero. ε. ε do not depend on n. The proof is rather technical and involves some heavy-duty measure theory. Here T. b > 0. Arbitrage means that there is a trading strategy Hs such that there is no chance that we lose anything and there is a positive proﬁt with positive probability. and Hn (that are adapted and satisfy the appropriate integrability conditions) such that T 0 1 Hn (s) dSs > − . Q is an equivalent martingale measure if Q is a probability measure.s.

In order for Mt to make sense, we need f to be diﬀerentiable. A result from measure theory says that if f is not diﬀerentiable, then we can ﬁnd a subset A of [0, ∞) such t that 0 1A (s)ds = 0 but the amount of increase of f over the set A is positive. This last statement is phrased mathematically by saying

t

1A (s)df (s) > 0,

0

where the integral is a Riemann-Stieltjes (or better, a Lebesgue-Stieltjes) integral. Then if we hold Hs = 1A (s) shares at time s, our net proﬁt is

t t t

Hs dSs =

0 0

1A (s)dWs +

0

1A (s) df (s).

The second term would be positive since this is the amount of increase of f over the set t t A. The ﬁrst term is 0, since E ( 0 1A (s)dWs )2 = 0 1A (s)2 ds = 0. So our net proﬁt is nonrandom and positive, or in other words, we have made a net gain without risk. This contradicts “no arbitrage.” See Note 1 for more on this. Sometime Theorem 23.1 is called the ﬁrst fundamental theorem of asset pricing. The second fundamental theorem is the following. Theorem 23.2. The equivalent martingale measure is unique if and only if the market is complete. We will not prove this. Note 1. We will not prove Theorem 23.1, but let us give a few more indications of what is going on. First of all, recall the Cantor set. This is where E1 = [0, 1], E2 is the set obtained 1 from E1 by removing the open interval ( 3 , 2 ), E3 is the set obtained from E2 by removing 3 the middle third from each of the two intervals making up E2 , and so on. The intersection, E = ∩∞ En , is the Cantor set, and is closed, nonempty, in fact uncountable, yet it contains n=1 no intervals. Also, the Lebesgue measure of A is 0. We set A = E. Let f be the CantorLebesgue function. This is the function that is equal to 0 on (−∞, 0], 1 on [1, ∞), equal to 1 1 2 1 1 2 3 7 8 2 on the interval [ 3 , 3 ], equal to 4 on [ 9 , 9 ], equal to 4 on [ 9 , 9 ], and is deﬁned similarly on each interval making up the complement of A. It turns out we can deﬁne f on A so that it is 1 continuous, and one can show 0 1A (s)df (s) = 1. So this A and f provide a concrete example of what we were discussing.

94

24. American puts. The proper valuation of American puts is one of the important unsolved problems in mathematical ﬁnance. Recall that a European put pays out (K − ST )+ at time T , while an American put allows one to exercise early. If one exercises an American put at time t < T , one receives (K − St )+ . Then during the period [t, T ] one receives interest, and the amount one has is (K − St )+ er(T −t) . In today’s dollars that is the equivalent of (K − St )+ e−rt . One wants to ﬁnd a rule, known as the exercise policy, for when to exercise the put, and then one wants to see what the value is for that policy. Since one cannot look into the future, one is in fact looking for a stopping time τ that maximizes E e−rτ (K − Sτ )+ . There is no good theoretical solution to ﬁnding the stopping time τ , although good approximations exist. We will, however, discuss just a bit of the theory of optimal stopping, which reworks the problem into another form. Let Gt denote the amount you will receive at time t. For American puts, we set Gt = e−rt (K − St )+ . Our problem is to maximize E Gτ over all stopping times τ . We ﬁrst need Proposition 24.1. If S and T are bounded stopping times with S ≤ T and M is a martingale, then E [MT | FS ] = MS . Proof. Let A ∈ FS . Deﬁne U by U (ω) = S(ω) if ω ∈ A, T (ω) if ω ∈ A. /

It is easy to see that U is a stopping time, so by Doob’s optional stopping theorem, E M0 = E MU = E [MS ; A] + E [MT ; Ac ]. Also, E M0 = E MT = E [MT ; A] + E [MT ; Ac ]. Taking the diﬀerence, E [MT ; A] = E [Ms ; A], which is what we needed to show. Given two supermartingales Xt and Yt , it is routine to check that Xt ∧ Yt is also a n n supermartingale. Also, if Xt are supermartingales with Xt ↓ Xt , one can check that Xt 95

is again a supermartingale. With these facts, one can show that given a process such as Gt , there is a least supermartingale larger than Gt . So we deﬁne Wt to be a supermartingale (with respect to P, of course) such that Wt ≥ Gt a.s for each t and if Yt is another supermartingale with Yt ≥ Gt for all t, then Wt ≤ Yt for all t. We set τ = inf{t : Wt = Gt }. We will show that τ is the solution to the problem of ﬁnding the optimal stopping time. Of course, computing Wt and τ is another problem entirely. Let Tt = {τ : τ is a stopping time, t ≤ τ ≤ T }. Let Vt = sup E [Gτ | Ft ].

τ ∈Tt

Proposition 24.2. Vt is a supermartingale and Vt ≥ Gt for all t. Proof. The ﬁxed time t is a stopping time in Tt , so Vt ≥ E [Gt | Ft ] = Gt , or Vt ≥ Gt . so we only need to show that Vt is a supermartingale. Suppose s < t. Let π be the stopping time in Tt for which Vt = E [Gπ | Ft ]. π ∈ Tt ⊂ Ts . Then E [Vt | Fs ] = E [Gπ | Fs ] ≤ sup E [Gτ | Fs ] = Vs .

τ ∈Ts

**Proposition 24.3. If Yt is a supermartingale with Yt ≥ Gt for all t, then Yt ≥ Vt . Proof. If τ ∈ Tt , then since Yt is a supermartingale, we have E [Yτ | Ft ] ≤ Yt . So Vt = sup E [Gτ | Ft ] ≤ sup E [Yτ | Ft ] ≤ Yt .
**

τ ∈Tt τ ∈Tt

What we have shown is that Wt is equal to Vt . It remains to show that τ is optimal. There may in fact be more than one optimal time, but in any case τ is one of them. Recall we have F0 is the σ-ﬁeld generated by S0 , and hence consists of only ∅ and Ω. 96

Proposition 24. this procedure gives good usable results for some optimal stopping problems. Since τ was the ﬁrst time that Wt equals Gt and Wt = Vt . Therefore the expected value of Gτ is as least as large as the expected value of Gσ . V0 = supτ ∈T0 E [Gτ | F0 ] = supτ E [Gτ ]. and hence τ is also an optimal stopping time. However. An example is where Gt is a function of just Wt . Let σ be a stopping time where the supremum is attained. Then V0 ≥ E [Vσ | F0 ] = E [Vσ ] ≥ E [Gσ ] = V0 . 97 . Since Vσ ≥ Gσ . we must have Vσ = Gσ . The above representation of the optimal stopping problem may seem rather bizarre. Proof.4. Since F0 is trivial. Therefore all the inequalities must be equalities. we see that τ ≤ σ. Then E [Gτ ] = E [Vτ ] ≥ EVσ = E Gσ . τ is an optimal stopping time.

t Nt = E [1/β(T )] + 0 Hs dWs for some adapted integrand Hs . we take another look at option pricing. Zero coupon. Here T is ﬁxed. A zero coupon bond with maturity date T pays $1 at time T and nothing before.25. T ). T ) = β(t)dNt + Nt dβ(t) = β(t)Ht dWt + Nt r(t)β(t)dt = β(t)Ht dWt + B(t. By the martingale representation theorem. By Ito’s product formula. as we have seen. it has a random component. T ) = β(t)E 1 − | Ft = E e β(T ) T t r(u)du | Ft . Let t β(t) = e 0 r(u)du be the accumulation factor. in terms of dollars at time t. r(u)du (ST − K)+ | Ft = E β(t) V V | Ft = β(t)E | Ft . One dollar at time T will be worth 1/β(T ) in today’s dollars. This is a martingale. So B(t. V /β(T ). In today’s dollars it is worth. We now want to consider the case where the interest rate is nondeterministic. Therefore the price of the option should be E V . as above. 98 . This is equivalent to an option with payoﬀ value V = 1. Let r(t) be the (random) interest rate at time t. β(T ) β(T ) From now on we assume we have already changed to the risk-neutral measure and we write P instead of P. should be the payoﬀ at time T discounted by the interest or inﬂation rate. and so should be e Therefore the value at time t is E e − T t − T t r(u)du (ST − K)+ . Let Nt = E [1/β(T ) | Ft ]. Accumulation factor. The payoﬀ. β(T ) We can also get an expression for the value of the option at time t. where St is the stock price. should be B(t. Let’s derive the SDE satisﬁed by B(t. that is. T )r(t)dt. Let V = (ST − K)+ be the payoﬀ on the standard European call option at time T with strike price K. Term structure. dB(t. So its price at time t. To do so. T ) = β(t)Nt .

∂T (25.3) 99 . Let us try to accomplish this by buying a zero coupon bond with maturity date T and shorting (i. T ). Solving for B(t. Suppose we want to borrow $1 at time T and repay it with interest at time T + ε. selling) N zero coupon bonds with maturity date T + ε. we have T t f (t. T )/B(t. T ) + log B(t. T ). T ) instead of B(t. T +ε). and its log is 0. t). The eﬀective rate of interest R over the time period T to T + ε is eεR = Solving for R. the graph will not clearly show the behavior of r. (25. T ) . T ). One sometimes speciﬁes interest rates by what are known as forward rates. u)du = − ∂ log B(t. ε We now let ε → 0. T ) from f (t. Recovering B from f . T ) as a function of t. Our outlay of money at time t is B(t. T Since B(t.and we thus have dB(t. If one holds T ﬁxed and graphs B(t. u)du = − log B(t. T + ε) Sometimes interest rates are speciﬁed by giving f (t. T )r(t)dt. If we set N = B(t. u) |u=T u=t ∂u t = − log B(t. Let us see how to recover B(t. At the present time we are at time t ≤ T . T ) or r(t). t) is the value of a zero coupon bond at time t which expires at time t.2) B(t. we have log B(t.u)du . we have B(t.1) Forward rates. Integrating. our outlay at time t is 0. (25. T + ε) = 0. T ) = e − T t f (t. T ) = β(t)Ht dWt + B(t. T ) = − ∂ log B(t. At time T +ε we pay B(t. We now discuss forward rates. B(t. T )/B(t. T ) − N B(t.e. At time T we receive $1.. it is equal to 1. T + ε) . T + ε). T ) − log B(t. We deﬁne the forward rate by R= f (t.

5) 100 . t). (25. ∂ − B(t.u)du . let us show how to recover r(t) from the forward rates. On the other hand. Comparing with (25. we obtain T t r(u)du | Ft . t).4) yields r(t) = f (t. T ) = E e t | Ft . T ) = E − r(T )e ∂T Evaluating this when T = t. We have T − r(u)du B(t.3) we have ∂ − B(t. from (25.4) f (t.Recovering r from f . T ) = −f (t. T )e ∂T T t (25. Diﬀerentiating. E [−r(t) | Ft ] = −r(t). Next. Setting T = t we obtain −f (t.

From (25. b(t) = b are constants and σ = 0. then the solution to (26. we see that if P is the risk-neutral measure. T ) = B(t. T )dXt + 2 B(t. Let T T α∗ (t. u)du dWt = r(t)dt − α∗ (t. T ) r(t) − α∗ + 2 (σ ∗ )2 dt − σ ∗ B(t.1) Let us derive the SDE that B(t. we derive the SDE for B by using Ito’s formula with T the function ex and Xt = − t f (t. u)du. T )dt − σ ∗ (t. T )(σ ∗ (t. u)du). u)du. T )dt. Hull and White model In this model. b are deterministic functions.2) Here σ. The stochastic integral term introduces randomness. Since B(t. T )dWt . T ) = B(t. u)du dt − t σ(t. using Ito’s formula. We have T dXt = f (t. (26. 1 dB(t. while the a − br term causes a drift toward a(t)/b(t). u)du. we have α∗ = 2 (σ ∗ )2 . Therefore.1) we know the dt term must be B(t. (26. t)dt − t T df (t. T )dWt . σ ∗ (t. hence dB(t. u)dWt ] du = r(t)dt − t T T = r(t)dt − t α(t. T ) satisﬁes.26. the interest rate r is speciﬁed as the solution to the SDE dr(t) = σ(t)dWt + (a(t) − b(t)r(t))dt. T )r(t)dt − σ ∗ B(t. Some interest rate models. Heath-Jarrow-Morton model Instead of specifying r. T )dWt + α(t. T ) = exp(− t f (t. T ) = t T α(t.2) becomes r(t) = a/b.1). a.) 101 . (Note that if σ(t) = σ. 1 Comparing with (26. u)du [α(t. T ) = t σ(t. T ))2 dt 1 = B(t. u)dt + σ(t. See Note 1 for more on this. a(t) = a. the Heath-Jarrow-Morton model (HJM) speciﬁes the forward rates: df (t. T )r(t)dt. T )dWt . T ) = σ(t.

v. so we conclude that the t r. so t Var r(t) = e−2K(t) 0 e2K(u) σ(u)2 du. Multiplying both sides by e−K(t) .’s (Gaussian = normal) are Gaussian. Let K(t) = Then t 0 b(u)du. and also limits of Gaussian r. T ) = E e − T 0 r(u)du . We see that the mean at time t is t E r(t) = e −K(t) r(0) + 0 eK(u) a(u)du .v.) Limits of linear combinations T of Gaussians are Gaussian.2) is one of those SDE’s that can be solved explicitly.’s are Gaussian. t t eK(t) r(t) = r(0) + 0 eK(u) a(u)du + 0 eK(u) σ(u)dWu . Integrating both sides. then t F (u)dWu = lim 0 F (ui )(Wui+1 − Wui ). (One can similarly calculate the covariance of r(s) and r(t).(26. . From undergraduate probability. d eK(t) r(t) = eK(t) r(t)b(t)dt + eK(t) a(t) − b(t)r(t) dt + eK(t) [σ(t)dWt ] = eK(t) a(t)dt + eK(t) [σ(t)dWt ]. Cox-Ingersoll-Ross model One drawback of the Hull and White model is that since r(t) is Gaussian. We know how to calculate the second moment of a stochastic integral. we have the explicit solution t t r(t) = e−K(t) r(0) + 0 eK(u) a(u)du + 0 eK(u) σ(u)dWu . it can take negative values with positive probability. The Cox-IngersollRoss model avoids this by modeling r by the SDE dr(t) = (a − br(t))dt + σ 102 r(t)dWt . so we can calculate the mean and variance of 0 r(t)dt and get an explicit expression for B(0. which doesn’t make sense. linear combinations of Gaussian r. 0 F (u)dWu is Gaussian. If F (u) is deterministic.v.

we obtain α(t. By the Girsanov theorem. comparing this with (25. let Mt = exp(− 0 θ(u)dWu − 2 0 θ(u)2 du) and deﬁne P(A) = E [MT . T )σ ∗ (t. T ) = B(t. T )dWt . will be given in terms of Bessel functions. 1 dB(t. This square root term implies that when r(t) is small. if θ does not depend on T . one can calculate the distribution of r. If P is not the risk-neutral measure. it is still possible that one exists. where Wt is a Brownian motion under P. Let θ(t) be a t 1 t function of t. 2 Diﬀerentiating with respect to T . T ) = σ(t. the ﬂuctuations in 1 r(t) are larger than they are in the Hull and White model. it can be shown that r(t) will never hit 0 and will always be positive. If we try to solve this equation for θ.1) we must have α∗ = 1 (σ ∗ )2 + σ ∗ θ. T )θ(t). T ) r(t) − α∗ + 2 (σ ∗ )2 + σ ∗ θ]dt − σ ∗ B(t.) Note 1. (The density of r(t). It turns out to be related to the square of what are known in probability theory as Bessel processes. Provided a ≥ 2 σ 2 . for example. However. T ) + σ(t. 103 . A] for A ∈ FT . Again. P will be the risk-neutral measure. Although one cannot solve for r explicitly. there is no reason oﬀ-hand that θ depends only on t and not T .The diﬀerence from the Hull and White model is the square root of r in the stochastic integral term.

Show that Mn = eaSn φ(a)−n 2 is a martingale for each a real. r = 0. 8. . and A0 = A0 . 12. . Nn = |Mn |.1. . and K = 10. 11. then A0 ≤ A1 ≤ A2 ≤ · · · and An is Fn−1 measurable for each n. Show that Mn = Sn − 3nSn is a martingale. 15. X2 . Suppose Xn is a submartingale and Xn = Mn + An = Mn + An . ∆1 . 7. Show that if Xn and Yn are martingales with respect to {Fn } and Zn = max(Xn . 2. u = 3. If V is a European call with strike price K and exercise date n. Suppose Xn is a submartingale. Let Xi and Sn be as in Problem 3. Nn = Mn . Let Xn and Yn be martingales with E Xn < ∞ and E Yn < ∞. Show E [Nn+1 | Fn ] ≥ Nn for each n. T ) are also stopping times. and E Nn < ∞ for each n. Do not use Jensen’s inequality. 2 2 9. Show Mn = Mn for each n. 1 10. 3. 14. Suppose Mn is a martingale. Show there exists a martingale Mn such that if An = Xn − Mn . 6. d = 2 . Show E [XE [Y | G] ] = E [Y E [X | G] ]. and E Nn < ∞ for each n. then E [Zn+1 | Fn ] ≥ Zn . both M and M are martingales. Show that in the binomial asset pricing model the value of the option V at time k is Vk . Consider the binomial asset pricing model with n = 3. compute the hedging strategy ∆0 . Suppose X1 . Let φ(x) = 1 (ex +e−x ). Xn are independent and for each i we have P(Xi = 1) n 1 3 = P(Xi = −1) = 2 . Show n E Xn Yn − E X0 Y0 = m=1 E (Xm − Xm−1 )(Ym − Ym−1 ). S0 = 20. Suppose Xn is a martingale with respect to Gn and Fn = σ(X1 . . Do not use Jensen’s inequality. In the same model as problem 1. both An and An increase in n. T ) and min(S. Prove that E [aX1 + bX2 | G] = aE [X1 | G] + bE [X2 | G]. 4. .Problems 1. Xn ). Yn ). 13. . where both An and An are Fn−1 measurable for each n. Show E [Nn+1 | Fn ] ≥ Nn for each n. 104 . Let Sn = i=1 Xi . Suppose Mn is a martingale. . Show that max(S. . Show Xn is a martingale with respect to Fn . and ∆2 . 2 5. Suppose that S and T are stopping times. compute explicitly the random variables V1 and V2 and calculate the value V0 .

Show that if instead S1 ≥ S2 ≥ · · · and S = limn→∞ Sn .16. 17.] 20. Suppose Mt is a continuous bounded martingale for which M Show that n 2 −1 ∞ is also bounded.ε) (Ws )ds 0 converges as ε → 0 to a continuous nondecreasing process that is not identically zero and that increases only when Xt is at 0. where F is a twice continuously diﬀerentiable function. Suppose Xt = Wt + F (t).ε) (Ws )ds in terms of fε (Wt ) − fε (W0 ) plus a stochastic integral term and take the limit in this formula.ε) (x). Let Xt be the solution to dXt = σ(Xt )dWt + b(Xt )dt. t 1 [Hint: Use Ito’s formula to rewrite 2ε 0 1(−ε. Suppose that Sn is a stopping time for each n and S1 ≤ S2 ≤ · · ·. Find a nonconstant function f such that f (Xt ) is a martingale. where Wt is Brownian motion and σ and b are bounded C ∞ functions and σ is bounded below by a positive constant. Let Wt be Brownian motion. and Wt is a Brownian motion under P. Then sum over i and show that the stochastic integral term goes to zero as n → ∞. Show S = limn→∞ Sn is also a stopping time.] 21. then S is again a stopping time. F (0) = 0. Let fε (0) = fε (0) = 0 and fε (x) = 2ε 1(−ε. Show that eiuWt +u t Hs dWs and give an explicit formula for Hs . [Hint: Show that Ito’s formula implies (i+1)/2n (M i+1 − M n 2 i 2n ) = i/2n 2 (Ms − M i 2n )dMs + M i+1 2n − M i 2n . Show that / 1 2ε t 1(−ε.] 1 19. 0 2 t/2 can be written in the form 18. (M i+1 − M n 2 i 2n )2 i=0 converges to M 1 as n → ∞. [Hint: Apply Ito’s formula to f (Xt ) and obtain an ordinary diﬀerential equation that f needs to satisfy. Find a probability measure Q under 105 . You may assume that it is valid to use Ito’s formula with the function fε (note fε ∈ C 2 ). X0 = x.

Let ψ(t. x..e. (ψ has an explicit formula. Suppose we are in the continuous time model. (b) Find the hedging strategy that duplicates the claim V . µ)dy dx. K = 15. 106 .1. Let A and B be ﬁxed positive reals. Suppose the stock price is given by dSt = σSt dWt + µ(t)St dt. Let V be the standard European call that has strike price K and exercise date T . but we don’t need that here. 25. r = 0. 26. Let V be the option that pays oﬀ sups≤T Ss at time T . D B P(A ≤ sup(Ws + µs) ≤ B. where Wt is a Brownian motion. for each A. D. x. Suppose the interest rate is 0 and St is the standard geometric Brownian motion stock price. Wt = x). nonrandom) function. B.) 22. (a) Determine the price at time 0 of V . but let µ(t) be a deterministic (i. Determine the price at time 0 of V as an expression in terms of ψ. Suppose Xt = Wt − t 0 Xs ds. 23. Determine the price of the standard European call using the Black-Scholes formula. s≤t where Wt is a Brownian motion. C. y. and let V be the option that pays oﬀ 1 at time T if A ≤ ST ≤ B and 0 otherwise.which Xt is a Brownian motion and prove your statement. Suppose we have a stock where σ = 2. (You will need to use the general Girsanov theorem. 23. More precisely. Let r and σ be constants. Find the price at time 0 of V . S0 = 10. and T = 3. Show that t Xt = 0 es−t dWs . C ≤ Wt ≤ D) = s≤t C A ψ(t.) Let the stock price St be given by the standard geometric Brownian motion. y. µ) = P(sup(Ws + µs) = y for s ≤ t. as usual.

- Finance Maths
- Financial Mathematics
- The Mathematics Of Financial Derivatives
- Lectures in Financial Economics
- An undergraduate Introduction to Financial Mathematics by J Robert Buchanan
- Introduction to Stochastic Calculus With Applications
- MMM Waiver
- Waiver of Liability
- Financial Engineering and Stochastic Calculas
- Financial Calculus
- Lecture Notes Stochastic Calculus
- Elementary Stochastic Calculus With Finance in View_thomas Mikosch (World 1998 212p)
- Calculus
- Financial Economics
- EXAMPLE LLC Operating Agreement
- Elementary Calculus of Financial Mathematics Monographs on Mathematical Modeling and Computation
- Financial Mathematics Ct 1
- Dictionary of Analysis, Calculus, & Differential Equations
- Physics 2000 and Calculus 2000
- The Basics of Financial Mathematics
- The Basics of Financial Mathematics - Bass 2003
- The Basics of Financial Mathematics
- Financial Mathematics (Lectures)
- CRR model
- Week 3
- SSRN-id541744
- Random Dry Markets and Statistical Arbitrage Bounds for European Derivatives
- Damien Lamberton, Bernard Lapeyre, Nicolas Rabeau, Francois Mantion Introduction to Stochastic Calculus Applied to Finance 1996
- OptionProbability-123566084846-phpapp02
- finlmath

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd