You are on page 1of 92

Stochastic processes without measure theory

Byron Schmuland

I returned, and saw under the sun, that the race is not to the swift, nor
the battle to the strong, neither yet bread to the wise, nor yet riches to
men of understanding, nor yet favour to men of skill; but time and chance
happeneth to them all. Ecclesiastes 9:11.

Contents

1. Finite Markov chains


A. Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
B. Calculating probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
C. Invariant probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
D. Classification of states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
E. Hitting times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2. Countable Markov chains


A. Recurrence and transience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
B. Difference equations and random walks on Z . . . . . . . . . . . . . . . . . 30
C. The general result of random walks . . . . . . . . . . . . . . . . . . . . . . . . . . 33
D. Two types of recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
E. Branching processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3. Optimal stopping
A. Strategies for winning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

B. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
C. Algorithms to find optimal strategies . . . . . . . . . . . . . . . . . . . . . . . . . 45
D. Binomial pricing model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4. Martingales
A. Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B. Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
C. Optional sampling theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
D. Martingale convergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5. Brownian motion
A. Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
B. Reflection principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
C. Dirichlet problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6. Stochastic integration
A. Integration with respect to random walk . . . . . . . . . . . . . . . . . . . . . 78
B. Integration with respect to Brownian motion . . . . . . . . . . . . . . . . . 79
C. Itos formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

FINITE MARKOV CHAINS

Finite Markov Chains

Basic definitions

Let (Xn )
n=0 be a stochastic process taking values in a state space S that
has N states. To understand the behaviour of this process we will need to
calculate probabilities like
P (X0 = i0 , X1 = i1 , . . . , Xn = in ).

(1.1)

This can be computed by multiplying conditional probabilities as follows:


P (X0 = i0 ) P (X1 = i1 j X0 = i0 ) P (X2 = i2 j X1 = i1 , X0 = i0 )
P (Xn = in j Xn1 = in1 , . . . , X1 = i1 , X0 = i0 ).

Example A-1. We randomly select playing cards from an ordinary deck.


The state space is S = fRed, Blackg. Lets calculate the chance of observing
the sequence RRB using two different sampling methods.
(a) Without replacement
P (X0 = R, X1 = R, X2 = B)
= P (X0 = R) P (X1 = R j X0 = R) P (X2 = B j X1 = R, X0 = R)
=

26 25 26
52 51 50

.12745.
(b) With replacement
P (X0 = R, X1 = R, X2 = B)
= P (X0 = R) P (X1 = R j X0 = R) P (X2 = B j X1 = R, X0 = R)
=

26 26 26
52 52 52

.12500.

FINITE MARKOV CHAINS

Definition. The process (Xn )


n=0 is called a Markov chain if for any n and
any collection of states i0 , i1 , . . . , in+1 we have
P (Xn+1 = in+1 j Xn = in , . . . , X1 = i1 , X0 = i0 ) = P (Xn+1 = in+1 j Xn = in ).
For a Markov chain the future depends only on the current state and not
on past history.

Exercise. In example A-1, calculate P (X2 = B j X1 = R) and confirm that


only with replacement do we get a Markov chain.
Definition. A Markov chain (Xn )
n=0 is called time homogeneous if, for any
i, j S we have
P (Xn+1 = j j Xn = i) = p(i, j),
for some function p : S S [0, 1].
From now on, we will assume that all our Markov chains are time homogeneous. The probabilities (1.1) for a Markov chain are computed using
the initial probabilities (i) = P (X0 = i) and the transition probabilities
p(i, j):
P (X0 = i0 , X1 = i1 , . . . , Xn = in ) = (i0 ) p(i0 , i1 ) p(i1 , i2 ) p(in1 , in ).
We will often be interested in probabilities conditional on the starting position, and will write Pi (A) instead of P (A j X0 = i). In a similar vein, we
will write conditional expected values as Ei (X) instead of E(X j X0 = i).
Definition. The transition matrix for the Markov chain (Xn )
n=0 is the
N N matrix whose (i, j)th entry is p(i, j).

FINITE MARKOV CHAINS

Example. Card color with replacement.

P =

B
R

B
R

1/2 1/2
1/2 1/2

Every transition matrix satisfies 0 p(i, j) 1 and


matrices are also called stochastic.

jS

p(i, j) = 1. Such

Example A-2. Two state Markov chain


Imagine an office where there is only one telephone, so that at any time
the phone is either free (0) or busy (1). During each time period p is the
chance that a call comes in, while q is the chance that a call is completed.

0 0 0 1 1 1 0

The two arrows on top represent incoming calls and the arrow below is
the completed call. Note that the second incoming call was lost since the
phone was already busy when it came in.

P =

0
1

0
1

1p
p
q
1q

Example A-3. A simple queueing system


Now imagine that the office got a second telephone, so that the number of
busy phones can be 0, 1, or 2. During each time period p is the chance
that a call comes in, while q is the chance that a call is completed.

FINITE MARKOV CHAINS

0 0 0 1 1 2 1

The two arrows on top represent incoming calls and the arrow below is
the completed call. This time the second incoming call is answered on the
second phone.
0
0
1p
P = 1 (1 p)q
2
0

1
p
(1 p)(1 q) + pq
q

0
p(1 q)
1q

Example A-4. Random walk with reflecting boundaries.


The state space is S = f0, 1, . . . , Ng, and at each time the walker jumps to
the right with probability p, or to the left with probability 1 p.

1p

.......
..
..............
.......
....................................
.....................................
.....
..... .......... ..
......
.... ...
....
.....
....
...
...

.......
...... .
......................................
.....
...... .
....
.....
...
...

...

j1

j+1

..
..............
....................................
.....
......
....
.....
...
...

...

N 1

That is, p(j, j + 1) = p and p(j, j 1) = 1 p, for j = 1, . . . , N 1. The


boundary conditions are p(0, 1) = 1 and p(N, N 1) = 1, and p(i, j) = 0
otherwise. In the case p = 1/2, we call this a symmetric random walk.

Example A-5. Random walk with absorbing boundaries.


As above except with boundary conditions p(0, 0) = 1 and p(N, N ) = 1.

FINITE MARKOV CHAINS

Example A-6. Symmetric vs. Random walk with drift.


The following pages show some pictures of the first 10000 steps of a gambler
playing red-black on a roulette wheel. The symmetric walk (p = 1/2) is
the idealized situation of a fair game, while the walk with drift (p = 9/19)
shows the real situation where the casino takes advantage of the green spots
on the wheel.
The central limit theorem can help us understand the result after 10000
plays of the game. In the symmetric case, it is a tossup whether you are
ahead or behind. In the real situation however, the laws of probability have
practically guaranteed a nice profit for the casino.
...........................
.........................
......
.......
......
.......
.....
.....
......
.....
.....
.....
.....
.....
.....
.....
.....
.....
.
.
.
.
.
.
.....
.....
..
..
.
.
.
.
.
.
.
.
.....
.....
..
.
....
.
.
.
.....
.
.
.
.
.
.....
...
...
.....
.
.
.
.
.
.
.
.
.
.....
.....
..
..
.
.
.
.
.
.
.
.
.
.....
.....
..
..
.
.
.
.
.
.
.
.....
.
.
.
.....
...
....
.
.....
.
.
.
.
.
.
.
.....
..
.....
..
.
.
.
.
.
.
.
.
.
.
.....
.....
...
....
.
.
.
.
.
.....
.
.
.
.
.....
...
...
.....
.
.
.
.
.
.
.
.
.
.
.....
.....
...
...
.
.
.
.
.
.
.
.
.....
.
.
.
......
...
......
.
....
.
.
.
.
.
.
.
.
.
.
......
......
...
....
.
.
.
.
.
.
.
.
.
.......
.
.
.
.
.
.
........
...
....
.
.
.........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.............
.............
.
.....
.......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.....................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.
.........
............... .........
...
....

-526.31

Example A-7. A genetics model.


Imagine a population of fixed size N , that has two types of individuals; blue
and red . At each time t, every individual gives birth to one offspring
whose genetic type ( or ) is randomly chosen according to the empirical
distribution of the current generation.
The old generation dies, and the new generation takes its place, keeping the
population size fixed at N . This process repeats itself in every generation.
The state space of the process is the set of all probability measures on the
type space f , g.
The picture below shows a typical sample path for the the process. Here
N = 400 and X0 = (1/2, 1/2). Notice that as time goes on, one of the
genetic types starts to dominate and will eventually become fixed. Once
the process hits one of the absorbing states, it is stuck and the population
stops evolving.

1d Random Walk
160
140
120
100
80
60
40
20
0
-20
-40
10000 steps - 3 sample paths

Biased 1d Random Walk


100
0
-100
-200
-300
-400
-500
-600
10000 steps - 3 sample paths

FINITE MARKOV CHAINS

Time = 0

Time = 1

Time = 2

Time = 100

Time = 200

Time = 300

Calculating probabilities

Consider a two state Markov chain with p = 1/4 and q = 1/6 so that

P =

0
1

0
1

3/4 1/4
.
1/6 5/6

To find the probability that the process follows a certain path, you multiply
the initial probability with conditional probabilities. For example, what is
the chance that the process begins with 01010?

P (X0 = 0, X1 = 1, X2 = 0, X3 = 1, X4 = 0) = (0) p(0, 1) p(1, 0) p(0, 1) p(1, 0)


= (0) 1/576.

FINITE MARKOV CHAINS

As a second example, lets find the chance that the process begins with
00000.

P (X0 = 0, X1 = 0, X2 = 0, X3 = 0, X4 = 0) = (0) p(0, 0) p(0, 0) p(0, 0) p(0, 0)


= (0) 81/256.
If (as in many situations) we were interested in conditional probabilities,
given that X0 = 0, we simply drop (0), that is,

P (X1 = 1, X2 = 0, X3 = 1, X4 = 0 j X0 = 0) =

1
= .00174. (1.2)
576

P (X1 = 0, X2 = 0, X3 = 0, X4 = 0 j X0 = 0) =

81
= .31641. (1.3)
256

Heres a harder problem. Suppose we want to find P (X4 = 0 j X0 = 0)?


Instead of one path, this includes all possible paths that start in state 0 at
time 0, and find themselves in state 0 again at time 4. Then P (X4 = 0 j
X0 = 0) is the sum of (1.2) and (1.3) above, plus six others. (Why six?)
Luckily there is an easier way to calculate such probabilities.

Theorem. The conditional probability P (Xn = j j X0 = i) is the (i, j)th


entry in the matrix P n .

In the problem above, matrix multiplication gives

P4 =
so that P (X4 = 0 j X0 = 0) =

0
3245

1
3245
6912

6912
3667
10368

1 !

3667
6912
6701
10368

= .46947.

Probability vectors Let 0 = (0 (1), . . . , 0 (i), . . . , 0 (N )) be the 1 N


row vector whose ith entry is P (X0 = i). This vector give the distribution

FINITE MARKOV CHAINS

of the random variable X0 . If we multiply 0 on the left of the N N


matrix P n , then we get a new row vector:

0 P n

pn (1, 1)
pn (2, 1)
= (0 (1), . . . , 0 (i), . . . , 0 (N ))
..

...

pn (1, j)
pn (2, j)
..
.

...

pn (1, N )
pn (2, N )

..

pn (N, 1) pn (N, j) pn (N, N )

iS 0 (i)pn (i, 1), . . . ,

iS 0 (i)pn (i, j), . . . ,

(i)p
(i,
N
)
.
0
n
iS

But for each j S,


X
X
0 (i)pn (i, j) =
P (X0 = i)P (Xn = j j X0 = i)
iS

iS

P (X0 = i, Xn = j)

iS

= P (Xn = j).
In other words, the vector n = 0 P n gives the distribution of Xn .

Example. If we start in state zero, then 0 = (1, 0) and

3245
3667
3245 3667
6912
6912
4
,
= (.46947, .53053).
=
4 = 0 P = (1, 0) 3667
6701
6912
6912
10368
10368
On the other hand, if we flip a coin to choose the starting position then
17069 24403
0 = (1/2, 1/2) and 4 = ( 41472
, 41472 ) = (.41158, .58840).

Invariant Probabilities

Definition. A probability vector is called invariant for the Markov chain


if = P .

FINITE MARKOV CHAINS

Example. Lets try to find invariant probability vectors for some Markov
chains.
1.

Suppose that P =

0
1

1
. An invariant probability vector =
0

(1 , 2 ) must satisfy
(1 , 2 ) = (1 , 2 )

0
1

1
0

or, multiplying the right hand side, (1 , 2 ) = (2 , 1 ). This equation gives


us 1 = 2 , and since 1 + 2 = 1, we conclude that 1 = 1/2 and 2 = 1/2.
The unique invariant probability vector for P is = (1/2, 1/2).

1 0
2.
If P =
0 1
satisfies = P !
3.

is the identity matrix, then any probability vector

Consider a two state Markov chain with

3/4 1/4
P =
.
1/6 5/6

The first entry of = P gives 1 = 31 /4 + 2 /6, which implies 1 =


22 /3. Using 1 + 2 = 1, we conclude that = (2/5, 3/5).

Theorem
A probability vector is invariant if and only if there is a
probability vector v such that = limn vP n .
Proof: () Suppose that is invariant, and choose 0 = . then 1 =
0 P = P = . Repeating this argument shows that n = for all n 1.
Therefore, = limn P n (In this case, we say that the Markov chain is in
equilibrium).
() Suppose that = limn vP n . Multiply both sides on the right by P to
obtain
P = (lim vP n )P = lim vP n+1 = .
n

FINITE MARKOV CHAINS

10
u
t

This shows that is invariant.

1p
p
Lets investigate the general 2 2 matrix P =
. It has
q
1q
eigenvalues 1 and 1 (p + q). If p + q > 0, then P can be diagonalized as
P = QDQ1 , where
q
p

1 p
1
0
p+q p+q
.
Q=
, D=
, Q1 =
1
1
1 q
0 1 (p + q)
p+q p+q
Using these matrices, it is easy to find powers of the matrix P . For example
P 2 = (QDQ1 )(QDQ1 ) = QD2 Q1 . In the same way, for every n 1 we
have
P n = QDn Q1
n

1
0
= Q
Q1
0 (1 (p + q))n
q
p+q
= 1n q
p+q

p
p
p+q
np+q
p + (1 (p + q)) q
p+q
p+q

p
p+q
.
q
p+q

Now, if 0 < p + q < 2 then (1 (p + q))n 0 as n and


q
p

p+q p+q
n
P q
p = ,
p+q p+q
where = (q/(p + q), p/(p + q)). For any probability vector v we have

n
n
lim vP = v lim P = v
= .
n
n

This means that is the unique limiting vector for P , and hence the unique
invariant probability vector.

FINITE MARKOV CHAINS

11

Definition A Markov chain is called ergodic if P n converges to a matrix


with identical rows as n . In that case, is the unique invariant
probability vector.

Theorem If P is a stochastic matrix with P n > 0 for some n 1, then


the Markov chain is ergodic.

The next result is valid for any Markov chain, ergodic or not.
P
Theorem
If P is a stochastic matrix, then (1/n) nk=1 P k M . The
set of invariant probability vectors is the set of all convex combinations of
rows of M .
Proof: We assume the convergence result and prove the second statement.
Note that any vector that is a convex combination of rows of M can be
written = vM for some probability vector v.
() If is an invariant probability vector, then = P . Therefore
!
n
1X k
M,
P
=
n k=1
which shows that is a convex combination of the rows of M .
() Suppose that = vM for some probability vector v. Then
n
!
X
1
P = vM P = lim v
Pk P
n
n
k=1
= lim v(P 2 + . . . + P n+1 )
n

= lim v(P + + P n )
n

1
n

1
1
+ v(P n+1 P )
n
n

= vM + 0
= .
u
t

FINITE MARKOV CHAINS

12

P
We sketch the argument for the convergence result n1 nk=1 P k M .
P
The (i, j) entry of the approximating matrix n1 nk=1 P k can be expressed
in terms of probability as
n
!
n
n
n
1X
1X k
1X
1X
P =
Pi (Xk = j) =
Ei (1(Xk =j) ) = Ei
1(Xk =j) .
n k=1 ij
n k=1
n k=1
n k=1
This is the expected value of the random variable representing the average
number of visits the Markov chain makes to state j during the first n time
periods. A law of large numbers type result will be used to show why this
average converges.
Define the return time of the state j as Tj := inf(k 1 j Xk = j). We
use the convention that the infimum of the empty set is . There are
two possibilities for the P
sequence 1(Xk =j) : if Tj = , then it is just a
1
sequence of zeros, and n nk=1 1(Xk =j) = 0. On the other hand, if Tj < ,
then the history of the process up to Tj is irrelevant and we may just as
well start counting visits to j from time Tj . This leads to the equation
mij = Pi (Tj < ) mjj .
Putting i = j above, we discover that if Pj (Tj < ) < 1, then mjj = 0.
Thus mij = 0 for all i S and hence j = 0 for any invariant probability
vector . If Pj (Tj < ) = 1, then in fact Ej (Tj ) < (for a finite state
space!).
The following example shows the first n + 1 values of the sequence 1(Xk =j) ,
where we assume that the (` + 1)th visit to state j occurs at time n. The
random variable Tjs is defined as the time between the (s 1)th and sth
visit. These are independent, identically distributed random variables with
the same distribution as Tj .
nth trial

1 |00001
|
| {z } 000001
| {z }
{z } 00000000000001
{z
} 00000001
1
2
3
T `j
Tj
Tj
Tj
The average number of visits to state j up to time n can be represented as
the inverse of the average amount of time between visits. The law of large

FINITE MARKOV CHAINS

13

numbers says that, Pj -almost surely,


Pn
`
1
k=1 1(Xk =j)
= P`
> 0.

k
n
E
(T
)
j
j
T
k=1 j
We conclude that (1/n)

Pn

k=1

P k M , where mij = Pi (Tj < )/Ej (Tj ).

Examples of invariant probabilities.


1. The rat.
Suppose that a rat wanders aimlessly through the maze
pictured below. If the rat always chooses one of the available doors at
random, regardless of whats happened in the past, then Xn = the rats
position at time n, defines a Markov chain.
1
3

2
4

The transition matrix for this chain is

0 1/3 1/3 1/3


1/2 0
0 1/2

P =
1/2 0
0 1/2
1/3 1/3 1/3 0
To find the invariant probability vector, we rewrite the equation = P
as (I P t ) = 0, where P t is the transpose of the matrix P and I is the
identity matrix. The usual procedure of row reduction will lead to the
answer.

1
1/2 1/2 1/3
1 1/2 1/2 1/3
1/3

1
0
1/3

0 5/6 1/6 4/9


1/3
0 1/6 5/6 4/9
0
1
1/3
1/3 1/2 1/2
1
0 2/3 2/3 8/9

FINITE MARKOV CHAINS

14

1 0 3/5 3/5
1 0 0 1
0 1 1/5 8/15

0 1 0 2/3
0 0 4/5 8/15
0 0 1 2/3
0 0 4/5 8/15
0 0 0
0
This last matrix tells us that 1 4 = 0, 2 24 /3 = 0, and 3 24 /3 =
0, in other words the invariant vector is (4 , 24 /3, 24 /3, 4 ). Because
1 +2 +3 +4 = 1, we need
4 = 3/10 so the unique invariant probability
3 2 2 3
, , ,
.
vector is =
10 10 10 10
2. Random walk.

Here is an example from the final exam in 2004.

1/2

........
..
.....................................
.....
...... ...
....
.....
...
...

1/2

........
.........................................
......
.....
.....
....
...
...

........
.
...
..........
......................................
........................................
......
.....
..... ........... ..
.....
...
... ...
...
...
...

...

j1

j+1

...

N 1

A particle performs a random walk on f0, 1, . . . , Ng as drawn above. On


the interior, it is a symmetric random walk. From either of the boundary
points 0 or N , the particle either jumps to the neighboring state (with
probability ) or sits still (with probability 1 ). Here, can take any
value in [0, 1], so this includes both reflecting and absorbing boundaries, as
well as so-called sticky boundaries.
Find all invariant probability measures for every 0 1.
Solution: The vector equation = P gives
0 = (1 )0 + 1 /2
1 = 0 + 2 /2
j = j1 /2 + j+1 /2, for j = 2, . . . , N 2
N 1 = N + N 2 /2
N = (1 )N + N 1 /2,

FINITE MARKOV CHAINS

15

which can be re-written as

20 = 1
1 20 = 2 1
j j1 = j+1 j , for j = 2, . . . , N 2
N 1 2N = N 2 N 1
2N = N 1 .

Combining the first two equations shows that 2 = 1 , and then the middle
set of equations implies that 1 = P
2 = 3 = = N 1 . If > 0, then
both 0 and N equal 1 /2. From N
j=0 j = 1, we get the unique solution
0 = N =

1
,
2((N 1) + 1)

j =

, j = 1, . . . , N 1.
(N 1) + 1

If = 0, then we find that 1 = 2 = = N 1 = 0, and 0 , N are any


two non-negative numbers that add to 1.
3. Google search engine.
One of the primary reasons why Google is such
an effective search engine is the PageRank algorithm developed by Googles
founders, Larry Page and Sergey Brin, when they were graduate students
at Stanford. A brief explanation explanation of PageRank is available at
http : //www.google.com/technology/index.html
Imagine surfing the Web, randomly choosing an outgoing link from one
page to get to the next. This can lead to dead ends at pages with no
outgoing links, or cycles around cliques of interconnected pages. So, a
certain fraction of the time, simply choose a random page. This theoretical
random walk of the Web is a Markov chain. The limiting probability that a
random surfer visits any particular page is its PageRank. A page has high
rank if it has links to and from other pages with high rank.

FINITE MARKOV CHAINS

16

To illustrate, imagine a miniature Web consisting of six pages labelled A


to F , connected as below.
........
.................
...... .........
.....
...
...
...
...
...
..
..
.
.....
....
.
..
.
...
...
...........
..
.
.
.
.....
......
....
.
.
.
...................................................
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
................
........... ..........
...
.....
.......
...
...
....
.......
.....
...
.....
.......
.....
....
...
.....
.......
.....
.....
.......
...
...
.....
....
.......
.
.
...
.....
.
...
.
.
.
.......
............. ......
........................
.
.
...
.
.
.
...
.
.
.
.
.
.
.
.
.
.
....... ..
...
...
..
...
...
.
.
.
.
.
.
.
.
.
...
...
...
...
...................
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.
.
...
.... ...
.
.
.....
.
.
...
....
.
.
..........
.
.
.
.
.
.
.
.
.
.
.
.
..
.
.
.
.
.
.
.
.
...
...... ...
......... ........................
.
.
.
.
....
.
.
................. ...........
...
.....
.....
.
.
.
...
.
.
.
.
.
.
.
...
.
.
.....
...
.....
.
.
...
.
.
.
.
.
.
..
.
.
.
.
.....
...
.....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.....
.........
........
...
.....
.
.
.
.
.
.
.
.
.
.
.
.......
.
.
.......
........
... ...........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............... ..........
.... ............ .........
...
..
.........
.....
...
.... ..
...
...
.
..................................................................................................................................................
....
..
.
...
...
.
.
...
.
.....
.....
....................
....................

The connectivity matrix of this miniweb is:


A
A 0
B
0
C
0
G=
D
1
E0
F
1

B
1
0
0
0
0
0

C
1
1
0
0
0
0

D
0
1
1
0
0
0

E
0
0
1
0
0
0

F
0
0
1
0
1
0

P
The transition probabilities are pij = pgij / k gik + (1 p)/n. With n = 6
and p = .85, we get

1 18 18
1
1
1
1
1 18 18
1
1

1
1
1 1 37/3 37/3 37/3

P =

35
1
1
1
1
1
40

1
1 1
1
1
35
35 1 1
1
1
1
Using software to solve the matrix equation = P , we get
= (.2763, .1424, .2030, .1431, .0825, .1526),
so the pages ranked according to their PageRank are A, C, F, D, B, E.

FINITE MARKOV CHAINS

17

Classification of states

Definition. A state j S is called transient if Pj (Tj < ) < 1 and called


recurrent if Pj (Tj < ) = 1.
Note. Let Rjs be the time of the sth return to state j, so Rj1 = Tj1 = Tj .
We have Pj (Rjs < ) = Pj (Tj < )Pj (Rjs1 < ), and by induction we
prove that Pj (Rjs < ) = [Pj (Tj < )]s . Taking the intersection over s,
we obtain

0 if j is transient
s
.
Pj s (Rj < ) =
1 if j is recurrent
The probability of infinitely many visits to state j is either zero or one,
according as the state j is transient or recurrent.
Definition. Two states i, j S are said to communicate if there exist
n
m, n 0 so that pm
ij > 0 and pji > 0.
By this definition, every state communicates with itself (reflexive). Also, if
i communicates with j, then j communicates with i (symmetric). Finally, if
i and j communicate, and j and k communicate, then i and k communicate
(transitive). Therefore communication is an equivalence relation and we
can divide the state space into disjoint sets called communication classes.
If i is transient, then mii = 0 and the equation mji = Pj (Ti < ) mii
shows that mji = 0 for all j S. The jth row of the matrix M is invariant
for P , and hence for any power of P , so that
X
0 = mji =
mjk pnki mjj pnji .
kS

If i and j communicate, we can choose n so pnji > 0, and conclude that


mjj = 0.
This shows that communicating states i and j are either both recurrent, or
both transient. All states within each communicating class are of the same
type, so we will speak of recurrent or transient classes.

Lemma.

If j is recurrent and Pj (Ti < ) > 0, then Pi (Tj < ) > 0.

FINITE MARKOV CHAINS

18

In particular, j communicates with i, which means you cannot escape a


recurrent class.
Proof.
If j is recurrent, the chain makes infinitely many visits to j.
Define T = inffn > Ti j Xn = jg to be the first time the process hits state
j after hitting state i. Then
Pj (Ti < ) = Pj (Ti < , T < ) = Pj (Ti < )Pi (Tj < ).
The first equation is true because, starting at j, the process hits j at arbitrarily large times. The second equation comes from applying the Markov
property at time Ti . If Pj (Ti < ) > 0, then we can divide it out to obtain
Pi (Tj < ) = 1.
u
t
A general stochastic matrix with recurrent classes R1 , . . . , Rr , after a possible reordering of the states, looks like
.

P1 0 0 .........
....
0 P2 0 .......
..

0
...
...
0

...
0
0
...
P =

...
.
.
0

....
0

P
.
r
.

...
.
....
...
..
...
...
...

Each recurrent class R` forms a little Markov chain with transition matrix
P` . When you take powers you get
.
n

0 0 .........
P1
.
.
.
0 P2n 0 ........

..

0
...
.
.

.
. . 0 ....
0
Pn = 0

...
.
.
0

n ....
0

P
.
.

r ...

..
Sn

..
...
..
...
...
...

Qn

Averaging these from 1 to n, and letting n reveals the structure of

FINITE MARKOV CHAINS

19

the matrix M .
1
0

M = 0
0

0
2
0
0

0
0
...
0
r

..
...
..
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
..

If i and j are in the same recurrent class R` , the argument in the Lemma
above shows that Pi (Tj < ) = 1 and so mij = mjj . That is, the rows of
` are identical and give the unique invariant probability vector for P` .

Examples of classification of states.

0 1
1.
If P =
, then there is only the one recurrent class R1 = f0, 1g.
1 0
The invariant probability must be unique and have strictly positive entries.

1 0
2.
If P =
, then there are two recurrent classes R1 = f0g
0 1
and R2 = f1g. The invariant measures are = a(1, 0) + (1 a)(0, 1) for
0 a 1. That is, all probability vectors!
3.

Suppose we have

1/2 1/2 0
0
1/6 5/6 0
0

0 3/4 1/4
P = 0
0
0 1/6 5/6
1/5 1/5 1/5 1/5

0
0

0
.
0
1/5

The classes are R1 = f0, 1g, R2 = f2, 3g, and T1 = f4g. The invariant
measures are = a(1/4, 3/4, 0, 0, 0) + (1 a)(0, 0, 2/5, 3/5, 0) for 0 a 1.
None of these puts mass on the transient state.

FINITE MARKOV CHAINS

20

4.
Take a random walk with absorbing boundaries at 0 and N . We can
reach 0 from any state in the interior, but we cant get back. The interior states must therefore be transient. Each boundary point is recurrent,
so R1 = f0g, R2 = fN g, and T1 = f1, 2, . . . , N 1g and the invariant
probability vectors are
= a(1, 0, 0 . . . , 0, 0) + (1 a)(0, 0, . . . , 0, 1)
= (a, 0, 0 . . . , 0, 1 a) for 0 a 1.

Hitting times

Partition the state space S into two pieces D and E. We suppose that for
every starting point in D it is possible to reach the set E. We are interested
in the transition from D to E.

...............................................................
............
........
..
..........
......
..
........
.....
.
... .
.
.
.
.
.
.....
.............
.......
....
.
......
.
...
.
.
.
.
.
.
.
...
....
.
.
.
.
.
.
...
... ....
...
.
.
.
...
.
........
...
......
.
.
.
.
.
.
.
...
.
.
.
.
.
.
.
......
...
.
.
.
.
..
.
.
.
.
.
.
.
.
......
..
.. ..
.
.
.
.
.
.
.
.
...
.
.....
............
..
...
.....
.
.
.................................. ....
.
.....
.
..
.
.
.
.
.
.
.....
. .......... ..
....
...
.... .. ............... ....
...
.....
...........
.....
...
..
...
.....
...
..
...
...
.....
.
..
.
...
.
.....
....
.
.... .... ......
....
.
.
...
.
...
........ ......
...
...
..
.....
...
......................
.....
...
...
.....
....
...
...
.....
......
.
.
.
.
.....
.... .........
.
.....
.......
.. .....
......
.......
.
.
.
.
.
.......
.
.......
.........
.........
..........
..............
...............................................

Let Q be the matrix of transition probabilities from the set D into itself,
and S the matrix of transition probabilities of D into E.
The row sums of (I Q)1 give the expected amount of time spent until
the chain hits E.
The matrix (I Q)1 S gives the probability distribution of the first state
visited in E.

FINITE MARKOV CHAINS

Examples.

1. The rat.

Recall the rat in the maze.

1
2
3
4

1
0 1/3 1/3 1/3
2 1/2 0
0 1/2

P =

3 1/2 0
0 1/2
4 1/3 1/3 1/3 0

21

Heres a question we can answer using methods from a previous section:


Starting in room 1, how long on average does the rat take to return to
room 1? We are looking for E1 (T1 ), which is 1/1 for the unique invariant
probability vector. Since 1 = 3/10, the answer is 10/3 = 3.333.
Heres a similar sounding question: Starting in room 1, how long on average
does the rat take to enter room 4? For this we need the new results from this
section. Divide the state space into two pieces D = f1, 2, 3g and E = f4g.
The corresponding matrices are
1
1
0

Q = 2 1/2
3 1/2

2
1/3
0
0

1/3
0
0

and

1 1/3
S = 2 1/2 .
3 1/2

We calculate
1
2
3

1
1
1/3 1/3
1
0
IQ = 2 1/2
3 1/2
0
1

and

(IQ)1

1
2
3

1 3/2 1/2 1/2


= 2 3/4 5/4 1/4 .
3 3/4 1/4 5/4

The row sums of (I Q)1 are


5/2
E1 (T4 )
E2 (T4 ) = 8/4 ,
8/4
E3 (T4 )
and the first entry answers our question: E1 (T4 ) = 5/2.

FINITE MARKOV CHAINS

22

2. $100 or bust.
Consider a random walk on the graph pictured below. You keep moving
until you hit either $100 or ruin. What is the probability that you end up
ruined?
.
.....
.....
... $100
.
.
.
.
..... ...
.... ...
...
.... ........
.
.
.
.....
..
....
...
.
.
ruin
start
E consists of the states $100 and ruin, so the Q and S matrices look like:

0 1/3 1/3
1/3 0
Q = 1/4
0 1/4 S = 1/4 1/4 .
1/3 1/3 0
0 1/3
A bit of linear algebra gives

11/8 2/3
(I Q)1 S = 1/2 4/3
5/8 2/3

5/8
1/3
1/2 1/4
11/8
0

0
5/8 3/8
1/4 = 1/2 1/2 .
1/3
3/8 5/8

Starting from the bottom left hand corner there is a 5/8 chance of being
ruined before hitting the money. Hey! Did you notice that if we start in
the center, then getting ruined is a 50-50 proposition? Why doesnt this
surprise me?

3. The spider and the fly.


A spider performs a random walk on the corners of a cube and eventually
will catch and eat the stationary (and rather stupid!) fly. How long on
average does the hunt last?

fly

............................................ spider
.........
.
..................................................... ....
...
.
...
..
..
..
...
.
..
....
..
...
..
..
..
...
... ...... . . . . ... . .............
...............................................

FINITE MARKOV CHAINS

23

To begin with, it helps to squash the cube flat and label the corners to see
what is going on.
1

..................................................................................
...........
.. .......
...... ...........
......
......
......
......
......
......
.....
......
.
......
.
......
.
.
......
...
.....
......
.
.
.
.
.
.
.
.
.
.
.
......
...... .....
....
.
.
.
.
......
.
.
.
....
.............
......
.
.
.
.
.
.
.
.
......
......
...
...
.
.
.
.
.
.
.
.
.
.
.
......
.
......
....
....
.
.
.
.
......
.
.
.
.
.
.
......
......
....
....
.
.
.
.
.
.
.
.
.
.
......
....
....
....
.
.
.
.
.
.
.
.
.
.
.
........................................................................................................................................................................................................................................
......
......
.
.
...
.
......
.
......
......
.
.
.
......
...
......
.....
.
.
.
.
.
.
.
.
.
.
......
......
....
....
.
.
.
.
......
......
.
.
.
.
...... .........
......
.....
..
......
.....
..........
......
......
......
......
...... ...........
.....
.....
......
......
.
.
.
.
.
.
.
.
.
.
......
.
......
..
......
......
......
......
...... .........
......
.
.................................................................................................

Here is the transition matrix for the chain.


F
1
2
3
F
0 1/3 1/3 1/3
1
1/3
0
0
0

2 1/3 0
0
0

3 1/3 0
0
0
P =
4
0
1/3
1/3
0

5
0 1/3 0 1/3
6 0
0 1/3 1/3
S
0
0
0
0

4
0
1/3
1/3
0
0
0
0
1/3

5
0
1/3
0
1/3
0
0
0
1/3

6
0
0
1/3
1/3
0
0
0
1/3

0
0

1/3

1/3

1/3
0

Using software to handle the linear algebra on the 6 6 matrix Q, we find


that the row sums of (I Q)1 are (10, 9, 9, 9, 7, 7, 7). The answer to our
question is the first one: ES (TF ) = 10.

4. Random walk.
Take the random walk on S = f0, 1, 2, 3, 4g with absorbing boundaries.
1p

......
..
..............
..................
................. ...............
......... ........................
......
.....
..... ..........
.....
...
... ...
.
.
...
....
.

FINITE MARKOV CHAINS

24

Now put D = f1, 2, 3g and E = f0, 4g, so that


1
1
0
Q = 2 1 p
3
0

(I Q)1

2
3

p
0
0
p
1p 0

and

0
4

1 1p 0
0 .
S = 2 0
3
0
p

1
1 (1 p)2 + p
1
2
1p
=
(1 p)2 + p2
3
(1 p)2

2
p
1
1p

p2
.
p
2
p + (1 p)

The row sums give

E1 (TE )
1 + 2p2
1
E2 (TE ) =
.

2
(1 p)2 + p2
2
E3 (TE )
1 + 2(1 p)
Matrix multiplication gives
0
1 (1 p + p2 )(1 p)
1
1
2
(1 p)2
(I Q) S =
2
2
(1 p) + p
3
(1 p)3

p3
.
p2
2
(1 p + p )p

Starting in the middle, at state 2, we find that

4
E(length of game) =

2
,
(1 p)2 + p2

3
2

...................
.......
......
.....
.....
.....
.....
.....
.....
.
.
.
.....
...
.
.
.
.....
..
.
.
.
.....
.
....
...
.
.
...
..
.
.
...
..
.
....
.
.....
...
.
.
.
.....
...
.
.
.
.....
...
.
.....
.
.
..
.
.....
.
.
.
.....
...
.
.
.
.
.....
.
.
.....

0.5

FINITE MARKOV CHAINS

25

P (ruin) =

0.5

(1 p)2
,
(1 p)2 + p2

...
...
...
...
...
...
...
...
...
....
.....
.....
.....
.....
.....
......
......
.......
.......
........
.........
..........
............
................
............................

0.5

For instance, with an American roulette wheel we have p = 9/19 so that


the expected length of the game is 3.9889 and the chance of ruin .5524.

5. Waiting for patterns.


Suppose you start tossing a fair coin and that you will stop when the
pattern HHH appears. How long on average does this take?
We define a Markov chain where Xn means the number of steps needed to
complete the pattern. The state space is S = f0, 1, 2, 3g and we start in
state 3 and the target is state 0. Define D = f1, 2, 3g and E = f0g.
The state of the chain is determined by the number of Hs at the end of the
sequence. Here T represents any initial sequence of tosses, including the
empty sequence, provided it doesnt have 3 heads in a row.
T
TH
T HH
T HHH

State
State
State
State

3
2
1
0

The corresponding Q matrix and (I Q)1 are given by


1
1
0

Q = 2 1/2
3
0

2
0
0
1/2

1/2
1/2 ,
1/2

(I Q)1

1 2 3

1 2 2 4
= 2 2 4 6
3 2 4 8

The third row sum answers our question: E(T0 ) = 2 + 4 + 8 = 14.

FINITE MARKOV CHAINS

26

We can apply the same idea to other patterns; lets take T HT . Now the
states are given as follows:
HH
HHT
TT
TH
T HT

State
State
State
State
State

3
2
2
1
0

The corresponding Q matrix and (I Q)1 are a little different


1
1
0

Q = 2 1/2
3
0

2
0
1/2
1/2

1/2
0 ,
1/2

(I Q)1

1 2 3

1 2 2 2
= 2 2 4 6
3 2 4 4

The third row sum E(T0 ) = 2 + 4 + 4 = 10 shows that we need on average


10 coin tosses to see T HT .

6. A Markov chain model of algorithmic efficiency.


Certain algorithms
in operations research and computing science act in the following way. The
objective is to find the best of a set of N elements. The algorithm starts
with one of the elements, and then successively moves to a better element
until it reaches the best.
In the worst case, this algorithm will require N 1 steps. What about the
average case? Let Xn stand for the rank of the element we have at time n.
If the algorithm chooses a better element at random, then Xn is a Markov
chain with transition matrix

FINITE MARKOV CHAINS

27

1
0
0
0

0
0
1
1
0
0

0
0
2
2
1
1
1
0

0
0
3
3
3
..
..
..
..
..
..
...
.
.
.
.
.
.
1
1
1
1
1

0
N 1 N 1 N 1 N 1
N 1
We are trying to hit E = f1g and so

0
0
0

0
0
1

0
0

0
0
2

0
0
Q=
.
3
3

.
.
.
.
.
.
..
..
..
..
..
.

1
1
1
1

0
N 1 N 1 N 1
N 1
A bit of experimentation with Maple will convince you that

1 0 0
0
0
1

0
0
2 1 0

1 1

1
0
0
(I Q)1 =
.
2 3

. . .

.
.
. . . ...

.
.
.
.
.
.
.

1
1 1 1

1
2 3 4
N 1

P =

Taking row totals shows that E(Tj ) = 1 + (1/2) + (1/3) + + (1/(j 1)).
Even if we begin with the worst element, we have E(TN ) = 1 + (1/2) +
(1/3) + + (1/(N 1)) log(N ). It takes an average of log(N ) steps to
get the best element. The average case is much faster than the worst case
analysis might lead you to believe.

COUNTABLE MARKOV CHAINS

28

Countable Markov Chains

We now consider a state space that is countably infinite, say S = f0, 1, . . .g


or S = Zd = f(i1 , . . . , id ) : ij Z, j = 1, . . . , dg. We no longer use linear
algebra but some of the rules carry over
X
p(x, y) = 1, for all x S,
yS

and
pm+n (x, y) =

pm (x, z)pn (z, y).

zS

Recurrence and transience

Suppose that Xn is an irreducible Markov chain on a countable state space


S. We call the chain recurrent if for each x S, Px (Xn = x infinitely often) =
1, and transient if Px (Xn = x infinitely often) = 0.
Fix
P a state x and assume X0 = x. Define the random variable R =
n=0 1(Xn =x) . From the Markov property
E(R j R > 1) = 1 + E((R 1) j R > 1)
= 1 + E(R)
By definition
E(R) = E(R j R > 1)P (R > 1) + E(R j R = 1)P (R = 1)
= (1 + E(R))P (R > 1) + P (R = 1)
= 1 + E(R)P (R > 1),
whence we conclude that
1 = E(R)P (R = 1).

(2.4)

Suppose that P (R = 1) = 0, that is, a return to x is certain. From the


Markov property, a second return is also certain, and a third, etc. In fact
P (R = ) = 1, so that x is recurrent and E(R) = .

COUNTABLE MARKOV CHAINS

29

Now suppose that P (R = 1) > 0, so a return to the initial state is not


certain. Then from (2.4) we have E(R) = 1/P (R = 1) < , so P (R <
) = 1, i.e. x is transient.
Since
E(R) = E

1(Xn =x)

n=0

P (Xn = x) =

n=0

pn (x, x),

n=0

the following theorem holds.

Theorem.

The state x is recurrent if and only if

n=0

pn (x, x) = .

Example. One dimensional random walk


p

1p

...

.........
.....
........................................
.....................................
......
..... ......... .
.....
.....
.... ....
...
...
...
...

x2

x1

x+1

x+2

...

Take x = 0 and assume that X0 = 0. Lets find p2n (0, 0). In order that
X2n = 0, there must have been n steps
to the left and n steps to the
2n
right. The number of such paths is n and the probability of each path
is pn (1 p)n so

2n n
(2n)! n
p2n (0, 0) =
p (1 p)n =
p (1 p)n .
n
n!n!

Stirlings formula says n! 2n(n/e)n , so replacing the factorials gives


us an approximate formula
p
2(2n)(2n/e)2n n
[4p(1 p)]n
n

.
p
(1

p)
=
p2n (0, 0)
2n(n/e)2n
n
P
P

If p = 1/2, then n p2n (0, 0) n 1/ n = and the P


walk is recurrent.
But if p 6= 1/2, then p2n (0, 0) 0 exponentially fast and n p2n (0, 0) <
so the walk is transient.

COUNTABLE MARKOV CHAINS

30

Example. Symmetric random walk in Zd


At each time, the walk moves to one of its 2d neighbors with equal probability.
...
...
...
...
...
.
.....
....
.
.
.........
...
....
.
.
.
.
.
..................................................................................................................................
.
.
...
...
.
.........
........
..
.
.....
...
...
...
..

Exact calculations are a bit difficult, so we just sketch a rough calculation


that gives the right result. Suppose the walker takes 2n steps. Roughly,
we expect hes taken 2n/d steps in each direction. In order to return to
zero, we need the number of steps in each direction to be even: the chance
of that is (1/2)d1 .
In each direction, the
p chance that a coordinate ends up back at 0 in 2n/d
steps is roughly 1/ (n/d). Therefore
d1 d/2
d
1
.
p2n (0, 0)
2
n
Since

na < if and only if a > 1, we conclude that

recurrent if d 2
d
Symmetric random walk in Z is
.
transient if d 3

Difference equations and Markov chains on Z

Let Xn be a random walk on the integers, where p(y, y 1) = qy , p(y, y +


1) = py , and p(y, y) = 1 (qy + py ), where we assume py + qy 1. For
x y z, define
a(y) = Py (Xn will hit state x before z).

COUNTABLE MARKOV CHAINS

31

Note that a(x) = 1 and a(z) = 0. Conditioning on X1 we find that the


function a is harmonic for x < y < z, that is,
a(y) = a(y 1)qy + a(y)(1 (qy + py )) + a(y + 1)py ,
which can be rewritten as
a(y)(qy + py ) = a(y 1)qy + a(y + 1)py ,
or as
py [a(y) a(y + 1)] = qy [a(y 1) a(y)].
Provided the py > 0 for x < y < z, we can divide to get
a(y) a(y + 1) =

qy
[a(y 1) a(y)],
py

and iterating gives us


a(y) a(y + 1) =

qx+1 qy
[a(x) a(x + 1)].
px+1 py

For convenience, lets define ry = qy /py , so the above equation becomes


a(y) a(y + 1) = rx+1 ry [a(x) a(x + 1)].
This is even true for y = x, if we interpret the empty product rx+1 rx
as 1. For any x w z we have
a(x) a(w) =

w1
X

[a(y) a(y + 1)] =

y=x

w1
X

rx+1 ry [a(x) a(x + 1)]. (1)

y=x

In particular, putting w = z gives


1 = 1 0 = a(x) a(z) =

z1
X

rx+1 ry [a(x) a(x + 1)],

y=x

and plugging this back into (1) and solving for a(w) gives
Pz1
y=w (rx+1 ry )
.
a(w) = Pz1
(r

r
)
x+1
y
y=x

(2)

COUNTABLE MARKOV CHAINS

32

Consequences
1. Lets define the function b by
b(y) = Py (Xn will hit state z before x).
This function is also harmonic, but satisfies the opposite boundary conditions b(x) = 0 and b(z) = 1. Equation (1) is valid for any harmonic
function, so lets plug in b and multiply by -1 to get
b(w) = b(w) b(x) =

w1
X

rx+1 ry [b(x + 1) b(x)].

(3)

y=x

Plugging in w = z allows us to solve for b(x + 1) b(x) and plugging this


back into (3) gives
Pw1
y=x (rx+1 ry )
.
(4)
b(w) = Pz1
y=x (rx+1 ry )
In particular we see that a(w) + b(w) = 1 for all x w z. That is, the
chain must eventually hit one of the boundary points fx, zg, provided all
the py s are non-zero.
2. For w x, define
(w) = Pw (ever hit state x) = lim a(w).
z

P
If the limit of the denominator of (4) diverges, i.e.,
y=x (rx+1 ry ) = ,
then limz b(w) = 0, so limz a(w) = (w) = 1 for all w.
On the other hand, if

y=x (rx+1

ry ) < , then

P
y=w (rx+1 ry )
(w) = P
.
y=x (rx+1 ry )
This shows that (w) decreases to zero as w .

(5)

COUNTABLE MARKOV CHAINS

33

3. In the case where py = p and qy = q dont depend on y,


z
r rw

rz rx if r 6= 1
a(w) =

zw

if r = 1.
zx
Letting z gives the probability we ever hit x from the right: (w) =
1 rwx .
4. Notice that (x) = 1. The process is guaranteed to hit state x when you
start there, for the simple reason that we count the visit at time zero! Lets
work out the chance of a return to state x. Conditioning on the position
X1 at time 1, we have
Px (Tx < ) = qPx1 (hit x) + (1 (p + q))Px (hit x) + pPx+1 (hit x)

1
+ (1 (p + q))1 + p(1 r)
= q 1
r
= (q p) + (1 (p + q)) + (p q)
= 1 jp qj.
This shows that the chain is recurrent if and only if p = q.

The general result for random walks


....................................................................................................................................................
..
. .
...
. ..
... ...
... ...
...
... ...
...
... .....
... .....
... .....
...
...
.....
..
...
...
...
...
..
...
...
...
............ ...............
.
.
.
.
...
.
...
...
.. ....
..
... ....
... ....
...
... .....
.
... ..
.
...
.
.
... . .
... ..
... ...
.
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..........................................................................................................................................................................
.
. ..
. ..
...
.
.
.
.
.
.
.
.
.
.
.
...
...
.
.
..
.
.
... ....
.
.
.
.
.
.
.
...
..
.
.
...
.
...
............ ...............
...
...
...
...
...
....
..
..
......
...
...
... ....
... ....
... ....
...
... ...
... ...
... ...
..
.
.
.
.
.
.
.
.
................................................................................................................................................................
...
..
..
..
.. ...
.. ...
.. ...
...
... .....
... .....
... ....
...
.
.
.
.
.
.
...
.
.
.
.
...
...
...
...
...
...
...
...
...
...
...
...
..
...
..
...
...
...
... ....
...
... .....
.
.
.
.
..
.
... ...
.
............................................................................................................................................................
.
.

The following general result is proved using of harmonic analysis.

COUNTABLE MARKOV CHAINS

34

Theorem.
Suppose Xn is a genuine d-dimensional random
P
P walk with
jxjp(0, x) < . The walk is recurrent if d = 1, 2, and
xp(0, x) = 0.
Otherwise it is transient.

Two types of recurrence

A Markov chain only makes finitely many visits to a transient state j, so


the asymptotic density of its visits, mjj must be zero. This is true for both
finite and countable Markov chains.
A recurrent state j may have zero density or positive density, though the
first is impossible in a finite state space S.
Definition.
A recurrent state j is called null recurrent if mjj = 0, i.e.,
Ej (Tj ) = . It is called positive recurrent if mjj > 0, i.e., Ej (Tj ) < .
The argument in section 1.D shows that if states i and j communicate,
and if mii = 0, then mjj = 0 also. The following lemma shows that if i
and j communicate, then they are either both recurrent or both transient.
Putting it all together, communicating states are of the same type: either
transient, null recurrent, or positive recurrent.

Lemma.

If i and j communicate,

pkii < if and only if

pkjj < .

n
Proof.
Choose n and m so pm
ij > 0 and pji > 0. Then for every k 0,
n+k+m
we have pjj
pnji pkii pm
ij , so that

X
k

pkjj

X
k

n
pn+k+m
pm
ij pji
jj

pkii .

P
P
Therefore k pkjj < implies k pkii < . Reversing the roles of i and j
gives the result.
t
u

COUNTABLE MARKOV CHAINS

Example.

35

For the symmetric random walks in d = 1, 2, we have


2n

m00

2n

1 X 1
1 X 2k

= lim
= 0.
p00 lim
2n k=1
2n k=1 k

These random walks are null recurrent.

Branching processes

This is a random model for the evolution of a population over several


generations. It has its origins in an 1874 paper by Francis Galton and
Reverend Henry William Watson called On the probability of extinction
of families. (see http://www.mugu.com/galton/index.html)
To illustrate, here is a piece of the Schmuland family tree:

..
...........
..... .. .....
..... .. .....
..... ..... ........
.
.
.
.
.....
.
...
.....
.....
.
.....
....
....
.....
.....
.....
.....
...
....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....................
........... ...
.
.
.
.
.
.
.
.
. ...... .........
............ ...
.
.
.
.
.
.
.
.
.
.
.
..... .......
...
...
..... .......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.....
.......
....
.....
.......
.....
.......
.....
.....
....
.......
.....
.......
...
...
.......
.....
.....
.......
.......
.....
...
...
.....
.......
.......
........
.......
.
.
.
.
.
.
.
.
.
.
.
..
.
... ... .......
...
.
.
...
.
.
.
.
.
.
... ... .......
...
.
...
.
.
.
.
.
.
.
.....
...
...
...
...
.
.
.
.
.
.
.
.
.
.
.....
..
...
....
.
.
.
.
.
.
.
.
.
.....
.
.
.....
...
...
...
.
.
.
.
.
.
.
...
.
.
.
.....
.
...
...
.
.
.
.
.
...
.
.
.
.
.
.
.
.....
...
....
.....
.
.
.
... .....
...
.
.
. ..
....
.... .........
.....
..
..
.....
....
....
.....
...
.....
...
.....
...
...
....
.
.

In this model, Xn is the number of individuals alive at time n. Each


individual produces offspring distributed like the random variable Y
y
0
p(y) p0

1
p1

2
p2

3
p3

...
...

We also assume that the individuals reproduce independently.


The process (Xn ) is a Markov chain with state space S = f0, 1, 2, . . .g. It
is not easy to write explicit formulae for the transition probabilities, but
we can express them as follows
p(k, j) = P (Y1 + + Yk = j).

COUNTABLE MARKOV CHAINS

36

Lets find the average size of generation n. Let = E(Y ) =


Then
EXn+1 j Xn = k) = E(Y1 + + Yk ) = k,

j=0

jpj .

so that
E(Xn+1 ) =

E(Xn+1 j Xn = k)P (Xn = k)

kP (Xn = k)

k=0

k=0

= E(Xn ).
By induction, we discover that E(Xn ) = n E(X0 ).
If < 1, then E(Xn ) 0 as n . The estimate
E(Xn ) =

kP (Xn = k)

k=0

P (Xn = k) = P (Xn 1)

k=1

shows that limn P (Xn = 0) = 1. Now, for a branching process, the state
0 is absorbing so we can draw the stronger conclusion that P (limn Xn =
0) = 1. In other words, the branching process is guaranteed to become
extinct if < 1.
Extinction.
Lets define a sequence an = P (Xn = 0 j X0 = 1), that
is, the probability that the population is extinct at the nth generation,
starting with a single individual. Conditioning on X1 and using the Markov
property we get
P (Xn+1 = 0 j X0 = 1) =
=
=

k=0

k=0

P (Xn+1 = 0 j X1 = k)P (X1 = k j X0 = 1)


P (Xn = 0 j X0 = k)P (X1 = k j X0 = 1)
P (Xn = 0 j X1 = 1)k pk .

k=0

k
If we define (s) =
k=0 pk s , then the equation above can be written
as an+1 = (an ). Note that (0) = P (Y = 0), and (1) = 1. Also

COUNTABLE MARKOV CHAINS

37

P
k1
0
0, and 0 (1) = E(Y ). Finally, note that 00 (s) =

=
k=0 pk ks
P(s)

k2
0, and if p0 + p1 < 1, then 00 (s) > 0 for s > 0.
k=0 pk k(k 1)s
The sequence (an ) is defined through the equations a0 = 0 and an+1 = (an )
for n 1. Since a0 = 0, we trivially have a0 a1 . Apply to both sides
of the inequality to obtain a1 a2 . Continuing in this way we find that
the sequence (an ) is non-decreasing. Since (an ) is bounded above by 1, we
conclude that an a for some constant a.
The value a gives the probability that the population will eventually become extinct. It is the smallest solution to the equation a = (a). The
following pictures sketch the proof that a = 1 (extinction is certain) if and
only if E(Y ) 1.

Case 1 and a = 1

Case > 1 and a < 1

..
......
...
....
..
......
.........
...
..............
.
.
.
...
.
.. .
..........
...
.........
...
..........
...... .....
...
..................
.
.
.
...
.
.
...... .......
...
....................
...
...... .. .......
...... .......
...
................................ ...
.
.
.
...
.
.
.
.
....... .... ......... . .
...
.......
...
.. ..... .. ...
.......
............................................
...
.. ..
.
..........
.
.
.
.
.
.
.
...
.
.
.
.
..... ..
........
...
....
.... ....
..... ..
.........
.
.
.
.
.
.
.
.
.
.
...
.
.
.
.
.
..
..........
...
.... ........
.. ...
..
...........
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.
.
.. ..
.
.
.
.
...
.
.......
...............
.
..............................................................................................................................
.
.... ....
.
... ..
.
.
.
.
....
.
..... ..
.. ...
..
...
.....
.
.
.
.
.
.
.. ..
.
.
...
.
.....
...
.
.
....
.
.
.
.
.
.... ....
.
...
.
.
...
.
.
.
.
.
.
...
.
.....
.. ...
..
...
..
....
.
.
.
.
.
.. ..
.
...
.
.
.....
...
.
.
.
.....
.
.
.
.
.
.
.
...
.
.... ...
.
.
....
.
.
.
...
.....
..
...
..
.... ....
.....
.
.
.
.
.
.
.
.
.
...
.
..
.....
...
.
.....
....
...
....
.... ...
.....
.
.
.
...
.
....
..
...
..
.... ....
.....
. .
..
... .........
..
... .......
..
. ..
..
.. ..
.........................................................................................................................................................................................................................................

a0

a1

a2 a3a4

..
...
...
......
..
........
........
...
...........
.
.
.
...
.
.
..........
...
........
...
..... ....
..... .....
...
.............
.
.
...
.
..... ....
...
..... ....
...
..... .....
..... .....
...
..............
.
.
...
.
..... ....
...
..... .....
...
..... .....
..... ....
...
.............
.
.
.
...
.
..... .....
...
...........
...
..........
..........
...
.............
.
.
.
...
.
.
..
..........
...
....
...
........
.......... .
...
................. .
.
.
.
.
...
.
.
..................
...
...
....... .. ......
...
........ .......
............................. ..
...
....
......... .... ......... .. ..
.
.
.
.
.
.
...
.
.
..
...........
..
..
........... .. ..
.............
..
................................................................................
.. ..
.
...
. ...
..... ...
.
.
.
.
....
.
.
...
. .
.
....
.
.
.
...
.....
.. ..
...
..
....
.....
.
.
.
.
.
.
.
.
.
.
...
.
.
.....
...
. ...
.
....
.
.
.
.
.
.
...
.
....
.
...
. .
.
.
.
.
...
.
....
.. ..
...
..
....
.....
.
.. ..
... .........
..
... .......
.. ...
......
..
..
.
.
.
..................................................................................................................................................................................................................................

a0

a1 a2 a3 a

Examples.
1. p0 = 1/4, p1 = 1/4, p2 = 1/2. This gives us = 5/4 and (s) =
1/4 + s/4 + s2 /2. Solving (s) = s gives two solutions f1/2, 1g. Therefore
a = 1/2.

COUNTABLE MARKOV CHAINS

38

2. p0 = 1/2, p1 = 1/4, p2 = 1/4. In this case, = 3/4 so that a = 1.


3. Schmuland family tree. p0 = 7/15,
p 1 = 3/15, p2 = 1/15, p3 = 4/15.
This gives = 1.1333 and a = ( 137 5)/8 = .83808. There is a 16%
chance that our surname will survive into the future!

OPTIMAL STOPPING

39

Optimal Stopping

Strategies for winning

Think of a Markov chain as a gambling game. For example, a random walk


on f0, 1, . . . , Ng could represent your winnings at the roulette wheel.
Definition. A payoff function f : S [0, ) assigns a payoff to each
state of the Markov chain. Think of it as the amount you would collect if
you stop playing with the chain in that state.

Example. In the usual situation, f(x) = x, while in the situation where


you owe the mob N dollars the payoff is f(x) = 1N (x).
f(x) = x

f(x) = 1N (x)

.......

.
....
...
..
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
.
...
.
.
...
....
....
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
.
.
.
.
.
.
.
......................................................................................................................................................................................................

...

N 1 N

.......

..
...
..
.
......................................................................................................................................................................................................

... N 1 N

Note. Any function g : S R can also be considered as the column


vector whose coordinates are the numbers g(x) for x S. Multiplying this
vector on the right of the transition matrix PPgives a new column vector
(= function) P g with coordinates (P g)(x) = yS g(y)p(x, y).
Definition. A stopping time (strategy) is a rule that tells you when to stop
playing. Mathematically, it is a random variable with values in f0, 1, 2, . . .g
fg so that fT ng (X0 , . . . , Xn ) for all n 0.

OPTIMAL STOPPING

40

Examples.
1. T 0, i.e., dont gamble.
2. T 1, i.e., play once, then stop.
3. T = wait for 3 reds in a row, bet on black. Repeat until you hit 0
or N .
We will always assume that P (T < ) = 1 in this section.
Definition. The value function v : S [0, ) is defined as the most
expected profit possible from that starting point;
v(x) = sup E(f(XT ) j X0 = x).
T

There is an optimal strategy Topt so that


v(x) = E(f(XTopt ) j X0 = x).
Facts about v
1. Consider the strategy T0 0 (dont gamble). Then
v(x) = sup E(f(XT ) j X0 = x) E(f(XT0 ) j X0 = x) = f(x).
T

That is, v(x) f(x) for all x S.


2. Define the strategy T : play once, then follow the optimal strategy.
Then
v(x) E(f(XT ) j X0 = x)
X
=
E(f(XT ) j X1 = y)p(x, y)
yS

v(y)p(x, y)

yS

= (P v)(x).
Therefore v(x) (P v)(x) for all x S. Such a function is called
superharmonic.

OPTIMAL STOPPING

41

3. By definition, a superharmonic function u satisfies u(x) (P u)(x),


or
E(u(X0 ) j X0 = x) E(u(X1 ) j X0 = x).
It turns out that for any two stopping times 0 S T < we have
E(u(XS ) j X0 = x) E(u(XT ) j X0 = x).
4. Suppose that u is a superharmonic function and u(x) f(x) for all
x S. Then
u(x) =

E(u(X0 ) j X0 = x)
E(u(Topt ) j X0 = x)
E(f(Topt ) j X0 = x)
v(x).

Putting all this together gives the following.


Lemma. The value function v is the smallest superharmonic function f.

Here is the main theorem of this section.


Theorem. The optimal strategy is given by TE := inf(n 0 j Xn E),
where E = fx S j f(x) = v(x)g.
Sketch of proof. First you must show that P (TE < j X0 = x) = 1 for
all x S. Assume this has been done.
Define u(x) = E(f(XTE ) j X0 = x), and note that, since v is superharmonic,
u(x) = E(v(XTE ) j X0 = x) v(x).
Define another strategy TE0 = inf(n 1 j Xn E), so that TE0 TE and
(P u)(x) = E(v(XTE0 ) j X0 = x)
E(v(XTE ) j X0 = x)
= u(x),

OPTIMAL STOPPING

42

showing that u is superharmonic.


The last thing is to show that u(x) f(x). Fix a state y so that f(y)
u(y) = supx (f(x) u(x)). Then
u(x) + f(y) u(y) f(x) for all x S.
Since the left hand side is superharmonic, we get
u(x) + f(y) u(y) v(x) for all x S.
in particular, f(y) v(y) so that y E. Then
u(y) = E(f(XTE ) j X0 = y) = f(y),
so that u(x) f(x) for all x S.

u
t

Corollary. v(x) = maxff(x), (P v)(x)g.

Examples

1. Take the random walk on f0, 1, . . . , Ng with q p, and f(x) = x. Then


(P f )(x) = qf(x1)+(1(q +p))f(x)+pf (x+1) = x(q p) x = f(x).
This shows that f is superharmonic. Therefore v = f everywhere and
E = S. The optimal strategy is T0 0, i.e., dont gamble!
2. Take the random walk on f0, 1, . . . , Ng with absorbing boundaries and
f(x) = 1N (x). Lets show that the optimal strategy is to continue until
you hit f0, N g.
Certainly f0, N g E since absorbing states x always satisfy v(x) = f(x).
For 1 x N 1, there is a non-zero probability that the chain will hit
N before 0 and so,
f(x) = 0 < E(f(XTf0,Ng ) j X0 = x) v(x).
Thus, v(x) > f(x) so x does not belong to E.

OPTIMAL STOPPING

43

Note that the value function gives the probability of ending up at N ;


v(x) = E(f(XTE ) j X0 = x) = P (XTE = N j X0 = x).
The function v is harmonic (v(x) = (P v)(x)) on f1, 2, . . . , N 1g so, as for
the example on page 33, we calculate directly

1 (q/p)x

, p 6= q

1 (q/p)N
v(x) =
.

x
p=q
N
3. Zarin case The following excerpt is taken from What is the Worth of
Free Casino Credit? by Michael Orkin and Richard Kakigi, published in
the January 1995 issue of the American Mathematical Monthly.
In 1980, a compulsive gambler named David Zarin used a generous
credit line to run up a huge debt playing craps in an Atlantic City
casino. When the casino finally cut off Zarins credit, he owed over
$3 million. Due in part to New Jerseys laws protecting compulsive
gamblers, the debt was deemed unenforceable by the courts, leading
the casino to settle with Zarin for a small fraction of the amount he
owed. Later, the Internal Revenue Service tried to collect taxes on the
approximately $3 million Zarin didnt repay, claiming that cancellation
of the debt made it taxable income. Since Zarin had never actually
received any cash (he was always given chips, which he promptly lost
at the craps table), an appellate court finally ruled that Zarin had no
tax obligation. The courts never asked what Zarins credit line was
actually worth.
Mathematically, the payoff function is the positive part of x k, where k
is the units of free credit.
f(x) = (x k)+
.......

.
....
...
...
...
...
...
...
..
...
...
...
...
...
.
.
.
.............................................................................................................................................................................................................................................................................

k+1

k+2

OPTIMAL STOPPING

44

Since the state zero is absorbing, we have v(0) = 0. On the other hand,
v(x) > 0 = f(x) for x = 1, . . . , k, so that 1, . . . , k 6 E. Starting at k, the
optimal strategy is to keep playing until you hit 0 or N for some N > k
which is to be determined. In fact, N is the smallest element in E greater
than k.
We have to eliminate the possibility that N = , that is, E = 0. But
the strategy Tf0g gives a value function that is identically zero. As this is
impossible, we know N < .
The optimal strategy is Tf0,N g for some N . Using the previous example we
can calculate directly that
E(f(XTf0,Ng ) j X0 = k) = (N k)

1 (q/p)k
.
1 (q/p)N

For any choice of p and q, we choose N to maximize the right hand side.
In the Zarin case, we may assume he played the pass line bet which gives
the best odds of p = 244/495 and q = 251/495, so that q/p = 251/244. We
also assume that he bets boldly, making the maximum bet of $15,000 each
time. Then three million dollars equals k = 200 free units, and trial and
error gives N = 235 and v(200) = 12.977 = $194, 655.
N Expected Profit (units) Expected Profit ($)
232
12.9169
193754.12
233
12.9486
194228.91
234
12.9684
194526.29
235
12.9771
194655.80
236
12.9751
194626.58
12.9632
194447.42
237
238
12.9418
194126.71
In general, we have the approximate formula N k + 1/ ln(q/p) and the
probability of reaching N is approximately 1/e = .36788. Therefore the
approximate value of k free units of credit is
v(k)
which is independent of k!

1
,
exp(1) ln(q/p)

OPTIMAL STOPPING

45

Algorithm to find optimal strategy

The value function v is the smallest superharmonic function that exceeds


f, but there is no direct formula to find it. Here is an algorithm that gives
approximations to v.

Algorithm.

Define
u1 (x) =

f(x)
sup f

if x is absorbing
otherwise.

Then let u2 = max(P u1 , f), u3 = max(P u2 , f), etc. The sequence (un )
decreases to the function v.

Example. How much would you pay for the following financial opportunity?
I assume that you follow a random walk on the graph. There is no payoff
except $100 at state 4, and state 5 is absorbing.
1 ..................

4 ($100 bill)

..
.....
.....
.....
.....
.....
.
.....
.
.
.
..
.....
.....
.....
.....
.....
.....
.....
.....
.....
..... .........
.........
....
..... .....
..... ........
.....
.....
.
.
.
.....
...
.
.
.
.....
.
.....
.....
.....
.....
.
.
.
.
.....
....
.
.....
.
.
.....
....
.
.
.
.....
.
...
.....

5 (absorbing)

In vector form, the value function is f = (0, 0, 0, 100, 0). (For ease of
typesetting, we will render these column vectors as row vectors, OK?) The
P operation takes a vector u and gives
Pu =

u(2) + u(3) + u(4) u(1) + u(3) + u(5) u(1) + u(2) + u(4) + u(5) u(1) + u(3) + u(5)
,
,
,
, u(5) .
3
3
4
3
The initial vector is u1 = (100, 100, 100, 100, 0) and P u1 = (100, 200/3, 75, 200/3, 0).
Taking the maximum of this with f puts the fourth coordinate back up to
100, giving u2 = (100, 200/3, 75, 100, 0).
Applying this procedure to u2 gives u3 = (725/9, 175/3, 200/3, 100, 0).
Putting this on a computer, and repeating 15 times yields (in decimal

OPTIMAL STOPPING

46

format)
u15 = (62.503, 37.503, 50.002, 100.00, 0.00).
These give the value, or fair price, of the different starting positions on the
graph.
We may guess that if the algorithm were continued, the values would be
rounded to u = v = (62.5, 37.5, 50.0, 100.0, 0.0). We can confirm this
guess by checking the equation v = max(P v, f).

The binomial pricing model

In this section, we will look at an example from mathematical finance. Its


a little more complicated than our previous models, but it uses the same
principles.
We suppose a market with 3 financial instruments: bond (riskless asset),
stock (risky asset), and an option (to be explained). In one time period,
the value of the stock could go up or down, by being multiplied by the
constant d or u. In contrast, the bond grows at fixed rate r independent
of the market. Here r = 1 + interest rate and we assume that d < r < u to
preclude the possibility of arbitrage (riskfree profit).
uS, rB, Cu

........
..........
..........
..........
..........
.
.
.
.
.
.
.
.
.
...
..........
..........
..........
..........
...................
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........

C, B, S

dS, rB, Cd

Time 0

Time 1

How to not make money


Now imagine a portfolio that consists of x units of stock, y units of options,
and z units of bonds. The numbers can be positive, zero, or negatives; for
example they may represent debt.
Suppose that we can arrange it so that this portfolio is absolutely worthless

OPTIMAL STOPPING

47

at time 1, whether stocks go up or down


x(uS) + y Cu + z rB = 0
x(dS) + y Cd + z rB = 0
The fair time-zero price for the worthless portfolio is also zero, so that
x S + y C + z B = 0. Solving these 3 equations for the unknown C gives


1
rd
ur
C=
Cu +
Cd .
r
ud
ud
To ease the notation let p = (r d)/(u d) so that the price can be written
C = (p Cu + (1 p) Cd )/r. The worthless portfolio device is the same as
the usual replicating portfolio device.

Call Option
A call option gives the holder the right (but not the obligation) to buy
stock at a later time for K dollars. The value K is called the strike price.
The value of the option at time 1 is given by
Cu = (uS K)+

Cd = (dS K)+

Example. Suppose S = 100, r = 1.05, u = 2, d = 1/2. This gives a p-value


of p = 11/30. If the strike price is K = 150, then Cu = 50 and Cd = 0.
Therefore the option price is



11
19
1 55
1
50 +
0 =
= 17.46
C=
r
30
30
r 3
Just to doublecheck: the replicating portfolio consists of 1/3 share of stock
and a loan of 1r (50/3) dollars. If the stock goes down, it is worth 50/3
dollars which you use to pay off the loan, leaving a profit of zero. On the
other hand, if the stock goes up it is worth 200/3, you pay off the loan and
retain 150/3 = 50 dollars profit.
The time zero value of this portfolio is


1 55
100 1 50

=
.
3
r 3
r 3

OPTIMAL STOPPING

48

Clearly the price of the call option is non-negative; C 0. Also


1
fp (uS K)+ + (1 p)(dS K)+ g
r
1

fp (uS K) + (1 p)(dS K)g


r
K
= S
r
SK

C =

Combining these two inequalities shows that C (S K)+ .


Definition. An American option can be exercised at any time, a European
option can only be exercised at the terminal time.
For American options, the price is the maximum of the current payoff and
the formula calculated earlier. For a call option,

1
C = max (S K)+ , fpCu + (1 p)Cd g
r
1
=
fpCu + (1 p)Cd g .
r
A call option is never exercised early.

A put option gives the buyer the right (but not the obligation) to sell stock
for K dollars. That is,
Pu = (K uS)+
and

Pd = (K dS)+

1
P = max (K S)+ , fpPu + (1 p)Pd g .
r

Example. Suppose again that S = 100, r = 1.05, u = 2, d = 1/2. This


gives a p-value of p = 11/30. If the strike price is K = 150, then Pu = 0
and Pd = 100. Therefore the option price is

11
19
1
0+
100
= 60.31.
P = max 50,
1.05
30
30

OPTIMAL STOPPING

49

Multiple time periods.

Call Option

650
800

50
200

0
50

........
..........
..........
..........
.
.
.
.
.
.
.
.
.
.
..........
..........
..........
..........
..........
.
.
.
.
.
.
.
.
.
.
..........
.......... ...............
..........
..........
..........
..........
..........
..........
.
.
.
.
.
.
.
.
..........
.
.
..........
..........
..........
..........
.
.
.
.
.
.
.
.
..........
.
......
.
.
..........
.
.
.
.
.
.
.
..................
...........
.
.
.
.
.
.
.
.
.....
.
..........
......
.
.
.
.
.......... .....
.
.
.
.
.
.
.
.
.
.
..........
..........
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..........
......
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..........
........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
. ..........
..........................
...........................
.
.
.
. ..
..........
.......... ...................
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
.
.
.
.
.
..........
.
.
.
.
..........
..........
......
.
.
.
..........
.
.
.
.
.
.
..........
..........
......
.
.
.
.
..........
.
.
.
.
..........
.
....................
............
..........
..........
.......... ...
..........
..........
.
.
.
.
.
.
.
.
.
.
..........
........
.
.
..........
.
.
.
.
.
.
.
....
..........
...........
..........
..........
..........
...........
..........
.......... .....................
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
.......

257.14

100.33

400

38.71

200

17.46

100

6.10

100

50

25

Time

0
12.50
3

This tree explains the price of a call option with terminal time n = 3.
The red numbers are the possible stock values and the green numbers are
the current value of the call option. These are calculated by starting at
the right hand side and working left, using our formula. The end result is
relatively simple, since a call option is never exercised early.
( n
)
1 X n j
nj
j nj
p (1 p) (u d S K)+
C= n
r
j
j=0

OPTIMAL STOPPING

50

Put Option

0
800

0
200

100
50

.......
..........
..........
..........
..........
.
.
.
.
.
.
.
.
.
.
..........
..........
..........
..........
..........
.
.
.
.
.
.
.
.
.
.
.......... ..............
..........
..........
..........
..........
..........
..........
..........
..........
.
.
.
.
.
.
.
.
..........
.
......
.
.
.
..........
.
.
.
.
.
.
..........
......
.
.
.
.
.
.
.
.
..........
.
......
.
.
.................
.
.
.
.
.
.
.
...... ..............
.
.
.
.
.
.
.................
.
.
.
..........
......
..........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..........
.....
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..........
......
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..........
.........
..........
..........
..........
..........
..........
..........
..........
..........
. ..........
...........................
...........................
.. ...........
.
.
.
.
.
.
.
..........
.
.
..........
.
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
.
.
.
.
.
.
..........
.
.
..........
.
......
..........
.
.
..........
.
.
.
.
.
.
.
.......... ...........
..........
..............
.............
..........
..........
.......... ...
..........
...........
..........
..........
.
.
.
.
.
.
.
.
.
.
..........
...........
..........
...........
..........
..........
...........
..........
..........
.......... .....................
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
..........
.......

36.38

400

73.02

200

60.32

100

100

100

50

125

25

Time

137.50
12.50
3

This tree explains the price of a put option with terminal time n = 3. The
red numbers are the possible stock values and the green numbers are the
current value of the put option. These are calculated by starting at the
right hand side and working left, using our formula, but always taking the
maximum with the result of immediate exercise. There are boxes around
the nodes where the option would be exercised; note that two of them are
early exercise.

Whats the connection with optimal stopping for Markov chains?


Define a Markov chain on the tree. At each time, the chain moves forward.
It goes upward (north-east) with probability p and downwards (south-east)
with probability 1 p. The ends are absorbing.
Let f(x) be the value of the option is exercised immediately at state
x. Note that for a state x at level n, we have u1 (x) = f(x), u2 (x) =
max(P u1 (x), f(x)) = f(x), u3 (x) = max(P u2 (x), f(x)) = f(x). Similarly
uk (x) = f(x) for all k, so letting k we find v(x) = f(x) for such x.

OPTIMAL STOPPING

51

In other words, u1 (x) = v(x) for x at level n.


Next consider a state x at level n 1. We have u2 (x) = max(P u1 (x), f(x)).
Now u1 v at level n so P u1 P v at level n1. So u2 (x) = max(P v(x), f(x)) =
v(x) for such x.

.....
..........
..........
..........
..........
..................
..........
..........
..........
..........
.......

level = n 1

(P g)(x) = pg(y) + (1 p)g(z)

Continuing this way, we can prove that uj+1 (x) = v(x) for x at level N j.
In particular, uk (x) = v(x) for all x when k N + 1. The algorithm of
starting at the far right hand side and working backwards gives us the value
function v, which gives the correct price of the option at each state. We
exercise the option at any state where v(x) = f(x).

MARTINGALES

52

Martingales

Conditional Expectation

Whats the best way to estimate the value of a random variable X? If we


want to minimize the squared error
E[(X e)2 ] = E[X 2 2eX + e2 ] = E(X 2 ) 2eE(X) + e2 ,
differentiate to obtain 2e 2E(X), which is zero at e = E(X).
Example. Your friend throws a die and you have to estimate its value X.
According to the analysis above, your best bet is to guess E(X) = 3.5.
What happens if you have additional information? Suppose that your
friend will tell you the parity of the die value, that is, whether it is odd or
even? How should we modify our guess to takenthis new information into
0 if X is even
account? Lets define the random variable P =
.
1 if X is odd
Then
E(X j X is even) =

6
X

xP (X = x j X is even)

x=1

1
1
1
+ (3 0) + 4
+ (5 0) + 6
= (1 0) + 2
3
3
3
= 4.
Similar calculations show that E(X j X is odd) = 3.
We can combine these results as follows. Define a function (p) =

4
3

if p = 0
.
if p = 1

Then our best estimate is the random variable (P ).


In an even more extreme case, your friend may tell you the exact result X.
In that case your estimate will be X itself.

MARTINGALES

53

Information

Best estimate of X

none

E(X j no info) =

partial

E(X j P ) = (P )

complete

E(X j X) = (X)

E(X)

4 if p = 0
(p) =
3 if p = 1
(x) = x

Example. Suppose you roll two fair dice and let X be the number on the
first die, and Y be the total on both dice. Calculate (a) E(Y j X) and (b)
E(X j Y ).
(a)
X

E(Y j X)(x) =

yP (Y = y j X = x)

6
X
1
=
(x + w) = x + 3.5,
6
w=1

so that E(Y j X) = X + 3.5. The variable w in the sum above stands for
the value on the second die.
(b)
X

x P (X = x j Y = y)

P (X = x, Y = y)
P (Y = y)

P (X = x, Y X = y x)
P (Y = y)

P (X = x)P (Y X = y x)
P (Y = y)

E(X j Y )(y) =

MARTINGALES

Now

and

54

y1

36
P (Y = y) =

13 y
36
1
P (Y X = y x) = ,
6

y = 2, 3, 4, 5, 6, 7
y = 8, 9, 10, 11, 12
y 6 x y 1.

For 2 y 7 we get
y1
X

y1

1/36
1 X
1 (y 1)y
y
E(X j Y )(y) =
x
=
x=
= .
(y 1)/36
y 1 x=1
y1
2
2
x=1
For 7 y 12 we get
6
X

6
X
1
1/36
y
E(X j Y )(y) =
=
x
x= .
(13 y)/36
13 y x=y6
2
x=y6

Therefore our best estimate is E(X j Y ) = Y /2.

If X1 , X2 , . . . is a sequence of random variables we will use Fn to denote


the information contained in X1 , . . . , Xn and we will write E(Y j Fn ) for
E(Y j X1 , . . . , Xn ).
Definition. E(Y j Fn ) is the unique random variable satisfying the following two conditions:
1. E(Y j Fn ) depends only on the information in Fn . That is, there is
some function so that
E(Y j Fn ) = (X1 , X2 , . . . , Xn ).
2. If Z is a random variable that depends only on Fn , then
E (E(Y j Fn )Z) = E(Y Z).

MARTINGALES

55

Properties:
1. E(E(Y j Fn )) = E(Y )
2. E(aY1 + bY2 j Fn ) = aE(Y1 j Fn ) + bE(Y2 j Fn )
3. If Y is a function of Fn , then E(Y j Fn ) = Y
4. For m < n, we have E(E(Y j Fn ) j Fm ) = E(Y j Fm )
5. If Y is independent of Fn , then E(Y j Fn ) = E(Y )

Example 1. Let X1 , X2 , . . . be independent random variables with mean


and set Sn = X1 + X2 + + Xn . Let Fn = (X1 , , . . . , Xn ) and m < n.
Then
E(Sn j Fm ) = E(X1 + + Xm j Fm ) + E(Xm+1 + + Xn j Fm )
= X1 + + Xm + E(Xm+1 + + Xn )
= Sm + (n m)

Example 2. Let X1 , X2 , . . . be independent random variables with mean


= 0 and variance 2 . Set Sn = X1 + X2 + + Xn . Let Fn =
(X1 , , . . . , Xn ) and m < n. Then
E(Sn2 j Fm ) =
=
=
=
=

E((Sm + (Sn Sm ))2 j Fm )


2
+ 2Sm (Sn Sm ) + (Sn Sm )2 j Fm )
E(Sm
2
+ 2Sm E(Sn Sm j Fm ) + E((Sn Sm )2 )
Sm
2
+ 0 + Var (Sn Sm )
Sm
2
Sm + (n m) 2

MARTINGALES

56

Example 3.

Martingales

Let X0 , X1 , be a sequence of random variables and define Fn = (X0 , X1 , . . . , Xn )


to be the information in X0 , X1 , . . . , Xn . The family (Fn )
n=0 is called
the filtration generated by X0 , X1 , . . .. A sequence Y0 , Y1 , . . . is said to
be adapted to the filtration if Yn Fn for every n 0, i.e., Yn =
n (X0 , . . . , Xn ) for some function n .

MARTINGALES

57

Definition. The sequence M0 , M1 , . . . of random variables is called a martingale (with respect to (Fn )
n=0 ) if
(a) E(jMn j) < for n 0.

(b) (Mn )
n=0 is adapted to (Fn )n=0 .

(c) E(Mn+1 j Fn ) = Mn for n 0.


Note that if (Mn ) is an (Fn ) martingale, then E(Mn+1 Mn j Fn ) = 0 for
all n. Therefore if m < n,
E(Mn Mm j Fm ) = E(

n1
X

Mj+1 Mj j Fm )

n1
X

E(Mj+1 Mj j Fj ) j Fm )

j=m

= E(

j=m

= 0
so that E(Mn j Fm ) = Mm .
Another note: Suppose (Mn ) is an (Fn ) martingale, and define FnM =
(M0 , M1 , . . . , Mn ). Then Mn FnM for all n, and FnM Fn . Therefore
E(Mn+1 j FnM ) = E(E(Mn+1 j Fn ) j FnM ) = E(Mn j FnM ) = Mn ,
so (Mn ) is an (FnM )-martingale.
Example 1. Let X1 , X2 , . . . be independent random variables with mean .
Put S0 = 0 and Sn = X1 + + Xn for n 1. Then Mn := Sn n is an
(Fn ) martingale.
Proof.
E(Mn+1 Mn j Fn ) =
=
=
=

E(Xn+1 j Fn )
E(Xn+1 j Fn )

0.

MARTINGALES

58

Example 2. Martingale betting strategy


Let X1 , X2 , . . . be independent random variables with P (X = 1) = P (X =
1) = 1/2. These represent the outcomes of a fair game that we will bet
on. We start with a one dollar bet, and keep doubling our bet until we win
once, then stop.
Let W0 = 0 and, for n 1, Wn be our winnings after n bets: this is either
equal to 1 or to (1 + 2 + + 2n1 ) = 1 2n .
If Wn = 1, then Wn+1 = 1 also since weve stopping betting. That is,
E(Wn+1 j Wn = 1) = 1 = Wn . On the other hand, if Wn = 1 2n , then we
bet 2n dollars so that
1
1
P (Wn+1 = 1 j Wn = 1 2n ) = , P (Wn+1 = 1 2n+1 j Wn = 1 2n ) = .
2
2
Putting this together we get
1
1
E(Wn+1 j Wn = 1 2n ) = (1) + (1 2n+1 ) = 1 2n = Wn .
2
2
Thus E(Wn+1 j Wn ) = Wn in either case, so Wn is a martingale.
Example 3. A more complex betting strategy
Let X1 , X2 , . . . be as above, and suppose you bet Bn dollars on the nth
play. We insist that Bn Fn1 since you cant peek into the future. Such
a (Bn ) process is called predictable.
P
Your winnings are given by Wn = nj=1 Bj Xj so that
E(Wn+1 Wn j Fn ) =
=
=
=

E(Bn+1 Xn+1 j Fn )
Bn+1 E(Xn+1 j Fn )
Bn+1 E(Xn+1 )
0,

so (Wn ) is again a martingale. This is a discrete version of stochastic


integration with respect to a martingale.
Note that example 2 is the case where we set B1 = 1 and

j1
if X1 , X2 , . . . , Xj1 = 1 .
Bj = 2
0
otherwise

MARTINGALES

59

Example 4. Polyas urn


Begin with an urn that holds two balls: one red and the other green. Draw
a ball at random, then return it with another of the same color.
Define Xn to be the number of red balls in the urn after n draws. Then
Xn is a time inhomogeneous Markov chain with
P (Xn+1 = k + 1 j Xn = k) =

k
n+2

P (Xn+1 = k j Xn = k) = 1

k
n+2

This gives
E(Xn+1

k
n+3
k
j Xn = k) = (k + 1)
+k 1
=k
,
n+2
n+2
n+2

so that E(Xn+1 j Xn ) = Xn (n + 3/n + 2). From the Markov property we


get

n+3
,
E(Xn+1 j Fn ) = Xn
n+2
and dividing we obtain
E

Xn+1
j Fn
(n + 1) + 2

Xn
.
n+2

If we define Mn = Xn /(n + 2), then (Mn ) is a martingale. Here Mn stands


for the proportion of red balls in the urn after the nth draw.

Optional sampling theorem

Definition. A process (Xn ) is called a supermartingale if it is adapted and


E(Xn+1 j Fn ) Xn for n 0. A process (Xn ) is called a submartingale if
it is adapted and E(Xn+1 j Fn ) Xn for n 0.

Definition. A random variable with values in f0, 1, . . .g fg is called


a stopping time if f : () ng Fn for n 0.

MARTINGALES

60

Proposition. If (Xn ) is a supermartingale and 0 are bounded


stopping times, then E(X ) E(X ).
Proof. Let k be the bound, i.e., 0 k. We prove the result by
induction on k. If k = 0, then obviously the result is true.
Now suppose the result is true for k 1. Write
E(X X ) = E(X(k1) X (k1) ) E((Xk Xk1 )1fk1, =kg ).
The first term on the right is non-negative by the induction hypothesis. As
for the second term, note that
f k 1, = kg = f k 1g f k 1gc Fk1 .
Therefore
E((Xk Xk1 )1fk1, =kg ) = E((Xk Xk1 j Fk1 )1fk1, =kg ) 0,
which gives the result.

Optional sampling theorem. If (Mn ) is a martingale and T a finite stopping


time, then under suitable conditions E(M0 ) = E(MT ).
Proof. For each k, Tk := T k is a bounded stopping time so that E(M0 ) =
E(MTk ). But
E(MT ) = E(MTk ) + E((MT Mk )1fT >kg ),
so to prove the theorem you need to argue that
E((MT Mk )1fT >kg ) 0 as k .

Warning. The simple symmetric random walk Sn is a martingale, and


T := inffn : Sn = 1g is a stopping time with P (T < ) = 1. However
E(S0 ) = 0 6= E(ST ) = 1, so the optional sampling theorem fails.

MARTINGALES

61

Analysis of random walk using martingales


Let X1 , X2 , . . . be independent with P (X = 1) = q, P (X = 1) = p, and
P (X = 0) = 1 (p + q). Note that = q p and 2 = p + q (q p)2 .
Let S0 = j and Sn = S0 + X1 + + Xn and define the stopping time
T := inffn 0 : Sn = 0 or Sn = N g where we assume that 0 j N .
1. (Case p = q) Since (Sn ) is a martingale, we have
E(S0 ) = E(ST )
j = 0 P (ST = 0) + N P (ST = N )
which implies that P (ST = N ) = j/N .
2. (Case p 6= q) Now (Sn n) is a martingale, we have
E(S0 0) = E(ST T )
j = N P (ST = N ) E(T )
which unfortunately leaves us with two unknowns. To overcome this problem, we introduce another martingale: Mn = (q/p)n (check that this really
is a martingale!). By optional stopping
!
!
S
S
q 0
q T
E
= E
p
p
j
0
N
q
q
q
=
P (ST = 0) +
P (ST = N )
p
p
p
N
j
q
q
= 1 P (ST = N ) +
P (ST = N ).
p
p
A little algebra now shows that

1 (q/p)j
1 (q/p)j
1
P (ST = N ) =
and E(T ) = (p q)
N
j
1 (q/p)N
1 (q/p)N

MARTINGALES

62

3. (Case p = q) Now (Sn2 n 2 ) is a martingale, we have


E(S02 0) = E(ST2 T 2 )
j 2 = N 2 P (ST = N ) E(T ) 2
Substitute P (ST = N ) = j/N and solve to obtain

E(T ) =

j(N j)
p+q

Probability of ruin: p = 1/2, q = 1/2, p = 9/10, q = 10/19


1.00 .........................................
......... ................
..
.........
......... .....................
0.75
.........
.........
........
.........
........
.........
.......
.
.
.
.........
0.50
...
.........
......... ............
......... .....
......... .....
0.25
......... ....
.............
......
0.00
0
5
10
15
20
Starting point

Average length of game: p = 1/2, q = 1/2, p = 9/10, q = 10/19


100
75
50
25
0

................................
......... ...........................................................
.
.
.
.
.
.............
.... ......
..... ......
..... ............
.
.
..... ....
.
.
.
.
.
.
.... .....
.. ........
.
.
.... ...
.. ......
.
.
........
.. .....
.
... ...
.
.
.
.
.
......
.. ........
......
.....
.. ....
......
......
......
....
0
5
10
15
20
Starting point

MARTINGALES

63

Waiting for patterns: In tossing a fair coin, how long on average until
you see the pattern HT H?
Imagine a gambler who wants to see HT H and follows the play until you
lose strategy: at time 1 he bets one dollar, if it is T he loses and quits,
otherwise he wins one dollar. Now he has two dollars to bet on T , if it is
H he loses and quits, otherwise he wins two more dollars. In that case, he
bets his four dollars on H, if it is T he loses and quits, otherwise he wins
four dollars and stops.
His winnings Wn1 form a martingale with W00 = 0.
Now imagine that at each time j 1 another gambler begins and bets
on the same coin tosses using the same strategy. These guys winnings are
labelled Wn2 , Wn3 , . . . Note that Wnj = 0 for n < j.
P
Define Wn = nj=1 WnJ the total winnings and let T be the first time the
pattern is completed. By optional stopping E(WT ) = E(W0 ) = 0. From
the casinos point of view this means that the average income equals the
average payout.

Income: $1

$1

$1

$1

$1

$1

$1

$1

Coin tosses: H

Payout: $0

$0

$0

$0

$0

$8

$0

$2

Examining this diagram, we see that the total income is T dollars, while
the total payout is 8 + 2 = 10 dollars, and conclude that E(T ) = 10.
Fortunately, you dont need to go through the whole analysis every time
you solve one of these problems, just figure out how much the casino has to
pay out. For instance, if the desired pattern is HHH, then the casino pays
out the final three bettors a total of 8 + 4 + 2 = 14 dollars, thus E(T ) = 14.
Example. If a monkey types on a keyboard, randomly choosing letters,
how long on average before we see the word MONKEY? Answer: 266 =
308915776.

MARTINGALES

64

Guessing Red: A friend turns over the cards of a well shuffled deck one
at a time. You can stop anytime you choose and bet that the next card is
red. What is the best strategy?
Solution: Let Rn be the number of red cards left after n cards have been
turned over. Then

Rn
with probability 1 p
Rn+1 =
,
Rn 1 with probability p
where p = Rn /(52 n), the proportion of reds left. Taking expectations
we get

52 (n + 1)
Rn
= Rn
E(Rn+1 j Rn ) = Rn
52 n
52 n
so that
E

Rn+1
j Rn
52 (n + 1)

Rn
.
52 n

This means that Mn := Rn /(52 n) is a martingale.


Now let T represent your stopping strategy. By the optional stopping
theorem,
P (T is successful) = E(MT ) = E(M0 ) = 1/2.
Every strategy has a 50% chance of success!

An application to linear algebra: First note that if (Xn ) is a Markov chain,


and v a superharmonic function, then the process v(Xn ) is a supermartingale.
X
E(v(Xn+1 ) j Fn ) = E(v(Xn+1 ) j Xn ) =
v(y)p(Xn , y) v(Xn ).
yS

Theorem. Let P be an n n matrix with pij > 0 and


Then the eigenvalue 1 is simple.

pij = 1 for all i.

Proof. Let (Xn ) be the Markov chain with transition matrix P on state
space S = f1, 2, . . . , ng. A function u : S R is harmonic if and only if the

MARTINGALES

65

vector u = (u(1), u(2), . . . , u(n))T satisfies P u = u, i.e., u is a right eigenvalue for the eigenvalue 1. Clearly the constant functions are harmonic; we
want to show that they are the only ones.
Suppose u is a right eigenvector so that u : S R is harmonic and u(Xn )
is a (bounded!) martingale. Let x, y S and Ty := inffn 0 : Xn = yg
be the first time the chain hits state y. Since the chain is communicating,
we have P (Ty < j X0 = x) = 1 and so
u(y) = Ex (u(XTy )) = Ex (u(X0 )) = u(x).

Martingale convergence theorem

Theorem. If (Mn ) is a martingale with supn E(jMn j) < , then there is a


random variable M so that Mn M .
Proof. It suffices to show that for any < a < b < , the probability
that (Mn ) fluctuates infinitely often between a and b is zero. To see this, we
define a new martingale (Wn ) which is the total winnings for a particular
betting strategy.
The strategy is to wait until the process goes below a, then keep betting
until the process goes above b, and repeat. The winnings on the jth bet is
Mj Mj1 so that
W0 = 0,

Wn =

n
X

Bj (Mj Mj1 ), n 1.

j=1

The following diagram explains the relationship between the two martingales.

MARTINGALES

66

M process
.

b
a

....

...
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
...
....
..
......

...
...
............
...
...
..
..
.
...
.
.
.
.
.
.
.
...
.
.......

.
...
...
...

... ...
...
...
...
... ....
...
...
...

...
...
...........
.
.
..
..

.
.
.
.
...
.
.
.
..
.
..
.
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
..... ...
..
..
...
..
..
...
..... ..
.....
.
.
.
.
.
...
.
.
.
.
.
...
.
.
.
.
................ .
...
...
...........

...
...
.

...
... .
..
... ... jM
... ..
n aj
..
... ..
... ..
... .
.

....
...
..
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
..

W process

.
......
... ..
... .....
...
...
........
.
.
.
.
.
.
.
...
.
..... .........
...
...
.....
.....
.....
..... .....
...
.....
........
.
.
...
.
.
..
..
...
...
...
...
...
...
...
...
...

..
......
......
......

.............
...

...
...
...
..
.
...
...
...
..
.
...
...
...
..
.
...
...
...

...
...
..
..
.
.
.
...
...
...
................ ....
...........

Notice that Wn (b a)Un jMn aj, where Un is the number of times


that (Mn ) upcrosses the interval [a, b]. Therefore
0 = E(W0 ) = E(Wn ) (b a)E(Un ) E(jMn aj),
so that E(Un ) E(jMn aj)/(b a). Taking the supremum in n on both
sides gives E(U ) < , which shows that P (U = ) = 0.
So Mn M , but M make take the values or . Luckily Fatous
lemma comes to the rescue, showing us that E(jM j) lim inf n E(jMn j) <
. Thus P (jM j < ) = 1 and E(M ) exists.
Although E(M0 ) = E(Mn ) for all n, we cannot always let n to
conclude that E(M0 ) = E(M ).

MARTINGALES

67

Examples.
1. Polyas urn.
Let Mn be the proportion of red balls in Polyas urn at
time n. Then (Mn ) is a martingale and 0 Mn 1, so supn E(jMn j) 1.
Therefore Mn M for some random variable M . It turns out that M
has a uniform distribution on (0, 1).

Three sample paths from Polyas urn


1.00
0.75
0.50
0.25

....
............. .... ......
........... ..
........
..
.......
....................................................................................................
............................ ......................................................................... .....................................................................
......
.....
....
.
.
..
.
...
...
...
....
..
...
...
....
..
...
.....
...
...
...
...
...
...
...
...
.
........
... ......
................................. .... .......... ............... ................................................ ....................................
....
... .. ....
....
.............................
..........
........... ..... .....
......... ..................... .
... ... .....
....
...... ... ......
... .. .. ..... .............. ..... . ............... ..............
..... ... ....... ... ... ........ .............. ................. ................
................
.
. ...........................................................................................................
........
...........
..... ..
.........
.........................................................................................................
.
.......

0.00
0

25

50

75

100

2. Branching process.
Let Xn be a branching process, and put Mn :=
n
Xn / so that (Mn ) is a martingale. If 1, then Mn M = 0
(extinction). If > 1, then (Mn ) is uniformly integrable and Mn M
where E(M ) = 1.
3. Random harmonic series.
The harmonic series diverges, but not the alternating harmonic series:
1+
1

1
1 1 1
+ + + + + = .
2 3 4
j

1
1 1 1
+ + + (1)j+1 + = ln 2.
2 3 4
j

Here the positive and negative terms partly cancel, allowing the series to
converge.

MARTINGALES

68

Lets choose plus and minus signs at random, by tossing a fair coin. Formally, let (j )
j=1 be independent random variables with common distribution P (j = 1) = P (j = 1) = 1/2. P
Then the martingale convergence
theorem shows that the sequence Mn = nj=1 j /j converges almost surely.
P
The limit M :=
j=1 j /j has the smooth density pictured below.
0.25
0.20
0.15
0.10
0.05
0.00

...............................................
......
....
.
.
...
..
...
.. .
...
...
...
..
...
..
...
..
...
..
...
..
...
..
...
....
.
.
.
.......
.
.
.
.
.
.
.
.....
.
....
3 2 1 0
1
2
3
P
Density of
j=1 j /j.

4. Recurrent Markov chain. Let (Xn ) be an irreducible, recurrent Markov


chain on a countable state space S. Suppose that u is a bounded harmonic
function. Then Mn := u(Xn ) is a bounded martingale and so Mn M
as n . But Xn is recurrent and visits every state infinitely often, so
u(Xn ) can only be convergent if u is constant.

BROWNIAN MOTION

Brownian motion

Basic properties

69

Brownian motion is our standard model for continuous random movement.


We get a Brownian motion (Xt ) by assuming
(1) Independent increments: for s1 < t1 < s2 < t2 < < sn < tn the
random variables Xt1 Xs1 , . . . , Xtn Xsn are independent.
(2) Stationarity: the distribution of Xt Xs depends only on t s.
(3) Continuous paths: the sample path t 7 Xt () is continuous with
probability 1.
Here the random variables Xt take values in the state space Rd for d 1,
the starting point X0 = x Rd can be anywhere, and E(Xt ) = t for some
fixed Rd .
When 6= 0 we have Brownian motion with drift, while if d > 1 we call
(Xt ) multi-dimensional Brownian motion. Conditions (1)(3) imply that
Xt has a (multivariate) normal distribution for t > 0.
Definition. The process (Xt ) is called standard d-dimensional Brownian
motion when = 0 and the covariance matrix of Xt satisfies
E [(Xt X0 )(Xt X0 )0 ] = tI.
In this case, the coordinates of the vector Xt = (Xt1 , Xt2 , . . . , Xtd ) are independent 1-dimensional Brownian motions.
Brownian motion is a Markov process with transition kernel
1
2
e(yx) /2t ,
P (Xt = y j X0 = x) = pt (x, y) =
2dt
This kernel satisfies the Chapman-Kolmogorov equation
Z
ps+t (x, y) = ps (x, z)pt (z, y) dz.

y Rd .

BROWNIAN MOTION

70

For standard d-dimensional Brownian motion, we have

E kXt X0 k

d
X

E((Xtj X0j )2 ) = dt,

j=1

so that, on average, d-dimensional Brownian motion is about


from its starting position at time t.

dt units

In
p fact, the average speed of Brownian motion over [0, t] is E(kXt X0 k)/t
d/t. For large t, this is near zero while for large t, it is near .
Proposition. Xt is not differentiable at t = 0, i.e.,
P ( : Xt0 () exists at t = 0) = 0.
Proof:
( :

Xt0 ()

exists at t = 0)
: sup kXt () X0 ()k/t <
0<t1

: sup 2 kX2n () X0 ()k <


n

n1

: sup 2 kX2n () X2(n1) ()k <


n

n1
: sup 2 kX2n () X2(n1) ()k < k .
n

k=1

Define Ak = ( : supn 2n1 kX2n () X2(n1) ()k < k). The random variables Zn := 2n1 (X2n X2(n1) ) are independent multivariate normal, so
XY
X
P (Ak ) =
P (kZk < k) = 0.
P ( : Xt0 () exists at t = 0)
k

u
t
A more complicated argument gives
P ( : t 7 Xt () is not differentiable at all t 0) = 1.

Brownian Motion at a large time scale


100
80
60
40
20
0
-20
-40
-60
-80
-100

20

40

60

80

Brownian Motion at small time scale


0.01
0.008
0.006
0.004
0.002
0
0.000
-0.002
-0.004
-0.006
-0.008
-0.01

0.002

0.004

0.006

0.008

100
80
60
40
20
0
-20
-40
-60
-80
-100

BROWNIAN MOTION

71

The reflection principle

By the three ingredients (1)(3) that define Brownian motion, we see that
for any fixed s 0, the process (Xt+s Xs ) is a Brownian motion independent of Fs that starts at the origin. In other words, (Xt+s ) is a Brownian
motion, independent of Fs , with random starting point Xs .
An important generalization says that if T is a finite stopping time then
(XT +t ) is independent of FT , with random starting point XT .
Suppose Xt is a standard 1-dimensional Brownian motion starting at x,
and let x < b. We will prove that
P (Xs b for some 0 s t) = 2P (Xt b).
This follows from stopping the process at Tb , the first time (Xt ) hits the
point fbg, then using symmetry. The picture below will help you to understand the calculation:
1
P (Xt b) = P (Xt b j Tb t)P (Tb t) = P (Tb t),
2
which gives the result.

Reflection principle

. ..
..
.....................
.. ...................
...
.............. ........... . ..... ............ .....
...... ...... ..
....
...
.... .. .........
..
....
...... ...
...
...
...
.............................................................................................................................................................................................................................................................................................................................................................................
...
. .. ..
...
.....
..................... ..... ......................
.
.
. ..
.
.
.
.
.
.
.
.
...
.. . . ..
...
...
..... ... .......
.. .. .. ..
........
..................... ..... .....
...
.... ... .. ....
..... ........ ....
.......
... ........ ..... ............... ..
...
.......... ... .......... . .......... . ...... ....... .....
..
.
.............
...........
...................
.
.
.
.
.
.
.
......
.
.
.
.
.
...
.
.
.
.
.
. .........
.
...
........
... .......
.
..............
...... ......
.... .... .
...
...
..
.......... ......
...
...
..
...
......... ... .......
.
... .
.
....................
.
...
.
........... .... ......
.
... ......
....
.
.
.
.
...
.
.
............. ..
....
.............. .........
........
.
..
...
....
...
.
. .. ....... ...
....... ............
...........
. .... ..
.
...
...
...... ....... . . ...... ......
.... .
...... ........ ............. ...
...
... . .............
..................... .................... ....
....
.. ............ .. ....... .. .
.... ..... ... .................
........... ........ ........ .............. ....... .......
.....
...
... .......... ..
.. ........
. .... ..
........
............ .. ....
.
...
... ... ......
......
.... ...... ......
.....
.
.
...
... ... .
.
.
....... ....... .
.
.
...
... .....
.... ....
....
...
...........
...
.......
..
......
...........................................................................................................................................................................................................................................................................................................................................................................
...
...
...
...
..
....
..
...

If we now fix a < x < b, we may write explicitly

Px (Tb t) = 2Px (Xt b) = 2P Z (b x)/ t .


Letting t we find Px (Xt ever hits b) = 2P (Z 0) = 1.

BROWNIAN MOTION

72

This shows that 1-dimensional Brownian will eventually hit every value
greater than the starting position. Since Xt is also a Brownian motion, we argue that it will also hit every value less than the starting point:
Px (Xt hits a) = Px (Xt hits a) = 1.
Now we use the strong Markov property again to show that
Px (Xt hits b, then hits a) = Px (Xt hits b)Pb (Xt hits a) = 1.
In particular it must return to its starting point. You can extend this
argument to prove
Px (Xt hits all points infinitely often ) = 1.

Now let T be the hitting time of the set fa, bg. Since (Xt ) is a martingale,
we have
x = Ex (X0 ) = Ex (XT ) = aPx (XT = a) + bPx (XT = b).
Using the fact that Px (XT = a) + Px (XT = b) = 1, we can conclude that
Px (XT = b) = (x a)/(b a).
Just like for the symmetric random walk, (Xt2 t) is a martingale so
Ex (X02 0) = Ex (XT2 T )
x2 = a2 Px (XT = a) + b2 Px (XT = b) Ex (T ).
The previous result plus a little algebra shows that
Ex (T ) = (b x)(x a).
If we let a , we find that, although Px (Tb < ), we have Ex (Tb ) = .

The Dirichlet problem

Let (Xt ) be a d-dimensional Brownian motion and f : Rd R. We want


to study the function u(t, x) := Ex (u(Xt )).

BROWNIAN MOTION

73

The Taylors series of f about the point z is


1
f(y) = f(z) + hf(z), y zi + hy z, D2 f(z)(y z)i + o(jy zj2 ).
2
Setting y = Xt , z = Xs , and taking expectations we get
X
Ex (f(Xt )) = Ex (f(Xs )) +
Ex (i f(Xs ))Ex (Xti Xsi )
i

1X

Ex (ij2 f(Xs ))Ex [(Xti Xsi )(Xtj Xsj )] + o(E(kXt Xs k2 ))

i,j

= Ex (f(Xs )) + Ex

ii2 f(Xs ) (t s) + o(jt sj)

Therefore we see that u satisfies

1
u(t, x) = Ex (f(Xt )) .
t
2
To find the spatial derivatives of u we use the translation invariance of
Brownian motion.
Ey (f(Xt )) = Ex (f(Xt + (y x)))

1
2
2
= Ex f(Xt ) + h(f)(Xt ), y xi + hy x, D f(Xt )(y x)i + o(ky xk )
2
1
= Ex (f(Xt )) + hEx ((f)(Xt )), y xi + hy x, Ex (D2 f(Xt ))(y x)i
2
2
+ o(ky xk ).
In particular, we have D2 u(t, x) = Ex (D2 f(Xt )) and hence u(t, x) =
Ex ((f)(Xt )). In other words, u satisfies the heat equation
1

u(t, x) = (u)(t, x).


t
2
Let us explore the connection with the heat equation on a bounded region
D of Rd . We fix a temperature distribution g on D (the boundary) for all
time, and begin with an initial temperature distribution f in D at time 0.
The latter distribution will flow and eventually dissipate completely.

BROWNIAN MOTION

74

The solution to the heat equation for x D can be expressed as

u(t, x) = Ex f(Xt )1ft<T g + g(XT )1ftT g ,


where T is the time when the process first hits the boundary.
Letting t we arrive at v(x) = Ex (g(XT )), the solution to the Dirichlet
problem. That is, v is harmonic (v(x) = 0) inside D, and v = g on D.
Example. Lets apply this result to a problem we already solved using
martingales. Let (Xt ) be 1-dimensional Brownian motion, D = (a, b), and
put g(a) = 0 and g(b) = 1. For d = 1, harmonic means linear so we
solve the problem with a straight line segment with the right boundary
conditions. This gives us
v(x) = Ex (g(XT )) = Px (XT = b) =

xa
.
ba

..
...........
......... ....
.........
...
.........
.
.
.
.
.
...
.
.
.
..
...
.........
.........
.
.
.
.
.
.
....
.
.
......
.
.
.
.
.
...
.
.
.
......
.
.
.
...
.
.
.
.
.
......
.
.
...
.
.
.
.
.
.
......
...
.
.
.
.
.
.
.
.
..
......
.
.
.
.
.
.
.
.
.
.
.
...............................................................................................................................................................................................................................

Thats very nice, but let me plant the seeds of doubt by asking three
questions and looking at a couple of counterexamples.
Questions
1. Do we know that Px (T < ) = 1?
2. Is u continuous at the boundary?
3. Is there more than one solution to the Dirichlet problem?
Example 1. In R2 , let D = f(x1 , x2 ) : x1 > 0g, the open right half plane.
The functions u1 (x) 0 and u2 (x) = x1 are both harmonic and equal zero
on D.

BROWNIAN MOTION

75

Theorem 1. If D is a bounded open set, u1 , u2 harmonic on D, continuous


and u1 = u2 on D, then u1 = u2 on D.
on the closure D,
Example 2. Let D = f(x1 , x2 ) : 0 < x21 + x22 < 1g be the punctured disc,
and put g(x) = 1 jxj on D. That is, zero when jxj = 1 and 1 at x = 0.
For x D, we have u(x) = Ex (g(XT )) = Px (XT = 0). It turns out that
u(x) = 0 for all x D.
Theorem 2. If D is smooth, and g continuous on D, then u(x) g(y)
as x D y D.
Example 3. Let D be bounded with a smooth boundary D. Define
u(x) = Px (TD < ). Then u is harmonic on D and u 1 on D.
Therefore, theorems 1 and 2 tell us that u(x) = 1 for all x D, that is,
Px (TD < ) = 1 for all x D.
Example 4. Let D1 be any bounded set in Rd . Let D2 be an open sphere
so large that D1 D2 . Since D2 is a bounded open set with a smooth
boundary, we have
Px (TD1 < ) Px (TD2 < ) = 1 for all x D1 .
Now that weve cleared those points up, lets return to the problem of
finding probabilities by solving the Dirichlet problem.
..................................................
.......
.........
......
.......
......
......
.....
......
.
.
.
.
.....
....
.
.....
.
.
...
...
.
.
...
..
.
...
.
...
...
.
...
..
.
...
............................
.
.
.
.
.
....
.....
...
....
.
.
.
.
...
.
...
...
...
.
...
...
.
...
...
...
....
...
..
..
....
... ..................
.
.
...
.
...
.
.
...
.
...
...
.
..
.
.
...
.
.
.
..
... .......
.
...
.
.
.
.
... .....
.
.
...
.
.
.
.
.
.
.
.
.
...
.......
.....
...
...
..... ..................................
...
.....
...
.....
...
...
.....
..
.
...
.
.
.
.
.
...
...
...
..... ....
...
.....
.....
.....
.....
......
.....
.
.
.
.
......
.....
.......
.......
.........
................. .........................
..........

x.

R
1

R2
.

The probability that Brownian motion reaches the outer boundary first is
given by v(x) = Ex (g(XT )) where g(x) = 1 if jxj = R2 and g(x) = 0 if
jxj = R1 . The function v will be harmonic in between. Now, the symmetry

BROWNIAN MOTION

76

of Brownian motion implies that the probability is the same for all x with
a common radius. So we can write
P
1/2
d
2
v(x) = (r), where r =
.
i=1 xi
Taking derivatives of the function r, we find
1/2
1 Pd
xi
2
x
i r =
2xi =
i=1 i
2
r
x
i
i [(r)] = 0 (r)i r = 0 (r)
r
so that
x
xi xi
i
+ 0 (r)i
r r
r

x2i
1
1 xi
00
0
= (r) 2 + (r)
+ xi 2
r
r
r r
2
1
x2
x
= 00 (r) 2i + 0 (r) 0 (r) 3i
r
r
r

ii [(r)] = 00 (r)

Adding over i gives


[(r)] =

d
X
i=1

d
r2
r2
ii [(r)] = (r) 2 +0 (r) 0 (r) 3 = 00 (r)+0 (r)
r
r
r
00

d1
r

Solving the one variable equation [(r)] = 0 we get the solution

ln jxj ln(R1 )

ln(R ) ln(R ) if d = 2

2
1
v(x) =

R12d jxj2d

2d
if d 3
R1 R22d
We learn something interesting by taking limits as R2 . For d = 2,
Px ( ever hits B(0, R1 )) = lim 1 v(x) = 1.
R2

Two dimensional Brownian motion will hit any ball, no matter how small,
from any starting point. If we pursue this argument, we can divide R2

BROWNIAN MOTION

77

using a fine grid, and find that 2-d Brownian motion will visit every section
infinitely often.
On the other hand, if we leave R2 alone and let R1 0, we get
Px (Xt = 0 before jXt j = R2 ) = lim 1 v(x) = 0,
R1 0

and if we now let R2 we discover


Px (Xt ever hits 0) = 0.
Two dimensional Brownian motion will never hit any particular point. The
process is neighborhood recurrent but not point recurrent.
For d 3, if we let R2 we get
Px ( ever hits B(0, R1 )) = lim 1 v(x) =
R2

jxj
R1

2d

Since this is less than one, we see that Brownian motion is transient when
d 3.
It turns out that whether or not d-dimensional Brownian motion will hit
a set depends on its fractional dimension. The process can hit sets of
dimension greater than d 2, but cannot hit sets of dimension less than
d 2. In the d 2 case, it depends on the particular set.

STOCHASTIC INTEGRATION

78

Stochastic integration

Integration with respect to random walk

Let X1 , X2 , . . . be independent random variables with P (Xi = 1) = P (Xi =


1) = 1/2. The symmetric random walk can be expressed as Sn = X1 +
+ Xn , so that Xi = Si Si1 = Si .
Let Fn denote the information ni X1 , . . . , Xn , and let Bn be the amount
bet on the nth game. We require that Bn Fn1 , i.e., the B-process is
predictable.
The winnings up to time n can be written:
Zn =

n
X

Bi Xi =

i=1

n
X

Bi Si ,

i=1

so we can call Z the integral of B with respect to S.


Recall that Z is a martingale
E(Zn+1 Zn j Fn ) = E(Bn+1 Xn+1 j Fn ) = Bn+1 E(Xn+1 ) = 0.
In particular, E(Zn ) = 0. What about the variance Var (Zn ) = E(Zn2 )?
Squaring the sum gives
X
X
Bi Bj Xi Xj
Bi2 Xi2 + 2
Zn2 =
i<j

X
i

Bi2

+2

Bi Bj Xi Xj

i<j

For i < j we have


E(Bi Bj Xi Xj ) = E(E(Bi Bj Xi Xj j Fj1 ))
= E(Bi Bj Xi E(Xj ))
= 0
so E(Zn2 ) =

E(Bi2 ).

STOCHASTIC INTEGRATION

79

Integration with respect to Brownian motion

Many models of random behaviour suppose that a process X satisfies a


stochastic differential equation dXt = a(Xs ) ds + b(Xs ) dWs . Here the function a is called the drift coefficient and b the diffusion coefficient. This
equation is understood in the integrated sense
Z t
Z t
Xt = X0 +
a(Xs ) ds +
b(Xs ) dWs .
0

Let Wt be a standard 1-dimensional Brownian motion, and Yt the


R t amount
bet at time t. We want to define the integrated process Zt = 0 Ys sWs .
Rt
We assume that 0 E(Ys2 ) ds < and that Yt is Ft measurable.
Simple integrands Suppose there are a finite number of times 0 = t0 <
t1 < t2 < < tn and that the process (Yt ) can be written

Y0 0 t < t1

Y t t < t
1
1
2
Yt = ..
..

.
Y n tn t <
We assume E(Yi2 ) < and Yi Fti for all i. Then it makes sense to
define: for tj < t tj+1
Zt =

Ys dWS =
0

j
X

Yi1 [Wti Wti1 ] + Yj [Wt Wtj ].

...
...
..
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
..
....................................................................................................................................................................................................................................................................................................... .

i=1

tj t tj+1

t1 t2
t3
0
Here are some facts about the integral weve defined

STOCHASTIC INTEGRATION

80

1. Linearity: If X and Y are two simple integrands, then so is aX + bY


and

Z t
Z t
Z t
Ys dWs .
Xs dWs + b
(aXs + bYs ) dWs = a
0

2. Martingale property: Clearly Zt Ft and E(Zt2 ) < . Now if


tj s t tj+1 for some j then Zt Zs = Yj [Wt Ws ] so
E(Zt Zs j Fs ) = E(Yj [Wt Ws ] j Fs ) = Yj E(Wt Ws Fs ) = 0,
which is the martingale equation. Now if s tj < < tk t, then
P
Zt Zs = (Ztj Zs ) + k1
i=j (Zti+1 Zti ) + (Zt Ztk ), so that
E(Zt Zs j Fs ) = E(Ztj Zs j Fs ) +

k1
X

E(Zti+1 Zti j Fs ) + E(Zt Ztk j Fs )

k1
X

E(E(Zti+1 Zti j Fti ) j Fs )

i=j

= E(Ztj Zs j Fs ) +

i=j

+E(E(Zt Ztk j Ftk ) j Fs )


= 0.
Rt
3. Variance formula: E(Zt2 ) = 0 E(Ys2 ) ds. This follows exactly as for
integration with respect to random walk.
For integrands Yt that are not simple, we define a simple approximation as
follows
(n)
Yt = Yi/n for i/n t < (i + 1)/n.
Rt
The stochastic integral Zt = 0 Ys dWs is defined as the limit
Z t
Ys(n) dWs .
Zt = lim
n

The linearity, martingale property, and variance formula carry over to (Zt )
An example Let f be a differentiable non-random function. Then
Z t
Z t
Zt =
f(s) dWs = (Wt f(t) W0 f(0))
Ws df (s).
0

STOCHASTIC INTEGRATION

81

Rt
Then Zt is a normal random variable with mean zero and variance 0 f 2 (s) ds.
We can show that Zt has independent increments as well, so that Z is just
a time changed Brownian motion

t
2
Zt = B f (s) ds .
0

Itos formula

Let f be a differentiable function and write f(t) as the telescoping sum


f(t) = f(0) +

n1
X

f(0) +

n1
X

[f((j + 1)t/n) f(jt/n)]

j=0

f (jt/n)(t/n) +

j=0

= f(0) +

n1
X

o(t/n)

j=0

f 0 (s) ds + 0
0

In a similar vein, let Wt be a Brownian motion and write f(Wt ) as telescoping sum
f(Wt ) = f(w0 ) +

n1
X

[f(W(j+1)t/n ) f(Wjt/n )]

j=0

= f(W0 ) +

n1
X
j=0

1
+
2

n1
X
j=0

00

f 0 (Wjt/n ) W(j+1)t/n Wjt/n

f (Wjt/n ) W(j+1)t/n Wjt/n

n1 h
X
2 i

+
o W(j+1)t/n Wjt/n
j=0

The intuition behind Itos formula is that you can replace W(j+1)t/n Wjt/n
by t/n with only a small amount of error. Therefore
f(Wt ) = f(W0 ) +

n1
X
j=0

f 0 (Wjt/n ) W(j+1)t/n Wjt/n

STOCHASTIC INTEGRATION

82

n1

n1

X
1 X 00
f (Wjt/n ) (t/n) +
o(t/n) + error .
2 j=0
j=0

Letting n gives Itos formula

f(Wt ) = f(W0 ) +

1
f (Ws ) dWs +
2
0

f 00 (Ws ) ds.
0

Rt
Example. Suppose we want to calculate 0 Ws dWs . The definition gets us
nowhere so we try to apply the usual rules of calculus
Z t
Z t
2
2
Ws dWs ,
Ws dWs = Wt W0
0

Rt

which implies 0 Ws dWs = [Wt2 W02 ]/2. The only problem is that this
formula is false! Since W0 = 0 , we can see that it is fishy by taking
expectations on both sides: the left hand side gives zero but the right hand
side is strictly positive.
The moral of this example is that the usual rulesR of calculus do not apply
t
to stochastic integrals. So how do we calculate 0 Ws dWs correctly? Let
f(t) = t2 , so f 0 (t) = 2t and f 00 (t) = 2. From Itos formula we find
Z t
Z
Z t
1 t
2
2
2Ws dWs +
Wt = W0 +
2 ds = 2
Ws dWs + t,
2 0
0
0
and therefore

Ws dWs =
0

1 2
Wt t .
2

A more advanced version of Itos formula can handle functions that depend
on t as well as x:

f(t, Wt ) = f(0, W0 ) +

s f(s, Ws ) ds +
0

Z
0

1
x f(s, Ws ) dWs +
2

xx f(s, Ws ) ds.
0

STOCHASTIC INTEGRATION

83

A peek at math finance


Imagine an economy with two assets: a bond whose value grows at a fixed
rate r and a stock whose price per unit (St ) is a random variable. If Mt is
the cost of one unit of bond at time t we have dMt = rMt dt which implies
Mt = M0 ert . For the stock we have
dSt = St ( dt + dWt )

()

How do we solve this equation? Guess! Let Xt = exp(at + bWt ). From Itos
formula with f(t, x) = exp(at + bx) we get
Xt

Z
1 t 2
b Xs ds
= X0 +
aXs ds +
bXs dWs +
2 0
0
0
Z t
Z t
1
= X0 + (a + b)
aXs ds + b
Xs dWs
2
0
0
Z

To solve () we set b = and a = 2 /2. The solution to our problem


is a geometric Brownian motion

St = S0 exp Wt + ( 2 /2) t .