Stochastic Processes
with Applications
Books in the Classics in Applied Mathematics series are monographs and textbooks declared out
of print by their original publishers, though they are of continued importance and interest to the
mathematical community. SIAM publishes this series to ensure that the information presented in these
texts is not lost to today's students and researchers.
EditorinChief
Robert E. O'Malley, Jr., University of Washington
Editorial Board
John Boyd, University of Michigan
Leah EdelsteinKeshet, University of British Columbia
William G. Faris, University of Arizona
Nicholas J. Higham, University of Manchester
Peter Hoff, University of Washington
Mark Kot, University of Washington
Peter Olver, University of Minnesota
Philip Protter, Cornell University
Gerhard Wanner, L'Universite de Geneve
Petar Kokotovic, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis
and Design
Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New
Statistical Methodology
James A. Murdock, Perturbations: Theory and Methods
Ivar Ekeland and Roger Temam, Convex Analysis and Variational Problems
Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II
J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables
David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications
F. Natterer, The Mathematics of Computerized Tomography
Avinash C. Kale and Malcolm Slaney, Principles of Computerized Tomographic Imaging
R. Wong, Asymptotic Approximations of Integrals
O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation
David R. Brillinger, Time Series: Data Analysis and Theory
Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, FixedPoint
Theorems
Philip Hartman, Ordinary Differential Equations, Second Edition
Michael D. Intriligator, Mathematical Optimization and Economic Theory
Philippe G. Ciarlet, The Finite Element Method for Elliptic Problems
Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue
Computations, Vol. I: Theory
M. Vidyasagar, Nonlinear Systems Analysis, Second Edition
Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice
Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology
of Selecting and Ranking Populations
Eugene L. Allgower and Kurt Georg, Introduction to Numerical Continuation Methods
Leah EdelsteinKeshet, Mathematical Models in Biology
HeinzOtto Kreiss and Jens Lorenz, InitialBoundary Value Problems and the NavierStokes Equations
J. L. Hodges, Jr. and E. L. Lehmann, Basic Concepts of Probability and Statistics, Second Edition
George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable: Theory and
Technique
Friedrich Pukelsheim, Optimal Design of Experiments
Israel Gohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications
Lee A. Segel with G. H. Handelman, Mathematics Applied to Continuum Mechanics
Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues
Barry C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics
Charles A. Desoer and M. Vidyasagar, Feedback Systems: InputOutput Properties
Stephen L. Campbell and Carl D. Meyer, Generalized Inverses of Linear Transformations
Alexander Morgan, Solving Polynomial Systems Using Continuation for Engineering and Scientific Problems
I. Gohberg, P. Lancaster, and L. Rodman, Matrix Polynomials
Galen R. Shorack and Jon A. Wellner, Empirical Processes with Applications to Statistics
Richard W. Cottle, JongShi Pang, and Richard E. Stone, The Linear Complementarity Problem
Rabi N. Bhattacharya and Edward C. Waymire, Stochastic Processes with Applications
Robert J. Adler, The Geometry of Random Fields
Mordecai Avriel, Walter E. Diewert, Siegfried Schaible, and Israel Zang, Generalized Concavity
Rabi N. Bhattacharya and R. Ranga Rao, Normal Approximation and Asymptotic Expansions
F ^
Stochastic Processes
with Applications
b ci
Rabi N. Bhattacharya
University of Arizona
Tucson, Arizona
Edward C. Waymire
Oregon State University
Corvallis, Oregon
pia m o
Society for Industrial and Applied Mathematics
Philadelphia
Copyright 2009 by the Society for Industrial and Applied Mathematics
This SIAM edition is an unabridged republication of the work first published by John
Wiley & Sons (SEA) Pte. Ltd., 1992.
10987654321
All rights reserved. Printed in the United States of America. No part of this book may
be reproduced, stored, or transmitted in any manner without the written permission of
the publisher. For information, write to the Society for Industrial and Applied
Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 191042688 USA.
Preface xv
ix
X CONTENTS
XIII
Preface
xv
PREFACE
xvi
m.n, or Corollary m.n, refers to the nth such assertion in section m of the same
chapter. Exercise n, or Example n, refers to the nth Exercise, or nth Example,
of the same section. Exercise m.n (Example m.n) refers to Exercise n (Example
n) of a different section m within the same chapter. When referring to a result
or an example in a different chapter, the chapter number is always mentioned
along with the label m.n to locate it within that chapter.
This book took a long time to write. We gratefully acknowledge research
support from the National Science Foundation and the Army Research Office
during this period. Special thanks are due to Wiley editors Beatrice Shube and
Kate Roach for their encouragement and assistance in seeing this effort through.
RABI N. BHATTACHARYA
EDWARD C. WAYMIRE
Bloomington, Indiana
Corvallis, Oregon
February 1990
Sample Course Outlines
COURSE I
Beginning with the Simple Random Walk, this course leads through Brownian
Motion and Diffusion. It also contains an introduction to discrete/continuous
parameter Markov Chains and Martingales. More emphasis is placed on concepts,
principles, computations, and examples than on complete proofs and technical
details.
Chapter 1 Chapter II Chapter III
17 (+ Informal Review of Chapter 0, 4) 14 13
13 (Up to Proposition 13.5) 5 (By examples) 5
11 (Example 2)
13
Chapter IV Chapter V Chapter VI
17 (Quick survey 1 4
by examples) 2 (Give transience/recurrence
from Proposition 2.5)
3 (Informal justification of
equation (3.4) only)
57
10
11 (Omit proof of Theorem 11.1)
1214
COURSE 2
The principal topics are the Functional Central Limit Theorem, Martingales,
Diffusions, and Stochastic Differential Equations. To complete proofs and for
supplementary material, the theoretical complements are an essential part of this
course.
Chapter I Chapter V Chapter VI Chapter VII
14 (Quick survey) 13 4 14
610 67
13 11
13 17
COURSE 3
This is a course on Markov Chains that also contains an introduction to
Martingales. Theoretical complements may he used only sparingly.
Denoting by X the value of a stock at an nth unit of time, one may represent
its (erratic) evolution by a family of random variables {X0 , X,, ...} indexed by
the discretetime parameter n E 7L + . The number X, of car accidents in a city
during the time interval [0, t] gives rise to a collection of random variables
>,
{ X1 : t 0} indexed by the continuoustime parameter t. The velocity X. at a
point u in a turbulent wind field provides a family of random variables
{X: u e l8 3 indexed by a multidimensional spatial parameter u. More generally
}
In the above, one may take, respectively: (i) I = Z , S = I!; (ii) I = [0, oo),
S = Z; (iii) I = l, S = X8 3 . For the most part we shall study stochastic
processes indexed by a onedimensional set of real numbers (e.g., time). Here
the natural ordering of numbers coincides with the sense of evolution of the
process. This order is lost for stochastic processes indexed by a multidimensional
parameter; such processes are usually referred to as random fields. The state
space S will often be a set of real numbers, finite, countable, (i.e., discrete) or
uncountable. However, we also allow for the possibility of vectorvalued
variables. As a matter of convenience in notation the index set is often suppressed
when the context makes it clear. In particular, we often write {X} in place of
{X: n = 0, 1, 2, ...} and {X,} in place of {X,: t >, 0}.
For a stochastic process the values of the random variables corresponding
2 RANDOM WALK AND BROWNIAN MOTION
(a)
(b)
Figure 1.1
THE SIMPLE RANDOM WALK 3
Example 1. The sample space S2 for repeated (and unending) tosses of a coin
may be represented by the sequence space consisting of sequences of the form
w = (co l , w 2 . . , w n , ...) with aw n = 1 or co,, = 0. For this choice of 0, the value
,.
each variable has the same (Bernoulli) distribution. These facts are summarized
by saying that {X 1 , X2,. . .} is a sequence of independent and identically
distributed (i.i.d.) random variables with a common Bernoulli distribution. Let
Fn denote the event that the specific outcomes E 1 , ... , e n occur on the first n
tosses respectively. Then
0 < P(G) < P(F,,) = p'"(1 p)" '"  for each n = 1, 2, .... (1.2)
Now apply a limiting argument to see that, for 0 < p < 1, P(G) = 0. Hence the
probability of every singleton event in S2 is zero.
Definition 2.1. The stochastic process {S,,: n = 0, 1, 2, ...} is called the simple
random walk. The related process S = S + x, n = 0, 1, 2, ... is called the simple
random walk starting at x.
n
n + y x pin+Yx)12q(nY+x)/2 if ly xI < ri
P(S. =Y)= 2
and y x, n have the same parity,
0 otherwise. (2.2)
TRANSIENCE AND RECURRENCE PROPERTIES OF THE SIMPLE RANDOM WALK 5
Let us first consider the manner in which a particle escapes from an interval.
Let TY denote the first time that the process starting at x reaches y, i.e.
To avoid trivialities, assume 0 <p < 1. For integers c and d with c < d, denote
In other words, 4(x) is the probability that the particle starting at x reaches d
before it reaches c. Since in one step the particle moves to x + I with probability
p, or to x 1 with probability q, one has
so that
Thus, (x) is the solution to the discrete boundaryvalue problem (3.4). For
p ^ q, Eq. 3.4 yields
x1 x1 q Y
O(x) = Z [^(y + 1 ) o(y)] = Z  [O(c + 1) O(c)]
v=c v=c P
1  (q/P)x '
1 (qlP) `
1 =4(d)=4(c+ 1)
1 q/P
Then
1 q/P
q(c + 1) = 1
(glp)d c
6 RANDOM WALK AND BROWNIAN MOTION
so that
P(Tx<Tx)= 1 (q/P)xdc for c<x<d, p q. (3.6)
1 (q/P)
Now let
P(Tx<Td)= 1(P/q)dxdc
or c<x<d,p q.
f (3.8)
1 (P/q)
Note that O(x) + fr(x) = 1, proving that the particle starting in the interior of
[c, d] will eventually reach the boundary (i.e., either c or d) with probability
1. Now if c < x, then (Exercise 3)
um \9/ x
c
ifp >21
= dam
1, ifp <Z,
xc
q if p>
= P (3.9)
1, ifp<Z.
By symmetry, or as above,
i
P({S:} will ever reach d) = P(Td' < oo) = 1' d_x i f p > 2 (3.10)
C q/ ,
fp<Z.
Observe that one gets from these calculations the (geometric) distribution
function for the extremes Mx = sup,, S and mx = inf S; (Exercise 7).
Note that, by the strong law of large numbers (Chapter 0),
P Sx = x+S
^pgasnoo =1. (3.11)
n n
TRANSIENCE AND RECURRENCE PROPERTIES OF THE SIMPLE RANDOM WALK 7
Hence, if p > q, then the random walk drifts to + oo (i.e., S * + co) with
probability 1. In particular, the process is certain to reach d > x if p > q.
Similarly, if p < q, then the random walk drifts to  co (i.e., S +  cc), and
starting at x > c the process is certain to reach c if p < q. In either case, no
matter what the integer y is,
x
= y + 0 as n k  cc,
nk nk
Definition 3.1. A state y for which Eq. 3.12 holds is called transient. If all
states are transient then the stochastic process is said to be a transient process.
xc
P(Tx<Tx)= ,
c<x<d,p=q =Z (3.13)
dc
Similarly,
dx
P(Tcx <Td)=
dc
c<,x<d,p=q =2 (3.14)
Again we have
= lim xc = 1. (3.17)
e  ao d c
Thus, no matter where the particle may be initially, it will eventually reach any
given state y with probability 1. After having reached y for the first time, it will
move to y + 1 or to y 1. From either of these positions the particle is again
bound to reach y with probability 1, and so on. In other words (Exercise 4),
Definition 3.2. A state y for which Eq. 3.18 holds is called recurrent. If all
states are recurrent, then the stochastic process is called a recurrent process.
Consider the random variable 7 := T representing the first time the simple
random walk starting at zero reaches the level (state) y. We will calculate the
distribution of Ty by means of an analysis of the sample paths of the simple
random walk. Let FN , y = {Ty = N} denote the event that the particle reaches
state y for the first time at the Nth step. Then,
Note that "SN = y" means that there are (N + y)/2 plus l's and (N y)/2
minus 1's among X I , X2 , ... , XN (see Eq. 2.1). Therefore, we assume that
IYI <, N and N + y is even. Now there are as many paths leading from (0, 0)
to (N, y) as there are ways of choosing (N + y)/2 plus l's among X 1 , X2 , ... , XN ,
namely
N
N+y
2
FIRST PASSAGE TIMES FOR THE SIMPLE RANDOM WALK 9
where L is the number of paths from (0, 0) to (N, y) that do not touch or cross
the level y prior to time N. To calculate L, consider the complementary number
L of paths that do reach y prior to time N,
N
L'= N+y L. (4.3)
2
First consider the case of y> 0. If a path from (0, 0) to (N, y) has reached
y prior to time N, then either (a) SJY _ 1 = y + 1 (see Figure 4.1a) or
(b) SN _ 1 = y I and the path from (0, 0) to (N 1, y 1) has reached y prior
to time N 1 (see Figure 4.1b). The contribution to L from (a) is
N1
N+y
2
We need to calculate the contribution to L from (b).
(a)
1I
Figure 4.1
10 RANDOM WALK AND BROWNIAN MOTION
Proposition 4.1. (A Reflection Principle). Let y > 0. The collection of all paths
from (0, 0) to (N 1, y 1) that touch or cross the level y prior to time N 1
is in onetoone correspondence with the collection of all possible paths from
(0,0)to (N 1,y + 1).
It now follows from the reflection principle that the contribution to L' from
(b) is
N1
N+y
2
Hence
N1
L' =2 N+y (4.4)
2
N
Figure 4.2
MULTIDIMENSIONAL RANDOM WALKS 11
N
= IYI N + y p(N+Y)12q(N_Y)1i for N >, y, y + N even, y > 0
N \ 2
(4.5)
To calculate P(TT = N) for y < 0, simply relabel H as T and T as H (i.e.,
interchange + 1, 1). Using this new code, the desired probability is given by
replacing y by y and interchanging p, q in (4.5), i.e.,
N
P(Tr = N) = ( + y q(N_Y)/2p(N+Y)/2
2
N
P ( Ty = N ) = N + y p ( N+y)/2 q (x v)I2 = I (4.6)
p(SN = y)
N
2
However, observe that the expected time to reach y is infinite since by Stirling's
formula, k! = (2irk) 1 / 2 Ve '`( 1 + o(1)) as k ,, oo, the tail of the p.m.f. of Ty is of

ES = x,
Cov(S nxi xj
' , Sn')
= tn, if =j i (5.3)
0, ifi*j.
Proof. The result has already been obtained for k = 1. In general, let S. = Sno
and write
rn=P(Sn=0)
fn = P(S" = 0 for the first time after time 0 at n), n >, 1. (5.4)
Let P(s) and f(s) denote the respective probability generating functions of {r n }
and {f,.} defined by
P(s) = 1 + I E .ijrnjs'snj = 1 + jZ Y r
n =1j =0 =0 (M W
=0
m sm)fj sj = 1 + f(s)f(s). (5.7)
MULTIDIMENSIONAL RANDOM WALKS 13
Therefore,
(5.8)
r(s) 1 f(s)
Note that by the Monotone Convergence Theorem (Chapter 0), P(s) ,, r(1) and
f(s) / f(1) as s T 1. If f(1) < 1, then P(l) = (1 f (1))' < oo. If f(1) = 1,
then P(1) = um s , (1 f(s) = oo. Therefore, y < 1 (i.e., 0 is transient) if
and only if :=r(1) < oo.
This criterion is applied to the case k = 2 as follows. Since a return to 0 is
possible at time 2n if and only if the numbers of steps among the 2n in the
positive horizontal and vertical directions equal the respective numbers of steps
in the negative directions,
\
412n nn ^ n ^I n n 1 nn^ z . (5.10)
j=o j 4 2n
The combinatorial identity used to get the last line of (5.10) follows by
considering the number of ways of selecting samples of size n from a population
of n objects of type 1 and n objects of type 2 (Exercise 2). Apply Stirling's
formula to (5.10) to get r2 = 0(1/n) > c/n for some c > 0. Therefore,
= P(1) = + oo and so 0 is recurrent in the case k = 2.
In the case k = 3, similar considerations of "coordinate balance" give
1 2n 1
( ) n! 2
= 22n n (5.11)
j+msn 3" j!m!(n j m)!} .
Therefore, writing
n! 1
pj, m =
j!m!(n j m)! 3n
and noting that these are the probabilities for the trinomial distribution, we have
14 RANDOM WALK AND BROWNIAN MOTION
that
1 (2n) 2
= 2 z" (P;.m) (5.12)
n
( 2n)
22 n jmax Pj,m]Pj,m= 2 a" ( nnx) ma Pj.m. (5.13)
j,m j,m
r2n \ 1 2n 1
( ) n.i
(5.14)
2 2 n n 3" rn i fl, [n],
C
r 2n  2" nn n n
3/2 for some C' > 0. (5.15)
In particular,
Er"<oo. (5.16)
"
The general case, r2n < c k n k/ 2 for k > 3, is left as an exercise (Exercise 1).
n
The constants appearing in the estimate (5.15) are easily computed from the
monotonicity of the ratio n!/{(2nn)`I 2 n"e  "}; whose limit as n > oo is 1
according to Stirling's formula. To see that the ratio is monotonically decreasing,
simply observe that
t
log n! = log n! flog n n log n + n log(2n) l i 2
(2nn) 112 n"e  "
J .
,1
j log j Z log n}{n lognn}log(2n)"2
)
U2 2 J^ J
(5.17)
CANONICAL CONSTRUCTION OF STOCHASTIC PROCESSES 15
where the integral term may be checked by integration by parts. The point is
that the term defined by
provides the inner trapezoidal approximation to the area under the curve
y = log x, 1 < x <, n. Thus, in particular, a simple sketch shows
01 J logxdxT"
n! e
n = 1, 2, .... (5.19)
1 (2nn) 112 n"e  " < (2n) 1 / z ,
Equivalently,
for all Borel sets B 1 , . .. , B. in 118 1 and n >, 1. Kolmogorov's Existence Theorem
asserts that the consistency condition (6.3) is also sufficient for such a probability
measure P to exist and that there is only one such P on (, R) = (12, F)
(theoretical complement 1). This holds more generally, for example, when the
state space S is l, a countable set, or any Borel subset of tF . A proof for the
simple case of finite state processes is outlined in Exercise 3.
Since Q(R') = 1, the consistency condition (6.3) follows immediately from the
definition (6.4). Now one simply invokes the Kolmogorov Existence Theorem
to get a probability measure P on (S2, F) such that
= p(X1 EB1) . .
.p(X,,EB). (6.5)
The simple random walk can be constructed within the framework of the
canonical probability space (S2, F, P) constructed for coin tossing, although
this is a noncanonical probability space for {S}. Alternatively, a canonical
construction can be made directly for {S} (Exercise 2(i)). This, on the other
hand, provides a noncanonical probability space for the displacement
(cointossing) process defined by the differences X. = S. S_ 1 , n > 1.
(Nonnegative Definiteness)
7 BROWNIAN MOTION
p=2+ 2 ^ o and 0= ^
Here p and a are two fixed numbers, or > 0. Then as f > cc, the mean
displacement t f (p q)0 converges to t and the variance converges to ta e . In
the limit, then, the position X, of the particle at time t > 0 is Gaussian with
probability density function (in y) given by
(Y
_
2QZt_t)z (7.1)
P(t; x, Y) _ (2ita2t)1j2 eXp{
Ifs > 0 then X, + X, is the sum of displacements during the time interval
(t, t + s]. Therefore, by the argument above, X, +s X, is Gaussian with mean
s and variance sa e , and it is independent of {X,,: 0 < u < t}. In particular, for
every finite set of time points 0 < t l < t 2 < <t, the random variables
X,,, X^ 2 X,.. . , X XX ,,,_, are independent. A stochastic process with this
last property is said to be a process with independent increments. This is the
continuoustime analogue of random walks. From the physical description of
the process {X} as representing (a coordinate of) the path of a diffusing solute
particle, one would expect that the sample paths of the process (i.e., the
trajectories t * X(w) = w,) may be taken to be continuous. That this is indeed
the case is an important mathematical result originally due to Norbert Wiener.
For this reason, Brownian motion is also called the Wiener process. A complete
definition of Brownian motion goes as follows.
1. The sample space S2 := C[0, oo) is the set of all realvalued continuous
functions on the time interval [0, cc). This is the set of all possible
trajectories (sample paths) of the process.
2. XX (co) := co, is the value of the sample path w at time t.
3. S2 is equipped with the smallest sigmafield .y of subsets of S2
containing the class .moo of all finitedimensional sets of the form
F = fce e ): a ; <w,. < b i , i = 1, 2, ... , k}, where a <b, are constants
;
and 0 < t l < t 2 < < t k are a finite set of time points. .F is said to be
generated by .moo.
BROWNIAN MOTION 19
For the set F above, P (F) can be calculated as follows. Definition (7.1) gives
P
the joint density of X, Xr2  X,,, ... , X,,  Xtk _, as that of k independent
Gaussian random variables with means t 1 p., (t 2  t1), ... , (tk  tk 1)/ 2 ,
respectively, and variances tIQ 2 , (t2 t1)a 2 , ... , (tk tk_ 1 )a 2 , respectively.
Transforming this (product) joint density, say in variables z 1 , z 2 , ... , by the
change of variables z 1 = YI, z2 = Y2  Y1' I zk = Yk  Yk1 and using the fact
that the Jacobian of this linear transformation is unity, one obtains
J
fbk
^ bI ...

{(Y1 X t A'
2 "2 exp 2
01 ak I ak (27rQ t l) 2v t,
1 t t1)1 1 ) 2
1Y2 Y1 (2 l
(21IU2(t2 t1))"2exp
2a2(t2 t1) 1
... 2
I
1/ 2
J (Yk Yk1 (tk tk1)t^) 2
expl( 2 dYk dYkI
...
dY1
(27L6 (tk tk  1)) 26
(tk tk  1) ^
(7.2)
The joint density of X, , X, , ... , X,,, is the integrand in (7.2) and may be
1 2
Define, for each value of the scale parameter n >, 1, the stochastic process
X i n) = S[nr] (t
^ i 0), (8.2)
V "
where [nt] is the integer part of nt. Figure 8.1 plots the sample path of
{X;"^: t >, 0} up to time t = 13/n if the successive displacements take values
Z 1 =1, Zz =+1, Z 3 =+1, Z4 =+1, Z 5 =1, Z6 =+1, Z 7 =+1,
Zg = 1, Zq = + 1, Z10 = + 1, 211 = + 1, Z12 = 1.
_ 1
Simi
I n .i
4 4 '4
Vn
3
^; 4
'In
? 4.14
Vn
Vn
1 1 3 4 5 6 7 i 9 10 11 12 13 t
n n n n n n It n n n n n n
Intl
EX " = 0, VarX^11 = n ~ t,
' )
Cov(Vt., + `v l )= [n] ~ S.
Figure 8.1
>,
The process {S 11 : t 0} records the discretetime random walk
{S.: m = 0, 1, 2, ...} on a continuous time scale whose unit is n times that of
the discrete time unit, i.e., Sm is plotted at time m/n. The process
{X} = {(1/ \/)S[ f] also scales distance by measuring distances on a scale
}
whose unit is f times the unit of measurement used for the random walk.
This is a convenient normalization, since
[nt]0 z2
EX ) = 0, Var X}" ) = ta z for large n. (8.3)
n
In a time interval (t 1 , t 2 ] the overall "displacement" X X( ) is the sum of
a large number [nt z ] [nt,] n(t 2 t,) of small i.i.d. random variables
1 1
In the case {Z,} is i.i.d. Bernoulli, this means reducing the step sizes of the
random variables to t1 = 1/,.fn. In a physical application, looking at {X}
means the following.
1. The random walk is observed at times t, < t 2 <t 3 < sufficiently far
apart to allow a large number of individual displacements to occur during
each of the time intervals (t,, t z ], (t z , t 3 ... , and
],
Since the sample paths of {X} have jumps (though small for large n) and
are, therefore, discontinuous, it is technically more convenient to linearly
interpolate the random walk between one jump point and the next, using the
same spacetime scales as used for {X}. The polygonal process {X,( 1 is " }
formally defined by
In this way, just as for the limiting Brownian motion process, the paths of
{X} are continuous. Figure 8.2 plots the path of {X1 "} corresponding to the
path of {X} drawn in Figure 8.1. In a time interval m/n < t < (m + 1)/n, X;" )
is constant at level 1// S while X}" ) changes linearly from l/ f S. at time
t=m/n to
m+1
I S, +1 = S"' Z'" + ' at time
me t =
n
22 RANDOM WALK AND BROWNIAN MOTION
I [n(] Z101j+j
S^ rl + (t
n ) ^n
Vn
4
Vn
3
In
2
do
W,
[nt] + 1 (t [nt] t
= 0, VarX^ rn) _ )2
n n n
[ns]
Figure 8.2
Thus, in any given interval [0, T] the maximum difference between the two
processes {X,(n} and {X,( } does not exceed
)
To see that the difference between {X,(n } and {X;n } is negligible for large n,
) )
I ^'<(5
= I P( Z for allm= 1,2,...,[nT]+ 1)
lV n
[nT1+1
= 1 (P(IZ11 < 5\))
= 1 (1 P(IZ11 > 6.^ n))[nT 1 +l (8.5)
Assuming for simplicity that EIZ 1 1 3 < co, Chebyshev's inequality yields
P(1Z11 > 6 ^) <, EIZII 3 /6 3 n 3/2 . Use this in (8.5) to get (Exercise 9)
(nTJ+l
EIZIr
P(e (T) > (5) 1 ( i (533/2 )
when n is large. Here indicates that the difference between the two sides
THE FUNCTIONAL CENTRAL LIMIT THEOREM (FCLT) 23
goes to zero. Thus, on any closed and bounded time interval the behaviors of
{X,(" } and {X} are the same in the largen limit.
)
Note that given any finite set of time points 0 < t 1 < t 2 < < t, the joint
distribution of (X, X;z ) , .. . , X(" ) converges to the finitedimensional
)
drift and diffusion coefficient a 2 . To see this, note that X, X Xt"^, ... ,
X,( X() , are independent random variables that by the classical central limit
k )
therefore, of {X^" 1 }) to those of the Brownian motion process {X} (Exercise 1).
Roughly speaking, to establish the full convergence in distribution of {X!" 1 }
to Brownian motion, one further looks at a finite set of time points comprising
a fine subdivision of a bounded interval [0, T] and shows that the fluctuations
of the process {X^"^} on [0, T] between successive points of this subdivision
are sufficiently small in probability, a property called the tightness of the process.
This control over fluctuations together with the convergence of {X^" 1 } evaluated
at the time points of the subdivision ensures convergence in distribution to a
continuous process whose finitedimensional distributions are the same as those
of Brownian motion (see theoretical complements for details). Since there is no
process other than Brownian motion with continuous sample paths that has
these limiting finitedimensional distributions, it follows that the limit must be
Brownian motion.
A precise statement of the functional central limit theorem (FCLT) is the
following.
dimensional events, e.g., the events {max a , b X > y} and f maxa t b X < x}
pertaining to extremes of the process. More generally, if f is a continuous
function on C[0, oo) then the event { f( {X}) < x} is also a Borel subset of
C[0, oo) (Exercise 2). With events of this type in mind, a precise meaning of
convergence in distribution (or weak convergence) of the probability measures
P. to P on this infinitedimensional space C[0, oo) is that the probability
distributions of the realvalued (one dimensional) random variables f( {X;'})
converge (in distribution as described in Chapter 0) to the distribution of f( {X1 })
for each realvalued continuous function f defined on C[0, cc). Since a number
of important infinitedimensional events can be expressed in terms of continuous
functionals of the processes, this makes calculations of probabilities possible
by taking limits; for examples of infinite dimensional events whose probabilities
do not converge see Exercise 9.3(iv).
Because the limiting process, namely Brownian motion, is the same for all
increments {Z,} as above, the limit Theorem 8.1 is also referred to as the
Invariance Principle, i.e., invariance with respect to the distribution of the
increment process.
There are two distinct types of applications of Theorem 8.1. In the first type
it is used to calculate probabilities of infinitedimensional events associated with
Brownian motion by studying simple random walks. In the second type it
(invariance) is used to calculate asymptotics of a large variety of partialsum
processes by studying simple random walks and Brownian motion. Several such
examples are considered in the next two sections.
The first problem is to calculate, for a Brownian motion {X} with drift Ic = 0
and diffusion coefficient Q 2 , starting at x, the probability
P(T < Ta) = P({X' } reaches c before d) (c < x < d), (9.1)
where
_X
P(2x < ra) = P({B,} reaches c x before d (9.3)
a Q
complement 2)
cx dxl
P(i < rd) = lim P ( {i} reaches  before /)
" x Q 6
where
c"= Lc x ;],
6
and
d
x n if d" = d X is an integer,
d =
" dx
d x + 1 if not.
d_ x 
 n
d a
P(rx <t) = l im " = lim   . (9.5)
Therefore,
The relations (9.8) mean that a Brownian motion with zero drift is recurrent,
26 RANDOM WALK AND BROWNIAN MOTION
[nt]
EX (n) = ES1 n,1,n = a n tp
^n or (9.9)
[
nt] Var Z n [nt] (
(1 1\1 )Z) t,
Var X^ (n) =
n n 7
1 J
Q.
1 7
= um
nm l^ dc
a Jn 1 +
U n
1 I I
; ^
FIRST PASSAGE TIME DISTRIBUTIONS FOR BROWNIAN MOTION 27
exp
c 
d z x
exp 2 A ^
c) Ja
e x p^ (d 2
)j
d 
exp
vz 1L
Therefore,
1 exp{2(d x)p/v z }
P(i' < zcd) = (c < x < d, p 0). (9.10)
1 exp{2(d c)p/v 2 }
P(i<<oo)=exp{ 2(x z c)p } (c < x, p > 0),
l o J)) (9.12)
P(r <oo)= 1 (c<x,p<0).
We have seen in Section 4, relation (4.7), that for a simple symmetric random
walk starting at zero, the first passage time 7 y to the state y 0 0 has the
28 RANDOM WALK AND BROWNIAN MOTION
distribution
N
P(7.=N)=IYI N+y 1
Y N N=IYI>IYI+2,IYI+4..... (10.1)
2
Now let r = T be the first time a standard Brownian motion starting at the
origin reaches z. Let {X^" } be the polygonal process corresponding to the simple
)
symmetric random walk. Considering the first time {X } reaches z, one has )
= lim P(T=,n] = N)
n+m N=(nt]+1
N
y IYI ( N
= lim N+ (Y = [z^])
n+ao N=tnt]+1, N 2
Nyeven
(10.2)
Now according to Stirling's formula, for large integers M, we can write
2 e M M nr+2 (1 + S M )
M! = (21r) (10.3)
N eNNN+#2N
IYI N + y 2 _ N = IYI 2
(N+y)/2+I e l (NYu2+#
N 2 (2ir) t N e (N+Y)12( N + Y ) (NY)/2 (N Y
2 ` 2 J
X (1 + o(1))
(2ir)1I2N312 1+ N I 1 N (1 + o( 1 ))
I 2 (N + Y)/ 2 (N Y)l2
(2ir) /2N3/2 (1 + ) 1 (1 + o( 1 )),
N
(10.4)
where o(1) denotes a quantity whose magnitude is bounded above by a quantity
en (t, z) that depends only on n, t, z and which goes to zero as n oo. Also,
FIRST PASSAGE TIME DISTRIBUTIONS FOR BROWNIAN MOTION 29
r
log (1
y (N+ y uz y wNvuz _ N + Y Y _ Y z IYI3
+ N) 1  N 2 N 2N2 +(N3)
[
2 3
+N 2 y [
N+2 N +O \ INI3 /]
_ 2N+8(N,y), (10.5)
where IB(N, y)j < n 11 z c(t, z) and c(t, z) is a constant depending only on t and
z. Combining (10.4) and (10.5), we have
N
z
^N N + Y 2N = n N3I/2 1 exp 
(
2N}(1 + o( 1 ))
2
( (10.6)
= n I N 312 exp12N}(1 + 0(1)),
where o(1) , 0 as n * oo, uniformly for N> [nt], N  [z\] even. Using this
in (10.2), one obtains
2
P(T= > t) _ v
^ f
olz
e  " Z / z dv. (10.9)
The first passage time distribution for the more general case of Brownian
motion {X1 } with zero drift and diffusion coefficient Qz > 0, starting at the
origin, is now obtained by applying (10.9) to the standard Brownian motion
{(1/Q)X}. Therefore,
P(;> t) _ 2 f o
I=Ibf
e2/2 dv. (10.10)
Note that for large t the tail of the p.d.f. f 2 (t) is of the order of t 3 / 2 . Therefore,
although {X} will reach z in a finite time with probability 1, the expected time
is infinite (Exercise 11).
Consider now a Brownian motion {X,} with a nonzero drift and diffusion
coefficient a 2 that starts at the origin. As in Section 9, the polygonal process
{X^n) } corresponding to the simple random walk S,, n = Z 1 , + + Z,, n ,
S0 ,, = 0, with P(Ztn , n = 1) = p = 2 + /(2Q\), converges in distribution to
{W = Xja}, which is a Brownian motion with drift /u and diffusion coefficient
1. On the other hand, writing Ty , n for the first passage time of {S, n : m = 0, 1, ...}
to y, one has, by relation (4.6) of Section 4,
N
N = IY)
P(1 = N) N N +y p(N+v)/2R(Nv)/2
2
N (Niy)12 (lNY)l2
IYI N ) (
N+y 2  1+
N 2
Ql Qn
FYI 2 \ N/21 y /2 p y /2
=
1
NN 2 U
1_
+ y 2 N(
n \ 1 +
a2 n / \
l
; )
(10.12)
For y = [w..J] for some given nonzero w, and N = [nt] + 2r for some given
t> 0 and all positive integers r, one has
N/2 / ly/2
^ v/z/ l
( _
I1 1 }
J ^
2 2r
exp{ _
= ex 4
t e i
}(x a^w
2
ex 1
l + 0 ())
n^ p 26 p 2v (
z)n]rin
t/1 2 /LW
zn1 +o(1))
(=exp2+6
i ( z l r ro
exp
t 2 + ^W exp Y2 + e (l + 0(1)) (10.13)
2a Cr l Q J
where E does not depend on r and goes to zero as n + oc, and o(l) represents
a term that goes to zero uniformly for all r >, 1 as n * x. The first passage
time i 2 to z for {X1 } is the same as the first passage time to w = z/a for the
process {I4'} = {X/a}. It follows from (9.12), (9.13) that if
then there is a positive probability that the process { W} will never reach w = z/a
(i.e., t Z = co). On the other hand, the sum of the probabilities in (10.12) over
N> [nt] only gives the probability that the random walk reaches [w,,/n ] in a
finite time greater than [nt]. By the FCLT, (10.6)(10.8) and (10.12) and (10.13),
we have
x [ du
expf^Zn``I/2
1
(2n)i/z w=
exI I p a
v3/z
^ t1i
2a W/.1
2
W2
x exp{
(v t) dv
2v 2az
1 W/I ) z z
= ^ aw^ exp fj 1 exp W v A,
(2n) l/Z v 3 ' 2 2v 2U2 ^
Differentiating this with respect to t (and changing the sign) the probability
density function of r Z is given by
Therefore,
Izi
.f 2 exp{  1 (z t) 2 } (t > 0). (10.16)
(t) _ (2na2)1J2t3n
In particular, letting p(t; 0, y) denote the p.d.f. (7.1) of the distribution of the
position X at time t, (10.16) can be expressed as (see 4.6)
I I p(t; 0, Z).
.%2. u (t) = C (10.17)
(i) p>O,z<Oor
(ii) p<0,z>0.
In all other cases, (10.16) is a proper probability density function. By putting
p = 0 in (10.16), one gets (10.11).
Consider a simple symmetric random walk {S,} starting at zero. The problem
is to calculate the distribution of the last visit to zero by S o , S,, ... , S. For
this we first calculate the probability that the number of + l's exceeds the
number of l's until time N and with a given positive value of the excess at
time N.
P(S1>0,S2>0,...,S.+bi> 0,Sa+b=ba)
bb
[(a+ 11 (a+b 111^21a+n(a b  a
(11.1)
b b)a+b(2)a+n.
THE ARCSINE LAW 33
M= (a +
bb i 1) (a+b1^
since there are altogether (' + 6 1 1 ) paths from (1,1) to (a + b, b a). Now a
straightforward simplification yields
_ a+b ba
M b )a+b
Lemma 2. For the simple symmetric random walk starting at zero we have,
"
=2 Z P(S 1 >O,S 2 >0,...,S Z "_ 2 >0,S en =2r)
r=1
a"
=2,= 2
[ (n +r 1 1) ( 2n+r/](2)
= 2(2nn 1 ( 2n)(^)'"
= P(S2" = 0),
)\ 2 / 2n = n
34 RANDOM WALK AND BROWNIAN MOTION
Theorem 11.1. Let I' ( ' ) = max{ j: 0 ,< j ,< m, Si = 0}. Then
P(F
(2n) = 2k) = P(S2k = 0 )P(S2n2k = 0 )
2 ) J2n2k
= \ k /\ 2/ 2k (fl_k k2
(2k)!(2n 2k)! ( i'\"
= fork =0,1,2,..., n. (11.3)
(k!) 2 ((n k)!)2 2
Theorem 11.2. (The Arc Sine Law). Let {B,} be a standard Brownian motion
at zero. Let y = sup{t: 0 < t 5 1, B, = 0}. Then y has the probability density
function
P(y < x) =
fo x
.f(y) dy = sin
n
1 x. (11.6)
(2k)!(2n 2k)!
lim P(I' (Z " 1 < 2nx) = lim [^1 2_2n
nw n. k=o (k!) 2 ((n k)!)z
(2x)1/2e2(nk)(2(n k))[2cnk)+#122n
x \ ((2n)'1'e("k)(n k)nk++)2
[nxxl
= lim Y
1/2
n+oo k=o 7i (n k)
1 Intl 1 1 _ 1
1x
= n li k= n k
k 112 n o (y( 1 y)) 1I2 dy
\ n^ 1
n//
Corollary 11.3. Let {Z,, Z 2 , ...} be a sequence of i.i.d. random variables such
that EZ, = 0, EZ i = 1. Then, defining {X^" 1 } as in (8.4) and y ( n 1 as above, one has
From the arcsine law of the time of the last visit to zero it is also possible
to get the distribution of the length of time in [0, 1] the standard Brownian
motion spends on the positive side of the origin (i.e., an occupation time
law) again.as an arcsine distribution. This fact is recorded in the following
corollary (Exercise 2).
and
Cov(B*, B*) = Cov(B,,, B, 2 ) t 2 Cov(B,,, B 1 ) t, Cov(B,,, B 1 )
+ t1t2 Cov(B 1 , B1)
= tl t2tl t1t2 + t1t2 = t1(1 t2), for t i t2. (12.3)
From this one can also write down the joint normal density of (B*, B*, ... , B)
for arbitrary 0 < t l < t 2 < < t k < 1 (Exercise 1).
The Brownian bridge arises quite naturally in the asymptotic theory of
statistics. To explain this application, let us consider a sequence of realvalued
i.i.d. random variables Y1 , Y2 ,... , having a (common) distribution function F.
The nth empirical distribution is the discrete probability distribution on the line
assigning a probability 1/n to each of the n values Y Y2 , ... , Y. The
corresponding distribution function F. is called the (nth) empirical distribution
function,
1
F(t)=#{j:1<j<n,Y; <t}, co<t<00, (12.4)
n
FF(t)
1 ^
3 ~ I
Figure 12.1
THE BROWNIAN BRIDGE 37
is the sum of n i.i.d. Bernoulli random variables each taking the value I with
probability F(t) = P( t) and the value 0 with probability 1 F(t). Now
E(1 (y . ) ) = F(t) and, for t 1
=
F(t1)( 1 F(t2)),
Cov( 1 (Y ; s1,) , 1{Yks:2))
to, ifj = k,
(12.6)
is asymptotically (as n > oo) Gaussian with mean zero and variance
F(t)(1 F(t)). For t l < t 2 < . < t k , the multidimensional central limit
theorem applied to the i.i.d. sequence of kdimensional random vectors
( 1 (Y)' 1(Y;st^)' ... , 1 (Y ; ,Ik t) shows that (.(Fn(t1) F(t1)), \/(F,(t2) F(ti)),
. , f (F,,(t k ) F(t k ))) is asymptotically (kdimensional) Gaussian with zero
mean and dispersion matrix E = ((a s,)), where
In the special case of observations from the uniform distribution on [0, 1],
one has
so the sequence U 1 = F(Y1 ), U2 = F(Y2 ), . .. is i.i.d. uniform on [0, 1]. The same
is true more generally (Exercise 2). Let F be the empirical distribution function
of Y1 , ... , Y,,, and G. that of U1 ,. .. , U. Then, since the proportion of Yk 's,
1 < k < n, that do not exceed t coincides with the proportion of Uk 's, 1 < k < n,
that do not exceed F(t), we have
If a = oo (b = + oo), the index set [a, b] for the process is to exclude a (b).
Since ^(G(t) t), 0 < t <, 1, converges in distribution to the Brownian bridge,
and since t + F(t) is increasing on (a, b), one derives the following extension
of Proposition 12.1.
D. = sup "I F(t) F(t)I = sup ' I G(F(t)) F(t)I = sup ,/ /IG(t) tl.
a_<t5b a5t5b 0<t<,1
(12.11)
Thus, the distribution of D. is the same (namely that obtained under the uniform
distribution) for all continuous F. This common distribution has been tabulated
for small and moderately large values of n (see theoretical complement 2). By
Proposition 12.2, for large n, the distribution is approximately the same as that
of the statistic defined by (also see theoretical complement 1)
D:= sup (B*I. (12.12)
o,t,1
STOPPING TIMES AND MARTINGALES 39
These facts are often used to test the statistical hypothesis that observations
Y,, Y2 , ... , Y are from a specified distribution with a continuous distribution
function F. If the observed value, say d, of D is so large that the probability
(approximated by (12.13) for large n) is very small for a value of Dn as large as
or larger than d to occur (under the assumption that Y,, ... , Y do come from
F), then the hypothesis is rejected.
In closing, note that by the strong law of large numbers, F (t) + F(t) as
F
Definition 13.1. A stopping time r for the process {X} is a random variable
taking nonnegative integer values, including possibly the value + oo, such that
If X. does not lie in B for any n, one takes T B = oo. Sometimes the minimum
in (13.3) is taken over In >, 1, X . e B} , in which case we call it the first return
time to B, denoted rl B
A less interesting but useful example of a stopping time is a constant time,
T:= m (13.4)
Again, if X. does not lie in B for any n> TB 1) , take ie^ = oo. Also note that
if i8 ) = oo for some r then r ' = oo for all r' r. It is a simple exercise to check
( )
then
ES= ES o . (13.6)
Assumptions (2), (3) ensure that ES, is well defined and finite. Assumption
(4) is of a technical nature, but cannot be dispensed with. To demonstrate this,
consider a simple symmetric random walk {S n } starting at zero (i.e., S o = 0).
Write r y for T { ^, ) , the first passage time to the state y, y 0. Then (1), (2), (3)
are satisfied. But
ES,Y=y 0. (13.13)
The reason (13.6) does not hold in this case is that assumption (4) is violated
(see Exercise 4). If, on the other hand,
where a and b are positive integers, then P(T < co) = 1. There are various ways
of proving this last assertion. A more general result, namely, Proposition 13.4,
is proved later in the section to take care of this condition. To check condition
(3) of Theorem 13.1, note that ISJ < max{a, b}, so that EISB I < max{a, b}. Also,
on the set {tr > m} one has a <Sm <b and therefore
42 RANDOM WALK AND BROWNIAN MOTION
Thus condition (4) is verified. Hence the conclusion (13.6) of Theorem 13.1
holds. This means
( a >z b ) =
Pz_ a (, 13.17)
a+b
a result that was obtained by a different method earlier (see Chapter I, Eq. 3.13).
To deal with the case EX 0, as is the case with the simple asymmetric
random walk, the following corollary to Theorem 13.1 is useful.
Then
which yields (13.18). Note that EISB I < EISLI + (Er)I j! < oo, by (2') and (3'). Also
Er= (b+a)P(T_a>rb)a
(13.21)
p R
1_
(P) a
P(ta > Tb) _(q'\)a+b'
,)a
P q( a (13.22)
P_y
b+a (
Assumption (2') for this case follows from Proposition 13.4 below, while (3'),
(4'), follow exactly as in the case of the simple symmetric random walk (see
Eq. 13.15).
In the proof of Theorem 13.1, the only property of the sequence {X} that
is made use of is the property
I
E(Xa 1 {X0 ,X 1 ,...,X.})=0 (n= 0,1,2,...). (13.23)
I
E(S +1 {S o ,...,Sn })=S (n = 0, 1,2,...), (13.24)
44 RANDOM WALK AND BROWNIAN MOTION
since
X=SS i (n=1,2,3,...),
 Xo=So, (13.26)
satisfies (13.23).
Martingales necessarily have constant expected values. Likewise, if {X} is a
martingale difference sequence, then EX = 0 for each n >, 1. Theorem 13.1,
Corollary 13.2, and Theorem 13.3 below, assert that this constancy of
expectations of a martingale continues to hold at appropriate stopping times.
In the gambling setting, the martingale property (13.24), or (13.23), is often
taken as the definition of a fair game, since whatever be the outcomes of the
first n plays, the expected net gain at the (n + 1)st play is zero. As an example
of a strategy for the gambler, suppose that it is decided not to stop until an
amount a is lost or an amount b is gained, whichever comes first. Under (13.23)
and conditions (2)(4) of Theorem 13.1, the expected gain at the end of the
game is still zero. This conclusion holds for more general stopping times, as
stated in Theorem 13.3 below. Before this result is stated, it would be useful to
extend the definition of a martingale somewhat. To motivate this new definition,
consider a sequence of i.i.d. random variables { Y: n = 1, 2, ...} such that
EY = 0, EY,, = Var()) = 1. Then {S n: n = 0, 1, 2, ...} is a martingale,
where So is an arbitrary random variable independent of { Y: n = 1, 2, ...}
satisfying ES < oo.
To see this, form the difference sequence
Xo _
Szo . (13.27)
Then, writing Yo = So ,
I
E(Xn+ 1 {X0, Xl, X2, .... Xn })
by (13.28). Thus,
implies that
In general, however, the converse is not true; namely, (13.31) does not imply
(13.30). To understand this better, consider a sequence of random variables
{Y: n = 0, 1, 2, ...}. Suppose that {X: n = 0, 1, 2, ...} is another sequence of
random variables such that, for every n, X0 , X 1 , X2,... , X can be expressed
as functions of Yo , Y1 , Y2 . . , Y. Also assume EXn < oo for all n. The condition
(13.31) implies that X, +1 is orthogonal to all square integrable functions of
X0 , X 1 , ... , X, while (13.30) implies that X + , is orthogonal to all square
integrable functions of Yo , Y1 .....} (Chapter 0, Eq. 4.20). The latter class of
functions is larger than the former class. Property (13.30) is therefore stronger
than property (13.31).
One may express (13.30) as
Note that a martingale in this sense is also a martingale in the sense of Definition
13.2, since (13.30) implies (13.31). In order to state an appropriate generalization
of Theorem 13.1 we need to extend the definition of stopping times given earlier.
Then
Let Yo = 0, ^ = Q{ Yo , Y,, ... , Y}. Then, by (13.27) and (13.28), the sequence
of random variables
and
a2b 2
Et=ES, =a 2 P(r_ a <T b )+ b 2 P(t_ a > Tb) = (13.41)
a+b + a6 ab ab.
Proof. There exists an c> 0 such that either P(X" > c) > 0 or P(X" < c)> 0.
Assume first that 6 := P(X" > c) > 0. Define
no = r a+b l
+ 1, (13.43)
i
Ls
where [(a + b) /E] is the integer part of (a + b) /E. No matter what the starting
position x E (a, b) of the random walk may be, if X" > E for all
n=1,2,...,n o , then S=x+X 1 ++X" o >x+n o e>x +a+b>,b.
Therefore,
By (13.44),
Next,
Now
The equality in (13.49) is due to the fact that the distribution of (X 1 , X2 , ... , Xno )
is the same as that of (Xn o +l, Xn o +z, . .. , X20 ). Note that Sao E ( a, b) on the
set A 1 . Hence the last probability in (13.49) is not larger than 1 5 by (13.46).
Therefore, (13.47) yields
P(T > kn o ) = E(1 42 IA,,) = E[1A, ... 1 A,, E(lAk {S1, ... , S(k 1)no })]
E[I A1 ...l Ak ,( 1 So)] < (1 b o )P(A 1 n ... n Aki)
0o kno
_ )e'I'IP(r = m)
k=1 m= (k 1)no+1
STOPPING TIMES AND MARTINGALES 49
w oo
e k ^ 0I=I P(2 > (k 1)n0) Ze (1 6) k1
k=1 k=1
log(1 6)
no^z^
= e ((1 b)e Holz)) k1 < cc for jzj < . (13.53)
k=1 no
One may proceed in an entirely analogous manner assuming P(X,, < E) > 0.
n
Thus, Proposition 13.4 has the following extension, which is useful in studying
processes other than random walks.
Proposition 13.5. The conclusion of Proposition 13.4 holds if, instead of the
assumption that {X} is i.i.d., (13.54) holds for a pair of positive numbers s, S.
and
Proof. (a) Write .Fk := a{Z 0 , ... , Zk }, the sigmafield of events determined by
Z0 ,. . . , Z. Consider the events A o '= {IZ O I % A}, A k := {IZ; I <A for 0 < j < k,
IZkI >, A} (k = 1, ... , n). The events A k are pairwise disjoint and
n
UA k ={M 2}.
0
Therefore,
n
P(MM > A) _ Y P(A k ). (13.58)
k=0
Now I < IZk !/.1 on A k . Using this and the martingale property,
Hence,
r
E( 1 AkZ^) ? E[ 1 AkE(Zk I . )] = E[E(1 Ak Zk I k)] = E( 1 AkZk). (13.63)
STOPPING TIMES AND MARTINGALES 51
co
EMS =
n
2
f A d2 dP = 2
om n o
21(M,M d i) dP
=
21 o
2(J 1{MA} dP) dA = 2
n f ,0 2P(M > A) dA. (13.64)
0
=
2 fr i
IZIM dP = 2 E(IZI M) < 2(EZ2)'/ 2 (EMn) 112 , (13.65)
using the Schwarz Inequality. Now divide the extreme left and right sides of
(13.65) by (EM)" 2 to get (13.57). n
where;:= r{X: 0 < u < t} is the sigmafield of events that depend only on the
("past" of the) process up to time t. As in the discreteparameter case, first
passage times to finite sets are stopping times. Write for Bore! sets B c ff8 1 ,
r B :=min{t>,0:X,eB}, (13.68)
for the first passage time to the set B. If B is a singleton, B = {y}, T, is written
simply as r y , the first passage time to the state y.
52 RANDOM WALK AND BROWNIAN MOTION
For a Brownian motion {X} with drift , the process {Z, := XX t} is easily
seen to be a martingale with respect to {A}. For this {X,} another example is
{Z, :_ (X1 t) 2 ta 2 }, where a 2 is the diffusion coefficient of {X1 }.
The following is the continuousparameter analogue of Theorem 13.6(b).
and
EM, <,4EZ,. (13.71)
Proof. For each n let 0 = t,, n <t 2 ,, < < = t be such that the sets
I:= {t j ,: I j < n} are increasing (i.e., I n c I + ,) and U 1. is dense in [0, t].
Write M1 , n := max{,Zj: I < j < n}. By (13.56),
P(Mr,, )) EZ`
> 2
Letting n j oo, one obtains (13.70) as the sets F := {M A,} increase to a set
that contains {M, > Al as n j oo.
Next, from (13.57),
CHAPTER APPLICATION 53
Then
It is simple to check that T " is a stopping time with respect to the sequence of
( )
sigmafields {.k2 n: k = 0, 1, ...}, as is r " A r for every positive integer r. Since
( )
EZT ., A , = EZ o . (13.74)
Now ' , T for all n and T " J, r as n j oc. Therefore, t ( " ) A r j r A r. By the
( )
One may apply (13.72) exactly as in the case of the simple symmetric random
walk starting at zero (see (13.16) and (13.17)) to get, in the case it = 0,
for arbitrary a > 0, b > 0. Similarly, applying (13.72) to {Z, :_ (XX  tp) 2  ta l l,
as in the case of the simple asymmetric random walk (see (13.20)(13.22), and
use (9.11)),
Cl  exp^_2a2 11(b + a)
(b + a)P(r_ Q > TO  a = 6 })  a
Et _ __ (13.76)
a6 ^ 2ba '
(1  exp{  (+
6Z )N 1)
(
sediment deposition, erosion, etc.) constrain the life and capacity of a reservoir.
However, a particular design parameter analyzed extensively by hydrologists,
based on an idealization in which water usage and natural loss would occur at
an annual rate estimated by YN units per year, is the (dimensionless) statistic
defined by
RN MNmN
DN DN (14.1)
where
MN:=max{SnYN:n=0, 1,...,N}
_ (14.2)
mN :=min{SnYN :n=0, 1,...,N},
DN :=r (Y YN ) Z ] IIZ ,
YN =  SN .
' (14.3)
First consider that, by the central limit theorem, SN = Nd + O(N 1 " 2 ) in the
sense that (SN Nd)/..N is, for large N, distributed approximately like a
Gaussian random variable with mean zero and variance 0 2 . If one defines
MN =max{Snd:0 n<N},
r N =min{Snd:0^n<,N}, (14.5)
RN = MN  rN,
then by the functional central limit theorem (FCLT)
Q^N
N

mN l (M, m), R
Q^N
NvJ /
^^
as N oo, (14.6)
CHAPTER APPLICATION 55
M := max{B,: 0 . t . l } ,
m:=min{B,:0 < t 1}, (14.7)
R':=Mm,
R N RN
(14.9)
V '. DN \/'. Q
where "S" indicates "asymptotic equality" in the sense that the ratio of the
two sides goes to 1 as N  oo. This implies that the asymptotic distributions
of the two sides of (14.9) are the same. Next notice that
MN _ ( (Snd)n(, d))
max
v,IN o<n,<N aIN
and
mN  [Nt] ( SN d \l
= min ( SIN`S [Nt]d min (B,  tB l ) := m,
and
^
C MN
a \/
mN
' aN
V
^^(M,m). (14.11)
56 RANDOM WALK AND BROWNIAN MOTION
Therefore,
RN RN
(14.12)
DN . 7
/ a..JN
where R is a strictly positive random variable. Once again then, R N /DN , the
socalled rescaled adjusted range statistic, is of the order of O(N 1 J 2 ).
The basic problem raised by Hurst is to identify circumstances under which
one may obtain an exponent H > 2. The next major theoretical result
following Feller was again somewhat negative, though quite insightful.
Specifically, P. A. P. Moran considered the case of i.i.d. random variables
Y1 , Y2 , having "fat tails" in their distribution. In this case the rescaling by
...
Y. = X + f (n). (14.13)
= S* + Z f(j), So = 0, (14.14)
i= 1
where
S,*= X1 ++X. (14.15)
N _
D^:=1 L. (}'n Y, )2
N L=1
1 N1
N 2
/' N
= N n^l (X XN) 2 + N nZl ( f (n) JN) 2 + N nl (f(n) fN)(Xn XN)
1 N 2 N
=D2 + I (f(n) fN)Z + ( f (n) fv)(X XN). (14.17)
N= 1 N=1
Also write
_ _ n _
m N = min {SS nYN } = min Sn nXN + Y (f (j) fN )} , (14.18)
O<n<N O<n<N j=1
IN)' /IN(0) 0,
^N(n) : jY
=1 (1(f)
(14.19)
AN max IN(n) min PN(n).
O<n<N 05n<N
Observe that
and
OSn<N O<n<N
(14.21)
From (14.20) one gets R N < A N + RN, and from (14.21), R N > A N R. In
58 RANDOM WALK AND BROWNIAN MOTION
other words,
IR N  A N S < R. (14.22)
The second term on the right clearly tends to zero as N increases. Also, by
Schwarz inequality,
1 N 1 N 1/2 N 1/2
 (f(n)a)(Xnd) < 1 Z (f(n) a)2
)(X^
^  d)2
N n=1 N n=1 ( =J
Theorem 14.1. If f (n) converges to a finite limit, then for every H > 2,
In particular, the Hurst effect with exponent H > Z holds if and only if, for
some positive number c',
lm ^
N = c'. (14.28)
N' N "
Example 1. Take
First let < 0. Then f (n) a, and Theorem 14.1 applies. Recall that
AN = max N (n) min P N (n), (14.30)
0<n5N O<n<N
where
n
N
) 14.32)
N(n) pN(n 1) = c n !J (
is positive for n < (N ' Ij 1j6)116, and negative or zero otherwise. This shows

1
1 N 1/
n o = j jQ (14.33)
where [x] denotes the integer part of x. The minimum value of p(fl) is zero,
60 RANDOM WALK AND BROWNIAN MOTION
ON = /N(no) = c Y ^k 1 E j) (14.34) .
k=1 Ni=i
ng N c1 Ni+fl,
cno > 1,
1 ^ "
\ 1+
A N cn o (n^ ' log n o N 1 log N) c log N, = 1, (14.37)
X
C Y_ j# = c 2 , <1.
j=1
CASE 1: 2 < < 0. In this case Theorem 14.1 applies with H() = 1 + > 2.
Note that, by Lemma 1, DN a with probability 1. Therefore,
CASE 2: < Z. Use inequality (14.23), and note from (14.37) that
O N = o(N l l 2 ). Dividing both sides of (14.23) by DN N 112 one gets, in probability
asN goo,
RN R" R"
if < 2 .
DN N 1 I 2 DN N 1 I 2 QN1 2
1 (14.39)
CHAPTER APPLICATION 61
CASE 3: = 0. In this case the Y are i.i.d. Therefore, as proved at the outset,
the Hurst exponent is 2.
CASE 4: > 0. In this case Lemma 1 does not apply, but a simple computation
yields
that
_nl S nYN
ZN( N) forn= l,2,...,N,
V ' DN
and linearly interpolated between n/N and (n + 1)/N. Then {Z N (s)} converges in
distribution to {BS + 2c,(1 ../s)}, where B'} is the Brownian bridge. In
this case the asymptotic distribution of R N /( DN) is the nondegenerate
distribution of
max B min B,
oss,i osssi
(Exercise 1).
The graph of H() versus in Figure 14.1 summarizes the results of the
preceding cases 1 through 5.
62 RANDOM WALK AND BROWNIAN MOTION
Figure 14.1
In other words, for large N the plot of log R N /DN against log N should be
approximately linear with slope H = 1 + , if < < 0.
Under the i.i.d. model one would expect to find a fluctuation between the
maximum and the minimum of partial sums, centered around the sample mean,
over a period N to be of the order of N 1 / 2 . One may then try to check the
appropriateness of the model, i.e., the presumed i.i.d. nature of the observations,
by taking successive (disjoint) blocks of Y. values each of size N, calculating
the difference between the maximum and minimum of partial sums in each
block, and seeing whether this difference is of the order of N" 2 . In this regard
it is of interest that many other geophysical data sets indicative of climatic
patterns have been reported to exhibit the Hurst effect.
EXERCISES
Each integer lattice site of Z' is independently colored red or green with
probabilities p and q = 1  p, respectively. Let E m be the event that the number
of green sites equals the number of red sites in the block of sites of side lengths
2m (sites per side) with a corner at the origin. Calculate P(E m i.o.) for d > 3. [Hint:
Use the BorelCantelli Lemma, Chapter 0, Lemma 6.1.]
5. A die is repeatedly tossed and the number of spots is recorded at each stage. Fix
j, 1 _< j < 6, and let p,, be the probability that j occurs among the first n tosses.
Calculate p. and the probability that j eventually occurs.
6. (A Fair Coin Simulation) Suppose that you are given a coin for which the
probability of a head is p where 0 < p < 1. At each unit of time toss the coin twice
and at the nth such double toss record:
XX = 1 if a head followed by a tail occurs,
X, = 1 if a tail followed by a head occurs,
XX = 0 if the outcomes of the double toss coincide.
Let,r=min{n_> 1:X= 1 or 1}.
(i) Verify that P(t < oo) = 1.
(ii) Calculate the distribution of Y = X.
(iii) Calculate Er.
7. Show that the two probability distributions for unending independent tosses of a
coin, corresponding to distinct probabilities Pi 0 p Z for a head in a single toss,
assign respective total probabilities to mutually disjoint subsets of the cointossing
sample space S2. [Hint: Consider the density of l's in the various possible sequences
in S2 and use the SLLN (Chapter 0, Theorem 6.1). Such distributions are said to
be mutually singular.]
8. Suppose that M particles can be in each of N possible states s,, S21'.. , s N . Construct
a probability space and calculate the distribution of (X ... , XN ), where Xi is the
number of particles in state s ; , for each of the following schemes (i)(iii).
(i) (MaxwellBoltzmann) The particles are distinguishable, say labeled m i , . .. , m M
,
and are randomly assigned states in such a way that all possible
distinct assignments are equally likely to occur. (Imagine putting balls
(particles) into boxes (states).)
(ii) (BoseEinstein) The particles are not distinguishable but are randomly
assigned states in such a way that all possible values of the numbers of particles
in the various states are equally likely to occur.
(iii) (FermiDirac) The particles are not distinguishable but are randomly
assigned states in such a way that there can be at most one particle in any
one of the states and all possible values of the numbers of particles in various
states under the exclusion principle are equally likely to occur.
(iv) For each of the above distributions calculate the asymptotic distribution of
X as M and N . oo such that MIN . where p > 0 is the asymptotic
;
1/3 "  t . In particular, the probability (under f) that a randomly selected point
belongs to I", i.e., the length of 1", is P(I") = 2" '/3" '. The sets
 
first time at the nth stage. Then F 0 is well defined and has a continuous
extension to a function F on all of [0,1] with F(l) = I and F(0) = 0.]
u u 0 u II u
n=5
Figure Ex.I.2
test is needed. If the test of the pool is positive then at least one individual has the
disease and each of the m persons must be retested individually, resulting in this
event in m + 1 tests. Let X 1 , X 2 , ... , XN be an i.i.d. sequence of 0 or 1 valued
random variables with p = P(X = 1) for n = 1, 2, .... Let the event X. = l} be
used to indicate that the nth individual is infected; then the parameter p measures
the incidence of the disease in the population. Let S. = X, + X 2 + + X. denote
the number of infected individuals among the first n individuals tested, S o = 0. Let
Tk denote the number of tests required for the kth group of m individuals tested,
k = 1,2,...,[N/m]. Thus, form? 2, Tk = m + I if5,kSm(k_ ^0and Tk = I if
Smk 'Sm(k,) = 0. The total number of tests (cost) for N individuals tested in groups
of size m each is
[N!m]
C N = Tk+(Nm[N/m]), m? 2.
k=1
Find m such that, for given N (large) and p, the expected number of tests per person
is minimal. [Hint: Consider the limit as N oo and show that the optimal m, if
one exists, is the integer value of m that minimizes the function
ECN I
c(m)=lim =1+(1p) m , m? 2, c(1) = 1.
N ' N in
Analyze the extreme values of the function g(x) = (I/x) (1 p)x for x > 0
(see D. W. Turner,. F. E. Tidmore, D. M. Young (1988), SIAM Review, 30,
pp. 119122).]
4. Let {S;} be the simple symmetric random walk starting at x and let
66 RANDOM WALK AND BROWNIAN MOTION
u(n, y) = P(S = y). Verify that u(n, y) satisfies the following initial value problem
sink{y (x c)}
P(Td < Tx) = exp{y p (d x)} where y,, = ln((q/p)" 2 )
Binh{y,,(d c)}'
3. Justify the use of limits in (3.9) using the continuity properties (Chapter 0, (1.1),
(1.2)) of a probability measure.
4. Verify that P(S.x j4 y for all n >, N) = 0 for each N = 1, 2, ... to establish (3.18).
5. (i) If p < q and x < d, then give the symmetry argument to calculate, using (3.9),
the probability that the simple random walk starting at x will eventually reach
d in (3.10).
(ii) Verify that (3.10) may be expressed as
R. represents the number of distinct states visited by the random walk in time 0 to n.
(i) Show that E(R/n) ' ^p qi as n * oo. [Hint: Write
_ (l ifSk;S;forallj=0,1,...,k1,
Ik Sl0 otherwise.
Then
For the case P(X 1 = 0) = 0, take logarithms and apply Jensen's Inequality (Chapter
0, (2.7)) and the SLLN (Chapter 0, Section 0.6) to show log T.  00 a.s. Note the
strict inequality in Jensen's Inequality by nondegeneracy.]
15. Let {S"} be a simple random walk starting at 0. Show the following.
(i) If p = Z, then P(S" = 0) diverges.
(ii) If p Z, then =o P(S" = 0) = (1 4pq) 112 = Ip q1'. [Hint: Apply the
Taylor series generalization of the Binomial theorem to z"P(Sn = 0) noting
that
C 2n (pq)" =
n )
(_1)(4pq)n (
/
\ n
2 I ]
(iii) Give a proof of the transience of 0 using (ii) for p ^ Z. [Hint: Use the
BorelCantelli Lemma (Chapter 0).]
16. Define the backward difference operator by
(iv) Verify that if two harmonic functions agree on a, then they must coincide on
all of [c, d].
(v) Give an alternate proof of the fact that a symmetric simple random walk starting
at x in (c, d) must eventually reach the boundary based on the above
maximum/minimum principle for harmonic functions. [Hint: Verify that the sum
of two harmonic functions is harmonic and use the above ideas to determine
the minimum of the escape probability from [c, d] starting at x, c < x < d.]
17. Consider the simple random walk with p < q starting at 0. Let N denote the number ;
of visits to j > 0 that occur prior to the first return to 0. Give an argument that
EN; = (p/q)'. [Hint: The number of excursions to j before returning to 0 has a
geometric distribution. Condition on the first displacement.]
Let T" denote the time to reach the boundary for a simple random walk {S}
starting at x in (c, d). Let p = 2p  1, a 2 = 2p.
(i) Verify that ET" < oo. [Hint: Take x = 0. Choose N such that P(ISI > d  c).
Argue that P(T > rN) < (',)', r = 1, 2, ... using the fact that the r sums over
(jN  N, jN], j = 1, ... , r, are i.i.d. and distributed as SN.]
(ii) Show that m(x) = ETx solves the boundary value problem
aZ
2 V m+pVm= 1,
m(c) = m(d) = 0, Vm(x):= m(x) m(x 1).
(iii) Find an analytic expression for the solution to the nonhomogeneous boundary
value problem for the case p = 0. [Hint: m(x) = x 2 is a particular solution
and 1, x solve the homogeneous problem.]
(iv) Repeat (iii) for the case p 96 0. [Hint: m(x) = Iq  p 'x is a particular solution

(i) Y N
N
( I N +y 2 N =1
1 forally#0.
NIIyI
N+yeven 2
Y
N,IyI
J IyIIN
N
N
+ 1
1
p (N+y)lZ q (N y)/2 = / p\ v
f I
fort'>0
for y < 0.
N+ y even 2 \q )
70 RANDOM WALK AND BROWNIAN MOTION
3. (A Reflection Property) For the simple symmetric random walk {S} starting at 0
show that, for y > 0,
MN =max{S:n=0, 1,2,...,N},
m N =min{S:n=0, 1, 2,...,N}.
N
_ P(Tn =n)P(SN _>ba)
n=1
8. What percentage of the particles at y at time N are there for the first time in a dilute
system of many noninteracting (i.e., independent) particles each undergoing a simple
random walk starting at the origin?
*9. Suppose that the points of the state space S = 1 are painted blue with probability
EXERCISES 71
Nn(P) _ 1B(Sk)
k=0
(i) Show that EN(p) = (n + 1)p. [Hint: EI B (Sk ) = E{E[I B (Sk ) I Sk ]}.]
(ii) Verify that
for p = Z
lim Var{ Nn(p) l  cap
l n ) P(I  P)
for p z.
1p  9^
10. Apply Stirling's Formula, (k! _ (27rk)'/ Z k k e k (1 + 0(l)) as k . 00), to show for the
simple symmetric random walks starting at 0 that
(i) P(T N)
'
as N ^ oo.
(2rz)1'^z N3/2
I. (i) Complete the proof of P61ya'.s Theorem fork > 3. (See Exercise 5 below.)
(ii) Give an alternative proof of transience for k > 3 by an application of the
BorelCantelli Lemma Part 1 (Chapter 0, (6.1)). Why cannot Part 2 of the
lemma be directly applied to prove recurrence for k = 1, 2?
2. Show that
kO \k/ Z =()
/ .
[Hint: Consider the number of ways in which n balls can be selected from a box
of n black and n white balls.]
3. (i) Show that for the 2dimensional simple symmetric random walk, the probability
of a return to (0, 0) at time 2n is the same as that for two independent walkers,
one along the horizontal and the other along the vertical, to be at (0, 0) at
time 2n. Also verify this by a geometric argument based on two independent
walkers with step size 1/ f and viewed along the axes rotated by 450
72 RANDOM WALK AND BROWNIAN MOTION
(ii) Show that relations (5.5) hold for a general random walk on the integer lattice
in any dimension. Use these to compute, for the simple symmetric random
walk in dimension two, the probabilities fj that the random walk returns to
the origin at time j for the first time for j = 1, ... , 8. Similarly compute fj in
dimension three for I < j < 4.
4. (i) Show that the method of Exercise 3(i) above does not hold in k = 3 dimensions.
(ii) Show that the motion of three independent simple symmetric random walkers
starting at (0, 0, 0) in 71 3 is transient.
5. Show that the trinomial coefficient
n n!
( j, k, n j k j!k!(njk)!
n ^ n
( j, k,njk)_ < (J,K,nJK)
2k1 '"
P(S+1,...,S,+mE{So,...,S}`5
2k )
S0=(0,0,...,0).
Show that the configuration in which all switches are off is recurrent in the cases
k = 1, 2. The general case will follow from the methods and theory of Chapter II
when k < oo. The problem when k = cc has an interesting history: see F. Spitzer
(1976), Principles of Random Walk, Springer Verlag, New York, and references
therein.
*12. Use Exercise 11 above and the examples of random walks on Z to arrive at a
general formulation of the notion of a random walk on a group. Describe a random
walk on the unit circle in the complex plane as an illustration of your ideas.
13. Let {X"} denote a recurrent random walk on the 1dimensional integer lattice.
Show that
[Hint: Translate the problem by x and consider that starting from 0, the number
of visits to 0 before hitting x is bounded below by the number of visits to 0
before leaving the (open) interval centered at 0 of length lxi. Use monotonicity to
pass to the limit.]
for an arbitrarily prescribed sequence E i ..... E" of l's and 0's. [Hint:
p(w, n) = J^ ^ Iw" 7"I/ 2 " metrizes the product topology on S2. Consider the
open balls of radii of the form r = 2 ^' centered at sequences which are 0 from
some n onward and use separability.]
74 RANDOM WALK AND BROWNIAN MOTION
(ii) Let {P"} be a consistent family of probability measures, with P" defined on
(Ii",."), and such that P. is concentrated on Q, = {0,1 }". Define a set function
for events of the form F = F(E 1 , ... , e") in (i), by
where B c {0, 1 }", which agrees with this formula for Fe .f", n >_ 1.
(iii) Show that F := Un 1 .f" is a field of subsets of Q but not a sigmafield.
(iv) Show that P is a countably additive measure on F. [Hint: f) is compact and
the cylinder sets are both open and closed for the product topology on D.]
(v) Show that P has a unique extension to a probability measure on F. [Hint:
Invoke the Caratheodory Extension Theorem (Chapter 0, Section 1) under (iii),
(iv)]
(vi) Show that the above arguments also apply to any finitestate discreteparameter
stochastic process.
4. Let (S2, F, P) and (S, 2) represent measurable spaces. A function X defined on S2
and taking values in S is called measurable if X  `(B) E F for all Be 2', where
X ' (B) = {X E B} _ {w E D: X(o) e B}. This is the meaning of an Svalued random
variable. The distribution of X is the induced probability measure Q on ,P defined by
Let (S2, .F, P) be the canonical model for nonterminating repeated tosses of a coin
and X(a) = w", WE n. Show that {X" X.1, . ..}, m an arbitrary positive integer,
is a measurable function on (S2, F, P) taking values in (S2, F) with the distribution
P; i.e., {X,", Xm+ ,, ... , } is a noncanonical model for an infinite sequence of coin
tossings.
5. Suppose that Di" and D'2 are covariance matrices.
(i) Verify that aD" + D 21 , a, _> 0, is a covariance matrix.
(ii) Let {D ( " ) = ((a;;'))} be a sequence of covariance matrices (k x k) such that
lim a v;j ) = a i; exists. Show that D = ((a u )) is a covariance matrix.
*6. Let (t) = f e"'p(dx) be the Fourier transform of a positive finite measure p.
(Chapter 0, (8.46)).
(i) Show that ((t, t.)) is a nonnegative definite matrix for any t, < < t k .
(ii) Show that (t) = e  I` 1 if p is the Cauchy distribution.
*7. (Plya Criterion for Characteristic Functions) Suppose that 4 is a realvalued
nonnegative function on ( cc, oo) with 4(t) = 4(t) and 0(0) = 1. Show that if 0
is continuous and convex on [0, cc), then 0 is the Fourier transform (characteristic
function) of a probability distribution (in particular, for any t, < t 2 < < t k , k >_ 1,
((4(t ; ti ))) is nonnegative definite by Exercise 6), via the following steps.
(i) Check that 1 Its, Its _< 1, t E 68',
Y(t) = 0,
Itl > 1
EXERCISES 75
Y( . ) (X,  mnt)

t
4. Let {X} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and
zero drift.
(ii) Show that the process has the following scaling property. For each A > 0 the
process {Y} defined by Y, = A  ' t Z Xx, is distributed exactly as the process Al.
(ii) How does (i) extend to kdimensional Brownian motion?
5. Let {X} be a stochastic process which has stationary and independent increments.
(i) Show that the distribution of the increments must be infinitely divisible; i.e., for
each integer n, the distribution of X,  X, (s < t) can be expressed as an nfold
convolution of a probability measure p".
(ii) Suppose that the increment 24  X s has the Cauchy distribution with p.d.f.
(t  s)/n[(t  s) 2 + x 2 ] for s < t, x e I8'. Show that the Cauchy process so
described is invariant under the rescaling { Y} where Y = A ' Xi, for A > 0; i.e.,
{Y,} has the same distribution as {X,}. (This process can be constructed by
methods of theoretical complements 1, 2 to Section IV.1.)
6. Let {X} be a Brownian motion starting at 0 with zero drift and diffusion coefficient
a 2 > 0. Define Y,= JXJ,t ?0.
(i) Calculate EY, Var Y,.
(ii) Is { Y} a process with independent increments?
7. Let R, = X, where {Xj is a Brownian motion starting at 0 with zero drift and
diffusion coefficient a 2 > 0. Calculate the distribution of R.
8. Let {B,} be a standard Brownian motion starting at 0. Define
11. Let {X} be any mean zero Gaussian process. Let t, < t 2 < < t".
(i) Show that the characteristic function of (X r ,... .. X,^) is of the form e 44 ) for
some quadratic form Q(E,) = <A4, i>.
(ii) Establish the paircorrelation decomposition formula for block correlations:
0 if n is odd
E{X,,X, 2 ...X,} =
* E{X,X, }. E{X,X, k } if n is even,
where Y* denotes the sum taken over all possible decompositions into all possible
disjoint pairs {t ; , t 3 }, ... , {t,,,, t,} obtained from {t,.. ... t"}. [Hint: Use induction
on derivatives of the (multivariate) characteristic function at (0, 0, ... , 0) by
first observing that c?e 1 Q 14) /c7 i = a,e t'2 and c?x ; /a5 t = a ;j , where
x i = Y J a ;j ^ t and A = ((a i; )).]
2n n
Figure Ex.I.8
78 RANDOM WALK AND BROWNIAN MOTION
*4 Give an example to demonstrate that it is not the case that the FCLT gives
convergence of probabilities of all infinitedimensional events in C[0, x). [Hint:
The polygonal process has finite total variation over 0 <_ t z 1 with probability 1.
Compare with Exercise 7.8.]
5. Verify that the probability density function p(t; x, y) of the position at time t of the
Brownian motion starting at x with drift p and diffusion coefficient a 2 solves the
socalled FokkerPlanck equation (for fixed x) given by
ap _ 1 2 02 p ap
at  Zo OY Z  p aY
(i) Check that for fixed y, p also satisfies the adjoint equation
ap =, 22
a l p a
2
c7 + p
at ax "^ ax
ac a2C ac
za g 2  p, c(O , Y) = co(y).
8t = Y Y
6. (Collective Risk in Actuary Science) Suppose that an insurance company has an
initial reserve (total assets) of X0 > 0 units. Policy holders are charged a (gross)
risk premium rate a per unit time and claims are made at an average rate A. The
average claim amount is it with variance a 2 . Discuss modeling the risk reserve
process {X,} as a Brownian motion starting at x with drift coefficient of the form
a  p l and diffusion coefficient 2a 2 , on some scale.
7. (Law of Proportionate Effect) A material (e.g., pavement) is subject to a succession
of random impacts or loads in the form of positive random variables L,, L 2 , ..
(e.g., traffic). It is assumed that the (measure of) material strength T k after the kth
impact is proportional to the strength Tk _, at the preceding stage through the
applied load L k , k = 1, 2, ... , i.e., Tk = L,,T,k _,. Assume an initial strength To  1
as normalization, and that E(log L 1 ) 2 < co. Describe conditions under which it is
appropriate to consider the geometric Brownian motion defined by {exp(pt + a 2 B,)},
where {Bj is standard Brownian motion, as a model for the strength process.
8. Let X 1 , X2 ,,.. be i.i.d. random variables with EX = 0, Var X. = a 2 > 0. Let
S. = X, + . + X,,, n >, 1, So = 0. Express the limiting distribution of each of the
random variables defined below in terms of the distribution of the appropriate
random variable associated with Brownian motion having drift 0 and diffusion
coefficient a 2 > 0.
(i) Fix 0>0, Y. = n 012 max{ISj : 1 _< k < n}.
(ii) Yn = n  'I'S..
(iii) Y. = n 312 >I Sk . [Hint: Consider the integral of t  S(n , l , 0 5 t  1.]
9. (i) Write R n (x) = 1(1 + xfn)"  esj. Show that
EXERCISES 79
+ I x en+
R(x) 1 1 1 _ r = 1_ ^x r t e^xl
n n Jj r! (n + 1)!
(ii) Use (i) to prove (8.6). [Hint: Use Taylor's theorem for the inequality, and
Lebesgue's Dominated Convergence Theorem (Chapter 0, Section 0.3).]
1. (i) Use the SLLN to show that the Brownian motion with nonzero drift is transient.
(ii) Extend (i) to the kdimensional Brownian motion with drift.
2. Let X, = X 0 + vt, t >, 0, where v is a nonrandom constantrate parameter and X 0
is a random variable.
(i) Calculate the conditional distribution of X,, given XS = x, for s < t.
(ii) Show that all states are transient if v 0.
(iii) Calculate the distribution of X, if the initial state is normally distributed with
mean and variance a 2 .
3. Let {X,} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and
zero drift.
(i) Define { }} by Y, = tX,,, for t > 0 and Y0 = 0. Show that { Y} is distributed as
Brownian motion starting at 0. [Hint: Use the law of large numbers to prove
sample path continuity at t = 0.]
(ii) Show that {X,} has infinitely many zeros in every neighborhood of t = 0 with
probability 1.
(iii) Show that the probability that t  X, has a righthand derivative at t = 0 is zero.
(iv) Use (iii) to provide another example to Exercise 8.4.
4. Show that the distribution of min,, 0 X is exponential if {X, } is Brownian motion
starting at 0 with drift p > 0. Likewise, calculate the distribution of max,,, X, when
p<0.
*5. Let {Sn } denote the simple symmetric random walk starting at 0, and let
m= min S k , M= max Sk , n= 1, 2, ... .
OSk<n 05k'n
Let {B,} denote a standard Brownian motion and let m = min,,,,, B,,
M = max, < ,,, B,. Then, by the FCLT, n" t j 2 (m n , Mn , Sn ) converges in distribution
to (m, M, B 1 ); for rigorous justification use theoretical complements 1.8, 1.9 noting
that the functional w + (min,,,,, co,, max,,,,, w,, w,) is a continuous map of the
metric space C[0, 1] into R 3 . For notational convenience, let
for integers u, v, y such that u _< 0 _< v, u < v and u _< y _< v. Also let
1(a, b) = P(a <Z < b), where Z has the standard normal distribution. The following
use of the reflection principle is taken from an exercise in P. Billingsley (1968),
80 RANDOM WALK AND BROWNIAN MOTION
Convergence of Probability Measures, Wiley, New York, p. 86. These results for
Brownian motion are also obtained by other methods in Chapter V.
(i) P(u, v, Y) = P,(Y) n(v, Y) it(u, Y) + n(v, u, y) + n(u, v, Y) t(v, u, v, Y)
n(u, v, u, y) + , where for any fixed sequence of nonnegative integers
Y1, Y2, , Yk, y, k, n, zc(y 1 , Y2, ... , Yk, y) denotes the probability that an nstep
random walk meets y, (at least once), then meets Y 2 , then meets y 3 , ... ,
then meets (1) k 'y k , and ends at y.
( 11 ) R(Y 1, Y2, ... , Yk, Y) = pn( 2 Y 1 + 2Y 2 + ... + 2y k  1 + (1)k + 1 y) if (1)k + 'Y > Yk,
n(Y1, Y2, ... , Yk, Y) = Pn( 2 Y1 + 2Y 2 + ... + 2y k _ (_ 1)k+ l y) if (1)'y < Yk
[Hint: Use Exercise 4.4(ii), the reflection principle, and induction on k. Reflect
through (1)k y k _, the part of the path to the right of the first passage through
that point following successive passages through y1, Y2, ... , ( 1 ) k1 Yk2.]
Y P(2vy 2 +2k(vu)<S<2vy,+2k(vu)).
k=w
(vii) P(u <m < M < v) _ (1)kD(u + 2k(v u), v + 2k(v u)).
k=m
* 1. (i) Show that (10.2) holds at each point z _< 0 (> 0) of continuity of the distribution
function for min o , X. (max o <,,, X,). [Hint: These latter functionals are
continuous.]
(ii) Use (i) and (10.9) to assert (10.2) for all ^.
2. Calculate the probability that a Brownian motion with drift it and diffusion coefficient
a 2 > 0 starting at x will reach y : x in time t or less.
3. Suppose that solute particles are undergoing Brownian motion in the horizontal
direction in a semiinfinite tube whose left end acts as an absorbing boundary in
the sense that when a particle reaches the left end it is taken out of the flow. Assume
that initially a proportion i(x) dx of the particles are present in the element of
volume between x and x + dx from the left end, so that f b(x) dx = 1. For a given
drift it away from the left end and diffusion coefficient a 2 > 0, calculate the fraction
of particles eventually absorbed. What if p = 0?
4. Two independent Brownian motions with drift p ; and diffusion coefficient a?, i = 1, 2,
are found at time t = 0 at positions x i , i = 1, 2, with x, <x 2 .
(i) Calculate the probability that the two particles will never meet.
(ii) Calculate the probability that the particles will meet before time s > 0.
5. (i) Calculate the distribution of the maximum value of the Brownian motion
starting at 0 with drift p and diffusion coefficient a 2 over the time period [0, t].
*(ii) For the case p = 0 give a geometric "reflection" argument that P(max o , s <, Xs >_ y)
= 2P(X, _> y). Use (i) to verify this.
6. Calculate the distribution of the minimum value of a Brownian motion starting at
0 with drift I and diffusion coefficient a 2 over the time period [0, t].
7. Let {B 1 } be standard Brownian motion starting at 0 and let a, b > 0.
(i) Calculate the probability that at < B, < bt for all sufficiently large t.
(ii) Calculate the probability that {B,} last touches the line v = at instead of
y = bt. [Hint: Consider the process {Z,} defined by Zo = 0, Z, = tB, I , for t > 0.
and Exercise 9.3(i).]
8. Let {(B,(' ) , B 2 )} be a twodimensional standard Brownian motion starting at (0, 0)
(see Section 7). Let r y = inf{t _> 0: B; 2 = y}, y > 0. Calculate the distribution of
Bty ) . [Hint: {B} and {B 2 } are independent onedimensional Brownian motions.
Condition on r. Evaluate the integral by substituting u = (x 2 + y 2 )/t.]
9. Let {Br } be a standard Brownian motion starting at 0. Describe the geometric
structure of sample paths for each of the following stochastic processes and calculate
EY,.
fY = B 1 if max o<s ,, Bs < a
(i) (Absorption)
1 Y = a if max o , s <, B s >_ a,
where a > 0 is a constant.
82 RANDOM WALK AND BROWNIAN MOTION
(*ii) (Reflection)
J Y=B,
Y,=2aB,
ifB,<a
if B,>a,
a e
.f8(t) = (2 )"2 22 ' (t > 0).
t 3/2
ir
(ii) Verify that the distribution of; is a stable law with exponent 0 = 2 (index z)
in the sense that if Ti , T2 , ... , T. are i.i.d. and distributed as T. then
n 8 (T1 + + T.) is distributed as T. (see Eq. 10.2).
(iii) (Scaling property) ; is distributed as z 2 z1.
11. Let T. be the first passage time to z for a standard Brownian motion starting at 0
with zero drift.
(i) Verify that Ez= is not finite.
(ii) Show that Ee = e i 2 ' ^"', A > 0. [Hint: Tedious integration will
work.]
(iii) Use Laplace transforms to check that (1/n)z (,J J converges in distribution to
t z as n > oo.
12. Let {B,} be standard Brownian motion starting at 0. Let s < t. Show that the
probability that {B,} has at least one zero in (s, t) is given by (2/it) cos  '(s/t)`/ 2 .
[Hint: Let
Likewise for x < 0, p(x) = P(r _ x < t s). So the desired probability can be obtained
by calculating
2
EP(IB,I) =
f
o,
P(x)(
ns
e zs ^ 2 dx.
EXERCISES 83
Throughout this set of exercises {S"} denotes the simple symmetric random
walk starting at 0.
1. Show the following for r 0.
2n Irl
(i) P(S, ^0,S 2 ^0,...,S 2n  t # 0 ,S2 = 2r) = 2
2n
fl + rJ n
= 2r) _ (2k)( 2n 2k) Iri 2 zn
(ii) P(r' 2 n ) = 2k, S Zn
k/J nk+r /Jnk
... , S;, = S". This transformation corresponds to a rotation through 180 degrees.
Use (11.2).]
(iii) Show that
P(i a") =
2k) = P(t (2 "' = 2k + 1) = 1 (2k)22k(2(n  k))22i"k)
2\k/ nk
for k = 1, ... , n in the first case and k = 0, ... , n  1 in the second. [Hint: A
path of length 2n with a maximum at 2k can be considered in two sections.
Apply (i) and (ii) to each section.]
")
1
n C
(iv) lim P t /I1 sin  ^,
it
2
0<t<1.
*4. Let F, UU be as defined in Exercise 2. Define V. = #{k < V" ) : Sk _ , >, 0, Sk _> 0} =
UT "). Show that P(VZ " = 2r ( S 2 . = 0) = 1/(n + 1), r = 0, 1, ... , n. [Hint: Use
induction and Exercise 3(i) to show that P(V2 . = 2r, S 2 " = 0) does not depend on
r,0_<r_<n.]
1. Show that the finitedimensional distributions of the Brownian bridge are Gaussian.
2. Suppose that F is an arbitrary distribution function (not necessarily continuous).
Define an inverse to F as F  '(y) = inf{x: F(x) > y}. Show that if Y is uniform on
[0, 1] then X = F '(Y) has distribution function F.

3. Let {B r } be standard Brownian motion starting at 0 and let B* = B,  tB,, 0 < t <_ 1.
(i) Show that {B*} is independent of B,.
(ii) (The Inverse Simulation) Give a construction of standard Brownian motion
from the Brownian bridge. [Hint: Use (i).]
*4. Let {B,} be a standard Brownian motion starting at 0 and let {B*} be the Brownian
bridge.
(i) Show that for time points 0 < t, <t 2 < < t k _ 1,
 exp{2[v+k(vu)]2}.
[Hint: Express as a limit of the ratio of probabilities as in (i) and use Exercise
9.5(v). Also, 4(x, x + e) = e/(2n) " 2 exp(  x 2 /2) + o(1) as e + 0.]
EXERCISES 85
(iii) Prove
*6. (Brownian Meander) The Brownian meander {B+ } is defined as the limiting
distribution of the standard Brownian motion {B,} starting at 0, conditional on
{m = min,,, I B, > e} as E > 0 (see theoretical complement 4 for existence). Let
m + = min a ^ B,+ , M + = max o ,,, , B+ . Prove the following:
L'(2kz+y)2'2]
(i) P(M < x, B < y) = ^ [el2kx)2/2  0 < y x.
k=m
[Hint: Express as a limit of ratios of probabilities and use Exercise 9.5(v). Also
P(m > e)=(2/tt) 2 +o(1);see Exercise 10.5(ii)notingmin(A)= max(A)
and symmetry. Justify interchange of limits with the Dominated Convergence
Theorem (Chapter 0).]
(ii) P(M + < x) = I + 2 Yk I ( 1) k exp{ (kx) 2 /2}. [Hint: Consider (i) with
y = x.]
(iii) EM + = (2tt)'t 2 log 2 = 1.7374.... [Hint: Compute f P(M + > x) dx from
(ii).]
(iv) (Rayleigh Distribution) P(B, < x) = I e 22 , x > 0. [Hint: Consider (i) in
the limit as x oo.]
*7. (Brownian Excursion) The Brownian excursion {B* + } is defined by the limiting
distribution of {B} conditioned on {m* > s} as d0 (see theoretical complement
4 for existence). Let M* = max 0 , 1 B* + . Prove the following:
M
(i) P(M* + < x) = 1 + 2 1 [l (2kx) 2 ] exp{(2kx) 2 /2}, x > 0.
k1
and note that for k > A the integrand is nonnegative on [A, oc ). So Lebesgue's
monotone convergence can be applied to interchange integral with sum over
k > 1/(20) to get zero for this. Thus, EM* + is the limit as 0  0 of a finite
sum over k <i of an integral that can be evaluated (by parts). Note that this
gives a Riemann sum limit for 2J exp( Zx 2 ) dx = (it/2)' 1 2 .]
(iv/2)'t 2 , if r = 1
(iii)
r
(2(2)li2).4( n )lt2 r 1 (r), if r = 2, 3 , ... ,
Ir
Ii
\ 2 /
where C(r) _ ^k , k ' is the Riemann Zeta function (r >, 2). [Hint: The case
r = 1 is given in (ii) above.] For the case r > 2, we have
TI k m 1 )^i^T^m^, t<1(m)andt>t(1),
check that
m m
(v) C= U U {{F(t(k/m))+*F(i(k/m))} u {F(z(k/m)  )+* F(z(k/m)  )}
m=1k=1
EXERCISES 87
where {Sk 2 nj *: k = 0, 1, 2, ... , 2n} is the simple symmetric random walk bridge
(starting at 0 and tied down at k = 2n) as defined in Exercise 5. [Hint: Arrange
X, Y1 ..... Y in increasing order as X(1) < X (2) < < X 2
define the kth displacement of {Sk 2 n ) *} by
(ii) Find the analytic expression for the probability in (i). [Hint: Consider the event
that the simple random walk with absorbing boundaries at r returns to 0 at
time 2n. First condition on the initial displacement.]
(iii) Calculate the largesampletheory (i.e., asymptotic as n  x) limit distribution
of fn D., n . See Exercise 4(iii).
(iv) Show
C2nl
r'\ nr/f
Pl sup (F.(x) Gn(x))<= 1  r= 1,...,n.
n () 2n
(11J
[Hint: Only one absorbing barrier occurs in the random walk approach.]
4. For the simple symmetric random walk, starting at x, show that E{S
yP(r y = r) for r _< m.
, ,1(, ,)} _
5. Prove that EZn is independent of n (i.e., constant) for a martingale {Zn ). Show also
that E(Zn I {Z o , ... , Zk )) = Zk for any n> k.
6. Write out a proof of Theorem 13.3 along the lines of that of Theorem 13.1.
7. Let {Sn } be a simple symmetric random walk with p a (2, 1).
(i) Prove that {(q/p)s ': n = 0, 1, 2, ...} is a martingale.
(ii) Let c <x <d be integers, So = x, and T = z, A t d := min(t,, r d ). Apply Theorem
13.3 to the martingale in (i) and t to compute P({S n } reaches c before d).
8. Write out a proof of Proposition 13.5 along the lines of that of Proposition 13.4.
9. Under the hypothesis that the pth absolute moments are finite for some p _> 1, derive
the Maximal Inequality P(MM >, A) < EIZn I/ti" in the context of Theorem 13.6.
10. (Submartingales) Let {Z n : n = 0, 1, 2, ...} be a finite or infinite sequence of
integrable random variables satisfying E(Zn+ I {Z 0 , ...Zn }) > Zn for all n. Such a
sequence {Zn } is called a submartingale.
(i) Prove that, for any n > k, E(Zn ( {Z0.....4)) > Z.
(ii) Let Mn = max{Z o , ... , Zn }. Prove the maximal inequality P(MM _> A) _< EZ/A 2
for A > 0. [Hint: E(ZkIAk(Zn Zk)) = E(Z,IAkE(Zn Zk I {Zo, ... , Zk})) i 0
for n>k, where A k :={Z 0 <A,...,Zk _ I <A,Zk >.i}.]
(iii) Extend the result of Exercise 9 to nonnegative submartingales.
11. Let {Zn } be a martingale. If EIZ,,I < oo then prove that IZ,,I is a submartingale,
p >_ 1. [Hint: Use Jensen's or Hlder's Inequality, Chapter 0, (2.7), (2.12).]
12. (An Exponential Martingale) Let {X,: j _> 0} be a sequence of independent random
variables having finite momentgenerating functions 4,(^):= E exp{^XX } for some
96 0. Define Sn := X, + + XX , Zn = exp{^Sn }/fl7 = , q().
(i) Prove that {Zn } is a martingale.
(ii) Write M = max{S...... Sn }. If > 0, prove that
n
P(MM  A) < exp{ ZA}
11 O ; (Z) ( >0).
13. Let {Xn : n >_ 1) be i.i.d. Gaussian with mean zero and variance a 2 > 0. Let
Sn = X, + + Xn , MM = max{S 1 , ... , Sn }. Prove the following for A > 0.
(i) P(MM _> 2) < exp{ 2 2 /(2a 2 n)). [Hint: Use Exercise 12(ii) and an appropriate
choice of .]
(ii) P(max {ISST: 1 < j < n} >_ Aaln) _< 2 exp{ A 2 /2}.
14. Let r ' r 2 be stopping times. Show the following assertions (i) (v) hold.
(i) z l v r 2 '= max(r l , tr z ) is a stopping time.
EXERCISES 89
17. Let {S} be the simple symmetric random walk starting at 0. Let r = inf{n ? 0:
S=2n}.
(i) Calculate Er from the distribution of t.
(ii) Use the martingale stopping theorem to calculate Et.
(*iii) How does this generalize to the cases r = inf{n ? 0: S = b n}, where b is
a positive integer? [Hint: Check that n + S. is even for n = 0, 1, 2, ....]
18. (i) Show that if X is a random variable such that g(z) = Ee^ x is finite in a
neighborhood of z = 0, then EX < oo for all k = 1, 2.....
(ii) For a Brownian motion {X,} with drift p and diffusion coefficient a 2 , prove that
exp{AX, Atp A 2 a 2 t/2} (t _> 0) is a martingale.
19. Consider an arbitrary Brownian motion with drift and diffusion coefficient a 2 > 0.
(i) Let m(x) = ET", where Ts is the time to reach the boundary {c, d} starting at
x e [c, d]. Show that m(x) solves the boundaryvalue problem
d e m dm
m(c)=m(d)=0.
zag d2 + dz 1,
(ii) Let r(x) = Px (r d < ;) for x e [c, d]. Verify that r(x) solves the boundary value
problem
Z 2 x z
dx
+ dr = 0,
dx
r(c) = 0, r(d) = 1.
THEORETICAL COMPLEMENTS
Proof. To see this one uses the general measuretheoretic fact that the probability
of any event A belonging to the sigmafield F = 6 {X,, X2, ... , Xn , ...} generated (
the field of events ,`fo = Un 1 ar {X i ..... X} in the sense that A. e a{X ... , X} for
each n and P(AAA) + 0 as n + oo, where A denotes the symmetric difference
AAA = (A n A n v (A` n A). Applying this approximation to a tail event A, one
obtains that since Ac u{X +1, X +2, ...} for each n, A is independent of each event
A. Thus, 0 = lim... P(AAA) = 2P(A)P(AC) = 2P(A)(1 P(A)). The only solutions
to the equation x(1 x) = 0 are 0 and 1. n
2. Let S. = X l + + X, n 1. Events that depend on the tail of the sums are trivial
(i.e., have probability I or 0) whenever the summands X 1 , X 2 , ... are i.i.d. This is a
THEORETICAL COMPLEMENTS 91
consequence of the following more general zeroone law for events that symmetrically
depend on the terms X 1 , X 2 , .. . of an i.i.d. sequence of random variables (or vectors).
Let ' denote the sigmafield of subsets of I8x' _ {(x,, x 2 , ...): x ; E W} generated by
events depending on finitely many coordinates.
Theorem T.1.2. (HewittSavage ZeroOne Law). Let X,, X 2 ,. . . bean i.i.d. sequence
of random variables. If an event A = {(X,, X 2 , ...) e B}, where Be , is invariant
under finite permutations (Xi ,, X.....) of terms of the sequence (X,, X 2 , ...), that
is, A = {(X, X......) e B} for any finite permutation (i,, i 2 , ...) of (1, 2, ...), then
P(A) = I or 0.
Proof. To prove the HewittSavage 01 law, proceed as in the Kolmogorov 01 law
by selecting finitedimensional approximants to A of the form A n = {(X,, ... , Xn ) e B},
B. e 4n, such that P(AAA,)  0 as n * oo. For each fixed n, let (i,, i Z , ...) be the
B,
permutation (2n, 2n 1, ... , 1, 2n + I, ...) and define A. = {(X; , ... , X; ,) e B}.
Then A and A n are independent with P(A, n A n ) = P(A,)P(A,) = (P(A n )) 2  (P(A)) z
as n ^ co. On the other hand, P(AA,) = P(ALA) * 0, so that P(A,O n ) 0 and,
in particular, therefore P(A n r n ) * P(A) as n 4 co. Thus x = P(A) satisfies x = x 2 .
n
Proof. To prove this, first observe that P(S, = 0 i.o.) is 1 or 0 by the HewittSavage
zeroone law (theoretical complement 1.2). If X , P(S n = 0) < oc, then P(S, = 0
i.o.) = 0 by the BorelCantelli Lemma. If Y_ , P(S, = 0) is divergent (i.e., the
expected number of visits to 0 is infinite), then we can show that P(Sn = 0 i.o.) = 1
as follows. Using independence and the property that the shifted sequence
Xk , Xk ,,, ... has the same distribution as X,, X 2 , ... , one has
=Z
P(Sn =0)P(Sm S, 540,m>n)
Thus,
1 1 "
P(S " = 0) = Ee"s^ dt = (p"(t) dt
lac _ 2n _"
1 N dt
9(x) = 2n " 1  xtp(t)
f n dt f ( l
" Re 1 dt =
f 1  x(p (t) dt a 1  xq,(t)
dt
J n  x0(t)  .L \1  xtP(t)/ J " I 1  x^P(t)Iz j I1  x^V(t)I Z
> a
J a (1 
1 x(p 1 (t)
x(V1(t))2 + x2(t)
1x
dt v
f b
6 (1
1 x
 x + xslti) 2 + x2e2t2
dt
dt> a 1x dt
2(1  x) 2 + 3x 2 r 2 t 2 3[(1 x) 2 + e 2 t 2 ]
2  1^ ES 1 n
=tan /J^ asx' 1.
3e 1  x 3E
l(f)' L
'I "ki
f(x1 .,..., x !k)pI1....f Pik(dxi^ . . .dx^k)s
where
The Borel sigmafield .4 of C[0, 1] for the metric p is the smallest sigmafield of subsets
of C[0, 1] that contains all finitedimensional events of the form
min(i,j) / k \2
^yijXiXj=>Xi Xj ^ (Lr trt) = ttrtr1) xi J1 iO.
i,j i,j r=1 r i=r
However, the problem with this construction of a probability space (i2, F, P) for
{X,} is that events in F can only depend on specifications of values at countably
many time points. Thus, the subset C[0, 1] of S2 is not measurable; i.e., C[O, 1] 0 F.
This dilemma is resolved in the theoretical complement to Section I.13 by showing
that there is a modification of the process {X} that yields a process {B,} with sample
paths in C[O, 1] and having the same finitedimensional distributions as {X}; i.e.,
{B,} is the desired Brownian motion process. The basic idea for this modification is
to show that almost all paths q + Xq , q e D, where D is a countable dense set of time
points, are uniformly continuous. With this, one can then define {B,} by the continuous
extension of these paths given by
B, _ XQ ift=qeD
lim Xq if t D.
q1t
Probability and Measure, 2nd ed., Wiley, New York, P. 558). In theory, this is enough
sample path regularity to make manageable most such measurability issues connected
with processes at uncountably many time points. In practice though, one seeks to
explicitly construct models with sufficient sample path regularity that such
considerations are often avoidable. The latter is the approach of this text.
1. Let {X;" }, n = 1, 2, ... and {X,} be stochastic processes whose sample paths
)
(S, 9). Assume {X;" } and {X} are defined on a probability space (S2, .F , Q). Then,
)
Convergence in distribution of {X} to {X} has been defined in the text to mean that
the sequence of realvalued random variables Y:= f({X;" }) converges in distribution )
To see the equivalence, first observe that for any continuous f: S R', the
functions cos(rf) and sin(rf) are, for each r E R', continuous and bounded functions
on S. Therefore, assuming that condition (T.8.2) gives the convergence of the
characteristic functions of the Y. to that of Y for each continuous f on S. In particular,
the Y" must converge in distribution to Y. To go the other way, suppose that f: S ff8'
is continuous and bounded. Assume without loss of generality that 0 <f _< 1. Then,
for each N _> 1,
t
liminf f dP + N , (T.8.5)
s
by (T.8.4) applied to P, and the fact that lim Prob(} > x) = Prob(Y > x) for all
points x of continuity of the d.f. of Y implies liminf Prob( Y > y) > Prob(Y > y) for
ally. Letting N ' gives
Thus, in general,
limsup
J s
f dP < I f dP < liminf I f dP
i s ', s
(T.8.6)
which implies
With the above equivalence in mind we make the following general definition.
Definition. A sequence {P} of probability measures on (S, .4) converges weakly (or in
distribution) to a probability measure P on (S, 9) provided that lim f s f dP = f s f dP
for all bounded and continuous functions f: S + U8'.
then {P.} has a subsequence weakly convergent to a probability measure Q on (S, t7).
Moreover, if S is complete and separable then the condition (T.8.8) is also necessary.
The condition (ii) refers to the equicontinuity of the functions in A in the sense that
given any r > 0 there is a common S > 0 such that for all functions w e A we have
Iw, cw,I < e if It sI < b. Conditions (i) and (ii) together imply that A is uniformly
bounded in the sense that there is a number B for which
This is because for N sufficiently large we have sup WEA v w (1/N) < I and, therefore,
for each 0<t<1
N
Iw1(< Iw01 + wi,1N wr,u Nl _< sup Iwol + N sup v^,( = B.
i =1 weA w.A \N
4. Combining the Prohorov theorem (T.8.2) with the ArzelaAscoli theorem (T.8.3)
gives the following criterion for tightness of probability measures { P} on S = C[0, 1].
Theorem T.8.4. Let {P} be a sequence of probability measures on C[0, 1]. Then
{P} is tight if and only if the following two conditions hold.
98 RANDOM WALK AND BROWNIAN MOTION
(ii) For each e> 0, ry > 0, there is a 0 <(5 < 1 such that
P"({weC[0,1]:v",(b)>_a})_<ry, n>_1.
Proof. If {P"} is tight, then given ry > 0 there is a compact K such that P(K) > 1 ry
for all n. By the ArzelaAscoli theorem, if B> supKIWol then
P"({w e C[0,1]: v W (S) >_ ej) < P(KC) < ry for all n _> 1.
The converse goes as follows. Given ry > 0, first select B using (i) such that
P"({w: Iw o l < B}) _> 1 Zry, for n >, 1. Select S, using (ii) such that P({w: v w ((5,) < 1/r})
1 for n >, 1. Now take K to be the closure of
{w: Iw o l < B} n n t
co: v w (8,) <
1r
1
Then P"(K) > 1 ry for n > 1, and K is compact by the ArzelaAscoli theorem.
Theorem T.8.5. Let {X: 0 < t < 1) and {XX :0 _< t <, 1} be stochastic processes on
(S2, .f, P) which have a.s. continuous sample paths and suppose that the
finitedimensional distributions of {X} converge to those of {X}. Then {X;"}
converges weakly to {X,} if and only if for each F > 0
Corollary. For the last limit to hold it is sufficient that there be positive numbers
a, , M such that
EIX;" X" M
) < MIt
it sl' +0 for all s, t, n.
To prove the corollary, let D be the set of all dyadic rationals in [0, 1], i.e., numbers
in [0, 1] of the form j/2" for integers j and m. By sample path continuity, the oscillation
THEORETICAL COMPLEMENTS 99
j2  k<i2  ^<(j+l)2  k
r
i2  m=j2 k + 1] 2  m' wherek<m 1 <m 2 < <m,^m,
(n) lI ^`
J)j
X  k X.j2k+al/+) Xj2  k+al 
N=1
Therefore,
Let e > 0 and take = 2 k+ so small (i.e., k so large) that ^=k+1 1/m 2 < E/2. Then
h=0 m
x m 2
m 2a2m M2 m(l+p) = M m
mk+1 mk+12
of all orders, this approach can also be used to give an alternative rigorous construction
of the Wiener measure based on Prohorov's theorem as the limiting distribution of
random walks. (Compare theoretical complement 13.1 for another construction). Let
Z 1 , Z 2 ,... be i.i.d. random variables on a probability space (S2, y , P) having mean
zero, variance one, and finite fourth moment m, = EZ. Define So = 0,
S"=Z 1 + +Z",n>,1,and
Xtn] = n IIZ
Stnr] + n  ' ]Z (nt [ nt])Zt,, i +l, 0 < I < 1.
We will show that there are positive numbers a, and M such that
By our corollary this will prove tightness of the distributions of the process
n = 1, 2, .... This together with the finitedimensional CLT proves the FCLT under
the assumption of finite fourth moments. One needs to calculate the probabilities of
fluctuations described in the ArzelaAscoli theorem more carefully to get the proof
under finite second moments alone.
To establish (T.5.1), take a = 4. First consider the case s = (j/n) < (k/n) = t are
at the grid points. Then
k k k k
=n 2 E{Zi,ZZ,,Z;,}
ij+1 i2=j+1 i3=j+1 is=j+1
1.
Thus, in this case,
Next, consider the more general case 0 < s, t < 1, but for which It si > 1/n. Then,
for s < t,
4
E{X;' X;" ] } 4 = n Z E
t [ntl
j=(ns]+ 1
ZJ + (nt [nt])Z[",1+1 ([ns] ns)Z]"sl+l
[n^] \a
n  2 3 E( ' Z^ ) + (nt [nt]) 4 EZ1", 1+ 1
=[ns]+I
In the above, we used the fact that (a + b + c) 4 < 3 4 (a 4 + b a + c a ) to get the first
inequality. The analysis of the first (gridpoint) case was then used to get the second
inequality. Finally, if it sI < 1/n, then either
(a) k_<s<t<k+
l forsome0_<k_<n 1,or
n n
k k+ 1 k+2
(b) s< and\t< forsome0_<k_<n1.
n n n n
E{X^") _ XS"^}a = n Z E Zk + nl t
k k
\ Zk+i n(s ^Zk+I
la
+i
\ ( n n )
( ( k+l ^ k+11 la
=t1
n Z E n \ t Zk+zn s JZk+i Jj
n n
<24n2(t
nt
k a+(k+
am
1 a
s) m a
+
= 2 a n 2 m a 1(t k 1^ a + I k ns )a }
a )J)
+ I + k + 1 sl = 2 n2m
tk
2an2ma 4 4 (t s)a
n n J
The FCLT (Theorem 8.1) is stated in the text for convergence in S = C[0, 00),
when S has the topology of uniform convergence on compacts. One may take the met
ric to be p(w, co') _ Zk 1 2 k dk /(1 + dk ), where d k = max{Iw(t) cu'(t)J: 0 t k}.
Since the above arguments apply to [0, k] in place of [0, 1], the assertion of Theorem
8.1 follows (under the moment condition m a < oo).
102 RANDOM WALK AND BROWNIAN MOTION
6. (Measure Determining Classes) Let (S, p) be a metric space, .a(S) its Borel sigmafield.
A class l c a(S) is measuredetermining if, for any two finite measures u, v,
p(C) = v(C) VC e' implies p = v. An example is the class 9 of all closed sets. To
see this, consider the lambda class .sad of all sets A for which p(A) = v(A). If this class
contains 9 then by the PiLambda Theorem (Chapter 0, Theorem 4.1)
a a(s) = .a(S). Similarly, the class (0 of all open sets is measuredetermining. A
class 9 of realvalued bounded Borel measurable functions on S is measuredetermining
if ff dp = f f dv Vg e I implies p = v. The class C b (S) of realvalued bounded
continuous functions on S is measuredetermining. To prove this, it is enough to
show that for each Fe 9 there exists a sequence {f} c Cb (S) such that f" j I F as
n I oo. For this, let h a (r) = 1 nr for 0 _< r < 1/n, h a (r) = 0 for r > 1/n. Then take
f(x) = h"(p(x, F)).
f ({Xs }) := max{X5 :0 ,< s ,< t} __ M, we get (10.10), for P(T. > t) = P(M< < z), if z > 0.
The case z < 0 is similar.
Joint distributions of several functionals may be similarly obtained by looking at
linear combinations of the functionals. Here is the precise statement.
Theorem T.9.1. If f: C[0, oo)  l8'` is continuous, say f = (fl , ... , f,,) where
f : C[0, oo)  III', then the random vectors X. = f({X;" 1 }), n >, 1, converge in
distribution to X = f({X,}).
Proof. This can be proved using Alexandrov's Theorem (T.8.1(ii)), since for any
closed set F, f '(F) c f  `(F) = Df v f  `(F), where the overbar denotes the closure

of the set. n
THEORETICAL COMPLEMENTS 103
A proof of Proposition 12.1 for the special case of infinitedimensional events that
depend on the empirical process through the functional (w  supo,,,, Ico,l) used to
define the KolmogorovSmirnov statistic (12.11) is given below. This proof is based
on a trick of M. D. Donsker (1952), "Justification and Extension of Doob's Heuristic
Approach to the KolmogorovSmirnov Theorems," Annals Math. Statist., 23,
pp. 277281, which allows one to apply the FCLT as given in Section 8 (and proved
in theoretical complements to Section 1.8 under the assumption of finite fourth
moments).
The key to Donsker's proof is the simple observation that the distribution of the
order statistic (Y i) , ... , Y) of n i.i.d. random variables Y, Y2 , ... from the uniform
distribution on [0, 1] can also be obtained as the distribution of the ratios
S, S Z S
( S., S+ I S +I
Intuitively, if the T are regarded as the successive times between occurrence of some
phenomena, then S, is the time to the (n + 1)st occurrence and, in units of S + 1 ,
the occurrence times should be randomly distributed because of lack of memory and
iridependence properties. A version of this simple fact is given in Chapter IV
(Proposition 5.6) for the Poisson process. The calculations are essentially the same,
so this is left as an exercise here.
The precise result that we will prove here is as follows. The symbol = below
denotes equality in distribution.
Proposition T.12.1. Let Y1 , Y2 be i.i.d. uniform on [0, 1] and let, for each n _> 1,
, ...
d
='nmax
d 'I
I Sk
k5n S^11
k
 =
n
n Skk kSn+1
max I 
Sn+1 kn n
n
where
and, by the SLLN, n/(S n+1 ) ^ I a.s. as n  ao. The result follows from the FCLT,
(8.6), and the definition of Brownian bridge. n
where FE := {w e C[0,1]: dist(o, F) _< a} fordist(w, A):= inf{p(w, y): ye Al, A c C[0, 1].
But, starting with finitedimensional sets and then using the monotone class argument
(Chapter 0), one may check that the events {{B*) e F } and {0 _< B, _< e} are E
independent. Therefore, P({B,} e F 0 _< B 1 _< e) _< P({B*} e FE ) for any s > 0. Since
F is closed, the events {{B*} e F } decrease to {{B*} e F) as e  0, and tightness
E
follows from the continuity of the probability measure P and Alexandrov's Theorem
T.8.1 (ii).
4. A check of tightness is also required for the Brownian meander and Brownian
excursion as described in Exercises 12.6 and 12.7, respectively. For this, consult
R. T. Durrett, D. L. Iglehart, and D. R. Miller (1977), "Weak Convergence to
Brownian Meander and Brownian Excursion," Ann. Probab., 5, pp. 117129. The
THEORETICAL COMPLEMENTS 105
distribution of the extremal functionals outlined in the exercises can also be found
in R. T. Durrett and D. L. Iglehart (1977), "Functionals of Brownian Meander and
Brownian Excursion," Ann. Probab., 5, pp. 130135; K. L. Chung (1976), "Excursions
in Brownian Motion," Ark. Mat., pp. 155177; D. P. Kennedy (1976), "Maximum
Brownian Excursion," J. App!. Probability, 13, 371376. Durrett, Iglehart and Miller
(1977) also show that the * and + commute in the sense that the Brownian excursion
can be obtained either by a meander of the Brownian bridge (as done in Exercise
12.7) or as a bridge of the meander (i.e., conditioning the meander in the sense of
theoretical complement 3 above). Brownian meander and Brownian excursion have
been defined in a variety of other ways in work originating in the late 1940's with
Paul Levy; see P. Levy (1965), Processus Stochastiques et Mouvement Brownien,
GauthierVillars, Paris. The theory was extended and terminology introduced in
K. Ito and H. P. McKean, Jr. (1965), Diffusion Processes and Their Sample Paths,
Springer Verlag, New York. The general theory is introduced in D. Williams (1979),
Diffusions, Markov Processes, and Martingales, Vol. 1, Wiley, New York. A much
fuller theory is then given in L. C. G. Rogers and D. Williams (1987), Diffusions,
Markov Processes, Martingales, Vol. II, Wiley, New York. Approaches from the point
of view of Markov processes (see theoretical complement 11.2, Chapter V) having
nonstationary transition law are possible. Another very useful approach is from the
point of view of FCLTs for random walks conditioned on a late return to zero; see
W. D. Kaigh (1976), "An Invariance Principle for Random Walk Conditioned by a
Late Return to Zero," Ann. Probab., 4(1), pp. 115 121, and references therein. A
connection with extreme values of branching processes is described in theoretical
complement 11.2, Chapter V.
j P max
n=1
sup Xq  XXI2 "i >
(0_<k1.2 qeJ,,,,, D
n1 < oo . (T.13.1)
By the BorelCantelli lemma we will get from this that with probability 1, for all n
sufficiently large,
1
max sup IXq  Xkf2 .,I < . (T.13.2)
OEk<n2" geJ,,.kn D n
_>
In particular, it will follow that with probability 1, for every t > 0, q  Xq is uniformly
continuous on D n [0, t]. Thus, almost all sample paths of {Xq : q e D} have a unique
extension to continuous functions {B,: t 0}. That is, letting C = {wu e Q: for each
B,(co) = Xq (W), if t = q e D,
(T.13.3)
lim Xq (co), if t 0 D,
qt
where the limit is over dyadic rational q decreasing to t. By construction, {B,: t _> 0}
has continuous paths with probability 1. Moreover, for 0 < t, < < t, with prob
ability one, (B,,, .. . , B1 ) = lim" . (Xqc ., ... , Xq ,) for dyadic rational q;" , ... , qp"
) )
decreasing to t,.. ... t k . Also, the random vector (Xgti .... .. Xq ^ ,) has the multivariate
normal distribution with mean vector 0 and variancecovariance matrix
= min(t ; , t i ), I < i, j < k as a limiting distribution. lt follows from these
two facts that this must be the distribution of (B 1 , B, k ). Thus, {B,} is a standard
Brownian motion process.
To verify the condition (T.13.1) for the BorelCantelli lemma, just note that by
the maximal inequality (see Exercises 4.3, 13.11),
P( max IX,+,a2
(s2"
a) 2P(IX,+a X11 % a)
2
4 E(X1+, X,) 4
a
66'
a
(T.13.4)
since the increments of {X,} are independent and Gaussian with mean 0. Now since
the events {max 142 .IX, +1a2 , X, a} increase with m, we have, letting m * oo,
Thus,
/ 1
Pl max sup Xq Xk12 .4 >  Pf sup (Xq Xk,2"I >
O^kan2"qeJ".knD n k=0 \qeJ",k,D n
n2"
6^ri2 ^
( n)2
)4 =
6
5
(T.13.6)
which is summable.
1 MARKOV DEPENDENCE
Definition 1.1. A stochastic process {X0 , X 1 ..... X, ...} has the Markov
property if, for each n and m, the conditional distribution of X + 1 .. , Xn +m
,
is a function only of X.
Proof. For simplicity, take the state space S to be countable. The necessity of
the condition is obvious. For sufficiency, observe that
109
110 DISCRETEPARAMETER MARKOV CHAINS
The last equality follows from the hypothesis of the proposition. Thus the
conditional distribution of the future as a function of the past and present states
i 0 , i 1 , ... , i n depends only on the present state i n . This is, therefore, the
conditional distribution given X n = i n (Exercise 1). n
An i.i.d. sequence and a random walk are merely two examples of Markov
chains. To define a general Markov chain, it is convenient to introduce a matrix
p to describe the probabilities of transition between successive states in the
evolution of the process.
matrix p = ((p i3 )), where i and] vary over a finite or denumerable set S, satisfying
(i) p i; >, 0 for all i and j,
The set S is called the state space and its elements are states.
Think of a particle that moves from point to point in the state space according
to the following scheme. At time n = 0 the particle is set in motion either by
starting it at a fixed state i o , called the initial state, or by randomly locating it
in the state space according to a probability distribution it on S, called the
initial distribution. In the former case, it is the distribution concentrated at the
state i 0 , i.e., n ; = 1 if j = i o , ire = 0 if j : i 0 . In the latter case, the probability
is 7r ; that at time zero the particle will be found in state i, where 0 < n i < 1 and
y i tc i = 1. Given that the, particle is in state i 0 at time n = 0, a random trial is
performed, assigning probability p ;0 to the respective states j' E S. If the
outcome of the trial is the state i l , then the particle moves to state i 1 at time
n = 1. A second trial is performed with probabilities p i , j  of states j' E S. If the
outcome of the second trial is i 2 , then the particle moves to state i 2 at time
n = 2, and so on.
A typical sample point of this experiment is a sequence of states, say
(i o , i 1 , i z , ... , i n , ...), representing a sample path. The set of all such sample
paths is the sample space S2. The position Xn at time iI is a random variable
whose value is given by X n = i n if the sample path is (i 0 , i l , ... , i n , ...). The
precise specification of the probability P,, on Q for the above experiment is
given by
.. p
Pn( XO = l 0, X1 = ii, , Xn = in) = M io Pioi1 Pi l i 2 (2.1)
nj ) _ P kcS
ik Pkj (2.4)
The elements of the matrix p" are defined recursively by p" = p" 'p so that the 
It is easily checked by induction on n that the expression for p;n is given directly )
Now let us check the Markov property of this probability model. Using (2.1)
and summing over unrestricted coordinates, the joint distribution of
Xo , Xn ,, X" 2 , . .. , Xnk , with 0 = n o < n l < n2 < < nk, is given by
Pa(X0 = i, Xn l =j, Xn 2
12' ... , Xn k =ik)
_ Z E . . . I (
Y /n2  1 J2 )
7r l Pii, Pi, . . . . pin 1  lj l)(Pi l in i + I Pin, + l in, + Z . . I7 . .
1 2 k
where >, is the sum over the rth block of indices ii + n ,_ ', .+ .), . 1,,
, in ,
(r = 1, 2, . .. , k). The sum Y_ k , keeping indices in all other blocks fixed, yields
the factor p^kk Ok ') using (2.6) for the last group of terms. Next sum successively
over the (k 1)st, ... , second, and first blocks of factors to get
, Xn k =1k) = 7riP) p
^nJ 2 nl)..
P (X0 = i, Xn1 =j1, X,2 =J2' . plkk Ijk 1). (2.8)
Xnk 1) .
PE(Xni j1, X2 121 , Jk) = ( ^I Pin1))Pjll.i2 "
. .pjkk lik 1). (2.9)
( ics /
Although by Proposition 1.1 the case m = 1 would have been sufficient to prove
SOME EXAMPLES 113
the Markov property, (2.10) justifies the terminology that p' :_ ((p;; >)) is the
mstep transition probability matrix. Note that p' is a stochastic matrix for all
m>'1.
The calculation of the distribution of X, follows from (2.10). We have,
where n' is the transpose of the column vector n, and (n'pt) j is the jth element
of the row vector n'pm.
3 SOME EXAMPLES
The transition probabilities for some familiar Markov chains are given in the
examples of this section. Although they are excluded from the general
development of this chapter, examples of a nonMarkov process and a Markov
process having a nonhomogeneous transition law are both supplied under
Example 8 below.
This means that if the process is now in state i it must be in state h(i) at the
next instant. In this case, if the initial state X o is known then one knows the
entire future. Thus, if X 0 = i, then
X l = h(i),
X z = h(h(i)) := h (z) (i), ... ,
"  1)
X" = h(h ( (i)) = h ( " (i), ... .
)
Hence p;;^ = 1 if j = h ( " (i) and p;j" = 0 if j ^ h ( " (i). Pseudorandom number
) ) )
p si =p (ieS,jeS). (3.2)
114 DISCRETEPARAMETER MARKOV CHAINS
Pi; =P ifj =i +1
=q ifj =i 1
=0 ifjjiI> 1. (3.3)
p ;j =p ifj =i +1 andc<i<d
=q ifj =i I andc<i<d
Pc,c +i = I, Pa,e i = 1. (3.4)
In this case, if at any point of time the particle finds itself in state c, then at
the next instant of time it moves with probability I to c + 1. Similarly, if it is
at d at any point of time it will move to d I at the next instant. Otherwise
(i.e., in the interior of [c, d]), its motion is like that of a simple random walk.
For c < i < d, p i; is as defined by (3.3) or (3.4). In this case, once the particle
reaches c (or d) it stays there forever.
One may think of this Markov chain as a partialsum process as follows. Let
X0 have a distribution n. Let Z 1 , Z 2 , ... be a sequence of i.i.d. random variables
with common distribution Q and independent of X o . Then,
is a Markov chain with the transition probability (3.6). Also note that Example
3 is a special case of Example 6, with Q( l) = q, Q(l) = p, and Q(i) = 0 for
i: +1.
The last row says that "zero" is an absorbing state, i.e., if at any point of time
X = 0, then X. = 0 for all m > n, and extinction occurs.
[r+(s1)c][r+(s2)c]. r[b+(T1)c].b
(3.9)
[r+b+(n 1)c][r+b+(n2)c]...[r+b]
where
s= Y_ s k , r =ns. (3.10)
k=1
[ r + (n 1) c] r
P(X I = 1, ... X = 1) (3.11)
[r+b+(n1)c].[r+b]
[b + (n 1)c].b
P(XI=0,...,Xn =0)= (3.12)
[r+b+(n 1)c][r+b]
In particular,
P (X1 = E1, .. . , X. = n)
P (X 1 = c1,. ..,Xt = En t )
[r+s_Ic]
ifE= 1
r+b+(n 1)c
(3.13)
[b+rIC]
ife=0.
r+b+ (ii 1)c
It follows that {X} is nonMarkov unless c = 0 (in which case {X} is i.i.d.).
Note, however, that {X} does have a distinctive symmetry property reflected
in (3.9). Namely, the joint distribution is a function of s = Yk =1 e k only, and
is therefore invariant under permutations of e l . i,,. Such a stochastic process
is called exchangeable (or symmetrically dependent). The Plya urn model was
originally introduced to illustrate a notion of "contagious disease or "accident
proneness' for actuarial mathematics. Although {X} is nonMarkov for c ^ 0,
it is interesting to note that the partialsum process {S}, representing the
evolution of accumulated numbers of red balls sampled, does have the Markov
property. From (3.13) one can also get that
STOPPING TIMES AND THE STRONG MARKOV PROPERTY 117
r + CS n _ , if s = l +s_,
+(n1)c
P(sn= sIS, =s1, ,Sn_, =s,)= r+b
b +(n S 1 ) c
lfs=s _ i .
r + b + (n 1)c
(3.14)
Observe that the transition law (3.14) depends explicitly on the time point n.
In other words, the partialsum process {S} is a Markov process with a
nonhomogeneous transition law. A related continuoustime version of this
Markov process, again usually called the P1ya process is described in Exercise
1 of Chapter IV, Section 4.1. An alternative model for contagion is also given
in Example 1 of Chapter IV, Section 4, and that one has a homogeneous
transition law.
One of the most useful general properties of a Markov chain is that the Markov
property holds even when the "past" is given up to certain types of random
times. Indeed, we have tacitly used it in proving that the simple symmetric
random walk reaches every state infinitely often with probability 1 (see
Eq. 3.18 of Chapter 1). These special random times are called stopping times
or (less appropriately) Markov times.
If co is such that Y(w) y whatever be n (i.e., if the process never reaches y),
then take r y,(w) = oo. Observe that
Hence zr Y is a stopping time. The rth return times r;' ofy are defined recursively by
)
Once again, the infimum over an empty set is to be taken as oo. Now whether
or not the process has reached (or hit) the state y at least r times by the time
m depends entirely on the values of Y1 , ... , Y.. Indeed, {rI m} is precisely
the event that at least r of the variables Y1 , ... , Y,n equal y. Hence ry" is a
stopping time. On the other hand, if n y denotes the last time the process reaches
the state y, then ?J is not a stopping time; for whether or not i < m cannot
in general be determined without observing the entire process {Y n }.
Let S be a countable state space and p a transition probability matrix on S,
and let P,, denote the distribution of the Markov process with transition
probability p and initial distribution n. It will be useful to identify the events
that depend on the process up to time n. For this, let S2 denote the set of all
sequences w = (i 0 , i 1 , i 2 , ...) of states, and let Y(w) be the nth coordinate of w
(if w = (i o , i,, ... , i, ...), then Yn (cw) = in ). Let .fin denote the class of all events
that depend only on Yo , YI , ... , Yn . Then the n form an increasing sequence
of sigmafields of finitedimensional events. The Markov property says that given
the "past" Yo , Yi , ... , Y. up to time m, or given .gym , the conditional distribution
of the "afterm"stochastic process Y, = {(Ym ) } := {Ym+n . n = 0, 1, ...} is P.
In other words, if the process is reindexed after time m with m + n being
regarded as time n, then this stochastic process is conditionally distributed as
a Markov chain having transition probability p and initial state Y..
Suppose now that r is the stopping time. "Given the past up to time t" means
given the values oft and Yo , Y1 , ... , YY . By the "afterr"process we now mean
the stochastic process
Theorem 4.1. Every Markov chain { Yn : n = 0, 1, 2, ...} has the strong Markov
property; that is, for every stopping time i, the conditional distribution of the
afterr process Yt = { Yt+n : n = 0, 1, 2, ...}, given the past up to time i is P..
on the set {i < co}.
Proof. Choose and fix a nonnegative integer m and a positive integer k along
with k time points 0 _< m, < m 2 < < m k , and states i o , i l , ... 'im'
STOPPING TIMES AND THE STRONG MARKOV PROPERTY 119
Now all the steps in (4.7) remain valid if one replaces Ty l) by i;,' 1) and T;,2) by
zy'r and assumes that < oo almost surely. Hence, by induction,
P (') < oo) = 1 for all positive integers r. This is equivalent to asserting
The unrestricted simple random walk {S} is an example in which any state
i E S can be reached from every state j in a finite number of steps with positive
probability. If p denotes its transition probability matrix, then p 2 is the transition
probability matrix of { Y} 2_ {SZ : n = 0, 1, 2, ...}. However, for the Markov
chain { Y}, transitions in a finite number of steps are possible from odd to odd
integers and from even to even, but not otherwise. For {S} one says that there
is one class of "essential states and for { Y} that there are two classes of
essential states.
A different situation occurs when the random walk has two absorbing
boundaries on S = {c, c + 1, ... , d 1, d}. The states c, d can be reached (with
positive probability) from c + 1, ... , d 1. However, c + 1, ... , d 1 cannot
A CLASSIFICATION OF STATES OF A MARKOV CHAIN 121
>,
behavior of the process. If a chain has several essential classes, the process
restricted to each class can be analyzed separately.
Definition S.I. Write i ' f and read it as either ` j is accessible from i" or "the
Y_
process can go from i to j" if p;l > 0 for some n ) 1.
Since
Pi;)= Pi (5.1)
1I.i2....,ln  1 E.S
i  j if and only if there exists one chain (i, i,, i 2 , ... , i n _ 1 , j) such that
Pul' pi,i2 , . , p,,,_ 1 j are strictly positive.
Proposition 5.1
(a) For every i there exists (at least one) j such that i ' f. 
Y
Proof. (a) For each i, jEs pit = 1. Hence there exists at least one j for which
p i; >0; for this] one has i j.
(b) i * j, j  k means that there exist m? 1, n I such that p;f > 0, >, )
pi
l
m+1)
= li Pu
i
(m) = pif Pi (m) + pilp it ) > 0. (5.3)
Ics )3E.%
i.e., there exists m' >, 1 such that p1') > 0. Then, by (b), i  k. Since i is essential,
one must have k > i. Together with i + j this implies (again by (b)) k  j.
Thus, if any state k is accessible from j, then j is accessible from that state k,
proving that j is essential.
(e) If 60 is empty (which is possible, as for example in the case p 1 = 1,
and j  k. Hence i 4 k (by (b)). Also, k  j and j > i imply k > i (again by
(b)). Hence i H k. This shows that "+" is transitive (on 9 as well as on S).
From the proof of (e) the relation "^" is seen to be symmetric and transitive
on all of S (and not merely 9). However, it is not generally true that i i (or,
i  i) for all i e S. In other words, reflexivity may break down on S.
Definition 5.3. A transition probability matrix p having one essential class and
no inessential states is called irreducible.
Now fix attention on S. Distinct subsets of essential states can be identified
according to the following considerations. Let i e.1. Consider the set
6'(i) = { j e 6": i > j}. Then, by (d), i +] for all j E 6(i). Indeed, if], k e 6"(i), then
j H k (for j > i, i  k imply j + k; similarly, k  j). Thus, all members of 6'(i)
communicate with each other. Let r e 6, r 6"(i). Then r is not accessible from
a state in 6'(i) (for, if j e e'(i) and j + r, then i > j, j a r will imply i > r so
that r e 6"(i), a contradiction). Define S(r) = { j E 6',r + j}. Then, as before, all
states in c0(r) communicate with each other. Also, no state in 6"(r) is accessible
from any state in 6'(i) (for if ! e "(r), and j e 6'(i) and j + 1, then i  1; but r F + 1,
so that i  1, 1 > r implying i  r, a contradiction). In this manner, one
decomposes 6" into a number of disjoint classes, each class being a maximal set
of communicating states. No member of one class is accessible from any member
of a different class. Also note that if k e 6"(i), then 6"(i) = 6'(k). For if j e 6'(i),
then j , i, i * k imply j  k; and since j is essential one has k > j. Hence
j e 6"(k). The classes into which of decomposes are called equivalence classes.
In the case of the unrestricted simple random walk {S}, we have
6" S = {0, + 1, 2,. . .}' and all states in 6" communicate with each other;
only one equivalence class. While for {X} = {S 2n }, = S consists of two disjoint
equivalence classes, the odd integers and the even integers.
Our last item of bookkeeping concerns the role of possible cyclic motions
within an essential class. In the unrestricted simple random walk example, note
that p i ,=0 for all i=0,1,2,...,butp;2 ) =2pq>0. In fact p;7 1 =0for
all odd n, and p;" > 0 for all even n. In this case, we say that the period of i
)
Proposition 5.2
(a) If i H j then i and j possess the same period. In particular "period" is
constant on each equivalence class.
(b) Let i e 9' have a period d = d ; . For each j e 6'(i) there exists a unique
integer r 1 , 0 < rj d  1, such that p;j ) > 0 implies n = rj (mod d) (i.e.,
either n = rj or n = sd + rj for some integer s >, I).
for all positive integers a, m, b. Choose a and b such that p;, > 0 and pj(b > 0.
) )
0 1 0 0
0 0 1 0
0 2 1 0 '2
1 0 0 0
4.1
T
I>2^3
2
Thus p;i' = 0, pi 1 > 0 , P11 > 0, etc., and pi"i = 0 for all odd n. The states
communicate with each other and their common period is 2, although
min{n: p;"1 > 0} = 4. Note that min{n > 1: p; ) > 0} is a multiple of d, since d,
divides all n for which p;" ) > 0. Thus, d. <, min{n >, 1: p> 0}.
Proposition 5.3. Let i E e have period d> 1. Let Cr be the set of j e .9(i) such
that rj = r, where rr is the remainder term as defined in Proposition 5.2(b). Then
(a) Co , C,, ... , Cd _, are disjoint, U r = C, =
(b) If je C then pik >0 implies k e C, + ,, where we take r + 1 = 0 if
r = d 1.
Here is what Proposition 5.3 means. Suppose i is an essential state and has
a period d > 1. In one step (i.e., one time unit) the process can go from i E Ca
only to some state in C, (i.e., p 1 > 0 only if j e C 1 ). From states in C,, in one
step the process can go only to states in C 2 . This means that in two steps the
process can go from i only to states in C 2 (i.e., p; > 0 only if je C 2 ), and so
on. In d steps the process can go from i only to states in Cd + , = CO3 completing
one cycle (of d steps). Again in d + 1 steps the process can go from i only to
states in C 1 , and so on. In general, in sd + r steps the process can go from i
only to states in Cr. Schematically, one has the picture in Figure 5.1 for the
case d = 4 and a fixed state i e Co of period 4.
Example S.S. In the case of the unrestricted simple random walk, the period
is 2 and all states are essential and communicate with each other. Fix i = 0.
Then C o = {0, 2, 4, ...}, C l = (1, 3, 5, ...}. If we take i to be any
even integer, then C O3 C l are as above. If, however, we start with i odd, then
C o = { 1, 3, 5, ...}, C, = {0, 2, 4, ...}.
C,
iEG,.jEC j ^kEC,./EC,^mEC^
Figure 5.1
126 DISCRETEPARAMETER MARKOV CHAINS
Proposition 6.1. Suppose S is finite and p,J > 0 for all i, j. Then there exists a
unique probability distribution it = {m 1 : j e S} such that
and
Proof. Let M;" ) , m;" ) denote the maximum and the minimum, respectively, of
the elements {p: i e S} of the jth column of p'. Since p ;J >, 6 and
p ;J = 1 Y k0 J P ik < 1 (N 1)b for all i, one has
J JE)' JE)
so that
and
I (Pij P) _ y Pij
jE jEJ jEJ
Therefore,
(n + 1)(n + 1) (n) (n) (n)
nj Pi'j = Pik Pkj Pi'k Pj
k = (Pik Pi'k)Pj
k
k k k
/ (n) (n)
\Pik Pi'k)Pkj + (Pik Pi'k)Pkj
kcJ kEJ'
kJ kJ'
min)). (6.6)
_ (Mj mj )` ") ") y (Pik Pik)) (1 NS)(M
`keJ
Letting i, i' be such that p (n+l) _ Mj(n+l) p1j = m(n+l) one gets from (6.6),
Min+l) m 5v+l) < (1 N6)(Mj(" min)). ) (6.7)
Now
M in+ 1) = max (n+ 1) = max I )) < max (Y p ik M(n)1 = Mcn) ,
P \ P'k
Pkj j J
i i k i k
m
r 1) =min p^^ +i) = min Y_ P ik Pkj) min )
(^ Pikm;" ^ = m;" ) ,
i i \ k J i k
bounded above by 1, (6.7) now implies that both sequences have the same limit,
say n j . Also, 6<mj(' <m<nj <Mj for all n, so that n j >6 for all jand
) ()
one gets it j = E k Trk Pkj, proving (6.1). Since E j p;^ = 1, taking limits, as n  co,
)
IP(jv+r") 1Cjl =  v)
Pik ) (Pkj x j ) ) 1< L, Pik )(1 N(5 ' ) " = (1 Nb ' ) ,
" in i 1,
k k
(6.13)
to obtain
where [x] is the integer part of x. From here one obtains the following corollary
to Proposition 6.1.
Also,
Pi
Pn(Xm = lo, Xm+1 = i l , ... , Xm+n = in) _ (TC ' P m )i0Pt0t Pi 1 i2
1 .. .
r 11.
Now let
In a manner analogous to the proof of Proposition 6.1, one can also obtain
the following result (Exercise 9).
and
p( )(x, y) n(y)I '< [1 6(d c)]' '0 for all x, ye (c, d) (6.24)
where
Here p "(x, y) is the n step transition probability density function of X" given
[
Xo = x.
Proof. Define A + = {A > 0: Ax >, tix for some nonnegative nonzero vector x};
here inequalities are to be interpreted componentwise. Observe that the set A +
is nonempty and bounded above by IIAII :_ JN , J" , a ij . Let A o be the least
upper bound of A. There is a sequence {2 n : n >, l} in A + with limit A o as
n + oc. Let {x n : n >, l} be corresponding nonnegative vectors, normalized so
that hIxIl xi = 1, n >, 1, for which Ax n >, .? n x n . Then, since Ilxnll = 1 ,
n = 1, 2, ... , {x n } must have a convergent subsequence, with limit denoted x 0 ,
say. Therefore Ax 0 >, 2 0 x 0 and hence A o e A + . In fact, it follows from the least
1 _j a x
upper bound property of 2 c that Ax o = 2 0 x 0 . For otherwise there must be a
component with strict inequality, say l j j 2 0 x 1 = > 0, where
x 0 = (x 1 ..... x N )', and Ij , a kj x j 2 o x k >, 0, k = 2, ... , N. But then taking
y = (x 1 + ((5/2), x 2 , ... , x N )' we get Ay > pl o y with strict inequality in each
component. This contradicts the maximality of 2. To prove that if A is any
other eigenvalue then Al J <, A., let z be an eigenvector corresponding to A and
define Izl = (Iz1l, , Iz n l). Then Az = Az implies AIzl %JAIIzl. Therefore, by
definition of 2, we have 1 2 1 <, 20. To prove part (ii) of the Theorem we can
apply Proposition 6.1 to the transition probability matrix
C ^ P`, 10xi
(2
)X j
(2) N
Puj k1 Pik Pkj
N a ik x k a kj xj a
k = 1 20x1 AOxk ^O X
Xi
and inductively
^n> ai] xj
Pij ^f n >
A xi
Y
j=1
a,j x j = 2 o x 1 , i = 1, .. , N, x i > 0,
we have
N1
(Bx)i = a,jxj = 2oxi a,NXN < 2OX;, 1= 1,. ..,N 1.
j=1
Corollary 6.6. Let A = ((a ;j )) be a matrix of strictly positive elements and let
2 0 be the positive eigenvalue of maximum magnitude (spectral radius). Then
A 0 is a simple eigenvalue.
In general, the transition law p may admit several essential classes, periodicities,
or inessential states. In this section, we consider the asymptotics of p" for such
cases.
First suppose that S is finite and is, under p, a single class of periodic essential
states of period d> 1. Then the matrix p , regarded as a (onestep) transition
probability matrix of the process viewed every d time steps, admits d
(equivalence) classes C o , C l , ... , C_ 1 , each of which is aperiodic. Applying
(6.16) to Cr and p d (instead of S and p) one gets, writing N, for the number of
elements in Cr ,
^pi; ^ ic j 1 '< (1 Nr S r
)In/vr for i, j a Cr, (7.1)
where n j = lim n . p;^l, v, is the smallest positive integer such that p!yd > > 0
STEADYSTATE DISTRIBUTIONS 133
for all i, j e C and b r = min{ pij, d ) : i, j e C,}. Let 6 = min{b,: r = 0, 1, ... , d l}.
Then one has, writing L = min{ N,: 0 < r < d 1 }, v = max{v,: 0 < r < d 1 },
If i E C, and j c C S with s = r + m (mod d), then one gets, using the facts that
pkjd) = 0 if k is not in Cs , and that YkEC. pik ) = 1 ,
d+m) 71j )I
7I j l = I I Pik)(Pkjd) (I L5)'" for i e Cr, E Cs .
^ Pij j
keC s
(7.3)
Of course,
d+m') = 0 if i e C j E Cs , and m' s r (mod d). (7.4)
Note that {nj : j e Cr } is the unique invariant initial distribution on C r for the
restriction of p d to C,.
Now, in view of (7.4),
nd nI
= pit'd+m)
if je C j E Cs , and m = s r (mod d). (7.5)
t=1 no
If r = s then m = 0 and the index of the second sum in (7.5) ranges from 1 to
n. By (7.3) one then has
I nd
lim  Y pii = 7r J.
) (i, j ES). (7.6)
nac n t=1
n
n
lim  P _
) (i, j e S). (7.7)
nm nt=1 d
k 1 n
keS d
>i
, lim
kes noo

n [=I
P ik ) Pkj
n
7C
= lim  Y Pi;+ I) = (j e S). (7.8)
nxnI d
na
fj = 1 Y_ Y ^k Pkj =
=1 kES keS
nk ^ 1
n=1
Pk(
i
(7.10)
and zero probabilities to states not in 4 Then it is easy to check that ^6 = I a ; t" )
Let {X} be a Markov chain with countable state space S and transition
probability law p = ((p ij )). As in the case of random walks, the frequency of
returns to a state is an important feature of the evolution of the process.
PJ (X =j i.o.) = 1, (8.1)
and transient if
t^ =0,
} '}=min{n>0:X=j},
(8.3)
r=min{n> T:X=j} (r= 1,2,...),
Pj(T! ') < 00) = Pj (ii'  ' < oo and Xt,, + = i for some n i 1)
}
Therefore, by iteration,
Pj(T ! ' ) < cc) = Pj(i{ l } < cc)P'  ' = Pji P  ` (r = 2, 3, ...). (8.6)
Now
= ^Pii=1
1 i
= Jim P i (T;r ^ < oo) f (8.8)
r .. 0 if pii < 1.
Further, write N(j) for the number of visits to the state j by the Markov chain
{X}, and denote its expected value by
cc
E1N(j) _ Pj(N(j) > r) _ Pr+t < )
op) = Pji (8.10)
r=0 r=0 r=0
so that, if i 0 j,
0 if i +* j, i.e., if p ii = 0,
G(i, j) = Pii/(l  p) if i , j and p 1, (8.11)
00 if i + f and pii = 1.
Proposition 8.1
(a) Every state is either recurrent or transient. A state j is recurrent iff p ii = 1
iff G(j, j) = oc, and transient if pii < 1 if G(j, j) < oo.
(b) If j is recurrent, j * i, then i is recurrent, and pi; = p ii = 1. Thus,
recurrence (or transience) is a class property. In particular, if all states
communicate with each other, then either they are all recurrent, or they
are all transient.
(c) Let j be recurrent, and S(j) _ {i e S: j  i) be the class of states which
communicate with j. Let n be a probability distribution on S(j). Then
Proof. Part (a) follows from (8.8), (8.11). For part (b), suppose j is recurrent
and j  i (i j). Let A r denote the event that the Markov chain visits i between
the rth and (r + 1)st visits to state). Then under Pi , A, (r >, 0) are independent
events and have the same probability 6, say. Now 0 > 0. For if 0 = 0, then
PJ (X = i for some n >, 1) = Pj ((J r>0 A,) = 0, contradicting j  i. It now
follows from the second half of the BorelCantelli Lemma (Chapter 0, Lemma
6.1) that PJ (A. i.o.) = 1. This implies G(j, i) = oo. Interchanging i and j in (8.11)
one then obtains p,, = 1. Hence i is recurrent. Also, pi; >_ Pj (A r i.o.) = 1. By
the same argument, p ii = 1.
Hence
n
Note that G(i, i) = 1/(1 p a ), i.e., replace p, by 1 in (8.10).
For the simple random walk with p> Z, one has (see (3.9), (3.10) of
Chapter I),
j
)'
Proposition (8.1) shows that the difference between recurrence and transience
is quite dramatic. If) is recurrent, then P P (N(j) = cc) = 1. If] is transient, not
only is it true that P3 (N(j) < oo) = 1, but also E1 (N(j)) < oo. Note also that
every inessential state is transient (Exercise 7).
events have positive probability. If ax c 11, then I[ax + u]I = laxl. Since
0 is absorbing, all states x ; 0 must, therefore, be inessential, and thus transient.
To obtain convergence, simply note that the state space of the process started
at x is finite since, with probability 1, the process remains in the interval
[ Ixl, Ixl]. The finiteness is critical for this argument, as one can readily see
by considering for contrast the case of the simple asymmetric (p > 2) random
walk on the nonnegative integers with absorbing boundary at 0.
Consider now the kdimensional problem X n+ 1 = [AX n + U n+ 1 ], n = 0, 1, 2,
, where A is a k x k real matrix and {U n } is an i.i.d. sequence of random
vectors uniformly distributed over the kdimensional cube, [0, 1) k and [ ] is
defined componentwise, i.e., [x] = ([x 1 ], ... , [x k ]), x e 11 k It is convenient to
use the norm II II defined by Ilx lie := max{Ix,l, , x,j}. The ball
B,(0):= {x: Ilxllo < r} (of radius r centered at 0) for this norm is the square of
side length 2r centered at 0. Assume II Ax II o < II x II o, for all x 0 (i.e.,
IIAII := sup, , 0 I1AxiI 0 /Ilx11 0 < 1) as our stability condition. Once again we wish
to show that X n 0 as n > oo with probability 1. As in the onedimensional
case, 0 is an (absorbing) essential state that is accessible from every other state
x because there is a subset N(x) of [0, 1) k having positive volume such that for
each u c N(x), one has II[Ax + u]110 < IIAxllo < Ilxll o . So each state x # 0 is
inessential and thus transient. The result follows since the process starting at
x e 71k does not escape B 11x1 ^ 0 (0), since II[Ax + u]110 < I1Axllo + I < 11x1, + 1
and, since II[Ax + u]Il o and Ilx110 are both integers, (I[Ax + u](lo < 11x10
Linear models X n+ , = AX n + E n+ ,, with {E n } a sequence of i.i.d. random
vectors, are systematically analyzed in Section 13.
Zj r I I I
r
lim Z Z= EZ 1 , (9.3)
r+x r 1
provided that EIZ 1 I < oc. In what follows we will make the stronger assumption
provided
that
T 2,
NN =max{r>0:tjr'<n}. (9.5)
n 1 J( m)
m=0 r=1 n+I
S Al)
T
f(i) f(Xn)
x,=r T T
f(j) f(j) f(j) f(j) f(j)
 TI YTz
f(k)
k T T
f(xn1)
7...
U t I I
ZIA Zi Z, ... ZN
"I Z.N,
Sn
Figure 9.1
1 T 11)
The last sum on the right side of (9.6) has at most r;'"' ) r( N ^ ) summands,
this number being the time between the last visit to j by time n and the next
visit to j. Although this sum depends on n, under the condition (9.4) we still
have that (Exercise 1)
1T '" a l l 1 T'N' 1 1
Therefore,
S _ 1 N^ N 1 N^
2] Z,+R n = " E Z,+R n , (9.9)
n n, 1n
Nn ,= 1
where R n * 0 as n > oo with probability 1 under (9.4). Also, for each sample
path outside a set of probability 0, N N oo as n oo and therefore by (9.3)
(taking limit over a subsequence)
1 N^
lim Y Z r = EZ i (9.10)
n_^
if (9.4) holds. Now, replacing f by the constant function f  1 in (9.10), we have
THE LAW OF LARGE NUMBERS 141
_' = E(tj 2) T i

lim L_ (9.11)
nx Nn
T (N,J
nT (N " ) < T(N"+ 1)
J J J
Note that the right side, E(r 2) t'), is the average recurrence time of
j(= E j tj(' ) ), and the left side is the reciprocal of the asymptotic proportion of
time spent at].
Combining (9.9)(9.11) gives the following result. Note that positive recurrence
is a class property (see Theorem 9.4 and Exercise 4).
(b) If S comprises a single class of essential states (irreducible) that are all
positive recurrent, then (9.15) holds with probability I regardless of the
initial distribution.
1 n
lim  P;;) _ (9.16)
n n m=i Et
142 DISCRETEPARAMETER MARKOV CHAINS
f(j)
I for all k ^ j. (9.17)
f(k)=0
Then Zr  1 for all r = 0, 1, 2, ... since there is only one visit to state j in
(T cr) ,
i(r+1)] Hence, taking expectation on both sides of (9.15) under the initial
state i, one gets (9.16) after interchanging the order of the expectation and the
limit. This is permissible by Lebesgue's Dominated Convergence Theorem since
(n + I)' YOf(Xm)I < 1.
constitute an invariant distribution if the states are all positive recurrent and
communicate with each other. Let us first show that Y t it = I in this case. For
this, introduce the random variables
i.e., T i ' is the amount of time the Markov chain spends in the state i between
( )
the rth and (r + 1)th passages to j. The sequence {T: r = 1, 2, ...} is i.i.d. by
the strong Markov property. Write
Then,
03(i) _ EJ T; 1) = Ej ( T}1)) = E j ( t f2 _ r ) = .
(9.21)
ics 1 l l 7f
r = 1, 2, ...}, and by (9.12) and (9.18), the limit on the left side also equals
N
V T(r)
nJ IN, T1'
lim h = lim " N = n 1 Bj (i). (9.23)
n oc n x n \r1 n
Hence,
ni = n j Oj (i). (9.24)
(9.25)
71 f
By Scheffe's Theorem (Theorem 3.7 of Chapter 0) and (9.16),
In
 71i +0 as n ^ oo . (9.26)
iES nm =1
Hence,
1 1
Pam) Pij < TCi  Y pj, " ') > 0 as n
iES iES n m= 1 iES n m= 1
(9.27)
Therefore,
in
= hm  Y E p^.m)p . `
n^ao n m= 1 iES
1 n (m+l) _
= lim Pi; it; (9.28)
n'x n m=1
it = z niPij (m = 1, 2, ...).
; )
(9.29)
iES
144 DISCRETEPARAMETER MARKOV CHAINS
1
n; = Y_ ni
ieS n m=1
i Pi; (9.30)
If j is a null recurrent state and i..j, then for the Markov chain having initial
state i the sequence {Z,: r = 1, 2, ...} defined by (9.2) with f  1 is still an i.i.d.
sequence of random variables, but the common mean is infinity. It follows
(Exercise 3) that, with Pi probability 1,
j (N")
Since n we have
n+l
lim = cc,
n w Nn
and, therefore,
lim N = 0
" with P probability 1. (9.32)
". te n+l
Since 0 < N" /(n + 1) < 1 for all n, Lebesgue's Dominated Convergence
Theorem applied to (9.32) yields
N
lim E " ; = 0. (9.33)
n ^ n+1
But
lim p) = 0 (9.35)
n.o Il + 1 m=1
00
p;; ) <ao.
m=0
In particular, therefore,
if j is a transient state.
The main results of this section may be summarized as follows.
Theorem 9.4. Assume that all states communicate with each other. Then one
has the following results.
(a) Either all states are recurrent, or all states are transient.
(b) If all states are recurrent, then they are either all positive recurrent, or
all null recurrent.
(c) There exists an invariant distribution if and only if all states are positive
recurrent. Moreover, in the positive recurrent case, the invariant
distribution it is unique and is given by
(d) In case the states are positive recurrent, no matter what the initial
distribution p, if E,, I f (X 1 )I < cc, then
n
him !
 Z f(Xm) = Z nif(i) = Enf(X,) (9.38)
n c m=1 ieS
with P probability 1.
in the latter case). Thus (9.24) holds. Now O (i) > 0; for otherwise T = 0 with
probability 1 for all r > 0, implying p , = 0, which is ruled out by the assumption.
;
The relation (9.24) implies it > 0, since it > 0 and O (i) > 0. Therefore, all states
are positive recurrent.
For part (c), it has been proved above that there exists a unique invariant
probability distribution it given by (9.35) if all states are positive recurrent.
Conversely, suppose it is an invariant probability distribution. We need to show
that all states are positive recurrent. This is done by elimination of other
possibilities. If the states are transient, then (9.36) holds; using this in (9.29)
(or (9.30)) one would get it = 0 for all j, which is a contradiction. Similarly,
null recurrence implies, by (9.30) and (9.35), that ii = 0 for all j. Therefore, the
states are all positive recurrent. Part (d) will follow from Theorem 9.2 if:
(i) The hypothesis (9.14) holds whenever
and
=)
T!
Since ET;' = O (i) = n i /rr (see Eq. 9.24), taking expectations in (9.41) yields
Ta
The last equality follows upon interchanging the orders of summation and
expectation, which by Fubini's theorem is always permissible if the summands
are nonnegative. Therefore (9.14) follows from (9.39). Now as in (9.42),
T cn
where this time the interchange of the orders of summation and expectation is
justified again using Fubini's theorem by finiteness of the double "integral."
n
THE LAW OF LARGE NUMBERS 147
If the assumption that "all states communicate with each other" in Theorem
9.4 is dropped, then S can be decomposed into a set . t of inessential states and
(disjoint) classes S l , S 2 , ... , St of essential states. The transition probability
matrix p may be restricted to each one of the classes S..... , S, and the
conclusions of Theorem 9.2 will hold individually for each class. If more than
one of these classes is positive recurrent, then more than one invariant
distribution exist, and they are supported on disjoint sets. Since any convex
combination of invariant distributions is again invariant, an infinity of invariant
distributions exist in this case. The following result takes care of the set , of
inessential states in this connection (also see Exercise 4).
Proposition 9.5
(a) If j is inessential then it is transient.
(b) Every invariant distribution assigns zero probability to inessential,
transient, and null recurrent states.
Proof. (a) If j is inessential then there exist i E S and m > I such that
P;M > 0
) and p;; = 0
) for all n >, 1. (9.44)
Hence
Corollary 9.6. If S is finite, then there exists at least one positive recurrent
state, and therefore at least one invariant distribution n. This invariant
distribution is unique if and only if all positive recurrent states communicate.
Proof. Suppose it possible that all states are either transient or null recurrent.
Then
Since (n + 1) ' I, _ , p;T < I for all], and there are only finitely many states
)
1 1 "
lim Pi; ^) = lim Pi;
jes n n+ 1 m=1 nix jes n+ 1 m=o
1
= lim (m)
%^ij
n . x n + 1 m=0 jcs
n
=tim 1 i 1 =lim n1 =1. (9.47)
(n + 1 m=o
 nx,n+1
But the first term in (9.47) is zero by (9.46). We have reached a contradiction.
Thus, there exists at least one positive recurrent state. The rest follows from
Theorem 9.2 and the remark following its proof.
The same method as used in Section 9 to obtain the law of large numbers may
be used to derive a central limit theorem for S. = Ym =o f(Xm ), where f is a
realvalued function on the state space S. Write
_ n _
Sn = Y .J(X.) = Z (f(Xm)  U),
m=0 m=0
(10.3)
Zr = ^ J(X) (r = 0, 1, 2, ...).
m =,(i +1
Then by (9.40),
Thus {Z,: r = 1, 2, ...} is an i.i.d. sequence with mean zero and finite variance
or = EJ . (10.5)
Now apply the classical central limit theorem to this sequence. As r  cc,
(1/^) jk =1 Zk converges in distribution to the Gaussian law with mean zero
and variance U 2 . Now express Sn as in (9.6) with f replaced by f, S" by S,, and
THE CENTRAL LIMIT THEOREM FOR MARKOV CHAINS 149
Zr by Zr, to see that the limiting distribution of (1/^)S n is the same as that
of (Exercise 1)
1 N,, Na )112
(
1 N"
(10.6)
7 r i
n JNn r=1
We shall need an extension of the central limit theorem that applies to sums
of random numbers of i.i.d. random variables. We can get such a result as an
extension of Corollary 7.2 in Chapter 0 as follows.
Proposition 10.1. Let {XX : j >, 1} be i.i.d., EX^ = 0, 0 < a 2 := EX J2 < oc. Let
{v n : n >, l} be a sequence of nonnegative integervalued random variables with
v
lim " = a in probability (10.7)
n,, n
+
P( max
(mI (na1I <L 3 Ina17
ISm (n]1 % c([na]) 1J2 )
(10.8)
The first term on the right goes to zero as n > x, by (10.7). The second term
is estimated by Kolmogorov's Maximal Inequality (Chapter 1, Corollary 13.7),
as being no more than
S" Sfna1 + 0
in probability. (10.10)
1r2
1[na])
Since S1na1 /([na]) l " 2 converges in distribution to N(0, 1), it follows from (10.10)
that so does S"/([na])'/ z . The desired convergence now follows from (10.7).
n
By Proposition 10.1, N'2 1 Zr is asymptotically Gaussian with mean
zero and variance o z . Since N/n converges to (Er') ', it follows that the
expression in (10.6) is asymptotically Gaussian with mean zero and variance
150 DISCRETEPARAMETER MARKOV CHAINS
Wn(t) = Sin(]
n+l
(10.11)
I
l' (t) = W(t) + (nt [nt])X 1+ 1 (t '> 0),
n+l
(Exercise 2). In fact, convergence of the full distribution may also be obtained
by consideration of the above renewal argument. The precise form of the
functional central limit theorem (FCLT) for Markov chains goes as follows (see
theoretical complement 1).
Then,
/
E,J 2 (Xm)
n + 1 m=0
ABSORPTION PROBABILITIES 151
2
mm
= EJ 2(X0) + Y Y Erz[f
1 (Xm , )(P J )(XYr')]
n + 1 m=1 m=0
rn I
= EJ 2 (X0 ) + ^ Erz % (X0)(P m '.j)(X0)]
n + 1 m=1 m'=0
2 n m1
n + 1 m=1 m'=0
=Erzf 2 (X0)+ 2 f, 1 Y k
YPi ( k=mm ' )
n+ 1 m = 1 k= 1I I n
(10.14)
Now assume that the limit
m
y= lim I <f,P kfi rz (10.15)
mimeo k=1
Note that
and
m m
Y <j P k j >rz = COV{f(X0),J (Xk)l.
k=1 k=1
The condition (10.15) that y exists and is finite is the condition that the
correlation decays to zero at a sufficiently rapid rate for time points k units
apart as k , oo.
11 ABSORPTION PROBABILITIES
where Y* denotes summation over all mtuples (i,, i 2 , ... , i) of elements from
S { j}. Now let p denote the matrix obtained by deleting the jth row and jth
column from p,
and, therefore,
Observe that the above idea can be applied to the calculation of the first
passage time to any state j e S or, for that matter, to any nonempty set A of
states such that i 0 A. Moreover, it follows from Theorem 6.4 and its corollaries
that the rate of absorption is, therefore, furnished by the spectral radius of p .
This will be amply demonstrated in examples of this section.
where p is the matrix obtained by deleting the rows and columns of p
corresponding to the states in A.
In general, the matrix p is not a proper transition probability matrix since the
row sums may be strictly less than 1 upon the removal of certain columns from
p. However, if each of the rows in p corresponding to states j e A is replaced
by ej having 1 in the jth place and 0 elsewhere, then the resulting matrix ,
say, is a transition probability matrix and
ABSORPTION PROBABILITIES 153
The reason (11.8) holds is that up to the first passage time t o the distribution
of Markov chains having transition probability matrices p and (starting at i)
are the same. In particular,
In the case that there is more than one state in A, an important problem is
to determine the distribution of X,,, starting from i A. Of course, if
P; ('r A < co) < 1 then X, is a defective random variable under Pi, being defined
on the set {r A < co} of P; probability less than 1.
Write
Denoting by a j the vector (a(i): i E S), viewed as a column vector, one may
express (11.10) as
(11.13)
_ 1 ifi=j
(^ E A).
a ' (j) 0 if i E A, but i A j
1 ifi=j
(11.15)
a ' (i) f o if i e A, i^ j.
ifieA`,keA,and
for all je A`, since a(k) = 0 for k E A\{ j} and a(j) = 1. Hence a ; is the unique
solution of (11.13).
Conversely, if P; (T A < cc) < 1 for some in A`, then the function
h = (h(i): i e S) defined by
(2N)
P(Y+1 = k X o , X,, ... , X) 8n(1 B)zN k 
(11.23)
Notice that {X} is an aperiodic Markov chain. The "boundary states {0}
and {2N} form closed classes of essential states. The set of states
{ 1, 2, ... , 2N I } constitute an inessential class. The model has a special
conservation property of the form of the following martingale property,
for n = 0, 1, 2..... However, since S is finite we know that in the long run
{X} is certain to be absorbed in state 0 or 2N, i.e., the population is certain
to eventually come to a unanimous opinion, be it pro or con. It is of interest to
calculate the absorption probabilities as well as the rate of absorption. Here,
with A = {0, 2N}, one has p = p.
Let a(i) denote the probability of ultimate absorption at j = 0 or at j = 2N
starting from state i e S. Then,
156 DISCRETEPARAMETER MARKOV CHAINS
a 2N()
i = 2N , i =0, 1, 2,...,2N, (11.28)
= 2N i
ao(i) 2N ' i = 0, 1, 2, ... , 2N. (11.29)
2N
E' p j vj =2v i ,
; i =0,1,...,2N. (11.30)
j=o
The rth factorial moment of the binomial distribution (p ij : 0 < j < 2N) is
(2N rl
Z j(j 1)... (j r + l)pij =( 1r(2N)...
j=o 2N)
(2N r + 1)
j =r j r
(jJ_r i 2N j
x 2N) 1 2N
jj (11.31)
for r = 1, 2, ... , 2N. Equation (11.31) contains a transformation between
"factorial powers" and "ordinary powers" that deserves to be examined for
connections with (11.30). The "factorial powers" (j) r := j(j 1)... (j r + 1)
are simply polynomials in the "ordinary powers" and can be expressed as
r
j(1 1)(j r + 1) _ skjk. (11.32)
k =1
Likewise, "ordinary powers" jr can be expressed as polynomials in the "factorial
ABSORPTION PROBABILITIES 157
powers" as
r
J r = Sk(J)k, (11.33)
k=0
with the convention (j) 0 = 1. Note that S r' = 1 for all r > 0. The coefficients
Sr), {Sk} are commonly referred to as Stirling coefficients of the first and second
kinds, respectively.
Now every vector v = (v 0 , ... , v 2N )' may be represented as the successive
values of a unique (factorial) polynomial of degree 2N evaluated at 0, 1, . .. , 2N
(Exercise 7), i.e.,
2N
a r (j)r forj=0,1,...,2N. (11.34)
r=0
pij vj = L
j =0 r=0
Y pij(J)r r=0
j=0
ar Y a r
(2N)
r jr = Y ar r t J
2N n=0
r=0 (2N)
S( \
2N /2N
(2N)r (11.35)
= Y ar (2N )r S)(1)n.
n=0 r=n
_ _ (2 N )2N (2N)
^`'2N (2N)2N + a2N I,
\ 1 1.37 )
a2n
and a r = & 2N) , r = 2N 1,... 0, are solved recursively from (11.36). Next
takea2N 1) 0 , a2N1 1) = 1 and solve for
(2N)2N_ 1
(2N)2N1'
etc.
Then,
pV = VD, (11.39)
or
p = VDV 1 , (11.40)
so that
Since the left side of (11.42) must go to zero as m oo, the coefficients of
and A;  1 must be zero. Thus,
2N 2N1
Pi(T(0.2N) > m) _ Y, Y, Vikv k ^^k
k=O j=1
_ ^2 vi2v 2i ) + kY 1
Lt , E1 1 , Y1 1
Vik Uk,/ \^ 2/ m J
.f * `(j), ifi>'1,j>'0,
p ij = 1 ifi=0,j=0, (11.44)
0 ifi=0,j 0.
Since a power series can be differentiated term by term within its radius of
convergence, one has
i.e., if
= Y_
If the mean y of the number of particles generated by a single particle is finite,
i =1
ifU) < oo,
(11.49)
Since &'(z) > 0 for 0 < z < 1, 0 is strictly increasing. Also, since 4) "(z) (which
exists and is finite for 0 < z < l) satisfies
dz
zz O(z) _ j(j 1)f(j)z' 2 >0 for 0 < z < 1, (11.50)
i =2
the function 0 is strictly convex on [0, 1]. In other words, the line segment
joining any two points on the curve y = 4)(z) lies strictly above the curve (except
at the two points joined). Because 0(0) = f(0) >0 and 4)(l) _ f(j) = 1,
the graph of looks like that of Figure 11.1 (curve a or b).
The maximum of '(z) is y, which is attained at z = 1. Hence, in the case
> 1, the graph of y = 4)(z) must lie below that of y = z near z = 1 and, because
4)(0) = f(0) > 0, must cross the line y = z at a point z o , 0 < z o < 1. Since the
slope of the curve y = 4)(z) continuously increases as z increases in (0, 1), z o is
the unique solution of the equation z = 4)(z) that is smaller than 1.
160 DISCRETEPARAMETER MARKOV CHAINS
4(0) = f(0)
U Zo I z
Figure 11.1
In case u <, 1, y = cb(z) must lie strictly above the line y = z, except at z = 1.
For if it meets the line y = z at a point z o < 1, then it must go under the line
in the immediate vicinity to the right of z o , since its slope falls below that of
the line (i.e., unity). In order to reach the height 4(l) = 1 (also reached by the
line at the same value z = 1) its slope then must exceed 1 somewhere in (z o , 1];
this is impossible since 0'(z) < 0'(1) = p < I for all z in [0,1]. Thus, the only
solution of the equation z = 4(z) is z = 1.
Now observe
thus if p <, 1, then p = 1 and extinction is certain. On the other hand, suppose
p> 1. Then p is either z o or 1. We shall now show that p = z o (< 1). For this,
consider the quantities
That is, q is the probability that the sequence of generations originating from
a single particle is extinct at time n. As n increases, q. j p; for clearly,
{X=0}c{X,,,=0} for all m>,n,so that q<,qifn<m.Also
co
Since q 1 = f(0) = 4(0) < 4(z 0 ) = z o (recall that b(z) is strictly increasing in z
for 0 < z < 1), one has using (11.53) with n = 1, q 2 = o(q 1 ) < O(z o ) = z o , and
so on. Hence, q <z for all n. Therefore, p = lim n .. q < z o . This proves
p = zo.
If f(0) + f(1) = 1 and 0 <f(0) < 1, then q5"(z) = 0 for all z, and the graph
of 4(z) is the line segment joining (0, f(0)) and (1, 1). Hence, p = I in this case.
Let us now compute the average size of the nth generation. One has
_^ ^k P ^n1c11^
1j Pjk P1j
^i(k^111
Pjk l `
O ro
_ p E(XI I Xo =j)=
) pi jE(XI 1X0 = 1)
j=1 j=1
ao a.
_ Pi;Ju = P 1Pij = E(X I X o = 1). (11.54)
j= 1 j=1
It follows that
Thus, in the case < 1, the expected size of the population at time n decreases
to zero exponentially fast as n + co. If = 1, then the expected size at time n
does not depend on n (i.e., it is the same as the initial size). If > 1, then the
expected size of the population increases to infinity exponentially fast.
162 DISCRETEPARAMETER MARKOV CHAINS
regards the total potential energy as a sum of energies at individual sites plus
the sum of interaction energies between pairs of sites plus the sum of energies
between triples, etc., and the probability distribution is specified for various
types of such interactions. For example, if U(w) = > fEA q1(6 n ) for
ONEDIMENSIONAL NEARESTNEIGHBOR GIBBS STATE 163
Y_
C, except for the independent case it is not quite obvious how to do this starting
with energy considerations. First consider then the independent case. For
singlesite energies q 1 (s, n), se S. at the respective sites ne Z, the probabilities
of cylinder events can be specified according to the formula
That is, the state at n depends on the given states at sites in D \ {n} only through
the neighboring values at n 1, n + 1. One would like to know that there is
a probability distribution having these conditional probabilities. For the
onedimensional case at hand, we have the following basic result.
Theorem 12.1. Let S be an arbitrary finite set, S2 = S^, and let be the
sigmafield of subsets of ) generated by finitedimensional cylinder sets. Let
{Xn } be the coordinate projections on S2. Suppose that P is a probability measure
on S with the following properties.
(i) P(C) > 0 for every finitedimensional cylinder set C e S.
(ii) For arbitrary n c Z let 3(n) = {n 1, n + 1 } denote the boundary of {n}
in 7L. If f is any Svalued function defined on a finite subset D,, of 7/ that
contains {n} u (n) and if a e S is arbitrary, then the conditional
probability
P(X=a Xm = f(m),meD\{n})
Proof. Let
gb,C(a)=P(X=1 X. =b,X+ =c). (12.7)
...
P(XX = a o , X+ = a l , ... , X,,+n = ab) = ir(ao)p(ao, a, ) p(ab_ 1, ab). (12.8)
So, in particular, the condition (i) is satisfied; also see (2.9), (2.6), Prop. 6.1.
For m and n > I it is a straightforward computation to verify
Therefore, the condition (ii) holds for P. since condition (iii) also holds because
P is the distribution of a stationary Markov chain. Next suppose that P is a
probability distribution satisfying (i), (ii), and (iii). We must show that P is the
distribution of a stationary Markov chain. Fix an arbitrary element of S, denoted
as 0, say. Let the local structure of P be as defined in (12.7). Observe that for
each b, c E S. g( ) is a probability measure on S. Outlined in Exercise 1 are
the steps required to show that
q(b, a)q(a, c)
g(a) =q^z^(b ^) (12.10)
where
0 , (b)
q(b, c) = g (12.11)
g 9 (0)
and
q("+ '>
(b, c) = I q(b, a)q(a, c). (12.12)
aS
) (b, c) (c)
9(b, c) = q^" (12.15)
2 "u(b)
Let Q be the probability distribution of this Markov chain. It is enough to
166 DISCRETEPARAMETER MARKOV CHAINS
show that P and Q agree on the cylinder events. Using (12.13) we have for any
n 1,
(n)(a,, b)
(n) (a , ao)9(ao, a,) ... q(ar,, ar)q
_ P(Xn = a, Xn+r = b) q q(r+2n)(a, b)
aES bES
random variables defined on some probability space (f), .F, P). Given an initial
random variable X 0 independent of {E}, define recursively the sequence of
random variables {X": n > 0} as follows:
and le n s < c with probability I for some constant c. Then it follows from (13.5)
that
This implies that, with probability 1, Ib "E" +l l < c(Ibj6)" for all but finitely many
n. Since IbI b < 1, the series on the right side of (13.7) is convergent and is the
limit of Y".
It is simple to check that (13.8) holds if I bi < 1 and (Exercise 3)
The conditions (13.6) and (13.8) (or (13.9)) are therefore sufficient for the
existence of a unique invariant probability it and for the convergence of X. in
distribution to it.
As in (13.2), (13.3), {X"} is a Markov process with state space 68'" and transition
probability
Assume that
where Ixl denotes the Euclidean length of x in LRm. For a positive integer n > n o
write n = jn o + j', where 0 < j' < n o . Then using the fact IIB1B2II IIB1II IIB2II
for arbitrary m x m matrices B,, B 2 (Exercise 2), one gets
1
" ^ 1 P(i 1 I > cb") < oo for some 6 < II Bn0Il11n0 . (13.15)
It also follows, as in Example 1 (see Exercise 1), that no matter what the initial
distribution (i.e., the distribution of X 0 ) is, X n converges in distribution to the
distribution it of Y. Therefore, it is the unique invariant distribution for p(x, dy).
For purposes of application it is useful to know that the assumption (13.12)
holds if the maximum modulus of eigenvalues of B, also known as the spectral
radius r(B) of B, is less than 1. This fact is implied by the following result from
linear algebra.
Proof. Let A,, ... , A m be the eigenvalues of B. This means det(B Al) =
(^ 1 A)(A2 A) * (Am A), where det is shorthand for determinant and I is
the identity matrix. Let A m have the maximum modulus among the A ; , i.e.,
J > Iiml then B Al is invertible, since det(B AI) 0. Indeed,
I'ml = r(B). If Al
by the definition of the inverse, each element of the inverse of B Al is a
polynomial in A (of degree m 1 or m 2) divided by det(B AI). Therefore,
one may write
where B,(0 < j < m 1) are m x m matrices that do not involve A. Writing
z = 1/A, one may express (13.18) as
170 DISCRETEPARAMETER MARKOV CHAINS
m1
(B AI)1 = (_)_m(1 /^1,)1.. .(1 )!mR)12m1 j=0
m1
= (1)mz(l .l,z) 1 ...(I .mZ) 1 I z m 1, Bj
j=0
Z Y a,, Z n) 1 Zm1jBj
(IZI < I2ml1 ). (13.19)
^ n=0 /0
To see this, first note that the series on the right is convergent in norm for
Izl < 1 /1111, and then check that termbyterm multiplication of the series I z k B k
by I zB yields the identity I after all cancellations. In particular, writing b;j'^
for the (i, j) element of B k , the series
X
zY_z k b; (13.21)
k=0
converges absolutely for Izl < 1/ B. Since (13.21) is the same as the (i, j) element
of the series (13.19), at least for Izl < I/IIBII, their coefficients coincide (Exercise
4) and, therefore, the series in (13.21) is absolutely convergent for Izl < IAm) '
(as (13.19) is).
This implies that, for each e > 0,
For if (13.22) is violated, one may choose Izl sufficiently close to (but less than)
1/I ; m l such that Iz^ k ' ) b'>l  co for a subsequence {k'}, contradicting the
requirement that the terms of the convergent series (13.21) must go to zero for
IZI < 1/IAmI
Now IIB k ll < m' 12 max{Ib>I: 1 < i, j < m} (Exercise 2). Since m l/21 ^ 1 as
k * co, (13.22) implies (13.17). n
Two wellknown time series models will now be treated as special cases of
Example 2. These are the pth order autoregressive (or AR(p)) model, and the
autoregressive movingaverage model ARMA(p, q).
p1
U+1 + 1
Un+p iln+p (n >, 0). (13.23)
i =0
0 1 0 0 0 0
0 0 1 0 .. 0 0
B:= . (13.27)
0 0 0 0 ... 0 1
). 1 0 0 0 0
0 ) 1 0 0 0
BA.(= ... .
0 0 0 0 ^ 1
Expanding det(B Al) by its last row, and using the fact that the determinant
of a matrix in triangular form (i.e., with all zero offdiagonal elements on one
side of the diagonal) is the product of its diagonal elements (Exercise 5), one gets
1
det(B AI) = (  1 p+1 ) (0 + 1A + ... + p ,; ,p 

A"). (13.28)
0+1A+ ...
+pI )pI Ap = 0 . (13.29)
Finally, in view of (13.17), the following proposition holds (see (13.15) and
Exercise 3).
Proposition 13.1. Suppose that the roots of the polynomial equation (13.29)
are all strictly inside the unit circle in the complex plane, and that the common
distribution G of {tl"} satisfies
where JA m t is the maximum modulus of the roots of (13.29). Then (i) there exists
a unique invariant distribution 71 for the Markov process {X"}, and (ii) no
matter what the initial distribution, X n converges in distribution to it.
Once again it is simple to check that (13.30) holds if G has a finite absolute
moment of some order r > 0 (Exercise 3).
An immediate consequence of Proposition 13.1 is that the time series
{ Un : n >, 0} converges in distribution to a steady state n U given, for all Bore!
sets C R', by
To see this, simply note that U. is the first coordinate of X, so that X,, converges
to it in distribution implies U. converges to it in distribution.
p I q
Un+p := Z i Un +i + 1 Sj^1 n+pj + rin+p (n 0), (13.32)
i =0 j=1
where p, q are positive integers, i (0 < i < p 1) and bj (1 < j < q) are real
constants, {ri n : n >, p q} is an i.i.d. sequence of realvalued random variables,
and U. (0 < i <, p 1) are arbitrary initial random variables independent of
{rj"}. Consider the sequence {X"}, {s"} of (p + q)dimensional vectors
A MARKOVIAN APPROACH TO LINEAR TIME SERIES MODELS 173
X,,+1 = HXn + En+ 1 (n > 0 ), (13.34)
b ll . ... b 1 0 . ... 0 0
h nl h ... b 2 d1
d91
0 00 1 0 .. 00
H :=
0 00 0 1 00
0 0 ... 0 0 01
0 0 . 0 0 ... 00
Therefore, the eigenvalues of H are q zeros and the roots of (13.29). Thus, one
has the following proposition.
no matter what the distribution of (U0 , U ... , U p _,) is, provided the
hypothesis of Proposition 13.2 is satisfied.
In the case that E n is Gaussian, it is simple to check that under the hypothesis
(13.12) in Example 2 the random vector V in (13.16) is Gaussian. Therefore, it
is Gaussian, so that the stationary vectorvalued process {X n } with initial
distribution it is Gaussian (Exercise 6). In particular, if q n are Gaussian in
Example 2(a), and the roots of the polynomial equation (13.29) lie inside the
unit circle in the complex plane, then the stationary process {Un }, obtained
174 DISCRETEPARAMETER MARKOV CHAINS
lt will be shown now that the following condition guarantees the existence
of a unique invariant probability it as well as stability, i.e., convergence of X(x)
in distribution to it for every initial state x. Assume
6, := P(X o (x) '< z 0 Vx) >0 and 82 := P(X o (x) z 0 Vx) >0
(14.4)
for some zo E J and some integer n o .
Define
A,:= sup I P(X,(x) '< z) P(X,(y) '< z). (14.5)
x,y,zcJ
For this fix x, y E J and first take z < z o . On the set F2 :_ {X 0 (x) >, z 0 Vx} the
176 DISCRETEPARAMETER MARKOV CHAINS
events {X 0 (x) <, z}, {X 0 (y) < z} are both empty. Hence, by the second
condition in (14.4),
IP(Xno (x) < z) P(Xn o (y) < z)I = IE(1{x^o(x)sz) 1{x,o(Y),z))I P(FZ) = 1 6 2 ,
(14.7)
since the difference between the two indicator functions in (14.7) is zero on F2 .
Similarly, if z > z o then on the set F, :_ {X 0 (x) <, z 0 Vx} the two indicator
functions both equal 1, so that their difference vanishes and one gets
IP(X^ o (x) < z) P(Xn o (y) < z)] < P(F) = 1 6 1 . (14.8)
For,
= I E ( 1 {ai . +un o ... ain o +IXjn o (x) z) 1 (au+IW o ...ai no + Xjn o (Y)Sz) ) I
= I E(l {xjno(x) 2 ( ( Itj+1mo ... n 0+1)  '( aO.zJ) 1 {Xjno(Y)e(a(j+ llno ... a^no+1)  '( ao.zJ) )1 '
(14.13)
Let
F3 1 = {a(j + 1) no ...aj no+ lx < zOVx}, F4:= {a(j+1) no ...aj np +1X s Zpdx}.
MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS 177
Take z < z o first. Then the inverse image of (x,:] in (14.13) is empty on
F4 , so that the difference between the two indicator functions vanishes on F 4 .
On the complement of F,, the inverse image of ( ao, z] under the continuous
increasing map a (j+1)n0 ' *anno + , is an interval (oo, Z'] n J, where Z' is a
random variable. Therefore, (14.13) leads to
As F and Z' are determined by a+,)n o , ... , a;n.+, and the latter are
independent of Xj(x), X 0 (y) one gets, by taking conditional expectation given
1aU+1)no, ... , ajno+1
Y0(x) = x, }(x):=12. (n
(n 1). (14.16)
and the cp (n >, 1) are independent. Thus the Markov process {X(x): n >, 0}
on the state space (0, cc) may be represented as
X.(X) = a( ...p(IX,
where, writing
one has
i.e., lim x10 g,(x) > I for all r. As lim x ^ g(x) = 0 + (1 /3) lim x . f(x) = 0 < 1,
it follows from the strictly increasing and strict concavity properties of g r that
each g r has a unique fixed point a r (see Figure 14.1)
Note that by property (iii) of fr , a, < a z < < a N . If y >, a,, then
9r(Y) % g(a 1 ) % g 1 (a 1 ) = a,, so that X,,(x) >, a, for all n > 0 if x >, a,. Similarly,
if y < a N then gr(Y) 9 r(aN ) ' 9N(aN) = a N , so that X,,(x) < a N for all n >, 0 if
x < a N . As a consequence, if the initial state x is in [a,, a N ], then the process
n >, 0} remains in [a,, a N ] forever. In this case, one may take
J = [a,, a N ] to be the effective state space. Also, if x > a, then the nth iterate
of g i , namely gin ) (x), decreases as n increases. For if x a,, then g,(x) < x,
8121 (x) = g, (g, (x)) < g, (x), etc. The limit of this decreasing sequence is a fixed
point of g l (Exercise 3) and, therefore, must be a,. Similarly, if x < a N then
g(x) increases, as n increases, to a N . In particular,
no)
91 (aN) < 9 N< 0) (a,). (14.22)
a,
u a,
Figure 14.1
180 DISCRETEPARAMETER MARKOV CHAINS
P(X" O (x) <z O Vxc[a i ,a N ])>P(oc"=g l for I <,n <n o )=pi 0 >0,
P(X"o(x) >, z 0 Vx e [a1, aN]) % P(a = gN for 1 < n < n o ) = pn > 0.
Hence, the condition (1.4.4) of Example I holds, and there exists a unique
invariant probability it, if the state space is taken to be [a,, a N ].
Next fix the initial state x in (0, a 1 ). Then g(x) increases, as n increases.
The limit must be a fixed point and, therefore, a 1 . Since g(a 1 ) > a, for
r = 2, ... , N, there exists s > 0 such that g(y) > a l (2 r < N) if
ye [a 1 e, a 1 ]. Now find n E such that g(x) >, a i e. If T, inf{n > 1:
X(x) >_ a,}, then it follows from the above that
because t, > n E + k implies that the last k among the first n e + k function a"
are g 1 . Since p; goes to zero as k + oo, it follows from this that i l is a.s. finite.
Also XTI (x) < a N as g(y) < gr(aN) gN(aN) = aN (t < r < N) for y <, a 1 , so that
in a single step it is not possible to go from a state less than a 1 to a state larger
than a N . By the strong Markov property, and the result in the preceding
paragraph on the existence of a unique invariant distribution and stability on
[a l , a N ], it follows that XL , + ,(x) converges in distribution to it, as m o0
(Exercise 5). From this, one may show that pl"'(x, dy) converges weakly to it(dy)
for all x, as n + oo, so that it is the unique invariant probability on (0, oo)
(Exercise 5).
In the same manner it may be checked that X(x) converges in distribution
to it if x > a N . Thus, no matter what the initial state x is, X(x) converges in
distribution to n. Therefore, on the state space (0, cc) there exists a unique
invariant distribution it (assigning probability 1 to [a,, a N ]), and stability holds.
In analogy with the case of Markov chains, one may call the set of states
{x; 0 < x <a 1 or x > a N } inessential.
The study of the existence of unique invariant probabilities and stability is
relatively simpler for those cases in which the transition probabilities p(x, dy)
have a density p(x, y), say, with respect to some reference measure (dy) on the
state space. In the case of Markov chains this measure may be taken to be the
counting measure, assigning mass I to each singleton in the state space. For a
class of simple examples with an uncountable state space, let S = Il' and f a
bounded measurable function on I8', a < f (x) < b. Let {E"} be an i.i.d. sequence
of realvalued random variables whose common distribution has a strictly
positive continuous density cp with respect to Lebesgue measure on 1. Consider
the Markov process
with X0 arbitrary (independent of {E"}). Then the transition probability p(x, dy)
Note that
where
Then (see theoretical complement 6.1) it follows that this Markov process has
a unique invariant probability with a density n(y) and that the distribution of
X. converges to it(y) dy, whatever the initial state.
The following example illustrates the dramatic difference between the cases
when a density exists and when it does not.
x+l if2x0,
(14.26)
Lx1 if 0<x^2.
First let E be Bernoulli, P(E = I) = '' = P(f _ I). Then, with X.  x e (0, 2],
X 1 (x) =
I x 2
x
with probability ' ,
with probability i,
In other words, X 1 (x) and X 2 (x) are independent and have the same twopoint
distribution It s . It follows that { X(x): n >, I } is i.i.d. with common distribution
m. In particular, n x is an invariant initial distribution. If x e [2,0], then
{X(x): n > l} is i.i.d. with common distribution nx+2, assigning probabilities
Z and Z to {x + 2} and {x}. Thus, there is an uncountable family
of invariant initial distributions {n r : 0 < x < 1) v {n x+ ,: I _< x 0}.
On the other hand, suppose s is uniform on [ 1, 1], i.e., has the density z
on [ 1, 1] and zero outside. Check that (Exercise 6) {X 2 (x): n > l} is an i.i.d.
sequence whose common distribution does not depend on x and has a density
7 r(Y) = 2 4 IYI
2<y<2. (14.27)
C CC
iffX,=x>c+ c + +..+ 
EI E1E2 E1E2...sn+1
{Xn >cforalln}={x>c+ +
_  ++ 1 foralln}
l E1 6162 E1gZ...En
)
` D
=jxiC+cF
1 llj^
( 1 0D x
,<1
l n=1 6 I E Z ...E n ) In=1 E182...En C
}
In other words, Ix
p(x)=P1 x 1 (14.31)
(n=j E 1 g 2 C,, C
so that log E 1 c 2 E p oo a.s., or e,e 2 E n , 0 a.s. This implies that the
infinite series in (14.31) diverges a.s., that is,
p(x) = 0 for all x, if E log E, < 0. (14.32)
Now by Jensen's Inequality (Chapter 0, Section 2), E log E, < log EE,, with
strict inequality unless E, is degenerate. Therefore, if EE, < 1, or EE, = 1 and
r, is nondegenerate, then E log e, < 0. If E, is degenerate and EE 1 = 1, then
P(E 1 = 1) = 1, and the infinite series in (14.31) diverges. Therefore, (14.32)
implies
p(x) = 0 for all x, if EE 1 < 1. (14.33)
It is not true, however, that E log E, > 0 implies p(x) = 1 for large x. To see
this and for some different criteria, define
This is possible, as 11 (1 + 1/r 2 ) < exp{j] l/r 2 } < oo. If m < I then
P(E, < I + 1/r 2 ) > 0 for all r >, 1. Hence,
m
<1 ifx<c
m 1
P(x) _ (m > 1). (14.37)
m
=1 ifx>c
m
1 < 1 1
n =1 E1...E,, n =1 m m 1
with probability 1 (if m > 1). Therefore, (14.31) implies the second relation in
n(S) 1 1 n(S) 1 1 a
r 1 (m +b )...(
i m+8 r ) > = 1 mr
2. (14.39)
Then
r.l m
Y
5P1 1 >> 1 a1 =P
\ r =1 ElE2...Er r =1 m
r )
1 > 1 SI.
(r11r m )
Y
If (5 >0 is small enough, the last probability is smaller than P(I 1/(E, E r ) >
x/c 1), provided x/c 1 < 1/(m 1), i.e., if x < cm/(m 1). Thus for such
x one has 1 p(x) > 0, proving the first relation in (14.37).
It = limsup ,. (15.2)
t.x
^ , ( 15.3 )
log M log M
in the sense that the compression coefficient of a code is never smaller than
this, although there are codes whose coefficient is arbitrarily close to it.
The parameter H = p, log p ;j is referred to as the entropy of the
;
that, given the transition law of the Markov chain, the optimal compression
coefficient may easily be computed from (15.3) once the invariant initial
distribution is determined.
For a word a of length t let p,(a) = P((Xo , ... , X, _ ,) = a). Then,
as t + oo with probability 1 (Exercise 4); i.e., for almost all sample realizations,
for large t the probability of the sequence X0 , X,, ... , X,_, is approximately
exp{ tH}. The result (15.5) is quite remarkable. It has a natural generalization
that applies to a large class of stationary processes so long as a law of large
numbers applies (Exercise 4).
An important consequence of (15.5) that will be used below is obtained by
considering for each t the M` words of length t arranged as x ( , x ( ,., ... in order
of decreasing probability. For any positive number a < 1, let
log N,(c)
lim   = H. (15.7)
C ^ log
Pt(Xo, ... , X' ' )
t
H < v )>
with probability at least I . Let R, denote the set consisting of all words x
of length t such that e  ` ( " + }' ) < p 1 (a) < e  ` ( "  ' ) . Fix t larger than T. Let
Sr = {a ) , a (2 ... , ; N , (e }. The sum of the probabilities of the M,(E), say, words
a of length t in R, that are counted among the N,(e) words in S, equals
Yr s, u R, p,(a) > r 5 by definition of N,(e). Therefore,
Also, none of the elements of S, has probability less than exp{ t(H + y)}, since
the set of all a with p,(a) > exp{ t(H + y)} contains R, and has total probability
larger than I > E. Therefore,
log N,(e)
<H+y.
t
On the other hand, by (15.8),
Again taking logarithms and now combining this with (15.9), we get
log N,(E)
t
Then
e`H'M
#J < M + M 2 + + MIIH'IIogM] < M(1H'jiogM) { l + 1/M + _ ,
M1
(15.12)
since the number of codewords of length k is M k . Now observe that
tH'
t,=Ec(X1,...,X^)%lo MP{(X...,X,)eJi}
g
Therefore,
Now observe that for any positive number e < 1, for the probability
P({(X,, ... , Xr ) e J}) to exceed E requires that N,(E) be smaller than #J,. In
view of (15.12) this means that
Now by Proposition 15.1 for any given t this can hold for at most finitely many
values of t. In other words, we must have that the probability
P({ (X,, ... , X,) e Jr }) tends to 0 in the limit as t grows without bound. Therefore,
(15.14) becomes
H' H26
(15.17)
log M log M
EXERCISES 189
for all sufficiently large t. That is, the number of (relatively) highprobability
words of length t, the sum of whose probabilities exceeds I e, is no greater
than the number M` (H +y ) / I og At of words of length t(H + y)/log M. Therefore,
there are enough distinct sequences of length t(H + y)/log M to code the
N (1 e) words "most likely to occur." For the lowerprobability words, the
sum of whose probabilities does not exceed 1 (I e) = t, just code each one
as itself. To ensure uniqueness for decoding, one may put one of the previously
unused sequences of length t(H + y)/log M in front of each of the selfcoded
terms. The length c(X0 , X,.... , X,_,) of codewords for such a code is then
either t(H + y)/log M or t + t(H + y)/log M, the latter occurring with
probability at most e. Therefore,
t(H + y) [ _
t(H + y)l t(H + 8)
Ec(X0 , X 1 .... , X1 _ t ) ^ ^
l + t + (15.19)
g lo g M F lo g M
EXERCISES
5. (i) Let {Xn } be a sequence of random variables with denumerable state space S.
Call {Xn } rth order Markovdependent if
EXERCISES 191
3. (Random Walks on a Group) Let G be a finite group with group operation denoted
by. That is, G is a nonempty set and is a welldefined binary operation for G
such that (i) if x, y a G then x p+ y e G; (ii) if x, y, z e G then x (D(v z) = (v J y) z;
(iii) there is an e E G such that x (D e = e p+ x = x for all x e G; (iv) for each XE G
there is an element in G, denoted x, such that x $(x) = (_r) +Q x = e. If (1 is
commutative, i.e., x Q+ y = y O + x for all x, y E G, then G is called abelian. Let
X 1 , X 2 , ... be i.i.d. random variables taking values in G and having the common
probability distribution Q(g) = P(X = g), g e G.
(i) Show that the random walk on G defined by S = X 0 Q+ X i Q+ O+ X, ri _> 0, is
a Markov chain and calculate its transition probability matrix. Note that it is
not necessary for G to be abelian for {S} to be Markov.
(ii) (TopIn Card Shuffles) Construct a model for card shuffling as a Markov chain
on a (nonabelian) permutation group on N symbols in which the top card of
the deck is inserted at a randomly selected location in the deck at each shuffle.
(iii) Calculate the transition probability matrix for N = 3. [Hint: Shuffles are of the
form (c1, c2, c 3) s (c2. c1, c 3) or (c2, c3, C1) only.] Also see Exercise 4.5.
An individual with a highly contagious disease enters a population. During each
subsequent period, either the carrier will infect a new person or be discovered and
removed by public health officials. A carrier is discovered and removed with
probability q = I p at each unit of time. An unremoved infected individual is sure
to infect someone in each time unit. The time evolution of the number of infected
individuals in the population is assumed to be a Markov chain {X: n = 0, 1, 2, ...}.
What are its transition probabilities?
5. The price of a certain commodity varies over the values 1, 2, 3, 4, 5 units depending
on supply and demand. The price X at time n determines the demand D. at time n
through the relation D. = N X, where N is a constant larger than 5. The supply
C. at time n is given by CC = N 3 + E. where {F} is an i.i.d. sequence of equally
likely 1valued Bernoulli random variables. Price changes are made according to
the following policy:
(i) Fix X0 = i o . Show that {X} is a Markov chain with state space S = { 1, 2, 3, 4, 5}.
(ii) Compute the transition probability matrix of {X}.
(iii) Calculate the twostep transition probabilities.
6. A reservoir has finite capacity of h units, where h is a positive integer. The daily
inputs are i.i.d. integervalued random variables {J: n = 1, 2, ...} with the common
p.m.f. {g j = P(J = j), j = 0, 1, 2, ...}. One unit of water is released through the dam
at the end of each day provided that the reservoir is not empty or does not exceed
its capacity. If it is empty, there is no release. If it exceeds capacity, then the excess
water is released. Let X. denote the amount of water left in the reservoir on the nth
day after release of water. Compute the transition matrix for {X}.
7. Suppose that at each unit of time each particle located in a fixed region of space has
probability p, independently of the other particles present, of leaving the region. Also,
192 DISCRETEPARAMETER MARKOV CHAINS
at each unit of time a random number of new particles having Poisson distribution
with parameter ) enter the region independently of the number of particles already
present at time n. Let X. denote the number of particles in the region at time n.
Calculate the transition matrix of the Markov chain {X"}.
8. We are given two boxes A and B containing a total of N labeled balls. A ball is
selected at random (all selections being equally likely) at time n from among the N
balls and then a box is selected at random. Box A is selected with probability p and
B with probability q = I p independently of the ball selected. The selected ball is
moved to the selected box, unless the ball is already in it. Consider the Markov
evolution of the number X. of balls in box A. Calculate its transition matrix.
9. Each cell of a certain organism contains N particles, some of which are of type A
and the others type B. The cell is said to be in state j if it contains exactly j particles
of type A. Daughter cells are formed by cell division as follows: Each particle replicates
itself and a daughter cell inherits N particles chosen at random from the 2j particles
of type A and the 2N 2j particles of type B present in the parental cell. Calculate
the transition matrix of this Markov chain.
3. Let p = ((p i; )) denote the transition matrix for the unrestricted general random walk
of Example 6.
(i) Calculate p;t interms of the increment distribution Q.
(ii) Show that p = Q*"(j i), where the nfold convolution is defined recursively by
4. Verify each of the following for the Plya urn model in Example 8.
(i) P(X" = 1) = r/(r + b) for each n = 1, 2, 3, ... .
(ii) P(X1 = e1,...,Xn =En)= P(Xi+n=x ,...,X. +h= ),foranyh=0,1,2,...
(*iii) {X is a martingale (see Definition 13.2, Chapter I).
X}
5. Describe the motion represented by a Markov chain having transition matrix of the
following forms:
10
(i) = 0 1 ],
01
=[1 0J'
(ii)
I
5 S
(iii) p =
ss
EXERCISES 193
(iv) Use the probabilistic description to write down p" without algebraically
performing the matrix multiplications. Generalize these to mstate Markov
chains.
(Length of a Queue) Suppose that items arrive at a shop for repair on a daily basis
but that it takes one day to repair each item. New arrivals are put on a waiting list
for repair. Let A. denote the number of arrivals during the nth day. Let X. be the
length of the waiting list at the end of the nth day. Assume that A,, A,, .. . is an i.i.d.
nonnegative integervalued sequence of random variables with a(x) = P(A = x),
x = 0, 1, 2, .... Assume that A + , is independent of X o X (n _> 0). Calculate
, ... ,
_<
7. (Pseudo Random Number Generator) The linear congruential method of generating
integer values in the range 0 to N I is to calculate h(x) = (ax + c) mod(N) for
some choice of integer coefficients 0 a, c < N and an initial seed value of x.
More generally, polynomials with integer coefficients can be used in place of ax + c.
Note that these methods cycle after N iterations.
(i) Show that the iterations may be represented by a Markov chain on a circle.
(ii) Calculate the transition probabilities in the case N = 5, a = 1, c = 2.
(iii) Calculate the transition probabilities in the case h(x) _ (x 2 + 2) mod(5).
[See D. Knuth (1981), The Art of Computer Programming, Vol. lt, 2nd ed.,
AddisonWesley, Menlo Park, for extensive treatments.]
(A Renewal Process) A system requires a certain device for its operation that is
subject to failure. Inspections for failure are made at regular points in time, so that
an item that fails during the nth period of time between n 1 and n is replaced at
time n by a device of the same type having an independent service life. Let p denote
the probability that a device will fail during the nth period of its use. Let X. be the
age (in number of periods) of the item in use at time n. A new item is started at time
n = 0, and X. = 0 if an item has just been replaced at time n. Calculate the transition
matrix of the Markov chain {X}.
A balanced sixsided die is rolled repeatedly. Let Z denote the smallest number of
rolls for the occurrence of all six possible faces. Let Z, = 1, Z ; = smallest number of
tosses to obtain the jth new face after j 1 distinct faces have occurred. Then
Z=Z 1 ++Z 6 .
(i) Give a direct proof that Z,, ... , Z 6 are independent random variables.
(ii) Give a proof of (i) using the strong Markov property. [Hint: Define stopping
times t, denoting the first time after r j _, that X. is not among X,, .... X T
N )(
Ik
(iv) P(T > m) = k ^^ ( I)k+l(
where (i, i 1, ... , I> is the permutation in which the ith value moves to i 1,
i I to i 2, ... 2 to 1, and 1 to i. Let S o be the identity permutation and let
S = X,. . . X, where the group operation is being expressed multiplicatively. Let T
denote the first time the original bottom card arrives at the top and is inserted back
into the deck (cf. Exercise 2.3). Then
(i) T is a stopping time.
(ii) T has the additional property that P(T = k, Sk = g) does not depend on g e G.
[Hint: Show by induction on N that at time T I the (N 1)! arrangements
of the cards beneath the top card are equally likely; see Exercise 2.3(iii).]
(iii) Property (ii) is equivalent to P(Sk = g I T = k) = 1/1G 5 I; i.e., the deck is mixed
at time T. This property is referred to as the strong uniform time property by
D. Aldous and P. Diaconis (1986), "Shuffling Cards and Stopping Times, Amer.
Math. Monthly, 93, pp. 333348, who introduced this example and approach.
EXERCISES 195
_< _<
(iv) Show that
P(ScA)=P(SEA,T_<n)+P(SeA,T>n)
0 _< _<
r
A
IGNI
I B + rP(T > n),
= I I P(T n) + P(S e A T > n)P(T > n) = Al
IGNJ
(ii) Show that {1'} is a Markov process and calculate its transition probabilities.
(iii) Extend (ii) to the case when the distribution function of Xk is continuous.
j 0 0 0 j 0 0
0 0 0; 0 0 2
I 1 1 1 1 1
6 6 6 6 6 6
0 1 0 0 0 1 0
5 0 0 0 3 00
0 0 0 6 0 0 6`
0 * 0 0 0 3 0
4. Suppose that S comprises a single essential class of aperiodic states. Show that there
is an integer v such that p$ 1 > 0 for all i, j e S by filling in the details of the following
steps.
(i) For a fixed (i, )), let B i; _ {v > 1: p;J) > 0}. Then for each state j, B is closed
under addition.
(ii) (Basic Number Theory Lemma) If B is a set of positive integers having greatest
common divisor 1 and if B is closed under addition, then there is an integer b
such that ne B for all n >, b. [Hints:
(a) Let G be the smallest additive subgroup of Z that contains B. Then argue
that G = Z since if d is the smallest positive integer in G it will follow that
if n E B, then, since n = qd + r, 0 <, r < d, one obtains r = n qd E G and
hence r = 0, i.e., d divides each n e B and thus d = 1.
(b) If leB,theneachn=l+1++Ie B. If10B,thenby(a), 1 =a
for a, E B. Check b = ( a + ) Z + 1 suffices; for if n > (a + ) 2 , then, writing
(iii) For each (i, j) there is an integer b;; such that v b implies v e B.. [Hint:
;;
Obtain b from (ii) applied to (i) and then choose k such that p;; 9 > 0. Check
that b = k + b suffices.]
;; ;;
(iv) Check that v = max {b : i, j e S} suffices for the statement of the exercise.
;;
5. Classify the states in Exercises 2.4, 2.5, 2.6, 2.8 as essential and inessential states.
Decompose the essential states into their respective equivalence classes.
6. Let p be the transition matrix on S = {0, 1, 2, 3} defined below.
2 0 ' 0 '2
20 Z 0
2 0 1 0 '2
2 0 2 0
Show that S is a single class of essential states of period 2 and calculate p" for all n.
7. Use the strong Markov property to prove that if j is inessential then P,,(X,, =j for
infinitely many n) = 0.
8. Show by induction on N that all states communicate in the TopIn Card Shuffling
example of Exercises 2.3(ii) and 4.5.
EXERCISES 197
solution z = (z, z"); recall that det(B) = det(B') for any N x N matrix B.]
(ii) Show that A = 1 must be a simple eigenvalue of A (i.e., geometric multiplicity
1). [Hint: Suppose z is any (left) eigenvector corresponding to A = 1. By the
results of this section there must be an invariant distribution (positive
eigenvector) n. For t sufficiently large z + to is also positive (and normalizable).]
*8. Let A = ((a ;j )) be a N x N matrix with positive entries. Show that the spectral radius
is also given by min{.l > 0: Ax < Ax for some positive x}. [Hint: A and its transpose
A' have the same eigenvalues (why?) and therefore the same spectral radius. A' is
adjoint to A with respect to the usual (dot) inner product in the sense
(Ax, y) = (x, A'y) for all x, y, where (u, v) = Z" 1 u 1 v.. Apply the maximal property
to the spectral radius of A'.]
9. Let p(x, y) be a continuous function on [c, d] x [c, d] with c < d. Assume that
J
p(x, y) > 0 and p(x, y) dy = 1. Let S2 denote the space of all sequences
w = (x 0 x 1 ,...) of numbers x ; e [c, d]. Let .yo denote the class of all
,
finitedimensional sets A of the form A = {co = (x 0 x,, . ..) e S2: a, <x < b ; , ,
i = 0,1, ... , n}, where c < a ; < b, < d for each i. Define P (A) for such a set A by P
Px(A) = "... P(x,Y1)p(Y1,Y2) ... p(Y "1,Y ")dY" ... dy1 forxe[ao,bo].
J fa bl ,
us that PX has a unique extension to a probability measure defined for all events in
the smallest sigmafield .y of subsets of Q that contains FO . For any nonnegative
integrable function y with integral 1, define
Let X. denote the nth coordinate projection mapping on 52. Then {X"} is said to
be a Markov chain on the state space S = [c, d] with transition density p(x, y) and
initial density y under P. Under P. the process is said to have initial state x.
(i) Prove the Markov property for {X"}; i.e., the conditional distribution of X"+ 1
given X0 ,... , X" is p(X", y) dy.
(ii) Compute the distribution of X. under P.
(iii) Show that under PY the conditional distribution of X. given X o = x o is
p 1 " ) ( x o , y) dy, where
by breaking the integral into two terms involving y such that p(x, y) > p(z, y)
and those y such that p(x, y) < p(z, y).
(v) Show that there is a continuous strictly positive function n(y) such that
max {Ip " (x, y) m(y)I: c < x, y < d} < [1 b(d c)]" 'p
( )
S"=x+X,++X" mod1,
the process is called timereversible if it p ij = 7r j pj; for all i, j e S [it is often said
to be timereversible (with respect to p) as well]. Show that if S is finite and p
is doubly stochastic, then the (discrete) uniform distribution makes the process
timereversible if and only if p is symmetric.
(*ii) Suppose that {X^} is a Markov chain with invariant distribution it and started
in x. Then {X} is a stationary process and therefore has an extension backward
in time ton = 0, 1, 2..... [Use Kolmogorov's Extension Theorem.] Define
the timereversed process by Y = X_,. Show that the reversed process {}} is
a Markov chain with 1step transition probabilities q ;j = nj pji /n,.
(iii) Show that under the timereversibility condition (i), the processes in (ii), { Y}
and {X}, have the same distribution; i.e., in equilibrium a movie of the evolution
looks the same statistically whether run forward or backward in time.
(iv) Show that an irreducible Markov chain on a state space S with an invariant
initial distribution it is timereversible if and only if (Kolmogorov Condition):
Pr, Pi,i 2 Piki = PukP ikik 1 * P i,i for all i, i 1 , .... i k E S, k >_ 1.
(v) If there is a j e S such that p ij > 0 for all i 0 j in (iv), then for timereversibility
it is both necessary and sufficient that p ij Pik Pki = Pik Pkj Pji for all i, j, k.
5. (Random Walk on a Tree) A tree graph on n vertices v 1 , v 2 , ... , v, is a connected
graph that contains no cycles. [That is, there is given a collection of unordered pairs
of distinct vertices (called edges) with the following property: Any two distinct vertices
u, v e S are uniquely connected in the sense that there is a unique sequence
e l , e 2 .... , e of edges e ; _ {v k; , v,,} such that u e e, v e e,,, e i n e i ,, ^ 0,
i = 1, ... , n 1.] The degree v i of the vertex v ; represents the number
of vertices adjacent to v i , where u, v E S are called adjacent if there is an edge {u, v}.
By a tree random walk on a given tree graph we mean a Markov chain on the state
space S = {v1, v2.... , v r } that at each time step n changes its state v ; to one of its v i
randomly selected adjacent states, with equal probabilities and independently of its
states prior to time n.
(i) Explain that such a Markov chain must have a unique invariant distribution.
(ii) Calculate the invariant distribution in terms of the vertex degrees v i , i = 1, ... , r.
(iii) Show that the invariant distribution makes the tree random walk timereversible.
6. Let {X} be a Markov chain on S and define Y. = (X, X n = 0, 1, 2, ... .
(i) Show that {Y} is a Markov chain on S' = {(i, j) E S x S: p i , > 0).
(ii) Show that if {X} is irreducible and aperiodic then so is {Y}.
(iii) Show that if {X} has invariant distribution it = (rz ; ) then {}} has invariant
distribution (n ; p ; ,).
7. Let {X} be an irreducible Markov chain on a finite state space S. Define a graph G
having states of S as vertices with edges joining i and j if and only if either p, j > 0
or pj; > 0.
(i) Show that G is connected; i.e., for any two sites i and j there is a path of edges
from i to j.
(ii) Show that if {X} has an invariant distribution it then for any A c S,
(i.e., the net probability flux across a cut of S into complementary subsets A, S\A
is in balance).
(iii) Show that if G contains no cycles (i.e., is a tree graph in the sense of Exercise
5). then the process is timereversible started with n.
P(Z>r)= P(Z=n).
.=o r=on=r+t
as n oc. [Hint: Represent N(j) as a sum of indicator variables and use (8.11).]
4. Classify the states for the models in Exercises 2.6, 2.7, 2.8, 2.9 as transient or recurrent.
5. Classify the states for {R} = {IS,,}, where S. is the simple symmetric random walk
starting at 0 (see Exercise 1.8).
6. Show that inessential states are transient.
7. (A Birth or Collapse Model) Let
1 i
 '
Pi.^+I =i+1 Pi.o i =0, 1,2,....
=i+ 1'
I i
A.o = Pi.;+I = i + 1' >_ I ' Po.i 1.
i+ l ' i
(i) Prove (see (5.5) of Chapter I) that r = J]ni=1 f,,r_ m (n _> 1).
(ii) Sum (i) over n to give an alternative proof of (8.11).
(iii) Use (i) to indicate how one may compute the distribution of the first visit to
state j (after time zero), starting in state i, in terms of p;PP (n _> 1).
202 DISCRETEPARAMETER MARKOV CHAINS
11. (i) Show that if II II and II 110 are any two norms on t, then there are positive
constants c,, c 2 such that
(ii) Show that the stability condition given in Example 1 implies that X. . 0 in
every norm.
n
lim = E(T; 2) t) =
Jim ' = o0
n w Nn
distribution. [Hint: Use the coupling method described in Exercise 6.6 for finite
state space.]
EXERCISES 203
k x
iff pj < b .
k=1j=1
8. Calculate the invariant distribution for the Renewal Model of Exercise 3.8, in the
case that p"=pn 1 (1 p),n=0, 1,2,. . . where 0 <p < 1.
9. (OneDimensional NearestNeighbor Ising Model) The onedimensional nearest
neighbor Ising model of magnetism consists of a random distribution of + Ivalued
random variables (spins) at the sites of the integers n = 0, 1, 2, .... The
parameters of the model are the inverse temperature /i =  1  > 0 where T is
kT
temperature and k is a universal constant called Boltzmann s constant, an external
field parameter H, and an interaction parameter (coupling constant) J. The spin
variables X", n = 0, + 1, 2, +3, ... , are distributed according to a stochastic
process on { 1,1 } indexed by Z with the Markov property and having stationary
transition law given by
exp{Jai + Hn}
p(X" + ' =nI X" =Q)= 2cosh(H+Ja)
(iv) Determine when the process (in equilibrium) is reversible for the invariant
distribution; modify Exercise 7.4 accordingly.
10. Show that if {X} is an irreducible positiverecurrent Markov chain then the
condition (iv) of Exercise 7.4 is necessary and sufficient for timereversibility of the
stationary process started with distribution it.
*11. An invariant measure for a transition matrix ((p ;j )) is a sequence of nonnegative
numbers (m ; ) such that m ; p ;J = m j for all j ES. An invariant measure may or
may not be normalizable to a probability distribution on S.
(i) Let p ; _ ;+ , = p ; and p ; , o = I p ; for i = 0, 1, 2, .... Show that there is a unique
invariant measure (up to multiples) if and only if tim,,. fjk=, Pk = 0; i.e., if
and only if the chain is recurrent, since the product is the probability of no
return to the origin.
(ii) Show that invariant measures exist for the unrestricted simple random walk
but are not unique in the transient case, and is unique (up to multiples) in
the (null) recurrent case.
(iii) Let Poo = Poi = i and p i , ; _, = p i , ; = 2  ' Z , and p ;.;+ , = I 2 i = 1, 2, 3,
.... Show that the probability of not returning to 0 is positive (i.e., transience),
but that there is a unique invariant measure.
12. Let { Y} be any sequence of random variables having finite second moments and
let y, = Cov(Y, Y,), = EY, o = Var(Y) = y, and p
(i) Verify that 1 <_ p _< 1 for all n and m. [Hint: Use the Schwarz Inequality.]
(ii) Show that if p . , _< f(In ml), where f is a nonnegative function such that
n_ 2 Yk=f(k)Y n =1 Qk *0 as n * oo, then the WLLN holds for {}}.
(iii) Verify that if = p o ^_^ = p(ln ml) > 0, then it is sufficient that
p(k) *0 as n + oc for the WLLN.
(iv) Show that in the case of nonpositive correlations it is sufficient that
n 1 Ik= 1 ak +0 as n , oo for the WLLN.
13. Let p be the transition probability matrix for the asymmetric random walk on
S = {0, 1, 2, ...} with 0 absorbing and p i , ;+ , = p > Z for i _> 1. Explain why for
fixed i > 0,
1 ^
j e S,
n,,
does not converge to the invariant distribution S o ({j}) (as n  co). How can this
be modified to get convergence?
14. (Iterated Averaging)
(i) Let a,, a 2 , a 3 be three numbers. Define a 4 = (a, + a 2 + a 3 )/3, a 5 =
(a 2 + a 3 + a 4 )/3,.... Show that lim,, a = (a, + 2a 2 + 3a 3 )/6.
(ii) Let p be an irreducible positive recurrent transition law and let a,, a 2 , ... be
any bounded sequence of numbers. Show that
lim Pi! aj _
) ajnt,
where (it) is the invariant distribution of p. Show that the result of (i) is a
special case.
EXERCISES 205
1. (i) Let Y1 , Y,, ... be i.i.d. with EY < co. Show that max(Y 1 , ... , Y")/,/n 0 a.s.
as n * . [Hint: Show that P(Y.2 > ne i.o.) = 0 for every e > 0.]
(ii) Verify that n '/ZS" has the same limiting distribution as (10.6).

2. Let {W"(t): t >_ 0} be the path process defined in (10.7). Let t I < t 2 < < t k , k >_ 1,
be an arbitrary finite set of time points. Show that (W"(t 1 ), ... , W"(t k )) converges in
distribution as n ^ x to the multivariate Gaussian distribution with mean zero and
variancecovariance matrix ((D min{t i , t i })), where D is defined by (10.12).
3. Suppose that {X"} is Markov chain with state space S = { 1, 2, . .. , r} having unique
invariant distribution (it ; ). Let
Show that
`(N"(1) Na(r)
n 7C 1 ...
n n
4. For the onedimensional nearestneighbor Ising model of Exercise 9.9 calculate the
following:
(i) The pair correlations p,,,, = Cov(X", X,").
(ii) The largescale variance (magnetic susceptibility) parameter Var(X 0 ).
(iii) Describe the distribution of the fluctuations in the (bulk limit) magnetization
(cf. Exercise 9.9(i)).
5. Let {X"} be a Markov chain on S and define Y = (X", Xri+ 1 ), n = 0, 1, 2..... Let
p = ((p, j )) be the transition matrix for {X"}.
(i) Show that { Y"} is a Markov chain on the state space defined by
S'_ {(i,j)eS x S:p ij >0}.
(ii) Show that if {XX } is irreducible and aperiodic then so is { }"}.
(iii) Suppose that {X"} has invariant distribution it = (n i ). Calculate the invariant
distribution of { },}.
(iv) Let (i, j) e S' and let T be the number of onestep transitions from i to j by
X0 , X"... , X" started with the invariant distribution of (iii). Calculate
lim". x (T"/n) and describe the fluctuations about the limit for large n.
6. (LargeSample Consistency in Statistical Parameter Estimation) Let XX = I or 0
according to whether the nth day at a specified location is wet (rain) or dry. Assume
{X"} is a twostate Markov chain with parameters = P(X" +1 = II X" = 0) and
S=P(X 1 =0^X"=1),n=0,1,2,...,0<f<1,0<S<I.Suppose that {X}
is in equilibrium with the invariant initial distribution it = (n 1 , n o ). Define statistics
based on the sample X 0 , X 1 , ... , X" to estimate /3, it , respectively, by t" = S/(n + 1)
206 DISCRETEPARAMETER MARKOV CHAINS
7. Use the result of Exercise 1.5 to describe an extension of the SLLN and the CLT to
certain rthorder dependent Markov chains.
1. Let {X"} be a twostate Markov chain on S = {O, 1 } and let T o be the first time
{X"} reaches 0. Calculate P l (z o = n), n _> 1, in terms of the parameters p, o and Poi
2. Let {X"} be a threestate Markov chain on S = {0, 1, 2} where 0, 1, 2 are arranged
counterclockwise on a circle, and at each time a transition occurs one unit clockwise
with probability p or one unit counterclockwise with probability 1 p. Let t o denote
the time of the first return to 0. Calculate P(r o > n), n > 1.
3. Let T o denote the first time starting in state 2 that the Markov chain in Exercise
5.6 reaches state 0. Calculate P 2 (r o > n).
4. Verify that the Markov chains starting at i having transition probabilities p and p,
and viewed up to time T A have the same distribution by calculating the probabilities
of the event {X0 = i, X, = i ...... Xm = m , T A =m} under each of p and p.
5. Write out a detailed explanation of (11.22).
6. Explain the calculation of (11.28) and (11.29) as given in the text using earlier results
on the longterm behavior of transition probabilities.
7. (Collocation) Show that there is a unique polynomial p(x) of degree k that takes
prescribed (given) values v o , v 1 ..... v k at any prescribed (given) distinct points
x 0 , x,, ... , x k , respectively; such a polynomial is called a collocation polynomial.
[Hint: Write down a linear system with the coefficients a o , a,, ... , a k of p(x) as the
unknowns. To show the system is nonsingular, view the determinant as a polynomial
and identify all of its zeros.]
*8. (Absorption Rates and the Spectral Radius) Let p be a transition probability matrix
for a finitestate Markov chain and let r ; be the time of the first visit to j. Use (11.4)
and the results of the PerronFrobenius Theorem 6.4 and its corollary to show that
exponential rates of convergence (as obtained in (11.43)) can be anticipated more
generally.
ifi>0,j=0,1,2,...,i1
ifi<0,j=0,1,2,..., i +I
1 ifi=0,j=0
0 ifi=0,j960.
(ii) Show that the mean time to absorption starting at i > 0 is given by =, (i /k).
10. Let {X} be the simple branching process on S = {0, 1, 2, ...} with offspring
distribution { fj }, f j a jfj 1.
(i) Show that all nonzero states in S are transient and that lim^ P 1 (X = k) = 0,
k=1,2,....
(ii) Describe the unique invariant probability distribution for {X}.
II. (i) Suppose that in a certain society each parent has exactly two children, and
both males and females are equally likely to occur. Show that passage of the
family surname to descendants of males eventually stops.
(ii) Calculate the extinction probability for the male lineage as in (i) if each parent
has exactly three children.
(iii) Prompted by an interest in the survival of family surnames, A. J. Lotka (1939),
"Theorie Analytique des Associations Biologiques II, Actualites Scientifiques
et Industrielles, (N.780), Hermann et Cie, Paris, used data for white males in
the United States in 1920 to estimate the probability function f for the
number of male children of a white male. He estimated f(0) = 0.4825,
f(j)= (0.2126)(0.5893)' ' (j = 1,2,...).

12. Let f be the offspring distribution function for a simple branching process having
finite second moment. Let p = > k kf(k), v 2 = E k (k p) 2 f(k). Show that, given
Xo = 1,
1 )/(u 1) if it # I
Var X. =
g if p = 1.
13. Each of the following distributions below depends on a single parameter. Construct
graphs of the nonextinction probability and the expected sizes of the successive
generations as a function of the parameter.
p ifj= 2
(i) f(j)= q ifj=0
0 otherwise;
(iii) f(j)=^ i j

9,(a) _
_ q(b, a)q(a, c)
a, b, cc S. [Hint: Y_ g(a) = 1, b, c e S.]
9,(0) q(b, O)q(O, c)' a
(ii) Use (12.11), (12.12) to show that this condition can be expressed as
and, therefore,
[a, , a, b] _ g()
[a, /3', a, b] g,,,(')
[a, , a, b] g (a)
[a, , a', b] g.b(a')
Use the "substitution scheme" of (iii) and (iv) to verify (12.10) by checking (ii).
2. (i) Verify (12.13) for the case n = 1, r = 2. [Hints: Without loss of generality, take
h = 0, and note,
[x, , y b]
,
[a, u, v, b]
for all x, and x * p(x, dy) is weakly continuous (i.e., f s f (y) p(x, dy) is a continuous
function of x for every bounded continuous function f on S), then n is the unique
invariant probability for p(x, dy), i.e., j 13 p(x, B)rc(dx) = n(B) for all Be .V V. [Hint:
Let f be bounded and continuous. Then
1lr(dz). ]
J
(iii) Extend (i) and (ii) to arbitrary metric space S, and note that it suffices to require
convergence of n  ' Jm If (Y)pl'")(x, dy) to If (y)ir(y) dy for all bounded
continuous f on S.
2. (i) Let B 1 , B 2 be m x m matrices (with real or complex coefficients). Define IIBII as
in (13.13), with the supremum over unit vectors in Il'" or C'. Show that
(iii) If B is an m x m matrix and IIBII is defined to be the supremum over unit vectors
in C'", show that IIB"II >, r"(B). Use this together with (13.17) to prove that
limllB"II" exists and equals r(B). [Hint: Let A. be an eigenvalue such that
12,1 = r(B). Then there exists x e C', (1x11 = 1, such that Bx = Ax.]
_<
3. Suppose E I is a random vector with values in 1.
(i) Prove that if b > 1 and c > 0, then
log c
P(Ie 1 I > cb") EEIZI 1, where Z = logI ,I
" =1 log (5
_< _<
n =1 n=1
IIS"B"II _<
(ii) Show that if (13.15) holds then (13.16) converges. [Hint:
(iii) Show that (13.15) holds, if it holds for some S < 1 /r(B). [Hint: Use the Lemma.]
r n o }. ]
4. Suppose Y a"z" and 1] b "z" are absolutely convergent and are equal for Izl < r, where
r is some positive number. Show that a n = bn for all n. [Hint: Within its radius of
EXERCISES 211
(i) Prove that p " (x, dz) converges weakly to n(dz). [Hint: p '` (x, J) . I as k i c0,
( ) ( )
(k+r) .) (k)
f f(y)p (x dy) = f ($f(z)p^ (y, dz))p (x dy).]
(ii) Assume the hypothesis above for all x e J (with J not depending on x). Prove
that it is the unique invariant probability.
6. In Example 2, ifs,, are i.i.d. uniform on [ 1,1], prove that {X 2n (x): n > 1} is i.i.d.
with common p.d.f. given by (14.27) if XE [2, 2].
7. In Example 2, modify f as follows. Let 0 < <. Define fo (x): _f(x) for
2 _< x < 6, and 6 _< x _< 2, and linearly interpolate between (,), so that f,
is continuous.
(i) Show that, for x e [6, 1] (or, x c [1, b]) {X"(x): n >, 1} is i.i.d. with common
distribution it (or, nx+2).
(ii) For x e (1, 2] (or, [2, 1)) {X"(x): n >_ 2} is i.i.d. with common distribution
ix (or, ix +2).
(iii) For x e (8, 6) {X"(x): n >_ 1} is i.i.d. with common distribution it_ X+ ,.
8. In Example 3, assume P(e 1 = 0) > 0 and prove that P(e,, = 0 for some n >_ 0) = 1.
9. In Example 3, suppose E log s > 0. ;
(i) Prove that E, , {1/(E 1 ...e" )} converges a.s. to a (finite) nonnegative random
variable Z.
(ii) Let d, := inf{z > 0: P(Z < z) > 0}, d 2 == sup{z > 0: P(Z > z) > 0}. Show that
=0 if x < c(d, + l ),
p(x) e (0, 1) if c(d, + l) < x < c(d 2 + 1),
=1 if x > c(d 2 + 1).
_ 1
(a) d, = > m " if m > 1, = oo if m 5 1 , and
n =, mI
EXERCISES 213
Intuitively, condition (d) says that the total uncertainty in the joint occurrence of
two independent events is the cumulative uncertainty for each of the events. Verify
that h must be of the form h(p) = c log 2 p where c = h(21) > h(1) = 0 is a positive
(f)
constant. Standardizing, one may take
h(p) = log2 p.
2. Let f= be a probability distribution on S = { 1, 2, . .. , M}. Define the entropy
in f by the "average uncertainty," i.e.,
Y_
(i) Show that H(f) is maximized by the uniform distribution on S.
(ii) If g = (g ; ) is another probability distribution on S then
H(f) Ji 1og 2 gi
i
<_
3. Suppose that X is a random variable taking values a 1 ..... a M with respective
probabilities p(a, ), ... , p(a M ). Consider an arbitrary binary coding of the respective
symbols a,, I < i z M, by a string cb(a ; ) = ( rI', ... , sl,'t) of 0's and l's, such that no
string (eilt, ... , F 11 ) can be obtained from a shorter code (c,, ... , e 1 n ; n j , by
adding more terms; such codes will be called admissible. The number n ; of bits is
called the length of the codeword 4)(a 1 ).
(i) Show that an admissible binary code 0 having respective lengths n ; exists if and
only if
M
I.
i =1
(ii) (Noiseless Coding Theorem) For any admissible binary code 0 of a,..... a M
having respective lengths n 1 ..... M' the average length of codewords cannot
=l
n.p(a,) % Z
be made smaller than the entropy of the distribution p of a,, ... , a,, i.e.,
i =1
p(a) log2 p(ai)
H(p) Y
[Hint: Use Exercise 2() with f = p(a ; ), g ; = 2  " to show
i =1
n1p(ai) + log 2 (
k =1
2  "").
214 DISCRETEPARAMETER MARKOV CHAINS
M
H(p) n1 P(ai) 5 H(p) + 1.
log t p(a 1 )
(iv) Verify that there is not a more efficient (admissible) encoding (i.e. minimal
average number of bits) of the symbols a,, a 2 , a 3 for the distribution p(al) = z,
P(a2) = p(a 3 ) = , than the code 0(a 1 ) = ( 0), 0(a 2 ) = ( 1, 0), (a 3 ) = ( 1, 1)
4. (i) Show that Y1 , Y2 ,in (15.4) satisfies the law of large numbers.
...
(ii) Show that for (15.5) to hold it is sufficient that Yl , YZ satisfy the law of large
, ...
numbers.
THEORETICAL COMPLEMENTS
Px(A)
=L ...
9,
P(x,Y1)P(Y1,Y2)...P(yn_1,Y(dyn)...u(dy1) (T.6.1)
Let X. denote the nth coordinate projection mapping on 12. Then {X} is said to be
a Markov chain on the state space S with transition density p(x, y) with respect to p
and initial distribution, y under P. The results of Exercise 6.9 can be extended to this
THEORETICAL COMPLEMENTS 215
setting as follows. Suppose that there is a positive integer r and a pintegrable function
p on S such that fs p(x)p(dx) > 0 and p ' (x, y) > p(y) for all x, y in S. Then there is
( )
M(B) := uES ^
JB
Then
2 , s
pick+ n.>
(x, y)p(dy) p k+ '(z, y)p(dy)1 ( 1 e)[Mk.(B) mk.(B)]
(ii) Ifil
e'  x ify<x
P(x, Y) _
,r,(1_y)m
P(L m) =
fo
P(L m I Y = Y)P(Y E dY) =
0
( m 1)! J P(Y E dY),
and therefore,
ri
y
P(L >, m):= lim P(L >_ m) = J ^ 1
0 )1 2(1 y) dy.
W o (m 1)!
THEORETICAL COMPLEMENTS 217
i
lim p(Jd) = pijd(E;i^ii)
n^m
To obtain these from the general renewal theory described below, take as the
delay Zo the time to reach j for the first time starting at i. The durations of the
subsequent replacements Z 1 , Z 2 , ... represent the lengths of times between returns
to I.
F(x + a) F(a)
G(x) _
1 F(a)
Let
S. = Zo + Z, + + Zn , n ^> 0, (T.9.1)
and let
We will use the notation S.0, N sometimes to identify cases when Z o = 0 a.s. Then
Sn is the time of the (n + 1)st renewal and N, counts the number of renewals up to
218 DISCRETEPARAMETER MARKOV CHAINS
and including time t. In the case that Z o = 0 a.s., the stochastic (counting) process
{N}  {N} is called the (ordinary) renewal process. Otherwise {N,} is called a delayed
renewal process.
For simplicity, first restrict attention to the case of the ordinary renewal process.
Let u = EZ l < oo. Then 1/ is called the renewal rate. The interpretation as an
average rate of renewals is reasonable since
Sx i \ t \
(T.9.3)
N, N, N,
>_
N, . 1
a.s. as t  oc. (T.9.4)
t p
Since N, is a stopping time for {Sn } ( for fixed t 0), it follows from Wald's Equation
(Chapter I, Corollary 13.2) that
ESN , = pEN,. (T.9.5)
EN,1

as t * Cc. (T.9.6)
t
To deduce this from the above, simply observe that pEN, = ESN , > t and therefore
EN,1
/
liminf 
 . t p
On the other hand, assuming first that Z. < C a.s. for each n 1, where C is a >,
positive constant, gives pEN, < t + C and therefore for this case, limsup,.,(EN,/t)
I/p. More generally, since truncations of the Z. at the level C would at most decrease
the S, and therefore at most increase N, and EN this last inequality applied to the
truncated process yields
EN,
< 1
fc
limsup as C 4 co. (T.9.7)
t CP(Z 1 >, C) + xF(dx) p
The above limits (T.9.4) and (T.9.6) also hold in the case that p = co, under the
convention that 1 /0o is 0, by the SLLN and (T.9.7). Moreover, these asymptotics
can now be applied to the delayed renewal process to get precisely the same conclusions
for any given (initial) distribution G of delay.
With the special choice of G = F. defined by
x
F(x) =1 I P(Z, > u)du, x 0, (T.9.8)
u j0
THEORETICAL COMPLEMENTS 219
the corresponding delayed renewal process N, called the equilibrium renewal process,
has the property that
where
NI(t,t+h]=N+hN,
N. (T.9.10)
To prove this, define the renewal function m(t) = EN t > 0, for {N1 }. Then for the
general (delayed) process we have
Observe that g(t) = t/p, t > 0, solves the renewal equation (T.9.11) with G = Fx,;
i.e.,
t
=
u o
1`
(1 F(u)) du +
o
t
F(u) du = F(t) + 
N o 0
F(ds) du JJ
= F^(t) + 1
u J J r duo
' s f EP
F(ds) = F (t) + t s F(ds).
x
.'
(T.9.13)
To finish the proof of (T.9.9), observe that g(t) = m`(t):= EN;, t > 0, uniquely
solves (T.9.11), with G = F., among functions that are bounded on finite intervals.
For if r(t) is another such function, then by iterating we have
= F(t) + f { F^(t u) +
.'
(
J t
o
u r(t u s)F(ds)jF(du)
(T.9.14)
Thus,
r(t) _ P(Nj 3 n) = m W (t)
n =i
since
P(S, < t) = P(N >_ n) + 0 as n  oo since P(N >, n) = EN < cc.
n =i
Let d be a positive real number and let L d = {0, d, 2d, 3d, ...}. The common
distribution F of the durations Z 1 , Z 2 , ... is said to be a lattice distribution if there
is a number d > 0 such that P(Z 1 e L a ) = 1. The largest such d is called the period of F.
where
Nc") _ (T.9.17)
lsk=nd)
k=0
Note that assuming that the limit exists for each h > 0, the value of the limit in
(i), likewise (ii), of Blackwell's theorem can easily be identified from the elementary
renewal theorem (T.9.6) by noting that cp(h):= lim^.. m EN(t, t + h] must then be
linear in h, p(0) = 0, and
EN" 1
_ (im = .
n .. n
THEORETICAL COMPLEMENTS 221
Proof. To make the coupling idea precise for the case of Blackwell's theorem with
< oc, let {Z8 : n > 1} and {Z: n > l} denote two independent sequences of renewal
lifetime random variables with common distribution F, and let Z o and Z o be
independent delays for the two sequences having distributions G and G = F es ,
respectively. The tilde () will be used in reference to quantities associated with the
latter (equilibrium) process. Let a > 0 and define,
Suppose we have established that (erecurrence) P(v(e) < oo) = I (i.e., the coupling will
occur). Since the event {v(e) = n, v(e) = n} is determined by 4, Z 1 , ... , Z and
Z;,, the sequence of lifetimes {Z' +k : k > l} may be replaced by the
sequence {Z, +k : k > 1} without changing the distributions of {S}, {N}, etc. Then,
after such a modification for < h/2, observe with the aid of a simple figure that
(T.9.20)
Therefore,
N(t+e,t+hs]N(t,t+h]1 {s , ( , > , t
N(t + s, t + h E] 1 ^ s ,, E
N(t, t + h]I (s , (,) ,, I
N(t, t + h]1 Is , II + N(t, t + h]l {s )(= N(t, t + h])
Taking expected values and noting the first, fifth, and seventh lines, we have the
following coupling inequality,
Using (T.9.9),
h+2e
EN(t+s, t+h e]= h  2s and EN(t E, t+h+e]=
p
Therefore,
E(N(t, t + h]1 (s ,^ >,^) < E o Nh P(SVO > t) = m(h)P(S V(C) > t), (T.9.24)
where E o denotes expected value for the process N h under zerodelay. More precisely,
because (t, t + h] c (t, SN , + h] and there are no renewals in (t, SN ,), we have
N(t, t + h] < inf{k 3 0: ZN, +k > h). In particular, noting (T.9.2), this upper bound
by an ordinary (zerodelay) renewal process with renewal distribution F, is
independent of the event A, and furnishes the desired estimate (T.9.24).
Now from (T.9.23) and (T.9.24) we have the estimate
which is enough, since e> 0 is arbitrary, provided that the initial erecurrence
assumption, P(v(e) < oo) = 1, can be established. So, the bulk of the proof rests on
showing that the coupling will eventually occur. The probability P(v(c) < cc) can be
analyzed separately for each of the two cases (i) and (ii) of Theorem
T.9.1.
First take the lattice case (ii) with lattice spacing (period) d. Note that for e < d,
For case (i), observe by the HewittSavage zeroone law (theoretical complement
1.2 of Chapter I) applied to the i.i.d. sequence (Z 1 , Z 1 ), (Z 2 , Z 2 ), (Z 3 , Z 3 ), ... , that
where R"=
min{ S,fS":5;,S">0,n_>0}=SS,^S "
Now, the distribution of {SR, +; t} ; does not depend on t (Exercise 7.5, Chapter
THEORETICAL COMPLEMENTS 223
>
IV). This, independence of {Z' j } and {S}, and the fact that {Sk + Sk : n 0} does
not depend on k, make {R,, +k } also have distribution independent of k. Therefore,
the probability P(R, < e for some n > k), does not depend on k, and thus
implies P(R < r i.o.) = P(R < e for some n >, 0) < P(v(e) < oo). Now,
The proof that P(R <e for some n z) > 0 (and therefore is 1) in (T.9.29)
follows from a final technical lemma given below on "points of increase" of distribution
functions of sums of i.i.d. nonlattice positive random variables; a point x is called a
point of increase of a distribution function F if F(b) F(a) > 0 whenever a < x < b.
Lemma. Let F be a nonlattice distribution function on (0, co). The set E of points
of increase of the functions F, F* z , F* 3 , . . is "asymptotically dense at co" in the
sense that for any t > 0 and x sufficiently large, E n (x, x + e) 96 0, i.e., the interval
(x, x + e) meets E for x sufficiently large. 0
Proof. Let a, b E E, 0 <a < b, such that b a < e. Let 1 = (na, nb]. For
a < n(b a), the interval 1 properly contains (na, (n + 1)a), and therefore each
x > a 2 /(b a) belongs to some I., n >, 1. Since E is easily checked to be closed under
addition, the n + 1 points na + k(b a), k = 0,1, ... , n, belong to E and partition
I. into n subintervals of length b a < r. Thus each x > a 2 /(b a) is at a distance
(b a)/2 </2 of E. If for some r > 0, b a > e for all a, b e E then F must be
a lattice distribution. To see this say, without loss of generality, E < b a < 2a for
somea,beE.ThenEnl c {na +k(ba):k= 0,1,...,n}.Since(n+l)aeEnl
for a < n(b a), E n I must consist of multiples of (b a). Thus, if c e E then
c + k(b a) e / n E for n sufficiently large. Thus c is a multiple of (b a). n
Coupling approaches to the renewal theorem on which the preceding is based can
be found in the papers of H. Thorisson (1987), "A Complete Coupling Proof of
Blackwell's Renewal Theorem," Stoch. Proc. App!., 26, pp. 8797; K. Athreya, D.
McDonald, P. Ney (1978), "Coupling and the Renewal Theorem," Amer. Math.
Monthly, 851, pp. 809814; T. Lindvall (1977), "A Probabilistic Proof of Blackwell's
Renewal Theorem," Ann. Probab., 5, pp. 482485.
3. (Birkhoff's Ergodic Theorem) Suppose {X: n > 0} is a stochastic process on
(S2, .F, P) with values in (S, ,V). The process {X} is (strictly) stationary if for every
pair of integers m > 0, r > 1, the distribution of (X 0 , X 1 , ... , X,,) is the same as that
224 DISCRETEPARAMETER MARKOV CHAINS
>,
of (X X l+ ... , X. + ,). An equivalent definition is: {X} is stationary if the distribu
tion , say, of X :_ (X0 , X 1 , X2 , ...) is the same as that of T'X :_ (X,, X l + X2 + ...)
for all r 0. Recall that the distribution of (X X, + ...) is the probability measure
and
Definition T.9.1. The process {X: n > 0} and the shift transformation T are said
to be ergodic if S is trivial.
0) be a stationary
sequence on the state space S (having sigmafield .9'). Let f(X) be a realvalued
1measurable function such that Elf(X)( < oc. Then
(a) n  ' Y_;=
f(TX) converges a.s. and in L' to an invariant random variable g(X),
and
(b) g(X) = Ef(X) a.s. if S is trivial.
(T.9.30)
Proof. Note that f(X)+M(foT)=M^ +1 (f) on the set {MM+ ,(f)>0}. Since
M, + , (f) >, M(f) and {M(f) > 0} c {M + , (f) > 0}, it follows that f (X)
M(f) M(f o T) on {M(f) > 0}. Also, M (f) > 0, M^(f o T) > 0. Therefore,
M
f(X)dP>  (M^(f)M^(fT))dP
J(
M(f)>OV G {M(f)>OInG
J J
= M(f) dP M(f o T) dP
O
J G
J
M^(f) dP M(f o T) dP
G
= 0,
where the last equality follows from the invariance of G and the stationarity of
Thus, (T.9.31) holds with {M(f) > 0} in place of {M(f) > 0}. Now let n j cc.
1 n 1
f(A(fc)n G
f(X) dP > cP({A(f ) > c} n G) VG E 5. (T.9.32)
226 DISCRETEPARAMETER MARKOV CHAINS
f (M(f^0)nG
f(X) dP > cP({M(f  c) > 0} n G).
But {M(f  c) > 0} = {A(f  c) > 0} = {A(f) > c}, and {M(f  c) > 0} _
{A(f)>c}. n
I  ' I^'
7(X):= i > f(T'X), f(X):= lim  Y f(T'X),
p_Qn.=o n,n.=o (T.9.33)
fG"d(f)
f(X)dP _> dP(GC , d (f )),
i.e.,
Now if c > d, then (T.9.34) and (T.9.35) cannot both be true unless P(G,, d (f )) = 0.
Thus, if c > d, then P(GC , d (f )) = 0. Apply this to all pairs of rationals c > d to get
P(f (X) > f(X)) = 0. In other words, (1/n) y;= f (T'X) converges a.s. to
g(X) := f(X).
To complete the proof of part (a), it is enough to assume f >, 0, since
n  ' j 1 f + (T'X)  7(X) a.s. and n  ' 2]o  'f(TX)  r(X) a.s., where
f + = max{ f, 0},  f  = min{ f, 0}. Assume then f > 0. First, by Fatou's Lemma
and stationarity of {X},
'
1
E7(X) < lim E  f(T'(X)) = Ef(X) < oo.
nao n r=o
nonnegative and integrable, given s > 0 there exists a constant N E such that
S e + N,Ef(X)/.l. (T.9.36)
It follows that the left side of (T.9.36) goes to zero as .l p oo, uniformly for all n.
Part (b) is an immediate consequence of part (a).
Notice that part (a) of Theorem T.9.2 also implies that g(X) = E(f (X) J).
Theorem T.9.2 is generally stated for any transformation T on a probability space
(S2, ^, p) satisfying p(T 'G) = p(G) for all G e ^. Such a transformation is called

X= + +Z
Q^
Since Z i , . are i.i.d. with finite second moment, the FCLT of Chapter I provides
that {X;"} converges in distribution to standard Brownian motion. The corresponding
result for {W"(t)} follows by an application of the Maximal Inequality to show
There are specifications of local structure that are defined in a natural manner but
for which there are no Gibbs states having the given structure when, for example,
A = Z, but S is not finite. As an example, one can take q to be the transition matrix
of a (general) random walk on S = Z such that q = q111 > 0 for all i, j. In this case
;;
of zero itmeasure. Now the invariance of n' implies f (1/n) p(')(x; A)zr'(dx) = rz'(A)
for all n. Therefore, n'(A) = rz(A). Thus it' = it, completing the proof.
As a very special case, the following strong law of large numbers (SLLN) for
Markov processes on general state spaces is obtained: If p(x; dy) admits a unique
invariant probability rr, and {X": n > 0} is a Markov process with transition probability
230 DISCRETEPARAMETER MARKOV CHAINS
p and initial distribution n, then (1/n) j ' f (X,) converges to f f (x)n(dx) a.s. provided

f
that If (x)I i(dx) < co. This also implies, by conditioning on X 0 , that this almost
sure convergence holds under all initial states x outside a set of zero itmeasure.
5. (Ergodic Decomposition of a Compact State Space) Suppose S is a compact metric
space and S = .l(S) its Borel sigmafield. Let p(x; dy) be a transition probability on
(S, s(S)) having the Feller property: x * p(x; dy) is weakly continuous on S into
p(S)the set of all probability measures on (S,.R(S)). Let T* denote the map on
9(S) into a(S) defined by: (T*)(B) = $ p(x; B)p(dx) (Be f(S)). Then T* is weakly
continuous. For if probability measures ^ converge weakly to p then, for every
realvalued bounded continuous f on S, J f d(T*p") = f (f f (y)p(x; dy))p ^ (dx)
f
converges to ($ f(y)p(x; dy))p(dx) =If d(T*p), since x a f f(y)p(x;dy) is continuous
by the Feller property of p.
Let us show that under the above hypothesis there exists at least one invariant
_>
probability for p. Fix p e P1(S). Consider the sequence of probability measures
1^ '

where
J f d^ .  J f d(T *u )I =4 f
nJ
f dy  f f d(T*" p)) < (sup{f(x)I: x e S })(2/n') + 0,
as n'  oo. Therefore, {p} and {T*p".} converge to the same limit. In other words,
it = T*n, or it is invariant. This also shows that on a compact metric space, and with
p having the Feller property, if there exists a unique invariant probability it then
T*p 1_ (1/n) T*'p converges weakly to n, no matter what (the initial
distribution) p is.
Next, consider the set .f = .alp of all invariant probabilities for p. This is a convex
and (weakly) compact subset of P1(S). Convexity is obvious. Weak compactness follows
from the facts (i) q(S) is weakly compact (by Prohorov's Theorem), and (ii) T* is
continuous for the weak topology on 9(S). For, if u ^ e .elf and ^ converges weakly
to p, then ^ = T*" converges weakly to T*p. Therefore, T*p = p. Also, P1(S) is a
metric space (see, e.g., K. R. Parthasarathy (1967), Probability Measures on Metric
Spaces, Academic Press, New York, p. 43). It now follows from the KreinMilman
Theorem (see H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York,
p. 207) that di is the closed convex hull of its extreme points. Now if {X ^ } is not
ergodic under an invariant initial distribution n, then, by the construction given in
theoretical complement 4 above, there exists B e P(S) such that 0 < n(B) < I and
it = it(B)it B + n(B`)i B .,, with n B and iB ., mutually singular invariant probabilities. In
other words, the set K, say, of extreme points of d# comprises those it such that {X"}
with initial distribution it is ergodic. Every it e .i is a (weak) limit of convex
combinations of the form .1;'p;" ( n + cc), where 0 < A;^ < 1, .l;' = 1, ;' e K.
)
THEORETICAL COMPLEMENTS 231
Each of the simple random walk examples described in Section 1.3 has the
special property that it does not skip states in its evolution. In this vein, we
shall study timehomogeneous Markov chains called birthdeath chains whose
transition law takes the form
; ifj =i +1
S ; ifj =i 1
(1.1)
a i ifj =i
0 otherwise,
for I i<,2r1,
233
234 BIRTHDEATH MARKOV CHAINS
r
(w+4i)(2r i)
P1,i+ 1 =
(w + r) 2
CC.V^:)
(1.3)
2r
Pol = P2r,2r  1 = w + r
Just as the simple random walk is the discrete analogue of Brownian motion,
the birthdeath chains are the discrete analogues of the diffusions studied in
Chapter V.
Most of this chapter may be read independently of Chapter II.
CASE I. Let {X} bean unrestricted birthdeath chain on S = {O, 1, 2, ...} = 7L.
The transition probabilities are
with
or equivalently,
Rewrite (2.4) as
xbx1
. :. 61
0(d) 0(y) = d  1 S +1 (0(c + 1) 0(c)). (2.8)
x=yxl'x1 ... Nc+l
d1 Sxax1..'Sc+1
Let p y, denote the probability that starting at y the process eventually reaches
c after time 0, i.e.,
if dx x _1_+l = 0
p = lim ^i(y)
y ^
= 1
di x x=c+l !'xl'x I " ' f'c+ l
R IS'
2 ax = cc
i=l for ally > c iff Y a
x=1 12 '
0
x x+ = 0
p yd = 1 for all y < d iff F
x=oo axax+l"'SO
0
<1 for all y < d iff Y xx+l'  < oo (2.15)
x=m Sxsx+1" 60
A state y e S satisfying (2.18) is called a transient state; since (2.18) holds for
all y e S, the birthdeath chain is transient. Just as in the case of a simple
asymmetric random walk, the strong Markov property may be applied to see
that with probability 1 each state occurs at most finitely often in a transient
birthdeath Markov chain.
CASE II. The next case is that of two reflecting boundaries. For this take
S = {0, 1, 2, .. , N } , P00 = 1 0 , POI = 0' PN.N1 = 6 N' PN.N = 1 6N, and
Pi.j+ 1 = fli, Pi.r1 = b1, pi,; = 1 . d ; for I <, i < N  1. If one takes c = 0,
d = N in (2.3), then fr (y) gives the probability that the process starting at y
reaches 0 before reaching N. The probability 4(y), for the process to reach N
before 0 starting at y, may be obtained in the same fashion by changing the
boundary conditions (2.5) to c(0) = 0, (N) = I to get that q (y) = I Ji(y).
Alternatively, check that b(y)  I ^i(y) satisfies the equation (2.6) (with 0
replacing 0) and the boundary conditions (P(0) = 0, 4(N) = 1, and then argue
that such a solution is necessarily unique (Exercise 4). All states are recurrent,
by Corollary 9.6 (see Exercise 5 for an alternative proof).
CASE III. For the case of one absorbing boundary, say at 0, take
S = j0, 1, 2, ...1, Poo = 1 , Pi.i+1 = #i, Pi.^1 = b;, Pi.; = 1 ; S ; for i > 0;
; , 6 i > 0 for i > 0, fl + 1 < 1. For c, d e S, the probability Ji(y) is given by
(2.10) and the probability p, which is also interpreted as the probability of
eventual absorption starting at y> 0, is given by
...
d1 ax ax l  bl
Y .. 1 . . Ij1
p Ya =hm d1
dtv 1 + Sxbx1_..51
= 1 iff 2 a lb
/j J^ a = oo (for y > 0). (2.19)
x=1 I2'''Yx
so that
Poo = 1. (2.24)
On the other hand, if the series in (2.19) converges then p^, o < 1 for all y > 0.
In particular, from (2.23), we see Poo < 1. Convergence of the series in (2.19)
also gives p r,, < 1 for all c < y by (2.12). Now apply (2.16) to get
whenever the series in (2.19) converges. That is, the birthdeath chain is transient.
The various remaining cases, for example, two absorbing, or one absorbing
and one reflecting boundary, are left to the Exercises.
k>,1,
Pn(xo = io, ... , Xm Im) = Pn (Xk = 1p, ... > Xm+k = I m ). (3.2)
no(1 o) + ir rbi = ir o
(3.3)
7i ifli i +(l i ai) + mi+Ibi+1 = ni (j = 1,2,...,N 1),
or
n i n (l<j<N),
(3.5)
...iI) 1
N ot
7Cp
= 1 +
Y_(
i =1 CS1(52...(SJ
7ro(I o)+n151=ito,
(3.6)
71 i Ii ^+it(I i S i )+7r i +la i +^=j (j>11).
01 .J 1
.
< co. (3.8)
i =1 6162..81
/
n o =l +
1 .. .
1 1
. (3.9)
...6 1
16 2
)
i =1
240 BIRTHDEATH MARKOV CHAINS
ol ... j1
7t, 6162...gj Ro (J% 1 ),
(3.11)
aj+16j+2 07r0
(j1< 1).
+r * * F'1
Y_ 6
bj+1 . j+2 E . 01...j1
< 00, (3.12)
j<1 Pifli+1 .1 .j>1 6 1 2 ...(a j
in which case
aj+lbj+2
...S 0 + ol ... i 1
7< o= 1+ (3.13)
j_' 1 Yjl'j+l

/^/^
...
I j>1 6162...(ai
Notice that the convergence of the series in (3.12) implies the divergence of
the series in (2.14), (2.15). In other words, the existence of an equilibrium
distribution for the chain implies its recurrence. The same remark applies to
the birthdeath chain with one or two reflecting boundaries.
2r 2w
_ o 1
j_ 1 2r(w + r) i (w + r i)(2r i) (j X W + j
...
r
^j S 1 b j "o j(2r+j) j i(wr +i) 2w+2r
(w+r)
(3.14)
The assertions concerning positive recurrence contained in Theorem 3.1
below rely on the material in Section 2.9 and may be omitted on first reading.
Recall from Theorem 9.2(c) of Chapter I that in the case that all states
communicate with each other, existence of an invariant distribution is equivalent
to positive recurrence of all states.
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS 241
Theorem 3.1
c` 66
12
[J1 1 Nx
diverges or converges. All states are positive recurrent if and only if the
series (3.8) converges. In the case that (3.8) converges, the unique
invariant distribution is given by (3.7), (3.9).
(c) For an unrestricted birthdeath chain on S = {0, 1, 2, ...} all states
are transient if and only if at least one of the series in (2.14) and (2.15)
is convergent. All states are positive recurrent if and only if (3.12) holds;
if (3.12) holds, then the unique invariant distribution is given by (3.11),
(3.13).
We will apply the spectral theorem to calculate p", for n = 1, 2, ... , in the case
that p is the transition law for a birthdeath chain.
First consider the case of a birthdeath chain on S = {0, 1, ... , N} with
reflecting boundaries at 0 and N. Then the invariant distribution it is given by
(3.5) as
1
7z 1 = n o , 71 t = '^ '1 71 0 (2 < j < N). (4.1)
b l51...51
In the applied sciences the symmetry property (4.3) is often referred to as detailed
balance or time reversibility. Introduce the following inner product ( . )" in the
vector space R"+'
242
(x, Y)n =
i
Y_
N
=o
xiYi 7ry x = (x 0 , x i , ... , xN)',
i xI
(iO
1JZ . ( 4.5)
Y,
i=O UO PjiYiJxj7rj
Y_
Therefore, by the spectral theorem, p has N + 1 real eigenvalues a o , a l , ... , a N
(not necessarily distinct) and corresponding eigenvectors (0o, 4' , dN, which
are of unit length and mutually orthogonal with respect to ( , ). Therefore,
the linear transformation x > px has the spectral representation
N
P= akEk ,
k=0
(4.7)
N
x= E ak(4>k, x)aek ,
k=O
Letting x = e j denote the vector with 1 in the jth coordinate and zeros elsewhere,
one gets
1 I
7Z j =N (1 <j<,N1), tr o =zZ N =2
N . (4.10)
0 2  2at + 1 = 0, (4.13)
The equation (4.11) is linear in x, i.e., if x and y are both solutions of (4.11)
then so is ax + by for arbitrary numbers a and b. Therefore, every linear
combination
satisfies (4.11). We now apply the boundary conditions (4.12) to fix A(a), B(a),
up to a constant multiplier. Since every scalar multiple of a solution of (4.11)
and (4.12) is also a solution, let us fix x o = 1. Note that x o = 0 implies x 3 = 0
for all j. Letting j = 0 in (4.15), one has
A(a)(0 1 0 2 ) + 0 2 = a, (4.17)
or,
Now write 0 1 = e`o, 0 2 = e  ' O, where 0 is the unique angle in [0, it] such that
cos 4) = a. Note that cosine is strictly decreasing in [0, n] and assumes its entire
range of values [1, 1] on [0, 7c]. Note also that this is consistent with the
requirement sin 4) = 1 a 2 0. Then (4.19) becomes
i.e.,
Now,
k1 = J n j cosz
IIx(Iz
N kn l 1 N i (k7rj) 1
2N+Nlyl cosz
N + 2N
cos2(krrj) = 1 il I + cos(2kirj/N)
= 1 N1
N j _ o N N j=0 2
c1 ifk=0orN
(4.25)
t J ifk=1,2,...,N1.
Now use (4.9), (4.23), and (4.26) to get, for 0 < i, j < N,
N
Pij
(n) E k
n4 ki kj 7 lj
k=0
N1
k7rj
= rr j + 2rz j y cos ( k^ cos kni cos( + (1)rc j . (4.27)
k=1\ Nj N N
1 2 } _ 1 n+j i _
p j 1 + COS"( )COS COS !
N N k=1
^l N (N I )
) .. ) N ( N
For 0<i<N,
1 IN 1 ( ^ ( ) 1
Po = + COS"t
\ COS J
/ + 1 " I
2N N k = t N N 2N
For 0<i<N,
1 1 N ' / k\ ki i
1
P,N + Z COs cos ( I COS + ( 1)n+N (4.28)
2N N k = 1 N N N 2N
Note that when n and j i have the same parity, say n = 2m and j i is
even, then
r)
C p;;"' 1 I=4
L cos( I I cos( cos( I[] + o(1)] (4.29)
246 BIRTHDEATH MARKOV CHAINS
for all i > 1. Note that p;! is the same as in the case of a random walk with
)
two reflecting boundaries 0 and N, provided N > n + i, since the random walk
cannot reach N in n steps (or fewer) starting from i if N > n + i. Hence for all
i, j, n, p is obtained by taking the limit in (4.28) as N . oc, i.e.,
p=2
f
2 "
o
, cos"(nO) cos(iirO) cos( jn6) dO
are i balls in box I, then there are 2d i balls in box II. Thus there is no overall
heat loss or gain. Let X. denote the number of balls in box I after the nth trial.
Then {X: n = 0, 1, ...} is a Markov chain with state space S = {0, 1, 2, ... , 2d}
and transition probabilities
P ij = 0, otherwise.
This is a birthdeath chain with two reflecting boundaries at 0 and 2d. The
transition probabilities are such that the mean change in temperature, in box
I, say, at each step is propostional to the negative of the existing temperature
gradient, or temperature difference, between the two bodies. We will first see
that the model yields Newton's law of cooling at the level of the evolution of
the averages. Assume that initially there are i balls in box I. Let Y = X d,
the excess of the number of balls in box I over d. Writing e = E j (Y), the
expected value of Y given X 0 = i, one has
e=E,(Xd)=E,[X d+(XX,)]
d)+E;(X
/ 2d x_, X _ i 1
=E,(X_1 X 1)=e1+Er
2d 2d )
e i 1\
= e 1 + Ei d = e, d = 1 ^ e 1. 
Suppose in the physical model the frequency of transitions is r per second. Then
in time t there are n = tT transitions. Write v = log[(l (1/d))]T. Then
e = (i d)e  `, (5.3)
_ ( 2 d) 2 _ a y
j=0, 1,...,2d. (5.4)
m ; =1+ m 1 +5 1 m i _ 1 (1<i<N1),
(5.5)
m0=0, mN = 1 + mN_1.
u 0 =0, u l = 1, (5.6)
In other words, in this new scale the probability of reaching the relabeled
boundary u = 0, before U N , starting from u x (inside), is proportional to the
distance from the boundary U N . This scale is called the natural scale. The
difference equations (5.5) when written in this scale assume a simple form, as
will presently be shown. First let us determine u x from (5.6) and (5.7) and the
difference equation
a 1 S Z .. _ S,a2... x
a a a (ui uo)= a (1<x<N1), (5.10)
or
x . (5 i
61o2"
ux+1 = 1 +i^ i2 ...i (1 x<N 1). (5.11)
Now write
m(u)  m x . (5.12)
1 m(u) m(u 1 ) 1
_
_ (1 x N 1).
UN UN1 Ux Ux1 i =x
6162...(Si
(5.14)
Relations (5.10) and (5.14) lead to
Y_
l'N1 +
m \U x ) m(U x 1) = Yx/'x+l 1 F'x
(1 x < N 1). Ni 1i
...6 (Sx...Sii
Sxax+l N1 i =x l
(5.15)
The factor ; / i is introduced in the last summands to take care of the summand
corresponding to i = x (this summand is actually 1/S x ). Sum (5.15) over
x = 1, 2, ... , y to finally get, using m(u 0 ) = 0,
m(UO _ xfx+1
...%jN ...
1 + x i 1t
Ll (1 < y < N 1).
x=1 Sx 6 x+I * * ' SN1 x=1 i =x 6X... 6 i
(5.16)
In particular, for the Ehrenfest model one gets
= 2d 22d(1 (5.17)
+Q)).
Next let us calculate
m =_ Ei T (0<i<d),
i 4 (5.18)
where
Writing m(u i ) = Ph i , one obtains the same equations as (5.13) for 1 < i < d 1,
and boundary conditions
...
m(ux+l)  m(ux) tn(ul)  tn(uo) x o1 _I
ux+I U x U1 UQ i =1 6 1 62...^i
o1
...'
_ 1 Y
_ 6 6 ...6.
1 1 2
(5.21)
where 0 = 1. Therefore,
x
m(u x+, )  m(u) _ ' z
x  Y r+ I x x+ I
(5 22)
x ,z
x i 1 , x6x+I '
m l+
d1
X= I
d1
x!
(2d1)(2dx)
x!
+ x
+ dI
=
dI x ((x + 1)x (i + 2)(i + 1)
i; (2d_i)...(2d_x)(x+1) )
2d x x x i
<1 +
x=1 (2d 1)(2dx) x = 1 2dx 1= 2dx
d1 x! dI
2d
,1+Y +I
x= I (2d  1)...(2d  x) x= I 2(d  x)
1+
dIl xl + d(log d + 1). (5.24)
x= I (2d  1) (2d  x)
For d = 10 000 balls and rate of transition one ball per second, it follows that
It takes only about a day on the average for the system to reach equilibrium from
a state farthest from equilibrium, but takes an average time inconceivably large,
even compared to cosmological scales, for the system to go back to that state
from equilibrium.
252 BIRTHDEATH MARKOV CHAINS
For d = 10 000 one gets, using Stirling's approximation for the second estimate,
EXERCISES
EXERCISES 253
2. Suppose that balls labeled 1, ... , N are initially distributed between two boxes labeled
I and II. The state of the system represents the number of balls in box I. Determine
the onestep transition probabilities for each of the following rules of motion in the
state space.
(i) At each time step a ball is randomly (uniformly) selected from the numbers
1, 2, ... , N. Independently of the ball selected, box I or II is selected with
respective probabilities p, and P2 = t p,. The ball selected is placed in the
box selected.
(ii) At each time step a ball is randomly (uniformly) selected from the numbers in
box I with probability p, or from those in II with probability P2 = 1 p,. A
box is then selected with respective probabilities in proportion to current box
sizes. The ball selected is placed in the box selected.
(iii) At each time step a ball is randomly (uniformly) selected from the numbers in
box I with probability proportional to the current size of I or from those in II
with the complementary probability. A box is also selected with probabilities in
proportion to current box size. The ball selected is placed in the box selected.
_< _<
3. Prove (2.4), (2.16), and (2.23) by conditioning on X, and using the Markov property.
>_ _<
4. Suppose that cp(i)(c i d) satisfy the equations (2.4) and the boundary conditions
q(c) = 0, cp(d) = 1. Prove that such a cp is unique.
5. Consider a birthdeath chain on S = {0, 1, ... , N } with both boundaries reflecting.
(i) Prove that P(T mN) (I S N 5 N _ I ...6, )m if i > j, and < (1 o/i t .. N _, )m
if i < f. Here T = inf {n 1: Xn =j}.
(ii) Use (i) to prove that p ; , = P; (Tt < x) = I for all i, j.
6. Consider a birthdeath chain on S = {0, 1, ...} with 0 reflecting. Argue as in Exercise
5 to show that p 1 = I for all y.
7. Consider a birthdeath chain on S = {0, I, ... , N} with 0, N absorbing. Calculate
9. If 0 is absorbing, and N
Derive the necessary and sufficient condition for recurrence.
reflecting, for a birthdeath chain on S = {0, 1, ... , N},
then show that 0 is recurrent and all other states are transient.
10. Let p be the transition probability matrix of a birthdeath chain on S = {0, 1, 2, ...}
with
.= j= 0.1,2,....
2 (j +21 ) Si 2 (j+ 1),
254 BIRTHDEATH MARKOV CHAINS
N i
N p l , ifj =i +1,
Ni (P 1N i)
+ N p 2 , ifj=i,i=0,1 ' ..., N,
Pij=
3. Calculate the transition probabilities p" for n >, 1 by the spectral method in the case
of Exercise 1.2(i) and p, = p 2 = Z according to the following steps.
(i) Consider the eigenvalue problem for the transpose p'. Write out difference
equations for p'x = ax.
(ii) Replace the system of equations in (i) by the infinite system
1 1 N i 1 i +2
 x0+ X I = ax0, 2N Xi +  xi+) + x^+2 = ax;+
2 2N 2 2N
N(2a 1 z)
q (Z) =
' q(Z), q(0) = xo.
1 ZZ
[Hint: Multiply both sides of the second equation in (ii) by z' and sum over
i >0.]
(iv) Show that (iii) has the unique solution
N I' a)(I + )Na
(P(Z) = X0( 1 z) Z
(v) Show that for aj = j/N, j = 0, 1, ... , N, cp(z) is a polynomial of degree N and
therefore, by (ii) and (iii), a j = j/N, j = 0, 1, ... , N, are the eigenvalues of p' and,
therefore, of p.
(vi) Show that the eigenvector x (' ) = (x ) , ... , xN ) )' corresponding to aj = j/N is
given with xo' ) = 1, by xk ) = coefficient of z' in (1 z)"  '(1 + z) .
(vii) Write B for the matrix with columns x^ 0) , .. , x (N) . Then,
where
no n
(B') ' B diag ...
(IX' 0) lirz ' IIX IN) IIa2 )
4. (Relaxation and Correlation Length) Let p be the transition matrix for a finite state
stationary birthdeath chain {X n } on S = {0, 1, ... , N} with reflecting boundaries at
0 and N. Show that
Use the inequality ab _< (a' + b 2 )/2 to show (Corr, ( f (X.), g(X ))I < e z ".]
o
5. (i) (Simple Random Walk with Periodic Boundary) States 0, 1, 2, ... , N 1 are
arranged clockwise in a circle. A transition occurs either one unit clockwise or
one unit counterclockwise with respective probabilities p and q = 1 p. Show
that
1 N1
N r_o1
where 0 = e(znptN is an Nth root of unity (all Nth roots of unity being
1,0,0 ,...,0 ).
2 N1
(*ii) (General Random Walk with Periodic Boundary) Suppose that for the
arrangement in (i), a transition k units clockwise (equivalently, N k units
counterclockwise) occurs with probability p k , k = 0,1, ... , N 1. Show that
NI NI
(n) = 1 0rU'k) I Br5 "
Pjk P,
N r=o s=o
THEORETICAL COMPLEMENTS
aT, x
f(T,x), t > 0,
at (T.5.1)
To x=x,
such that f = (f,, ... , f") . I^" R" uniquely determines the solution at all timest >0
for each initial state x by (T.5.1).
cos(yt) my sin(yt) k
A(t)= 1 t_>0, where y= > 0.
sin(yt) cos(yt)
my
Notice that areas (2dimensional phasespace volume) are preserved under T, since
det A(t) = 1. The motion is obviously periodic in this case.
OH
dq;_aH dp;_
i =1,...,k, (T.5.2)
dt ap ' ; dt aq; '
where H  ll(q,, . .. , qk, p,, ... , Pk) is the Hamiltonian function representing the
total energy (kinetic energy plus potential energy) of the system. Example I is of this
form with k = 1, H(q, p) = p 2 /2m + kg 2 . Writing n = 2k, x, = q,, ... , X k = qk>
Xk+ 1 = Pi, , X2k = Pk, this is also of the form (T.5.1) with
p / OH OH OH aH
f(x) _ (fl (x), ..... 2k(x)) _ , ... ,  (T.5.3)
GXk+ 1 aX2, ax, OXk
Observe that for H sufficiently smooth, the flow in phase space is generally
incompressible. That is,
258 BIRTHDEATH MARKOV CHAINS
Liouville first noticed the important fact that incompressibility gives the volume
preserving property of the flow in phase space.
Lionville Theorem T.5.1. Suppose that f(x) in (T.5.1) is such that div f(x) = 0 for
all x. Then for each bounded (measurable) set D c R', IT DI = IDI for all t > 0, where
I I denotes ndimensional volume (Lebesgue measure).
Proof. By the uniqueness condition stated at the outset we have T, +h = T,Th for all
t, h > 0. So, by the change of variable formula,
T,,x
ITs+hDI = f d e t( l dx.
11,D \ ax )
aThx
I+ af h+O(h2) as h0.
ax = ax
But, expanding the determinant and collecting terms, one sees for any matrix M that
det( Ox ) = 1 + O(h 2 ) as h p 0.
ax
It follows that for each t >_ 0
THEORETICAL COMPLEMENTS 259
Proof: Consider A, T", T  2 "A, .... Then there are distinct times i, j such that
IT . "A n T'01 ^ 0; for otherwise
It follows that
IO n T  "li  'IAI ^ 0.
ContinuousParameter Markov
Chains
<,
In other words, for any sequence of time points 0 t o < t, < ... , the discrete
parameter process Yo := X, 0 , Y, := Xr ..... is a Markov chain as described in
I
Chapter II. The conditional probabilities p, J (s, t) = P(X1 = j Xs = i), 0 < s < t,
are collectively referred to as the transition probability law for the process. In
the case p ; j (s, t) is a function of t s, the transition law is called
timehomogeneous, and we write p, 1 (s, t) = p, j (t s).
Simple examples of continuousparameter Markov chains are the
continuoustime random walks, or processes with independent increments on
countable state space. Some others are described in the examples below.
Example 1. (The Poisson Process). The Poisson process with intensity function
p is a process with state space S = {0, 1, 2, ...} having independent increments
distributed as
261
262 CONTINUOUSPARAMETER MARKOV CHAINS
(f:P(u)du )^ (
P.1(s,t)=P(Xr=j1 X, = i)
P(X'
= P = j, Xs = i)
(X5=I)
(J i )!
l expj p(u) du)
\
J
for j i
(1.3)
0 ifj<i.
(j i)! .
Pi;(s,t)= (1.4)
0, 3> ,
Therefore,
p,J(s,t)=E{P(X,XS=j i N,N )} S
2(t S)]k
*k ^
_ Y if) (j k^ e
( 1 .7)
=
k0
Here p(t o ) takes the place of p, and t o is treated as the unit of time. Events that
depend on the process at time points that are not multiples of t o are excluded.
Likewise, specifying transition matrices p(t o ), p(t, ), ... , p(t) for an arbitrary
finite set of time points t o , t,, ... , t, will not be enough.
On the other hand, if one specifies all transition matrices p(t) of a
timehomogeneous Markov chain for values of t in a time interval 0 < t <, t o
for some t o > 0, then, regardless of how small t o > 0 may be, all other transition
probabilities may be constructed from these. To understand this basic fact, first
264 CONTINUOUSPARAMETER MARKOV CHAINS
assume transition matrices p(t) to be given for all t > 0, together with an initial
distribution n. Then for any finite set of time points 0 < t l < t 2 < < t,,, the
joint distribution of X 0 , X,,, ... , X given by
and
Pi(XI +s = k) =Y JEs
P;(XX =j, Xt +s = k). (2.4)
pik(t + s) _Ep
,jEs
i; (t)p Jk (s) (i, k E S; s > 0, t > 0), (2.5)
Therefore, the transition matrices p(t) cannot be chosen arbitrarily. They must
be so chosen as to satisfy the ChapmanKolmogorov equations.
It turns out that (2.5) is the only restriction required for consistency in the
sense of prescribing finitedimensional distributions as in Section 6, Chapter I.
To see this, take an arbitrary initial distribution it and time points
0 < t t < t 2 < t 3 . For arbitrary states i 0 , i l , i 2 , i 3 , one has from (2.2) that
as well as
But consistency requires that (2.8) be obtained from (2.7) by summing over i 2 .
This sum is
showing that the right sides of (2.8) and (2.9) are indeed equal. Thus, if (2.5)
holds, then (2.2) defines joint distributions consistently, i.e., the joint distribution
at any finite set of points as specified by (2.2) equals the probability obtained
by summing successive probabilities of a joint distribution (like (2.2)) involving
a larger set of time points, over states belonging to the additional time points.
Suppose now that p(t) is given for 0 < t < t o , for some t o > 0, and the
transition probability matrices satisfy (2.6). Since any t > t o may be expressed
uniquely as t = rt o + s, where r is a positive integer and 0 < s < t o , by (2.6)
we have
Thus, it is enough to specify p(t) on any interval 0 < t < t o , however small
t o > 0 may be. In fact, we will see that under certain further conditions p(t) is
determined by its values for infinitesimal times; i.e., in the limit as t o 0.
From now on we shall assume that
where I is the identity matrix, with 1's along the diagonal and 0's elsewhere.
266 CONTINUOUSPARAMETER MARKOV CHAINS
p(o)=I. (2.12)
Then (2.11) expresses the fact that p(t), 0 < t < oc, is (componentwise)
continuous at t = 0 as a function of t. It may actually be shown that owing to
the rich additional structure reflected in (2.6), continuity implies that p(t) is in
fact differentiable in t, i.e., p(t) = d(p ;; (t))/dt exist for all pairs (i, j) of states,
and alit > 0. At t = 0, of course, "derivative" refers to the righthand derivative.
In particular, the parameters q ;j given by
are well defined. Instead of proving differentiability from continuity for transition
probabilities, which is nontrivial, we shall assume from now on that p ;; (t) has a
finite derivativefor all (i, j) as part of the required structure. Also, we shall write
Q = ((qi;)), (2.14)
Suppose for the time being that S is finite. Since the derivative of a finite
sum equals the sum of the derivatives, it follows by differentiating both sides
of (2.5) with respect to t and setting t = 0 that
Y 9i; = 0. (2.21)
jES
Note that
in view of the fact p iJ (t) 0 = p ij (0) for i 0 j, and p ii (t) < 1 = p i; (0).
In the general case of a countable state space S, the termbyterm
differentiation used to derive Kolmogorov's equations may not always be
justified. Conditions are given in the next two sections for the validity of these
equations for transition probabilities on denumerable state spaces. However,
regardless of whether or not the differential equations are valid for given
transition probabilities p(t), we shall refer to the equations in general as
Kolmogorov's backward and forward equations, respectively.
co ^ktk
p 1 (t) = I f*k(j i) 1 e at
t j0,
k=o k
= s ;; e + f(j i)a.te ^' + o (t), as t j 0. (2.23)
Therefore,
Y_
p(t) _ > gikPkj(t), i, j e S, t 0, (3.1)
k
In the case that S is finite it is known from the theory of ordinary differential
equations that, subject to the initial condition p(0) = I, the unique solution to
(3.1) is given by
p(t) = e, t ? 0, (3.4)
Example 1. Consider the case S = {0, 1} for a general twostate Markov chain
with rates
Therefore,
r(+b)
p(t)= e`Q=I R I (e 1)Q
_ 1 S + e(B+a)' e c+air
+b S Se  +Se
It is also simple, however, to solve the (forward) equations directly in this case
(Exercise 3).
In the case that S is countably infinite, results analogous to those for the
finite case can be obtained under the following fairly restrictive condition,
z t t"
p1j(t) = ; + tq ;j + t2 q (2) + ... + t n q (n) + ... (3.11)
so that the series on the right in (3.11) converges to a function r;j (t), say,
absolutely for all t. By termbyterm differentiation of this series for r(t), which
is an analytic function of t, one verifies that r. (t) satisfies the Kolmogorov
backward equation and the correct initial condition. Uniqueness under (3.9)
follows by the same estimates typically used in the finite case (Exercise 2).
To verify the ChapmanKolmogorov equations (2.6), note that
2 m
p (t +s)=e ( z+s'Q=I+(t+s)Q+^t2^s^ Q 2 + ... .^^^
m
t Qm +...l
=
L I+tQ+t?Q2+...
2!
_}.
m! J
2 n
x I+sQ+sQz+... +S_Qn+..
2! n!
= e`QesQ = p(t)p(s). (3.13)
with initial conditions H 1 (0) = E IES b i; = 1 (i e S). Since HH (t):= 1 (for all t > 0,
all i e S) clearly satisfy these equations, one has H,(t) = 1 for all t by uniqueness
of such solutions. Thus, the solutions (3.11) have been shown to satisfy all
conditions for being transition probabilities except for nonnegativity (Exercise
5). Nonnegativity will also follow as a consequence of a more general method
of construction of solutions given in the next section. When it applies, the
exponential form (3.4) (equivalently, (3.11)) is especially suitable for calculations
of transition probabilities by spectral methods as will be seen in Section 9.
Example 2. (Poisson Process). The Poisson process with parameter 2 > 0 was
introduced in Example 1.1. Alternatively, the process may be regarded as a
Markov process on the state space S = {0, 1, 2, ...} with prescribed infinitesimal
transition rates of the form
nn+j n
(1) i^l, if0<jinn
(n) _ (3.16)
0, otherwise.
/ t2
!)ijlt) = (S ij + tqi; + Cji^ + .. .
t ;i+k oo tji+k i + k
Likewise,
For the general problem of constructing transition probabilities ((p ;; (t))) having
a prescribed set of infinitesimal transition rates given by Q = ((q1)), where
the method of successive approximations will be used in this section. The main
result provides a solution to the backward equations
Theorem 4.1. Given any Q satisfying (4.1) there exists a smallest nonnegative
solution p(t) of the backward equations (4.2) satisfying (4.3). This solution
satisfies
In case equality holds in (4.4) for all i e S and t > 0, there does not exist any
other nonnegative solution of (4.2) that satisfies (4.3) and (4.4).
or
d li"
ds (ePik(s)) = A;e PIk(S) + e r ' s ga;P;k(s) = y elisqPJk(s)
JE$ j :i
or
Reversing the steps shows that (4.2) together with (4.3) follow from (4.5). Thus
(4.2) and (4.3) are equivalent to the system of integral equations (4.5). To solve
the system (4.5) start with the first approximation
Pik = Sike A'` (i, k e S, t ) 0) (4.6)
Since q ;j 0 for i j, it is clear that p;k ^(t) > p;k ^(t). It then follows from (4.7)
by induction that p;k + '(t) p(t) for all n , 0. Thus, p ;k (t) = lim n . p(t)
exists. Taking limits on both sides of (4.7) yields
Pik(t) = 6ikeAil + ea'(tS)gijPjk(s) ds. (4.8)
j9 6 i JO
Hence, p ik (t) satisfy (4.5). Also, P k (t)  p; (t) >, 0. Further, >kes Pik (t) 1 for
all t > 0 and all i. Assuming, as induction hypothesis, that >kES Pik '^(t) 1
for all t >, 0 and all i, it follows from (4.7) that
e x;' + ds
joi o
Hence, Ekcs p(t) < 1 for all n, all t >, 0, and the same must be true for
kES
I Pikt) = lim X pik (t)
nj ro kES
)
We now show that p(t) is the smallest nonnegative solution of (4.5). Suppose
p(t) is any other nonnegative solution. Then obviously p ;k (t) ^ 6 ik e = p(t)
for all i, k, t. Assuming, as induction hypothesis, p ik (t) > p;k 1 (t) for all i, k e S.
t > 0, it follows from the fact that p ;k (t) satisfies (4.5) that
SOLUTIONS TO KOLMOGOROV'S EQUATIONS BY SUCCESSIVE APPROXIMATION 273
r
> S ex;t + e x;a s^ cn1) sds
pik (t) > ik qjk
ij p ( ) /t ) .
= p ikt
joi Jo
Hence, p ik (t) p(t) for all n 0 and, therefore, p ikt) 15 k (t) for all i, k e S
and all t >, 0. The last assertion of the theorem is almost obvious. For if equality
holds in (4.4) for p(t), for all i and all t >, 0, and p(t) is another transition
probability matrix, then, by the above p ik (t) ^ p ;k (t) for all i, k, and t >, 0. If
strict inequality holds for some t = t o and i = i o then summing over k one gets
Note that we have not proved that p(t) satisfies the ChapmanKolmogorov
equation (2.6). This may be proved by using Laplace transforms (Exercise 6).
It is also the case that the forward equations (2.18) (or (2.19)) always hold for
the minimal solution p(t) (Exercise 5).
In the case that (3.9) holds, i.e., the bounded rates condition, there is only
one solution satisfying the backward equations and the initial conditions and,
therefore, p;k(t) is given by exponential representation on the right side of (3.11).
Of course, the solution may be unique even otherwise. We will come back to
this question and the probabilistic implications of nonuniqueness in the next
section.
Finally, the upshot of all this is that the Markov process is under certain
circumstances specified by an initial distribution it and a matrix Q, satisfying
(4.1). In any case, the minimal solution always exists, although the total mass
may be less than 1.
Clearly,
poo(t) =
(4.10)
p01 (t) _ (v + A)poI(t) + vpoo(t) _ (v + A)poI(t) + ve "`
274 CONTINUOUSPARAMETER MARKOV CHAINS
This equation can be solved with the aid of an integrating factor as follows.
Let g(t) = e ( v + z tp ol (t). Then (4.10) may be expressed as
dg(t) = vez,
dt '
or
Therefore,
Next
= e(v+2)t [1'(%'
+ ^1) J` (ezz"
ezu) du]
^ o
= ecv+2t
z) [''("_+ A) e 2zr 1 e zt 1
,1 221 2
v(v + 2) [ e  vt
2e (
v+z ) t
+ e  w+zz)t]
22 2
v(v + 2) e_vt[1
e  zt]z
22 2
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY 275
yields
Po,"+ 1
(t) = ecv+("+1)z)r
J 0
e(v+("+1)z)u(v
+ n.1 )po"(u) du
x/ f
2 l
1 (x 1)"+1 e ,.^ 1 ( e z^ 1)"+1
= L n+1 ] 1 n+l
Hence,
Let Q = ((q ij )) be transition rates satisfying (4.1) and such that the
corresponding Kolmogorov backward equation admits a unique (transition
probability semigroup) solution p(t) = ((p ;j (t))). Given an initial distribution it
on S there is a Markov process {X1 } with transition probabilities p(t), t 0,
and initial distribution it having rightcontinuous sample paths. Indeed, the
process {X1 } may be constructed as coordinate projections on the space S2 of
rightcontinuous step functions on [0, oo) with values in S (theoretical
complement 5.3).
Our purpose in the present section is to analyze the probabilistic nature of
the process {X,}. First we consider the distribution of the time spent in the
initial state.
Proposition 5.1. Let the Markov chain {X,: 0 < t < cc} have the initial state
i and let To = inf{t > 0: X, 0 i }. Then To has an exponential distribution with
parameter q . In the case q = 0, the degeneracy of the exponential
;; ;;
Proof. Choose and fix t > 0. For each integer n > 1 define the finitedimensional
event
276 CONTINUOUSPARAMETER MARKOV CHAINS
A={X(m/2 n )t =iform=0,l,...,2n}.
A = linl A:= fl A
n00 n=1
To see why the last equality holds, first note that{ To > t} = {X = i for all u
in [0, t]} c A. On the other hand, since the sample paths are step functions, if
a sample path is not in {To > t} then there occurs a jump to state j, different
from i, at some time t o (0 < t o < t). The case t o = t may be excluded, since it
is not in A. Because each sample path is a rightcontinuous step function, there
is a time point t 1 > t o such that X. =j for t o < u < t 1 . Since there is some u
of the form u = (m/2n)t < t in every nondegenerate interval, it follows that
X = j for some u of the form u = (m/2n)t < t; this implies that this sample path
is not in A. and, hence, not in A. Therefore, {To > t} A. Now note by (2.2)
r Z
P1 (To > t) = Pi (A) = lim F(A) = 11n p(2
nt x n 1 o0 L
t 1 2
]
= l im 1 + 2 qii
+ o^2 ) = e`q , (5.1)
nt o0
The following random times are basic to the description of the evolution of
continuous time Markov chains.
Thus, To is the holding time in the initial state, T1 is the holding time in the
state to which the process jumps first time, and so on. Generally, T. is the
holding time in the state to which the process jumps at its nth transition.
As usual, P; denotes the distribution of the process {X1 } under Xo = i. As
might be guessed, given the past up to and including time To , the process evolves
from time To onwards as the original Markov process would with initial state
X.0 . More generally, given the sample path of the process up to time
= To + T1 + + Tn _ 1 , the conditional distribution of {Xz , + ,: t > 0} is Ps , , ,
depends only on the (present) state X, and on nothing else in the past.
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY 277
Although this seems intuitively clear from the Markov property, the time To is
not a constant, as in the Markov property, but a random variable.
The italicized statement above is a case of an extension of the Markov
property known as the strong Markov property. To state this property we
introduce a class of random times called stopping times or Markov times. A
stopping time r is a random time, i.e., a random variable with values in [0, 00],
with the property that for every fixed time s, the occurrence or nonoccurrence
of the event {i <, s} can be determined by a knowledge of {X: 0 < u <, s}. For
example, if one cannot decide whether or not the event IT < 10} has happened
by observing only {X,,: 0 < u <, 10}, then r is not a stopping time. The random
variables i l = To , r 2 = To + T1 ..... t are stopping times, but T1 , To + TZ are
not (Exercise 1).
Proposition 5.2. On the set {co e Si: r(w) < 00}, the conditional distribution of
{X^ + ,: t >, 0} given the past up to time r is P, if r is a stopping time.
Stated another way, Proposition 5.2 says that on IT < co}, given the past of
the process {X,} up to time r, the future is (conditionally) distributed as the
Markov chain starting at X, and having the same transition probabilities as
those of {X1 }. This is the strong Markov property. The proof of Proposition 5.3
in the case that r is discrete is similar to that already given in Section 4, Chapter
II. The proof in the general case follows by approximating t by discrete stopping
times. For a detailed proof see Theorem 11.1 in Chapter V.
The strong Markov property will now be used to obtain a vivid probabilistic
description of the Markov chain {X,}. For this, let us write
k..=0
^r = qjif q1j : 0,
= q " for i J (5.3)
P1 (s<T0 s+A,X5+n=J)
P; (X =i for 0<u<s, Xs+o=j)
=P; (X= i for 0<u <s)P;(XS+o =jI X =i for 0<u<s)
= P;(To > s)p1j(A) = ev sp 1 (A). (5.5)
,
278 CONTINUOUSPARAMETER MARKOV CHAINS
Dividing the first and last expressions of (5.5) by i and letting 0. 0 one gets
the joint densitymass function of To and XTp at s and j, respectively (Exercise 2),
qij
iTo.xTO (s , j) = e v1s pij(0 ) = e v " s gij = 21exs , (5.6)
where 2 i = q ;i .
Now use Propositions 5.1 and 5.3 and the strong Markov property to check
the following computation.
= Pio(TI ' S1, Xti = l2, ... , Tn ' Sng Xt.^ = in+l I TO = S, XTo = i 1)^io
s=0
x e liosk ;oi , ds

so
1 (T <1 S1 i XT o = i2, ... , TnI < sn , Xs. = in+1)2ioe^ioskioi, ds
= s=0
so \
ds)kio`,
_Aioexios RieitsdS)ki,i2
:=o ( fs"=0
X Pie(TO'S2, XTo= 13,...,Tn2'Sn,Xtn_2 =in+J
Note that
n
fl
j=0
= Pi (Y1 = il, Y2 = i2, ... , Yn+1 = in+1) ,
Theorem 5.4
(a) Let {X1 } be a Markov chain having the infinitesimal generator Q and
initial state i o . Then {Yn := XT ,: n = 0, 1, 2, ...} is a discrete parameter
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY 279