Markov Chains

J. R. Norris
University of Cambridge
~ u ~ ~ u CAMBRIDGE
::: UNIVERSITY PRESS
PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE
The Pitt Building, Trumpington Street, Cambridge CB2 lRP, United Kingdom
CAMBRIDGE UNIVERSITY PRESS
The Edinburgh Building, Cambridge CB2 2RU, United Kingdom
40 West 20th Street, New York, NY 10011-4211, USA
10 Stamford Road, Oakleigh, Melbourne 3166, Australia
© Cambridge University Press 1997
This book is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without
the written permission of Cambridge University Press.
First published 1997
Reprinted 1998
First paperback edition 1998
Printed in the United States of America
TYPeset in Computer Modem
A catalogue recordfor this book is available from the British Library
Library of Congress Cataloguing-in-Publication Data is available
ISBN 0-521-48181-3 hardback
ISBN 0-521-63396-6 paperback
For my parents
Contents
Preface
Introduction
1. Discrete-time- Markov chains
1.1 Definition and basic properties
1.2 Class structure
1.3 Hitting times and absorption probabilities
1.4 Strong Markov property
1.5 Recurrence and transience
1.6 Recurrence and transience of random walks
1.7 Invariant distributions
1.8 Convergence to equilibrium
1.9 Time reversal
1.10 Ergodic theorem
1.11 Appendix: recurrence relations
1.12 Appendix: asymptotics for n!
2. Continuous-time Markov chains I
2.1 Q-matrices and their exponentials
2.2 Continuous-time random processes
2.3 Some properties of the exponential distribution
ix
xiii
1
1
10
12
19
24
29
33
40
47
52
57
58
60
60
67
70
viii Contents
2.4 Poisson processes 73
2.5 Birth processes 81
2.6 Jump chain and holding times 87
2.7 Explosion 90
2.8 Forward and backward equations 93
2.9 Non-minimal chains 103
2.10 Appendix: matrix exponentials 105
3. Continuous-time Markov chains II 108
3.1 Basic properties 108
3.2 Class structure 111
3.3 Hitting times and absorption probabilities 112
3.4 Recurrence and transience 114
3.5 Invariant distributions 117
3.6 Convergence to equilibrium 121
3.7 Time reversal 123
3.8 Ergodic theorem 125
4. Further theory 128
4.1 Martingales 128
4.2 Potential theory 134
4.3 Electrical networks 151
4.4 Brownian motion 159
5. Applications 170
5.1 Markov chains in biology 170
5.2 Queues and queueing networks 179
5.3 Markov chains in resource management 192
5.4 Markov decision processes 197
5.5 Markov chain Monte Carlo 206
6. Appendix: probability and measure 217
6.1 Countable sets and countable sums 217
6.2 Basic facts of measure theory 220
6.3 Probability spaces and expectation 222
6.4 Monotone convergence and Fubini's theorem 223
6.5 Stopping times and the strong Markov property 224
6.6 Uniqueness of probabilities and independence of a-algebras 228
Further reading 232
Index 234
Preface
Markov chains are the simplest mathematical models for random phenom-
ena evolving in time. Their simple structure makes it possible to say a great
deal about their behaviour. At the same time, the class of Markov chains
is rich enough to serve in many applications. This makes Markov chains
the first and most important examples of random processes. Indeed, the
whole of the mathematical study of random processes can be regarded as a
generalization in one way or another of the theory of Markov chains.
This book is an account of the elementary theory of Markov chains,
with applications. It was conceived as a text for advanced undergraduates
or master's level students, and is developed from a course taught to un-
dergraduates for several years. There are no strict prerequisites but it is
envisaged that the reader will have taken a course in elementary probability.
In particular, measure theory is not a prerequisite.
The first half of the book is based on lecture notes for the undergradu-
ate course. Illustrative examples introduce many of the key ideas. Careful
proofs are given throughout. There is a selection of exercises, which forms
the basis of classwork done by the students, and which has been tested
over several years. Chapter 1 deals with the theory of discrete-time Markov
chains, and is the basis of all that follows. You must begin here. The
material is quite straightforward and the ideas introduced permeate the
whole book. The basic pattern of Chapter 1 is repeated in Chapter 3 for
continuous-time chains, making it easy to follow the development byanal-
ogy. In between, Chapter 2 explains how to set up the theory of continuous-
x Preface
time chains, beginning with simple examples such as the Poisson process
and chains with finite state space.
The second half of the book comprises three independent chapters in-
tended to complement the first half. In some sections the style is a lit-
tle more demanding. Chapter 4 introduces, in the context of elementary
Markov chains, some of the ideas crucial to the advanced study of Markov
processes, such as martingales, potentials, electrical networks and Brownian
motion. Chapter 5 is devoted to applications, for example to population
growth, mathematical genetics, queues and networks of queues, Markov de-
cision processes and Monte Carlo simulation. Chapter 6 is an appendix to
the main text, where we explain some of the basic notions of probability
and measure used in the rest of the book and give careful proofs of the few
points where measure theory is really needed.
The following paragraph is directed primarily at an instructor and as-
sumes some familiarity with the subject. Overall, the book is more focused
on the Markovian context than most other books dealing with the elemen-
tary theory of stochastic processes. I believe that this restriction in scope
is desirable for the greater coherence and depth it allows. The treatment
of discrete-time chains in Chapter 1 includes the calculation of transition
probabilities, hitting probabilities, expected hitting times and invariant dis-
tributions. Also treated are recurrence and transience, convergence to equi-
librium, reversibility, and the ergodic theorem for long-run averages. All
the results are proved, exploiting to the full the probabilistic viewpoint.
For example, we use excursions and the strong Markov property to obtain
conditions for recurrence and transience, and convergence to equilibrium is
proved by the coupling method. In Chapters 2 and 3 we proceed via the
jump chain/holding time construction to treat all right-continuous, mini-
mal continuous-time chains, and establish analogues of all the main results
obtained for discrete time. No conditions of uniformly bounded rates are
needed. The student has the option to take Chapter 3 first, to study the
properties of continuous-time chains before the technically more demand-
ing construction. We have left measure theory in the background, but
the proofs are intended to be rigorous, or very easily made rigorous, when
considered in measure-theoretic terms. Some further details are given in
Chapter 6.
It is a pleasure to acknowledge the work of colleagues from which I have
benefitted in preparing this book. The course on which it is based has
evolved over many years and under many hands - I inherited parts of it
from Martin Barlow and Chris Rogers. In recent years it has been given
by Doug Kennedy and Colin Sparrow. Richard Gibbens, Geoffrey Grim-
Preface xi
mett, Frank Kelly and Gareth Roberts gave expert advice at various stages.
Meena Lakshmanan, Violet Lo and David Rose pointed out many typos ,and
ambiguities. Brian Ripley and David Williams made constructive sugges-
tions for improvement of an early version.
I am especially grateful to David Thanah at Cambridge University Press
for his suggestion to write the book and for his continuing support, and to
Sarah Shea-Simonds who typeset the whole book with efficiency, precision
and good humour.
Cambridge, 1996 James Norris
Introduction
This book is about a certain sort of random process. The characteristic
property of this sort of process is that it retains no memory of where it has
been in the past. This means that only the current state of the process can
influence where it goes next. Such a process is called a Markov process. We
shall be concerned exclusively with the case where the process can assume
only a finite or countable set of states, when it is usual to refer it as a
Markov chain.
Examples of Markov chains abound, as you will see throughout the book.
What makes them important is that not only do Markov chains model
many phenomena of interest, but also the lack of memory property makes
it possible to predict how a Markov chain may behave, and to compute
probabilities and expected values which quantify that behaviour. In this
book we shall present general techniques for the analysis of Markov chains,
together with many examples and applications. In this introduction we
shall discuss a few very simple examples and preview some of the questions
which the general theory will answer.
We shall consider chains both in discrete time
n E Z+ = {O, 1, 2, ... }
and continuous time
t E jR+ = [0, (0).
The letters n, m, k will always denote integers, whereas t and s will refer
to real numbers. Thus we write ( X n ) n ~ O for a discrete-time process and
( X t ) t ~ O for a continuous-time process.
XIV Introduction
Markov chains are often best described by diagrams, of which we now
give some simple examples:
(i) (Discrete time)
1
3
1
3
2
You move from state 1 to state 2 with probability 1. From state 3 you
move either to 1 or to 2 with equal probability 1/2, and from 2 you jump
to 3 with probability 1/3, otherwise stay at 2. We might have drawn a loop
from 2 to itself with label 2/3. But since the total probability on jumping
from 2 must equal 1, this does not convey any more information and we
prefer to leave the loops out.
(ii) (Continuous time)
A
o••- - - ~ . ~ - - ...... 1
When in state 0 you wait for a random time with exponential distribution
of parameter A E (0, 00), then jump to 1. Thus the density function of the
waiting time T is given by
for t ~ o.
We write T rv E(A) for short.
(iii) (Continuous time)
A
• •
o 1
A

2 3 4
Here, when you get to 1 you do not stop but after another independent
exponential time of parameter Ajump to 2, and so on. The resulting process
is called the Poisson process of rate A.
3
Introduction
1
4
2
xv
(iv) (Continuous time)
In state 3 you take two independent exponential times T
1
rv E(2) and
T2 rv E (4); if T
1
is the smaller you go to 1 after time T
1
, and if T2 is the
smaller you go to 2 after time T
2
. The rules for states 1 and 2 are as given
in examples (ii) and (iii). It is a simple matter to show that the time spent
in 3 is exponential of parameter 2 + 4 = 6, and that the probability of
jumping from 3 to 1 is 2/(2 +4) = 1/3. The details are given later.
(v) (Discrete time)
3 6
2
o
5
We use this example to anticipate some of the ideas discussed in detail
in Chapter 1. The states may be partitioned into communicating classes,
namely {O}, {I, 2, 3} and {4, 5, 6}. Two of these classes are closed, meaning
that you cannot escape. The closed classes here are recurrent, meaning
that you return again and again to every state. The class {O} is transient.
The class {4, 5, 6} is periodic, but {I, 2, 3} is not. We shall show how to
establish the following facts by solving some simple linear/ equations. You
might like to try from first principles.
(a) Starting from 0, the probability of hitting 6 is 1/4.
(b) Starting from 1, the probability of hitting 3 is 1.
(c) Starting from 1, it takes on average three steps to hit 3.
(d) Starting from 1, the long-run proportion of time spent in 2 is 3/8.
xvi Introduction
Let us write pij for the probability starting from i of being in state j after
n steps. Then we have:
(e) lim POI = 9/32;
n---+oo
(f) P04 does not converge as n ~ 00;
(g) lim pg4 = 1/124.
n---+oo
1
Discrete-time Markov chains
This chapter is the foundation for all that follows. Discrete-time Markov
chains are defined and their behaviour is investigated. For better orien-
tation we now list the key theorems: these are Theorems 1.3.2 and 1.3.5
on hitting times, Theorem 1.4.2 on the strong Markov property, Theorem
1.5.3 characterizing recurrence and transience, Theorem 1.7.7 on invariant
distributions and positive recurrence. Theorem 1.8.3 on convergence to
equilibrium, Theorem 1.9.3 on reversibility, and Theorem 1.10.2 on long-
run averages. Once you understand these you will understand the basic
theory. Part of that understanding will come from familiarity with exam-
ples, so a large number are worked out in the text. Exercises at the end of
each section are an important part of the exposition.
1.1 Definition and basic properties
Let I be a countable set. Each i E I is called a state and I is called the
state-space. We say that A = (Ai: i E I) is a measure on I if 0 ~ Ai < 00
for all i E I. If in addition the total mass EiEI Ai equals 1, then we call
A a distribution. We work'throughout with a probability space (0, F, lP).
Recall that a random variable X with values in I is a function X : 0 --+ I.
Suppose we set
Ai = IF(X = i) = IF( {w : X(w) = i}).
2 1. Discrete-time Markov chains
Then A defines a distribution, the distribution of X. We think of X as
modelling a random state which takes the value i with probability Ai. There
is a brief review of some basic facts about countable sets and probability
spaces in Chapter 6.
We say that a matrix P == (Pij : i, j E I) is stochastic if every row
(Pij : j E I) is a distribution. There is a one-to-one correspondence between
stochastic matrices P and the sort of diagrams described in the Introduc-
tion. Here are two examples:
(I-a
1 /3)
P-
- (3
1
p= (
1
)
1/2
1/2 0 1/2
3 1 2
2
We shall now formalize the rules for a Markov chain by a definition in
terms of the corresponding matrix P. We say that is a Markov
chain wit'h initial distribution A and transition matrix P if
(i) X
o
has distribution A;
(ii) for n 0, conditional on X
n
== i, X
n
+
1
has distribution (Pij : j E I)
and is independent of X
o
,.·· ,X
n
-
1
.
More explicitly, these conditions state that, for n 2: 0 and io, ... ,in+l E I,
(i) P(X0 == io) == Aio;
(ii) P(X
n
+1 == i
n
+
1
I X
o
== io, ... ,X
n
== in) == Pi
n
i
n
+
1

We say that is Markov (A, P) for short. If is a finite
sequence of random variables satisfying (i) and (ii) for n == 0, ... ,N - 1,
then we again say is Markov (A, P).
It is in terms of properties (i) and (ii) that most real-world examples are
seen to be Markov chains. But mathematically the. following result appears
to give a more comprehensive description, and it is the key to some later
calculations.
Theorem 1.1.1. A discrete-time random process (Xn)O<n<N is
Markov(A, P) if and only if for all io, ... ,iN E I
1.1 Definition and basic properties
Proof. Suppose is Markov(A, P), then
P(X
o
= iO,X
I
= i
l
, ... ,X
N
= iN)
= P(X
o
= iO)P(X
I
= il I X
o
= io)
... lP(X
N
= iN I X
o
= io,· .. ,X
N
-
I
= iN-I)
3
On the other hand, if (1.1) holds for N, then by summing both sides over
iN E I and using EjEI Pij = 1 we see that (1.1) holds for N - 1 and, by
induction
for all n = 0,1, ... ,N. In particular, P(X
o
= io) = Aio and, for n
0,1, ... ,N - 1,
P(X
n
+
1
= i
n
+
1
I X
o
= io, ... ,X
n
= in)
= P(X
o
= i
o
, ... ,X
n
= in, X
n
+
1
= in+I)/P(X
O
= i
o
,· .. ,X
n
= in)
So is Markov(A, P). D
The next result reinforces the idea that Markov chains have no memory.
We write 8
i
= (8
ij
: j E I) for the unit mass at i, where
{
I ifi=j
8ij =
°otherwise.
Theorem 1.1.2 (Markov property). Let be Markov(A, P).
Then, conditional on X
m
= i, is Markov(8
i
, P) and is indepen-
dent of the random variables X
o
, . .. ,X
m
.
Proof. We have to show that for any event A determined by ... ,X
m
we have
lP({Xm = im,··· ,Xm+n = im+n} n A I Xm = i)
= 8iirnPirnirn+l .. ·Pirn+n-lirn+nlP(A I Xm = i) (1.2)
then the result follows by Theorem 1.1.1. First consider the case of elemen-
taryevents
A = {X
o
= i
o
, ... ,X
m
= i
m
}.
4 1. Discrete-time Markov chains
In that case we have to show
P(X
o
= i
o
,· .. ,X
m
+
n
= i
m
+
n
and i = im)/P(X
m
= i)
= biirnPi
rn
irn+l Pirn+n-l i
rn
+
n
X lP(X
o
= io, ,X
m
= i
m
and i = im)/lP(X
m
= i)
which is true by Theorem 1.1.1. In general, any event A determined by
X
o
, ... ,X
m
may be written as a countable disjoint union of elementary
events
00
A= UAk.
k=l
Then the desired identity (1.2) for A follows by summing up the corre-
sponding identities for A
k
. D
The remainder of this section addresses the following problem: what is
the probability that after n steps our Markov chain is in a given First
we shall see how the problem reduces to calculating entries in the nth power
of the transition matrix. Then we shall look at some examples where this
may be done explicitly.
We regard distributions and measures A as row vectors whose compo-
nents are indexed by I, just as P is a matrix whose entries are indexed by
I x I. When I is finite we will often label the states 1,2, ... ,N; then A
will be an N-vector and P an N x N-matrix. For these objects, matrix
multiplication is a familiar operation. We extend matrix multiplication to
the general case in the obvious way, defining a new measure AP and a new
matrix p
2
by
(AP)j = L AiPij,
iEI
(p2)ik = LPijPjk.
jEI
We define pn similarly for any n. We agree that pO is the identity matrix
I, where (I)ij = 8
ij
. The context will make it clear when I refers to the
state-space and when to the identity matrix. We write = (pn)ij for
the (i, j) entry in pn.
In the case where Ai > °we shall write Pi(A) for the conditional prob-
ability P(A I X
o
= i). By the Markov property at time m = 0, under lPi,
is Markov(8
i
, P). So the behaviour of under lPi does not
depend on A.
Theorem 1.1.3. Let be Markov(A, P). Then, for all n, m 0,
(i) P(X
n
= j) = (Apn)j;
(ii) lP\(X
n
= j) = JP(X
n
+
m
= j I X
m
= i) = ·
1.1 Definition and basic properties
Proof. (i) By Theorem 1.1.1
P(Xn = j) = L ·.. L P(X
o
= i
o
,··· ,Xn- I = in-I, X n = j)
ioEI in-lEI
= L ... L AioPioil" · Pin-Ii = (Apnk
ioEI in-lEI
5
(ii) By the Markov property, conditional on X
m
= i, is Markov
(8
i
, P), so we just take A = 8
i
in (i). D
In light of this theorem we call the n-step transition probability from i
to j. The following examples give some methods for calculating
Example 1.1.4
The most general two-state chain has transition matrix of the form
(
I-a
p = {3
and is represented by the following diagram:
{3
We exploit the relation p
n
+
I
= pnP to write
(n+I) _ (n){3 + (n) (1 _ )
PII - PI2 PII a .
We also know that + = IP\ (X
n
= 1 or 2) = 1, so by eliminating
we get a recurrence relation for
(n+I) - (1 - - (3) (n) + {3
PII - a PII ,
This has a unique solution (see Section 1.11):
(n) {_(3_ + _a_(l_ a - (3)n for a +{3 > 0
PII = a + {3 a + {3
1 for a +{3 = O.
6 1. Discrete-time Markov chains
Example 1.1.5 (Virus mutation)
Suppose a virus can exist in N different strains and in each generation
either stays the same, or with probability a mutates to another strain,
which is chosen at random. What is the probability that the strain in the
nth generation is the same as that in the Oth?
We could model this process as an N-state chain, with N x N transition
matrix P given by
Pii = 1 - a, Pij = a / (N - 1) for i =I j.
Then the answer we want would be found by computing p i ~ ) . In fact, in
this example there is a much simpler approach, which relies on exploiting
the symmetry present in the mutation rules.
At any time a transition is made from the initial state to another with
probability a, and a transition from another state to the initial state with
probability a/(N - 1). Thus we have a two-state chain with diagram
a/(N - 1)
and by putting (3 = a/(N - 1) in Example 1.1.4 we find that the desired
probability is
-!. + (1- -!.) (1- ~ ) n
N N N-1·
Beware that in examples having less symmetry, this sort of lumping together
of states may not produce a Markov chain.
Example 1.1.6
Consider the three-state chain with diagram
1
3
1
2
2
1.1 Definition and basic properties
and transition matrix
7
p= I)·
The problem is to find a general formula for
First we compute the eigenvalues of P by writing down its characteristic
equation
o= det (x - P) = x(x - - = - 1)(4x
2
+1).
The eigenvalues are 1, i/2, -i/2 and from this we deduce that has the
form
(
.)n (")n
(n)
Pu = a + b 2 + c - 2
for some constants a, band c. (The justification comes from linear algebra:
having distinct eigenvalues, P is diagonalizable, that is, for some invertible
matrix U we have
) U-
1
-i/2
and hence
p
n
= U (00
1
U-
1
o (-i/2)n
which forces to have the form claimed.) The answer we want is real
and
= (cosn
2
7r
±isin
n
2
7r
)
so it makes sense to rewrite in the form
for constants Q, (3 The first few values of are easy to write
down, so we get equations to solve for Q, (3 and
1 = = Q +(3
o= pii) = Q +
O
- p(2) - r\J - 1(3
- 11 - L.(, 4
8 1. Discrete-time Markov chains
SO Q: = 1/5, (3 = 4/5, = -2/5 and
More generally, the following method may in principle be used to find a
formula for for any M-state chain and any states i and j.
(i) Compute the eigenvalues AI, ... ,AM of P by solving the character-
istic equation.
(ii) If the eigenvalues are distinct then has the form
(n) _ \ n \ n
Pij - alAI + ... +aMAM
for some constants al, . .. ,aM (depending on i and j). If an eigen-
value A is repeated (once, say) then the general form includes the
term (an+b)A
n
.
(iii) As roots of a polynomial with real coefficients, complex eigenvalues
will come in conjugate pairs and these are best written using sine
and cosine, as in the example.
Exercises
1.1.1 Let B
I
, B
2
, • •• be disjoint events with B
n
= O. Show that if A
is another event and P(AIB
n
) = P for all n then P(A) = p.
Deduce that if X and Yare discrete random variables then the following
are equivalent:
(a) X and Yare independent;
(b) the conditional distribution of X given Y = y is independent of y.
1.1.2 Suppose that is Markov (A, P). If Y
n
= Xkn, show that
is Markov (A, p
k
).
1.1.3 Let X
o
be a random variable with values in a countable set I. Let
Y
I
, Y
2
, . .. be a sequence of independent random variables, uniformly dis-
tributed on [0, 1]. Suppose we are given a function
G : I x [0, 1] --+ I
and define inductively
Show that is a Markov chain and express its transition matrix P
in terms of G. Can all Markov chains be realized in this way? How would
you simulate a Markov chain using a computer?
1.1 Definition and basic properties 9
Suppose now that Zo, Z1, . .. are independent, identically distributed
random variables such that Zi = 1 with probability p and Zi = 0 with
probability 1 - p. Set So = 0, Sn = Z1 + ... + Zn. In each of the following
cases determine whether ( X n ) n ~ O is a Markov chain:
(a) X
n
= Zn, (b) X
n
= Sn,
(c) X
n
= So + ... + Sn, (d)X
n
= (Sn, So + ... + Sn).
In the cases where ( X n ) n ~ O is a Markov chain find its state-space and
transition matrix, and in the cases where it is not a Markov chain give an
example where P(X
n
+
1
= ilX
n
= j, X
n
-
1
= k) is not independent of k.
1.1.4 A flea hops about at random on the vertices of a triangle, with all
jumps equally likely. Find the probability that after n hops the flea is back
where it started.
A second flea also hops about on the vertices of a triangle, but this flea is
twice as likely to jump clockwise as anticlockwise. What is the probability
that after n hops this second flea is back where it started? [Recall that
e±i7r/6 = V3/2 ± i/2.]
1.1.5 A die is 'fixed' so that each time it is rolled the score cannot be the
same as the preceding score, all other scores having probability 1/5. If the
first score fS 6, what is the probability p that the nth score is 6? What is
the probability that the nth score is I?
Suppose now that a new die is produced which cannot score one greater
(mod 6) than the preceding score, all other scores having equal probability.
By considering the relationship between the two dice find the value of p for
the new die.
1.1.6 An octopus is trained to choose object A from a pair of objects A, B
by being given repeated trials in which it is shown both and is rewarded
with food if it chooses A. The octopus may be in one of three states of mind:
in state 1 it cannot remember which object is rewarded and is equally likely
to choose either; in state 2 it remembers and chooses A but may forget
again; in state 3 it remembers and chooses A and never forgets. After each
tr·ial it may change its state of mind according to the transition matrix
State 1 ~ ~ 0
State 2 ~ l2 1
5
2
State 3 0 0 1.
It is in state 1 before the first trial. What is the probablity that it is
in state 1 just before the (n+l)th trial? What is the probability Pn+1 (A)
that it chooses A on the (n + 1)th trial ?
10 1. Discrete-time Markov chains
Someone suggests that the record of successive choices (a sequence of As
and Bs) might arise from a two-state Markov chain with constant transition
probabilities. Discuss, with reference to the value of P
n
+
1
(A) that you have
found, whether this is possible.
1.1.7 Let ( X n ) n ~ O be a Markov chain on {1,2,3} with transition matrix
1 0 )
2/3 1/3 .
1- P 0
Calculate P(X
n
= 11X
o
= 1) in each of the following cases: (a) p = 1/16,
(b) p = 1/6, (c) p = 1/12.
1.2 Class structure
It is sometimes possible to break a Markov chain into smaller pieces, each
of which is relatively easy to understand, and which together give an un-
derstanding of the whole. This is done by identifying the communicating
classes of the chain.
We say that i leads to j and write i ~ j if
Pi(X
n
= j for some n ~ 0) > o.
We say i communicates with j and write i ~ j if both i ~ j and j ~ i.
Theorem 1.2.1. For distinct states i and j the following are equivalent:
(i) i ~ j;
(ii) P
i
oi
l
P
i
li2 ... Pin-li
n
> 0 for some states io,il, ... ,in with io = i and
in = j;
(iii) p ~ j ) > 0 for some n ~ O.
Proof. Observe that
00
p ~ j ) ::; lPi(X
n
= j for some n ~ 0) ::; L p ~ j )
n=O
which proves the equivalence of (i) and (iii). Also
p ~ j ) = L Pii
1
P i li2 .. ·Pin-d
il , ... ,in-l
so that (ii) and (iii) are equivalent. 0
1.3 Hitting times and absorption probabilities 11
It is clear from (ii) that i ~ j and j ~ k imply i ~ k. Also i ~ i for
any state i. So ~ satisfies the conditions for an equivalence relation on I,
and thus partitions I into communicating classes. We say that a class C is
closed if
i E C, i ~ j imply j E C.
Thus a closed class is one from which there is no escape. A state i is
absorbing if {i} is a closed class. The smaller pieces referred to above are
these communicating classes. A chain or transition matrix P where I is a
single class is called irreducible.
As the following example makes clear, when one can draw the diagram,
the class structure of a chain is very easy to find.
Example 1.2.2
Find the communicating classes associated to the stochastic matrix
1 1
0 0 0 0
2 2
0 0 1 0 0 0
1
0 0
1 1
0
P=
3 3 3
0 0 0
1 1
0
2 2
0 0 0 0 0 1
0 0 0 0 1 0
The solution is obvious from the diagram
1 4
2 6
the classes being {1,2,3}, {4} and {5,6}, with only {5,6} being closed.
Exercises
1.2.1 Identify the communicating classes of the following transition matrix:
1
0 0 0
1
2 2
0
1
0
1
0
2 2
P= 0 0 1 0 0
0
1 1 1 1
:4 :4 :4 :4
1
0 0 0
1
2 2
Which classes are closed?
12 1. Discrete-time Markov chains
1.2.2 Show that every transition matrix on a finite state-space has at least
one closed communicating class. Find an example of a transition matrix
with no closed communicating class.
1.3 Hitting times and absorption probabilities
Let (Xn)n>O be a Markov chain with transition matrix P. The hitting time
of a subset A of I is the random variable H
A
: n ~ {O,1,2, ... } U {oo}
given by
HA(w) = inf{n ~ 0 : Xn(w) E A}
where we agree that the infimum of the empty set 0 is 00. The probability
starting from i that ( X n ) n ~ O ever hits A is then
When A is a closed class, hf is called the absorption probability. The mean
time taken for ( X n ) n ~ O to reach A is given by
kt = lEi(H
A
) = 2: nJP>(H
A
= n) + ooJP>(H
A
= (0).
n<oo
We shall often write less formally
Remarkably, these quantities can be calculated explicitly by means of cer-
tain linear equations associated with the transition matrix P. Before we
give the general theory, here is a simple example.
Example 1.3.1
Consider the chain with the following diagram:
1
1
1
2
2
2


E
• 41( •
~

1 2
1
3 4
2
Starting from 2, what is the probability of absorption in 4? How long does
it take until the chain is absorbed in 1 or 4?
Introduce
1.3 Hitting times and absorption probabilities 13
Clearly, hI = 0, h
4
= 1 and k
l
= k
4
= o. Suppose now that we start at 2,
and consider the situation after making one step. With probability 1/2 we
jump to 1 and with probability 1/2 we jump to 3. So
The 1 appears in the second formula because we count the time for the first
step. Similarly,
Hence
h2 = h3 = ( h2 +
k2 = 1 + = 1 + +
So, starting from 2, the probability of hitting 4 is 1/3 and the mean time to
absorption is 2. Note that in writing down the first equations for h
2
and k
2
we made implicit use of the Markov property, in assuming that the chain
begins afresh from its new position after the first jump. Here is a general
result for hitting probabilities.
Theorem 1.3.2. The vector of hitting probabilities h
A
= (hf : i E I) is
the minimal non-negative solution to the system of linear equations
{
hf = 1 for i E A
hf = EjEI Pij
h
1 for i fj. A.
(1.3)
(Minimality means that if x = (Xi: i E I) is another solution with Xi 0
for all i, then Xi hi for all i.)
Proof. First we show that h
A
satisfies (1.3). If X
o
= i E A, then H
A
= 0,
so hf = 1. If X
o
= i fj. A, then H
A
1, so by the Markov property
and
hf = lP\(H
A
< 00) = LJPi(H
A
< oo,X
I
= j)
jEI
=LJPi(H
A
< 00 I Xl = j)JPi(XI = j) = LPijhf.
jEI jEI
14 1. Discrete-time Markov chains
Suppose now that X = (Xi: i E I) is any solution to (1.3). Then hf = Xi = 1
for i E A. Suppose i ¢ A, then
Xi = '2:PijXj = '2:Pij + '2:PijXj.
jEI jEA
Substitute for Xj to obtain
Xi = LPij + LPij(LPjk + LPjkXk)
jEA kEA
= JPli(X
1
E A) + JPli(X
1
¢ A, X2 E A) + '2: '2: PijPjkXk·

By repeated substitution for X in the final term we obtain after n steps
Xi = JP>i(X
I
E A) + ... + JP>i(X
I
fj. A, ... ,X
n
-
I
fj. A, X
n
E A)
+ L ... L PiiIPjli2 · · · Pjn-lin Xjn'

Now if X is non-negative, so is the last term on the right, and the remaining
terms sum to JP>i(H
A
n). So Xi JP>i(H
A
n) for all n and then
Xi lim JP>i(H
A
n) = JP>i(H
A
< 00) = hi.
n--+-oo
Example 1.3.1 (continued)
The system of linear equations (1.3) for h = h{4} are given here by
h
4
= 1,
h2 = + h3 = +
so that
and
D
The value of hI is not determined by the system (1.3), but the minimality
condition now makes us take hI = 0, so we recover h
2
= 1/3 as before. Of
course, the extra boundary condition hI = 0 was obvious from the beginning
1.3 Hitting times and absorption probabilities 15
so we built it into our system of equations and did not have to worry about
minimal non-negative solutions.
In cases where the state-space is infinite it may not be possible to write
down a corresponding extra boundary condition. Then, as we shall see in
the next examples, the minimality condition is essential.
Example 1.3.3 (Gamblers' ruin)
Consider the Markov chain with diagram

o
q p
I( • ..
1
q p q p
I( ... 1( ••
i i + 1
where 0 < P = 1 - q < 1. The transition probabilities are
Poo = 1,
Pi,i-l = q, Pi,i+l = P for i = 1,2, ....
Imagine that you enter a casino with a fortune of £i and gamble, £1 at a
time, with probability P of doubling your stake and probability q of losing
it. The resources of the casino are regarded as infinite, so there is no upper
limit to your fortune. But what is the probability that you leave broke?
Set hi = IPi(hit 0), then h is the minimal non-negative solution to
h
o
= 1,
hi = ph
i
+
1
+ qh
i
-
1
, for i = 1,2, ....
If p =I q this recurrence relation has a general solution
hi = A + B ( ~ ) i .
(See Section 1.11.) If P < q, which is the case in most successful casinos,
then the restriction 0 ~ hi ~ 1 forces B = 0, so hi = 1 for all i. If p > q,
then since h
o
= 1 we get a family of solutions
for a non-negative solution we must have A ~ 0, so the minimal non-
negative solution is hi = (q/ p)i. Finally, if p = q the recurrence relation
has a general solution
hi = A+ Bi
16 1. Discrete-time Markov chains
and again the restriction 0 hi 1 forces B = 0, so hi = 1 for all i.
Thus, even if you find a fair casino, you are certain to end up broke. This
apparent paradox is called gamblers' ruin.
Example 1.3.4 (Birth-and-death chain)
Consider the Markov chain with diagram
ql PI qi Pi qi+1 Pi+1
...---.t(-...... ......- - - - - • .-...... t(I---..-••--- - - - -
o 1 i i+l
where, for i = 1,2, ... , we have 0 < Pi = 1 - qi < 1. As in the preceding
example, 0 is an absorbing state and we wish to calculate the absorption
probability starting from i. But here we allow Pi and qi to depend on i.
Such a chain may serve as a model for the size of a population, recorded
each time it changes, Pi being the probability that we get a birth before
a death in a population of size i. Then hi = IPi(hit 0) is the extinction
probability starting from i.
We write down the usual system of equations
h
o
= 1,
hi = Pihi+1 + qihi-I, for i = 1,2, ....
This recurrence relation has variable coefficients so the usual technique fails.
But consider Ui = h
i
-
I
- hi, then PiUi+1 = qiUi, so
where the final equality defines Then
UI + ... +Ui = h
o
- hi
so
where A = UI and = 1. At this point A remains to be determined. In
the case L::o = 00, the restriction 0 hi 1 forces A = 0 and hi = 1
for all i. But if L::o < 00 then we can take A > 0 so long as
... for all i.
1.3 Hitting times and absorption probabilities 17
(1.4)
Thus the minimal non-negative solution occurs when A = (E:o Ii) -1 and
then
In this case, for i = 1,2, ... , we have hi < 1, so the population survives
with positive probability.
Here is the general result on mean hitting times. Recall that kf =
Ei(H
A
), where H
A
is the first time hits A. We use the notation
1
B
for the indicator function of B, so, for example, 1
x1
==j is the random
variable equal to 1 if Xl = j and equal to 0 otherwise.
Theorem 1.3.5. The vector of mean hitting times k
A
= (k
A
: i E I) is
the minimal non-negative solution to the system of linear equations
{
kf = 0 for i E A
kf = 1 + pijkf for i f/: A.
Proof. First we show that k
A
satisfies (1.4). If X
o
= i E A, then HA = 0,
so kf = o. If X
o
= i f/: A, then H
A
1, so, by the Markov property,
and
kt = Ei(H
A
) = L
E
i(H
A
1
X1
=j)
JEI
= LEi(H
A
I Xl = j)JP\(XI = j) = 1 + LPijkf·
JEI
Suppose now that Y = (Yi : i E I) is any solution to (1.4). Then kf = Yi = 0
for i E A. If i f/: A, then
Yi = 1 + LPijYj

= 1 + LPi
j
(l + LPjkYk)

= lP'i(H
A
1) + lP'i(H
A
2) + L L PijPjkYk·

By repeated substitution for Y in the final term we obtain after n steps
Yi = lP'i(H
A
1) + ... + lP'i(H
A
n) + L ... L PiilPilh .. 'Pjn-dnYjn'

18 1. Discrete-time Markov chains
So, if y is non-negative,
Yi 2: Wi(H
A
2: 1) +... +Wi(H
A
2: n)
and, letting n ----* 00,
Yi 2': L IP\(H
A
2': n) = JEi(H
A
) = Xi-
n==l
D
Exercises
1.3.1 Prove the claims (a), (b) and (c) made in example (v) of the Intro-
duction.
1.3.2 A gambler has £2 and needs to increase it to £10 in a hurry. He
can play a game with the following rules: a fair coin is tossed; if a player
bets on the right side, he wins a sum equal to his stake, and his stake is
returned; otherwise he loses his stake. The gambler decides to use a bold
strategy in which he stakes all his money if he has £5 or less, and otllerwise
stakes just enough to increase his capital, if he wins, to £10.
Let X
o
== 2 and let X
n
be his capital after n throws. P r o ~ e that the
gambler will achieve his aim with probability 1/5.
What is the expected number of tosses until the gambler either achieves
his aim or loses his capital?
1.3.3 A simple game of 'snakes and ladders' is played on a board of nine
squares.
1.4 Strong Markov property 19
At each turn a player tosses a fair coin and advances one or two places
according to whether the coin lands heads or tails. If you land at the foot
of a ladder you climb to the top, but if you land at the head of a snake you
slide down to the tail. How many turns on average does it take to complete
the game?
What is the probability that a player who has reached the middle square
will complete the game without slipping back to square I?
1.3.4 Let ( X n ) n ~ O be a Markov chain on {O, 1, ... } with transition proba-
bilities given by
(
i +1) 2
POI = 1, Pi,i+l +Pi,i-l = 1, Pi,i+l = -i- Pi,i-l,
i ~ 1.
Show that if X
o
= 0 then the probability that X
n
~ 1 for all n ~ 1 is 6/1T
2

1.4 Strong Markov property
In Section 1.1 we proved the Markov property. This says that for each time
m, conditional on X
m
= i, the process after time m begins afresh from
i. Suppose, instead of conditioning on X
m
= i, we simply waited for the
process to hit state i, at some random time H. What can one say about the
process after time H? What if we replaced H by a more general random
time, for example H - I? In this section we shall identify a class of random
times at which a version of the Markov property does hold. This class will
include H but not H -- 1; after all, the process after time H - 1 jumps
straight to i, so it does not simply begin afresh.
A random variable T : n ~ {O, 1,2, ... } U {(X)} is called a stopping time
if the event {T = n} depends only on X
o
, Xl, ... ,X
n
for n = 0,1,2, ....
Intuitively, by watching the process, you know at the time when T occurs.
If asked to stop at T, you know when to stop.
Examples 1.4.1
(a) The first passage time
T
j
= inf{n ~ 1 : X
n
= j}
is a stopping time because
(b) The first hitting time H
A
of Section 1.3 is a stopping time because
{H
A
= n} = {X
o
~ A, ... ,X
n
-
l
~ A,X
n
E A}.
20
(c) The last exit time
1. Discrete-time Markov chains
LA = sup{n 0 : X
n
E A}
is not in general a stopping time because the event {LA == n} depends on
whether visits A or not.
We sllall show that the Markov property holds at stopping times. The
crucial point is that, if T is a stopping time and B n is determined by
X
o
, Xl, ... ,X
T
, then B n {T == m} is determined by X
o
, Xl, ... ,X
m
, for
all m = 0, 1, 2, ....
Theorem 1.4.2 (Strong Markov property). Let be
Markov(A, P) and let T be a stopping time of Then, conditional
on T < 00 and X
T
= i, is Markov(8
i
, P) and independent of
XO,X
I
,.·. ,XT.
Proof. If B is an event determined by X
o
, Xl, ... ,X
T
, then B n {T = m}
is determined by X
o
, Xl, ... ,X
m
, so, by the Markov property at time m
lP( {X
T
= io, XT+I = il, ,XT+n = in} n B n {T = m} n {X
T
= i})
= JP>i(X
O
= io,X
I
= jl, ,X
n
= jn)JP>(B n {T = m} n{X
T
= i})
where we have used the condition T = m to replace m by T. Now sum over
m == 0, 1, 2, ... and divide by lP(T < 00, X
T
= i) to obtain
JP>( {X
T
= jo, X
T
+
I
= jl, ,X
T
+
n
= jn} n BIT < 00, X
T
= i)
= JP>i(X
O
= jo, Xl = jl, ,X
n
= jn)JP>(B I T < 00, X
T
= i). D
The following example uses the strong Markov property to get more
information on the hitting times of the chain considered in Example 1.3.3.
Example 1.4.3
Consider the Markov chain with diagram

o
q p
I( •
1
q p q p
I( •
i i + 1
1.4 Strong Markov property 21
where 0 < p = 1 - q < 1. We know from Example 1.3.3 the probability of
hitting 0 starting from 1. Here we obtain the complete distribution of the
time to hit 0 starting from 1 in terms of its probability generating function.
Set
H
j
= inf{n ~ 0 : X
n
= j}
and, for 0 ~ s < 1
</>(s) = lEI (sHo) = 2: snIP
1
(H
o
= n).
n<oo
Suppose we start at 2. Apply the strong Markov property at !II to see
that under lP
2
, conditional on HI < 00, we have H
o
= HI + H
o
, where
ii
o
, the time taken after HI to get to 0, is independent of HI and has the
(unconditioned) distribution of HI. So
E
2
(SHO) = E
2
(sHl I HI < 00)E
2
(sHO I HI < 00)JP>2(H
l
< 00)
=E2(sHI1Hl<OO)E2(sHO I HI < 00)
= E
2
(sHl)2 = ¢(s)2.
Then, by the Markov property at time 1, conditional on Xl = 2, we have
H
o
= 1 +H
o
, where H
o
, the time taken after time 1 to get to 0, has the
same distribution as H
o
does under lP
2
. So
¢(s) = El(sH
o
) = pEl (sHo I Xl = 2) +qEl(sH
o
I Xl = 0)
= pEl (sl+Ho I Xl = 2) +qEl(s I Xl = 0)
= psE
2
(sH
o
) +qs
=ps¢(s)2 +qs.
Thus ¢ = ¢(s) satisfies
pS¢2 - ¢ +qs = 0
and
(1.5)
¢ = (1 ± VI - 4pqs2)/2ps.
Since ¢(O) ~ 1 and ¢ is continuous we are forced to take the negative root
at s = 0 and stick with it for all 0 ~ s < 1.
To recover the distribution of H
o
we expand the square-root as a power
series:
</>(s) = 2 ~ S { 1- (1 + !(-4
pqs
2) + !(-lH_4
pqs
2)2/2! + ... ) }
2 3
= qs +pq s +...
= slPl(H
o
= 1) +s2lP
l
(H
o
= 2) +s3lP
l
(H
o
= 3) +....
22 1. Discrete-time Markov chains
The first few probabilities P1(H
o
= 1),P
1
(H
o
= 2), ... are readily checked
from first principles.
On letting s i 1 we have ¢(s) ~ P1(H
o
< 00), so
lI]) (Ii ) _ 1 - VI - 4pq _ { 1 if p ~ q
.1rl 0 < 00 - -
2p q/p if p > q.
(Remember that q = 1 - p, so
VI - 4pq = VI - 4p +4
p
2 = 11 - 2pl = 12q - 11.)
We can also find the mean hitting time using
It is only worth considering the case p ~ q, where the mean hitting time
has a chance of being finite. Differentiate (1.5) to obtain
2ps¢¢' +p¢2 - ¢' +q = 0
so
¢'(s) = (p¢(S)2 +q)/(l - 2ps¢(s)) ~ 1/(1 - 2p) = l/(q - p) as s i 1.
See Example 5.1.1 for a connection with branching processes.
Example 1.4.4
We now consider an application of the strong Markov property to a Markov
chain ( X n ) n ~ O observed only at certain times. In the first instance suppose
that J is some subset of the state-space I and that we observe the chain
only when it takes values in J. The resulting process ( Y m ) m ~ O may be
obtained formally by setting Y
m
= X
Trn
, where
To = inf{n ~ 0 : X
n
E J}
and, for m = 0, 1, 2, ...
T
m
+
1
= inf{n > T
m
: X
n
E J}.
Let us assume that lP(Tm < 00) = 1 for all m. For each m we can check
easily that Tm, the time of the mth visit to J, is a stopping time. So the
strong Markov property applies to show, for i
o
, ... ,im+l E J, that
P(Ym +1 = i m +1 I Yo = i o,··· , Ym = im )
= P(X
Trn
+
1
= i m +1 I X
To
= i o,··· ,X
Trn
= im)
= Pi (X
T
= i
m
+
1
) = p-. .
rn 1 't rn't rn+l
where, for i, j E J
1.4 Strong Markov property 23
(1.6)
- hi
Pii = i
and where, for j E J, the vector (h{ : i E I) is the minimal non-negative
solution to
h{ = Pij + LPik
h


Thus is a Markov chain on J with transition matrix P.
A second example of a similar type arises if we observe the original chain
only when it moves. The resulting process is given by
Zm = XS
rn
where 8
0
= 0 and for m = 0,1,2, ...
Let us assume there are no absorbing states. Again the random times 8
m
for m 0 are stopping times and, by the strong Markov property
lP(Zm+l = im+l I Zo = io,··· ,Zm = im)
= lP(X
Srn
+
1
= im+l I XS
o
= io, ... ,Xs
rn
= i
m
)
= lPirn (XS1 = i m +1 ) = Pirnirn+l
where Pii = 0 and, for i =I j
Pij = Pij/ LPik.
k=j:i
Thus is a Markov chain on I with transition matrix P.
Exercises
1.4.1 Let Y
1
, Y
2
, ••• be independent identically distributed random vari-
ables with
lP(Y
1
= 1) = lP(Y
1
= -1) = 1/2 and set X
o
= 1, X
n
= X
o
+Y
1
+... +Y
n
for n 1. Define
H
o
= inf{n 0 : X
n
= O}.
Find the probability generating function ¢(s) = E(sHO).
Suppose the distribution of Y
1
, "Y2, ... is changed to lP(Y
1
= 2) = 1/2,
lP(Y
1
= -1) = 1/2. Show that ¢ now satisfies
s¢3 - 2¢ +s = 0.
1.4.2 Deduce carefully from Theorem 1.3.2 the claim made at (1.6).
24 1. Discrete-time Markov chains
1.5 Recurrence and transience
Let ( X n ) n ~ O be a Markov chain with transition matrix P. We say that a
state i is recurrent if
lPi(X
n
= i for infinitely many n) = 1.
We say that i is transient if
IPi(X
n
= i for infinitely many n) = O.
Thus a recurrent state is one to which you keep coming back and a transient
state is one which you eventually leave for ever. We shall show that every
state is either recurrent or transient.
Recall that the first passage time to state i is the random variable T
i
defined by
Ti(W) = inf{n ~ 1 : Xn(w) = i}
where inf 0 = 00. We IIOW define inductively the rth passage time Ti(r) to
state i by
and, for r = 0,1,2, ... ,
The length of the rth excursion to i is then
{
T(r) _ T(r-I)
S ~ r ) = i i
~ 0
·f T(r-I)
1 i < 00
otherwise.
The following diagram illustrates these definitions:
X
n
i
T.(O)
~
n
1.5 Recurrence and transience 25
Our analysis of recurrence and transience will rest on finding the joint
distribution of these excursion lengths.
Lemma 1.5.1. For r = 2,3, ... , conditional on Ti(r-I) < 00, sir) is inde-
pendent of {X
m
: m Ti(r-I)} and
Proof. Apply the strong Markov property at the stopping time T = Ti(r-I).
It is automatic that X
T
= i on T < 00. So, conditional on T < 00,
is Markov(8
i
, P) and independent of X
o
, Xl, ... ,X
T
. But
sir) = inf{n 1 : X
T
+
n
= i},
so sir) is the first passage time of to state i. D
Recall that the indicator function l{xl==j} is the random variable equal
to 1 if Xl = j and 0 otherwise. Let us introduce the number of visits Vi to
i, which may be written in terms of indicator functions as
00
Vi = L l{x
n
=i}
n==O
and note that
00 00 00 00
lEi(Vi) = lEi L l{X
n
=i} = L
lE
i(l{X
n
=i}) = LlPi(Xn = i) =
n==O n==O n==O n==O
Also, we can compute the distribution of Vi under Pi in terms of the return
probability
Lemma 1.5.2. For r = 0,1,2, ... , we have Pi (Vi > r) = fi.
Proof. Observe that if X
o
= i then {Vi > r} = {Ti(r) < oo}. When r = 0
the result is true. Suppose inductively that it is true for r, then
( ) (
(r+I) )
Pi Vi > r +1 = Pi T
i
< 00
= Pi(Ti(r) < 00 and si
r
+
I
) < 00)
= Pi (Sfr+I) < 00 I Ti(r) < oo)Pi(Ti(r) < 00)
= fif; = f[+1
by Lemma 1.5.1, so by induction the result is true for all r. D
26 1. Discrete-time Markov chains
Recall that one can compute the expectation of a non-negative integer-
valued random variable as follows:
00 00 00
LP(V > r) = L L P(V = v)
r=O r=Ov=r+1
00 v-I 00
= L LP(V -="' v) = L vP(V = v) = E(V).
v=lr=O v=1
The next theorem is the means by which we establish recurrence or
transience for a given state. Note that it provides two criteria for this, one
in terms of the return probability, ttle other in terms of the n-step transition
probabilities. Both are useful.
Theorem 1.5.3. The following dichotomy holds:
(i) ifIPi(T
i
< 00) = 1, then i is recurrent and = 00;
(ii) ifIPi(T
i
< 00) < 1, then i is transient and < 00.
In particular, every state is either transient or recurrent.
Proof. If IPi(T
i
< 00) = 1, then, by Lemma 1.5.2,
IPi(Vi = 00) = lim IPi(Vi > r) = 1
r--+-oo
so i is recurrent and
00
= Ei(Vi) = 00.
n=O
On the other hand, if Ii = IPi(Ti < 00) < 1, then by Lemma 1.5.2
= Ei(Vi) = fpi(Vi > r) = fl[ = 1 . < 00
n=O r=O r=O f'l,
so IPi (Vi = 00) = 0 and i is transient. D
From this theorem we can go on to solve completely the problem of
recurrence or transience for Markov chains with finite state-space. Some
cases of infinite state-space are dealt with in the following chapter. First
we show that recurrence and transience are class properties.
Theorem 1.5.4. Let C be a communicating class. Then either all states
in C are transient or all are recurrent.
Proof. Take any pair of states i, j E C and suppose that i is transient.
There exist n, m 0 with > 0 and PJ":) > 0, and, for all r 0
(n+r+m) > (n) (r) (m)
Pii - Pij Pjj Pji
1.5 Recurrence and transience
so
~ (r) < 1 ~ (n+r+m)
LJPjj - (n) (m) LJPii < 00
r==O Pij Pji r==O
by Theorem 1.5.3. Hence j is also transient by Theorem 1.5.3. D
27
In the light of this theorem it is natural to speak of a recurrent or transient
class.
Theorem 1.5.5. Every recurrent class is closed.
Proof. Let C be a class which is not closed. Then there exist i E C, j fj. C
and m ~ 1 with
Since we have
lP
i
( {X
m
= j} n {X
n
= i for infinitely many n}) = 0
this implies that
lPi(X
n
= i for infinitely many n) < 1
so i is not recurrent, and so neither is C. D
Theorem 1.5.6. Every finite closed class is recurrent.
Proof. Suppose C is closed and finite and that ( X n ) n ~ O starts in C. Then
for some i E C we have
o< P(X
n
= i for infinitely many n)
= lP(X
n
= i for some n)Pi(X
n
= i for infinitely many n)
by the strong Markov property. This shows that i is not transient, so C is
recurrent by Theorems 1.5.3 and 1.5.4. D
It is easy to spot closed classes, so the transience or recurrence of finite
classes is easy to determine. For example, the only recurrent class in Ex-
ample 1.2.2 is {5, 6}, the others being transient. On the other hand, infinite
closed classes may be transient: see Examples 1.3.3 and 1.6.3.
We shall need the following result in Section 1.8. Remember that irre-
ducibility means that the chain can get from any state to any other, with
positive probability.
28 1. Discrete-time Markov chains
Theorem 1.5.7. Suppose P is irreducible and recurrent. Then for all
j E I we have lP(T
j
< 00) = 1.
Proof. By the Markov property we have
lP(T
j
< (0) = LlP(Xo = i)lPi(T
j
< (0)
iEI
so it suffices to show lPi (T
j
< (0) = 1 for all i E I. Choose m with p;:n) > o.
By Theorem 1.5.3, we have
1 = Pj(X
n
= j for infinitely many n)
= lPj(X
n
= j for some n m +1)
=L lPj(X
n
= j for some n m + 1 I X
m
= k)lPj(X
m
= k)
kEI
= L lPk(T
j
< oo)p;7:)
kEI
where the final equality uses the Markov property. But EkE! p;7:) = 1 so
we must have Pi(T
j
< 00) = 1. D
Exercises
1.5.1 In Exercise 1.2.1, which states are recurrent and which are transient?
1.5.2 Show that, for the Markov chain in Exercise 1.3.4 we have
lP(X
n
00 as n 00) = 1.
Suppose, instead, the transition probabilities satisfy
(
i+l)O
Pi,i+l = -i- Pi,i-l·
For each Q E (0,00) find the value of P(X
n
00 as n 00).
1.5.3 (First passage decomposition). Denote by T
j
the first passage
time to state j and set
Justify the identity
n

= '"'
I) I) ))
k==l
for n 1
1.6 Recurrence and transience of random walks
and deduce that
where
29
CX)
Pij(s) =
n==O
00
F
ij
(s) = L sn·
n==O
Hence show that Pi (T
i
< 00) = 1 if and only if
00
= 00
n==O
without using Theorem 1.5.3.
1.5.4 A random sequence of non-negative integers is obtained by
setting F
o
= 0 and F
1
= 1 and, once F
o
, ... ,F
n
are known, taking F
n
+
1
to
be either the sum or the difference of F
n
-
1
and F
n
, each with probability
1/2. Is a Markov chain?
By considering the Markov chain X
n
= (F
n
-
1
, F
n
), find the probability
that reaches 3 before first returning to O.
Draw enough of the flow diagram for to establish a general
pattern. Hence, using the strong Markov property, show that the hitting
probability for (1,1), starting from (1,2), is (3 - V5)/2.
Deduce that is transient. Show that, moreover, with probability
1, F
n
00 as n 00.
1.6 Recurrence and transience of random walks
In the last section we showed that recurrence was a class property, that all
recurrent classes were closed and that all finite closed classes were recurrent.
So the only chains for which the question of recurrence remains interesting
are irreducible with infinite state-space. Here we shall study some simple
and fundamental examples of this type, making use of the following criterion
for recurrence from Theorem 1.5.3: a state i is recurrent if and only if
(n)_
L..Jn==O Pii - 00.
Example 1.6.1 (Simple random walk on Z)
The simple random walk on Z has diagram
i-I
q P
... ...
i i+l
30 1. Discrete-time Markov chains
where 0 < P = 1 - q < 1. Suppose we start at O. It is clear that we cannot
return to 0 after an odd number of steps, so = 0 for all n. Any
given sequence of steps of length 2n from 0 to 0 occurs with probability
pnqn, there being n steps up and n steps down, and the number of such
sequences is the number of ways of choosing the n steps up from 2n. Thus
(2n) (2n) n n
Poo = p q .
n
Stirling's formula provides a good approximation to n! for large n: it is
known that

where an rv b
n
means an/b
n
1. For a proof see W. Feller, An Introduction
to Probability Theory and its Applications, Vol I (Wiley, New York, 3rd
edition, 1968). At the end of this chapter we reproduce the argument used
by Feller to show that

for some A E [1,00). The additional work needed to show A = y'2;IT is
omitted, as this fact is unnecessary to our applications.
For the n-step transition probabilities we obtain
(2n) _ (2n)! ( )n rv (4pq)n
Poo - (n!)2 pq AJn/2
as n 00.
In the symmetric case p = q = 1/2, so 4pq = 1; then for some N and all
n N we have
(2n) > 1
Poo - 2AVii
so
(2n) > --.!... _1 _
Poo - 2A Vii - 00
which shows that the random walk is recurrent. On the other hand, if p =I q
then 4pq = r < 1, so by a similar argument, for some N
00 1 00
'"" p(n) < _ '"" r
n
< 00
00 -A
n==N n=N
showing that the random walk is transient.
1.6 Recurrence and transience of random walks
Example 1.6.2 (Simple symmetric random walk on Z2)
The simple symmetric random walk on Z2 has diagram
1
1
:4
:4
......
,
1
1 , ~
:4
:4
31
and transition probabilities
{
1/4 ifli-jl==l
Pij == 0 otherwise.
Suppose we start at o. Let us call the walk X
n
and write X ~ and X; for
the orthogonal projections of X
n
on the diagonal lines y == ±x:
X+
n
Then X ~ and X; are independent simple symmetric random walks on
2-
1
/
2
Z and X
n
== 0 if and only if X ~ == 0 == X;. This makes it clear that
for X
n
we have
(2n)
Poo ==
a s n ~ o o
32 1. Discrete-time Markov chains
by Stirling's formula. Then E::o = 00 by comparison with E::o l/n
and the walk is recurrent.
Example 1.6.3 (Simple symmetric random walk on Z3)
The transition probabilities of the simple symmetric random walk on Z3
are given by
P
.. _ {1/6 if Ii - jl = 1
'1,) -
o otherwise.
Thus the chain jumps to each of its nearest neighbours with equal probabil-
ity. Suppose we start at O. We can only return to 0 after an even number
2n of steps. Of these 2n steps there must be i up, i down, j north, j south,
k east and k west for some i,j, k 0, with i +j +k = n. By counting the
ways in which this can be done, we obtain
(2n) _
Poo -
Now

i+j+k=n
(2n)!
(
., ·'k')2
..

i+j+k=n

i+j+k=n
the left-hand side being the total probability of all the ways of placing n
balls randomly into three boxes. For the case where n = 3m, we have
(
n ) n! (n)
ij k = i!j!k! mmm
for all i, j, k, so
3/2

by Stirling's formula. Hence, E:=o < 00 by comparison with
,",00 -3/2 B t (6m) > (1/6)2 (6m-2) d (6m) > (1/6)4 (6m-4) £
L...Jn=O n . u Poo Poo an Poo _ Poo or
all m so we must have
and the walk is transient.
Exercises
1. 7 Invariant distributions 33
1.6.1 The rooted binary tree is an infinite graph T with one distinguished
vertex R from which comes a single edge; at every other vertex there are
three edges and there are no closed loops. The random walk on T jumps
from a vertex along each available edge with equal probability. Show that
the random walk is transient. \
1.6.2 Show that the simple symmetric random walk in Z4 is transient.
1.7 Invariant distributions
Many of the long-time properties of Markov chains are connected with the
notion of an invariant distribution or measure. Remember that a measure
A is any row vector (Ai : i E I) with non-negative entries. We say A is
invariant if
AP == A.
The terms equilibrium and stationary are also used to mean the same. The
first result explains the term stationary.
Theorem 1.7.1. Let be Markov(A,P) and suppose that A is in-
variant for P. Then is also Markov(A, P).
Proof. By Theorem 1.1.3, P(X
m
== i) == (Apm)i == Ai for all i and, clearly,
conditional on X
m
+
n
== i, X
m
+
n
+
1
is independent of X
m
, X
m
+
1
, ... ,X
m
+
n
and has distribution (Pij : j E I). D
The next result explains the term equilibrium.
Theorem 1.7.2. Let I be finite. Suppose for some i E I that
---t 7rj as n ---t 00 for all j E I.
Then 7r == (7rj : j E I) is an invariant distribution.
Proof. We have
L L
1
· (n) 1· L (n) 1
7rj == 1m P- - == 1m P- - ==

jEI jEI jEI
and
1
· (n) 1· L (n) L 1· (n) L
7rj == 1m P- - == 1m P-k Pkj == 1m P-k Pkj == 7rkPkj

kEI kEI kEI
34 1. Discrete-time Markov chains
where we have used finiteness of I to justify interchange of summation and
limit operations. Hence 7r is an invariant distribution. D
Notice that for any of the random walks discussed in Section 1.6 we have
---t 0 as n ---t 00 for all i, j E I. The limit is certainly invariant, but it
is not a distribution!
Theorem 1.7.2 is not a very useful result but it serves to indicate a rela-
tionship between invariant distributions and n-step transition probabilities.
In Theorem 1.8.3 we shall prove a sort of converse, which is much more
useful.
Example 1.7.3
Consider the two-state Markov chain with transition matrix
(
l-a
p-
- (3
Ignore the trivial cases 0: = (3 = 0 and 0: = (3 = 1. Then, by Example 1.1.4
pn ((3/(a +(3)
(3/(a +(3)
a/(a +(3))
a/(a +(3)
as n 00,
so, by Theorem 1.7.2, the distribution ((3/(0: + (3),0:/(0: + (3)) must be
invariant. There are of course easier ways to discover this.
Example 1.7.4
Consider the Markov chain with diagram
1
3
1
2
2
To find an invariant distribution we write down the components of the
vector equation 7rP = 7r
7rl =
1 1
7r2 = 27r1 + 2
7r3
_ 1 1
7r3 - 27r2 + 2
7r3
·
1. 7 Invariant distributions 35
In terms of the chain, the right-hand sides give the probabilities for Xl,
when X
o
has distribution 7r, and the equations require Xl also to have
distribution 7r. The equations are homogeneous so one of them is redundant,
and another equation is required to fix 7r uniquely. That equation is
and we find that 7r = (1/5,2/5,2/5).
According to Example 1.1.6
p i ~ ) ~ 1/5 as n ~ 00
so this confirms Theorem 1.7.2. Alternatively, knowing that p ~ ~ ) had the
form
(n) (l)n ( n7r . n7r)
PII = a + 2 bcos 2 + CSln 2
we could have used Theorem 1.7.2 and knowledge of 7r1 to identify a = 1/5,
instead of working out p ~ ; ) in Example 1.1.6.
In the next two results we shall show that every irreducible and recurrent
stochastic matrix P has an essentially unique positive invariant measure.
The proofs rely heavily on the probabilistic interpretation so it is worth
noting at the outset that, for a finite state-space I, the existence of an
invariant row vector is a simple piece of linear algebra: the row sums of P
are alII, so the column vector of ones is an eigenvector with eigenvalue 1,
so P must have a row eigenvector with eigenvalue 1.
For a fixed state k, consider for each i the expected time spent in i between
visits to k:
Tk- l
')'f = lEk L l{x
n
=i}'
n=O
Here the sum of indicator functions serves to count the number of times n
at which X
n
= i before the first passage time T
k
.
Theorem 1.7.5. Let P be irreducible and recurrent. Then
(i) l ' ~ = 1;
(ii) l'k = (l'f : i E I) satisfies l'k P = l'k ;
(iii) 0 < Tf < 00 for all i E I.
Proof. (i) This is obvious. (ii) For n = 1,2, ... the event {n ::; Tk} depends
only on X
o
, Xl, . .. ,X
n
-
l
, so, by the Markov property at n - 1
36 1. Discrete-time Markov chains
Since P is recurrent, under IPk we have Tk < 00 and X
o
= X
Tk
= k with
probability one. Therefore
Tk 00
,j = lEk L l{x
n
=j} = lEk L l{X
n
=j and nSTd
n=l n=l
00
= L JP>k(X
n
= j and n ~ Tk)
n=l
00
= L L JP>k(X
n
-
1
= i, X
n
= j and n ~ Tk)
iEI n=l
00
= LPij L JP>k(X
n
-
1
= i and n ~ Tk)
iEI n=l
00
=LpijlEk L l{X
tn
=i and msTk- 1}
iEI m=O
Tk- 1
= LPijlEk L l{X
tn
=i} = L ,fpij.
iEI m=O iEI
(iii) Since P is irreducible, for each state i there exist n, m ~ 0 with
(n) (m) 0 Th k k (m) 0 d k (n) k b () d
Pik ,Pki >. en ~ i ~ ~ k P k i > an ~ i Pik ~ ~ k = 1 Y i an
(ii). D
Theorem 1.7.6. Let P be irreducible and let A be an invariant measure
for P with Ak = 1. Then A ~ ~ k . If in addition P is recurrent, then A = ~ k .
Proof. For each j E I we have
Aj = L AioPioj = L AioPioj + Pkj
ioEI io#k
= L Ai1Pi
1
ioPioj + (Pk
j
+ L PkioPiOj )
io,il#k io#k
L AinPinin-l · · · Pioj
i
o
,... ,in=j:k
+ (Pk
j
+ L PkioPioj + · · · + L Pkin-l · · · PilioPioj )
io#k io, ... ,in-l#k
~ IPk(X
1
= j and Tk ~ 1) +IPk(X2 = j and Tk ~ 2)
+... +IPk(X
n
=j and Tk ~ n)
~ ~ j as n ~ 00.
1. 7 Invariant distributions 37
So A If P is recurrent, then is invariant by Theorem 1.7.5, so
J-l = A - is also invariant and /-l O. Since P is irreducible, given i E I,
h
(n) 0 £ dO'" (n) (n)
we ave Pik > or some n, an = /-lk = LJjEI /-ljPjk /-liPik , so
J-li = O. 0
Recall that a state i is recurrent if
IPi(X
n
= i for infinitely many n) = 1
and we showed in Theorem 1.5.3 that this is equivalent to
If in addition the expected return time
is finite, then we say i is positive recurrent. A recurrent state which fails to
have this stronger property is called null recurrent.
Theorem 1.7.7. Let P be irreducible. Then the following are equivalent:
(i) every state is positive recurrent;
(ii) some state i is positive recurrent;
(iii) P has an invariant distribution, 7r say.
Moreover, when (iii) holds we have mi = 1/7ri for all i.
Proof. (i) => (ii) This is obvious.
(ii) => (iii) If i is positive recurrent, it is certainly recurrent, so P is recur-
rent. By Theorem 1.7.5, is then invariant. But
= mi < 00
jEI
so 7rj = / mi defines an invariant distribution.
(iii) => (i) Take any state k. Since P is irreducible and EiEI 7ri = 1 we have
7rk = EiEI > 0 for some n. Set Ai = 7ri/7rk. Then A is an invariant
measure with Ak = 1. So by Theorem 1.7.6, A Hence
L
k L 7ri 1
mk = - = - < 00
7r 7r
iEI iEI k k
and k is positive recurrent.
(1.7)
38 1. Discrete-time Markov chains
To complete the proof we return to the argument for (iii) => (i) armed
with the knowledge that P is recurrent, so A = ~ k and the inequality (1.7)
is in fact an equality. 0
Example 1.7.8 (Simple symmetric random walk on Z)
The simple symmetric random walk on Z is clearly irreducible and, by
Example 1.6.1, it is also recurrent. Consider the measure
7ri = 1 for all i.
Then
so 7r is invariant. Now Theorem 1.7.6 forces any invariant measure to be
a scalar multiple of 7r. Since LiEZ 7ri = 00, there can be no invariant
distribution and the walk is therefore null recurrent, by Theorem 1.7.7.
Example 1.7.9
The existence of an invariant measure does not guarantee recurrence: con-
sider, for example, the simple symmetric random walk on Z3, which is
transient by Example 1.6.3, but has invariant measure 7r given by 7ri = 1
for all i.
Example 1.7.10
Consider the asymmetric random walk on Z with transition probabilities
Pi,i-l = q < P = Pi,i+l· In components the invariant measure equation
7rP = 7r reads
This is a recurrence relation for 7r with general solution
So, in this case, there is a two-parameter family of invariant measures -
uniqueness up to scalar multiples does not hold.
Example 1.7.11
Consider a success-run chain on Z+ , whose transition probabilities are given
by
Pi,i+l = Pi, PiO = qi = 1 - Pi·
1. 7 Invariant distributions
Then the components of the invariant measure equation 7rP = 7r read
00
1ro = L qi
1r
i,
i=O
7ri = Pi-l7ri-l, for i ~ 1.
Suppose we choose Pi converging sufficiently rapidly to 1 so that
00
P = IIpi > 0
i=O
which is equivalent to
00
Lqi = 00.
i=O
Then for any solution of 7rP = 7r we have
and so
00
1ro ~ JJ7ro L qi·
i=O
39
This last equation forces either 7ro = 0 or 7ro = 00, so there is no invariant
measure.
Exercises
1.7.1 Find all invariant distributions of the transition matrix in Exercise
1.2.1.
1.7.2 Gas molecules move about randomly in a box which is divided into two
halves symmetrically by a partition. A hole is made in the partition. Sup-
pose there are N molecules in the box. Show that the number of molecules
on one side of the partition just after a molecule has passed through the hole
evolves as a Markov chain. What are the transition probabilities? What is
the invariant distribution of this chain?
1.7.3 A particle moves on the eight vertices of a cube in the following
way: at each step the particle is equally likely to move to each of the three
adjacent vertices, independently of its past motion. Let i be the initial
vertex occupied by the particle, 0 the vertex opposite i. Calculate each of
the following quantities:
40 1. Discrete-time Markov chains
(i) the expected number of steps until the particle returns to i;
(ii) the expected number of visits to 0 until the first return to i;
(iii) the expected number of steps until the first visit to o.
1.7.4 Let be a simple random walk on Z with Pi,i-l = q < P =
Pi,i+l. Find
(
TO-l )
,f = lEo L l{x
n
=i}
n=O
and verify that
= inf Ai for all i
A
where the infimum is taken over all invariant measures A with Ao 1.
(Compare with Theorem 1.7.6 and Example 1.7.10.)
1.7.5 Let P be a stochastic matrix on a finite set I. Show that a distribution
7r is invariant for P if and only if 7r(I -P+A) = a, where A = (aij : i,j E I)
with aij = 1 for all i and j, and a = (ai : i E I) with ai = 1 for all i. Deduce
that if P is irreducible then I -P+A is invertible. Note that this enables one
to compute the invariant distribution by any standard method of inverting
a matrix.
1.8 Convergence to equilibrium
We shall investigate the limiting behaviour of the n-step transition proba-
bilities as n 00. As we saw in Theorem 1.7.2, if the state-space is
finite and if for some i the limit exists for all j, then it must be an invariant
distribution. But, as the following example shows, the limit does not always
exist.
Example 1.8.1
Consider the two-state chain with transition matrix

Then p2 = I, so p2n = I and p
2n
+l = P for all n. Thus fails to
converge for all i, j.
Let us call a state i aperiodic if > 0 for all sufficiently large n. We
leave it as an exercise to show that i is aperiodic if and only if the set
{n 2:: 0 : > O} has no common divisor other than 1. This is also
a consequence of Theorem 1.8.4. The behaviour of the chain in Example
1.8.1 is connected with its periodicity.
1.8 Convergence to equilibrium 41
Lemma 1.8.2. Suppose P is irreducible and has an aperiodic state i.
Then, for all states j and k, > 0 for all sufflciently large n. In particular,
all states are aperiodic.
P
,I Th · > 0 · h (r) (s) 0 Th
rooJ. ere eXIst r, S _ WIt Pji' Pik >. en
(r+n+s) > (r) (n) (s) > 0
Pjk - Pji Pii Pik
for all sufficiently large n. 0
Here is the main result of this section. The method of proof, by coupling
two Markov chains, is ingenious.
Theorem 1.8.3 (Convergence to equilibrium). Let P be irreducible
and aperiodic, and suppose that P has an invariant distribution 7r. Let A
be any distribution. Suppose that is Markov(A, P). Then
P(X
n
= j) 7rj as n 00 for all j.
In particular,
---t 'lrj as n ---t 00 for all i, j.
Proof. We use a coupling argument. Let be Markov(7r, P) and
independent of Fix a reference state b and set
T = inf{n 2:: 1 : X
n
= Y
n
= b}.
Step 1. We show P(T < 00) = 1. The process W
n
= (X
n
, Y
n
) is a Markov
chain on I x I with transition probabilities
P(i,k)(j,l) = PijPkl
and initial distribution
jj(i,k) = Ai
7r
k.
Since P is aperiodic, for all states i, j, k, l we have
(n) (n) 0
P(i,k)(j,l) = Pij Pkl >
for all sufficiently large n; so P is irreducible. Also, P has an invariant
distribution given by
7r(i,k) = 7ri7rk
so, by Theorem 1.7.7, P is positive recurrent. But T is the first passage
time of W
n
to (b, b) so P(T < 00) = 1, by Theorem 1.5.7.
42 1. Discrete-time Markov chains
Step 2. Set
ifn < T
if n T.
The diagram below illustrates the idea. We show that is
Markov(;\, P).
I
n
The strong Markov property applies to at time T, so
(XT+n, is Markov(8(b,b), P) and independent of (X
o
, Yo),
(XI, YI), . .. , (XT, YT ) . By symmetry, we can replace the process
(XT+n, by (Y
T
+
n
, which is also Markov(8(b,b), P) and
remains independent of (X
o
, Yo), (Xl, YI), ... ,(X
T
, YT). Hence =
(Zn, is Markov(j.t, P) where
, { Y
n
Z =
n X
n
ifn < T
if n 2 T.
In particular, is Markov(;\, P).
Step 3. We have
lP(Zn = j) = lP(X
n
= j and n < T) +lP(Y
n
= j and n 2 T)
so
IlP(X
n
= j) - 7rjl = IlP(Zn = j) -lP(Y
n
= j)1
= IlP(X
n
= j and n < T) -lP(Y
n
= j and n < T)I
lP(n < T)
and P(n < T) 0 as n 00. D
1.8 Convergence to equilibrium 43
To understand this proof one should see what goes wrong when P is
not aperiodic. Consider the two-state chain of Example 1.8.1 which has
(1/2, 1/2) as its unique invariant distribution. We start from 0
and with equal probability from 0 or 1. However, if Yo = 1, then,
because of periodicity, and will never meet, and the proof
fails. We move on now to the cases that were excluded in the last theorem,
where is periodic or transient or null recurrent. The remainder of
this section might be omitted on a first reading.
Theorem 1.8.4. Let P be irreducible. There is an integer d 1 and a
partition
I = Co U C
1
U ... U Cd-1
such that (setting Cnd+r = Cr)
(i) > 0 only if i E C
r
and j E C
r
+
n
for some r;
(ii) > 0 for all sufficiently large n, for all i,j E C
r
, for all r.
Proof. Fix a state k and consider S = {n 0 : > O}. Choose n1, n2 E S
with n1 < n2 and such that d := n2 - n1 is as small as possible. (Here and
throughout we use the symbol := to mean 'defined to equal'.) Define for
r = 0, ... ,d-1
C {
. I (nd+r) £ O}
r = E : Pki > 0 or some n .
Then Co U ... U Cd-1 = I, by irreducibility. Moreover, if > 0 and
> 0 for some r, S E {O, 1, ... ,d - I}, then, choosing m 0 so that
> 0, we have > 0 and > 0 so r = s by minimality
of d. Hence we have a partition.
To prove (i) suppose > 0 and i E Cr. Choose m so that > 0,
then > 0 so j E C
r
+
n
as required. By taking i = j = k we now
see that d must divide every element of S, in particular n1.
Now for nd we can write nd = qn1 + r for integers q n1 and
o r n1 - 1. Since d divides n1 we then have r = md for some integer
m and then nd = (q - m)n1 +mn2. Hence
and hence nd E S. To prove (ii) for i, j E C
r
choose m1 and m2 so that
> 0 and > 0, then
44 1. Discrete-time Markov chains
whenever nd Since ml +m2 is then necessarily a multiple of d, we
are done. D
We call d the period of P. The theorem just proved shows in particular for
all i E I that d is the greatest common divisor of the set {n 0 : > O}.
This is sometimes useful in identifying d.
Finally, here is a complete description of limiting behaviour for irre-
ducible chains. This generalizes Theorem 1.8.3 in two respects since we
require neither aperiodicity nor the existence of an invariant distribution.
The argument we use for the null recurrent case was discovered recently by
B. Fristedt and L. Gray.
Theorem 1.8.5. Let P be irreducible of period d and let Co, C
1
, ... ,Cd-l
be the partition obtained in Theorem 1.8.4. Let A be a distribution with
LiEC
o
Ai = 1. Suppose that is Markov(A, P). Then for r =
0,1, ... ,d - 1 and j E C
r
we have
P(X
nd
+
r
= j) dlmj as n 00
where mj is the expected return time to j. In particular, for i E Co and
j E C
r
we have
Proof
(nd+r) dl
Pij mj
as n 00.
Step 1. We reduce to the aperiodic case. Set v = Apr, then by Theorem
1.8.4 we have
Set Y
n
= Xnd+r, then is Markov(v, pd) and, by Theorem 1.8.4, pd
is irreducible and aperiodic on Cr. For j E C
r
the expected return time of
to j is mjld. So if the theorem holds in the aperiodic case, then
P(Xnd+r = j) = P(Y
n
= j) dlmj as n 00
so the theorem holds in general.
Step 2. Assume that P is aperiodic. If P is positive recurrent then 1/mj =
1rj, where 1r is the unique invariant distribution, so the result follows from
Theorem 1.8.3. Otherwise mj = 00 and we have to show that
P(X
n
= j) 0 as n 00.
1.8 Convergence to equilibrium 45
If P is transient this is easy and we are left with the null recurrent
case.
Step 3. Assume that P is aperiodic and null recurrent. Then
00
:EPj(Tj > k) = lEj(Tj) = 00.
k=O
Given € > 0 choose K so that
Then, for n K - 1
n
1 :E P(Xk = j and X
m
1= j for m = k + 1, ... ,n)
k=n-K+l
n
L P(X
k
= j)Pj(Tj > n - k)
k=n-K+l
K-l
= :E P(Xn-k = j)Pj(Tj > k)
k=O
so we must have P(X
n
-
k
= j) €/2 for some k E {O, 1, ... ,K - I}.
Return now to the coupling argument used in Theorem 1.8.3, only now let
be Markov(j.t, P), where j.t is to be chosen later. Set W
n
= (X
n
, Y
n
).
As before, aperiodicity of ensures irreducibility of If
is transient then, on taking j.t = A, we obtain
P(X
n
= j)2 = P(W
n
= (j, j)) 0
as required. Assume then that is recurrent. Then, in the notation
of Theorem 1.8.3, we have P(T < 00) = 1 and the coupling argument shows
that
IP(X
n
= j) - P(Y
n
= j)1 0 as n 00.
We exploit this convergence by taking j.t = Ap
k
for k = 1, ... ,K- 1, so
that P(Y
n
= j) = P(X
n
+
k
= j). We can find N such that for n 2:: Nand
k = 1, ... ,K -1,
IP(Xn = j) - P(Xn+k = j)1 ::; ·
46 1. Discrete-time Markov chains
But for any n we can find k E {O, 1, ... ,K - I} such that P(X
n
+
k
= j) ~
c/2. Hence, for n ~ N
P(X
n
= j) ~ c.
Since c > 0 was arbitrary, this shows that lP(X
n
= j) ~ 0 as n ~ 00, as
required. D
Exercises
1.8.1 Prove the claims (e), (f) and (g) made in example (v) of the Intro-
duction.
1.8.2 Find the invariant distributions of the transition matrices in Exercise
1.1.7, parts (a), (b) and (c), and compare them with your answers there.
1.8.3 A fair die is thrown repeatedly. Let X
n
denote the sum of the first n
throws. Find
lim P(X
n
is a multiple of 13)
n--+-oo
quoting carefully any general theorems that you use.
1.8.4 Each morning a student takes one of the three books he owns from
his shelf. The probability that he chooses book i is Qi, where 0 < Qi < 1 for
i = 1,2,3, and choices on successive days are independent. In the evening
he replaces the book at the left-hand end of the shelf. If Pn denotes the
probability that on day n the student finds the books in the order 1,2,3,
from left to right, show that, irrespective of the initial arrangement of the
books, Pn converges as n ~ 00, and determine the limit.
1.8.5 (Renewal theorem). Let Y
1
, Y
2
, ••• be independent, identically
distributed random variables with values in {I, 2, ... }. Suppose that the
set of integers
{n : P(Y
1
= n) > I}
has greatest common divisor 1. Set J-l = E(Y
1
). Show that the following
process is a Markov chain:
X
n
= inf{m ~ n: m = Y
1
+ ... +Y
k
for some k ~ O} - n.
Determine
lim lP(X
n
= 0)
n--+-oo
and hence show that as n ~ 00
P(n = Y
1
+ ... +Y
k
for some k ~ 0) ~ 1/J-l.
1.9 Time reversal 47
(Think of Y
I
, Y
2
, • •• as light-bulb lifetimes. A bulb is replaced when it fails.
Thus the limiting probability that a bulb is replaced at time n is 1/J-l. Al-
though this appears to be a very special case of convergence to equilibrium,
one can actually recover the full result by applying the renewal theorem to
the excursion lengths ... from state i.)
1.9 Time reversal
For Markov chains, the past and future are independent given the present.
This property is symmetrical in time and suggests looking at Markov chains
with time running backwards. On the other hand, convergence to equilib-
rium shows behaviour which is asymmetrical in time: a highly organised
state such as a point mass decays to a disorganised one, the invariant dis-
tribution. This is an example of entropy increasing. It suggests that if
we want complete time-symmetry we must begin in equilibrium. The next
result shows that a Markov chain in equilibrium, run backwards, is again a
Markov chain. The transition matrix may however be different.
Theorem 1.9.1. Let P be irreducible and have an invariant distribution
1r. Suppose that is Markov(1T', P) and set Y
n
= X
N
-
n
. Then
is MarkovCrr, P), where P= (Pij) is given by
1rjPji = 1riPij for all i, j
and P is also irreducible with invariant distribution 1r.
Proof. First we check that Pis a stochastic matrix:
since 1r is invariant for P. Next we check that 1r is invariant for P:
L 1rjPji = L 1riPij = 1ri
JEI JEI
since P is a stochastic matrix.
We have
P(YO= io, Y
I
= i
l
,··· ,YN = iN)
= P(Xo = iN, Xl = iN-I,·.· ,XN = io)
= 1riNPiNiN-1 ... P i 1io = 1rioPioi1 ... PiN-1iN
48 1. Discrete-time Markov chains
so, by Theorem 1.1.1, is Markov(7r, P). Finally, since P is
irreducible, for each pair of states i, j there is a chain of states io =
i,il, ... ,in-1,i
n
=j withpioil ... Pin-li
n
> O. Then
Pinin-l ... Pil io = 7rioPioil ... Pin-l in /
7r
i
n
> 0
so Pis also irreducible. D
The chain is called the time-reversal of
A stochastic matrix P and a measure A are said to be in detailed balance
if
AiPij = AjPji for all i, j.
Though obvious, the following result is worth remembering because, when
a solution A to the detailed balance equations exists, it is often easier to
find by the detailed balance equations than by the equation A = AP.
Lemma 1.9.2. If P and A are in detailed balance, then A is invariant for
P.
Proof· We have (AP)i = LjE! AjPji = LjE! AiPij = Ai· D
Let be Markov(A, P), with P irreducible. We say that
is reversible if, for all N 1, is also Markov(A, P).
Theorem 1.9.3. Let P be an irreducible stochastic matrix and let A be
a distribution. Suppose that is Markov(A, P). Then the following
are equivalent:
(a) is reversible;
(b) P and A are in detailed balance.
Proof. Both (a) and (b) imply that A is invariant for P. Then both (a) and
(b) are equivalent to the statement that P= P in Theorem 1.9.1. D
We begin a collection of examples with a chain which is not reversible.
Example 1.9.4
Consider the Markov chain with diagram:
1
3
2
3
2
The transition matrix is
1.9 Time reversal
(
0 2/3 1/3)
p= 1/3 0 2/3
2/3 1/3 0
49
and 1r = (1/3, 1/3, 1/3) is invariant. Hence P = pT, the transpose of P.
But p is not symmetric, so P =I P and this chain is not reversible. A
patient observer would see the chain move clockwise in the long run: under
time-reversal the clock would run backwards!
Example 1.9.5
Consider the Markov chain with diagram:
P
. ~
o 1
q P
I I ( . ~
i-I i i+l
q
• II( •
M-l M
where 0 < P = 1 - q < 1. The non-zero detailed balance equations read
AiPi,i+l = Ai+lPi+l,i for i = 0, 1, ... ,M - 1.
So a solution is given by
A= ((p/q)i : i = 0, 1, ... ,M)
and this may be normalised to give a distribution in detailed balance with
P. Hence this chain is reversible.
If P were much larger than q, one might argue that the chain would tend
to move to the right and its time-reversal to the left. However, this ignores
the fact that we reverse the chain in equilibrium, which in this case would
be heavily concentrated near M. An observer would see the chain spending
most of its time near M and making occasional brief forays to the left,
which behaviour is symmetrical in time.
Example 1.9.6 (Random walk on a graph)
A gr.aph G is a countable collection of states, usually called vertices, some
of which are joined by edges, for example:
50 1. Discrete-time Markov chains
1
4
2
3
Thus a graph is a partially drawn Markov chain diagram. There is a natural
way to complete the diagram which gives rise to the random walk on G.
The valency Vi of vertex i is the number of edges at i. We have to assume
that every vertex has finite valency. The random walk on G picks edges
with equal probability:
1
1
2
1
3
4
1
2
1
3
1
3
1
2
2
1
3
1
2
3
Thus the transition probabilities are given by
if (i, j) is an edge
otherwise.
We assume G is connected, so that P is irreducible. It is easy to see that
P is in detailed balance with V == (Vi : i E G). So, if the total valency
a == LiEG Vi is finite, then 1r == V / a is invariant and P is reversible.
Example 1.9.7 (Random chessboard knight)
A random knight makes each permissible move with equal probability. If it
starts in a corner, how long on average will it take to return?
This is an example of a random walk on a graph: the vertices are the
squares of the chessboard and the edges are the moves that the knight can
take:
1.9 Time reversal 51
The diagram shows a part of the graph. We know by Theorem 1.7.7 and
the preceding example that
so all we have to do is identify valencies. The four corner squares have
valency 2, and the eight squares adjacent to the corners have valency 3.
There are 20 squares of valency 4, 16 of valency 6, and the 16 central
squares have valency 8. Hence
lEc(T
c
) = 8 + 24 + 80 + 96 + 128 = 168.
2
Alternatively, if you enjoy solving sets of 64 simultaneous linear equations,
you might try finding 1r from 1rP == 1r, or calculating lEe (T
e
) using Theorem
1.3.5!
Exercises
1.9.1 In each of the following cases determine whether the stochastic matrix
P, which you may assume is irreducible, is reversible:
(a)
p ) .
l-q ,
(b)
(c) 1== {O,l, ... ,N} a n d p ~ J == 0 if Ij -il2:: 2;
52 1. Discrete-time Markov chains
(d) I = {a, 1,2, ... } and POI = 1, Pi,i+l = P, Pi,i-l = 1 - P for i 2:: 1;
(e) Pij = Pji for all i,j E S.
1.9.2 Two particles X and Y perform independent random walks on the
graph shown in the diagram. So, for example, a particle at A jumps to B,
C or D with equal probability 1/3.
D p--_.......
B
C ~ - - - " I I
E
Find the probability that X and Y ever meet at a vertex in the following
cases:
(a) X starts at A and Y starts at B;
(b) X starts at A and Y starts at E. For I = B, DIet MI denote the
expected time, when both X and Y start at I, until they are once
again both at I. Show that 9MD = 16MB.
1.10 Ergodic theorem
Ergodic theorems concern the limiting behaviour of averages over time.
We shall prove a theorem which identifies for Markov chains the long-run
proportion of time spent in each state. An essential tool is the following
ergodic theorem for independent random variables which is a version of the
strong law of large numbers.
Theorem 1.10.1 (Strong law of large numbers). Let Y
I
, Y
2
, ... be
a sequence of independent, identically distributed, non-negative random
1.10 Ergodic theorem
variables with JE(Y
1
) = jj. Then
11']) (Yl + ... +Y
n
) 1
c = .
n
53
Proof. A proof for the case J_l < 00 may be found, for example, in Probability
with Martingales by David Williams (Cambridge University Press, 1991).
The case where J_l = 00 is a simple deduction. Fix N < 00 and set yJN) =
YnAN. Then
(N) (N)
Y
1
+ ···+Y
n
> Y
1
+ ···+Y
n
_ ---t E(Y
1
/\ N)
n n

with probability one. As N i 00 we have E(YI A N) i J_l by monotone
convergence (see Section 6.4). So we must have, with probability 1
Y
1
+ ... + Y
n

n
as n 00.
D
We denote by Vi (n) the number of visits to i before n:
n-l
Vi(n) = L l{xk=i}'
k=O
Then Vi (n )/ n is the proportion of time before n spent in state i. The
following result gives the long-run proportion of time spent by a Markov
chain in each state.
Theorem 1.10.2 (Ergodic theorem). Let P be irreducible and let .x
be any distribution. If (Xn)n?O is Markov(.x, P) then
]p> (Vi (n) ---t ...!- as n ---t 00) = 1
n mi
where mi = Ei(T
i
) is the expected return time to state i. Moreover, in the
positive recurrent case, for any bounded function f : I lR we have
where
and where (7ri : i E I) is the unique invariant distribution.
54 1. Discrete-time Markov chains
Proof. If P is transient, then, with probability 1, the total number Vi of
visits to i is finite, so
Vi(n) < Vi -t 0 = ...!-.
n - n mi
Suppose then that P is recurrent and fix a state i. For T = T
i
we have
P(T < 00) = 1 by Theorem 1.5.7 and is Markov(8
i
, P) and
independent of X
o
, Xl, . .. ,X
T
by the strong Markov property. The long-
run proportion of time spent in i is the same for (XT+n)n>O and (Xn)n>O,
- -
so it suffices to consider the case A = bi.
Write sir) for the length of the rth excursion to i, as in Section 1.5. By
Lemma 1.5.1, the non-negative random variables S;l), 8;2), ... are indepen-
dent and identically distributed with Ei(S;r)) = mi. Now

+ + < - 1
't ••• 't _n,
the left-hand side being the time of the last visit to i before n. Also
the left-hand side being the time of the first visit to i after n - 1. Hence
S;l) + ... + n S;l) + ... +
Vi(n) :::; Vi(n) < Vi(n) (1.8)
By the strong law of large numbers

+ + )
JP> • -t mi as n -t 00 = 1
and, since P is recurrent
P(Vi(n) 00 as n 00) = 1.
So, letting n 00 in (1.8), we get
JP> (Vi7n) -t mi as n -t 00) = 1,
which implies
P (Vi(n) -t ...!- as n -t (0) = 1.
n mi
1.10 Ergodic theorem 55
Assume now that has an invariant distribution (1ri : i E I). Let
I : I lR be a bounded function and assume without loss of generality that
III 1. For any J I we have
I
n-l I ( )
1 - Vi(n)
L !(Xk) -! = -n- - 1ri Ii
k=O 'tEl
L I - 7ril + L I - 7ril
iEJ
L I - 7ril +L +7ri)
iEJ
" IVi(n) I " -n- -
1r
i
iEJ
We proved above that
JP> (Vi ---t 7ri as n ---t 00 for all i) = 1.
Given c > 0, choose J finite so that
and then N = N(w) so that, for n N(w)
"I Vi(n) I -n- - 1ri < c/4.
iEJ
Then, for n N(w), we have
< c,
which establishes the desired convergence. D
We consider now the statistical problem of estimating an unknown tran-
sition matrix P on the basis of observations of the corresponding Markov
chain. Consider, to begin, the case where we have N + 1 observations
The log-likelihood function is given by
I(P) = log(..\xoPxox
l
•• ,PXN-1XN) = L Nij logpij
i,jEl
56 1. Discrete-time Markov chains
up to a constant independent of P, where N
ij
is the number of transitions
from i to j. A standard statistical procedure is to find the maximum likeli-
hood estimate P, which is the choice of P maximizing l(P). Since P must
satisfy the linear constraint E
j
Pij = 1 for each i, we first try to maximize
l (P) + 2: /-LiPij
i,jEI
and then choose (/-li : i E I) to fit the constraints. This is the method of
Lagrange multipliers. Thus we find
N-l N-l
Pij = L 1{Xn =i,X
n
+
1
=j}/ L l{x
n
=i}
n=O n=O
which is the proportion of jumps from i which go to j.
We now turn to consider the consistency of this sort of estimate, that is
to say whether Pij ~ Pij with probability 1 as N ~ 00. Since this is clearly
false when i is transient, we shall slightly modify our approach. Note that
to find Pij we simply have to maximize
2: Nij logPij
jEI
subject to E
j
Pij = 1: the other terms and constraints are irrelevant. Sup-
pose then that instead of N + 1 observations we make enough observations
to ensure the chain leaves state i a total of N times. In the transient case
this may involve restarting the chain several times. Denote again by N
ij
the number of transitions from i to j.
To maximize the likelihood for (Pij : j E I) we still maximize
2: Nij logPij
jEI
subject to E
j
Pij = 1, which leads to the maximum likelihood estimate
Pij = Nij/N.
But Nij = Y
1
+ ... +Y
N
, where Y
n
= 1 if the nth transition from i is to
j, and Y
n
= 0 otherwise. By the strong Markov property Yl, . .. ,YN are
independent and identically distributed random variables with mean Pij.
So, by the strong law of large numbers
P(Pij ~ Pij as N ~ 00) = 1,
which shows that Pij is consistent.
Exercises
1.11 Appendix: recurrence relations 57
1.10.1 Prove the claim (d) made in example (v) of the Introduction.
1.10.2 A professor has N umbrellas. He walks to the office in the morning
and walks home in the evening. If it is raining he likes to carry an um-
brella and if it is fine he does not. Suppose that it rains on each journey
with probability p, independently of past weather. What is the long-run
proportion of journeys on which the professor gets wet?
1.10.3 Let ( X n ) n ~ O be an irreducible Markov chain on I having an invariant
distribution Jr. For J ~ I let ( Y m ) m ~ O be the Markov chain on J obtained
by observing ( X n ) n ~ O whilst in J. (See Example 1.4.4.) Show that ( Y m ) m ~ O
is positive recurrent and find its invariant distribution.
1.10.4 An opera singer is due to perform a long series of concerts. Hav-
ing a fine artistic temperament, she is liable to pullout each night with
probability 1/2. Once this has happened she will not sing again until the
promoter convinces her of his high regard. This he does by sending flowers
every day until she returns. Flowers costing x thousand pounds, 0 ~ x ~ 1,
bring about a reconciliation with probability yIX. The promoter stands to
make £750 from each successful concert. How much should he spend on
flowers?
1.11 Appendix: recurrence relations
Recurrence relations often arise in the linear equations associated to Markov
chains. Here is an account of the simplest cases. A more specialized case
was dealt with in Example 1.3.4. In Example 1.1.4 we found a recurrence
relation of the form
Xn+l = aXn +b.
We look first for a constant solution X
n
= x; then x = ax +b, so provided
a =I 1 we must have x = b/(l - a). Now Yn = X
n
- b/(l - a) satisfies
Yn+l = aYn, so Yn = anyo. Thus the general solution when a =I 1 is given
by
X
n
= Aan +b/ (1 - a)
where A is a constant. When a = 1 the general solution is obviously
X
n
= Xo +nb.
In Example 1.3.3 we found a recurrence relation of the form
aXn+l +bXn +CXn-l = 0
58 1. Discrete-time Markov chains
where a and c were both non-zero. Let us try a solution of the form X
n
= -Xn;
then a-X
2
+ b-X + c = O. Denote by 0: and {3 the roots of this quadratic. Then
Yn = Ao:
n
+B{3n
is a solution. If 0: =1= (3 then we can solve the equations
xo=A+B, xl=Ao:+B{3
so that Yo = Xo and Yl = Xl; but
for all n, so by induction Yn = X
n
for all n. If 0: = {3 # 0, then
Yn = (A +nB)o:n
is a solution and we can solve
so that Yo = Xo and Yl = Xl; then, by the same argument, Yn = X
n
for all
n. The case 0: = (3 = 0 does not arise. Hence the general solution is given
by
{
Ao:n + B{3n
X
n
= (A + nB)a
n
if 0: # {3
if 0: = {3.
1.12 Appendix: asymptotics for n!
Our analysis of recurrence and transience for random walks in Section 1.6
rested heavily on the use of the asymptotic relation
n! rv Ayri(nje)n as n ~ 00
for some A E [1, 00). Here is a derivation.
We make use of the power series expansions for It I < 1
10g(1 +t) = t - ~ t 2 + ~ t 3 - .
10g(1 - t) = -t - ~ t 2 - ~ t 3 - .
By subtraction we obtain
1 (l+t) 13 15
2 log 1 _ t = t + 3t +:5t +....
1.12 Appendix: asymptotics for n! 59
Set An = n!/(nn+l/2e-n) and an = logAn. Then, by a straightforward
calculation
1 (1+(2n+l)-1)
an - an+! = (2n + 1)2" log 1 _ (2n + 1)-1 - 1.
By the series expansion written above we have
{
Ill II}
an-an+! = (2n+1) (2n+1) +3"(2n+1)3 +5 (2n+1)5 + ... -1
1 1 1 1
= 3 (2n + 1)2 + 5 (2n + 1)4 + · ..
1{II}
::; 3" (2n + 1)2 + (2n + 1)4 + · · ·
1 1 1 1
3 (2n + 1)2 - 1 12n - 12(n + 1) .
It follows that an decreases and an - 1/(12n) increases as n ~ 00. Hence
an ~ a for some a E [0, 00) and hence An ~ A, as n ~ 00, where A = ea.
2
Continuous-time Markov chains I
The material on continuous-time Markov chains is divided between this
chapter and the next. The theory takes some time to set up, but once up
and running it follows a very similar pattern to the discrete-time case. To
emphasise this we have put the setting-up in this chapter and the rest in the
next. If you wish, you can begin with Chapter 3, provided you take certain
basic properties on trust, which are reviewed in Section 3.1. The first three
sections of Chapter 2 fill in some necessary background information and are
independent of each other. Section 2.4 on the Poisson process and Section
2.5 on birth processes provide a gentle warm-up for general continuous-
time Markov chains. These processes are simple and particularly important
examples of continuous-time chains. Sections 2.6-2.8, especially 2.8, deal
with the heart of the continuous-time theory. There is an irreducible level
of difficulty at this point, so we advise that Sections 2.7 and 2.8 are read
selectively at first. Some examples of more general processes are given in
Section 2.9. As in Chapter 1 the exercises form an important part of the
text.
2.1 Q-matrices and their exponentials
In this section we shall discuss some of the basic properties of Q-matrices
and explain their connection with continuous-time Markov chains.
Let I be a countable set. A Q-matrix on I is a matrix Q = (qij : i,j E I)
satisfying the following conditions:
2.1 Q-matrices and their exponentials 61
(i) 0 ~ -qii < 00 for all i;
(ii) qij ~ 0 for all i =I j;
(iii) L qij = 0 for all i.
jEI
Thus in each row of Q we can choose the off-diagonal entries to be any non-
negative real numbers, subject only to the constraint that the off-diagonal
row sum is finite:
qi = Lqij < 00.
j#i
The diagonal entry qii is then -qi, making the total row sum zero.
A convenient way to present the data for a continuous-time Markov chain
is by means of a diagram, for example:
1
3
1
2
Each diagram then corresponds to a unique Q-matrix, in this case
Q = ( ~ 2 !1 ~ )
2 1 -3
Thus each off-diagonal entry qij gives the value we attach to the (i, j) arrow
on the diagram, which we shall interpret later as the rate of going from i to
j. The numbers qi are not shown on the diagram, but you can work them
out from the other information given. We shall later interpret qi as the rate
of leaving i.
We may think of the discrete parameter space {O, 1,2, ... } as embedded
in the continuous parameter space [0,00). For P E (0,00) a natural way to
interpolate the discrete sequence (pn : n = 0,1,2, ... ) is by the function
(e
tq
: t ~ 0), where q = logp. Consider now a finite set I and a matrix
62 2. Continuous-time Markov chains I
P = (Pij : i,j E I). Is there a natural way to fill in the gaps in the discrete
sequence (p
n
: n = 0,1,2, ... )?
For any matrix Q = (qij : i, j E I), the series
converges componentwise and we denote its limit by e
Q
. Moreover, if two
matrices Q1 and Q2 commute, then
The proofs of these assertions follow the scalar case closely and are given
in Section 2.10. Suppose then that we can find a matrix Q with e
Q
= P.
Then
e
nQ
= (eQ)n = p
n
so (e
tQ
: t ~ 0) fills in the gaps in the discrete sequence.
Theorem 2.1.1. Let Q be matrix on a finite set I. Set P(t) = e
tQ
. Then
(P(t) : t ~ 0) has the following properties:
(i) P(s +t) = P(s)P(t) for all s, t (semigroup property);
(ii) (P(t) : t ~ 0) is the unique solution to the forward equation
~ P(t) = P(t)Q,
P(O) = I;
(iii) (P(t) : t ~ 0) is the unique solution to the backward equation
P(O) = I;
(iv) for k = 0,1,2, ... , we have
Proof. For any s, t E lR, sQ and tQ commute, so
e
sQ
e
tQ
= e(s+t)Q
proving the semigroup property. The matrix-valued power series
P(t) = f: ( t ~ ) k
k=O
2.1 Q-matrices and their exponentials 63
has infinite radius of convergence (see Section 2.10). So each component is
differentiable with derivative given by term-by-term differentiation:
, 00 t
k
-
1
Qk
P (t) = L (k _ 1)! = P(t)Q = QP(t).
k=1
Hence P(t) satisfies the forward and backward equations. Moreover by
repeated term-by-term differentiation we obtain (iv). It remains to show
that P(t) is the only solution of the forward and backward equations. But
if M(t) satisfies the forward equation, then
.!!.-(M(t)e-
tQ
) = (.!!.-M(t)) e-
tQ
+ M(t) (.!!.-e-
tQ
)
dt dt dt
= M(t)Qe-
tQ
+ M(t)( -Q)e-
tQ
= 0
so M(t)e-
tQ
is constant, and so M(t) = P(t). A similar argument proves
uniqueness for the backward equation. D
The last result was about matrix exponentials in general. Now let us see
what happens to Q-matrices. Recall that a matrix P = (Pij : i, j E I) is
stochastic if it satisfies
(i) 0 ~ Pij < 00 for all i,j;
(ii) LPij = 1 for all i.
jEI
We recall the convention that in the limit t ~ 0 the statement f(t) = O(t)
means that f(t)/t ~ C for all sufficiently small t, for some C < 00. Later
we shall also use the convention that f(t) = o(t) means f(t)/t ~ 0 as t ~ o.
Theorem 2.1.2. A matrix Q on a finite set I is a Q-matrix if and only if
P(t) = e
tQ
is a stochastic matrix for all t ~ o.
Proof. As t ! 0 we have
so qij ~ 0 for i =1= j if and only if Pij (t) ~ 0 for all i, j and t ~ 0 sufficiently
small. Since P(t) = P(t/n)n for all n, it follows that qij ~ 0 for i =1= j if
and only if Pij(t) ~ 0 for all i,j and all t ~ o.
If Q has zero row sums then so does Qn for every n:
""" (n) """ """ (n-1) """ (n-1) """ 0
LJ qik = LJ LJ qij qjk = LJ qij LJ qjk = .
kEI kEI jEI jEI kEI
64
So
2. Continuous-time Markov chains I
00 t
n
(n)
LPij(t) = 1 + L I" Lqij = 1.
n.
jEI n=l jEI
On the other hand, if LjE! Pij(t) = 1 for all t 0, then
L qij = I LPij(t) = O.
jEI t=O jEI
D
Now, if P is a stochastic matrix of the form e
Q
for some Q-matrix, we
can do some sort of filling-in of gaps at the level of processes. Fix some
large integer m and let be discrete-time Markov(.x, e
Q
/
m
). We
define a process indexed by {n/m : n = 0,1,2, ... } by
Then (X
n
: n = 0,1,2, ... ) is discrete-time Markov(.x, (eQ/m)m) (see Exer-
cise 1.1.2) and
Thus we can find discrete-time Markov chains with arbitrarily fine
grids {n/m : n = 0,1,2, ... } as time-parameter sets which give rise to
Markov(.x, P) when sampled at integer times. It should not then be too
surprising that there is, as we shall see in Section 2.8, a continuous-time
process which also has this property.
To anticipate a little, we shall see in Section 2.8 that a continuous-time
Markov chain with Q-matrix Q satisfies
for all n = 0,1,2, ... , all times ° to ... t
n
+l and all states
io, ... ,i
n
+
1
, where Pij(t) is the (i,j) entry in e
tQ
. In particular, the tran-
sition probability from i to j in time t is given by
(Recall that := means 'defined to equal'.) You should compare this with
the defining property of a discrete-time Markov chain given in Section 1.1.
We shall now give some examples where the transition probabilities Pij(t)
may be calculated explicitly.
2.1 Q-matrices and their exponentials 65
Example 2.1.3
We calculate PII(t) for the continuous-time Markov chain with Q-matrix
Q = ( ~ 2 !1 ~ )
2 1 -3
The method is similar to that of Example 1.1.6. We begin by writing down
the characteristic equation for Q:
o= det (x - Q) = x(x +2) (x +4).
This shows that Q has distinct eigenvalues 0, -2, -4. Then PII(t) has the
form
PII(t) = a +be-
2t
+ce-
4t
for some constants a, band c. (This is because we could diagonalize Q by
an invertible matrix U:
Then
00)
(-2t)k 0
o (-4t)k
~ ) U-
I
,
e-
4t
U-
I
so PII(t) must have the form claimed.) To determine the constants we use
1 = PII (0) = a +b+c,
-2 = qII = P ~ I (0) = -2b - 4c,
7 = qii) = P ~ I (0) = 4b +16c,
so
66 2. Continuous-time Markov chains I
A A A
...- ..... -----------..- ...- ..
o 1 2 N-l N
Q=
Example 2.1.4
We calculate Pij(t) for the continuous-time Markov chain with diagram
given above. The Q-matrix is
-A A
-A A
A
-A A
o
where entries off the diagonal and super-diagonal are all zero. The expo-
nential of an upper-triangular matrix is upper-triangular, so Pij(t) = 0 for
i > j. In components the forward equation P'(t) = P(t)Q reads
= -APii(t),
= -APij(t) + APi,j-l (t),
= APiN-l(t),
Pii (0) == 1,
Pij (0) == 0,
PiN(O) = 0,
for i < N,
for i < j < N,
for i < N.
We can solve these equations. First, pii(t) = e-
At
for i < N. Then, for
i <j < N
so, by induction
(At)j-i
Pij(t) = e->.t (j _ i)!'
If i = 0, these are the Poisson probabilities of parameter At. So, start-
ing from 0, the distribution of the Markov chain at time t is the same as
the distribution of min{yt, N}, where yt is a Poisson random variable of
parameter At.
Exercises
2.1.1 Compute Pll(t) for P(t) = e
tQ
, where
Q = !4
2 1 -3
2.2 Continuous-time random processes
2.1.2 Which of the following matrices is the exponential of a Q-matrix?
67
(a) ( ~ ~ ) (b) ( ~ ~ ) (c) ( ~ ~ ) .
What consequences do your answers have for the discrete-time Markov
chains with these transition matrices?
2.2 Continuous-time random processes
Let I be a countable set. A continuous-time random process
with values in I is a family of random variables X
t
: n ~ I. We are going
to consider ways in which we might specify the probabilistic behaviour (or
law) of ( X t ) t ~ o . These should enable us to find, at least in principle,
any probability connected with the process, such as lP(X
t
= i) or
lP(X
to
= io, ... ,X
tn
= in), or P(X
t
= i for some t). There are subtleties in
this problem not present in the discrete-time case. They arise because, for
a countable disjoint union
whereas for an uncountable union Ut>o At there is no such rule. To avoid
these subtleties as far as possible we shall restrict our attention to processes
( X t ) t ~ O which are right-continuous. This means in this context that for all
wEn and t ~ 0 there exists € > 0 such that
for t ~ s ~ t + €.
By a standard result of measure theory, which is proved in Section 6.6,
the probability of any event depending on a right-continuous process can
be determined from its finite-dimensional distributions, that is, from the
probabilities
lP(X
to
= io, X
t1
= il, ,X
tn
= in)
for n ~ 0, 0 ~ to ~ tl ~ ... ~ t
n
and i
o
, ,in E I. For example
P(X
t
= i for some t E [0,00)) = 1- lim '""" P(X
q1
= ji, ... ,X
qn
= jn)
n--+-oo L.J
jl , ... ,in=l=i
where ql, q2, . .. is an enumeration of the rationals.
68 2. Continuous-time Markov chains I
Every path t ~ Xt(w) of a right-continuous process must remain con-
stant for a while in each new state, so there are three possibilities for the
sorts of path we get. In the first case the path makes infinitely many jumps,
but only finitely many in any interval [0, t]:

o


o
~ -
J
o
= 0
1.1
t
The second case is where the path makes finitely many jumps and then
becomes stuck in some state forever:

J
o
= 0
J.2
t
In the third case the process makes infinitely many jumps in a finite interval;
this is illustrated below. In this case, after the explosion time ( the process
starts up again; it may explode again, maybe infinitely often, or it may
not.
2.2 Continuous-time random processes
...
:.
69

"
. ...
.......-
• •
...-.--
J
o
= 0
..........------e.
t
..-
14----....._----....
We call J
o
, J
1
, ... the jump times of and 8
1
,8
2
, ... the holding
times. They are obtained from by
for n = 0,1, ... , where inf 0= 00, and, for n = 1,2, ... ,
if I
n
-1 < 00
otherwise.
Note that right-continuity forces 8
n
> 0 for all n. If I
n
+
1
= 00 for some
n, we define X
oo
= XJ
n
, the final value, otherwise X
oo
is undefined. The
(first) explosion time ( is defined by
00
(= supJ
n
= I:Sn.
n n==1
The discrete-time process given by Y
n
= X
Jn
is called the jump
process of or the jump chain if it is a discrete-time Markov chain.
This is simply the sequence of values taken by up to explosion.
We shall not consider what happens to a process after explosion. So it
is convenient to adjoin to I a new state, 00 say, and require that X
t
= 00
if t (. Any process satisfying this requirement is called minimal. The
terminology 'minimal' does not refer to the state of the process but to the
70 2. Continuous-time Markov chains I
interval of time over which the process is active. Note that a minimal
process may be reconstructed from its holding times and jump process.
Thus by specifying the joint distribution of 8
1
,82, ... and we have
another 'countable' specification of the probabilistic behaviour of
For example, the probability that X
t
= i is given by
00
IP(X
t
= i) = L IP(Y
n
= i and I
n
:::; t < I
n
+1)
n=O
and
lP(X
t
= i for some t E [0,00)) = lP(Y
n
= i for some n 0).
2.3 Some properties of the exponential distribution
A random variable T : n [0,00] has exponential distribution of parameter
-X (0 -X < 00) if
lP(T > t) = e-
At
for all t O.
We write T rv E(-X) for short. If -X > 0, then T has density function
The mean of T is given by
E(T) = 1
00
IP(T > t)dt = A-I.
The exponential distribution plays a fundamental role in continuous-time
Markov chains because of the following results.
Theorem 2.3.1 (Memoryless property). A random variable T: n
(0,00] has an exponential distribution if and only if it has the following
memoryless property:
lP(T> s +tiT> s) = lP(T > t) for all s, t O.
Proof. Suppose T rv E(-X), then
P(T> s +t) e-A(s+t) -At
IP(T> s + tiT> s) = IP(T> s) = e->'s = e = IP(T > t).
2.3 Some properties of the exponential distribution 71
On the other hand, suppose T has the memoryless property whenever
JP>(T> s) > O. Then g(t) = JP>(T > t) satisfies
g(s +t) = g(s)g(t) for all s, t ~ O.
We assumed T > 0 so that g(l/n) > 0 for some n. Then, by induction
so g(l) = e-
A
for some 0 ::; A < 00. By the same argument, for integers
p,q ~ 1
g(p/q) = g(l/q)P = g(l)pjq
so g(r) = e-
AT
for all rationals r > O. For real t > 0, choose rationals
r, s > 0 with r ::; t ::; s. Since 9 is decreasing,
e-
AT
= g(r) ~ g(t) ~ g(s) = e-
AS
and, since we can choose rand s arbitrarily close to t, this forces g(t) = e-
At
,
so T rv E(A). 0
The next result shows that a sum of independent exponential random
variables is either certain to be finite or certain to be infinite, and gives a cri-
terion for deciding which is true. This will be used to determine whether or
not certain continuous-time Markov chains can take infinitely many jumps
in a finite time.
Theorem 2.3.2. Let 8
1
, 8
2
, . .. be a sequence ofindependent random vari-
ables with 8
n
rv E(A
n
) and 0 < An < 00 for all n.
00 1 (00)
(i) I f ~ An < 00, then JP> ~ Sn < 00 = l.
00 1 (00)
(ii) If ~ An = 00, then JP> ~ Sn = 00 = l.
Proof. (i) Suppose E ~ 1 1/An < 00. Then, by monotone convergence
so
72 2. Continuous-time Markov chains I
(ii) Suppose instead that E ~ 1 1 / An = 00. Then r r ~ o ( 1 + 1/An) = 00.
By monotone convergence and independence
so
JP> (fSn = 00) = 1.
n==l
D
The following result is fundamental to continuous-time Markov chains.
Theorem 2.3.3. Let I be a countable set and let Tk' k E I, be independent
random variables with Tk rv E(qk) and 0 < q := EkE! qk < 00. Set
T = infk T
k
. Then this infimum is attained at a unique random value K of
k, with probability 1. Moreover, T and K are independent, with T rv E(q)
and lP(K = k) = qk/q.
Proof. Set K = k if T
k
< T
j
for all j =I k, otherwise let K be undefined.
Then
lP(K = k and T ~ t)
=lP(Tk ~ t and Tj > Tk for all j =I k)
= 1
00
qke-qkBP(Tj > s for all j 1= k)ds
= (OO qk
e
-
qkB
IT e-qjBds
it j#
1
00
- 8 qk - t
= qke q ds = -e q .
t q
Hence lP(K = k for some k) = 1 and T and K have the claimed joint
distribution. D
The following identity is the simplest case of an identity used in Section
2.8 in proving the forward equations for a continuous-time Markov chain.
Theorem 2.3.4. For independent random variables S rv E(A) and R rv
E(jj) and for t ~ 0, we have
jjJP>(S ~ t < S +R) = AJP>(R ~ t < R+ S).
2.4 Poisson processes
Proof. We have
73
from which the identity follows by symmetry. D
Exercises
2.3.1 Suppose Sand T are independent exponential random variables of
parameters Q and (3 respectively. What is the distribution of min{S, T}?
What is the probability that S :::; T? Show that the two events {S < T}
and {min{S, T} ~ t} are independent.
2.3.2 Let Tl, T
2
, . .. be independent exponential random variables of pa-
rameter A and let N be an independent geometric random variable with
JP>(N = n) = (3(1 - (3)n-l,
n = 1,2, ....
Show that T = 2::1 Ti has exponential distribution of parameter A(3.
2.3.3 Let 81, 8
2
, . .. be independent exponential random variables with
parameters AI, A2, . .. respectively. Show that AlSl is exponential of pa-
rameter 1.
Use the strong law of large numbers to show, first in the special case
An = 1 for all n, and then subject only to the condition sUPn An < 00, that
JP> (fSn = 00) = 1 .
n=l
Is the condition sUPn An < 00 absolutely necessary?
2.4 Poisson processes
Poisson processes are some of the simplest examples of continuous-time
Markov chains. We shall also see that they may serve as building blocks
for the most general continuous-time Markov chain. Moreover, a Poisson
process is the natural probabilistic model for any uncoordinated stream of
discrete events in continuous time. So we shall study Poisson processes
first, both as a gentle warm-up for the general theory and because they
are useful in themselves. The key result is Theorem 2.4.3, which provides
three different descriptions of a Poisson process. The reader might well
begin with the statement of this result and then see how it is used in the
74 2. Continuous-time Markov chains I
theorems and examples that follow. We shall begin with a definition in
terms of jump chain and holding times (see Section 2.2). A right-continuous
process with values in {O, 1,2, ... } is a Poisson process of rate A
(0 < A < 00) if its holding times 8
1
, 8
2
, . .. are independent exponential
random variables of parameter A and its jump chain is given by Y
n
= n.
Here is the diagram:
A A A A
.. . .. . . .. .
o 1 234
The associated Q-matrix is given by
-A A
-A A
Q=
By Theorem 2.3.2 (or the strong law of large numbers) we have
P(J
n
00) = 1 so there is no explosion and the law of is uniquely
determined. A simple way to construct a Poisson process of rate A is to
take a sequence 8
1
, 8
2
, . .. of independent exponential random variables of
parameter A, to set J
o
= 0, I
n
= 81 + ... + 8
n
and then set
. .
. . . ..
5 : : : : ..- --
4 : -: :....................... •'---00········
3 ; :.... • 0,······ .: .
. . . ..
.. .
2 : : : .
. . ..
1 ........... .••-------ec;> : : : .
t
J.4

J
o
= (}
2.4 Poisson processes 75
The diagram illustrates a typical path. We now show how the memory-
less property of the exponential holding times, Theorem 2.3.1, leads to a
memoryless property of the Poisson process.
Theorem 2.4.1 (Markov property). Let be a Poisson process
of rate -X. Then, for any s 0, (X
s
+
t
- is also a Poisson process of
rate -X, independent of(X
r
: r:::; s).
Proof. It to prove the claim conditional on the event X
s
= i, for
each i O. Set X
t
= X
s
+
t
- X
s
. We have
On this event
i
X r = L l{srSt} for r ::; s
j==1
and the holding times 8
1
,8
2
, ... of are given by
8
1
= 8
i
+
1
- (s - J
i
), 8
n
= 8
i
+
n
for n
as shown in the diagram.
o
ill(
s
Recall that the holding times 8
1
,8
2
, ... are independent E(A). Condition
on 8
1
, ... ,8
i
and {X
s
= i}, then by the memoryless property of 8
i
+
1
and independence, 81,8
2
, ... are themselves independent E(-X). Hence,
conditional on {X
s
= i}, 8
1
,8
2
, ... are independent E(-X), and independent
of 8
1
, ... ,8
i
. Hence, conditional on {X
s
= i}, is a Poisson process
of rate A and independent of (X
r
: r s). D
In fact, we shall see in Section 6.5, by an argument in essentially the
same spirit that the result also holds with s replaced by any stopping time
T of
76 2. Continuous-time Markov chains I
Theorem 2.4.2 (Strong Markov property). Let be a Poisson
process oErate A and let T be a stopping time Then, conditional
on T < 00, (X
T
+
t
- XT is also a Poisson process of rate A, independent

Here is some standard terminology. If is a real-valued process,
we can consider its increment X
t
- X
s
over any interval (s, t]. We say that
has stationary increments if the distribution of X
s
+
t
- X
s
depends
only on t O. We say that has independent increments if its
increments over any finite collection of disjoint intervals are independent.
We come to the key result for the Poisson process, which gives two condi-
tions equivalent to the jump chain/holding time characterization which we
took as our original definition. Thus we have three alternative definitions
of the same process.
Theorem 2.4.3. Let be an increasing, right-continuous integer-
valued process starting from O. Let 0 < A < 00. Then the following three
conditions are equivalent:
(a) (jump chain/holding time definition) the holding times 8
1
,8
2
, ... of
are independent exponential random variables of parameter
A and the jump chain is given by Y
n
= n for all n;
(b) (infinitesimal definition) has independent increments and, as
h ! 0, uniformly in t,
IF(X
t
+
h
- X
t
= 0) = 1 - Ah +o(h), IF(X
t
+
h
- X
t
= 1) = Ah +o(h);
(c) (transition probability definition) has stationary independent
increments and, for each t, X
t
has Poisson distribution of parameter
At.
If satisfies any of these conditions then it is called a Poisson process
of rate A.
Proof. (a) => (b) If (a) holds, then, by the Markov property, for any t, h 0,
the increment X
t
+
h
- X
t
has the same distribution as X
h
and is independent
of (X
s
: s t). So has independent increments and as h ! 0
lP(Xt+h - X
t
1) = lP(X
h
1) = lP(J1 h) = 1 - e-)..h = Ah +o(h),
lP(Xt+h - X
t
2) = lP(X
h
2) = lP(J2 h)
IF(8
1
h and 8
2
h) = (1 - e-
Ah
)2 = o(h),
which implies (b).
2.4 Poisson processes 77
(b) ::::} (c) If (b) holds, then, for i = 2,3, ... , we have P(X
t
+
h
- X
t
= i) =
o(h) as h ! 0, uniformly in t. Set Pj(t) = P(X
t
= j). Ther:, for j = 1,2, ... ,
j
Pj(t + h) = P(X
Hh
= j) = L P(X
Hh
- X
t
= i) P(X
t
= j - i)
i==O
= (1 - Ah +o(h) )Pj(t) + (Ah +o(h))Pj-l (t) +o(h)
so
Pj(t + - Pj(t) = ->'Pj(t) + >'Pj-l(t) + O(h).
Since this estimate is uniform in t we can put t = s - h to obtain for all
s,?-h
Now let h ! 0 to see that Pj(t) is first continuous and then differentiable
and satisfies the differential equation
By a simpler argument we also find
= -APO(t).
Since X
o
= 0 we have initial conditions
PO(O) = 1, Pj(O) = 0 for j = 1,2, ....
As we saw in Example 2.1.4, this system of equations has a unique solution
given by
(At)j
Pj(t) = e-
At
_.,_, j = 0,1,2, ....
J.
Hence X
t
rv P(At). If satisfies (b), then certainly has
independent increments, but also (X
s
+
t
- satisfies (b), so the above
argument shows X
s
+
t
- X
s
rv P(At), for any s, which implies (c).
(c) ::::} (a) There is a process satisfying (a) and we have shown that it must
then satisfy (c). But condition (c) determines the finite-dimensional distri-
butions of and hence the distribution of jump chain and holding
times. So if one process satisfying (c) also satisfies (a), so must every process
satisfying (c). D
The differential equations which appeared in the proof are really the
forward equations for the Poisson process. To make this clear, consider the
78 2. Continuous-time Markov chains I
possibility of starting the process from i at time 0, writing Pi as a reminder,
and set
Pij(t) = Pi(X
t
= j).
Then, by spatial homogeneity Pij(t) = Pj-i(t), and we could rewrite the
differential equations as
= -APiO(t),
= APi,j-l(t) - APij(t),
or, in matrix form, for Q as above,
PiO(O) = 8
iO
,
Pij(O) = 8ij
P'(t) = P(t)Q, P(O) = I.
Theorem 2.4.3 contains a great deal of information about the Poisson
process of rate A. It can be useful when trying to decide whether a given
process is a Poisson process as it gives you three alternative conditions to
check, and it is likely that one will be easier to check than another. On the
other hand it can also be useful when answering a question about a given
Poisson process as this question may be more closely connected to one defi-
nition than another. For example, you might like to consider the difficulties
in approaching the next result using the jump chain/holding time definition.
Theorem 2.4.4. If and are independent Poisson processes
of rates A and J-l, respectively, then (X
t
+ is a Poisson process of rate
A+ J-l.
Proof. We shall use the infinitesimal definition, according to which
and have independent increments and, as h ! 0, uniformly in t,
P(X
t
+
h
- X
t
= 0) == 1 - Ah + o(h), P(X
t
+
h
- X
t
== 1) == Ah + o(h),
P(¥t+h - ¥t = 0) = 1 - J-lh + o(h), P(¥t+h - yt = 1) = J-lh + o(h).
Set Zt = X
t
+yt. Then, since and are independent,
has independent increments and, as h ! 0, uniformly in t,
P(Zt+h - Zt = 0) = P(X
t
+
h
- X
t
= O)P(yt+h - yt = 0)
= (1 - Ah + o(h))(l - J-lh + o(h)) = 1 - (A + J-l)h +o(h),
lP(Zt+h - Zt = 1) = P(X
t
+
h
- X
t
= l)lP(yt+h - yt == 0)
+lP(X
t
+
h
- X
t
== O)lP(yt+h - yt == 1)
== (Ah +o(h) )(1 - J-lh +o(h)) + (1 - Ah +o(h) )(J-lh +o(h))
= (A + J-l)h + o(h).
Hence is a Poisson process of rate A+ J-l. D
2.4 Poisson processes 79
Next we establish some relations between Poisson processes and the uni-
form distribution. Notice that the conclusions are independent of the rate
of the process considered. The results say in effect that the jumps of a
Poisson process are as randomly distributed as possible.
Theorem 2.4.5. Let be a Poisson process. Then, conditional on
having exactly one jump in the interval [s, s +t], the time at which
that jump occurs is uniformly distributed on [s, s +t].
Proof. We shall use the finite-dimensional distribution definition. By sta-
tionarity of increments, it suffices to consider the case s = O. Then, for
o u t,
IF(J1 u I X
t
= 1) = IF(J
1
u and X
t
= l)/lF(X
t
= 1)
= IF(X
u
= 1 and X
t
- Xu = O)/lF(X
t
= 1)
= Aue-'xue-,X(t-u) /(Ate-'xt) = u/t. D
Theorem 2.4.6. Let be a Poisson process. Then, conditional on
the event {X
t
= n}, the jump times J
1
, ... ,I
n
have joint density function
f(t1, . .. ,tn) = n! ...
Thus, conditional on {X
t
= n}, the jump times J
1
, ... ,I
n
have the same
distribution as an ordered sample of size n from the uniform distribution
on [O,t].
Proof. The holding times S1, . .. ,Sn+1 have joint density function
An+1 e-'x(Sl +...+Sn+l) 1
{Sl ,... ,Sn+l
so the jump times J1, ... ,I
n
+
1
have joint density function
So for A jRn we have
IF((J
1
, ... ,I
n
) E A and X
t
= n) = IF((J
1
, ... ,I
n
) E A and I
n
t < I
n
+1)
= e-,Xt An 1 ... ... dtn
(tl ,... ,tn)EA
and since IF(X
t
= n) = e-,Xt AnIn! we obtain
as required. 0
80 2. Continuous-time Markov chains I
We finish with a simple example typical of many problems making use
of a range of properties of the Poisson process.
Example 2.4.7
Robins and blackbirds make brief visits to my birdtable. The probability
that in any small interval of duration h a robin will arrive is found to be
ph+o(h), whereas the corresponding probability for blackbirds is (3h+o(h).
What is the probability that the first two birds I see are both robins? What
is the distribution of the total number of birds seen in time t? Given that
this number is n, what is the distribution of the number of blackbirds seen
in time t?
By the infinitesimal characterization, the number of robins seen by time
t is a Poisson process of rate p, and the number of blackbirds is
a Poisson process of rate (3. The times spent waiting for the first
robin or blackbird are independent exponential random variables 8
1
and T
1
of parameters p and (3 respectively. So a robin arrives first with probability
p/(p + (3) and, by the memoryless property of T
1
, the probability that
the first two birds are robins is p2 / (p +(3)2. By Theorem 2.4.4 the total
number of birds seen in an interval of duration t has Poisson distribution
of parameter (p +(3)t. Finally
JP>(B
t
= k I R
t
+B
t
= n) = JP>(B
t
= k and R
t
= n - k)/JP>(R
t
+B
t
= n)
= (e-
f3
(3k) (e-
ppn
-
k
)/ (e-
CP
+
f3
)(p+(3)n)
k! (n - k)! n!
=
so if n birds are seen in time t, then the distribution of the number of
blackbirds is binomial of parameters nand {3/ (p +(3).
Exercises
2.4.1 State the transition probability definition of a Poisson process. Show
directly from this definition that the first jump time J1 of a Poisson process
of rate A is exponential of parameter A.
Show also (from the same definition and without assuming the strong
Markov property) that
and hence that J
2
- J1 is also exponential of parameter A and independent
of J1.
2.5 Birth processes 81
2.4.2 Show directly from the infinitesimal definition that the first jump time
J
1
of a Poisson process of rate A has exponential distribution of parameter
A.
2.4.3 Arrivals of the Number 1 bus form a Poisson process of rate one bus
per hour, and arrivals of the Number 7 bus form an independent Poisson
process of rate seven buses per hour.
(a) What is the probability that exactly three buses pass by in one hour?
(b) What is the probability that exactly three Number 7 buses pass by
while I am waiting for a Number I?
(c) When the maintenance depot goes on strike half the buses break down
before they reach my stop. What, then, is the probability that I wait
for 30 minutes without seeing a single bus?
2.4.4 A radioactive source emits particles in a Poisson process of rate A.
The particles are each emitted in an independent random direction. A
Geiger counter placed near the source records a fraction p of the particles
emitted. What is the distribution of the number of particles recorded in
time t?
2.4.5 A pedestrian wishes to cross a single lane of fast-moving traffic. Sup-
pose the number of vehicles that have passed by time t is a Poisson process
of rate A, and suppose it takes time a to walk across the lane. Assuming
that the pedestrian can foresee correctly the times at which vehicles will
pass by, how long on average does it take to cross over safely? [Consider
the time at which the first car passes.]
How long on average does it take to cross two similar lanes (a) when one
must walk straight across (assuming that the pedestrian will not cross if,
at any time whilst crossing, a car would pass in either direction), (b) when
an island in the middle of the road makes it safe to stop half-way?
2.5 Birth processes
A birth process is a generalization of a Poisson process in which the param-
eter A is allowed to depend on the current state of the process. The data
for a birth process consist of birth rates 0 ~ qj < 00, where j == 0,1,2, ....
We begin with a definition in terms of jump chain and holding times. A
minimal right-continuous process ( X t ) ( ~ O with values in {O, 1,2, ... } U{oo}
is a birth process of rates (qj : j ~ 0) if, conditional on X
o
== i, its holding
times 8
1
,8
2
, are independent exponential random variables of param-
eters qi, qi+1, , respectively, and its jump chain is given by Y
n
== i +n.
82 2. Continuous-time Markov chains I
ql q2 q3
~ . ~ . .. .
o 1 2 3 4
The flow diagram is shown above and the Q-matrix is given by:
Q=
Example 2.5.1 (Simple birth process)
Consider a population in which each individual gives birth after an expo-
nential time of parameter -X, all independently. If i individuals are present
then the first birth will occur after an exponential time of parameter iA.
Then we have i + 1 individuals and, by the memoryless property, the pro-
cess begins afresh. Thus the size of the population performs a birth process
with rates qi = i-X. Let X
t
denote the number of individuals at time t and
suppose X
o
= 1. Write T for the time of the first birth. Then
E(X
t
) = E ( X t l T ~ t ) +E(X
t
lT>t)
= it ,xe-ASlE(X
t
I T = s )ds + e-
At

Put J-l(t) = E(X
t
), then E(X
t
IT = s) = 2J-l(t - s), so
p,(t) = it 2,xe-
AS
p,(t - s)ds + e-
At
and setting r = t - s
By differentiating we obtain
so the mean population size grows exponentially:
2.5 Birth processes 83
t
Much of the theory associated with the Poisson process goes through
for birth processes with little change, except that some calculations can no
longer be made so explicitly. The most interesting new phenomenon present
in birth processes is the possibility of explosion. For certain choices of birth
rates, a typical path will make infinitely many jumps in a finite time, as
shown in the diagram. The convention of setting the process to equal 00
after explosion is particularly appropriate for birth processes!
· ..
8 : : :.. .:•..................
7 ; : : :.. : .•..................
· .
6 ; : : :.. •...................
· ..
5 : : : :.. .
· "
4 : : .
· ..
3 : : ...-...0. : .:.: .
2 ..............• .: .. : .:.:: .
1 ....................•• : :.. :.:: .

J
o
== 0
In fact, Theorem 2.3.2 tells us exactly when explosion will occur.
Theorem 2.5.2. Let be a birth process of rates (qj : j > 0),
starting from o.
00 1
(i) IfI: - < 00, then P(( < 00) = 1.
j==O qj
00 1
(ii) If:E - = 00, then P(( = 00) = 1.
j==O qj
Proof. Apply Theorem 2.3.2 to the sequence of holding times 8
1
,8
2
,. . .. D
The proof of the Markov property for- the Poisson process is easily
adapted to give the following generalization.
84 2. Continuous-time Markov chains I
Theorem 2.5.3 (Markov property). Let be a birth process of
rates (qj : j 2:: 0). Then, conditional on X
s
= i, is a birth process
of rates (qj : j 0) starting from i and independent of (X
r
: r s).
We shall shortly prove a theorem on birth processes which generalizes the
key theorem on Poisson processes. First we must see what will replace the
Poisson probabilities. In Theorem 2.4.3 these arose as the unique solution
of a system of differential equations, which we showed were essentially the
forward equations. Now we can still write down the forward equation
P'(t) = P(t)Q, P(O) = I.
or, in components
and, for j = 1, 2, ...
Moreover, these equations still have a unique solution; it is just not as
explicit as before. For we must have
which can be substituted in the equation
1( t) = PiO(t )qo - Pi 1( t )q1 , Pi 1( 0) = 8
i
1
and this equation solved to give
Now we can substitute for Pi1(t) in the next equation up the hierarchy and
find an explicit expression for Pi2(t), and so on.
Theorem 2.5.4. Let be an increasing, right-continuous process
with values in {O, 1,2, ... } U {oo}. Let 0 qj < 00 for all j o. Then the
following three conditions are equivalent:
(a) (jump chain/holding time definition) conditional on X
o
= i, the hold-
ing times 8
1
,8
2
, ... are independent exponential random variables of
parameters qi, qi+1, . .. respectively and the jump chain is given by
Y
n
= i +n for all n;
2.5 Birth processes 85
(b) (infinitesimal definition) for all t, h ~ 0, conditional on X
t
= i, X
t
+
h
is independent of (X
s
: s ~ t) and, as h ! 0, uniformly in t,
IF(X
t
+
h
= i I X
t
= i) = 1 - qih +o(h),
lP(X
t
+
h
= i +1 I X
t
= i) = qih +o(h);
(c) (transition probability definition) for all n = 0,1,2, ... , all times °~
to ~ ... ~ tn+l and all states io, ... ,in+l
where (pij(t) : i,j = 0,1,2, ... ) is the unique solution of the forward
equations.
If ( X t ) t ~ O satisfies any of these conditions then it is called a birth process
of rates (qj : j ~ 0).
Proof. (a) => (b) If (a) holds, then, by the Markov property for any t, h ~ 0,
conditional on X
t
= i, X
t
+
h
is independent of (X
s
: s ~ t) and, as h ! 0,
uniformly in t,
lP(X
t
+
h
~ i +1 I X
t
= i) = lP(X
h
~ i +1 I X
o
= i)
= IF(Jl ~ h I X
o
= i) = 1 - e-
qih
= qih +o(h),
and
IF(X
t
+
h
~ i +2 I Xt = i) = IF(X
h
~ i +2 I X
o
= i)
= IF( J
2
~ h I X
o
= i) ~ IF(Sl ~ hand S2 ~ h I X
o
= i)
= (1 - e-
qih
)(l - e-
qi
+
1h
) = o(h),
which implies (b).
(b) => (c) If (b) holds, then certainly for k = i +2, i +3, ...
IF(X
t
+
h
= k I X
t
= i) = o(h) as h ! 0, uniformly in t.
Set Pij(t) = IF(X
t
= j I X
o
= i). Then, for j = 1,2, ...
Pij(t +h) = IF(X
t
+
h
= j I X o = i)
j
= LJP>(X
t
= k I X
o
= i)JP>(X
Hh
= j I X
t
= k)
k==i
=Pij(t)(l - qjh +o(h)) +Pi,j-l(t)(qj-lh +o(h)) +o(h)
86
so
2. Continuous-time Markov chains I
As in the proof of Theorem 2.4.3, we can deduce that Pij(t) is differentiable
and satisfies the differential equation
By a simpler argument we also find
Thus (Pij(t) : i,j = 0,1,2, ... ) must be the unique solution to the forward
equations. If satisfies (b), then certainly
but also satisfies (b), so
by uniqueness for the forward equations. Hence satisfies (c).
(c) =* (a) See the proof of Theorem 2.4.3. D
Exercise
2.5.1 Each bacterium in a colony splits into two identical bacteria after
an exponential time of parameter -X, which then split in the same way but
independently. Let X
t
denote the size of the colony at time t, and suppose
X
o
= 1. Show that the probability generating function ¢(t) = E(zX
t
)
satisfies
Make a change of variables u = t - s in the integral and deduce that
d¢/dt = -X¢(¢ - 1). Hence deduce that, for q = 1 - e-
At
and n = 1,2, ...
2.6 Jump chain and holding times
2.6 Jump chain and holding times
87
This section begins the theory of continuous-time Markov chains proper,
which will occupy the remainder of this chapter and the whole of the next.
The approach we have chosen is to introduce continuous-time chains in
terms of the joint distribution of their jump chain and holding times. This
provides the most direct mathematical description. It also makes possible
a number of constructive realizations of a given Markov chain, which we
shall describe, and which underlie many applications.
Let I be a countable set. The basic data for a continuous-time Markov
chain on I are given in the form of a Q-matrix. Recall that a Q-matrix on
I is any matrix Q = (qij : i, j E I) which satisfies the following conditions:
(i) 0 ~ -qii < 00 for all i;
(ii) qij ~ 0 for all i =I j;
(iii) L qij = 0 for all i.
JEI
We will sometimes find it convenient to write qi or q(i) as an alternative
notation for -qii.
We are going to describe a simple procedure for obtaining from a Q-
matrix Q a stochastic matrix IT. The jump matrix IT = (7rij : i, j E I) of Q
is defined by
{
qij / qi if j =I i and qi =I 0
7rij = 0 if j =I i and qi = 0,
7rii = {O ~ f qi 1= 0
1 If qi = O.
This procedure is best thought of row by row. For each i E I we take,
where possible, the off-diagonal entries in the ith rovJ of Q and scale them
so they add up to 1, putting a 0 on the diagonal. This is only impossible
when the off-diagonal entries are all 0, then we leave them alone and put a
1 on the diagonal. As you will see in the following example, the associated
diagram transforms into a discrete-time Markov chain diagram simply by
rescaling all the numbers on any arrows leaving a state so they add up to
1.
Example 2.6.1
The Q-matrix
Q = ( ~ 2 ~ 1 ~ )
2 1 -3
88
has diagram:
2. Continuous-time Markov chains I
1
3
1
2
The jump matrix IT of Q is given by
II = ( ~
2/3
and has diagram:
1/2
o
1/3
1
3
1
3
2
Here is the definition of a continuous-time Markov chain in terms of its
jump chain and holding times. Recall that a minimal process is one which
is set equal to 00 after any explosion - see Section 2.2. A minimal right-
continuous process ( X t ) t ~ O on I is a Markov chain with initial distribution
.x and generator matrix Q if its jump chain ( Y n ) n ~ O is discrete-time Mar-
kov(.x, IT) and if for each n ~ 1, conditional on yo, ... ,Y
n
-
1
, its holding
times Sl, ,Sn are independent exponential random variables of param-
eters q(Y
o
), ,q(Yn-1) respectively. We say ( X t ) t ~ O is Markov(.x, Q) for
short. We can construct such a process as follows: let ( Y n ) n ~ O be discrete-
time Markov(.x, IT) and let T
1
, T
2
, ••• be independent exponential random
2.6 Jump chain and holding times 89
variables of parameter 1, independent of Set 8
n
= T
n
/q(Y
n
-
1
),
I
n
= 8
1
+... +8
n
and
{
Yn if I
n
t < I
n
+
1
for some n
X
t
= 00
otherwise.
Then has the required properties.
We shall now describe two further constructions. You will need to un-
derstand these constructions in order to identify processes in applications
which can be modelled as Markov chains. Both constructions make direct
use of the entries in the Q-matrix, rather than proceeding first via the jump
matrix. Here is the second construction.
We begin with an initial state X
o
= Yo with distribution .x, and with an
array : n 1, j E I) of independent exponential random variables of
parameter 1. Then, inductively for n = 0,1,2, ... , if Y
n
= i we set
= for j =I i,
Sn+l =
J-r-'I,
y: _ {j if = 8n +1 < 00
n+1 - . ·f S
'l 1 n+1 = 00.
Then, conditional on Y
n
= i, the random variables are independent
exponentials of parameter qij for all j -=I i. So, conditional on Y
n
= i,
by Theorem 2.3.3, 8
n
+
1
is exponential of parameter qi = Ej:j=i qij, Y
n
+
1
has distribution (7rij : j E I), and 8
n
+
1
and Y
n
+
1
are independent, and
independent of yo, . .. ,Y
n
and 8
1
, ... ,8
n
, as required. This construction
shows why we call qi the rate of leaving i and qij the rate of going from i
to j.
Our third and final construction of a Markov chain with generator matrix
Q and initial distribution .x is based on the Poisson process. Imagine the
state-space I as a labyrinth of chambers and passages, each passage shut
off by a single door which opens briefly from time to time to allow you
through in one direction only. Suppose the door giving access to chamber
j from chamber i opens at the jump times of a Poisson process of rate qij
and you take every chance to move that you can, then you will perform
a Markov chain with Q-matrix Q. In more mathematical terms, we begin
with an initial state X
o
= Yo with distribution .x, and with a family of
independent Poisson processes : i,j E I,i =I j}, having
rate qij. Then set J
o
= °and define inductively for n = 0,1,2, ...
I
n
+
1
= inf{t > I
n
: Nr
nj
=I Nj:
j
for some j #ly
n
}
{
j if I
n
+
1
< 00 and Njnj =I Njnj
Y
n
+
1
= . . n+l n
'l If I
n
+
1
= 00.
90 2. Continuous-time Markov chains I
The first jump time of is exponential of parameter qij. So, by
Theorem 2.3.3, conditional on Yo = i, J
1
is exponential of parameter qi =
Lj#i qij, Y
1
has distribution (7rij : j E I), and J
1
and Y
1
are independent.
Now suppose T is a stopping time of If we condition on X
o
and
on the processes for (k,l) =1= (i,j), which are independent of N;j,
then {T t} depends only on (N;j : s t). So, by the strong Markov
property of the Poisson process N;j := N!}+t - N!} is a Poisson process of
rate qij independent of (N;j : s T), and independent of X
o
and
for (k,l) =1= (i,j). Hence, conditional on T < 00 and XT = i,
has the same distribution as and is independent of (X
s
: s T).
In particular, we can take T = I
n
to see that, conditional on I
n
< 00
and Y
n
= i, 8
n
+
1
is exponential of parameter qi, Y
n
+
1
has distribution
(7rij : j E I), and 8
n
+
1
and Y
n
+
1
are independent, and independent of
yo, ... ,Y
n
and 8
1
, ... ,8
n
· Hence is Markov(-X, Q) and, more-
over, has the strong Markov property. The conditioning on which
this argument relies requires some further justification, especially when the
state-space is infinite, so we shall not rely on this third construction in the
development of the theory.
2.7 Explosion
We saw in the special case of birth processes that, although each holding
time is strictly positive, one can run through a sequence of states with
shorter and shorter holding times and end up taking infinitely many jumps
in a finite time. This phenomenon is called explosion. Recall the notation
of Section 2.2: for a process with jump times Jo, J
1
, J2, ... and holding
times 8
1
, 8
2
, ... , the explosion time ( is given by
00
(= supJ
n
= 2: Sn o
n n==l
Theorem 2.7.1. Let be Markov(-X, Q). Then does not
explode if anyone of the following conditions holds:
(i) I is finite;
(ii) sup qi < 00;
iEI
(iii) X
o
= i, and i is recurrent for the jump chain.
Proof. Set Tn = q(Y
n
-
1
)Sn, then T
1
, T
2
, . .. are independent E(l) and in-
dependent of In cases (i) and (ii), q = SUPi qi < 00 and
00
2. 7 Explosion 91
with probability 1. In case (iii), we know that visits i infinitely
often, at times N
1
, N
2
, • •• , say. Then
00
qi( L TN",+! = 00
m==l
with probability 1. D
We say that a Q-matrix Q is explosive if, for the associated Markov chain
Pi (( < 00) > 0 for some i E I.
Otherwise Q is non-explosive. Here as in Chapter 1 we denote by Pi the
conditional probability Pi(A) = P(AIX
o
= i). It is a simple consequence
of the Markov property for (Yn)n>O that under Pi the process (Xt)t>o is
- -
Markov(8
i
, Q). The result just proved gives simple conditions for non-
explosion and covers many cases of interest. As a corollary to the next
result we shall obtain necessary and sufficient conditions for Q to be explo-
sive, but these are not as easy to apply as Theorem 2.7.1.
Theorem 2.7.2. Let be a continuous-time Markov chain with
generator matrix Q and write ( for the explosion time of (Xt)t>o. Fix
() > 0 and set Zi = Ei(e-(}(). Then Z = (Zi : i E I) satisfies: -
(i) IZil 1 for all i;
(ii) Qz = Oz.
Moreover, ifz also satisfies (i) and (ii), then Zi Zi for all i.
Proof. Condition on X
o
= i. The time and place of the first jump are
independent, J
1
is E(qi) and
Moreover, by the Markov property of the jump chain at time n = 1, con-
ditional on XJ
1
= k, is Markov(8k,Q) and independent of J
1

So
and
92 2. Continuous-time Markov chains I
Recall that qi = -qii and q(Jrik = qik. Then
(0 - qii)Zi = L qikZk
k#i
so
OZi = Lqikzk
kEI
and so z satisfies (i) and (ii). Note that the same argument also shows that
Suppose that zalso satisfies (i) and (ii), then, in particular
for all i. Suppose inductively that
z. < E·(e-
9Jn
)
~ - ~
then, since zsatisfies (ii)
Hence Zi ~ E
i
(e-
9Jn
) for all n. By monotone convergence
as n ~ 00, so Zi ~ Zi for all i. D
Corollary 2.7.3. For each (J > 0 the following are equivalent:
(a) Q is non-explosive;
(b) Qz = (Jz and IZil ~ 1 for all i imply z = o.
Proof. If (a) holds then JP>i(( = 00) = 1 so E
i
(e-
9
() = o. By the theorem,
Qz = (Jz and Izl ~ 1 imply Zi ~ E
i
(e-
9
(), hence z ~ 0, by symmetry z ~ 0,
and hence (b) holds. On the other hand, if (b) holds, then by the theorem
E
i
(e-
9
() = 0 for all i, so JP>i(( = 00) = 1 and (a) holds. D
2.8 Forward and backward equations 93
Exercise
2.7.1 Let (X
t
be a Markov chain on the integers with transition rates
qi,i+l = Aqi, qi,i-l = J-lqi
and qij = °if Ij - il 2, where A + J-l = 1 and qi > °for all i. Find for all
integers i:
(a) the probability, starting from 0, that X
t
hits i;
(b) the expected total time spent in state i, starting from 0.
In the case where J-l = 0, write down a necessary and sufficient condition
for to be explosive. Why is this condition necessary for to
be explosive for all J-l E [0,1/2)?
Show that, in general, is non-explosive if and only if one of the
following conditions holds:
(i) A = J-l;
(ii) A > J-l and 1/qi = 00;
(iii) A < J-l and = 00.
2.8 Forward and backward equations
Although the definition of a continuous-time Markov chain in terms of its
jump chain and holding times provides a clear picture of the process, it does
not answer some basic questions. For example, we might wish to calculate
IPi(X
t
= j). In this section we shall obtain two more ways of characterizing
a continuous-time Markov chain, which will in particular give us a means
to find IPi(X
t
= j). As for Poisson processes and birth processes, the
first step is to deduce the Markov property from the jump chain/holding
time definition. In fact, we shall give the strong Markov property as this
is a fundamental result and the proof is not much harder. However, the
proof of both results really requires the precision of measure theory, so we
have deferred it to Section 6.5. If you want to understand what happens,
Theorem 2.4.1 on the Poisson process gives the main idea in a simpler
context.
Recall that a random variable T with values in [0,00] is a stopping time of
if for each t E [0,00) the event {T t} depends only on (X
s
: s t).
Theorem 2.8.1 (Strong Markov property). Let be
Markov(A, Q) and let T be a stopping time of Then, conditional
on T < 00 and XT = i, is Markov(8
i
, Q) and independent of
(X
s
: s T).
We come to the key result for continuous-time Markov chains. We shall
present first a version for the case of finite state-space, where there is a
94 2. Continuous-time Markov chains I
simpler proof. In this case there are three alternative definitions, just as for
the Poisson process.
Theorem 2.8.2. Let be a right-continuous process with values in
a finite set I. Let Q be a Q-matrix on I with jump matrix II. Then the
following three conditions are equivalent:
(a) (jump chain/holding time definition) conditional on X
o
= i, the
jump chain of is discrete-time Markov(8
i
, IT) and for
each n 1, conditional on yo, . .. ,Y
n
-
1
, the holding times 8
1
, ... ,8
n
are independent exponential random variables of parameters
q(Y
o
), ... ,q(Y
n
-
1
) respectively;
(b) (infinitesimal definition) for all t, h 0, conditional on X
t
= i, Xt+h
is independent of (X
s
: s t) and, as h ! 0, uniformly in t, for all j
(c) (transition probability definition) for all n = 0, 1, 2, ... , all times °::;
to t
1
... t
n
+
1
and all states io, ... ,i
n
+
1
where (Pij (t) : i, j E I, t 0) is the solution of the forward equation
P'(t) = P(t)Q, P(o) = I.
If satisfies any of these conditions then it is called a Markov chain
with generator matrix Q. We say that is Markov(>.., Q) for short,
where>.. is the distribution of X
o
.
Proof. (a) =* (b) Suppose (a) holds, then, as h ! 0,
and for j =I i we have
JP>i(Xh = j) JP>(J1 h, Y1 = j, 8
2
> h)
= (1 - e-qih)7rije-qjh = qijh + o(h).
Thus for every state j there is an inequality
2.8 Forward and backward equations 95
and by taking the finite sum over j we see that these must in fact be
equalities. Then by the Markov property, for any t, h 0, conditional on
X
t
= i, X
t
+
h
is independent of (X
s
: s t) and, as h ! 0, uniformly in t
(b) :::} (c) Set Pij(t) = lFi(X
t
= j) = IF(X
t
= j I X
o
= i). If (b) holds, then
for all t, h 0, as h ! 0, uniformly in t
Pij(t + h) = LlPi(X
t
= k)IP(XHh = j I X
t
= k)
kEI
= LPik(t)(8kj + qkjh + o(h)).
kEI
Since I is finite we have
Pij(t + - Pij(t) = LPik(t)qkj + O(h)
kEI
so, letting h ! 0, we see that Pij(t) is differentiable on the right. Then by
uniformity we can replace t by t - h in the above and let h ! °to see first
that Pij(t) is continuous on the left, then differentiable on the left, hence
differentiable, and satisfies the forward equations
= LPik(t)qkj, Pij(O) = 8ij .
kEI
Since I is finite, Pij(t) is then the unique solution by Theorem 2.1.1. Also,
if (b) holds, then
and, moreover, (b) holds for so, by the above argument,
proving (c).
(c) :::} (a) See the proof of Theorem 2.4.3. D
We know from Theorem 2.1.1 that for I finite the forward and backward
equations have the same solution. So in condition (c) of the result just
proved we could replace the forward equation with the backward equation.
Indeed, there is a slight variation of the argument from (b) to (c) which
leads directly to the backward equation.
96 2. Continuous-time Markov chains I
The deduction of (c) from (b) above can be seen as the matrix version
of the following result: for q E lR we have
(1 + + o( )n -4 e
q
as n -4 00.
Suppose (b) holds and set
Pij(t, t +h) = P(X
t
+
h
= j I X
t
= i);
then P(t, t +h) = (Pij(t, t +h) : i,j E I) satisfies
P(t, t +h) = I +Qh +o(h)
and
... =
Some care is needed in making this precise, since the o(h) terms, though
uniform in t, are not a priori identical. On the other hand, in (c) we see
that
P(O, t) = e
tQ
.
We turn now to the case of infinite state-space. The backward equation
may still be written in the form
P'(t) = QP(t), P(O) = I
only now we have an infinite system of differential equations
= L qikPkj(t), Pij(O) = 15ij
kEI
and the results on matrix exponentials given in Section 2.1 no longer apply.
A solution to the backward equation is any matrix (pij(t) : i,j E I) of
differentiable functions satisfying this system of differential equations.
Theorem 2.8.3. Let Q be a Q-matrix. Then the backward equation
P'(t) = QP(t), P(O) = I
has a minimal non-negative solution (P(t) : t 2: 0). This solution forms a
matrix semigroup
P(s)P(t) = P(s +t) for all s, t O.
We shall prove this result by a probabilistic method in combination with
Theorem 2.8.4. Note that if I is finite we must have P(t) = e
tQ
by Theorem
2.1.1. We call (P(t) : t 0) the minimal non-negative semigroup associated
to Q, or simply the semigroup of Q, the qualifications minimal and non-
negative being understood.
Here is the key result for Markov chains with infinite state-space. There
are just two alternative definitions now as the infinitesimal characterization
become problematic for infinite state-space.
2.8 Forward and backward equations 97
(2.2)
Theorem 2.8.4. Let be a minimal right-continuous process with
values in I. Let Q be a Q-matrix on I with jump matrix IT and semigroup
(P(t) : t 0). Then the following conditions are equivalent:
(a) (jump chain/holding time definition) conditional on X
o
= i, the
jump chain of is discrete-time Markov(8i, II) and for
each n 1, conditional on yo, . .. ,Y
n
-
1
, the holding times 8
1
, ... ,8
n
are independent exponential random variables of parameters
q(Y
o
), ... ,q(Y
n
-
1
) respectively;
(b) (transition probability definition) for all n = 0,1,2, ... , all times 0
to tl ... t
n
+l and all states io, il, ... ,i
n
+l
If satisfies any of these conditions then it is called a Markov chain
with generator matrix Q. We say that is Markov (A, Q) for short,
where A is the distribution of X
o
.
Proof of Theorems 2.8.3 and 2.8.4. We know that there exists a process
satisfying (a). So let us define P(t) by
Pij(t) =JP>i(X
t
=j).
Step 1. We show that P(t) satisfies the backward equation.
Conditional on X
o
= i we have J
1
rv E(ql) and X
J1
rv (7rik : k E I).
Then conditional on J
1
= sand X
J1
= k we have rv
Markov(8k, Q). So
and
JP>i(J1 ::; t, XJI = k, X
t
= j) = I
t
qie-QiS7rikPkj(t - s)ds.
Therefore
Pij(t) = JP>i(X
t
= j, t < J1) + LJP>i(J1 ::; t, XJl = k, X
t
= j)
k#i
= e-Qit8ij + 2: t qie-QiS7rikPkj(t - s)ds. (2.1)
k#i Jo
Make a change of variable u = t - s in each of the integrals, interchange
sum and integral by monotone convergence and multiply by e
qit
to obtain
eQitpij(t) = 8
ij
+ t L qieQiU7rikPkj(u)du.
Jo k#i
98 2. Continuous-time Markov chains I
This equation shows, firstly, that Pij (t) is continuous in t for all i, j.
Secondly, the integrand is then a uniformly converging sum of continuous
functions, hence continuous, and hence Pij (t) is differentiable in t and sat-
isfies
eqit(qiPij(t) + P ~ j ( t ) ) = L qieqit7rikPkj(t).
k#i
Recall that qi = -qii and qik = qi'lrik for k =I i. Then, on rearranging, we
obtain
P ~ j ( t ) = L qikPkj(t)
kEI
(2.3)
so P(t) satisfies the backward equation.
The integral equation (2.1) is called the integral form of the backward
equation.
Step 2. We show that if P(t) is another non-negative solution of the back-
ward equation, then P(t) ~ P(t), hence P(t) is the minimal non-negative
solution.
The argument used to prove (2.1) also shows that
JP>i(X
t
=j,t < I
n
+l)
= e-qit15ij + L rt qie-QiS7rikJP>k(Xt-s = j, t - s < In)ds.
k=li Jo (2.4)
On the other hand, if P(t) satisfies the backward equation, then, by revers-
ing the steps from (2.1) to (2.3), it also satisfies the integral form:
If P(t) 2 0, then
for all i, j and t.
Let us suppose inductively that
for all i, j and t,
then by comparing (2.4) and (2.5) we have
for all i, j and t,
2.8 Forward and backward equations
and the induction proceeds. Hence
99
for all i, j and t.
Step 3. Since does not return from 00 we have
Pij(S + t) = lPi(Xs+t = j) = LlPi(XS+
t
= j I X s = k)lPi(Xs = k)
kEI
= :ElPi(X
s
= k)lPk(X
t
= j) = :EPik(S)Pkj(t)
kEI kEI
by the Markov property. Hence (P(t) : t 0) is a matrix semigroup. This
completes the proof of Theorem 2.8.3.
Step 4. Suppose, as we have throughout, that satisfies (a). Then,
by the Markov property
P(X
tn
+
1
= i
n
+
1
IX
to
= io, ... ,X
tn
= in)
=Pin (Xtn+l-tn = i n+1) = Pi
n
i
n
+l (t n+l - tn)
so satisfies (b). We complete the proof of Theorem 2.8.4 by the
usual argument that (b) must now imply (a) (see the proof of Theorem
2.4.3, (c) :::} (a)). D
So far we have said nothing about the forward equation in the case of
infinite state-space. Remember that the finite state-space results of Section
2.1 are no longer valid. The forward equation may still be written
P'(t) = P(t)Q, P(O) = I,
now understood as an infinite system of differential equations
= LPik(t)qkj, Pij(O) = 8ij .
kEI
A solution is then any matrix (pij(t) : i,j E I) of differentiable functions
satisfying this system of equations. We shall show that the
(P(t) : t 0) of Q does satisfy the forward equations, by a probabilistic
argument resembling Step 1 of the proof of Theorems 2.8.3 and 2.8.4. This
time, instead of conditioning on the first event, we condition on the last
event before time t. The argument is a little longer because there is no
reverse-time Markov property to give the conditional distribution. We need
the following time-reversal identity, a simple version of which was given in
Theorem 2.3.4.
100 2. Continuous-time Markov chains I
Lemma 2.8.5. We have
qinJP(J
n
~ t < I
n
+
1
I Yo = io, Y
1
= i
1
, ... ,Y
n
= in)
= qioJP(Jn ~ t < I n+
1
I Yo = in,· .. , Y
n
-
1
= iI, Y
n
= io).
Proof· Conditional on Yo = i
o
, ... ,Y
n
= in, the holding times 8
1
, ... ,8
n
+
1
are independent with 8k rv E(qik_l). So the left-hand side is given by
{ qi
n
exp{-qi
n
(t - Sl - · · · - sn)} IT qik-l exp{-qik-l Sk}dsk
Jt:!..(t) k=l
where Ll(t) = {(SI, ... ,sn) : SI + ... + Sn ~ t and SI, ... ,8
n
~ O}. On
making the substitutions Ul = t - 81 - ... - Sn and Uk = 8
n
-k+2, for
k = 2, ... ,n, we obtain
qinJP(Jn ~ t < I n+
1
I Yo = i
o
,· .. ,Y
n
= in)
= ( qio exp{-qio(t - U1 - · .. - Un)} IT qin-k+l exp{-qin-k+l Uk}duk
Jt:!..(t) k=l
=qioJP(J
n
~ t < I
n
+
1
I Yo = in, . .. ,Y
n
-
1
= iI, Y
n
= io). D
Theorem 2.8.6. The minimal non-negative solution (P(t) : t 2: 0) of the
backward equation is also the minimal non-negative solution of the forward
equation
P'(t) = P(t)Q, P(O) = I.
Proof. Let ( X t ) t ~ O denote the minimal Markov chain with generator matrix
Q. By Theorem 2.8.4
00
= L LJP>i(Jn ::; t < In+i, Y
n
-
1
= k, Y
n
= j).
n==O k=j:j
Now by Lemma 2.8.5, for n ~ 1, we have
JPi(J
n
~ t < I
n
+
1
I Y
n
-
1
= k, Y
n
= j)
= (qi/qj)JPj(J
n
~ t < I
n
+
1
I Y
1
= k, Y
n
= i)
= (qi/Qj) it qje-QjBJP>k(Jn_1 ::; t - S < I
n
I Yn-1 = i)ds
= qi I
t
e-
QjB
(qk/qi)JP>i (In-1 ::; t - s < I
n
I Y
n
-
1
= k)ds
2.8 Forward and backward equations 101
where we have used the Markov property of for the second equality.
Hence
Pij(t) = 8ije-qit + f '2: it JP>i(Jn-l ::; t - s < I
n
I Y
n
-
1
= k)
n=lki=i 0
X IPi(Y
n
-
1
= k, Y
n
= j)Qke-qj8ds
00 ft
=8ije-qit + L L io JP>i(Jn-l ::; t - s< I
n
, Y
n
-
1
= k)qk'lrkje-qjSds
n=lki=i 0
= 8ije-qit + t '2:Pik(t - s)qkje-qjSds (2.6)
io k#j
where we have used monotone convergence to interchange the sum and
integral at the last step. This is the integral form of the forward equation.
Now make a change of variable u = t - s in the integral and multiply by
e
qjt
to obtain
(2.7)
We know by equation (2.2) that eqitpik(t) is increasing for all i, k. Hence
either
'2:Pik(U)qkj converges uniformly for u E [0, t]
ki=i
or
LPik(U)qkj = 00 for all u t.
ki=j
The latter would contradict (2.7) since the left-hand side is finite for all t,
so it is the former which holds. We know from the backward equation that
Pii (t) is continuous for all i, j; hence by uniform convergence the integrand
in (2.7) is continuous and we may differentiate to obtain
+ Pij(t)qj = '2:Pik(t)qkj.
ki=i
Hence P(t) solves the forward equation.
To establish minimality let us suppose that Pij(t) is another solution of
the forward equation; then we also have
102 2. Continuous-time Markov chains I
A small variation of the argument leading to (2.6) shows that, for n ~ 0
Pi(X
t
= j, t < I
n
+
1
)
= 8ije-qit + L (t IP\(X
t
= j, t < In)qkje-QjBds. (2.8)
k#jJo
If P(t) ~ 0, then
P(X
t
= j, t < J
o
) = 0 ~ Pij (t) for all i, j and t.
Let us suppose inductively that
then by comparing (2.7) and (2.8) we obtain
and the induction proceeds. Hence
Exercises
2.8.1 Two fleas are bound together to take part in a nine-legged race on the
vertices A, B, C of a triangle. Flea 1 hops at random times in the clockwise
direction; each hop takes the pair from one vertex to the next and the times
between successive hops of Flea 1 are independent random variables, each
with with exponential distribution, mean 1/A. Flea 2 behaves similarly,
but hops in the anticlockwise direction, the times between his hops having
mean 1/Il. Show that the probability that they are at A at a given time
t > 0 (starting from A at time t = 0) is
2.8.2 Let ( X t ) t ; ~ o be a birth-and-death process with rates An = nA and
Iln = nil, and assume that X
o
= 1. Show that h(t) = P(X
t
= 0) satisfies
2.9 Non-minimal chains
and deduce that if A =I J-l then
2.9 Non-minimal chains
103
This book concentrates entirely on processes which are right-continuous
and minimal. These are the simplest sorts of process and, overwhelmingly,
the ones of greatest practical application. We have seen in this chapter
that we can associate to each distribution A and Q-matrix Q a unique
such process, the Markov chain with initial distribution A and generator
matrix Q. Indeed we have taken the liberty of defining Markov chains to be
those processes which arise in this way. However, these processes do not by
any means exhaust the class of memoryless continuous-time processes with
values in a countable set I. There are many more exotic possibilities, the
general theory of which goes very much deeper than the account given in
this book. It is in the nature of things that these exotic cases have received
the greater attention among mathematicians. Here are some examples to
help you imagine the possibilities.
Example 2.9.1
Consider a birth process ( X t ) t ~ O starting from 0 with rates qi = 2
i
for i ~ o.
We have chosen these rates so that
00 00
Lq:;l = L2-i < 00
i=O i=O
which shows that the process explodes (see Theorems 2.3.2 and 2.5.2). We
have until now insisted that X
t
= 00 for all t 2 (, where ( is the explosion
time. But another obvious possibility is to start the process off again from
oat time (, and do the same for all subsequent explosions. An argument
based on the memoryless property of the exponential distribution shows
that for 0 ~ to ~ . . . ~ t
n
+1 this process satisfies
for a semigroup of stochastic matrices (P(t) : t ~ 0) on I. This is the
defining property for a more general class of Markov chains. Note that
the chain is no longer determined by A and Q alone; the rule for bringing
( X t ) t ~ O back into I after explosion also has to be given.
104 2. Continuous-time Markov chains I
Example 2.9.2
We make a variation on the preceding example. Suppose now that the jump
chain of is the Markov chain on Z which moves one step away from
owith probability 2/3 and one step towards 0 with probability 1/3, and
that Yo = O. Let the transition rates for be qi = 2
1il
for i E Z. Then
is again explosive. (A simple way to see this using some results of
Chapter 3 is to check that is transient but has an invariant
distribution - by solution of the detailed balance equations. Then Theorem
3.5.3 makes explosion inevitable.) Now there are two ways in which
can explode, either X
t
-00 or X
t
00.
The process may again be restarted at 0 after explosion. Alternatively,
we may choose the restart randomly, and according to the way that explo-
sion occurred. For example
if X
t
-00 as t i (
if X
t
00 as t i (
where Z takes values ±1 with probability 1/2.
Example 2.9.3
The processes in the preceding two examples, though no longer minimal,
were at least right-continuous. Here is an altogether more exotic example,
due to P. Levy, which is not even right-continuous. Consider
for n 0
and set I = UnD
n
. With each i E D
n
\D
n
-
1
we associate an independent
exponential random variable Si of parameter (2
n
)2. There are 2
n
-
1
states
in (Dn\Dn-l) n [0,1), so, for all i E I
and
Now define
if L Sj :::; t < L Sj for some i E I
j<i
otherwise.
2.10 Appendix: matrix exponentials 105
This process runs through all the dyadic rationals i E I in the usual order.
It remains in i E D
n
\D
n
-
1
for an exponential time of parameter 1. Between
any two distinct states i and j it makes infinitely many visits to 00. The
Lebesgue measure of the set of times t when X
t
= 00 is zero. There is a
semigroup of stochastic matrices (P(t) : t ~ 0) on I such that, for 0 ~ to ~
... ~ tn+l
In particular, P(X
t
= 00) = 0 for all t ~ O. The details may be found in
Markov Chains by D. Freedman (Holden-Day, San Francisco, 1971).
We hope these three examples will serve to suggest some of the possibil-
ities for more general continuous-time Markov chains. For further reading,
see Freedman's book, or else Markov Chains with Stationary Transition
Probabilities by K.-L. Chung (Springer, Berlin, 2nd edition, 1967), or Dif-
fusions, Markov Processes and Martingales, Vol 1: Foundations by L. C. G.
Rogers and D. Williams (Wiley, Chichester, 2nd edition, 1994).
2.10 Appendix: matrix exponentials
Define two norms on the space of real-valued N x N -matrices
IQI = sup IQvl/lvl, IIQlloo = sup Iqijl·
v#O i,j
Obviously, IIQlloo is finite for all Q and controls the size of the entries in
Q. We shall show that the two norms are equivalent and that IQI is well
adapted to sums and products of matrices, which IIQlloo is not.
Lemma 2.10.1. We have
(a) IIQlloo ~ IQI ~ NIIQlloo;
(b) IQl +Q21 ~ IQll + IQ21 and IQIQ21 ~ IQIIIQ21·
Proof. (a) For any vector v we have IQvl ~ IQllvl. In particular, for the
vector Cj = (0, ... ,1, ... ,0), with 1 in the jth place, we have IQcjl ~ IQI.
The supremum defining IIQlloo is achieved, at j say, so
I I Q I I ~ ~ L(qij)2 = IQcjl2 ~ IQI
2
.
i
106
On the other hand
2. Continuous-time Markov chains I
IQvI
2
= 2
( IIQllooIVjl) 2
= ( IVjl) 2
and, by the Cauchy-Schwarz inequality
(b) For any vector v we have
I(QI +Q2)vl IQI
V
I+ IQ2
V
l (IQll + IQ21)1vl,
IQIQ2
V
l IQI11Q2
V
l IQIIIQ21Ivl· D
Now for n = 0,1,2, ... , consider the finite sum
n Qk
E(n) = L kf'
k=O
For each i and j, and m n, we have
I(E(n) - E(m))iil IIE(n) - E(m)lloo IE(n) - E(m)1
n Qk
Lkf
k=m+l
n IQl
k
< '" -
- LJ k!·
k=m+l
Since IQI NllQlloo < 00, IQl
k
/k! converges by the ratio test, so
n IQl
k
'" -
LJ k!
k=m+l
as m,n 00.
2.10 Appendix: matrix exponentials 107
Hence each component of E(n) forms a Cauchy sequence, which therefore
converges, proving that
CX) Qk
e
Q
= L kf
k=O
is well defined and, indeed, that the power series
(
t
Q
) .. _ ~ (tQ)fj
e '1,) - L.-J k!
k=O
has infinite radius of convergence for all i,j. Finally, for two commuting
matrices Q1 and Q2 we have
3
Continuous-time Markov chains II
This chapter brings together the discrete-time and continuous-time theo-
ries, allowing us to deduce analogues, for continuous-time chains, of all the
results given in Chapter 1. All the facts from Chapter 2 that are necessary
to understand this synthesis are reviewed in Section 3.1. You will require
a reasonable understanding of Chapter 1 here, but, given such an under-
standing, this chapter should look reassuringly familiar. Exercises remain
an important part of the text.
3.1 Basic properties
Let I be a countable set. Recall that a Q-matrix on I is a matrix Q =
(qij : i,j E I) satisfying the following conditions:
(i) 0 ~ -qii < 00 for all i;
(ii) qij 2 0 for all i =I j;
(iii) L qij = 0 for all i.
jEI
We set qi = q(i) = -qii. Associated to any Q-matrix is a jump matrix
IT = (7rij : i,j E I) given by
1r.. _ { qij / qi if j =I i and qi =I 0
~ J - 0 if j =I i and qi = 0,
7r.. _ {o if qi =I 0
~ ~ - 1 if qi = o.
Note that II is a stochastic matrix.
3.1 Basic properties 109
A sub-stochastic matrix on I is a matrix P = (Pij : i, j E I) with non-
negative entries and such that
LPij ::; 1 for all i.
jEI
Associated to any Q-matrix is a semigroup (P(t) : t 2 0) of sub-stochastic
matrices P(t) = (Pij(t) : i,j E I). As the name implies we have
P(s)P(t) = P(s +t) for all s, t 2 o.
You will need to be familiar with the following terms introduced in Sec-
tion 2.2: minimal right-continuous random process, jump times, holding
times, jump chain and explosion. Briefly, a right-continuous process ( X t ) t ~ O
runs through a sequence of states Yo, Y
1
, Y
2
, . .. , being held in these states
for times 8
1
,8
2
,8
3
, ... respectively and jumping to the next state at times
J
1
, J
2
, J
3
,· ... Thus I
n
= 8
1
+... +8
n
. The discrete-time process ( Y n ) n ~ O
is the jump chain, (8
n
)n>1 are the holding times and (I
n
)n>1 are the jump
- -
times. The explosion time ( is given by
00
( = '""" 8n = lim I n .
L.J n--+-oo
n=l
For a minimal process we take a new state 00 and insist that X
t
= 00 for
all t 2 (. An important point is that a minimal right-continuous process is
determined by its jump chain and holding times.
The data for a continuous-time Markov chain ( X t ) t ~ O are a distribution
-X and a Q-matrix Q. The distribution -X gives the initial distribution, the
distribution of X
o
. The Q-matrix is known as the generator matrix of
( X t ) t ~ O and determines how the process evolves from its initial state. We
established in Section 2.8 that there are two different, but equivalent, ways
to describe how the process evolves.
The first, in terms of jump chain and holding times, states that
(a) ( Y n ) n ~ O is Markov(-X, IT);
(b) conditional on Yo = i
o
, ... ,Y
n
-
1
= in-I, the holding times 8
1
, ... ,Sn
are independent exponential random variables of parameters
qi
o
' . .. ,qi
n
-1 .
Put more simply, given that the chain starts at i, it waits there for an
exponential time of parameter qi and then jumps to a new state, choosing
state j with probability 1rij. It then starts afresh, forgetting what has gone
before.
110 3. Continuous-time Markov chains II
The second description, in terms of the semigroup, states that the finite-
dimensional distributions of the process are given by
(c) for all n = 0,1,2, ... , all times 0 to tl ... tn+l and all states
i
o
, iI, . .. ,i
n
+1
Again, put more simply, given that the chain starts at i, by time t it is
found in state j with probability Pij (t). It then starts afresh, forgetting
what has gone before. In the case where
Pioo(t) := 1 - :EPij(t) > 0
jEI
the chain is found at 00 with probability Pi 00 (t). The semigroup P(t) is re-
ferred to as the transition matrix of the chain and its entries Pij (t) are the
transition probabilities. This description implies that for all h > 0 the dis-
crete skeleton is Markov(.x, P(h)). Strictly, in the explosive case,
that is, when P(t) is strictly sub-stochastic, we should say P(h)),
where and P(h) are defined on IU{oo}, extending.x and P(h) by = 0
and Pooj(h) = O. But there is no danger of confusion in using the simpler
notation.
The information coming from these two descriptions is sufficient for most
of the analysis of continuous-time chains done in this chapter. Note that
we have not yet said how the semigroup P(t) is associated to the Q-matrix
Q, except via the process! This extra information will be required when
we discuss reversibility in Section 3.7. So we recall from Section 2.8 that
the semigroup is characterized as the minimal non-negative solution of the
backward equation
P'(t) = QP(t), P(O) = I
which reads in components
= L qikPkj(t), Pij(O) = 8ij .
kEI
The semigroup is also the minimal non-negative solution of the forward
equation
P'(t) = P(t)Q, P(O) = I.
In the case where I is finite, P(t) is simply the matrix exponential e
tQ
, and
is the unique solution of the backward and forward equations.
3.2 Class structure
3.2 Class structure
111
A first step in the analysis of a continuous-time Markov chain is to
identify its class structure. We emphasise that we deal only with minimal
chains, those that die after explosion. Then the class structure is simply
the discrete-time class structure of the jump chain as discussed in
Section 1.2.
We say that i leads to j and write i j if
lPi(X
t
= j for some t 0) > o.
We say i communicates with j and write i j if both i j and j
i. The notions of communicating class, closed class, absorbing state and
irreducibility are inherited from the jump chain.
Theorem 3.2.1. For distinct states i and j the following are equivalent:
(i) i
(ii) i j for the jump chain;
(iii) Q
i
oilqili2 ... Qin-li
n
> 0 for some states io,iI, ... ,in with i
o
= i,
in = j;
(iv) Pij(t) > 0 for all t > 0;
(v) Pij(t) > 0 for some t > o.
Proof. Implications (iv) => (v) => (i) => (ii) are clear. If (ii) holds, then
by Theorem 1.2.1, there are states io, i
I
, ... ,in with io = i, in = j and
'1rioi
l
'1rili2 ... '1ri
n
-li
n
> 0, which implies (iii). If qij > 0, then
for all t > 0, so if (iii) holds, then
Pij (t) Pioil (tin) ... Pin-lin (tin) > 0
for all t > 0, and (iv) holds. D
Condition (iv) of Theorem 3.2.1 shows that the situation is simpler than
in discrete-time, where it may be possible to reach a state, but only after a
certain length of time, and then only periodically.
3.3 Hitting times and absorption probabilities
Let (Xt)t>o be a Markov chain with generator matrix Q. The hitting time
of a subset A of I is the random variable D
A
defined by
112 3. Continuous-time Markov chains II
with the usual convention that inf 0 = 00. We emphasise that (Xt)t>o is
minimal. So if H A is the hitting time of A for the jump chain, then -
{H
A
< oo} = {D
A
< oo}
and on this set we have
D
A
= JHA.
The probability, starting from i, that ( X t ) t ~ O ever hits A is then
When A is a closed class, hf is called the absorption probability. Since the
hitting probabilities are those of the jump chain we can calculate them as
in Section 1.3.
Theorem 3.3.1. The vector of hitting probabilities h
A
= (hf : i E I) is
the minimal non-negative solution to the system of linear equations
Proof. Apply Theorem 1.3.2 to the jump chain and rewrite (1.3) in terms
ofQ. D
The average time taken, starting from i, for ( X t ) t ~ O to reach A is given
by
In calculating kf we have to take account of the holding times so the rela-
tionship to the discrete-time case is not quite as simple.
1
1 2
2
1
3
3
3
2
4
3.3 Hitting times and absorption probabilities 113
(3.1)
Example 3.3.2
Consider the Markov chain with the diagram given on the preceding
page. How long on average does it take to get from 1 to 4?
Set k
i
= E
i
(time to get to 4). On starting in 1 we spend an average time
ql
1
= 1/2 in 1, then jump with equal probability to 2 or 3. Thus
k 1 = + k2 + k3
and similarly
k 2 = + k 1 + k 3 , k 3 = + k 1 + k 2 .
On solving these linear equations we find k
1
= 17/12.
Here is the general result. The proof follows the same lines as Theorem
1.3.5.
Theorem 3.3.3. Assume that qi > 0 for all i fj. A. The vector of expected
hitting times k
A
= (kt : i E I) is the minimal non-negative solution to the
system of linear equations
{
kt = 0 for i E A
- LjEI qijkf = 1 for i fj. A.
Proof. First we show that k
A
satisfies (3.1). If X
o
= i E A, then D
A
= 0,
so kt = o. If X
o
= i fj. A, then D
A
J
1
, so by the Markov property of the
jump chain
so
kt = Ei(D
A
) = Ei(J
1
)+LE(DA-J
1
I Y
1
= j)IP\(Y
l
= j) = q;l+L 'lrij
k
1
j#i j#i
and so
- Lqijk1 = 1.
JEI
Suppose now that Y = (Yi : i E I) is another solution to (3.1). Then
kf = Yi = 0 for i E A. Suppose i fj. A, then
-1 '"' -1 '"' (-1 '"' ) Yi = qi + 'lrijYj = qi + 'lrij qj + 'lrjkYk

= Ei(Sl) + Ei (S2
1
{HA:2:2}) + L L'lrij'lrjkYk.

114 3. Continuous-time Markov chains II
By repeated substitution for y in the final term we obtain after n steps
Yi = lEi(Sd + ... + lEi (Sn
1
{HA:2:n}) + L ... L 1riil .. ·
1r
in-dnYin·

So, if y is non-negative
where we use the notation HA /\ n for the minimum of H
A
and n. Now
so, by monotone convergence, Yi E
i
(DA) = kf, as required. D
Exercise
3.3.1 Consider the Markov chain on {I, 2, 3, 4} with generator matrix
Q=
1/6
o
1/2
-1/2
o
o
1/2
o
-1/3
o

1/6
o.
Calculate (a) the probability of hitting 3 starting from 1, (b) the expected
time to hit 4 starting from 1.
3.4 Recurrence and transience
Let be Markov chain with generator matrix Q. Recall that we insist
be minimal. We say a state i is recurrent if
JP>i({t 0: X
t
= i} is unbounded) = 1.
We say that i is transient if
JP>i( {t 0 : X
t
= i} is unbounded) = O.
Note that if can explode starting from i then i is certainly not
recurrent. The next result shows that, like class structure, recurrence and
transience are determined by the jump chain.
3.4 Recurrence and transience 115
Theorem 3.4.1. We have:
(i) if i is recurrent for the jump chain then i is recurrent for

(ii) if i is transient for the jump chain, then i is transient for
(iii) every state is either recurrent or transient;
(iv) recurrence and transience are class properties.
Proof. (i) Suppose i is recurrent for If X
o
= i then does
not explode and I
n
00 by Theorem 2.7.1. Also X(J
n
) = Y
n
= i infinitely
often, so {t 0 : X
t
= i} is unbounded, with probability 1.
(ii) Suppose i is transient for If X
o
= i then
N = sup{n 0 : Y
n
= i} < 00,
so {t 0: X
t
= i} is bounded by J(N +1), which is finite, with probability
1, because (Y
n
: n N) cannot include an absorbing state.
(iii) Apply Theorem 1.5.3 to the jump chain.
(iv) Apply Theorem 1.5.4 to the jump chain. D
The next result gives continuous-time analogues of the conditions for
recurrence and transience found in Theorem 1.5.3. We denote by T
i
the
first passage time of to state i, defined by
Ti(w) = inf{t J
1
(w) : Xt(w) = i}.
Theorem 3.4.2. The following dichotomy holds:
(i) if qi = 0 or IPi(T
i
< 00) = 1, then i is recurrent and Jo
oo
Pii(t)dt = 00;
(ii) ifqi > 0 and IPi(T
i
< 00) < 1, then i is transient and Jo
oo
Pii(t)dt < 00.
Proof. If qi = 0, then cannot leave i, so i is recurrent, Pii(t) = 1
for all t, and Jo
oo
Pii(t)dt = 00. Suppose then that qi > o. Let N
i
denote
the first passage time of the jump chain to state i. Then
IPi(N
i
< 00) = IPi(T
i
< 00)
so i is recurrent if and only if IPi(T
i
< 00) = 1, by Theorem 3.4.1 and the
corresponding result for the jump chain.
Write 1r}j) for the (i,j) entry in rrn. We shall show that
1
00 1 00
_ (n)
pii(t)dt - --:- L 1r
ii
o q1, n=O
(3.2)
so that i is recurrent if and only if Jo
oo
Pii(t)dt = 00, by Theorem 3.4.1 and
the corresponding result for the jump chain.
116 3. Continuous-time Markov chains II
To establish (3.2) we use Fubini's theorem (see Section 6.4):
00
= lEi L Sn+l
1
{Yn=i}
n=O
00 1 00
= L lEi (Sn+l I Yn = i)JP>i(Yn = i) = --:- L 0
n=O n=O
Finally, we show that recurrence and transience are determined by any
discrete-time sampling of
Theorem 3.4.3. Let h > 0 be given and set Zn = X
nh
.
(i) If i is recurrent for then i is recurrent for
(ii) If i is transient for then i is transient for
Proof. Claim (ii) is obvious. To prove (i) we use for nh t < (n + l)h the
estimate
which follows from the Markov property. Then, by monotone convergence
and the result follows by Theorems 1.5.3 and 3.4.2. D
Exercise
3.4.1 Customers arrive at a certain queue in a Poisson process of rate A.
The single 'server' has two states A and B, state A signifying that he is 'in
attendance' and state B that he is having a tea-break. Independently of
how many customers are in the queue, he fluctuates between these states
as a Markov chain Y on {A, B} with Q-matrix
(
-a a)
(3 -(3 .
The total service time for any customer is exponentially distributed with
parameter J-l and is independent of the chain Y and of the service times of
other customers.
3.5 Invariant distributions
Describe the system as a Markov chain X with state-space
117
An signifying that the server is in state A and there are n people in the
queue (including anyone being served) and B
n
signifying that the server is
in state B and there are n people in the queue.
Explain why, for some fJ in (0,1], and k = 0,1,2, ... ,
Show that (fJ - 1) f (fJ) = 0, where
By considering f(l) or otherwise, prove that X is transient if 1l(3 < A(a+(3),
and explain why this is intuitively obvious.
3.5 Invariant distributions
Just as in the discrete-time theory, the notions of invariant distribution
and measure play an important role in the study of continuous-time Markov
chains. We say that A is invariant if
AQ=O.
Theorem 3.5.1. Let Q be a Q-matrix with jump matrix IT and let A be
a measure. The following are equivalent:
(i) A is invariant;
(ii) Il IT = Il where Ili = Aiqi.
Proof· We have qi ('lrij - bij) = qij for all i, j, so
(Jl(II - I))j = :EJli(1rij - 8ij ) = :E,xiqij = (,xQk D
iEI iEI
This tie-up with measures invariant for the jump matrix means that we
can use the existence and uniqueness results of Section 1.7 to obtain the
following result.
118 3. Continuous-time Markov chains II
Theorem 3.5.2. Suppose that Q is irreducible and recurrent. Then Q has
an invariant measure A which is unique up to scalar multiples.
Proof. Let us exclude the trivial case I = {i}; then irreducibility forces
qi > 0 for all i. By Theorems 3.2.1 and 3.4.1, II is irreducible and recurrent.
Then, by Theorems 1.7.5 and 1.7.6, II has an invariant measure jj, which is
unique up to scalar multiples. So, by Theorem 3.5.1, we can take Ai = jji/qi
to obtain an invariant measure unique up to scalar multiples. D
Recall that a state i is recurrent if qi = 0 or lPi (Ti < 00) = 1. If qi = 0
or the expected return time mi = Ei(Ti) is finite then we say i is positive
recurrent. Otherwise a recurrent state i is called null recurrent. As in the
discrete-time case positive recurrence is tied up with the existence of an
invariant distribution.
Theorem 3.5.3. Let Q be an irreducible Q-matrix. Then the following
are equivalent:
(i) every state is positive recurrent;
(ii) some state i is positive recurrent;
(iii) Q is non-explosive and has an invariant distribution A.
Moreover, when (iii) holds we have mi = l/(Aiqi) for all i.
Proof. Let us exclude the trivial case I = {i}; then irreducibility forces
qi > 0 for all i. It is obvious that (i) implies (ii). Define J-li = (J-l; : j E I)
by
fTil\(
f . L ~ = Ei 10 1{xs=j}ds,
where T
i
/\ ( denotes the minimum of T
i
and (. By monotone convergence,
L f.L; = Ei(Ti 1\ ().
jEI
Denote by N
i
the first passage time of the jump chain to state i. By Fubini's
theorem
00
f.L; = Ei L Sn+l
1
{Yn=j,n<Nd
n=O
00
= LEi(Sn+l IYn =j)Ei (l{Y
n
=j,n<N
i
})
n=O
00
= ~ l E · '""" 1 .
qJ ~ L.J {Yn=J,n<Ni }
n=O
Ni-l
= q;l Ei L l{Y
n
=j} = ,j/qj
n=O
3.5 Invariant distributions 119
where, in the notation of Section 1.7, ,,; is the expected time in j between
visits to i for the jump chain.
Suppose (ii) holds, then i is certainly recurrent, so the jump chain is
recurrent, and Q is non-explosive, by Theorem 2.7.1. We know that "ill =
"i by Theorem 1.7.5, so J-liQ = 0 by Theorem 3.5.1. But J-li has finite total
mass
L Jl; = Ei(Ti) = mi
jEI
so we obtain an invariant distribution A by setting Aj = J-l;/mi.
On the other hand, suppose (iii) holds. Fix i E I and set Vj =
Ajqj/(Aiqi); then Vi = 1 and vll = v by Theorem 3.5.1, so Vj ~ ,,; for
all j by Theorem 1.7.6. So
mi = LJl; = L ,;/qj ~ L Vj/Qj
jEI jEI jEI
= 2: Aj/(Aiqi) = l/(Aiqi) < 00
jEI
showing that i is positive recurrent.
To complete the proof we return to the preceding calculation armed
with the knowledge that Q is recurrent, hence 11 is recurrent, Vj = ,,; and
mi = l/(Aiqi) for all i. D
The following example is a caution that the existence of an invariant
distribution for a continuous-time Markov chain is not enough to guarantee
positive recurrence, or even recurrence.
Example 3.5.4
Consider the Markov chain ( X t ) ( ~ O on Z+ with the following diagram,
where qi > 0 for all i and where 0 < A = 1 - J-l < 1:
1
AqO
...- - - ~ - - .......- - - - - -
o i-I i i+1
The jump chain behaves as a simple random walk away from 0, so ( X t ) ( ~ O
is recurrent if A ~ J-l and transient if A > J-l. To compute an invariant
measure v it is convenient to use the detailed balance equations
for all i, j.
120 3. Continuous-time Markov chains II
Look ahead to Lemma 3.7.2 to see that any solution is invariant. In this
case the non-zero equations read
for all i.
So a solution is given by Vi = q:;l(A/Il)i. If the jump rates qi are constant
then v can be normalized to produce an invariant distribution precisely
when A < Il.
Consider, on the other hand, the case where qi = 2
i
for all i and
1 < AIIl < 2. Then v has finite total mass so ( X t ) t ~ O has an invariant
distribution, but ( X t ) t ~ O is also transient. Given Theorem 3.5.3, the only
possibility is that ( X t ) t ~ O is explosive.
The next result justifies calling measures A with AQ = 0 invariant.
Theorem 3.5.5. Let Q be irreducible and recurrent, and let A be a mea-
sure. Let s > 0 be given. The following are equivalent:
(i) AQ = 0;
(ii) AP(S) = A.
Proof. There is a very simple proof in the case of finite state-space: by the
backward equation
d
ds>'P(s) = >.P'(s) = >.QP(s)
so AQ = 0 implies AP(S) = AP(O) = A for all s; P(s) is also recurrent, so
Il P (s) = Il implies that Il is proportional to A, so IlQ = O.
For infinite state-space, the interchange of differentiation with the sum-
mation involved in multiplication by A is not justified and an entirely dif-
ferent proof is needed.
Since Q is recurrent, it is non-explosive by Theorem 2.7.1, and P(s) is
recurrent by Theorem 3.4.3. Hence any A satisfying (i) or (ii) is unique up
to scalar multiples; and from the proof of Theorem 3.5.3, if we fix i and set
then IlQ = o. Thus it suffices to show IlP( s) = Il. By the strong Markov
property at T
i
(which is a simple consequence of the strong Markov property
of the jump chain)
3.6 Convergence to equilibrium
Hence, using Fubini's theorem,
l
s
+
Ti
/-£j = lEi s l{X
t
=j}dt
= 1
00
JP\(X
s
+t = j, t < Ti)dt
= roo '2:)J.»i(X
t
= k, t < Ti)Pkj(s)dt
Jo kEI
= L(lE
i
(Ti l{Xt=k}dt)Pk
j
(s)
kEI Jo
= :E/-£kPkj(S)
kEI
121
as required. D
Theorem 3.5.6. Let Q be an irreducible non-explosive Q-matrix having
an invariant distribution A. If ( X t ) ( ~ O is Markov(A, Q) then so is ( X s + t ) ( ~ O
for any S 2: o.
Proof. By Theorem 3.5.5, for all i,
IF(X
s
= i) = (AP(S))i = Ai
so, by the Markov property, conditional on X
s
Markov(8
i
, Q). D
3.6 Convergence to equilibrium
We now investigate the limiting behaviour of Pij (t) as t ~ 00 and its relation
to invariant distributions. You will see that the situation is analogous to the
case of discrete-time, only there is no longer any possibility of periodicity.
We shall need the following estimate of uniform continuity for the tran-
sition probabilities.
Lemma 3.6.1. Let Q be a Q-matrix with semigroup P(t). Then, for all
t, h 2: 0
Proof. We have
Ipij(t + h)-Pij(t)1 = I:EPik(h)Pkj(t) - Pij(t)1
kEI
= !:EPik(h)Pkj(t) - (1 - pii(h))pij(t) I
k#i
:s; 1 - pii(h) :s; lF
i
(J
1
:s; h) = 1 - e-
qih
. D
122 3. Continuous-time Markov chains II
Theorem 3.6.2 (Convergence to equilibrium). Let Q be an irre-
ducible non-explosive Q-matrix with semigroup P(t), and having an in-
variant distribution A. Then for all states i, j we have
Proof Let be Markov(8
i
, Q). Fix h > 0 and consider the h-skeleton
Zn = Xnh. By Theorem 2.8.4
so is discrete-time Markov(8
i
, P(h)). By Theorem 3.2.1 irreducibil-
ity implies pij(h) > 0 for all i,j so P(h) is irreducible and aperiodic. By
Theorem 3.5.5, A is invariant for P(h). So, by discrete-time convergence to
equilibrium, for all i, j
Thus we have a lattice of points along which the desired limit holds; we fill
in the gaps using uniform continuity. Fix a state i. Given € > 0 we can
find h > 0 so that
for 0 s h
and then find N, so that
for n 2: N.
For t 2: Nh we have nh t < (n + l)h for some n 2: Nand
by Lemma 3.6.1. Hence
as
D
The complete description of limiting behaviour for irreducible chains in
continuous time is provided by the following result. It follows from Theorem
1.8.5 by the same argument we used in the preceding result. We do not
give the details.
3. 7 Time reversal 123
Theorem 3.6.3. Let Q be an irreducible Q-matrix and let v be any dis-
tribution. Suppose that is Markov(v, Q). Then
as t 00 for all j E I
where mj is the expected return time to state j.
Exercises
3.6.1 Find an invariant distribution A for the Q-matrix
Q = !4
2 1 -3
and verify that = Al using your answer to Exercise 2.1.1.
3.6.2 In each of the following cases, compute P(X
t
= 21X
o
= 1) for
the Markov chain with the given Q-matrix on {1,2,3,4}:
(a)
(c)
(1
2 1 1
11)
-1 1
0 -1
0 0
(1
1 1 0
]2)
-1 0
0 -2
0 2
(b)
(d)
(1
2 1 1

-1 1
0 -1
0 0
(1
2 1 0

-2 2
1 -1
0 0
3.6.3 Customers arrive at a single-server queue in a Poisson stream of rate
A. Each customer has a service requirement distributed as the sum of two
independent exponential random variables of parameter J-l. Service require-
ments are independent of one another and of the arrival process. Write
down the generator matrix Q of a continuous-time Markov chain which
models this, explaining what the states of the chain represent. Calculate
the essentially unique invariant measure for Q, and deduce that the chain
is positive recurrent if and only if AIIl < 1/2.
3. 7 Time reversal
Time reversal of continuous-time chains has the same features found in the
discrete-time case. Reversibility provides a powerful tool in the analysis
of Markov chains, as we shall see in Section 5.2. Note in the following
124 3. Continuous-time Markov chains II
result how time reversal interchanges the roles of backward and forward
equations. This echoes our proof of the forward equation, which rested on
the time reversal identity of Lemma 2.8.5.
A small technical point arises in time reversal: right-continuous processes
become left-continuous processes. For the processes we consider, this is
unimportant. We could if we wished redefine the time-reversed process
to equal its right limit at the jump times, thus obtaining again a right-
continuous process. We shall suppose implicitly that this is done, and
forget about the problem.
Theorem 3.7.1. Let Q be irreducible and non-explosive and suppose
that Q has an invariant distribution A. Let T E (0,00) be given and let
(Xt)O<t<T be Markov(A, Q). Set X
t
= X
T
-
t
. Then the process (Xt)O<t<T
is Q), where Q= : i,j E 1) is given by = Aiqijo Mo;e-
over, Qis also irreducible and non-explosive with invariant distribution A.
Proof. By Theorem 2.8.6, the semigroup (P(t) : t 2: 0) of Q is the minimal
non-negative solution of the forward equation
P'(t) = P(t)Q, P(O) = I.
Also, for all t > 0, P(t) is an irreducible stochastic matrix with invariant
distribution A. Define P(t) by
AjPJi(t) = AiPij(t),
then P(t) is an irreducible stochastic matrix with invariant distribution A,
and we can rewrite the forward equation transposed as
P'(t) = QP(t) .
.-
But this is the backward equation for Q, which is itself a Q-matrix, and
P(t) is then its minimal non-negative solution. Hence Qis irreducible and
non-explosive and has invariant distribution A.
Finally, for 0 = to < ... < t
n
= T and Sk = tk - tk-1, by Theorem 2.8.4
we have
P(X
to
= io, ... ,X
tn
= in) = P(XT-to = io, ... ,X
T
-
tn
= in)
= AinPinin-l (sn) ... Pi
l
io (Sl)
= AioPioi
l
(Sl) ... Pin-lin (Sn)
SO, by Theorem 2.8.4 again, is Markov(A, Q). D
The chain is called the time-reversal of
A Q-matrix Q and a measure A are said to be in detailed balance if
for all i, j.
3.8 Ergodic theorem 125
Lemma 3.7.2. IfQ and A are in detailed balance then A is invariant for
Q.
Proof· We have (AQ)i = EjEI Ajqji = EjEI Aiqij = 0. D
Let ( X t ) ( ~ O be Markov(A, Q), with Q irreducible and. non-explosive.
We say that ( X t ) ( ~ O is reversible if, for all T > 0, (XT-t)OstsT is also
Markov(A, Q).
Theorem 3.7.3. Let Q be an irreducible and non-explosive Q-matrix and
let A be a distribution. Suppose that ( X t ) ( ~ O is Markov(A, Q). Then the
following are equivalent:
(a) ( X t ) ( ~ O is reversible;
(b) Q and A are in detailed balance.
Proof. Both (a) and (b) imply that A is invariant for Q. Then both (a) and
(b) are equivalent to the statement that Q= Q in Theorem 3.7.1. D
Exercise
3.7.1 Consider a fleet of N buses. Each bus breaks down independently
at rate J-l, when it is sent to the depot for repair. The repair shop can
only repair one bus at a time and each bus takes an exponential time of
parameter A to repair. Find the equilibrium distribution of the number of
buses in service.
3.7.2 Calls arrive at a telephone exchange as a Poisson process of rate A,
and the lengths of calls are independent exponential random variables of
parameter J-l. Assuming that infinitely many telephone lines are available,
set up a Markov chain model for this process.
Show that for large t the distribution of the number of lines in use at
time t is approximately Poisson with mean AIJ-l.
Find the mean length of the busy periods during which at least one line
is in use.
Show that the expected number of lines in use at time t, given that n
are in use at time 0, is ne-J-Lt + A(l - e-J-Lt)1J-l.
Show that, in equilibrium, the number Nt of calls finishing in the time
interval [0, t] has Poisson distribution of mean At.
Is ( N t ) t ~ o a Poisson process?
3.8 Ergodic theorem
Long-run averages for continuous-time chains display the same sort of be-
haviour as in the discrete-time case, and for similar reasons. Here is the
result.
126 3. Continuous-time Markov chains II
Theorem 3.8.1 (Ergodic theorem). Let Q be irreducible and let v be
any distribution. If is Markov(v, Q), then
]p> t l{x
s
=i}ds _1_ as t 00) = 1
t io miqi
where mi = IEi(T
i
) is the expected return time to state i. Moreover, in the
positive recurrent case, for any bounded function f : I lR we have
where
and where (Ai: i E I) is the unique invariant distribution.
Proof. If Q is transient then the total time spent in any state i is finite, so
1 t 1 1
t Jo l{xs =i}ds::; t Jo l{xs =i}ds 0 = mi ·
Suppose then that Q is recurrent and fix a state i. Then hits i
with probability 1 and the long-run proportion of time in i equals the long-
run proportion of time in i after first hitting i. So, by the strong Markov
property (of the jump chain), it suffices to consider the case v = bi.
Denote by Mi the length of the nth visit to i, by Tin the time of the
nth return to i and by Li the length of the nth excursion to i. Thus for
n = 0,1,2, ... , setting Tp = 0, we have
M;+l = inf{t > Tin: X
t
=1= i} - Tin
T
i
n
+
1
= inf{t > Tin + M;+l : X
t
= i}

= T?1'+l - T!"
.
By the strong Markov property (of the jump chain) at the stopping times
Tin for n 0 we find that L}, , . .. are independent and identically dis-
tributed with mean mi, and that Ml, M?, ... are independent and identi-
cally distributed with mean l/qi. Hence, by the strong law of large numbers
(see Theorem 1.10.1)
+ ... + Lr:t
mi
n
+ ... +M!L 1

n qi


and hence
3.8 Ergodic theorem
+... +M!L 1

L} +··· +Li miqi
127
with probability 1. In particular, we note that Tin /T
i
n
+
1
1 as n 00
with probability 1. Now, for Tin t < T
i
n
+
1
we have
T!L +··· +M!L 1 it T!"+l +··· +M!"+l
<- 1
T!"+l +···+Lr:t - t 0 - T!L +... +L,,:+l

so on letting t 00 we have, with probability 1
In the positive recurrent case we can write
where Ai = 1/(miqi). We conclude that
1 it
- f(Xs)ds 1
t 0

with probability 1, by the same argument as was used in the proof of The-
orem 1.10.2. D
4
Further theory
In the first three chapters we have given an account of the elementary theory
of Markov chains. This already covers a great many applications, but is
just the beginning of the theory of Markov processes. The further theory
inevitably involves more sophisticated techniques which, although having
their own interest, can obscure the overall structure. On the other hand,
the overall structure is, to a large extent, already present in the elementary
theory. We therefore thought it worth while to discuss some features of the
further theory in the context of simple Markov chains, namely, martingales,
potential theory, electrical networks and Brownian motion. The idea is that
the Markov chain case serves as a guiding metaphor for more complicated
processes. So the reader familiar with Markov chains may find this chapter
helpful alongside more general higher-level texts. At the same time, further
insight is gained into Markov chains themselves.
4.1 Martingales
A martingale is a process whose average value remains constant in a par-
ticular strong sense, which we shall make precise shortly. This is a sort of
balancing property. Often, the identification of martingales is a crucial step
in understanding the evolution of a stochastic process.
We begin with a simple example. Consider the simple symmetric random
walk ( X n ) n ~ O on Z, which is a Markov chain with the following diagram
4.1 Martingales 129
i-I
1 1
2 2
I( • •
i i+l
The average value of the walk is constant; indeed it has the stronger prop-
erty that the average value of the walk at some future time is always simply
the current value. In precise terms we have
and the stronger property says that, for n m,
E(X
n
- X
m
I X
o
= io, ... ,X
m
= i
m
) = o.
This stronger property says that is in fact a martingale.
Here is the general definition. Let us fix for definiteness a Markov chain
and write F
n
for the collection of all sets depending only on
X
o
, ... ,X
n
. The sequence is called the filtration of and
we think of F
n
as representing the state of knowledge, or history, of the
chain up to time n. A process is called adapted if M
n
depends
only on X
o
, . .. ,X
n
. A process is called integmble if EIMnl < 00
for all n. An adapted integrable process is called a martingale if
for all A E F
n
and all n. Since the collection F
n
consists of countable
unions of elementary events such as
this martingale property is equivalent to saying that
for all i
o
, . .. ,in and all n.
A third formulation of the martingale property involves another notion of
conditional expectation. Given an integrable random variable Y, we define
lE(Y I F
n
) = L lE(Y I Xo = i o,· .. ,Xn = i n )l{xo=io, ... ,Xn=i
n
}'
io,··· ,in
130 4. Further theory
The random variable E(Y I F
n
) is called the conditional expectation of Y
given F
n
. In passing from Y to E(Y I F
n
), what we do is to replace on
each elementary event A E F
n
, the random variable Y by its average value
E(Y I A). It is easy to check that an adapted integrable process ( M n ) n ~ o
is a martingale if and only if
Conditional expectation is a partial averaging, so if we complete the process
and average the conditional expectation we should get the full expectation
E(E(Y I F
n
)) = E(Y).
It is easy to check that this formula holds.
In particular, for a martingale
so, by induction
This was already clear on taking A = n in our original definition of a
martingale.
We shall prove one general result about martingales, then see how it
explains some things we know about the simple symmetric random walk.
Recall that a random variable
T : n ~ {O, 1, 2, ... } u {(X)}
is a stopping time if {T = n} E F
n
for all n < 00. An equivalent condition
is that {T ::; n} E F
n
for all n < 00. Recall from Section 1.4 that all sorts
of hitting times are stopping times.
Theorem 4.1.1 (Optional stopping theorem). Let ( M n ) n ~ o be a
martingale and let T be a stopping time. Suppose that at least one of the
following conditions holds:
(i) T::; n for some n;
(ii) T < 00 and IMnl ~ C whenever n ~ T.
Then lEMr = lEMo.
Proof. Assume that (i) holds. Then
M
r
- M
o
= (M
r
- M
r
- 1 ) + ···+ (M1 - Mo)
n-l
= :2)Mk+l - Mk)lk<T.
k=O
4.1 Martingales
Now {k.< T} = {T k}C E Fk since T is a stopping time, and so
since is a martingale. Hence
n-l
EM
T
- EM
o
= L E[(Mk+l - Mk)lk<T] = O.
k=O
131
If we do not assume (i) but (ii), then the preceding argument applies to the
stopping time T 1\ n, so that IEMTAn = IEM
o
. Then
IIEM
T
- IEMol = IIEM
T
- IEMTAnl IEIM
T
- MTAnl 2CJP>(T > n)
for all n. But JP>(T > n) 0 as n 00, so EM
T
= EM
o
. D
Returning to the simple symmetric random walk suppose that
X
o
= 0 and we take
T = inf{n 0 : X
n
= -a or X
n
= b}
where a, bEN are given. Then T is a stopping time and T < 00 by
recurrence of finite closed classes. Thus condition (ii) of the optional stop-
ping theorem applies with M
n
= X
n
and C = a V b. We deduce that
IEXT = EX
o
= O. So what? Well, now we can compute
p = JP>(X
n
hits -a before b).
We have X
T
= -a with probability p and X
T
= b with probability 1 - p,
so
o= EXT = p( -a) + (1 - p)b
giving
p = b/(a +b).
There is an entirely different, Markovian, way to compute p, using the
methods of Section 1.4. But the intuition behind the result EXT = 0 is
very clear: a gambler, playing a fair game, leaves the casino once losses
reach a or winnings reach b, whichever is sooner; since the game is fair, the
average gain should be zero.
We discussed in Section 1.3 the counter-intuitive case of a gambler who
keeps on playing a fair game against an infinitely rich casino, with the
certain outcome of ruin. This game ends at the finite stopping time
T = inf{n 0 : X
n
= -a}
132 4. Further theory
where a is the gambler's initial fortune. Since XT = -a we have
EXT = -a =I 0 = EX
o
but this does not contradict the optional stopping theorem because neither
condition (i) nor condition (ii) is satisfied. Thus, while intuition might
suggest that EXT = EX
o
is rather obvious, some care is needed as it is not
always true.
The example just discussed was rather special in that the chain
itself was a martingale. Obviously, this is not true in general; indeed a
martingale is necessarily real-valued and we do not in general insist that
the state-space I is contained in JR. Nevertheless, to every Markov chain is
associated a whole collection of martingales, and these martingales charac-
terize the chain. This is the basis of a deep connection between martingales
and Markov chains.
We recall that, given a function f : I JR and a Markov chain
with transition matrix P, we have
(pn J)(i) = Ii = lEi (J(X
n
)).
jEI
Theorem 4.1.2. Let be a random process with values in I and
let P be a stochastic matrix. Write for the filtration of
Then the following are equivalent:
(i) is a Markov chain with transition matrix P;
(ii) for all bounded functions f : I JR, the following process is a mar-
tingale:
n-l
= f(X
n
) - f(X
o
) - L (P - I)f(X
m
).
m=O
Proof. Suppose (i) holds. Let f be a bounded function. Then
I(PJ)(i)1 = ILPij1i1 ::; llil
jEI J
so
2(n+1)suplfjl < 00
j
showing that is integrable for all n.
Let A = {X
o
= i
o
, ... ,X
n
= in}. By the Markov property
4.1 Martingales
so
- M! I A) = E[f(X
n
+
l
) - (Pf)(X
n
) I A] = 0
and so is a martingale.
On the other hand, if (ii) holds, then
for all bounded functions f. On taking f = l{i
n
+l} we obtain
133
so is Markov with transition matrix P. D
Some more martingales associated to a Markov chain are described in
the next result. Notice that we drop the requirement that f be bounded.
Theorem 4.1.3. Let be a Markov chain with transition matrix
P. Suppose that a function f : N x I lR satisfies, for all n 0, both
and
(Pf)(n+ 1,i) = LPijf(n+ 1,j) = f(n,i).
JEI
Then M
n
= f(n, X
n
) is a martingale.
Proof. We have assumed that M
n
is integrable for all n. Then, by the
Markov property
E(M
n
+
1
- M
n
I X
o
= i
o
,· .. ,X
n
= in) = E
in
[f(n +1, Xl) - f(n, X
o
)]
= (Pf)(n +1, in) - f(n, in) = O.
So is a martingale. D
Let us see how this theorem works in the case where is a simple
random walk on Z, starting from O. We consider f (i) = i and g(n, i) =
i
2
- n. Since IXnl n for all n, we have
Also
(Pf)(i) = (i - 1)/2 + (i +1)/2 = i = f(i),
(Pg)(n +1, i) = (i - 1)2/2 + (i - 1)2/2 - (n +1) = i
2
- n = g(n, i).
Hence both X
n
= f(X
n
) and Y
n
= g(n, X
n
) are martingales.
134 4. Further theory
In order to put this to some use, consider again the stopping time
T = inf{n ~ 0 : X
n
= -a or X
n
= b}
where a, bEN. By the optional stopping theorem
Hence
On letting n ---+ 00, the left side converges to E(T), by monotone conver-
gence, and the right side to E(Xf) by bounded convergence. So we obtain
We have given only the simplest examples of the use of martingales in
studying Markov chains. Some more will appear in later sections. For
an excellent introduction to martingales and their applications we recom-
mend Probability with Martingales by David Williams (Cambridge Univer-
sity Press, 1991).
Exercise
4.1.1 Let ( X n ) n ~ O be a Markov chain on I and let A be an absorbing set
in I. Set
T = inf{n ~ 0: X
n
E A}
and
hi = IPi(X
n
E A for some n ~ 0) = IPi(T < 00).
Show that M
n
= h(X
n
) is a martingale.
4.2 Potential theory
Several physical theories share a common mathematical framework, which is
known as potential theory. One example is Newton's theory of gravity, but
potential theory is also relevant to electrostatics, fluid flow and the diffusion
of heat. In gravity, a distribution of mass, of density p say, gives rise to a
gravitational potential ¢, which in suitable units satisfies the equation
-Da¢ = p,
where ~ = 8
2
/8x
2
+8
2
/8y2 +8
2
/ 8z
2
. The potential ¢ is felt physically
through its gradient
'\l¢ = (8¢ 8¢ 8¢)
8x' 8y' 8z
4.2 Potential theory 135
which gives the force of gravity acting on a particle of unit mass. Markov
chains, where space is discrete, obviously have no direct link with this the-
ory, in which space is a continuum. An indirect link is provided by Brownian
motion, which we shall discuss in Section 4.4.
In this section we are going to consider potential theory for a count-
able state-space, which has much of the structure of the continuum version.
This discrete theory amounts to doing Markov chains without the proba-
bility, which has the disadvantage that one loses the intuitive picture of the
process, but the advantage of wider applicability. We shall begin by intro-
ducing the idea of potentials associated to a Markov chain, and by showing
how to calculate these potentials. This is a unifying idea, containing within
it other notions previously considered such as hitting probabilities and ex-
pected hitting times. It also finds application when one associates costs to
Markov chains in modelling economic activity: see Section 5.4.
Once we have established the basic link between a Markov chain and its
associated potentials, we shall briefly run through some of the main features
of potential theory, explaining their significance in terms of Markov chains.
This is the easiest way to appreciate the general structure of potential
theory, unobscured by technical difficulties. The basic ideas of boundary
theory for Markov chains will also be introduced.
Before we embark on a general discussion of potentials associated to
a Markov chain, here are two simple examples. In these examples the
potential cP has the interpretation of expected total cost.
1
2
5
4
Example 4.2.1
Consider the discrete-time random walk on the directed graph shown above,
which at each step choses among the allowable transitions with equal proba-
bility. Suppose that on each visit to states i = 1,2,3,4 a cost Ci is incurred,
where Ci = i. What is the fair price to move from state 3 to state 4?
The fair price is always the difference in the expected total cost. We
denote by cPi the expected total cost starting from i. Obviously, cPs = 0 and
136 4. Further theory
by considering the effect of a single step we see that
¢l = 1 + ¢2,
¢2 = 2 + ¢3,
¢3 = 3 + +
¢4 = 4.
Hence ¢3 = 8 and the fair price to move from 3 to 4 is 4.
We shall now consider two variations on this problem. First suppose
our process is, instead, the continuous-time random walk on the
same directed graph which makes each allowable transition at rate 1, and
suppose cost is incurred at rate Ci = i in state i for i = 1,2,3,4. Thus the
total cost is now
1
00
c(Xs)ds.
What now is the fair price to move from 3 to 4? The expected cost incurred
on each visit to i is given by Ci/ qi and ql = 1, q2 = 1, q3 = 3, q4 = 1. So we
see, as before
¢l = 1 + ¢2,
¢2 = 2 + ¢3,
¢3 = i + +
¢4 = 4.
Hence ¢3 = 5 and the fair price to move from 3 to 4 is 1.
1
2
5
4
In the second variation we consider the discrete-time random walk
on the modified graph shown above. Where there is no arrow,
transitions are allowed in both directions. Obviously, states 1 and 5 are
absorbing. We impose a cost Ci = i on each visit to i for i = 2, 3, 4, and a
final cost Ii on arrival at i = 1 or 5, where Ii = i. Thus the total cost is
now
T-l
L c(X
n
) + f(X
T
)
n=O
4.2 Potential theory 137
where T is the hitting time of {I, 5}. Write, as before, cPi for the expected
total cost starting from i. Then cPl = 1, cPs = 5 and
cP2 = 2 + ~ ( c P l + cP3),
cP3 = 3 + ~ ( c P l +cP2 + cP4 + cPs),
cP4 = 4 + ~ ( c P 3 + cPs).
On solving these equations we obtain cP2 = 7, cP3 = 9 and cP4 = 11. So in
this case the fair price to move from 3 to 4 is -2.
Example 4.2.2
Consider the simple discrete-time random walk on Z with transition prob-
abilities Pi,i-l = q < P = Pi,i+l. Let c > 0 and suppose that a cost c
i
is
incurred every time the walk visits state i. What is the expected total cost
cPo incurred by the walk starting from O?
We must be prepared to find that cPo = 00 for some values of c, as the
total cost is a sum over infinitely many times. Indeed, we know that the
walk X
n
~ 00 with probability 1, so for c ~ 1 we shall certainly have
cPo = 00.
Let cPi denote the expected total cost starting from i. On moving one
step to the right, all costs are multiplied by c, so we must have
By considering what happens on the first step, we see
cPo = 1 + PcPl + qcP-l = 1 + (cp + q/c)cPo.
Note that cPo = 00 always satisfies this equation. We shall see in the general
theory that cPo is the minimal non-negative solution. Let us look for a finite
solution: then
so
c
cPo = .
c - c
2
p - q
The quadratic c
2
p - c + q has roots at q/p and 1, and takes negative values
in between. Hence the expected total cost is given by
{
c/(c- c
2
p-q)
cPo =
00
if c E (q/p, l)
otherwise.
138 4. Further theory
It was clear at the outset that cPo = 00 when c ~ 1. It is interesting that
cPo = 00 also when c is too small: in this case the costs rapidly become large
to the left of 0, and although the walk eventually drifts away to the right,
the expected cost incurred to the left of 0 is infinite.
In the examples just discussed we were able to calculate potentials by
writing down and solving a system of linear equations. This situation is
familiar from hitting probabilities and expected hitting times. Indeed, these
are simple examples of potentials for Markov chains. As the examples show,
one does not really need a general theory to write down the linear equations.
Nevertheless, we are now going to give some general results on potentials.
These will help to reveal the scope of the ideas used in the examples, and
will reveal also what happens when the linear equations do not have a
unique solution. We shall discuss the cases of discrete and continuous time
side-by-side. Throughout, we shall write ( X n ) n ~ O for a discrete-time chain
with transition matrix P, and ( X t ) t ~ O for a continuous-time chain with
generator matrix Q. As usual, we insist that ( X t ) ( ~ O be minimal.
Let us partition the state-space I into two disjoint sets D and aD; we call
aD the boundary. We suppose that functions (Ci : i E D) and (Ii: i E aD)
are given. We shall consider the associated potential, defined by
in discrete time, and in continuous time
where T denotes the hitting time of aD. To be sure that the sums and
integrals here are well defined, we shall assume for the most part that c
and I are non-negative, that is, Ci ~ 0 for all i E D and Ii ~ 0 for all
i E aD. More generally, ¢ is the difference of the potentials associated with
the positive and negative parts of c and I, so this assumption is not too
restrictive. In the explosive case we always set c(00) = 0, so no further
costs are incurred after explosion.
The most obvious interpretation of these potentials is in terms of cost:
the chain wanders around in D until it hits the boundary: whilst in D, at
state i say, it incurs a cost Ci per unit time; when and if it hits the boundary,
at j say, a final cost Ij is incurred. Note that we do not assume the chain
will hit the boundary, or even that the boundary is non-empty.
4.2 Potential theory 139
Theorem 4.2.3. Suppose that (Ci : i E D) and (Ii: i E aD) are non-
negative. Set
where T denotes the hitting time of aD. Then
(i) the potential ¢ = (¢i : i E I) satisfies
{
¢ = P¢+ C
¢=I
(ii) if'l/J = ('l/Ji : i E I) satisfies
in D
in aD;
(4.1)
{
'l/J ~ P'l/J+c in D
'l/J ~ I in aD
(4.2)
and 'l/Ji ~ 0 for all i, then 'l/Ji ~ ¢i for all i;
(iii) if IPi(T < 00) = 1 for all i, then (4.1) has at most one bounded
solution.
Proof. (i) Obviously, ¢ = f on aD. For i E D by the Markov property
IEi ( L c(Xn ) + f(XT )IT<oo Xl = j)
l ~ n < T
= IEj (L c(Xn ) + f(XT )lT <OO) = ¢j
n<T
so we have
<Pi = Ci +LpijlE ( L c(Xn ) + f(XT )lT <oo Xl = j)
jEI l ~ n < T
=Ci+ LPWPj
jEI
as required.
(ii) Consider the expected cost up to time n:
140 4. Further theory
By monotone convergence, ¢i(n) i ¢i as n 00. Also, by the argument
used in part (i), we find
{
¢(n + 1) = c + P¢(n) in D
¢(n + 1) = f in aD.
Suppose that 1/J satisfies (4.2) and 1/J 0 = ¢(O). Then 1/J P1/J + c
P¢(O) + c = ¢(1) in D and 1/J f = ¢(1) in aD, so 1/J ¢(1). Similarly
and by induction, 1/J ¢(n) for all n, and hence 1/J ¢.
(iii) We shall show that if 1/J satisfies (4.2) then
with equality if equality holds in (4.2). This is another proof of (ii). But
also, in the case of equality, if l1/Ji I M and Pi (T < 00) = 1 for all i, then

so 1/J = ¢(n) = ¢, proving (iii).
For i E D we have
'ljJi Ci + L Pij!j + LPij'ljJj
jEaD JED
and, by repeated substitution for 1/J on the right
'ljJi Ci + L Pijli + LPijCj
jEaD JED
+ ···+ 2: ... 2: Piil'" Pjn-2jn-l Cjn_l
jlED jn-IED
+ L '" L L PijI'" Pjn-lin !jn
jlED jn-IEDjnEaD
+ 2: ... L Pijl · · · Pjn-lin'ljJjn
jlED jnED
= E
i
(c(XO)lT>O + !(XdlT=l + C(Xdlr>l
+ ···+C(Xn-1)lT>n-l +!(Xn)lT=n +'ljJ(Xn)lT>n)
= ¢i(n - 1) + IE
i

as required, with equality when equality holds in (4.2). D
4.2 Potential theory 141
It is illuminating to think of the calculation we have just done in terms
of martingales. Consider
n-l
Mn = L c(Xk)lk<T + f(XT )IT<n +'ljJ(Xn)ln::;T'
k==O
Then
n-l
lE(Mn+l I Fn) = L c(Xk)lk<T + f(X
T
)lT<n
k==O
+(P1jJ + c)(X
n
)lT>n + I(X
n
)lT==n
:::;M
n
with equality if equality holds in (4.2). We note that M
n
is not necessarily
integrable. Nevertheless, it still follows that
with equality if equality holds in (4.2).
For continuous-time chains there is a result analogous to Theorem 4.2.3.
We have to state it slightly differently because when cP takes infinite val-
ues the equations (4.3) may involve subtraction of infinities, and therefore
not make sense. Although the conclusion then appears to depend on the
finiteness of cP, which is a priori unknown, we can still use the result to
determine cPi in all cases. To do this we restrict our attention to the set of
states J accessible from i. If the linear equations have a finite non-negative
solution on J, then (cPj : j E J) is the minimal such solution. If not, then
cPj = 00 for some j E J, which forces cPi = 00, since i leads to j.
Theorem 4.2.4. Assume that ( X t ) t ~ O is minimal, and that (Ci : i E D)
and (Ii: i E aD) are non-negative. Set
where T is the hitting time of aD. Then cP = (cPi : i E I), if finite, is the
minimal non-negative solution to
{
-QcP = C
cP=1
in D
in aD.
(4.3)
142 4. Further theory
If ¢i = 00 for some i, then (4.3) has no finite non-negative solution. More-
over, if IPi(T < 00) == 1 for all i, then (4.3) has at most one bounded
solution.
Proof. Denote by (Yn)n>O and 8
1
,8
2
, ... the jump chain and holding times
of and by IT the jump matrix. Then
iTc(Xt)dt +!(XT)lT<oo = L c(Yn)Sn+l +!(YN)lN<oo
o n<N
where N is the first time hits aD, and where we use the convention
ox 00 = 0 on the right. We have
()
") __ { Cj / qj
lE c(Yn Sn+l I Yn = J = Cj = 0
so, by Fubini's theorem
if Cj > 0
if Cj = 0,
¢i = lEi(L c(Yn) +!(YN)lN<oo).
n<N
By Theorem 4.2.3, ¢ is therefore the minimal non-negative solution to
{
¢ = IT¢ + c in D
¢ == f in aD,
(4.4)
which equations have at most one bounded solution if IPi(N < 00) = 1 for
all i. Since the finite solutions of (4.4) are exactly the finite solutions of
(4.3), and since N is finite whenever T is finite, this proves the result. D
It is natural in some economic applications to apply to future costs a
discount factor a E (0, 1) or rate A E (0, 00), corresponding to an interest
rate. Potentials with discounted costs may also be calculated by linear
equations; indeed the discounting actually makes the analysis easier.
Theorem 4.2.5. Suppose that (Ci : i E I) is bounded. Set
00
¢i = lEi L anc(X
n
)
n==O
then ¢ = (¢i : i E I) is the unique bounded solution to
¢ = aP¢+c.
4.2 Potential theory
Proof. Suppose that ICil ::; C for all i, then
00
I<Pi I :::; CLan = C/(l- a)
n==O
so ¢ is bounded. By the Markov property
Then
00
<Pi = lEi L anc(X
n
)
n==O
= ci +a LPijlE (fan-Ic(X
n
) Xl = j)
jEI n==l
= Ci +a LPij<Pj,
jEI
so
143
¢ = c+aP¢.
On the other hand, suppose that 1/J is bounded and also that 1/J = c + aP1/J.
Set M = sUPi l1/Ji - ¢il, then M < 00. But
1/J - ¢ = aP(1/J - ¢)
so
I'l/Ji - <Pi I :::; a LPijl'l/Jj - <pjl :::; aM.
jEI
Hence M ::; aM, which forces M = 0 and 1/J = ¢. D
We have a similar looking result for continuous time, which however lies
a little deeper, because it really corresponds to a version of the discrete-time
result where the discount factor may depend on the current state.
Theorem 4.2.6. Assume that ( X t ) t ~ O is non-explosive. Suppose that
(Ci : i E I) is bounded. Set
<Pi = lEi 1
00
e->..tc(Xt)dt,
then ¢ = (¢i : i E I) is the unique bounded solution to
(A - Q)¢ = c. (4.5)
144 4. Further theory
Proof. Assume for now that c is non-negative. Introduce a new state awith
Ca = O. Let T be an independent E(-X) random variable and define
- {Xt for t < T
X
t
=
a for t ~ T.
Then ( X t ) t ~ O is a Markov chain on I U {a} with modified transition rates
Qi = qi + -X, Qia = -X, qa = O.
Also T is the hitting time of a, and is finite with probability 1. By Fubini's
theorem
<Pi = Ei l
T
c(Xt)dt.
Suppose Ci :::; C for all i, then
so ¢ is bounded. Hence, by Theorem 4.2.4, ¢ is the unique bounded solution
to
-Q¢ = c,
which is the same as (4.5).
When C takes negative values we can apply the preceding argument to
the potentials
<P; = E
i
1
00
e->.tc±(Xt)dt
where ct = (±c) V o. Then ¢ = ¢+ - ¢- so ¢ is bounded. We have
so, subtracting
(-X - Q)¢ = c.
Finally, if 1jJ is bounded and (-X-Q)1jJ = c, then (-X-Q)(1jJ-¢) = 0, so 1jJ-¢
is the unique bounded solution for the case when c = 0, which is O. D
The point of view underlying the last four theorems was that we were
interested in a given potential associated to a Markov chain, and wished to
calculate it. We shall now take a brief look at some structural aspects of
the set of all potentials of a given Markov chain. What we describe is just
the simplest case of a structure of great generality. First we shall look at
the Green matrix, and then at the role of the boundary.
4.2 Potential theory 145
Let us consider potentials with non-negative costs c, and without bound-
ary. The potential is defined by
00
¢i = lEi L c(X
n
)
n==O
in discrete time, and in continuous time
By Fubini's theorem we have
00 00
¢i = L lEic(X
n
) = L(pnC)i = (GC)i
n==O n==O
where G = (9ij : i, j E I) is the Green matrix
00
Similarly, in continuous time ¢ = Gc, with
G = 1
00
P(t)dt.
Thus, once we know the Green matrix, we have explicit expressions for
all potentials of the Markov chain. The Green matrix is also called the
fundamental solution of the linear equations (4.1) and (4.3). The jth column
(9ij : i E I) is itself a potential. We have
00
9ij = lEi L lXn=j
n==O
in discrete time, and in continuous time
Thus 9ij is the expected total time in j starting from i. These quantities
have already appeared in our discussions of transience and recurrence in
Sections 1.5 and 2.11: we know that 9ij = 00 if and only if i leads to j and
j is recurrent. Indeed, in discrete time
146 4. Further theory
where hi is the probability of hitting j from i, and /j is the return proba-
bility for j. The formula for continuous time is
For potentials with discounted costs the situation is similar: in discrete
time
00 00
<Pi = lEi L anc(X
n
) = L anlEic(X
n
) = (RaC)i
n=O n=O
where
and in continuous time
where
R>.. = 1
00
e->..t P(t)dt.
We call (R
Q
: Q: E (0,1)) and (R>.. : ,\ E (0,00)) the resolvent of the Markov
chain. Unlike the Green matrix the resolvent is always finite. Indeed, for
finite state-space we have
and
We return to the general case, with boundary aD. Any bounded function
(¢i : i E I) for which
¢ = P¢ in D
is called harmonic in D. Our object now is to examine the relation between
non-negative functions, harmonic in D, and the boundary aD. Here are
two examples.
a
4.2 Potential theory
, ~ , ~
-
.....
......
--
b
147
Example 4.2.7
Consider the random walk ( X n ) n ~ O on the above graph, where each allow-
able transition is made with equal probability. States a and bare absorbing.
We set aD = {a, b}. Let hi denote the absorption probability for a, starting
from i. By the method of Section 1.3 we find
1/2 5/12)
1/2 1/3
1/2 0
in D
where we have written the vector h
a
as a matrix, corresponding in an ob-
vious way to the state-space. The linear equations for the vector h
a
read
{
ha = Ph
a
h ~ = 1, hg = o.
Thus we can find two non-negative functions h
a
and h
b
, harmonic in D,
but with different boundary values. In fact, the most general non-negative
harmonic function ¢ in D satisfies
{
¢=P¢
¢=f
where fa' fb 2:: 0, and this implies
in D
in aD
Thus the boundary points a and b give us extremal generators h
a
and h
b
of the set of all non-negative harmonic functions.
148 4. Further theory
Example 4.2.8
Consider the random walk on Z which jumps towards 0 with prob-
ability q and jumps away from 0 with probability p = 1 - q, except that at
oit jumps to -lor 1 with probability 1/2. We choose p > q so that the
walk is transient. In fact, starting from 0, we can show that is
equally likely to end up drifting to the left or to the right, at speed p - q.
Let us consider the problem of determining for the set C of all
non-negative harmonic functions cP. We must have
cPi = PcPi+l + qcPi-l
cPo = +
cPi = qcPi+l + PcPi-l
The first equation has general solution
for i = 1, 2, . .. ,
for i = -1, - 2, . .. .
cPi = A + B(l - (q/p)i) for i = 0,1,2, ... ,
which is non-negative provided A + B O. Similarly, the third equation
has general solution
cPi = A' + B' (1 - (q/p)-i) for i = 0, -1, -2, ... ,
non-negative provided A' + B' O. To obtain a general harmonic function
we must match the values cPo and satisfy cPo = (cPl + cP-l)/2. This forces
A = A' and B + B' = O. It follows that all non-negative harmonic functions
have the form
where f-, f+ 0 and where hi = and
+ _ { + (1 - (q/p)i)
hi - 1 1 ( ")
2 - 2 1 - (q/p)-t
for i = 0, 1, 2, ... ,
for i = -1, - 2, . .. .
In the preceding example the generators of C were in one-to-one corre-
spondence with the points of the boundary - the possible places for the
chain to end up. In this example there is no boundary, but the generators
of C still correspond to the two possibilities for the long-time behaviour of
the chain. For we have
The suggestion of this example, which is fully developed other works, is
that the set of non-negative harmonic functions may be used to identify a
4.2 Potential theory 149
generalized notion of boundary for Markov chains, which sometimes just
consists of points in the state-space, but more generally corresponds to the
varieties of possible limiting behaviour for X
n
as n ~ 00. See, for example,
Markov Chains by D. Revuz (North-Holland, Amsterdam, 1984).
We cannot begin to give the general theory corresponding to Example
4.2.8, but we can draw some general conclusions from Theorem 4.2.3 when
the situation is more like Example 4.2.7. Suppose we have a Markov chain
( X n ) n ~ O with absorbing boundary aD. Set
h? = JP>i(T < 00)
(4.6)
{
h8 = Ph
8
in D
h
8
= 1 in aD.
Note that hf = 1 for all i always gives a possible solution. Hence if (4.6) has
a unique bounded solution then hf = JP>i(T < 00) = 1 for all i. Conversely,
if Pi (T < 00) = 1 for all i, then, as we showed in Theorem 4.2.3, (4.6) has
a unique bounded solution. Indeed, we showed more generally that this
condition implies that
where T is the hitting time of aD. Then by the methods of Section 1.3 we
have
{
¢ = P¢ + c in D
¢ = f in aD
has at most one bounded solution, and since
(4.7)
is the minimal solution, any bounded solution is given by (4.7). Suppose
from now on that JP>i(T < 00) = 1 for all i. Let ¢ be a bounded non-negative
function, harmonic in D, with boundary values ¢i = Ii for i E aD. Then,
by monotone convergence
<Pi = lEi (J(X
T
)) = L hJI»i(XT=j).
jE8D
Hence every bounded harmonic function is determined by its boundary
values and, indeed
<P = L !i
hi
,
jE8D
where
150 4. Further theory
Just as in Example 4.2.7, the hitting probabilities for boundary states form
a set of extremal generators for the set of all bounded non-negative harmonic
functions.
Exercises
4.2.1 Consider a discrete-time Markov chain ( X n ) n ~ O and the potential ¢
with costs (Ci : i E D) and boundary values (Ii: i E aD). Set
ifn ~ T
if n > T,
n<T
where T is the hitting time of aD and a is a new state. Show that ( X n ) n ~ o
is a Markov chain and determine its transition matrix.
Check that
00
<Pi = E
i
L c(X
n
) = E
i
L c(X
n
)
n=O
where T = T + 1 and where we set Ci = Ii on aD and Ca = O. This
shows that a general potential may always be considered as a potential
with boundary value zero or, indeed, without boundary at all.
Can you find a similar reduction for continuous-time chains?
4.2.2 Prove the fact claimed in Example 4.2.8 that
4.2.3 Let (Ci : i E I) be a non-negative function. Partition I as D U aD and
suppose that the linear equations
{
¢ = P¢ +c in D
¢ = 0 in aD
have a unique bounded solution. Show that the Markov chain ( X n ) n ~ O
with transition matrix P is certain to hit aD.
Consider now a new partition Du aD, where D ~ D. Show that the
linear equations
{
'l/J = P'l/J + c in 15_
1f;=0 in aD
also have a unique bounded solution, and that
11/" < ,h.
0/'1, - 0/'1,
for all i E I.
4.3 Electrical networks
4.3 Electrical networks
151
An electrical network has a countable set I of nodes, each node i having a
capacity 1ri > O. Some nodes are joined by wires, the wire between i and
j having conductivity aij = aji 2:: o. Where no wire joins i to j we take
aij = O. In practice, each 'wire' contains a resistor, which determines the
conductivity as the reciprocal of its resistance. Each node i holds a certain
charge Xi, which determines its potential ¢i by
A current or flow of charge is any matrix (T'ij : i, j E I) with T'ij = -T'ji.
Physically it is found that the current T'ij from i to j obeys Ohm's law:
Thus charge flows from nodes of high potential to nodes of low potential.
The first problem in electrical networks is to determine equilibrium flows
and potentials, subject to given external conditions. The nodes are parti-
tioned into two sets D and aD. External connections are made at the nodes
in aD and possibly at some of the nodes in D. These have the effect that
each node i E aD is held at a given potential Ii, and that a given current 9i
enters the network at each node i ED. The case where gi = 0 corresponds
to a node with no external connection. In equilibrium, current may also
enter or leave the network through aD , but here it is not the current but
the potential which is determined externally.
Given a flow (T'ij : i, j E I) we shall write T'i for the total flow from i to
the network:
'Yi = 2: 'Yij'
JEI
In equilibrium the charge at each node is constant, so
T'i = 9i
for i E D.
Therefore, by Ohm's law, any equilibrium potential ¢ = (¢i : i E I) must
satisfy
{
LjEIaij(¢i - ¢j) = 9i,
¢i = fi'
for i E D
for i E aD.
(4.8)
There is a simple correspondence between electrical networks and reversible
Markov chains in continuous-time, given by
for i =I j.
152 4. Further theory
We shall assume that the total conductivity at each node is finite:
ai = Laij < 00.
j#i
Then ai = 7riqi = -7riqii. The capacities 7ri are the components of an
invariant measure, and the symmetry of aij corresponds to the detailed
balance equations. The equations for an equilibrium potential may now be
written in a form familiar from the preceding section:
{
-Q¢ = c in D
¢ = f in aD,
(4.9)
(4.10)
where Ci = 9i/7ri. It is natural that c appears here and not 9, because ct
and f have the same physical dimensions. We know that these equations
may fail to have a unique solution, indicating the interesting possibility
that there may be more than one equilibrium potential. However, to keep
matters simple here, we shall assume that I is finite, that the network is
connected, and that aD is non-empty. This is enough to ensure uniqueness
of potentials. Then, by Theorem 4.2.4, the equilibrium potential is given
by
¢i = E
i
(iT c(Xt)dt + f(X
T
))
where T is the hitting time of aD.
In fact, the case where aD is empty may be dealt with as follows: we
must have
L9i=O
iEI
or there is no possibility of equilibrium; pick one node k, set aD = {k}, and
replace the condition ~ k = 9k by ¢i = O. The new problem is equivalent to
the old, but now aD is non-empty.
A
B
c
2
1
2
1
2
1
2
D
E
F
4.3 Electrical networks 153
Example 4.3.1
Determine the equilibrium current in the network shown on the preceding
page when unit current enters at A and leaves at F. The conductivities are
shown on the diagram. Let us set cPA = 1 and cPF = O. This will result in
some flow from A to F, which we can scale to get a unit flow. By symmetry,
cPE = 1- cPB and cPD = 1- cPe. Then, by Ohm's law, since the total current
leaving Band C must vanish
(cPB - cPA) + (cPB - cPE) + 2(cPB - cPe) = 0,
2(cPe - cPF) + 2(cPe - cPB) = O.
Hence, cPB = 1/2 and cPe = 1/4, and the associated flow is given by =
1/2, = 1/2, = 1/2, = O. In fact, we were lucky - no scaling
was necessary.
Note that the node capacities do not affect the problem we considered.
Let us arbitrarily assign to each node a capacity 1. Then there is an asso-
ciated Markov chain and, according to (4.10), the equilibrium potential is
given by
cPi = Ei (lxT=A) = JP>i(XT = A)
where T is the hitting time of {A, F}. Different node capacities result
in different Markov chains, but the same jump chain and hence the same
hitting probabilities.
Here is a general result expressing equilibrium potentials, flows and
charges in terms of the associated Markov chain.
Theorem 4.3.2. Consider a finite network with external connections at
two nodes A and B, and the associated Markov chain
(a) The unique equilibrium potential cP with cPA = 1 and cPB = 0 is given
by
where TA and TB are the hitting times of A and B.
(b) The unique equilibrium flow with = 1 and = -1 is given by
where r
ij
is the number of times that jumps from i to j before
hitting B.
(c) The charge X associated with subject to XB = 0, is given by
154 4. Further theory
Proof. The formula for ¢ is a special case of (4.10), where c = 0 and f =
l{A}. We shall prove (b) and (c) together. Observe that if X
o
= A then
if i = A
if i {A,B}
if i = B
so if = EA(r
ij
- r
ji
) then is a unit flow from A to B. We have
00
roo = '""'1{,-," _0,-," 0 N}
'1,) L-J B
n=O
where NB is the hitting time of B for the jump chain So, by the
Markov property of the jump chain
00
lEA(rij) = LIPA(Y
n
= i, Y
n
+! = j, n < NB)
n=O
00
= LIPA(Y
n
= i,n < NB)1rij.
n=O
Set
fTB
Xi = lEA 10 l{Xt=i}dt
and consider the associated potential'l/Ji = Xi/1ri. Then
00
Xiqij = XiQi
1r
ij = LIPA(Y
n
= i,n < NB)
1r
ij = lEA(rij)
n=O
so
('l/Ji -'l/Jj)aij = Xiqij - Xjqij =
Hence'l/J = ¢, is the equilibrium unit flow and X the associated charge, as
required. D
The interpretation of potential theory in terms of electrical networks
makes it natural to consider notions of energy. We define for a potential
¢ = (¢i : i E I) and a flow = : i, j E I)
E(<p) = l L (<Pi - <pj)2aij ,
i,jEI
I(r) = l L ,f
j
aijl.
i,jEI
4.3 Electrical networks 155
The 1/2 means that each wire is counted once. When ¢ and ~ are related
by Ohm's law we have
E(¢) = l L (¢i - ¢j)rij = I(r)
i,jEI
and E(¢) is found physically to give the rate of dissipation of energy, as heat,
by the network. Moreover, we shall see that certain equilibrium potentials
and flows determined by Ohm's law minimize these energy functions. This
characteristic of energy minimization can indeed replace Ohm's law as the
fundamental physical principle.
Theorem 4.3.3. The equilibrium potential and flow may be determined
as follows.
(a) The equilibrium potential ¢ = (¢i : i E I) with boundary values
¢i = Ii for i E 8D and no current sources in D is the unique solution
to
minimize E (¢)
subject to ¢i = fi for i E aD.
(b) The equilibrium flow r = (rij : i, j E I) with current sources ri = 9i
for i E D and boundary potential zero is the unique solution to
minimize I (~ )
subject to ~ i = 9i for i E D.
Proof. For any potential ¢ = (¢i : i E I) and any flow ~ = ( ~ i j : i,j E I)
we have
L (¢i - ¢j)rij = 2 L ¢i'Yi.
i,jEI iEI
(a) Denote by ¢ = (¢i : i E I) and by ~ = ( ~ i j : i,j E I) the equilibrium
potential and flow. We have ~ i = 0 for i E D. We can write any potential
in the minimization problem in the form ¢ +c, where c = (ci : i E I) with
Ci = 0 for i E aD. Then
L (Ei - Ej)(¢i - ¢j)aij = L (Ei - Ej)rij = 2 LEi'Yi = 0
i,j EI i,j EI iEI
so
E(¢ +c) = E(¢) +E(c) ~ E(¢)
with equality only if c = o.
156 4. Further theory
(b) Denote by cP = (cPi : i E I) and by ~ = ( ~ i j : i, j E I) the equilibrium
potential and flow. We have cPi = 0 for i E aD. We can write any flow in
the minimization problem in the form ~ +8, where 8 = (8
ij
: i,j E I) is a
flow with 8
i
= 0 for i E D. Then
L 'Yij8ijai/ = L (¢i - ¢j )8ij = 2 L ¢i
8
i = 0,
i,jEI i,jEI iEI
so
I ( ~ +8) = I ( ~ ) +1(8) ~ 1(8)
with equality only if 8 = o. D
The following reformulation of part (a) of the preceding result states that
harmonic functions minimize energy.
Corollary 4.3.4. Suppose that ¢ = (¢i : i E I) satisfies
{
Q¢ = 0 in D
cP = I in aD.
Then cP is the unique solution to
minimize E (¢)
subject to ¢ = f in aD.
An important feature of electrical networks is that networks with a small
number of external connections look like networks with a small number
of nodes altogether. In fact, given any network, there is always another
network of wires joining the externally connected nodes alone, equivalent
in its response to external flows and potentials.
Let J ~ I. We say that a = (aij : i,j E J) is an effective conductivity on
J if, for all potentials I = (Ii : i E J), the external currents into J when J
is held at potential I are the same for (J, a) as for (I, a). We know that I
determines an equilibrium potential ¢ = (cPi : i E I) by
{
L.jEI(¢i - ¢j)aij : 0 for ~ ~ J
cPi - Ii for 'I, E J.
Then a is an effective conductivity if, for all I, for i E J we have
L(¢i - ¢j)aij = :E(Ji - Ji) aij'
JEI jEJ
For a conductivity matrix a on J, for a potential I = (Ii : i E J) and a flow
8 = (8
ij
: i, j E J) we set
- 1" 2
E(/) = 2 L.J (Ii - Ij) aij,
i,jEJ
1(8) = ~ L 8fj
u
i?·
i,jEJ
4.3 Electrical networks
Theorem 4.3.5. There is a unique effective conductivity a given by
aij = aij + L aik<P{

where for each j E J, q) = : i E I) is the potential defined by
157
{
- = 0
= 8
ij
for i tf- J
for i E J.
(4.11)
Moreover, a is characterized by the Dirichlet variational principle
E(/) = inf E(¢),
<Pi==fi on J
and also by the Thompson variational principle
inf 1(8) = inf
Oi==gi on J {J
gi on
"Yi==
o off J
Proof. Given I = (Ii: i E J), define ¢ = (¢i : i E I) by
<Pi = L 1i<p1
jEJ
then ¢ is the equilibrium potential given by
{
aij(¢i - ¢j) = 0
¢i = Ii
and, by Corollary 4.3.4, ¢ solves
for i fj. J
for i E J,
minimize E(¢)
subject to ¢i = Ii for i E J.
We have, for i E J
Z:=aij<Pj = z:=aijfj + z:=z:=aik<p{fj = z:=aijfjo
JEI jEJ jEJ
In particular, taking I == 1 we obtain
Laij = Laijo
JEI jEJ
158 4. Further theory
Hence we have equality of external currents:
2)cPi - cPj )aij = 'r)1i - Ii )Uij ·
JEI jEJ
Moreover, we also have equality of energies:
L (cPi - cPj)2aij = 2 L cPi L(cPi - cPj)aij
i,jEI iEI JEI
= 2 L Ii LUi -!i)Uij = L Ui _!i)2
Uij
.
iEJ jEJ i,jEJ
Finally, if gij = (Ii - Ij )aij and = (¢i - ¢j )aij, then
L "Yljai/ = L (cPi - cPj)2
aij
i,jEI i,jEI
= L (Ii - Ii )2Uij = L gfjUi/,
i,jEJ i,jEJ
so, by Theorem 4.3.3, for any flow 8 = (8ij : i,j E I) with 8i = gi for i E J
and bi = 0 for i ¢. J, we have
L
-1 > L 2 --1
v··a·· g··a··
- •
i,jEI i,jEJ
o
Effective conductivity is also related to the associated Markov chain
in an interesting way. Define the time spent in J
At = it l{X
s
EJ}ds
and a time-changed process (X by
X
t
= Xr(t)
where
r(t) = inf{s 0 : As > t}.
We obtain by observing whilst in J, and stopping the clock
whilst makes excursions outside J. This is really a transformation
of the jump chain. By applying the strong Markov property to the jump
chain we find that (X is itself a Markov chain, with jump matrix IT
given by
1'fij = 'lrij + L'lrik¢k

for i,j E J,
where
4.4 Brownian motion 159
¢{ = JP>k(X
T
= j)
and T denotes the hitting time of J. See Example 1.4.4. Hence
has Q-matrix given by
Qij = qij + L qik¢{

Since ¢-i = (¢{ : k E I) is the unique solution to (4.11), this shows that
so is the Markov chain on J associated with the effective conduc-
tivitya.
There is much more that one can say, for example in tying up the non-
equilibrium behaviour of Markov chains and electrical networks. More-
over, methods coming from one theory one provide insights into the other.
For an entertaining and illuminating account of the subject, you should
see Random Walks and Electrical Networks by P. G. Doyle and J. L. Snell
(earus Mathematical Monographs 22, Mathematical Association of Amer-
ica, 1984).
4.4 Brownian motion
Imagine a symmetric ran'dom walk in Euclidean space which takes infinitesi-
mal jumps with infinite frequency and you will have some idea of Brownian
motion. It is named after a botanist who observed such a motion when
looking at pollen grains under a microscope. The mathematical object now
called Brownian motion was actually discovered by Wiener, and is also
called the Wiener process.
A discrete approximation to Euclidean space ]Rd is provided by
where c is a large positive number. The simple symmetric random walk
on Zd is a Markov chain which is by now quite familiar. We shall
show that the scaled-down and speeded-up process
X
(c) - -1/2X
t - c ct
is a good approximation to Brownian motion. This provides an elementary
way of thinking about Brownian motion. Also, it makes it reasonable to
160 4. Further theory
suppose that some properties of the random walk carryover to Brownian
motion. At the end of this section we state some results which confirm that
this is true to a remarkable extent.
Why is space rescaled by the square-root of the time-scaling? Well, if
we hope that xi
c
) converges in some sense as c ~ 00 to a non-degenerate
limit, we will at least want IE[lxi
c
) 1
2
] to converge to a non-degenerate limit.
For ct E Z+ we have
so the square-root scaling gives
which is independent of c.
We begin by defining Brownian motion, and then show that this is not
an empty definition; that is to say, Brownian motions exist.
A real-valued random variable is said to have Gaussian distribution with
mean 0 and variance t if it has density function
cPt(X) = (27T"t)-1/2 exp{ _x
2
/2t}.
The fundamental role of Gaussian distributions in probability derives from
the following result.
Theorem 4.4.1 (Central limit theorem). Let Xl, X
2
, ••• be a se-
quence of independent and identically distributed real-valued random vari-
ables with mean 0 and variance t E (0,00). Then, for all bounded continu-
ous functions f, as n ~ 00 we have
We shall take this result and a few other standard properties of the
Gaussian distribution for granted in this section. There are many introduc-
tory texts on probability which give the full details.
A real-valued process (Xt)t;:::o is said to be continuous if
lP({w : t..-+ Xt(w) is continuous}) = 1.
A continuous real-valued process (Bt)t;:::o is called a Brownian motion if
B
o
= 0 and for all 0 = to < tl < ... < t
n
the increments
4.4 Brownian motion
are independent Gaussian random variables of mean 0 and variance
161
The conditions made on ( B t ) t ~ o are enough to determine all the probabil-
ities associated with the process. To put it properly, the law of Brownian
motion, which is a measure on the set of continuous paths, is uniquely de-
termined. However, it is not obvious that there is any such process. We
need the following result.
Theorem 4.4.2 (Wiener's theorem). Brownian motion exists.
Proof. For N = 0,1,2, ... , denote by D
N
the set of integer multiples of
2-
N
in [0,00), and denote by D the union of these sets. Let us say that
(B
t
: t E D
N
) is a Brownian motion indexed by D
N
if B
o
= 0 and for all
o= to < tl < ... < t
n
in D
N
the increments
are independent Gaussian random variables of mean 0 and variance
We suppose given, for each tED, an independent Gaussian random variable
yt of mean 0 and variance 1. For t E Do = Z+ set
then (B
t
: t E Do) is a Brownian motion indexed by Do. We shall show
how to extend this process successively to Brownian motions (B
t
: tEDN )
indexed by D
N
. Then (B
t
: tED) is a Brownian motion indexed by D.
Next we shall show that (B
t
: tED) extends continuously to t E [0,00),
and finally check that the extension is a Brownian motion.
Suppose we have constructed (B
t
: t E D
N
-
1
), a Brownian motion in-
dexed by D
N
-
1
• For t E DN\D
N
-
1
set r = t - 2-
N
S = t +2-
N
so that
s, t E D
N
-
1
and define
Zt = 2-(N+l)/2yt,
Bt = ~ ( B r +Bs) +Zt·
We obtain two new increments:
B
t
- B
r
= ~ (Bs - B
r
) +Zt,
B
s
- B
t
= ~ ( B s - B
r
) - Zt.
162
We compute
4. Further theory
E[(B
t
- B
r
)2] = E[(B
s
- B
t
)2] = +2-(N+I) = 2-
N
,
E[(B
t
- Br)(B
s
- B
t
)] = - 2-(N+I) = o.
The two new increments, being Gaussian, are therefore independent and of
the required variance. Moreover, being constructed from B
s
- B
r
and yt,
they are certainly independent of increments over intervals disjoint from
(r, s). Hence (B
t
: tEDN) is a Brownian motion indexed by D N, as
required. Hence, by induction, we obtain a Brownian motion (B
t
: tED).
For each N denote by the continuous process obtained by
linear interpolation from (B
t
: t E D
N
). Also, set zi
N
) = BiN) - BiN-I).
For t E D
N
-
I
we have zi
N
) = O. For t E DN\D
N
-
I
, by our construction
we have
with yt Gaussian of mean 0 and variance 1. Set
M
N
= sup IziN)I.
tE[O,I]
Then, since interpolates linearly between its values on D
N
, we
obtain
M
N
= sup 2-(N+I)/2Iytl.
tE(DN\DN-l)n[o,l]
There are 2
N
-
I
points in (DN\D
N
-
I
) n [0,1]. So for A > 0 we have
For a random variable X 0 and p > 0 we have the formula
Hence
= 1
00
p,Xp-
1
IP(2(N+l)/2M
N
> 'x)d'x
2
N
-
1
1
00
p,Xp-1IP(IY11 > 'x)d'x = 2
N
-
1
E(IY
1
IP)
4.4 Brownian motion
and hence, for any p > 2
00 00
ELMn= LE(MN )
N=O N=O
163
00 00
:::; L E(MKr)l/P :::; E(IY1IP)1/p L (2(p-2)/2
p
)-N < 00.
N=O N=O
It follows that, with probability 1, as N 00
BiN) = Bi
O
) +zi
1
) +... + zi
N
)
converges uniformly in t E [0,1], and by a similar argument uniformly for
t in any bounded interval. Now BiN) eventually equals B
t
for any tED
and the uniform limit of continuous functions is continuous. Therefore,
(B
t
: tED) has a continuous extension (Bt)t,?:o, as claimed.
It remains to show that the increments of (Bt)t,?:o have the required joint
distribution. But given 0 < t
1
< ... < t
n
we can find sequences (t''k)mEN
in D such that 0 < t
1
< ... < for all m and tf: tk for all k. Set
to = to = o. We know that the increments
are Gaussian of mean 0 and variance
tr - t(f, · · · -
Hence, using continuity of (Bt)t,?:o we can let m 00 to obtain the desired
distribution for the increments
D
Having shown that Brownian motion exists, we now want to show how
it appears as a universal scaling limit of random walks, very/much as the
Gaussian distribution does for sums of independent random variables.
-
Theorem 4.4.3. Let (Xn)n,?:O be a discrete-time real-valued random walk
with steps of mean 0 and variance 0'2 E (0,00). For c > 0 consider the
rescaled process
X
(c) - -1/2X
t - c ct
where the value of X
ct
when ct is not an integer is found by linear interpo-
lation. Then, for all m, for all bounded continuous functions f : jRm jR
and all 0 tl < ... < t
m
, we have
-tE[f(aB
h
,· .. ,aB
tm
)]
as c 00, where (Bt).t,?:o is a Brownian motion.
164 4. Further theory
Proof The claim is that ... , converges weakly to
(aBtl, ... ,aB
trn
) as c 00. In the proof we shall take for granted some
basic properties of weak convergence. First define
X
-(e) - -I/2X
t - e [et]
where (et] denotes the integer part of ct. Then
\(x
(e) X(e)) (X-(e) X-(e))\ -I/2\('V: 'V:)\
t 1 ,..., t
rn
- t 1 ,..., t
rn
:::; e .L [et 1 ] +1, · .. ,.L [et n] +1
where Y
n
denotes the nth step of The right side converges weakly
to 0, so it suffices to prove the claim with replacing xi
e
).
Consider now the increments
U
(e) - X-(e) - X-(e) Z (B B )
k - tk tk-l' k = a tk - tk-l
for k = 1, ... , m. Since .ide) = B
o
= 0 it suffices to show that
(Ui
e
), . .. converges weakly to (ZI, ... ,Zm). Then since both sets of
increments are independent, it suffices to show that converges weakly
to Zk for each k. But
[etk]
= C-
1
/
2
L Y
n
rv (C-
1
/
2
Nk(C)1/2)Nk(C)-1/2(Yi + ... +YN(c»)
n=[etk_-l]+1
where rv denotes identity of distribution and Nk(e) = [etk] - [etk-I]. By
the central limit theorem Nk(e)-I/2(Y
I
+ ... + YN(e)) converges weakly
to (tk - tk_l)-1/2Zk, and (C-
1
/
2
Nk(C)1/2) -t (tk - tk_l)1/2. Hence
converges weakly to Zk, as required. D
To summarize the last two results, we have shown, using special proper-
ties of the Gaussian distribution, that there is a continuous process
with stationary independent increments and such that B
t
is Gaussian of
mean 0 and variance t, for each t o. That was Wiener's theorem. Then,
using the central limit theorem applied to the increments of a rescaled ran-
dom walk, we established a sort of convergence to Brownian motion. There
now follows a series of related remarks.
Note the similarity to the definition of a Poisson process as a right-
continuous integer-valued process starting from 0, having station-
ary independent increments and such that X
t
is Poisson of parameter At
for each t O.
Given d independent Brownian motions (Bi . .. let us con-
sider the JRd-valued process B
t
= (Bi, ... ,Bt). We call a Brownian
4.4 Brownian motion 165
motion in ]Rd. There is a multidimensional version of the central limit the-
orem which leads to a multidimensional version of Theorem 4.4.3, with no
essential change in the proof. Thus if is a random walk in ]Rd with
steps of mean 0 and covariance matrix
and if V is finite, then for all bounded continuous functions f : (]Rd)m ]R,
as c 00 we have
Here are two examples. We might take to be the simple symmetric
random walk in Z3, then V = Alternatively, we might take the compo-
nents of to be three independent simple symmetric random walks
in Z, in which case V = I. Although these are different random walks, once
the difference in variance is taken out, the result shows that in the scaling
limit they behave asymptotically the same. More generally, given a random
walk with a complicated step distribution, it is useful to know that on large
scales all one needs to calculate is the variance (or covariance matrix). All
other aspects of the step distribution become irrelevant as c 00.
The scaling used in Theorem 4.4.3 suggests the following scaling invari-
ance property of Brownian motion (Bt)t>o, which is also easy to check from
the definition. For any c > 0 the proces; defined by
B
(e) - -1/2B
t - C et
is a Brownian motion. Thus Brownian motion appears as a fixed point of the
scaling transformation, which attracts all other finite variance symmetric
random walks as c 00.
The sense in which we have shown that converges to Brownian
motion is very weak, and one can with effort prove stronger forms of con-
vergence. However, what we have proved is strong enough to ensure that
does not converge, in the same sense, to anything else.
The discussion to this point has not really been about the Markov prop-
erty, but rather about processes with independent increments. To remedy
this we must first define Brownian motion starting from x: this is simply
any process such that B
o
= x and (B
t
- is a Brownian mo-
tion (starting from 0). As a limit of Markov chains it is natural to look in
Brownian motion for the structure of a Markov process. By analogy with
continuous-time Markov chains we look for a transition semigroup
166 4. Further theory
and a generator G. For any bounded measurable function f : ]Rd ]R we
have
lEx[J(B
t
)] = lEo[J(x + B
t
)] = { f(x + Y)¢t(Yd··· ¢t(Yd)dYl .. , dYd
JRd
= ( p(t, x, y)f(y)dy
JRd
where
p(t, x, y) = (27rt)-d/2 exp{-Iy - xI
2
/2t}.
This is the transition density for Brownian motion and the transition semi-
group is given by
(Ptf)(x) = ( p(t, x, y)f(y)dy = lEx [f(B
t
)].
JRd
To check the semigroup property PsP
t
= P
s
+
t
we note that
Ex[f(B
s
+
t
)] = Ex [f(B
s
+ (B
s
+
t
- B
s
))]
= Ex [Ptf(B
s
)] = (PsPtf)(x)
where we first took the expectation over the independent increment
B
s
+
t
- B
s
. For t > 0 it is easy to check that
:t p(t, x, y) = x, y)
where
a
2
a
2
= a 2 + · · · + a 2·
Xl x
d
Hence, if f has two bounded derivatives, we have
a
8
(Ptf)(x) = { x, y)f(y)dy
t JRd
= { x, y)f(y)dy
JRd
= ( p(t, x, y)(
JRd
=
as t ! O. This suggests, by analogy with continuous-time chains, that the
generator, a term we have not defined precisely, should be given by

4.4 Brownian motion 167
Where formerly we considered vectors (fi : i E I), now there are functions
f : jRd jR, required to have various degrees of local regularity, such as
measurability and differentiability. Where formerly we considered matrices
P
t
and Q, now we have linear operators on functions: P
t
is an integral
operator, G is a differential operator.
We would like to explain the appearance of the Laplacian by refer-
ence to the random walk approximation. Denote by the simple
symmetric random walk in tl
d
and consider for N == 1,2, ... the rescaled
process
xi
N
) == N-
I
/
2
X
Nt
, t == 0, liN, 2/N, . ...
For a bounded continuous function f : jRd jR, set
(pt(N) f)(x) == IEx[f(Xi
N
))], X E N-
I
/
2
71
d

The closest thing we have to a derivative in t at 0 for (pt(N))t=O,I/N,2/N, ...
is
(
(N) )) _ [ ( (N) ) ((N))]
N PI/Nf - f (x - NIE
x
f X
I
/
N
- f X
o
== NIE
N
l/2
x
[f(N-
I
/
2
Xl) - f(N-
I
/
2
X
O
)]
= (N/2){f(x - N-
I
/
2
) - 2f(x) + f(x +N-
I
/
2
)}.
If we assume that f has two bounded derivatives then, by Taylor's theorem,
as N 00,
f(x - N-
I
/
2
) - 2f(x) + f(x +N-
I
/
2
) == +o(N)),
so
N(pi;Jf - f)(x) -t
We finish by stating some results about Brownian motion which empha-
sise how much of the structure of Markov chains carries over. You will notice
some weasel words creeping in, such as measurable, continuous and differ-
entiable. These are various sorts of local regularity for functions defined on
the state-space jRd. They did not appear for Markov chains because a dis-
crete state-space has no local structure. You might correctly guess that the
proofs would require additional real analysis, relative to the corresponding
results for chains, and a proper measure-theoretic basis for the probabil-
ity. But, this aside, the main ideas are very similar. For further details
see, for example, Probability Theory - an analytic view by D. W. Stroock
(Cambridge University Press, 1993), or Diffusions, Markov Processes and
Martingales, Volume 1: Foundations by L. C. G. Roger.s and David Williams
(Wiley, Chichester, 2nd edition 1994).
First, here is a result on recurrence and transience.
168 4. Further theory
Theorem 4.4.4. Let ( B t ) t ~ o be a Brownian motion in jRd.
(i) If d = 1, then
JP>({t ~ 0: B
t
= O} is unbounded) = 1.
(ii) If d = 2, then
JP>(B
t
= 0 for some t > 0) = 0
but, for any € > 0
JP>( {t ~ 0 : IBt I < €} is unbounded) = 1.
(iii) If d = 3, then for any N < 00
JP>(I B
t
I ~ 00 as t ~ 00) = 1.
It is natural to compare this result with the facts proved in Section 1.6,
that in Z and Z2 the simple symmetric random walk is recurrent, whereas
in Z3 it is transient. The results correspond exactly in dimensions one and
three. In dimension two we see the fact that for continuous state-space
it makes a difference to demand returns to a point or to arbitrarily small
neighbourhoods of a point. If we accept this latter notion of recurrence the
correspondence extends to dimension two.
The invariant measure for Brownian motion is Lebesgue measure dx.
This has infinite total mass so in dimensions one and two Brownian motion
is only null recurrent. So that we can state some results for the positive
recurrent case, we shall consider Brownian motion in jRd projected onto the
torus T
d
= jRd/Zd. In dimension one this just means wrapping the line
round a circle of circumference 1. The invariant measure remains Lebesgue
measure but this now has total mass 1. So the projected process is positive
recurrent and we can expect convergence to equilibrium and ergodic results
corresponding to Theorems 1.8.3 and 1.10.2.
Theorem 4.4.5. Let (Bt)t>o be a Brownian motion in jRd and let
f : jRd ~ jR be a continuous periodic function, so that
f(x +z) = f(x) for all z E Zd.
Then for all x E jRd, as t ~ 00, we have
lEx[f(B
t
)] -t 1= [ f(z)dz
J[O,l]d
and, moreover
JI»x ( ~ it f(Bs)ds -t 1ast -t 00) = 1.
The generator ! ~ of Brownian motion in jRd reappears as it should in
the following martingale characterization of Brownian motion.
4.4 Brownian motion 169
Theorem 4.4.6. Let be a continuous ]Rd-valued random pro-
cess. Write (F
t
)t2::o for the filtration of Then the following are
equivalent:
(i) (X
t
)t2::o is a Brownian motion;
(ii) for all bounded functions f which are twice differentiable with
bounded second derivative, the following process is a martingale:
This result obviously corresponds to Theorem 4.1.2. In case you are unsure,
a continuous time process (M
t
)t2::o is a martingale if it is adapted to the
given filtration (F
t
)t2::o, if JEIMtl < 00 for all t, and
whenever s :::; t and A E F
8

We end with a result on the potentials associated with Brownian motion,
corresponding very closely to Theorem 4.2.3 for Markov chains. These
potentials are identical to those appearing in Newton's theory of gravity,
as we remarked in Section 4.2.
Theorem 4.4.7. Let D be an open set in ]Rd with smooth boundary aD.
Let c : D [0, 00) be measurable and let f : aD [0,00) be continuous.
Set
<jJ(x) = JEx [I
T
c(Bt)dt + f(XT )IT<oo]
where T is the hitting time of aD. Then
(i) ¢ if finite belongs to C
2
(D) n C(D) and satisfies
in D
in aD;
(4.12)
(ii) if 1/J E C
2
(D) n C(D) and satisfies
{
c in D
1/J f in aD
and 'l/J 0, then 'l/J ¢;
(iii) if ¢(x) = 00 for some x, then (4.12) has no finite solution;
(iv) if JPx(T < 00) = 1 for all x, then (4.12) has at most one bounded
solution in C
2
(D) n C(D).
5
Applications
Applications of Markov chains arise in many different areas. Some have
already appeared to illustrate the theory, from games of chance to the
evolution of populations, from calculating the fair price for a random reward
to calculating the probability that an absent-minded professor is caught
without an umbrella. In a real-world problem involving random processes
you should always look for Markov chains. They are often easy to spot.
Once a Markov chain is identified, there is a qualitative theory which limits
the sorts of behaviour that can occur - we know, for example, that every
state is either recurrent or transient. There are also good computational
methods - for hitting probabilities and expected rewards, and for long-run
behaviour via invariant distributions.
In this chapter we shall look at five areas of application in detail: bi-
ological models, queueing models, resource management models, Markov
decision processes and Markov chain Monte Carlo. In each case our aim is
to provide an introduction rather than a systematic account or survey of
the field. References to books for further reading are given in each section.
5.1 Markov chains in biology
Randomness is often an appropriate model for systems of high complex-
ity, such as are often found in biology. We have already illustrated some
aspects of the theory by simple models with a biological interpretation.
See Example 1.1.5 (virus), Exercise 1.1.6 (octopus), Example 1.3.4 (birth-
and-death chain) and Exercise 2.5.1 (bacteria). We are now going to give
5.1 Markov chains in biology 171
some more examples where Markov chains have been used to model bio-
logical processes, in the study of population growth, epidemics and genetic
inheritance. It should be recognised from the start that these models are
simplified and somewhat stylized in order to make them mathematically
tractable. Nevertheless, by providing quantitative understanding of various
phenomena they can provide a useful contribution to science.
Example 5.1.1 (Branching processes)
The original branching process was considered by Galton and Watson in the
1870s while seeking a quantitative explanation for the phenomenon of the
disappearance of family names, even in a growing population. Under the
assumption that each male in a given family had a probability Pk of having
k sons, they wished to determine the probability that after n generations
an individual had no male descendents. The solution to this problem is
explained below.
The basic branching process model has many applications to problems
of population growth, and also to the study of chain reactions in chemistry
and nuclear fission. Suppose at time n = 0 there is one individual, who dies
and is replaced at time n = 1 by a random number of offspring N. Suppose,
next, that these offspring also die and are themselves replaced at time n = 2,
each independently, by a random number of further offspring, having the
same distribution as N, and so on. We can construct the process by taking
for each n E N a sequence of independent random variables (N;:)kEN, each
with the same distribution as N, by setting X
o
= 1 and defining inductively,
for n 2: 1
X
n
= Nf + ... + N
Xn
_
1

Then X
n
gives the size of the population in the nth generation. The process
( X n ) n ~ O is a Markov chain on I = {O, 1,2, ... } with absorbing state o. The
case where P(N = 1) = 1 is trivial so we exclude it. We have
P(X
n
= 0 I X
n
-
1
= i) = P(N = O)i
so if P(N = 0) > 0 then i leads to 0, and every state i 2: 1 is transient. If
P(N = 0) = 0 then P(N 2: 2) > 0, so for i 2: 1, i leads to j for some j > i,
and j does not lead to i, hence i is transient in any case. We deduce that
with probability 1 either X
n
= 0 for some n or X
n
~ 00 as n ~ 00.
Further information on ( X n ) n ~ O is obtained by exploiting "the branching
structure. Consider the probability generating function
00
</J(t) = E(t
N
) = :EtkJP(N = k),
k==O
172 5. Applications
defined for 0 ~ t ~ 1. Conditional on X
n
-
l
= k we have
Xn =Nf+···+Nr
so
and so
00
lE(t
Xn
) = 'LlE(t
Xn
I X
n
-
1
= k)JP>(X
n
-
1
= k) = lE(</>(t)x
n
-
1
).
k==O
Hence, by induction, we find that E(t
Xn
) = ¢(n)(t), where ¢(n) is the n-fold
composition ¢ 0 ... 0 ¢. In principle, this gives the entire distribution of
X
n
, though ¢(n) may be a rather complicated function. Some quantities
are easily deduced: we have
E(X
n
) = lim dd lE(t
Xn
) = lim dd </>(n)(t) = (lim</>'(t)r = p,n,
til t til t til
where J-l = E(N); also
so, since 0 is absorbing, we have
q = P(X
n
= 0 for some n) = lim ¢(n)(o).
n--+-oo
Now ¢(t) is a convex function with ¢(1) = 1. Let us set r = inf{t E [0,1] :
¢(t) = t}, then ¢(r) = r by continuity. Since ¢ is increasing and 0 ~ r, we
have ¢(O) ~ r and, by induction, ¢(n)(o) ~ r for all n, hence q ~ r. On the
other hand
q = lim ¢(n+l) (0) = lim ¢(¢(n) (0)) = ¢(q)
n --+- 00 n --+- 00
so also q 2:: r. Hence q = r. If ¢'(1) > 1 then we must have q < 1, and if
¢'(1) ~ 1 then since either ¢" = 0 or ¢" > 0 everywhere in [0,1) we must
have q = 1. We have shown that the population survives with positive
probability if and only if J-l > 1, where J-l is the mean of the offspring
distribution.
There is a nice connection between branching processes and random
walks. Suppose that in each generation we replace individuals by their off-
spring one at a time, so if X
n
= k then it takes k steps to obtain X
n
+
l
.
5.1 Markov chains in biology 173
The population size then performs a random walk ( Y m ) m ~ O with step dis-
tribution N - 1. Define stopping times To = 0 and, for n 2: 0
Observe that X
n
= YT
n
for all n, and since ( Y m ) m ~ O jumps down by at
most 1 each time, (Xn)n>O hits 0 if and only if (Ym)m>O hits O. Moreover
- -
we can use the strong Markov property and a variation of the argument of
Example 1.4.3 to see that, if
qi = P(Y
m
= 0 for some m I Yo = i)
then qi = qf for all i and so
00
ql = P(N = 0) + L qfP(N = i) = ¢(ql)'
k==l
Now each non-negative solution of this equation provides a non-negative
solution of the hitting probability equations, so we deduce that ql is the
smallest non-negative root of the equation q = ¢(q), in agreement with the
generating function approach.
The classic work in this area is The Theory of Branching Processes by
T. E. Harris (Dover, New York, 1989).
Example 5.1.2 (Epidemics)
Many infectious diseases persist at a low intensity in a population for long
periods. Occasionally a large number of cases arise together and form an
epidemic. This behaviour is to some extent explained by the observation
that the presence of a large number of infected individuals increases the
risk to the rest of the population. The decline of an epidemic can also be
explained by the eventual decline in the number of individuals susceptible
to infection, as infectives either die or recover and are then resistant to fur-
ther infection. However, these naive explanations leave unanswered many
quantitative questions that are important in predicting the behaviour of
epidemics.
In an idealized population we might suppose that all pairs of individu-
als make contact randomly and independently at a common rate, whether
infected or not. For an idealized disease we might suppose t h ~ t on contact
with an infective, individuals themselves become infective and remain so for
an exponential random time, after which they either die or recover. These
two possibilities have identical consequences for the progress of the epi-
demic. This idealized model is obviously unrealistic, but it is the simplest
mathematical model to incorporate the basic features of an epidemic.
174 5. Applications
We denote the number of susceptibles by St and the number of infectives
by It. In the idealized model, X
t
= (St, It) performs a Markov chain on
(Z+)2 with transition rates
q(s,i)(s-i,i+l) = Asi, q(s,i)(s,i-l) = /-li
for some .x, /-l E (0, 00). Since St + It does not increase, we effectively
have a finite state-space. The states (s,O) for s E Z+ are all absorbing
and all the other states are transient; indeed all the communicating classes
are singletons. The epidemic must therefore eventually die out, and the
absorption probabilities give the distribution of the number of susceptibles
who escape infection. We can calculate these probabilities explicitly when
So +1
0
is small.
Of greater concern is the behaviour of an epidemic in a large population,
of size N, say. Let us consider the proportions sf = St/N and if' = It/N
and suppose that .x = v/N, where v is independent of N. Consider now a
sequence of models as N ~ 00 and choose s ~ ~ So and i ~ ~ i
o
. It can
be shown that as N ~ 00 the process (sf', if') converges to the solution
(St, it) of the differential equations
(d/dt)st = -vstit
(d/dt)i
t
= vStit - /-lit
starting from (so, i
o
). Here convergence means that E[I( sf', if') - (St, it) I] ~
ofor all t 2:: o. We will not prove this result, but will give an example of
another easier asymptotic calculation.
Consider the case where So = N -1,1
0
= 1, .x = l/N and /-l = O. This has
the following interpretation: a rumour is begun by a single individual who
tells it to everyone she meets; they in turn pass the rumour on to everyone
they meet. We assume that each individual meets another randomly at
the jump times of a Poisson process of rate 1. How long does it take until
everyone knows the rumour? If i people know the rumour, then N - i do
not, and the rate at which the rumour is passed on is
qi = i(N - i)/N.
The expected time until everyone knows the rumour is then
N-l 1 N-l N N-l(l 1) N-l
1
'" qi = '" . . = '" -=- + -. = 2 '" -=- rv 210gN
~ ~ ~ ( N - ~ ) ~ ~ N - ~ ~ ~
i=l i=l i=l i=l
as N ~ 00. This is not a limit as above but, rather, an asymptotic equiva-
lence. The fact that the expected time grows with N is related to the fact
5.1 Markov chains in biology 175
that we do not scale /0 with N: when the rumour is known by very few or
by almost all, the proportion of 'infectives' changes very slowly.
The final two examples come from population genetics. They represent
an attempt to understand quantitatively the consequences of randomness
in genetic inheritance. The randomness here might derive from the choice
of reproducing individual, in sexual reproduction the choice of partner, or
the choice of parents' alleles retained by their offspring. (The word gene
refers to a particular chromosomal locus; the varieties of genetic material
that can be present at such a locus are known as alleles.) This sort of
study was motivated in the first place by a desire to find mathematical
models of natural selection, and thereby to discriminate between various
competing accounts of the process of evolution. More recently, as scientists
have gained access to the genetic material itself, many more questions of a
statistical nature have arisen. We emphasise that we present only the very
simplest examples in a rich theory, for which we refer the interested reader
to Mathematical Population Genetics by W. J. Ewens (Springer, Berlin,
1979).
Example 5.1.3 (Wright-Fisher model)
This is the discrete-time Markov chain on {O, 1, ... ,m} with transition
probabilities
In each generation there are m alleles, some of type A and some of type a.
The types of alleles in generation n+1 are found by choosing randomly (with
replacement) from the types in generation n. If X
n
denotes the number of
alleles of type A in generation n, then ( X n ) n ~ O is a Markov chain with the
above transition probabilities.
This can be viewed as a model of inheritance for a particular gene with
two alleles A and a. We suppose that each individual has two genes, so the
possibilities are AA, Aa and aa. Let us take m to be even with m = 2k.
Suppose that individuals in the next generation are obtained by mating
randomly chosen individuals from the current generation and that offspring
inherit one allele from each parent. We have to allow that both parents
may be the same, and in particular make no requirement that parents be
of opposite sexes. Then if the generation n is, for example.
AA aA AA AA aa,
then each gene in generation n + 1 is A with probability 7/10 and a with
probability 3/10, all independent. We might, for example, get
aa aA Aa AA AA.
176 5. Applications
The structure of pairs of genes is irrelevant to the Markov chain ( X n ) n ~ O ,
which simply counts the number of alleles of type A.
The communicating classes of ( X n ) n ~ O are {O}, {I, ... ,m - I}, {m}.
States 0 and m are absorbing and {I, ... ,m - 1} is transient. The hit-
ting probabilities for state m (pure AA) are given by
This is obvious when one notes that ( X n ) n ~ O is a martingale; alternatively
one can check that
m
hi = LPijh
j
.
j==o
According to this model, genetic diversity eventually disappears. It is
known, however, that, for p E (0,1), as m ~ 00
lEpm(T) ~ -2m{(1 - p) log(l - p) +plogp}
where T is the hitting time of {O, m}, so in a large population diversity does
not disappear quickly.
Some modifications are possible which model other aspects of genetic
theory. Firstly, it may be that the three genetic types AA, Aa, aa have a
relative selective advantage given by lX, (3, ~ > 0 respectively. This means
that the probability of choosing allele A when X
n
= i is given by
'l/J. - a(ijm)2 + (lj2)(3i(m - i)jm
2
~ - a(ijm)2 + (3i(m - i)jm
2
+ ,((m - i)jm)2
and the transition probabilities are
Secondly, we may allow genes to mutate. Suppose A mutates to a with
probability u and a mutates to A with probability v. Then the probability
of choosing A when X
n
= i is given by
¢i = {i(l - u) + (m - i)v}/m
and
5.1 Markov chains in biology 177
With u, v > 0, the states 0 and m are no longer absorbing, in fact the chain
is irreducible, so attention shifts from hitting probabilities to the invariant
distribution 7r. There is an exact calculation for the mean of 7r: we have
m m
p, = L i7ri = E
7r
(XI) = L 7riEi(X1)
i=O i=O
m m
= 2: m7ri<Pi = 2:{i(l - u) + (m - i)V}7ri = (1 - u)p, + mv - vp,
i=O i=O
so that
J-L = mv/(u +v).
Example 5.1.4 (Moran model)
The Moran model is the birth-and-death chain on {O, 1, ... ,m} with tran-
sition probabilities
Pi,i-l = i(m - i)/m
2
, Pii = (i
2
+ (m - i)2)/m
2
, Pi,i+l = i(m - i)/m
2
.
Here is the genetic interpretation: a population consists of individuals of
two types, a and A; we choose randomly one individual from the population
at time n, and add a new individual of the same type; then we choose, again
randomly, one individual from the population at time n and remove it; so
we obtain the population at time n +1. The same individual may be chosen
each time, both to give birth and to die, in which case there is no change
in the make-up of the population. Now, if X
n
denotes the number of type
A individuals in the population at time n, then ( X n ) n ~ O is a Markov chain
with transition matrix P.
There are some obvious differences from the Wright-Fisher model: firstly,
the Moran model cannot be interpreted in terms of a species where genes
come in pairs, or where individuals have more than one parent; secondly in
the Moran model we only change one individual at a time, not the whole
population. However, the basic Markov chain structure is the same, with
communicating classes {O}, {I, . . . ,m - I}, {m}, absorbing states 0 and m
and transient class {I, ... ,m - I}. The Moran model is reversible, and,
like the Wright-Fisher model, is a martingale. The hitting probabilities are
given by
IPi(X
n
= m for some n) = i/m.
We can also calculate explicitly the mean time to absorption
178 5. Applications
where T is the hitting time of {O, m}. The simplest method is first to fix j
and write down equations for the mean time kf spent in j, starting from i,
before absorption:
kf = bij + (Pi,i-1
k
t-1 +Piikt +Pi,i+1
k
t+1) for i = 1, ... ,m - 1
kg = kin = 0.
Then, for i = 1, ... ,m - 1
so that
. {(i/j)k;
kf = ((m-i)/(m-j))k;
where k; is determined by
for i ~ j
for i ~ j
(
m-
j
-l j-l). m
2
---- -2+-- k ~ =----
m-j j j j(m-j)
which gives k; = m. Hence
m-1 { i ( . ) m-1. }
ki=Lkf=m L m=z. + L ~ ·
j=1 j=1 m J j=i+1 J
As in the Wright-Fisher model, one is really interested in the case where
m is large, and i = pm for some P E (0,1). Then
mp 1 m-1 1
m-
2
k
pm
= (1- p) L m _ . + P L ~ -t -(1- p) log(1- p) - plogp
j==1 J j==mp+1 J
as m ~ 00. So, as m ~ 00
Epm(T) ~ -m
2
{(1 - p) 10g(1 - p) +plogp}.
For the Wright-Fisher model we claimed that
Epm(T) ~ -2m{(1 - p) 10g(1 - p) +plogp}
which has the same functional form in p and differs by a factor of m/2.
This factor is partially explained by the fact that the Moran model deals
5.2 Queues and queueing networks 179
with one individual at a time, whereas the Wright-Fisher model changes
all m at once.
Exercises
5.1.1 Consider a branching process with immigration. This is defined, in
the notation of Example 5.1.1, by
X
n
= N
1
n
+ ... + N
x
n
+ In
n-l
where (In)n'?:o is a sequence of independent Z+-valued random variables
with common generating function 'ljJ(t) = E(t
1n
). Show that, if X
o
= 1,
then
n-l
lE(t
X
n) = ¢(n) ( t) II 1/J (¢(k) ( t) ) .
k==O
In the case where the number of immigrants in each generation is Poisson
of parameter A, and where P(N = 0) = 1 - P and P(N = 1) = p, find the
long-run proportion of time during which the population is zero.
5.1.2 A species of plant comes in three genotypes AA, Aa and aa. A
single plant of genotype Aa is crossed with itself, so that the offspring has
genotype AA, Aa or aa with probabilities 1/4, 1/2 and 1/4. How long on
average does it take to achieve a pure strain, that is, AA or aa? Suppose it
is desired to breed an AA plant. What should you do? How many crosses
would your procedure require, on average?
5.1.3 In the Moran model we may introduce a selective bias by making
it twice as likely that a type a individual is chosen to die, as compared
to a type A individual. Thus in a population of size m containing i type
A individuals, the probability that some type A is chosen to die is now
i/(i + 2(m - i)). Suppose we begin with just one type A. What is the
probability that eventually the whole population is of type A?
5.2 Queues and queueing networks
Queues form in many circumstances and it is important to be able to pre-
dict their behaviour. The basic mathematical model for- queues runs as
follows: there is a succession of customers wanting service; on arrival each
customer must wait until a server is free, giving priority to earlier arrivals;
it is assumed that the times between arrivals are independent random vari-
ables of the same distribution, and the times taken to serve customers are
also independent random variables, of some other distribution. The main
180 5. Applications
quantity of interest is the random process ( X t ) t ~ O recording the number
of customers in the queue at time t. This is always taken to include both
those being served and those waiting to be served.
In cases where inter-arrival times and service times have exponential
distributions, ( X t ) t ~ O turns out to be a continuous-time Markov chain, so
we can answer many questions about the queue. This is the context of
our first six examples. Some further variations on queues of this type have
already appeared in Exercises 3.4.1, 3.6.3, 3.7.1 and 3.7.2.
If the inter-arrival times only are exponential, an analysis is still pos-
sible, by exploiting the memorylessness of the Poisson process of arrivals,
and a certain discrete-time Markov chain embedded in the queue. This is
explained in the final two examples.
In each example we shall aim to describe some salient features of the
queue in terms of the given data of arrival-time and service-time distribu-
tions. We shall find conditions for the stability of the queue, and in the
stable case find means to compute the equilibrium distribution of queue
length. We shall also look at the random times that customers spend wait-
ing and the length of time that servers are continuously busy.
Example 5.2.1 (M/M/I queue)
This is the simplest queue of all. The code means: memoryless inter-arrival
times/memoryless service times/one server. Let us suppose that the inter-
arrival times are exponential of parameter A, and the service times are
exponential of parameter J-l. Then the number of customers in the queue
( X t ) t ~ O evolves as a Markov chain with the following diagram:
J-l A J-l A
III( •• 111( ••
i i + 1
To see this, suppose at time 0 there are i customers in the queue, where
i > O. Denote by T the time taken to serve the first customer and by A
the time of the next arrival. Then the first jump time J
1
is A 1\ T, which is
exponential of parameter A + JL, and XJl = i-I if T < A, XJl = i + 1 if
T > A, which events are independent of J
1
, with probabilities J-l/(A+J-l) and
A/(A+J-l) respectively. If we condition on J
1
= T, then A-J
1
is exponential
of parameter Aand independent of J1: the time already spent waiting for an
arrival is forgotten. Similarly, conditional on J
1
= A, T - J
1
is exponential
of parameter JL and independent of J
1
. The case where i = 0 is simpler
as there is no serving going on. Hence, conditional on XJl = j, ( X t ) t ~ O
5.2 Queues and queueing networks 181
begins afresh from j at time Jl. It follows that ( X t ) t ; ~ o is the claimed
Markov chain. This sort of argument should by now be very familiar and
we shall not spell out the details like this in later examples.
The MIMll queue thus evolves like a random walk, except that it does
not take jumps below o. We deduce that if ,\ > J-l then ( X t ) t ~ O is transient,
that is X
t
~ 00 as t ~ 00. Thus if A > J-l the queue grows without limit
in the long term. When A < J-l, ( X t ) t ~ O is positive recurrent with invariant
distribution
So when A < J-l the average number of customers in the queue in equilibrium
is given by
00 00
lE
7r
{X
t
) = LJP>7r{X
t
;::: i) = L{,\/JL)i = ,\/{JL - ,\).
i=l i=l
Also, the mean time to return to 0 is given by
so the mean length of time that the server is continuously busy is given by
rnO - (llqo) = 1/(J-l - A).
Another quantity of interest is the mean waiting time for a typical customer,
when A < J-l and the queue is in equilibrium. Conditional on finding a queue
of length i on arrival, this is (i + 1)I J-l, so the overall mean waiting time is
A rough check is available here as we can calculate in two ways the expected
total time spent in the queue over an interval of length t: either we multiply
the average queue length by t, or we multiply the mean waiting time by the
expected number of customers At. Either way we get AtlJ-l - A. The first
calculation is exact but we have not fully justified the sec9nd.
Thus, once the queue size is identified as a Markov chain, its behaviour
is largely understood. Even in more complicated examples where exact
calculation is limited, once the Markovian character of the queue is noted
we know what sort of features to look for - transience and recurrence,
convergence to equilibrium, long-run averages, and so on.
182 5. Applications
Example 5.2.2 (M/M/s queue)
This is a variation on the last example where there is one queue but there
are S servers. Let us assume that the arrival rate is A and the service
rate by each server is J-l. Then if i servers are occupied, the first service is
completed at the minimum of i independent exponential times of parameter
J-l. The first service time is therefore exponential of parameter iJ-l. The total
service rate increases to a maximum SJ-l when all servers are working. We
emphasise that the queue size includes those customers who are currently
being served. By an argument similar to the preceding example, the queue
size performs a Markov chain with the following diagram:
A J-l A 2J-l A
•• ••• •••
012
SJ-l A SJ-l A
• •• •••
s s+l
So this time we obtain a birth-and-death chain. It is transient in the
case A > SJ-l and otherwise recurrent. To find an invariant measure we look
at the detailed balance equations
Hence
for i = 0,1, ... ,S
for i = s + 1, S + 2, ....
The queue is therefore positive recurrent when A < SJ-l. There are two cases
when the invariant distribution has a particularly nice form: when S = 1
we are back to Example 5.2.1 and the invariant distribution is geometric of
parameter AIJ-l:
When S = 00 we normalize 1r by taking 1ro = e-)../J-L so that
and the invariant distribution is Poisson of parameter AIJ-l.
The number of arrivals by time t is a Poisson process of rate A. Each
arrival corresponds to an increase in X
t
, and each departure to a decrease.
Let us suppose that A < SJ-l, so there is an invariant distribution, and
consider the queue in equilibrium. The detailed balance equations hold and
is non-explosive, so by Theorem 3.7.3 for any T > 0,
5.2 Queues and queueing networks 183
and ( X T - t ) O ~ t ~ T have the same law. It follows that, in equilibrium, the
number of departures by time t is also a Poisson process of rate A. This is
slightly counter-intuitive, as one might imagine that the departure process
runs in fits and starts depending on the number of servers working. Instead,
it turns out that the process of departures, in equilibrium, is just as regular
as the process of arrivals.
Example 5.2.3 (Telephone exchange)
A variation on the M/M/s queue is to turn away customers who cannot
be served immediately. This might serve as a simple model for a telephone
exchange, where the maximum number of calls that can be connected at
once is s: when the exchange is full, additional calls are lost. The maximum
queue size or buffer size is s and we get the following modified Markov chain
diagram:
A J-L A 2J-L A
• I( • I( •
012
s-l s
We can find the invariant distribution of this finite Markov chain by solving
the detailed balance equations, as in the last example. This time we get a
truncated Poisson distribution
By the ergodic theorem, the long-run proportion of time that the exchange
is full, and hence the long-run proportion of calls that are lost, is given by
This is known as Erlang's formula. Compare this example with the bus
maintenance problem in Exercise 3.7.1.
Example 5.2.4 (Queues in series)
Suppose that customers have two service requirements: they arrive as a
Poisson process of rate A to be seen first by server A, and then by server
184 5. Applications
B. For simplicity we shall assume that the service times are independent
exponentials of parameters Q and j3 respectively. What is the average queue
length at B?
Let us denote the queue length at A by (Xt)t>o and that by B by (¥t)t>o.
- -
Then is simply an MIMll queue. If A > Q, then is transient
so there is eventually always a queue at A and departures form a Poisson
process of rate a. If A < a, then, by the reversibility argument of Example
5.2.2, the process of departures from A is Poisson of rate A, provided queue
A is in equilibrium. The question about queue length at B is not precisely
formulated: it does not specify that the queues should be in equilibrium;
indeed if A Q there is no equilibrium. Nevertheless, we hope you will agree
to treat arrivals at B as a Poisson process of rate Q /\ A. Then, by Example
5.2.1, the average queue length at B when Q /\ A < j3, in equilibrium, is
given by (Q /\ A) 1((3 - (Q /\ A)). If, on the other hand, Q /\ A > (3, then
is transient so the queue at B grows without limit.
There is an equilibrium for both queues if A < Q and A < j3. The
fact that in equilibrium the output from A is Poisson greatly simplifies the
analysis of the two queues in series. For example, the average time taken
by one customer to obtain both services is given by
1/(a - A) + 1/(j3 - A).
Example 5.2.5 (Closed migration process)
Consider, first, a single particle in a finite state-space I which performs
a Markov chain with irreducible Q-matrix Q. We know there is a unique
invariant distribution Jr. We may think of the holding times of this chain
as service times, by a single server at each node i E I.
Let us suppose now that there are N particles in the state-space, which
move as before except that they must queue for service at every node. If
we do not care to distinguish between the particles, we can regard this as
a new process with state-space Y= N
1
, where X
t
= (ni : i E I) if
at time t there are ni particles at state i. !n fact, this new process is
Markov chain. To describe its Q-matrix Q we define a function bi : I --+ I
by
Thus bi adds a particle at i. Then for i -=I j the non-zero transition rates
are given by
n E I, i,j E I.
5.2 Queues and queueing networks 185
Observe that we can write the invariant measure equation 1rQ = 0 in the
form
1ri L qij = L 1rjqji·
j#i j#i
For n = (ni : i E I) we set
1f(n) = II1rfi.
iEI
Then
(5.1)
1f(8
i
n) L q(8
i
n, 8
j
n) = II 1 r ~ k
j#i kEI
(
1r
i
Lqji)
j#i
(
L:: 1rjqji)
j#i
= L 1f(8
j
n)q(8
j
n, 8m).
j#i
Given mEl we can put m = 8
i
n in the last identity whenever mi ~ 1. On
summing the resulting equations we obtain
1f(m) L q(m, n) = L 1f(n)q(n, m)
n#m n#m
so 7r is an invariant measure for Q. The total number of particles is con-
served so Qhas communicating classes
and the unique invariant distribution for the N-particle system is given by
normalizing 7f restricted to eN.
Example 5.2.6 (Open migration process)
We consider a modification of the last example where new customers, or
particles, arrive at each node i E I at rate Ai. We suppose also that
customers receiving service at node i leave the network at rate /-li. Thus
customers enter the network, move from queue to queue according to a
Markov chain and eventually leave, rather like a shopping centre. This
model includes the closed system of the last example and also the queues
186 5. Applications
in series of Example 5.2.4. Let X
t
= (X; : i E I), where X; denotes the
number of customers at node i at time t. Then (Xt)t>o is a Markov chain
in Y= N
I
and the non-zero transition rates are given by
q(n, bin) = Ai, q(bi n , bjn) = qij, q(bjn, n) = /-lj
for n E I and distinct states i, LEI. We shall ass':,me that Ai > 0 for some
i and /-lj > 0 for some j; then Q is irreducible on I.
The system of equations (5.1) for an invariant measure is replaced here
by
1ri (f.1,i + L qi
j
) = Ai + L 1rjqji·
j#i j#i
This system has a unique solution, with 1ri > 0 for all i. This may be seen
by considering the invariant distribution for the extended Q-matrix Q on
I U {8} with off-diagonal entries
q8j = Aj, qij = qij, qi8 = /-li·
On summing the system over i E I we find
L 1rif.1,i = L Ai.
iEI iEI
As in the last example, for n = (ni : i E I) we set
1f(n) = II1 r ~ i .
iEI
Transitions from m E I may be divided into those where a new particle is
added and, for each i E I with mi 2:: 1, those where a particle is moved
from i to somewhere else. We have, for the first sort of transition
1f(m) Lq(m,8j m) = 1f(m) LAj
JEI JEI
= 1f(m) L 1rjf.1,j = L 1f(8j m)q(8j m, m)
JEI JEI
and for the second sort
1f(8
i
n) (q(8in, n) + L q(8
i
n, 8j n))
j#i
= II 1 r ~ k (1ri (f.1,i + L qij))
kEI j#i
= II 1 r ~ k ( Ai + L 1rjq
j
i)
kEI j#i
= 1f(n)q(n, 8
i
n) + L 1f(8j n)q(8j n, 8
i
n).
j#i
5.2 Queues and queueing networks
On summing these equations we obtain
1f(m) L: q(m, n) = L: 1f(n)q(n, m)
n#m n#m
187
so 7r is an invariant measure for Q. If 1ri < 1 for all i then 7r has finite total
mass niEI(l-1ri), otherwise the total mass if infinite. Hence, Qis positive
recurrent if and only if 1ri < 1 for all i, and in that case, in equilibrium, the
individual queue lengths (Xi: i E I) are independent geometric random
variables with
Example 5.2.7 (M/G/! queue)
As we argued in Section 2.4, the Poisson process is the natural probabilistic
model for any uncoordinated stream of discrete events. So we are often justi-
fied in assuming that arrivals to a queue form a Poisson process. In the pre-
ceding examples we also assumed an exponential service-time distribution.
This is desirable because it makes the queue size into a continuous-time
Markov chain, but it is obviously inappropriate in many real-world exam-
ples. The service requirements of customers and the duration of telephone
calls have observable distributions which are generally not exponential. A
better model in this case is the MIGll queue, where G indicates that the
service-time distribution is general.
We can characterize the distribution of a service time T by its distribu-
tion function
F(t) = JP>(T ~ t),
or by its Laplace transform
(The integral written here is the Lebesgue-Stieltjes integral: when T has
a density function f(t) we can replace dF(t) by f(t)dt.) Then the mean
service time J-l is given by
J-l = E(T) = -£'(0+).
To analyse the MIGll queue, we consider the queue size X
n
immediately
following the nth departure. Then
(5.2)
188 5. Applications
where Y
n
denotes the number of arrivals during the nth service time. The
case where X
n
= 0 is different because then we get an extra arrival before
the (n + 1)th service time begins. By the Markov property of the Poisson
process, Y
1
, Y
2
, • •• are independent and identically distributed, so ( X n ) n ~ O
is a discrete-time Markov chain. Indeed, except for visits to 0, ( X n ) n ~ O
behaves as a random walk with jumps Y
n
- 1.
Let Tn denote the nth service time. Then, conditional on Tn = t, Y
n
is
Poisson of parameter At. So
and, indeed, we can compute the probability generating function
A(z) = E(zY
n
) = 1
00
E(zY
n
I Tn = t)dF(t)
= 1
00
e-At(l-Z)dF(t) = £('\(1 - z)).
Set p = E(Y
n
) = AJ-l. We call p the service intensity. Let us suppose
that p < 1. We have
Xn = Xo + (Y1 + · · · + Yn) - n + Zn
where Zn denotes the number of visits of X
n
to 0 before time n. So
E(X
n
) = E(Xo) - n(l - p) +E(Zn).
Take X
o
= 0, then, since X
n
~ 0, we have for all n
o< 1 - p ~ E(Znln).
By the ergodic theorem we know that, as n --+ 00
E(Znln) --+ limo
where mo is the mean return time to o. Hence
mo ~ 1/(1 - p) < 00
showing that ( X n ) n ~ O is positive recurrent.
Suppose now that we start ( X n ) n ~ O with its equilibrium distribution Jr.
Set
00
G(z) = E(zX
n
) = I: 1riZi
i==O
5.2 Queues and queueing networks
then
00
= lE(ZYn+l) (1r
O
Z+ L 1r
i
Z
i
)
i==l
= A(z) (1t"OZ + G(z) -1t"o)
so
(A(z) - z)G(z) = 1t"
o
A(z)(l - z).
By I'Hopital's rule, as z i 1
(A(z) - z)/(l- z) --+ 1- A'(l-) = 1- p.
189
(5.3)
Since G(l) = 1 = A(l), we must therefore have 1t"o = 1 - p, rno = 1/(1 - p)
and
G(z) = (1 - p)(l - z)A(z)/(A(z) - z).
Since A is given explicitly in terms of the service-time distribution, we
can now obtain, in principle, the full equilibrium distribution. The fact
that generating functions work well here is due to the additive structure of
(5.2).
To obtain the mean queue length we differentiate (5.3)
(A(z) - z)G'(z) = (A'(z) - l)G(z) = (1 - p){ A'(z)(l - z) - A(z)},
then substitute for G(z) to obtain
G' (z) = (l-p)A' (z) (1- z) -(l-p)A(z) {(A' (z) -1) (1- z) + A(z) - z} .
(A(z) - z) (A(z) _ Z)2
By l'Hopital's rule:
lim (A'(z)-l)(l-z) + A(z)-z -lim A"(z)(l-z) _ _-_A_"_(l_-_)
zjl (A(Z)-Z)2 - zjl 2(A'(z)-1) (A(z)-z) - 2(1-p)2 ·
Hence
E(X
n
) = G'(l-) = p + A"(l-)/2(1 - p)
=p + A
2
£"(0+)/2(1 - p) = p + A
2
E(T
2
)/2(1 - p).
In the case of the M/M/1 queue p = AIJ-l, E(T
2
) = 2/J-l2 and E(X
n
) =
p/(l - p) = A/(J-l - A), as we found in Example 5.2.1.
190 5. Applications
We shall use generating functions to study two more quantities of interest
- the queueing time of a typical customer and the busy periods of the server.
Consider the queue (Xn)nEZ in equilibrium. Suppose that the customer
who leaves at time 0 has spent time Q queueing to be served, and time T
being served. Then, conditional on Q + T = t, X
o
is Poisson of parameter
>..t, since the customers in the queue at time 0 are precisely those who
arrived during the queueing and service times of the departing customer.
Hence
G(z) = E(e-
A
(Q+T)(l-Z)) = M(A(l- z))L(A(l- z))
where M is the Laplace transform
On substituting for G(z) we obtain the formula
M(w) = (1 - p)w/(w - .x(1 - L(w))).
Differentiation and l'Hopital's rule, as above, lead to a formula for the mean
queueing time
E(Q) = -M'(O+) = .xL"(0+)
2 (1 + AL' (0+)) 2
We now turn to the busy period 8. Consider the Laplace transform
Let T denote the service time of the first customer in the busy period. Then
conditional on T = t, we have
8=t+8
1
+ ... +8N,
where N is the number of customers arriving while the first customer is
served, which is Poisson of parameter At, and where 8
1
,8
2
, ... are inde-
pendent, with the same distribution as 8. Hence
B(w) = 1
00
E(e-
WS
IT = t)dF(t)
= 1
00
e-wte->.t(l-B(w»dF(t) = L(w + .x(1- B(w))).
5.2 Queues and queueing networks 191
Although this is an implicit relation for B(w), we can obtain moments by
differentiation:
E(S) = -B'(O+) = -£'(0+)(1 - AB'(O+)) = Jl(l + AE(S))
so the mean length of the busy period is given by
E(S) = Jl/(l - p).
Example 5.2.8 (M/G/oo queue)
Arrivals at this queue form a Poisson process, of rate A, say. Service times
are independent, with a common distribution function F(t) = lP(T ~ t).
There are infinitely many servers, so all customers in fact receive service
at once. The analysis here is simpler than in the last example because
customers do not interact. Suppose there are no customers at time O.
What, then, is the distribution of the number X
t
being served at time t?
The number Nt of arrivals by time t is a Poisson random variable of
parameter At. We condition on Nt = n and label the times of the n arrivals
randomly by AI, ... ,An. Then, by Theorem 2.4.6, AI, ... ,An are inde-
pendent and uniformly distributed on the interval [0, t]. For each of these
customers, service is incomplete at time t with probability
1 it 1 it
p=- JP>(T>s)ds=- (l-F(s))ds.
tot 0
Hence, conditional on Nt = n, X
t
is binomial of parameters nand p. Then
00
P(X
t
= k) = L P(X
t
= k I Nt = n)P(N
t
= n)
n==O
00
= e->.t(>.pt)k /k! L('\(1 - p)tt-
k
/(n - k)!
n==k
= e-APt(Apt)k /k!
So we have shown that X
t
is Poisson of parameter
,\ it(1 - F(s))ds.
192
Recall that
5. Applications
Hence if E(T) < 00, the queue size has a limiting distribution, which is
Poisson of parameter ,xE(T).
For further reading see Reversibility and Stochastic Networks by F. P.
Kelly (Wiley, Chichester, 1978).
5.3 Markov chains in resource management
Management decisions are always subject to risk because of the uncertainty
of future events. If one can quantify that risk, perhaps on the basis of past
experience, then the determination of the best action will rest on the cal-
culation of probabilities, often involving a Markov chain. Here we present
some examples involving the management of a resource: either the stock
in a warehouse, or the water in a reservoir, or the reserves of an insurance
company. See also Exercise 3.7.1 on the maintenance of unreliable equip-
ment. The statistical problem of estimating transition rates for Markov
chains has already been discussed in Section 1.10.
Example 5.3.1 (Restocking a warehouse)
A warehouse has a capacity of c units of stock. In each time period n, there
is a demand for D
n
units of stock, which is met if possible. We denote
the residual stock at the end of period n by X
n
. The warehouse manager
restocks to capacity for the beginning of period n + 1 whenever X
n
~ m,
for some threshold m. Thus ( X n ) n ~ O satisfies
if X
n
~ m
ifm < X
n
~ c.
Let us assume that D
1
, D
2
, ••• are independent and identically distributed;
then ( X n ) n ~ O is a Markov chain, and, excepting some peculiar demand
structures, is irreducible on {O, 1, ... ,c}. Hence ( X n ) n ~ O has a unique
invariant distribution 7r which determines the long-run proportion of time
in each state. Given that X
n
= i, the expected unmet demand in period
n +1 is given by
if i ~ m
if m < i :s; c.
5.3 Markov chains in resource management
Hence the long-run proportion of demand that is unmet is
c
u(m) = L 7l"i
U

i==O
The long-run frequency of restocking is given by
m
r(m) = L7l"i.
i==O
193
Now as m increases, u(m) decreases and r(m) increases. The warehouse
manager may want to compute these quantities in order to optimize the
long-run cost
ar(m) +bu(m)
where a is the cost of restocking and b is the profit per unit.
There is no general formula for 1r, but once the distribution of the demand
is known, it is a relatively simple matter to write down the (c +1) x (c +1)
transition matrix P for and solve 1rP = 1r subject to 1ri = 1.
We shall discuss in detail a special case where the calculations work out
nicely.
Suppose that the capacity c = 3, so possible threshold values are m =
0,1,2. Suppose that the profit per unit b = 1, and that the demand satisfies
P(D i) = 2-
i
for i = 0,1,2, ....
Then
00 00
lE((D - i)+) = LIP((D - i)+ 2: k) = LIP(D 2: i + k) = 2-
i
.
k=l k=l
The transition matrices for m = 0,1,2 are given, respectively, by
(
1/8 1/8 1/4 1/2) (1/8 1/8 1/4 1/2) (1/8 1/8 1/4 1/2)
1/2 1/2 0 0 1/8 1/8 1/4 1/2 1/8 1/8 1/4 1/2
1/4 1/4 1/2 0 1/4 1/4 1/2 0 1/8 1/8 1/4 1/2
1/8 1/8 1/4 1/2 1/8 1/8 1/4 1/2 1/8 1/8 1/4 1/2
with invariant distributions
(1/4,1/4,1/4,1/4), (1/6,1/6,1/3,1/3), (1/8,1/8,1/4,1/2).
Hence
u(O) = 1/4, u(l) = 1/6, u(2) = 1/8
194
and
5. Applications
r(O) = 1/4, r(l) = 1/3, r(2) = 1/2.
Therefore, to minimize the long-run cost ar(m) +u(m) we should take
{
2 if a ~ 1/4
m = 1 if 1/4 < a ~ 1
o if 1 < a.
Example 5.3.2 (Reservoir model - discrete time)
We are concerned here with a storage facility, for example a reservoir, of
finite capacity c. In each time period n, An units of resource are available
to enter the facility and B
n
units are drawn off. When the reservoir is
full, surplus water is lost. When the reservoir is empty, no water can be
supplied. We assume that newly available resources cannot be used in the
current time period. Then the quantity of water X
n
in the reservoir at the
end of period n satisfies
X
n
+
1
= ((X
n
-Bn+I)+ +A
n
+
l
) Ac.
If we assume that An, B
n
and c are integer-valued and that AI, A
2
, ••• are
independent and identically distributed, likewise B
I
, B
2
, ••• , then ( X n ) n ~ O
is a Markov chain on {O, 1, ... ,c}, whose transition probabilities may be
deduced from the distributions of An and B
n
. Hence we know that the long-
run behaviour of ( X n ) n ~ O is controlled by its unique invariant distribution
7r, assuming irreducibility. For example, the long-run proportion of time
that the reservoir is empty is simply 7ro. So we would like to calculate 7r.
A simplifying assumption which makes some calculations possible is to
assume that consumption in each period is constant, and that our units are
chosen to make this constant 1. Then the infinite capacity model satisfies
a recursion similar to the M/G/l queue:
X
n
+
1
= (X
n
- 1)+ + A
n
+
l
.
Hence, by the argument used in Example 5.2.7, if E(A
n
) < 1, then ( X n ) n ~ O
is positive recurrent and the invariant distribution 7r satisfies
00
L 7riZi = (1 - EA
n
)(1 - z)A(z)j(A(z) - z)
i=O
where A(z) = E(zA
n
). In fact, whether or not E(A
n
) < 1, the equation
00
L lIiZi = (1 - z)A(z)j(A(z) - z)
i=O
5.3 Markov chains in resource management 195
serves to define a positive invariant measure v for To see this,
multiply by A(z) - z and equate powers of z: the resulting equations are
the equilibrium equations for
Vo = lIo(ao +a1) +V1
a
O
i
l/i = l/i+l
a
O + L l/j
a
i-j+1, for i 1
j=O
where ai = P(A
n
= i).
Note that can only enter {O, 1, ... ,c} through c. Hence, by the
strong Markov property, observed whilst in {O, 1, ... ,c} is simply
the finite-capacity model. In the case where IE(A
n
) < 1, we can deduce for
the finite-capacity model that the long-run proportion of time in state i is
given by vi/(vo +... +v
e
). In fact, this is true in general as the equilibrium
equations for the finite-capacity model coincide with those for v up to level
c - 1, and the level c equation is redundant.
In reality, it is to be hoped that, in the long run, supply will exceed
demand, which is true if E(A
n
) > 1. Then is transient, so II must
have infinite total mass. The problem faced by the water company is to
keep the long-run proportion of time 1ro(c) that the reservoir is empty below
a certain acceptable fraction, € > °say. Hence c should be chosen large
enough to make
lIO/(lIo +... +lie) < c
which is always possible in the transient case.
Example 5.3.3 (Reservoir model - continuous time)
Consider a reservoir model where fresh water arrives at the times of
a Poisson process of rate A. The quantities of water 8
1
,8
2
, ... arriving
each time are assumed independent and identically distributed. We assume
that there is a continuous demand for water of rate 1. For a reservoir of
infinite capacity, the quantity of water held is just the stored
work in an M/G/I queue with the same arrival times and service times
8
1
,8
2
,. ... The periods when the reservoir is empty correspond to idle
periods of the queue. Hence in the positive recurrent case where AE(8
n
) <
1, the long-run proportion of time that the reservoir is empty is given by
IE(Sn)/(l - ,xIE(Sn)). Note that can enter [0, c] only through c.
As in the preceding example we can obtain the finite capacity model by
observing whilst in [0, c], but we shall not pursue this here.
The next example is included, in part, because it illustrates a surprising
and powerful connection between reflected random walks and the maxima
196 5. Applications
of random walks, which we now explain. Let Xl, X
2
, • •• denote a sequence
of independent, identically distributed random variables. Set Sn = Xl +
... + X
n
and define ( Z n ) n ~ O by Zo = 0 and
Zn+l = (Zn + X
n
+
l
)+.
Then, by induction, we have
so Zn has the same distribution as M
n
where
Example 5.3.4 (Ruin of an insurance company)
An insurance company receives premiums continuously at a constant rate.
We choose units making this rate 1. The company pays claims at the times
of a Poisson process of rate -X, the claims YI, Y
2
, ••• being independent and
identically distributed. Set p = -XE(Yi) and assume that p < 1. Then in
the long run the company can expect to make a profit of 1 - P per unit
time. However, there is a danger that large claims early on will ruin the
company even though the long-term trend is good.
Denote by Sn the cumulative net loss following the nth claim. Thus
Sn = Xl + .. · + X
n
, where X
n
= Y
n
- Tn and Tn is the nth inter-arrival
time. By the strong law of large numbers
as n ~ 00. The maximum loss that the company will have to sustain is
M= lim M
n
n--+oo
where
By the argument given above, M
n
has the same distribution as Zn, where
Zo = 0 and
Zn+l = (Zn +Y
n
- T
n
)+.
But Zn is the queueing time of the nth customer in the M/G/1 queue with
inter-arrival times Tn and service times Y
n
. We know by Example 5.2.7 that
the queue-length distribution converges to equilibrium. Hence, so does the
5.4 Markov decision processes 197
queueing-time distribution. Also by Example 5.2.7, we know the Laplace
transform of the equilibrium queueing-time distribution. Hence
The probability of eventual bankruptcy is P(M > a), where a denotes the
initial value of the company's assets. In principle, this may now be obtained
by inverting the Laplace transform.
5.4 Markov decision processes
In many contexts costs are incurred at a rate determined by some process
which may best be modelled as a Markov chain. We have seen in Section
1.10 and Section 4.2 how to calculate in these circumstances the long-run
average cost or the expected total cost. Suppose now that we are able to
choose the transition probabilities for each state from a given class and that
our choice determines the cost incurred. The question arises as to how best
to do this to minimize our expected costs.
Example 5.4.1
A random walker on {O, 1, 2, ... } jumps one step to the right with proba-
bility p and one step to the left with probability q = 1 - p. Any value of
p E (0,1] may be chosen, but incurs a cost
c(p) = l/p.
The walker on reaching 0 stays there, incurring no further costs.
If we are only concerned with minimizing costs over the first few time
steps, then the choice p = 1 may be best. However, in the long run the only
way to avoid an infinite total cost is to get to O. Starting from i we must
first hit i-I, then i - 2, and so on. Given the lack of memory in the model,
this makes it reasonable to pick the same value of p throughout, and seek
to minimize ¢(p), the expected total cost starting from 1. The expected
total cost starting from 2 is 2¢(p) since we must first hit 1. Hence
¢(p) = c(p) + 2p¢(p)
so that
¢(p) = { c(p)/(l - 2p) for p < 1/2
00 for p ~ 1/2.
Thus for c(p) = lip the choice p = 1/4 is optimal, with expected cost 8.
The general discussion which follows will make rigorous what we claimed
was reasonable.
198 5. Applications
Generally, let us suppose given some distribution A = (Ai: i E I) and,
for each action a E A, a transition matrix P(a) = (Pij(a) : i,j E I) and
a cost function c(a) = (Ci (a) : i E I). These are the data for a Markov
decision process, though so far we have no process and when we do it will
not in general be Markov. To get a process we must choose a policy, that
is, a way of determining actions by our current knowledge of the process.
Formally, a policy U is a sequence of functions
n=O,1,2, ....
Each policy u determines a probability law pu for a process ( X n ) n ~ O with
values in I by
(i) PU(X
o
= io) = Ai
o
;
(ii) PU(X
n
+
1
=in+11 X
o
=io, ... ,X
n
=i
n
) =Pi
n
i
n
+l(u
n
(i
o
, ... ,in)).
A stationary policy u is a function u : I ~ A. We abuse notation and write
u also for the associated policy given by
Under a stationary policy u, the probability law pu makes (Xn)n>O Markov,
with transition probabilities P"tj = Pij (u(i))· -
We suppose that a cost c(i, a) = ci(a) is incurred when action a is chosen
in state i. Then we associate to a policy u an expected total cost starting
from i, given by
00
VU(i) =E
U
LC(Xn,Un(X
o
, ... ,X
n
)).
n=O
So that this sum is well defined, we assume that c(i, a) ~ 0 for all i and a.
Define also the value function
V*(i) = infVU(i)
U
which is the minimal expected total cost starting from i.
The basic problem of Markov decision theory is how to minimize expected
costs by our choice of policy. The minimum expected cost incurred before
time n = 1 is given by
VI (i) = inf c(i, a).
a
Then the minimum expected cost incurred before time n = 2 is
V2 (i) = i ~ { c(i, a) + LPij(a)V
1
(j)}.
jE!
5.4 Markov decision processes
Define inductively
Vn+l (i) = i ~ f { c(i, a) +LPij(a)VnU)}.
jEI
199
(5.4)
It is easy to see by induction that V
n
(i) ~ V
n
+l (i) for all i, so V
n
(i) increases
to a limit Voo(i), possibly infinite. We have
Vn+l(i) :::; c(i,a) +LPij(a)VnU)
jEI
so, letting n ~ 00 and then minimizing over a,
for all a
Voo(i) :::; i ~ { c(i, a) +LPij(a)VOOU)}.
jEI
(5.5)
It is a reasonable guess that Voo(i), being the limit of minimal expected
costs over finite time intervals, is in fact the value function V*(i). This is
not always true, unless we can show that the inequality (5.5) is actually an
equality. We make three technical assumptions to ensure this. We assume
that
(i) for all i, j the functions Ci : A ~ [0,00) and Pij : A ~ [0,00) are
continuous;
(ii) for all i and all B < 00 the set {a : ci(a) ~ B} is compact;
(iii) for each i, for all but finitely many j, for all a E A we have Pij (a) = 0.
A simple case where (i) and (ii) hold is when A is a finite set. It is easy
to check that the assumptions are valid in Example 5.4.1, with A = (0,1],
ci(a) = l/a and
{
a ifj=i+1
Pij (a) = 0
1
- a if j = i-I
otherwise,
with obvious exceptions at i = O.
Lemma 5.4.2. There is a stationary policy u such that
Voo(i) = c(i, u(i)) + LPij (u(i))VooU)·
jEI
(5.6)
Proof. If Voo(i) = 00 there is nothing to prove, so let us assume that
Voo(i) ~ B < 00. Then
Vn+l(i) = ! ~ {C(i,a) +LPij(a)Vn(j)}
jEJ
200 5. Applications
where K is the compact set {a : c(i, a) ~ B} and where J is the finite set
{j : Pij ¢. O}. Hence, by continuity, the infimum is attained and
V
n
+l(i) = c(i,un(i)) + :EPij(un(i))Vn(j)
jEJ
(5.7)
for some un(i) E K. By compactness there is a convergent subsequence
u
nk
(i) ~ u(i), say, and, on passing to the limit nk ~ 00 in (5.7), we obtain
(5.6). D
Theorem 5.4.3. We have
(i) Vn(i) i V*(i) as n ~ 00 for all i;
(ii) if u * is any stationary policy such that a = u*(i) minimizes
c(i,a) + :EPij (a)V* (j)
jEI
for all i, then u* is optimal, in the sense that
v
u
* (i) = V*(i)
Proof. For any policy u we have
00
for all i.
VU(i) = Ei L c(X
n
, Un (X
o
, · " ,X
n
))
n=O
= c(i,uo(i)) + :EPij(UO(i))vu[i](j)
jEI
where u[i] is the policy given by
Hence we obtain
VU(i) ~ i ~ { c(i, a) + LPij (a)V* (j) }
jEI
and, on taking the infimum over u
V*(i) ~ i ~ { c(i, a) +LPij (a)V* (j) }.
jEI
(5.8)
Certainly, Vo(i) = 0 ~ V*(i). Let us suppose inductively that Vn(i) ~ V*(i)
for all i. Then by substitution in the right sides of (5.4) and (5.8) we find
V
n
+
1
(i) ~ V*(i) and the induction proceeds. Hence Voo(i) ~ V*(i) for all i.
5.4 Markov decision processes
Let u* be any stationary policy for which
Voo(i) ~ c(i,u*(i)) +LPij(U*(i))Voo(j).
jEI
201
We know such a policy exists by Lemma 5.4.2. Then by Theorem 4.2.3 we
have vu* (i) ~ Voo(i) for all i. But V*(i) ~ vu* (i) for all i, so
Voo(i) = V*(i) = v
u
* (i)
and we are done. D
for all i
The theorem just proved shows that the problem of finding a good policy
is much simpler than we might have supposed. For it was not clear at the
outset that there would be a single policy which was optimal for all i, even
less that this policy would be stationary. Moreover, part (i) gives an explicit
way of obtaining the value function V* and, once this is known, part (ii)
identifies an optimal stationary policy.
In practice we may know only an approximation to V*, for example V
n
for n large. We may then hope that, by choosing a = u(i) to minimize
c(i, a) + :EPij(a)Vn(j)
jEI
we get a nearly optimal policy. An alternative means of constructing nearly
optimal policies is sometimes provided by the method of policy improve-
ment. Given one stationary policy u we may define another Ou by the
requirement that a = (Ou)(i) minimizes
c(i,a) +:Epij(a)Vu(j).
jEI
Theorem 5.4.4 (Policy improvement). We have
(i) VeU(i) ~ VU(i) for all i;
(ii) VenU(i) ! V*(i) as n ~ 00 for all i, provided that
for all i. (5.9)
Proof. (i) We have, by Theorem 4.2.3
VU(i) = c(i,u(i)) + LPij(u(i))VU(j)
jEI
~ c(i,Ou(i)) + :EPij(Ou(i))vu(j)
jEI
so VU(i) ~ VeU(i) for all i, by Theorem 4.2.3.
202 5. Applications
(ii) We note from part (i) that
VOU(i) :::; c(i,a) +LPii(a)Vu(j)
jEI
for all i and a. (5.10)
Fix N 0 and consider for n = 0, 1, . .. ,N the process
n-l
M
n
= VON-nU(Xn) +LC(Xk,U*(Xk))'
k=O
Recall the notation for conditional expectation introduced in Section 4.1.
We have
E
U
* (M
n
+
1
I:F
n
) = LPXni (U*(Xn))VoN-n-lu(j) + c(X
n
, u*(X
n
))
jEI
n-l
+ LC(Xk,U*(Xk))
k=O

where we used (5.10) with u replaced by (IN-n-l u, i = X
n
and a = u*(X
n
).
It follows that EU· (M
n
+
1
) EU· (M
n
) for all n. Hence if we assume (5.9),
then
V
9N
U(i) = IEy· (M
o
) IEY· (M
N
)
=Ef(VU(XN)) +E
u
*
D
We have been discussing the minimization of expected total cost, which is
only relevant to the transient case. This is because we will have V* (i) = 00
unless for some stationary policy u, the only states j with positive cost
c(j, u(j)) > 0, accessible from i, are transient. The recurrent case is also
of practical importance and one way to deal with this is to discount costs
at future times by a fixed factor Q E (0,1). We now seek to minimize the
expected total discounted cost
00
V:(i) = EiLOnC(Xn,Un(X
o
, ... ,X
n
)).
n=O
Define the discounted value function
V;(i) = infV:(i).
U
5.4 Markov decision processes 203
In fact, the discounted case reduces to the undiscounted case by intro-
ducing a new absorbing state 8 and defining a new Markov decision process
by
Pij (a) = apij (a),
ci(a) = ci(a),
Pia(a) = 1 - a,
ca(a) = O.
Thus the new process follows the old until, at some geometric time of pa-
rameter a, it jumps to 8 and stays there, incurring no further costs.
Introduce VO,a (i) = 0 and, inductively
Vn+l,o:(i) = i ~ f {c(i, a) + a 2:Pij (a)Vn,o: (in
jEJ
and, given a stationary policy u, define another Oau by the requirement
that a = (Oau)(i) minimizes
c(i, a) + a 2:Pij(a)V
U
U).
jEJ
Theorem 5.4.5. Suppose that the cost function c(i, a) is uniformly
bounded.
(i) We have Vn,a(i) i VC:(i) as n ~ 00 for all i.
(ii) The value function VC: is the unique bounded solution to
V;(i) = i ~ f { c(i, a) + a 2: Pij(a)V;U) }.
jEI
(iii) Let u* be a stationary policy such that a = u*(i) minimizes
c(i,a) +a 2:Pij(a)V;U)
jEI
for all i. Then u* is optimal in the sense that
(5.11)
V:* (i) = VC:(i)
(iv) For all stationary policies u we have
for all i.
as n ~ 00 for all i.
Proof. With obvious notation we have
204 5. Applications
so parts (i), (ii) and (iii) follow directly from Theorems 5.4.3 and 5.4.4,
except for the uniqueness claim in (ii). But given any bounded solution V
to (5.11), there is a stationary policy u such that
V(i) = c(i,u(i)) +0: :Epij(u(i))V(j).
JEI
Then V = V ~ , by Theorem 4.2.5. Then (}au = U so (iv) will show that u is
optimal and V = VC:.
We have c(i, a) ~ B for some B < 00. So for any stationary policy u we
have
00
V:(i) = Ei:E o:nc(x
n
, u(X
n
)) :::; B/(l - 0:)
n=O
and so
iEf* (VU(X
n
)) = anEf* ( V ~ ( X n ) ) ~ Ba
n
/(l- a) ~ 0
as n ~ 00. Hence (iv) also follows from Theorem 5.4.4. D
We finish with a discussion of long-run average costs. Here we are con-
cerned with the limiting behaviour, as n ~ 00, of
We assume that
Ic(i, a)1 ~ B < 00
for all i and a.
This forces IV:(i) I ~ B for all n, but in general the sequence ~ (i) may
fail to converge as n ~ 00. In the case of a stationary strategy u for which
(Xn)n>O has a unique invariant distribution 1r
u
, we know by the ergodic
theorem that
1 n-l
;;: LC(Xk,U(Xk)) ~ L
7r
j
c
U,u(j))
k=O JEI
as n ~ 00, Pi-almost surely, for all i. So ~ (i) does converge in this
case by bounded convergence, with the same limit. This suggests that one
approach to minimizing long-run costs might be to minimize
L 7rjcU, u(j)).
JEI
But, although this is sometimes valid, we do not know in general that the
optimal policy is positive recurrent, or even stationary. Instead, we use a
martingale approach, which is more general.
5.4 Markov decision processes 205
Theorem 5.4.6. Suppose we can find a constant V* and a bounded func-
tion W(i) such that
V* +W(i) = +LPii(a)W(j)}
jEI
for all i. (5.12)
Let u* be any stationary strategy such that a = u*(i) achieves the infimum
in (5.12) for each i. Then
(i) v::* (i) -t V* as n -t 00 for all i;
(ii) v:: (i) V* for all i, for all u.
Proof. Fix a strategy u and set Un = un(Xo, ... ,X
n
). Consider
n-l
M
n
= W(X
n
) - nV* + L C(Xk' Uk).
k=O
Then
EU(Mn+l I F
n
)
= M
n
+ {C(X
n
, Un) +LPXni(Un)W(j)} - (V* +W(X
n
))
jEI
with equality if u = u*. Therefore
So we obtain
v* +2 sup IW(i)l/n.
i
This implies (ii) on letting n 00. When u = u* we also have
v::* (i) V* + 2 sup IW(i)l/n
i
and hence (i). D
The most obvious point of this theorem is that it identifies an optimal
stationary policy when the hypothesis is met. Two further aspects also
deserve comment. Firstly, if u is a stationary policy for which has
an invariant distribution 1r
u
, then
2: 1I"f (V* + W(i)) 2:1I"f (C(i'U(i)) + 2:Pii(U(i))W(j))
iEI iEI jEI
= 2:1I"fc(i,u(i)) + L1I"jW(j)
iEI jEI
206
so
5. Applications
v* ~ L 1rfc(i, u(i))
iEI
with equality if we can take u = u* .
Secondly, there is a connection with the case of discounted costs. Assume
that I is finite and that P(a) is irreducible for all a. Then we can show
that as a i 1 we have
V;(i) = V* /(1 - a) +W(i) +0(1 - a).
On substituting this into (5.11) we find
v* /(1 - a) +W(i) + 0(1 - a)
= i ~ f {C(i' a) + a LPij(a)(V*/(1- a) + W(j) + 0(1 - a))}
jEI
so
V* + W(i) = i ~ f {C(i' a) + a LPij(a)W(j)} + 0(1 - a)
jEI
which brings us back to (5.12) on letting a i 1.
The interested reader is referred to S.11. Ross, Applied Probability Mod-
els with Optimization Applications (Holden-Day, San Francisco, 1970) and
to H. C. Tijms, Stochastic Models - an algorithmic approach (Wiley, Chich-
ester, 1994) for more examples, results and references.
5.5 Markov chain Monte Carlo
Most computers may be instructed to provide a sequence of numbers
Ul = O.Ull U12
U
13 Ulm
U2 = O. U21 U22
U
23 U2m
U3 = 0.U31U32
U
33 U3m
written as decimal expansions of a certain length, which for many purposes
may be regarded as sample values of a sequence of independent random
variables, uniformly distributed on [0,1]:
5.5 Markov chain Monte Carlo 207
We are cautious in our language because, of course, Ul, U2, U3, ... are actu-
ally all integer multiples of 10-
m
and, more seriously, they are usually de-
rived sequentially by some entirely deterministic algorithm in the computer.
Nevertheless, the generators of such pseudo-random numbers are in general
as reliable an imitation as one could wish of U1(w), U
2
(w), U
3
(w), .... This
makes it worth while considering how one might construct Markov chains
from a given sequence of independent uniform random variables, and then
might exploit the observed properties of such processes.
We shall now describe one procedure to simulate a Markov chain ( X n ) n ~ O
with initial distribution A and transition matrix P. Since EiEI Ai = 1 we
can partition [0,1] into disjoint subintervals (Ai: i E I) with lengths
Similarly for each i E I, we can partition [0,1] into disjoint subintervals
(A
ij
: j E I) such that
Now define functions
Go : [0, 1] ~ I,
G : I x [0, 1] ~ I
by
Go(U) = i
G(i,u)=j
if u E Ai,
if u E A
ij
.
Suppose that U
o
, U
1
, U
2
, • •• is a sequence of independent random variables,
uniformly distributed on [0,1], and set
X
o
= Go(U
o
),
X
n
+
1
= G(X
n
, U
n
+
1
) for n ~ 0.
Then
lP(X
o
= i) = lP(U
o
E Ai) = Ai,
lP(Xn +1 = i n +1 I Xo = io, ... ,Xn = in) = JP>(Un +1 E Ainin+l) = Pi
n
i
n
+l
so ( X n ) n ~ O is Markov(A, P).
This simple procedure may be used to investigate empirically those as-
pects of the behaviour of a Markov chain where theoretical calculations
become infeasible.
208 5. Applications
The remainder of this section is devoted to one application of the simu-
lation of Markov chains. It is the application which finds greatest practical
use, especially in statistics, statistical physics and computer science, known
as Markov chain Monte Carlo. Monte Carlo is another name for computer
simulation so this sounds no different from the procedure just discussed.
But what is really meant is simulation by means of Markov chains, the
object of primary interest being the invariant distribution of the Markov
chain and not the chain itself. After a general discussion we shall give two
examples.
The context for Markov chain Monte Carlo is a state-space in product
form
I = II 8m
mEA
where A is a finite set. For the purposes of this discussion we shall also
assume that each component 8
m
is a finite set. A random variable X with
values in I is then a family of component random variables (X(m) : mEA),
where, for each site mEA, X(m) takes values in 8
m
.
We are given a distribution 7r = (7ri : i E I), perhaps up to an unknown
constant multiple, and it is desired to compute the number
(5.13)
for some given function I = (Ii : i E I). The essential point to understand is
that A is typically a large set, making the state-space I very large indeed.
Then certain operations are computationally infeasible - performing the
sum (5.13) state by state for a start.
An alternative approach would be to simulate a large number of inde-
pendent random variables Xl, ... ,X
n
in I, each with distribution 7r, and
to approximate (5.13) by
1 n
;;, L!(Xk).
k=l
The strong law of large numbers guarantees that this is a good approxi-
mation as n ~ 00 and, moreover, one can obtain error estimates which
indicate how large to make n in practice. However, simulation from the
distribution 7r is also difficult, unless 7r has product form
1r(X) = II 1r
m
{x(m)).
mEA
For recall that a computer just simulates sequences of independent U[O, 1]
random variables. When 7r does not have product form, Markov chain
Monte Carlo is sometimes the only way to simulate samples from 7r.
5.5 Markov chain Monte Carlo 209
The basic idea is to simulate a Markov chain ( X n ) n ~ O , which is con-
structed to have invariant distribution Jr. Then, assuming aperiodicity and
irreducibility, we know, by Theorem 1.8.3, that as n ~ 00 the distribution
of X
n
converges to Jr. Indeed, assuming only irreducibility, Theorem 1.10.2
shows that
with probability 1. But why should simulating an entire Markov chain be
easier than simulating a simple distribution Jr? The answer lies in the fact
that the state-space is a product.
Each component Xo(m) of the initial state X
o
is a random variable in 8
m
.
It does not matter crucially what distribution X
o
is given, but we might,
for example, make all components independent. The process ( X n ) n ~ O is
made to evolve by changing components one site at a time. When the
chosen site is m, we simulate a new random variable X
n
+
1
(m) with values
in 8
m
according to a distribution determined by X
n
, and for k =I m we set
Xn+1 ( k) = Xn ( k) . Thus at each step we have only to simulate a random
variable in 8
m
, not one in the much larger space I.
Let us write i ~ j if i and j agree, except possibly at site m. The law for
simulating a new value at site m is described by a transition matrix P(m),
where
pij(m) = 0 unless i ~ j.
We would like Jr to be invariant for P(m). A sufficient condition is that the
detailed balance equations hold: thus for all i, j we want
There are many possible choices for P(m) satisfying these equations. In-
deed, given any stochastic matrix R(m) with
rij(m) = 0 unless i ~ j
we can determine such a P(m) by
for i =I j, and then
pii(m) = 1- 2:Pij(m) ~ O.
j#i
210 5. Applications
This has the following interpretation: if X
n
= i we simulate a new random
variable Y
n
so that Y
n
= j with probability rij(m), then if Y
n
= j we set
with probability (7ririj(m)/7rjrji(m)) 1\ 1
otherwise.
This is called a Hastings algorithm.
There are two commonly used special cases. On taking
for i ~ j
we also find
for i ~ j.
So we simply resample X
n
(m) according to the conditional distribution
under 7r, given the other components. This is called the Gibbs sampler. It
is particularly useful in Bayesian statistics.
On taking rij(m) = rji(m) for all i and j we find
for i ~ j, i =1= j.
This is called a Metropolis algorithm. A particularly simple case would be
to take
rij(m) == l/(N
m
- 1) for i ~ j, i =1= j
where N
m
== ISml. This amounts to choosing another value jm at site m
uniformly at random; if 7rj > 7ri, then we adopt the new value, whereas if
7rj ~ 7ri we adopt the new value with probability 7rj/7ri.
We have not yet specified a rule for deciding which site to visit when.
In practice this may not matter much, provided we keep returning to every
site. For definiteness we mention two possibilities. We might choose to visit
every site once and then repeat, generating a sequence of sites ( m n ) n ~ O .
Then (m
n
, X n ) n ~ O is a Markov chain in A x I. Alternatively, we might
choose a site randomly at each step. Then ( X n ) n ~ O is itself a Markov chain
with transition matrix
P = IAI-
1
:E P(m).
mEA
We shall stick with this second choice, where the analysis is simpler to
present. Let us assume that P is irreducible, which is easy to ensure in the
examples. We know that
5.5 Markov chain Monte Carlo
for all m and all i, j, so also
211
7riPij = 7rjPji
and so 7r is the unique invariant measure for P. Hence, by Theorem 1.10.2,
we have
1 n-1
- L f(Xk) -t L 7rdi
n k=O iEI
as n ~ 00 with probability 1. Thus the algorithm works eventually. In
practice one is concerned with how fast it works, but useful information
of this type cannot be gained in the present general context. Given more
information on the structure of 8
m
and the distribution 7r to be simulated,
much more can be said. We shall not pursue the matter here. It should
also be emphasised that there is an empirical side to simulation: with due
caution informed by the theory, the computer output gives a good idea
of how well we are doing. For further reading we recommend Stochastic
Simulation by B. D. Ripley (Wiley, Chichester, 1987), and Markov Chain
Monte Carlo in practice by W. R. Gilks, S. Richardson and D. J. Spiegelhal-
ter (Chapman and Hall, London, 1996). The recent survey article Bayesian
computation and stochastic systems by J. Besag, P. Green, D. Higdon and
K. Mengersen (Statistical Science, 10 (1), pp. 3-40, 1995) contains many
interesting references. We finish with two examples.
Example 5.5.1 (Bayesian statistics)
In a statistical problem one may be presented with a set of independent
observations Y
1
, . .. ,Y
n
, which it is reasonable to assume are normally dis-
tributed, but with unknown mean /-l and variance 7-
1
. One then seeks
to draw conclusions about /-l and 7 on the basis of the observations. The
Bayesian approach to this problem is to assume that /-l and 7 are themselves
random variables, with a given prior distribution. For example, we might
assume that
/-l rv N(0
0
, cPo
1
), 7 rv r(0:0, ,80),
that is to say, /-l is normal of mean 0
0
and variance cPo
1
, and 7 has gamma
distribution of parameters 0:0 and ,80. The parameters 0
0
, cPo, 0:0 and ,80
are known. Then the prior density for (/-l, 7) is given by
7r(/-l, 7) ex exp{-cPO(/-l- O
o
)2/2}7
ao
-
1
exp{-,807}.
The posterior density for (/-l, 7), which is the conditional density given
the observations, is then given by Bayes' formula
7r(/-l, 7 I y) ex 7r(/-l, 7) f (y I /-l, 7)
ex exp{-¢o(p- (
0
)2/2} exp { -Tt.(Yi -p)2/2} T
ao
-l+
n
/
2
exp{-{jOT}.
212 5. Applications
Note that the posterior density is no longer in product form: the condition-
ing has introduced a dependence between J-l and T. Nevertheless, the full
conditional distributions still have a simple form
1r(pl Y, r) ex exp{-¢o(p- (
0
)2/2} exp { -r t.(Yi - p)2/2} I'.J N(On, ¢;;1),
1r(r I Y, p) ex r
CYo
-l+
n
/
2
exp { -r (,80 +t,(Yi - p)2/2) }I'.J r(a
n
, ,8n)
where
n
an = ao + n/2, ,8n = ,80 + 2)Yi - p)2/2.
i==l
Our final belief about J-l and T is regarded as measured by the posterior
density. We may wish to compute probabilities and expectations. Here the
Gibbs sampler provides a particularly simple approach. Of course, numeri-
cal integration would also be feasible as the dimension is only two. To make
the connection with our general discussion we set
1=8
1
X 8
2
= JR x [0,00).
We wish to simulate X = (J-l, T) with density 7r(J-l, T I y). The fact that JR
and [0, 00) are not finite sets does not affect the basic idea. In any case the
computer will work with finite approximations to JR and [0,00). First we
simulate X
o
, say from the product form density 7r(J-l, T). At the kth stage,
given X
k
= (J-lk, Tk), we first simulate J-lk+1 from 7r(J-l I y, Tk) and then Tk+1
from 7r(T I y,J-lk+1), then set Xk+1 = (J-lk+1,Tk+1). Then ( X k ) k ~ O is a
Markov chain in I with invariant measure 7r(J-l, T I y), and one can show
that
k-1
~ ~ f(Xj ) -t 1f(x)1r(x I y)dx as k -t 00
with probability 1, for all bounded continuous functions f : I ---+ JR. This
is not an immediate consequence of the ergodic theorem for discrete state-
space, but you may find it reasonable at an intuitive level, with a rate of
convergence depending on the smoothness of 7r and f.
We now turn to an elaboration of this example where the Gibbs sampler
is indispensible. The model consists of m copies of the preceding one, with
5.5 Markov chain Monte Carlo 213
different means but a common variance. Thus there are mn independent
observations }!ij, where i = 1, ... n, and j = 1, ... ,m, normally distributed,
with means jjj and common variance 7-
1
. We take these parameters to be
independent random variables as before, with
Let us write jj = (jj1, . .. ,jjn). The prior density is given by
and the posterior density is given by
1r(J-l, T I y) ex exp { -¢o ~ ( J - l j - ( 0 )2/2}
x exp { -Tt.~ ( Y i j - J-lj)2/2} TQo-Hmn/2 exp{-,8oT}.
Hence the full conditional distributions are
where
n m
On = 00 + mn/2, ,8n = f30 + L L(Yij - J-lj)2/2.
i=1 j=1
We can construct approximate samples from 7r(jj, 7 I y), just as in the case
m = 1 discussed above, by a Gibbs sampler method. Note that, conditional
on 7, the means jjj, for j = 1, ... ,m, remain independent. Thus one can
update all the means simultaneously in the Gibbs sampler. This has the.
effect of speeding convergence to the equilibrium distribution. In cases
where m is large, numerical integration of 7r(jj,7 I y) is infeasible, as is
direct simulation from the distribution, so the Markov chain approach is
the only one available.
214 5. Applications
Example 5.5.2 (Ising model and image analysis)
Consider a large box A = AN in 71
2
A = {-N, ... ,-1,0,1, ... ,N}2
with boundary 8A = AN\A
N
-
1
, and the configuration space
For x E A define
H(x) = ! I::(x(m) - x(m,))2
where the sum is taken over all pairs {m, m
/
} ~ A with 1m - m'l = 1. Note
that H(x) is small when the values taken by x at neighbouring sites are
predominantly the same. We write
I+ = {x E I : x(m) = 1 for all m E 8A}
and for each (3 > °define a probability distribution (rr(x) : x E I+) by
1t"(x) ex e-(3H(x).
As (3 ! °the weighting becomes uniform, whereas, as (3 i 00 the mass
concentrates on configurations x where H(x) is small. This is one of the
fundamental models of statistical physics, called the Ising model. A famous
and deep result of Onsager says that if X has distribution 1t", then
In particular, if sinh 2(3 ~ 1, the fact that X is forced to take boundary
values 1 does not significantly affect the distribution of X(O) when N is
large, whereas if sinh 2{3 > 1 there is a residual effect of the boundary
values on X(O), uniformly in N.
Here we consider the problem of simulating the Ising model. Simulations
may sometimes be used to guide further developments in the theory, or even
to detect phenomena quite out of reach of the current theory. In fact, the
Ising model is rather well understood theoretically; but there are many
related models which are not, where simulation is still possible by simple
modifications of the methods presented here.
First we describe a Gibbs sampler. Consider the sets of even and odd
sites
A+ = {(ml' m2) E A : ml +m2 is even},
A- = {(ml' m2) E A : ml +m2 is odd}
and for x E I set
5.5 Markov chain Monte Carlo 215
x± = (x(m) : m E A±).
We can exploit the fact that the conditional distribution 1r(x+ I x-) has
product form
1r(X+ I x-) ex II e/3x(m)s(m)
mEA+\8A
where, for mEA+\8A
s(m) = L x-(m').
Im'-ml=l
Therefore, it is easy to simulate from 1r(x+ I x-) and likewise from
1r(x- I x+). Choose now some simple initial configuration X
o
in 1+. Then
inductively, given X;; = x-, simulate firstly X ~ + l with distribution
1r(. I x-) and then given X ~ + l = x+, simulate X;+l with distribution
1r(. I x+). Then according to our general discussion, for large n, the distri-
bution of X
n
is approximately 1r. Note that we did not use the value of the
normalizing constant
Z = L e-/3H(x)
xEI+
wllich is hard to compute by elementary means when N is large.
An alternative approach is to use a Metropolis algorithm. We can again
exploit the even/odd partition. Given that X
n
= x, independently for each
m E A+\8A, we change the sign of Xt(m) with probability
p(m,x) = (1r(x)/1r(x)) 1\ 1 = e
2
,Bx(m)s(m) 1\ 1
where x ~ x with x(m) = -x(m). Let us call the resulting configuration
Y
n
. Next we apply the corresponding transformation to Y
n
- ( m) for the odd
sites m E A-\8A, to obtain X
n
+
1
. The process ( X n ) n ~ O is then a Markov
chain in 1+ with invariant distribution 1r.
Both methods we have described serve to simulate samples from 1r; there
is little to choose between them. Convergence is fast in the subcritical case
sinh 2,8 < 1, where 1r has an approximate product structure on large scales.
In a Bayesian analysis of two-dimensional images, the Ising model is
sometimes used as a prior. We may encode a digitized image on a two-
dimensional grid as a particular configuration (x(m) : mEA) E I, where
x(m) = 1 for a white pixel and x(m) = -1 for a black pixel. By varying
the parameter ,8 in the Ising model, we vary the tendency of black pixels
216 5. Applications
to clump together; the same for white pixels. Thus (3 is a sort of texture
parameter, which we choose according to the sort of image we expect, thus
obtaining a prior 7r(x). Observations are now made at each site which record
the true pixel, black or white, with probability p E (0,1). The posterior
distribution for X given observations Y is then given by
7r(x I y) ex 7r(x)f(y I x) ex e-
f3H
(x)pa(x,y) (1 - p)d(x,y)
where a(x, y) and d(x, y) are the numbers of sites at which x and y agree and
disagree respectively. 'Cleaned-up' versions of the observed image Y may
now be obtained by simulating from the posterior distribution. Although
this is not exactly the Ising model, the same methods work. We describe
the appropriate Metropolis algorithm: given that X
n
= x, independently
for each m E A+\8A, change the sign of X:(m) with probability
p(m, x, y) = (7r(x I Y)/7r(x I y)) 1\ 1
= e-
2
,6x(m)s(m)((1_ p)/pt(m)y(m)
where x x with x(m) = -x(m). Call the resulting configuration X
n
+
1
/
2
.
Next apply the corresponding transformation to for the odd sites
to obtain X
n
+
1
. Then is a Markov chain in /+ with invariant
distribution 7r(. I y).
6
Appendix: probability and measure
Section 6.1 contains some reminders about countable sets and the discrete
version of measure theory. For much of the book we can do without explicit
mention of more general aspects of measure theory, except an elementary
understanding of Riemann integration or Lebesgue measure. This is because
the state-space is at worst countable. The proofs we have given may be read
on two levels, with or without a measure-theoretic background. When in-
terpreted in terms of measure theory, the proofs are intended to be rigorous.
The basic framework of measure and probability is reviewed in Sections 6.2
and 6.3. Two important results of measure theory, the monotone conver-
gence theorem and Fubini's theorem, are needed a number of times: these
are discussed in Section 6.4. One crucial result which we found impossi-
ble to discuss convincingly without measure theory is the strong Markov
property for continuous-time chains. This is proved in Section 6.5. Finally,
in Section 6.6, we discuss a general technique for determining probability
measures and independence in terms of 1r-systems, which are often more
convenient than a-algebras.
6.1 Countable sets and countable sums
A set I is countable if there is a bijection f : {I, . .. ,n} ~ I for some n E N,-
or a bijection f : N ~ I. In either case we can enumerate all the elements
of I
218 6. Appendix: probability and measure
where in one case the sequence terminates and in the other it does not.
There would have been no loss in generality had we insisted that all our
Markov chains had state-space N or {I, ... ,n} for some n E N: this just
corresponds to a particular choice of the bijection f.
Any subset of a countable set is countable. Any finite cartesian product
of countable sets is countable, for example tl
n
for any n. Any countable
union of countable sets is countable. The set of all subsets of N is uncount-
able and so is the set of real numbers JR.
We need the following basic fact.
Lemma 6.1.1. Let I be a countably infinite set and let Ai ~ 0 for all i E I.
Then, for any two enumerations of I
~ 1 , ~ 2 , ~ 3 , ••• ,
we have
00 00
LAin = LAin'
n=l n=l
Proof. Given any N E N we can find M ~ Nand N' ~ M such that
Then
N M N'
""" A· < """ A· < """ A·
L.-J 't n - L.-J In - L.-J 't
n
n=l n=l n=l
and the result follows on letting N ~ 00. D
Since the value of the sum does not depend on the enumeration we are
justified in using a notation which does not specify an enumeration and
write simply
More generally, if we allow Ai to take negative values, then we can set
where
6.1 Countable sets and countable sums 219
allowing that the sum over I is undefined when the sums over I+ and I-
are both infinite. There is no difficulty in showing for Ai, jji 2 0 that
I)Ai + Pi) = LAi + LPi.
iEI iEI iEI
By induction, for any finite set J and for Aij 2 0, we have
L (LAi
j
) = L (LAij).
iEI jEJ jEJ iEI
The following two results on sums are simple versions of fundamental
results for integrals. We take the opportunity to prove these simple versions
in order to convey some intuition relevant to the general case.
Lemma 6.1.2 (Fubini's theorem - discrete case). Let I and J be
countable sets and let Aij 2 0 for all i E I and j E J. Then
L (LAi
j
) = L (LAij).
iEI jEJ jEJ iEI
Proof. Let jl,j2,j3, ... be an enumeration of J. Then
as n ~ 00. Hence
and the result follows by symmetry. D
Lemma 6.1.3 (Monotone convergence - discrete case). Suppose for
each i E I we are given an increasing sequence ( A i ( n ) ) n ~ O with limit Ai, .
and that Ai ( n) 2 0 for all i and n. Then
LAi(n) i LAi as n-t 00.
iEI iEI
220 6. Appendix: probability and measure
Proof. Set 8
i
(1) = Ai(l) and for n ~ 2 set
Then 8
i
(n) ~ 0 for all i and n, so as n ~ 00, by Fubini's theorem
~ A i ( n ) = ~ (t,8
i
(k))
= t. ( ~ 8 i ( k ) ) i t. ( ~ 8 i ( k ) )
= L(f
8i
(k)) = LAi
o
iEI k=l iEI
6.2 Basic facts of measure theory
D
We state here for easy reference the basic definitions and results of measure
theory. Let E be a set. A a-algebra £ on E is a set of subsets of E satisfying
(i) 0 E £;
(ii) A E £ =* AC E £;
(iii) (An E £,n E N) =* Un An E £.
Here AC denotes the complement E\A of A in E. Thus £ is closed under
countable set operations. The pair (E, £) is called a measurable space. A
measure J-l on (E, £) is a function J-l : £ ~ [0,00] which has the following
countable additivity property:
The triple (E, £, J-l) is called a measure space. If there exist sets En E £,
n E N with Un En = E and J-l(E
n
) < 00 for all n, then we say J-l is a-finite.
Example 6.2.1
Let I be a countable set and denote by I the set of all subsets of I. Recall
that A = (Ai: i E I) is a measure in the sense of Section 1.1 if Ai E [0,00)
for all i. For such A we obtain a measure on the measurable space (I,I) by
setting
In fact, we obtain in this way all a-finite measures J-l on (I,I).
6.2 Basic facts of measure theory 221
Example 6.2.2
Let A be any set of subsets of E. The set of all subsets of E is a a-
algebra containing A. The intersection of any collection of a-algebras is
again a a-algebra. The collection of a-algebras containing A is therefore
non-empty and its intersection is a a-algebra a(A), which is called the
a-algebra generated by A.
Example 6.2.3
In the preceding example take E = JR and
A = {(a,b): a,b E JR,a < b}.
The a-algebra B generated by A is called the Borel a-algebra of JR. It can
be shown that there is a unique measure J.-t on (JR, B) such that
J.-t(a, b) = b - a for all a, b.
This measure J.-t is called Lebesgue measure.
Let (E
1
, £1) and (E
2
, £2) be measurable spaces. A function f : E
1
~ E
2
is measurable if f-1(A) E £1 whenever A E £2. When the range E
2
= JR we
take £2 = B by default. When the range E
2
is a countable set I we take £2
to be the set of all subsets I by default.
Let (E, £) be a measllrable space. We denote by m£ the set of measurable
functions f : E ~ JR. Then m£ is a vector space. We denote by m£+ the
set of measurable functions f : E ~ [0, 00], where we take on [0, 00] the
a-algebra generated by the open intervals (a, b). Then m£+ is a cone
(f, 9 E m£+ ,0:, (3 ~ 0) ~ o:f +{3g E m£+.
Also, m£+ is closed under countable suprema:
(fi E m£+,i E I) ~ SUpfi E m£+.
i
It follows that, for a sequence of functions f n E m£+, both limsUPn f nand
liminf
n
fn are in m£+, and so is limn fn when this exists. It can be shown
that there is a unique map ji : m£+ ~ [0,00] such that
(i) ji(lA) = J.-t(A) for all A E £;
(ii) ji(o:f +(3g) = o:ji(f) +(3ji(f) for all f,g E m£+, 0:, {3 ~ 0;
(iii) (fn E m£+, n E N) ~ ji(En fn) = En ji(fn).
222 6. Appendix: probability and measure
For f E mE, set f± = (±f) V 0, then f+,f- E m£+, f = f+ - f- and
IfI = f+ + f-· If jt(lfl) < 00 then f is said to be integrable and we set
We call ji(f) the integral of f. It is conventional to drop the tilde and
denote the integral by one of the following alternative notations:
p(J) = r fdp = r f(x)p(dx).
lE lXEE
In the case of Lebesgue measure jj, one usually writes simply
r f(x)dx.
lXEJR
6.3 Probability spaces and expectation
The basic apparatus for modelling randomness is a probability space
(0, F, P). This is simply a measure space with total mass P(O) = 1. Thus
F is a a-algebra of subsets of 0 and P : F ~ [0,1] satisfies
(i) P(O) = 1;
(ii) P(AI n A
2
) = P(A
I
) +P(A
2
) for AI, A
2
disjoint;
(iii) P(A
n
) i P(A) whenever An i A.
In (iii) we write An i A to mean Al ~ An ~ ... with Un An = A. A
measurable function X defined on (0, F) is called a random variable. We
use random variables Y : 0 ~ lR to model random quantities, where for a
Borel set B ~ lR the probability that Y E B is given by
P(Y E B) = P({w: Y(w) E B}).
Similarly, given a countable state-space I, a random variable X : 0 ~ I
models a random state, with distribution
Ai = P(X = i) = p({w : X(w) = i}).
To every non-negative or integrable real-valued random variable Y is asso-
ciated an average value or expectation E(Y), which is the integral of Y with
respect to P. Thus we have
(i) E(IA) = P(A) for A E F;
(ii) E(oX +(3Y) = oE(X) +(3E(Y) for X, Y E mF+, o,{3 ~ 0;
6.4 Monotone convergence and Fubini's theorem 223
(iii) (Y
n
E mF+, n E N, Y
n
i Y) :::} IE(Y
n
) i IE(Y).
When X is a random variable with values in I and f : I [0,00] the
expectation of Y = f(X) = foX is given explicitly by
E(J(X)) = LAdi
iEI
where A is the distribution of X. For a real-valued random variable Y the
probabilities are sometimes given by a measurable density function p in
terms of Lebesgue measure:
P(Y E B) = Lp(y)dy.
Then for any measurable function f : lR [0,00] there is an explicit formula
E(J(Y)) = Lf(y)p(y)dy.
6.4 Monotone convergence and Fubini's theorem
Here are the two theorems from measure theory that come into play in
the main text. First we shall state the theorems, then we shall discuss
some places where they are used. Proofs may be found, for example, in
Probability with Martingales by D. Williams (Cambridge University Press,
1991).
Theorem 6.4.1 (Monotone convergence). Let (E, £, J-t) be a measure
space and let be a sequence of non-negative measurable functions.
Then, as n 00
(fn(x) i f(x) for all x E E) :::} J-t(fn) i J-t(f)·
Theorem 6.4.2 (Fubini's theorem). Let (E
1
, £1, J-l1) and (E
2
, £2, J-l2) be
two a-finite measure spaces. Suppose that f : E
1
x E
2
[0, 00] satisfies
(i) x f(x, y) : E
1
[0,00] is £1 measurable for all Y E E
2
;
(ii) Y IXEE
1
f(x, y)J-t1(dx) : E
2
[0,00] is £2 measurable.
Then
(a) y f(x, y) : E
2
[0,00] is £2 measurable for all x E E
1
;
(b) x f
yE
E
2
f(x, Y)J-l2(dy) : E
1
[0,00] is £1 measurable;
(c) r (1 (r
JxEE
l
yEE2 ') yEE2 JxEE
l
')
224 6. Appendix: probability and measure
The measurability conditions in the above theorems rarely need much
consideration. They are powerful results and very easy to use. There is
an equivalent formulation of monotone convergence in terms of sums: for
non-negative measurable functions 9n we have
To see this just take .fn = 91 +... +9n. This form of monotone convergence
has already appeared in Section 6.2 as a defining property of the integral.
This is also a special case of Fubini's theorem, provided that (E, £, J-t) is
a-finite: just take E
2
= {I, 2, 3, ... } and J-t2( {n}) = 1 for all n.
We used monotone convergence in Theorem 1.10.1 to see that for a non-
negative random variable Y we have
IE(Y) = lim IE(Y /\ N).
N--+oo
We used monotone convergence in Theorem 2.3.2 to see that for random
variables Sn ~ 0 we have
E(LSn) = LE(Sn)
n n
and
E(ex
p
{- LSn}) = E ( J ~ = ex
p
{- L Sn})
n n ~ N
=J ~ = E(ex
p
{ - L Sn}).
n ~ N
In the last application convergence is not monotone increasing but mono-
tone decreasing. But if 0 ~ X
n
~ Y and X
n
! X then Y - X
n
i Y - X.
So IE(Y - X
n
) i IE(Y - X) and if IE(Y) < 00 we can deduce IE(X
n
) ! IE(X).
Fubini's theorem is used in Theorem 3.4.2 to see that
Thus we have taken (E
1
, £1, J-t1) to be [0,00) with Lebesgue measure and
(E
2
, £2, J-t2) to be the probability space with the measure Pi.
6.5 Stopping times and the strong Markov property
The strong Markov property for continuous-time Markov chains cannot
properly be understood without measure theory. The problem lies with the
6.5 Stopping times and the strong Markov property 225
notion of 'depending only on', which in measure theory is made precise as
measurability with respect to some a-algebra. Without measure theory the
statement that a set A depends only on (X
s
: s t) does not have a precise
meaning. Of course, if the dependence is reasonably explicit we can exhibit
it, but then, in general, in what terms would you require the dependence
to be exhibited? So in this section we shall give a precise measure-theoretic
account of the strong Markov property.
Let be a right-continuous process with values in a countable set
I. Denote by F
t
the a-algebra generated by {X
s
: s t}, that is to say,
by all sets {X
s
= i} for s t and i E I. We say that a random variable T
with values in [0,00] is a stopping time of if {T t} E F
t
for all
t O. Note that this certainly implies
{T<t}=U{T::;t-l/n}EF
t
forall
n
We define for stopping times T
F
T
= {A E F : A n {T t} E F
t
for all t O}.
This turns out to be the correct way to make precise the notion of sets
which 'depend only on {X
t
: t T}'.
Lemma 6.5.1. Let Sand T be stopping times of Then both X
T
and {S T} are FT-measurable.
Proof. Since is right-continuous, on {T < t} there exists an n 0
such that for all m n, for some k 1, (k - 1)2-
m
T < k2-
m
t and
X
k2
-rn = X
T
. Hence
so X
T
is FT-measurable.
We have
U

so {S > T} E Fr, and so {S T} E Fr· D
226 6. Appendix: probability and measure
Lemma 6.5.2. For all m 0, the jump time J
m
is a stopping time of

Proof. Obviously, Jo = 0 is a stopping time. Assume inductively that J
m
is a stopping time. Then
{J
m
+! t} = U {J
m
s} n {X
s
=1= X
J71
J E F
t

for all t 0, so J
m
+
1
is a stopping time and the induction proceeds. D
We denote by Qm the a-algebra generated by yo,··· ,Y
m
and 8
1
, ... ,8
m
,
that is, by events of the form {Y
k
= i} for k m and i E I or of the form
{8k > s} for k m and s > o.
Lemma 6.5.3. Let T be a stopping time of and let A EFT.
Then for all m 0 there exist a random variable Tm and a set Am, both
measurable with respect to Qm, such that T = T
m
and lA = lA
Tn
on
{T < J
m
+
1
}.
Proof. Fix t 0 and consider
Since Qm is a a-algebra, so is At. For s t we have
{X
s
= i} n {t < J
m
+1}
= (D\Yk =i,Jk s < Jk+l}U{Ym =i,Jm s}) n{t< Jm+d
k=O
so {X
s
= i} E At. Since these sets generate F
t
, this implies that At = Ft·
For T a stopping time and A E FT we have B(t) := {T :s; t} E F
t
and
A(t) := An {T t} E F
t
for all t O. So we can find Bm(t), Am(t) E Qm
such that
B(t) n {T < J
m
+
1
} = Bm(t) n {T < J
m
+1},
A(t) n {T < J
m
+
1
} = Am(t) n {T < J
m
+
1
}.
Set
Am = UAm(t)
tEQ
then T
m
and Am are Qm-measurable and
Tml{T<J
Tn
+l} = suptlB
Tn
(t)n{T<J
Tn
+l}
tEQ
= (sup l{T<J
Tn
+l} = Tl{T<J
Tn
+l}
tEQ
6.5 Stopping times and the strong Markov property 227
and
Am n {T < Jm+d = UAm(t) n {T < Jm+d
tEQ
= U(A n {T t}) n {T < Jm+d = An {T < Jm+d
tEQ
as required. D
Theorem 6.5.4 (Strong Markov property). Let be
Markov(A, Q) and let T be a stopping time of Then, conditional
on T < ( and X
T
= i, is Markov(8
i
, Q) and independent of FT.
Proof On {T < (} set X
t
= X
T
+
t
and denote by the jump chain
and by the holding times of We have to show that, for all
A EFT, all io, ... ,in E I and all S1, ... ,Sn 0
IF({Yo = io, ... , Y
n
= in, 8
1
> S1, ... ,8
n
> sn} nAn {T < (} n{X
T
= i})
= lFi(Y
O
= io, ... , Y
n
= in, 8
1
> S1, ... ,8
n
> sn)
X IF(A n {T < (} n {X
T
= i}).
It suffices to prove this with {T < (} replaced by {J
m
T < J
m
+
1
}
for all m 0 and then sum over m. By Lemmas 6.5.1 and 6.5.2,
{J
m
T} n {X
T
= i} E FT so we may assume without loss of generality
that A {J
m
T} n {X
T
= i}. By Lemma 6.5.3 we can write T = T
m
and 1A = 1A
Tn
on {T < J
m
+
1
}, where T
m
and Am are Qm-measurable.
. 8
1
: 82
:c: .:c: .:
o
8m +1 :
i
T
On {J
m
T < J
m
+
1
} we have, as shown in the diagram
228 6. Appendix: probability and measure
Now, conditional on Y
m
= i, 8
m
+
1
is independent of gm and hence of
T
m
- J
m
and Am and, by the memoryless property of the exponential
Hence, by the Markov property of the jump chain
P({Yo = io, ,Y
n
= in,
8
1
> 81, ,8
n
> 8
n
} nAn {J
m
~ T < J
m
+
1
} n {X
T
= i})
= p({Y
m
= io,· .. , Y
m
+
n
= in, 8
m
+
1
> 81 +(T
m
- J
m
),
8
m
+
2
> 82, ,8
m
+
n
> 8 n} n Am n {8m+
1
> T
m
- J
m
})
= Pi(Y
o
= io, ,Y
n
= in,
8
1
> 81, ,8
n
> 8
n
)P(A n {J
m
~ T < J
m
+
1
} n {X
T
= i})
as required. D
6.6 Uniqueness of probabilities and independence of a-algebras
For both discrete-time and continuous-time Markov chains we have given
definitions which specify the probabilities of certain events determined by
the process. From these specified probabilities we have often deduced ex-
plicitly the values of other probabilities, for example hitting probabilities.
In this section we shall show, in measure-theoretic terms, that our defini-
tions determine the probabilities of all events depending on the process.
The constructive approach we have taken should make this seem obvious,
but it is illuminating to see what has to be done.
Let 0 be a set. A 7r-system A on 0 is a collection of subsets of 0 which
is closed under finite intersections; thus
We denote as usual by a(A) the a-algebra generated by A. If a(A) = :F
we say that A generates :F.
Theorem 6.6.1. Let (O,:F) be a measurable space. Let PI and P
2
be
probability measures on (0, F) which agree on a 7r-system A generating :F.
Then PI = P
2
·
Proof. Consider
6.6 Uniqueness of probabilities and independence of a-algebras 229
We have assumed that A ~ V. Moreover, since PI and P
2
are probability
measures, V has the following properties:
(i) 0 E V;
(ii) (A, B E V and A ~ B) :::} B\A E V;
(iii) (An E V, An i A) :::} A E V.
Any collection of subsets having these properties is called a d-system. Since
A generates F, the result now follows from the following lemma. D
Lemma 6.6.2 (Dynkin's 1r-system lemma). Let A be a 1r-system and
let V be a d-system. Suppose A ~ V. Then a(A) ~ V.
Proof. Any intersection of d-systems is again a d-system, so we may without
loss assume that V is the smallest d-system containing A. You may easily
check that any d-system which is also a 1r-system is necessarily a a-algebra,
so it suffices to show V is a 1r-system. This we do in two stages.
Consider first
VI = {A E V : A n B E V for all B E A}.
Since A is a 1r-system, A ~ VI. You may easily check that VI is ad-system
- because V is a d-system. Since V is the smallest d-system containing A,
this shows VI = V.
Next consider
V
2
= {A E V : A n B E V for all B E V}.
Since VI = V, A ~ V
2
. You can easily check that V
2
is also ad-system.
Hence also V
2
= V. But this shows V is a 1r-system. D
The notion of independence used in advanced probability is the indepen-
dence of a-algebras. Suppose that (0, F, P) is a probability space and F
1
and F
2
are sub-a-algebras of F. We say that Fl and F
2
are independent if
The usual means of establishing such independence is the following corollary
of Theorem 6.6.1.
Theorem 6.6.3. Let Al be a 1r-system generating F
1
and let A2 be a
1r-system generating F2. Suppose that
Then Fl and F
2
are independent.
230 6. Appendix: probability and measure
Proof. There are two steps. First fix A
2
E A
2
with P(A
2
) > 0 and consider
the probability measure
We have assumed that P(A) = P(A) for all A E AI, so, by Theorem 6.6.1,
jp> = P on :Fl. Next fix Al E :F1 with P(A
1
) > 0 and consider the probability
measure
We showed in first step that P(A) = P(A) for all A E A2, so, by
Theorem 6.6.1, P = P on :F
2
. Hence:F1 and :F
2
are independent. D
We now review some points in the main text where Theorems 6.6.1 and
6.6.3 are relevant.
In Theorem 1.1.1 we showed that our definition of a discrete-time Markov
chain with initial distribution ,\ and transition matrix P deter-
mines the probabilities of all events of the form
But subsequently we made explicit calculations for probabilities of events
which were not of this form - such as the event that visits a set
of states A. We note now that the events {X
o
= io, ... ,X
n
= in} form a
1r-system which generates the a-algebra a(X
n
: n 0). Hence, by Theorem
6.6.1, our definition determines (in principle) the probabilities of all events
in this a-algebra.
In our general discussion of continuous-time random processes in Section
2.2 we claimed that for a right-continuous process the probabilities
of events of the form
for all n 0 determined the probabilities of all events depending on
Now events of the form {X
to
= io, ... ,X
tn
= in} form a 1r-system which
generates the a-algebra a(X
t
: t 0). So Theorem 6.6.1 justifies (a precise
version) of this claim. The point about right-continuity is that without
such an assumption an event such as
{X
t
= i for some t > O}
which might reasonably be considered to depend on is not nec-
essarily measurable with respect to a(X
t
: t 0). An argument given in
6.6 Uniqueness of probabilities and independence of a-algebras 231
Section 2.2 shows that this event is measurable in the right-continuous case.
We conclude that, without some assumption like right-continuity, general
continuous-time processes are unreasonable.
Consider now the method of describing a minimal right-continuous pro-
cess via its jump process and holding times Let
us take F = a(X
t
: t 0). Then Lemmas 6.5.1 and 6.5.2 show that
and are F-measurable. Thus 9 F where
9 =
On the other hand, for all i E I
{X
t
= i} = U{I
n
t < In+d n {Y
n
= i} E g,

so also F C g.
A useful1r-system generating 9 is given by sets of the form
B = {Yo = i
o
, .. · ,Y
n
= i
n
,S1 > S1,·.· ,Sn > sn}.
Our jump chain/holding time definition of the continuous-time chain
with initial distribution .x and generator matrix Q may be read
as stating that, for such events
l1J)(B) - \ . I'fr.. I'fr. • e-qio81 e-qin-18n
C - A'tO"'tO'tl ••• "'tn-l't
n
••• •
Then, by Theorem 6.6.1, this definition determines JP> on 9 and hence on F.
Finally, we consider the strong Markov property, Theorem 6.5.4. Assume
that is Markov(.x, Q) and that T is a stopping time of On
the set n= {T < (} define X
t
= X
T
+
t
and let j = a(X
t
: t 0); write
and for the jump chain and holding times of and
set
9=
Thus F and 9are a-algebras on n, and coincide by the same argument as
for F = g. Set
B= {Yo = i
o
, ... ,Y
n
= i
n
,S1 > S1, .. · ,Sn > sn}.
Then the conclusion of the strong Markov property states that
JP>(B I T < (, X
T
= i) = lPi(B)
with B as above, and that
JP>(C n A IT < (,XT = i) = JP>(C IT < (,X
t
= i)JP>(A IT < (,XT = i)
for all C E F and A E FT. By 6.6.3 it suffices to prove the
independence assertion for the case C = B, which is what we did in the
proof of Theorem 6.5.4.
Further reading
We gather here the references for further reading which have appeared in
the text. This may provide a number of starting points for your exploration
of the vast literature on Markov processes and their applications.
J. Besag, P. Green, D. Higdon and K. Mengersen, Bayesian computation
and stochastic systems, Statistical Science 10 (1) (1995), 3-40.
K.L. Chung, Markov Chains with Stationary Transition Probabilities,
Springer, Berlin, 2nd edition, 1967.
P.G. Doyle and J.L. Snell, Random Walks and Electrical Networks, Carus
Mathematical Monographs 22, Mathematical Association of America,
1984.
W.J. Ewens, Mathematical Population Genetics, Springer, Berlin, 1979.
D. Freedman, Markov Chains, Holden-Day, San Francisco, 1971.
W.R. Gilks, S. Richardson and D.J. Spiegelhalter, Markov Chain Monte
Carlo in Practice, Chapman and Hall, London, 1996.
T.E. Harris, The Theory of Branching Processes, Dover, New York, 1989.
F.P. Kelly, Reversibility and Stochastic Networks, Wiley, Chichester, 1978.
D. Revuz, Markov Chains, North-Holland, Amsterdam, 1984.
B.D. Ripley, Stochastic Simulation, Wiley, Chichester, 1987.
L.C.G. Rogers and D. Williams, Diffusions, Markov Processes and Martin-
gales, Vol 1: Foundations, Wiley, Chichester, 2nd edition, 1994.
S.M. Ross, Applied Probability Models with Optimization Applications,
Holden-Day, San Francisco, 1970.
Further reading 233
D.W. Stroock, Probability Theory - An Analytic View, Cambridge Univer-
sity Press, 1993.
H.C. Tijms, Stochastic Models - an Algorithmic Approach, Wiley, Chich-
ester, 1994.
D. Williams, Probability with Martingales, Cambridge University Press,
1991.
Index
absorbing state 11, 111
absorption probability 12, 112
action 198
adapted 129
alleles 175
aperiodicity 40
average number of customers 181
backward equation 62, 96
Bayesian statistics 211
biological models 6, 9, 16, 82, 170
birth process 81
infinitesimal definition 85
jump chain/holding time definition
85
transition probability definition 85
birth-and-death chain 16
boundary 138
boundary theory for Markov chains
147
branching process 171
with immigration 179
Brownian motion 159
as limit of random walks 164
existence 161
in jRd 165
scaling invariance 165
starting from x 165
transition density 166
busy period 181
capacity 151
central limit theorem 160
charge 151
closed class 11, 111
closed migration process 184
communicating class 11, 111
conditional expectation 129
conductivity 151
continuous-time Markov chains 87
construction 89
infinitesimal definition 94
jump chain/holding time definition
94, 97
transition probability definition
94, 97
continuous-time random process 67
convergence to equilibrium 41, 121,
168
countable set 217
coupling method 41
current 151
Index
detailed balance 48, 124
discounted value function 202
distribution 1
lE, expectation 222
E(:A), exponential distribution of
parameter :A 70
e
Q
, exponential of Q 62
effective conductivity 156
electrical network 151
energy 154·
epidemic model 173
equilibrium distribution 33, 117
ergodic theorem 53, 126, 168
Erlang's formula 183
excursions 24
expectation 222
expected hitting time 12, 113
expected return time 37, 118
explosion
for birth processes 83
for continuous-time chains 90
explosion time 69
explosive Q-matrix 91
exponential distribution 70
F
n
, filtration 129
fair price 135
filtration 129
finite-dimensional distributions 67
first passage decomposition 28
first passage time 19, 115
flow 151
forward equation 62, 100
for birth processes 84
for Poisson process 78
Fubini's theorem 223
discrete case 219
full conditional distributions 212
fundamental solution 145
1'1, expected time in i between visits
to j 35
Galton-Watson process 171
gambling 131
Gaussian distribution 160
generator 166
generator matrix 94, 97
235
Gibbs sampler 210
gravity 134, 169
Green matrix 144, 145
harmonic function 146
Hastings algorithm 210
hitting probability 12
hitting time 12, 111
holding times 69
I, state-space 2
infective 173
integrable 129, 222
integral form of the backward
equation 98
integral form of the forward equation
101
inter-arrival times 180
invariant distribution 33, 117
computation of 40
irreducibility 11, 111
Ising model 214
jump chain 69
jump matrix 87
jump times 69
last exit time 20
long-run proportion of time 53, 126
mi, expected return time 37, 118
J.t{, expected time in i between visits
to j 118
Markov(:A, P) 2
Markov(:A, Q) 94,97
Markov chain
continuous-time 88
discrete-time 2
Markov chain Monte Carlo 206, 208
Markov decision process 197
expected total cost 198
expected total discounted cost 202
long-run average costs 204
Markov property 3
for birth processes 84
for continuous-time chains 93
for Poisson process 75
martingale 129, 141, 176, 204
236
associated to a Markov chain 132
associated to Brownian motion
169
matrix exponentials 105
maximum likelihood estimate 56
measure 1
memoryless property 70
Metropolis algorithm 210
minimal non-negative solution 13
minimal process 69
monotone convergence 223
discrete case 219
Moran model 177
mutation 6, 176
non-minimal chains 103
null recurrence 37, 118
o(t),O(t), order notation 63
Ohm's law 151
open migration process 185
optional stopping theorem 130
IP, probability 222
P, transition matrix 2
[>, transition matrix of reversed
chain 47
P(t), transition semigroup 96
P(t), semigroup of reversed chain
124
p ~ j ) , n-step transition probability 5
II, jump matrix 87
1r-system 228
Poisson process 74
infinitesimal definition 76
jump chain/holding time definition
76
transition probability definition 76
policy 198
policy improvement 201
population genetics 175
population growth 171
positive recurrence 37, 118
potential 138
associated to a Markov chain 138
associated to Brownian motion
169
Index
gravitational 134
in electrical networks 151
with discounted costs 142
potential theory 134
probability generating function 171
probability measure 222
probability space 222
Q-matrix 60
Q, generator matrix of reversed chain
124
qi, rate of leaving i 61
qij, rate of going from i to j 61
queue 179
MIGll 187
M/G/oo 191
MIMll 180
M/M/s 182
queueing network 183-185
queues in series 183
random chessboard knight 50
random walk
on tl
d
29
on a graph 49
recurrence 24, 114, 167
recurrence relations 57
reflected random walks 195
reservoir model 194, 195
resolvent 146
resource management 192
restocking a warehouse 192
return probability 25
reversibility 48, 125
right-continuous process 67
ruin
gambler 15
insurance company 196
selective advantage 176
semigroup 96
semigroup property 62
service times 180
shopping centre 185
simple birth process 82
simulation 206
skeleton 122
state-space 1
stationary distribution 33, 117
stationary increments 76
stationary.policy 198
statistics 55, 211, 215
stochastic matrix 2
stopping time 19
strong law of large numbers 52
strong Markov property 19, 93, 227
success-run chain 38
susceptible 173
telephone exchange 183
texture parameter 216
time reversal 47, 123
transience 24, 114, 167
transition matrix 2
irreducible 11
Index 237
maximum likelihood estimate 56
transition semigroup 165
truncated Poisson distribution 183
unit mass 3
Vi(n), number of visits to i before n
53
valency 50
value function 198
weak convergence 164
Wiener process 159
Wiener's theorem 161
Wright-Fisher model 175
(, explosion time 69

Markov Chains

J. R. Norris University of Cambridge

~u~~u CAMBRIDGE
::: UNIVERSITY PRESS

PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE

The Pitt Building, Trumpington Street, Cambridge CB2 lRP, United Kingdom
CAMBRIDGE UNIVERSITY PRESS

The Edinburgh Building, Cambridge CB2 2RU, United Kingdom 40 West 20th Street, New York, NY 10011-4211, USA 10 Stamford Road, Oakleigh, Melbourne 3166, Australia
© Cambridge University Press 1997

This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1997 Reprinted 1998 First paperback edition 1998 Printed in the United States of America TYPeset in Computer Modem
A catalogue record for this book is available from the British Library Library of Congress Cataloguing-in-Publication Data is available

ISBN 0-521-48181-3 hardback ISBN 0-521-63396-6 paperback

For my parents

Contents

Preface Introduction 1. Discrete-time- Markov chains 1.1 Definition and basic properties 1.2 Class structure 1.3 Hitting times and absorption probabilities 1.4 Strong Markov property 1.5 Recurrence and transience 1.6 Recurrence and transience of random walks 1.7 Invariant distributions 1.8 Convergence to equilibrium 1.9 Time reversal 1.10 Ergodic theorem 1.11 Appendix: recurrence relations 1.12 Appendix: asymptotics for n! 2. Continuous-time Markov chains I 2.1 Q-matrices and their exponentials 2.2 Continuous-time random processes 2.3 Some properties of the exponential distribution

ix xiii 1
1

10 12 19 24
29

33
40 47

52
57 58 60 60

67
70

4 Markov decision processes 5.4 Monotone convergence and Fubini's theorem 6.8 Forward and backward equations 2.2 Basic facts of measure theory 6.viii Contents 2.3 Markov chains in resource management 5.1 Basic properties 3.10 Appendix: matrix exponentials 73 81 87 90 93 103 105 108 108 111 112 114 117 121 123 125 128 128 134 151 159 170 170 179 192 197 206 217 217 220 222 223 224 228 232 234 3.5 Markov chain Monte Carlo 6. Applications 5.2 Class structure 3. Appendix: probability and measure 6.7 Explosion 2.9 Non-minimal chains 2.8 Ergodic theorem 4.5 Birth processes 2.3 Electrical networks 4.6 Uniqueness of probabilities and independence of a-algebras Further reading Index .5 Invariant distributions 3.3 Probability spaces and expectation 6.2 Queues and queueing networks 5.4 Recurrence and transience 3.6 Convergence to equilibrium 3.2 Potential theory 4.4 Poisson processes 2.6 Jump chain and holding times 2.1 Countable sets and countable sums 6. Continuous-time Markov chains II 3. Further theory 4.5 Stopping times and the strong Markov property 6.1 Martingales 4.7 Time reversal 3.3 Hitting times and absorption probabilities 3.1 Markov chains in biology 5.4 Brownian motion 5.

There are no strict prerequisites but it is envisaged that the reader will have taken a course in elementary probability. Illustrative examples introduce many of the key ideas. In between. The first half of the book is based on lecture notes for the undergraduate course. and is the basis of all that follows. You must begin here. This makes Markov chains the first and most important examples of random processes. which forms the basis of classwork done by the students. and which has been tested over several years. and is developed from a course taught to undergraduates for several years. Careful proofs are given throughout. Indeed. measure theory is not a prerequisite. There is a selection of exercises. Their simple structure makes it possible to say a great deal about their behaviour. Chapter 2 explains how to set up the theory of continuous- . the class of Markov chains is rich enough to serve in many applications. At the same time. Chapter 1 deals with the theory of discrete-time Markov chains. The basic pattern of Chapter 1 is repeated in Chapter 3 for continuous-time chains. This book is an account of the elementary theory of Markov chains.Preface Markov chains are the simplest mathematical models for random phenomena evolving in time. In particular. It was conceived as a text for advanced undergraduates or master's level students. making it easy to follow the development byanalogy. The material is quite straightforward and the ideas introduced permeate the whole book. with applications. the whole of the mathematical study of random processes can be regarded as a generalization in one way or another of the theory of Markov chains.

No conditions of uniformly bounded rates are needed. The following paragraph is directed primarily at an instructor and assumes some familiarity with the subject. potentials.I inherited parts of it from Martin Barlow and Chris Rogers. hitting probabilities. Markov decision processes and Monte Carlo simulation. the book is more focused on the Markovian context than most other books dealing with the elementary theory of stochastic processes. but the proofs are intended to be rigorous. I believe that this restriction in scope is desirable for the greater coherence and depth it allows. Chapter 4 introduces. electrical networks and Brownian motion. Geoffrey Grim- . For example. Richard Gibbens. when considered in measure-theoretic terms. The student has the option to take Chapter 3 first. convergence to equilibrium. such as martingales. The course on which it is based has evolved over many years and under many hands . We have left measure theory in the background. and the ergodic theorem for long-run averages. It is a pleasure to acknowledge the work of colleagues from which I have benefitted in preparing this book. in the context of elementary Markov chains. for example to population growth. Some further details are given in Chapter 6. The treatment of discrete-time chains in Chapter 1 includes the calculation of transition probabilities. Also treated are recurrence and transience. reversibility. The second half of the book comprises three independent chapters intended to complement the first half. and establish analogues of all the main results obtained for discrete time. In Chapters 2 and 3 we proceed via the jump chain/holding time construction to treat all right-continuous. Overall. queues and networks of queues. mathematical genetics. to study the properties of continuous-time chains before the technically more demanding construction. we use excursions and the strong Markov property to obtain conditions for recurrence and transience. and convergence to equilibrium is proved by the coupling method. beginning with simple examples such as the Poisson process and chains with finite state space. expected hitting times and invariant distributions. minimal continuous-time chains. or very easily made rigorous. All the results are proved. In recent years it has been given by Doug Kennedy and Colin Sparrow. where we explain some of the basic notions of probability and measure used in the rest of the book and give careful proofs of the few points where measure theory is really needed. Chapter 6 is an appendix to the main text. Chapter 5 is devoted to applications. exploiting to the full the probabilistic viewpoint. some of the ideas crucial to the advanced study of Markov processes.x Preface time chains. In some sections the style is a little more demanding.

I am especially grateful to David Thanah at Cambridge University Press for his suggestion to write the book and for his continuing support.Preface xi mett. Violet Lo and David Rose pointed out many typos . 1996 James Norris . Meena Lakshmanan.and ambiguities. precision and good humour. Frank Kelly and Gareth Roberts gave expert advice at various stages. Cambridge. and to Sarah Shea-Simonds who typeset the whole book with efficiency. Brian Ripley and David Williams made constructive suggestions for improvement of an early version.

but also the lack of memory property makes it possible to predict how a Markov chain may behave. The letters n. What makes them important is that not only do Markov chains model many phenomena of interest. We shall consider chains both in discrete time n E Z+ and continuous time = {O.Introduction This book is about a certain sort of random process. and to compute probabilities and expected values which quantify that behaviour. when it is usual to refer it as a Markov chain. } t E jR+ = [0. Thus we write (Xn)n~O for a discrete-time process and (Xt)t~O for a continuous-time process. This means that only the current state of the process can influence where it goes next. Such a process is called a Markov process. as you will see throughout the book. We shall be concerned exclusively with the case where the process can assume only a finite or countable set of states. together with many examples and applications. Examples of Markov chains abound. 1. In this book we shall present general techniques for the analysis of Markov chains. The characteristic property of this sort of process is that it retains no memory of where it has been in the past... In this introduction we shall discuss a few very simple examples and preview some of the questions which the general theory will answer. k will always denote integers. (0). . m. whereas t and s will refer to real numbers. 2. .

. The resulting process is called the Poisson process of rate A. 1 When in state 0 you wait for a random time with exponential distribution of parameter A E (0. But since the total probability on jumping from 2 must equal 1. then jump to 1. (ii) (Continuous time) o••---~.. E(A) for short. of which we now give some simple examples: (i) (Discrete time) 1 3 3 1 2 You move from state 1 to state 2 with probability 1.. (iii) (Continuous time) o • • A 1 • A 2 3 4 Here.. Thus the density function of the waiting time T is given by for t We write T rv A ~ o.XIV Introduction Markov chains are often best described by diagrams.. otherwise stay at 2. We might have drawn a loop from 2 to itself with label 2/3.. . when you get to 1 you do not stop but after another independent exponential time of parameter A jump to 2. From state 3 you move either to 1 or to 2 with equal probability 1/2. this does not convey any more information and we prefer to leave the loops out. 00 ).~-. and so on. and from 2 you jump to 3 with probability 1/3.

namely {O}. if T 1 is the smaller you go to 1 after time T 1 . (b) Starting from 1. The details are given later. Two of these classes are closed.Introduction 1 xv 3 4 2 (iv) (Continuous time) In state 3 you take two independent exponential times T1 rv E(2) and T2 rv E (4). the long-run proportion of time spent in 2 is 3/8. The states may be partitioned into communicating classes. . The closed classes here are recurrent. meaning that you return again and again to every state. 6}. 5. It is a simple matter to show that the time spent in 3 is exponential of parameter 2 + 4 = 6. (c) Starting from 1. 3} and {4. the probability of hitting 6 is 1/4. 3} is not. The class {O} is transient. {I. (a) Starting from 0. 2. but {I. it takes on average three steps to hit 3. and if T2 is the smaller you go to 2 after time T 2 . meaning that you cannot escape. The class {4. 6} is periodic. You might like to try from first principles. We shall show how to establish the following facts by solving some simple linear/ equations. (v) (Discrete time) 3 6 2 5 o We use this example to anticipate some of the ideas discussed in detail in Chapter 1. the probability of hitting 3 is 1. The rules for states 1 and 2 are as given in examples (ii) and (iii). and that the probability of jumping from 3 to 1 is 2/(2 + 4) = 1/3. 5. (d) Starting from 1. 2.

Then we have: (e) lim POI n---+oo = 9/32. n---+oo .xvi Introduction Let us write pij for the probability starting from i of being in state j after n steps. (g) lim pg4 = 1/124. (f) P04 does not converge as n ~ 00.

. Theorem 1.2 and 1. Theorem 1. Suppose we set Ai = IF(X = i) = IF( {w : X(w) = i}).5. Part of that understanding will come from familiarity with examples. We work'throughout with a probability space (0.4. lP). If in addition the total mass EiEI Ai equals 1. Theorem 1.3. so a large number are worked out in the text. F. Theorem 1. Theorem 1.10.2 on longrun averages.9.3 characterizing recurrence and transience.2 on the strong Markov property.8. Once you understand these you will understand the basic theory.3 on reversibility. Recall that a random variable X with values in I is a function X : 0 --+ I. and Theorem 1. Each i E I is called a state and I is called the state-space.3. Discrete-time Markov chains are defined and their behaviour is investigated. Exercises at the end of each section are an important part of the exposition.1 Discrete-time Markov chains This chapter is the foundation for all that follows. For better orientation we now list the key theorems: these are Theorems 1. then we call A a distribution. 1.7 on invariant distributions and positive recurrence.1 Definition and basic properties Let I be a countable set.3 on convergence to equilibrium.5 on hitting times.7. We say that A = (Ai: i E I) is a measure on I if 0 ~ Ai < 00 for all i E I.

There is a brief review of some basic facts about countable sets and probability spaces in Chapter 6. .. We say that a matrix P == (Pij : i.. for n 2: 0 and io.. and it is the key to some later calculations. .. A discrete-time random process Markov(A. . then we again say (Xn)O~n~N is Markov (A.Xn . conditional on X n == i. P) if and only if for all io. We say that (Xn)n~O is a Markov chain wit'h initial distribution A and transition matrix P if (i) X o has distribution A.. P) for short.. We think of X as modelling a random state which takes the value i with probability Ai. . X n + 1 has distribution (Pij : j E I) and is independent of X o. Discrete-time Markov chains Then A defines a distribution.2 1.in+l E I. Here are two examples: P- (I-a (3 1 ~ /3) 1 p= ( ~ 1/2 1 1/2 0 1/2 1~2 ) 3 1 2 2 We shall now formalize the rules for a Markov chain by a definition in terms of the corresponding matrix P. There is a one-to-one correspondence between stochastic matrices P and the sort of diagrams described in the Introduction. (i) P (X 0 == io) == Aio.1. following result appears to give a more comprehensive description. j E I) is stochastic if every row (Pij : j E I) is a distribution. . . It is in terms of properties (i) and (ii) that most real-world examples are seen to be Markov chains.N .1 .iN E I (Xn)O<n<N is . (ii) for n ~ 0..Xn == in) == Pi n i n +1 • We say that (Xn)n~O is Markov (A. More explicitly. If (Xn)O~n~N is a finite sequence of random variables satisfying (i) and (ii) for n == 0.1. . the distribution of X.. Theorem 1. P)..·· . (ii) P(Xn + 1 == i n + 1 I X o == io. .1. But mathematically the. these conditions state that.

.N .1. ·Pirn+n-lirn+nlP(A I Xm = i) (1. .· . P). . . ... P(Xo = io) = Aio and.1 and. . for n 0..Xm+n = im+n } n A I X m = i) Xo~. lP(XN = iN I X o = io.XN = iN) = P(Xo = iO)P(XI = il I X o = io) . P).Xn = in) = P(Xo = i o.. Theorem 1. by induction for all n = 0.... . if (1. .Xn = in) So (Xn)O~n~N is Markov(A.··· .Xm .. Then. ..Xm = i m }. .Xn = in. P(Xn+ 1 = i n+ 1 I X o = io..1.XN. then by summing both sides over iN E I and using EjEI Pij = 1 we see that (1.1) holds for N . . First consider the case of elementaryevents A = {X o = i o. We have to show that for any event A determined by we have lP({Xm = im. X n+1 = in+I)/P(XO = i o. .. conditional on X m = i.. P) and is independent of the random variables X o. .1 Definition and basic properties 3 Proof.. . ..1.1.2 (Markov property). . .... where 8ij ={ ° I ifi=j otherwise. D The next result reinforces the idea that Markov chains have no memory. . P).1. In particular.. . Let (Xn)n~O be Markov(A. (Xm+n)n~O is Markov(8 i . .1.XI = i l .1.N..· . Proof.1) holds for N..I = iN-I) On the other hand. Suppose (Xn)O~n~N is Markov(A. then P(Xo = iO. We write 8i = (8 ij : j E I) for the unit mass at i.Xm = 8iirnPirnirn+l .2) then the result follows by Theorem 1.

.· . k=l 00 Then the desired identity (1. In general. P). In the case where Ai > we shall write Pi(A) for the conditional probability P(A I X o = i). . . We regard distributions and measures A as row vectors whose components are indexed by I.1.2) for A follows by summing up the corresponding identities for A k . (i) P(Xn = j) = (Apn)j. under lPi.. ..2. P). just as P is a matrix whose entries are indexed by I x I. Theorem 1. .1.Xm may be written as a countable disjoint union of elementary events A= U Ak. for all n. D The remainder of this section addresses the following problem: what is the probability that after n steps our Markov chain is in a given state~ First we shall see how the problem reduces to calculating entries in the nth power of the transition matrix. So the behaviour of (Xn)n~O under lPi does not depend on A. We agree that pO is the identity matrix I. any event A determined by X o. jEI We define pn similarly for any n.Xm+n = i m+n and i = im)/P(Xm = i) = biirnPi rn irn+l X Pirn+n-l i rn + n lP(Xo = io. where (I)ij = 8ij .. Then.N. m ~ 0. (ii) lP\(Xn = j) = JP(Xn + m = j I X m = i) = p~j) · . . By the Markov property at time m = 0.4 1. matrix multiplication is a familiar operation.1. The context will make it clear when I refers to the state-space and when to the identity matrix. (Xn)n~O is Markov(8i .Xm = i m and i = im)/lP(Xm = i) which is true by Theorem 1. We write p~j) = (pn)ij for the (i. (p2)ik = LPijPjk.. Let (Xn)n~O ° be Markov(A. . For these objects. defining a new measure AP and a new matrix p 2 by (AP)j = L iEI AiPij.3. j) entry in pn. Then we shall look at some examples where this may be done explicitly. then A will be an N-vector and P an N x N-matrix. When I is finite we will often label the states 1. Discrete-time Markov chains In that case we have to show P(Xo = i o. We extend matrix multiplication to the general case in the obvious way.

L L .4 The most general two-state chain has transition matrix of the form p = ( I-a {3 and is represented by the following diagram: {3 We exploit the relation p n+ I = pn P to write PII (n+I) _ - PI2 (n){3 + PII (1 _ a ) . D = i.. .1.. X n = j) AioPioil" · Pin-Ii = (Apnk in-lEI ioEI in-lEI (ii) By the Markov property.11): (n) PII = {_(3_ + a + {3 1 _a_(l_ a a + {3 (3)n for a for a + {3 > 0 + {3 = O. (n) We also know that pi~) + pi~) = IP\ (Xn = 1 or 2) = 1. Example 1.a ..I = in-I.··· . conditional on X m (8i . (Xm+n)n~O is Markov In light of this theorem we call p~j) the n-step transition probability from i to j. The following examples give some methods for calculating p~j). P).1 Definition and basic properties Proof. (n) This has a unique solution (see Section 1.Xn. L P(Xo = i o. so we just take A = 8i in (i).(3) PII + {3 . (i) By Theorem 1. so by eliminating pi~) we get a recurrence relation for pi~): PII (n+I) - (1 .1 5 P(Xn = j) = = ioEI L ·.1.1.

which is chosen at random.~)n N N N-1· Beware that in examples having less symmetry.a. At any time a transition is made from the initial state to another with probability a.) (1.1. this sort of lumping together of states may not produce a Markov chain. Pij = a / (N - 1) for i =I j.1). in this example there is a much simpler approach.1) and by putting (3 = a/(N .4 we find that the desired probability is -!.6 Consider the three-state chain with diagram 1 3 2 1 2 .1. Then the answer we want would be found by computing pi~). or with probability a mutates to another strain. + (1. In fact. with N x N transition matrix P given by Pii = 1 .1.1) in Example 1.6 1. Example 1. Thus we have a two-state chain with diagram a/(N . and a transition from another state to the initial state with probability a/(N . What is the probability that the strain in the nth generation is the same as that in the Oth? We could model this process as an N-state chain.5 (Virus mutation) Suppose a virus can exist in N different strains and in each generation either stays the same.-!. which relies on exploiting the symmetry present in the mutation rules. Discrete-time Markov chains Example 1.

2 ( . -i/2 and from this we deduce that pi~) has the form Pu (n) =a+b 2 . - 1(3 4 . band c.1 Definition and basic properties and transition matrix 7 p= (~ ~ I)· = ~(x . that is.)n + c (")n ~ ~ for some constants a. The eigenvalues are 1. (3 and ~: 1= pi~) = Q p(2) 11 - o = pii) O- + (3 = + ~l' Q r\J L. First we compute the eigenvalues of P by writing down its characteristic equation o = det (x - P) = x(x - ~)2 .~ + 1). (3 aDd~.) The answer we want is real and (±~)n = (~)ne±in~/2= (~)n (cosn 7r ±isin n 7r ) 2 so it makes sense to rewrite pi~) in the form for constants Q. so we get equations to solve for Q.1)(4x 2 The problem is to find a general formula for pi~).1. i/2. The first few values of pi~) are easy to write down. (The justification comes from linear algebra: having distinct eigenvalues. P is diagonalizable.(. for some invertible matrix U we have -i/2 and hence ~ ) U- 1 p n = U 1 (00 (i/~t o ~) U(-i/2)n 2 1 which forces pi~) to have the form claimed.

1. Exercises 1. 1. (iii) As roots of a polynomial with real coefficients. as in the example. + aMAM for some constants al. Let YI . If Y n = Xkn.) has the form Pij (n) _ - alAI \n \n + . ..1. . (3 = 4/5. .8 SO Q: 1. uniformly distributed on [0. Y2 . ~ = -2/5 and More generally.. Can all Markov chains be realized in this way? How would you simulate a Markov chain using a computer? .. show that (Yn)n~O is Markov (A.1 Let B I .3 Let X o be a random variable with values in a countable set I. Suppose we are given a function G : I x [0.) for any M-state chain and any states i and j. • •• be disjoint events with U~l B n = O. Deduce that if X and Yare discrete random variables then the following are equivalent: (a) X and Yare independent. be a sequence of independent random variables.. 1. say) then the general form includes the term (an+b)A n .. B 2 . . (ii) If the eigenvalues are distinct then p~. 1]. the following method may in principle be used to find a formula for p~.1. (i) Compute the eigenvalues AI.2 Suppose that (Xn)n~O is Markov (A. 1] and define inductively --+ I Show that (Xn)n~O is a Markov chain and express its transition matrix P in terms of G. If an eigenvalue A is repeated (once. Discrete-time Markov chains = 1/5. (b) the conditional distribution of X given Y = y is independent of y. p k ). .. Show that if A is another event and P(AIBn ) = P for all n then P(A) = p. . P). .AM of P by solving the characteristic equation.aM (depending on i and j). complex eigenvalues will come in conjugate pairs and these are best written using sine and cosine.

all other scores having probability 1/5. are independent. What is the probablity that it is in state 1 just before the (n+l)th trial? What is the probability Pn+1 (A) that it chooses A on the (n + 1)th trial ? . In each of the following cases determine whether (Xn)n~O is a Markov chain: (a) X n = Zn.6 An octopus is trained to choose object A from a pair of objects A. X n.1.. By considering the relationship between the two dice find the value of p for the new die.. all other scores having equal probability.1. (d)X n = (Sn. (b) X n = Sn. in state 3 it remembers and chooses A and never forgets.1. What is the probability that after n hops this second flea is back where it started? [Recall that e±i7r/6 = V3/2 ± i/2. + Sn. 1. A second flea also hops about on the vertices of a triangle.p. and in the cases where it is not a Markov chain give an example where P(Xn+1 = ilXn = j.. So + .] 1. Sn = Z1 + . . Set So = 0. B by being given repeated trials in which it is shown both and is rewarded with food if it chooses A. If the first score fS 6. 1. with all jumps equally likely. . what is the probability p that the nth score is 6? What is the probability that the nth score is I? Suppose now that a new die is produced which cannot score one greater (mod 6) than the preceding score. in state 2 it remembers and chooses A but may forget again. Find the probability that after n hops the flea is back where it started.. After each tr·ial it may change its state of mind according to the transition matrix State 1 State 2 State 3 ~ ~ 0 5 12 ~ 0 l2 0 1...1. but this flea is twice as likely to jump clockwise as anticlockwise. + Zn.. (c) X n = So + . Z1. + Sn).1 Definition and basic properties 9 Suppose now that Zo. It is in state 1 before the first trial. identically distributed random variables such that Zi = 1 with probability p and Zi = 0 with probability 1 .4 A flea hops about at random on the vertices of a triangle. The octopus may be in one of three states of mind: in state 1 it cannot remember which object is rewarded and is equally likely to choose either. In the cases where (Xn)n~O is a Markov chain find its state-space and transition matrix.1 = k) is not independent of k.5 A die is 'fixed' so that each time it is rolled the score cannot be the same as the preceding score.

in with io = i and in = j. 1.7 Let (Xn)n~O be a Markov chain on {1.in-l Pii 1 P i li2 . It is sometimes possible to break a Markov chain into smaller pieces. each of which is relatively easy to understand. . Calculate P(Xn = 11Xo = 1) in each of the following cases: (a) p (b) p = 1/6. and which together give an understanding of the whole.. L p~j) n=O 00 which proves the equivalence of (i) and (iii). We say that i leads to j and write i ~ j if ~ Pi(Xn = j for some n We say i communicates with j and write i 0) > o. lPi(Xn =j for some n ~ 0) ::.P 0 ) 1/3 0 . (iii) p~j) > 0 for some n ~ O. P i oi l P i li2 . (c) p = 1/12. Observe that p~j) ::. 0 .1. This is done by identifying the communicating classes of the chain. Discrete-time Markov chains Someone suggests that the record of successive choices (a sequence of As and Bs) might arise from a two-state Markov chain with constant transition probabilities..2 Class structure = 1/16. Also p~j) = L il .. ~ ~ j if both i j and j ~ i.3} with transition matrix 1 2/3 1. .. Discuss. 1.. Pin-li n > 0 for some states io.2.10 1. Theorem 1.1..il. whether this is possible.. For distinct states i and j the following are equivalent: (i) i (ii) ~ j. ·Pin-d so that (ii) and (iii) are equivalent.. with reference to the value of Pn+ 1 (A) that you have found.2. Proof. .

Thus a closed class is one from which there is no escape.1 Identify the communicating classes of the following transition matrix: 2 0 0 0 2 1 1 0 2 0 :4 0 1 1 P= 0 0 1 :4 0 1 0 2 0 :4 0 1 1 2 0 0 :4 1 2 1 1 Which classes are closed? .1. the class structure of a chain is very easy to find. Exercises 1. and thus partitions I into communicating classes. We say that a class C is closed if i E C. So ~ satisfies the conditions for an equivalence relation on I. As the following example makes clear. when one can draw the diagram.2. {4} and {5. with only {5.2. Also i ~ i for any state i.2 Find the communicating classes associated to the stochastic matrix 2 0 1 P= 3 1 0 0 0 2 0 0 0 0 0 1 0 1 0 0 0 0 0 0 3 1 1 0 0 3 1 1 2 0 0 2 0 1 0 0 0 0 1 0 The solution is obvious from the diagram 1 4 2 6 the classes being {1. The smaller pieces referred to above are these communicating classes. A chain or transition matrix P where I is a single class is called irreducible. i ~ j imply j E C.3 Hitting times and absorption probabilities 11 It is clear from (ii) that i ~ j and j ~ k imply i ~ k.3}. A state i is absorbing if {i} is a closed class.6} being closed.2.6}. Example 1.

12 1.3 Hitting times and absorption probabilities Let (Xn)n>O be a Markov chain with transition matrix P. The hitting time of a subset A of I is the random variable H A : n ~ {O. hf is called the absorption probability.2. Example 1. . } U {oo} given by HA(w) = inf{n ~ 0 : Xn(w) E A} where we agree that the infimum of the empty set 0 is starting from i that (Xn)n~O ever hits A is then 00. these quantities can be calculated explicitly by means of certain linear equations associated with the transition matrix P. Before we give the general theory.. The mean time taken for (Xn)n~O to reach A is given by kt = lEi(H A ) = 2: nJP>(H n<oo A = n) + ooJP>(H A = (0).2 Show that every transition matrix on a finite state-space has at least one closed communicating class.1.1 Consider the chain with the following diagram: 1 1 1 • 1 2 E 2 • • 2 41( 2 1 3 • 2 ~ 4 • Starting from 2.3.2. Find an example of a transition matrix with no closed communicating class.. what is the probability of absorption in 4? How long does it take until the chain is absorbed in 1 or 4? Introduce . here is a simple example. We shall often write less formally Remarkably. Discrete-time Markov chains 1. The probability When A is a closed class. 1.

(1. k2 = 1 + ~k3 = 1 + ~(1 + ~k2). Suppose now that we start at 2. A.2. in assuming that the chain begins afresh from its new position after the first jump. hI = 0.3. then H A so hf = 1. If X o = i E A. Note that in writing down the first equations for h2 and k2 we made implicit use of the Markov property.1. So The 1 appears in the second formula because we count the time for the first step. Here is a general result for hitting probabilities. Theorem 1. jEI jEI A . starting from 2. the probability of hitting 4 is 1/3 and the mean time to absorption is 2. and hf = lP\(H A < 00) = LJPi(H A < oo. then H A ~ 1. If X o = i fj. Hence h2 = ~ h 3 = ~ ( ~ h 2 + ~).3 Hitting times and absorption probabilities 13 Clearly. A. So.) i E I) is another solution with 0 Proof. The vector of hitting probabilities h A = (hf : i E I) is the minimal non-negative solution to the system of linear equations hf = 1 { hf = EjEI Pij h (Xi: for i E A 1 for i fj. and consider the situation after making one step.XI = j) jEI = LJPi(H < 00 I Xl = j)JPi(XI = j) = LPijhf. First we show that h A satisfies (1. then Xi ~ hi for all i. h4 = 1 and k l = k4 = o.3) Xi ~ (Minimality means that if x = for all i. With probability 1/2 we jump to 1 and with probability 1/2 we jump to 3. Similarly.3). so by the Markov property = 0.

. Suppose i ¢ A.. Of course.14 1. A. .1 (continued) The system of linear equations (1.. X n E A) + L jl~A . Discrete-time Markov chains Suppose now that X = (Xi: i E I) is any solution to (1. X 2 X E A) + '2: '2: PijPjkXk· j~A k~A By repeated substitution for Xi = JP>i(XI E A) in the final term we obtain after n steps fj. so is the last term on the right. + JP>i(XI jn~A fj. and the remaining terms sum to JP>i(H A ~ n).. then Xi Substitute for Xi Xj = Xi = 1 = '2:PijXj jEI = '2:Pij jEA + '2:PijXj.. h4 so that h3 = ~h2 + ~h4 and The value of hI is not determined by the system (1. so we recover h 2 = 1/3 as before.. So Xi ~ JP>i(H A ~ n) for all n and then Xi ~ lim JP>i(H A ~ n) n--+-oo = JP>i(H A < 00) = hi. L PiiIPjli2 · · · Pjn-lin Xjn' Now if X is non-negative.3) for h = h{4} are given here by = 1. h2 = ~hl + ~h3.3).. Then hf for i E A.Xn I + .3). j~A to obtain = LPij jEA + LPij(LPjk + j~A LPjkXk) k~A kEA = JPli(X1 E A) + JPli(X1 ¢ A.3. D Example 1. but the minimality condition now makes us take hI = 0. A. the extra boundary condition hI = 0 was obvious from the beginning .

Pi. so the minimal nonq the recurrence relation .3. (See Section 1.2. Pi. In cases where the state-space is infinite it may not be possible to write down a corresponding extra boundary condition. Then.. £1 at a time. But what is the probability that you leave broke? Set hi = IPi(hit 0). . The transition probabilities are Poo = 1.) If P < q. so there is no upper limit to your fortune.q < 1.. then h is the minimal non-negative solution to h o = 1.i-l = q.. then the restriction 0 ~ hi ~ 1 forces B = 0. the minimality condition is essential...11. q 1( p o • I( 1 i i +1 •• where 0 < P = 1 . as we shall see in the next examples. if p has a general solution hi = A+ Bi ~ = 0.. q I( p . If p > q.1. for i = 1. with probability P of doubling your stake and probability q of losing it.i+l = P Imagine that you enter a casino with a fortune of £i and gamble. hi = ph i+1 + qhi . then since h o = 1 we get a family of solutions for a non-negative solution we must have A negative solution is hi = (q / p) i. The resources of the casino are regarded as infinite. for i = 1. Example 1.1 . so hi = 1 for all i..3 Hitting times and absorption probabilities 15 so we built it into our system of equations and did not have to worry about minimal non-negative solutions.. Finally.. which is the case in most successful casinos.3 (Gamblers' ruin) Consider the Markov chain with diagram q p • . If p =I q this recurrence relation has a general solution hi = A +B (~) i .2. .

. Pi being the probability that we get a birth before a death in a population of size i..3....• t i+l qi+1 Pi+1 where. +~i-I)~O for all i. (I---.. Thus. + Ui so where A = UI and ~o = 1. -. This recurrence relation has variable coefficients so the usual technique fails. recorded each time it changes. 0 is an absorbing state and we wish to calculate the absorption probability starting from i... In the case L::o ~i = 00.t(-. you are certain to end up broke.. . Such a chain may serve as a model for the size of a population..... even if you find a fair casino.I . hi = Pihi+1 + qihi-I. . the restriction 0 ~ hi ~ 1 forces A = 0 and hi = 1 for all i. As in the preceding example.~ ql PI qi Pi ---.... . we have 0 < Pi = 1 . But consider Ui = h i .. At this point A remains to be determined. for i = 1. for i = 1..4 (Birth-and-death chain) Consider the Markov chain with diagram o ..t(~--- 1 i ..hi + . But here we allow Pi and qi to depend on i..2...hi. This apparent paradox is called gamblers' ruin. Then hi = IPi(hit 0) is the extinction probability starting from i.. then PiUi+1 = qiUi.16 1.2. so hi = 1 for all i.. But if L::o ~i < 00 then we can take A > 0 so long as l-A(~o+ ..qi < 1.. so where the final equality defines UI ~i........ Then = h o .. Discrete-time Markov chains and again the restriction 0 ~ hi ~ 1 forces B = 0. Example 1.---.-••.. We write down the usual system of equations ho = 1. . ..

= 0. First we show that k A satisfies (1. then H A ~ 1.3 Hitting times and absorption probabilities 17 Thus the minimal non-negative solution occurs when A = then (E:o Ii) -1 and In this case. by the Markov property.. where H A is the first time (Xn)n~O hits A.4). we have hi < 1. + lP'i(H A ~ n) + L jl~A .1.4). 'Pjn-dnYjn' . If X o = i E A.4) Proof. then Yi kf = Yi = 0 = 1 + LPijYj j~A = 1 + LPij j~A (l + LPjkYk) k~A = lP'i(H A ~ 1) + lP'i(H A ~ 2) + L L PijPjkYk· j~A k~A By repeated substitution for Y in the final term we obtain after n steps Yi = lP'i(H A ~ 1) + . so. Recall that kf = Ei(H A ).3. Then for i E A. (1. so the population survives with positive probability. and kt = Ei(H A ) = = LEi(H JEI A L JEI E i(H A 1X1 =j) I Xl = j)JP\(XI = j) = 1 + LPijkf· j~A Suppose now that Y = (Yi : i E I) is any solution to (1. The vector of mean hitting times k A = (k A : i E I) is the minimal non-negative solution to the system of linear equations kf { kf = 0 = 1 for i E A + Ej~A pijkf for i f/: A. then HA so kf = o. for example.. Here is the general result on mean hitting times. so.. We use the notation 1B for the indicator function of B.2... for i = 1. . If X o = i f/: A. If i f/: A. 1x1 ==j is the random variable equal to 1 if Xl = j and equal to 0 otherwise. Theorem 1. .. L jn~A PiilPilh .5..

and his stake is returned. Yi 2: Wi(H A 2: 1) and. The gambler decides to use a bold strategy in which he stakes all his money if he has £5 or less.. + .3 A simple game of 'snakes and ladders' is played on a board of nine squares. letting n ----* 00. + Wi(H A = 2: n) Yi 2': Exercises L IP\(H n==l A 2': n) JEi(H A ) = Xi- D 1. Pro~e that the What is the expected number of tosses until the gambler either achieves his aim or loses his capital? 1. to £10. he wins a sum equal to his stake.3.3.2 A gambler has £2 and needs to increase it to £10 in a hurry.3. 1.1 Prove the claims (a).18 1. otherwise he loses his stake. Discrete-time Markov chains So. if he wins. if y is non-negative. and otllerwise stakes just enough to increase his capital. Let X o == 2 and let X n be his capital after n throws. (b) and (c) made in example (v) of the Introduction. if a player bets on the right side. He can play a game with the following rules: a fair coin is tossed.. gambler will achieve his aim with probability 1/5. .

i+l = i + ( -i- 1) 2 Pi..Xn for n = 0. 1. . but if you land at the head of a snake you slide down to the tail.. .1.. Xl. ..i-l. Pi. 1. } U {(X)} is called a stopping time if the event {T = n} depends only on X o .1 jumps straight to i.. A random variable T : n ~ {O. Pi. If you land at the foot of a ladder you climb to the top.2.3. the process after time m begins afresh from i. Intuitively.4. If asked to stop at T.. by watching the process. the process after time H .1 (a) The first passage time Tj = inf{n ~ 1 : Xn = j} is a stopping time because (b) The first hitting time H A of Section 1. instead of conditioning on X m = i.I? In this section we shall identify a class of random times at which a version of the Markov property does hold. This class will include H but not H -. This says that for each time m. we simply waited for the process to hit state i. you know when to stop. for example H . .. Show that if X o = 0 then the probability that X n ~ 1 for all n ~ 1 is 6/1T 2 • 1. at some random time H. .4 Strong Markov property 19 At each turn a player tosses a fair coin and advances one or two places according to whether the coin lands heads or tails.Xn l ~ A. conditional on X m = i.i-l = 1.Xn E A}. i ~ 1..i+l + Pi. Suppose.1 we proved the Markov property.4 Strong Markov property In Section 1.1.4 Let (Xn)n~O be a Markov chain on {O.3 is a stopping time because {H A = n} = {X o ~ A. . Examples 1. } with transition probabilities given by POI = 1. ... after all. so it does not simply begin afresh. you know at the time when T occurs. How many turns on average does it take to complete the game? What is the probability that a player who has reached the middle square will complete the game without slipping back to square I? 1..1. . What can one say about the process after time H? What if we replaced H by a more general random time.2.

The crucial point is that.XT+n = jn} n BIT < 00.. X T = i).... X T = i) to obtain JP>( {XT = jo. . D The following example uses the strong Markov property to get more information on the hitting times of the chain considered in Example 1. Xl = jl. . then B n {T = m} is determined by X o. so.Xm .. by the Markov property at time m lP( {XT = io. Xl. .3 Consider the Markov chain (Xn)n~O with diagram q p • ~ q I( • p ~ q i p o • I( I(. Theorem 1. conditional on T < 00 and X T = i. Discrete-time Markov chains (c) The last exit time LA = sup{n ~ 0 : X n E A} is not in general a stopping time because the event {LA whether (Xn+m)m~l visits A or not. .. Then. Example 1.4..2 (Strong Markov property).XT ..X I = jl. . P) and let T be a stopping time of (Xn)n~O. . .XI . and divide by lP(T < 00.Xn = jn)JP>(B I T < 00.. X T = i) = JP>i(XO = jo. XT+I = il. .. . . Xl.. .Xn = jn)JP>(B n {T = m} n {XT = i}) where we have used the condition T = m to replace m by T. Xl.3. 1..3.. then B n {T == m} is determined by X o. P) and independent of XO. Proof. for all m = 0.XT . X T+I = jl.20 1. if T is a stopping time and B ~ n is determined by X o. Xl.Xm .. Let (Xn)n~O be Markov(A. 2. (XT+n)n~O is Markov(8 i . .~ 1 i +1 . . . 2.XT+n = in} n B n {T = m} n {XT = i}) = JP>i(XO = io.XT.·. If B is an event determined by X o. == n} depends on We sllall show that the Markov property holds at stopping times. 1.4. . Now sum over m == 0.

So E 2(SHO) = E 2(sHl I HI < 00)E 2(sHO I HI < 00)JP>2(Hl < 00) = E2(sHI1Hl<OO)E2(sHO I HI < 00) = E 2(sHl)2 = ¢(s)2.4 Strong Markov property 21 where 0 < p = 1 . for 0 ~ s < 1 </>(s) = lEI (sHo) = 2: snIP (Ho = n). is independent of HI and has the (unconditioned) distribution of HI. conditional on Xl = 2.4pqs2)/2ps. we have H o = HI + H o. by the Markov property at time 1. 2 3 . So ¢(s) = El(sHo) = pEl (sHo I Xl = = pEl (sl+Ho I Xl = psE 2(sHo ) + qEl(sHo I Xl = 0) = 2) + qEl(s I Xl = 0) 2) + qs = ps¢(s)2 + qs.q < 1.. slPl(Ho = 1) + s2lP l (Ho = 2) + s3lP l (Ho = 3) + . 1 n<oo Suppose we start at 2. Then..1. where iio. the time taken after HI to get to 0.. Set H j = inf{n ~ 0 : X n = j} and.. the time taken after time 1 to get to 0.5) ¢ = (1 ± VI . Thus ¢ = ¢( s) satisfies and pS¢2 . ) } = qs +pq s = + . To recover the distribution of H o we expand the square-root as a power series: </>(s) = 2~S { 1 - (1 + !( -4pqs 2) + !(-lH _4pqs 2)2 /2! + .¢ + qs =0 (1. Since ¢(O) ~ 1 and ¢ is continuous we are forced to take the negative root at s = 0 and stick with it for all 0 ~ s < 1. we have H o = 1 + H o.. Here we obtain the complete distribution of the time to hit 0 starting from 1 in terms of its probability generating function.. has the same distribution as H o does under lP 2 . where H o. conditional on HI < 00. We know from Example 1.3 the probability of hitting 0 starting from 1. Apply the strong Markov property at !II to see that under lP 2.3..

··· .4pq = VI .. Let us assume that lP(Tm < 00) = 1 for all easily that T m. that P(Ym +1 = i m +1 I Yo = i o.5) to obtain 2ps¢¢' so + p¢2 - ¢' + q = 0 ¢'(s) = (p¢(S)2 + q)/(l - 2ps¢(s)) ~ 1/(1 .rn +l 't 't . See Example 5.1 for a connection with branching processes. 2.4 We now consider an application of the strong Markov property to a Markov chain (Xn)n~O observed only at certain times. For each m we can check J. for m = 0. On letting s lI]) i 1 we have ¢(s) ~ P1(Ho < 0 00). (Remember that q = 1- p.4p + 4p2 = 11 We can also find the mean hitting time using 2pl = 12q .. Ym = i m ) = P(XTrn + 1 = i m + 1 I X To = i o. So the i o. . so VI .) It is only worth considering the case p ~ q.2p) = l/(q .··· .11.. so .1. where the mean hitting time has a chance of being finite. Example 1.im+l E J.4. the time of the mth visit to strong Markov property applies to show.XTrn = i m) = Pi rn (XT 1 = i m +1 ) = p-... where To = inf{n ~ 0 : X n E J} and. for m. Discrete-time Markov chains The first few probabilities P1(Ho = 1). In the first instance suppose that J is some subset of the state-space I and that we observe the chain only when it takes values in J.22 1. 1. The resulting process (Ym)m~O may be obtained formally by setting Ym = X Trn . . T m + 1 = inf{n > Tm : X n E J}. rn . .P 1(Ho = 2). is a stopping time.p) as s i 1.. are readily checked from first principles.1rl (Ii < 00 ) _ 1- VI - 4pq _ { 1 if p ~ q 2p q/p if p > q. Differentiate (1. .

1. Y2 .. by the strong Markov property lP(Zm+l = im+l I Zo = io. Again the random times 8 m for m ~ 0 are stopping times and... for i =I j Pij = Pij/ LPik.. for i. ••• be independent identically distributed random variables with lP(Y1 = 1) = lP(Y1 = -1) = 1/2 and set X o = 1.1.. 1. Let us assume there are no absorbing states.··· . X n = X o + Y1 + . Find the probability generating function ¢(s) = E(sHO). . .Xsrn = im ) = lPirn (XS1 = i m +1 ) = Pirnirn+l where Pii = 0 and..2 Deduce carefully from Theorem 1. Define H o = inf{n ~ 0 : X n = O}..1. The resulting process (Zm)m~O is given by Zm = XS rn where 8 0 = 0 and for m = 0. Suppose the distribution of Y1 . . k=j:i Thus (Zm)m~O is a Markov chain on I with transition matrix Exercises P. the vector (h{ : i E I) is the minimal non-negative solution to (1. "Y2.2 the claim made at (1.1 Let Y1 .3. is changed to lP(Y1 lP(Y1 = -1) = 1/2. A second example of a similar type arises if we observe the original chain (Xn)n~O only when it moves.4.2¢ + s = 0 .Zm = i m) = lP(XSrn + 1 = im+l I XS o = io. = 2) = 1/2..4 Strong Markov property 23 where.2. Show that ¢ now satisfies s¢3 . .4. + Yn for n ~ 1. .6) h{ = Pij + LPik h 1· k~J Thus (Ym)m~O is a Markov chain on J with transition matrix P. j E J Pii - = hi i and where. for j E J.6).

(O) ~ n .2. .24 1. . Discrete-time Markov chains 1. We shall show that every state is either recurrent or transient. The length of the rth excursion to i is then S~r) ~ = { T(r) _ T(r-I) 0 i i 1 ·f T(r-I) < i otherwise.. We IIOW define inductively the rth passage time Ti(r) to and.. Recall that the first passage time to state i is the random variable T i defined by Ti(W) = inf{n ~ 1 : Xn(w) = i} where inf 0 = state i by 00. We say that a state i is recurrent if lPi(Xn =i for infinitely many n) = 1.5 Recurrence and transience Let (Xn)n~O be a Markov chain with transition matrix P. for r = 0.1. 00 The following diagram illustrates these definitions: Xn i T. Thus a recurrent state is one to which you keep coming back and a transient state is one which you eventually leave for ever. We say that i is transient if IPi(Xn =i for infinitely many n) = O.

(XT+n)n~O is Markov(8i .5 Recurrence and transience 25 Our analysis of recurrence and transience will rest on finding the joint distribution of these excursion lengths. It is automatic that X T = i on T < 00. Xl. .3.. But sir) so = inf{n ~ 1 : X T +n = i}. P) and independent of X o. D sir) is the first passage time of (XT+n)n~O to state i.. Recall that the indicator function l{xl==j} is the random variable equal to 1 if Xl = j and 0 otherwise. then Pi ( Vi > r + 1) = Pi ( T i(r+I) < 00 00 ) 00) = Pi(Ti(r) < = Pi (Sfr+I) = fif.. so by induction the result is true for all r. .2. . . conditional on Ti(r-I) < pendent of {Xm : m ~ Ti(r-I)} and 00..XT .1. . When r the result is true.2.5. .5.. and si r+ I) < < 00 I Ti(r) < oo)Pi(Ti(r) < 00) = f[+1 by Lemma 1. conditional on T < 00.1. Observe that if X o = i then {Vi > r} = {Ti(r) < oo}.. which may be written in terms of indicator functions as Vi and note that 00 = L n==O 00 l{x n =i} lEi(Vi) = lEi L n==O l{X n =i} =L n==O 00 00 00 lEi(l{Xn =i}) = LlPi(Xn = i) = LP~7). So. =0 Proof.1. D .5.1. we can compute the distribution of Vi under Pi in terms of the return probability Lemma 1. Lemma 1. For r = 2. n==O n==O Also. Let us introduce the number of visits Vi to i. Suppose inductively that it is true for r. Apply the strong Markov property at the stopping time T = Ti(r-I). sir) is inde- Proof. we have Pi (Vi > r) = fi. For r = 0.

26

1. Discrete-time Markov chains

Recall that one can compute the expectation of a non-negative integervalued random variable as follows:
00 00 00

LP(V > r)
r=O

=

L

L

P(V

=

v)
00

r=Ov=r+1 00 v-I

=

L LP(V
v=lr=O

-="'

v) = L vP(V = v) = E(V).
v=1

The next theorem is the means by which we establish recurrence or transience for a given state. Note that it provides two criteria for this, one in terms of the return probability, ttle other in terms of the n-step transition probabilities. Both are useful.

Theorem 1.5.3. The following dichotomy holds:

(i) ifIPi(Ti < 00) = 1, then i is recurrent and E~op~~) = 00; (ii) ifIPi(Ti < 00) < 1, then i is transient and E~op~~) < 00. In particular, every state is either transient or recurrent.
Proof. If IPi(Ti < 00) = 1, then, by Lemma 1.5.2, IPi(Vi
so i is recurrent and

= 00) = r--+-oo IPi(Vi > r) = 1 lim
00

LP~~) = Ei(Vi)
n=O

=

00.

On the other hand, if Ii = IPi(Ti < 00) < 1, then by Lemma 1.5.2

n=O

fp~~) = Ei(Vi) = fpi(Vi > r) = fl[
r=O r=O

=

1

~

. < f'l,

00

so IPi (Vi

= 00) = 0 and i

is transient.

D

From this theorem we can go on to solve completely the problem of recurrence or transience for Markov chains with finite state-space. Some cases of infinite state-space are dealt with in the following chapter. First we show that recurrence and transience are class properties.

Theorem 1.5.4. Let C be a communicating class. Then either all states in C are transient or all are recurrent.

Proof. Take any pair of states i, j E C and suppose that i is transient. There exist n, m ~ 0 with p~j) > 0 and PJ":) > 0, and, for all r ~ 0 Pii
(n+r+m)

- Pij Pjj Pji

>

(n)

(r) (m)

1.5 Recurrence and transience
so

27

~
r==O

(r) LJPjj -

<

1
Pij Pji

(n) (m) LJPii
r==O

~

(n+r+m)

<

00

by Theorem 1.5.3. Hence j is also transient by Theorem 1.5.3.

D

In the light of this theorem it is natural to speak of a recurrent or transient class. Theorem 1.5.5. Every recurrent class is closed.
Proof. Let C be a class which is not closed. Then there exist i E C, j fj. C and m ~ 1 with

Since we have lPi ( {Xm = j} n {X n = i for infinitely many n}) = 0 this implies that

lPi(Xn = i for infinitely many n) < 1
so i is not recurrent, and so neither is C. D

Theorem 1.5.6. Every finite closed class is recurrent.
Proof. Suppose C is closed and finite and that for some i E C we have
(Xn)n~O

starts in C. Then

o < P(Xn = i
= lP(Xn

for infinitely many n) for some n)Pi(Xn

=i

=i

for infinitely many n)

by the strong Markov property. This shows that i is not transient, so C is recurrent by Theorems 1.5.3 and 1.5.4. D It is easy to spot closed classes, so the transience or recurrence of finite classes is easy to determine. For example, the only recurrent class in Example 1.2.2 is {5, 6}, the others being transient. On the other hand, infinite closed classes may be transient: see Examples 1.3.3 and 1.6.3. We shall need the following result in Section 1.8. Remember that irreducibility means that the chain can get from any state to any other, with positive probability.

28

1. Discrete-time Markov chains Then for all

j E I we have lP(Tj

Theorem 1.5.7. Suppose P is irreducible and recurrent. < 00) = 1.

Proof. By the Markov property we have

lP(Tj < (0) = LlP(Xo = i)lPi(Tj < (0)
iEI

so it suffices to show lPi (Tj < (0) By Theorem 1.5.3, we have
1 = Pj(Xn

=

1 for all i E I. Choose m with

p;:n) > o.

= j for infinitely many
= j

n)

= lPj(Xn
kEI
=

for some n ~ m
= j for some n

+ 1)
+ 1 IXm
= k)lPj(Xm = k)

= L lPj(Xn

~m

L lPk(T
kEI

j

< oo)p;7:)

where the final equality uses the Markov property. But we must have Pi(Tj < 00) = 1. D
Exercises

EkE!

p;7:) =

1 so

1.5.1 In Exercise 1.2.1, which states are recurrent and which are transient? 1.5.2 Show that, for the Markov chain
lP(Xn
~

(Xn)n~O
~

in Exercise 1.3.4 we have

00 as n

00) = 1.

Suppose, instead, the transition probabilities satisfy
Pi,i+l =

(i+l)O
-i-

Pi,i-l·

For each

Q

E (0,00) find the value of P(Xn ~ 00 as n ~ 00).

1.5.3 (First passage decomposition). Denote by T j the first passage time to state j and set

Justify the identity
n

P ~~) = '"' f~~)p~~-k)
I)

~

I)

))

for n

~

1

k==l

1.6 Recurrence and transience of random walks and deduce that where

29

CX)

00

Pij(s) = LP~;)sn,

F ij (s) = L

fi~n) sn ·

n==O
Hence show that Pi (Ti <
00)

n==O

= 1 if and only if
00

n==O
without using Theorem 1.5.3.

LP~~) =

00

1.5.4 A random sequence of non-negative integers (Fn)n~o is obtained by setting Fo = 0 and F 1 = 1 and, once Fo, ... ,Fn are known, taking Fn+ 1 to be either the sum or the difference of F n - 1 and F n , each with probability 1/2. Is (Fn)n~o a Markov chain? By considering the Markov chain X n = (Fn - 1 , Fn ), find the probability that (Fn)n~o reaches 3 before first returning to O. Draw enough of the flow diagram for (Xn)n~O to establish a general pattern. Hence, using the strong Markov property, show that the hitting probability for (1,1), starting from (1,2), is (3 - V5)/2. Deduce that (Xn)n~O is transient. Show that, moreover, with probability 1, Fn ~ 00 as n ~ 00.

1.6 Recurrence and transience of random walks
In the last section we showed that recurrence was a class property, that all recurrent classes were closed and that all finite closed classes were recurrent. So the only chains for which the question of recurrence remains interesting are irreducible with infinite state-space. Here we shall study some simple and fundamental examples of this type, making use of the following criterion for recurrence from Theorem 1.5.3: a state i is recurrent if and only if
~oo

L..Jn==O Pii -

(n)_
00.

Example 1.6.1 (Simple random walk on Z)
The simple random walk on Z has diagram

q
i-I

...

..
i

P
i+l

30

1. Discrete-time Markov chains

where 0 < P = 1 - q < 1. Suppose we start at O. It is clear that we cannot return to 0 after an odd number of steps, so p~~n+l) = 0 for all n. Any given sequence of steps of length 2n from 0 to 0 occurs with probability pnqn, there being n steps up and n steps down, and the number of such sequences is the number of ways of choosing the n steps up from 2n. Thus

(2n) Poo = (2n) p n qn . n
Stirling's formula provides a good approximation to n! for large n: it is known that
asn~oo

where an rv bn means an/bn ~ 1. For a proof see W. Feller, An Introduction to Probability Theory and its Applications, Vol I (Wiley, New York, 3rd edition, 1968). At the end of this chapter we reproduce the argument used by Feller to show that
asn~oo

for some A E [1,00). The additional work needed to show A omitted, as this fact is unnecessary to our applications. For the n-step transition probabilities we obtain

= y'2;IT

is

Poo

(2n) _ (2n)! ( )n - (n!)2 pq

rv

(4pq)n

AJn/2

as n

~

00.

In the symmetric case p n ~ N we have

= q = 1/2, so 4pq = 1; then for some N and all
(2n) > Poo -

2AVii
_1 _

1

so ~ Poo

~

(2n)

> --.!... ~

- 2A ~

Vii -

00

which shows that the random walk is recurrent. On the other hand, if p then 4pq = r < 1, so by a similar argument, for some N

=I q

'"" p(n) < _ '"" r n < 00 ~ 00 -A ~
n==N n=N

00

1

00

showing that the random walk is transient.

.6..1. Suppose we start at o. for the orthogonal projections of X n on the diagonal lines y == ±x: X+ n Then X~ and X..1 / 2 Z and X n == 0 if and only if X~ == 0 == X.. 1 1 .6 Recurrence and transience of random walks Example 1. This makes it clear that for X n we have Poo (2n) == asn~oo ... Let us call the walk X n and write X~ and X. are independent simple symmetric random walks on 2.~ :4 1 :4 and transition probabilities Pij 1/4 { 0 == ifli-jl==l otherwise. 1 :4 .2 (Simple symmetric random walk on Z2) The simple symmetric random walk on Z2 has diagram 31 :4 .

Suppose we start at O.k~O i+j+k=n i+j+k=n Now i. with i + j + k = n. j north. k east and k west for some i. Of these 2n steps there must be i up.k~O i+j+k=n the left-hand side being the total probability of all the ways of placing n balls randomly into three boxes. we obtain (2n) _ Poo - i.j. k. E::o p~~) = 00 by comparison with E::o l/n Example 1. _ {1/6 if Ii . Thus the chain jumps to each of its nearest neighbours with equal probability.. ·'k')2 (~. we have n ( ij k) for all i. We can only return to 0 after an even number 2n of steps. For the case where n = 3m.J .j.6..k~O (2n)! . k ~ 0.00 -3/2 . Then and the walk is recurrent. i down.. E:=o p~~m) < 00 by comparison with . Hence.j.Jn=O n and Poo _ Poo or all m so we must have and the walk is transient. .j. i. Discrete-time Markov chains by Stirling's formula.) o otherwise.". j. By counting the ways in which this can be done. so = i!j!k! ~ mmm n! (n) (~) 3/2 asn~oo by Stirling's formula. B u t Poo > (1/6)2 Poo (6m) (6m-2) (6m) > (1/6)4 (6m-4) £ L.32 1. j south. .3 (Simple symmetric random walk on Z3) The transition probabilities of the simple symmetric random walk on Z3 are given by .jl = 1 P'1..

X m+n+ 1 is independent of X m . The first result explains the term stationary. Theorem 1. The random walk on T jumps from a vertex along each available edge with equal probability. We have L and 7rj 7rj == jEI L n~oo 1· P~J1m (n) - 1· == n~oo 1m L P -~J (n) == 1 jEI jEI 1· 1· == n~oo P-~J. .7. at every other vertex there are three edges and there are no closed loops.3. 7 Invariant distributions 33 Exercises 1.1. \ 1. Remember that a measure A is any row vector (Ai : i E I) with non-negative entries.1. conditional on X m+n == i.1. The terms equilibrium and stationary are also used to mean the same. Let I be finite. 1.6. Theorem 1.6.Xm+n and has distribution (Pij : j E I). X m + 1 .P) and suppose that A is invariant for P.== n~oo 1m (n) 1m L kEI P-k Pkj ~ (n) == L kEI n~oo 1· 1m P-k Pkj ~ (n) == L kEI 7rkPkj . Show that the random walk is transient. Suppose for some i E I that p~j) Then 7r ---t 7rj as n ---t 00 for all j E I. By Theorem 1.7 Invariant distributions Many of the long-time properties of Markov chains are connected with the notion of an invariant distribution or measure..7. We say A is invariant if AP == A. clearly. Let (Xn)n~O be Markov(A. P(Xm == i) == (Apm)i == Ai for all i and. Proof.2.. == (7rj : j E I) is an invariant distribution. Proof. P). Then (Xm+n)n~O is also Markov(A.2 Show that the simple symmetric random walk in Z4 is transient. .1 The rooted binary tree is an infinite graph T with one distinguished vertex R from which comes a single edge. D The next result explains the term equilibrium.

0:/(0: invariant.3 we shall prove a sort of converse.2. the distribution ((3/(0: + (3).7. by Theorem 1. by Example 1.34 1.7.8.1.7. which is much more useful. so. but it is not a distribution! Theorem 1.6 we have p~.) ---t 0 as n ---t 00 for all i.4 Consider the Markov chain (Xn)n~O with diagram 1 3 1 2 2 To find an invariant distribution we write down the components of the vector equation 7r P = 7r 7rl 7r2 = ~7r3 1 1 = 27r1 + 2 7r3 1 27r2 1 + 2 7r3 · _ 7r3 - . Discrete-time Markov chains where we have used finiteness of I to justify interchange of summation and limit operations. Example 1.7.3 Consider the two-state Markov chain with transition matrix p. D Notice that for any of the random walks discussed in Section 1. j E I.( l-a (3 Ignore the trivial cases 0: = (3 = 0 and 0: = (3 = 1. In Theorem 1.4 pn ~ ((3/(a (3/(a + (3) + (3) a/(a + (3)) a/(a + (3) as n ~ 00. Then. The limit is certainly invariant. There are of course easier ways to discover this.2 is not a very useful result but it serves to indicate a relationship between invariant distributions and n-step transition probabilities. Hence 7r is an invariant distribution. + (3)) must be Example 1.

knowing that p~~) had the form PII (n) = a + (l)n (bcos 2 2 n7r . and the equations require Xl also to have distribution 7r.1. . the event {n ::. by the Markov property at n . (i) This is obvious.1. and another equation is required to fix 7r uniquely.6 pi~) ~ 1/5 as n ~ 00 so this confirms Theorem 1. Xl. consider for each i the expected time spent in i between visits to k: Tk. Tk} depends only on X o.) in Example 1. so the column vector of ones is an eigenvector with eigenvalue 1. In the next two results we shall show that every irreducible and recurrent stochastic matrix P has an essentially unique positive invariant measure.1 . .2. According to Example 1.5.2/5. so P must have a row eigenvector with eigenvalue 1. The equations are homogeneous so one of them is redundant.6.l ')'f = lEk L l{x n =i}' n=O Here the sum of indicator functions serves to count the number of times n at which X n = i before the first passage time T k . Let P be irreducible and recurrent. The proofs rely heavily on the probabilistic interpretation so it is worth noting at the outset that.1. .7.. Theorem 1. For a fixed state k. Alternatively.7.. so. 7 Invariant distributions 35 In terms of the chain.2 and knowledge of 7r1 to identify a = 1/5.7.. (ii) For n = 1. the existence of an invariant row vector is a simple piece of linear algebra: the row sums of P are alII. That equation is and we find that 7r = (1/5. Then (i) l'~ = 1. for a finite state-space I. when X o has distribution 7r. n7r) + CSln 2 we could have used Theorem 1. for all i E I. instead of working out p~. (ii) l'k = (l'f : i (iii) 0 < E Tf < 00 I) satisfies l'k P = l'k . Proof. the right-hand sides give the probabilities for Xl.Xn .2. .l .2/5).

+IPk(Xn ~ ~j as =j and n ~ 00.j = lEk L n=l 00 Tk l{x n =j} = lEk L 00 l{Xn =j and nSTd n=l = L n=l JP>k(Xn 00 =j and n ~ Tk) i. .Pki >.i n -l#k Pkin-l · · · PilioPioj ) ~ IPk(X1 = j and Tk ~ 1) + IPk(X2 n) = j and Tk ~ 2) + . m ~ 0 with (n) (m) 0 k (m) (n) k Pik .1 = = LPij L JP>k(Xn .in=j:k AinPinin-l · · · Pioj L io#k + (Pk j + PkioPioj + ···+ Tk ~ L io. Then A ~ ~k. for each state i there exist n.1 = LPijlEk iEI L m=O l{Xtn =i} = L iEI . Therefore < 00 and X o = X Tk = k with .7. (iii) Since P is irreducible. If in addition P is recurrent. Let P be irreducible and let A be an invariant measure for P with Ak = 1. under IPk we have probability one...6.il#k Ai1Pi 1 ioPioj + (Pk j PkioPiOj ) L i o . D Theorem 1.. . then A = ~k .1 } Tk.. Proof. .fpij. ..36 1. For each j E I we have Aj = = L ioEI AioPioj = L io#k AioPioj + Pkj + L io#k L io.1 = iEI n=l 00 = LpijlEk L iEI m=O l{Xtn =i and msTk. Then ~ik ~ ~kPki > 0 and ~ik Pik ~ ~k = 1 bY () and i (ii). X n = j and n ~ Tk) i and n ~ Tk) = L L JP>k(X iEI n=l 00 n.. Discrete-time Markov chains Tk Since P is recurrent..

By Theorem 1.7. so P is recurrent. But L~j jEI = mi < 00 so 7rj = ~. 0 Recall that a state i is recurrent if IPi(Xn = i for infinitely many n) = 1 and we showed in Theorem 1. (iii) P has an invariant distribution. and O ' " or = /-lk = LJjEI /-ljPjk ~ /-liPik . Hence mk =L k ~. Moreover. (ii) some state i is positive recurrent. . 7r say. (i) => (ii) This is obvious.7.7) and k is positive recurrent. Since P is irreducible and EiEI 7ri = 1 we have 7rk = EiEI 7riP~~) > 0 for some n. / mi defines an invariant distribution.1. (ii) => (iii) If i is positive recurrent. given i E I. Let P be irreducible. So by Theorem 1. Proof. so J-l = A . then ~k is invariant by Theorem 1.~k is also invariant and /-l ~ O. it is certainly recurrent.6.5. Then A is an invariant measure with Ak = 1. ~ ~ L iEI iEI 7ri . Since P is irreducible. 7 Invariant distributions 37 So A ~ ~k. ~i is then invariant.7.3 that this is equivalent to If in addition the expected return time is finite.7.5. A recurrent state which fails to have this stronger property is called null recurrent.5. A ~ ~k. Then the following are equivalent: (i) every state is positive recurrent. when (iii) holds we have mi = 1/7ri for all i. If P is recurrent.7. so J-li = O. Theorem 1. (iii) => (i) Take any state k. then we say i is positive recurrent. (n) (n) (n) we have Pik > 0 £ some n.= -1 < 7r 7r k k 00 (1. Set Ai = 7ri/7rk.

Now Theorem 1. the simple symmetric random walk on Z3. Example 1. whose transition probabilities are given by Pi.7.i+l = Pi. Example 1.7.11 Consider a success-run chain on Z+ . so A = ~k and the inequality (1.7. PiO = qi = 1 . but has invariant measure 7r given by 7ri = 1 for all i.7. so 7r is invariant.6. Example 1.9 The existence of an invariant measure does not guarantee recurrence: consider. Discrete-time Markov chains To complete the proof we return to the argument for (iii) => (i) armed with the knowledge that P is recurrent. Since LiEZ 7ri = 00. which is transient by Example 1.Pi· .i-l = q < P = Pi. there can be no invariant distribution and the walk is therefore null recurrent.1.7. by Theorem 1. there is a two-parameter family of invariant measures uniqueness up to scalar multiples does not hold. Consider the measure 7ri Then =1 for all i.7.6 forces any invariant measure to be a scalar multiple of 7r. by Example 1.i+l· In components the invariant measure equation 7r P = 7r reads This is a recurrence relation for 7r with general solution So. for example. in this case. it is also recurrent.7.3. 0 Example 1.6.10 Consider the asymmetric random walk on Z with transition probabilities Pi.38 1.7) is in fact an equality.8 (Simple symmetric random walk on Z) The simple symmetric random walk on Z is clearly irreducible and.

1.3 A particle moves on the eight vertices of a cube in the following way: at each step the particle is equally likely to move to each of the three adjacent vertices. Suppose we choose Pi converging sufficiently rapidly to 1 so that P = IIpi > 0 i=O 00 which is equivalent to 00 Lqi = i=O 00. 0 the vertex opposite i.1.7. Suppose there are N molecules in the box.7.1. 7 Invariant distributions 39 7r P Then the components of the invariant measure equation 00 = 7r read 1ro = L i=O qi 1ri.2. Show that the number of molecules on one side of the partition just after a molecule has passed through the hole evolves as a Markov chain.2 Gas molecules move about randomly in a box which is divided into two halves symmetrically by a partition. for i ~ 7ri = Pi-l7ri-l.1 Find all invariant distributions of the transition matrix in Exercise 1.7. independently of its past motion. A hole is made in the partition. Then for any solution of 7r P = 7r we have and so 1ro ~ JJ7ro 7ro L qi· i=O 00 This last equation forces either measure. Let i be the initial vertex occupied by the particle. 1. so there is no invariant Exercises 1. What are the transition probabilities? What is the invariant distribution of this chain? 1. Calculate each of the following quantities: . = 0 or 7ro = 00.

j E I) with aij = 1 for all i and j.8.5 Let P be a stochastic matrix on a finite set I.8 Convergence to equilibrium We shall investigate the limiting behaviour of the n-step transition probabilities p~. if the state-space is finite and if for some i the limit exists for all j.4. as the following example shows.10. This is also a consequence of Theorem 1.1 is connected with its periodicity. The behaviour of the chain in Example 1. Deduce that if P is irreducible then I -P+A is invertible. the limit does not always exist.1 Consider the two-state chain with transition matrix p=(~ ~). and a = (ai : i E I) with ai = 1 for all i. (iii) the expected number of steps until the first visit to o. Find .7. Note that this enables one to compute the invariant distribution by any standard method of inverting a matrix.) 1.7.f = and verify that T'~ ~ TO-l lEo ( n=O L ) l{x n =i} = inf Ai for all i A 1.2.7.8.8. Discrete-time Markov chains (i) the expected number of steps until the particle returns to i. j. (ii) the expected number of visits to 0 until the first return to i. We leave it as an exercise to show that i is aperiodic if and only if the set {n 2:: 0 : p~~) > O} has no common divisor other than 1. where the infimum is taken over all invariant measures A with Ao (Compare with Theorem 1. But. then it must be an invariant distribution.4 Let (Xn)n~O be a simple random walk on Z with Pi. where A = (aij : i. Thus p~j) fails to Let us call a state i aperiodic if p~~) > 0 for all sufficiently large n. Then p2 = I. As we saw in Theorem 1.i+l. Example 1.7.) as n ~ 00. =I and p 2n+l = P for all n. so p2n converge for all i. 1.6 and Example 1.7. Show that a distribution 7r is invariant for P if and only if 7r(I -P+A) = a. 1.i-l = q < P = Pi. .40 1.

j.k)(j. by Theorem 1. PJ~) > 0 for all sufflciently large n. P) and independent of (Xn)n~O. But T is the first passage time of W n to (b. Suppose P is irreducible and has an aperiodic state i. Then · (r) (s) 0 (r+n+s) (r) (n) (s) Pjk . (X n .8. S _ > 0 WIt h Pji' Pik >. We use a coupling argument. In particular. . 0 Here is the main result of this section.l) = Pij Pkl > 0 for all sufficiently large n.k) = Ai 7rk.k)(j. The process W n = chain on I x I with transition probabilities P(i. Let A be any distribution.5. l we have ~n) (n) (n) P(i.8 Convergence to equilibrium 41 Lemma 1. Since P is aperiodic.8. p~.1.I · P rooJ. We show P(T < 00) = 1. P). Let (Yn)n~O be Markov(7r.l) is a Markov = PijPkl and initial distribution jj(i. k. so P is irreducible. all states are aperiodic.7. In particular. for all states j and k. Suppose that (Xn)n~O is Markov(A. b) so P(T < 00) = 1. Then P(Xn = j) ~ 7rj as n ~ 00 for all j.Pji Pii Pik > >0 for all sufficiently large n. P has an invariant distribution given by 7r(i. by coupling two Markov chains. Fix a reference state b and set T = inf{ n 2:: 1 : X n = Yn = b}. Let P be irreducible and aperiodic. Proof. P is positive recurrent. Y n ) Step 1. There eXIst r.7. The method of proof. j. for all states i.3 (Convergence to equilibrium). Theorem 1. is ingenious. Also. .) ---t 'lrj as n ---t 00 for all i.k) = 7ri7rk so. Then. and suppose that P has an invariant distribution 7r.7. by Theorem 1.2.

Yo). XT+n)n~O which is also Markov(8(b. (Xl.b). so (XT+n. P) and remains independent of (X o.7rjl = IlP(Zn = j) -lP(Yn = j)1 = IlP(Xn = j and n < T) -lP(Yn = j and n < T)I lP(n < T) 0 as n ~ 00.(XT . . Markov(.\. Set ifn < T if n The diagram below illustrates the idea. YI). . Step 3. we can replace the process (XT+n. We have lP(Zn = j) = lP(Xn = j and n < T) +lP(Yn = j and n 2 T) so IlP(Xn ~ = j) . P). P) where n . In particular.. . . (XT.t. I ~ T. . (X I. Z~) is Markov(j. P).\. By symmetry. YT+n)n~O is Markov(8(b. Y T ) .42 1. We show that (Zn)n~O is n The strong Markov property applies to (Wn)n~O at time T. P) and independent of (X o.. Discrete-time Markov chains Step 2. Yo). YI).YT).. (Zn)n~O is Markov(. Hence W~ = (Zn. YT+n)n~O by (YT+n .b). Zn = { Y Xn ifn < T if n 2 T. ~ and P(n < T) D .

We start (Xn)n~O from 0 and (Yn)n~O with equal probability from 0 or 1. 1/2) as its unique invariant distribution. Hence we have a partition. The remainder of this section might be omitted on a first reading.8.1.I}. then . o~ r Now for nd ~ n~. Proof. There is an integer d partition I = Co U C 1 U .. if p~~d+r) > 0 and p~~d+8) > 0 for some r. then p~7d+r+k) > 0 so j E C r+n as required.4. for all i.. (Xn)n~O and (Yn)n~O will never meet. . Choose n1. if Yo = 1.m)n1 + mn2. Since d divides n1 we then have r = md for some integer m and then nd = (q . . we can write nd = qn1 + r for integers q ~ n1 and ~ n1 ... in particular n1. (ii) p~jd) > 0 for all sufficiently large n. where (Xt)~o is periodic or transient or null recurrent. Hence and hence nd E S.8 Convergence to equilibrium 43 To understand this proof one should see what goes wrong when P is not aperiodic.1 which has (1/2. We move on now to the cases that were excluded in the last theorem. E I (nd+r) : Pki > 0 £or some n ~ O} .d . (Here and throughout we use the symbol := to mean 'defined to equal'. then. j E Cr choose m1 and m2 so that p~:l) > 0 and p~72) > 0.1.) > 0. n2 E S with n1 < n2 and such that d := n2 . Moreover. Then Co U . Theorem 1. we have p~~d+r+m) > 0 and p~~d+8+m) > 0 so r = s by minimality of d. and the proof fails. By taking i = j = k we now see that d must divide every element of S.. To prove (ii) for i. U Cd-1 ~ 1 and a such that (setting Cnd+r = Cr) (i) p~j) > 0 only if i E Cr and j E C r+ n for some r.. then. .n1 is as small as possible. Consider the two-state chain of Example 1. choosing m ~ 0 so that p~". To prove (i) suppose p~j) > 0 and i E Cr. for all r. by irreducibility. S E {O.) Define for r = 0. Let P be irreducible.8.. Fix a state k and consider S = {n ~ 0 : P~~ > O}.d-1 Cr = {~. However.. U Cd-1 = I.j E C r . Choose m so that p~7d+r) > 0. 1. because of periodicity. .

so the result follows from Theorem 1.4 we have = Apr. For j E Cr the expected return time of (Yn)n~O to j is mjld. Let A be a distribution with LiECo Ai = 1.3 in two respects since we require neither aperiodicity nor the existence of an invariant distribution.8. Let P be irreducible of period d and let Co. The argument we use for the null recurrent case was discovered recently by B. Gray. for i E Co and j E C r we have (nd+r) ~ dl mj as n ~ 00. . . . Set v 1. This generalizes Theorem 1. Otherwise mj = 00 and we have to show that P(Xn = j) ~ 0 as n ~ 00.44 1.8.. here is a complete description of limiting behaviour for irreducible chains. Theorem 1.. pd) and.8.8. Then for r = 0. then P(Xnd+r = j) = P(Yn = j) so the theorem holds in general. Assume that P is aperiodic. In particular. Finally. Pij Proof Step 1.1.1 and j E C r we have P(Xnd +r = j) ~ dlmj as n ~ 00 where mj is the expected return time to j. by Theorem 1. The theorem just proved shows in particular for all i E I that d is the greatest common divisor of the set {n ~ 0 : p~~) > O}.4. Suppose that (Xn)n~O is Markov(A.. We reduce to the aperiodic case.Cd-l be the partition obtained in Theorem 1. then (Yn)n~O is Markov(v. . Fristedt and L. we We call d the period of P. ~ dlmj as n ~ 00 Step 2.8. C 1 . Discrete-time Markov chains ml whenever nd ~ n~. This is sometimes useful in identifying d. then by Theorem Set Y n = Xnd+r.8.d . D + m2 is then necessarily a multiple of d. If P is positive recurrent then 1/mj = 1rj. So if the theorem holds in the aperiodic case. . P). Since are done. where 1r is the unique invariant distribution.4.3.. pd is irreducible and aperiodic on Cr.5.

8.K .k = j) ~ €/2 for some k E {O..3.n) L k=n-K+l K-l n P(Xk = j)Pj(Tj > n .1 :E k=n-K+l n P(Xk = j and X m 1= j for m = k + 1.. on taking j. We can find N such that for n 2:: Nand k = 1.8 Convergence to equilibrium 45 If P is transient this is easy and we are left with the null recurrent case. IP(Xn = j) .1. so that P(Yn = j) = P(Xn +k = j).Yn ).. aperiodicity of (Xn)n~O ensures irreducibility of (Wn)n~O. . . Assume that P is aperiodic and null recurrent.1.. where j. for n 1~ ~ K . If (Wn)n~O is transient then.P(Xn+k = j)1 ::. Step 3.t = Ap k for k = 1. only now let (Yn)n~O be Markov(j.I}. . .t = A. we obtain P(Xn = j)2 = P(Wn = (j.k) = :E P(Xn-k = j)Pj(Tj > k) k=O so we must have P(Xn . 1. Then 00 :EPj(Tj > k) = lEj(Tj) = k=O 00.8. in the notation of Theorem 1. As before.K -1. Set W n = (Xn . ~· . Then. . .. Assume then that (Wn)n~O is recurrent. . j)) ~0 as required.t is to be chosen later. . IP(Xn = j) .. Return now to the coupling argument used in Theorem 1.t.P(Yn = j)1 ~ 0 We exploit this convergence by taking j.K . we have P(T < 00) = 1 and the coupling argument shows that as n ~ 00.3. Given € > 0 choose K so that Then. P)...

and compare them with your answers there. Set J-l process is a Markov chain: Xn = E(Y1 ). identically distributed random variables with values in {I..46 1.3. In the evening he replaces the book at the left-hand end of the shelf. and determine the limit. Discrete-time Markov chains ~ But for any n we can find k E {O. 2..3. (f) and (g) made in example (v) of the Introduction.I} such that P(Xn + k = j) c/2. Hence. Pn converges as n ~ 00. . Suppose that the set of integers {n : P(Y1 = n) > I} has greatest common divisor 1. this shows that lP(Xn = j) required.. Let X n denote the sum of the first n throws.8. for n ~ N P(Xn = j) ~ c.K .. ••• be independent. Let Y1 . 1. + Yk lim lP(Xn = 0) for some k ~ O} . 1.8. where 0 < Qi < 1 for i = 1. . Since c > 0 was arbitrary. (b) and (c). 1.n.4 Each morning a student takes one of the three books he owns from his shelf. . and choices on successive days are independent. . show that. 1.8.8. }.3 A fair die is thrown repeatedly.8.1. If Pn denotes the probability that on day n the student finds the books in the order 1. 1. from left to right.5 (Renewal theorem)..2.. Find lim P(Xn is a multiple of 13) n--+-oo quoting carefully any general theorems that you use. Y 2 .2.7.2 Find the invariant distributions of the transition matrices in Exercise 1.. irrespective of the initial arrangement of the books.. + Yk for some k ~ 0) ~ 1/ J-l. D Exercises ~ 0 as n ~ 00. parts (a).1 Prove the claims (e). as 1. Show that the following = inf{m ~ n: m = Y1 + . The probability that he chooses book i is Qi. Determine n--+-oo and hence show that as n P(n ~ 00 = Y1 + .

··· .YN = iN) = P(Xo = iN. P). run backwards. j and P is also irreducible with invariant distribution 1r. from state i. S~2). PiN -1 iN . the invariant distribution.9 Time reversal 47 (Think of YI .. Xl = iN-I. Suppose that (Xn)O~n~N is Markov(1T'. P i 1io = 1rioPioi1 . Thus the limiting probability that a bulb is replaced at time n is 1/ J-l. one can actually recover the full result by applying the renewal theorem to the excursion lengths S~l).9. P) and set Y n = X N . This is an example of entropy increasing.1. Then (Yn)O~n~N is MarkovCrr. The next result shows that a Markov chain in equilibrium. First we check that P is a stochastic matrix: since 1r is invariant for P..1. is again a Markov chain. This property is symmetrical in time and suggests looking at Markov chains with time running backwards. The transition matrix may however be different.XN = io) = 1riN PiNiN -1 . • •• as light-bulb lifetimes.9 Time reversal For Markov chains. where P = (Pij) is given by 1rjPji = 1riPij for all i. Although this appears to be a very special case of convergence to equilibrium. Let P be irreducible and have an invariant distribution 1r. On the other hand. A bulb is replaced when it fails. Proof. the past and future are independent given the present.. It suggests that if we want complete time-symmetry we must begin in equilibrium. Y 2 . convergence to equilibrium shows behaviour which is asymmetrical in time: a highly organised state such as a point mass decays to a disorganised one..·. We have P(YO= io. . YI = i l .· .. Next we check that 1r is invariant for P: L JEI 1rjPji = L JEI 1riPij = 1ri since P is a stochastic matrix..) 1. Theorem 1.n .

We say that (Xn)n~O is reversible if.9. Pin-li n > O. with P irreducible. Discrete-time Markov chains so.9. Proof. it is often easier to find by the detailed balance equations than by the equation A = AP. Theorem 1. Though obvious. P).1. the following result is worth remembering because. Suppose that (Xn)n~O is Markov(A. j.. P). Lemma 1.1. (Yn)O~n~N is Markov(7r. since P is irreducible.9.in-1.4 Consider the Markov chain with diagram: 1 3 3 2 2 . j there is a chain of states io = i.3.. Finally. Both (a) and (b) imply that A is invariant for P. (XN-n)O~n~N is also Markov(A.. Let P be an irreducible stochastic matrix and let A be a distribution. Then Pin in-l . A stochastic matrix P and a measure A are said to be in detailed balance if AiPij = AjPji for all i. Then the following are equivalent: (a) (Xn)n~O is reversible..2. . Proof· We have (AP)i = LjE! AjPji = LjE! AiPij = Ai· D Let (Xn)n~O be Markov(A. D We begin a collection of examples with a chain which is not reversible. The chain (Yn)O~n~N is called the time-reversal of (Xn)O~n~N.in =j withpioil .. for all N ~ 1. P). when a solution A to the detailed balance equations exists. by Theorem 1.48 1.. . Pil io D = 7rioPioil .. for each pair of states i.1. If P and A are in detailed balance. Pin-l in / 7ri n >0 so P is also irreducible. Then both (a) and (b) are equivalent to the statement that P = P in Theorem 1..il.9. Example 1. then A is invariant for P. P). (b) P and A are in detailed balance.

1/3. which in this case would be heavily concentrated near M. some of which are joined by edges.9. Example 1. The non-zero detailed balance equations read AiPi. A patient observer would see the chain move clockwise in the long run: under time-reversal the clock would run backwards! Example 1. . 1. However.5 Consider the Markov chain with diagram: P q 1 P • q II( • o where 0 . one might argue that the chain would tend to move to the right and its time-reversal to the left. 1/3) is invariant.9 Time reversal The transition matrix is 0 1/3 ( 2/3 49 p= 2/3 0 1/3 1/3) 2/3 0 and 1r = (1/3. the transpose of P.6 (Random walk on a graph) A gr. usually called vertices. .aph G is a countable collection of states.M) and this may be normalised to give a distribution in detailed balance with P.9. so P =I P and this chain is not reversible.. this ignores the fact that we reverse the chain in equilibrium. .~ i-I i i+l M-l M <P= 1- q < 1. If P were much larger than q. So a solution is given by A = ((p/q)i : i = 0. 1.1. . But p is not symmetric. An observer would see the chain spending most of its time near M and making occasional brief forays to the left.i for i = 0.~ II(. which behaviour is symmetrical in time.i+l = Ai+lPi+l.. Hence this chain is reversible..M ..1. for example: . Hence P = pT.

so that P is irreducible. We have to assume that every vertex has finite valency. then 1r == V / a is invariant and P is reversible. how long on average will it take to return? This is an example of a random walk on a graph: the vertices are the squares of the chessboard and the edges are the moves that the knight can take: . j) is an edge otherwise. There is a natural way to complete the diagram which gives rise to the random walk on G. Discrete-time Markov chains 1 2 4 3 Thus a graph is a partially drawn Markov chain diagram. The valency Vi of vertex i is the number of edges at i. Example 1. if the total valency a == LiEG Vi is finite. So.7 (Random chessboard knight) A random knight makes each permissible move with equal probability. It is easy to see that P is in detailed balance with V == (Vi : i E G). The random walk on G picks edges with equal probability: 1 1 2 1 3 1 2 1 2 1 3 1 3 4 1 1 2 3 2 3 Thus the transition probabilities are given by if (i. If it starts in a corner.9.50 1. We assume G is connected.

. if you enjoy solving sets of 64 simultaneous linear equations. . you might try finding 1r from 1r P == 1r. The four corner squares have valency 2. and the eight squares adjacent to the corners have valency 3. is reversible: (a) l-q p ) . There are 20 squares of valency 4.1 In each of the following cases determine whether the stochastic matrix P. .l.7 and the preceding example that so all we have to do is identify valencies.7.9 Time reversal 51 The diagram shows a part of the graph.N} andp~J == 0 if Ij -il2:: 2. 2 Alternatively.1. (b) (c) 1== {O. We know by Theorem 1. .3. or calculating lEe (Te ) using Theorem 1. .5! Exercises 1.9. and the 16 central squares have valency 8.. Hence lEc(Tc ) = 8 + 24 + 80 + 96 + 128 = 168. which you may assume is irreducible. 16 of valency 6.

identically distributed. Pi. Discrete-time Markov chains (d) (e) = {a. B C ~---"II E Find the probability that X and Y ever meet at a vertex in the following cases: (a) X starts at A and Y starts at B... (b) X starts at A and Y starts at E.. a particle at A jumps to B.. An essential tool is the following ergodic theorem for independent random variables which is a version of the strong law of large numbers. Let YI .. when both X and Y start at I. non-negative random . D p . DIet MI denote the expected time.._.10..2.i+l = P. Theorem 1.2 Two particles X and Y perform independent random walks on the graph shown in the diagram. I P for i 2:: 1. 1.. Show that 9MD = 16MB. 1. So.10 Ergodic theorem Ergodic theorems concern the limiting behaviour of averages over time.. Y 2 . We shall prove a theorem which identifies for Markov chains the long-run proportion of time spent in each state. 1. C or D with equal probability 1/3..52 1. . for example. For I = B. } and POI = 1. until they are once again both at I.i-l = 1 Pij = Pji for all i. Pi. be a sequence of independent.9.j E S.1 (Strong law of large numbers). ..

Fix N < 00 and set yJN) = YnAN. As N i 00 we have E(YI A N) i J_l by monotone convergence (see Section 6.as n mi ---t 00) = 1 where mi = Ei(Ti ) is the expected return time to state i.1. Proof.x. Moreover. for example. The case where J_l = 00 is a simple deduction. 1991)..2 (Ergodic theorem). If (Xn)n?O is Markov(.. + Yn n as n ~ 00. in the positive recurrent case.. The following result gives the long-run proportion of time spent by a Markov chain in each state.. with probability 1 -----~oo Y1 + . D We denote by Vi (n) the number of visits to i before n: n-l Vi(n) = L l{xk=i}' k=O Then Vi (n )/ n is the proportion of time before n spent in state i. + Yn n -----~J_lasn~oo ) = 1.!. Theorem 1..1 0 Ergodic theorem 53 variables with JE(Y1 ) = 11']) jj. So we must have. Then Y1 + ···+ Yn > Y1 n (N) + ···+ Yn n (N) _ ---t E(Y1 /\ N) asn~oo with probability one. Let P be irreducible and let be any distribution.x (Vi (n) n ---t . A proof for the case J_l < 00 may be found. for any bounded function f : I ~ lR we have where and where (7ri : i E I) is the unique invariant distribution. P) then ]p> . in Probability with Martingales by David Williams (Cambridge University Press. ..4).10. Then c (Yl + .

+ S..l). 8. 't the left-hand side being the time of the last visit to i before n. letting n ~ ~ 00 as n ~ 00) = 1. For T = T i we have P(T < 00) = 1 by Theorem 1. P) and independent of X o. Discrete-time Markov chains Proof. + S~n) ~ -t mi as n ) . mi Suppose then that P is recurrent and fix a state i. + S.1. the total number Vi of visits to i is finite. as in Section 1.!-. so it suffices to consider the case A = bi... . Now S't~l) + ••• + S~V~(n)-l) < n 1 _ . If P is transient..V~(n)) Vi(n) (1. . Write sir) for the length of the rth excursion to i.7 and (XT+n)n~O is Markov(8i . ..l) + . with probability 1.. Also the left-hand side being the time of the first visit to i after n .n -t 0= .5..8).!.1.t 00 = 1 and.5. Vi(n) < + .8) By the strong law of large numbers JP> (S~l) + • ~ ~. . are independent and identically distributed with Ei(S. we get JP> (Vi7n) -t mi as n . . The longrun proportion of time spent in i is the same for (XT+n)n>O and (Xn)n>O.V~(n)-l) Vi(n) n S.. which implies P (Vi(n) n -t .XT by the strong Markov property. since P is recurrent P(Vi(n) So. Hence S. so Vi(n) < Vi n .2). By Lemma 1.r)) = mi..l) :::. the non-negative random variables S. Xl.as n mi -t (0) = 1. 00 in (1.54 1.5.. then..t 00) = 1..

- 1 L - Vi(n) 1ri ) Ii k=O 'tEl ~ L IVi~n) iEJ iEJ . to begin.- I < c/4.1 0 Ergodic theorem 55 Assume now that (Xn)n~O has an invariant distribution (1ri : i E I).7ril + L IVi~n) .1..\xoPxoxl •• .PXN-1XN) = L i. The log-likelihood function is given by I(P) = log(. Given c > 0.7ril i~J ~ L I Vi~n) ~ 2~ " iEJ 7ril + L i~J (Vi~n) + 7ri) i~J I-nVi(n) - 1ri I +2~1ri. For any J ~ I we have I ~ n-l !(Xk) -! I = ~ ( -n. Let I : I ~ lR be a bounded function and assume without loss of generality that III ~ 1. for n ~ N(w) 1ri "I iEJ Vi(n) ~ -n. " We proved above that JP> (Vi ~n) ---t 7ri as n ---t 00 for all i) = 1. we have < c. choose J finite so that and then N = N(w) so that. for n ~ N(w). Consider. the case where we have N + 1 observations (Xn)O~n~N. which establishes the desired convergence.jEl N ij logpij . D We consider now the statistical problem of estimating an unknown transition matrix P on the basis of observations of the corresponding Markov chain. Then.

Discrete-time Markov chains up to a constant independent of P.. Since this is clearly false when i is transient. Thus we find N-l Pij = L N-l 1{Xn =i. where Y n = 1 if the nth transition from i is to j. By the strong Markov property Yl. .Y are N independent and identically distributed random variables with mean Pij. . A standard statistical procedure is to find the maximum likelihood estimate P. that is to say whether Pij ~ Pij with probability 1 as N ~ 00. we first try to maximize l (P) + 2: i. + Y N . . where N ij is the number of transitions from i to j.jEI /-LiPij and then choose (/-li : i E I) to fit the constraints. which shows that Pij is consistent. We now turn to consider the consistency of this sort of estimate. In the transient case this may involve restarting the chain several times. This is the method of Lagrange multipliers. Note that to find Pij we simply have to maximize 2: N jEI ij log Pij subject to E j Pij = 1: the other terms and constraints are irrelevant.56 1. by the strong law of large numbers P(Pij ~ Pij as N ~ 00) = 1. and Yn = 0 otherwise. But Nij = Y1 + . . Denote again by N ij the number of transitions from i to j. which is the choice of P maximizing l(P)..X n +1 =j}/ L l{x n =i} n=O n=O which is the proportion of jumps from i which go to j. To maximize the likelihood for (Pij : ij j E I) we still maximize 2: N jEI log Pij subject to Ej Pij = 1. which leads to the maximum likelihood estimate Pij = Nij/N. Since P must satisfy the linear constraint E j Pij = 1 for each i.. Suppose then that instead of N + 1 observations we make enough observations to ensure the chain leaves state i a total of N times. we shall slightly modify our approach. So.

so Yn = anyo.4.b/(l .3 we found a recurrence relation of the form aXn+l + bXn + CXn-l = 0 . Now Yn = Xn .10.) Show that (Ym)m~O is positive recurrent and find its invariant distribution.3 Let (Xn)n~O be an irreducible Markov chain on I having an invariant distribution Jr. (See Example 1. In Example 1. Having a fine artistic temperament.10.11 Appendix: recurrence relations 57 Exercises 1. He walks to the office in the morning and walks home in the evening.2 A professor has N umbrellas. Flowers costing x thousand pounds. This he does by sending flowers every day until she returns. then x = ax + b.10. Here is an account of the simplest cases. 1.4 we found a recurrence relation of the form Xn+l = aX n + b.3.11 Appendix: recurrence relations Recurrence relations often arise in the linear equations associated to Markov chains.a) satisfies Yn+l = aYn. so provided a =I 1 we must have x = b/(l . When a = 1 the general solution is obviously Xn = Xo +nb.4.4 An opera singer is due to perform a long series of concerts.1 Prove the claim (d) made in example (v) of the Introduction. bring about a reconciliation with probability yIX.a). The promoter stands to make £750 from each successful concert.1. independently of past weather. We look first for a constant solution X n = x.4. For J ~ I let (Ym)m~O be the Markov chain on J obtained by observing (Xn)n~O whilst in J. she is liable to pullout each night with probability 1/2. How much should he spend on flowers? 1. A more specialized case was dealt with in Example 1. 0 ~ x ~ 1. Suppose that it rains on each journey with probability p.3. Thus the general solution when a =I 1 is given by Xn = Aa n + b/ (1 - a) where A is a constant. 1. What is the long-run proportion of journeys on which the professor gets wet? 1.10. If it is raining he likes to carry an umbrella and if it is fine he does not. Once this has happened she will not sing again until the promoter convinces her of his high regard.1. In Example 1.

. so by induction Yn = Xn for all n. Hence the general solution is given by Ao:n + B{3n if 0: # {3 n { (A + nB)an if 0: = {3. Yn = Xn for all n. The case 0: = (3 = 0 does not arise. then a-X 2 + b-X + c = O. Then Yn = Ao: n + B{3n is a solution. We make use of the power series expansions for 10g(1 It I < . Denote by 0: and {3 the roots of this quadratic. then Yn = (A + nB)o:n is a solution and we can solve so that Yo = Xo and Yl = Xl. Here is a derivation.t) = -t .. so that Yo = Xo and Yl xl=Ao:+B{3 = Xl.58 1. by the same argument. then. If 0: =1= (3 then we can solve the equations xo=A+B. 1 + t) = t - ~t2 + ~t3 - 10g(1 . Let us try a solution of the form X n = -X n. X = 1.~t2 . but for all n.~t3 - By subtraction we obtain 2 log 1 (l+t) t 1_ = 13 15 t + 3 t +:5 t +. Discrete-time Markov chains where a and c were both non-zero. If 0: = {3 # 0.. 00 ).12 Appendix: asymptotics for n! Our analysis of recurrence and transience for random walks in Section 1.6 rested heavily on the use of the asymptotic relation n! rv Ayri(nje)n as n ~ 00 for some A E [1. ..

Hence an ~ a for some a E [0. . ::. (2n+1)5 -1 1 1 = 3 (2n + 1)2 + 5 (2n + 1)4 + · .12(n + 1) ... 1 3 (2n + 1)2 .an+! = (2n = logAn. 3" (2n (2n 1)4 1 + II} { + + + ··· 1 1 1 12n .12 Appendix: asymptotics for n! 59 Set An = n!/(nn+l/2 e-n) and an calculation an .1/(12n) increases as n ~ 00. Then.1. by a straightforward + 1)2" log 1 (1+(2n+l)-1) 1 _ (2n + 1)-1 . where A = ea.1.1 It follows that an decreases and an . 00) and hence An ~ A.. By the series expansion written above we have an-an+! = (2n+1) { Ill 1 1 1)2 (2n+1) +3"(2n+1)3 +5 II} + . as n ~ 00.

j E I) . Some examples of more general processes are given in Section 2.9. These processes are simple and particularly important examples of continuous-time chains. but once up and running it follows a very similar pattern to the discrete-time case. The theory takes some time to set up. If you wish. provided you take certain basic properties on trust.1.5 on birth processes provide a gentle warm-up for general continuoustime Markov chains. deal with the heart of the continuous-time theory. which are reviewed in Section 3.4 on the Poisson process and Section 2. A Q-matrix on I is a matrix Q satisfying the following conditions: = (qij : i. The first three sections of Chapter 2 fill in some necessary background information and are independent of each other.7 and 2. especially 2.8.6-2. As in Chapter 1 the exercises form an important part of the text. 2. Sections 2. To emphasise this we have put the setting-up in this chapter and the rest in the next. you can begin with Chapter 3. There is an irreducible level of difficulty at this point.8.2 Continuous-time Markov chains I The material on continuous-time Markov chains is divided between this chapter and the next. so we advise that Sections 2.1 Q-matrices and their exponentials In this section we shall discuss some of the basic properties of Q-matrices and explain their connection with continuous-time Markov chains. Section 2.8 are read selectively at first. Let I be a countable set.

. The numbers qi are not shown on the diagram. We may think of the discrete parameter space {O. ) is by the function (e tq : t ~ 0). but you can work them out from the other information given. j) arrow on the diagram. We shall later interpret qi as the rate of leaving i. for all i qij ~ 0 =I j.. Consider now a finite set I and a matrix .00)... } as embedded in the continuous parameter space [0. 1.00) a natural way to interpolate the discrete sequence (pn : n = 0.1.2. in this case Q= (~2 2 !1 1 -3 ~) Thus each off-diagonal entry qij gives the value we attach to the (i.1 Q-matrices and their exponentials (i) 0 ~ (ii) -qii 61 < 00 for all i. . subject only to the constraint that the off-diagonal row sum is finite: qi = Lqij j#i < 00. for example: 1 3 1 2 Each diagram then corresponds to a unique Q-matrix.2. For P E (0. which we shall interpret later as the rate of going from i to j. . i. making the total row sum zero. (iii) L jEI qij = 0 for all Thus in each row of Q we can choose the off-diagonal entries to be any nonnegative real numbers.2. where q = logp. A convenient way to present the data for a continuous-time Markov chain is by means of a diagram. The diagonal entry qii is then -qi.

.1. the series converges componentwise and we denote its limit by eQ . Continuous-time Markov chains I P = (Pij : i. Suppose then that we can find a matrix Q with eQ = P.10.1.2. Is there a natural way to fill in the gaps in the discrete sequence (p n : n = 0. . j E I).62 2. Then (P(t) : t ~ 0) has the following properties: (i) P(s + t) = P(s)P(t) for all s. if two matrices Q1 and Q2 commute..2.. so e sQ e tQ = e(s+t)Q proving the semigroup property. Theorem 2.1. (ii) (P(t) : t ~ 0) is the unique solution to the forward equation ~ P(t) = P(t)Q. Set P(t) = e tQ . (iii) (P(t) : t ~ P(O) = I. Let Q be matrix on a finite set I. 0) is the unique solution to the backward equation P(O) = I. sQ and tQ commute.1.. The matrix-valued power series P(t) = f: (t~)k k=O . Then e nQ = (eQ)n = p n so (e tQ : t ~ 0) fills in the gaps in the discrete sequence. t E lR.j E I). .. )? For any matrix Q = (qij : i. then The proofs of these assertions follow the scalar case closely and are given in Section 2. we have Proof. For any s. (iv) for k = 0. Moreover. t (semigroup property).

A similar argument proves uniqueness for the backward equation. (ii) LPij = 1 for all i.!!.tQ ) dt dt dt = M(t)Qe. Later we shall also use the convention that f(t) = o(t) means f(t)/t ~ 0 as t ~ o.-e.-M(t)) e.tQ is constant. LJ (n) LJ LJ (n LJ ( n LJ kEI kEI jEI jEI kEI .2. Moreover by repeated term-by-term differentiation we obtain (iv).!!.2. P (t) = L (k _ k=1 00 t k. j and t ~ 0 sufficiently small. D The last result was about matrix exponentials in general. Theorem 2.1 Qk 1)! = P(t)Q = QP(t).-(M(t)e-tQ ) = (. It remains to show that P(t) is the only solution of the forward and backward equations. for some C < 00.tQ = 0 so M(t)e. then .j and all t ~ o.10).tQ + M(t) (. jEI We recall the convention that in the limit t ~ 0 the statement f(t) = O(t) means that f(t)/t ~ C for all sufficiently small t.j. But if M(t) satisfies the forward equation. Since P(t) = P(t/n)n for all n.1.tQ + M(t)( -Q)e. Proof. So each component is differentiable with derivative given by term-by-term differentiation: . Recall that a matrix P = (Pij : i. As t ! 0 we have so qij ~ 0 for i =1= j if and only if Pij (t) ~ 0 for all i. A matrix Q on a finite set I is a Q-matrix if and only if P(t) = etQ is a stochastic matrix for all t ~ o.1 Q-matrices and their exponentials 63 has infinite radius of convergence (see Section 2. If Q has zero row sums then so does Qn for every n: """ qik = """ """ qij -1) qjk = """ qij -1) """ qjk = 0 .!!. Now let us see what happens to Q-matrices. and so M(t) = P(t). j E I) is stochastic if it satisfies (i) 0 ~ Pij < 00 for all i. Hence P(t) satisfies the forward and backward equations. it follows that qij ~ 0 for i =1= j if and only if Pij(t) ~ 0 for all i.

In particular.. P) when sampled at integer times.1..1. where Pij(t) is the (i.. a continuous-time process (Xt)t. all times io.2. we can do some sort of filling-in of gaps at the level of processes.j) entry in etQ . . Continuous-time Markov chains I tn LPij(t) jEI = 1 + L I" Lqij n.. We shall now give some examples where the transition probabilities Pij(t) may be calculated explicitly. . as we shall see in Section 2. It should not then be too surprising that there is. the transition probability from i to j in time t is given by ° (Recall that := means 'defined to equal'. we shall see in Section 2..) You should compare this with the defining property of a discrete-time Markov chain given in Section 1. To anticipate a little. . . ) is discrete-time Markov(. .1. eQ / m ).. We define a process indexed by {n/m : n = 0.i n + 1 ...2. then L qij jEI ~I LPij(t) jEI = O.1. if P is a stochastic matrix of the form eQ for some Q-matrix. } by Then (Xn : n = 0.8. } as time-parameter sets which give rise to Markov(. On the other hand.8 that a continuous-time Markov chain (Xt)t~O with Q-matrix Q satisfies ~ to ~ .x.1. ~ t n +l and all states for all n = 0.x. n=l jEI 00 (n) = 1.64 So 2. Fix some large integer m and let (X~)n~O be discrete-time Markov(.... (eQ/m)m) (see Exercise 1.~o which also has this property. . D t=O Now. if LjE! Pij(t) = 1 for all t = ~ 0.1..2. .x.2.2) and Thus we can find discrete-time Markov chains with arbitrarily fine grids {n/m : n = 0. .

2. We begin by writing down the characteristic equation for Q: o = det (x - Q) = x (x + 2) (x + 4).I 0 o (-4t)k ~ e- ) U- I . This shows that Q has distinct eigenvalues 0.) To determine the constants we use 1 = PII (0) = a + b + c. 7= so qii) = P~I (0) = 4b + 16c.6. (This is because we could diagonalize Q by an invertible matrix U: Then 0 (-2t)k 0 ) U. 4t so PII(t) must have the form claimed. band c. -4. -2. Then PII(t) has the form PII(t) = a + be.1.1 Q-matrices and their exponentials Example 2. -2 = qII = P~I (0) = -2b .1.3 65 We calculate PII(t) for the continuous-time Markov chain with Q-matrix Q= (~2 2 !1 1 -3 ~) The method is similar to that of Example 1.4c.4t for some constants a. .2t + ce.

N}..1 Compute Pll(t) for P(t) = etQ . these are the Poisson probabilities of parameter At.At for i < N. Pii (0) == 1.. by induction = e.~~.A. Then.1. pii(t) i <j < N so.. A ~. . for Pij(t) = e->.. P~N(t) = APiN-l(t). 1 -----------. for i < j < N. We can solve these equations. where yt is a Poisson random variable of parameter At. for i < N.4 We calculate Pij(t) for the continuous-time Markov chain with diagram given above.. PiN(O) = 0.. So. where Q= (~2 2 !4 1 -3 ~).t (At)j-i (j _ i)!' If i = 0. The exponential of an upper-triangular matrix is upper-triangular. 2 N-l N Example 2.. Pij (0) == 0.1. so Pij(t) = 0 for i > j.. P~j(t) = -APij(t) + APi. for i < N. Continuous-time Markov chains I A . In components the forward equation P'(t) = P(t)Q reads P~i(t) = -APii(t).66 2. First.. Exercises 2..j-l (t). starting from 0. the distribution of the Markov chain at time t is the same as the distribution of min{yt..o. ---41.. The Q-matrix is -A A -A A A -A Q= o A where entries off the diagonal and super-diagonal are all zero.

at least in principle.6. X t1 = il. . which is proved in Section 6.2 Which of the following matrices is the exponential of a Q-matrix? 67 (a) (~ ~) (b) (~ ~) (c) (~ ~). These should enable us to find..in =l=i where ql....1. the probability of any event depending on a right-continuous process can be determined from its finite-dimensional distributions. This means in this context that for all wEn and t ~ 0 there exists € > 0 such that for t ~ s ~t + €.Xtn = in) for n ~ 0. . By a standard result of measure theory.. We are going to consider ways in which we might specify the probabilistic behaviour (or law) of (Xt)t~o. or P(Xt = i for some t). To avoid these subtleties as far as possible we shall restrict our attention to processes (Xt)t~O which are right-continuous. . . .00)) = 1-n--+-oo lim = ji.. What consequences do your answers have for the discrete-time Markov chains with these transition matrices? 2. .Xtn = in).2 Continuous-time random processes Let I be a countable set.2. .in E I. 0 ~ to ~ tl ~ . that is. They arise because. for a countable disjoint union whereas for an uncountable union Ut>o At there is no such rule. . For example '""" L. any probability connected with the process. There are subtleties in this problem not present in the discrete-time case. ..J P(Xq1 P(Xt = i for some t E [0...2 Continuous-time random processes 2. . is an enumeration of the rationals. ~ t n and i o.. from the probabilities lP(Xto = io. such as lP(Xt = i) or lP(Xto = io. q2. A continuous-time random process with values in I is a family of random variables X t : n ~ I.Xqn = jn) jl .

this is illustrated below. it may explode again.1 t The second case is where the path makes finitely many jumps and then becomes stuck in some state forever: • Jo = 0 J. t]: • o • • o ~- Jo = 0 1. but only finitely many in any interval [0. Continuous-time Markov chains I Every path t ~ Xt(w) of a right-continuous process must remain constant for a while in each new state. or it may not.68 2. maybe infinitely often. . so there are three possibilities for the sorts of path we get. In the first case the path makes infinitely many jumps. In this case.2 t In the third case the process makes infinitely many jumps in a finite interval. after the explosion time ( the process starts up again.

the holding times... the final value.. and. The (first) explosion time ( is defined by (= supJn = n I:Sn.2. .....------e.. They are obtained from (Xt)t~O by for n = 0. the jump times of (Xt)t~O and 8 1 . - We call Jo.2. 00 say. ... or the jump chain if it is a discrete-time Markov chain....... This is simply the sequence of values taken by (Xt)t~O up to explosion..2 Continuous-time random processes 69 :. .1.. .. where inf 0 = 00._----. Note that right-continuity forces 8 n > 0 for all n. and require that X t = 00 if t ~ (. If I n + 1 = 00 for some n.82 . .- • .-- . • " . ..... n==1 00 The discrete-time process (Yn)n~O given by Y n = X Jn is called the jump process of (Xt)t~O. The terminology 'minimal' does not refer to the state of the process but to the . Any process satisfying this requirement is called minimal. if I n -1 < 00 otherwise.. .. for n = 1.. we define X oo = XJn .. . So it is convenient to adjoin to I a new state...-. J 1 . We shall not consider what happens to a process after explosion... Jo = 0 t 14----. otherwise X oo is undefined..... ~~~~~ . • ...

Theorem 2. If -X > 0. .1 (Memoryless property). .00] has exponential distribution of parameter -X (0 ~ -X < 00) if lP(T > t) = e.00] has an exponential distribution if and only if it has the following memoryless property: lP(T> s Proof. t < I n +1) = i for some t E [0. The exponential distribution plays a fundamental role in continuous-time Markov chains because of the following results.At for all t ~ O. the probability that X t = i is given by IP(Xt = i) = and lP(Xt n=O L IP(Y 00 n = i and I n :::. A random variable T: n ~ (0. then T has density function The mean of T is given by E(T) = 1 00 IP(T > t)dt = A-I.3 Some properties of the exponential distribution A random variable T : n ~ [0.82. We write T rv E(-X) for short. rv IP(T> s + tiT> s) = P(T> s + t) IP(T> s) = e-A(s+t) e->'s = -At e = IP(T > t). and (Yn)n~O we have another 'countable' specification of the probabilistic behaviour of (Xt)t~o.3. Note that a minimal process may be reconstructed from its holding times and jump process. Suppose T + tiT> s) = E(-X). Continuous-time Markov chains I interval of time over which the process is active. Thus by specifying the joint distribution of 8 1 ..00)) = lP(Yn = i for some n ~ 0). t ~ O.. then lP(T > t) for all s.70 2. For example. 2.

. Proof. by monotone convergence so . and.At . . 0 The next result shows that a sum of independent exponential random variables is either certain to be finite or certain to be infinite. This will be used to determine whether or not certain continuous-time Markov chains can take infinitely many jumps in a finite time. since we can choose rand s arbitrarily close to t. and gives a criterion for deciding which is true. for integers p. 1 (i) If~ An < 00 00.3 Some properties of the exponential distribution 71 On the other hand. A < 00.AT for all rationals r > O. l. (ii) If ~ An 00 1 = 00. be a sequence of independent random variables with 8 n rv E(A n ) and 0 < An < 00 for all n.q ~ 1 g(p/q) = g(l/q)P = g(l)pjq so g(r) = e. this forces g(t) so T rv E(A).3. We assumed T > 0 so that g(l/n) > 0 for some n.A for some 0 ::. Then g(t) = JP>(T > t) satisfies g(s + t) = g(s)g(t) for all s. choose rationals e. . suppose T has the memoryless property whenever JP>(T> s) > O. by induction so g(l) = e. then then JP> (00) = ~ Sn < 00 l.AS = e. Then. For real t r. s > 0 with r ::. (i) Suppose E~1 1/ An < Then.2. t ::.2. Let 8 1 . > 0. Theorem 2. JP> (00) = ~ Sn = 00 00. 8 2 . Since 9 is decreasing. t ~ O. By the same argument. s.AT = g(r) ~ g(t) ~ g(s) = e.

Theorem 2. .3. D The following result is fundamental to continuous-time Markov chains. lP(K = k and T ~ t) ~ = lP(Tk = 1 it t t and Tj > Tk for all j =I k) 00 qke-qkBP(Tj > s for all j 1= k)ds = (OO qk e . we have rv E(A) and R rv jjJP>(S ~ t < S + R) = AJP>(R ~ t < R+ S). with T rv E(q) and lP(K = k) = qk/q.3. D = 1 and T and K have the claimed joint The following identity is the simplest case of an identity used in Section 2.qkB 00 IT e-qjBds j# = 1 qke - q 8 qk ds = -e . Continuous-time Markov chains I 00.3. Theorem 2. otherwise let K be undefined. Then rr~o(1 + 1/ An) = By monotone convergence and independence so JP> (f n==l Sn = 00) = 1. Proof. Let I be a countable set and let Tk' k E I. Moreover. with probability 1. q Hence lP(K = k for some k) distribution.4. (ii) Suppose instead that E~11/An = 00. For independent random variables S E(jj) and for t ~ 0. T and K are independent. Set T = inf k T k .72 2. be independent random variables with Tk rv E(qk) and 0 < q := EkE! qk < 00. Then this infimum is attained at a unique random value K of k. Set K Then = k if T k < T j for all j =I k.q t .8 in proving the forward equations for a continuous-time Markov chain.

first in the special case = 1 for all n. T} ~ t} are independent..2. 8 2 . Show that AlSl is exponential of parameter 1. that JP> (f Sn = n=l 00) = 1... ..3. . We shall also see that they may serve as building blocks for the most general continuous-time Markov chain. Moreover. a Poisson process is the natural probabilistic model for any uncoordinated stream of discrete events in continuous time. The key result is Theorem 2.3 Let 81. respectively.4.. Is the condition sUPn An < 00 absolutely necessary? 2. 2. . which provides three different descriptions of a Poisson process.2.4 Poisson processes Poisson processes are some of the simplest examples of continuous-time Markov chains. = 2::1 Ti has exponential distribution of parameter A(3. . T? Show that the two events {S < T} and {min{S.4 Poisson processes Proof.. The reader might well begin with the statement of this result and then see how it is used in the . .1 Suppose Sand T are independent exponential random variables of parameters Q and (3 respectively. Exercises D 2. T 2 . both as a gentle warm-up for the general theory and because they are useful in themselves.3. An Use the strong law of large numbers to show. We have 73 from which the identity follows by symmetry.3.3. 2. A2. and then subject only to the condition sUPn An < 00. So we shall study Poisson processes first. T} ? What is the probability that S :::. . n = 1. What is the distribution of min{S. be independent exponential random variables with parameters AI. be independent exponential random variables of parameter A and let N be an independent geometric random variable with JP>(N Show that T = n) = (3(1 - (3)n-l. .2 Let Tl.

••-------ec. . 8 2. I n = 81 + .. of independent exponential random variables of parameter A.. . 8 2 . ~ ..'-----------_""""_--------" Jo = (} J. .. A right-continuous process (Xt)t. ..2).. . . .3. ..······ ... 1. . Continuous-time Markov chains I theorems and examples that follow.74 2.. .. : ~ ..4 t . 1 .. : : 0. We shall begin with a definition in terms of jump chain and holding times (see Section 2. A simple way to construct a Poisson process of rate A is to take a sequence 8 1 . . 2 .. : . -A A -A A Q= By Theorem 2. . ....2 (or the strong law of large numbers) we have P(Jn ~ 00) = 1 so there is no explosion and the law of (Xt)t...> ..-- 4 3 : -: :. : .. . .. : . ... • . Here is the diagram: A A A A 234 The associated Q-matrix is given by 1 o .... } is a Poisson process of rate A (0 < A < 00) if its holding times 8 1.: .2..... + 8 n and then set 5 . : O..~o with values in {O... ..... •---00········ ' .. : : ... ..... are independent exponential random variables of parameter A and its jump chain is given by Yn = n... : .... :.~o is uniquely determined.-. .o4l~---""'-----'-. to set Jo = 0. .

82 .2.82 . are themselves independent E(-X). We have = i. 81 . are independent E(A).. leads to a memoryless property of the Poisson process.X s .. s). .8i and {X s = i}.82 .. we shall see in Section 6. Set X t = X s +t .1. 8 n = 8 i +n for n ~ o ill( s Recall that the holding times 8 1 . Proof.4.82 . (s . . ..J i ). .5. then by the memoryless property of 8 i +1 and independence. . We now show how the memoryless property of the exponential holding times. and independent of 8 1 . conditional on {Xs = i}.. s 81 . by an argument in essentially the same spirit that the result also holds with s replaced by any stopping time T of (Xt)t~o. Hence.. are independent E(-X). of (Xt)t~O are given by 81 = 8 i +1 as shown in the diagram. conditional on {X s = i}.. for any s ~ 0.. D In fact. Hence. It suffice~to prove the claim conditional on the event X s each i ~ O. independent of(Xr : r:::.4 Poisson processes 75 The diagram illustrates a typical path.. . Then..Xs)t~O is also a Poisson process of rate -X.1 (Markov property)... (X s+t . (Xt)t~O is a Poisson process of rate A and independent of (X r : r ~ s).3. 81. Condition on 8 1 . for On this event Xr = and the holding times L j==1 i l{srSt} for r ::. Let (Xt)t~O be a Poisson process of rate -X. . .8i . Theorem 2. Theorem 2. .

Ah + o(h). Here is some standard terminology. Let (Xt)t~O be an increasing. which implies (b).82 .4.X s over any interval (s. independent of(Xs:s~T).2 (Strong Markov property). (a) => (b) If (a) holds. then.3. the increment X t +h .X t = 1) = Ah + o(h). So (Xt)t~O has independent increments and as h ! 0 lP(Xt+h . Then the following three conditions are equivalent: (a) (jump chain/holding time definition) the holding times 8 1 . which gives two conditions equivalent to the jump chain/holding time characterization which we took as our original definition. If (Xt)t~O satisfies any of these conditions then it is called a Poisson process of rate A..76 2.Ah )2 = o(h). as h ! 0. uniformly in t..X t ~ 1) lP(Xt+h .X t ~ 2) = lP(Xh ~ = lP(Xh ~ 1) 2) = lP(J1 = lP(J2 ~ h) ~ h) = 1 . Proof.X t has the same distribution as X h and is independent of (X s : s ~ t). Continuous-time Markov chains I Theorem 2. Then. of (Xt)t~O are independent exponential random variables of parameter A and the jump chain is given by Yn = n for all n. IF(Xt +h .X s depends only on t ~ O. We come to the key result for the Poisson process. right-continuous integervalued process starting from O. (b) (infinitesimal definition) (Xt)t~O has independent increments and. X t has Poisson distribution of parameter At.~o be a Poisson process oErate A and let T be a stopping time oE(Xt)t~o. Thus we have three alternative definitions of the same process. t]. IF(Xt +h . we can consider its increment X t . We say that (Xt)t~O has stationary increments if the distribution of X s+t .XT )t~O is also a Poisson process of rate A. for each t. Let 0 < A < 00. ~ IF(81 ~ h and 8 2 ~ h) = (1 - e. Theorem 2. (XT +t .X t = 0) = 1 . If (Xt)t~O is a real-valued process. by the Markov property.. . h ~ 0. for any t.e-). Let (Xt)t. We say that (Xt)t~O has independent increments if its increments over any finite collection of disjoint intervals are independent. conditional on T < 00. (c) (transition probability definition) (Xt)t~O has stationary independent increments and. .4.h = Ah + o(h).

2.Ah + o(h) )Pj(t) + (Ah + o(h))Pj-l (t) + o(h) so Pj(t + h~ . for i = 2. .X s rv P(At).4. Since this estimate is uniform in t we can put t = s . Pj(t + h) = P(XHh = j) = = L P(X i==O j Hh - X t = i) P(Xt = j . but also (X s+t ....Pj(t) = ->'Pj(t) + >'Pj-l(t) + O(h). . As we saw in Example 2.3. Ther:. D The differential equations which appeared in the proof are really the forward equations for the Poisson process.. consider the . this system of equations has a unique solution given by (At)j Pj(t) = e. Hence X t rv P(At).h to obtain for all s._.X t = i) = o(h) as h ! 0.. j = 0. (c) ::::} (a) There is a process satisfying (a) and we have shown that it must then satisfy (c).i) (1 .. To make this clear. so must every process satisfying (c).Xs)t~O satisfies (b)..1.At _. Since X o = 0 we have initial conditions PO(O) = 1. for j = 1. .. we have P(Xt + h . then. for any s.. So if one process satisfying (c) also satisfies (a). .?-h Now let h ! 0 to see that Pj(t) is first continuous and then differentiable and satisfies the differential equation By a simpler argument we also find p~(t) = -APO(t). so the above argument shows X s +t .. then certainly (Xt)(~O has independent increments. uniformly in t.2.2. . But condition (c) determines the finite-dimensional distributions of (Xt)t~O and hence the distribution of jump chain and holding times. Pj(O) = 0 for j = 1. Set Pj(t) = P(Xt = j). J..1.4 Poisson processes 77 (b) ::::} (c) If (b) holds. . which implies (c).2. If (Xt)(~O satisfies (b).

and it is likely that one will be easier to check than another.j-l(t) .X t == 1) == Ah + o(h). P(¥t+h . Theorem 2. according to which (Xt)t~O and (yt)t~O have independent increments and. P(Xt+h . We shall use the infinitesimal definition. = APi. Continuous-time Markov chains I possibility of starting the process from i at time 0. as h ! 0.Zt = 1) = P(Xt+h . Hence (Zt)t~O is a Poisson process of rate A + J-l.Ah + o( h) )(J-lh + o( h)) = (A + J-l)h + o(h). (Zt)t~O has independent increments and.X t = 0) == 1 . as h ! 0. in matrix form.4.Ah + o(h))(l . P'(t) = P(t)Q. P(Xt+h . writing Pi as a reminder.APij(t). since (Xt)t~O and (yt)t~O are independent.J-lh + o(h)) = 1 . Then. respectively. On the other hand it can also be useful when answering a question about a given Poisson process as this question may be more closely connected to one definition than another. If (Xt)t~O and (yt)t~O are independent Poisson processes of rates A and J-l.yt == 1) == (Ah + o( h) ) (1 . D . and we could rewrite the PiO(O) = 8iO .J-lh + o( h)) + (1 . you might like to consider the difficulties in approaching the next result using the jump chain/holding time definition. uniformly in t.78 2.yt = 0) = (1 .Ah + o(h).yt == 0) + lP(Xt+h . uniformly in t.yt = 1) = J-lh + o(h). then (Xt + yt)t~O is a Poisson process of rate A + J-l. Then.3 contains a great deal of information about the Poisson process of rate A. by spatial homogeneity Pij(t) differential equations as p~O(t) = -APiO(t). Theorem 2. Pij(O) = 8ij or. and set Pij(t) = Pi(Xt = j). It can be useful when trying to decide whether a given process is a Poisson process as it gives you three alternative conditions to check.4. P(Zt+h .Zt = 0) = P(Xt+h .(A + J-l)h + o(h).X t = O)P(yt+h .4. Proof.J-lh + o(h). lP(Zt+h . Set Zt = X t + yt.X t = l)lP(yt+h .X t == O)lP(yt+h .¥t = 0) = 1 . For example. for Q as above. P(O) = I. P(¥t+h . P~j(t) = Pj-i(t).

...Sn+1 have joint density function An+1 e-'x(Sl +. . Then.. Proof. Notice that the conclusions are independent of the rate of the process considered. .. dtn and since IF(Xt = n) = e-..tn ) = n! l{o~tl~ . ~tn~t}.In have the same distribution as an ordered sample of size n from the uniform distribution on [O. . the jump times J 1 . Theorem 2. Then.. . Let (Xt)(~O be a Poisson process. s + t]. the jump times J 1 .6..+Sn+l) 1 {Sl .Sn+l ~O} so the jump times J1..Xu = O)/lF(Xt = 1) = Aue-'xue-......In+1 have joint density function So for A ~ jRn we have IF((J1..Xt as required. the time at which that jump occurs is uniformly distributed on [s. conditional on (Xt)t~O having exactly one jump in the interval [s... ~tn~t}dt1 An In! we obtain .Xt An 1 = n) = IF((J1.X(t-u) /(Ate-'xt) = u/t. . .. . Proof.tn)EA 1{O~tl~ .4.In ) E A and I n ~ t < I n+1) (tl ..4. . for o ~ u ~ t.t]. it suffices to consider the case s = O.4 Poisson processes 79 Next we establish some relations between Poisson processes and the uniform distribution. Let (Xt)t~O be a Poisson process. . conditional on the event {Xt = n}. . . . ... We shall use the finite-dimensional distribution definition. . 0 . Then.. IF(J1 ~ u I X t = IF(J1 ~ u and X t = l)/lF(Xt = = IF(Xu = 1 and X t . By stationarity of increments.In have joint density function f(t1. 1) = 1) D Theorem 2..5. . s + t]. . The holding times S1..In ) E A and X t = e-... . . conditional on {Xt = n}.2. The results say in effect that the jumps of a Poisson process are as randomly distributed as possible. Thus.

Show also (from the same definition and without assuming the strong Markov property) that and hence that J 2 . and the number of blackbirds is a Poisson process (Bt)t~O of rate (3. The probability that in any small interval of duration h a robin will arrive is found to be ph+o(h).J1 is also exponential of parameter A and independent of J1. What is the probability that the first two birds I see are both robins? What is the distribution of the total number of birds seen in time t? Given that this number is n.4 the total number of birds seen in an interval of duration t has Poisson distribution of parameter (p + (3)t. Show directly from this definition that the first jump time J1 of a Poisson process of rate A is exponential of parameter A.)/ + )(p+(3)n) k! (n . Exercises 2.1 State the transition probability definition of a Poisson process.7 Robins and blackbirds make brief visits to my birdtable. The times spent waiting for the first robin or blackbird are independent exponential random variables 8 1 and T 1 of parameters p and (3 respectively. the number of robins seen by time (Rt)t~O of rate p. Continuous-time Markov chains I We finish with a simple example typical of many problems making use of a range of properties of the Poisson process.4. what is the distribution of the number of blackbirds seen in time t? t is a Poisson process By the infinitesimal characterization. . By Theorem 2.k)! n! (e- (e- (e- = (~) (p~~)k (p:~)n-k so if n birds are seen in time t. Example 2. Finally JP>(Bt = k I R t + Bt = n) = JP>(Bt = k and R t = n . by the memoryless property of T 1 . the probability that the first two birds are robins is p2 / (p + (3)2.4.k)/JP>(Rt + B t = n) ppn k f3 CP f3 = (3k) . whereas the corresponding probability for blackbirds is (3h+o(h). then the distribution of the number of blackbirds is binomial of parameters nand {3/ (p + (3).80 2.4. So a robin arrives first with probability p/(p + (3) and.

2.1. conditional on X o == i.] How long on average does it take to cross two similar lanes (a) when one must walk straight across (assuming that the pedestrian will not cross if. 2. qi+1.4 A radioactive source emits particles in a Poisson process of rate A.3 Arrivals of the Number 1 bus form a Poisson process of rate one bus per hour. What is the distribution of the number of particles recorded in time t? 2. a car would pass in either direction). is the probability that I wait for 30 minutes without seeing a single bus? 2. and its jump chain is given by Y n == i + n. Assuming that the pedestrian can foresee correctly the times at which vehicles will pass by. } U {oo} is a birth process of rates (qj : j ~ 0) if.2 Show directly from the infinitesimal definition that the first jump time J 1 of a Poisson process of rate A has exponential distribution of parameter A. how long on average does it take to cross over safely? [Consider the time at which the first car passes..4. (a) What is the probability that exactly three buses pass by in one hour? (b) What is the probability that exactly three Number 7 buses pass by while I am waiting for a Number I? (c) When the maintenance depot goes on strike half the buses break down before they reach my stop.5 Birth processes A birth process is a generalization of a Poisson process in which the parameter A is allowed to depend on the current state of the process. Suppose the number of vehicles that have passed by time t is a Poisson process of rate A. at any time whilst crossing.5 Birth processes 81 2.4. 1. The data for a birth process consist of birth rates 0 ~ qj < 00. . and suppose it takes time a to walk across the lane. are independent exponential random variables of parameters qi.5 A pedestrian wishes to cross a single lane of fast-moving traffic.4. respectively. We begin with a definition in terms of jump chain and holding times. where j == 0.2. A Geiger counter placed near the source records a fraction p of the particles emitted.. ..4. (b) when an island in the middle of the road makes it safe to stop half-way? 2.. A minimal right-continuous process (Xt)(~O with values in {O.2.82 . then. and arrivals of the Number 7 bus form an independent Poisson process of rate seven buses per hour. What.. The particles are each emitted in an independent random direction. . its holding times 8 1 . .

82

2. Continuous-time Markov chains I

ql

o

~

1

2

. . ..
q2
~

q3

3

4

.

The flow diagram is shown above and the Q-matrix is given by:

Q=

Example 2.5.1 (Simple birth process) Consider a population in which each individual gives birth after an exponential time of parameter -X, all independently. If i individuals are present then the first birth will occur after an exponential time of parameter iA. Then we have i + 1 individuals and, by the memoryless property, the process begins afresh. Thus the size of the population performs a birth process with rates qi = i-X. Let X t denote the number of individuals at time t and suppose X o = 1. Write T for the time of the first birth. Then E(Xt ) = E(XtlT~t)
=

it

+ E(Xt lT>t)

,xe-ASlE(X

tI

T = s )ds

+ e- At •

Put J-l(t)

= E(Xt ), then E(Xt IT = s) = 2J-l(t - s), so
p,(t)
=

it

2,xe- AS p,(t - s)ds + e- At

and setting r

=t- s

By differentiating we obtain

so the mean population size grows exponentially:

2.5 Birth processes

83

Much of the theory associated with the Poisson process goes through for birth processes with little change, except that some calculations can no longer be made so explicitly. The most interesting new phenomenon present in birth processes is the possibility of explosion. For certain choices of birth rates, a typical path will make infinitely many jumps in a finite time, as shown in the diagram. The convention of setting the process to equal 00 after explosion is particularly appropriate for birth processes!

8

· ~

. . : :

:.. ~ .:•..................

7
6

;
·

:
.

:
:

:.. : .•..................
:.. ~ •...................

; ·

. .

:

5
4 3

:
· ~

:
:
"

:
:

:.. ~
~.:-::

.
.
.

· :
~

. .

:

...-...0. : .:.:

2

..............• ---lIQo"" .:.. : .:.:: ~

.

1 ....................••----.....e~

: :.. :.::

.
t

0---41~------e--------'-----""'--"""-&....&o&_------..

Jo == 0

In fact, Theorem 2.3.2 tells us exactly when explosion will occur. Theorem 2.5.2. Let starting from o. (i) If
(Xt)(~O

be a birth process of rates

(qj

:

j

> 0),

I: 00 00

1

< 00, then P(( < 00)
= 00,

= 1.

j==O qj

1

(ii) If:E j==O qj

then P(( = 00) = 1.
D

Proof. Apply Theorem 2.3.2 to the sequence of holding times 8 1 ,82 ,. . ..

The proof of the Markov property for- the Poisson process is easily adapted to give the following generalization.

84

2. Continuous-time Markov chains I

Theorem 2.5.3 (Markov property). Let (Xt)(~O be a birth process of rates (qj : j 2:: 0). Then, conditional on X s = i, (Xs+t)(~O is a birth process of rates (qj : j ~ 0) starting from i and independent of (X r : r ~ s).
We shall shortly prove a theorem on birth processes which generalizes the key theorem on Poisson processes. First we must see what will replace the Poisson probabilities. In Theorem 2.4.3 these arose as the unique solution of a system of differential equations, which we showed were essentially the forward equations. Now we can still write down the forward equation

P'(t)
or, in components

= P(t)Q,

P(O)

= I.

and, for j = 1, 2, ...

Moreover, these equations still have a unique solution; it is just not as explicit as before. For we must have

which can be substituted in the equation
P~ 1 ( t) = PiO (t )qo - Pi 1 ( t )q1 ,

Pi 1 ( 0)

= 8i 1

and this equation solved to give

Now we can substitute for Pi1(t) in the next equation up the hierarchy and find an explicit expression for Pi2(t), and so on.

Theorem 2.5.4. Let (Xt)t~O be an increasing, right-continuous process with values in {O, 1,2, ... } U {oo}. Let 0 ~ qj < 00 for all j ~ o. Then the following three conditions are equivalent: (a) (jump chain/holding time definition) conditional on X o = i, the holding times 8 1 ,82 , ... are independent exponential random variables of parameters qi, qi+1, . .. respectively and the jump chain is given by Y n = i + n for all n;

2.5 Birth processes

85

(b) (infinitesimal definition) for all t, h ~ 0, conditional on X t = i, X t+h is independent of (X s : s ~ t) and, as h ! 0, uniformly in t,

IF(Xt + h lP(Xt + h

= i I X t = i) = 1 - qih + o(h), = i + 1 I X t = i) = qih + o(h);

(c) (transition probability definition) for all n = 0,1,2, ... , all times to ~ ... ~ tn+l and all states io, ... ,in+l

°

~

where (pij(t) : i,j = 0,1,2, ... ) is the unique solution of the forward equations.
If (Xt)t~O satisfies any of these conditions then it is called a birth process of rates (qj : j ~ 0).
Proof. (a) => (b) If (a) holds, then, by the Markov property for any t, h ~ 0, conditional on X t = i, X t + h is independent of (X s : s ~ t) and, as h ! 0, uniformly in t,

lP(Xt + h ~ i

+ 1 I X t = i) = lP(Xh
~ h I X o = i)

~ i

= IF(Jl
and

=

1-

+ 1 I X o = i) e- qih = qih + o(h),

IF(Xt + h ~ i

+ 2 I X t = i) = IF(Xh

~ i

+ 2 I X o = i)
~ h

= IF( J2 ~ h I X o = i) ~ IF(Sl ~ hand S2 = (1 - e- qih )(l - e- qi + 1h ) = o(h),
which implies (b). (b) => (c) If (b) holds, then certainly for k = i

I X o = i)

+ 2, i + 3, ...

IF(Xt + h
Set Pij(t)

= k I X t = i) = o(h)

as h

! 0, uniformly in t.
1,2, ...

= IF(Xt = j I X o = i).
j

Then, for j

=

Pij(t

+ h) = IF(Xt + h = j I X o = i)
=

LJP>(Xt = k I X o = i)JP>(XHh = j
k==i

IX t

= k)

= Pij(t)(l -

qjh + o(h))

+ Pi,j-l(t)(qj-lh + o(h)) + o(h)

86
so

2. Continuous-time Markov chains I

As in the proof of Theorem 2.4.3, we can deduce that Pij(t) is differentiable and satisfies the differential equation

By a simpler argument we also find

Thus (Pij(t) : i,j = 0,1,2, ... ) must be the unique solution to the forward equations. If (Xt)t;~o satisfies (b), then certainly

but also

(Xtn+t)t;~o

satisfies (b), so

by uniqueness for the forward equations. Hence (c) =* (a) See the proof of Theorem 2.4.3.
Exercise

(Xt)t;~o

satisfies (c).

D

2.5.1 Each bacterium in a colony splits into two identical bacteria after an exponential time of parameter -X, which then split in the same way but independently. Let X t denote the size of the colony at time t, and suppose X o = 1. Show that the probability generating function ¢(t) = E(zX t ) satisfies

Make a change of variables u = t - s in the integral and deduce that d¢/dt = -X¢(¢ - 1). Hence deduce that, for q = 1 - e- At and n = 1,2, ...

which will occupy the remainder of this chapter and the whole of the next. The approach we have chosen is to introduce continuous-time chains in terms of the joint distribution of their jump chain and holding times. which we shall describe. This provides the most direct mathematical description.1 The Q-matrix Q = (~2 ~1 ~) 2 1 -3 . Recall that a Q-matrix on I is any matrix Q = (qij : i. We are going to describe a simple procedure for obtaining from a Qmatrix Q a stochastic matrix IT. then we leave them alone and put a 1 on the diagonal. putting a 0 on the diagonal. where possible. The basic data for a continuous-time Markov chain on I are given in the form of a Q-matrix.6 Jump chain and holding times 87 This section begins the theory of continuous-time Markov chains proper.6. The jump matrix IT = (7rij : i. j E I) which satisfies the following conditions: (i) 0 ~ -qii < 00 for all i. the off-diagonal entries in the ith rovJ of Q and scale them so they add up to 1. the associated diagram transforms into a discrete-time Markov chain diagram simply by rescaling all the numbers on any arrows leaving a state so they add up to 1. As you will see in the following example.2. Example 2. j E I) of Q is defined by 7rij 7rii = = qij / qi { 0 if j if j =I i =I i and qi and qi =I 0 = 0. (ii) qij ~ 0 for all i =I j. (iii) L qij = 0 for all i. JEI We will sometimes find it convenient to write qi or q( i) as an alternative notation for -qii. For each i E I we take.6 Jump chain and holding times 2. This is only impossible when the off-diagonal entries are all 0. and which underlie many applications. {O 1 ~f qi 1= If qi = O. It also makes possible a number of constructive realizations of a given Markov chain. 0 This procedure is best thought of row by row. Let I be a countable set.

2.x. its holding n times Sl.88 has diagram: 2. A minimal rightcontinuous process (Xt)t~O on I is a Markov chain with initial distribution .x.q(Yn-1) respectively.Y .1 . ••• be independent exponential random .x. IT) and let T 1 .Sn are independent exponential random variables of parameters q(Yo). Recall that a minimal process is one which is set equal to 00 after any explosion .x and generator matrix Q if its jump chain (Yn)n~O is discrete-time Markov(. IT) and if for each n ~ 1. . T 2 .. conditional on yo.see Section 2.. Continuous-time Markov chains I 1 3 1 2 The jump matrix IT of Q is given by II = ( ~ 2/3 1/2 o 1/3 and has diagram: 1 3 3 1 2 Here is the definition of a continuous-time Markov chain in terms of its jump chain and holding times. We say (Xt)t~O is Markov(. . . We can construct such a process as follows: let (Yn)n~O be discretetime Markov(. . Q) for short.

3. by Theorem 2.6 Jump chain and holding times variables of parameter 1.8n .x.. Suppose the door giving access to chamber j from chamber i opens at the jump times of a Poisson process of rate qij and you take every chance to move that you can.. .. Then set J o = and define inductively for n = 0. So. 00 ~~ S~+l' n+1 - _ {j 'l .. Imagine the state-space I as a labyrinth of chambers and passages.. . 89 = T n /q(Yn. and with a family of independent Poisson processes {(N.3. Then (Xt)t~O has the required properties.j)t~o having rate qij. we begin with an initial state X o = Yo with distribution . + 8 n and Yn if I n ~ t < I n+1 for some n X t = { 00 otherwise. Here is the second construction. Sn+l = y: J-r-'I. . Set 8 n I n = 8 1 + .. for j =I i.i =I j}. if Y n = i we set 8~+1 = T~+l/qij.2. and Njnj n+l =I Njnj n 'l If I n + 1 = .Y and 8 1 . and with an array (T~ : n ~ 1. conditional on Yn = i. Then.. conditional on Y n = i. 8 n+1 is exponential of parameter qi = Ej:j=i qij. inductively for n = 0.1.j E I. rather than proceeding first via the jump matrix. the random variables 8~+1 are independent exponentials of parameter qij for all j -=I i.1 ). . In more mathematical terms.. 1 if 8~+1 = 8 n + 1 < ·f S n+1 = 00. . (N.1. Our third and final construction of a Markov chain with generator matrix Q and initial distribution .. nj I n+1 = inf{t > I n : =I Nj: j for some j #lyn } ° Nr Yn + 1 = j { if I + < . then you will perform a Markov chain with Q-matrix Q. j E I) of independent exponential random variables of parameter 1. n 1 00 00. You will need to understand these constructions in order to identify processes in applications which can be modelled as Markov chains. . each passage shut off by a single door which opens briefly from time to time to allow you through in one direction only. independent of (Yn)n~O. We begin with an initial state X o = Yo with distribution .x is based on the Poisson process.j)t~o : i. This construction n shows why we call qi the rate of leaving i and qij the rate of going from i to j. Then.2.x. . as required. and 8 n + 1 and Y n + 1 are independent. Both constructions make direct use of the entries in the Q-matrix. . and independent of yo. . We shall now describe two further constructions. Y n+1 has distribution (7rij : j E I).2.

(Xt)t~O has the strong Markov property.90 2. In particular. Then explode if anyone of the following conditions holds: (i) I is finite. Recall the notation of Section 2. Y1 has distribution (7rij : j E I). one can run through a sequence of states with shorter and shorter holding times and end up taking infinitely many jumps in a finite time. then T 1 .l) =1= (i.j)t~o is exponential of parameter qij. Q) and.. 8 2 .N!} is a Poisson process of rate qij independent of (N.3. especially when the state-space is infinite. and 8 n +1 and Y n +1 are independent. 2. (ii) sup qi < 00. conditional on Yo = i.8n · Hence (Xt)t~O is Markov(-X.. so we shall not rely on this third construction in the development of the theory. Set Tn = q(Yn. o (= supJn = n 2: 00 (Xt)t~O does not iEI (iii) X o = i.2: for a process with jump times Jo.j := N!}+t . . although each holding time is strictly positive.Y and 8 1 . T 2 . (XT+t)t~O has the same distribution as (Xt)t~O and is independent of (X s : s ~ T). which are independent of N. and J 1 and Y1 are independent. .. and i is recurrent for the jump chain. by Theorem 2. . conditional on I n < 00 and Y n = i. and independent of X o and (Ntkl)t~O for (k. Y n+1 has distribution (7rij : j E I). conditional on T < 00 and XT = i.1 )Sn. .l) =1= (i.3. Q).j : s ~ t). If we condition on X o and on the processes (Ntkl)t~O for (k.7. J 1.. the explosion time ( is given by Sn n==l Theorem 2.7 Explosion We saw in the special case of birth processes that. Proof. by the strong Markov property of the Poisson process N. . then {T ~ t} depends only on (N. So. Hence. Let (Xt)t~O be Markov(-X. and independent of yo. . . So.. 8 n+1 is exponential of parameter qi. and holding times 8 1 . Continuous-time Markov chains I The first jump time of (N...j. ..j : s ~ T). In cases (i) and (ii). are independent E(l) and independent of (Yn)n~O.j). . Now suppose T is a stopping time of (Xt)t~o.1. q = SUPi qi < 00 and 00 . J2.j). moren over.. J 1 is exponential of parameter qi = Lj#i qij. This phenomenon is called explosion. The conditioning on which this argument relies requires some further justification. we can take T = I n to see that.

then for all i. As a corollary to the next result we shall obtain necessary and sufficient conditions for Q to be explosive.Q) and independent of J 1 • So and . D (Yn)n~O 91 visits i infinitely L m==l 00 TN". Moreover. Let (Xt)t~O be a continuous-time Markov chain with generator matrix Q and write ( for the explosion time of (Xt)t>o. J 1 is E(qi) and Moreover. (XJl+t)t~O is Markov(8k. • •• . ifz also satisfies (i) and (ii).+! = 00 We say that a Q-matrix Q is explosive if. say. Fix () > 0 and set Zi = Ei(e-(}(). Proof. The time and place of the first jump are independent. Here as in Chapter 1 we denote by Pi the conditional probability Pi(A) = P(AIXo = i). we know that often. 7 Explosion with probability 1. Then qi( ~ with probability 1.7.7. It is a simple consequence of the Markov property for (Yn)n>O that under Pi the process (Xt)t>o is Markov(8i . Then Z = (Zi : i E I) satisfies: - (i) IZil ~ 1 for all i. by the Markov property of the jump chain at time n = 1. Q). Zi ~ Zi (ii) Qz = Oz. conditional on XJ 1 = k.2. Otherwise Q is non-explosive. but these are not as easy to apply as Theorem 2. at times N 1 . The result just proved gives simple conditions for nonexplosion and covers many cases of interest. Theorem 2.2. In case (iii). for the associated Markov chain Pi (( < 00) > 0 for some i E I. N 2 . Condition on X o = i.1.

On the other hand.7. Suppose inductively that z. By monotone convergence as n ~ 00. D . (b) Qz = (Jz and IZil ~ 1 for all i imply z = o. then by the theorem E i (e. by symmetry z ~ 0. since z satisfies (ii) Hence Zi ~ E i (e.9() = 0 for all i. so Zi ~ Zi for all i. Then qikZk (0 . so JP>i(( = 00) = 1 and (a) holds.9Jn ) ~ ~ then.9 ().3. Continuous-time Markov chains I qi Recall that = -qii and q(Jrik = qik.9Jn ) for all n. hence z ~ 0.9 () = o. For each (J > 0 the following are equivalent: (a) Q is non-explosive. and hence (b) holds. if (b) holds. then. By the theorem. -< E·(e. Qz = (Jz and Izl ~ 1 imply Zi ~ E i (e. in particular for all i. If (a) holds then JP>i(( = 00) = 1 so E i (e. Proof. D Corollary 2.92 2. Note that the same argument also shows that Suppose that z also satisfies (i) and (ii).qii)Zi =L k#i so OZi = Lqikzk kEI and so z satisfies (i) and (ii).

Theorem 2. In this section we shall obtain two more ways of characterizing a continuous-time Markov chain. conditional on T < 00 and XT = i. Q) and independent of (Xs : s ~ T). so we have deferred it to Section 6.1 on the Poisson process gives the main idea in a simpler context.00) the event {T ~ t} depends only on (X s : s ~ t). Let (Xt)t~O be Markov(A.~o be a Markov chain on the integers with transition rates and qij = if Ij .1 (Strong Markov property). starting from 0. write down a necessary and sufficient condition for (Xt)t. (Xt)t~O is non-explosive if and only if one of the following conditions holds: ° qi. Then. Recall that a random variable T with values in [0. we shall give the strong Markov property as this is a fundamental result and the proof is not much harder. In the case where J-l = 0.8. (b) the expected total time spent in state i.4. Theorem 2. A > J-l and A < J-l and E~l 1/ qi = 00. As for Poisson processes and birth processes.il ~ 2. starting from 0.1/2)? Show that. E~ll/q-i = 00.00] is a stopping time of (Xt)t~O if for each t E [0. the proof of both results really requires the precision of measure theory. For example. We come to the key result for continuous-time Markov chains. However.i-l = J-lqi ° (i) (ii) (iii) A = J-l. that X t hits i. we might wish to calculate IPi(Xt = j). where A + J-l = 1 and qi > for all i. If you want to understand what happens.i+l = Aqi.2. it does not answer some basic questions.5. in general.8 Forward and backward equations Although the definition of a continuous-time Markov chain in terms of its jump chain and holding times provides a clear picture of the process. Why is this condition necessary for (Xt)t~O to be explosive for all J-l E [0. Q) and let T be a stopping time of (Xt)t~o.8 Forward and backward equations Exercise 93 2. where there is a . In fact. We shall present first a version for the case of finite state-space. qi.~o to be explosive. the first step is to deduce the Markov property from the jump chain/holding time definition. 2. Find for all integers i: (a) the probability.1 Let (Xt )t. (XT+t)t~O is Markov(8i .7. which will in particular give us a means to find IPi(Xt = j).

Thus for every state j there is an inequality . If (Xt)t. P(o) = I..~o be a right-continuous process with values in a finite set I. where>. Let (Xt)t. .in +1 °::. Xt+h is independent of (X s : s ~ t) and.1 . . the holding times 8 1 .. and for j =I i we have JP>i(Xh = j) ~ JP>(J1 ~ h.. . Proof. . . (b) (infinitesimal definition) for all t.8. Let Q be a Q-matrix on I with jump matrix II. . 1. all times to ~ t 1 ~ . . Theorem 2.e-qih)7rije-qjh = qijh + o(h). t ~ 0) is the solution of the forward equation P'(t) = P(t)Q.. 2. In this case there are three alternative definitions. j E I.Y .1 ) respectively. Continuous-time Markov chains I simpler proof.8n n are independent exponential random variables of parameters q(Yo).. ~ t n +1 and all states io. Q) for short.. (a) =* (b) Suppose (a) holds. for all j (c) (transition probability definition) for all n = 0.q(Yn .. then. conditional on X t = i..94 2...~o satisfies any of these conditions then it is called a Markov chain with generator matrix Q. Then the following three conditions are equivalent: (a) (jump chain/holding time definition) conditional on X o = i. h ~ 0.2. IT) and for each n ~ 1. . . the jump chain (Yn)n~O of (Xt)t~O is discrete-time Markov(8 i . Y1 = j. uniformly in t. as h ! 0. We say that (Xt)(~O is Markov(>. .... is the distribution of X o. . just as for the Poisson process. 8 2 > h) = (1 .. where (Pij (t) : i. conditional on yo. as h ! 0.

h ~ 0. proving (c). and satisfies the forward equations ° p~/t) = LPik(t)qkj. D We know from Theorem 2. uniformly in t (b) :::} (c) Set Pij(t) for all t.1. as h ! 0.8 Forward and backward equations 95 and by taking the finite sum over j we see that these must in fact be equalities. Since I is finite. . conditional on X t = i. kEI Pij(O) = 8ij . if (b) holds.1. h ~ 0. X t + h is independent of (X s : s ~ t) and. uniformly in t kEI If (b) holds. we see that Pij(t) is differentiable on the right.1 that for I finite the forward and backward equations have the same solution.2. by the above argument. Then by uniformity we can replace t by t . moreover. Indeed.3. letting h ! 0. then and. for any t. Also. there is a slight variation of the argument from (b) to (c) which leads directly to the backward equation. Then by the Markov property. (b) holds for (Xtn+t)(~O so.Pij(t) = LPik(t)qkj + O(h) kEI so. as h = lFi(Xt = j) = IF(Xt = j I X o = i). (c) :::} (a) See the proof of Theorem 2. ! 0. Pij(t) is then the unique solution by Theorem 2.4. then differentiable on the left. then Pij(t + h) = LlPi(Xt = k)IP(XHh = j = I X t = k) LPik(t)(8kj + qkjh + o(h)).h in the above and let h ! to see first that Pij(t) is continuous on the left. kEI Since I is finite we have Pij(t + h~ . So in condition (c) of the result just proved we could replace the forward equation with the backward equation. hence differentiable.1.

t and Pij(t. Theorem 2.j E I) of differentiable functions satisfying this system of differential equations.j E I) satisfies P(t. kEI Pij(O) = 15ij and the results on matrix exponentials given in Section 2. t ~ O.1 no longer apply. We turn now to the case of infinite state-space.. This solution forms a matrix semigroup P(s)P(t) = P(s + t) for all s.1. the qualifications minimal and nonnegative being understood. We shall prove this result by a probabilistic method in combination with Theorem 2. t + h) : i. are not a priori identical. The backward equation may still be written in the form P'(t) = QP(t).1. Note that if I is finite we must have P(t) = etQ by Theorem 2..~). Then the backward equation P'(t) = QP(t).t) (I+t~ +o(~))n. t n -4 eq as n -4 00. Here is the key result for Markov chains with infinite state-space.96 2.3. A solution to the backward equation is any matrix (pij(t) : i. Continuous-time Markov chains I The deduction of (c) from (b) above can be seen as the matrix version of the following result: for q E lR we have (1 + ~ + o( ~) ) Suppose (b) holds and set then P(t. t + h) = I + Qh + o(h) = P(O. P(O) =I only now we have an infinite system of differential equations P~j(t) = L qikPkj(t).8. Let Q be a Q-matrix. t) = etQ . We call (P(t) : t ~ 0) the minimal non-negative semigroup associated to Q. . P(O) =I has a minimal non-negative solution (P(t) : t 2: 0). On the other hand.4. p((n~l)t. though uniform in t. = (Pij(t. or simply the semigroup of Q. Some care is needed in making this precise. + h) + h) = P(Xt +h = j I X t = i).8.~)p(~.t)=p(o. There are just two alternative definitions now as the infinitesimal characterization become problematic for infinite state-space. since the o(h) terms. in (c) we see that P(O.

.s in each of the integrals. Make a change of variable u = t . the holding times 8 1 ...i n+l If (Xt)t~O satisfies any of these conditions then it is called a Markov chain with generator matrix Q.8.q(Yn. t. Q). Let (Xt)(~O be a minimal right-continuous process with values in I. t..8..1 ) respectively. Then conditional on J 1 = sand X J1 = k we have (X8+t)t~O rv Markov(8k. . .8. all times 0 ~ to ~ tl ~ . Conditional on X o = i we have J 1 rv E(ql) and X J1 rv (7rik : k E I). JP>i(Xt = j. il. XJI = k..2) . Proof of Theorems 2.1 . .s)ds.4.. Then the following conditions are equivalent: (a) (jump chain/holding time definition) conditional on X o = i. conditional on yo. Step 1.8n n are independent exponential random variables of parameters q(Yo)... We say that (Xt)t~O is Markov (A. ..s)ds. So and JP>i(J1 ::. (2. We know that there exists a process (Xt)t~O satisfying (a). . .Y .4. Q) for short.1) = e-Qit8ij + 2: k#i Jo t qie-QiS7rikPkj(t . (b) (transition probability definition) for all n = 0.2. . . t < J1) + LJP>i(J1 ::.8 Forward and backward equations 97 Theorem 2. X t = j) (2. II) and for each n ~ 1.3 and 2. Let Q be a Q-matrix on I with jump matrix IT and semigroup (P(t) : t ~ 0). interchange sum and integral by monotone convergence and multiply by eqit to obtain eQitpij(t) = 8ij + Jo tL k#i qieQiU7rikPkj(u)du. .. ~ t n+l and all states io. . We show that P(t) satisfies the backward equation. where A is the distribution of X o..1. So let us define P(t) by Pij(t) =JP>i(Xt =j).2. the jump chain (Yn)n~O of (Xt)t~O is discrete-time Markov(8i. X t = j) = Therefore Pij(t) = I t qie-QiS7rikPkj(t . XJl k#i = k.

1) also shows that JP>i(Xt =j. j and t. j.3) P~j(t) = L qikPkj(t) kEI so P(t) satisfies the backward equation. Let us suppose inductively that for all i. The integral equation (2.1) to (2. We show that if P(t) is another non-negative solution of the backward equation. j and t. j and t. then for all i.t < I n +l) = e-qit15ij +L k=li Jo rt qie-QiS7rikJP>k(Xt-s = j. we (2. then P(t) ~ P(t). if P(t) satisfies the backward equation. then.98 2. that Pij (t) is continuous in t for all i. it also satisfies the integral form: If P(t) 2 0. hence P(t) is the minimal non-negative solution. Secondly. Continuous-time Markov chains I This equation shows.5) we have for all i. on rearranging. hence continuous. Then. the integrand is then a uniformly converging sum of continuous functions. t - s < In)ds.1) is called the integral form of the backward equation. . by reversing the steps from (2. Step 2.4) On the other hand. L k#i Recall that obtain qi = -qii and qik = qi'lrik for k =I i. then by comparing (2.4) and (2. The argument used to prove (2. firstly.3). and hence Pij (t) is differentiable in t and satisfies eqit(qiPij(t) + P~j(t)) = qieqit7rikPkj(t). (2.

j E I) of differentiable functions satisfying this system of equations. P(Xtn +1 = i n+1 IXto = io.3. Suppose. (c) :::} (a)). .4 by the usual argument that (b) must now imply (a) (see the proof of Theorem 2. that by the Markov property (Xt)t~O satisfies (a). Hence for all i.3 and 2. instead of conditioning on the first event. Since (Xt)t~O does not return from 00 we have Pij(S + t) = lPi(Xs +t = j) = LlPi(XS +t = j kEI I X s = k)lPi(Xs = k) = :ElPi(Xs = k)lPk(Xt = j) = :EPik(S)Pkj(t) kEI kEI by the Markov property. D So far we have said nothing about the forward equation in the case of infinite state-space. by a probabilistic argument resembling Step 1 of the proof of Theorems 2. .. kEI Pij(O) = 8ij .8. Step 4.. 99 Step 3.1 are no longer valid.8.8. We shall show that the semigro~p (P( t) : t ~ 0) of Q does satisfy the forward equations.2. This time.3.4.Xtn = in) = Pin (Xtn+l-tn = in+1) = Pi nin+l (tn+l .8. This completes the proof of Theorem 2. The forward equation may still be written P'(t) = P(t)Q. j and t. . we condition on the last event before time t.tn) so (Xt)t~O satisfies (b). A solution is then any matrix (pij(t) : i. as we have throughout. a simple version of which was given in Theorem 2.4. P(O) = I.4. Then.8 Forward and backward equations and the induction proceeds. Hence (P(t) : t ~ 0) is a matrix semigroup. Remember that the finite state-space results of Section 2. We need the following time-reversal identity. We complete the proof of Theorem 2.3. The argument is a little longer because there is no reverse-time Markov property to give the conditional distribution. now understood as an infinite system of differential equations p~/t) = LPik(t)qkj.

100

2. Continuous-time Markov chains I

Lemma 2.8.5. We have

qinJP(Jn ~ t < I n+ 1 I Yo = io, Y 1 = i 1, ... ,Y = in) n = qioJP(Jn ~ t < I n+1 I Yo = in,· .. , Y n - 1 = iI, Y n

= io).

Proof· Conditional on Yo = i o, ... ,Y = in, the holding times 8 1 , ... ,8n+ 1 n are independent with 8k rv E(qik_l). So the left-hand side is given by

Jt:!..(t)

{

qi n exp{ -qi n (t -

Sl -

· ·· -

sn)}

k=l

IT

qik-l exp{ -qik-l Sk}dsk

where Ll(t) = {(SI, ... ,sn) : SI + ... + Sn ~ t and SI, ... ,8 n ~ O}. On making the substitutions Ul = t - 81 - ... - Sn and Uk = 8 n -k+2, for k = 2, ... ,n, we obtain

qinJP(Jn ~ t < I n+ 1 I Yo
=

= i o,·

.. ,Y = in) n

Jt:!..(t)

(

qio exp{ -qio(t - U1 - · .. - Un)}
~

k=l

IT

qin-k+l exp{ -qin-k+l Uk}duk
= io).

= qioJP(Jn

t < I n+ 1 I Yo

= in, . ..

,Y - 1 = iI, Y n n

D

Theorem 2.8.6. The minimal non-negative solution (P(t) : t 2: 0) of the backward equation is also the minimal non-negative solution of the forward equation P'(t) = P(t)Q, P(O) = I.
Proof. Let (Xt)t~O denote the minimal Markov chain with generator matrix Q. By Theorem 2.8.4

00

= L

LJP>i(Jn ::; t < In+i, Y n- 1 = k, Y n = j).
~

n==O k=j:j

Now by Lemma 2.8.5, for n

1, we have

JPi(Jn ~ t < I n+ 1 I Y n - 1 = k, Y n = j) = (qi/qj)JPj(Jn ~ t < I n+ 1 I Y1 = k, Y n = i)
= (qi/Qj)

= qi

I

it

qje-QjBJP>k(Jn_1 ::; t -

S

< I n I Yn-1 = i)ds

t

e- QjB (qk/qi)JP>i (In-1 ::; t - s < I n I Yn - 1 = k)ds

2.8 Forward and backward equations

101

where we have used the Markov property of (Yn)n~O for the second equality. Hence
Pij(t) = 8ije-qit

+

f
00

'2: i t JP>i(Jn-l ::; t 0
X

s
1

< I n I Yn -

1

= k)

n=lki=i

IPi(Yn -

= k, Yn = j)Qke-qj8ds

= 8ije-qit + L
= 8ije-qit

L

io
0

ft
JP>i(Jn-l ::; t -

s < I n , Yn - 1 =

k)qk'lrkje-qjSds

n=lki=i

+

io

t '2:Pik(t k#j

s)qkje-qjSds

(2.6)

where we have used monotone convergence to interchange the sum and integral at the last step. This is the integral form of the forward equation. Now make a change of variable u = t - s in the integral and multiply by eqjt to obtain (2.7) We know by equation (2.2) that eqitpik(t) is increasing for all i, k. Hence either '2:Pik(U)qkj converges uniformly for u E [0, t]
ki=i

or
LPik(U)qkj =
ki=j
00

for all u ~ t.

The latter would contradict (2.7) since the left-hand side is finite for all t, so it is the former which holds. We know from the backward equation that Pii (t) is continuous for all i, j; hence by uniform convergence the integrand in (2.7) is continuous and we may differentiate to obtain

P~j(t) + Pij(t)qj

= '2:Pik(t)qkj. ki=i

Hence P(t) solves the forward equation. To establish minimality let us suppose that Pij(t) is another solution of the forward equation; then we also have

102

2. Continuous-time Markov chains I
~

A small variation of the argument leading to (2.6) shows that, for n

0

Pi(Xt = j, t < I n + 1 )

= 8ije-qit +
If P(t) ~ 0, then

L

(t IP\(Xt = j, t < In)qkje-QjBds.

(2.8)

k#jJo

P(Xt = j, t < J o) = 0
Let us suppose inductively that

~ Pij (t)

for all i, j and t.

then by comparing (2.7) and (2.8) we obtain

and the induction proceeds. Hence

Exercises 2.8.1 Two fleas are bound together to take part in a nine-legged race on the vertices A, B, C of a triangle. Flea 1 hops at random times in the clockwise direction; each hop takes the pair from one vertex to the next and the times between successive hops of Flea 1 are independent random variables, each with with exponential distribution, mean 1/ A. Flea 2 behaves similarly, but hops in the anticlockwise direction, the times between his hops having mean 1/ Il. Show that the probability that they are at A at a given time t > 0 (starting from A at time t = 0) is

2.8.2 Let

(Xt)t;~o

Iln = nil, and assume that X o = 1. Show that h(t) =

be a birth-and-death process with rates An = nA and P(Xt = 0) satisfies

2.9 Non-minimal chains
and deduce that if A =I J-l then

103

2.9 Non-minimal chains
This book concentrates entirely on processes which are right-continuous and minimal. These are the simplest sorts of process and, overwhelmingly, the ones of greatest practical application. We have seen in this chapter that we can associate to each distribution A and Q-matrix Q a unique such process, the Markov chain with initial distribution A and generator matrix Q. Indeed we have taken the liberty of defining Markov chains to be those processes which arise in this way. However, these processes do not by any means exhaust the class of memoryless continuous-time processes with values in a countable set I. There are many more exotic possibilities, the general theory of which goes very much deeper than the account given in this book. It is in the nature of things that these exotic cases have received the greater attention among mathematicians. Here are some examples to help you imagine the possibilities.

Example 2.9.1
Consider a birth process (Xt)t~O starting from 0 with rates qi = 2i for i ~ We have chosen these rates so that
00 00

o.

Lq:;l = L2-i < 00
i=O i=O

which shows that the process explodes (see Theorems 2.3.2 and 2.5.2). We have until now insisted that X t = 00 for all t 2 (, where ( is the explosion time. But another obvious possibility is to start the process off again from o at time (, and do the same for all subsequent explosions. An argument based on the memoryless property of the exponential distribution shows that for 0 ~ to ~ . . . ~ t n + 1 this process satisfies

for a semigroup of stochastic matrices (P(t) : t ~ 0) on I. This is the defining property for a more general class of Markov chains. Note that the chain is no longer determined by A and Q alone; the rule for bringing (Xt)t~O back into I after explosion also has to be given.

104 Example 2.9.2

2. Continuous-time Markov chains I

We make a variation on the preceding example. Suppose now that the jump chain of (Xt)t~O is the Markov chain on Z which moves one step away from o with probability 2/3 and one step towards 0 with probability 1/3, and that Yo = O. Let the transition rates for (Xt)t~O be qi = 21il for i E Z. Then (Xt)t~O is again explosive. (A simple way to see this using some results of Chapter 3 is to check that (Yn)n~O is transient but (Xt)t~O has an invariant distribution - by solution of the detailed balance equations. Then Theorem 3.5.3 makes explosion inevitable.) Now there are two ways in which (Xt)t~O can explode, either X t ~ -00 or X t ~ 00. The process may again be restarted at 0 after explosion. Alternatively, we may choose the restart randomly, and according to the way that explosion occurred. For example if X t if X t
~ -00 ~ 00

i( as t i (
as t

where Z takes values ±1 with probability 1/2. Example 2.9.3 The processes in the preceding two examples, though no longer minimal, were at least right-continuous. Here is an altogether more exotic example, due to P. Levy, which is not even right-continuous. Consider for n
~

0

and set I = UnD n . With each i E D n\Dn- 1 we associate an independent exponential random variable Si of parameter (2 n )2. There are 2n - 1 states in (Dn\Dn-l) n [0,1), so, for all i E I

and

Now define if

L Sj :::; t < L Sj for some i E I
j<i
j~i

otherwise.

. Chung (Springer.10 Appendix: matrix exponentials 105 This process runs through all the dyadic rationals i E I in the usual order. The details may be found in Markov Chains by D. which IIQlloo is not. see Freedman's book.2.1. The supremum defining IIQlloo is achieved. P(Xt = 00) = 0 for all t ~ O. for 0 ~ to ~ . The Lebesgue measure of the set of times t when X t = 00 is zero. IQl + Q21 ~ IQll + IQ21 and IQIQ21 ~ IQIIIQ21· Proof. There is a semigroup of stochastic matrices (P(t) : t ~ 0) on I such that. Berlin..-L. for the vector Cj = (0. Freedman (Holden-Day. We have (a) (b) IIQlloo ~ IQI ~ NIIQlloo.j Obviously. or else Markov Chains with Stationary Transition Probabilities by K. It remains in i E D n \Dn .1. we have IQcjl ~ IQI. San Francisco. For further reading. ~ tn+l In particular. v#O IIQlloo = sup Iqijl· i. We hope these three examples will serve to suggest some of the possibilities for more general continuous-time Markov chains. or Diffusions.1 for an exponential time of parameter 1. Rogers and D. G. 1994). at j say. . Lemma 2.10 Appendix: matrix exponentials Define two norms on the space of real-valued N x N -matrices IQI = sup IQvl/lvl. We shall show that the two norms are equivalent and that IQI is well adapted to sums and products of matrices. so IIQII~ ~ L(qij)2 = IQcjl2 i ~ IQI 2. C. Markov Processes and Martingales. with 1 in the jth place. Chichester. 2. Vol 1: Foundations by L. 1971). IIQlloo is finite for all Q and controls the size of the entries in Q. ..0). Between any two distinct states i and j it makes infinitely many visits to 00. .10. Williams (Wiley. . (a) For any vector v we have IQvl ~ IQllvl... 2nd edition. .. In particular. 2nd edition. 1967).

. n k=m+l ' " .E(m)lloo ~ IE(n) . by the Cauchy-Schwarz inequality (b) For any vector v we have I(QI + Q2)vl ~ IQI V I + IQ2 V l ~ (IQll + IQ21)1vl.E(m))iil ~ IIE(n) .. consider the finite sum E(n) For each i and j. so Qk < Since '" ..n ~ 00. .k! LJ IQl k ~O as m.2. . we have I(E(n) .k!· LJ k=m+l IQI ~ NllQlloo < 00. IQIQ2 V ~ IQI11Q2 V l ~ IQIIIQ21Ivl· l D Now for n = 0. and m ~ = L k=O n Qk kf' n.106 On the other hand 2.E(m)1 n k=m+l n Lkf IQl k E~o IQl k /k! converges by the ratio test.1. Continuous-time Markov chains I IQvI 2 = ~ (~qijVj) 2 ~ ~ ( ~ IIQllooIVjl) = 2 NIIQII~ ( ~ IVjl) 2 and.

for two commuting matrices Q1 and Q2 we have .) - ~ (tQ)fj L.-J k! k=O has infinite radius of convergence for all i. which therefore converges. proving that Qk e Q = L kf k=O CX) is well defined and. Finally.j. that the power series (e t Q ) .2.10 Appendix: matrix exponentials 107 Hence each component of E(n) forms a Cauchy sequence. _ '1.. indeed.

You will require a reasonable understanding of Chapter 1 here.1. 3.. Associated to any Q-matrix is a jump matrix i. if qi if qi =I 0 = o.j E I) given by 1r. (iii) L qij = 0 for all i. Exercises remain an important part of the text.3 Continuous-time Markov chains II This chapter brings together the discrete-time and continuous-time theories. Recall that a Q-matrix on I is a matrix Q = (qij : i. jEI We set qi IT = (7rij : = q(i) = -qii. _ ~~ - {o qij / qi 0 1 if j if j =I i =I i and qi =I 0 and qi = 0. _ { ~J 7r. . (ii) qij 2 0 for all i =I j. but. given such an understanding. Note that II is a stochastic matrix.. allowing us to deduce analogues. All the facts from Chapter 2 that are necessary to understand this synthesis are reviewed in Section 3. for continuous-time chains. this chapter should look reassuringly familiar. of all the results given in Chapter 1.1 Basic properties Let I be a countable set.j E I) satisfying the following conditions: (i) 0 ~ -qii < 00 for all i.

it waits there for an exponential time of parameter qi and then jumps to a new state. Briefly.1 Basic properties A sub-stochastic matrix on I is a matrix P negative entries and such that 109 = (Pij : i.83 .1 = in-I.. The discrete-time process (Yn)n~O is the jump chain. J 2 .qin -1 . forgetting what has gone before..J n=l For a minimal process we take a new state 00 and insist that X t = 00 for all t 2 (. .· . . lim L. E I) with non- LPij ::. jump times. states that (a) (Yn)n~O is Markov(-X. . We established in Section 2. . Y1 . . in terms of jump chain and holding times. Thus I n = 8 1 + . a right-continuous process (Xt)t~O runs through a sequence of states Yo. but equivalent. .2: minimal right-continuous random process..8 that there are two different.. J 3 . ways to describe how the process evolves. 1 jEI for all Associated to any Q-matrix is a semigroup (P(t) : t 2 0) of sub-stochastic matrices P(t) = (Pij(t) : i. (8n )n>1 are the holding times and (In )n>1 are the jump times. The Q-matrix is known as the generator matrix of (Xt)t~O and determines how the process evolves from its initial state. t 2 o. .. The explosion time ( is given by 00 ( = '""" 8 n = n--+-oo I n . respectively and jumping to the next state at times J1. the holding times 8 1 . being held in these states for times 8 1 ..Sn n are independent exponential random variables of parameters qi o ' . The distribution -X gives the initial distribution.. . As the name implies we have P(s)P(t) = P(s + t) for all s. Y2 . Put more simply. You will need to be familiar with the following terms introduced in Section 2.Y .j E I)..3.. (b) conditional on Yo = i o. j i. choosing state j with probability 1rij. IT). given that the chain starts at i. + 8 n . It then starts afresh.. .82 . An important point is that a minimal right-continuous process is determined by its jump chain and holding times.. The data for a continuous-time Markov chain (Xt)t~O are a distribution -X and a Q-matrix Q.. holding times. The first. the distribution of X o. . . jump chain and explosion.

But there is no danger of confusion in using the simpler notation. in the explosive case. Note that we have not yet said how the semigroup P(t) is associated to the Q-matrix Q. where ~ and P(h) are defined on IU{oo}. Continuous-time Markov chains II The second description. . states that the finitedimensional distributions of the process are given by (c) for all n = 0. . when P(t) is strictly sub-stochastic. Strictly. extending.110 3.i n + 1 ~ to ~ tl ~ .. It then starts afresh. put more simply. .:EPij(t) > 0 jEI the chain is found at 00 with probability Pi 00 (t). The information coming from these two descriptions is sufficient for most of the analysis of continuous-time chains done in this chapter. forgetting what has gone before. we should say Markov(~.7.. The semigroup P(t) is referred to as the transition matrix of the chain and its entries Pij (t) are the transition probabilities. in terms of the semigroup. except via the process! This extra information will be required when we discuss reversibility in Section 3. iI..8 that the semigroup is characterized as the minimal non-negative solution of the backward equation P'(t) = QP(t). P(h)). In the case where Pioo(t) := 1 .x and P(h) by ~oo = 0 and Pooj(h) = O. This description implies that for all h > 0 the discrete skeleton (Xnh)n~O is Markov(. ~ tn+l and all states Again. P(t) is simply the matrix exponential etQ . The semigroup is also the minimal non-negative solution of the forward equation P'(t) = P(t)Q. P(O) = I. by time t it is found in state j with probability Pij (t).x. . P(h)). P(O) = I which reads in components P~j(t) = L qikPkj(t). . In the case where I is finite. So we recall from Section 2. that is. kEI Pij(O) = 8ij . .2. given that the chain starts at i.1... and is the unique solution of the backward and forward equations. all times 0 i o.

. D Condition (iv) of Theorem 3. and then only periodically.1. those that die after explosion. so if (iii) holds. (v) Pij(t) > 0 for some t > o. If (ii) holds. If qij > 0.in with i o i in = j. as discussed in Section 1. . absorbing state and irreducibility are inherited from the jump chain. closed class. Pin-lin (tin) > 0 for all t > 0.. We emphasise that we deal only with minimal chains.2 Class structure 111 A first step in the analysis of a continuous-time Markov chain (Xt)t~O is to identify its class structure. there are states io. . Theorem 3. in = j and '1rioi l '1rili2 .2..1.. Qin-li n > 0 for some states io. Implications (iv) => (v) => (i) => (ii) are clear. i I .. (ii) i ~ j for the jump chain. which implies (iii). then for all t > 0. '1ri n -li n > 0. (iii) Q oilqili2 .3 Hitting times and absorption probabilities Let (Xt)t>o be a Markov chain with generator matrix Q. . and (iv) holds.. = i. (iv) Pij(t) > 0 for all t > 0. 3.2. We say that i leads to j and write i ~ j if lPi(Xt =j for some t ~ 0) > o.. Then the class structure is simply the discrete-time class structure of the jump chain (Yn)n~O..in with io = i.3. Proof.. but only after a certain length of time. For distinct states i and j the following are equivalent: (i) i ~j. The hitting time of a subset A of I is the random variable D A defined by . then Pij (t) ~ Pioil (tin) . We say i communicates with j and write i ~ j if both i ~ j and j ~ i.2 Class structure 3. The notions of communicating class.2.2. .1 shows that the situation is simpler than in discrete-time. then by Theorem 1.. where it may be possible to reach a state.iI.

3. The probability.1.2 to the jump chain and rewrite (1. The vector of hitting probabilities h A = (hf : i E I) is the minimal non-negative solution to the system of linear equations Proof. that (Xt)t~O ever hits A is then When A is a closed class. So if H A is the hitting time of A for the jump chain. Theorem 3. hf is called the absorption probability. Apply Theorem 1. starting from i. D The average time taken. Continuous-time Markov chains II with the usual convention that inf 0 = 00. starting from i. then - {H A < oo} and on this set we have = {D A < oo} D A = JHA. 1 2 2 (Xt)t~O to reach A is given 1 1 2 3 3 3 4 .3. Since the hitting probabilities are those of the jump chain we can calculate them as in Section 1.3. We emphasise that (Xt)t>o is minimal.112 3. for by In calculating kf we have to take account of the holding times so the relationship to the discrete-time case is not quite as simple.3) in terms ofQ.

How long on average does it take to get from 1 to 4? Set ki = E i (time to get to 4).LjEI qijkf = for i fj. k3 = + ~k1 + ~k2. On solving these linear equations we find k 1 = 17/12. then D A = 0. Then kf = Yi = 0 for i E A. If X o = i E A. A. The vector of expected hitting times k A = (kt : i E I) is the minimal non-negative solution to the system of linear equations kt { =0 1 for i E A .1).3. The proof follows the same lines as Theorem 1.1) Proof.Lqijk1 JEI = 1.5. Suppose now that Y = (Yi : i E I) is another solution to (3. First we show that k A satisfies (3. A. Assume that qi > 0 for all i fj.3.3.l+L j#i j#i 'lrij k 1 and so .1). then jump with equal probability to 2 or 3. A.3 Hitting times and absorption probabilities 113 Example 3. so by the Markov property of the jump chain so kt = Ei(D A ) = Ei(J1 )+ LE(DA-J1 I Y1 = j)IP\(Yl = j) = q. Here is the general result. (3.2 Consider the Markov chain (Xt)t~O with the diagram given on the preceding page. Thus 1 k1 = ~ + ~ k 2 + ~ k3 ~ and similarly k2 = ~ + ~k 1 + ~k3.3. j~A k~A . so kt = o.3. then D A ~ J 1 . On starting in 1 we spend an average time ql = 1/2 in 1. Suppose i fj. A. Theorem 3. then Yi = qi + ~ 'lrijYj = qi + ~ 'lrij j~A j~A -1 '"' -1 '"' (-1 + '"' ) qj ~ 'lrjkYk k~A = Ei(Sl) + Ei (S2 1{HA:2:2}) + L L'lrij'lrjkYk. If X o = i fj.

Now so. Continuous-time Markov chains II By repeated substitution for y in the final term we obtain after n steps Yi = lEi(Sd + . if y is non-negative where we use the notation HA /\ n for the minimum of H A and n...4 Recurrence and transience Let (Xt)t~O (Xt)t~O be Markov chain with generator matrix Q. 3.3. . (b) the expected time to hit 4 starting from 1. + lEi (Sn 1{HA:2:n}) + L . D 3. We say that i is transient if JP>i( {t ~ 0 : Xt = i} is unbounded) = O.114 3. Recall that we insist be minimal. L jl~A jn~A 1riil .1 Consider the Markov chain on {I. We say a state i is recurrent if JP>i({t ~ 0: X t = i} is unbounded) = 1. like class structure. Yi ~ E i (D A) = Exercise kf. recurrence and transience are determined by the jump chain. Note that if (Xt)t~O can explode starting from i then i is certainly not recurrent. · 1rin-dnYin· So. The next result shows that.. Calculate (a) the probability of hitting 3 starting from 1.. as required.. 2. by monotone convergence. 3. 4} with generator matrix Q= (0~ 1/6 o 1/2 -1/2 1/2 o o o o -1/3 1/6 1~4) o.

3. so i is recurrent. then i is transient and Jo Jo oo oo Pii(t)dt = 00. Write 1r}j) for the (i. We have: (i) if i is recurrent for the jump chain (Xt)t~O. If X o = i then (Xt)t~O does not explode and I n ~ 00 by Theorem 2.4. Proof. We shall show that 1 o 00 1 pii(t)dt .7. (iv) Apply Theorem 1. 115 then i is recurrent for (ii) if i is transient for the jump chain. Suppose then that qi > o. Jo oo 00. (iii) Apply Theorem 1.--:_ L q1. then (Xt)t~O cannot leave i. (ii) Suppose i is transient for (Yn)n~O. Also X(Jn ) = Y n = i infinitely often. which is finite.2.1 and so that i is recurrent if and only if Pii(t)dt = the corresponding result for the jump chain.1 and the corresponding result for the jump chain.5. (iii) every state is either recurrent or transient. Then Jo IPi(Ni < 00) = IPi(Ti < 00) so i is recurrent if and only if IPi(Ti < 00) = 1. (i) Suppose i is recurrent for (Yn)n~O. because (Yn : n ~ N) cannot include an absorbing state.4.3 to the jump chain. by Theorem 3. Pii(t) = 1 oo Pii(t)dt = 00.1. Proof. with probability 1. D The next result gives continuous-time analogues of the conditions for recurrence and transience found in Theorem 1.1. then i is transient for (Xt)t~O.3. defined by Ti(w) = inf{t ~ J 1 (w) : Xt(w) = i}.4. Theorem 3.4. Pii(t)dt < 00. Let N i denote for all t. n=O 00 (n) 1rii (3.4 to the jump chain. (iv) recurrence and transience are class properties.4 Recurrence and transience Theorem 3. N = sup{ n ~ 0 : Yn = i} < so {t ~ 0: X t = i} is bounded by J(N +1). (Yn)n~O. so {t ~ 0 : X t = i} is unbounded. If X o = i then 00.2) by Theorem 3. We denote by T i the first passage time of (Xt)t~O to state i. If qi = 0.j) entry in rrn. then i is recurrent and (ii) ifqi > 0 and IPi(Ti < 00) < 1. .5. The following dichotomy holds: (i) if qi = 0 or IPi(Ti < 00) = 1. and the first passage time of the jump chain (Yn)n~O to state i.5. with probability 1.

B} with Q-matrix (-a a) (3 -(3 .1 Customers arrive at a certain queue in a Poisson process of rate A.4. Then. (i) If i is recurrent for (Xt)(~O then i is recurrent for (ii) If i is transient for (Xt)t~O then i is transient for (Zn)n~O. Claim (ii) is obvious. we show that recurrence and transience are determined by any discrete-time sampling of (Xt)(~o. The total service time for any customer is exponentially distributed with parameter J-l and is independent of the chain Y and of the service times of other customers. Independently of how many customers are in the queue.2.3.4): = lEi 00 n=O L 00 Sn+l 1{Yn=i} 1 = L lEi (Sn+l I Y n = i)JP>i(Yn = i) = --:n=O ~ n=O L 00 1I"i~)· 0 Finally.116 3. (Zn)n~O. he fluctuates between these states as a Markov chain Y on {A. Theorem 3. Let h > 0 be given and set Zn = X nh . Continuous-time Markov chains II To establish (3. by monotone convergence and the result follows by Theorems 1.2) we use Fubini's theorem (see Section 6. state A signifying that he is 'in attendance' and state B that he is having a tea-break. To prove (i) we use for nh estimate ~ t < (n + l)h the which follows from the Markov property.3 and 3.4. Proof. The single 'server' has two states A and B.4. .5. D Exercise 3.

Show that (fJ . Explain why.3.8ij ) = :E.5 Invariant distributions Just as in the discrete-time theory.5. j.. and explain why this is intuitively obvious. The following are equivalent: (i) A is invariant. so (Jl(II . We say that A is invariant if AQ=O.bij) = qij for all i.. and k = 0. 3.I))j = :EJli(1rij . (ii) Il IT = Il where Ili = Aiqi. . for some fJ in (0.1.1]. . Theorem 3. .5 Invariant distributions Describe the system as a Markov chain X with state-space 117 An signifying that the server is in state A and there are n people in the queue (including anyone being served) and B n signifying that the server is in state B and there are n people in the queue. Proof· We have qi ('lrij . prove that X is transient if 1l(3 < A(a+(3).7 to obtain the following result.xiqij iEI iEI = (.2. Let Q be a Q-matrix with jump matrix IT and let A be a measure. the notions of invariant distribution and measure play an important role in the study of continuous-time Markov chains.1) f (fJ) = 0. where By considering f(l) or otherwise.xQk D This tie-up with measures invariant for the jump matrix means that we can use the existence and uniqueness results of Section 1.1.

Suppose that Q is irreducible and recurrent. By Theorems 3. As in the discrete-time case positive recurrence is tied up with the existence of an invariant distribution. (ii) some state i is positive recurrent.1 and 3.l Ei L n=O l{Yn=j} = . by Theorem 3.118 3. By monotone convergence.1. So.n<Ni } . Proof. we can take Ai = jji/qi to obtain an invariant measure unique up to scalar multiples. Proof.L~ = Ei 10 where T i /\ ( fTil\( 1 {xs=j}ds. Then the following are equivalent: (i) every state is positive recurrent.2.n<Ni } ) n=O 00 = qJ ~lE· '""" 1{Yn=J. Define J-li = (J-l. Moreover. Theorem 3.7. then irreducibility forces qi > 0 for all i.5. Then Q has an invariant measure A which is unique up to scalar multiples. By Fubini's theorem 00 f. L jEI f. = Ei L n=O 00 Sn+l 1{Yn=j.L.4. If qi = 0 or the expected return time mi = Ei(Ti) is finite then we say i is positive recurrent. (iii) Q is non-explosive and has an invariant distribution A. Then. Continuous-time Markov chains II Theorem 3. Denote by N i the first passage time of the jump chain to state i. : j E I) by f. then irreducibility forces qi > 0 for all i. Otherwise a recurrent state i is called null recurrent.J n=O Ni-l = q. by Theorems 1. when (iii) holds we have mi = l/(Aiqi) for all i. D Recall that a state i is recurrent if qi = 0 or lPi (Ti < 00) = 1.1.5. ~ L.5 and 1.L.2. II is irreducible and recurrent. II has an invariant measure jj.3. Let us exclude the trivial case I = {i}. Let Q be an irreducible Q-matrix. Let us exclude the trivial case I = {i}. which is unique up to scalar multiples. It is obvious that (i) implies (ii).j/qj .n<Nd = LEi(Sn+l I Y n =j)Ei (l{Yn=j.7.6.5. = Ei(Ti 1\ (). denotes the minimum of T i and (.

Vj = .. for all j by Theorem 1.7. so J-liQ = 0 by Theorem 3.4 Consider the Markov chain (Xt)(~O on Z+ with the following diagram. where qi > 0 for all i and where 0 < A = 1 .5. then Vi = 1 and vll = v by Theorem 3. But J-li has finite total mass L Jl.---~-- .7. D The following example is a caution that the existence of an invariant distribution for a continuous-time Markov chain is not enough to guarantee positive recurrence... On the other hand.1. Fix i E I and set Vj = Ajqj/(Aiqi). Suppose (ii) holds.5. in the notation of Section 1.1. and mi = l/(Aiqi) for all i...6. = L . then i is certainly recurrent.7./mi.. ... so the jump chain is recurrent.1 i-I i i+1 The jump chain behaves as a simple random walk away from 0. is the expected time in j between visits to i for the jump chain. So mi = LJl. or even recurrence... We know that "ill = by Theorem 1.5 Invariant distributions 119 "i where. and Q is non-explosive.1.. suppose (iii) holds.3.... . so Vj ~ . To compute an invariant measure v it is convenient to use the detailed balance equations for all i..7.5. so (Xt)(~O is recurrent if A ~ J-l and transient if A > J-l.. by Theorem 2./qj ~ L Vj/Qj jEI jEI jEI = 2: Aj/(Aiqi) = l/(Aiqi) < 00 jEI showing that i is positive recurrent..J-l < 1: AqO o .. j.5.. = Ei(Ti) = jEI mi so we obtain an invariant distribution A by setting Aj = J-l. Example 3. To complete the proof we return to the preceding calculation armed with the knowledge that Q is recurrent.. hence 11 is recurrent.

The following are equivalent: (i) AQ = 0.l(A/Il)i. and P(s) is recurrent by Theorem 3.7.3.P'(s) = >.5. If the jump rates qi are constant then v can be normalized to produce an invariant distribution precisely when A < Il.7. so = Il implies that Il is proportional to A. the only possibility is that (Xt)t~O is explosive.120 3. and from the proof of Theorem 3. Consider. Continuous-time Markov chains II Look ahead to Lemma 3. Theorem 3. There is a very simple proof in the case of finite state-space: by the backward equation ds>'P(s) = >. Let s > 0 be given. In this case the non-zero equations read for all i.5.1. The next result justifies calling measures A with AQ = 0 invariant. Given Theorem 3. Thus it suffices to show IlP( s) = Il. the case where qi = 2i for all i and 1 < AI Il < 2.4.5. and let A be a measure.2 to see that any solution is invariant.3. For infinite state-space. Since Q is recurrent. Let Q be irreducible and recurrent. it is non-explosive by Theorem 2. So a solution is given by Vi = q:.QP(s) so AQ Il P (s) d = 0 implies AP(S) = AP(O) = A for all s. P(s) is also recurrent. the interchange of differentiation with the summation involved in multiplication by A is not justified and an entirely different proof is needed. By the strong Markov property at T i (which is a simple consequence of the strong Markov property of the jump chain) . on the other hand. Hence any A satisfying (i) or (ii) is unique up to scalar multiples. (ii) AP(S) = A.3. so IlQ = O. if we fix i and set then IlQ = o. Proof. Then v has finite total mass so (Xt)t~O has an invariant distribution.5. but (Xt)t~O is also transient.

You will see that the situation is analogous to the case of discrete-time.5. If (Xt)(~O is Markov(A.6. using Fubini's theorem. Let Q be a Q-matrix with semigroup P(t). D . By Theorem 3. Proof.3. s +Ti /-£j = = 121 1 Jo kEI lEi l 00 s l{X t=j}dt = j.1. D 3.6. 1 .»i(X kEI i = k. Lemma 3. We shall need the following estimate of uniform continuity for the transition probabilities. for all t.5.5. IF(Xs = i) = (AP(S))i = Ai so. for all i.e.6 Convergence to equilibrium We now investigate the limiting behaviour of Pij (t) as t ~ 00 and its relation to invariant distributions. D Theorem 3. conditional on X s Markov(8i . h 2: 0 Proof.Pij(t)1 kEI = !:EPik(h)Pkj(t) . lF i (J1 :s.pii(h))pij(t) I k#i :s. Q) then so is (Xs+t)(~O for any S 2: o. by the Markov property. h) = 1 . only there is no longer any possibility of periodicity.6 Convergence to equilibrium Hence.qih .pii(h) :s. We have Ipij(t + h)-Pij(t)1 = I:EPik(h)Pkj(t) .(1 . Q). t t JP\(Xs +t < Ti)dt < Ti)Pkj(s)dt = = = roo '2:)J. Then. Let Q be an irreducible non-explosive Q-matrix having an invariant distribution A. t L(lE kEI Jo (Ti l{Xt=k}dt)Pk j (s) :E /-£kPkj(S) as required.

Given € > 0 we can find h > 0 so that for 0 and then find N.5.1. Fix a state i. by discrete-time convergence to equilibrium. we fill in the gaps using uniform continuity. and having an invariant distribution A.6. Continuous-time Markov chains II Theorem 3. A is invariant for P(h). It follows from Theorem 1.6. Fix h > 0 and consider the h-skeleton Zn = Xnh.2. .5. So. j Thus we have a lattice of points along which the desired limit holds.122 3.1 irreducibility implies pij(h) > 0 for all i.2 (Convergence to equilibrium). P(h)). By Theorem 3. By Theorem 2.4 so (Zn)n~O is discrete-time Markov(8i . so that for n 2: N.5 by the same argument we used in the preceding result. For t 2: Nh we have nh ~ ~ s ~ h t < (n + l)h for some n 2: Nand by Lemma 3. We do not give the details.8. Hence Pij(t)~Aj as n~oo.8. D The complete description of limiting behaviour for irreducible chains in continuous time is provided by the following result. By Theorem 3. j we have Proof Let (Xt)(~O be Markov(8i . Then for all states i.j so P(h) is irreducible and aperiodic. for all i. Q). Let Q be an irreducible non-explosive Q-matrix with semigroup P(t).

Let Q be an irreducible Q-matrix and let v be any distribution. Q).7 Time reversal Time reversal of continuous-time chains has the same features found in the discrete-time case. 7 Time reversal 123 Theorem 3.6.3 Customers arrive at a single-server queue in a Poisson stream of rate A. Service requirements are independent of one another and of the arrival process.2.3. and deduce that the chain is positive recurrent if and only if AI Il < 1/2. Calculate the essentially unique invariant measure for Q.6.2 In each of the following cases.3. as we shall see in Section 5.3.1 Find an invariant distribution A for the Q-matrix Q = (~2 2 !4 1 -3 ~) and verify that limt~ooPII(t) = Al using your answer to Exercise 2. Reversibility provides a powerful tool in the analysis of Markov chains. Exercises 3.1. Suppose that (Xt)(~O is Markov(v. Write down the generator matrix Q of a continuous-time Markov chain which models this. compute limt~oo P(Xt = 21Xo = 1) for the Markov chain (Xt)t~O with the given Q-matrix on {1.6. Then as t ~ 00 for all j E I where mj is the expected return time to state j.6.2.1.4}: (a) (c) (11 (1 2 1 -1 0 0 1 -1 0 0 1 1 -1 0 0 0 -2 2 11) ]2) (b) (d) (1 (1 2 1 -1 0 0 2 1 1 1 -1 0 0 -2 1 0 2 -1 0 ~) ~) 3. Note in the following . Each customer has a service requirement distributed as the sum of two independent exponential random variables of parameter J-l. 3. explaining what the states of the chain represent. 3.

.6. .j E 1) is given by Aj~i = Aiqijo over.. For the processes we consider.8. (Xt)O~t~T is Markov(A... ..8. < t n = T and Sk = tk . Mo. Also. Define P(t) by AjPJi(t) = AiPij(t). Set Xt = X T .tn = in) = AinPinin-l (sn) . then P(t) is an irreducible stochastic matrix with invariant distribution A. Let T E (0. and P(t) is then its minimal non-negative solution. Pin-lin (Sn) SO. D The chain (Xt)O~t~T is called the time-reversal of (Xt)O~t~T. by Theorem 2. .7. We could if we wished redefine the time-reversed process to equal its right limit at the jump times. which is itself a Q-matrix. We shall suppose implicitly that this is done..5.1. This echoes our proof of the forward equation.Xtn = in) = P(XT-to = io. j.4 we have .e- Proof..00) be given and let (Xt)O<t<T be Markov(A.8.t .- P(Xto = io. Hence Q is irreducible and non-explosive and has invariant distribution A.124 3. for all t > 0.4 again. and forget about the problem.k~v(A. Continuous-time Markov chains II result how time reversal interchanges the roles of backward and forward equations. Pi l io (Sl) = AioPioi l (Sl) . . But this is the backward equation for Q. and we can rewrite the forward equation transposed as P'(t) = QP(t) . where Q = (~j : i. A Q-matrix Q and a measure A are said to be in detailed balance if for all i. Theorem 3. Q). which rested on the time reversal identity of Lemma 2.tk-1. P(t) is an irreducible stochastic matrix with invariant distribution A.8.XT . Then the process (Xt)O<t<T is Ma. Q is also irreducible and non-explosive with invariant distribution A. A small technical point arises in time reversal: right-continuous processes become left-continuous processes. Finally. by Theorem 2.. thus obtaining again a rightcontinuous process. the semigroup (P(t) : t 2: 0) of Q is the minimal non-negative solution of the forward equation P'(t) = P(t)Q. Q). P(O) = I. for 0 = to < .. . By Theorem 2.. this is unimportant. Let Q be irreducible and non-explosive and suppose that Q has an invariant distribution A. Q).

Then the following are equivalent: (a) (Xt)(~O is reversible. Q). and the lengths of calls are independent exponential random variables of parameter J-l. Is (Nt)t~o a Poisson process? 3.2. Q). Let Q be an irreducible and non-explosive Q-matrix and let A be a distribution. Show that. Then both (a) and (b) are equivalent to the statement that Q = Q in Theorem 3. We say that (Xt)(~O is reversible if. Show that for large t the distribution of the number of lines in use at time t is approximately Poisson with mean AI J-l. Q). Proof· We have (AQ)i = EjEI Ajqji = EjEI Aiqij = 0.1 Consider a fleet of N buses. given that n are in use at time 0. (XT-t)OstsT is also Markov(A. . Show that the expected number of lines in use at time t. Both (a) and (b) imply that A is invariant for Q.7. Each bus breaks down independently at rate J-l. for all T > 0. IfQ and A are in detailed balance then A is invariant for Q. Suppose that (Xt)(~O is Markov(A. and for similar reasons.3. Theorem 3.7.8 Ergodic theorem 125 Lemma 3. Assuming that infinitely many telephone lines are available. when it is sent to the depot for repair.3. D Exercise 3. non-explosive. with Q irreducible and.2 Calls arrive at a telephone exchange as a Poisson process of rate A.7.1. Proof. Here is the result. Find the mean length of the busy periods during which at least one line is in use.7. 3. The repair shop can only repair one bus at a time and each bus takes an exponential time of parameter A to repair. t] has Poisson distribution of mean At. is ne-J-Lt + A(l . Find the equilibrium distribution of the number of buses in service.8 Ergodic theorem Long-run averages for continuous-time chains display the same sort of behaviour as in the discrete-time case.7. in equilibrium. (b) Q and A are in detailed balance.e-J-Lt)1 J-l. the number Nt of calls finishing in the time interval [0. D Let (Xt)(~O be Markov(A. set up a Markov chain model for this process.

.1) L~ ~ + .. ~ ~ ~ By the strong Markov property (of the jump chain) at the stopping times Tin for n ~ 0 we find that L}.. Thus for n = 0.. Then (Xt)(~O hits i with probability 1 and the long-run proportion of time in i equals the longrun proportion of time in i after first hitting i. If (Xt)(~O is Markov(v. in the positive recurrent case.126 3.. are independent and identically distributed with mean l/qi.1.. by the strong law of large numbers (see Theorem 1. L~ .+l : X t = i} L ~+l = T?1'+l . by Tin the time of the nth return to i and by Li the length of the nth excursion to i. t t 1 {~ Jo l{x s =i}ds ~ 0 = 1 mi · Suppose then that Q is recurrent and fix a state i.T!" .. for any bounded function f : I ~ lR we have where and where (Ai: i E I) is the unique invariant distribution. Moreover. . So. Continuous-time Markov chains II Theorem 3.Tin T in + 1 = inf{t > Tin + M. . + Lr:t ~ n n ~ mi asn~oo M~ ~ + . so t 1 Jo l{x s =i}ds::. Denote by Mi the length of the nth visit to i..8.2. it suffices to consider the case v = bi.+l = inf{t > Tin: X t =1= i} . we have M. and that Ml. by the strong Markov property (of the jump chain). Proof. . M?. Let Q be irreducible and let v be any distribution. +M!L 1 ~~qi asn~oo . then ]p> (~ t t io l{x s =i}ds ~ _1_ as t miqi ~ 00) = 1 where mi = IEi(Ti ) is the expected return time to state i. Q). are independent and identically distributed with mean mi.10. Hence. setting Tp = 0. If Q is transient then the total time spent in any state i is finite..1 (Ergodic theorem). . .

. Now... for Tin ~ t < T in+ 1 we have T!L M~ ~ T!"+l L~ ~ ~ -~- + ···+ M!L <1 ~ + ···+ Lr:t .10. In particular. + L.. + M!L ~ L} + ···+ Li ~-- 1 asn~oo miqi 00 with probability 1.:+l ~ so on letting t ~ 00 we have. D . we note that Tin /Tin+ 1 ~ 1 as n ~ with probability 1.2.8 Ergodic theorem and hence M~ ~ 127 + . with probability 1 In the positive recurrent case we can write where Ai = 1/(miqi). by the same argument as was used in the proof of Theorem 1.t ~ it 0 1 {Xs=~} ·ds<-~- T!"+l M~ ~ T!L L~ ~ ~ + ···+ M!"+l ~ + .3.. We conclude that 1 t it 0 f(Xs)ds ~ 1 ast~oo with probability 1.

4. Consider the simple symmetric random walk (Xn)n~O on Z. The further theory inevitably involves more sophisticated techniques which. We begin with a simple example. the identification of martingales is a crucial step in understanding the evolution of a stochastic process. namely. This already covers a great many applications. although having their own interest. At the same time. The idea is that the Markov chain case serves as a guiding metaphor for more complicated processes. further insight is gained into Markov chains themselves. On the other hand. So the reader familiar with Markov chains may find this chapter helpful alongside more general higher-level texts. the overall structure is. already present in the elementary theory. electrical networks and Brownian motion. which is a Markov chain with the following diagram . We therefore thought it worth while to discuss some features of the further theory in the context of simple Markov chains. potential theory. which we shall make precise shortly. martingales. This is a sort of balancing property. but is just the beginning of the theory of Markov processes. Often. to a large extent.4 Further theory In the first three chapters we have given an account of the elementary theory of Markov chains. can obscure the overall structure.1 Martingales A martingale is a process whose average value remains constant in a particular strong sense.

. A third formulation of the martingale property involves another notion of conditional expectation.in lE(Y I X o = i o. .Xn=in }' .4. Since the collection F n consists of countable unions of elementary events such as this martingale property is equivalent to saying that for all i o. X m I X o = io. .··· . .. Let us fix for definiteness a Markov chain and write F n for the collection of all sets depending only on X o. . Here is the general definition.. . for n E(Xn - ~ m. we define lE(Y I F n ) = L io. A process (Mn)n~o is called adapted if M n depends only on X o.Xm (Xn)n~O = i m ) = o. The sequence (Fn)n~o is called the filtration of (Xn)n~O and we think of F n as representing the state of knowledge.Xn .Xn = i n )l{xo=io. A process (Mn)n~o is called integmble if EIMnl < 00 for all n. . .. . . . .Xn .1 Martingales 1 1 129 2 i-I 2 i I( • • i+l The average value of the walk is constant. An adapted integrable process (Mn)n~o is called a martingale if for all A E F n and all n...in and all n. In precise terms we have and the stronger property says that. This stronger property says that (Xn)n~O is in fact a martingale. of the chain up to time n. or history. ..· ... . Given an integrable random variable Y. indeed it has the stronger property that the average value of the walk at some future time is always simply the current value.

Let (Mn)n~o be a martingale and let T be a stopping time. what we do is to replace on each elementary event A E F n . so if we complete the process and average the conditional expectation we should get the full expectation E(E(Y I F n )) = E(Y). n} E F n for all n < 00. then see how it explains some things we know about the simple symmetric random walk. n for some n. It is easy to check that this formula holds.1 (Optional stopping theorem). Proof. Further theory The random variable E(Y I F n ) is called the conditional expectation of Y given F n .M r n-l 1) + ···+ (M1 - Mo) = :2) Mk +l k=O Mk)lk<T. Recall that a random variable T :n ~ {O.130 4. Assume that (i) holds. . Recall from Section 1. In particular. We shall prove one general result about martingales.4 that all sorts of hitting times are stopping times.M o = (Mr .. An equivalent condition is that {T ::. It is easy to check that an adapted integrable process (Mn)n~o is a martingale if and only if Conditional expectation is a partial averaging. . In passing from Y to E(Y I F n ). 1. the random variable Y by its average value E(Y I A). by induction This was already clear on taking A = n in our original definition of a martingale..1. for a martingale so. Theorem 4. 2. ~ T. (ii) T < 00 and IMnl ~ C whenever n Then lEMr = lEMo. Suppose that at least one of the following conditions holds: (i) T::. Then M r . } u {(X)} is a stopping time if {T = n} E F n for all n < 00.

But JP>(T > n) ~ 0 as n ~ 00. now we can compute p = JP>(Xn hits -a before b).3 the counter-intuitive case of a gambler who keeps on playing a fair game against an infinitely rich casino. So what? Well. way to compute p. Thus condition (ii) of the optional stopping theorem applies with M n = X n and C = a V b.IEMTAnl ~ IEIMT . playing a fair game. Then IIEMT .MTAnl ~ 2CJP>(T > n) for all n. so that IEMTAn = IEMo.IEMol = IIEMT . with the certain outcome of ruin.1 Now {k. Markovian. We discussed in Section 1.4. This game ends at the finite stopping time T = inf{ n ~ 0 : Xn = -a} .EMo = L E[(Mk+l k=O Mk)lk<T] = O. and so since (Mk)k~O is a martingale. since the game is fair.p. using the methods of Section 1. But the intuition behind the result EXT = 0 is very clear: a gambler. bEN are given. then the preceding argument applies to the stopping time T 1\ n.< T} = {T ~ Martingales 131 k}C E Fk since T is a stopping time. If we do not assume (i) but (ii). We have X T = -a with probability p and X T = b with probability 1 . D (Xn)n~O. We deduce that IEXT = EXo = O.p)b giving p = b/(a + b). leaves the casino once losses reach a or winnings reach b. whichever is sooner. Hence n-l EMT . so o = EXT = p( -a) + (1 . Then T is a stopping time and T < 00 by recurrence of finite closed classes. so EMT = EMo. Returning to the simple symmetric random walk X o = 0 and we take T suppose that = inf {n ~ 0: Xn = -a or Xn = b} where a. There is an entirely different.4. the average gain should be zero.

given a function f : I ~ JR and a Markov chain (Xn)n~O with transition matrix P. some care is needed as it is not always true. By the Markov property . This is the basis of a deep connection between martingales and Markov chains. Thus. Let A = {X o = i o.I)f(X m ). jEI Theorem 4. m=O Proof. Then the following are equivalent: is a Markov chain with transition matrix P.. Obviously. Then I(PJ)(i)1 = so ILPij1i1 ::. .Xn = in}. and these martingales characterize the chain. s~p llil jEI J IM~I ~ 2(n+1)suplfjl < j 00 showing that M~ is integrable for all n. Let (Xn)n~O be a random process with values in I and let P be a stochastic matrix.1..2. indeed a martingale is necessarily real-valued and we do not in general insist that the state-space I is contained in JR. . to every Markov chain is associated a whole collection of martingales. Since XT = -a we have EXT = -a =I 0 = EXo but this does not contradict the optional stopping theorem because neither condition (i) nor condition (ii) is satisfied.132 4. we have (pn J)(i) = Lp~j) Ii = lEi (J(Xn )). while intuition might suggest that EXT = EXo is rather obvious. Let f be a bounded function. The example just discussed was rather special in that the chain (Xn)n~O itself was a martingale. Write (Fn)n~o for the filtration of (Xn)n~O. Suppose (i) holds. Nevertheless. Further theory where a is the gambler's initial fortune. (ii) for all bounded functions f : I ~ JR. We recall that. the following process is a martingale: n-l (i) (Xn)n~O M~ = f(Xn ) - f(Xo) - L (P . this is not true in general.

So (Mn)n~o is a martingale.i).f(n. Suppose that a function f : N x I ~ lR satisfies. by the Markov property E(Mn +1 - M n I X o = i o.3. i) = i 2 .1)2/2 + (i .1 so Martingales 133 (Pf)(Xn ) E(M~+1 . Xl) . if (ii) holds. Theorem 4. i) = (i . both and (Pf)(n+ 1. i).j) JEI = f(n. We consider f (i) = i and g( n.1)2/2 . Proof.· .f(n. Since IXnl ~ n for all n.Xn D = in) = = E in [f(n + 1. starting from O. then for all bounded functions f.1)/2 + (i + 1)/2 = i = f(i)..M! I A) = E[f(Xn + l ) - I A] = 0 and so (Ml)n~o is a martingale. (Pg)(n + 1.4. On the other hand. Let (Xn)n~O be a Markov chain with transition matrix P. = f(n. Notice that we drop the requirement that f be bounded. Hence both X n = f(X n ) and Yn = g(n.n = g(n. Then. Let us see how this theorem works in the case where (Xn)n~O is a simple random walk on Z.i) Then M n = LPijf(n+ 1. for all n ~ 0.1. We have assumed that M n is integrable for all n. . we have Also (P f)(i) = (i . X n ) is a martingale. On taking f = l{i n +l} we obtain so (Xn)n~O is Markov with transition matrix P.(n + 1) = i2 . in) = O. .n. D Some more martingales associated to a Markov chain are described in the next result. X n ) are martingales. X o)] (Pf)(n + 1. in) .

The potential ¢ is felt physically '\l¢ = (8¢ 8¢ 8¢) 8x' 8y' 8z . which is known as potential theory. So we obtain We have given only the simplest examples of the use of martingales in studying Markov chains. bEN. which in suitable units satisfies the equation -Da¢ = p. a distribution of mass. Some more will appear in later sections. the left side converges to E(T). Show that M n = h(Xn ) is a martingale. 1991). where ~ = 8 2 /8x 2 + 8 2 /8 y 2 through its gradient + 82 / 8z 2 . 4.134 4. For an excellent introduction to martingales and their applications we recommend Probability with Martingales by David Williams (Cambridge University Press. fluid flow and the diffusion of heat. by monotone convergence. but potential theory is also relevant to electrostatics. of density p say. One example is Newton's theory of gravity. gives rise to a gravitational potential ¢. Further theory In order to put this to some use.1. and the right side to E(Xf) by bounded convergence.1 Let (Xn)n~O be a Markov chain on I and let A be an absorbing set in I.2 Potential theory Several physical theories share a common mathematical framework. consider again the stopping time T = inf{n ~ 0 : Xn = -a or Xn = b} where a. Set T = inf{n ~ 0: X n E A} and hi = IPi(Xn E A for some n ~ 0) = IPi(T < 00). By the optional stopping theorem Hence On letting n ---+ 00. Exercise 4. In gravity.

1 4 2 5 Example 4. An indirect link is provided by Brownian motion. Obviously. cPs = 0 and . Suppose that on each visit to states i = 1. This discrete theory amounts to doing Markov chains without the probability.4. which has much of the structure of the continuum version. The basic ideas of boundary theory for Markov chains will also be introduced. We denote by cPi the expected total cost starting from i. we shall briefly run through some of the main features of potential theory.4. containing within it other notions previously considered such as hitting probabilities and expected hitting times. where space is discrete.4.2. This is the easiest way to appreciate the general structure of potential theory. Once we have established the basic link between a Markov chain and its associated potentials.2. obviously have no direct link with this theory. which at each step choses among the allowable transitions with equal probability. but the advantage of wider applicability. It also finds application when one associates costs to Markov chains in modelling economic activity: see Section 5. This is a unifying idea. In this section we are going to consider potential theory for a countable state-space.4 a cost Ci is incurred. What is the fair price to move from state 3 to state 4? The fair price is always the difference in the expected total cost. here are two simple examples. In these examples the potential cP has the interpretation of expected total cost. Markov chains. which has the disadvantage that one loses the intuitive picture of the process. unobscured by technical difficulties. where Ci = i.1 Consider the discrete-time random walk on the directed graph shown above.2 Potential theory 135 which gives the force of gravity acting on a particle of unit mass. explaining their significance in terms of Markov chains.3. which we shall discuss in Section 4. Before we embark on a general discussion of potentials associated to a Markov chain. and by showing how to calculate these potentials. We shall begin by introducing the idea of potentials associated to a Markov chain. in which space is a continuum.

Further theory by considering the effect of a single step we see that ¢l = 1 + ¢2. and suppose cost is incurred at rate Ci = i in state i for i = 1. instead. q2 = 1. q3 = 3. q4 = 1. First suppose our process is. where Ii = i. 4. ¢3 = + ~¢l ¢4 = 4. 3. ¢2 = 2 + ¢3. So we see. Hence ¢3 = 8 and the fair price to move from 3 to 4 is 4. 1 4 2 5 In the second variation we consider the discrete-time random walk (Xn)n~O on the modified graph shown above. What now is the fair price to move from 3 to 4? The expected cost incurred on each visit to i is given by Ci/ qi and ql = 1. Thus the total cost is now T-l n=O L c(X n ) + f(X T ) . Hence ¢3 i + ~¢4' = 5 and the fair price to move from 3 to 4 is 1. We impose a cost Ci = i on each visit to i for i = 2.2. Where there is no arrow. and a final cost Ii on arrival at i = 1 or 5. We shall now consider two variations on this problem. Obviously. transitions are allowed in both directions.136 4. the continuous-time random walk (Xt)t~O on the same directed graph which makes each allowable transition at rate 1. as before ¢l = 1 + ¢2. ¢3 = 3 + ~¢l + ~¢4' ¢4 = 4. Thus the total cost is now 00 1 c(Xs)ds.3. ¢2 = 2 + ¢3.4. states 1 and 5 are absorbing.

The quadratic c2p . all costs are multiplied by c. . We shall see in the general theory that cPo is the minimal non-negative solution. What is the expected total cost cPo incurred by the walk starting from O? We must be prepared to find that cPo = 00 for some values of c. as before.4.2. so we must have By considering what happens on the first step. Note that cPo = 00 always satisfies this equation. Write. On solving these equations we obtain cP2 = 7. we know that the walk X n ~ 00 with probability 1. Let c > 0 and suppose that a cost ci is incurred every time the walk visits state i.2 Potential theory 137 where T is the hitting time of {I. =9 and cP4 cP4 = 4 + ~(cP3 + cPs). cPi for the expected total cost starting from i. Then cPl = 1. Let cPi denote the expected total cost starting from i. Hence the expected total cost is given by cPo c/(c. and takes negative values in between.i-l = q < P = Pi.c + q has roots at q/p and 1. So in cPo = 00.2 Consider the simple discrete-time random walk on Z with transition probabilities Pi. Example 4.i+l. cP3 = 3 + ~(cPl + cP2 + cP4 + cPs). 5}. cPs = 5 and cP2 = 2 + ~(cPl + cP3). cP3 this case the fair price to move from 3 to 4 is -2.q . l) otherwise. as the total cost is a sum over infinitely many times. we see cPo = 1 + PcPl + qcP-l = 1 + (cp + q/c)cPo.c2p-q) = { 00 if c E (q/p. so for c ~ 1 we shall certainly have = 11. On moving one step to the right. Let us look for a finite solution: then so cPo = c cc2 p . Indeed.

This situation is familiar from hitting probabilities and expected hitting times. The most obvious interpretation of these potentials is in terms of cost: the chain wanders around in D until it hits the boundary: whilst in D.138 4. one does not really need a general theory to write down the linear equations. so no further costs are incurred after explosion. at j say. It is interesting that cPo = 00 also when c is too small: in this case the costs rapidly become large to the left of 0. ¢ is the difference of the potentials associated with the positive and negative parts of c and I. . a final cost Ij is incurred. the expected cost incurred to the left of 0 is infinite. To be sure that the sums and integrals here are well defined. We shall discuss the cases of discrete and continuous time side-by-side. we shall write (Xn)n~O for a discrete-time chain with transition matrix P. at state i say. As usual. In the examples just discussed we were able to calculate potentials by writing down and solving a system of linear equations. Ci ~ 0 for all i E D and Ii ~ 0 for all i E aD. More generally. Indeed. we call aD the boundary. Nevertheless. As the examples show. and in continuous time where T denotes the hitting time of aD. these are simple examples of potentials for Markov chains. or even that the boundary is non-empty. and although the walk eventually drifts away to the right. Note that we do not assume the chain will hit the boundary. we are now going to give some general results on potentials. Let us partition the state-space I into two disjoint sets D and aD. that is. when and if it hits the boundary. Throughout. and will reveal also what happens when the linear equations do not have a unique solution. we shall assume for the most part that c and I are non-negative. Further theory It was clear at the outset that cPo = 00 when c ~ 1. In the explosive case we always set c( 00) = 0. We shall consider the associated potential. so this assumption is not too restrictive. We suppose that functions (Ci : i E D) and (Ii: i E aD) are given. we insist that (Xt)(~O be minimal. and (Xt)t~O for a continuous-time chain with generator matrix Q. defined by in discrete time. These will help to reveal the scope of the ideas used in the examples. it incurs a cost Ci per unit time.

2) and 'l/Ji ~ 0 for all i.2. (ii) Consider the expected cost up to time n: .1) ¢ = P¢+ C in D { ¢=I in aD.3. ¢ = f on aD.1) has at most one bounded Proof. then (4.2 Potential theory 139 Theorem 4. then 'l/Ji ~ ¢i for all i. Set where T denotes the hitting time of aD. For i E D by the Markov property IEi ( L l~n<T c(Xn ) + f(X T )IT<oo Xl c(Xn ) = j) j) = IEj so we have (L + f(XT )lT <OO) = ¢j n<T <Pi = Ci + LpijlE ( L jEI l~n<T c(Xn ) + f(X T )l T <oo Xl = =Ci+ LPWPj jEI as required. = 1 for all i. (iii) if IPi(T < 00) solution. (ii) if'l/J = ('l/Ji : i E I) satisfies 'l/J { 'l/J ~ ~ P'l/J+c in D in aD I (4. Then (i) the potential ¢ = (¢i : i E I) satisfies (4. (i) Obviously.4. Suppose that (Ci : i E D) and (Ii: i E aD) are nonnegative.

by the argument ¢(n + 1) = c + P¢(n) in D in aD.140 4. 2: jn-IED Piil'" Pjn-2jn-l Cjn_l PijI'" Pjn-lin !jn + L jlED '" L L jn-IEDjnEaD + = 2: .2). Also.. For i E D we have 'ljJi ~ Ci + L jEaD Pij!j + LPij'ljJj JED and. But also. proving (iii). we find 00. in the case of equality. with equality when equality holds in (4. Then 1/J ~ P1/J + c ~ P¢(O) + c = ¢(1) in D and 1/J ~ f = ¢(1) in aD. and hence 1/J ~ ¢. Further theory i ¢i as n ~ By monotone convergence. ¢i(n) used in part (i).2).. so 1/J ~ ¢(1). L jlED Pijl · · · Pjn-lin 'ljJjn jnED E i (c(XO)lT>O + !(XdlT=l + C(Xdlr>l + ···+ C(Xn-1)lT>n-l + !(Xn)lT=n + 'ljJ(Xn)lT>n) = ¢i(n - 1) + IEi (1/J(Xn)lT~n) D as required. if l1/Ji I ~ M and Pi (T < 00) = 1 for all i. This is another proof of (ii)... { ¢(n + 1) = f Suppose that 1/J satisfies (4.2) then with equality if equality holds in (4. . Similarly and by induction. (iii) We shall show that if 1/J satisfies (4. then asn~oo so 1/J = limn~oo ¢(n) = ¢. by repeated substitution for 'ljJi 1/J on the right ~ Ci + L jEaD Pijli + LPijCj JED + ···+ 2: jlED . 1/J ~ ¢(n) for all n.2) and 1/J ~ 0 = ¢(O).

is the { -QcP = C cP=1 in D in aD. Assume that (Xt)t~O is minimal.T' n-l k==O Then lE(Mn+l I F n ) = L c(Xk)lk<T + f(XT )lT<n k==O + (P1jJ + c)(Xn )lT>n + I(Xn )lT==n :::. If the linear equations have a finite non-negative solution on J. Nevertheless. To do this we restrict our attention to the set of states J accessible from i. if finite. Theorem 4.2 Potential theory 141 It is illuminating to think of the calculation we have just done in terms of martingales.3) . which is a priori unknown.4. We have to state it slightly differently because when cP takes infinite values the equations (4.3.2. Consider n-l Mn = L c(Xk)lk<T + f(XT )IT<n + 'ljJ(Xn )ln::.2). Although the conclusion then appears to depend on the finiteness of cP. For continuous-time chains there is a result analogous to Theorem 4. since i leads to j.Mn with equality if equality holds in (4. (4. it still follows that with equality if equality holds in (4. we can still use the result to determine cPi in all cases.3) may involve subtraction of infinities.4. and that (Ci : i E D) and (Ii: i E aD) are non-negative.2). then cPj = 00 for some j E J. which forces cPi = 00. If not. Set where T is the hitting time of aD. We note that M n is not necessarily integrable.2. and therefore not make sense. Then minimal non-negative solution to cP = (cPi : i E I). then (cPj : j E J) is the minimal such solution.

..2. Then iT c(Xt)dt + !(XT)lT<oo = o L n<N c(Yn)Sn+l + !(YN)lN<oo where N is the first time (Yn)n~O hits aD.3. then (4. Since the finite solutions of (4. ¢ is therefore the minimal non-negative solution to ¢ { ¢ = IT¢ + c == f in D in aD. ¢i = lEi(L c(Y ) + !(YN)lN<oo).. Denote by (Yn)n>O and 8 1 . n n<N By Theorem 4. We have lE ( ) Sn+l I Yn c(Yn so. Further theory If ¢i = 00 for some i. Potentials with discounted costs may also be calculated by linear equations. 1) or rate A E (0. this proves the result. Theorem 4. and where we use the convention o x 00 = 0 on the right. and by IT the jump matrix. if IPi(T < 00) == 1 for all i. indeed the discounting actually makes the analysis easier.4) are exactly the finite solutions of (4. Set ¢i = lEi then ¢ n==O L anc(Xn ) 00 = (¢i :i E I) is the unique bounded solution to ¢ = aP¢+c.142 4. Suppose that (Ci : i E I) is bounded.4) which equations have at most one bounded solution if IPi(N < 00) = 1 for all i.82 . the jump chain and holding times of (Xt)t~O. then (4. by Fubini's theorem = J") = __ = Cj { 0 Cj / qj if Cj >0 if Cj = 0. (4. corresponding to an interest rate.5. and since N is finite whenever T is finite. Proof.3). . D It is natural in some economic applications to apply to future costs a discount factor a E (0. 00 ).3) has no finite non-negative solution. Moreover.2.3) has at most one bounded solution.

then M < 00. ¢ = so c+aP¢. Set (Xt)t~O is non-explosive. then 00 143 I<Pi I :::.a) so ¢ is bounded. aM. By the Markov property Then 00 <Pi = lEi L anc(Xn ) n==O = ci + a LPijlE jEI jEI (f n==l an-Ic(Xn ) Xl = j) = Ci + a LPij<Pj. On the other hand.. because it really corresponds to a version of the discrete-time result where the discount factor may depend on the current state.6. a LPijl'l/Jj jEI <pjl :::.¢ so = c + aP1/J. C for all i. (4.Q)¢ = c. D Hence M ::.2 Potential theory Proof. Suppose that <Pi = lEi then ¢ 1 00 e->. Suppose that ICil ::. aM. which forces M = 0 and 1/J = ¢. suppose that 1/J is bounded and also that 1/J Set M = sUPi l1/Ji . Assume that (Ci : i E I) is bounded.2.4. Theorem 4. = aP(1/J - ¢) I'l/Ji - <Pi I :::. which however lies a little deeper. But 1/J .¢il. CLan = n==O C/(l.tc(Xt)dt. We have a similar looking result for continuous time.5) . = (¢i : i E I) is the unique bounded solution to (A .

By Fubini's theorem <Pi Suppose Ci :::.so ¢ is bounded. What we describe is just the simplest case of a structure of great generality. = Ei l T c(Xt)dt. T qa = O. When C takes negative values we can apply the preceding argument to the potentials <P. ¢ is the unique bounded solution to -Q¢ = c. which is the same as (4. Hence. Then (Xt)t~O is a Markov chain on I U {a} with modified transition rates Qi = qi + -X.2. so 1jJ-¢ is the unique bounded solution for the case when c = 0. We shall now take a brief look at some structural aspects of the set of all potentials of a given Markov chain. D The point of view underlying the last four theorems was that we were interested in a given potential associated to a Markov chain. where ct = Ei 1 00 e->. and wished to calculate it. Also T is the hitting time of a. then (-X-Q)(1jJ-¢) = 0. and then at the role of the boundary. . subtracting (-X .tc±(Xt)dt = (±c) V o. C for all i. Then ¢ = ¢+ . if 1jJ is bounded and (-X-Q)1jJ = c.5). Finally. We have so.¢.4. then so ¢ is bounded. First we shall look at the Green matrix. Qia = -X. by Theorem 4.Q)¢ = c. Further theory Proof. Let T be an independent E(-X) random variable and define Xt = . and is finite with probability 1. which is O.{Xt a for t < for t ~ T.144 4. Assume for now that c is non-negative. Introduce a new state a with Ca = O.

1) and (4. The potential is defined by ¢i = lEi L c(X n==O 00 n) in discrete time. in discrete time . Indeed.5 and 2.11: we know that 9ij = 00 if and only if i leads to j and j is recurrent.2 Potential theory 145 Let us consider potentials with non-negative costs c. once we know the Green matrix. The Green matrix is also called the fundamental solution of the linear equations (4. and in continuous time Thus 9ij is the expected total time in j starting from i. in continuous time ¢ = Gc.4. The jth column (9ij : i E I) is itself a potential. we have explicit expressions for all potentials of the Markov chain.3). j E I) is the Green matrix 00 Similarly. These quantities have already appeared in our discussions of transience and recurrence in Sections 1. and without boundary. with G= 1 00 P(t)dt. and in continuous time By Fubini's theorem we have 00 00 ¢i = L n==O lEic(Xn ) = L(pnC)i = (GC)i n==O where G = (9ij : i. We have 9ij = lEi L n==O 00 lXn=j in discrete time. Thus.

and the boundary aD. .. Here are two examples. Unlike the Green matrix the resolvent is always finite.1)) and (R>. Our object now is to examine the relation between non-negative functions. Indeed.\ E (0. : . The formula for continuous time is hi /j is the return proba- For potentials with discounted costs the situation is similar: in discrete time <Pi where = lEi L anc(Xn ) = L anlEic(Xn ) = n=O n=O 00 00 (RaC)i and in continuous time where R>. for finite state-space we have Q : and We return to the general case. and bility for j. with boundary aD.t P(t)dt..00)) the resolvent of the Markov chain..146 4. harmonic in D. We call (R Q: E (0. Any bounded function (¢i : i E I) for which ¢ = P¢ in D is called harmonic in D. Further theory where is the probability of hitting j from i. = 1 00 e->.

Let hi denote the absorption probability for a.~ a .4. but with different boundary values. States a and b are absorbing. Thus we can find two non-negative functions ha and hb . By the method of Section 1... hg = o. starting from i...~ . In fact.. We set aD = {a. the most general non-negative harmonic function ¢ in D satisfies ¢=P¢ { ¢=f in D in aD where fa' fb 2:: 0. -- b Example 4.. . where each allowable transition is made with equal probability.7 Consider the random walk (Xn)n~O on the above graph. harmonic in D. . The linear equations for the vector h a read ha = Ph a { h~ in D = 1... b}..2. and this implies Thus the boundary points a and b give us extremal generators h a and h b of the set of all non-negative harmonic functions.3 we find 1/2 5/12) 1/2 1/3 1/2 0 where we have written the vector h a as a matrix. corresponding in an obvious way to the state-space.2 Potential theory 147 .

2.. 1.2.148 Example 4.. except that at o it jumps to -lor 1 with probability 1/2. ..8 4. . Further theory Consider the random walk (Xn)n~O on Z which jumps towards 0 with probability q and jumps away from 0 with probability p = 1 . cPi = qcPi+l + PcPi-l The first equation has general solution for i = 1.(q/p)i) which is non-negative provided A has general solution for i = 0.. In the preceding example the generators of C were in one-to-one correspondence with the points of the boundary . at speed p . .2..q. . In this example there is no boundary. for i = -1. .q.the possible places for the chain to end up.2 1 . -2. Let us consider the problem of determining for (Xn)n~O the set C of all non-negative harmonic functions cP. We must have cPi = PcPi+l + qcPi-l cPo = ~cPl + ~cP-l. the third equation cPi = A' + B' (1 . It follows that all non-negative harmonic functions have the form where f-. . . which is fully developed i~ other works. ~ +B O. f+ ~ 0 and where hi = h~i and +_ { ~ hi 1 + ~ (1 . is that the set of non-negative harmonic functions may be used to identify a . . 2 . . . . Similarly. . . . . 2.2. We choose p > q so that the walk is transient. For we have The suggestion of this example. -1. non-negative provided A' + B' ~ O. cPi = A + B(l . . To obtain a general harmonic function we must match the values cPo and satisfy cPo = (cPl + cP-l)/2.. .(q/p)-i) for i = 0.. but the generators of C still correspond to the two possibilities for the long-time behaviour of the chain. we can show that (Xn)n~O is equally likely to end up drifting to the left or to the right. 2..(q/p)-t for i = -1..1. In fact. starting from 0. This forces A = A' and B + B' = O.(q/p)i) 1( ") for i = 0.

Indeed. by monotone convergence <Pi = lEi (J(XT )) = L jE8D hJI»i(XT=j).2. Hence every bounded harmonic function is determined by its boundary values and. Amsterdam.6) { h8 = 1 in aD. Then.7.2.7) is the minimal solution.8. 1984).6) has a unique bounded solution then hf = JP>i(T < 00) = 1 for all i.2.3 when the situation is more like Example 4. which sometimes just consists of points in the state-space. Then by the methods of Section 1. See. if Pi (T < 00) = 1 for all i. then. Markov Chains by D. Set h? = JP>i(T < 00) where T is the hitting time of aD. Revuz (North-Holland. Hence if (4. We cannot begin to give the general theory corresponding to Example 4. but we can draw some general conclusions from Theorem 4. Suppose we have a Markov chain (Xn)n~O with absorbing boundary aD. and since (4. Suppose from now on that JP>i(T < 00) = 1 for all i. we showed more generally that this condition implies that ¢ = P¢ + c in D { ¢= f in aD has at most one bounded solution. Note that hf = 1 for all i always gives a possible solution. (4.3 we have h8 = Ph 8 in D (4. L jE8D where .3. with boundary values ¢i = Ii for i E aD. any bounded solution is given by (4. Conversely.7). indeed <P = !i hi .6) has a unique bounded solution. harmonic in D. but more generally corresponds to the varieties of possible limiting behaviour for X n as n ~ 00. for example. Let ¢ be a bounded non-negative function.2.4. as we showed in Theorem 4.2 Potential theory 149 generalized notion of boundary for Markov chains.

h.3 Let (Ci : i E I) be a non-negative function.7.2 Prove the fact claimed in Example 4. Can you find a similar reduction for continuous-time chains? 4.2. - < 0/'1.1 Consider a discrete-time Markov chain (Xn)n~O and the potential ¢ with costs (Ci : i E D) and boundary values (Ii: i E aD). where T is the hitting time of aD and a is a new state. Check that <Pi = E i L n<T c(Xn ) = Ei L c(X n=O 00 n) where T = T + 1 and where we set Ci = Ii on aD and Ca = O. . Show that (Xn)n~o is a Markov chain and determine its transition matrix. Consider now a new partition linear equations (Xn)n~O D u aD.2. Exercises 4.2. where D~ D. and that 11/" 0/'1. Partition I as D U aD and suppose that the linear equations ¢ = P¢ +c in D { ¢=0 in aD have a unique bounded solution. without boundary at all. the hitting probabilities for boundary states form a set of extremal generators for the set of all bounded non-negative harmonic functions. indeed. Further theory Just as in Example 4.8 that 4. Show that the Markov chain with transition matrix P is certain to hit aD. Set ifn if n ~ T > T. This shows that a general potential may always be considered as a potential with boundary value zero or.=0 in aD also have a unique bounded solution.2.2. . Show that the 'l/J = P'l/J + c in 15_ { 1f. for all i E I.150 4.

which determines the conductivity as the reciprocal of its resistance. and that a given current 9i enters the network at each node i ED. These have the effect that each node i E aD is held at a given potential Ii. each 'wire' contains a resistor. External connections are made at the nodes in aD and possibly at some of the nodes in D. The nodes are partitioned into two sets D and aD. current may also enter or leave the network through aD . Where no wire joins i to j we take aij = O. In equilibrium. but here it is not the current but the potential which is determined externally. each node i having a capacity 1ri > O. The first problem in electrical networks is to determine equilibrium flows and potentials. subject to given external conditions. The case where gi = 0 corresponds to a node with no external connection. There is a simple correspondence between electrical networks and reversible Markov chains in continuous-time. by Ohm's law.3 Electrical networks 4. which determines its potential ¢i by A current or flow of charge is any matrix (T'ij : i. Each node i holds a certain charge Xi. any equilibrium potential ¢ = (¢i : i E I) must satisfy for i E D LjEIaij(¢i . Some nodes are joined by wires.3 Electrical networks 151 An electrical network has a countable set I of nodes.¢j) = 9i. so T'i = 9i for i E D. . In practice. Therefore.8) { ¢i = fi' for i E aD. j E I) with T'ij = -T'ji. the wire between i and j having conductivity aij = aji 2:: o.4. j E I) we shall write 'Yi T'i for the total flow from i to = 2: JEI 'Yij' In equilibrium the charge at each node is constant. (4. given by for i =I j. Physically it is found that the current T'ij from i to j obeys Ohm's law: Thus charge flows from nodes of high potential to nodes of low potential. Given a flow the network: (T'ij : i.

pick one node k.9) where Ci = 9i/7ri. by Theorem 4. the equilibrium potential is given by ¢i = Ei (iT c(Xt)dt + f(XT )) L9i=O iEI (4. to keep matters simple here. and replace the condition ~k = 9k by ¢i = O. This is enough to ensure uniqueness of potentials. but now aD is non-empty. The capacities 7ri are the components of an invariant measure.2. The new problem is equivalent to the old. indicating the interesting possibility that there may be more than one equilibrium potential. However.10) where T is the hitting time of aD. Further theory We shall assume that the total conductivity at each node is finite: ai = Laij < 00. The equations for an equilibrium potential may now be written in a form familiar from the preceding section: -Q¢ { =c ¢= f in D in aD. the case where aD is empty may be dealt with as follows: we must have or there is no possibility of equilibrium. and the symmetry of aij corresponds to the detailed balance equations. (4. we shall assume that I is finite. We know that these equations may fail to have a unique solution. and that aD is non-empty. Then. set aD = {k}.4. A 1 2 2 1 B 2 2 1 c D E F . because ct and f have the same physical dimensions. In fact. j#i Then ai = 7riqi = -7riqii. It is natural that c appears here and not 9. that the network is connected.152 4.

the equilibrium potential is given by cPi = Ei (lxT=A) = JP>i(XT = A) where T is the hitting time of {A. cPE = 1. flows and charges in terms of the associated Markov chain. = 1/2 and cPe = 1/4. since the total current leaving Band C must vanish (cPB . Here is a general result expressing equilibrium potentials.cPA) + (cPB .cPe) 2(cPe .cPe. subject to XB = 0. but the same jump chain and hence the same hitting probabilities. = O.1 153 Determine the equilibrium current in the network shown on the preceding page when unit current enters at A and leaves at F.cPB) Hence. By symmetry. ~eF = 1/2. Then. cPB = 0.3. This will result in some flow from A to F. and the associated flow is given by ~AB = 1/2. and the associated Markov chain (Xt)t~o. The conductivities are shown on the diagram.10). Note that the node capacities do not affect the problem we considered. (c) The charge X associated with ~. Then there is an associated Markov chain and.3 Electrical networks Example 4. Let us arbitrarily assign to each node a capacity 1.cPB and cPD = 1. which we can scale to get a unit flow. Let us set cPA = 1 and cPF = O.cPF) + 2(cPe . ~Be = 1/2. (a) The unique equilibrium potential cP with cPA = 1 and cPB = 0 is given by where T A and TB are the hitting times of A and B. by Ohm's law.4. F}. In fact. Consider a finite network with external connections at two nodes A and B.2. Different node capacities result in different Markov chains. Theorem 4.3. according to (4. is given by . (b) The unique equilibrium flow ~ with ~A = 1 and ~B = -1 is given by where r ij is the number of times that (Xt)t~O jumps from i to j before hitting B. we were lucky .no scaling was necessary.cPE) + 2(cPB . ~BE = O.

Further theory Proof.f aijl." L-J ~n-'I.B} if i = B so if ~ij = EA(r ij - = r ji ) then ~ is a unit flow from A to B.n = LIPA(Y n n=O < NB)1rij.n< N} B '1. j E I) E(<p) = lL i.-. D ~ is the equilibrium unit flow and X the associated charge. So. as The interpretation of potential theory in terms of electrical networks makes it natural to consider notions of energy. n < NB) = i.jEI (<Pi . Observe that if X o = A then if i = A if i ~ {A.-. and consider the associated potential'l/Ji = 00 Then Xiqij so = XiQi 1rij = LIPA(Yn = i.10). .) _0. We shall prove (b) and (c) together. We define for a potential ¢ = (¢i : i E I) and a flow ~ = (~ij : i.<pj)2aij .~n+l=). where c = 0 and f l{A}.. required.154 4. by the lEA(rij) = LIPA(Y n=O 00 00 n = i. Yn +! = j. j I(r) = l L ." 0 n=O where N B is the hitting time of B for the jump chain Markov property of the jump chain (Yn)n~O. The formula for ¢ is a special case of (4. 00 We have roo = '""'1{.jEI i. Set Xi = lEA 10 fTB l{Xt=i}dt Xi/1ri.n < NB) 1rij = lEA(rij) n=O ('l/Ji -'l/Jj)aij = Xiqij - Xjqij = ~ij· Hence'l/J = ¢.

¢j)aij = L (Ei . The equilibrium potential and flow may be determined as follows.3.Ej)rij = 2 LEi'Yi = 0 i. we shall see that certain equilibrium potentials and flows determined by Ohm's law minimize these energy functions. by the network. where c = (ci : i E I) with Ci = 0 for i E aD.3. We can write any potential in the minimization problem in the form ¢ + c.3 Electrical networks The 1/2 means that each wire is counted once.jEI (a) Denote by ¢ = (¢i : i E I) and by ~ = (~ij : i. We have ~i = 0 for i E D.j E I) the equilibrium potential and flow. When ¢ and by Ohm's law we have E(¢) = ~ 155 are related l L (¢i i.j EI (Ei .4.j EI iEI so E(¢ + c) = E(¢) + E(c) ~ E(¢) with equality only if c = o. j E I) with current sources ri for i E D and boundary potential zero is the unique solution to minimize subject to Proof. Moreover. For any potential ¢ we have I (~ ) ~i = 9i for i E D. as heat. (a) The equilibrium potential ¢ = (¢i : i E I) with boundary values ¢i = Ii for i E 8D and no current sources in D is the unique solution to minimize E (¢ ) subject to ¢i = fi for i E aD.¢j)rij = 2 L iEI ¢i'Yi. = 9i (b) The equilibrium flow r = (rij : i. This characteristic of energy minimization can indeed replace Ohm's law as the fundamental physical principle.j E I) L (¢i . Then L i. i. Theorem 4.Ej)(¢i . = (¢i : i E I) and any flow ~ = (~ij : i.jEI ¢j)rij = I(r) and E( ¢) is found physically to give the rate of dissipation of energy. .

for all potentials I = (Ii : i E J). for all I. i. 2 1(8) = ~ L i.¢j )8ij = 2 L ¢i 8i = 0.Ji) aij' jEJ For a conductivity matrix a on J.J i.j E J) is an effective conductivity on J if.4.¢j)aij : 0 { L(¢i . a).jEI 'Yij8ijai/ = L (¢i . a) as for (I. Then cP is the unique solution to minimize subject to E (¢ ) ¢ = f in aD.3. Then L i. for a potential I 8 = (8ij : i.j E I) is a flow with 8i = 0 for i E D. Then a is an effective conductivity if. Let J ~ I. for i E J we have L. the external currents into J when J is held at potential I are the same for (J. In fact.Ii for 'I. We can write any flow in the minimization problem in the form ~ + 8.jEI(¢i .jEJ . Corollary 4.¢j)aij JEI = :E(Ji . where 8 = (8 ij : i. The following reformulation of part (a) of the preceding result states that harmonic functions minimize energy. We have cPi = 0 for i E aD. j E I) the equilibrium potential and flow. j E J) we set - = (Ii : i E J) and a flow E(/) = 1 " (Ii 2 L. Further theory (b) Denote by cP = (cPi : i E I) and by ~ = (~ij : i. equivalent in its response to external flows and potentials.Ij) aij. given any network.jEJ 8fj u i?· . We know that I determines an equilibrium potential ¢ = (cPi : i E I) by for ~ ~ J cPi . there is always another network of wires joining the externally connected nodes alone.jEI iEI I(~) so I(~ + 8) = D + 1(8) ~ 1(8) with equality only if 8 = o.156 4. Suppose that ¢ = (¢i : i E I) satisfies Q¢ = 0 in D { cP = I in aD. An important feature of electrical networks is that networks with a small number of external connections look like networks with a small number of nodes altogether. We say that a = (aij : i. E J.

There is a unique effective conductivity a given by 157 aij where for each j E J.3 Electrical networks Theorem 4. for i E J E (¢ ) ¢i = Ii for i E J.J { Moreover. q) = aij + L aik<P{ k~J = (¢~ : i E I) is the potential defined by = 0 ~kEI(<p1 . ¢ solves minimize subject to We have.3.5. Given I = (Ii: i E J).4. by Corollary 4.3.<p{)ai~ for i tf.4. o off J Proof. J for i E J. and also by the Thompson variational principle inf Oi==gi on J1(8) = "Yi== inf {J gi on I(~). Z:=aij<Pj = z:=aijfj + z:=z:=aik<p{fj = z:=aijfjo JEI jEJ k~JjEJ jEJ In particular. (4.11) a is characterized by the Dirichlet variational principle <Pi==fi on J E(/) = inf E(¢). taking I == 1 we obtain Laij JEI = Laijo jEJ . define ¢ = (¢i : i E I) by <Pi = L 1i<p1 jEJ then ¢ is the equilibrium potential given by ~jEI aij(¢i { ¢j) =0 ¢i = Ii for i fj. ¢~ = 8ij for i E J. and.

jEI = = L (cPi . Define the time spent in J At = it l{X s EJ}ds and a time-changed process (X t)t~O by X t = Xr(t) where r(t) = inf{s ~ 0 : As > t}.jEJ i. Further theory Hence we have equality of external currents: 2)cPi JEI cPj )aij = 'r)1i . then L "Yljai/ i.j E J. for any flow 8 and bi = 0 for i ¢.158 4.cPj)2 aij i. we also have equality of energies: L (cPi .jEI L (Ii . by Theorem 4. J.Ii )2 Uij = L gfjUi/. we have = (8ij : i.j E I) with 8i = gi for i E J i.3.jEI = 2L iEI cPi L(cPi .jEI L ~2 -1 v··a·· ~J ~J > L i.Ii )Uij · jEJ Moreover. We obtain (Xt)t~O by observing (Xt)t~O whilst in J. with jump matrix IT given by for i. iEJ jEJ i.¢j )aij. By applying the strong Markov property to the jump chain we find that (X t)t~O is itself a Markov chain.jEJ so.3.jEJ Finally. if gij = (Ii - Ij )aij and ~ij = (¢i .jEJ 2 --1 g··a·· • ~J ~J o Effective conductivity is also related to the associated Markov chain (Xt)t~O in an interesting way. 1'fij = 'lrij + L'lrik¢k k~J . and stopping the clock whilst (Xt)t~O makes excursions outside J. This is really a transformation of the jump chain.cPj)2aij i. i.cPj)aij JEI = 2 L Ii LUi -!i)Uij = L Ui _!i)2Uij .

4. L. methods coming from one theory one provide insights into the other. and is also called the Wiener process. There is much more that one can say. it makes it reasonable to . Snell (earus Mathematical Monographs 22. Also.4 Brownian motion Imagine a symmetric ran'dom walk in Euclidean space which takes infinitesimal jumps with infinite frequency and you will have some idea of Brownian motion. 1984).4. Hence has Q-matrix given by Qij = qij (Xt)t~O + L qik¢{ k~J Since ¢-i = (¢{ : k E I) is the unique solution to (4. It is named after a botanist who observed such a motion when looking at pollen grains under a microscope. this shows that so (Xt)t~O is the Markov chain on J associated with the effective conductivitya. Moreover. See Example 1. for example in tying up the nonequilibrium behaviour of Markov chains and electrical networks. you should see Random Walks and Electrical Networks by P. The mathematical object now called Brownian motion was actually discovered by Wiener. For an entertaining and illuminating account of the subject. The simple symmetric random walk (Xn)n~O on Zd is a Markov chain which is by now quite familiar.4. This provides an elementary way of thinking about Brownian motion.4 Brownian motion where 159 ¢{ = JP>k(XT = j) and T denotes the hitting time of J. 4. Doyle and J. G. Mathematical Association of America. A discrete approximation to Euclidean space ]Rd is provided by where c is a large positive number.11). We shall show that the scaled-down and speeded-up process X t(c) - c -1/2Xct is a good approximation to Brownian motion.

. For ct E Z+ we have xi so the square-root scaling gives which is independent of c.4. Then.:::o is said to be continuous if lP({w : t. we will at least want IE[lxi ) 12 ] to converge to a non-degenerate limit. and then show that this is not an empty definition. Theorem 4.:::o is called a Brownian motion if B o = 0 and for all 0 = to < tl < . The fundamental role of Gaussian distributions in probability derives from the following result. A real-valued random variable is said to have Gaussian distribution with mean 0 and variance t if it has density function cPt(X) = (27T"t)-1/2 exp{ _x 2/2t}. Brownian motions exist. for all bounded continuous functions f. X 2 . that is to say. At the end of this section we state some results which confirm that this is true to a remarkable extent. Let Xl..-+ Xt(w) is continuous}) = 1. Why is space rescaled by the square-root of the time-scaling? Well. ••• be a sequence of independent and identically distributed real-valued random variables with mean 0 and variance t E (0. We begin by defining Brownian motion.00). if c ) converges in some sense as c ~ 00 to a non-degenerate we hope that c limit.1 (Central limit theorem). < t n the increments .160 4. A continuous real-valued process (Bt)t. Further theory suppose that some properties of the random walk carryover to Brownian motion. A real-valued process (Xt)t. There are many introductory texts on probability which give the full details. as n ~ 00 we have We shall take this result and a few other standard properties of the Gaussian distribution for granted in this section..

We shall show how to extend this process successively to Brownian motions (Bt : tEDN ) indexed by D N .1 ). B s .. We need the following result. B t = ~(Br We obtain two new increments: + B s) + Zt· Br ) Bt .Bt .Br = ~ (B s = ~(Bs - + Zt.. which is a measure on the set of continuous paths.00). Then (B t : tED) is a Brownian motion indexed by D. and finally check that the extension is a Brownian motion.4. for each tED.N in [0.2. and denote by D the union of these sets.00). Proof. To put it properly. denote by D N the set of integer multiples of 2.1 set r = t . . an independent Gaussian random variable yt of mean 0 and variance 1. is uniquely determined...1 and define Zt = 2-(N+l)/2yt. .Zt. . t E D N . the law of Brownian motion.1. However. For t E Do = Z+ set then (B t : t E Do) is a Brownian motion indexed by Do. it is not obvious that there is any such process. Theorem 4.4. < t n in D N the increments are independent Gaussian random variables of mean 0 and variance We suppose given.N so that s. For N = 0.2.4 Brownian motion are independent Gaussian random variables of mean 0 and variance 161 The conditions made on (Bt)t~o are enough to determine all the probabilities associated with the process.1 • For t E DN\D N . Suppose we have constructed (Bt : t E D N .2 (Wiener's theorem). Let us say that (B t : t E D N ) is a Brownian motion indexed by D N if B o = 0 and for all o = to < tl < . Brownian motion exists. Next we shall show that (Bt : tED) extends continuously to t E [0.N S = t + 2. a Brownian motion indexed by D N .B r ) .

~2-(N-I) The two new increments.1 IP(2(N+l)/2 M N > 'x)d'x ~ 2N .162 We compute E[(Bt .Xp.B t )] = = + 2-(N+I) = 2.I .Xp-1IP(IY11 > 'x)d'x = 2N . by our construction we have with yt Gaussian of mean 0 and variance 1. Moreover. being Gaussian.B r )2] 4.I ) n [0. s). we obtain a Brownian motion (B t : tED). For each N denote by (BiN))t~O the continuous process obtained by linear interpolation from (Bt : t E D N ). Hence. ~2-(N-I) . For t E DN\DN . tE[O.B r and yt.I we have zi ) = O. N For t E D N .1 1 00 p. we obtain MN = sup 2-(N+I)/2Iytl.l] There are 2 N - I points in (DN\D N .1E(IY1IP) . being constructed from B s . since (ZiN))t~O interpolates linearly between its values on D N .1]. Hence (Bt : tEDN) is a Brownian motion indexed by D N. set zi N ) = BiN) .N . tE(DN \DN -l)n[o. Also. they are certainly independent of increments over intervals disjoint from (r.I] Then. are therefore independent and of the required variance.2-(N+I) = o. by induction. So for A > 0 we have For a random variable X ~ 0 and p > 0 we have the formula Hence 2P(N+l)/2E(M~) = 1 00 p.BiN-I). Further theory = E[(B s - - B t )2] E[(Bt Br)(B s . Set MN = sup IziN)I. as required.

3..?:o we can let m distribution for the increments to obtain the desired D Having shown that Brownian motion exists.4.. we now want to show how it appears as a universal scaling limit of random walks. L 00 E(MKr)l/P :::.· . < t~ for all m and tf: ~ tk for all k. < t m .4. Now BiN) eventually equals B t for any tED and the uniform limit of continuous functions is continuous..c -1/2Xct where the value of X ct when ct is not an integer is found by linear interpolation.?:o. Set to = to = o.aBtm )] where (Bt). It remains to show that the increments of (Bt)t. But given 0 < t 1 < . · · · ...1]. For c > 0 consider the rescaled process X t(c) .00). + zi N) converges uniformly in t E [0. with probability 1. (B t : tED) has a continuous extension (Bt)t. using continuity of (Bt)t... Theorem 4. for any p > 2 00 00 163 ELMn= LE(MN ) N=O N=O : :.··· .t.t(f.?:o is a Brownian motion.Xi~)] -tE[f(aBh as c ~ 00. .t~ - t~-l· ~ 00 Hence. .4 Brownian motion and hence. E(IY1IP)1/p ~ 00 L 00 (2(p-2)/2 p )-N < 00. Therefore. and by a similar argument uniformly for t in any bounded interval. we have - E[f(Xi~). as N N=O BiN) = Bi O + zi 1 ) ) + . .?:o have the required joint distribution.. as claimed. < t n we can find sequences (t''k)mEN in D such that 0 < t 1 < .. for all bounded continuous functions f : jRm ~ jR and all 0 ~ tl < .?:O be a discrete-time real-valued random walk with steps of mean 0 and variance 0'2 E (0. N=O It follows that. Let (Xn)n. for all m. We know that the increments are Gaussian of mean 0 and variance tr . very/much as the Gaussian distribution does for sums of independent random variables. Then.

L where Y n denotes the nth step of (Xn)n~O. Given d independent Brownian motions (Bi )t~o. D To summarize the last two results. let us consider the JRd-valued process B t = (Bi. Note the similarity to the definition of a Poisson process as a rightcontinuous integer-valued process (Xt)t~O starting from 0.i~e) replacing xi ). .X-(e) tk tk-l' for k Z k = a (B tk .Bt). There now follows a series of related remarks. In the proof we shall take for granted some basic properties of weak convergence.. having stationary independent increments and such that X t is Poisson of parameter At for each t ~ O.[etk-I]. . . . m.164 4..Zm). Since .. + YN(e)) converges weakly to (tk . . .ide) = Bo = 0 it suffices to show that U~c) = 12 C.U~)) converges weakly to (ZI. First define -(e) Xt - e -I/2X[et] where (et] denotes the integer part of ct.(Bt)t~o.aB trn ) as c ~ 00.t (tk . we established a sort of convergence to Brownian motion.. . That was Wiener's theorem. The right side converges weakly e to 0.. Then. Consider now the increments (e) Uk .. using the central limit theorem applied to the increments of a rescaled random walk. . . . We call (Bt)t~o a Brownian ...tk_l)1/2.1 / 2 Nk(C)1/2) . Then \(xt(e) 1 ...... as required.. so it suffices to prove the claim with . · .. e -I/2\('V:[et 1 ] +1. But e (Ui ). it suffices to show that U~e) converges weakly to Zk for each k..(X-(e) . By the central limit theorem Nk(e)-I/2(YI + . .1/ 2Nk(C)1/2)Nk(C)-1/2(Yi + .tk_l)-1/2 Zk.. . . we have shown. .. using special properties of the Gaussian distribution. X-(e))\ :::. ...X-(e) .B tk-l ) increments are independent./ [etk] L Yn rv (C. . Further theory Proof The claim is that (xi~). that there is a continuous process (Bt)t~o with stationary independent increments and such that B t is Gaussian of mean 0 and variance t. X(e)) .L [et n] +1 'V:)\ t rn t1 t rn . Hence U~c) converges weakly to Zk.. + YN(c») n=[etk_-l]+1 where rv denotes identity of distribution and Nk(e) = [etk] . . . and (C. Then since both sets of = 1.. . xi~) converges weakly to (a Btl. . for each t ~ o. .

what we have proved is strong enough to ensure that (Xie))t~o does not converge. which attracts all other finite variance symmetric random walks as c ~ 00.3. the result shows that in the scaling limit they behave asymptotically the same. to anything else. By analogy with continuous-time Markov chains we look for a transition semigroup (Pt)t~O .BO)t~o is a Brownian motion (starting from 0). in which case V = I. then for all bounded continuous functions as c ~ 00 we have f : (]Rd)m ~ ]R. Here are two examples. The scaling used in Theorem 4.4. and one can with effort prove stronger forms of convergence.3 suggests the following scaling invariance property of Brownian motion (Bt)t>o. once the difference in variance is taken out. For any c > 0 the proces. More generally. which is also easy to check from the definition. it is useful to know that on large scales all one needs to calculate is the variance (or covariance matrix). Thus Brownian motion appears as a fixed point of the scaling transformation.4 Brownian motion motion in ]Rd.4. All other aspects of the step distribution become irrelevant as c ~ 00. we might take the components of (Xn)n~O to be three independent simple symmetric random walks in Z. (Bic)k. then V = ~I. As a limit of Markov chains it is natural to look in Brownian motion for the structure of a Markov process. but rather about processes with independent increments.4. in the same sense. The discussion to this point has not really been about the Markov property. Although these are different random walks.~o defined by B t(e) - C -1/2Bet is a Brownian motion. To remedy this we must first define Brownian motion starting from x: this is simply any process (Bt)t~o such that B o = x and (B t . We might take (Xn)n~O to be the simple symmetric random walk in Z3. given a random walk with a complicated step distribution. The sense in which we have shown that (Xie))t~o converges to Brownian motion is very weak. with no essential change in the proof. 165 There is a multidimensional version of the central limit theorem which leads to a multidimensional version of Theorem 4. However. Thus if (Xn)n~O is a random walk in ]Rd with steps of mean 0 and covariance matrix and if V is finite. Alternatively.

y)f(y)dy = lEx [f(Bt )]. x. Further theory and a generator G. x. dYd JRd = ( p(t. y)f(y)dy JRd = ( JRd p(t. x. by analogy with continuous-time chains. y) a2 ~x = Hence.166 4. x. JRd To check the semigroup property PsPt = Ps+t we note that Ex[f(B s+t )] = Ex [f(B s + (B s+t . x. y)f(y)dy JRd where p(t. that the generator. y) where = !~xp(t. This is the transition density for Brownian motion and the transition semigroup is given by (Ptf)(x) = ( p(t. y) = (27rt)-d/2 exp{ -Iy . For t > 0 it is easy to check that :t p(t. we have 8 at (Ptf)(x) = { !~xp(t.B s ))] = Ex [Ptf(B s )] = (PsPtf)(x) where we first took the expectation over the independent increment B s +t . y)f(y)dy = { !~yp(t. . should be given by G=!~. if aXl2 + · · · + ax d 2· JRd a2 f has two bounded derivatives. x. x..xI 2/2t}. x. This suggests. . a term we have not defined precisely. y)( !~J)(y)dy = Ex[(~~f)(Bt)] ~ ~~f(x) as t ! O. For any bounded measurable function have f : ]Rd ~ ]R we lEx[J(Bt )] = lEo[J(x + B t )] = { f(x + Y)¢t(Yd··· ¢t(Yd)dYl .B s .

t == 0. Chichester.I / 2Xl) .4 Brownian motion 167 Where formerly we considered vectors (fi : i E I). f ((N))] Xo NIE N l/2 x [f(N. the rescaled process N xi ) == N. the main ideas are very similar.Jf .I / 2 ) .I / 271 d • The closest thing we have to a derivative in t at 0 for (pt(N))t=O. For further details see. 2nd edition 1994).. E N. Roger. such as measurability and differentiability. required to have various degrees of local regularity. These are various sorts of local regularity for functions defined on the state-space jRd.4.I / 2 ) so - 2f(x) + f(x + N. . For a bounded continuous function f : jRd ~ jR. 2/N. now we have linear operators on functions: Pt is an integral operator.NIE x [f (X I(N) ) (x _ == If we assume that as N ~ 00. Markov Processes and Martingales. But. such as measurable. First. is (N) /N N ( PI/Nf . They did not appear for Markov chains because a discrete state-space has no local structure.f(N. G is a differential operator. Denote by (Xn)n~O the simple symmetric random walk in tl d and consider for N == 1. here is a result on recurrence and transience. Where formerly we considered matrices Pt and Q. f has two bounded derivatives then. You will notice some weasel words creeping in.f ) ) . . this aside. -t N(pi. Probability Theory .I/N.2..2/N. G.I / 2 ) == N-I(~f(x) + o(N)).f)(x) ~~f(x).I / 2X O)] - = (N/2){f(x . .I / 2 )}. Volume 1: Foundations by L. and a proper measure-theoretic basis for the probability...I / 2 X Nt . for example. relative to the corresponding results for chains. W. f(x . set X (pt(N) f)(x) == IEx[f(Xi N ))]. You might correctly guess that the proofs would require additional real analysis. We would like to explain the appearance of the Laplacian ~ by reference to the random walk approximation. by Taylor's theorem. C.N.. Stroock (Cambridge University Press.an analytic view by D.. . 1993). or Diffusions. continuous and differentiable. We finish by stating some results about Brownian motion which emphasise how much of the structure of Markov chains carries over. now there are functions f : jRd ~ jR.N.2f(x) + f(x + N. liN. .s and David Williams (Wiley.

Further theory jRd. Let (Bt)t~o be a Brownian motion in (i) If d = 1. The invariant measure for Brownian motion is Lebesgue measure dx. then JP>(Bt = 0 for some t but. So that we can state some results for the positive recurrent case. Let (Bt)t>o be a Brownian motion in f : jRd ~ jR be a continuous periodic function. So the projected process is positive recurrent and we can expect convergence to equilibrium and ergodic results corresponding to Theorems 1. = 2. f(x) for all z E Zd. If we accept this latter notion of recurrence the correspondence extends to dimension two.6. for any € > 0) = 0 >0 JP>( {t ~ 0 : IBt I < €} is unbounded) = 1. that in Z and Z2 the simple symmetric random walk is recurrent. moreover 1= [ J[O. This has infinite total mass so in dimensions one and two Brownian motion is only null recurrent.l]d f(z)dz JI»x (~ it f(Bs)ds -t 1 ast -t 00) = 1. whereas in Z3 it is transient. Theorem 4. we shall consider Brownian motion in jRd projected onto the torus T d = jRd /Zd.4. then for any N < 00 00 00) JP>(I B t I~ = 1.10. + z) = 00. The generator ~ of Brownian motion in jRd reappears as it should in the following martingale characterization of Brownian motion. so that jRd and let f(x Then for all x E jRd. Theorem 4.5.4.2. ! . The results correspond exactly in dimensions one and three. It is natural to compare this result with the facts proved in Section 1. In dimension two we see the fact that for continuous state-space it makes a difference to demand returns to a point or to arbitrarily small neighbourhoods of a point.168 4. The invariant measure remains Lebesgue measure but this now has total mass 1. as t ~ (iii) If d = 3. In dimension one this just means wrapping the line round a circle of circumference 1. then JP>({t ~ 0: B t (ii) If d = O} is unbounded) = 1.4.3 and 1.8. as t ~ we have -t lEx[f(Bt )] and.

1. These potentials are identical to those appearing in Newton's theory of gravity. Write (Ft )t2::o for the filtration of (Xt)t~o. Then (i) ¢ if finite belongs to C 2 (D) n C(D) and satisfies in D in aD. a continuous time process (Mt )t2::o is a martingale if it is adapted to the given filtration (Ft )t2::o. and whenever s :::.2. if JEIMtl < 00 for all t. then 'l/J ~ ¢. the following process is a martingale: This result obviously corresponds to Theorem 4. . (iv) if JPx(T < 00) = 1 for all x.12) (ii) if 1/J E C 2 (D) n C(D) and satisfies { -~~1/J ~ c in D in aD 1/J ~ f and 'l/J ~ 0.12) has at most one bounded solution in C 2 (D) n C(D). (4.2. Then the following are equivalent: (i) (Xt )t2::o is a Brownian motion. corresponding very closely to Theorem 4. then (4.4. Let D be an open set in ]Rd with smooth boundary aD.4.00) be continuous.4 Brownian motion 169 Theorem 4. Let (Xt)t~O be a continuous ]Rd-valued random process. (iii) if ¢(x) = 00 for some x. as we remarked in Section 4.7. In case you are unsure.6. Theorem 4. Let c : D ~ [0. (ii) for all bounded functions f which are twice differentiable with bounded second derivative. 00) be measurable and let f : aD ~ [0.4. t and A E F 8 • We end with a result on the potentials associated with Brownian motion.3 for Markov chains. Set <jJ(x) = JEx [I T c(Bt)dt + f(XT )IT<oo] where T is the hitting time of aD. then (4.12) has no finite solution.2.

Markov decision processes and Markov chain Monte Carlo. See Example 1. for example. that every state is either recurrent or transient. Once a Markov chain is identified.5 (virus).1. Example 1.for hitting probabilities and expected rewards.1 Markov chains in biology Randomness is often an appropriate model for systems of high complexity. In a real-world problem involving random processes you should always look for Markov chains.3.we know. We have already illustrated some aspects of the theory by simple models with a biological interpretation. We are now going to give . and for long-run behaviour via invariant distributions.1 (bacteria).5. Some have already appeared to illustrate the theory. from calculating the fair price for a random reward to calculating the probability that an absent-minded professor is caught without an umbrella. Exercise 1. queueing models. there is a qualitative theory which limits the sorts of behaviour that can occur . 5. such as are often found in biology. In this chapter we shall look at five areas of application in detail: biological models. from games of chance to the evolution of populations. References to books for further reading are given in each section.5 Applications Applications of Markov chains arise in many different areas.1. In each case our aim is to provide an introduction rather than a systematic account or survey of the field. resource management models. There are also good computational methods . They are often easy to spot.6 (octopus).4 (birthand-death chain) and Exercise 2.

and so on.5. epidemics and genetic inheritance. } with absorbing state o.1 Markov chains in biology 171 some more examples where Markov chains have been used to model biological processes. We deduce that with probability 1 either X n = 0 for some n or X n ~ 00 as n ~ 00. by setting X o = 1 and defining inductively. Further information on (Xn)n~O is obtained by exploiting "the branching structure. + N Xn _ • 1 Then X n gives the size of the population in the nth generation. Nevertheless. Suppose at time n = 0 there is one individual. The case where P(N = 1) = 1 is trivial so we exclude it.. having the same distribution as N. If P(N = 0) = 0 then P(N 2: 2) > 0. The basic branching process model has many applications to problems of population growth. . Suppose. ..1 (Branching processes) The original branching process was considered by Galton and Watson in the 1870s while seeking a quantitative explanation for the phenomenon of the disappearance of family names. each with the same distribution as N. next. that these offspring also die and are themselves replaced at time n = 2. even in a growing population. The process (Xn)n~O is a Markov chain on I = {O. 1. We can construct the process by taking for each n E N a sequence of independent random variables (N. Under the assumption that each male in a given family had a probability Pk of having k sons. they wished to determine the probability that after n generations an individual had no male descendents. in the study of population growth.:)kEN. Example 5. who dies and is replaced at time n = 1 by a random number of offspring N. and j does not lead to i. hence i is transient in any case. so for i 2: 1. by a random number of further offspring. It should be recognised from the start that these models are simplified and somewhat stylized in order to make them mathematically tractable. each independently. i leads to j for some j > i. for n 2: 1 X n = Nf + . by providing quantitative understanding of various phenomena they can provide a useful contribution to science. and also to the study of chain reactions in chemistry and nuclear fission... Consider the probability generating function </J(t) = E(t N ) = :E tkJP(N = k==O 00 k). and every state i 2: 1 is transient. The solution to this problem is explained below.2.1. We have P(Xn = 0 I X n - 1 = i) = P(N = O)i so if P(N = 0) > 0 then i leads to 0.

this gives the entire distribution of X n . We have shown that the population survives with positive probability if and only if J-l > 1. In principle. til t til t til where J-l = E(N).00 = lim ¢( ¢(n) (0)) n --+. by induction. Applications t ~ 1.00 = ¢( q) so also q 2:: r. we have q = P(Xn = 0 for some n) = lim ¢(n)(o). though ¢(n) may be a rather complicated function. 0 ¢. n--+-oo Now ¢(t) is a convex function with ¢(1) = 1. Since ¢ is increasing and 0 ~ r.1) we must have q = 1.1 ) . so if X n = k then it takes k steps to obtain X n + l . where ¢(n) is the n-fold composition ¢ 0 . . . Some quantities are easily deduced: we have E(Xn ) = lim dd lE(t Xn ) = lim dd </>(n)(t) = (lim </>'(t)r = p. Conditional on X n - l = k we have Xn so =Nf+···+Nr and so 00 lE(t Xn ) = 'LlE(t Xn I X n k==O 1 = k)JP>(Xn . Let us set r = inf{t E [0. since 0 is absorbing. hence q ~ r. we find that E(t Xn ) = ¢(n)(t).n. Suppose that in each generation we replace individuals by their offspring one at a time. . There is a nice connection between branching processes and random walks. On the other hand q = lim ¢(n+l) (0) n --+. and if ¢'(1) ~ 1 then since either ¢" = 0 or ¢" > 0 everywhere in [0. where J-l is the mean of the offspring distribution. Hence. by induction.1] : ¢(t) = t}. ¢(n)(o) ~ r for all n.172 defined for 0 ~ 5. we have ¢(O) ~ r and. Hence q = r. If ¢'(1) > 1 then we must have q < 1. also so. then ¢(r) = r by continuity.1 = k) = lE(</>(t)xn .

but it is the simplest mathematical model to incorporate the basic features of an epidemic.1 Markov chains in biology 173 The population size then performs a random walk (Ym)m~O with step distribution N .5. as infectives either die or recover and are then resistant to further infection.1. so we deduce that ql is the smallest non-negative root of the equation q = ¢(q). whether infected or not. (Xn)n>O hits 0 if and only if (Ym)m>O hits O. The classic work in this area is The Theory of Branching Processes by T. In an idealized population we might suppose that all pairs of individuals make contact randomly and independently at a common rate. These two possibilities have identical consequences for the progress of the epidemic.2 (Epidemics) Many infectious diseases persist at a low intensity in a population for long periods. after which they either die or recover. However. for n 2: 0 Observe that X n = YTn for all n. individuals themselves become infective and remain so for an exponential random time. E. and since (Ym)m~O jumps down by at most 1 each time. Moreover we can use the strong Markov property and a variation of the argument of Example 1. Define stopping times To = 0 and. . This behaviour is to some extent explained by the observation that the presence of a large number of infected individuals increases the risk to the rest of the population. if qi then qi = P(Ym = 0 for some m I Yo = i) = qf for all i and so ql = P(N = 0) + L qfP(N = i) = ¢(ql)' k==l 00 Now each non-negative solution of this equation provides a non-negative solution of the hitting probability equations.4. Occasionally a large number of cases arise together and form an epidemic. in agreement with the generating function approach.3 to see that. The decline of an epidemic can also be explained by the eventual decline in the number of individuals susceptible to infection. Harris (Dover. Example 5. New York. For an idealized disease we might suppose th~t on contact with an infective.1. these naive explanations leave unanswered many quantitative questions that are important in predicting the behaviour of epidemics. 1989). This idealized model is obviously unrealistic.

Consider now a sequence of models as N ~ 00 and choose s~ ~ So and i~ ~ i o. X t = (St.~) = '" . Here convergence means that E[I( sf'.10 = 1. Consider the case where So = N -1. if') . Of greater concern is the behaviour of an epidemic in a large population.174 5.O) for s E Z+ are all absorbing and all the other states are transient. The states (s.i+l) = Asi. N . rather. This has the following interpretation: a rumour is begun by a single individual who tells it to everyone she meets. ~ -=~ 1) = +-. 00 ). In the idealized model. and the absorption probabilities give the distribution of the number of susceptibles who escape infection. it) of the differential equations (d/dt)st (d/dt)i t = -vstit = vStit . but will give an example of another easier asymptotic calculation.i)(s. /-l E (0.x = l/N and /-l = O. How long does it take until everyone knows the rumour? If i people know the rumour.x = v/N. Let us consider the proportions sf = St/N and if' = It/N and suppose that . It can be shown that as N ~ 00 the process (sf'. if') converges to the solution (St. Applications We denote the number of susceptibles by St and the number of infectives by It. This is not a limit as above but. .~ N-l i=l 2 ' " -=~ ~ 1 rv 210gN as N ~ 00.(St.i)(s-i. q(s.x. ~ ~ . of size N. and the rate at which the rumour is passed on is qi = i(N .i-l) = /-li for some . We assume that each individual meets another randomly at the jump times of a Poisson process of rate 1. indeed all the communicating classes are singletons. where v is independent of N. The expected time until everyone knows the rumour is then N-l i=l 1 N-l i=l N N-l(l i=l '" qi = '" ~(N . say. We can calculate these probabilities explicitly when So + 10 is small. The epidemic must therefore eventually die out. It) performs a Markov chain on (Z+)2 with transition rates q(s. it) I] ~ o for all t 2:: o. then N ./-lit starting from (so. we effectively have a finite state-space. The fact that the expected time grows with N is related to the fact . We will not prove this result. i o).i)/N. Since St + It does not increase. they in turn pass the rumour on to everyone they meet.i do not. an asymptotic equivalence.

The final two examples come from population genetics. or the choice of parents' alleles retained by their offspring.1 Markov chains in biology 175 that we do not scale /0 with N: when the rumour is known by very few or by almost all. (The word gene refers to a particular chromosomal locus. then (Xn)n~O is a Markov chain with the above transition probabilities. get aa aA Aa AA AA. .m} with transition probabilities In each generation there are m alleles. the varieties of genetic material that can be present at such a locus are known as alleles. Let us take m to be even with m = 2k. for example. The types of alleles in generation n+ 1 are found by choosing randomly (with replacement) from the types in generation n. all independent.) This sort of study was motivated in the first place by a desire to find mathematical models of natural selection.1. then each gene in generation n + 1 is A with probability 7/10 and a with probability 3/10. We suppose that each individual has two genes. Berlin. If X n denotes the number of alleles of type A in generation n. We might.5. and in particular make no requirement that parents be of opposite sexes. The randomness here might derive from the choice of reproducing individual. and thereby to discriminate between various competing accounts of the process of evolution. as scientists have gained access to the genetic material itself. 1979). for which we refer the interested reader to Mathematical Population Genetics by W.. Example 5. This can be viewed as a model of inheritance for a particular gene with two alleles A and a. They represent an attempt to understand quantitatively the consequences of randomness in genetic inheritance. 1. in sexual reproduction the choice of partner.3 (Wright-Fisher model) This is the discrete-time Markov chain on {O. the proportion of 'infectives' changes very slowly. some of type A and some of type a. Aa and aa. Then if the generation n is. so the possibilities are AA. many more questions of a statistical nature have arisen. for example. Suppose that individuals in the next generation are obtained by mating randomly chosen individuals from the current generation and that offspring inherit one allele from each parent. We emphasise that we present only the very simplest examples in a rich theory. Ewens (Springer. More recently. AA aA AA AA aa. J. .. We have to allow that both parents may be the same. .

1). genetic diversity eventually disappears. Firstly.p) log(l .. Then the probability of choosing A when X n = i is given by ¢i = {i(l . m}.u) and + (m - i)v}/m . ~ > 0 respectively.m . alternatively hi = LPijh j . The communicating classes of (Xn)n~O are {O}. . ~- + (lj2)(3i(m .. Aa. it may be that the three genetic types AA. Applications (Xn)n~O. States 0 and m are absorbing and {I. Suppose A mutates to a with probability u and a mutates to A with probability v.m . The hitting probabilities for state m (pure AA) are given by This is obvious when one notes that one can check that (Xn)n~O m is a martingale. for p E (0. (3. {I. as m ~ 00 lEpm(T) ~ -2m{(1 . known.i)jm)2 a(ijm)2 and the transition probabilities are Secondly. aa have a relative selective advantage given by lX. Some modifications are possible which model other aspects of genetic theory..1} is transient.176 5. The structure of pairs of genes is irrelevant to the Markov chain which simply counts the number of alleles of type A.I}. so in a large population diversity does not disappear quickly. .. we may allow genes to mutate.i)jm 2 + . j==o According to this model.i)jm 2 a(ijm)2 + (3i(m . that. .p) It is + plogp} where T is the hitting time of {O. This means that the probability of choosing allele A when X n = i is given by 'l/J. . however. {m}.((m .

like the Wright-Fisher model. + mv - vp.m . the Moran model cannot be interpreted in terms of a species where genes come in pairs. one individual from the population at time n and remove it. and add a new individual of the same type. we choose randomly one individual from the population at time n.. Here is the genetic interpretation: a population consists of individuals of two types.. The hitting probabilities are given by IPi(Xn = m for some n) = i/m. a and A. both to give birth and to die.5. v > 0. with communicating classes {O}. However. in which case there is no change in the make-up of the population. There is an exact calculation for the mean of 7r: we have p. or where individuals have more than one parent. absorbing states 0 and m and transient class {I. so attention shifts from hitting probabilities to the invariant distribution 7r. Now. . Example 5. {I. .1. The same individual may be chosen each time. Pi. then (Xn)n~O is a Markov chain with transition matrix P.. We can also calculate explicitly the mean time to absorption .i)V} 7ri = (1 - u)p.m . so that J-L = mv/(u + v). then we choose. 1.I}.1 Markov chains in biology 177 With u.I}. if X n denotes the number of type A individuals in the population at time n. . and. is a martingale.m} with transition probabilities Pi. Pii = (i 2 + (m - i)2)/m 2. not the whole population.. again randomly. . . the basic Markov chain structure is the same. There are some obvious differences from the Wright-Fisher model: firstly.i-l = i(m - i)/m 2. secondly in the Moran model we only change one individual at a time. {m }. . . the states 0 and m are no longer absorbing.i+l = i(m - i)/m2. in fact the chain is irreducible. .4 (Moran model) The Moran model is the birth-and-death chain on {O. The Moran model is reversible. so we obtain the population at time n + 1. = L i7ri = E 7r (XI) = L 7riEi(X1) i=O m m m m i=O = 2: m7ri<Pi = 2:{i(l i=O i=O u) + (m .

p) .plogp as m ~ 00. is determined by m.i+1 k t+1) for i = 1..p) + plogp} which has the same functional form in p and differs by a factor of m/2. For the Wright-Fisher model we claimed that Epm(T) ~ -2m{(1 . before absorption: kf kf = bij + (Pi.1 so that . + L m J .p) + plogp}. } j=1 j=i+1 J ~ · As in the Wright-Fisher model.l j-l). . .. {(i/j)k.k~ = . kf where = for i ~ j ((m-i)/(m-j))k. for i = 1.j .) m-1. and i = pm for some P E (0.1)..m . = m.p) 10g(1 . one is really interested in the case where m is large. Hence ki=Lkf=m L j=1 m-1 { i ( m=z..i-1 k t-1 + Piikt + Pi. m}.. for i ~j k..( m-j j j j(m-j) which gives k.2 + . The simplest method is first to fix j and write down equations for the mean time spent in j. m2 . This factor is partially explained by the fact that the Moran model deals .178 5.1 kg = kin = 0. So.. J +P j==mp+1 J L ~ 1 -t -(1. . Then. Applications where T is the hitting time of {O.p) j==1 L m_ ~ 00 mp 1 m-1 . Then 2 m.... .p) log(1. starting from i.k pm = (1. as m Epm(T) ~ -m 2 {(1 .p) 10g(1 ..m .

then n-l k==O lE(t X n) = ¢(n) ( t) II 1/J (¢ (k) ( t) ) . Show that. of some other distribution.5. by n X n = N 1 + . and the times taken to serve customers are also independent random variables.2 A species of plant comes in three genotypes AA. giving priority to earlier arrivals.1.1. in the notation of Example 5. The main . and where P(N = 0) = 1 . Suppose we begin with just one type A.queues runs as follows: there is a succession of customers wanting service.1. The basic mathematical model for. Thus in a population of size m containing i type A individuals. What is the probability that eventually the whole population is of type A? 5. that is. A single plant of genotype Aa is crossed with itself. n + Nx n-l + In where (In)n'?:o is a sequence of independent Z+-valued random variables with common generating function 'ljJ(t) = E(t 1n ). Aa or aa with probabilities 1/4.1. as compared to a type A individual. on arrival each customer must wait until a server is free.i)).1 Consider a branching process with immigration. 5.3 In the Moran model we may introduce a selective bias by making it twice as likely that a type a individual is chosen to die. it is assumed that the times between arrivals are independent random variables of the same distribution.2 Queues and queueing networks 179 with one individual at a time. AA or aa? Suppose it is desired to breed an AA plant.. 1/2 and 1/4. How long on average does it take to achieve a pure strain. In the case where the number of immigrants in each generation is Poisson of parameter A..2 Queues and queueing networks Queues form in many circumstances and it is important to be able to predict their behaviour. Exercises 5. the probability that some type A is chosen to die is now i/(i + 2(m . if X o = 1. whereas the Wright-Fisher model changes all m at once. This is defined. find the long-run proportion of time during which the population is zero.1. What should you do? How many crosses would your procedure require.P and P(N = 1) = p. Aa and aa. on average? 5. so that the offspring has genotype AA.

where > O. Denote by T the time taken to serve the first customer and by A the time of the next arrival.2. Some further variations on queues of this type have already appeared in Exercises 3. and XJl = i . (Xt)t~O .J 1 is exponential of parameter JL and independent of J 1 . We shall also look at the random times that customers spend waiting and the length of time that servers are continuously busy. 3.7. Then the number of customers in the queue (Xt)t~O evolves as a Markov chain with the following diagram: J-l III( A J-l •• 111( A i i +1 •• i To see this. If the inter-arrival times only are exponential. and in the stable case find means to compute the equilibrium distribution of queue length. which is exponential of parameter A + JL. The code means: memoryless inter-arrival times/memoryless service times/one server.1.1 (M/M/I queue) This is the simplest queue of all. suppose at time 0 there are i customers in the queue.180 5.4. and a certain discrete-time Markov chain embedded in the queue. T . This is always taken to include both those being served and those waiting to be served. XJl = i + 1 if T > A. The case where i = 0 is simpler as there is no serving going on. then A-J1 is exponential of parameter A and independent of J1: the time already spent waiting for an arrival is forgotten.1 and 3.I if T < A. by exploiting the memorylessness of the Poisson process of arrivals. conditional on J 1 = A. Hence. an analysis is still possible. In each example we shall aim to describe some salient features of the queue in terms of the given data of arrival-time and service-time distributions. In cases where inter-arrival times and service times have exponential distributions. Similarly. Let us suppose that the interarrival times are exponential of parameter A. If we condition on J 1 = T. which events are independent of J 1 . and the service times are exponential of parameter J-l.2. This is explained in the final two examples. Applications quantity of interest is the random process (Xt)t~O recording the number of customers in the queue at time t.3. Example 5. with probabilities J-l/(A+J-l) and A/(A+J-l) respectively. conditional on XJl = j. (Xt)t~O turns out to be a continuous-time Markov chain.6. Then the first jump time J 1 is A 1\ T. We shall find conditions for the stability of the queue. 3. This is the context of our first six examples.7. so we can answer many questions about the queue.

(Xt)t~O is positive recurrent with invariant distribution So when A < J-l the average number of customers in the queue in equilibrium is given by 00 00 lE7r {Xt ) = LJP>7r{Xt i=l . Conditional on finding a queue of length i on arrival. or we multiply the mean waiting time by the expected number of customers At. The first calculation is exact but we have not fully justified the sec9nd. this is (i + 1) I J-l.5. i=l Also. so the overall mean waiting time is A rough check is available here as we can calculate in two ways the expected total time spent in the queue over an interval of length t: either we multiply the average queue length by t. when A < J-l and the queue is in equilibrium.2 Queues and queueing networks 181 begins afresh from j at time Jl. the mean time to return to 0 is given by so the mean length of time that the server is continuously busy is given by rnO - (llqo) = 1/(J-l . . We deduce that if .\ > J-l then (Xt)t~O is transient. except that it does not take jumps below o. and so on.A). Either way we get Atl J-l .\/{JL . its behaviour is largely understood. convergence to equilibrium..\/JL)i = .\). that is X t ~ 00 as t ~ 00. Thus. Another quantity of interest is the mean waiting time for a typical customer.~o is the claimed Markov chain.::: i) = L{. Even in more complicated examples where exact calculation is limited. Thus if A > J-l the queue grows without limit in the long term. It follows that (Xt)t.A. long-run averages. The MIMll queue thus evolves like a random walk. When A < J-l. once the Markovian character of the queue is noted we know what sort of features to look for .transience and recurrence. This sort of argument should by now be very familiar and we shall not spell out the details like this in later examples. once the queue size is identified as a Markov chain.

To find an invariant measure we look at the detailed balance equations Hence for i for i = 0. . the first service is completed at the minimum of i independent exponential times of parameter J-l. so by Theorem 3. The number of arrivals by time t is a Poisson process of rate A. Let us assume that the arrival rate is A and the service rate by each server is J-l.2.. Applications Example 5.. .. . Then if i servers are occupied. The total service rate increases to a maximum SJ-l when all servers are working.3 for any T > 0.S = s + 1. There are two cases when the invariant distribution has a particularly nice form: when S = 1 we are back to Example 5. It is transient in the case A > SJ-l and otherwise recurrent. and consider the queue in equilibrium.1./J-L so that and the invariant distribution is Poisson of parameter AI J-l.7. Let us suppose that A < SJ-l. We emphasise that the queue size includes those customers who are currently being served.2. By an argument similar to the preceding example.182 5. The queue is therefore positive recurrent when A < SJ-l. The detailed balance equations hold and (Xt)t~O is non-explosive.~o performs a Markov chain with the following diagram: •• ••• ••• 012 A J-l A 2J-l A SJ-l • s•• ••• s+l A SJ-l A So this time we obtain a birth-and-death chain.. S + 2. (Xt)O~t~T . so there is an invariant distribution. the queue size (Xt)t.. Each arrival corresponds to an increase in X t ..2 (M/M/s queue) This is a variation on the last example where there is one queue but there are S servers.1 and the invariant distribution is geometric of parameter AI J-l: When S = 00 we normalize 1r by taking 1ro = e-). The first service time is therefore exponential of parameter iJ-l. and each departure to a decrease.

additional calls are lost. Compare this example with the bus maintenance problem in Exercise 3. Example 5. it turns out that the process of departures.7.5. Instead. where the maximum number of calls that can be connected at once is s: when the exchange is full. and hence the long-run proportion of calls that are lost. the number of departures by time t is also a Poisson process of rate A. in equilibrium. Example 5.4 (Queues in series) Suppose that customers have two service requirements: they arrive as a Poisson process of rate A to be seen first by server A. as in the last example. the long-run proportion of time that the exchange is full. It follows that. and then by server .2 Queues and queueing networks 183 and (XT-t)O~t~T have the same law.2.3 (Telephone exchange) A variation on the M/M/s queue is to turn away customers who cannot be served immediately. This time we get a truncated Poisson distribution By the ergodic theorem. The maximum queue size or buffer size is s and we get the following modified Markov chain diagram: A • J-L I( A • 2J-L I( A • 012 s-l s We can find the invariant distribution of this finite Markov chain by solving the detailed balance equations.2. This might serve as a simple model for a telephone exchange. is given by This is known as Erlang's formula.1. in equilibrium. as one might imagine that the departure process runs in fits and starts depending on the number of servers working. This is slightly counter-intuitive. is just as regular as the process of arrivals.

184

5. Applications

B. For simplicity we shall assume that the service times are independent exponentials of parameters Q and j3 respectively. What is the average queue length at B?
Let us denote the queue length at A by (Xt)t>o and that by B by (¥t)t>o. Then (Xt)(~O is simply an MIMll queue. If A > Q, then (Xt)(~O is transient so there is eventually always a queue at A and departures form a Poisson process of rate a. If A < a, then, by the reversibility argument of Example 5.2.2, the process of departures from A is Poisson of rate A, provided queue A is in equilibrium. The question about queue length at B is not precisely formulated: it does not specify that the queues should be in equilibrium; indeed if A ~ Q there is no equilibrium. Nevertheless, we hope you will agree to treat arrivals at B as a Poisson process of rate Q /\ A. Then, by Example 5.2.1, the average queue length at B when Q /\ A < j3, in equilibrium, is given by (Q /\ A) 1((3 - (Q /\ A)). If, on the other hand, Q /\ A > (3, then (¥t)t~O is transient so the queue at B grows without limit. There is an equilibrium for both queues if A < Q and A < j3. The fact that in equilibrium the output from A is Poisson greatly simplifies the analysis of the two queues in series. For example, the average time taken by one customer to obtain both services is given by

1/(a - A)

+ 1/(j3 -

A).

Example 5.2.5 (Closed migration process)
Consider, first, a single particle in a finite state-space I which performs a Markov chain with irreducible Q-matrix Q. We know there is a unique invariant distribution Jr. We may think of the holding times of this chain as service times, by a single server at each node i E I. Let us suppose now that there are N particles in the state-space, which move as before except that they must queue for service at every node. If we do not care to distinguish between the particles, we can regard this as a new process (Xt)t~O with state-space Y= N1 , where X t = (ni : i E I) if at time t there are ni particles at state i. !n fact, this new process is ~lso ~ Markov chain. To describe its Q-matrix Q we define a function bi : I --+ I by

Thus bi adds a particle at i. Then for i -=I j the non-zero transition rates are given by

n

E

I,

i,j E I.

5.2 Queues and queueing networks
Observe that we can write the invariant measure equation 1rQ form
1ri

185

= 0 in the
(5.1)

L qij = L 1rjqji·
j#i j#i

For n

=

(ni : i E I) we set
1f(n)
=

II
iEI

1rfi.

Then

1f(8i n)

L q(8 n, 8 n) II 1r~k (1riLqji)
i
j

=

j#i

kEI

j#i

(L:: 1rjqji)
j#i

=

L 1f(8 n)q(8 n, 8m).
j j

j#i

Given mEl we can put m = 8i n in the last identity whenever mi summing the resulting equations we obtain

~

1. On

1f(m)

L
n#m

q(m, n) =

L 1f(n)q(n, m)
n#m

so 7r is an invariant measure for Q. The total number of particles is conserved so Q has communicating classes

and the unique invariant distribution for the N-particle system is given by normalizing 7f restricted to eN.
Example 5.2.6 (Open migration process)

We consider a modification of the last example where new customers, or particles, arrive at each node i E I at rate Ai. We suppose also that customers receiving service at node i leave the network at rate /-li. Thus customers enter the network, move from queue to queue according to a Markov chain and eventually leave, rather like a shopping centre. This model includes the closed system of the last example and also the queues

186

5. Applications

in series of Example 5.2.4. Let X t = (X; : i E I), where X; denotes the number of customers at node i at time t. Then (Xt)t>o is a Markov chain in Y= N I and the non-zero transition rates are given by
q(n, bin)

=

for n E I and distinct i and /-lj > 0 for some j; then Q is irreducible on I. The system of equations (5.1) for an invariant measure is replaced here by
1ri

= /-lj states i, LEI. We shall ass':,me that Ai > 0 for some
Ai, q(bi n , bjn) q(bjn, n)

= qij,

(f.1,i

+ L qi j )
j#i

= Ai

+ L 1rjqji·
j#i

This system has a unique solution, with 1ri > 0 for all i. This may be seen by considering the invariant distribution for the extended Q-matrix Q on I U {8} with off-diagonal entries
q8j

=

Aj,

qij

= qij,

qi8

= /-li·

On summing the system over i E I we find

L
iEI

1rif.1,i = L
iEI

Ai.

As in the last example, for n = (ni : i E I) we set

1f(n) =

II 1r~i.
iEI

Transitions from m E I may be divided into those where a new particle is added and, for each i E I with mi 2:: 1, those where a particle is moved from i to somewhere else. We have, for the first sort of transition

1f(m) Lq(m,8j m) = 1f(m) LAj
JEI JEI

= 1f(m)

L 1rjf.1,j = L 1f(8j m)q(8j m, m)
JEI JEI

and for the second sort

1f(8i n) (q(8in, n)
=

+ L q(8i n, 8j n))
j#i

II 1r~k (1ri (f.1,i + L
kEI j#i kEI j#i

qij))

=

II 1r~k ( Ai + L 1rjqj i)
+ L 1f(8j n)q(8j n, 8i n).
j#i

= 1f(n)q(n, 8i n)

5.2 Queues and queueing networks
On summing these equations we obtain
1f(m)

187

L: q(m, n) = L: 1f(n)q(n, m)
n#m n#m

so 7r is an invariant measure for Q. If 1ri < 1 for all i then 7r has finite total mass niEI(l-1ri), otherwise the total mass if infinite. Hence, Qis positive recurrent if and only if 1ri < 1 for all i, and in that case, in equilibrium, the individual queue lengths (Xi: i E I) are independent geometric random variables with

Example 5.2.7 (M/G/! queue)
As we argued in Section 2.4, the Poisson process is the natural probabilistic model for any uncoordinated stream of discrete events. So we are often justified in assuming that arrivals to a queue form a Poisson process. In the preceding examples we also assumed an exponential service-time distribution. This is desirable because it makes the queue size into a continuous-time Markov chain, but it is obviously inappropriate in many real-world examples. The service requirements of customers and the duration of telephone calls have observable distributions which are generally not exponential. A better model in this case is the MIGll queue, where G indicates that the service-time distribution is general. We can characterize the distribution of a service time T by its distribution function

F(t) = JP>(T
or by its Laplace transform

~

t),

(The integral written here is the Lebesgue-Stieltjes integral: when T has a density function f(t) we can replace dF(t) by f(t)dt.) Then the mean service time J-l is given by
J-l

= E(T) = -£'(0+).

To analyse the MIGll queue, we consider the queue size X n immediately following the nth departure. Then

(5.2)

188

5. Applications

where Yn denotes the number of arrivals during the nth service time. The case where X n = 0 is different because then we get an extra arrival before the (n + 1)th service time begins. By the Markov property of the Poisson process, Y1 , Y 2 , • •• are independent and identically distributed, so (Xn)n~O is a discrete-time Markov chain. Indeed, except for visits to 0, (Xn)n~O behaves as a random walk with jumps Yn - 1. Let Tn denote the nth service time. Then, conditional on Tn = t, Y n is Poisson of parameter At. So

and, indeed, we can compute the probability generating function A(z) = E(zYn ) =
=

1

1

00

E(zYn

I Tn

=

t)dF(t)

00

e-At(l-Z)dF(t) = £('\(1 - z)).

Set p = E(Yn ) = AJ-l. We call p the service intensity. Let us suppose that p < 1. We have

X n = X o + (Y1 + · · · + Yn ) - n + Zn
where Zn denotes the number of visits of X n to 0 before time n. So

E(Xn ) = E(Xo) - n(l - p)
Take X o = 0, then, since X n
~

+ E(Zn).

0, we have for all n

o<

1 - p ~ E(Znln).
--+ 00

By the ergodic theorem we know that, as n

E(Znln)
where mo is the mean return time to

--+

limo
Hence
00

o.

mo ~ 1/(1 - p) <

showing that (Xn)n~O is positive recurrent. Suppose now that we start (Xn)n~O with its equilibrium distribution Set G(z) = E(zX n ) =

Jr.

I:
i==O

00

1riZi

)/2(1 . Since A is given explicitly in terms of the service-time distribution. as z i 1 --+ (A(z) .p) = (1 - p)(l .A).A'(l-) = 1. we can now obtain.z) 1.z)G(z) (5. In the case of the M/M/1 queue p = AIJ-l.z)/(l.5.2. in principle.z) + A(z) .z)A(z)/(A(z) . To obtain the mean queue length we differentiate (5. The fact that generating functions work well here is due to the additive structure of (5. E(T 2) = 2/ J-l2 and E(Xn ) = p/(l .p. (A(z) .z) .z) (A(z) _ Z)2 By l'Hopital's rule: lim (A'(z)-l)(l-z) zjl + A(z)-z -lim (A(Z)-Z)2 - zjl A"(z)(l-z) _ _ -_A_"_(l_-_) 2(A'(z)-1) (A(z)-z) . rno = 1/(1 .z)G'(z) = (A'(z) .3) (A(z) .3) By I'Hopital's rule.2(1-p)2 · Hence E(Xn ) = G'(l-) = p + A"(l. so (A(z) .z).p) = p + A2£"(0+ )/2(1 .z} .1. the full equilibrium distribution.2). then substitute for G(z) to obtain G' (z) = (l-p )A'(z) (1. Since G(l) and = 1 = A(l).2 Queues and queueing networks then 189 = lE(ZYn+l) (1rOZ + L 1r Z i 00 i ) i==l = A(z) (1t"OZ + G(z) -1t"o) = 1t"oA(z)(l z).p) = p + A2E(T 2)/2(1 - p).A(z)}.p) = A/(J-l . G(z) we must therefore have 1t"o = 1 .z) -(l-p)A(z) { (A' (z) -1) (1. as we found in Example 5. .p.l)G(z) = (1 - p){ A'(z)(l .

Hence B(w) = = 1 1 00 E(e. lead to a formula for the mean queueing time E(Q) = -M'(O+) = .A(Q+T)(l-Z)) = M(A(l.z)) where M is the Laplace transform On substituting for G(z) we obtain the formula M(w) = (1 .x(1 . where N is the number of customers arriving while the first customer is served. as above.190 5... . we have 8=t+81 + . Hence G(z) = E(e. are independent. +8N.82 . Consider the queue (Xn)nEZ in equilibrium.the queueing time of a typical customer and the busy periods of the server. . Applications We shall use generating functions to study two more quantities of interest .x(1. which is Poisson of parameter At.B(w))). conditional on Q + T = t.. Consider the Laplace transform Let T denote the service time of the first customer in the busy period. with the same distribution as 8. and time T being served.t(l-B(w»dF(t) = L(w + ..WS IT = t)dF(t) 00 e-wte->. Then.xL" (0+) 2 (1 + AL' (0+)) 2 We now turn to the busy period 8.z))L(A(l. and where 8 1 . X o is Poisson of parameter >. since the customers in the queue at time 0 are precisely those who arrived during the queueing and service times of the departing customer..L(w)))..p)w/(w . Suppose that the customer who leaves at time 0 has spent time Q queueing to be served.t. Then conditional on T = t. Differentiation and l'Hopital's rule.

.5. then.F(s))ds.\ it (1 . .t(>.2 Queues and queueing networks 191 Although this is an implicit relation for B(w).pt)k /k! L('\(1 .p)tt. Then./(n . What. Service times are independent. .. For each of these customers.4.. Suppose there are no customers at time O.An are independent and uniformly distributed on the interval [0.k)! n==k 00 e-APt(Apt)k /k! So we have shown that X t is Poisson of parameter .p).An.8 (M/G/oo queue) Arrivals at this queue form a Poisson process. There are infinitely many servers. X t is binomial of parameters nand p. Then P(Xt = k) = L P(Xt = k I Nt = n)P(Nt = n) n==O 00 = = k e->. . service is incomplete at time t with probability 1 1 p=JP>(T>s)ds=tot it it 0 (l-F(s))ds. of rate A. we can obtain moments by differentiation: E(S) = -B'(O+) = -£'(0+)(1 .. Example 5. say. with a common distribution function F(t) = lP(T ~ t).2. conditional on Nt = n. so all customers in fact receive service at once.6. is the distribution of the number X t being served at time t? The number Nt of arrivals by time t is a Poisson random variable of parameter At. t]. Hence. We condition on Nt = n and label the times of the n arrivals randomly by AI.. The analysis here is simpler than in the last example because customers do not interact.AB'(O+)) = Jl(l + AE(S)) so the mean length of the busy period is given by E(S) = Jl/(l . by Theorem 2. . AI.

Here we present some examples involving the management of a resource: either the stock in a warehouse.xE(T). the queue size has a limiting distribution. The statistical problem of estimating transition rates for Markov chains has already been discussed in Section 1. If one can quantify that risk. Applications Hence if E(T) < 00. and. ••• are independent and identically distributed. 1.3. perhaps on the basis of past experience. The warehouse manager restocks to capacity for the beginning of period n + 1 whenever X n ~ m. In each time period n. . 1978). for some threshold m. Example 5. which is Poisson of parameter . c. Let us assume that D 1 . 5.10. or the water in a reservoir. is irreducible on {O.1 on the maintenance of unreliable equipment. Chichester.192 Recall that 5.. which is met if possible.c}. D 2 . . Thus (Xn)n~O satisfies if X n ~ m ifm < X n ~ c. often involving a Markov chain. See also Exercise 3.7. For further reading see Reversibility and Stochastic Networks by F. . then the determination of the best action will rest on the calculation of probabilities. Kelly (Wiley. then (Xn)n~O is a Markov chain.1 (Restocking a warehouse) A warehouse has a capacity of c units of stock. Given that X n = i. Hence (Xn)n~O has a unique invariant distribution 7r which determines the long-run proportion of time in each state.3 Markov chains in resource management Management decisions are always subject to risk because of the uncertainty of future events. the expected unmet demand in period n + 1 is given by if i ~ m if m < i :s. there is a demand for D n units of stock. or the reserves of an insurance company. We denote the residual stock at the end of period n by X n . P. excepting some peculiar demand structures..

u(l) = 1/6.1. We shall discuss in detail a special case where the calculations work out nicely.i Then 00 for i = 0.i)+ 2: k) = LIP(D 2: i + k) = 2. .3 Markov chains in resource management Hence the long-run proportion of demand that is unmet is u(m) = 193 L 7l"i i· U c i==O The long-run frequency of restocking is given by r(m) = L7l"i. Hence (1/6. u(O) = 1/4.1/3. by 1/8 1/2 ( 1/4 1/8 1/8 1/4 1/2) (1/8 1/8 1/4 1/2) (1/8 1/8 1/2 0 0 1/8 1/8 1/4 1/2 1/8 1/8 1/4 1/2 0 1/4 1/4 1/2 0 1/8 1/8 1/8 1/4 1/2 1/8 1/8 1/4 1/2 1/8 1/8 1/4 1/4 1/4 1/4 1/2) 1/2 1/2 1/2 with invariant distributions (1/4.1/8.2.2 are given.2. so possible threshold values are m = 0. (1/8.i)+) = LIP((D . it is a relatively simple matter to write down the (c + 1) x (c + 1) transition matrix P for (Xn)n~O and solve 1rP = 1r subject to E~==o 1ri = 1.1/2). Suppose that the capacity c = 3.1.1/3)..5. and that the demand satisfies P(D ~ i) = 2. respectively.1/4..1.1/4). u(2) = 1/8 .i ..1/6. u(m) decreases and r(m) increases. There is no general formula for 1r. k=l k=l The transition matrices for m = 0. i==O m Now as m increases. The warehouse manager may want to compute these quantities in order to optimize the long-run cost ar(m) + bu(m) where a is the cost of restocking and b is the profit per unit. 00 lE((D .1/4. Suppose that the profit per unit b = 1.1/4. but once the distribution of the demand is known.

7.discrete time) We are concerned here with a storage facility. . and that our units are chosen to make this constant 1. to minimize the long-run cost ar(m) 2 m = + u(m) we should take { 1 o if 1/4 < a ~ 1 if 1 < a. ••• . An units of resource are available to enter the facility and B n units are drawn off. . Applications r(O) = 1/4.2 (Reservoir model . the equation lIiZi L i=O = (1 . then is positive recurrent and the invariant distribution 7r satisfies (Xn)n~O L where A(z) 00 7riZi = (1 . surplus water is lost.z) i=O = E(zA n ). A 2 .2. r(l) = 1/3. B n and c are integer-valued and that AI. If we assume that An. Then the infinite capacity model satisfies a recursion similar to the M/G/l queue: X n+1 = (Xn .c}. whether or not E(A n ) < 1. A simplifying assumption which makes some calculations possible is to assume that consumption in each period is constant. ••• are independent and identically distributed. Hence.z) . if E(A n ) < 1.. Hence we know that the longrun behaviour of (Xn)n~O is controlled by its unique invariant distribution 7r. of finite capacity c. Example 5. We assume that newly available resources cannot be used in the current time period. When the reservoir is full.194 and 5.z)A(z)j(A(z) . When the reservoir is empty. In each time period n.1)+ + A n+l .EA n )(1 . Therefore. no water can be supplied. then (Xn)n~O is a Markov chain on {O. 00 In fact.. for example a reservoir. by the argument used in Example 5. likewise B I . Then the quantity of water X n in the reservoir at the end of period n satisfies X n+1 = ((Xn -Bn+I)+ + A n+l ) Ac. assuming irreducibility. B 2 . whose transition probabilities may be deduced from the distributions of An and B n . 1. if a ~ 1/4 r(2) = 1/2. So we would like to calculate 7r. For example.3.z)A(z)j(A(z) . the long-run proportion of time that the reservoir is empty is simply 7ro.

continuous time) Consider a reservoir model where fresh water arrives at the times of a Poisson process of rate A. € > say.c} through c.. Note that (Wt)t~O can enter [0. As in the preceding example we can obtain the finite capacity model by observing (Wt)t~O whilst in [0.. we can deduce for the finite-capacity model that the long-run proportion of time in state i is given by vi/(vo +. for i ~ 1 where ai = P(A n = i). because it illustrates a surprising and powerful connection between reflected random walks and the maxima . +ve ). this is true in general as the equilibrium equations for the finite-capacity model coincide with those for v up to level c . by the strong Markov property. Then (Xn)n~O is transient. it is to be hoped that. Hence... . c]. in part..z and equate powers of z: the resulting equations are the equilibrium equations for (Xn)n~O: Vo = lIo(ao l/i + a1) + V1 aO + = l/i+l a O L j=O i l/j a i-j+1. We assume that there is a continuous demand for water of rate 1.. Note that (Xn)n~O can only enter {O.xIE(Sn)).. The next example is included.3 Markov chains in resource management 195 serves to define a positive invariant measure v for (Xn)n~O. In the case where IE(A n) < 1.. arriving each time are assumed independent and identically distributed.5. To see this. (Xn)n~O observed whilst in {O.82 . . In reality.. c] only through c... the quantity of water held (Wt)t~O is just the stored work in an M/G/I queue with the same arrival times and service times 8 1 . + lie) < c which is always possible in the transient case. . Example 5. so II must have infinite total mass. For a reservoir of infinite capacity. in the long run. In fact. . The problem faced by the water company is to keep the long-run proportion of time 1ro(c) that the reservoir is empty below a certain acceptable fraction. The quantities of water 8 1 . 1.3 (Reservoir model . . The periods when the reservoir is empty correspond to idle periods of the queue.82 .3. which is true if E(A n ) > 1. supply will exceed demand.. 1. Hence c should be chosen large enough to make ° lIO/(lIo + . but we shall not pursue this here. multiply by A(z) . the long-run proportion of time that the reservoir is empty is given by IE(Sn)/(l . .. and the level c equation is redundant.1.c} is simply the finite-capacity model. Hence in the positive recurrent case where AE(8n ) < 1..

4 (Ruin of an insurance company) An insurance company receives premiums continuously at a constant rate. Thus Sn = Xl + .3. We choose units making this rate 1. Y 2 . the claims YI. so does the . which we now explain. identically distributed random variables.. Set p = -XE(Yi) and assume that p < 1. Denote by Sn the cumulative net loss following the nth claim. so Zn has the same distribution as M n where Example 5. Then in the long run the company can expect to make a profit of 1 .196 5. we have + X n+l )+. · + X n . The maximum loss that the company will have to sustain is M= lim M n n--+oo where By the argument given above. By the strong law of large numbers as n ~ 00. The company pays claims at the times of a Poisson process of rate -X. + X n and define (Zn)n~O by Zo = 0 and Zn+l = (Zn Then. Let Xl. Applications of random walks.T n )+.. X 2 .. there is a danger that large claims early on will ruin the company even though the long-term trend is good. Hence. But Zn is the queueing time of the nth customer in the M/G/1 queue with inter-arrival times Tn and service times Yn . ••• being independent and identically distributed.7 that the queue-length distribution converges to equilibrium. However. • •• denote a sequence of independent. where Zo = 0 and Zn+l = (Zn + Yn . M n has the same distribution as Zn.2.Tn and Tn is the nth inter-arrival time. where X n = Y n . We know by Example 5. by induction.P per unit time. Set Sn = Xl + .

p. and so on.7. then the choice p = 1 may be best.2. the expected total cost starting from 1. with expected cost 8. We have seen in Section 1. ¢(p) = { c(p)/(l . If we are only concerned with minimizing costs over the first few time steps. incurring no further costs. and seek to minimize ¢(p).1 A random walker on {O. Given the lack of memory in the model. . this makes it reasonable to pick the same value of p throughout. The general discussion which follows will make rigorous what we claimed was reasonable.10 and Section 4. in the long run the only way to avoid an infinite total cost is to get to O.. this may now be obtained by inverting the Laplace transform. In principle. Any value of p E (0.4 Markov decision processes In many contexts costs are incurred at a rate determined by some process which may best be modelled as a Markov chain. Example 5. However. . Hence The probability of eventual bankruptcy is P(M > a).2 how to calculate in these circumstances the long-run average cost or the expected total cost. 5. then i .2p) 00 Thus for c(p) = lip the choice p = 1/4 is optimal. we know the Laplace transform of the equilibrium queueing-time distribution. 1. where a denotes the initial value of the company's assets. Also by Example 5. but incurs a cost c(p) = l/p.1] may be chosen.4 Markov decision processes 197 queueing-time distribution. Suppose now that we are able to choose the transition probabilities for each state from a given class and that our choice determines the cost incurred. } jumps one step to the right with probability p and one step to the left with probability q = 1 .2. 2.. Hence ¢(p) so that = c(p) + 2p¢(p) for p < 1/2 for p ~ 1/2. The expected total cost starting from 2 is 2¢(p) since we must first hit 1. The walker on reaching 0 stays there. The question arises as to how best to do this to minimize our expected costs.5. Starting from i we must first hit i-I.4.

j E I) and a cost function c( a) = (Ci (a) : i E I). . Define also the value function V*(i) = infVU(i) U which is the minimal expected total cost starting from i. These are the data for a Markov decision process. A stationary policy u is a function u : I ~ A. with transition probabilities P"tj = Pij (u(i))· We suppose that a cost c(i. a policy U is a sequence of functions n=O. jE! . though so far we have no process and when we do it will not in general be Markov. . let us suppose given some distribution A = (Ai: i E I) and. a) = ci(a) is incurred when action a is chosen in state i.. . . The minimum expected cost incurred before time n = 1 is given by VI (i) = inf c(i. . Then we associate to a policy u an expected total cost starting from i.2..1. for each action a E A. To get a process we must choose a policy.Un(Xo. (ii) PU(Xn+ 1 =in+11 X o =io..Xn =i n ) =Pi n in +l(u n (i o . a). The basic problem of Markov decision theory is how to minimize expected costs by our choice of policy.. .Xn )). the probability law pu makes (Xn)n>O Markov.... . We abuse notation and write u also for the associated policy given by Under a stationary policy u.198 5. given by 00 VU(i) =E U LC(Xn. we assume that c(i. a) + LPij(a)V1 (j)}. that is. Formally.. Each policy u determines a probability law pu for a process (Xn)n~O with values in I by (i) PU(Xo = io) = Aio . Applications Generally. n=O So that this sum is well defined.in)). a transition matrix P(a) = (Pij(a) : i.. a Then the minimum expected cost incurred before time n = 2 is V2 (i) = i~{ c(i. a way of determining actions by our current knowledge of the process. a) ~ 0 for all i and a.

1]. so Vn (i) increases to a limit Voo(i).5. Then Vn+l(i) = !~ {C(i.00) and Pij : A ~ [0.4 Define inductively Markov decision processes 199 Vn+l (i) = i~f{ c(i.4) It is easy to see by induction that Vn (i) ~ Vn +l (i) for all i. ci(a) = l/a and a ifj=i+1 1 Pij ( a) = 0 . There is a stationary policy u such that Voo(i) = c(i.6) Proof. with obvious exceptions at i = O. We make three technical assumptions to ensure this. for all but finitely many j. jEI (5. being the limit of minimal expected costs over finite time intervals. u(i)) + LPij (u(i))VooU)· jEI (5. Voo(i) :::. so let us assume that Voo(i) ~ B < 00. for all a E A we have Pij (a) = 0. i~{ c(i.4. (ii) for all i and all B < 00 the set {a : ci(a) ~ B} is compact. letting n ~ 00 and then minimizing over a.a if j = i-I { otherwise. A simple case where (i) and (ii) hold is when A is a finite set. We assume that (i) for all i. is in fact the value function V*(i). unless we can show that the inequality (5. c(i.4. a) + LPij(a)VOOU)}. It is easy to check that the assumptions are valid in Example 5. This is not always true. If Voo(i) = 00 there is nothing to prove.a) + LPij(a)VnU) jEI for all a so. possibly infinite.2.1. with A = (0.00) are continuous.5) is actually an equality. Lemma 5. j the functions Ci : A ~ [0. jEI (5. We have Vn+l(i) :::. a) + LPij(a)VnU)}.a) + LPij(a)Vn(j)} jEJ .5) It is a reasonable guess that Voo(i). (iii) for each i.

. a) ~ B} and where J is the finite set {j : Pij ¢.un(i)) + :EPij(un(i))Vn(j) jEJ (5. on taking the infimum over u V*(i) ~ i~{ c(i. O}. Applications where K is the compact set {a : c( i.200 5.7).4. by continuity. Let us suppose inductively that Vn(i) ~ V*(i) for all i. D Theorem 5.6). we obtain (5. VU(i) = = Ei L n=O c(Xn . a) + LPij (a)V* (j) } jEI and.8) Certainly. the infimum is attained and Vn+l(i) = c(i. By compactness there is a convergent subsequence u nk (i) ~ u(i).Un (Xo. Hence Voo(i) ~ V*(i) for all i.8) we find Vn+ 1 (i) ~ V*(i) and the induction proceeds. For any policy u we have 00 V*(i) for all i. Hence. · " . Then by substitution in the right sides of (5.uo(i)) + :EPij(UO(i))vu[i](j) jEI where u[i] is the policy given by Hence we obtain VU(i) ~ i~{ c(i. Vo(i) = 0 ~ V*(i).3. We have for all i.a) + :EPij (a)V* (j) jEI for all i. and.4) and (5. say. jEI (5. a) + LPij (a)V* (j) }. on passing to the limit nk ~ 00 in (5. then u* is optimal.Xn )) c(i.7) for some un(i) E K. in the sense that v u * (i) = Proof. (ii) if u * is any stationary policy such that a (i) Vn(i) i V*(i) as n ~ 00 = u * (i) minimizes c(i.

5. jEI Theorem 5.2. for example Vn for n large. Then by Theorem 4. (i) We have.Ou(i)) + :EPij(Ou(i))vu(j) jEI so VU(i) ~ VeU(i) for all i.2. Given one stationary policy u we may define another Ou by the requirement that a = (Ou)(i) minimizes c(i. (ii) VenU(i) ! V*(i) as n ~ 00 for all i.9) Proof. (5. even less that this policy would be stationary.3 we have vu* (i) ~ Voo(i) for all i.2. by Theorem 4. Moreover.u(i)) + LPij(u(i))VU(j) jEI ~ c(i.2. . once this is known. part (i) gives an explicit way of obtaining the value function V* and.4 Markov decision processes Let u* be any stationary policy for which 201 Voo(i) ~ c(i. We may then hope that. For it was not clear at the outset that there would be a single policy which was optimal for all i. But V*(i) ~ vu* (i) for all i. In practice we may know only an approximation to V*. provided that for all i.3 VU(i) = c(i. D v u * (i) for all i The theorem just proved shows that the problem of finding a good policy is much simpler than we might have supposed. by Theorem 4.u*(i)) + LPij(U*(i))Voo(j). An alternative means of constructing nearly optimal policies is sometimes provided by the method of policy improvement. so Voo(i) = V*(i) = and we are done. We have (i) VeU(i) ~ VU(i) for all i.a) + :Epij(a)Vu(j). a) + :EPij(a)Vn(j) jEI we get a nearly optimal policy.4 (Policy improvement).4. jEI We know such a policy exists by Lemma 5.3. by choosing a = u(i) to minimize c(i. part (ii) identifies an optimal stationary policy.4.

which is only relevant to the transient case. .Un(Xo.a) + LPii(a)Vu(j) jEI for all i and a.U*(Xk)) k=O ~Mn where we used (5.U*(Xk))' k=O Recall the notation for conditional expectation introduced in Section 4. n=O Define the discounted value function V.1. 1. u*(Xn )) * jEI n-l + LC(Xk.9). c(i.10) Fix N ~ 0 and consider for n = 0. This is because we will have V* (i) = 00 unless for some stationary policy u. i = X n and a = u*(Xn ). (5. . We now seek to minimize the expected total discounted cost 00 V:(i) = EiLOnC(Xn.. We have EU (Mn+ 1 I :Fn ) = LPXni (U*(Xn))VoN-n-lu(j) + c(Xn . . .1). Applications (ii) We note from part (i) that VOU(i) :::.Xn )). Hence if we assume (5.u*(Xn))) D asN~oo. It follows that EU· (Mn +1) ~ EU· (Mn ) for all n.. U . The recurrent case is also of practical importance and one way to deal with this is to discount costs at future times by a fixed factor Q E (0. then V 9N U(i) = IEy· (Mo) ~ IEY· (MN ) =Ef(VU(XN)) +Eu* ~V*(i) (~c(Xn.. are transient.10) with u replaced by (IN-n-l u .(i) = infV:(i).N the process n-l M n = VON-nU(Xn) + LC(Xk. We have been discussing the minimization of expected total cost. accessible from i. u(j)) > 0. the only states j with positive cost c(j. .202 5.

at some geometric time of parameter a.a(i) i VC:(i) as n ~ V. incurring no further costs. define another Oau by the requirement that a = (Oau)(i) minimizes c(i. it jumps to 8 and stays there.11) (iii) Let u* be a stationary policy such that a = u*(i) minimizes c(i. jEI (5.o: (in jEJ and. inductively Vn+l.a) +a 2:Pij(a)V. Theorem 5.5. given a stationary policy u. for all i.4 Markov decision processes 203 In fact.5. Then u* is optimal in the sense that V:* (i) = VC:(i) for all i. .U) jEI for all i. a) + a 2:Pij(a)V jEJ U U). a) + a 2: Pij(a)V. ci(a) = ci(a). Pia(a) = 1 . a) is uniformly bounded. (ii) The value function VC: is the unique bounded solution to 00 (i) We have Vn. With obvious notation we have ~ 00 for all i.a (i) = 0 and. Thus the new process follows the old until. a) + a 2:Pij (a)Vn. ca(a) = O. Introduce VO.4. Suppose that the cost function c(i. the discounted case reduces to the undiscounted case by introducing a new absorbing state 8 and defining a new Markov decision process by Pij (a) = apij (a).a. (iv) For all stationary policies u we have as n Proof.o:(i) = i~f {c(i.(i) = i~f{ c(i.U) }.

204 5. Instead. there is a stationary policy u such that V(i) = c(i. although this is sometimes valid. So for any stationary policy u we have 00 V:(i) and so = Ei:E o:nc(xn . In the case of a stationary strategy u for which (Xn)n>O has a unique invariant distribution 1ru .a) ~ D 0 Hence (iv) also follows from Theorem 5. We have c(i. JEI Then V = V~. we do not know in general that the optimal policy is positive recurrent. with the same limit. u(j)). a) ~ B for some B < 00. u(Xn )) :::. except for the uniqueness claim in (ii). for all i. B/(l . Then (}au = U so (iv) will show that u is optimal and V = VC:.4. (ii) and (iii) follow directly from Theorems 5. Applications so parts (i).u(i)) +0: :Epij(u(i))V(j). So ~ (i) does converge in this case by bounded convergence.2. Here we are concerned with the limiting behaviour.11).3 and 5. by Theorem 4.U(Xk)) ~ L k=O 7r j c U.4. JEI But. a)1 ~ B < 00 for all i and a.4. This suggests that one approach to minimizing long-run costs might be to minimize L 7rjcU. of We assume that Ic(i. which is more general. but in general the sequence ~ (i) may fail to converge as n ~ 00. = anEf* (V~(Xn)) ~ Ba n /(l. or even stationary.: LC(Xk.5. . we use a martingale approach. as n ~ 00.u(j)) JEI as n ~ 00. Pi -almost surely.4. But given any bounded solution V to (5.0:) n=O iEf* (VU(Xn )) as n ~ 00. we know by the ergodic theorem that 1 n-l .. We finish with a discussion of long-run average costs.4. This forces IV: (i) I ~ B for all n.

5.nV* Then +L k=O C(Xk' Uk). Two further aspects also deserve comment. D IW(i)l/n The most obvious point of this theorem is that it identifies an optimal stationary policy when the hypothesis is met. (ii) liminfn~oo (i) ~ V* for all i. (5. for all u. n-l Consider M n = W(X n ) . i This implies (ii) on letting n ~ 00.t 00 for all i. .. if u is a stationary policy for which (Xn)n~O has an invariant distribution 1r u .12) for each i. Suppose we can find a constant V* and a bounded function W(i) such that V* + W(i) = i~f{c(i. v::* v:: Proof.u(i)) iEI + L1I"jW(j) jEI . then 2: 1I"f (V* + W(i)) iEI ~ 2:1I"f (C(i'U(i)) + 2:Pii(U(i))W(j)) iEI jEI = 2:1I"fc(i.4 Markov decision processes 205 Theorem 5.6. Un) + LPXni(Un)W(j)} jEI (V* + W(Xn )) with equality if u = u*. EU(Mn+l I F n ) = M n + {C(Xn . . Then (i) (i) ..t V* as n .4.a) + LPii(a)W(j)} jEI for all i.12) Let u* be any stationary strategy such that a = u*(i) achieves the infimum in (5. Fix a strategy u and set Un = un(Xo. Firstly.Xn ). Therefore So we obtain v* ~ V~(i) + 2 sup IW(i)l/n. When u 2 sup i = u* we also have v::* (i) ~ V* + and hence (i).

1994) for more examples.U31 U32 U 33 written as decimal expansions of a certain length. U21 U22 U 23 = 0.a). Tijms. there is a connection with the case of discounted costs. 1970) and to H. On substituting this into (5.an algorithmic approach (Wiley. 5.11) we find v* /(1 = so a) + W(i) + 0(1 jEI a) i~f {C(i' a) + a LPij(a)(V* /(1. uniformly distributed on [0. u(i)) iEI with equality if we can take u = u* .5 Markov chain Monte Carlo Most computers may be instructed to provide a sequence of numbers Ul = O.(i) = V* /(1 .Ull U12 U 13 Ulm U2m U3m U2 = U3 O. Stochastic Models . results and references. The interested reader is referred to S.12) on letting a i 1. Applied Probability Models with Optimization Applications (Holden-Day. which for many purposes may be regarded as sample values of a sequence of independent random variables. C.a) + W(j) + 0(1 V* + W(i) = a))} i~f {C(i' a) + a LPij(a)W(j)} + 0(1 jEI a) which brings us back to (5. Secondly. Chichester.1]: .11. Assume that I is finite and that P( a) is irreducible for all a. San Francisco. Ross.206 so 5.a) + W(i) + 0(1 . Then we can show that as a i 1 we have V. Applications v* ~ L 1rfc(i.

. Since EiEI Ai = 1 we can partition [0. We shall now describe one procedure to simulate a Markov chain (Xn)n~O with initial distribution A and transition matrix P. 1] ~ I. Ul. . they are usually derived sequentially by some entirely deterministic algorithm in the computer. . .Xn = in) = JP>(Un +1 E Ainin+l) = Pi ni n+l so (Xn)n~O is Markov(A.. if u E A ij . uniformly distributed on [0. This makes it worth while considering how one might construct Markov chains from a given sequence of independent uniform random variables.1] into disjoint subintervals (Ai: i E I) with lengths Similarly for each i E I. and set X o = Go(Uo). .1]. are actually all integer multiples of 10. 1] ~ I by Go(U) =i if u E Ai. U2 . .. G : I x [0. lP(Xn +1 = i n +1 I X o = io. of course.5. U1. Nevertheless.u)=j Suppose that Uo. U3. lP(Xo = i) = lP(Uo E Ai) = Ai.5 Markov chain Monte Carlo 207 We are cautious in our language because..1] into disjoint subintervals (A ij : j E I) such that Now define functions Go : [0. U3 (w). more seriously. the generators of such pseudo-random numbers are in general as reliable an imitation as one could wish of U1(w). and then might exploit the observed properties of such processes. This simple procedure may be used to investigate empirically those aspects of the behaviour of a Markov chain where theoretical calculations become infeasible. U2 (w). X n+1 Then = G(Xn ..m and.. Un+1 ) for n ~ 0. U2. G(i. P). we can partition [0.. • •• is a sequence of independent random variables.

known as Markov chain Monte Carlo. Markov chain Monte Carlo is sometimes the only way to simulate samples from 7r.Xn in I.. X(m) takes values in 8 m .. especially in statistics. However. The essential point to understand is that A is typically a large set. .13) by 1 n . When 7r does not have product form. making the state-space I very large indeed. For recall that a computer just simulates sequences of independent U[O. the object of primary interest being the invariant distribution of the Markov chain and not the chain itself. The context for Markov chain Monte Carlo is a state-space in product form I = 8m II mEA where A is a finite set. 1] random variables. for each site mEA. moreover. statistical physics and computer science.13) for some given function I = (Ii : i E I).208 5. Monte Carlo is another name for computer simulation so this sounds no different from the procedure just discussed. For the purposes of this discussion we shall also assume that each component 8 m is a finite set..performing the sum (5.13) state by state for a start. It is the application which finds greatest practical use. one can obtain error estimates which indicate how large to make n in practice. An alternative approach would be to simulate a large number of independent random variables Xl. where. simulation from the distribution 7r is also difficult. unless 7r has product form 1r(X) = II 1r mEA m {x(m)).. and it is desired to compute the number (5. each with distribution 7r. k=l The strong law of large numbers guarantees that this is a good approximation as n ~ 00 and. . . But what is really meant is simulation by means of Markov chains. L!(Xk). and to approximate (5. After a general discussion we shall give two examples. We are given a distribution 7r = (7ri : i E I). perhaps up to an unknown constant multiple. Then certain operations are computationally infeasible . Applications The remainder of this section is devoted to one application of the simulation of Markov chains. A random variable X with values in I is then a family of component random variables (X (m) : mEA).

assuming only irreducibility. by Theorem 1. Each component Xo(m) of the initial state X o is a random variable in 8 m . we know. and then pii(m) = 1. we simulate a new random variable X n + 1 (m) with values in 8 m according to a distribution determined by X n . Then.5. It does not matter crucially what distribution X o is given. Indeed.8. except possibly at site m. Let us write i ~ j if i and j agree.2:Pij(m) ~ O. and for k =I m we set X n+ 1 ( k) = X n ( k) . Thus at each step we have only to simulate a random variable in 8 m .2 shows that with probability 1. not one in the much larger space I. given any stochastic matrix R( m) with rij(m) =0 unless i ~ j we can determine such a P(m) by for i =I j.3. When the chosen site is m.5 Markov chain Monte Carlo 209 The basic idea is to simulate a Markov chain (Xn)n~O. The process (Xn)n~O is made to evolve by changing components one site at a time. j we want There are many possible choices for P(m) satisfying these equations. We would like Jr to be invariant for P( m).10. where pij(m) = 0 unless i ~ j. The law for simulating a new value at site m is described by a transition matrix P( m). which is constructed to have invariant distribution Jr. for example. Indeed. that as n ~ 00 the distribution of X n converges to Jr. assuming aperiodicity and irreducibility. but we might. make all components independent. But why should simulating an entire Markov chain be easier than simulating a simple distribution Jr? The answer lies in the fact that the state-space is a product. A sufficient condition is that the detailed balance equations hold: thus for all i. j#i . Theorem 1.

For definiteness we mention two possibilities. So we simply resample X n (m) according to the conditional distribution under 7r. then we adopt the new value. It is particularly useful in Bayesian statistics. Then (Xn)n~O is itself a Markov chain with transition matrix P= IAI. We have not yet specified a rule for deciding which site to visit when. This is called the Gibbs sampler. we might choose a site randomly at each step. On taking for i ~ j we also find for i ~ j. We know that . A particularly simple case would be to take for i ~ j. Xn)n~O is a Markov chain in A x I. i =1= j rij(m) == l/(Nm . This amounts to choosing another value jm at site m uniformly at random. given the other components. provided we keep returning to every site. Alternatively.210 5. generating a sequence of sites (mn)n~O. We might choose to visit every site once and then repeat. On taking rij(m) = rji(m) for all i and j we find for i ~ j. There are two commonly used special cases. Let us assume that P is irreducible. Applications This has the following interpretation: if X n = i we simulate a new random variable Y n so that Y n = j with probability rij(m). In practice this may not matter much. which is easy to ensure in the examples.1 :E P(m). i =1= j. This is called a Hastings algorithm. if 7rj > 7ri.1) where N m == ISml. then if Y n = j we set with probability (7ririj(m)/7rjrji(m)) 1\ 1 otherwise. whereas if 7rj ~ 7ri we adopt the new value with probability 7rj/7ri. where the analysis is simpler to present. Then (m n . mEA We shall stick with this second choice. This is called a Metropolis algorithm.

Spiegelhalter (Chapman and Hall. Example 5. The parameters 00 . For further reading we recommend Stochastic Simulation by B.O )2/2}7 ao .5. cPo 1 ). 1995) contains many interesting references. . j.807}. 10 (1). For example. Chichester.80). 7 I y) ex 7r(/-l. 7). and 7 has gamma distribution of parameters 0:0 and . we might assume that /-l rv N( 00.80.2. 3-40. is then given by Bayes' formula 7r(/-l. Higdon and K.80 are known. D. and Markov Chain Monte Carlo in practice by W.( 0 )2 /2} exp { -T t. that is to say. pp. 7) is given by 7r(/-l.Y . Green.. 7) f (y I /-l. Richardson and D. 1987). but useful information of this type cannot be gained in the present general context.1 .5 Markov chain Monte Carlo for all m and all i. cPo. 0:0 and .10. with a given prior distribution. London. Besag. Ripley (Wiley. 7) ex exp{ -¢o(p. Given more information on the structure of 8 m and the distribution 7r to be simulated. Then the prior density for (/-l. The Bayesian approach to this problem is to assume that /-l and 7 are themselves random variables. . In practice one is concerned with how fast it works. 7 rv r( 0:0. Mengersen (Statistical Science. 7) ex exp{ -cPO(/-l. P. S. by Theorem 1. -L 1 n-1 L o The posterior density for (/-l. much more can be said. Thus the algorithm works eventually. 1996). Gilks. we have f(Xk) .1 (Bayesian statistics) In a statistical problem one may be presented with a set of independent observations Y1 . D. .1 exp{ -. /-l is normal of mean 00 and variance cPo 1 . but with unknown mean /-l and variance 7. We finish with two examples.5. .t 7rdi n k=O iEI as n ~ 00 with probability 1.(Yi - p)2 /2 } T ao -l+ n / 2 exp{ -{jOT}. the computer output gives a good idea of how well we are doing. which is the conditional density given the observations. One then seeks to draw conclusions about /-l and 7 on the basis of the observations. It should also be emphasised that there is an empirical side to simulation: with due caution informed by the theory. . We shall not pursue the matter here. so also 7riPij 211 = 7rjPji and so 7r is the unique invariant measure for P. Hence. which it is reasonable to assume are normally disn tributed. R. The recent survey article Bayesian computation and stochastic systems by J. J.

To make the connection with our general discussion we set 1=81 X 8 2 = JR x [0. Our final belief about J-l and T is regarded as measured by the posterior density. Applications Note that the posterior density is no longer in product form: the conditioning has introduced a dependence between J-l and T. numerical integration would also be feasible as the dimension is only two. we first simulate J-lk+1 from 7r(J-l I y.80 + 2)Yi i==l - p)2/2.8n) where n an = ao + n/2.80 p)2 /2 } I'.. The model consists of m copies of the preceding one. p) + t.1). In any case the computer will work with finite approximations to JR and [0.(Yi - p)2 /2) } I'.00). Nevertheless.t f(x)1r(x I y)dx as k . with . T I y). 00) are not finite sets does not affect the basic idea. then set Xk+1 = (J-lk+1. Tk).J N(On. ¢.Tk+1). given X k = (J-lk. T) with density 7r(J-l. We may wish to compute probabilities and expectations. for all bounded continuous functions f : I ---+ JR.212 5. and one can show that k-1 ~ ~ f(X j ) . Tk) and then Tk+1 from 7r(T I y. r) ex exp{ -¢o(p. the full conditional distributions still have a simple form 1r(pl 1r(r Y. Of course. First we simulate X o. T). I Y. The fact that JR and [0.8n = . We now turn to an elaboration of this example where the Gibbs sampler is indispensible. say from the product form density 7r(J-l.( 0)2/2} exp { -r t. This is not an immediate consequence of the ergodic theorem for discrete statespace.00). Then (Xk)k~O is a Markov chain in I with invariant measure 7r(J-l. . but you may find it reasonable at an intuitive level.J r(a n . with a rate of convergence depending on the smoothness of 7r and f.(Yi ex r CYo -l+ n / 2 exp { -r (. Here the Gibbs sampler provides a particularly simple approach. We wish to simulate X = (J-l.t 00 1 with probability 1. At the kth stage. . T I y).J-lk+1).

x exp { -T t. . by a Gibbs sampler method. with means jjj and common variance 7. .jjn). for j = 1.. . with Let us write jj = (jj1.J-lj)2 /2} TQo-Hmn/2 exp{ -. so the Markov chain approach is the only one available.8n = f30 + L L(Yij . and j = 1. normally distributed.. . conditional on 7. . The prior density is given by and the posterior density is given by 1r(J-l.. the means jjj. T I y) ex exp { -¢o ~(J-lj . ~(Yij Hence the full conditional distributions are where n m On = 00 + mn/2.1 . Note that. . n.. . This has the. . 7 I y). Thus there are mn independent observations }!ij.m.5. Thus one can update all the means simultaneously in the Gibbs sampler.( 0 )2 /2} .. .8oT}.m. just as in the case m = 1 discussed above. We take these parameters to be independent random variables as before.5 Markov chain Monte Carlo 213 different means but a common variance. as is direct simulation from the distribution. i=1 j=1 We can construct approximate samples from 7r(jj. . where i = 1. In cases where m is large.. effect of speeding convergence to the equilibrium distribution.J-lj)2/2. remain independent. numerical integration of 7r(jj.7 I y) is infeasible..

the fact that X is forced to take boundary values 1 does not significantly affect the distribution of X(O) when N is large. This is one of the fundamental models of statistical physics. . but there are many related models which are not. Here we consider the problem of simulating the Ising model.0. m / } ~ A with 1m . First we describe a Gibbs sampler...x(m.= {(ml' m2) + m2 E A : ml + m2 E A : ml is even}..m'l = 1. We write I+ and for each (3 > ° = {x E I : x(m) = 1 for all m E 8A} define a probability distribution (rr(x) : x E I+) by 1t"(x) ex e-(3H(x). as (3 i 00 the mass concentrates on configurations x where H(x) is small. . For x E A define H(x) = ! I:: (x(m) . Note that H(x) is small when the values taken by x at neighbouring sites are predominantly the same. where simulation is still possible by simple modifications of the methods presented here. A famous and deep result of Onsager says that if X has distribution 1t".2 (Ising model and image analysis) Consider a large box A = AN in 71 2 A with boundary 8A = {-N. if sinh 2(3 ~ 1. In fact. is odd} .214 5.N}2 and the configuration space = AN\A N. Consider the sets of even and odd sites A+ = {(ml' m2) A. .5. then ° In particular.1.-1. uniformly in N. or even to detect phenomena quite out of reach of the current theory. Simulations may sometimes be used to guide further developments in the theory. the Ising model is rather well understood theoretically. whereas if sinh 2{3 > 1 there is a residual effect of the boundary values on X(O). whereas. As (3 ! the weighting becomes uniform. called the Ising model.))2 where the sum is taken over all pairs {m.1 .. Applications Example 5. .

5.I x+). to obtain X n+1 .x) = (1r(x)/1r(x)) 1\ 1 = e2 . we change the sign of Xt(m) with probability p(m. In a Bayesian analysis of two-dimensional images. We may encode a digitized image on a twodimensional grid as a particular configuration (x(m) : mEA) E I..+l with distribution 1r(. Then inductively. for large n.8 in the Ising model. simulate X. Then according to our general discussion. Note that we did not use the value of the normalizing constant Z = e-/3H(x) L xEI+ wllich is hard to compute by elementary means when N is large. where 1r has an approximate product structure on large scales. I x+). An alternative approach is to use a Metropolis algorithm. The process (Xn)n~O is then a Markov chain in 1+ with invariant distribution 1r. We can exploit the fact that the conditional distribution 1r(x+ product form 1r(X+ I x-) ex e/3x(m)s(m) I x-) has II mEA+\8A where. By varying the parameter . for mEA+\8A s(m) = L Im'-ml=l x-(m'). Given that X n = x. we vary the tendency of black pixels . Let us call the resulting configuration Y n . I x-) and then given X~+l = x+. there is little to choose between them. Next we apply the corresponding transformation to Y n. We can again exploit the even/odd partition. independently for each m E A+\8A. Choose now some simple initial configuration X o in 1+. = x-. the distribution of X n is approximately 1r. it is easy to simulate from 1r(x+ I x-) and likewise from 1r(x.5 Markov chain Monte Carlo and for x E I set 215 x± = (x(m) : m E A±). Convergence is fast in the subcritical case sinh 2. simulate firstly X~+l with distribution 1r(. Therefore.8 < 1. given X. Both methods we have described serve to simulate samples from 1r. the Ising model is sometimes used as a prior.Bx(m)s(m) 1\ 1 where x ~ x with x(m) = -x(m). where x(m) = 1 for a white pixel and x(m) = -1 for a black pixel.( m) for the odd sites m E A-\8A.

black or white. Observations are now made at each site which record the true pixel. with probability p E (0. Then (Xn)n~O is a Markov chain in /+ with invariant distribution 7r(.6x(m)s(m)((1_ p)/pt(m)y(m) where x ~ x with x(m) = -x(m).y) (1 .p)d(x. y) = (7r(x I Y)/7r(x I y)) = 1\ 1 e. Next apply the corresponding transformation to X~+1/2 for the odd sites to obtain X n+ 1 . Although this is not exactly the Ising model.2 . Applications to clump together. . which we choose according to the sort of image we expect. The posterior distribution for X given observations Y is then given by 7r(x I y) ex 7r(x)f(y I x) ex e. I y). the same methods work. Thus (3 is a sort of texture parameter. thus obtaining a prior 7r(x).1). y) are the numbers of sites at which x and y agree and disagree respectively. independently for each m E A+\8A. y) and d(x. x. We describe the appropriate Metropolis algorithm: given that X n = x. 'Cleaned-up' versions of the observed image Y may now be obtained by simulating from the posterior distribution. change the sign of X:(m) with probability p(m. the same for white pixels.f3H (x)pa(x.y) where a(x. Call the resulting configuration X n + 1 / 2 .216 5.

6 Appendix: probability and measure Section 6.1 contains some reminders about countable sets and the discrete version of measure theory. The basic framework of measure and probability is reviewed in Sections 6. For much of the book we can do without explicit mention of more general aspects of measure theory.5.n} ~ I for some n E N. . 6. One crucial result which we found impossible to discuss convincingly without measure theory is the strong Markov property for continuous-time chains. This is proved in Section 6. In either case we can enumerate all the elements of I .4. Two important results of measure theory. are needed a number of times: these are discussed in Section 6. which are often more convenient than a-algebras. This is because the state-space is at worst countable. Finally. with or without a measure-theoretic background. When interpreted in terms of measure theory. except an elementary understanding of Riemann integration or Lebesgue measure. The proofs we have given may be read on two levels. . the monotone convergence theorem and Fubini's theorem. . in Section 6.or a bijection f : N ~ I.2 and 6..6. the proofs are intended to be rigorous.1 Countable sets and countable sums A set I is countable if there is a bijection f : {I.3. we discuss a general technique for determining probability measures and independence in terms of 1r-systems.

The set of all subsets of N is uncountable and so is the set of real numbers JR. Lemma 6. ~ 0 for all i E I. There would have been no loss in generality had we insisted that all our Markov chains had state-space N or {I. Any countable union of countable sets is countable..1. ••• . for example tl n for any n. for any two enumerations of I ~1.1. Any subset of a countable set is countable. n=l and the result follows on letting N ~ D Since the value of the sum does not depend on the enumeration we are justified in using a notation which does not specify an enumeration and write simply More generally.. then we can set where .n} for some n E N: this just corresponds to a particular choice of the bijection f. if we allow Ai to take negative values.-J 't n=l n=l 00. Let I be a countably infinite set and let Ai Then. ~3. ~2. Appendix: probability and measure where in one case the sequence terminates and in the other it does not. Any finite cartesian product of countable sets is countable.-J In L. . We need the following basic fact. .218 6. we have 00 00 LAin n=l = LAin' n=l Proof. Given any N E N we can find M ~ Nand N' ~ M such that Then N M N' """ A· n -< """ A· -< """ A· n L.-J 't L.

We take the opportunity to prove these simple versions in order to convey some intuition relevant to the general case..1. Then LAi(n) i LAi as n iEI iEI -t 00. . There is no difficulty in showing for Ai.1 Countable sets and countable sums 219 allowing that the sum over I is undefined when the sums over I+ and Iare both infinite. Suppose for each i E I we are given an increasing sequence (Ai(n))n~O with limit Ai. for any finite set J and for 2 0. and that Ai ( n) 2 0 for all i and n.j3. be an enumeration of J.discrete case). we have L (LAi j) L (LAij).. Then L (LAi j) = L (LAij). . Lemma 6.j2.2 (Fubini's theorem .6. Then as n ~ 00. D Lemma 6.3 (Monotone convergence . Let I and J be countable sets and let Aij 2 0 for all i E I and j E J. = iEI iEI Aij iEI By induction. iEI jEJ jEJ iEI Proof. = iEI jEJ jEJ iEI The following two results on sums are simple versions of fundamental results for integrals. Let jl. Hence and the result follows by symmetry. .discrete case). jji 2 0 that I)Ai + Pi) L Ai + LPi.1.

220 6. A a-algebra £ on E is a set of subsets of E satisfying (i) 0 E £. (iii) (An E £. (~8i(k)) t. £.8i(k)) = t. The pair (E. £) is a function J-l : £ ~ [0. then we say J-l is a-finite. . J-l) is called a measure space. (ii) A E £ =* AC E £. For such A we obtain a measure on the measurable space (I.I).2 Basic facts of measure theory We state here for easy reference the basic definitions and results of measure theory. n E N with Un En = E and J-l(En ) < 00 for all n.n E N) =* Un An E £.2.I) by setting In fact.1 if Ai E [0. Example 6. Set 8i (1) = Ai(l) and for n ~ 2 set Then 8i (n) ~ 0 for all i and n. we obtain in this way all a-finite measures J-l on (I. A measure J-l on (E. (~8i(k)) i = L (f 8i (k)) LAi = iEI k=l iEI o D 6.00] which has the following countable additivity property: The triple (E.1 Let I be a countable set and denote by I the set of all subsets of I. Appendix: probability and measure Proof. so as n ~ 00. £) is called a measurable space.00) for all i. Here AC denotes the complement E\A of A in E. Thus £ is closed under countable set operations. Recall that A = (Ai: i E I) is a measure in the sense of Section 1. If there exist sets En E £. by Fubini's theorem ~Ai(n) = ~ (t. Let E be a set.

When the range E 2 = JR we take £2 = B by default. We denote by m£+ the set of measurable functions f : E ~ [0.-t(a.00] such that (i) ji(lA) = J. We denote by m£ the set of measurable functions f : E ~ JR. for a sequence of functions f n E m£+. It can be shown that there is a unique measure J. where we take on [0. Then m£ is a vector space. i It follows that. The intersection of any collection of a-algebras is again a a-algebra. (ii) ji(o:f + (3g) = o:ji(f) + (3ji(f) for all f. 0:. Let (E. When the range E 2 is a countable set I we take £2 to be the set of all subsets I by default.6.b): a.3 In the preceding example take E = JR and A = {(a. n E N) ~ ji(En fn) = En ji(fn). £2) be measurable spaces. The collection of a-algebras containing A is therefore non-empty and its intersection is a a-algebra a(A). b.b E JR. . £) be a measllrable space.2.-t is called Lebesgue measure. b) = b .2 221 Let A be any set of subsets of E.-t(A) for all A E £. 00]. and so is limn fn when this exists.-t on (JR. Example 6. (iii) (fn E m£+.0:. The set of all subsets of E is a aalgebra containing A.2. A function f : E 1 ~ E 2 is measurable if f-1(A) E £1 whenever A E £2. The a-algebra B generated by A is called the Borel a-algebra of JR. (3 ~ 0) ~ o:f + {3g E m£+. both lim sUPn f nand lim inf n fn are in m£+. which is called the a-algebra generated by A. This measure J. £1) and (E2 . {3 ~ 0.a for all a. It can be shown that there is a unique map ji : m£+ ~ [0. 00] the a-algebra generated by the open intervals (a. b).a < b}. B) such that J. Then m£+ is a cone (f. Let (E 1 . 9 E m£+ .i E I) ~ SUpfi E m£+.2 Basic facts of measure theory Example 6. m£+ is closed under countable suprema: (fi E m£+.g E m£+. Also.

1] satisfies (i) P(O) = 1. r jj. (ii) P(AI n A 2 ) = P(A I ) + P(A 2 ) for AI. Appendix: probability and measure For f E mE. + (3Y) = oE(X) + (3E(Y) for X.E m£+.and IfI = f+ + f-· If jt(lfl) < 00 then f is said to be integrable and we set We call ji(f) the integral of f. It is conventional to drop the tilde and denote the integral by one of the following alternative notations: p(J) = lE r fdp = lXEE f(x)p(dx). o. We use random variables Y : 0 ~ lR to model random quantities. To every non-negative or integrable real-valued random variable Y is associated an average value or expectation E(Y).{3 ~ 0. set f± = (±f) V 0. where for a Borel set B ~ lR the probability that Y E B is given by P(Y E B) = P({w: Y(w) E B}). A 2 disjoint. Thus F is a a-algebra of subsets of 0 and P : F ~ [0. In (iii) we write An i A to mean Al ~ An ~ .3 Probability spaces and expectation The basic apparatus for modelling randomness is a probability space (0.. A measurable function X defined on (0. which is the integral of Y with respect to P. with Un An = A. then f+. Y E mF+. ~ Similarly.. Thus we have (i) E(IA) (ii) E(oX = P(A) for A E F. F) is called a random variable. . This is simply a measure space with total mass P(O) = 1.f. a random variable X : 0 models a random state. (iii) P(A n ) i P(A) whenever An i A. with distribution I Ai = P(X = i) = p({w : X(w) = i}). f = f+ . given a countable state-space I. P).f. In the case of Lebesgue measure one usually writes simply lXEJR r f(x)dx. 6. F.222 6.

as n ~ 00 (fn(x) i f(x) for all x E E) :::} J-t(fn) i J-t(f)· Theorem 6. y) : E 2 ~ [0. Then (a) y ~ f(x. Let (E 1 . n E N. 1991).00] the LAdi iEI where A is the distribution of X. 6. (b) x ~ f yE E2 f(x. (ii) Y ~ IXEE 1 f(x. When X is a random variable with values in I and f expectation of Y = f(X) = foX is given explicitly by E(J(X)) = 223 : I ~ [0. in Probability with Martingales by D.00] is £1 measurable. J-l1) and (E2.00] is £2 measurable. £. For a real-valued random variable Y the probabilities are sometimes given by a measurable density function p in terms of Lebesgue measure: P(Y E B) = Then for any measurable function L p(y)dy. J-t) be a measure space and let (fn)n~l be a sequence of non-negative measurable functions. Yn i Y) :::} IE(Yn ) i IE(Y). Williams (Cambridge University Press. 00] satisfies (i) x ~ f(x.6. (c) r yEE2 JxEE (1 f(x. Then.2 (Fubini's theorem). y)J-t1(dx) : E 2 ~ [0. ') J l l ') . Suppose that f : E 1 x E 2 ~ [0. Y)J-l2(dy) : E 1 ~ [0. f : lR ~ [0.00] is £1 measurable for all Y E E 2.4 Monotone convergence and Fubini's theorem Here are the two theorems from measure theory that come into play in the main text. for example. then we shall discuss some places where they are used. Let (E. Theorem 6.4 Monotone convergence and Fubini's theorem (iii) (Yn E mF+.4. First we shall state the theorems.1 (Monotone convergence). J-l2) be two a-finite measure spaces.4. y) : E 1 ~ [0. Proofs may be found.00] is £2 measurable for all x E E 1. £1.00] there is an explicit formula E(J(Y)) = L f(y)p(y)dy. £2.y)J-t2(dy~J-tl(dx)=1yEE2 (rxEE f(X.Y)J-tl(dx~J-t2(dY).

} and J-t2( {n}) = 1 for all n.2 to see that for random variables Sn ~ 0 we have E(LSn) = LE(Sn) n n and E(exp { . 2.4. £1.X) and if IE(Y) < 00 we can deduce IE(Xn ) ! IE(X). £2. n~N In the last application convergence is not monotone increasing but monotone decreasing.00) with Lebesgue measure and (E2. We used monotone convergence in Theorem 1.2 as a defining property of the integral. Appendix: probability and measure The measurability conditions in the above theorems rarely need much consideration.5 Stopping times and the strong Markov property The strong Markov property for continuous-time Markov chains cannot properly be understood without measure theory. The problem lies with the .2 to see that Thus we have taken (E 1. + 9n.. J-t) is a-finite: just take E 2 = {I.X n ) i IE(Y .X n i Y . . We used monotone convergence in Theorem 2. 6.. provided that (E.L Sn}). 3. There is an equivalent formulation of monotone convergence in terms of sums: for non-negative measurable functions 9n we have To see this just take . They are powerful results and very easy to use. So IE(Y .X.LSn}) =E(J~= exp { . But if 0 ~ X n ~ Y and X n ! X then Y .10..L Sn}) n n~N =J~= E(exp { . Fubini's theorem is used in Theorem 3.3. This is also a special case of Fubini's theorem. J-t1) to be [0. This form of monotone convergence has already appeared in Section 6. J-t2) to be the probability space with the measure Pi.224 6.. £.1 to see that for a nonnegative random variable Y we have IE(Y) = N--+oo lim IE(Y /\ N).fn = 91 +.

Proof.00] is a stopping time of (Xt)t~O if {T ~ t} E F t for all t ~ O.5 Stopping times and the strong Markov property 225 notion of 'depending only on'. Let (Xt)t~O be a right-continuous process with values in a countable set I. by all sets {X s = i} for s ~ t and i E I.s~t ({T~s}n{S>s})EFt D > T} E Fr. We define for stopping times T FT = {A E F : A n {T ~ t} E F t for all t ~ O}. which in measure theory is made precise as measurability with respect to some a-algebra. Without measure theory the statement that a set A depends only on (X s : s ~ t) does not have a precise meaning. Then both X T and {S ~ T} are FT-measurable. that is to say.1. Since (Xt)t~O is right-continuous. for some k ~ 1. We have {S>T}n{T~t}= so {S U sEQ. if the dependence is reasonably explicit we can exhibit it. Hence so X T is FT-measurable.m ~ t and X k2 -rn = X T . (k . in general. Let Sand T be stopping times of (Xt)t~o. on {T < t} there exists an n ~ 0 such that for all m ~ n. but then. This turns out to be the correct way to make precise the notion of sets which 'depend only on {Xt : t ~ T}'. in what terms would you require the dependence to be exhibited? So in this section we shall give a precise measure-theoretic account of the strong Markov property.1)2. We say that a random variable T with values in [0.5.m ~ T < k2.t-l/n}EFt n forall t~O. Denote by F t the a-algebra generated by {X s : s ~ t}. and so {S ~ T} E Fr· .6. Note that this certainly implies {T<t}=U{T::. Lemma 6. Of course.

For s ~ t we have {X s = = i} n {t < J m +1} (D\Y k=O k =i. Let T be a stopping time of (Xt)t~O and let A EFT. For all m (Xt)t~o.s~t {Jm ~ s} n {X s =1= X J71J E F t D 0. both measurable with respect to Qm. . this implies that At = Ft· For T a stopping time and A E FT we have B(t) := {T :s. 0. . Am = then T m and Am are Qm-measurable and T ml{T<J Tn U Am(t) tEQ +l} = suptlB (t)n{T<J Tn Tn +l} tEQ = (sup tl{T~t}) l{T<J tEQ Tn +l} = Tl{T<J Tn +l} .Jk ~ s < Jk+l}U{Ym =i.8m . Fix t ~ 0 and consider Since Qm is a a-algebra.. t} E F t and A(t) := An {T ~ t} E F t for all t ~ O.5. Appendix: probability and measure ~ Lemma 6. So we can find Bm(t).226 6. We denote by Qm the a-algebra generated by yo. Jo = 0 is a stopping time. so J m + 1 is a stopping time and the induction proceeds. such that T = T m and lA = lA on {T < J m + 1 }. Lemma 6. Tn Proof. Am(t) E Qm such that B(t) n {T < J m+ 1 } = Bm(t) A(t) n {T < Jm+ 1 } = Am(t) Set n {T < Jm+1}.2. the jump time J m is a stopping time of Proof.3.··· . n {T < Jm+ 1 }. m that is. Then {Jm +! ~ t} for all t ~ = U sEQ. Obviously.Jm ~ s}) n{t< Jm+d so {X s = i} E At. Then for all m ~ 0 there exist a random variable T m and a set Am. Since these sets generate F t . so is At. Assume inductively that J m is a stopping time. by events of the form {Yk = i} for k ~ m and i E I or of the form {8k > s} for k ~ m and s > o..5.Y and 8 1 .

. . .2. .in E I and all S1. as shown in the diagram . We have to show that.8n > sn) It suffices to prove this with {T < (} replaced by {Jm ~ T < Jm + 1 } for all m ~ 0 and then sum over m.....5.5 Stopping times and the strong Markov property and 227 Am n {T < Jm+d = = U Am(t) n {T < Jm+d tEQ U (A n {T ~ t}) n {T < Jm+d = tEQ An {T < Jm+d as required. . 81 > S1. Q) and let T be a stopping time of (Xt)(~o..3 we can write T = T m and 1A = 1A on {T < J m + 1 }.. .5.. . By Lemmas 6. Tn .6. conditional on T < ( and X T = i.5. (XT+t)(~O is Markov(8i .4 (Strong Markov property). . X IF(A n {T < (} n {X T = i})..:c: : 82 . By Lemma 6.5. Y n = in.. where T m and Am are Qm-measurable. for all A EFT. Proof On {T < (} set X = X T +t and denote by (Yn)n~O the jump chain t and by (Sn)n~1 the holding times of (Xt)t~o. {Jm ~ T} n {XT = i} E FT so we may assume without loss of generality that A ~ {Jm ~ T} n {XT = i}..: o On {Jm ~ 8 m +1 : i T T < J m + 1 } we have. . D Theorem 6.1 and 6. all io. . 8 1 > S1. Let (Xt)(~O be Markov(A. . ...8n > sn} nAn {T < (} n {XT = i}) = lFi(YO = io. Then. Yn = in.Sn ~ 0 IF( {Yo = io. 81 :c: . Q) and independent of FT.

for example hitting probabilities. but it is illuminating to see what has to be done. Ym+n = in.· . Consider . Let 0 be a set.J m and Am and. thus We denote as usual by a(A) the a-algebra generated by A. in measure-theoretic terms.228 6.:F) be a measurable space. that our definitions determine the probabilities of all events depending on the process. Let (O. Let PI and P 2 be probability measures on (0.6.1. conditional on Ym = i. n o 8 1 > 81. .8m+ n > 8 n } n Am n {8m+ 1 > T m . .8n > 8 n )P(A n {Jm ~ T < Jm + 1 } n {XT = i}) as required. 8m+ 1 > 81 + (Tm .J m ).. Then PI = P 2 · Proof. . Appendix: probability and measure Now. 8 m+ 2 > 82.J m }) = Pi(Y = io. From these specified probabilities we have often deduced explicitly the values of other probabilities.Y = in. by the Markov property of the jump chain P( {Yo = io. F) which agree on a 7r-system A generating :F. 8 m+ 1 is independent of gm and hence of T m . . . 81 > 81. A 7r-system A on 0 is a collection of subsets of 0 which is closed under finite intersections. If a(A) we say that A generates :F.6 Uniqueness of probabilities and independence of a-algebras For both discrete-time and continuous-time Markov chains we have given definitions which specify the probabilities of certain events determined by the process. by the memoryless property of the exponential Hence.8n > 8 n } nAn {Jm ~ T < Jm + 1 } n {XT = i}) = p( {Ym = io. = :F Theorem 6. The constructive approach we have taken should make this seem obvious. In this section we shall show.Yn = in. . D 6.

Any collection of subsets having these properties is called a d-system. Then a(A) ~ V. V has the following properties: (i) 0 E V. A ~ VI. Suppose that Then Fl and F 2 are independent. Moreover. A ~ V 2 . Next consider V2 = {A E V : A n B E V for all B E V}.because V is a d-system. the result now follows from the following lemma.6 Uniqueness of probabilities and independence of a-algebras 229 We have assumed that A ~ V.6. Suppose A ~ V. You may easily check that VI is ad-system .6. Theorem 6. Hence also V 2 = V.6.3. Proof. . Since VI = V. since PI and P 2 are probability measures. this shows VI = V. so it suffices to show V is a 1r-system. Since A is a 1r-system. F. Since V is the smallest d-system containing A. But this shows V is a 1r-system. We say that Fl and F 2 are independent if The usual means of establishing such independence is the following corollary of Theorem 6. Any intersection of d-systems is again a d-system. Since A generates F. Let Al be a 1r-system generating F 1 and let A2 be a 1r-system generating F2. You can easily check that V 2 is also ad-system. P) is a probability space and F 1 and F 2 are sub-a-algebras of F. Let A be a 1r-system and let V be a d-system. E V.2 (Dynkin's 1r-system lemma). so we may without loss assume that V is the smallest d-system containing A. D The notion of independence used in advanced probability is the independence of a-algebras.6. Consider first VI = {A E V : A n B E V for all B E A}. B E V and A ~ B) :::} B\A (iii) (An E V. This we do in two stages. You may easily check that any d-system which is also a 1r-system is necessarily a a-algebra.1. D Lemma 6. Suppose that (0. An i A) :::} A E V. (ii) (A.

6.Xtn = in} form a 1r-system which generates the a-algebra a(Xt : t ~ 0).\ and transition matrix P determines the probabilities of all events of the form But subsequently we made explicit calculations for probabilities of events which were not of this form . D We now review some points in the main text where Theorems 6.Xn = in} form a 1r-system which generates the a-algebra a(Xn : n ~ 0).1. An argument given in . > 0 and consider the probability measure We showed in ~e first step that P(A) = P(A) for all A E A2. so.3 are relevant. our definition determines (in principle) the probabilities of all events in this a-algebra. We note now that the events {X o = io.6.1. . There are two steps. First fix A 2 E A 2 with P(A 2 ) the probability measure > 0 and consider jp> = P on :Fl. In Theorem 1.1.. so..6. by Theorem 6. P =P on :F2 .1 and 6.such as the event that (Xn)n~O visits a set of states A.. by Theorem 6.6. Next fix Al E :F1 with P(A 1) We have assumed that P(A) = P(A) for all A E AI.1 we showed that our definition of a discrete-time Markov chain (Xn)n~O with initial distribution . .6. The point about right-continuity is that without such an assumption an event such as {Xt = i for some t > O} which might reasonably be considered to depend on (Xt)t~O. . by Theorem 6.2 we claimed that for a right-continuous process (Xt)t~O the probabilities of events of the form for all n ~ 0 determined the probabilities of all events depending on (Xt)t~o. is not necessarily measurable with respect to a(Xt : t ~ 0). . Hence. In our general discussion of continuous-time random processes in Section 2.. Appendix: probability and measure Proof. Hence:F1 and :F2 are independent.1. So Theorem 6. Now events of the form {Xto = io.230 6.6.1 justifies (a precise version) of this claim.

.XT = i) = JP>(C I T < (. . • e-qio81 ••• e-qin-18n • Then.2 shows that this event is measurable in the right-continuous case. Consider now the method of describing a minimal right-continuous process (Xt)t~O via its jump process (Yn)n~O and holding times (Sn)n~1. Let us take F = a(Xt : t ~ 0). this definition determines JP> on 9 and hence on F.5.S1 > S1.6. Our jump chain/holding time definition of the continuous-time chain (Xt)t~O with initial distribution .·.2 show that (Yn)n~O and (Sn)n~1 are F-measurable.A'tO"'tO'tl \. . Finally... and that JP>(C n A I T < (.5.· .Sn > sn}. A useful1r-system generating 9 is given by sets of the form B = {Yo = i o.XT = i) with B as above.4. C I'fr. Then Lemmas 6. On the other hand. Thus 9 ~ F where 9 = a((Yn)n~O' (Sn)n~1). without some assumption like right-continuity.Yn = i n . On the set = {T < (} define Xt = X T +t and let j = a(Xt : t ~ 0).6. Q) and that T is a stopping time of (Xt)t~o. which is what we did in the proof of Theorem 6. for such events l1J)(B) .1. . By T~eore~ 6. we consider the strong Markov property..1 and 6. Assume that (Xt)t~O is Markov(. general continuous-time processes are unreasonable. and coincide by the same argument as = g.5. We conclude that. .6 Uniqueness of probabilities and independence of a-algebras 231 Section 2.Sn > sn}.x. X T 9 = a((Yn)n~O' (Sn)n~O). write CYn)n~O and (Sn)n~O for the jump chain and holding times of (Xt)t~O and set n Thus for F F and 9 are a-algebras on n. JP>(B I T < (. .6. Theorem 6. for all i E I {Xt = i} = U {In ~ t < In+d n {Yn = i} E g.S1 > S1. · . Set B = {Yo = i o. · .Xt = i)JP>(A for all C E F and A E FT.3 it suffices to prove the independence assertion for the case C = B. by Theorem 6. n~O so also F C g. ••• "'tn-l't n I'fr.5.Yn = i n ..x and generator matrix Q may be read as stating that. Then the conclusion of the strong Markov property states that = i) = lPi(B) IT < (.4.

Amsterdam. Markov Processes and Martingales. Mathematical Population Genetics. Spiegelhalter. Diffusions. 1996. P. Carus Mathematical Monographs 22. W. S. 1984. Ross. Statistical Science 10 (1) (1995). Bayesian computation and stochastic systems.J. 1971. F. Mengersen.L. T. Wiley. 1979. Wiley. Reversibility and Stochastic Networks.R. Holden-Day.L. San Francisco. K. Dover. 2nd edition. Higdon and K. Holden-Day. Mathematical Association of America. Markov Chains. Green. 3-40. Snell. Chung. D. Doyle and J. Freedman. Williams. Ripley. . P. This may provide a number of starting points for your exploration of the vast literature on Markov processes and their applications. Markov Chains. Vol 1: Foundations. 1987. London. Springer.G. Chichester. Chapman and Hall.M. B. The Theory of Branching Processes.C. Besag. Kelly. Random Walks and Electrical Networks. San Francisco. Wiley. Chichester.Further reading We gather here the references for further reading which have appeared in the text. J. Harris. 1994. 1967. Springer. 1978. D. 1984.P. Richardson and D. Rogers and D. 2nd edition. Berlin. 1989. North-Holland. D. Ewens. Stochastic Simulation. Berlin. Markov Chains with Stationary Transition Probabilities. Applied Probability Models with Optimization Applications. Gilks.D. Markov Chain Monte Carlo in Practice.E.J. New York.G. L. 1970. W. Revuz. Chichester. S.

Cambridge University Press. Probability with Martingales. Tijms. 1993. H.Further reading 233 D.An Analytic View. Probability Theory . 1994. 1991.W. Williams. Wiley. Stroock. Stochastic Models .an Algorithmic Approach. D. . Cambridge University Press. Chichester.C.

82. 9. 111 conditional expectation 129 conductivity 151 continuous-time Markov chains 87 construction 89 infinitesimal definition 94 jump chain/holding time definition 94. 111 closed migration process 184 communicating class 11. 96 Bayesian statistics 211 biological models 6. 97 transition probability definition 94. 170 birth process 81 infinitesimal definition 85 jump chain/holding time definition 85 transition probability definition 85 birth-and-death chain 16 boundary 138 boundary theory for Markov chains 147 branching process 171 with immigration 179 Brownian motion 159 as limit of random walks 164 existence 161 in jRd 165 scaling invariance 165 starting from x 165 transition density 166 busy period 181 capacity 151 central limit theorem 160 charge 151 closed class 11. 16. 121. 97 continuous-time random process 67 convergence to equilibrium 41. 111 absorption probability 12.Index absorbing state 11. 168 countable set 217 coupling method 41 current 151 . 112 action 198 adapted 129 alleles 175 aperiodicity 40 average number of customers 181 backward equation 62.

169 Green matrix 144. expected time in i between visits F n . 117 ergodic theorem 53. 117 computation of 40 irreducibility 11. expected return time 37. 124 discounted value function 202 distribution 1 lE. 222 integral form of the backward equation 98 integral form of the forward equation 101 inter-arrival times 180 invariant distribution 33. exponential distribution of parameter :A 70 eQ . 118 J. 113 expected return time 37. 141. expected time in i between visits to j 35 Galton-Watson process 171 gambling 131 Gaussian distribution 160 generator 166 generator matrix 94. state-space 2 infective 173 integrable 129. 111 Ising model 214 jump chain 69 jump matrix 87 jump times 69 last exit time 20 long-run proportion of time 53. 115 flow 151 forward equation 62. P) 2 Markov(:A. 208 Markov decision process 197 expected total cost 198 expected total discounted cost 202 long-run average costs 204 Markov property 3 for birth processes 84 for continuous-time chains 93 for Poisson process 75 martingale 129. 111 holding times 69 235 I.t{. filtration 129 fair price 135 filtration 129 finite-dimensional distributions 67 first passage decomposition 28 first passage time 19. 168 Erlang's formula 183 excursions 24 expectation 222 expected hitting time 12. 100 for birth processes 84 for Poisson process 78 Fubini's theorem 223 discrete case 219 full conditional distributions 212 fundamental solution 145 1'1.Index detailed balance 48.97 Markov chain continuous-time 88 discrete-time 2 Markov chain Monte Carlo 206. 118 explosion for birth processes 83 for continuous-time chains 90 explosion time 69 explosive Q-matrix 91 exponential distribution 70 Gibbs sampler 210 gravity 134. 145 harmonic function 146 Hastings algorithm 210 hitting probability 12 hitting time 12. exponential of Q 62 effective conductivity 156 electrical network 151 energy 154· epidemic model 173 equilibrium distribution 33. 126. expectation 222 E(:A). Q) 94. 176. 126 mi. 97 to j 118 Markov(:A. 204 .

O(t). rate of going from i to j 61 queue 179 MIGll 187 M/G/oo 191 MIMll 180 M/M/s 182 queueing network 183-185 queues in series 183 o(t). 118 Q. generator matrix of reversed chain 124 qi. 114. transition matrix 2 [>.236 Index gravitational 134 in electrical networks 151 with discounted costs 142 potential theory 134 probability generating function 171 probability measure 222 probability space 222 Q-matrix 60 associated to a Markov chain 132 associated to Brownian motion 169 matrix exponentials 105 maximum likelihood estimate 56 measure 1 memoryless property 70 Metropolis algorithm 210 minimal non-negative solution 13 minimal process 69 monotone convergence 223 discrete case 219 Moran model 177 mutation 6. semigroup of reversed chain 124 p~j). transition semigroup 96 P(t). jump matrix 87 1r-system 228 Poisson process 74 infinitesimal definition 76 jump chain/holding time definition 76 transition probability definition 76 policy 198 policy improvement 201 population genetics 175 population growth 171 positive recurrence 37. order notation 63 Ohm's law 151 open migration process 185 optional stopping theorem 130 IP. 118 potential 138 associated to a Markov chain 138 associated to Brownian motion 169 random chessboard knight 50 random walk on tl d 29 on a graph 49 recurrence 24. 195 resolvent 146 resource management 192 restocking a warehouse 192 return probability 25 reversibility 48. 167 recurrence relations 57 reflected random walks 195 reservoir model 194. probability 222 P. 176 non-minimal chains 103 null recurrence 37. 125 right-continuous process 67 ruin gambler 15 insurance company 196 selective advantage 176 semigroup 96 semigroup property 62 service times 180 shopping centre 185 simple birth process 82 simulation 206 skeleton 122 state-space 1 . transition matrix of reversed chain 47 P(t). n-step transition probability 5 II. rate of leaving i 61 qij.

117 stationary increments 76 stationary. 215 stochastic matrix 2 stopping time 19 strong law of large numbers 52 strong Markov property 19. explosion time 69 .Index stationary distribution 33. 123 transience 24. 227 success-run chain 38 susceptible 173 telephone exchange 183 texture parameter 216 time reversal 47. 211. 93. 167 transition matrix 2 irreducible 11 237 maximum likelihood estimate 56 transition semigroup 165 truncated Poisson distribution 183 unit mass 3 Vi(n). 114.policy 198 statistics 55. number of visits to i before n 53 valency 50 value function 198 weak convergence 164 Wiener process 159 Wiener's theorem 161 Wright-Fisher model 175 (.

Sign up to vote on this title
UsefulNot useful