Professional Documents
Culture Documents
5 Markov Chains
In various applications one considers collections of random variables which
evolve in time in some random but prescribed manner (think, eg., about con-
secutive flips of a coin combined with counting the number of heads observed).
Such collections are called random (or stochastic) processes. A typical random
process X is a family {Xt : t ∈ T } of random variables indexed by elements of
some set T . When T = {0, 1, 2, . . . } one speaks about a ‘discrete-time’ process,
alternatively, for T = R or T = [0, ∞) one has a ‘continuous-time’ process. In
what follows we shall only consider discrete-time processes.
19
O.H. Probability II (MATH 2647) M15
P(Xn+1 = j | Xn = i) ≡ P(X1 = j | X0 = i)
for all n, i, j. The transition matrix P = (pij ) is the |S| × |S| matrix of transition
probabilities
pij = P(Xn+1 = j | Xn = i) .
In what follows we shall only consider homogeneous Markov chains.
The next claim characterizes transition matrices.
b Theorem 5.3. P is a stochastic matrix, which is to say that
P(Yn+1 = s + 1 | Yn = s) = p , P(Yn+1 = s | Yn = s) = 1 − p ,
for all n ≥ 0, where 0 < p < 1. You may think of Yn as the number of heads
thrown in n tosses of a coin.
Example 5.5. [Simple random walk] Let S = {0, ±1, ±2, . . . } and define the
Markov chain X by X0 = 0 and
p, if j = i + 1,
pij = q = 1 − p, if j = i − 1,
0, otherwise.
r−k k
pk,k+1 = , pk,k−1 = , pij = 0 otherwise .
r r
In words, there is a total of r balls in two urns; k in the first and r − k in the
second. We pick one of the r balls at random and move it to the other urn.
Ehrenfest used this to model the division of air molecules between two chambers
(of equal size and shape) which are connected by a small hole.
Example 5.7. [Birth and death chains] Let S = {0, 1, 2, . . . , }. These chains are
defined by the restriction pij = 0 when |i − j| > 1 and, say,
with q0 = 0. The fact that these processes cannot jump over any integer makes
the computations particularly simple.
20
O.H. Probability II (MATH 2647) M15
Definition 5.8. The n-step transition matrix Pn = pij (n) is the matrix of
n-step transition probabilities
(n) def
pij (n) ≡ pij = P(Xm+n = j | Xm = i) .
Of course, P1 = P.
b Theorem 5.9 (Chapman-Kolmogorov equations). We have
X
pij (m + n) = pik (m) pkj (n) . (5.2)
k
pij (m + n) = P(Xm+n = j | X0 = i)
X
= P(Xm+n = j, Xm = k | X0 = i)
k
X
= P(Xm+n = j | Xm = k, X0 = i) P(Xm = k | X0 = i)
k
X
= P(Xm+n = j | Xm = k) P(Xm = k | X0 = i)
k
X X
= pkj (n) pik (m) = pik (m) pkj (n) .
k k
(n) def
Let µi = P(Xn = i), i ∈ S, be the mass function of Xn ; we write µ(n) for
(n)
the row vector with entries (µi : i ∈ S).
Example 5.11. Consider a three-state Markov chain with the transition matrix
0 1 0
P= 0 2/3 1/3 .
1/16 15/16 0
(n)
Z Find a general formula for p11 .
21
O.H. Probability II (MATH 2647) M15
41 The justification comes from linear algebra: having distinct eigenvalues, P is diagonaliz-
able, that is for some invertible matrix U we have
1 0 0
P = U 0 −1/4 0 U −1
0 0 −1/12
and hence
1 0 0
Pn = U 0 (−1/4)n 0 U −1
0 0 (−1/12)n
(n)
which forces p11 = a + b(−1/4)n + c(−1/12)n with some constants a, b, c.
22
O.H. Probability II (MATH 2647) M15
In other words, a closed class is one from which there is no escape. A state i is
absorbing if {i} is a closed class.
Exercise 5.15. Show that every transition matrix on a finite state space has at
least one closed communicating class. Find an example of a transition matrix with
no closed communicating classes.
b Definition 5.16. A Markov chain or its transition matrix P is called irreducible
if its state space S forms a single communicating class.
Example 5.17. Find the communicating classes associated with the stochastic
matrix
1/2 1/2 0 0 0 0
0 0 1 0 0 0
1/3 0 0 1/3 1/3 0
P= .
0 0 0 1/2 1/2 0
0 0 0 0 0 1
0 0 0 0 1 0
Solution. The classes are {1, 2, 3}, {4}, {5, 6}, with only {5, 6} being closed. (Draw
the diagram!)
so that b) ⇐⇒ c).
23
O.H. Probability II (MATH 2647) M15
Lemma 5.19. If states i and j are communicating, then i and j have the same
period.
Example 5.20. It is easy to see that both the simple random walk (Exam-
ple 5.5) and the Ehrenfest chain (Example 5.6) have period 2. On the other
hand, the birth and death process (Example 5.7) with all pk ≡ pk,k+1 > 0, all
qk ≡ pk,k−1 > 0 and at least one rk ≡ pkk positive is aperiodic (however, if all
rkk vanish, the birth and death chain has period 2).
(m) (n)
Proof of Lemma 5.19. Since i ↔ j, there exist m, n > 0 such that pij pji > 0. By
the Chapman-Kolmogorov equations,
(m+r+n) (m) (r) (n)
pii ≥ pij pjj pji > 0
so that d(i) divides d(j). In a similar way one deduces that d(j) divides d(i).
0 0 0 1
24
O.H. Probability II (MATH 2647) M15
Example 5.23 (continued) Answer the same question by solving Eqns. (5.5)
Solution. Strictly speaking, the system (5.5) reads
1 1
h4 = 1 , h3 = (h4 + h2 ) , h2 = (h3 + h1 ) , h 1 = h1
2 2
so that
1 2 2 1
h2 = + h1 , h3 = + h1 .
3 3 3 3
The value of h1 is not determined by the system, but the minimality condition requires
h1 = 0, so we recover h2 = 1/3 as before. Of course, the extra boundary condition
h1 = 0 was obvious from the beginning so we built it into our system of equations and
did not have to worry about minimal non-negative solutions.
25
O.H. Probability II (MATH 2647) M15
Example 5.25. (Gamblers’ ruin) Imagine the you enter a casino with a
fortune of £i and gamble, £1 at a time, with probability p of doubling your
stake and probability q of losing it. The resources of the casino are regarded as
infinite, so there is no upper limit to your fortune. What is the probability that
you leave broke?
In other words, consider a Markov chain on {0, 1, 2, . . . } with transition proba-
bilities
p00 = 1 , pk,k+1 = p , pk,k−1 = q (k ≥ 1)
where 0 < p = 1 − q < 1. Find hi = Pi (H {0} < ∞).
Solution. The vector (hi : i ≥ 0) is the minimal non-negative solution to the system
h0 = 1 , hi = p hi+1 + q hi−1 , (i ≥ 1) .
As we shall see below, a general solution to this system is given by (check that the
following expressions solve this system!)
( i
A + B q/p , p 6= q
hi =
A + Bi, p = q,
where A and B are some real numbers. If p ≤ q, the restriction 0 ≤ hi ≤ 1 forces
B = 0 so that hi ≡ A = 1 for all i ≥ 0. If p > q, we get (recall that h0 = A + B = 1)
q i q i q i
hi = A + (1 − A) = +A 1− .
p p p
As for the non-negative solution we must have A ≥ 0, the minimal solution now reads
hi = (q/p)i .
Example 5.23 (continued) Assuming that X0 = 2, find the mean time until
the chain is absorbed in states 1 or 4.
Solution. Put ki = Ei (H {1,4} ). Clearly, k1 = k4 = 0, and thanks to the Markov
property, we have
1 1
k2 = 1 + k1 + k3 , k3 = 1 + k2 + k4
2 2
(The 1 appears in the formulae above because we count the time for the first step).
As a result, k2 = 1 + 12 k3 = 1 + 12 (1 + 21 k2 ), that is k2 = 2.
26
O.H. Probability II (MATH 2647) M15
be the probability of the event “the first future visit to state j, starting from i,
takes place at nth step”. Then
∞
X (n)
fij = fij ≡ Pi Tj < ∞
n=1
27
O.H. Probability II (MATH 2647) M15
is the probability that the chain ever visits j, starting from i. Of course, state j
is recurrent iff
∞
X (n)
fjj = fjj = Pj Tj < ∞ = 1 . (5.9)
n=1
In this case
Pj Xn = j for some n ≥ 1 ≡ Pj Xn returns to state j at least once = 1 ,
the probability that starting from i the Markov chain ever hits j.
Remark 5.27.2. By (5.9), state j is recurrent if and only if Tj is a proper
random variable w.r.t. the probability distribution Pj ( · ), ie., P(Tj < ∞) = 1.
A recurrent state j is positive recurrent if the expectation
∞
X
(n)
Ej Tj ≡ E Tj | X0 = j = nfjj
n=1
28
O.H. Probability II (MATH 2647) M15
Of course, (5.10) is nothing else than the first passage decomposition: every
trajectory leading from i to j in n steps, visits j for the first time in k steps
(1 ≤ k ≤ n) and then comes back to j in the remaining n − k steps.
b Corollary 5.29. The following dichotomy holds:
P (n) P (n)
a) if n pjj = ∞, then the state j is recurrent; in this case n pij = ∞
for all i such that fij > 0.
P (n) P (n)
b) if n pjj < ∞, then the state j is transient; in this case n pij < ∞
for all i.
P (n)
Remark 5.29.1. Notice that n pjj is just the expected number of visits to
state j starting from j.
Example 5.30. Let Xn n≥0 be a Markov chain in S with X0 = k ∈ S.
Consider the event An = ω : Xn (ω) = k that the chain returns to state
(n)
k after n steps. Clearly, P(An ) = pkk , the n-step transition probability. By
P (n)
Corollary 5.29, state k is transient iff n pkk < ∞. On the other P hand, by
the first Borel-Cantelli lemma, Lemma 1.6a), finiteness of the sum n P(An ) ≡
P (n)
n pkk < ∞ implies P An i.o. = 0, so that, with probability one, the chain
visits state k only a finite number of times.
(n)
Z Corollary 5.31. If j ∈ S is transient, then pij → 0 as n → ∞ for all i ∈ S.
Example 5.32. Determine recurrent and transient states for a Markov chain
on {1, 2, 3} with the following transition matrix:
1/3 2/3 0
P= 0 0 1 .
0 1 0
(n) P (n)
Solution. We obviously have p11 = (1/3)n with n p11 < ∞, so state 1 is transient.
(n) (n)
On the other hand, for i ∈ {2, 3}, we have pii = 1 for even n and pii = 0 otherwise.
P (n)
As n pii = ∞ for i ∈ {2, 3}, both states 2 and 3 are recurrent.
(1) (2) (2)
Alternatively, f11 ≡ f11 = 1/3, f22 ≡ f22 = 1, and f33 ≡ f33 = 1, so that state 1 is
transient and states 2, 3 are recurrent.
Proof of Lemma 5.28. The event An = {Xn = j} can be decomposed into a disjoint
union of “the first visit to j events”
ie., An = ∪n
k=1 Bk , so that
n
X
Pi (An ) = Pi (An ∩ Bk ) .
k=1
29
O.H. Probability II (MATH 2647) M15
On the other hand, by the “tower property” and the Markov property
def
X (n) def
X (n)
Pij (s) = pij sn , Fij (s) = fij sn
n n
(0) (0)
with the convention that pij = δij , the Kronecker delta, and fij = 0 for all i
and j. By Lemma 5.28, we obviously have
Notice that this equation allow us to derive Fij (s), the generating function of
the first passage time from i to j.
P (n)
Proof of Corollary 5.29. To show that j is recurrent iff p = ∞, we first observe
−1jj n
that (5.11) with i = j is equivalent to Pjj (s) = 1 − Fjj (s) for all |s| < 1 and thus,
as s % 1,
Pjj (s) → ∞ ⇐⇒ Fjj (s) → Fjj (1) ≡ fjj = 1 .
P (n)
It remains to observe that the Abel theorem implies Pjj (s) → n pjj as s % 1. The
rest follows directly from (5.11) with i 6= j.
Using Corollary 5.29, we can easily classify states of finite state Markov
chains into transient and recurrent. Notice that transience and recurrence are
class properties:
b Lemma 5.33. Let C be a communicating class. Then either all states in C are
transient or all are recurrent.
Thus it is natural to speak of recurrent and transient classes.
P∞ (l)
Proof. Fix i, j ∈ C and suppose that state i is transient, so that l=0 pii < ∞.
(k) (m)
There exist k, m ≥ 0 with pij > 0 and pji > 0, so that for all l ≥ 0, by Chapman-
Kolmogorov equations,
(k+l+m) (k) (l) (m)
pii ≥ pij pjj pji
and thus
∞
X ∞
X ∞
X
(l) 1 (k+l+m) 1 (l)
pjj ≤ (k) (m)
pii ≡ (k) (m)
pii < ∞ ,
l=0 pij pji l=0 pij pji l=k+m
30
O.H. Probability II (MATH 2647) M15
Proof. Let C be a recurrent class which is not closed. Then there exist i ∈ C, j ∈
/C
def
and m ≥ 1 such that the event Am = {Xm = j} satisfies Pi (Am ) > 0. Denote
def
B = Xn = i for infinitely many n .
Remark 5.34.1. In fact, one can prove that Every finite closed class is recurrent.
Sketch of the proof. Let class C be closed and finite. Start Xn in C and consider the
event Bi = {Xn = i for infinitely many n}. By finiteness of C, for some i ∈ C
and all j ∈ C, we have 0 < Pj (Bi ) ≡ fji Pi (Bi ), and therefore Pi (Bi ) > 0. Since
Pi (Bi ) ∈ {0, 1}, we deduce Pi (Bi ) = 1, i.e., state i (and the whole class C) is recurrent.
31
O.H. Probability II (MATH 2647) M15
32
O.H. Probability II (MATH 2647) M15
P(Xn = j) → πj as n → ∞.
π1 = a π1 π2 = b π1 + π2 , π3 = c π1 + π3 ,
33
O.H. Probability II (MATH 2647) M15
By Lemma 5.42, the exponential moments E ecTj exist for all c > 0 small
enough, in particular, all polynomial moments E (Tj )p with p > 0 are finite.
b Lemma 5.43. Let Xn be an irreducible recurrent Markov chain with stationary
distribution π. Then for all states j, the expected return time Ej (Tj ) satisfies
πj Ej (Tj ) = 1 .
P
n−1
If Vk (n) = 1{Xl =k} denotes the number of visits 46 to k before time n,
l=1
then Vk (n)/n is the proportion of time before n spent in state k.
The following consequence of Theorem 5.39 and Lemma 5.43 gives the long-
run proportion of time spent by a Markov chain in each state. It is often referred
to as the Ergodic theorem.
b Theorem 5.44. If (Xn )n≥0 is an irreducible Markov chain, then
V (n) 1
k
P → as n → ∞ = 1 ,
n Ek (Tk )
where Ek (Tk ) is the expected return time to state k.
Moreover, if Ek (Tk ) < ∞, then for any bounded function f : S → R we have
1 n−1
X
P f (Xk ) → f¯ as n → ∞ = 1 ,
n
k=0
def P
where f¯ = k∈S πk f (k) and π = (πk : k ∈ S) is the unique stationary
distribution.
In other words, the second claim implies that if a real function f on S is
bounded, then the average of f along every typical trajectory of a “nice” Markov
chain (Xk )k≥0 is close to the space average of f w.r.t. the stationary distribution
of this Markov chain.
Proof of Lemma 5.43. Let X0 have the distribution π so that P(X0 = j) = πj . Then,
∞
X ∞
X
πj Ej (Tj ) ≡ P Tj ≥ n | X0 = j P(X0 = j) = P Tj ≥ n, X0 = j
n=1 n=1
def
Denote an = P(Xm 6= j for 0 ≤ m ≤ n). We then have
P Tj ≥ 1, X0 = j ≡ P(X0 = j) = 1 − a0
and for n ≥ 2
P Tj ≥ n, X0 = j = P X0 = j, Xm 6= j for 1 ≤ m ≤ n − 1
= P Xm 6= j for 1 ≤ m ≤ n − 1
− P Xm 6= j for 0 ≤ m ≤ n − 1 = an−2 − an−1 .
46
n−1
P (l)
Notice that Ej Vk (n) ≡ E Vk (n) | X0 = j = pjk .
l=1
34
O.H. Probability II (MATH 2647) M15
Consequently,
∞
X
πj Ej (Tj ) = P(X0 = j) + (an−2 − an−1 )
n=2
since
lim an = P(Xm 6= j for all m) = P(Tj = ∞) = 0 ,
n
Remark 5.43.1. Observe that the argument above shows that even in the
countable state space the stationary distribution is unique, provided it exists.
In what follows we shall rely upon the following property of γ-positive matrices.
d µ0 Q, µ00 Q ≤ (1 − γ) d(µ0 , µ00 ) . (5.15)
The
estimate (5.15) means
P that the mapping µ 7→ µQ from the “probability sim-
m
plex” x ∈ R : xj ≥ 0, j xj = 1 to itself is a contraction; therefore, it has a unique
fixed point x = xQ in this simplex. Results of this type are central to various areas
of mathematics.
We postpone the proof of Lemma 5.46 and verify a particular case Theorem 5.39
first.
35
O.H. Probability II (MATH 2647) M15
Proof of Theorem 5.39 for γ-positive matrices. Let µ(0) be a fixed initial distribution,
P be a γ-positive stochastic matrix, and let µ(n) = µ(0) P denote the distribution at
time n. By (5.15),
d µ(n) , µ(n+k) = d µ(0) Pn , µ(0) Pn+k
≤ (1 − γ)n d µ(0) , µ(0) Pk ≤ (1 − γ)n ,
Moreover, this limit is unique: indeed, assuming the contrary, let π 0 and π 00 be two
distinct stationary probability distributions for P; then, by (5.15),
(n)
we have pij ≡ µ(0) P j
→ πj for all j = 1, 2, . . . , m. The proof is finished.
36
O.H. Probability II (MATH 2647) M15
b Definition 5.51. Let (Xn )n≥0 be an irreducible Markov chain with state space S
and transition matrix P. A probability measure π on S is said to be reversible for
the chain (or for the matrix P) if π and P are in detailed balance, ie.,
37
O.H. Probability II (MATH 2647) M15
has stationary distribution π = (1/3, 1/3, 1/3). For the latter to be reversible
for this Markov chain, we need π1 p12 = π2 p21 , i.e., p = q. If p = q = 21 , then
DBE (5.16) hold for all pairs of states, ie., the chain is reversible. Otherwise
the chain is not reversible.
Exercise 5.53. Let P be irreducible and have an invariant distribution π. Sup-
pose that (Xn )0≤n≤N is a Markov chain with transition matrix P and the initial
def
distribution π, and set Yn = XN −n . Show that (Yn )0≤n≤N is a Markov chain
with the same initial distribution π and with transition matrix P b = (p̂ij ) given
by πj p̂ji = πi pij for all i, j. The chain (Yn )0≤n≤N is called the time-reversal of
(Xn )0≤n≤N .
b of the time-reversal for the Markov
Exercise 5.54. Find the transition matrix P
chain from Example 5.52.
Z Example 5.55. [Random walk on a graph] A graph G is a countable collection
of states, usually called vertices, some of which are joined by edges. The valency
vj of vertex j is the number of edges at j, and we assume that every vertex in
G has finite valency. The random walk on G is a Markov chain with transition
probabilities (
1/vj , if (j, k) is an edge
pjk =
0, otherwise.
We assume that G is connected, so that P is irreducible. It is easy to see
that P is in detailed balance with v = (vj : j ∈ G). As a result, if the total
P def
valency V = j vj is finite, then π = v/V is a stationary distribution and P
is reversible.
38
O.H. Probability II (MATH 2647) M15
such that if v1 and v2 are adjacent in G, then ξv1 6= ξv2 . The following Markov chain
in S V randomly selects q-colourings of G:
Let a colouring C ∈ S V be given.
1. Pick v ∈ V uniformly at random;
2. Re-colour v in any admissible colour (ie., not in use by any of the
neighbours of v) uniformly at random.
It is easy to see that for q large enough, this Markov chain is irreducible, aperiodic,
and thus has a unique stationary distribution. Consequently, after sufficiently many
steps, the distribution of this Markov chain will be almost uniform in the collection
of admissible colourings of G. The smallest possible value of q for which the theory
works depends on the structure of G; in many cases this optimal value is unknown.
Z Example 5.58. (Gibbs sampler) More generally, if one needs to sample a random
element from a set S according to a probability distribution π, one attempts to con-
struct an irreducible aperiodic Markov chain in S, which is reversible w.r.t. π. Many
such examples can be found in the literature.
Z For various applications, it is essential to construct Markov chains with fast conver-
gence to equilibrium. This is an area of active research with a lot of interesting open
problems.
39