You are on page 1of 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/255649584

Summary of Results on Markov Chains

Article

CITATIONS READS

0 158

1 author:

Enrico Scalas
University of Sussex
220 PUBLICATIONS   5,286 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

exact finite probability models in econophics View project

All content following this page was uploaded by Enrico Scalas on 28 July 2014.

The user has requested enhancement of the downloaded file.


Summary of Results on Markov Chains

Enrico Scalas1, ∗
1
Laboratory on Complex Systems. Dipartimento di Scienze e Tecnologie Avanzate,
Università del Piemonte Orientale “Amedeo Avogadro”,
Via Bellini 25 G, 15100 Alessandria, Italy
(Dated: August 30, 2007)

Abstract
These short lecture notes contain a summary of results on the elementary theory of Markov
chains. The purpose of these notes is to let the reader understand as quickly as possible the
concept of statistical equilibrium, based on the stationary distribution of homogeneous Markov
chains. Some exercises related to these notes can be found in a separate document.

PACS numbers: 02.50.-r, 02.50.Ey, 05.40.-a, 05.40.Jc,


Electronic address: enrico.scalas@mfn.unipmn.it; URL: www.mfn.unipmn.it/~scalas

1
I. INTRODUCTION

Many models used in Economics, in Physics or in other sciences are instances of Markov
chains. This is the case of Schelling’s model [1] or of the closely related Ising’s model [2]
with the usual Monte Carlo dynamics [3]. Economists will find further motivation to study
Markov chains in a recent book by Aoki and Yoshikawa [4].
Markov chains have the advantage that their theory can be introduced and many results
can be proven in the framework of the elementary theory of probability, without extensively
using measure theoretical tools. In order to compile the present summary, the books by Hoel
et al., by Kemeny and Laurie Snell, by Durrett and by Çinlar [5–8] have been consulted.
These notes can be considered as a summary of the first two chapters of Hoel et al..
In this summary, random variables will be denoted by capital letters X, Y, . . . and their
values by small letters x, y, . . . . In order to define a Markov chain, a random variable Xn will
be considered that can assume values in a finite or at most denumerable set of states S at
instants denoted by the subscript n = 0, 1, 2, . . .. This subscript will always be interpreted
as a discrete-time index.
It will be further assumed that

P (Xn+1 = xn+1 |X0 = x0 , . . . , Xn = xn ) = P (Xn+1 = xn+1 |Xn = xn ), (1)

for every choice of the non-negative integer n and of the values x0 , . . . , xn which belong to
S. P (·|·) is a conditional probability. The meaning of equation (1) is that the probability of
Xn+1 does not depend on the past history, but only on the value of Xn ; this equation, the so-
called Markov property, can be used to define Markov chains. The conditional probabilities
P (Xn+1 = xn+1 |Xn = xn ) are called transition probabilities. If they do not depend on n,
they are stationary (or homogeneous) transition probabilities and the corresponding Markov
chains are stationary (or homogeneous) Markov chains.

II. PROPERTIES OF MARKOV CHAINS

A. Transitions and initial distribution

The transition function, P (x, y) of a Markov chain, Xn , is defined as

P (x, y) = P (X1 = y|X0 = x), x, y ∈ S. (2)

2
The values of P (x, y) are non-negative and the sum over the final states y of P (x, y) is 1. In
the finite case with M states, this function can be represented as a square M × M matrix
with non-negative matrix elements and with rows summing up to 1.
For a stationary Markov chain, one has that

P (Xn+1 = y|Xn = x) = P (x, y), n ≥ 1, (3)

the initial distribution is


π0 (x) = P (X0 = 0), (4)

and the joint probability distribution P (X0 = x0 , X1 = x1 , . . . , Xn = xn ) can be expressed


as a product of π0 (x) and P (x, y)’s in the following way

P (X0 = x0 , X1 = x1 . . . , Xn = xn ) = π0 (x0 )P (x0 , x1 ) · · · P (xn−1 , xn ). (5)

The m-step transition function P m (x, y) is the probability of going from state x to state y
in m steps. It is given by
X X
P m (x, y) = ··· P (x, y1 )P (y1 , y2 ) · · · P (ym−2 , ym−1 )P (ym−1 , y) (6)
y1 ym−1

for m ≥ 2; for m = 1, it coincides with P (x, y) and for m = 0, it is 1 if x = y and 0 elsewhere.


The following three formulae involving P m (x, y) are useful in the theory of Markov chains:
X
P n+m (x, y) = P n (x, z)P m (z, y) (7)
z
X
P (Xn = y) = π0 (x)P n (x, y) (8)
x
X
P (Xn+1 = y) = P (Xn = x)P (x, y). (9)
x

B. Hitting times and classification of states

Given a subset of states A, the hitting time TA is defined as

TA = min{n > 0 : Xn ∈ A}. (10)

Thanks to the concept of hitting time, it is possible to classify the states of Markov chains in
a very useful way. Let Px (·) denote the probability of an event for a Markov chain starting
at state x. Then one has the following formula for the n-step transition function:
n
X
n
P (x, y) = Px (Ty = m)P n−m (y, y). (11)
m=1

3
An absorbing state of a Markov chain is a state a for which P (a, a) = 1 or, equivalently,
P (a, y) = 0 for any state y 6= a. If the chain reaches such a state, it is trapped there and it
will never leave. For an absorbing state, it turns out that P n (x, a) = Px (Ta ≤ n) for n ≥ 1.
The quantity
ρxy = Px (Ty < ∞) (12)

can be used to introduce two classes of states. ρyy is the probability that a chain starting
at y will ever return to y. A state y is recurrent if ρyy = 1 and transient if ρyy < 1. For
a transient state, there is a positive probability to never return back. An absorbing state
is recurrent. The indicator function Iy (z) helps in defining the counting random variable
N (y). The indicator function Iy (Xn ) is 1 if Xn = y and 0 otherwise, therefore

X
N (y) = Iy (Xn ) (13)
n=1

counts the number of times in which the chain reaches state y. The event {N (y) ≥ 1}
coincides with the event {Ty < ∞}. Therefore, one can write

Px (N (y) ≥ 1) = Px (Ty < ∞) = ρxy . (14)

By induction, one can prove that for m ≥ 1

Px (N (y) ≥ m) = ρxy ρm−1


yy , (15)

hence
Px (N (y) = m) = ρxy ρm−1
yy (1 − ρyy ), (16)

and finally
Px (N (y) = 0) = 1 − Px (N (y) ≥ 1) = 1 − ρxy . (17)

One can define G(x, y) = Ex (N (y)), the average number of visits to state y for a Markov
chain that started at x. It turns out that

X
G(x, y) = Ex (N (y)) = P n (x, y). (18)
n=1

It is now possible to state the following

Theorem 1. 1. Let y be a transient state. Then Px (N (y) < ∞) = 1 and


ρxy
G(x, y) = , x∈S (19)
1 − ρyy
finite for all states.

4
2. Let y be a recurrent state. Then Py (N (y) = ∞) = 1 and G(y, y) = ∞ and one also
has
Px (N (y) = ∞) = Px (Ty < ∞) = ρxy , x ∈ S. (20)

Finally, if ρxy = 0, then G(x, y) = 0, else if ρxy > 0, then G(x, y) = ∞.

This theorem tells that the Markov chain pays only a finite number of visits to a transient
state, whereas if it starts from a recurrent state it will come back there an infinite number
of times. If the Markov chain starts at any state x, it may well be that it will never visit
the recurrent state y, but if it gets there, it will come back infinitely many times. A Markov
chain is called transient if it has only transient states and recurrent if all of its states are
recurrent. A finite Markov chain at least has one recurrent state and cannot be transient.

C. The decomposition of space state

A state x leads to another state y if ρxy > 0 or, equivalently, if there exists a positive
integer n for which P n (x, y) > 0. If x leads to y and y leads to z, then x leads to z. Based
on this concept, there is the following

Theorem 2. Let x be a recurrent state and suppose that x leads to y, Then y is recurrent
and ρxy = ρyx = 1.

A set of states C is said to be closed if no state in C leads to a state outside C. An


absorbing state a defines the closed set {a}. There are several caracterizations of closed
sets, but they will not be included here. A closed set C is irreducible (or ergodic) if for any
choice of two states x and y in C, x leads to y. It is a consequence of Theorem (2) that if C is
an irreducible closed set, either every state in C is transient or every state in C is recurrent.
Another consequence of Theorems (1,2) is the following

Corollary 1. For an irreducible closed set of recurrent states, C one has ρxy = 1, Px (N (y) =
∞) = 1, and G(x, y) = ∞ for all choices of x and y in C.

Finally, one has the following important result as a direct consequence of the above
theorems and corollaries

Theorem 3. If C is a finite irreducible closed set, then any state in C is recurrent.

5
If we are given a finite Markov chain, it is often possible to directly verify if the process
is irreducible (or ergodic) by using the transition function (matrix) and controlling whether
any state leads to any other state. Finally, one can prove the following decomposition into
irreducible (ergodic) components

Theorem 4. A non-empty set SR of recurrent states is the union of a finite or countably


infinite number of disjoint irreducible closed sets C1 , C2 , . . ..

If the initial state of the Markov chain is within one of the sets Ci , the time evolution will
take place within this set and the chain will visit any of these states an infinite number of
times. If the chain starts within the set of transient states ST , either it will stay in this set
visiting any transient state only a finite number of times, or, if it reaches one of the Ci , it
will stay there and will visit any state of the irreducible closed set infinitely many times.
The problem arises to determine the hitting time distribution of the various ergodic
components for a chain that starts in a transient state, as well as the absorption probability
ρC (x) = Px (TC < ∞) for x ∈ ST . The latter problem has the following solution when ST is
finite.

Theorem 5. Let the set ST be finite and let C be a closed irreducible set of recurrent states.
Then the system of equations
X X
f (x) = P (x, y) + P (x, y)f (y), x ∈ ST (21)
y∈C y∈ST

has the unique solution f (x) = ρC (x), x ∈ ST .

III. THE PATH TO STATISTICAL EQUILIBRIUM

A. The stationary distribution

The stationary distribution, π(x), is a function of the Markov chain state space such that
its values are non-negative, its sum over state space is 1, and
X
π(x)P (x, y) = π(y), y ∈ S. (22)
x

It is interesting to notice that, for all n


X
π(x)P n (x, y) = π(y), y ∈ S. (23)
x

6
Moreover, if X0 follows the stationary distribution, then, for all n, the distribution of
Xn also follows the stationary distribution. Indeed, the distribution of Xn does not
depend on n if and only if π0 (x) = π(x). If π(x) is a stationary distribution and
limn→∞ P n (x, y) = π(y) holds for every initial state x and for every state y then one can
conclude that limn→∞ P (Xn = y) = π(y) irrespective of the initial distribution. This means
that, after a transient period, the distribution of chain states reaches a stationary distribu-
tion, which can then be interpreted as an equilibrium distribution in the statistical sense.
For the reasons discussed above, it is important to see under which conditions, π(x) exists
and is unique and to study the convergence properties of P n (x, y).

B. How many times is a recurrent state visited in average?

Let Nn (t) denote the number of visits to a state y up to time step n. This random variable
is defined as n
X
Nn (y) = Iy (Xm ). (24)
m=1

One can also define the average number of visits to state y, starting from x up to step n:
n
X
Gn (x, y) = Ex (Nn (y)) = P m (x, y). (25)
m=1

If my = Ey (Ty ) is taken to indicate the mean return (recurrence) time to come back to y for
a chain starting at y, then, as an application of the strong law of large numbers one has

Theorem 6. Let y be a recurrent state, then

N (y) I{Ty <∞}


lim = (26)
n→∞ n my

with probability one and


Gn (x, y) ρxy
lim = , x∈S (27)
n→∞ n my
The meaning of this theorem is that if a chain reaches a recurrent state y, then it returns
there with frequency 1/my . Note that the quantity Nn (y)/n is immediately accessible from
Monte Carlo simulation of Markov chains. A corollary is of immediate relevance to finite
Markov chains:

7
Corollary 2. Let x, y be two generic states in an irreducible closed set of recurrent states
C, then
Gn (x, y) 1
lim = , (28)
n→∞ n my
and if P (X0 ∈ C) = 1, then with probability one for any state y in C
N (y) 1
lim = (29)
n→∞ n my
If my = ∞ the right sides are both 0.

A null recurrent state y is a recurrent state for which my = ∞. A positive recurrent


state y is is a recurrent state for which my < ∞. The following result characterizes positive
recurrent states

Theorem 7. If x is a positive recurrent state and x leads to y then also y is positive


recurrent.

In a finite irreducible closed set of states there is no null recurrent state:

Theorem 8. If C is a finite irreducible closed set of states, every state in C is positive


recurrent.

These corollaries are immediate consequences of the above theorems and corollary

Corollary 3. An irreducible Markov chain having a finite number of states is positive re-
current.

Corollary 4. A Markov chain having a finite number of states has no null recurrent states.

As a final remark of this subsection, note that Theorem (6) and Corollary (2) connect
“time” averages defined by Nn (y)/n to “ensemble” averages defined by Gn (x, y)/n and they
can be called ergodic theorems. Ergodic theorems are related to the so-called strong law of
large numbers, one of the important results of probability theory.

Theorem 9. Let ξ1 , ξ2 , . . ., be independent and identically distributed random variables


with finite mean µ, then
ξ1 + ξ2 + · · · + ξn
lim =µ
n→∞ n
If these random variables are positive with infinite mean, the theorem still holds with
µ = +∞.

8
C. Existence, uniqueness and convergence to the stationary distribution

Eventually, the main results on the existence and uniqueness of π(x) and the limiting
behaviour of P n (x, y) can be stated. The ergodic theorems discussed in the previous sub-
section do provide a rule for the Monte Carlo “approximation” of π(x) that can be used to
prove its existence and uniqueness.
First of all, the stationary weight of both transient states and null recurrent states is
zero.

Theorem 10. If π(x) is a stationary distribution and x is a transient state or a null recur-
rent state then π(x) = 0.

This means that a Markov chain without positive recurrent states cannot have a station-
ary probability distribution. However,

Theorem 11. An irreducible positive recurrent Markov chain has a unique stationary dis-
rtribution π(x) given by
1
π(x) = . (30)
mx
This theorem provides the utimate justification for the use of Markov chain Monte Carlo
simulations to sample the stationary distribution if the hypotheses of the theorem are ful-
filled. In order to get an approximate value for π(y) one “lets the system equilibrate” (and
to fully justify this step, the convergence theorem will be necessary) and then counts the
number of occurences of state y, Nn (y) in a “long enough” simulation of the Markov chain
and divides it by the number of Monte Carlo steps n. This program can be carried out when
the state space is not too large. In a typical Monte Carlo simulation of the Ising model, with
K sites, the number of states is 2K and soon grows to become untractable. In a simulation,
many states will be never sampled even if the Markov chain is irreducible. For this reason,
Metropolis et al. introduced the importance sampling trick whose explanation is outside
the scope of the present notes [3, 9]. The next corollary provides a nice characterization of
positive recurrent Markov chains.

Corollary 5. An irreducible Markov chain is positive recurrent if and only if it has a sta-
tionary distribution.

For chains with a finite number of states the existence and uniqueness of the stationary
distribution is granted if they are irreducible.

9
Corollary 6. If a Markov chain having a fnite number of states is irreducible, it has a
unique stationary distribution

and, finally, the corollary discussed above, where the recipe was given to estimate π(x)
from Monte Carlo simulations:

Corollary 7. For an irreducible positive recurrent Markov chain having stationary distri-
bution π, one has with probability one
Nn (x)
lim = π(x). (31)
n→∞ n
For reducible Markov chains the following results hold

Theorem 12. Let SP denote the positive recurrent states of a Markov chain

1. if SP is empty, the stationary distribution does not exist;

2. if SP is not empty and irreducible, the chain has a unique distribution;

3. if SP is non empty and reducible, the chain has an infinite number of stationary
distributions.

Case 3 is when the chain reaches one of the closed irreducible sets and then stays there
forever. It is a subtle case, where Monte Carlo simulations may not give proper results if
the chain reducibility is not studied.
If x is a state of a Markov chain such that P n (x, x) > 0 for some n ≥ 1, its period dx
can be defined as the greatest common divisor of the set {n ≥ 1 : P n (x, x) > 0}. For two
states x and y leading to each other, dx = dy . States in an irreducible Markov chain have a
common period d. The chain is called periodic of period d if d > 1 and aperiodic if d = 1.
The following theorem gives the conditions for the convergence of P n (x, y) to the stationary
distribution:

Theorem 13. For an aperiodic irreducible positive recurrent Markov chain with stationary
probability π(x)
lim P n (x, y) = π(y), x, y ∈ S. (32)
n→∞

For a periodic chain with the same properties and with period d, for each pair of states in
S, there is an integer r, 0 ≤ r < d, such that P n (x, y) = 0 unless n = md + r for some
non-negative integer m, and

lim P md+r (x, y) = dπ(y), x, y ∈ S. (33)


m→∞

10
This theorem is the only one in the list that needs (mild) number-theoretic tools to be
proven.

Acknowledgements

These notes were written during a visit to Marburg University supported by an Erasmus
fellowship. The author wishes to thank Prof. Guido Germano and his group for their warm
hospitality.

[1] T.S. Schelling, (1971) Dynamic Models of Segregation, Journal of Mathematical Sociology 1,
143-186.
[2] E. Ising, (1924) Beitrag zur Theorie des Ferro- und Paramagnetismus, Dissertation,
Mathematisch-Naturwissenschaftliche Fakultät der Hamburgischen Universität, Hamburg.
[3] D. Landau and K. Binder (1995) A Guide to Monte Carlo Simulations in Statistical Physics,
Cambridge University Press.
[4] M. Aoki and H. Yoshikawa (2007) Reconstructing Macroeconomics. A Perspective from Statis-
tical Physics and Combinatorial Stochastic Processes, Cambridge University Press.
[5] P.G. Hoel, S.C. Port, and C.J. Stone (1972) Introduction to Stochastic Processes, Houghton
Mifflin, Boston.
[6] J.G. Kemeny and J. Laurie Snell (1976) Finite Markov Chains, Springer, New York.
[7] R. Durrett (1999) Essentials of Stochastic Processes, Springer, New York.
[8] E. Çinlar (1975) Introduction to Stochastic Processes, Prentice Hall, Englewood Cliffs.
[9] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller and E. Teller (1953) Equation of
State Calculations by Fast Computing Machines, Journal of Chemical Physics, 21, 1087–1092.

11

View publication stats

You might also like