This action might not be possible to undo. Are you sure you want to continue?

**MARKOV CHAINS AND RANDOM WALKS
**

Takis Konstantopoulos∗ Autumn 2009

∗

c Takis Konstantopoulos 2006-2009

Contents

1 Introduction 2 Examples of Markov chains 2.1 2.2 2.3 2.4 2.5 A mouse in a cage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bank account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple random walk (drunkard’s walk) . . . . . . . . . . . . . . . . . . . . . . Simple random walk (drunkard’s walk) in a city . . . . . . . . . . . . . . . . . Actuarial chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 3 4 4 5 6 6 6 7 9

3 The Markov property 3.1 3.2 3.3 Deﬁnition of the Markov property . . . . . . . . . . . . . . . . . . . . . . . . Transition probability and initial distribution . . . . . . . . . . . . . . . . . . Time-homogeneous chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Stationarity 4.1

Finding a stationary distribution . . . . . . . . . . . . . . . . . . . . . . . . . 13 15

5 Topological structure 5.1 5.2 5.3 5.4

The graph of a chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The relation “leads to” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The relation “communicates with” . . . . . . . . . . . . . . . . . . . . . . . . 16 Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 20 24 27 28 29

6 Hitting times and ﬁrst-step analysis 7 Gambler’s ruin 8 Stopping times and the strong Markov property 9 Regenerative structure of a Markov chain 10 Recurrence and transience

10.1 First hitting time decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 31 10.2 Communicating classes and recurrence/transience . . . . . . . . . . . . . . . . 32 11 Positive recurrence 33

i

12 Law (=Theorem) of Large Numbers in Probability Theory 13 Law of Large Numbers for Markov chains 13.1 Functions of excursions

34 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

13.2 Average reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 14 Construction of a stationary distribution 15 Positive recurrence is a class property 16 Uniqueness of the stationary distribution 17 Structure of a stationary distribution 18 Coupling and stability 39 41 43 44 45

18.1 Deﬁnition of stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 18.2 The fundamental stability theorem for Markov chains . . . . . . . . . . . . . 47 18.3 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 18.4 Proof of the fundamental stability theorem . . . . . . . . . . . . . . . . . . . 49 18.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 18.6 Periodic chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 19 Limiting theory for Markov chains* 53

19.1 Limiting probabilities for null recurrent states . . . . . . . . . . . . . . . . . . 53 19.2 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 20 Ergodic theorems 21 Finite chains 22 Time reversibility 23 New Markov chains from old ones* 56 57 58 62

23.1 Subsequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 23.2 Watching a Markov chain when it visits a set . . . . . . . . . . . . . . . . . . 62 23.3 Subordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 23.4 Deterministic function of a Markov chain . . . . . . . . . . . . . . . . . . . . 63 24 Applications 65

ii

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Branching processes: a population growth application . . . . . . . . . . . . . . 92 30. . . . . . . . . .1 Distribution of hitting time and maximum . . . . . . . . .2 The ALOHA protocol: a computer communications application .1 Paths that start and end at speciﬁc points . . . . . . . . . . . . . . . . 85 27. . . . . . . . . . . . . . . . . . .4 The Wright-Fisher model: an example from biology . 90 30. . . . . . . . . . . . . . . . . . 84 27. . . . .24. . . . . . . . . . . . . . . . . . . . . . . . 96 32 Duality 33 Amount of time that a SSRW is positive* 98 99 iii . . . . . . . . . . . . . . . . . . .2 Paths that start and end at speciﬁc points and do not touch zero at all . . . . . . . . . . . .4 Some remarkable identities . . . . . . 83 27. . . . . . . . . . . . . . . . . . . . .1 Asymptotic distribution of the maximum . . . . . . . . . . 80 27 The simple symmetric random walk in dimension 1: simple probability calculations 83 27. . . . . 68 24. . . . . . 79 26. . . . . . 70 24. . .1 First hitting time analysis . . . . . . . . .5 First return to zero . . . 95 31. . . . . . . . . .3 Total number of visits to a state . . . . . . .2 The ballot theorem . . .1 Distribution after n steps . . . . . . . . . . . . . . . .3 The distribution of the ﬁrst return to the origin . . . . . . . 84 27. . . . . . . . . . . . . . . 71 24. 87 29 Urns and the ballot theorem 30 The asymmetric simple random walk in dimension 1 88 89 30. . .2 First return to the origin . . . . . . . . . . . . 86 28 The reﬂection principle for a simple random walk in dimension 1 86 28. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 A storage or queueing application . . . . . . .3 PageRank (trademark of Google): A World Wide Web application . . . . . . . . . . . . . . . . . .2 Return probabilities . . . . . . 65 24. . . . . . . . . . . . . . . . . . . . . . 92 31 Recurrence properties of simple random walks 94 31. . . . .3 Some conditional probabilities . . . . . . . . 73 25 Random walk 26 The simple symmetric random walk in dimension 1: path counting 76 78 26. . . . . 95 31. . . . . . .

34 Simple symmetric random walk in two dimensions 35 Skorokhod embedding* 102 103 iv .

I have a little mathematical appendix. I introduce stationarity before even talking about state classiﬁcation. There notes are still incomplete. At the end. dozens of good books on the topic. They form the core material for an undergraduate course on Markov chains in discrete time. There are. of course. The notes should be readable by someone who has taken a course in introductory (non-measuretheoretic) probability. I have tried both and prefer the current ordering.K.. Also.. “important” formulae are placed inside a coloured box.Preface These notes were written based on a number of courses I taught over the years in the U. A few starred sections should be considered “advanced” and can be omitted at ﬁrst reading. Greece and the U. v . one can argue that random walk calculations should be done before the student is exposed to the Markov chain theory. Also. I plan to add a few more sections: – On algorithms and simulation – On criteria for positive recurrence – On doing stuﬀ with matrices – On ﬁnance applications – On a few more delicate computations for simple random walks – Reshape the appendix – Add more examples These are things to come.. The second one is speciﬁcally for simple random walks. Of course. I tried to put all terms in blue small capitals whenever they are ﬁrst encountered. The only new thing here is that I give emphasis to probabilistic methods as soon as possible. I tried to make everything as rigorous as possible while maintaining each step as accessible as possible.S. The ﬁrst part is about Markov chains and some applications.

. and evolving as xn+1 = yn yn+1 = rem(xn . In the module following this one you will study continuous-time chains. a totally ordered set. It is not hard to see that. continuous (i. . For instance. the integers). where r = rem(a. for any n. i.e. An example from Arithmetic of a dynamical system in discrete time is the one which ﬁnds the greatest common divisor gcd(a. 2. s ≥ t) ˙ 1 . and the other the greatest common divisor. . depend only on the pair Xn . yn+1 ). The “time” can be discrete (i. Suppose that a particle of mass m is moving under the action of a force F in a straight line. . The sequence of pairs thus produced. F = f (x. more generally. Newton’s second law says that the acceleration is proportional to the force: m¨ = F. x Here.e. y0 = min(a. yn ). x). b). there is a function F that takes the pair Xn = (xn . b) is the remainder of the division of a by b. b). In Mathematics. . Its velocity is x = dx . yn ). b) = gcd(r.e. . b) between two positive integers a and b. Xn+2 . . x = x(t) denotes the position of the particle at time t. the future pairs Xn+1 . Recall (from elementary school maths). x(t)). one of which is 0. initialised by x0 = max(a. An example from Physics is provided by Newton’s laws of motion. Formally. or. the real numbers). n = 0. . The state of the particle at time t is described by the pair ˙ X(t) = (x(t). that gcd(a. 1. the particle is suspended at the end of a spring and force is produced by extending the spring and letting the system oscillate (Hooke’s law). Assume that the force F depends only on the particle’s position ¨ dt and velocity.PART I: MARKOV CHAINS 1 Introduction A Markov chain is a mathematical model of a random phenomenon evolving with time in a way that the past aﬀects the future only through the present. we end up with two numbers. we let our “state” be a pair of integers (xn . In other words. for any t. the future trajectory (X(s). yn ) into Xn+1 = (xn+1 . X1 . b). has the property that. where xn ≥ yn . We are herein constrained by the Syllabus to discuss only discrete-time Markov chains. X2 . X0 . and its ˙ dt 2x acceleration is x = d 2 . By repeating the procedure. . a phenomenon which evolves with time in a way that only the present aﬀects the future is called a dynamical system.

An example of a random algorithm is the Monte Carlo method for the b approximate computation of the integral a f (x)dx of a complicated function f : suppose that f is positive and bounded below B. Similarly.05. .1 Examples of Markov chains A mouse in a cage A mouse is in a cage with two cells. In other words. i. but some are stochastic. it moves from 2 to 1 with probability β = 0. In addition.can be completely speciﬁed by the current state X(t). analyse. The resulting ratio is an approximation b of a f (x)dx. Perform this a large number N of times.000 columns or computing the integral of a complicated function of a large number of variables.99. a ≤ X0 ≤ b. instead of deterministic objects. at time n + 1 it is either still in 1 or has moved to 2. We will also see why they are useful and discuss how they are applied. respectively. Y0 ) of random numbers. . by the software in your computer. uniformly. Other examples of dynamical systems are the algorithms run. and use those mathematical models which are called Markov chains. When the mouse is in cell 1 at time n (minutes) then. 2 2. A mouse lives in the cage. 1 2 Statistical observations lead the scientist to believe that the mouse moves from cell 1 to cell 2 with probability α = 0. it does so. an eﬀective way for dealing with large problems is via the use of randomness. Some of these algorithms are deterministic. We will develop a theory that tells us how to describe. Each time count 1 if Yn < f (Xn ) and 0 otherwise. We can summarise this information by the transition diagram: 1 Stochastic means random. for steps n = 1.1 They are stochastic either because the problem they are solving is stochastic or because the problem is deterministic but “very large” such as ﬁnding the determinant of a matrix with 10. Yn ). The generalisation takes us into the realm of Randomness.e. Markov chains generalise this concept of dependence of the future only on the present.000 rows and 10. via the use of the tools of Probability. we will see what kind of questions we can ask and what kind of answers we can hope for. . past states are not needed.. say. 1 and 2. containing fresh and stinky cheese. regardless of where it was at earlier times. Count the number of 1’s and divide by N . A scientist’s job is to record the position of the mouse every minute. Choose a pair (X0 . Repeat with (Xn . 2. Try this on the computer! 2 . We will be dealing with random variables. Indeed. 0 ≤ Y0 ≤ B.

How often is the mouse in room 1 ? Question 1 has an easy. answer: Since the mouse really tosses a coin to decide whether to move or stay in a cell.y := P (Xn+1 = y|Xn = x). S2 . S0 . 0}. y = 0. px. D0 . Dn is the money he deposits. Assume that Dn . from month to month. We easily see that if y > 0 then ∞ x. . S1 . 1.) 2. intuitive.05 0. = z=0 ∞ = z=0 3 . . . Dn = z + y − x) P (Sn = z)P (Dn = z + y − x) FS (z)FD (z + y − x). are independent random variables. Here. Assume also that X0 . So if he wishes to spend more than what is available.01 Questions of interest: 1. Xn is the money (in pounds) at the beginning of the month. he can’t.95 0. on the average. Sn . are random variables with distributions FD .y = P (Dn − Sn = y − x) = = z=0 ∞ z=0 ∞ P (Sn = z. the ﬁrst time that the mouse will move from 1 to 2 will have mean 1/α = 1/0. Dn − Sn = y − x) P (Sn = z.99 0. (This is the mean of the binomial distribution with parameter α. . 2. to move from cell 1 to cell 2? 2. .α 1−α 1 β 2 1−β Another way to summarise the information is by the 2×2 transition probability matrix P= 1−α α β 1−β = 0. respectively. and Sn is the money he wants to spend. D1 . . D2 . as follows: Xn+1 = max{Xn + Dn − Sn . How long does it take for the mouse. The information described above by the evolution of the account together with the distributions for Dn and Sn leads to the (one-step) transition probability which is deﬁned by: px.05 ≈ 20 minutes.2 Bank account The amount of money in Mr Baxter’s bank account evolves. FS .

2 p1.2 ··· ··· ··· ··· · · · · · · ··· and is an inﬁnite matrix. on the average. how fast? 2. 1/2 1/2 1/2 1/2 1/2 −2 1/2 −1 1/2 0 1/2 1 1/2 2 1/2 3 Questions: 1. in principle. and if so. how often will he be visiting 0? 2.2 P= p2. if y = 0.1 p2.y .3 Simple random walk (drunkard’s walk) Consider a completely drunk person who walks along a street. can be computed from FS and FD . to go from the pub to home? You might object to the assumption that the person is totally drunk and has no sense of direction. We may want to model such a case by assigning probability p for the drunkard to go one step to the right (and probability 1 − p for him to go one step to the left). how long will it take him. Are there any places which he will never visit? 3. So he may move forwards with equal probability that he moves backwards. he has no sense of direction. Questions of interest: 1. If the pub is located in position 0 and his home in position 100. we have px. being drunk. Are there any places which he will visit inﬁnitely many times? 4.0 = 1 − ∞ px. How long will it take (if ever) for the bank account to empty? We shall develop methods for answering these and other questions. The transition probability matrix is y=1 p0.1 p0.0 p0. If he starts from position 0. 2. 2. Of course.4 Simple random walk (drunkard’s walk) in a city Suppose that the drunkard is allowed to move in a city whose streets are laid down in a square pattern: 4 .0 p1.0 p2.something that.1 p1. Will Mr Baxter’s account grow.

These things are. but they will become more precise in the sequel.8 0. Note that the diagram omits pH.D = 1.S = 0. therefore. the answer to this is crucial for the policy determination. We may again want to ask similar questions.S because pH. north or south. so we assign probability 1/4 for each move. there is probability pH. etc. 2.1 D Thus. the reason being that the company does not believe that its clients are subject to resurrection.S − pH. Question: What is the distribution of the lifetime (in months) of a currently healthy individual? Clearly. for now. pD. Also.H = 1 − pH. 5 .D . It proposes the following model summarising the state of health of an individual on a monthly basis: 0. and let’s say he’s completely drunk.3 that the person will become sick.D . observe.3 H 0.H . west.S = 1 − pS. and pS.01 S 0. given that he is currently healthy. But.5 Actuarial chains A life insurance company wants to ﬁnd out how much money to charge its clients. Clearly. mathematically imprecise. So he can move east.Salvador Dali (1922) The Drunkard Suppose that the drunkard moves from corner to corner. and pS. there is clearly a higher chance that our man will be lost. that since there are more degrees of freedom now. the company must have an idea on how long the clients will live. pD.D is omitted.H − pS.

(Xn .2 Transition probability and initial distribution We therefore see that joint probabilities can be expressed as P (X0 = i0 . . 3. . . 2. Xn−1 ) are independent conditionally on Xn . unless needed. X0 = i0 ) = P (Xn+1 = in+1 | Xn = in ). n + m] Xk = ik | Xn = in ). . where T is a subset of the integers. m ∈ T).3 3. Lemma 1. . X0 = i0 ) = P (∀ k ∈ [n + 1. . . (1) 6 . . for all n ∈ Z+ and all i0 . .1 The Markov property Deﬁnition of the Markov property Consider a (ﬁnite or inﬁnite) sequence of random variables {Xn . . . zero. 3. n ∈ T} is Markov instead of saying that it has the Markov property. m and any choice of states i0 . Let us now write the Markov property in an equivalent way. . X2 = i2 . m < n. . . Usually. and so future is independent of past given the present. . it is customary to call (Xn ) a Markov chain. or negative). Sometimes we say that {Xn . . almost exclusively (unless otherwise said explicitly). which precisely means that (Xn+1 . conditionally on Xn . . . the Markov property. 1. viz. . n ∈ Z+ ) is Markov if and only if. . If (Xn ) is Markov then the claim holds by the conditional independence of Xn+1 and (X0 . We say that this sequence has the Markov property if. Proof. if the claim holds. . . that the Xn take values in some countable set S. the future process (Xm . . . m ∈ T) is independent of the past process (Xm . . . m > n. . Xn−1 ) given Xn . . for any n ∈ T. Since S is countable. in+m . T = Z+ = {0. This is true for any n and m. .} or N = {1. called the state space. . We focus almost exclusively to the ﬁrst choice. So we will omit referring to T explicitly. . for any n. . 3. The elements of S are frequently called states. X1 = i1 . we can see that by applying it several times we obtain P (∀ k ∈ [n + 1. Xn = in ) = P (X0 = i0 )P (X1 = i1 | X0 = i0 )P (X2 = i2 | X1 = i1 ) · · · P (Xn = in | Xn−1 = in−1 ).} or the set Z of all integers (positive. P (Xn+1 = in+1 | Xn = in . We will also assume. n + m] Xk = ik | Xn = in . in+1 ∈ S. n ∈ T}. On the other hand. . 2. . . Xn+m ) and (X0 . . . .

.i1 (m. 1) pi1 .i1 (0. requires some ordering on S) and whose entries are pi.3 Time-homogeneous chains We will specialise to the case of time-homogeneous chains. 2 3 And. n) = i1 ∈S ··· in−1 ∈S pi. . n + 1) is called transition probability from state i at time n to state j at time n + 1.i (n − 1. But we will circumvent them by using Probability only.which means that the two functions µ(i) := P (X0 = i) .. something that can be called semigroup property or Chapman-Kolmogorov equation.j (n. but if S is an inﬁnite set. (2) which is reminiscent of matrix multiplication. More generally. n). .j (m. 2) · · · pin−1 . n) = 1(i = j) and. pi. The function µ(i) is called initial distribution. means that the transition probabilities pi. . will specify all joint distributions2 : P (X0 = i0 . ℓ) P(ℓ. i∈S i. n). X2 = i2 . remembering a bit of Algebra. m + 2) · · · P(n − 1. n) := P (Xn = j | Xm = i) is the transition probability from state i at time m to state j time n ≥ m.j (m. n + 1) do not depend on n . Xn = in ) = µ(i0 ) pi0 . n)]i. Therefore. If |S| = d then we have d × d matrices. or as P(m. n). then the matrices are inﬁnitely large. pi.j (m. n) if m ≤ ℓ ≤ n. all probabilities of events associated with the Markov chain This causes some analytical diﬃculties. pi. we can write (2) as P(m. we deﬁne. of course. which. n) = P(m. m + 1) · · · pin−1 . for each m ≤ n the matrix P(m. n + 1) := P (Xn+1 = j | Xn = i) .j (m. n) = P(m. viz.i2 (1.j (n. by deﬁnition.j (n. 7 . n ≥ 0. 3. X1 = i1 . n) := [pi.j∈S . m + 1)P(m + 1. n). a square matrix whose rows and columns are indexed by S (and this.3 In this new notation. Of course.j (n. pi. j ∈ S. The function pi.in (n − 1. by a result in Probability. by (1).

to picking state i with probability 1. For example. Notice that any probability distribution on S can be written as a linear combination of unit masses. if j = i otherwise. For example. unless otherwise speciﬁed. (3) We call δi the unit mass at i. of course. we write Ei Y := E(Y | X0 = i) = yP (Y = y | X0 = i) = yPi (Y = y).j (n. then µ = pδi1 + (1 − p)δi2 . as follows from (1) and the deﬁnition of P. In other words. y y 8 .j the (one-step) transition probability from state i to state j.and therefore we can simply write pi. Starting from a speciﬁc state i. It corresponds. if µ assigns probability p to state i1 and 1 − p to state i2 . while P = [pi. transition matrix. The matrix notation is useful. Similarly. Then we have µ n = µ 0 Pn . 0. The distribution that assigns value 1 to a state i and 0 to all other states is denoted by δi . Thus Pi is the law of the chain when the initial distribution is δi . arranged in a row vector. Then P(m. and call pi. let µn := [µn (i) = P (Xn = i)]i∈S be the distribution of Xn .j ]i. From now on.j∈S is called the (one-step) transition probability matrix or. n + 1) =: pi. A useful convention used for time-homogeneous chains is to write Pi (A) := P (A | X0 = i). if Y is a real random variable. n) = P × P × · · · · · · × P = Pn−m . when we say “Markov chain” we will mean “timehomogeneous Markov chain”. b ≥ 0. n−m times for all m ≤ n. and the semigroup property is now even more obvious because Pa+b = Pa Pb . δi (j) := 1. Notational conventions Unit mass.j . simply. for all integers a.

We can model this by saying that the chance that a molecule move from room 1 to room 2 is proportional to the number of molecules in room 1.4 Stationarity We say that the process (Xn .). as it takes some time before the molecules diﬀuse from room to room. . Then a hole is opened in the separator and we watch what happens as the molecules diﬀuse between the two rooms. Hence the uniform probability distribution is stationary. On the average.i. dividing it into two identical rooms. . Imagine there are N molecules of gas (where N is a large number.i+1 = P (Xn+1 = i + 1 | Xn = i) = pi. We now investigate when a Markov chain is stationary. then. 1. But this will not work. . then. The question then is: can we distribute the initial number of molecules in a way that the origin of time plays no role? One guess might be to take X0 = N/2. a sequence of i.) is stationary if it has the same law as (Xn . at any point of time n. the law of a stationary process does not depend on the origin of time. the distribution of the bug will still be the same. that if room 1 contains more gas than room 2 then there is a tendency to observe net motion from 1 to 2. n = 0. and vice versa. The gas is placed partly in room 1 and partly in room 2. Then pi. Example: the Ehrenfest chain. . . say N = 1025 ) in a metallic chamber with a separator in the middle.d. at time n + 1 it moves to one of the adjacent vertices with equal probability.i−1 = P (Xn+1 N −i N i = i − 1 | Xn = i) = . To provide motivation and intuition. m + 1. It is clear. This is a model of gas distributed between two identical communicating chambers. n = m. i. n ≥ 0) will not be stationary: indeed. Let Xn be the number of molecules in room 1 at time n. Consider a canonical heptagon and let a bug move on its vertices. . random variables provides an example of a stationary process. for a while we will be able to tell how long ago the process started. because it is impossible for the number of molecules to remain constant at all times. If at time n the bug is on a vertex. N If we start the chain with X0 = N (all molecules in room 1) then the process (Xn . for any m ∈ Z+ . It should be clear that if initially place the bug at random on one of the vertices. let us look at the following example: Example: Motion on a polygon. Another guess then is: Place 9 . In other words.e. we indeed expect that we have N/2 molecules in each room. Clearly. selecting a vertex with probability 1/7.

a Markov chain is stationary if and only if the distribution of Xn does not change with n. We can now verify that if P (X0 = i) = π(i) then P (X1 = i) = π(i) and. Since. Indeed. .i + P (X0 = i + 1)pi+1.) The example above is typical: We found (by guessing!) some distribution π with the property that if X0 has distribution π then every other Xn also has distribution π. ﬁnd those distributions π such that πP = π. For the if part. in ∈ S. . Proof. 2−N i−1 i+1 N N (The dots signify some missing algebra. In other words. . which you should go through by yourselves. P (Xn = i) = π(i) for all n. Xm+n = in ). If we can do so. then the number of molecules in room 1 will be i with probability π(i) = N −N 2 . . . for all m ≥ 0 and i0 . The only if part is obvious.i = π(i − 1)pi−1. A Markov chain (Xn . The equations (4) are called balance equations.i + π(i + 1)pi+1.each molecule at random either in room 1 or in room 2. If we do so. (a binomial distribution). X2 = i2 . we see that the Markov chain is stationary if and only if the distribution µ0 is chosen so that µ0 = µ0 P.i = N N −i+1 N i+1 2−N + = · · · = π(i). Changing notation. the distribution µn at time n is related to the distribution at time 0 by µn = µ0 Pn . Lemma 2. Xn = in ) = P (Xm = i0 . n ∈ Z+ ) is stationary if and only if it is time-homogeneous and Xn has the same distribution as Xℓ for all n and ℓ. P (X1 = i) = j P (X0 = j)pj. Xm+2 = i2 .i = P (X0 = i − 1)pi−1. by (3). hence. . then we have created a stationary Markov chain. (4) Such a π is called . . . Xm+1 = i1 . stationary or invariant distribution. . i 0 ≤ i ≤ N. But we need to prove this. . X1 = i1 . . use (1) to see that P (X0 = i0 . 10 . . we are led to consider: The stationarity problem: Given [pij ].

But here. π(1) will be zero. which. 2 .i = p(i). Then (Xn . i≥1 π(i) −1 To compute π(1). where all the xi are distinct. where 1 is a column whose entries are all 1. we must be careful. π must be a probability: π(i) = 1. write the equations (4): π(i) = π(j)pj.i = π(0)p(i) + π(i + 1). . which gives p(j) . . The state space is S = N = {1.Eigenvector interpretation: The equation π = πP says that π is a left eigenvector of the matrix P with eigenvalue 1. 11 . This means that the solution to the balance equations is identically zero. the next one will arrive in i minutes with probability p(i). d! x ∈ Sd . . The times between successive buses are i. Thus. p1. can be written as P1 = 1. if i > 0 i ∈ N.}. Keep doing it continuously. i = 1. random variables. So we must make an: ASSUMPTION: i≥1 j≥i p(j) < ∞. 2. in matrix notation. . . xd ).d. . . Example: card shuﬄing.j = 1. After bus arrives at a bus stop. Who tells us that the sum inside the parentheses is ﬁnite? If the sum is inﬁnite.i. . and this means that 1 is a (right) eigenvector of P corresponding to the eigenvalue 1. Consider a deck of d = 52 cards and shuﬄe them in the following (stupid) way: pick two cards at random and swap their positions. It is easy to see that the stationary distribution is uniform: π(x) = 1 . . n ≥ 0) is a Markov chain with transition probabilities pi+1. i∈S The matrix P has the eigenvalue 1 always because j∈S pi. d}. . In addition to that. i ≥ 0. and so each π(i) will be zero. . we use the normalisation condition π(1) = i≥1 j≥i = 1. each x ∈ Sd is of the form x = (x1 . which in turn means that the normalisation condition cannot be satisﬁed. We can describe this by a Markov chain which takes values in the set Sd of d! permutations of {1. Consider the following process: At time n (measured in minutes) let Xn be the time till the arrival of the next bus. . To ﬁnd the stationary distribution. then we run into trouble: indeed. . i ≥ 1.. Example: waiting for a bus.i = 1. j≥1 These can easily be solved: π(i) = π(1) j≥i p(j).

. . . and the rule maps x into 2(1 − x). . 2. (5) 12 . r = 1. 1.Let us work out the sum to see if this assumption can be expressed in more “physical” terms: p(j) = 1(j ≥ i)p(j) = jp(j). P (ξr (n) = 1) = P (ξr (n) = 0) = 1/2 The proof is by induction: if the ξ1 (n).d. . ξ2 . So our assumption really means: ASSUMPTION ⇐⇒ the expected time between successive bus arrivals is ﬁnite. But to even think about it will give you some strength. . So it goes beyond our theory.. ξ2 (n) = 1 − ε1 . . Here ξ(n) = (ξ1 (n). . . are i. . Note that there is nothing random in the transition mechanism of this chain! The only way to instill randomness is by choosing the initial state at random. with P (ξr (n) = 1) = P (ξr (n) = 0) = 1/2 then. . So. . the same is true at n + 1. if ξ1 = 1 then shift to the left and ﬂip all bits to obtain (1 − ξ2 . and any ε1 .) is the state at time n. . Example: bread kneading.i. 1] (think of [0. i. . . . .e.). with P (ξr (0) = 1) = P (ξr (0) = 0) = 1/2 then the same will be true for all n. . . 1} for all i. . . i. 1] as a ﬂexible bread dough arranged in a long rod) kneads the bread for it stretches it till it doubles in length and then folds it back in the middle. . what we have really shown is that this assumption is a necessary and suﬃcient condition for the existence and uniqueness of a stationary distribution. from (5). Remarks: This is not a Markov chain with countably many states. As for the name of the chain (bread kneading) it can be justiﬁed as follows: to each ξ there corresponds a number x between 0 and 1 because ξ can be seen as the binary representation of the real number x. i≥1 j≥i j≥1 i≥1 j≥1 and the latter sum is the expected time between two successive bus arrivals. . 1 − ξ3 .. .. . . If ξ1 = 0 then x < 1/2 and our shifting rule maps x into 2x. . We have thus deﬁned a mapping ϕ from binary vectors into binary vectors. So let us do so. ξ2 (n). . . ξi ∈ {0.d. 1}. . P (ξ1 (n + 1) = ε1 . If ξ1 = 1 then x ≥ 1/2. . . 2(1 − x)).) Consider an inﬁnite binary vector ξ = (ξ1 . The Markov chain is obtained by applying the mapping ϕ again and again. . εr ∈ {0. . 2. ξ(n + 1) = ϕ(ξ(n)). . ξr+1 (n) = 1 − εr ). Hence choosing the initial state at random in this manner results in a stationary Markov chain. ξ2 (n) = ε1 . You can check that . 2. ξ3 . But the function f (x) = min(2x. ξr+1 (n) = εr ) + P (ξ1 (n) = 1.i. We can show that if we start with the ξr (0). ξr (n + 1) = εr ) = P (ξ1 (n) = 0. . . and transform it according to the following rule: if ξ1 = 0 then shift to the left to obtain (ξ2 . ξ2 (n). . . applied to the interval [0.).* (This is slightly beyond the scope of these lectures and can be omitted.). any r = 1. . for any n = 0.

But we only have d unknowns.1 Finding a stationary distribution As explained above. π(j) = 1. Fix a set A ⊂ S and write π(i) = j∈S π(j)pj. Ac ) := π(i)pi. Conversely. Proof.i Multiply π(i) by j∈S pi. If the state space has d states. because the ﬁrst d equations are linearly dependent: the sum of both sides equals 1. π(i) = j∈S (balance equations) (normalisation condition). Ac ) = F (Ac .j . We have: Proposition 1. we introduce more. expecting to be able to choose. Instead of eliminating equations. If the latter condition holds.j as a “current” ﬂowing from i to j.i = j∈A π(j)pj. then the above are d + 1 equations.j = j∈A π(j)pj. then choose A = {i} and see that you obtain the balance equations. therefore one of them is always redundant. The convenience is often suggested by the topological structure of the chain. a set of d linearly independent equations which can be more easily solved. So F (A. Ac ) is the total current from A to Ac . π is a stationary distribution if and only if π1 = 1 and F (A. for all A ⊂ S. i∈A j∈Ac Think of π(i)pi.i + j∈Ac π(j)pj. When a Markov chain is stationary we refer to it as being in 4. Writing the equations in this form is not always the best thing to do. as we will see in examples.i 13 . A). suppose that the balance equations hold. Let us introduce the notion of “probability flow” from a set A of states to its complement under a distribution π: F (A.j (which equals 1): π(i) j∈S pi. amongst them.i . a stationary distribution π satisﬁes π = πP π1 = 1 In other words.i + j∈Ac π(j)pj. j∈S i ∈ S. π(j)pj.Note on terminology: steady-state. This is OK.

048. That is.i This is true for all i ∈ S.04 room 1. p21 = β.Now split the left sum also: π(i)pi.. α+β π(2) = α . where p + q = 1: p p p 0 q 1 q 2 q 3 q 4 . Summing over i ∈ A.952.04 π(2) = 0.05. p11 = 1 − α. p q Consider the following chain. Thus. here is an example: Example: Consider the 2-state Markov chain with p12 = α. A) Schematic representation of the ﬂow balance relations. in steady state. gives us ﬂexibility. We have π(1)α = π(2)β. Instead. the mouse spends only 95. A c F(A . (Consequently. β < 1. we ﬁnd π(1) = 0.i + j∈Ac π(j)pj. . which immediately tells us that π(1) is proportional to β and π(2) proportional to α: π(1) = cβ. but we shall not do this here. 14 i = 1. One can ask how to choose a minimal complete set of linearly independent equations. If we take α = 0. β = 0. Example: a walk with a barrier. . . 2.j + j∈A j∈Ac π(i)pi. The balance equations are: π(i) = pπ(i − 1) + qπ(i + 1). The extra degree of freedom provided by the last result. The constant c is found from π(1) + π(2) = 1. p22 = 1 − β. A ) A c c F( A .99 ≈ 0. .j = j∈A π(j)pj. 1. π(1) = β . and 4.) Assume 0 < α. while the remaining two terms give the equality we seek.2% of the time in 1.05 ≈ 0.. 3. we see that the ﬁrst term on the left cancels with the ﬁrst on the right.8% of the time in room 2.99 (as in the mouse example). α+β π(2) = cα.

If i ≥ 1 we have F (A.e. . If i = j there is nothing to prove. (iii) Pi (Xn = j) > 0 for some n ≥ 0. . We can frequently (but not always) draw the graph (as we did in examples) and visualise the set E as the set of all edges with arrows. The following are equivalent: (i) i j i i j) ⇐⇒ Pi (∃n ≥ 0 Xn = j) > 0.j > 0. a pair (S.) 5 5. For i ≥ 1. Does this provide a stationary distribution? Yes. i leads to j (written as i Notice that this relation is reflexive): ∀i ∈ S Furthermore: Theorem 1.2 The relation “leads to” We say that a state i leads to state j if. . . 5. . n≥0 . A). 1. and E a set of directed edges. j) ∈ E ⇐⇒ pi.e. In graph-theoretic terminology. This is often cast in terms of a digraph (directed graph). Proof. (If p ≥ 1/2 there is no stationary distribution.j > 0 for some n ∈ N and some states i1 . In fact.i1 pi1 . starting from i. Ac ) = π(i − 1)p = π(i)q = F (Ac . These are much simpler than the previous ones. π(i) = (p/q)i π(0). • Suppose that i j. (ii) pi.1 Topological structure The graph of a chain This refers to the structure of the process that does not depend on the exact values of the transition probabilities but only on which of them are positive. i. the chain will visit j at some ﬁnite time. S is a set of vertices. i. we can solve them immediately. Since 0 < Pi (∃n ≥ 0 Xn = j) ≤ 15 Pi (Xn = j). in−1 . This is i=0 possible if and only if p < q. p < 1/2. E) where S is the state space and E ⊂ S × S deﬁned by (i. provided that ∞ π(i) = 1.But we can make them simpler by choosing A = {0. In other words. So assume i = j. .i2 · · · pin−1 . . . i − 1}.

3} is not closed. However. and so (iii) holds. {4. • Suppose that (ii) holds. By assumption. Equivalence relation means that it is symmetric. meaning that (ii) holds. In other words. 16 . it partitions S into equivalence classes known as communicating classes.3 The relation “communicates with” Deﬁne next the relation i communicates with j (written as i Obviously. 5}. by deﬁnition.i2 · · · pin−1 . (iii) holds. i}. Since Pi (Xn = j) > 0. 3}. the set of all states that communicate with i: [i] := {j ∈ S : j So.j .. The ﬁrst two classes diﬀer from the last two in character. (i) holds. we see that there are four communicating classes: {1}. It is reﬂexive and transitive because is so. Corollary 1.. The class {4. We have Pi (Xn = j) = i1 .. In addition. Proof. 5} is closed: if the chain goes in there then it never moves out of it. {2. (6) j. j) ⇐⇒ i j and j i. Two communicating classes are either 1 2 3 4 5 6 In the example of the ﬁgure..i1 pi1 . {6}. the sum contains a nonzero term. Just as any equivalence relation. 5. The communicating class corresponding to the state i is. because Pi (Xn = j) ≤ Pi (∃m ≥ 0 Xm = j). by deﬁnition. where the sum extends over all possible choices of states in S.the sum on the right must have a nonzero term. reﬂexive and transitive. We just observed it is symmetric. The relation is transitive: If i j and j k then i k. this is symmetric (i Corollary 2.in−1 pi. Look at the last display again. The relation j ⇐⇒ j i). • Suppose now that (iii) holds. is an equivalence relation. the class {2. [i] = [j] if and only if i identical or completely disjoint. we have that the sum is also positive and so one of its terms is positive. Hence Pi (Xn = j) > 0.

the state is inessential. Why? A state i is essential if it belongs to a closed communicating class. A general Markov chain can have an inﬁnite number of communicating classes. A communicating class C is closed if and only if the following holds: If i ∈ C and if i j then j ∈ C. We then can ﬁnd states i1 .i1 pi1 . C2 . Suppose that i j. Note: An inessential state i is such that there exists a state j such that i j but j i. starting from any state. . If all states communicate with all others. C5 in the ﬁgure above are closed communicating classes. j∈C for all i ∈ C. parts. Similarly. Suppose ﬁrst that C is a closed communicating class. the state space S is a closed communicating class.i = 1 then we say that i is an absorbing state. we say that a set of states C ⊂ S is closed if pi.More generally. Note: If we consider a Markov chain with transition matrix P and ﬁx n ∈ N then the Markov chain with transition matrix Pn has exactly the same closed communicating classes. the single-element set {i} is a closed communicating class. more precisely. pi.i2 > 0. more manageable. . C1 C2 C3 C4 C5 The general structure of a Markov chain is indicated in this ﬁgure. and so i2 ∈ C. In graph terms. we can reach any other state. in−1 such that pi. If we remove the non-closed communicating classes then we obtain a collection of disjoint closed communicating classes with no arrows between them. The converse is left as an exercise. Note that there can be no arrows between closed communicating classes. its transition matrix P) is called irreducible. . Closed communicating classes are particularly important because they decompose the chain into smaller. Lemma 3. To put it otherwise.j > 0. 17 . Thus. Otherwise. we conclude that j ∈ C. and C is closed. we have i1 ∈ C.j = 1. and so on. pi1 . Since i ∈ C. . There can be arrows between two nonclosed communicating classes but they are always in the same direction. The communicating classes C1 .i2 · · · pin−1 . Proof. If a state i is such that pi. The classes C4 . C3 are not closed. In other words still. then the chain (or.i1 > 0.

Let us denote by pij the entries of Pn . that we can further simplify the behaviour of the chain by noticing that if the chain is in the set C1 = {1. we have So pii (2n) (n) pij ≥ pii (n) 2 (m+n) ≥ pik pkj . A state i is called aperiodic if d(i) = 1. n ≥ 0. . The period of an inessential state is not deﬁned. Therefore if pii > 0 then pii (n) (ℓn) > 0 for all ℓ ∈ N. Consider two distinct essential states i. (Can you ﬁnd an example of a chain with period 3?) We will now give the deﬁnition of the period of a state of an arbitrary chain. d(j) = gcd Dj . and. Since Pm+n = Pm Pn .4 Period Consider the following chain: 1 2 4 3 The chain is irreducible. pii (ℓn) ≥ pii (n) ℓ . d(i) = gcd Di . (m) (n) for all m. i. whether it is possible to return to i in m steps can be decided by one of the integer divisors of m. all states form a single closed communicating class. So if the chain starts with X0 ∈ C1 then we know that X1 ∈ C2 . Notice however. X3 ∈ C2 . (n) Proof. If i j then d(i) = d(j). Let Di = {n ∈ N : pii > 0}. j ∈ S. i. 3} at some point of time then. Dj = {n ∈ (n) (α) N : pjj > 0}. . .e. it will move to the set C2 = {2. If i j there is α ∈ N with pij > 0 and if 18 .5. X4 ∈ C1 . X2 ∈ C1 . j. ℓ ∈ N. We say that the chain has period 2. 4} at the next step and vice versa. Theorem 2 (the period is a class property). . So. more generally. Another way to say this is: (n) (m) If the integer n divides m (denote this by: n | m) and if pii > 0 then pii > 0. The chain alternates between the two sets. The period of an essential state is deﬁned as the greatest common divisor of all natural numbers n with such that it is possible to return to i in n steps: d(i) := gcd{n ∈ N : Pi (Xn = i) > 0} .

showing that α + n + β ∈ Dj . Theorem 3.) 19 . n1 ∈ Di such that n2 − n1 = 1. we have n ∈ Di as well. So d(j) is a divisor of all elements of Di . 1. we obtain d(i) ≤ d(j) as well. . showing that α + β ∈ Dj . Therefore n = qn1 + r(n2 − n1 ) = (q − r)n1 + rn2 . (Here Cd := C0 . Since n1 . this is the content of the following theorem. In particular. So pjj ≥ pij pji > 0. Corollary 3. . More generally. If i is an aperiodic state then there exists n0 such that pii > 0 for all n ≥ n0 . (n) (n) (α+n+β) (α) (n) (β) (β) (α+β) (α) (β) This ﬁgure shows the internal structure of a closed communicating class with period d = 4. Of particular interest. Since d(i) is the greatest common divisor we have d(i) ≥ d(j) (in fact. the chain is called aperiodic. Theorem 4. An irreducible chain with ﬁnitely many states is aperiodic if and only if there (n) exists an n such that pij > 0 for all i. n2 ∈ Di . if d = 1. . C1 . r = 0. Formally. An irreducible chain has period d if one (and hence all) of the states have period d. . Pick n2 . i. Arguing symmetrically. as deﬁned earlier. Cd−1 such that pij = 1. . Then we can uniquely partition the state space S into d disjoint subsets C0 . . . Therefore d(j) | α + β. d − 1. Proof. all states communicate with one another. C1 . if an irreducible chain has period d then we can decompose the state space into d sets C0 . Divide n by n1 to obtain n = qn1 + r. where the remainder r ≤ n1 − 1. j ∈ S. From the last two displays we get d(j) | n for all n ∈ Di . So d(i) = d(j). j∈Cr+1 i ∈ Cr . . Then pii > 0 and so pjj ≥ pij pii pji > 0. Therefore d(j) | α + β + n for all n ∈ Di . Consider an irreducible chain with period d. . . If a number divides two other numbers then it also divides their diﬀerence. Because n is suﬃciently large. Now let n ∈ Di . q − r > 0.j i there is β ∈ N with pji > 0. are irreducible chains.e. Cd−1 such that the chain moves cyclically between them. chains where. d(j) | d(i)). Let n be suﬃciently large. . .

after it takes one step from its current position. Cd−1 . that is why we call it “first-step analysis”. then. i ∈ C0 . As an example.e the set of states j such that i j). j ∈ C2 . i. We deﬁne the hitting time4 of a set of states A by TA := inf{n ≥ 0 : Xn ∈ A}. Let C1 be the equivalence class of i1 ... Hence it partitions the state space S into d disjoint equivalence classes. The reader is warned to be alert as to which of the two variables we are considering at each time. Continue in this manner and deﬁne states i0 . say.id > 0. Suppose pij > 0 but j ∈ C1 but. Then pick i1 such that pi0 i1 > 0. for example. It is easy to see that if we continue and pick a further state id with pid−1 . which contradicts the deﬁnition of d. and by the assumptions that d d j. If X0 ∈ A the two times coincide.. .. . We show that these classes satisfy what we need. necessarily.Proof. Assume d > 1. Pick a state i0 and let C0 be its equivalence class (i. We avoid excessive notation by using the same letter for both. i1 . j belongs to the next class. We are interested in deriving formulae for the probability that this time is ﬁnite as well as the expectation of this time. 20 . id−1 d 6 Hitting times and ﬁrst-step analysis Consider a Markov chain with transition probability matrix P. consider the chain 1 1/3 1 0 4 1/2 1 1/6 2 We shall later consider the time inf{n ≥ 1 : Xn ∈ A} which diﬀers from TA simply by considering n ≥ 1 instead of n ≥ 0. ... We now show that if i belongs to one of these classes and if pij > 0 then.. Take. We use a method that is based upon considering what the Markov chain does at time 1. The existence of such a path implies that it is possible to go from i0 to i. C1 . Deﬁne the relation i d j ⇐⇒ pij (nd) > 0 for some n ≥ 0. id ∈ C0 . necessarily. Notice that this is an equivalence relation. . i1 . with corresponding classes C0 . . Consider the path i0 → i → j → i2 → i3 → · · · → id → i0 .e. i2 i0 i0 in a number of steps which is an integer multiple of d − 1 (why?). Such a path is possible because of the choice of the i0 .

conditionally on X1 . which implies that ′ P (TA < ∞|X1 = j) = P (TA < ∞|X0 = j) = Pj (TA < ∞) = ϕ(j). and deﬁne ϕ(i) := Pi (TA < ∞).).e. By self-feeding this equation. If i ∈ A. . if ϕ′ (i) is any other solution of these equations then ϕ′ (i) ≥ ϕ(i) for all i ∈ S. . we have Pi (TA < ∞) = ϕ(j)pij . We ﬁrst have (where TA Pi (TA < ∞) = j∈S ′ Pi (1 + TA < ∞|X1 = j)Pi (X1 = j) = j∈S ′ Pi (TA < ∞|X1 = j)pij ′ But observe that the random variable TA is a function of the future after time 1 (i. let ϕ′ (i) be another solution. Combining the above. because the chain may end up in state 2.It is clear that P1 (T0 < ∞) < 1. i ∈ A. ′ is the remaining time until A is hit). Therefore. Hence: ′ ′ P (TA < ∞|X1 = j. ϕ(i) = j∈S i∈A pij ϕ(j). Furthermore. the event TA is independent of X0 . j∈S as needed. The function ϕ(i) satisﬁes ϕ(i) = 1. Then ϕ′ (i) = j∈A pij + j∈A pij ϕ′ (j). X2 . and so ϕ(i) = 1. But what exactly is this probability equal to? To answer this in its generality. a ′ function of X1 . If i ∈ A then TA = 0. . ﬁx a set of states A. X0 = i) = P (TA < ∞|X1 = j). we have ϕ′ (i) = j∈A pij + j∈A = j∈A pij + j∈A pij pij pjk + k∈A k∈A pjk + j∈A pij pjk ϕ′ (k) k∈A pjk ϕ′ (k) k∈A 21 . So TA = 1 + TA . then TA ≥ 1. We then have: Theorem 5. ′ Proof. But the Markov chain is homogeneous. For the second part of the proof.

because it takes no time to hit A if the chain starts from A. we have 1 1 ϕ(1) = ϕ(1) + ϕ(0). ψ(i) = 1 + i∈A pij ψ(j). Start the chain from i. ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ Next consider the mean time (mean number of steps) until the set A is hit for the ﬁrst time. ′ E i TA = 1 + E i TA = 1 + j∈S pij Ej TA = 1 + j∈A pij ψ(j). the set that contains only state 0. let A = {0. By continuing self-feeding the above equation n times. if the chain starts from 2 it will never hit 0. If ′ i ∈ A. the second equals Pi (TA = 2). 2. i = 0. Therefore. as above. and ϕ(2) = 0. and let ψ(i) = Ei TA . 2}. TA = 0 and so ψ(i) := Ei TA = 0. We immediately have ϕ(0) = 1. we obtain ϕ′ (i) ≥ Pi (TA < ∞) = ϕ(i). By omitting the last term we obtain ϕ′ (i) ≥ Pi (TA ≤ 2). Proof. If i ∈ A then. (If the chain starts from 0 it takes no time to hit 0. The function ψ(i) satisﬁes ψ(i) = 0.) As for ϕ(1). ψ(0) = ψ(2) = 0. Example: Continuing the previous example. Letting n → ∞. 1. we obtain ϕ′ (i) ≥ Pi (TA ≤ n). We let ϕ(i) = Pi (T0 < ∞). 1 ψ(1) = 1 + ψ(1). Furthermore.We recognise that the ﬁrst term equals Pi (TA = 1). Example: In the example of the previous ﬁgure. j∈A i ∈ A. obviously. 3 2 so ϕ(1) = 3/4. if ψ ′ (i) is any other solution of these equations then ψ ′ (i) ≥ ψ(i) for all i ∈ S. 3 22 . ψ(i) := Ei TA . We have: Theorem 6. then TA = 1 + TA . we choose A = {0}. The second part of the proof is omitted. so the ﬁrst two terms together equal Pi (TA ≤ 2). Clearly. On the other hand. which is the second of the equations.

because the underlying experiment is the ﬂipping of a coin with “success” probability 2/3. therefore the mean number of ﬂips until the ﬁrst success is 3/2. We are interested in the mean total reward TA h(x) := Ex n=0 f (Xn ). 23 . B. after one step. ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ As yet another application of the ﬁrst-step analysis method.which gives ψ(1) = 3/2. ′ TA Ex n=0 f (Xn+1 ) = y∈S pxy h(y). X2 . Now the last sum is a function of the future after time 1. ′ TA = 1 + TA . . by the Markov property. consider the following situation: Every time the state is x a reward f (x) is earned. h(x) = f (x). (This should have been obvious from the start. This happens up to the ﬁrst hitting time TA ﬁrst hitting time of the set A. of the random variables X1 . in the last sum. TB for two disjoint sets of states A. as argued earlier. ϕAB is the minimal solution to these equations. ϕAB (i) = 0. because when X0 ∈ A then T0 = 0 and so the total reward is f (X0 ).) ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ Now let us consider two hitting times TA . We may ask to ﬁnd the probabilities ϕAB (i) = Pi (TA < TB ). i. h(x) = Ex n=0 f (Xn ) ′ 1+TA = f (x) + Ex n=1 ′ TA f (Xn ) = f (x) + Ex n=0 f (Xn+1 ). otherwise. if x ∈ A. i∈A Furthermore. Next. ϕAB (i) = j∈A i∈B pij ϕAB (j). until set A is reached. ′ where TA is the remaining time. Clearly. The total reward is f (X0 ) + f (X1 ) + · · · + f (XTA ). Hence. Then ′ 1+TA x ∈ A. we just changed index from n to n + 1. . It should be clear that the equations satisﬁed are: ϕAB (i) = 1.. .e. where.

3. y∈S h(x) = f (x) + x ∈ A. he collects i pounds. unless he is in room 4. if he starts from room 2. in which case he dies immediately. So. 2 2 We solve and ﬁnd h(2) = 8.) 7 Gambler’s ruin Consider the simplest game of chance of tossing an (possibly unfair) coin repeatedly and independently. collecting “rewards”: when in room i. he will die. (Also. h(3) = 7. The successive stakes (Yk . for i = 1. no gambler is assumed to have any information about the outcome of the upcoming toss. why?) If we let h(i) be the average total reward if he starts from room i. are h(x) = 0. the set of equations satisﬁed by h. If ξk is the outcome of the k-th toss (where ξk = +1 means heads and ξk = −1 means tails). your fortune after the n-th toss is n Xn = X0 + k=1 Y k ξk . h(1) = 5. we have h(4) = 0 1 h(1) = 1 + h(2) 2 1 h(3) = 3 + h(2) 2 1 1 h(2) = 2 + h(1) + h(3).Thus. x∈A pxy h(y). (Sooner or later. Let p the probability of heads and q = 1 − p that of tails. You can remember the latter ones by thinking that the expectecd total reward equals the immediate reward f (x) plus the expected remaining reward. if just before the k-th toss you decide to bet Yk pounds then after the realisation of the toss you will have earned Yk pounds if heads came up or lost Yk pounds if tails appeared. until he dies in room 4. Example: A thief enters sneaks in the following house and moves around the rooms. k ∈ N) form the strategy of the gambler. Clearly. so 24 . If heads come up then you win £1 per pound of stake. 1 2 4 3 The question is to compute the average total reward. 2.

a + 1. if p = q = 1/2 b−a Proof. . The general solution will then obtained from Px (Tb < Ta ) = ϕ(x − a. We obviously have ϕ(0) = 0. in the ﬁrst case. . Yk cannot depend on ξk . betting £1 at each integer time against a coin which has P (heads) = p. expectation) under the initial condition X0 = x. . P (tails) = 1 − p. b are absorbing. pi. We ﬁrst solve the problem when a = 0.if the gambler wants to have a strategy at all. Consider a gambler playing a simple coin tossing game. The diﬀerence between the gambler’s ruin problem for a company and that for a mere mortal is that. Consider the hitting time of state y: Ty := inf{n ≥ 0 : Xn = y}. x − a . Theorem 7. . and where the states a. . is our next goal. ℓ). b − 1. The gambler starts with X0 = x pounds and has a target: To make b pounds (b > x). if p = q Px (Tb < Ta ) = . We wish to consider the problem of ﬁnding the probability that the gambler will never lose her money.i+1 = p. write ϕ(x) instead of ϕ(x. astrological nature. This has profound actuarial applications: An insurance company must make sure that the strategies it employs are such that–at least–the company will not go bankrupt (and. b − a). ℓ) := Px (Tℓ < T0 ). (Think why!) To simplify things. Ex ) for the probability (respectively. To solve the problem.i−1 = q = 1 − p. She will stop playing if her money fall below a (a < x). in simplest case Yk = 1 for all k. Let us consider a concrete problem. she must base it on whatever outcomes have appeared thus far. a < i < b. 25 ϕ(ℓ) = 1. 0 ≤ x ≤ ℓ. in addition. . as described above. Let £x be the initial fortune of the gambler. it will make enormous proﬁts). a ≤ x ≤ b. b} with transition probabilities pi.g. a + 2. . It can only depend on ξ1 . Then the probability that the gambler’s fortune will reach level b > x before reaching level a < x equals: x−a − 1 (q/p) (q/p)b−a − 1 . In other words. We are clearly dealing with a Markov chain in the state space {a. . the company has control over the probability p. We use the notation Px (respectively. ξk−1 and–as is often the case in practice–on other considerations of e. b = ℓ > 0 and let ϕ(x. .

then λ = q/p > 1. λ−1 ϕ(x) = If λ = 1 then q = p and so ϕ(x + 1) − ϕ(x) = ϕ(x) − ϕ(x − 1). λℓ −1 x ℓ. Assuming that λ := q/p = 1. ℓ→∞ λℓ − 1 x = 1 − 0 = 1. Since p + q = 1. This gives δ(x) = δ(0) for all x and ϕ(x) = xδ(0). ℓ Summarising. We have Px (T0 < ∞) = lim Px (T0 < Tℓ ) = 1 − lim Px (Tℓ < T0 ). we ﬁnd δ(0) = 1/ℓ and so x ϕ(x) = . Px (T0 < ∞) = 1 − lim 26 . Since ϕ(ℓ) = 1. we can write this as p[ϕ(x + 1) − ϕ(x)] = q[ϕ(x) − ϕ(x − 1)]. if p = q. ℓ→∞ λℓ − 1 0−1 λx − 1 = 1 − 0 = 1. ϕ(x) = pϕ(x + 1) + qϕ(x − 1). ℓ→∞ ℓ Px (T0 < ∞) = 1 − lim Finally. If p > 1/2.and. We ﬁnally ﬁnd λx − 1 . 1. we ﬁnd ϕ(x) = δ(x − 1) + δ(x − 2) + · · · + δ(0) = [λx−1 + · · · + λ + 1]δ(0) = Since ϕ(ℓ) = 1. and so λx − 1 λx − 1 =1− = λx . We have δ(x) = q δ(x − 1) = p q p 2 0 < x < ℓ. we have obtained ϕ(x. Let δ(x) := ϕ(x + 1) − ϕ(x). ℓ) = λx −1 . The general case is obtained by replacing ℓ by b − a and x by x − a. δ(x − 2) = · · · = q p x δ(0). Corollary 4. λ = 1. we have λℓ −1 λ−1 δ(0) = 1 which gives δ(0) = (λ − 1)/(λℓ − 1). λx − 1 δ(0). λ = 1. 0 ≤ x ≤ ℓ. then λ = q/p < 1. ℓ→∞ ℓ→∞ (q/p)x . if λ = 1 if λ = 1. if p > 1/2 if p ≤ 1/2. λℓ − 1 0 ≤ x ≤ ℓ. by ﬁrst-step analysis. and so Px (T0 < ∞) = 1 − lim If p < 1/2. The ruin probability of a gambler starting with £x is Px (T0 < ∞) = Proof.

Xn ) and the ordinary splitting property at n. A.j1 pj1 . . Xτ )1(τ < ∞) = n∈Z+ (X0 . A | Xτ . . . Xn+1 .Xτ +2 = j2 . . . . . . . Xτ +2 = j2 . Xn )1(τ = n). Xn ) for all n ∈ Z+ . We will now prove that this remains true if we replace the deterministic n by a random stopping time τ . Xτ ) and has the same law as the original process started from i. n ≥ 0) is independent of the past process (X0 . τ = n)P (Xn = i. . . Our assertions are now easily obtained from this by ﬁrst summing over all n (whence the event {τ < ∞} appears) and then by dividing both sides by P (Xτ = i. considered earlier. including. In particular. (d) where (a) is just logic. . conditional on τ < ∞ and Xτ = i. . possibly. . τ = n) (c) = P (Xn+1 = j1 . . . τ < ∞) = pi.8 Stopping times and the strong Markov property A random time is a random variable taking values in the time-set. . (b) is by the deﬁnition of conditional probability. . . for any n. Xτ ) if τ < ∞. . A. A. . . Xn = i. . τ < ∞) = P (Xn+1 = j1 . . . . Xn+2 = j2 . . . . Then. . An example of a random time which is not a stopping time is the last visit of a speciﬁc set. . . . A ∩ {τ = n} is determined by (X0 . . . . τ < ∞)P (A | Xτ . Xτ +2 = j2 . . the ﬁrst hitting time of the set A. for all n ∈ Z+ . τ = n) = P (Xn+1 = j1 . .j2 · · · We have: P (Xτ +1 = j1 . Xn+2 = j2 . τ < ∞) and that P (Xτ +1 = j1 . n ≥ 0) is a time-homogeneous Markov chain. τ = n). Xτ +2 = j2 . 27 . the value +∞. A.) is independent of the past process (X0 . (c) is due to the fact that A ∩ {τ = n} is determined by (X0 . . . A. . a stopping time τ is a random variable with values in Z+ ∪{+∞} such that 1(τ = n) is a deterministic function of (X0 . and (d) is due to the fact that (Xn ) is time-homogeneous Markov chain with transition probabilities pij . Xτ = i. | Xτ . This stronger property is referred to as the strong Markov property. Proof. . τ < ∞). | Xn = i. . Theorem 8 (strong Markov property). . . . | Xn = i)P (Xn = i. the future process (Xn . | Xτ = i. . . Let A be any event determined by the past (X0 . . P (Xτ +1 = j1 . An example of a stopping time is the time TA . . Suppose (Xn . Xn ). τ = n) = pi. τ = n) (a) (b) = P (Xτ +1 = j1 .j2 · · · P (Xn = i. .j1 pj1 . We are going to show that for any such A. Since (X0 . . any such A must have the property that. . . . . Recall that the Markov property states that. . Xn+2 = j2 . A. . Let τ be a stopping time. Xn ) if we know the present Xn . the future process (Xτ +n . . Let τ be a stopping time.

“chunks”. Xi (1) . The idea is that if we can ﬁnd a state i that is visited again and again. in a generalised sense. . then the behaviour of the chain between successive visits cannot be inﬂuenced by the past or the future (and that is due to the strong Markov property).i. 2. (And we set Ti := Ti . Xi (2) . . However. . random trajectories. The random variables Xn comprising a general Markov chain are not independent. random variables is a (very) particular example of a Markov chain. in fact.d. . and so on. (1) r = 1. Ti Xi (0) (r) ≤ n < Ti (r+1) . by using the strong Markov property. we let Ti be the time of the second (1) (3) visit. r = 2. Formally. . Schematically. They are.. := Xn . as “random variables”. .d. 3.9 Regenerative structure of a Markov chain A sequence of i.i. we will show that we can split the sequence into i.. Note the subtle diﬀerence between this Ti and the Ti of Section 6. (Why?) Consider the “trajectory” of the chain between two successive visits to the state i: Xi Let also We think of (r) := Xn . State i may or may not be visited again. Regardless. These excursions are independent! 11111111111 00000000000 1111111111 0000000000 11111111111 00000000000 11111111111111 00000000000000 1111111111 0000000000 11111111111 00000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 11111111111 00000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 1111111111111111 0000000000000000 11111 11111111111111 00000000000000 1111111111 0000000000 i 00000 11111111111111111111111111111111111111111111 00000000000000000000000000000000000000000000 1111111111111111 0000000000000000 11111 00000 11111 00000 Ti = Ti (1) Ti (2) Ti (3) Ti (4) Fix a state i ∈ S and let Ti be the ﬁrst time that the chain will visit this state: Ti := inf{n ≥ 1 : Xn = i}. Our Ti here is always positive. Xi (0) . . We (r) refer to them as excursions of the process. All these random times are stopping times. If X0 = i then the two deﬁnitions coincide. The Ti of Section 6 was allowed to take the value 0.) We let Ti be the time of the third visit. So Xi is the r-th excursion: the chain visits 28 . we recursively deﬁne Ti (r) (2) := inf{n > Ti (r−1) : Xn = i}. 0 ≤ n < Ti .

the future is independent of the (1) (2) past. possibly. but important corollary of the observation of this section is that: Corollary 5. except. . g(Xi (1) ). . we say that a state i is transient if. . The excursions Xi (0) . . Λ2 . starting from i.state i for the r-th time. 29 . Xi (1) . . Example: Deﬁne Ti (r+1) (0) ). Therefore.i. and this proves the ﬁrst claim. (r) . given the state at (r) (r) (r) the stopping time Ti . the chain returns to i inﬁnitely many times: Pi (Xn = i for inﬁnitely many n) = 1. Xi have the same distribution follows from the fact that the Markov chain is time-homogeneous. Apply Strong Markov Property (SMP) at Ti : SMP says that. but there are always states which will be visited again and again. An immediate consequence of the strong Markov property is: Theorem 9. Perhaps some states (inessential ones) will be visited ﬁnitely many times. . have the same Proof.. by deﬁnition. are independent random variables. the chain returns to i only ﬁnitely many times: Pi (Xn = i for inﬁnitely many n) = 0. Xi (2) . equal to i. they are also identical in law (i. The claim that Xi . of the excursions are Λr := (r) n=Ti 1(Xn = j) The meaning of Λr is this: it expresses the number of visits to a state j between two successive visits to the state i. (Numerical) functions g(Xi independent random variables. if it starts from state i at some point of time. all. Therefore. (0) are independent random variables.). In particular. we “watch” it up until the next time it returns to state i. starting from i. for time-homogeneous Markov chains. We abstract this trivial observation and give a couple of deﬁnitions: First. . and “goes for an excursion”. Second. But (r) the state at Ti is. Xi distribution. at least one state must be visited inﬁnitely many times. 10 Recurrence and transience When a Markov chain has ﬁnitely many states. g(Xi (2) ). we say that a state i is recurrent if. The last corollary ensures us that Λ1 . the future after Ti is independent of the past before Ti . A trivial. Moreover.d. ad inﬁnitum. it will behave in the same way as if it starts from the same state at another point of time..

we have Proof. We summarise this together with another useful characterisation of recurrence & transience in the following two lemmas. there could be yet another possibility: 0 < Pi (Xn = i for inﬁnitely many n) < 1. 2 k Pi (Ji ≥ k) = fii Pi (Ji ≥ k − 1) = fii Pi (Ji ≥ k − 2) = · · · = fii . Notice also that fii := Pi (Ji ≥ 1) = Pi (Ti < ∞). the event {Ji ≥ k} implies that Ti < ∞ and that Ji ≥ k − 1. We will show that such a possibility does not exist. every state is either recurrent or transient. in principle. Suppose that the Markov chain starts from state i. Starting from X0 = i. State i is recurrent. Indeed. and that is why we get the product.e. Ei Ji = ∞. The following are equivalent: 1. This means that the state i is recurrent. Lemma 5.Important note: Notice that the two statements above are not negations of one another. i. 4. We claim that: Lemma 4. Notice that: state i is visited inﬁnitely many times ⇐⇒ Ji = ∞. This means that the state i is transient. 30 . (k−1) Notice that fii = Pi (Ti < ∞) can either be equal to 1 or smaller than 1. 2. Therefore. 3. k ≥ 0. . therefore concluding that every state is either recurrent or transient. we conclude what we asserted earlier. Indeed. Pi (Ji = ∞) = 1. Consider the total number of visits to state i (excluding the possibility that X0 = i): ∞ Ji = n=1 1(Xn = i) = sup{r ≥ 0 : Ti(r) < ∞} . Pi (Ji = ∞) = 1. But the past (k−1) before Ti is independent of the future. the random variable Ji is geometric: k Pi (Ji ≥ k) = fii . fii = Pi (Ti < ∞) = 1. namely. If fii < 1. If fii = 1 we have that Pi (Ji ≥ k) = 1 for all k. Because either fii = 1 or fii < 1. By the strong Markov property at time Ti (k−1) Pi (Ji ≥ k) = Pi (Ji ≥ k − 1)Pi (Ji ≥ 1). then Pi (Ji < ∞) = 1.

Proof. which. State i is transient if. i. since Ji is a geometric random variable. (n) But if a series is ﬁnite then the summands go to 0.e. but the two sequences are related in an exact way: Lemma 7. Ei Ji < ∞. clearly. 10. Clearly. by deﬁnition. by deﬁnition. then. Notice the diﬀerence with pij . we see that its expectation cannot be inﬁnity without Ji itself being inﬁnity with probability one. 3. Proof. If i is a transient state then pii → 0. The equivalence of the ﬁrst two statements was proved above. If i is transient then Ei Ji < ∞. The equivalence of the ﬁrst two statements was proved before the lemma. Proof. From Elementary Probability. n ≥ 1. by the deﬁnition of Ji . (n) pij n = Pi (Xn = j) = m=1 n Pi (Tj = m. 2. Pi (Ji < ∞) = 1. as n → ∞. it is visited ﬁnitely many times with probability 1. this implies that Ei Ji < ∞. it is visited inﬁnitely many times. But ∞ ∞ ∞ (n) Ei Ji = Ei n=1 1(Xn = i) = n=1 Pi (Xn = i) = n=1 (n) pii . 4. fij ≤ pij . State i is recurrent if. This. therefore Pi (Ji < ∞) = 1. the probability to hit state j for the ﬁrst time at time n ≥ 1. State i is transient. as n → ∞. and. by deﬁnition of Ji is equivalent to Pi (Ji = ∞) = 1. fii = Pi (Ti < ∞) < 1. pii → 0.Proof. On the other hand. then. assuming that the chain (n) starts from state i. If Ei Ji = ∞. The following are equivalent: 1. is equivalent to Pi (Ji < ∞) = 1. Ji cannot take value +∞ with positive probability. Corollary 6. Lemma 6. (n) pij n (m) (n−m) (n) (n) = m=1 fij pjj . The latter is the probability to be in state j at time n (not necessarily for the ﬁrst time) starting from i. Hence Ei Ji = ∞ also.e. (n) i. if we assume Ei Ji < ∞. owing to the fact that Ji is a geometric random variable.1 Deﬁne First hitting time decomposition fij := Pi (Tj = n). Xn = j) Pi (Tj = m)Pi (Xn = j | Tj = m) 31 = m=1 .

i. 32 . In other words.d. denoted by i j: it means that. for all states i. So the random variables δj. Corollary 8. . . and gij := Pi (Tj < Ti ) > 0. Now. starting from i. j i. 1. we obtain (after interchanging the order of the two sums in the right hand side) ∞ n=1 (n) pij ∞ (m) fij ∞ n=m (n) (n−m) pjj ∞ (m) (n) (m) = m=1 = (1 + Ej Jj ) m=1 fij = (1 + Ej Jj )Pi (Tj < ∞). Xi . r = 0.) Theorem 10. Indeed. Start with X0 = i. if we let fij := Pi (Tj < ∞) .The ﬁrst term equals fij . .d.i. ∞ Proof. by deﬁnition. ( Remember: another property that was a class property was the property that the state have a certain period. If i is recurrent and if i j then j is recurrent and fij = Pi (Tj < ∞) = 1. j i. Proof.2 Communicating classes and recurrence/transience Recall the deﬁnition of the relation “state i leads to state j”. we have n=1 pjj = Ej Jj < ∞. and since they take the value 1 with positive probability. and so the probability that j will appear in a speciﬁc excursion is positive. and hence pij as n → ∞. Corollary 7. which is ﬁnite. . Hence. inﬁnitely many of them will be 1 (with probability 1). In a communicating class. For the second term. are i. but also fij = 1. If j is a transient state then pij → 0. the probability to go to j in some ﬁnite time is positive. We now show that recurrence is a class property. not only fij > 0. showing that j will appear in inﬁnitely many of the excursions for sure. The last statement is simply an expression in symbols of what we said above. the probability that j will appear in a speciﬁc excursion equals gij = Pi (Tj < Ti ). as n → ∞. . . If j is transient. we have i j ⇐⇒ fij > 0. excursions Xi . Hence.. either all states are recurrent or all states are transient. by summing over n the relation of Lemma 7.r := 1 j appears in excursion Xi (r) . If i is recurrent and i j then there is a positive probability (0) (1) (= fij ) that j will appear in one of the i. In particular. 10. we use the Strong Markov Property: (n−m) Pi (Xn = j | Tj = m) = Pi (Xn = j | Xm = j) = pjj .

e. . Xn = i for inﬁnitely many n) = 0. all other states are also transient. Proof. n+1 33 = 0 . Hence all states are recurrent. . . appear in a certain game. Example: Let U0 . Hence 11 Positive recurrence Recall that if we ﬁx a state i and let Ti be the ﬁrst return time to i then either Pi (Ti < ∞) = 1 (in which case we say that i is recurrent) or Pi (Ti < ∞) < 1 (transient state). 0 < Pi (A) = Pi (A ∩ B) + Pi (A ∩ B c ) = Pi (A ∩ B c ) ≤ Pi (B c ).Proof. . Thus T represents the number of variables we need to draw to get a value larger than the initial one. U2 . . By closedness. Un ≤ x | U0 = x)dx xn dx = 1 . for example. Pi (B) = 0. Proof.d. If a communicating class contains a state which is transient then. every other j is also recurrent. but Pi (A ∩ B) = Pi (Xm = j. random variables. We now take a closer look at recurrence. and hence. j but j i. necessarily. Theorem 12. . P (B) = Pi (Xn = i for inﬁnitely many n) < 1. . uniformly distributed in the unit interval [0. if started in C. X remains in C. Recall that a ﬁnite random variable may not necessarily have ﬁnite expectation. By ﬁniteness. Let T := inf{n ≥ 1 : Un > U0 }.i. and so i is a transient state. . . 1]. What is the expectation of T ? First. U1 . observe that T is ﬁnite. Hence. some state i must be visited inﬁnitely many times. Let C be the class. If i is recurrent then every other j in the same communicating class satisﬁes i j. Theorem 11. i.e. there is a state j such that i there is m such that Pi (A) = Pi (Xm = j) > 0. But then. . i. by the above theorem. Indeed. . So if a communicating class is recurrent then it contains no inessential states and so it is closed. Un ≤ U0 ) 1 = 0 1 P (U1 ≤ x. be i. Suppose that i is inessential. Every ﬁnite closed communicating class is recurrent. If i is inessential then it is transient. P (T > n) = P (U1 ≤ U0 . It should be clear that this can. This state is recurrent.

P (T < ∞) = lim P (T ≤ n) = lim (1 − n→∞ ∞ n→∞ 1 ) = 1. Then P Z1 + · · · + Zn =∞ n→∞ n lim = 1. n+1 But ET = ∞ P (T > n) = n=1 n=1 1 = ∞. Before doing that. To our rescue comes the regenerative structure of a Markov chain identiﬁed at an earlier section. It is not a Law. . on the average. (i) Suppose E|Z1 | < ∞ and let µ = EZ1 . 0) = ∞. . Theorem 13. Z2 . of successive states is not an independent sequence. To apply the law of large numbers we must create some kind of independence.d. Then P Z1 + · · · + Zn =µ n→∞ n lim = 1. random variables. the sequence X0 . . Let Z1 . but E min(Z1 . We say that i is null recurrent if Ei Ti = ∞. (ii) Suppose E max(Z1 . an inﬁnite number of steps to exceed it! We now classify a recurrent state i as follows: We say that i is positive recurrent if Ei Ti < ∞. We state it here without proof. be i. n+1 You are invited to think about the meaning of this: If the initial value U0 is unknown. 0) > −∞. X2 . . arguably. we shall discover what the form of this stationary distribution is by using the Law of Large Numbers 12 Law (=Theorem) of Large Numbers in Probability Theory The following result is. Our goal now is to show that a chain possessing a positive recurrent state also possesses a stationary distribution. It is traditional to refer to it as “Law”. 34 . When we have a Markov chain.i.Therefore. X1 . it is a Theorem that can be derived from the axioms of Probability Theory. But this is misleading. . then it takes. . 13 Law of Large Numbers for Markov chains The law of large numbers stated above requires independence. the Fundamental Theorem of Probability Theory.

k=0 which represents the ‘time-average reward’ received. Example 2: Let f (x) be as in example 1. because the excursions Xa . In particular. in this example.d. . assign a reward f (x). we have that G(Xa(1) ). Xa . are i. The ﬁnal equality in the last display is merely our notation for the expectation of a random variable conditional on X0 = a. deﬁne G(X ) to be the maximum reward received during this excursion. . and because if we start with X0 = a.2 Average reward Consider now a function f : S → R. . (1) (2) (1) (n) (8) 13.i. but deﬁne G(X ) to be the total reward received over the excursion X . . We are interested in the existence of the limit time-average reward = lim 1 t→∞ t t f (Xk ). (r) (r+1) G(Xa(r) ) = max{f (Xt ) : Ta ≤ t < Ta }.d. G(Xa(r) ) = (r) (r+1) Ta ≤t<Ta f (Xt ). we can think of as a reward received when the chain is in state x. . for an excursion X . G(Xa(3) ).13. are i. . . as in Example 2. we shall assume that f is bounded and shall examine those conditions which yield a nontrivial limit. with probability 1. random variables. Xa(2) . Speciﬁcally. Then. for reasonable choices of the functions G taking values in R. which. G(Xa(2) ). Since functions of independent random variables are also independent.i.1 Functions of excursions Namely. forms an independent sequence of (generalised) random variables. the (0) initial excursion Xa also has the same law as the rest of them. . Thus. (7) Whichever the choice of G.. 35 . Xa(3) . . given a recurrent state a ∈ S (which will be referred to as the ground state for no good reason other than that it has been ﬁxed once and for all) the sequence of excursions Xa(1) . Example 1: To each state x ∈ S. Note that EG(Xa(1) ) = EG(Xa(2) ) = · · · = E G(Xa(0) ) | X0 = a = Ea G(Xa(0) ). n as n → ∞. the law of large numbers tells us that G(Xa ) + · · · + G(Xa ) → EG(Xa(1) ).

and so on. t as t → ∞. say. Nt . We shall present the proof in a way that the constant f will reveal itself–it will be discovered. Ea f= n=0 f (Xn ) E a Ta = x∈S f (x)π(x).Theorem 14.e. say |f (x)| ≤ B. and Glast is the reward over the last incomplete cycle. Then. The idea is this: Suppose that t is large enough. Suppose that a Markov chain possesses a positive recurrent state a ∈ S (i. For example. Proof. the ‘time-average reward’ f exists and is a deterministic quantity: P where Ta −1 1 t→∞ t lim t f (Xn ) = f n=0 =1 . and so 1 ﬁrst G → 0. Gr is given by (7). with probability one. We are looking for the total reward received between 0 and t. Hence |Gﬁrst | ≤ BTa . Nt = 5. plus the total reward received over the next excursion. Since P (Ta < ∞) = 1. We break the sum into the total reward received over the initial excursion. A similar argument shows that 1 last G → 0. with probability one. Thus we need to keep track of the number. 0 Ta (1) Ta (2) T (3) a Ta(4) Ta (5) t We write the total reward as a sum of rewards over complete excursions plus a ﬁrst and a last term: t Nt −1 r=1 (1) f (Xn ) = n=0 Gr + Gﬁrst + Glast . t t as t → ∞. Gﬁrst is the reward up to time Ta = Ta or t (whichever t is smaller). t t In other words. t t 36 . Let f : S → R+ be a bounded ‘reward’ function. Ea Ta < ∞) and assume that the distribution of X0 is such that P (Ta < ∞) = 1. (r) Nt := max{r ≥ 1 : Ta ≤ t}. in the ﬁgure below. We assumed that f is t bounded. of complete excursions from the ground state a that occur between 0 and t. we have BTa /t → 0.

Write now 1 t Nt −1 Gr = r=1 Nt 1 t Nt Nt −1 Gr .d. r=1 (10) Now look at the sequence r (r) (1) Ta = Ta + k=1 (k) (k−1) ) (Ta − Ta (1) (k) (k−1) and see what the Law of Large Numbers again.i. are i. r=1 (9) Look again what the Law of Large Numbers (8) tells us: that 1 n→∞ n lim (r) n Gr = EG1 . Since Ta → ∞. 1 Nt = .also. Hence. and since Ta − Ta k = 1. E a Ta 37 . it follows that the number Nt of complete cycles before time t will also be tending to ∞. . So we are left with the sum of the rewards over complete cycles. Hence Ta t t Ta t ≤ < Nt Nt Nt But (11) tells us that Ta t Ta t = lim t→∞ Nt t→∞ Nt lim Hence lim (N ) (N +1) (N ) (N +1) . Since Ta /r → 0. we have Ta r→∞ r lim (r) . = E a Ta . . lim t∞ 1 Nt Nt −1 Gr = EG1 . t→∞ t E a Ta (12) Using (10) and (12) in (9) we have 1 lim t→∞ t Thus 1 t→∞ t lim t Nt −1 Gr = r=1 EG1 .. (11) By deﬁnition. = E a Ta . as r → ∞. we have that the visit with index r = Nt to the ground state a is the last before t (see also ﬁgure above): (N (N Ta t ) ≤ t < Ta t +1) . random variables with common expectation Ea Ta . 2. r=1 with probability one. as t → ∞. E a Ta f (Xn ) = n=0 EG1 =: f . .

Applying formula (14) with b in place of a we ﬁnd that the limit in (13) also equals 1/Eb Tb . it takes Ea Ta units of time between successive visits to the state a. then the long-run proportion of visits to the state a should equal 1/Ea Ta . The equality of these two limits gives: average no. from the Strong Markov Property. a. for all b ∈ S.To see that f is as claimed. f(X2 ) f(X1 ) f(X )= f(a) 0 f(T −1) a 0 T a Comment 2: A very important particular case occurs when we choose f (x) := 1(x = b). Note that we include only one endpoint of the excursion. Comment 4: Suppose that two states. The numerator of f is computed by summing up the rewards over the excursion (making sure that not both endpoints are included). b. (13) 1(Xn = b) = n=1 Comment 3: If in the above we let b = a. Ta −1 k=0 1 lim t→∞ t with probability 1. The theorem above then says that. we let f to be 1 at the state b and 0 everywhere else. We can use b as a ground state instead of a. The constant f is a sort of average over an excursion. which communicate and are positive recurrent. The physical interpretation of the latter should be clear: If. just observe that EG1 = E (1) (1+1) Ta ≤t<Ta f (Xt ) = Ea 0≤t<Ta f (Xt ). Comment 1: It is important to understand what the formula says. E a Ta (14) with probability 1. on the average. t Ea 1(Xk = b) E a Ta . E b Tb 38 . then we see that the numerator equals 1. Hence t→∞ lim 1 t t 1(Xn = a) = n=1 1 . of times state b is visited = Ea between two successive visits to state a Ta −1 1(Xk = b) = k=0 E a Ta . The denominator is the duration of the excursion. the ground state. i.e. not both. We may write EG1 = Ea 0≤t<Ta f (Xt ) or EG1 = Ea 0<t≤Ta f (Xt ).

. X3 . we saw (Comment 2 above) that if a is a positive recurrent state then ν [a] (x) E a Ta is the long-run fraction of times that state x is visited. x ∈ S. . Since a = XT a .e. Fix a ground state a ∈ S and assume that it is recurrent5 . . we will soon realise that this dependence vanishes when the chain is irreducible. (15) should play an important role. Both of them equal 1. Then ν [a] satisﬁes the balance equations: ν [a] = ν [a] P. x ∈ S. i. we can write ν (x) = Ea k=1 [a] Ta 1(Xn = x). so there is nothing lost in doing so. (16) This was a trivial but important step: We replaced the n = 0 term with the n = Ta term. In view of this. we have 0 < ν [a] (x) < ∞. Consider the function ν [a] deﬁned in (15).14 Construction of a stationary distribution Our discussions on the law of large numbers for Markov chains revealed that the quantity Ta −1 ν (x) := Ea n=0 [a] 1(Xn = x) . for any state b such that a → x (a leads to x). Moreover. Clearly. this should have something to do with the concept of stationary distribution discussed a few sections ago. Hence the superscript [a] will soon be dropped. where a is the ﬁxed recurrent ground state whose existence was assumed. 5 We stress that we do not need the stronger assumption of positive recurrence here. The real reason for doing so is because we wanted to replace a function of X0 . X2 . notice that the choice of notation ν [a] is justiﬁable through the notation (6) for the communicating class of the state a. Proposition 2. Proof. by a function of X1 . 39 . Note: Although ν [a] depends on the choice of the ‘ground’ state a. . We start the Markov chain with X0 = a. ν [a] (x) = y∈S ν [a] (y)pyx . . X1 . Indeed. Recurrence tells us that Ta < ∞ with probability one. X2 . .

so there exists n ≥ 1. (17) This π is a stationary distribution. Hence ν [a] (x) ≥ ν [a] (a)p(n) = p(n) > 0. which. such that pxa > 0. from (16). We now have: Corollary 9. n ≤ Ta ) Ta = n=1 y∈S ∞ = n=1 y∈S = y∈S pyx Ea n=1 [a] 1(Xn−1 = y) = y∈S pyx ν (y). x ∈ S. n ≤ Ta ) pyx Pa (Xn−1 = y. by deﬁnition. Ta Ta (m) ν [a] (x) = Ea x∈S n=1 x∈S 1(Xn = x) = Ea n=1 1 = E a Ta . for all n ∈ N. that a is positive recurrent. for all x ∈ S.and thus be in a position to apply ﬁrst-step analysis. ν [a] (x) < ∞. namely. where. which is precisely what we do next: ∞ ν (x) = Ea n=1 ∞ [a] 1(Xn = x. we used (16). n ≤ Ta ) Pa (Xn = x. ax ax From Theorem 10. so there exists m ≥ 1. n ≤ Ta ) = n=1 ∞ Pa (Xn = x. Hence 1 = ν [a] (a) ≥ ν [a] (x)p(m) > ν [a] (x). such that pax > 0. 40 . xa so. (n) Next. Therefore. which are precisely the balance equations. indeed. let x be such that a x. Note: The last result was shown under the assumption that a is a recurrent state. We are now going to assume more. Xn−1 = y. Suppose that the Markov chain possesses a positive recurrent state a. is ﬁnite when a is positive recurrent. in this last step. We just showed that ν [a] = ν [a] P. ν [a] = ν [a] Pn . x a. So we have shown ν [a] (x) = y∈S pyx ν [a] (y). Deﬁne Ta −1 ν [a] (x) = π (x) := E a Ta [a] Ea n=0 1(Xn = x) E a Ta . First note that.

Suppose that i recurrent. that it sums up to one. and so π [a] is not trivially equal to zero. then the r-th excursion does not contain state j. . to decide whether j will occur in a speciﬁc excursion. that ν [a] P = ν [a] . then j occurs in the r-th excursion. no matter which one.e. in general. But π [a] is simply a multiple of ν [a] . r = 0. Start with X0 = i. π [a] P = π [a] . It only remains to show that π [a] is a probability. . R2 ) be the index of the ﬁrst (respectively. then we immediately have a stationary (probability) distribution! We DID NOT claim that this distribution is. and the probability gij = Pi (Tj < ∞) that j occurs in a speciﬁc excursion is positive. In words. π(y) = 1 y∈S x∈S have at least one solution. we have Ea Ta < ∞.In other words: If there is a positive recurrent state. positive recurrent state. since the excursions are independent. we toss a coin with probability of heads gij > 0. 1.i. if tails show up. Proof. Consider the excursions Xi . Hence j will occur in inﬁnitely many excursions. 2. Therefore. We already know that j is recurrent. We have already shown. This is what we will try to do in the sequel. then the set of linear equations π(x) = y∈S π(y)pyx . Otherwise. Let R1 (respectively. Then j is positive Proof. . The coins are i. Also. Theorem 15. But π [a] (x) = x∈S 1 E a Ta ν [a] (x) = 1. x∈S Pause for thought: We have achieved something VERY IMPORTANT: if we have one.d. 15 Positive recurrence is a class property We now show that (just like recurrence) positive recurrence is also a class property. second) excursion that contains j. If heads show up at the r-th excursion. 41 . in Proposition 2. unique. (r) j. Since a is positive recurrent. Let i be a positive recurrent state. i. the event that j occurs in a speciﬁc excursion is independent from the event that j occurs in the next excursion.

where the Zr are i. 2. . R2 has ﬁnite expectation. Then either all states in C are positive recurrent. . Corollary 10. But. Since gij > 0 and Ei Ti < ∞ we have that Ei ζ < ∞. . Let C be a communicating class. IRREDUCIBLE CHAIN r = 1. by Wald’s lemma.d. An irreducible Markov chain is called null recurrent if one (and hence all) states are null recurrent. . the expectation of this random variable equals −1 Ei ζ = (Ei Ti ) × Ei (R2 − R1 + 1) = (Ei Ti ) × (gij + 1) because R2 − R1 is a geometric random variable: Pi (R2 − R1 = r) = (1 − gij )r−1 gij . we just need to show that R2 −R1 +1 ζ := r=1 Zr has ﬁnite expectation. An irreducible Markov chain is called transient if one (and hence all) states are transient. . with the same law as Ti . . RECURRENT TRANSIENT POSITIVE RECURRENT NULL RECURRENT 42 . In other words.i.R2 R1 j i R1+1 R1+2 This random variable has the same law as T under P i i ζ We just need to show that the sum of the durations of excursions with indices R1 . . or all states are null recurrent or all states are transient. An irreducible Markov chain is called positive recurrent if one (and hence all) states are positive recurrent. An irreducible Markov chain is called recurrent if one (and hence all) states are recurrent.

Theorem 16. by (18). But. Proof. Consider the Markov chain (Xn . π(1) = 0. what prohibits uniqueness. Since the chain is irreducible and positive recurrent we have Pi (Tj < ∞) = 1. Fix a ground state a. for all x ∈ S. for all states i and j. for totally trivial reasons (it is. we have P (Xn = x) = π(x). is a stationary distribution. an absorbing state). x ∈ S. following Theorem 14. 1(Xn = x) n=1 = 1 t t P (Xn = x) = π(x). state 1 (and state 2) is positive recurrent.16 Uniqueness of the stationary distribution At the risk of being too pedantic. we stress that a stationary distribution may not be unique. the π deﬁned by π(0) = α. By a theorem of Analysis (Bounded Convergence Theorem). there is a stationary distribution. π(2) = 1 − α 2. we have. E 1 t t 1 t t 1(Xn = x) n=1 → π [a] (x). x∈S By (13). Let π be a stationary distribution. after all. (You should think why this is OBVIOUS!) Notice that 1 This lack of communication is. we start the Markov chain from the stationary distribution π. Hence. Then P (Ta < ∞) = x∈S In other words. with probability one. x ∈ S. Example: Consider the chain we’ve seen before: 1 1/3 1 0 1/2 1 1/6 2 Clearly. Ta −1 1 t t Ea 1(Xk = x) k=0 1(Xn = x) n=1 → E a Ta = π [a] (x). we can take expectations of both sides: E as t → ∞. Consider positive recurrent irreducible Markov chain. n=1 43 . n ≥ 0) and suppose that P (X0 = x) = π(x). for all n. recognising that the random variables on the left are nonnegative and bounded below 1. (18) π(x)Px (Ta < ∞) = π(x) = 1. as t → ∞. Then. indeed. But observe that there are many–inﬁnitely many–stationary distributions: For each 0 ≤ α ≤ 1. Then there is a unique stationary distribution.

In particular. π(i|C) = Ei Ti . Consider a positive recurrent irreducible Markov chain with stationary distribution π. Corollary 11. so they are equal. i is positive recurrent. Proof. Thus. Consider an arbitrary Markov chain. Both π [a] and π[b] are stationary distributions. In particular. Therefore Ei Ti < ∞.Therefore π(x) = π [a] (x). where π(C) = y∈C π(y). Let π(·|C) be the distribution deﬁned by restricting π to C: π(x|C) := π(x) . There is a unique stationary distribution. which is useful because it is often used as a criterion for positive recurrence: Corollary 13. Assume that a stationary distribution π exists. π(a) = π [a] (a) = 1/Ea Ta . Let i be such that π(i) > 0. 17 Structure of a stationary distribution It is not hard to show now what an arbitrary stationary distribution looks like. b ∈ S. Consider a positive recurrent irreducible Markov chain and two states a. Hence there is only one stationary distribution. There is only one stationary distribution. the only stationary distribution for the restricted chain because C is irreducible. Consider a Markov chain. y ∈ C. 1 π(a) = . Decompose it into communicating classes. for all a ∈ S. we obtain the cycle formula: Ta −1 Tb −1 Ea k=0 1(Xk = x) E a Ta = Eb k=0 1(Xk = x) E b Tb . Let C be the communicating class of i. The following is a sort of converse to that. This is in fact. x ∈ S. By the 1 above corollary. E a Ta Proof. delete all transition probabilities pxy with x ∈ C. Consider the restriction of the chain to C. Hence π = π [a] . Then π [a] = π [b] . Proof. In Section 14 we assumed the existence of a positive recurrent state a and proved the existence of a stationary distribution π. The cycle formula merely re-expresses this equality. for all x ∈ S. Namely. an arbitrary stationary distribution π must be equal to the speciﬁc one π [a] . i. Corollary 12. But π(i|C) > 0. Then any state i such that π(i) > 0 is positive recurrent. Then. Then for each positive recurrent communicating class C there corresponds a unique stationary distribution π C 44 . We can easily see that we thus obtain a Markov chain with state space C with stationary distribution π(·|C). π(C) x ∈ C. from the deﬁnition of π [a] .e.

1 Coupling and stability Deﬁnition of stability Stability refers to the convergence of the probabilities P (Xn = x) as n → ∞. more generally. it will settle to the vertical position. Let π be an arbitrary stationary distribution. This is the stable position. By our uniqueness result. without friction. Proposition 3. eventually. This is deﬁned by picking a state (any state!) a ∈ C and letting π C := π [a] . the so-called Cesaro 1 averages n n P (Xk = x) converge to π(x). that. where αC ≥ 0. x ∈ C) is a stationary distribution. where π(C) := π(C) π(x). that it remains in a small region). π(x|C) ≡ π C (x). An arbitrary stationary distribution π must be a linear combination of these π C . For example. that the pendulum will swing for a while and. If π(x) > 0 then x belongs to some positive recurrent class C. A sequence of real numbers an k=1 1 may not converge but its Cesaro averages n (a1 + · · · + an ) may converge. for example. from physical experience.which assigns positive probability to each state in C. To each positive recurrent communicating class C there corresponds a stationary distribution π C . So far we have seen. Stability of a deterministic dynamical system means that the system eventually converges to a ﬁxed point (or. consider the motion of a pendulum under the action of gravity. x∈C Then (π(x|C). There is dissipation of energy due to friction and that causes the settling down. Any stationary distribution π is necessarily of the form π= C: C positive recurrent communicating class αC π C . For each such C deﬁne the conditional distribution π(x|C) := π(x) . such that C αC = 1. Why stability? There are physical reasons why we are interested in the stability of a Markov chain. In an ideal pendulum. as n → ∞. Proof. We all know. under irreducibility and positive recurrence conditions. 18 18. A Markov chain is a simple model of a stochastic dynamical system. So we have that the above decomposition holds with αC = π(C). the pendulum will perform an 45 . take an = (−1)n .

4 3 2 x* 1 0 0 1 2 t 3 t4 If we start with x(t = 0) = x∗ . Speciﬁcally. It follows that |ρ| < 1 is the condition for stability. or an example in physics. we may it is stable because it does not (it cannot!) swing too far away. it cannot mean convergence to a ﬁxed equilibrium point. However notice what happens when we solve the recursion. But what does stability mean? In general. let us consider a stochastic analogue of the above in discrete time: Xn+1 = ρXn + ξn . . then regardless of the starting position. Still. a > 0. We say that x∗ is a stable equilibrium. mutually independent. so I’ll leave it upon the reader to decide whether he or she wants to consider this as an ﬁnancial example. X2 . the one found by setting the right-hand side equal to zero: x∗ = b/a.oscillatory motion ad inﬁnitum. ξ2 . or whatever. ξ0 . The simplest diﬀerential equation As another example. . consider the following diﬀerential equation x = −ax + b. ˙ There are myriads of applications giving physical interpretation to this. for we can do it: X1 = ρX0 + ξ0 X2 = ρX1 + ξ1 = ρ(ρX0 + ξ0 ) + ξ1 = ρ2 X0 + ρξ0 + ξ1 X3 = ρX2 + ξ2 = ρ3 X0 + ρ2 ξ0 + ρξ1 + ξ2 Xn = ρn X0 + ρn−1 ξ0 + ρn−2 ξ1 + · · · + ρξn−2 + ξn−1 . We shall assume that the ξn have the same law. then x(t) = x∗ for all t > 0. The thing to observe here is that there is an equilibrium point i. Since we are prohibited by the syllabus to speak about continuous time stochastic systems. . random variables. we have x(t) → x∗ . moreover. . If. are given. And we are in a happy situation here. as t → ∞. we here assume that X0 . . ξ1 .e. . x ∈ R. and deﬁne a stochastic sequence X1 . ······ 46 . by means of the recursion above. A linear stochastic system We now pass on to stochastic systems. simple because the ξn ’s depend on the time parameter n.

we have ˜ ˆ P (Xn ≤ x) = P (Xn ≤ x). deﬁne stability in a way other than convergence of the distributions of the random variables. for instance. Y = y) is. 18. to a matrix with rows all equal to π: (n) lim p n→∞ ij = π(j). assume that the ξn are bounded) Xn does converge to ˆ X∞ = k=0 ρk ξk . Y but not their joint law. . What this means is that. ˜ Xn := ρn−1 ξ0 + ρn−2 ξ1 + · · · + ρξn−2 + ξn−1 . ξn−1 does not converge.3 Coupling To understand why the fundamental stability theorem works. Starting at an elementary level. . under nice conditions. Y = y) = P (X = x)P (Y = y). but we are not given what P (X = x. j ∈ S. .2 The fundamental stability theorem for Markov chains Theorem 17. Coupling refers to the various methods for constructing this joint law. to try to make the coincidence probability P (X = Y ) as large as possible. but notice that if we let ˆ Xn := ρn−1 ξn−1 + ρn−2 ξn−2 + · · · + ρξ1 + ξ0 . A straightforward (often not so useful) construction is to assume that the random variables are independent and deﬁne P (X = x. as n → ∞. for any initial distribution. ∞ ˆ The nice thing is that. positive recurrent and aperiodic with (unique) stationary distribution π then n→∞ lim P (Xn = x) = π(x). How should we then deﬁne what P (X = x. which is a linear combination of ξ0 .g. while the limit of Xn does not exists. the n-step transition matrix Pn . In particular. The second term. g(y) = P (Y = y).Since |ρ| < 1. 18. Y = y) is? 47 . i. uniformly in x. for time-homogeneous Markov chains. In other words. . we need to understand the notion of coupling. the limit of its distribution does: ˜ ˆ ˆ lim P (Xn ≤ x) = lim P (Xn ≤ x) = lim P (Xn ≤ x) = P (X∞ ≤ x) n→∞ n→∞ n→∞ This is precisely the reason why we cannot. we know f (x) = P (X = x). (e. suppose you are given the laws of two random variables X. we have that the ﬁrst term goes to 0 as n → ∞. But suppose that we have some further requirement. converges. If a Markov chain is irreducible.

n ≥ 0. precisely because of the assumption that the two processes have been constructed together. The coupling we are interested in is a coupling of processes. Y be random variables with values in Z and laws f (x) = P (X = x). y) = P (X = x. g(y) = x h(x. i. P (T < ∞) = 1. We say that ransom time T is meeting time of the two processes if Xn = Yn for all n ≥ T. n < T ) + P (Yn = x. Y . on the event that the two processes never meet. in particular. and which maximises P (X = Y ).e. Let T be a meeting of two coupled stochastic processes X. If. T is ﬁnite with probability one. Y = y) which respects the marginals. f (x) = y h(x. Proof. then |P (Xn = x) − P (Yn = x)| → 0. y). x∈S 48 . uniformly in x ∈ S. but we are not told what the joint probabilities are.Exercise: Let X. This T is a random variables that takes values in the set of times or it may take values +∞. Note that it makes sense to speak of a meeting time.e. and hence P (Xn = x) − P (Yn = x) ≤ P (T > n). Combining the two inequalities we obtain the ﬁrst assertion: |P (Xn = x) − P (Yn = x)| ≤ P (T > n). (19) as n → ∞. but with the roles of X and Y interchanged. We have P (Xn = x) = P (Xn = x. g(y) = P (Y = y). A coupling of them refers to a joint construction on the same probability space. Then. Repeating (19). n < T ) + P (Xn = x. for all n ≥ 0. Suppose that we have two stochastic processes as above and suppose that they have been coupled. n ≥ 0. and all x ∈ S. we can write it as sup |P (Xn = x) − P (Yn = x)| ≤ P (T > n). x ∈ S. i. n ≥ 0) and the law of a stochastic process Y = (Yn . We are given the law of a stochastic process X = (Xn . The following result is the so-called coupling inequality: Proposition 4. n ≥ T ) = P (Xn = x. n ≥ T ) ≤ P (n < T ) + P (Yn = x). n ≥ 0). Find the joint law h(x. we obtain P (Yn = x) − P (Xn = x) ≤ P (T > n). respectively. Since this holds for all x ∈ S and since the right hand side does not depend on x. y). |P (Xn = x) − P (Yn = x)| ≤ P (T > n).

(y. Having coupled them. π(x) > 0 for all x ∈ S. Yn ). y) ∈ S × S.y′ . It is very important to note that we have only deﬁned the law of X and the law of Y but we have not deﬁned a joint law. We shall consider the laws of two processes. implying that q(x. n ≥ 0) is a Markov chain with state space S × S.x′ py. y) Its n-step transition probabilities are q(x. The second process. We now couple them by doing the straightforward thing: we assume that X = (Xn . We know that there is a unique stationary distribution. then. So both processes are positive recurrent aperiodic Markov chains with the same transition probabilities.x′ ). and it also makes sense to consider their ﬁrst meeting time: T := inf{n ≥ 0 : Xn = Yn }.y′ for all large n. Xn and Yn would be independent and distributed according to π.(y. sup |P (Xn = x) − P (Yn = x)| → 0. and so (Wn . (Indeed. The ﬁrst process.4 Proof of the fundamental stability theorem First.x′ > 0 and py. n ≥ 0) has the law of a Markov chain with transition probabilities pij and Y0 distributed according to the stationary distribution µ. they diﬀer only in how the initial states are chosen.(y. we have that px. 49 (n) (n) (n) (n) (n) (n) = P (Xn+1 = x′ | Xn = x) P (Yn+1 = y ′ | Yn = y) = px.x′ ). From Theorem 3 and the aperiodicity assumption. Then (Wn .y′ . y ′ ) | Wn = (x. n ≥ 0) is independent of Y = (Yn .x′ py.x′ ). positive recurrent and aperiodic. Then n→∞ lim P (T > n) = P (T = ∞) = 0. as usual.y′ ) = px.Assume now that P (T < ∞) = 1.y′ ) := P Wn+1 = (x′ . Notice that σ(x. denoted by X = (Xn . the transition probabilities. denoted by Y = (Yn . review the assumptions of the theorem: the chain is irreducible. . for all n ≥ 1. in other words. denoted by π. n ≥ 0) has the law of a Markov chain with transition probabilities pij and X0 distributed according to some arbitrary distribution µ. y) = µ(x)π(y). Its initial state W0 = (X0 . Therefore. Let pij denote. Its (1-step) transition probabilities are q(x.y′ ) for all large n. we can deﬁne joint probabilities. n ≥ 0) is an irreducible chain. y) := π(x)π(y). (x. we have not deﬁned joint probabilities such as P (Xn = x. had we started with both X0 and Y0 independent and distributed according to π. Y0 ) has distribution P (W0 = (x. Consider the process Wn := (Xn .) By positive recurrence. n ≥ 0). 18. Xm = y). n ≥ 0). x∈S as n → ∞. is a stationary distribution for (Wn .

n ≥ 0) is identical in law to (Xn .0). In particular. Then the product chain has state space S ×S = {(0. (Zn . X arbitrary distribution Z stationary distribution Y T Thus. if we modify the original chain and make it aperiodic by. By the coupling inequality. 1} and the pij are p01 = p10 = 1. then the process W = (X.Therefore σ(x. Y ) may fail to be irreducible. n ≥ 0).1).1) = q(1.(0. y) ∈ S × S.(1. In particular. However. |P (Zn = x) − P (Yn = x)| ≤ P (T > n). and since P (Yn = x) = π(x). for all n and x. say. 0). 0). 1). p01 = 0. T is a meeting time between Z and Y .0). Deﬁne now a third process by: Zn := Xn 1(n < T ) + Yn 1(n ≥ T ). (1. Since P (Zn = x) = P (Xn = x). Zn equals Xn before the meeting time of X and Y . (1. 50 . Moreover. Hence (Zn . Hence (Wn . P (T < ∞) = 1. Example showing that the aperiodicity assumption is essential for the method used in the proof: If pij are the transition probabilities of an irreducible chain X and if Y is an independent copy of X.1). P (Xn = x) = P (Zn = x). Obviously. p00 = 0. which converges to zero. n ≥ 0) is positive recurrent. after the meeting time. suppose that S = {0.(0. Zn = Yn .0) = q(1. as n → ∞. Clearly.0) = q(0.1) = 1. uniformly in x. we have |P (Xn = x) − π(x)| ≤ P (T > n). by assumption. 1)} and the transition probabilities are q(0. (0.99. This is not irreducible.01. ≥ 0) is a Markov chain with transition probabilities pij . y) > 0 for all (x. For example. precisely because P (T < ∞) = 1.(1. p10 = 1. Z0 = X0 which has an arbitrary distribution µ.

Suppose that the period of the chain is d. C1 . . Indeed.e. 18.01. Cd−1 . π(0) ≈ 0. we can decompose the state space into d classes C0 . from C1 to C2 . per se wrong. such that the chain moves cyclically between them: from C0 to C1 . Then Xn = 0 for even n and = 1 for odd n.then the product chain becomes irreducible. Example showing that the aperiodicity assumption is essential for the limiting result: Consider the same chain as above: p01 = p10 = 1.503. most people use the term “ergodic” for an irreducible. .5 Terminology A Markov chain which is irreducible and positive recurrent will be called ergodic. But this is “wrong”.497. if we modify the chain by p00 = 0. however. then (n) lim p n→∞ 00 (n) = lim p10 = π(0). It is not diﬃcult to see that 6 We put “wrong” in quotes.4 and. n→∞ (n) where π(0).497 18. In matrix notation.503 0. n→∞ lim p00 p01 (n) (n) p10 p11 (n) (n) = 0. Suppose X0 = 0. The reader is asked to pay attention to this terminology which is diﬀerent from the one used in the Markov chain literature. π(0) + π(1) = 1. Parry. 0. n→∞ (n) (n) lim p n→∞ 11 = lim p01 = π(1). Theorem 4. from Cd back to C0 . By the results of §5.503 0. Terminology cannot be. limn→∞ p00 does not exist. 1981). Thus. and this is expressed by our terminology. p01 = 0. 6 The reason is that the term “ergodic” means something speciﬁc in Mathematics (c.497 . π(1) satisfy 0. It is. customary to have a consistent terminology. and so on. we know we have a unique stationary distribution π and that π(x) > 0 for all states x. A Markov chain which is irreducible and positive recurrent and aperiodic will be called mixing.6 Periodic chains Suppose now that we have a positive recurrent irreducible chain. .99. Therefore p00 = 1 for even n (n) and = 0 for odd n. p10 = 1. π(1) ≈ 0. positive recurrent and aperiodic Markov chain. i. However.99π(0) = π(1). . in particular. By the results of Section 14.f. 51 .

Can we also ﬁnd the limit when i and j do not belong to the same class? Suppose that i ∈ Ck . starting from i. the (unique) stationary distribution of (Xnd . Suppose that pij are the transition probabilities of a positive recurrent irreducible chain with period d. π(i) = 1/d. 52 . Then. Since (Xnd . if sampled every d steps. namely C0 . . and the previous results apply. n ≥ 0) to any of the sets Cr we obtain a positive recurrent irreducible aperiodic chain. . after reduction by d) and. Hence if we restrict the chain (Xnd . Then pji = 0 unless j ∈ Cr−1 . where π is the stationary distribution. i ∈ Cr . Let i ∈ Ck . Summing up over i ∈ Cr we obtain αr = i∈Cr π(i) = j∈Cr−1 π(j) i∈Cr pji = j∈Cr−1 π(j) = αr−1 . thereafter. j ∈ Cℓ . Let αr := i∈Cr π(i). Suppose that X0 ∈ Cr . 1. j ∈ Cr . . . i ∈ Cr . i∈Cr for all r = 0.Lemma 8. . as n → ∞. . Suppose i ∈ Cr . as n → ∞. The chain (Xnd . it remains in Cℓ . We shall show that Lemma 9. Proof. the original chain enters Cℓ in r = ℓ − k steps (where r is taken to be a number between 0 and d − 1. We have that π satisﬁes the balance equations: π(i) = j∈S π(j)pji . n ≥ 0) consists of d closed communicating classes. j ∈ Cℓ . i ∈ S. and let r = ℓ − k (modulo d). . we have pij for all i. → dπ(j). So π(i) = j∈Cr−1 π(j)pji . Since the αr add up to 1. n ≥ 0) is aperiodic. Then. . This means that: Theorem 18. d − 1. n ≥ 0) is dπ(i). All states have period 1 for this chain. Then pij (r+nd) (nd) = Pi (Xnd = j) → dπ(j). each one must be equal to 1/d. Cd−1 .

k = 1. exists for all i ∈ N. 2. We also saw in Corollary (n) 7 that. Recall that we know what happens when j is positive recurrent and i is in the same commu(n) nicating class as j If the period is 1.. p(n) = (n) (n) (n) (n) (n) ∞ p1 . . e 53 . . . Since the numbers p1 n1 1 n1 2 (n) (n1 ) < < n1 < · · · . . k = 1. Since they are (n2 ) subsequence n2 of n1 such that limk→∞ p1 k k k (nk ) (n) as n → ∞. then pij → 0 as n → ∞.. say. . say. a probability distribution on N. Consider. i. 2. for each i. If the period is d then the limit exists only along a selected subsequence (Theorem 18). p1 . then pij → π(j) (Theorems 17). k = 1. for all i ∈ S. such that limk→∞ p1 k 3 (n1 ) numbers p2 k . . p2 .). as k → ∞. for all i ∈ S. . whose proof we omit7 : 7 See Br´maud (1999). .e. Now consider the contained between 0 and 1 there must be a exists and equals. . .) is a k k subsequence of each (ni . So the diagonal subsequence (nk . . . If j is a null recurrent state then pij → 0. The proof will be based on two results from Analysis. p3 . Then we can ﬁnd a i=1 pi sequence n1 < n2 < n3 < · · · of integers such that limk→∞ pi Proof. as n → ∞. there must be a subsequence exists and equals. . 2. for each The second result is Fatou’s lemma. It is clear how we (ni ) k continue. . the ﬁrst of which is known as HellyBray Lemma: Lemma 10. p2 . 2. if j is transient. where = 1 and pi ≥ 0 for all i. 19. . schematically. For each i we have found subsequence ni . Hence pi k i. . k = 1.19 Limiting theory for Markov chains* We wish to complete our discussion about the possible limits of the n-step transition prob(n) abilities pij . for each n = 1. pi . 2. and so on. . such that limk→∞ pi k exists and equals. We have. By construction. the sequence n2 of the second row k is a subsequence of n1 of the ﬁrst row. say. . are contained between 0 and 1. the sequence n3 of the third row is a subsequence k k of n2 of the second row. . p1 p1 (n1 ) 1 p1 (n1 ) 2 (n2 ) 2 p1 (n1 ) 3 (n2 ) 3 (n3 ) 3 (n2 ) 1 (n3 ) 1 ··· ··· ··· ··· → → → ··· p1 p2 p3 ··· p1 p1 ··· p1 p1 ··· (n3 ) 2 p1 ··· Now pick the numbers in the diagonal.1 Limiting probabilities for null recurrent states Theorem 19. (nk ) k → pi .

αy pyx . x ∈ S . i=1 (n) Proof of Theorem 19. the sequence (n) pij does not converge to 0. π is a stationary distribution. k→∞ (n′ +1) = y∈S piy k pyx . and the last equality is obvious since we have a probability vector. (n′ ) Hence αx ≥ αy pyx . But Pn+1 = Pn P. (n ) (n ) Consider the probability vector By Lemma 10. it is a strict inequality: αx0 > y∈S αy pyx0 . 54 . . Suppose that for some i. for some x0 ∈ S.Lemma 11. we have that π(x) := αx y∈S αy satisﬁes the balance equations and has x∈S π(x) = 1. and this is a contradiction (the number αx = αx cannot be strictly larger than itself). are sequences of nonnegative numbers then ∞ (n) yi lim inf xi i=1 n→∞ (n) ∞ ≤ lim inf n→∞ yi x i .e. y∈S Since x∈S αx ≤ 1. actually. αx ≤ lim inf k→∞ (n′ ) for all x ∈ S. i ∈ N). equalities. x∈S (n′ ) Since aj > 0. x ∈ S. Hence we can ﬁnd a subsequence nk such that k→∞ lim pij k = αj > 0.e. state j is positive recurrent: Contradiction. 2. . n = 1. By Corollary 13. pix k Therefore. we can pick subsequence (n′ ) of (nk ) such that k k→∞ lim pix k = αx . (n′ ) lim pix k (n′ +1) ≥ y∈S k→∞ lim piy k pyx . Suppose that one of them is not. i. Thus. by assumption. Now π(j) > 0 because αj > 0. i ∈ N). . Let j be a null recurrent state. y∈S x ∈ S. (xi . pix k = 1. i. pix k . This would imply that αx > x∈S x∈S y∈S x∈S αy pyx = y∈S αy x∈S pyx = y∈S αy . We wish to show that these inequalities are.e. If (yi . we have 0< x∈S The inequality follows from Lemma 11. i. by Lemma 11.

HC ≤ n) n (n) = m=1 Pi (HC = m)Pi (Xn = j | HC = m). In general. as in §10. This is a more old-fashioned way of getting the same result. Pµ (Xn−m = j) → π C (j). γ < 1): γ 1−γ 1 1 2 1−α α 3 β 4 1−β 5 1 55 . Let HC := inf{n ≥ 0 : Xn ∈ C}. so we will only look that the aperiodic case.19. π C (j) = 1/Ej Tj . Let ϕC (i) := Pi (HC < ∞). when j is a positive recurrent state but i and j do not necessarily belong to the same communicating class. the result can be obtained by the so-called Abel’s Lemma of Analysis. E j Tj where Tj = inf{n ≥ 1 : Xn = j}. If i also belongs to C then ϕC (i) = 1 and so the result follows from Theorem 17. Let j be a positive recurrent state with period 1. we have Pi (Xn = j | HC = m) = Pµ (Xn−m = j). as n → ∞.2 The general case (n) It remains to see what the limit of pij is. Let C be the communicating class of j and let π C be the stationary distribution associated with C. Indeed. From the Strong Markov Property.2. Proof. The result when the period j is larger than 1 is a bit awkward to formulate. Example: Consider the following chain (0 < α. Theorem 20. as n → ∞. Let µ be the distribution of XHC given that X0 = i and that {HC < ∞}. that is. which is what we need. and fij = Pi (Tj < ∞). But Theorem 17 tells us that the limit exists regardless of the initial distribution. we have pij = Pi (Xn = j) = Pi (Xn = j. Then (n) lim p n→∞ ij = ϕC (i)π C (j). Let i be an arbitrary state. Hence (n) lim p n→∞ ij ∞ = π (j) m=1 C Pi (HC = m) = π C (j)Pi (HC < ∞). and fij = ϕC (j). In this form. starting with the ﬁrst hitting time decomposition relation of Lemma 7. Remark: We can write the result of Theorem 20 as (n) lim p n→∞ ij = fij . β.

information about its velocity.The communicating classes are I = {3. Pioneers in the are were. while. Ergodic Theory is nowadays an area that combines many ﬁelds of Mathematics and Science. ϕ(5) = 0. ϕ(4) = (1 − β)ϕ(5) + βϕ(3). whence 1−α β(1 − α) . (n) (n) p32 → ϕ(3)π C1 (2). we assumed the existence of a positive recurrent state. (n) (n) p35 → (1 − ϕ(3))π C2 (5). among others. (n) (n) p34 → 0. Consider the quantity ν [a] (x) deﬁned as the average number of visits to state x between two successive visits to state a–see 16. p42 → ϕ(4)π C1 (2). C2 = {5} (absorbing state). (n) (n) p43 → 0. from ﬁrst-step analysis. ϕ(3) = (1 − α)ϕ(2) + αϕ(4) . in Theorem 14. ranging from Mathematical Physics and Dynamical Systems to Probability and Statistics. ϕ(3) = p31 → ϕ(3)π C1 (1). Hence. We will avoid to give a precise deﬁnition of the term ‘ergodic’ but only mention what constitutes a result in the area. π C1 (2) = . p45 → (1 − ϕ(4))π C2 (5). for example. The ergodic hypothesis in physics says that the proportion of time that a particle spends in a particular region of its “phase space” is proportional to the volume of this region. The subject of Ergodic Theory has its origins in Physics. 1+γ 1+γ Let ϕ(i) be the probability that class C1 will be reached starting from state i. Theorem 14 is a type of ergodic theorem. Recall that. We have ϕ(1) = ϕ(2) = 1. 56 (20) x∈S . the 19-th century French mathematicians Henri Poincar´ and e Joseph Liouville. The stationary distributions associated with the two closed classes are: 1 γ π C1 (1) = . The phase (or state) space of a particle is not just a physical space but it includes. ϕ(4) = . C1 = {1. C2 are aperiodic classes. Suppose that a Markov chain possesses a recurrent state a ∈ S and starts from an initial state X0 such that P (Ta < ∞) = 1. 1 − αβ 1 − αβ Both C1 . Similarly. as n → ∞. Let f be a reward function such that |f (x)|ν [a] (x) < ∞. 2} (closed class). For example. p41 → ϕ(4)π C1 (1). And so is the Law of Large Numbers itself (Theorem 13). p44 → 0. 4} (inessential states). 20 Ergodic theorems An ergodic theorem is a result about time-averages of random processes. (n) (n) p33 → 0. Theorem 21. π C2 (5) = 1. We now generalise this.

r=0 with probability 1 as long as E|G1 | < ∞. we assumed positive recurrence of the ground state a. As in Theorem 14. we have C := i∈S ν(i) < ∞. we break the total reward as follows: t Nt −1 f (Xn ) = n=0 r=1 Gr + Gﬁrst + Glast . then. Then at least one of the states must be visited inﬁnitely often. as we saw in the proof of Theorem 14. Replacing n by the subsequence Nt . If so. As in the proof of Theorem 14.Then P where f := x∈S t→∞ lim t n=0 t n=0 f (Xn ) 1(Xn = a) =f =1 . which is the quantity denoted by f in Theorem 14. 57 . Glast are the total rewards over the ﬁrst and t t a a last incomplete cycles. Proof. So if we divide by t the numerator and denominator of the fraction in (21). respectively. But this is precisely the condition (20). t t where Gr = T (r) ≤t<T (r+1) f (Xt ). we have t→∞ lim 1 last 1 G = lim Gﬁrst = 0. Remark 1: The diﬀerence between Theorem 14 and 21 is that. Now. since the number of states is ﬁnite. we have Nt /t → 1/Ea Ta . we obtain the result. while Gﬁrst . Note that the denominator t n=0 1(Xn = a) equals the number Nt of visits to state a up to time t. in the former. so the numerator converges to f /Ea Ta . From proposition 2 we know that the balance equations ν = νP have at least one solution. Remark 2: If a chain is irreducible and positive recurrent then the Ergodic Theorem Theorem 14 holds.5. we have that the denominator equals Nt /t and converges to 1/Ea Ta . (21) f (x)ν [a] (x). So there are always recurrent states. and this justiﬁes the terminology of §18. 21 Finite chains We take a brief look at some things that are speciﬁc to chains with ﬁnitely many states. t t→∞ t t t with probability one. The Law of Large Numbers tells us that 1 lim n→∞ n n−1 Gr = EG1 . We only need to check that EG1 = f . The reader is asked to revisit the proof of Theorem 14. and this is left as an exercise.

like playing a ﬁlm backwards. X0 ) for all n. Consider an irreducible Markov chain with transition probabilities pij and stationary distribution π. a ﬁnite Markov chain always has positive recurrent states. the stationary distribution is unique. We say that it is time-reversible. Because the process is Markov. . Xn ) has the same distribution as (Xn . simply. Then the chain is positive recurrent with a unique stationary distribution π. Then the chain is reversible if and only if the detailed balance equations hold: π(i)pij = π(j)pji . and run the ﬁlm backwards. In particular. More precisely. . running backwards. in addition. or. the ﬁrst components of the two vectors must have the same distribution. Theorem 23. Xn−1 . X1 . 58 . . By Corollary 13. by Theorem 16. Proof. because positive recurrence is a class property (Theorem 15). Xn has the same distribution as X0 for all n. For example. . without being able to tell that it is. i ∈ S. . . C satisﬁes the balance equations and the normalisation condition i∈S π(i) = 1. In addition. in addition. . . . j ∈ S. Xn−1 . for all n. X1 = j) = P (X1 = i. this means that it is stationary. the chain is irreducible. it will look the same. . Suppose ﬁrst that the chain is reversible. 22 Time reversibility Consider a Markov chain (Xn . π(i) := Hence.Hence 1 ν(i). as n → ∞. then all states are positive recurrent. Lemma 12. . (X0 . At least some state i must have π(i) > 0. X0 = j). i. . We summarise the above discussion in: Theorem 22. X1 . Therefore π is a stationary distribution. It is. Consider an irreducible Markov chain with ﬁnitely many states. . Taking n = 1 in the deﬁnition of reversibility. if we ﬁlm a standing man who raises his arms up and down successively. A reversible Markov chain is stationary. If. Proof. this state must be positive recurrent. actually. we have Pn → Π. we see that (X0 . . P (X0 = i. where P is the transition matrix and Π is a matrix all the rows of which are equal to π. . i. then we instantly know that the ﬁlm is played in reverse. reversible if it has the same distribution when the arrow of time is reversed. But if we ﬁlm a man who walks and run the ﬁlm backwards. If. X1 ) has the same distribution as (X1 . in a sense. Another way to put this is by saying that observing any part of the process will not reveal any information about whether the process runs forwards or backwards in time. the chain is aperiodic. n ≥ 0). that is. Reversibility means that (X0 . Xn ) has the same distribution as (Xn . X0 ). .e. X0 ).

we have π(i) > 0 for all i. . X1 . We will show the Kolmogorov’s loop criterion (KLC). X1 . P (X0 = j. X0 ). .for all i and j. So the detailed balance equations (DBE) hold. Now pick a state k and write. Conversely. π(i)pij pjk = [π(j)pji ]pjk = [π(j)pjk ]pji = π(k)pkj pji . Xn−1 . . . X1 = j) = π(i)pij . Consider an aperiodic irreducible Markov chain with transition probabilities pij and stationary distribution π. (22) Proof. Hence cancelling the π(i) from the above display we obtain the KLC for m = 3. and pi1 i2 (m−3) → π(i2 ). j. pji 59 (m−3) → π(i1 ). X1 = i) = π(j)pji . By induction. π(i)pij pjk pki = [π(j)pji ]pjk pki = [π(j)pjk ]pji pki = [π(k)pkj ]pji pki = [π(k)pki ]pji pkj = [π(i)pik ]pji pkj = π(i)pik pkj pji Since the chain is irreducible. suppose that the KLC (22) holds. . 2) The DBE are equivalent to saying that the ratio pij /pji can be written as a product of a function of i and a function of j. Pick 3 states i. . im−1 we obtain (m−3) (m−3) pi1 i2 pi2 i1 = pi1 i2 pi2 i1 If the chain is aperiodic. Hence the DBE hold. X1 ) has the same distribution as (X1 . X1 . The same logic works for any m and can be worked out by induction. . i2 . In other words. . using DBE. X2 = i). we have separation of variables: pij = f (i)g(j). X1 = j. . for all n. then the chain cannot be reversible. i4 . But P (X0 = i. Then the chain is reversible if and only if the Kolmogorov’s loop criterion holds: for all i1 . Notes: 1) A necessary condition for reversibility is that pij > 0 if and only if pji > 0. (X0 . . k. . j. and this means that P (X0 = i. suppose the DBE hold. X1 = j. im ∈ S and all m ≥ 3. X2 ) has the same distribution as (X2 . and the chain is reversible. Theorem 24. Suppose ﬁrst that the chain is reversible. X0 ). letting m → ∞. . Summing up over all choices of states i3 . it is easy to show that. using the DBE. So the chain is reversible. for all i. then. pi1 i2 pi2 i3 · · · pim−1 im = pi1 im pim im−1 · · · pi2 i1 . we have pi2 i1 implying that π(i1 )pi1 i2 = π(i2 )pi2 i1 . These are the DBE. X2 = k) = P (X0 = k. . . Xn ) has the same distribution as (Xn . . Conversely. . k. and write. X0 ). . and hence (X0 . if there is an arrow from i to j in the graph of the chain but no arrow from j to i. The DBE imply immediately that (X0 .

) Necessarily. meaning that it has no loops. Ac ) = F (Ac . there would be a loop. This gives the following method for showing that a ﬁnite irreducible chain is reversible: • Form the ratio pij /pji and show that the variables separate.(Convention: we form this ratio only when pij > 0. consider the chain: Replace it by: This is a tree.1) which gives π(i)pij = π(j)pji . then we need.) Let A. and the arrow from j to j is the only one from Ac to A. Example 1: Chain with a tree-structure. that the chain is positive recurrent. (If it didn’t. the graph becomes disconnected. And hence π can be found very easily. The balance equations are written in the form F (A. by assumption. There is obviously only one arrow from AtoAc . So g satisﬁes the balance equations. Hence the chain is reversible. j for which pij > 0. the two oriented edges from i to j and from j to i by a single edge without arrow. and by replacing. so we have pij g(j) = . using another method. • If the chain has inﬁnitely many states. in addition. For example. pji g(i) Multiplying by g(i) and summing over j we ﬁnd g(i) = j∈S g(j)pji . then the chain is reversible. and. the detailed balance equations. The reason is as follows. for each pair of distinct states i. i. Hence the original chain is reversible. Pick two distinct states i. Form the graph by deleting all self-loops. namely the arrow from i to j. f (i)g(i) must be 1 (why?). there isn’t any. If the graph thus obtained is a tree. 60 . to guarantee. A) (see §4. j for which pij > 0. Consider a positive recurrent irreducible Markov chain such that if pij > 0 then pji > 0. If we remove the link between them.e. Ac be the sets in the two connected components.

Now deﬁne the the following transition probabilities: pij = 1/δ(i). we have that C = 1/2|E|. if j is a neighbour of i otherwise. To see this let us verify that the detailed balance equations hold. π(i)pij = π(j)pji . The constant C is found by normalisation. Hence we conclude that: 1. 2. 0. The stationary distribution of a RWG is very easy to ﬁnd. If i. Suppose the graph is connected. There are two cases: If i. Consider an undirected graph G with vertices S.e. 61 . i∈S Since δ(i) = 2|E|. p02 = 1/4. and a set E of undirected edges. For each state i deﬁne its degree δ(i) = number of neighbours of i. j are not neighbours then both sides are 0. so both members are equal to C. j are neighbours 1 1 then π(i)pij = Cδ(i) δ(i) = C and π(j)pji = Cδ(j) δ(j) = C. The stationary distribution assigns to a state probability which is proportional to its degree. where C is a constant.Example 2: Random walk on a graph. i∈S where |E| is the total number of edges. π(i) = 1. A RWG is a reversible chain. We claim that π(i) = Cδ(i). i. States j are i are called neighbours if they are connected by an edge. the DBE hold. The Markov chain with these transition probabilities is called a random walk on a graph (RWG). In all cases. etc. 1 4 0 2 3 Here we have p12 = p10 = p14 = 1/3. Each edge connects a pair of distinct vertices (states) without orientation. For example. a ﬁnite set.

. if. Due to the Strong Markov Property. .3 Subordination Let (Xn . . 2. . 2. i. in addition. if nk = n0 + km.) has the Markov property but it is not time-homogeneous. . . . 2. .1 Subsequences The most obvious way to create a new Markov chain is by picking a subsequence. k = 0. this process is also a Markov chain and it does have time-homogeneous transition probabilities. i. 2.i. π is a stationary distribution for the original chain. 1. The sequence (Yk .e. 23. 1. The state space of (Yr ) is A. 1. . in general.e. A r = 1. . then it is a stationary distribution for the new chain. St+1 = St + ξt .) has time-homogeneous transition probabilities: P (Yk+1 = j|Yk = i) = pij . If π is the stationary distribution of the original chain then π/π(A) is the stationary distribution of (Yr ). n ≥ 0) be Markov with transition probabilities pij . . k = 0. . k = 0.23 New Markov chains from old ones* We now deal with the question: if we have a Markov chain (Xn ). . t ≥ 0) be independent from (Xn ) with S0 = 0. where ξt are i. random variables with values in N and distribution λ(n) = P (ξ0 = n). (m) (m) 23.e. where pij are the m-step transition probabilities of the original chain.2 Watching a Markov chain when it visits a set Suppose that the original chain is ergodic (i. . t ≥ 0. 62 n ∈ N. irreducible and positive recurrent). k = 0. then (Yk . a sequence of integers (nk )k≥0 such that 0 ≤ n0 < n1 < n2 < · · · and by letting Yk := Xnk . . Let (St . 1. Deﬁne Yr := XT (r) . . If the subsequence forms an arithmetic progression. how can we create new ones? 23. In this case. . Let A be (r) a set of states and let TA be the r-th visit to A.d. 2.

however. . ∞ λ(n)pij . β. Then (Yn ) has the Markov property.. Yn−1 ) = P (Yn+1 = β | Yn = α). (Notation: We will denote the elements of S by i..e. in general. β). that there is a function Q : S ′ × S ′ → R such that P (Yn+1 = β | Xn = i) = Q(f (i). . then the process Yn = f (Xn ). If. α1 . . knowledge of f (Xn ) implies knowledge of Xn and so. is a Markov chain with P (Yt+1 = j|Yt = i) = n=1 t ≥ 0. αn−1 ∈ S ′ and let Yn−1 be the event Yn−1 := {Y0 = α0 . Indeed. conditional on Yn the past is independent of the future. . .Then Yt := XSt . Yn−1 = αn−1 }. Suppose that P (Yn+1 = β | Xn = i) depends on i only through f (i). f is a one-to-one function then (Yn ) is a Markov chain.) More generally. . j. . . . (n) 23. . . is NOT. we have: Lemma 13. . a Markov chain. i. n≥0 63 . Proof.4 Deterministic function of a Markov chain If (Xn ) is a Markov chain with state space S and if f : S → S ′ is some function from S to some countable set S ′ . We need to show that P (Yn+1 = β | Yn = α. and the elements of S ′ by α. . Fix α0 . .

and so (Yn ) is. Xn = i|Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. β). i ∈ Z. Yn−1 ) Q(f (i). Yn−1 ) = i∈S P (Yn+1 = β | Xn = i. the absolute value of next state will either increase by 1 or decrease by 1 with probability 1/2. But P (Yn+1 = β | Yn = α. a Markov chain with transition probabilities Q(α. Deﬁne Yn := |Xn |. then |Xn+1 | = 1 for sure. and if i = 0. conditional on {Xn = i}. β) P (Yn = α|Xn = i. Then (Yn ) is a Markov chain. P (Xn+1 = i ± 1|Xn = i) = 1/2. if we let 1/2. depends on i only through |i|. β) P (Yn = α. Yn = α. In other words. observe that the function | · | is not one-to-one. if i = 0.e. But we will show that P (Yn+1 = β|Xn = i). because the probability of going up is the same as the probability of going down. β). β). Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = i∈S = i∈S = i∈S Q(f (i). regardless of whether i > 0 or i < 0. Yn−1 )P (Xn = i | Yn = α. β) 1(α = f (i))P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. β)P (Xn = i | Yn = α. if β = α ± 1.for all choices of the variables involved. i ∈ Z. To see this. α > 0 Q(α. for all i ∈ Z and all β ∈ Z+ . β) P (Yn = α|Yn−1 ) i∈S = Q(α. if β = 1. We have that P (Yn+1 = β|Xn = i) = Q(|i|. β) i∈S P (Yn = α|Xn = i. Indeed. Example: Consider the symmetric drunkard’s random walk. β). Yn−1 ) Q(f (i). Thus (Yn ) is Markov with transition probabilities Q(α. indeed. β ∈ Z+ . 64 . On the other hand. β) := 1. α = 0. Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. i.

Hence all states i ≥ 1 are inessential. therefore transient. If we let ξ (n) = ξ1 . independently from one another. It is a Markov chain with time-homogeneous transitions. We will not attempt to compute the pij neither draw the transition diagram because they are both futile. in general. each individual gives birth to at one child with probability p or has children with probability 1 − p. Computing the transition probabilities pij = P (Xn+1 = j|Xn = i) is. and following the same distribution. since ξ (0) .24 24. which has a binomial distribution: i j pij = p (1 − p)i−j . We ﬁnd the population (n) at time n + 1 by adding the number of oﬀspring of each individual. Then P (ξ = 0) + P (ξ ≥ 2) > 0. ξ (1) . one question of interest is whether the process will become extinct or not. (and independent of the initial population size X0 –a further assumption). ξ2 . . . Case II: If p0 = P (ξ = 0) > 0 then state 0 can be reached from any other state: indeed. 0 But 0 is an absorbing state. Example: Suppose that p1 = p. Let ξk be the number of oﬀspring of the k-th individual in the n-th generation. j In this case. if 0 is never reached then Xn → ∞. . The state 0 is irrelevant because it cannot be reached from anywhere. Then the size of the (n + 1)-st generation is: (n) (n) (n) Xn+1 = ξ1 + ξ2 + · · · + ξXn . If we know that Xn = i then Xn+1 is the number of successful births amongst i individuals. Hence all states i ≥ 1 are transient. if the current population is i there is probability pi that everybody gives birth to zero oﬀspring. Back to the general case. with probability 1. We distinguish two cases: Case I: If p0 = P (ξ = 0) = 0 then the population can only grow. ξ (n) ). p0 = 1 − p. 1. . then we see that the above equation is of the form Xn+1 = f (Xn . and 0 is always an absorbing state. and. But here is an interesting special case. The idea is this: At each point of time n we have a number Xn of individuals each of which gives birth to a random number of oﬀspring. We let pk be the probability that an individual will bear k oﬀspring. 0 ≤ j ≤ i. Therefore. the population cannot grow: it will eventually reach 0. In other words. are i. If p1 = P (ξ = 1) = 1 then Xn = X0 + n and this is a trivial case. so we will ﬁnd some diﬀerent method to proceed. So assume that p1 = 1. we have that (Xn .. k = 0. . a hard problem. The parent dies immediately upon giving birth.1 Applications Branching processes: a population growth application We describe a basic model for population growth known as the Galton-Watson process. . Thus. if the 65 (n) (n) .i. . 2. n ≥ 0) has the Markov property. . .d.

The result is this: Proposition 5. Therefore the problem reduces to the study of the branching process starting with X0 = 1. Let ∞ µ = Eξ = k=1 kpk be the mean number of oﬀspring of an individual. 66 . Consider a Galton-Watson process starting with X0 = 1 initial ancestor. Indeed. ε(0) = 1. ϕn (0) = P (Xn = 0). where ε = P1 (D). We then have that D = D1 ∩ · · · ∩ Di . and. . If µ > 1 then ε < 1 is the unique solution of ϕ(z) = z. let Dk be the event that the branching process corresponding to the k-th initial ancestor dies. Proof. For i ≥ 1. . when we start with X0 = i. For each k = 1. each one behaves independently of one another. i. . and P (Dk ) = ε. We let ϕn (z) := Ez Xn ∞ = k=0 z k P (Xn = k) be the probability generating function of Xn . Indeed. since Xn = 0 implies Xm = 0 for all m ≥ n. if we start with X0 = i in ital ancestors. . Let ∞ ϕ(z) := Ez ξ = k=0 z k pk be the probability generating function of ξ. Let D be the event of ‘death’ or extinction: D := {Xn = 0 for some n ≥ 0}. we can argue that ε(i) = εi . Let ε be the extinction probability of the process. we have n→∞ lim ϕn (0) = lim P (Xm = 0 for all m ≥ n) n→∞ = P (there exists n such that Xm = 0 for all m ≥ n) = P (D) = ε. If µ ≤ 1 then the ε = 1. Obviously.population does not become extinct then it must grow to inﬁnity. so P (D|X0 = i) = εi . We wish to compute the extinction probability ε(i) := Pi (D). This function is of interest because it gives information about the extinction probability ε.

But ϕn+1 (0) → ε as n → ∞. if the slope µ at z = 1 is ≤ 1. ϕn+1 (0) = ϕ(ϕn (0)). Also ϕn (0) → ε. since ϕ is a continuous function. Therefore ε = ϕ(ε). and. and so on. then the graph of ϕ(z) does intersect the diagonal at some point ε∗ < 1. In particular. Therefore ε = ε∗ . (See right ﬁgure below. On the other hand.) 67 . in fact. Since both ε and ε∗ satisfy z = ϕ(z). We then have ϕn+1 (z) = Ez Xn+1 = E z ξ1 z ξ2 · · · z ξXn ∞ (n) (n) (n) ϕ1 (z) = ϕ(z). ϕ3 (z) = ϕ2 (ϕ(z)) = ϕ(ϕ(ϕ(z))) = ϕ(ϕ2 (z)). if µ > 1. Therefore. But 0 ≤ ε∗ . ε = ε∗ . Let us show that. Hence ϕ(0) ≤ ϕ(ε∗ ) = ε∗ . Notice now that µ= d ϕ(z) dz . ϕn+1 (z) = ϕ(ϕn (z)). the graph of ϕ(z) lies above the diagonal (the function z) and so the equation z = ϕ(z) has no solutions other than z = 1. recall that ϕ is a convex function with ϕ(1) = 1.Note also that ϕ0 (z) = 1. Hence ϕ2 (z) = ϕ1 (ϕ(z)) = ϕ(ϕ(z)). This means that ε = 1 (see left ﬁgure below). By induction. ϕn (0) ≤ ε∗ . But ϕn (0) → ε. So ε ≤ ε∗ < 1. z=1 Also. and since the latter has only two solutions (z = 0 and z = ε∗ ). all we need to show is that ε < 1. = i=0 ∞ E z ξ 1 z ξ 2 · · · z ξ Xn E z ξ1 z ξ2 · · · z ξi (n) (n) (n) (n) (n) (n) (n) (n) 1(Xn = i) 1(Xn = i) (n) = i=0 ∞ = i=0 ∞ E[z ξ1 ] E[z ξ2 ] · · · E[z ξi ] E[1(Xn = i)] ϕ(z)i P (Xn = i) = i=0 = E ϕ(z)Xn = ϕn (ϕ(z)). ϕ(ϕn (0)) → ϕ(ε).

. 3 hosts. using the same frequency. Abandoning this idea. for a packet to go through. So if we let Xn be the number of backlogged packets at the beg ginning of the n-th time slot. 0). 2. Then ε = 1. random variables with some common distribution: P (An = k) = λk . 1. 2. During this time slot assume there are a new packets that need to be transmitted. 6. say. 1. 2. So if there are. 5. 3. If collision happens. there is collision and the transmission is unsuccessful.i. is that if more than one hosts attempt to transmit a packet. there are x + a packets. there is no time to make a prior query whether a host has something to transmit or not. then time slots 0. So. Since things happen fast in a computer network. we let hosts transmit whenever they like. We assume that the An are i. If s = 1 there is a successful transmission and. . Let s be the number of selected packets. there is a collision and. on the next times slot. assume that on a time slot there are x packets amongst all hosts that are waiting for transmission (they are ‘backlogged’). 68 . then the transmission is not successful and the transmissions must be attempted anew. Xn+1 = Xn + An . Each of the backlogged or newly arrived packets is selected for transmission with (a small) probability p. . But this is not a good method it is a waste of time to allocate a slot to a host if that host has nothing to transmit.. 1. Thus. The problem of communication over a common channel. are allocated to hosts 1. on the next times slot. periodically. Then ε < 1. if Sn = 1. only one of the hosts should be transmitting at a given time. and Sn the number of selected packets. then max(Xn + An − 1. Ethernet has replaced Aloha. there are x + a − 1 packets. One obvious solution is to allocate time slots to hosts in a speciﬁc order. in a simpliﬁed model. If s ≥ 2. 3. k ≥ 0.1 1 (z) (z) ε 0 1 0 ε 1 z z Case: µ ≤ 1. Case: µ > 1. Nowadays. An the number of new packets on the same time slot. 24.2 The ALOHA protocol: a computer communications application This is a protocol used in the early days of computer networking for having computer hosts communicate over a shared medium (radio channel). .d. . otherwise. . 3. 4.

Therefore ∆(x) ≥ µ/2 for all large x. but we mention that it is based on the so-called drift criterion which says that if the expected change Xn+1 − Xn conditional on Xn = x is bounded below from a positive constant for all large x. The random variable Sn has. k ≥ 1. if Xn + An = 0. 69 . conditional on Xn and An (and all past states and arrivals) a distribution which is Binomial with parameters Xn + An . So: px. Let us ﬁnd its transition probabilities. .x−1 when x > 0. An−1 .x+k for k ≥ 1. For x > 0. so q x G(x) tends to 0 as x → ∞. where q = 1 − p. for example we can make it ≤ µ/2. x+k−1 k≥1 + µ − λ1 (x + 1)pq − λk (x + k)pq k≥2 = µ − q G(x). k 0 ≤ k ≤ x + a.x+k kλk − kλk (x + k)pq x+k−1 + k≥1 x k≥1 = −λ0 xpq x−1 + = −λ0 xpq x x−1 kλk+1 (x + k + 1)pq x+k . The probabilities px. We here only restrict ourselves to the computation of this expected change. Sn = 1|Xn = x) = λ0 xpq x−1 . Proving this is beyond the scope of the lectures. Indeed. The reason we take the maximum with 0 in the above equation is because. and so q x tends to 0 much faster than G(x) tends to ∞. i. .where λk ≥ 0 and k≥0 kλk = 1.x+k = λk (1 − (x + k)pq x+k−1 ) + λk+1 (x + k + 1)pq x+k . then the chain is transient. Idea of proof. we should not subtract 1. where G(x) is a function of the form α + βx . Xn−1 . and p.x−1 + k≥1 kpx.x are computed by subtracting the sum of the above from 1.x−1 = P (An = 0. Since p > 0. we must have no new packets arriving. The main result is: Proposition 6. So P (Sn = k|Xn = x. we think as follows: to have a reduction of x by 1. Sn should be 0 or ≥ 2) or we need k + 1 new packets and one selection (Sn = 1). We also let µ := k≥1 kλk be the expectation of An .e. the Markov chain (Xn ) is transient. So we can make q x G(x) as small as we like by choosing x suﬃciently large. The Aloha protocol is unstable. An = a. we have ∆(x) = (−1)px. n ≥ 0) is a Markov chain. . The process (Xn . we have q < 1. selections are made amongst the Xn + An packets. deﬁned by‘ ∆(x) := E[Xn+1 − Xn |Xn = x]. To compute px. we have that the drift is positive for all large x and this proves transience. we think like this: to have an increase by k we either need exactly k new packets and no successful transmissions (i. and exactly one selection: px. To compute px. Unless µ = 0 (which is a trivial case: no arrivals!).e.) = x + a k x−k p q .

The way it does that is by computing the stationary distribution of a Markov chain that we now describe. We pick α between 0 and 1 (typically. So this is how Google assigns ranks to pages: It uses an algorithm. The algorithm used (PageRank) starts with an initial guess π(0) for π and updates it by π (k) = π (k−1) P. |V | These are new transition probabilities. the rank of a page may be very important for a company’s proﬁts. if page i has a link to page j then we put an arrow from i to j and consider it as a directed edge of the graph. All you need to do to get high rank is pay (a lot of money). Deﬁne transition probabilities as follows: 1/|L(i)|. So if Xn = i is the current location of the chain.3 PageRank (trademark of Google): A World Wide Web application PageRank is an algorithm used by the popular search engine Google in order to assign importance to a web page. unless he gets bored–and this happens with probability α–in which case he clicks on a random page. in the latter case. For each page i let L(i) be the set of pages it links to. to compute the stationary distribution π corresponding to the Markov chain with transition matrix P = [pij ]: π = πP. Academic departments link to each other by hiring their 70 . the chain moves to a random page. The interpretation is that of a ‘bored surfer’: a surfer moves according the the original Markov chain. because of the size of the problem. that allows someone to manipulate the PageRank of a web page and make it much higher. Then it says: page i has higher rank than j if π(i) > π(j). then i is a ‘dangling page’. unless i is a dangling page.2) and let α pij := (1 − α)pij + . α ≈ 0. Its implementation though is one of the largest matrix computations in the world. Comments: 1. PageRank. if L(i) = ∅ 0. The idea is simple. known as ‘302 Google jacking’. It is not clear that the chain is irreducible. If L(i) = ∅. there are companies that sell high-rank pages. otherwise. The web graph is a a graph set of vertices V consisting of web pages and edges representing hyperlinks: thus. So we make the following modiﬁcation. It uses advanced techniques from linear algebra to speed up computations. if j ∈ L(i) pij = 1/|V |. Therefore. A new use of PageRank is to rank academic doctoral programs based on their records of ﬁnding jobs for their graduates. the next location Xn+1 is picked uniformly at random amongst all pages that i links to. 3. 2.24. There is a technique. neither that it is aperiodic. In an Internet-dominated society.

colour of eyes) appears in two forms. we will assume that. 71 . 5. then. two AA individuals will produce and AA child. all we need is the information outlined in this ﬁgure.faculty from each other (and from themselves). We would like to study the genetic diversity.4 The Wright-Fisher model: an example from biology In an organism. To simplify. When two individuals mate. One of the circle alleles is selected twice. 4. a gene controlling a speciﬁc characteristic (e. this allele is of type A with probability x/2N . But it is a kind of corruption that is hard to trace since it is unanimously equated with modern use of Information Technology. Since we don’t care to keep track of the individuals’ identities. Do the same thing 2N times. We are now ready to deﬁne the Markov chain Xn representing the number of alleles of type A in generation n. One of the square alleles is selected 4 times. We will assume that a child chooses an allele from each parent at random. generation after generation. An individual can be of type AA (meaning that all its alleles are A) or aa or Aa. Imagine that we have a population of N individuals who mate at random producing a number of oﬀspring. in each generation. Thus. it makes the evaluation procedure easy. Lazy university administrators use PageRank-like systems for assigning a single number to a researcher’s output. And so on. But an AA mating with a Aa may produce a child of type AA or one of type Aa. In this ﬁgure there are 4 alleles of type A (circles) and 6 of type a (squares). their oﬀspring inherits an allele from each parent. at the next generation the number of alleles of type A can be found as follows: Pick an allele at random. If so. This number could be independent of the research quality but. The alleles in an individual come in pairs and express the individual’s genotype for that gene. This means that the external appearance of a characteristic is a function of the pair of alleles. We have a transition from x to y if y alleles of type A were produced.g. meaning how many characteristics we have. The process repeats. maintain this allele for the next generation. What is important is that the evaluation can be done without ever looking at the content of one’s research. Needless to say that these uses of PageRank and PageRank-like systems represent corruption at the deepest level. So let x be the number of alleles of type A (so and 2N − x is the number of those of type a). the genetic diversity depends only on the total number of alleles of each type. a dominant (say A) and a recessive (say a). Alleles that are never picked up may belong to individuals who never mate or may belong to individuals who mate but their oﬀspring did not inherit this allele. 24. We will also assume that the parents are replaced by the children. from the point of view of an administrator. Three of the square alleles are never selected.

. the number of alleles of type A won’t change. Clearly. we would like to know the probability ϕ(x) = Px (T0 > TM )P (T0 > TM |X0 = x) that the chain will eventually be absorbed by state M . the chain gets absorbed by either 0 or 2N . . . Clearly. Xn ). there is a x chance (= ( 2N )2N ) that only alleles of type A are selected. . on the average. . 1− y 2N 2N x Notice that there is a chance (= (1 − 2N )2N ) that no allele of type A is selected at all. ϕ(M ) = 1. the expectation doesn’t change and. M n P (ξk = 0|Xn . 2N − 1} form a communicating class of inessential states. with n P (ξk = 1|Xn . These are: M ϕ(0) = 0. . since. n n where. ϕ(x) = x/M. p00 = p2N. y ≤ 2N − 1. we should have M × ϕ(x) + 0 × (1 − ϕ(x)) = x. X0 ) = 1 − Xn . Let M = 2N .e. . for all n ≥ 0.Each allele of type A is selected with probability x/2N . Tx := inf{n ≥ 1 : Xn = x}. . X0 ) = E(Xn+1 |Xn ) = M This implies that. and the population becomes one consisting of alleles of type a only. . but the argument needs further justiﬁcation because “the end of time” is not a deterministic time. . Since pxy > 0 for all 1 ≤ x. i. . . EXn+1 = EXn Xn = Xn .2N = 1. as usual. conditionally on (X0 . the random variables ξ1 . 72 . . . . we see that 2N x 2N −y x y pxy = . ξM are i. The fact that M is even is irrelevant for the mathematical model.) One can argue that. M Therefore. . X0 ) = Xn . Since selections are independent. E(Xn+1 |Xn . We show that our guess is correct by proceeding via the mundane way of verifying that it satisﬁes the ﬁrst-step analysis equations. Therefore P (Xn = 0 or Xn = 2N for all large n) = 1. . Another way to represent the chain is as follows: n n Xn+1 = ξ1 + · · · + ξM . . . ϕ(x) = y=0 pxy ϕ(y). the states {1.i. Similarly. M and thus. on the average. . (Here. so both 0 and 2N are absorbing states. . since. at “the end of time” the chain (started from x) will be either M with probability ϕ(x) or 0 with probability 1 − ϕ(x). i. . The result is correct. . .d.e. and the population becomes one consisting of alleles of type A only. 0 ≤ y ≤ 2N.

. Xn electronic components at time n. Thus. b). the demand is only partially satisﬁed. . We will show that this Markov chain is positive recurrent. Here are our assumptions: The pairs of random variables ((An . n ≥ 0) are i. M x M (M −1)−(y−1) x x +1− M M M −1 = 24. A number An of new components are ordered and added to the stock. the demand is. Working for n = 1. there are Xn + An − Dn components in the stock.d. . and so. 73 . Let us consider (n) gxy := Px (Xn > y) = P (x + ξ0 + · · · + ξn−1 ) ∨ (ξ1 + · · · + ξn−1 ) ∨ · · · ∨ ξn−1 ∨ 0 > y . If the demand is for Dn components.5 A storage or queueing application A storage facility stores. on the average. Let ξn := An − Dn . so that Xn+1 = (Xn + ξn ) ∨ 0. M −1 y−1 x M y−1 1− x . We will apply the so-called Loynes’ method. The correctness of the latter can formally be proved by showing that it does satisfy the recursion. 2. Since the Markov chain is represented by this very simple recursion.i. the demand is met instantaneously. we ﬁnd X1 = (X0 + ξ0 ) ∨ 0 X2 = (X1 + ξ1 ) ∨ 0 = (X0 + ξ0 + ξ1 ) ∨ ξ1 ∨ 0 ··· X3 = (X2 + ξ2 ) ∨ 0 = (X0 + ξ0 + ξ1 + ξ2 ) ∨ (ξ1 + ξ2 ) ∨ ξ2 ∨ 0 Xn = (X0 + ξ0 + · · · + ξn−1 ) ∨ (ξ1 + · · · + ξn−1 ) ∨ · · · ∨ ξn−1 ∨ 0. Otherwise. for all n ≥ 0. we will try to solve the recursion. We have that (Xn . larger than the supply per unit of time. We can express these requirements by the recursion Xn+1 = (Xn + An − Dn ) ∨ 0 where a ∨ b := max(a. at time n + 1. The right-hand side of the last equals M y=0 M y x M y x 1− M M −y y = M = M y=1 M y=1 M! x y!(M − y)! M y 1− x M M −y y M x M M −y−1+1 x (M − 1)! (y − 1)!(M − y)! M M y=1 y−1+1 1− x = M x = M as needed.The ﬁrst two are obvious. say. while a number of them are sold to the market. and E(An − Dn ) = −µ < 0. n ≥ 0) is a Markov chain. Dn ). provided that enough components are available. and the stock reached level zero at time n + 1.

using π(y) = limn→∞ P (Mn = y). Hence. . . n→∞ n→∞ We claim that π(y) := g(y) − g(y − 1) satisﬁes the balance equations. where the latter equality follows from the fact that. . . and. But Mn+1 = 0 ∨ max Zj 1≤j≤n+1 =0∨ d 1≤j≤n+1 max (Zj − Z1 ) + ξ0 = 0 ∨ (Mn + ξ0 ).i. . for y ≥ 0. in computing the distribution of Mn which depends on ξ0 . P (Mn+1 = y) = P (0 ∨ (Mn + ξ0 ) = y) = x≥0 P (Mn = x)P (0 ∨ (x + ξ0 ) = y) P (Mn = x) pxy . In particular. . . . . we may replace the latter random variables by ξ1 . instead of considering them in their original order. we reverse it. however. . ξn−1 ) = (ξn−1 . 74 . . . ξn−1 ). the event whose probability we want to compute is a function of (ξ0 .d. . we ﬁnd π(y) = x≥0 π(x)pxy . . ξn−1 .Thus. for all n. Notice. ξn . . The limit of a bounded and increasing sequence certainly exists: g(y) := lim g0y = lim P0 (Xn > y). ξ0 ). Indeed. we may take the limit of both sides as n → ∞. Then Since Mn+1 ≥ Mn we have g0y ≤ g0y (n) (n) (n+1) g0y = P (Mn > y). = x≥0 Since the n-dependent terms are bounded. that d (ξ0 . . . . Therefore. the random variables (ξn ) are i. (n) . . (n) gxy = P (x + ξn−1 + · · · + ξ0 ) ∨ (ξn−2 + · · · + ξ0 ) ∨ · · · ∨ ξ0 ∨ 0 > y . Let Zn := ξ0 + · · · + ξn−1 Mn := Zn ∨ Zn−1 ∨ · · · ∨ Z1 ∨ 0. so we can shuﬄe them in any way we like without altering their joint distribution. .

Z2 . We must make sure that it also satisﬁes y π(y) = 1. The case Eξn = 0 is delicate. This will be the case if g(y) is not identically equal to 1. .} < ∞) = 1. the Markov chain may be transient or null recurrent. Depending on the actual distribution of ξn . It can also be shown (left as an exercise) that if Eξn > 0 then the Markov chain is transient. n→∞ This implies that P ( lim Zn = −∞) = 1. From the Law of Large Numbers we have P ( lim Zn /n = −µ) = 1.} > y) is not identically equal to 1. n→∞ But then P ( lim Mn = 0 ∨ max{Z1 . Here. . as required. .So π satisﬁes the balance equations. 75 . we will use the assumption that Eξn = −µ < 0. . . n→∞ Hence g(y) = P (0 ∨ max{Z1 . Z2 . .

then p(1) = p. namely that to go from state x to state y it must pass through all intermediate states because its value can change by at most 1 at each step.and space-homogeneity. This is the drunkard’s walk we’ve seen before. Example 3: Deﬁne p(x) = 1 2d . Example 2: Let where ej be the vector that has 1 in the j-th position and 0 everywhere else. If d = 1. besides time.and space-homogeneous transition probabilities given by px. . . . ±ed } otherwise 76 . . . . this walk enjoys.PART II: RANDOM WALKS 25 Random walk The purpose of this part of the notes is to take a closer look at a special kind of Markov chain known as random walk. so far. This means that the transition probability pxy should depend on x and y only through their relative positions in space. For example. given a function p(x). . namely. that of spatial homogeneity. if x ∈ {±e1 . p(x) = 0. j = 1. . the set of d-dimensional vectors with integer coordinates.y+z for any translation z. the skip-free property. groups) are allowed. Example 1: Let p(x) = C2−|x1 |−···−|xd | . d p(x) = qj . a random walk is a Markov chain with time. A random walk possesses an additional property. x ∈ Zd . Deﬁne pj . if x = −ej . where p1 + · · · + pd + q1 + · · · + qd = 1. otherwise where p + q = 1. A random walk of this type is called simple random walk or nearest neighbour random walk. To be able to say this. . p(−1) = q. otherwise. but. we need to give more structure to the state space S. if x = ej . 0.y = px+z. So. a structure that enables us to say that px. . . j = 1. here. . we can let S = Zd . d 0.g. Our Markov chains.y = p(y − x). where C is such that x p(x) = 1. In dimension d = 1. we won’t go beyond Zd . have been processes with timehomogeneous transition probabilities. More general spaces (e.

We shall resist the temptation to compute and look at other 77 . Let us denote by Sn the state of the walk at time n. x ∈ Zd is deﬁned as a new probability p ∗ p′ given by (p ∗ p′ )(x) = The reader can easily check that p ∗ p′ = p′ ∗ p. then p(1) = p(−1) = 1/2. It is clear that Sn can be represented as Sn = S0 + ξ1 + · · · + ξn . . we will mean a sum over all possible values. Indeed.i. has equal probabilities to move from one state x to one of its “neighbouring states” x ± e1 .) Now the operation in the last term is known as convolution. we can reduce several questions to questions about this random walk. in general. p(x) = 0. in addition. .) In this notation. We refer to ξn as the increments of the random walk. because all we have done is that we dressed the original problem in more fancy clothes. p(n) = Px (Sn = y) = P (x + ξ1 + · · · + ξn = y) = P (ξ1 + · · · + ξn = y − x). So. p′ (x). (When we write a sum without indicating explicitly the range of the summation variable. If d = 1. p ∗ (p′ ∗ p′′ ) = (p ∗ p′ ) ∗ p′′ . p ∗ δz = p. Furthermore. the initial state S0 is independent of the increments. . where ξ1 . x ∈ Zd . Furthermore. . The special structure of a random walk enables us to give more detailed formulae for its behaviour. The convolution of two probabilities p(x). and this is the symmetric drunkard’s walk. x ± ed . We will let p∗n be the convolution of p with itself n times. Obviously. computing the n-th order convolution for all n is a notoriously hard problem. notice that P (ξ1 + ξ2 = x) = y P (ξ1 = x.d.This is a simple random walk. otherwise. y p(x)p′ (y − x). But this would be nonsense. We call it simple symmetric random walk. ξ2 . (Recall that δz is a probability distribution that assigns probability 1 to the point z. One could stop here and say that we have a formula for the n-step transition probability. x + ξ2 = y) = y p(x)p(y − x). we have P (ξ1 + ξ2 = x) = (p ∗ p)(x). xy The random walk ξ1 + · · · + ξn is a random walk starting from 0. are i. We easily see that P (ξ1 + · · · + ξn = x) = p∗n (x). which. . random variables in Zd with common distribution P (ξn = x) = p(x). . .

we should assign. ξn = εn ) = 2−n . Therefore. . In the sequel. ξn ).2) where the ξn are i. 2) → (3. . if we can count the number of elements of A we can compute its probability. If yes. . we are dealing with a uniform distribution on the sample space {−1. As a warm-up. consider the following: 78 . 1): (2. a random vector with values in {−1. 1) → (2.0) We can thus think of the elements of the sample space as paths. all possible values of the vector are equally likely. Since there are 2n elements in {0.1) + (0. −1. but its practice may be hard. The principle is trivial.methods.2) (4. +1}n . who cares about the exact value of p∗n (x) for all n and x? Perhaps it is the case (a) that diﬀerent questions we need to ask or (b) that approximate methods are more informative than exact ones. random variables (random signs) with P (ξn = ±1) = 1/2. If not. −1} then we have a path in the plane that joins the points (0. we have P (ξ1 = ε1 . for ﬁxed n. After all. 2) → (5. and so #A P (A) = n . A ⊂ {−1. Will the random walk ever return to where it started from? B. how long will that take? C. we shall learn some counting methods. . .i+1 = pi. Clearly. and the sequence of signs is {+1. to each initial position and to each sequence of n signs. +1. So if the initial position is S0 = 0. . Here are a couple of meaningful questions: A. 1) → (4. +1. Look at the distribution of (ξ1 . where will it go? 26 The simple symmetric random walk in dimension 1: path counting This is a random walk with values in Z and transition probabilities pi. 0) → (1.1) + − (5. Events A that depend only on these the ﬁrst n random signs are subsets of this sample space. 2 where #A is the number of elements of A. When the walk starts from 0 we can write Sn = ξ1 + · · · + ξn . . + (1. 1}n . . a path of length n. +1}n .1) − (3. So.i. First.i−1 = 1/2 for all i ∈ Z. +1}n .d.

) The darker region shows all paths that start at (0. i). i). the number of paths that start from (m. So P (A) = 6/26 = 6/128 = 3/64 ≈ 0.047. 0) (n. and compute the probability of the event A = {S2 ≥ 0. n 79 .i) (0. The paths comprising this event are shown below. The number of paths that start from (0. y) is given by: #{(0. (n − m + y − x)/2 (23) Remark. S4 = −1. x) (n. in this case. (There are 2n of them. S6 < 0. 0)). (n. 2 1 3 0 1 2 4 5 6 7 −1 −2 −3 − −4 −7 26. 0) and n ends at (n. Let us ﬁrst show that Lemma 14. We use the notation {(m.Example: Assume that S0 = 0. y) is given by: #{(m. x) (n. i)} = n n+i 2 More generally. Consider a path that starts from (0. y)} = n−m . i). 0) and end up at the speciﬁc point (n. Proof. y)}. We can easily count that there are 6 paths in A. Hence (n+i)/2 = 0. 0) and end up at (n. y)} =: {all paths that start at (m. x) and end up at (n. x) and end up at (n. In if n + i is not an even number then there is no path that starts from (0.1 Paths that start and end at speciﬁc points Now consider a path that starts from a speciﬁc point and ends up at a speciﬁc point. S7 < 0}.0) The lightly shaded region contains all paths of length n that start at (0. 0) and ends at (n.

Let u be the number of +’s in this path (the number of upward segments) and d the number of −’s (the number of downward segments). Then u + d = n, Hence u = (n + i)/2, d = (n − i)/2. So if we know where the path starts from and where it ends at we know how many times it has gone up and how many down. The question then becomes: given a sequence of n signs out of which u are + and the rest are −, in how many ways can we arrange them? Well, in exactly n ways, and this proves the formula. u Notice that u = (n + i)/2 must be an integer. This is true if n + i is even or, equivalently if n and i are simultaneously even or simultaneously odd. If this is not the case, then there is no path that can go from (0, 0) to (n, i). For example (look at the ﬁgure), there is no path that goes from (0, 0) to (3, 2). We shall thus deﬁne n to be zero if u is not an integer. u More generally, the number of paths that start from (m, x) and end up at (n, y) equals the number of paths that start from (0, 0) and end up at (n − m, y − x). So, applying the previous formula, we obtain (23). u − d = i.

26.2

**Paths that start and end at speciﬁc points and do not touch zero at all
**

(n, y) that never become n−m . − m − y − x)

Lemma 15. Suppose that x, y > 0. The number of paths (m, x) zero (i.e. they do not touch the horizontal axis) is given by: #{(m, x) (n, y); remain > 0} =

1 2 (n

n−m − − m + y − x)

1 2 (n

(24)

Proof.

(m,x) (n,y)

The shaded region contains all possible paths that start at (m, x) and end up at the speciﬁc point (n, y) which, in addition, never touch the horizontal axis.

n−m

Consider, instead, a path that starts at (m, x) and ends up at point (n, y) which DOES touch the horizontal axis, like the path indicated by the solid line below. There is a trick, 80

called reflection, which is useful here. Note that to each such path there corresponds another path obtained by following it up to the ﬁrst time it hits zero and then reﬂecting it around the horizontal axis. In the ﬁgure below, follow the solid path starting at (m, x) up to the ﬁrst time it hits zero and then follow the dotted path: this is the reﬂected path.

(n, y)

(m, x)

(n, −y)

A path and its reﬂection around a horizontal line. Notice that the reﬂection starts only at the point where the path touches the horizontal line.

Note that if we know the reﬂected path we can recover the original path. Hence we may as well count the number of reﬂected paths. But each reﬂected path starts from (m, x) and ends at (n, −y) and so it necessarily crosses the horizontal axis. We thus apply (23) with −y in place of y: #{(m, x) (n, y); touch zero} = #{(m, x) = So, for x > 0, y > 0, #{(m, x) (n, y); remain > 0} = #{(m, x) =

1 2 (n

n−m . (n − m − y − x)/2

(n, −y)}

n−m − − m + y − x)

(n, y)} − #{(m, x)

1 2 (n

(n, y); touch zero}

n−m . − m − y − x)

**Corollary 14. For any y > 0, #{(1, 1) Let m = 1, x = 1 in (24): #{(1, 1) (n, y); remain > 0} = n−1 n−1 − 1 + y) − 1 2 (n − y) − 1 (n − 1)! (n − 1)! − n−y = n+y n−y n−y ! ! 2 −1 ! 2 2 −1 ! 2 1n+y 1n−y n! = − n+y n 2 n 2 ! n−y ! 2 2
**

1 2 (n

(n, y); remain > 0} =

y n

#{(0, 0)

(n, y)}.

(25)

=

y n

n

n+y 2

,

and the latter term equals the total number of paths from (0, 0) to (n, y).

81

Corollary 15. The number of paths from (m, x) to (n, y) that do not touch level z < min(x, y) is given by: #{(m, x) (n, y); remain > z} = n−m − − m + y − x) n−m . − m − y − x) − z (26)

1 2 (n

1 2 (n

Proof. Because it is only the relative position of x and y with respect to z that plays any role, we have #{(m, x) (n, y); remain > z} = #{(m, x + z) (n, y + z); remain > 0}

and so the formula is obtained by an application of (24). Corollary 16. The number of paths from (0, 0) to (n, y), where y ≥ 0, that remain ≥ 0 is given by: n n #{(0, 0) (n, y); remain ≥ 0} = n+y − n+y (27) 2 2 +1 Proof. #{(0, 0) (n, y); remain ≥ 0} = #{(0, 0) = = where we used (26) with z = −1. Corollary 17 (Catalan numbers). The number of paths from (0, 0) to (n, 0) that remain ≥ 0 is given by: #{(0, 0) Proof. Apply (27) with y = 0: #{(0, 0) (n, 0); remain ≥ 0} = n

n 2

= #{(0, 1) n

n+y 2

(n, y); remain > −1} n −1 n , n+y 2 +1

n−y 2

(n, y + 1); remain > 0}

− −

n

n+y 2

(n, 0); remain ≥ 0} =

1 n+1 . n + 1 n/2

−

n 2

n −1

=

1 n+1 , n + 1 n/2

where the latter calculation is as in the proof of (25). Corollary 18. The number of paths that start from (0, 0), have length n, and remain nonnegative is given by: #{(0, 0) (n, •); remain ≥ 0} = n . ⌈n/2⌉ (28)

where ⌈x⌉ denotes the smallest integer that is ≥ x. 82

Proof. To see, this, just use (27) and sum over all y ≥ 0. If n = 2m then we need to set y = 2z in (27) (otherwise (27) is just zero) and so we have #{(0, 0) = (2m, •); remain ≥ 0} = 2m m+z − 2m m+z+1

z≥0

2m 2m 2m 2m 2m 2m 2m ± ··· + − + ··· + − + − 2m 2m 2m − 1 m+2 m+1 m+1 m n 2m , = = n/2 m

because all the terms that are denoted by · · · are equal to 0, while the rest cancel with one another, and only the ﬁrst one remains. If n = 2m + 1 then we need to set y = 2z + 1 in (27) (otherwise (27) is just zero) and so we have #{(0, 0) = (2m + 1, •); remain ≥ 0} = 2m + 1 2m + 1 − m+z+1 m+z+2

z≥0

2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 ±· · · + − +· · ·+ − + − 2m + 1 2m + 1 2m m+3 m+2 m+2 m+1 n 2m + 1 . = = ⌈n/2⌉ m+1

We have shown that the shaded regions contain exactly the same number of paths. Was this, a posteriori, obvious?

2m

2m

27

The simple symmetric random walk in dimension 1: simple probability calculations

Since all paths of certain length have the same probability, we can translate the path counting formulae into corresponding probability formulae.

27.1

**Distribution after n steps
**

n 2−n . (n + y)/2 83

Lemma 16. P0 (Sn = y) =

y) that do not touch zero} × 2−n #{paths (0. y)} . Sn > 0 | Sn = y) = y . P0 (S1 ≥ 0 . n (30) Proof. . But. n This is called the ballot theorem and the reason will be given in Section 29. y)} × 2−n n−1 n−1 − 1 1 (n − 1 + y − 1) 2 (n − 1 − y − 1) = 2 n 1 (n + y) 2 = y . Sn ≥ 0 | Sn = y) = 1 − In particular. (n/2) + 1 y+1 P0 (Sn = y + 2) = 1 . 0) (n. if n is even. we shall just compute it. . If n. given that S0 = 0 and Sn = y > 0. y)} . P0 (Sn = y) 2 (n + y) + 1 . if n is even. 27. . 0) (n. The probability that. .P (Sn = y|Sm = x) = In particular. . for now. . . Sn ≥ 0 | Sn = 0) = 84 1 . (n − m + y − x)/2 u(n) := P0 (Sn = 0) = Proof. We have P0 (Sn = y) = and P (Sn = y|Sm = x) = and we apply (23). . n/2 (29) #{(0. n 2−n . 27.3 Some conditional probabilities Lemma 17. 1) (n. n 2 #{paths (m. . the random walk never becomes zero up to time n is given by P0 (S1 > 0. x) 2n−m (n. Such a simple answer begs for a physical/intuitive explanation. . S2 > 0.2 The ballot theorem Theorem 25. y ≥ 0 are both even or both odd then P0 (S1 ≥ 0 . n−m 2−n−m . The left side of (30) equals #{paths (1.

ξ2 + ξ3 ≥ 0. +1 27. . . Sn ≥ 0).Proof. ξ2 . . . . 85 . Sn = 0) = P0 (Sn = 0). . . . (c) (d) (e) (f ) (g) where (a) is obvious. . . . ξ2 + · · · + ξn ≥ 0) = P0 (S1 ≥ 0. (c) is just conditioning on {ξ1 = 1} and using the fact that S1 = ξ1 . . . . (d) follows from P0 (ξ1 = 1) = 1/2 and from the substitution of the value of ξ1 on the left of the conditioning. . . . Sn > 0) = 2P0 (ξ1 = 1)P0 (S2 > 0. We use (27): P0 (Sn ≥ 0 . ξ3 . . (f ) follows from the replacement of (ξ2 . . If n is even. . Sn ≥ 0) = P0 (S1 = 0. . . . ξ3 . . . .). . Sn > 0 | ξ1 = 1) = P0 (1 + ξ2 > 0. . Sn ≥ 0. Sn = 0) = P0 (S1 > 0. P0 (S1 ≥ 0. . n/2 where we used (28) and (29). . . 0) (n. We next have P0 (S1 = 0. . . . (b) follows from symmetry. Sn = y) P0 (Sn = y) =1− P0 (Sn = y + 2) . . . (e) follows from the fact that ξ1 is independent of ξ2 . . . and Sn−1 > 0 implies that Sn ≥ 0. . . Proof. .) by (ξ1 . . . . . . •). . . remain ≥ 0} = 2n n 2−n = P0 (Sn = 0). Sn−1 ≥ 0) = P0 (ξ2 ≥ 0. For even n we have P0 (S1 ≥ 0. . . Sn < 0) (b) (a) = 2P0 (S1 > 0. . . Sn > 0) + P0 (S1 < 0. . P0 (Sn = y) − n n+y 2 n +1 n+y 2 This is further equal to 1− ( n+y 2 ( n! n−y + 1)! ( 2 − 1)! n+y 2 )! ( n−y )! 2 =1− n! n−y 2 n+y 2 + 1 = n+y 2 y+1 . 1 + ξ2 + ξ3 > 0. and ﬁnally (g) is trivial because Sn−1 ≥ 0 means Sn−1 > 0 because Sn−1 cannot take the value 0. . . . Sn ≥ 0 | Sn = y) = n = n+y 2 P0 (Sn ≥ 0 . Sn ≥ 0) = #{(0. . and from the fact that if m is integer then 1 + m > 0 is equivalent to m ≥ 0. . 1 + ξ2 + · · · + ξn > 0 | ξ1 = 1) = P0 (S1 ≥ 0.4 Some remarkable identities Theorem 26. . . . . .

. we either have Sn−1 = 1 or −1. Sn−1 < 0. . we computed. So the last term is 1/2. . . . we can omit the last event. We now want to ﬁnd the probability f (n) that the walk will return to 0 for the ﬁrst time in n steps. Sn = 0) Now. Sn = 0). . P0 (Sn−1 > 0. we see that Sn−1 > 0. with equal probability. n−1 Proof. Sn = 0) = 2P0 (Sn−1 > 0. If a particle is to the right of a then you see its image on the left. Sn−1 > 0. . . . and if the particle is to the left of a then its mirror image is on the right. Sn−1 > 0. the two terms must be equal. . . Since the random walk cannot change sign before becoming zero. . Sn−2 > 0|Sn−1 > 0. and u(n) = P0 (Sn = 0). Sn = 0) = u(n) . . . the probability u(n) that the walk (started at 0) attains value 0 in n steps. Sn = 0) P0 (S1 > 0. Sn = 0) = P0 (Sn = 0) P (Sn−1 > 0|Sn = 0). . Put a two-sided mirror at a. n We know that P0 (Sn = 0) = u(n) = n/2 2−n . By symmetry. So f (n) = 2P0 (S1 > 0.27. f (n) = P0 (S1 > 0. . . To take care of the last term in the last expression for f (n). Sn−1 = 0. Fix a state a. So u(n) f (n) = . So 1 f (n) = 2u(n) P0 (S1 > 0. Sn = 0 means Sn−1 = 1. . Sn−2 > 0|Sn−1 = 1). . This is not so interesting. . So if the particle is in position s then its mirror image is in position s = 2a − s because s − a = a − s.5 First return to zero In (29). Then f (n) := P0 (S1 = 0. Also. Theorem 27. . . Sn = 0. for even n. Let n be even. n−1 28 The reﬂection principle for a simple random walk in dimension 1 The reflection principle says the following. 86 . Sn = 0) + P0 (S1 < 0. given that Sn = 0. . . It is clear that if a particle starts at 0 and performs a simple symmetric random walk then its mirror image starts at a and performs a simple symmetric random walk. By the Markov property at time n − 1. 2 and the last probability can be computed from the ballot theorem–see (30): it is equal to 1/(n − 1).

Sn ≤ a) + P0 (Sn > a). If Sn ≥ a then Ta ≤ n. If (Sn ) is a simple symmetric random walk started at 0 then so is (Sn ). Thus. Proof. But a symmetric random walk has the same law as its negative. deﬁne Sn = where is the ﬁrst hitting time of a. . n ≥ 0) be a simple symmetric random walk and a > 0. the process (STa +m − a. . n ≥ Ta .What is more interesting is this: Run the particle till it hits a (it will. S1 . So: P0 (Sn ≥ a) = P0 (Ta ≤ n. Let (Sn . 28. observing that Sn has the same distribution as −Sn we get that this is also equal to P0 (|Sn | ≥ a) − 1 P0 (|Sn | = a). So Sn − a can be replaced by a − Sn in the last display and the resulting process. independent of the past before Ta . for sure–Theorem 35) and after that consider its mirror image Sn = 2a − Sn . Sn ). n ∈ Z+ ). In the last event. n ≥ Ta By the strong Markov property. n < Ta .1 Distribution of hitting time and maximum We can take advantage of this principle to establish the distribution function of Ta : Theorem 29. But P0 (Ta ≤ n) = P0 (Ta ≤ n. as a corollary to the above result. . we obtain 87 . Sn ≥ a). Trivially. 2a − Sn ≥ a) = P0 (Ta ≤ n. Sn ≤ a) + P0 (Ta ≤ n. n < Ta . Theorem 28. m ≥ 0) is a simple symmetric random walk. Sn ≤ a). . Combining the last two displays gives P0 (Ta ≤ n) = P0 (Sn ≥ a) + P0 (Sn > a) = P0 (Sn ≥ a) + P0 (Sn ≥ a) − P0 (Sn = a). so we may replace Sn by 2a − Sn (Theorem 28). 2 Proof. In other words. P0 (Sn ≥ a) = P0 (Ta ≤ n. called (Sn . Finally. Then Ta = inf{n ≥ 0 : Sn = a} Sn . 2a − Sn . Then 1 P0 (Ta ≤ n) = 2P0 (Sn ≥ a) − P0 (Sn = a) = P0 (|Sn | ≥ a) − P0 (|Sn | = a). we have n ≥ Ta . n ∈ Z+ ) has the same law as (Sn . a + (Sn − a). The last equality follows from the fact that Sn > a implies Ta ≤ n. 2 Deﬁne now the running maximum Mn = max(S0 . Then. Sn > a) = P0 (Ta ≤ n. Sn = Sn .

Since Mn < x is equivalent to Tx > n. 2 Proof. 9 Bertrand. 8 88 . 436-437. 29 Urns and the ballot theorem Suppose that an urn8 contains n items. The original ballot problem. (31) x > y. An urn is a vessel. combined with (31). the ashes of the dead after cremation. Sci. . Sn = y) = P0 (Sn = 2x − y). Solution directe du probl`me r´solu par M. Rend. Acad. which results into P0 (Tx ≤ n. We can do more: we can derive the joint distribution of Mn and Sn : Theorem 30. But if x > y then 2x − y > x and so {Sn = 2x − y} ⊂ {Tx ≤ n}. n = a + b. what is the probability that the number of azure items is constantly ahead of the number of black ones? The problem was solved10 in the same year. A sample from the urn is a sequence η = (η1 . Sn = y) = P0 (Tx ≤ n. usually a vase furnished with a foot or pedestal. of which a are coloured azure and b black. we can replace Sn by 2x − Sn in the last part of the last display and get P0 (Tx ≤ n. Compt. Bertrand.Corollary 19. ornamental objects. . where ηi indicates the colour of the i-th item picked. If an urn contains a azure items and b = n−a black items. Proof. as long as a ≥ b. by applying the reﬂection principle (Theorem 28). during sampling without replacement. Rend. posed in 18879 asks the following: if we start picking the items from the urn one by one without replacement. The answer is very simple: Theorem 31. e e e Paris. as for holding liquids. Solution d’un probl`me. Compt. 105. 105. The set of values of η contains n elements and η is uniformly distributed a in this set. and anciently for holding ballots to be drawn. J. It is the latter use we are interested in in Probability and Statistics. Sn = y) = P0 (Sn = y) − P0 (Tx ≤ n. Sn = y) = P0 (Sn = y) − P0 (Sn = 2x − y) . Sci. Sn = y) = P0 (Tx > n. This. gives the result. Sn = y). then the probability that. Observe that Mn ≥ x if and only if Sk ≥ x for some k ≤ n which is equivalent to Tx ≤ n and use Theorem 29. Paris. ηn ). P0 (Mn < x. e 10 Andr´. employed for diﬀerent purposes. (1887). (1887). the number of selected azure items is always ahead of the black ones equals (a − b)/n. If Tx ≤ n. we have P0 (Mn < x. Acad. then. . Sn = 2x − y). 369. D. . 1 P0 (Mn ≥ x) = P0 (Tx ≤ n) = 2P0 (Sn ≥ x) − P0 (Sn = x) = P0 (|Sn | ≥ x) − P0 (|Sn | = x).

the result follows.) So pi. ξn ) is. i ∈ Z. Proof of Theorem 31. a. ξn ) is uniformly distributed in a set with n elements. a we have that a out of the n increments will be +1 and b will be −1. . always assuming (without any loss of generality) that a ≥ b = n − a. . Proof. Sn > 0} that the random walk remains positive. b be deﬁned by a + b = n. Let ε1 . indeed. n ≥ 0) with values in Z and let y be a positive integer. It remains to show that all values are equally a likely. +1} such that ε1 + · · · + εn = y. using the deﬁnition of conditional probability and Lemma 16. Since an urn with parameters n. but which is not necessarily symmetric. An urn can be realised quite easily be a random walk: Theorem 32. . But if Sn = y then. . . .We shall call (η1 . letting a. .i−1 = q = 1 − p. the sequence (ξ1 . (ξ1 . . a − b = y. . 89 . . p(n+k)/2 q (n−k)/2 . εn be elements of {−1. Consider a simple symmetric random walk (Sn . . . . the sequence (ξ1 . . Then. . so ‘azure’ items are +1 and ‘black’ items are −1. .i+1 = p. Then. . . . ξn ) has a uniform distribution. . the ‘black’ items correspond to − signs). 30 The asymmetric simple random walk in dimension 1 Consider now a simple random walk with values in Z. Sn > 0 | Sn = y) = . where y = a − (n − a) = 2a − n. conditional on {Sn = y}. . But in (30) we computed that y P0 (S1 > 0. (The word “asymmetric” will always mean “not necessarily symmetric”. . n . . . . we have: P0 (ξ1 = ε1 . So the number of possible values of (ξ1 . . . a can be realised by a simple symmetric random walk starting from 0 and ending at Sn = y. . . . conditional on {Sn = y}. Since the ‘azure’ items correspond to + signs (respectively. the event that the azure are always ahead of the black ones during sampling is precisely the event {S1 > 0. . ξn = εn ) P0 (Sn = y) 2−n 1 = = . . . n n −n 2 (n + y)/2 a Thus. −n ≤ k ≤ n. . . we can translate the original question into a question about the random walk. We have P0 (Sn = k) = n n+k 2 pi. ξn ) of the increments is an urn process with parameters n and a = (n + y)/2. . . . ξn = εn | Sn = y) = P0 (ξ1 = ε1 . . . conditional on the event {Sn = y}. ηn ) an urn process with parameters n. The values of each ξi are ±1. n Since y = a − (n − a) = a − b. We need to show that.

it is no loss of generality to consider the random walk starting from 0. Proof. x − 1 in order to reach x. plus a time T1 required to go from −1 to 0 for the ﬁrst time. (We tacitly assume that S0 = 0. If x > 0. 1. We will approach this using probability generating functions.} ∪ {∞}. Px (Ty = t) = P0 (Ty−x = t). Proof. . . Lemma 19. then the summands in Tx = T1 + (T2 − T1 ) + · · · + (Tx − Tx−1 ) are i. This follows from the spatial homogeneity of the transition probabilities: to reach y starting from x. random variables. only the relative position of y with respect to x is needed.d. Start with S0 = 0 and watch the process until the ﬁrst n = T1 such that Sn = 1. . x ≥ 0) is also a random walk. 2qs Proof. Consider a simple random walk starting from 0.i. Consider a simple random walk starting from 0. If S1 = 1 then T1 = 1. 90 . In view of Lemma 19 we need to know the distribution of T1 . If S1 = −1 then T1 is the sum of three times: one unit of time spent ′ to go from 0 to −1. Let ϕ(s) := E0 sT1 . 2. This random variable takes values in {0.30. namely the obvious fact that a simple random walk starting from 0 must pass through levels 1. Theorem 33. and all t ≥ 0. 2. Corollary 20. . Lemma 18. . y ∈ Z. . Consider a simple random walk starting from 0. In view of this. x ∈ Z. E0 sTx = ϕ(s)x . For all x. we make essential use of the skip-free property. The rest is a consequence of the strong Markov property.) Then. Then the probability generating function of the ﬁrst time that level 1 is reached is given by ϕ(s) = E0 sT1 = 1− 1 − 4pqs2 . To prove this. plus a ′′ further time T1 required to go from 0 to +1 for the ﬁrst time. for x > 0. .1 First hitting time analysis We will use probabilistic + analytical methods to compute the distribution of the hitting times Tx = inf{n ≥ 0 : Sn = x}. The process (Tx . See ﬁgure below.

Hence T1 = This can be used as follows: ϕ(s) = E0 [s1(S1 = 1) + s1+T1 +T1 1(S1 = −1)] = sP (S1 = 1) + sE0 [sT1 sT1 |S1 = −1]P (S1 = −1) = sp + sE0 [sT1 sT1 |S1 = −1]q ′′ ′ But. But note that −Sn is a simple random walk with upward transition probability q and downward transition probability p. 0 if x > 0 2qs Tx (32) E0 s = √ |x| E sT−1 |x| = 1− 1−4pqs2 0 . T1 is independent of T1 conditional on S1 = −1 and each with distribution that of T1 . Consider a simple random walk starting from 0. 2ps Proof. Corollary 22. 91 . if x < 0 2ps Proof. We thus have ′ ′′ ′ ′′ ′ ′′ 1 ′ ′′ 1 + T1 + T 1 if S1 = 1 if S1 = −1 ϕ(s) = ps + qsϕ(s)2 . Use Lemma 19 and the previous formulae of Theorem 33 and Corollary 21. in order to hit 1. Consider a simple random walk starting from 0. The formula is obtained by solving the quadratic. Corollary 21. from the strong Markov property. Then the probability generating function of the ﬁrst time that level x ∈ Z is reached is given by √ x E sT1 x = 1− 1−4pqs2 . Hence the previous formula applies with the roles of p and q interchanged. The ﬁrst time n such that Sn = −1 is the ﬁrst time n such that −Sn = 1.Sn 1 0 −1 n T1’ T1’’ If S1 = −1 then. Then the probability generating function of the ﬁrst time that level −1 is reached is given by E0 sT−1 = 1− 1 − 4pqs2 . the RW must ﬁrst hit 0 (it takes time ′ ′′ T1 to do this) and then 1 (it takes a further time of T1 ).

k 92 . For a symmetric ′ random walk. Similarly.3 The distribution of the ﬁrst return to the origin We wish to ﬁnd the exact distribution of the ﬁrst return time to the origin. then we can omit the k upper limit in the sum. in distribution.2: P0 (T0 = n) = f (n) = u(n)/(n − 1) is n is even. Then the probability generating function of the ﬁrst return time to 0 is given by E0 s T 0 = 1 − Proof. (Pay attention: it is important to write n ≥ 1 inside the set!) Theorem 34. The latter is. 1(S1 = −1) ′ + E0 s T 0 ′ ′ 1(S1 = 1) = qE0 sT0 | S1 = −1 + pE0 sT0 | S1 = 1 .30. k by the binomial theorem. this was found in §30. where α is a real number. we have computed the probability generating function E0 sT0 –see Theorem 34 The tool we need here is Taylor’s theorem for the function f (x) = (1 + x)α . For the general case. E0 s T 0 = E0 s T 0 ′ ′ ′ 1 − 4pqs2 . the same as the time required for the walk to hit 1 starting ′ from 0. and simply write (1 + x)n = k≥0 n k x .2 First return to the origin Now let us ask: If the particle starts at 0. If we deﬁne n to be equal to 0 for k > n. 1 − 4pqs2 . how long will it take till it ﬁrst returns to 0? We are talking about the stopping time ′ T0 := inf{n ≥ 1 : Sn = 0}. E0 sT0 = qE0 (s1+T1 ) + pE0 (s1+T−1 ) = qs 1− 1 − 4pqs2 1− + ps 2qs 1 − 4pqs2 . Consider a simple random walk starting from 0. 2ps ′ =1− 30. We again do ﬁrst-step analysis. ′ But if S1 = −1 then T0 equals 1 plus the time required for the walk to hit 0 starting from −1. if S1 = 1. then T0 equals 1 plus a time whose distribution that of the time required for the walk to hit −1 starting from 0. If α = n is a positive integer then n (1 + x)n = k=0 n k x .

k as long as we deﬁne α k = α(α − 1) · · · (α − k + 1) . For example. As another example. but that we won’t worry about which ones. 2k − 1 k 93 . by deﬁnition. let us compute √ (−1)k (−x)k = k≥0 k≥0 k≥0 −1 (−x)k k = (−1)k k! (−1)(−2) · · · (−1 − k + 1) = = (−1)k . We have k≥0 1 − x = (1 − x)1/2 = 1/2 (−1)k xk . Indeed. (−1)k = 2k − 1 k k and this is shown by direct computation. and it looks exactly the same: (1 + x)α = k≥0 n(n − 1) · · · (n − k + 1) n! = . k! k! (−1)2k xk = k≥0 xk . we will show that (1 − x)−1 = 1 + x + x2 + x3 + · · · which is of course the all too-familiar geometric series. √ 1 − x. −1 k so (1 − x)−1 = as needed. k! The adverb ‘formally’ is supposed to mean that this formula holds for some x around 0.Recall that n k = It turns out that the formula is formally true for general α. So √ 1−x= ′ k≥0 1 2k k x =1+ 2k − 1 k 1 − 4pqs2 : ′ k≥1 2k k 1 x 2k − 1 k We apply this to E0 sT0 = 1 − E0 s T 0 = k≥1 1 2k (pq)k s2k . k We can write this in more familiar symbols if we observe that 2k −k 1 1/2 4 . k!(n − k)! k! α k x . (1 − x)−1 = But.

x if x > 0 . • If p > 1/2 then p − q > 0 and so Sn tends to +∞ with probability one. x 1−|p−q| 2q 1−|p−q| 2p . 94 .But. |x| if x > 0 (33) . We will see how the results about the generating functions of the hitting and return times help us answer these questions directly. • If p = 1/2 then p − q = 0 and the law of large numbers only tells us that the walk is not positive recurrent. s↑1 which is true for any probability generating function. Hence it is again transient. so 1 − 4pq = So P0 (Tx < ∞) = 1 − 4p + 4p2 = (1 − 2p)2 = |1 − 2p| = |p − q|. The key fact needed here is: P0 (Tx < ∞) = lim E0 sTx . • If p < 1/2 then p − q < 0 and so Sn tends to −∞ with probability one. if x < 0 This formula can be written more neatly as follows: Theorem 35. By applying the law of large numbers we have that P0 Sn =p−q n→∞ n lim = 1. We apply this to (32). 2k − 1 k 2k − 1 31 Recurrence properties of simple random walks Consider again a simple random walk Sn with values in Z with upward probability p and downward probability q = 1 − p. That it is actually null-recurrent follows from Markov chain theory. E0 s T 0 = n≥0 ′ ′ ′ P0 (T0 = n)sn . if p = q then P0 (Tx < ∞) = 1 for all x ∈ Z. The quantity inside the square root becomes 1 − 4pq. Comparing the two ‘inﬁnite polynomials’ we see that we must have n = 2k (we knew that– the random walk can return to 0 only in an even number of steps) and then ′ P0 (T0 = 2k) = 1 1 2k k k p q = P0 (S2k = 0). if x < 0 |x| In particular. But remember that q = 1 − p. without appeal to the Markov chain theory. Hence it is transient. by the deﬁnition of E0 sT0 . P0 (Tx < ∞) = p q q p ∧1 ∧1 .

q p. This means that there is a largest value that will be ever attained by the random walk. Theorem 36. Then the overall maximum M∞ has a geometric distribution: P0 (M∞ ≥ x) = (p/q)x . every nondecreasing sequence has a limit. then we have M∞ = limn→∞ Mn . Proof. 95 . with some positive probability. S1 . it will never ever return to the point where it started from. Indeed. Consider a random walk with values in Z. this probability is always strictly less than 1. Otherwise. Let ′ T0 = inf{n ≥ 1 : Sn = 0} be the ﬁrst return time to 0. Then limn→∞ Sn = −∞ with probability 1. except that the limit may be ∞. Unless p = 1/2.2 Return probabilities Only in the symmetric case p = q = 1/2 will the random walk return to the point where it started from. Corollary 23. If p > q then |p − q| = p − q and so (33) gives P0 (Tx < ∞) = 1. . if x > 0 .) If we let Mn = max(S0 . with probability 1 because p < 1/2. we obtain what we are looking for. 31. We know that it cannot be positive recurrent. . The last part of Theorem 35 tells us that it is recurrent. x ≥ 0. The simple symmetric random walk with values in Z is null-recurrent. Assume p < 1/2. P0 (M∞ ≥ x) = lim P0 (Mn ≥ x) = lim P0 (Tx ≤ n) = P0 (Tx < ∞) = (p/q)x . What is this probability? Theorem 37.1 Asymptotic distribution of the maximum Suppose now that p < 1/2. Consider a simple random walk with values in Z. This value is random and is denoted by M∞ = sup(S0 . Then ′ P0 (T0 < ∞) = 2(p ∧ q).Proof. . n→∞ n→∞ x ≥ 0. 31. . but here it is < ∞. again. Proof. . . and so it is null-recurrent. If p < q then |p − q| = q − p and. the sequence Mn is nondecreasing. S2 . Sn ). if x < 0 which is what we want. .

For the second formula we have ∞ ∞ E0 J0 = k=1 P0 (J0 ≥ k) = k=1 (2(p ∧ q))k = 2(p ∧ q) . And so P0 (J0 ≥ k) = P0 (J0 ≥ 1)k . (34) a random variable ﬁrst considered in Section 10 where it was also shown to be geometric. 2(p ∧ q) . So ′ P0 (J0 ≥ 1) = P0 (T0 < ∞) = 2(p ∧ q). But if it is transient. In the latter case. This proves the ﬁrst formula. if a Markov chain starts from a state i then Ji is geometric. But ′ P0 (T0 < ∞) = lim EsT0 = 1 − s↑1 ′ 1 − 4pq = 1 − |p − q|. 1]. k = 0. k = 0.3 Total number of visits to a state If the random walk is recurrent then it visits any state inﬁnitely many times. . 2(p ∧ q) E0 J0 = . the total number of visits to a state will be ﬁnite. If p = q this gives 1. Theorem 38. meaning that Pi (Ji = ∞) = 1. as already known. 96 . . We now compute its exact distribution. In Lemma 4 we showed that. We now look at the the number of visits to some state but if we start from a diﬀerent state. Consider a simple random walk with values in Z. The total number of visits to state i is given by ∞ Ji = n=1 1(Sn = i). The probability generating function of T0 was obtained in Theorem 34. Ei Ji = 1 − 2(p ∧ q) Remark 2: The formulae work for all values of p ∈ [0. 1 − 2(p ∧ q) this being a geometric series. 2. 1. And if p < q this gives 1 − (q − p) = 1 − (1 − p − p) = 2p. . the total number J0 of visits to state 0 is geometrically distributed with P0 (J0 ≥ k) = (2(p ∧ q))k .′ Proof. 1. k = 0. . . . Then. If p > q this gives 1 − (p − q) = 1 − (1 − q − q) = 2q. . . 2. 1. Pi (Ji ≥ k) = 1 for all k. starting from zero. . But J0 ≥ 1 if and only if the ﬁrst return time to zero is ﬁnite. where the last probability was computed in Theorem 37. even for p = 1/2. 31. 1 − 2(p ∧ q) Proof. 2. we obviously have that the same formulae apply for any i: Pi (Ji ≥ k) = (2(p ∧ q))k . Remark 1: By spatial homogeneity. .

If p ≤ 1/2 then this gives P0 (Jx ≥ k) = (p/q)x (2p)k−1 . Then P0 (Tx < ∞. The reason that we replaced k by k − 1 is because. If p ≥ 1/2 then this gives P0 (Jx ≥ k) = (2q)k−1 . and the second in Theorem 38. if the random walk “drifts down” (i. Consider the case p < 1/2. So we obtain P0 (Jx ≥ k) = p ∧1 q x (p/q)x∨0 1 − 2p k≥1 . it will visit any point below 0 exactly the same number of times on the average. Remark 1: The expectation E0 Jx is ﬁnite if p = 1/2. In other words.e. E0 Jx = If p ≥ 1/2 then P0 (Jx ≥ k) = (q/p)(−x)∨0 (2q)k−1 . Jx ≥ k). Jx ≥ k) = P0 (Tx < ∞)Px (Jx ≥ k − 1). (2(p ∧ q))k−1 . For x > 0. Py (Jx ≥ k) = P0 (Jx−y ≥ k) and Ey Jx = E0 Jx−y . k ≥ 1. starting from 0. If p ≤ 1/2 then P0 (Jx ≥ k) = (p/q)x∨0 (2p)k−1 . Remark 2: If the initial state is not 0 then use spatial homogeneity to obtain the analogous formulae. If p ≤ 1/2 then this gives P0 (Jx ≥ k) = (2p)k−1 . Consider a random walk with values in Z. Apply now the strong Markov property at Tx . i. We repeat the argument for x < 0 and ﬁnd P0 (Jx ≥ k) = q ∧1 p |x| (2(p ∧ q))k−1 . there has already been one visit to state x and so we need at least k − 1 remaining visits in order to have at least k visits in total. we apply E0 Jx = k=1 P0 (Jx ≥ k) and perform the sum of a geometric series. But for x < 0. 97 . given that Tx < ∞. then E0 Jx drops down to zero geometrically fast as x tends to inﬁnity. ∞ As for the expectation. E0 Jx = Proof. (q/p)(−x)∨0 1 − 2q k≥1 . If p ≥ 1/2 then this gives P0 (Jx ≥ k) = (q/p)|x| (2q)k−1 .e. we have E0 Jx = 1/(1−2p) regardless of x. p < 1/2) then. simply because if Jx ≥ k ≥ 1 then Tx < ∞. Then P0 (Jx ≥ k) = P0 (Tx < ∞.Theorem 39. The ﬁrst term was computed in Theorem 35. Suppose x > 0.

n (37) 98 .i. . 1 ≤ k ≤ n | Sn = x) = P0 (Sk < x. . Sn > 0 | Sn = x) = But the left-hand side is P0 (Sn −Sn−k > 0. Let (Sk ) be a simple symmetric random walk and x a positive integer. i=1 Then the dual (Sk . and thus derived the generating function of Tx : E0 sTx = Specialising to the symmetric case we have E0 s Tx 1− 1 − 4pqs2 2qs x . each distributed as T1 . Use the obvious fact that (ξ1 . ξ1 ).d. in Theorem 41. i. Thus. every time we have a probability for S we can translate it into a probability for S. Theorem 40. Fix an integer n. . (35) In Theorem 29 we used the reﬂection principle and derived the distribution function of Tx for a simple symmetric random walk: 1 P0 (Tx ≤ n) = P0 (|Sn | ≥ x) − P0 (|Sn | = x). 0 ≤ k ≤ n) of (Sk . random variables. 0 ≤ k ≤ n) has the same distribution as (Sk . = 1− √ 1 − s2 s x . . because the two have the same distribution. 0 ≤ k ≤ n) is deﬁned by Sk := Sn − Sn−k . ξn ) has the same distribution as (ξn . ξn−1 . Let Tx = inf{n ≥ 1 : Sn = x} be the ﬁrst time that x will be visited.e. we derived using duality. (36) 2 Lastly. 0 ≤ k ≤ n − 1. Here is an application of this: Theorem 41. n Proof. 0 ≤ k ≤ n). . . Tx is the sum of x i. . .32 Duality Consider random walk Sk = k ξi . for a simple symmetric random walk. ξ2 . . which is basically another probability for S. . Sn = x) P0 (Tx = n) = . Rewrite the ballot theorem terms of the dual: P0 (S1 > 0. the probabilities P0 (Tx = n) = x P0 (Sn = x). Proof. random variables. (Sk . a sum of i. In Lemma 19 we observed that. . P0 (Sn = x) P0 (Sn = x) x . . for a simple random walk and for x > 0. Then P0 (Tx = n) = x P0 (Sn = x) . n x > 0.d. Pause for reﬂection: We have been dealing with the random times Tx for simple random walks (symmetric or asymmetric) from the beginning of the lectures.i.

We remark here something methodological: It was completely diﬀerent techniques that led to results about, essentially, the same thing: the distribution of Tx . This is an important lesson to learn. Of course, the three formulae should be compatible: the number (37) should be the coeﬃcient of sn in (35); summing up (37) should give (36), or taking diﬀerences in (36) should give (37). But on trying to verify these things algebraically, the reader will realise that it is not easy.

33

Amount of time that a SSRW is positive*

Consider a SSRW (Sn ) starting from 0. Deﬁne its trajectory as the polygonal line obtained by joining the points (n, Sn ) in the plane, as described earlier. Watch this trajectory up to time 2n. Let T+ (2n) be the number of sides of the polygonal line that are above the horizontal axis and T− (2n) = 2n − T+ (2n) the number of sides that are below. Clearly, T± (2n) are both even numbers because, if the RW is at 0, it takes an even number of steps to return to 0. We can think of T+ (2n) as the number of leads in 2n coin tosses, where we

0

T 0’

′ Figure 1: Part of the time the SSRW is positive and part negative. After it returns to 0 at time T0 ,

it behaves again like a SSRW and is independent of the past.

have a lead at some time if the number of heads till that time exceeds the number of tails. We are dealing with a fair coin which is tossed 2n times, so the event that the number of heads equals k has maximum probability when k = n (heads=tails). Most people take this statement and translate it into a statement about T+ (2n) arguing that P (T+ (2n) = m) is maximised when m = n, i.e. when T+ (2n) = T− (2n). This is wrong. In fact, as the theorem below shows, the most likely value of P (T+ (2n) = m) is when m = 0 or m = 2n. Theorem 42 ( distribution of the total time the random walk is positive ). P (T+ (2n) = 2k) = P (S2k = 0)P (S2n−2k = 0) , 0 ≤ k ≤ n.

′ Proof. The idea is to condition at the ﬁrst time T0 that the RW returns to 0: n ′ P (T+ (2n) = 2k, T0 = 2r). r=1

P (T+ (2n) = 2k) =

99

′ Between 0 and T0 the random walk is either above the horizontal axis or below; call the ﬁrst event A+ and the second A− and write n ′ ′ P (A+ , T0 = 2r)P (T+ (2n) = 2k | A+ , T0 = 2r) n ′ ′ P (A− , T0 = 2r)P (T+ (2n) = 2k | A− , T0 = 2r)

P (T+ (2n) = 2k) =

r=1

+

r=1

**In the ﬁrst case,
**

′ ′ P (T+ (2n) = 2k | A+ , T0 = 2r) = P (T+ (2n) = 2k−2r | A+ , T0 = 2r) = P (T+ (2n−2r) = 2k−2r), ′ because, if T0 = 2r and A+ occurs then the RW has already spent 2r units of time above the axis and so T+ (2n) is reduced by 2r. Furthermore, we may remove the conditioning ′ because the future evolution of the RW after T0 is independent of the past. In the second case, a similar argument shows ′ ′ P (T− (2n) = 2k | A− , T0 = 2r) = P (T− (2n) = 2k | A− , T0 = 2r) = P (T− (2n − 2r) = 2k).

**We also observe that, by symmetry, 1 ′ ′ ′ P (A+ , T0 = 2r) = P (A− , T0 = 2r) = P (T0 = 2r) = f2r . 2 Letting p2k,2n = P (T+ (2n) = 2k), we have shown that p2k,2n = 1 2
**

k

**f2r p2k−2r,2n−2r +
**

r=1

1 2

n−k

**f2r p2k,2n−2r
**

r=1

We want to show that p2k,2n = u2k u2n−2k , where u2k = P (S2k = 0). With this hypothesis, we can start playing with the recursion of the display above and using induction on n we will realise that p2k,2n 1 = u2n−2k 2

k r=1

1 f2r u2k−2r + u2k 2

n−k

f2r u2n−2k−2r .

r=1

Explicitly, we have P (T+ (2n) = 2k) = 2k k 2n − 2k −2n 2 , n−k 0 ≤ k ≤ n. (38)

With 2n = 100, we plot the function P (T100 = 2k) for 0 ≤ k ≤ 50 below. Notice that it is maximised at the ends of the interval.

100

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0

10

20 k

30

40

50

**The arcsine law*
**

Use of Stirling’s approximation in formula (38) gives P (T+ (2n) = 2k) ∼ 1/π k(n − k) , as k → ∞ and n − k → ∞.

**Now consider the calculation of the following probability: P T+ (2n) ∈ [a, b] 2n =
**

na≤k≤nb

P

T+ (2n) k = 2n n 1/π k(n − k) =

a≤k/n≤b

∼

a≤k/n≤b

1 . k k (n − n ) n n

1/π

**This is easily seen to have a limit as n → ∞, the limit being
**

b

1/π t(1 − t)

dt,

a

because the last sum is simply a Riemann approximation to the last integral. This shows the celebrated arcsine law: x √ T+ (2n) 2 1/π lim P dt = arcsin x. ≤x = n→∞ 2n π t(1 − t) 0 We thus have that the limiting distribution of T+ (2n) (the fraction of time that the RW is √ 2 positive), as n → ∞, is the distribution function π arcsin x, 0 < x < 1. This is known as the arcsine law. This distribution function has density √ 1/π , 0 < x < 1, the plot of

x(1−x)

which is amazingly close to the plot of the ﬁgure above. Now consider again a SSRW and consider the hitting times (Tx , x ≥ 0). This is another random walk because the summands in Tx = x (Ty − Ty−1 ) are i.i.d. all distributed as y=1 T1 : 2n −2n 1 P (T1 ≥ 2n) = P (S2n = 0) = 2 ∼√ . n πn 101

It is not a simple random walk because there is a possibility that the increment takes the value 0. 2 Same argument applies to (Sn . Consider a change of coordinates: rotate the axes by 45o (and scale them).i. They are also identically distributed with 1 P (ξn = 0) = P (ξn = (0. too.d. x1 − x2 ). going at random east. north or south. west.From this. 0)) = P (ξn = (−1. . being totally drunk. the two stochastic processes are not independent. n ∈ Z+ ) is also a random walk. We have 1.i. n ∈ Z+ ) is a sum of i. So Sn = (Sn . i. north or south. 34 Simple symmetric random walk in two dimensions Consider a drunkard in a city whose roads are arranged in a rectangular grid. n ∈ Z+ ) showing that it. are independent. . (Sn .. 2 1 If x ∈ Z2 . . for each n.e. The two random walks are not independent. west. −1)) = 1/2. 0)) = 1/4. y) where x. 0)) = 1/4 1 P (ξn = −1) = P (ξn = (−1. i. . We describe this motion as a random walk in Z2 = Z × Z: Let ξ1 . ξn are not independent simply because if one takes value 0 the other is nonzero. Indeed. 1 1 Proof. The corners are described by coordinates (x. . he moves east. random vectors such that P (ξn = (0. with equal probability (1/4). y ∈ Z. Sn − Sn ). 1 Hence (Sn . 1 P (ξn = 1) = P (ξn = (1. n ≥ 1. We then let S0 = (0. n ∈ Z+ ) is a random walk. random variables and hence a random walk. . We have: 102 . (Sn . Since ξ1 . Sn = i=1 n ξi . we let x1 be its horizontal coordinate and x2 its vertical. When he reaches a corner he moves again in the same manner. 1)) = P (ξn = (−1. . the random 1 2 variables ξn . ξ2 .d. i=1 ξi i=1 ξi 2 1 Lemma 20. . Rn ) := (Sn + Sn . x2 ) into (x1 + x2 . . However. Many other quantities can be approximated in a similar manner. we immediately get that the expectation of T1 is inﬁnity: ET1 = ∞. so are ξ1 . 0)) = 1/4. is a random walk. 0)) = P (ξn = (1. ξ2 . 0). 0) and. He starts at (0. the latter being functions of the former. 1)) + P (ξn = (0. map (x1 . ξ2 . So let 1 2 1 2 1 2 Rn = (Rn . Sn ) = n n 2 .

The sequence of vectors η . η 2 are ±1. ηn = ξn − ξn are independent for each n. ηn = ε′ ) = P (ηn = ε)P (ηn = ε′ ) for all choices of ε. being obtained the vector ηn = (ξn 1 2 n n n by taking the same function on the elements of the sequence ξ1 . 2 1 2 2 1 1 Proof. Proof. in the sense that the ratio of the two sides goes to 1. Similar argument shows that each of (Rn . 2 . ξ2 . 1 1 2 2 Hence P (ηn = 1) = P (ηn = −1) = 1/2. n Corollary 24. The same is true for (Rn ).. ηn = 1) = P (ξn = (0.1 Lemma 21. b−a P (Ta < Tb ) = b . Recall. . R2n = 0) = P (R2n = 0)(R2n = 0). We have of each of the ηn n 1 2 P (ηn = 1. (Rn + 2 2 1 2 1 1 we check that ηn = ξn + ξn . is i. P (S2n = 0) = Proof. 35 Skorokhod embedding* We take a closer look at the gambler’s ruin problem for a simple symmetric random walk in one dimension. where c is a positive constant. Let a < 0 < b. By Lemma 5.i. n ∈ Z+ ). (Rn . b−a P (STa ∧Tb = a) = b . 2n n 2 2−4n . Lemma 22. = 1)) = 1/4. So P (R2n = 0) = 2n 2−2n . 1)) = 1/4 1 2 P (ηn = −1.d. But by the previous n=0 c lemma. The last two are walk in one dimension. Hence (Rn . To show that the last two are independent. b−a 103 . we need to show that ∞ P (Sn = 0) = ∞. 1 where the last equality follows from the independence of the two random walks. (Rn . Hence P (ηn = 1) = P (ηn = −1) = 1/2. . . n ∈ Z ) is a random walk in one dimension. and by Stirling’s approximation. η . . as n → ∞. ξ 1 − ξ 2 ). n ∈ Z ) is a random walk in one dimension. But (Rn ) 1 2 is a simple symmetric random walk. So Rn = n ηn where ηn is i=1 i=1 i=1 1 + ξ 2 . 0)) = 1/4 1 2 P (ηn = −1. The possible values 1 . Rn = n (ξi − ξi ). ηn = −1) = P (ξn = (0. The two-dimensional simple symmetric random walk is recurrent. b−a −a . (Rn + independent. +1}. P (Tb < Ta ) = We can read this as follows: P (STa ∧Tb = b) = −a . . that for a simple symmetric random walk (started from S0 = 0). And so 2 1 2 1 P (ηn = ε. ηn = −1) = P (ξn = (−1. 0)) = 1/4 1 2 P (ηn = 1. . ε′ ∈ {−1. n ∈ Z+ ) is a random walk in two dimensions. n ∈ Z+ ) is a random 2 . from Theorem 7. n ∈ Z+ ) 1 is a random walk in two dimensions. ηn = 1) = P (ξn = (1. We have Rn = n (ξi + ξi ). 1 2 1 2 P (S2n = 0) = P (R2n = 0) = P (R2n = 0. P (S2n = 0) ∼ n .

Theorem 43. Can we ﬁnd a stopping time T such that ST has the given distribution? The Skorokhod embedding theorem tells us how to do this. k ∈ Z. B) taking values in {(0. c= i<0 (−i)pi = i>0 Letting Ti = inf{n ≥ 0 : Sn = i}.g. we can always ﬁnd integers a. B = 0) = p0 . This means that we can simulate the distribution by generating a simple symmetric random walk starting from 0 and watching it till the ﬁrst time it exits the interval [a. The probability it exits it from the left point is p. Deﬁne a pair of random variables (A. i<0 j>0 This leads to: c−1 j>0 jpj i<0 pi + c−1 i<0 (−i)pi j>0 i>0 ipi pj = 1 − p0 Since i∈Z ipi = 0. B) are independent of (Sn ). B = 0) + P (A = i. B = j) = c−1 (j − i)pi pj . a < 0 < b such that pa + (1 − p)b = 0. Let (Sn ) be a simple symmetric random walk and (pi . we have this common value: i<0 (−i)pi = and. b. and assuming that (A. Indeed.This last display tells us the distribution of STa ∧Tb . that ESTa ∧Tb = 0. The claim is that P (Z = k) = pk . How about more complicated distributions? Let suppose we are given a probability distribution (pi . M. using this. 1 − p). To see this. given a 2-point probability distribution (p. in particular. if p = M/N . Conversely. suppose ﬁrst k = 0. we consider the random variable Z = STA ∧TB . b]. Then Z = 0 if and only if A = B = 0 and this has 104 . Note. i ∈ Z. 0)} ∪ (N × N) with the following distribution: P (A = i. Then there exists a stopping time T such that P (ST = i) = pi . b = N . i ∈ Z. we get that c is precisely ipi . a = M − N . i < 0 < j. B = j) = 1. where c−1 is chosen by normalisation: P (A = 0. i ∈ Z). i ∈ Z) a given probability distribution on Z with zero mean: i∈Z ipi = 0. P (A = 0. and the probability it exits from the right point is 1 − p. e. Proof. N ∈ N. such that p is rational. M < N . choose.

by deﬁnition. i<0 = c−1 Similarly.probability p0 . Suppose next k > 0. B = k) i<0 = = i<0 −i −1 c (k − i)pi pk k−i (−i)pi pk = pk . 105 . B = j) P (STi ∧Tk = k)P (A = i. We have P (Z = k) = P (STA ∧TB = k) = i<0 j>0 P (STi ∧Tj = k)P (A = i. P (Z = k) = pk for k < 0.

−1. we have c | d. Therefore d | b − a. This gives us an algorithm for ﬁnding the gcd between two numbers: Suppose 0 < a < b are integers. Given integers a. Now suppose that c | a and c | b − a. d | b. 2) = 2. gcd(120. 5. 14) = gcd(6. c | b. Indeed.e. they merely serve as reminders. But d | a and d | b. If b − a > a. 8) = gcd(6. If c | b and b | a then c | a. b − a). Replace b by b − a. their negatives and zero. 2) = gcd(4. the process of long division that one learns in elementary school). i. But in the ﬁrst line we subtracted 38 exactly 3 times. b). 38) = gcd(44. 38) = gcd(6. and d = gcd(a. b) = gcd(a. 106 . For instance. Then c | a + (b − a). Similarly. b. They are not supposed to replace textbooks or basic courses.e. 3. 2. 4. replace b − a by b − 2a. Arithmetic Arithmetic deals with integers. with at least one of them nonzero. (That it is unique is obvious from the fact that if d1 . 2. . Keep doing this until you obtain b − qa < a. We will show that d = gcd(a. The set Z of integers consists of N. we can deﬁne their greatest common divisor d = gcd(a. then d | c.PART III: APPENDICES This section contains some sine qua non elements from standard Mathematics. So (i) holds. · · · }. with the second line. 26) = gcd(6. 38) = gcd(82. 32) = gcd(6. The set N of natural numbers is the set of positive integers: N = {1. if we also invoke Euclid’s algorithm (i. Then interchange the roles of the numbers and continue. leading to d1 = d2 . 0. Now. 1. . and we’re done. let d = gcd(a. b − a).}. So (ii) holds. Indeed 4 × 38 > 120. 38) = gcd(6. d2 are two such numbers then d1 would divide d2 and vice-versa. In other words. We write this by b | a. So we work faster. But 3 is the maximum number of times that 38 “ﬁts” into 120. If a | b and b | a then a = b. Zero is denoted by 0. Z = {· · · . −2. we may summarise the ﬁrst line by saying that we replace the largest of the two numbers by the remainder of the division of the largest with the smallest. . b). we say that b divides a if there is an integer k such that a = kb. b are arbitrary integers. as taught in high schools or universities. Because c | a and c | b. 20) = gcd(6. −3. if a. 3. (ii) if c | a and c | b.) Notice that: gcd(a. with b = 0. 6. b) as the unique positive integer such that (i) d | a. 2) = gcd(2.

Second. let d be the right hand side. 120) = 38x + 120y. For example. Suppose that c | a and c | b. xa + yb > 0}. 1. we can ﬁnd an integer q and an integer r ∈ {0. we used the divisions 120 = 3 × 38 + 6 38 = 6 × 6 + 2. Then d = xa + yb. r = 18. we have the fact that if d = gcd(a. We need to show (i). gcd(38. in particular. b) then there are integers x. The long division algorithm is a process for ﬁnding the quotient q and the remainder r. 38) = gcd(6. for some x. we have gcd(a. b) = min{xa + yb : x. b. 38) = gcd(6. y ∈ Z. Its proof is obvious. with a > b. y ∈ Z. 2) = 2. we can show that. This shows (ii) of the deﬁnition of a greatest common divisor. Thus. not both zero. . to divide 3450 with 26 we do 132 —— 3450 26 — 85 78 — 70 52 — 18 26 | and obtain q = 26. 107 . let us follow the previous example again: gcd(120. y = 6. . Here are some other facts about the greatest common divisor: First. To see why this is true. for integers a. a. To see this. i. . y such that d = xa + yb. The same logic works in general. it divides d. such that a = qb + r. a − 1}. with x = 19.e.The theorem of division says the following: Given two positive integers b. In the process. b ∈ Z. . Then c divides any linear combination of a. Working backwards. we have 2 = 38 − 6 × 6 = 38 − 6 × (120 − 3 × 38) = 19 × 38 − 6 × 120.

Both < and = are transitive. i. The set of functions from X to Y is denoted by Y X . Any equivalence relation splits a set into equivalence classes: the equivalence class of x contains all y that are related (are equivalent) to x. In general. a relation can be thought of as a set of pairs of elements of the set A. A relation which possesses all the three properties is called equivalence relation.. A function is called one-to-one (or injective) if two diﬀerent x1 . The equality relation = is reﬂexive.that d | a and d | b. and this is a contradiction. A relation is called reflexive if it contains pairs (x. A function is called onto (or surjective) if its range is Y . Relations A relation on a set A is a mathematical notion that establishes a way of pairing elements of A. y) and (y. A relation is called symmetric if it cannot contain a pair (x. The greatest common divisor of 3 integers can now be deﬁned in a similar manner. If we take A to be the set of students in a classroom. B be ﬁnite sets with n. Since d > 0. y) without contain its symmetric (y. We encountered the equivalence relation i theory. The equality relation is symmetric. But then b = q(xa + yb) + r. e. so d divides both a and b. If B ⊂ Y then f −1 (B) consists of all x ∈ X with f (x) ∈ B. j (i communicates with j) in Markov chain Functions We a function f from a set X into a set Y is denoted by f : X → Y .e. yielding r = (−qx)a + (1 − qy)b. The range of f is the set of its values. 108 . So r is expressible as a linear combination of a and b and is strictly smaller than d (being a remainder). that d does not divide b.e. respectively. the greatest common divisor of any subset of the integers also makes sense. the relation < between real numbers can be thought of as consisting of pairs of numbers (x. i. for some 1 ≤ r ≤ d − 1. A relation is transitive if when it contains (x. z) it also contains (x.g. x). Finally. The relation < is not symmetric. then the relation “x is a friend of y” is symmetric. x). it can be thought as reﬂexive (unless we do not want to accept the fact that one may not be afriend of oneself). that d does not divide both numbers. For example. So d is not minimum as claimed. f (x2 ). Suppose this is not true. x2 ∈ X have diﬀerent values f (x1 ). we can divide b by d to ﬁnd b = qd + r. The relation < is not reﬂexive. Counting Let A. y) such that x is smaller than y. but is not necessarily transitive. say. z). m elements. the set {f (x) : x ∈ X}. so r = 0.

So we have (n)k = n(n − 1)(n − 2) · · · (n − k + 1) ways to pick k elements if we care about their order. The ﬁrst element can be chosen in n ways. the total number of one-to-one functions from A to B is (m)n := m(m − 1)(m − 2) · · · (m − n + 1). n! stands for the product of the ﬁrst n natural numbers: n! = 1 × 2 × 3 × · · · × n. Clearly. the cardinality of the set B A ) is mn . Since the name of the ﬁrst animal can be chosen in m ways. 109 . k! k where the symbol equation. the k-th in n − k + 1 ways.The number of functions from A to B (i.) So the ﬁrst animal can be assigned any of the m available names. Thus. a one-to-one function from the set A into A is called a permutation of A. n k is merely a name for the number computed on the left side of this n k = n n−k n 0 = n! . and the third. Each assignment of this type is a one-to-one function from A to B. Hence the total number of assignments is m × m × ··· × m n times Suppose now we are not allowed to give the same name to diﬀerent animals. the second in n − 1. that of the second in m − 1 ways. The total number of subsets of A (regardless of their number of elements) is thus equal to n k=0 n k = (1 + 1)n = 2n . n! is the total number of permutations of a set of n elements.e. (The same name may be used for diﬀerent animals. we need more names than animals: n ≤ m. Any set A contains several subsets. If A has n elements. we denote (n)n also by n!. Incidentally. we may think of the elements of A as animals and those of B as names. We thus observe that there are as many subsets in A as number of elements in the set {0. To be able to do this. and so on. In the special case m = n. we have = 1. If we don’t then we divide by the number k! of permutations of the k elements concluding that the number of subsets with k elements of a set with n elements equals (n)k n =: . the number of subsets with k elements each can be found by designating which of the n elements of A will be chosen. so does the second. and this follows from the binomial theorem. 1}n of sequences of length n with elements 0 or 1. A function is then an assignment of a name to each animal. k!(n − k)! Since there is only one subset with no elements (it is the empty set ∅). in other words. and so on. Indeed. and so on.

110 . b. and we don’t care to put parantheses because the order of ﬁnding a maximum is irrelevant. say. a∨b∨c refers to the maximum of the three numbers a. b). If the pig is not included in the sample. Thus a ∧ b = a if a ≤ b or = b if b ≤ a. n n . we add the two numbers together in order to ﬁnd the total number of ways to pick k animals from n + 1 animals. then there are k − 1 remaining k n animals to be chosen and this can be done in k−1 ways. we deﬁne x m = x(x − 1) · · · (x − m + 1) . b is denoted by a ∧ b = min(a. In particular. by convention. the pig. Since the sentences “the pig is included” and “the pig is not included” cannot be simultaneously true. a+ is called the positive part of a. m! n y If y is not an integer then. The notation extends to more than one numbers. and a− is called the negative part of a. we choose k animals from the remaining n ones and this can be done in n ways. For example. in choosing k animals from a set of n + 1 ones. + k−1 k Indeed. The maximum is denoted by a ∨ b = max(a. Note that −(a ∨ b) = (−a) ∧ (−b) and (a ∧ b) + c = (a + c) ∧ (b + c). Both quantities are nonnegative. a− = − min(a. Note that it is always true that a = a+ − a− . c. consider a speciﬁc animal. we use the following notations: a+ = max(a. 0). Maximum and minimum The minimum of two numbers a. b). k Also. 0). by convention. The absolute value of a is deﬁned by |a| = a+ + a− . we let The Pascal triangle relation states that n+1 k = = 0. If the pig is included.We shall let n = 0 if n < k. if x is a real number.

Then X X= n=1 1 (the sum of X ones is X). are sets or clauses then the number n≥1 1An is the number true clauses. Therefore its expectation is E 1A = 1 × P (A) + 0 × P (Ac ) = P (A). This is very useful notation because conjunction of clauses is transformed into multiplication. So X= n≥1 1(X ≥ n). An example of its application: Since 1A∪B = 1 − 1(A∪B)c = 1 − 1Ac ∩Bc = 1 − 1Ac 1Bc = 1 − (1 − 1A )(1 − 1B ) = 1A + 1B − 1A 1B . if I go to a bank N times and Ai is the clause “I am robbed the i-th time” then N n=1 1Ai is the total number of times I am robbed. Thus.Indicator function stands for the indicator function of a set A. . meaning a function that takes value 1 on A and 0 on the complement Ac of A. A2 . if A1 . = 1A + 1B − 1A∩B . We also use the notation 1(clause) for a number that is 1 whenever the clause is true or 0 otherwise. 111 . It is easy to see that 1A 1A∩B = 1A 1B . If A is an event then 1A is a random variable. which means P (A ∪ B) = P (A) + P (B) − P (A ∩ B). . . . 2. Another example: Let X be a nonnegative random variable with values in N = {1. . 1 − 1A = 1 A c . . A linear function ϕ from Rn to Rm is such that ϕ(αx + βy) = αϕ(x) + βϕ(y). we have E[1A∪B ] = E[1A ] + E[1B ] − E[1A∩B ].}. For instance. It is worth remembering that. P (X ≥ n). Several quantities of interest are expressed in terms of indicators. It takes value 1 with probability P (A) and 0 with probability P (Ac ). So E[X] = n≥1 E 1(X ≥ n) = n≥1 Matrices Let Rd be the set of vectors with d components.

So if we choose a basis v1 . we have ϕ(x) = m n vi i=1 j=1 aij xj . The column . . . x1 . Let u1 . The relation between the column representation of y and the column representation of x is written as 1 1 x y a11 · · · a1n . Then any x ∈ Rn can be uniquely written as n x= j=1 x j uj . . because ϕ is linear. . . am1 · · · amn xn ym Rn − Rm − Rℓ → → ψ ϕ . . . . where ∈ R. . Combining. The matrix A of ϕ with respect to the choice of the two basis is deﬁned by a11 · · · a1n A := · · · · · · · · · am1 · · · amn y1 Then ϕ ◦ ψ is a linear function: Rn − → Rℓ − 112 ϕ◦ψ . un be a basis for Rn . .for all x. Then. . . . x . . β ∈ R. y ∈ Rn . is a column representation of x with respect to . j=1 i = 1. . Let y i := n aij xj . and all α. vm . . ϕ are linear functions: It is a m × n matrix. The column . . . . . . is a column representation of y := ϕ(x) with respect to the given basis . But the ϕ(uj ) are vectors in Rm . xn n 1 y := ϕ(x) = j=1 xj ϕ(uj ). xn the given basis. Suppose that ψ. . = ··· ··· ··· . vm in Rn we can express each ϕ(uj ) as a linear combination of the basis vectors: m ϕ(uj ) = i=1 aij vi . m. . ym v1 . .

a2 . Similarly. Rm and Rℓ and let A. Similarly.Fix three sets of basis on Rn . .} ≤ inf{x3 . we deﬁne lim sup xn . . The standard basis in Rd consists of the vectors 1 0 0 0 1 0 e 1 = . x4 .} ≤ inf{x2 . we deﬁne max I. sup I is characterised by: sup I ≥ x for all x ∈ I and. . we deﬁne the supremum sup I to be the minimum of all upper bounds. . 0 0 1 In other words. . If I has no upper bound then sup I = +∞. Limsup and liminf If x1 . i. . . respectively. a sequence a1 . 113 . If I has no lower bound then inf I = −∞. where m (AB)ij = k=1 Aik Bkj . xn+1 .} has no minimum. a square matrix represents a linear transformation from Rn to Rn . it is its matrix representation with respect to the ﬁxed basis. . . xn+2 . with respect to these bases. ψ. . . x2 . deﬁned as the maximum of all the numbers y with the property y ≤ x for all x ∈ I. 1/2. ei = 1(i = j). We note that lim inf(−xn ) = − lim sup xn .} ≤ · · · and. . j So A is a ℓ × m matrix. · · · }. If we ﬁx a basis in Rn . B is a m × n matrix. Usually. x3 .e. the choice of the basis is the so-called standard basis. 1/3. 1 ≤ i ≤ ℓ. A matrix A is called square if it is a n × n matrix. . standing for the “infimum” of I. . If I is empty then inf ∅ = +∞. there is x ∈ I with x ≥ ε + inf I. . The sequence xn has a limit ℓ if and only if lim sup xn = lim inf xn = ℓ. . for any ε > 0. inf I is characterised by: inf I ≤ x for all x ∈ I and. For instance I = {1. Likewise. . and it is this that we called inﬁmum. . . . In these cases we use the notation inf I. 1 ≤ j ≤ n. there is x ∈ I with x ≤ ε + inf I. of numbers) then the minimum of I is denoted by min I and refers to a number c ∈ I such that c ≤ x for all the numbers x ∈ I. . e2 = . . So lim sup xn = limn→∞ inf{xn . sup ∅ = −∞. since any increasing sequence has a limit (possibly +∞) we take this limit and call it lim sup xn (limit superior) of the sequence. . . Supremum and inﬁmum When we have a set of numbers I (for instance. is a sequence of numbers then inf{x1 . . ed = . This minimum (or maximum) may not exist. and AB is a ℓ × n matrix. . A property of real numbers says that if the set I is bounded from below then this maximum of all lower bounds exists. . . x2 . . . for any ε > 0. B be the matrix representations of ϕ. Then the matrix representation of ϕ ◦ ψ is AB. .

If An are events with P (An ) = 0 then P (∪n An ) = 0. we get P (A ∩ B) = P (A)P (B|A) = P (B)P (A|B) and this is called Bayes’ theorem. B2 . for (besides P (Ω) = 1) there is essentially one axiom . Indeed. which holds since X is positive. I do sometimes “cheat” from a strict mathematical point of view.) If A.Probability Theory I do not require11 knowledge of axiomatic Probability Theory (with Measure Theory). P (∪n An ) ≤ Similarly. Recall that P (B|A) is deﬁned by P (A ∩ B)/P (A) and is a very motivated deﬁnition. the function that takes value 1 on A and 0 on Ac . . . are increasing with A = ∪n An then P (A) = limn→∞ P (An ). B in the deﬁnition. to show that EX ≤ EY . then P n≥1 An = P (An ). Linear Algebra that has many axioms is more ‘restrictive’. . . and 1 Ac = 1 − 1 A . if he events A1 . namely: Everything else. A is a subset of B). Similarly. This is the inclusion-exclusion formula for two events. Often. n≥1 Usually. This is generalisable for a number of events: P (A1 ∩ A2 ∩ A3 ) = P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 ) If we interchange the roles of A. . we often need to argue that X ≤ Y . by not requiring this. E 1A = P (A). and apply them. If A is an event then 1A or 1(A) is the indicator random variable. we work a lot with conditional probabilities. If A1 . (That is why Probability Theory is very rich.. But I would like the reader to keep in mind one thing: Every mathematical system has a set of axioms from which we can prove (useful) theorems. In Markov chain theory. If the events A. In contrast. In contrast. . A2 . it is useful to ﬁnd disjoint events B1 . if An are events with P (An ) = 1 then P (∩n An ) = 1.e. Linear Algebra that has many axioms is more ‘restrictive’. are decreasing with A = ∩n An then P (A) = limn→∞ P (An ). to compute P (A). A2 . n If the events A1 . to compute P (A ∩ B) we read the deﬁnition as P (A ∩ B) = P (A)P (B|A). in which case we write P (A) = n P (A ∩ Bn ) = P (A|Bn )P (Bn ). This can be generalised to any ﬁnite number of events. Note that 1A 1B = 1A∩B .e. Of course. B are not disjoint then P (A ∪ B) = P (A) + P (B) − P (A ∩ B). That is why Probability Theory is very rich. Quite often. A2 .. . n P (An ). showing that P (A) ≤ P (B) is more of a logical problem than a numerical one: we need to show that A implies B (i. This follows from the logical statement X 1(X > x) ≤ X. Markov’s inequality for a positive random variable X states that P (X > x) ≤ EX/x. are mutually disjoint events. such that ∪n Bn = Ω. derive formulae. . . . . . B are disjoint events then P (A ∪ B) = P (A) + P (B). Probability Theory is exceptionally simple in terms of its axioms. Therefore. 11 114 . follows from this axiom. in these notes. and everywhere else. but not too much. Similarly. For example. i.

ϕ(s) = EsX = n=0 sn P (X = n) 115 . As an example. . 2. taught in calculus courses). . . . . |s| < 1/|ρ|. If it exists for some s1 then it exists uniformly for all s with |s| < |s1 | (and this needs a proof. for s = 0 for which G(0) = 0. This deﬁnition makes sense only if EX + < ∞ or if EX − < ∞. 0). the generating function of the geometric sequence an = ρn . n = 0. n ≥ 0 is ∞ ρn s n = n=0 1 . the formula reduces to the known one. 1. Suppose now that X is a random variable with values in Z+ ∪ {+∞}. For a random variable which takes positive and negative values. . In case the random variable is discrete. . we encounter situations where P (X = ∞) may be positive. so it is useful to keep this in mind. is called probability generating function of X: ∞ as long as |ρs| < 1. The expectation of a nonnegative random variable X may be deﬁned by ∞ EX = 0 P (X > x)dx. obviously. we have EX = −∞ xf (x)dx. . is a probability distribution on Z+ = {0. which is motivated from the algebraic identity X = X + − X − . EX = x xP (X = x). . In case the random variable is ∞ absoultely continuous with density f . viz. is an inﬁnite polynomial having the ai as coeﬃcients: G(s) = a0 + a1 s + a2 s2 + · · · .e. . which are both nonnegative and then deﬁne EX = EX + − EX − . 1]. 1]. we deﬁne X + = max(X. x→∞ In Markov chain theory. The generating function of the sequence pn = P (X = n). For a random variable that takes values in N we can write EX = ∞ P (X ≥ n). This is due to the fact that n=1 P (X ≥ n) = E 1(X ≥ n). a1 . X − = − min(X. 0). 1. Generating functions A generating function of a sequence a0 . The formula holds for any positive random variable. . making it a useful object when the a0 .} because then n an = 1. So if n an < ∞ then G(s) exists for all s ∈ [−1.An application of this is: P (X < ∞) = lim P (X ≤ x). i.. 1 − ρs (39) Note that |G(s)| ≤ n |an ||s|n and this is ≤ n |an | if s ∈ [−1. . a1 . We do not a priori know that this exists for any s other than.

Note that ϕ(0) = P (X = 0). n ≥ 0) is n≥0 xn+1 sn = s−1 (X(s) − x0 ) and the generating function of (xn+2 . and this is why ϕ is called probability generating function. xn sn be the generating of (xn . ∞ and p∞ := P (X = ∞) = 1 − pn . possibly. n! as follows from Taylor’s theorem. Generating functions are useful for solving linear recursions. with x0 = x1 = 1. we can recover the expectation of X from EX = lim ϕ′ (s). but. we have s∞ = 0. n=0 We do allow X to. s→1 something that can be formally seen from d d EsX = E sX = EXsX−1 . 116 . consider the Fibonacci sequence deﬁned by the recursion xn+2 = xn+1 + xn . n ≥ 0) then the generating function of (xn+1 . We can recover the probabilities pn from the formula pn = ϕ(n) (0) . As an example. while ∞ ϕ(1) = n=0 P (X = n) = P (X < +∞). If we let X(s) = n≥0 n ≥ 0. Note that ∞ n=0 pn ≤ 1. take the value +∞. n ≥ 0) is n≥0 xn+2 sn = s−2 (X(s) − x0 − sx1 ).This is deﬁned for all −1 < s < 1. Also. ds ds and by letting s = 1. since |s| < 1. Since the term s∞ p∞ could formally appear in the sum deﬁning ϕ(s).

e.e. for example. and so X(s) = 1 1 1 1 1 1 = .From the recursion xn+2 = xn+1 + xn we have (and this is were linearity is used) that the generating function of (xn+2 . b = −( 5 + 1)/2. this number is always an integer (it has to be!) Distribution or law of a random object When we speak of the “distribution” or “law” of a random variable X with values in Z or R or Rn or even in a much larger space. 117 . namely. i. Thus (1/ρ)−s is the 1 generating function of ρn+1 . Hence s2 + s − 1 = (s − a)(s − b). b−a is the solution to the recursion. n ≥ 0) and (xn . i. Hence X(s) is the generating function of 1 xn = ((1/b)n+1 − (1/a)n+1 ). − − a−b s−b s−a b−a b−s a−s 1/ρ 1 1 But (39) tells us that ρn has generating function 1−ρs = (1/ρ)−s . we can also write this as xn = (−a)n+1 − (−b)n+1 . x ∈ R. is such an example. Since x0 = x1 = 1. the function P (X ≤ x). s−2 (X(s) − x0 − sx1 ) = s−1 (X(s) − x0 ) + X(s). we speak of some function which enables us to specify the probabilities of the form P (X ∈ B) where B is a set of values of X. Despite the square roots in the formula. So. then the so-called distribution function. the n-th term of the Fibonacci sequence. if X is a random variable with values in R. This tells us that s−b is the generating function of (1/b)n+1 1 and that s−a is the generating function of (1/a)n+1 . we can solve for X(s) and ﬁnd X(s) = s2 −1 . +s−1 But the polynomial s2 + s − 1 has two roots: √ √ a = ( 5 − 1)/2. b−a Noting that ab = −1. n ≥ 0) equals the sum of the generating functions of (xn+1 . n ≥ 0). But I could very well use the function P (X > x).

for example they could take values which are themselves functions. for it does allow us to specify the probabilities P (X = n). it’s better to use logarithms: n n! = e log n = exp k=1 log k.or the 2-parameter function P (a < X < b). Now. Warning: the verb “specify” should not be interpreted as “compute” in the sense of existence of an analytical formula or algorithm. For example (try it!) the rough Stirling approximation give that the probability that. For instance. n ∈ Z+ . in 2n fair coin tosses 118 . compute an integral and obtain an approximation. n Hence n k=1 log k ≈ log x dx. instead. But there are two problems: (a) We do not know in what sense the symbol ≈ should be interpreted. (b) The approximation is not good when we consider ratios of products. We frequently encounter random objects which belong to larger spaces. dx All these objects can be referred to as “distribution” or “law” of X. If X is absolutely continuous. 1 Since d dx (x log x − x) = log x. we encountered the concept of an excursion of a Markov chain from a state i. but using or merely talking about such an object presents no diﬃculty other than overcoming a psychological barrier. to compute a large sum we can. Hence we have We call this the rough Stirling approximation. since the number is so big. Such an excursion is a random object which should be thought of as a random function. Large products: Stirling’s formula If we want to compute n! = 1 × 2 × · · · × n for large n then. n! ≈ exp(n log n − n) = nn e−n . it follows that n 1 log x dx = n log n − n + 1 ≈ n log n − n. Even the generating function EsX of a Z+ -valued random variable X is an example. I could use its density f (x) = d P (X ≤ x). Since log x does not vary too much between two successive positive integers k and k + 1 it follows that k+1 k log x dx ≈ log k. Specifying the distribution of such an object may be hard. It turns out to be a good one.

And if y < x then 1(y < x) = 0 and the display reads g(y) ≥ 0. To remedy this. The same process shows that a geometric random variable has a distribution of the form P (X ≥ n) = λn . deﬁned in (34). so this little section is a reminder. you have seen it before. expressing non-negativity. and so. by taking expectations of both sides (expectation is an increasing operator). as n → ∞. Let X be a random variable. g(X) ≥ g(x)1(X ≥ x). Consider an increasing and nonnegative function g : R → R+ . 119 . Then. We baptise this geometric(λ) distribution. Markov’s inequality Arguably this is the most useful inequality in Probability. The geometric distribution We say that the random variable X with values in Z+ is geometric if it has the memoryless property: P (X − n ≥ m | X ≥ n) = P (X ≥ m). We have already seen a geometric random variable. Eg(X) ≥ g(x)P (X ≥ x) . Surely. where λ = P (X ≥ 1). y ∈ R g(y) ≥ g(x)1(y ≥ x). and is based on the concept of monotonicity. 0 ≤ n ≤ m. because we know that the n probability goes to zero as n → ∞. It was shown in Theorem 38 that J0 satisﬁes the memoryless property. In fact it was shown there that P (J0 ≥ k) = P (J0 ≥ 1)k . n ∈ Z+ . That was the random variable J0 = ∞ n=1 1(Sn = 0). Indeed. if y ≥ x then 1(y ≥ x) = 1 and the display tells us g(y) ≥ g(x). We can case both properties of g neatly by writing For all x. then X is such that knowledge that X has exceeded n does not make the distribution of the remaining time X − n any diﬀerent from X. If we think of X as the time of occurrence of a random phenomenon. and this is terrible. expressing monotonicity. for all x. This completely trivial thought gives the very useful Markov’s inequality.we have exactly n heads equals 2n 2−2n ≈ 1. a geometric function of k. we shall prove Strirling’s approximation: √ n! ∼ nn e−n 2πn where the symbol an ∼ bn means that an /bn → 0.

J L (1997) Introduction to Probability. Parry. Bremaud P (1999) Markov Chains.mac. Springer. C M and Snell. Grinstead. W (1968) An Introduction to Probability Theory and its Applications.Bibliography ´ 1. Also available at: http://www. Norris. 7. 4. Wiley. 120 . 5. Soc. Feller. W (1981) Topics in Ergodic Theory. Bremaud P (1997) An Introduction to Probabilistic Modeling. ´ 2. Cambridge University Press. 3. K L and Aitsahlia F (2003) Elementary Probability Theory (With Stochastic Processes and an Introduction to Mathematical Finance).dartmouth.pdf 6. Springer. Springer. Math. Chung. Amer. J R (1998) Markov Chains.edu/~chance/teaching_aids/books_articles/ /probability_book/amsbook. Cambridge University Press.

115 geometric distribution. 109 positive recurrent. 110 mixing. 2 nearest neighbour random walk. 111 inessential state. 116 ﬁrst-step analysis. 108 ergodic. 47 coupling inequality. 51 essential state. 117 leads to. 101 axiom of Probability Theory. 114 balance equations. 18 permutation. 17 Ethernet. 6 Markov’s inequality. 61 recurrent. 59 law. 61 detailed balance equations. 10 irreducible. 16 equivalence relation. 17 inﬁmum. 34 PageRank. 87 121 . 119 minimum. 17 communicates with. 98 dynamical system. 82 Cesaro averages. 29 reﬂection. 70 greatest common divisor.Index absorbing. 34 probability ﬂow. 7 invariant distribution. 115 random walk on a graph. 77 indicator. 113 initial distribution. 1 equivalence classes. 7 closed. 45 Chapman-Kolmogorov equation. 35 Helly-Bray Lemma. 26 running maximum. 111 maximum. 63 Galton-Watson process. 15. 18 dual. 73 Markov chain. 15 Loynes’ method. 16. 48 cycle formula. 10 ballot theorem. 58 digraph (directed graph). 53 Fibonacci sequence. 16 convolution. 65 Catalan numbers. 70 period. 119 Google. 51 Monte Carlo. 84 bank account. 65 generating function. 15 distribution. 17 Aloha. 3 branching process. 61 null recurrent. 108 relation. 76 neighbours. 17 Kolmogorov’s loop criterion. 68 aperiodic. 119 matrix. 44 degree. 108 ruin probability. 117 division. 81 reﬂection principle. 6 Markov property. 20 function of a Markov chain. 110 meeting time. 16 communicating classes. 68 Fatou’s lemma. 86 reﬂexive. 18 arcsine law. 13 probability generating function. 20 increments. 106 ground state. 48 memoryless property. 77 coupling. 53 hitting time.

119 strong Markov property. 10 steady-state. 27 storage chain. 60 urn. 8 transitive. 7. 76 Skorokhod embedding. 16. 76 simple symmetric random walk. 73 strategy of the gambler. 108 tree. 16. 27 subordination.semigroup property. 62 Wright-Fisher model. 77 skip-free property. 3. 62 subsequence of a Markov chain. 104 states. 29 transition diagram:. 58 transient. 13 stopping time. 8 transition probability matrix. 9 stationary distribution. 88 watching a chain in a set. 71 122 . 7 time-reversible. 62 supremum. 24 Strirling’s approximation. 7 simple random walk. 6 stationary. 108 time-homogeneous. 2 transition probability. 6 statespace. 113 symmetric.

Sign up to vote on this title

UsefulNot useful- markov-chains
- Markov Chain
- 20752_0412055511MarkovChain
- (Ebook) Hidden Markov Models (Theory & Methods) Markov Chains Particle Filter Monte Carlo Hmm - Cappemoulinesryden
- MarkovChains
- Markov Chains
- Bayesian Econometric Methods
- Markov Coding
- A First Course in Linear Algebra
- ProblemsMarkovChains
- An Introduction to Statistics
- Statistics
- Exercises for Stochastic Calculus
- Introduction to Markov Chain Monte Carlo Simulation and Their Statistical Analysis
- Markov Chain
- Markov
- Exponential Distribution Theory and Methods
- markov chain chapter 6
- Stochastic Processes
- Computational Economic Analysis for Engineering and Industry Industrial Innovation
- A Course in Linear Algebra .pdf
- Solved Problems
- Ash Basic Probability Theory (Dover)
- Usig R for Introductory Statistics
- Basic Topology Armstrong
- Grimmett g.r., Stirzaker d.r. Probability and Random Processes (3ed., Oxford, 2001)(1)
- MIR publisher
- Weather Wax Ross Solutions
- Math Olympiad
- Markov Chain Random Walk

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.