This action might not be possible to undo. Are you sure you want to continue?

**MARKOV CHAINS AND RANDOM WALKS
**

Takis Konstantopoulos∗ Autumn 2009

∗

c Takis Konstantopoulos 2006-2009

Contents

1 Introduction 2 Examples of Markov chains 2.1 2.2 2.3 2.4 2.5 A mouse in a cage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bank account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple random walk (drunkard’s walk) . . . . . . . . . . . . . . . . . . . . . . Simple random walk (drunkard’s walk) in a city . . . . . . . . . . . . . . . . . Actuarial chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 3 4 4 5 6 6 6 7 9

3 The Markov property 3.1 3.2 3.3 Deﬁnition of the Markov property . . . . . . . . . . . . . . . . . . . . . . . . Transition probability and initial distribution . . . . . . . . . . . . . . . . . . Time-homogeneous chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Stationarity 4.1

Finding a stationary distribution . . . . . . . . . . . . . . . . . . . . . . . . . 13 15

5 Topological structure 5.1 5.2 5.3 5.4

The graph of a chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The relation “leads to” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The relation “communicates with” . . . . . . . . . . . . . . . . . . . . . . . . 16 Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 20 24 27 28 29

6 Hitting times and ﬁrst-step analysis 7 Gambler’s ruin 8 Stopping times and the strong Markov property 9 Regenerative structure of a Markov chain 10 Recurrence and transience

10.1 First hitting time decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 31 10.2 Communicating classes and recurrence/transience . . . . . . . . . . . . . . . . 32 11 Positive recurrence 33

i

12 Law (=Theorem) of Large Numbers in Probability Theory 13 Law of Large Numbers for Markov chains 13.1 Functions of excursions

34 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

13.2 Average reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 14 Construction of a stationary distribution 15 Positive recurrence is a class property 16 Uniqueness of the stationary distribution 17 Structure of a stationary distribution 18 Coupling and stability 39 41 43 44 45

18.1 Deﬁnition of stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 18.2 The fundamental stability theorem for Markov chains . . . . . . . . . . . . . 47 18.3 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 18.4 Proof of the fundamental stability theorem . . . . . . . . . . . . . . . . . . . 49 18.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 18.6 Periodic chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 19 Limiting theory for Markov chains* 53

19.1 Limiting probabilities for null recurrent states . . . . . . . . . . . . . . . . . . 53 19.2 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 20 Ergodic theorems 21 Finite chains 22 Time reversibility 23 New Markov chains from old ones* 56 57 58 62

23.1 Subsequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 23.2 Watching a Markov chain when it visits a set . . . . . . . . . . . . . . . . . . 62 23.3 Subordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 23.4 Deterministic function of a Markov chain . . . . . . . . . . . . . . . . . . . . 63 24 Applications 65

ii

.1 Distribution after n steps . . . . . . . .3 Total number of visits to a state . 73 25 Random walk 26 The simple symmetric random walk in dimension 1: path counting 76 78 26. . . . . . . . . . . 83 27. .1 Branching processes: a population growth application . . .4 Some remarkable identities . . . . . . . . . . . . . . . . . . .3 The distribution of the ﬁrst return to the origin . .2 Paths that start and end at speciﬁc points and do not touch zero at all . . . . 80 27 The simple symmetric random walk in dimension 1: simple probability calculations 83 27. 92 31 Recurrence properties of simple random walks 94 31. . . . . . . . . . . . . . .2 Return probabilities . .2 The ballot theorem . . .4 The Wright-Fisher model: an example from biology . . . . . . 95 31. . .2 First return to the origin . . . . . . . . . . . . . . . . . . . . . 96 32 Duality 33 Amount of time that a SSRW is positive* 98 99 iii . . . . .1 Asymptotic distribution of the maximum . 84 27. . . . . . 84 27. . . . . . . . . . . . . . . . . . . . . . . . . . . .5 First return to zero . . . . . . . . . . . . 79 26. . . 85 27. . . . . 70 24. . . . . . . . . . . . . . . . . . . . . . . . 95 31. . . 71 24. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 29 Urns and the ballot theorem 30 The asymmetric simple random walk in dimension 1 88 89 30. . . . 92 30. . . . . . . . .1 Paths that start and end at speciﬁc points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 30. . . . . . . . . . . 65 24. . . . . . . . . . . .3 PageRank (trademark of Google): A World Wide Web application . . . . . . . . . . . . . . . . . .5 A storage or queueing application . . .24. . . .1 Distribution of hitting time and maximum . . . . . . . 86 28 The reﬂection principle for a simple random walk in dimension 1 86 28. . . . . . . .2 The ALOHA protocol: a computer communications application . . . 68 24. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 First hitting time analysis . . . . . . . . . . . . . . . . . . . . .3 Some conditional probabilities . . . . . .

34 Simple symmetric random walk in two dimensions 35 Skorokhod embedding* 102 103 iv .

Also.. The only new thing here is that I give emphasis to probabilistic methods as soon as possible. Also. of course. I plan to add a few more sections: – On algorithms and simulation – On criteria for positive recurrence – On doing stuﬀ with matrices – On ﬁnance applications – On a few more delicate computations for simple random walks – Reshape the appendix – Add more examples These are things to come. A few starred sections should be considered “advanced” and can be omitted at ﬁrst reading.K. I have a little mathematical appendix. They form the core material for an undergraduate course on Markov chains in discrete time. At the end. v .Preface These notes were written based on a number of courses I taught over the years in the U.. I introduce stationarity before even talking about state classiﬁcation. dozens of good books on the topic. The ﬁrst part is about Markov chains and some applications. one can argue that random walk calculations should be done before the student is exposed to the Markov chain theory. There notes are still incomplete. The notes should be readable by someone who has taken a course in introductory (non-measuretheoretic) probability. “important” formulae are placed inside a coloured box. There are. Of course. I have tried both and prefer the current ordering. I tried to put all terms in blue small capitals whenever they are ﬁrst encountered. I tried to make everything as rigorous as possible while maintaining each step as accessible as possible..S. Greece and the U. The second one is speciﬁcally for simple random walks.

a totally ordered set. depend only on the pair Xn . . and its ˙ dt 2x acceleration is x = d 2 . In other words. n = 0. the particle is suspended at the end of a spring and force is produced by extending the spring and letting the system oscillate (Hooke’s law). 2. . continuous (i. the real numbers). . An example from Arithmetic of a dynamical system in discrete time is the one which ﬁnds the greatest common divisor gcd(a. yn+1 ). the integers). i. b) between two positive integers a and b. we end up with two numbers. Assume that the force F depends only on the particle’s position ¨ dt and velocity. X0 . has the property that. X1 . The state of the particle at time t is described by the pair ˙ X(t) = (x(t). b). yn ). y0 = min(a. x(t)). By repeating the procedure. there is a function F that takes the pair Xn = (xn . .e.e. a phenomenon which evolves with time in a way that only the present aﬀects the future is called a dynamical system. We are herein constrained by the Syllabus to discuss only discrete-time Markov chains. F = f (x. yn ). we let our “state” be a pair of integers (xn . where r = rem(a. Xn+2 . Formally. yn ) into Xn+1 = (xn+1 . An example from Physics is provided by Newton’s laws of motion. b). for any n. .e. x Here. b) = gcd(r. The sequence of pairs thus produced. that gcd(a. initialised by x0 = max(a. b). Its velocity is x = dx . where xn ≥ yn . The “time” can be discrete (i. and the other the greatest common divisor. 1. . In the module following this one you will study continuous-time chains. and evolving as xn+1 = yn yn+1 = rem(xn . Recall (from elementary school maths). for any t. X2 . For instance. one of which is 0. Newton’s second law says that the acceleration is proportional to the force: m¨ = F. s ≥ t) ˙ 1 . . b) is the remainder of the division of a by b. x = x(t) denotes the position of the particle at time t. x). or.PART I: MARKOV CHAINS 1 Introduction A Markov chain is a mathematical model of a random phenomenon evolving with time in a way that the past aﬀects the future only through the present. It is not hard to see that. the future trajectory (X(s). . In Mathematics. Suppose that a particle of mass m is moving under the action of a force F in a straight line. more generally. the future pairs Xn+1 . .

1 2 Statistical observations lead the scientist to believe that the mouse moves from cell 1 to cell 2 with probability α = 0.99. . Indeed. instead of deterministic objects. a ≤ X0 ≤ b. . 0 ≤ Y0 ≤ B. A mouse lives in the cage. we will see what kind of questions we can ask and what kind of answers we can hope for. In other words. say.can be completely speciﬁed by the current state X(t). A scientist’s job is to record the position of the mouse every minute. respectively. We will develop a theory that tells us how to describe. 1 and 2. We can summarise this information by the transition diagram: 1 Stochastic means random.000 rows and 10.000 columns or computing the integral of a complicated function of a large number of variables. . Each time count 1 if Yn < f (Xn ) and 0 otherwise. Other examples of dynamical systems are the algorithms run. past states are not needed. it does so.. We will be dealing with random variables. and use those mathematical models which are called Markov chains. Similarly. but some are stochastic. at time n + 1 it is either still in 1 or has moved to 2. it moves from 2 to 1 with probability β = 0. by the software in your computer. via the use of the tools of Probability.1 They are stochastic either because the problem they are solving is stochastic or because the problem is deterministic but “very large” such as ﬁnding the determinant of a matrix with 10.1 Examples of Markov chains A mouse in a cage A mouse is in a cage with two cells. containing fresh and stinky cheese. 2. An example of a random algorithm is the Monte Carlo method for the b approximate computation of the integral a f (x)dx of a complicated function f : suppose that f is positive and bounded below B. In addition. i. The generalisation takes us into the realm of Randomness. an eﬀective way for dealing with large problems is via the use of randomness. Repeat with (Xn . When the mouse is in cell 1 at time n (minutes) then. Choose a pair (X0 . Yn ).05. regardless of where it was at earlier times. analyse. The resulting ratio is an approximation b of a f (x)dx. Markov chains generalise this concept of dependence of the future only on the present. Count the number of 1’s and divide by N . Perform this a large number N of times. Try this on the computer! 2 . Some of these algorithms are deterministic. Y0 ) of random numbers. 2 2. uniformly.e. We will also see why they are useful and discuss how they are applied. for steps n = 1.

How long does it take for the mouse.05 0. S1 . Assume also that X0 . How often is the mouse in room 1 ? Question 1 has an easy. y = 0.) 2. The information described above by the evolution of the account together with the distributions for Dn and Sn leads to the (one-step) transition probability which is deﬁned by: px. from month to month. . S2 . are independent random variables. (This is the mean of the binomial distribution with parameter α. Dn = z + y − x) P (Sn = z)P (Dn = z + y − x) FS (z)FD (z + y − x). Dn − Sn = y − x) P (Sn = z.05 ≈ 20 minutes. We easily see that if y > 0 then ∞ x.y = P (Dn − Sn = y − x) = = z=0 ∞ z=0 ∞ P (Sn = z. as follows: Xn+1 = max{Xn + Dn − Sn . respectively.2 Bank account The amount of money in Mr Baxter’s bank account evolves. to move from cell 1 to cell 2? 2.01 Questions of interest: 1. = z=0 ∞ = z=0 3 . . on the average.95 0. Xn is the money (in pounds) at the beginning of the month. . are random variables with distributions FD . Dn is the money he deposits. px. 0}. answer: Since the mouse really tosses a coin to decide whether to move or stay in a cell. 1. FS . . Assume that Dn .99 0. So if he wishes to spend more than what is available. . the ﬁrst time that the mouse will move from 1 to 2 will have mean 1/α = 1/0. S0 .y := P (Xn+1 = y|Xn = x). D1 . D0 . he can’t.α 1−α 1 β 2 1−β Another way to summarise the information is by the 2×2 transition probability matrix P= 1−α α β 1−β = 0. 2. D2 . Sn . and Sn is the money he wants to spend. . intuitive. Here.

how often will he be visiting 0? 2. Are there any places which he will never visit? 3.0 p0.2 p1. Are there any places which he will visit inﬁnitely many times? 4.4 Simple random walk (drunkard’s walk) in a city Suppose that the drunkard is allowed to move in a city whose streets are laid down in a square pattern: 4 . So he may move forwards with equal probability that he moves backwards. if y = 0. Of course.1 p1.0 p2. in principle. he has no sense of direction.1 p0.3 Simple random walk (drunkard’s walk) Consider a completely drunk person who walks along a street. we have px.2 P= p2. 2. on the average. We may want to model such a case by assigning probability p for the drunkard to go one step to the right (and probability 1 − p for him to go one step to the left). Questions of interest: 1. How long will it take (if ever) for the bank account to empty? We shall develop methods for answering these and other questions. to go from the pub to home? You might object to the assumption that the person is totally drunk and has no sense of direction.0 = 1 − ∞ px. If he starts from position 0.something that.0 p1. Will Mr Baxter’s account grow. being drunk. 2. can be computed from FS and FD .1 p2. 1/2 1/2 1/2 1/2 1/2 −2 1/2 −1 1/2 0 1/2 1 1/2 2 1/2 3 Questions: 1. how fast? 2. and if so.y . how long will it take him. The transition probability matrix is y=1 p0. If the pub is located in position 0 and his home in position 100.2 ··· ··· ··· ··· · · · · · · ··· and is an inﬁnite matrix.

so we assign probability 1/4 for each move.H = 1 − pH. pD.1 D Thus. pD.S = 1 − pS.H .3 that the person will become sick.S − pH. there is probability pH. but they will become more precise in the sequel.Salvador Dali (1922) The Drunkard Suppose that the drunkard moves from corner to corner. Clearly. west.D . 2. and let’s say he’s completely drunk.D = 1. So he can move east. the answer to this is crucial for the policy determination. It proposes the following model summarising the state of health of an individual on a monthly basis: 0.H − pS. These things are. Question: What is the distribution of the lifetime (in months) of a currently healthy individual? Clearly. north or south. 5 .D is omitted. therefore.5 Actuarial chains A life insurance company wants to ﬁnd out how much money to charge its clients. But. and pS. that since there are more degrees of freedom now. observe. etc. given that he is currently healthy.S = 0. the reason being that the company does not believe that its clients are subject to resurrection. the company must have an idea on how long the clients will live. Also. for now.3 H 0.D . We may again want to ask similar questions.01 S 0.8 0. mathematically imprecise.S because pH. and pS. Note that the diagram omits pH. there is clearly a higher chance that our man will be lost.

n ∈ T}. . . and so future is independent of past given the present. The elements of S are frequently called states. n + m] Xk = ik | Xn = in . . . for any n.3 3. . viz. X1 = i1 .} or the set Z of all integers (positive. almost exclusively (unless otherwise said explicitly). T = Z+ = {0. Xn−1 ) are independent conditionally on Xn . . 2. or negative). m and any choice of states i0 . We focus almost exclusively to the ﬁrst choice. . Xn+m ) and (X0 . m ∈ T). in+m . Proof. unless needed. . 2. conditionally on Xn . the future process (Xm . So we will omit referring to T explicitly. . for all n ∈ Z+ and all i0 . it is customary to call (Xn ) a Markov chain. n ∈ Z+ ) is Markov if and only if. 3. the Markov property. . called the state space. . m ∈ T) is independent of the past process (Xm . that the Xn take values in some countable set S. . . m < n. . if the claim holds. . . . . zero. which precisely means that (Xn+1 . . where T is a subset of the integers. (Xn . . m > n. . . This is true for any n and m. . . (1) 6 . . We say that this sequence has the Markov property if. . . . . Sometimes we say that {Xn . . X2 = i2 . P (Xn+1 = in+1 | Xn = in . 3. Let us now write the Markov property in an equivalent way. .1 The Markov property Deﬁnition of the Markov property Consider a (ﬁnite or inﬁnite) sequence of random variables {Xn . . . On the other hand. Lemma 1. X0 = i0 ) = P (Xn+1 = in+1 | Xn = in ). Usually. 3. . . We will also assume. . n + m] Xk = ik | Xn = in ). If (Xn ) is Markov then the claim holds by the conditional independence of Xn+1 and (X0 . in+1 ∈ S. for any n ∈ T. 1. Since S is countable. Xn−1 ) given Xn . X0 = i0 ) = P (∀ k ∈ [n + 1.} or N = {1. Xn = in ) = P (X0 = i0 )P (X1 = i1 | X0 = i0 )P (X2 = i2 | X1 = i1 ) · · · P (Xn = in | Xn−1 = in−1 ). . . we can see that by applying it several times we obtain P (∀ k ∈ [n + 1. n ∈ T} is Markov instead of saying that it has the Markov property.2 Transition probability and initial distribution We therefore see that joint probabilities can be expressed as P (X0 = i0 .

n) = 1(i = j) and. n). pi. 7 . n) := [pi.j (n. 3. n) = i1 ∈S ··· in−1 ∈S pi.j (m. m + 2) · · · P(n − 1.i (n − 1. n)]i. n).i1 (m. The function pi. j ∈ S. n + 1) is called transition probability from state i at time n to state j at time n + 1. n) = P(m.j (m. n). n ≥ 0. . 2) · · · pin−1 . If |S| = d then we have d × d matrices. viz. for each m ≤ n the matrix P(m. m + 1)P(m + 1. X1 = i1 . a square matrix whose rows and columns are indexed by S (and this. by a result in Probability.3 In this new notation. (2) which is reminiscent of matrix multiplication. But we will circumvent them by using Probability only. means that the transition probabilities pi.. by (1). by deﬁnition. or as P(m.j (m. i∈S i.j (n. m + 1) · · · pin−1 . ℓ) P(ℓ. n) := P (Xn = j | Xm = i) is the transition probability from state i at time m to state j time n ≥ m. The function µ(i) is called initial distribution. all probabilities of events associated with the Markov chain This causes some analytical diﬃculties. n). .i1 (0. then the matrices are inﬁnitely large.j∈S .i2 (1. . X2 = i2 . of course.j (m. pi. n) = P(m. Xn = in ) = µ(i0 ) pi0 . we deﬁne. but if S is an inﬁnite set. pi.which means that the two functions µ(i) := P (X0 = i) . 1) pi1 . something that can be called semigroup property or Chapman-Kolmogorov equation. we can write (2) as P(m.in (n − 1. requires some ordering on S) and whose entries are pi. remembering a bit of Algebra. n + 1) := P (Xn+1 = j | Xn = i) . 2 3 And. will specify all joint distributions2 : P (X0 = i0 . More generally. pi. which.j (n. . Of course.j (n.3 Time-homogeneous chains We will specialise to the case of time-homogeneous chains. n + 1) do not depend on n . Therefore. n) if m ≤ ℓ ≤ n.

when we say “Markov chain” we will mean “timehomogeneous Markov chain”.j ]i. of course. n) = P × P × · · · · · · × P = Pn−m . then µ = pδi1 + (1 − p)δi2 . we write Ei Y := E(Y | X0 = i) = yP (Y = y | X0 = i) = yPi (Y = y). n + 1) =: pi. if µ assigns probability p to state i1 and 1 − p to state i2 . For example. Similarly. if j = i otherwise. Then we have µ n = µ 0 Pn . b ≥ 0. From now on. transition matrix.and therefore we can simply write pi. and the semigroup property is now even more obvious because Pa+b = Pa Pb . while P = [pi. as follows from (1) and the deﬁnition of P.j the (one-step) transition probability from state i to state j.j . (3) We call δi the unit mass at i. Notational conventions Unit mass. and call pi. Then P(m. 0. The distribution that assigns value 1 to a state i and 0 to all other states is denoted by δi . Notice that any probability distribution on S can be written as a linear combination of unit masses. let µn := [µn (i) = P (Xn = i)]i∈S be the distribution of Xn .j∈S is called the (one-step) transition probability matrix or. For example. The matrix notation is useful. δi (j) := 1. if Y is a real random variable. y y 8 . In other words. Starting from a speciﬁc state i. It corresponds. simply. arranged in a row vector. to picking state i with probability 1. n−m times for all m ≤ n. for all integers a. Thus Pi is the law of the chain when the initial distribution is δi . unless otherwise speciﬁed.j (n. A useful convention used for time-homogeneous chains is to write Pi (A) := P (A | X0 = i).

then. . 1. . the law of a stationary process does not depend on the origin of time. Imagine there are N molecules of gas (where N is a large number. It is clear. This is a model of gas distributed between two identical communicating chambers. m + 1. i. say N = 1025 ) in a metallic chamber with a separator in the middle. On the average.e. random variables provides an example of a stationary process. the distribution of the bug will still be the same. . We can model this by saying that the chance that a molecule move from room 1 to room 2 is proportional to the number of molecules in room 1.i−1 = P (Xn+1 N −i N i = i − 1 | Xn = i) = . Clearly. selecting a vertex with probability 1/7. The gas is placed partly in room 1 and partly in room 2.). n = 0. that if room 1 contains more gas than room 2 then there is a tendency to observe net motion from 1 to 2. for a while we will be able to tell how long ago the process started. Consider a canonical heptagon and let a bug move on its vertices. To provide motivation and intuition. because it is impossible for the number of molecules to remain constant at all times. . then. n = m. . Then a hole is opened in the separator and we watch what happens as the molecules diﬀuse between the two rooms. In other words. . as it takes some time before the molecules diﬀuse from room to room. Example: the Ehrenfest chain. Then pi. But this will not work. we indeed expect that we have N/2 molecules in each room. The question then is: can we distribute the initial number of molecules in a way that the origin of time plays no role? One guess might be to take X0 = N/2. at any point of time n. If at time n the bug is on a vertex. at time n + 1 it moves to one of the adjacent vertices with equal probability.d. for any m ∈ Z+ . It should be clear that if initially place the bug at random on one of the vertices. let us look at the following example: Example: Motion on a polygon. Hence the uniform probability distribution is stationary.i+1 = P (Xn+1 = i + 1 | Xn = i) = pi. Let Xn be the number of molecules in room 1 at time n. N If we start the chain with X0 = N (all molecules in room 1) then the process (Xn .) is stationary if it has the same law as (Xn . and vice versa.i. a sequence of i.4 Stationarity We say that the process (Xn . We now investigate when a Markov chain is stationary. dividing it into two identical rooms. n ≥ 0) will not be stationary: indeed. Another guess then is: Place 9 .

i + P (X0 = i + 1)pi+1. Lemma 2. . In other words.i = N N −i+1 N i+1 2−N + = · · · = π(i). . we are led to consider: The stationarity problem: Given [pij ]. hence. n ∈ Z+ ) is stationary if and only if it is time-homogeneous and Xn has the same distribution as Xℓ for all n and ℓ. .i = π(i − 1)pi−1. .i = P (X0 = i − 1)pi−1. X2 = i2 . Xm+n = in ). a Markov chain is stationary if and only if the distribution of Xn does not change with n. . . . Xm+2 = i2 . Indeed. If we do so.i + π(i + 1)pi+1. for all m ≥ 0 and i0 . Since. .) The example above is typical: We found (by guessing!) some distribution π with the property that if X0 has distribution π then every other Xn also has distribution π. Proof. stationary or invariant distribution. then the number of molecules in room 1 will be i with probability π(i) = N −N 2 . . The only if part is obvious. i 0 ≤ i ≤ N. P (Xn = i) = π(i) for all n. ﬁnd those distributions π such that πP = π. 2−N i−1 i+1 N N (The dots signify some missing algebra.each molecule at random either in room 1 or in room 2. Xm+1 = i1 . . by (3). use (1) to see that P (X0 = i0 . we see that the Markov chain is stationary if and only if the distribution µ0 is chosen so that µ0 = µ0 P. . Xn = in ) = P (Xm = i0 . which you should go through by yourselves. . If we can do so. (a binomial distribution). then we have created a stationary Markov chain. 10 . A Markov chain (Xn . P (X1 = i) = j P (X0 = j)pj. the distribution µn at time n is related to the distribution at time 0 by µn = µ0 Pn . X1 = i1 . (4) Such a π is called . The equations (4) are called balance equations. in ∈ S. But we need to prove this. For the if part. We can now verify that if P (X0 = i) = π(i) then P (X1 = i) = π(i) and. Changing notation.

π must be a probability: π(i) = 1. i≥1 π(i) −1 To compute π(1). and this means that 1 is a (right) eigenvector of P corresponding to the eigenvalue 1. . In addition to that. which.i = p(i).i. i = 1.}.i = 1.d. i ≥ 1. we must be careful. . where all the xi are distinct. It is easy to see that the stationary distribution is uniform: π(x) = 1 . d}. write the equations (4): π(i) = π(j)pj.Eigenvector interpretation: The equation π = πP says that π is a left eigenvector of the matrix P with eigenvalue 1. The times between successive buses are i. Consider the following process: At time n (measured in minutes) let Xn be the time till the arrival of the next bus. Thus. . . i∈S The matrix P has the eigenvalue 1 always because j∈S pi. j≥1 These can easily be solved: π(i) = π(1) j≥i p(j). After bus arrives at a bus stop. then we run into trouble: indeed. and so each π(i) will be zero. To ﬁnd the stationary distribution. . Consider a deck of d = 52 cards and shuﬄe them in the following (stupid) way: pick two cards at random and swap their positions. So we must make an: ASSUMPTION: i≥1 j≥i p(j) < ∞. d! x ∈ Sd . Who tells us that the sum inside the parentheses is ﬁnite? If the sum is inﬁnite. . in matrix notation. π(1) will be zero. where 1 is a column whose entries are all 1. . . which in turn means that the normalisation condition cannot be satisﬁed. we use the normalisation condition π(1) = i≥1 j≥i = 1.j = 1.i = π(0)p(i) + π(i + 1). each x ∈ Sd is of the form x = (x1 . i ≥ 0. 11 . . The state space is S = N = {1.. We can describe this by a Markov chain which takes values in the set Sd of d! permutations of {1. random variables. Example: waiting for a bus. which gives p(j) . 2 . But here. Then (Xn . if i > 0 i ∈ N. n ≥ 0) is a Markov chain with transition probabilities pi+1. Example: card shuﬄing. . the next one will arrive in i minutes with probability p(i). . xd ). Keep doing it continuously. p1. This means that the solution to the balance equations is identically zero. . can be written as P1 = 1. 2. .

ξr (n + 1) = εr ) = P (ξ1 (n) = 0. . . from (5). 1] as a ﬂexible bread dough arranged in a long rod) kneads the bread for it stretches it till it doubles in length and then folds it back in the middle. . 1. Example: bread kneading. are i. P (ξr (n) = 1) = P (ξr (n) = 0) = 1/2 The proof is by induction: if the ξ1 (n). any r = 1. with P (ξr (0) = 1) = P (ξr (0) = 0) = 1/2 then the same will be true for all n. i.. . . . . . . .d.). . ξ2 (n) = ε1 . 2.). Here ξ(n) = (ξ1 (n). So let us do so. .. . 2. . what we have really shown is that this assumption is a necessary and suﬃcient condition for the existence and uniqueness of a stationary distribution. 1}. . If ξ1 = 0 then x < 1/2 and our shifting rule maps x into 2x. i. r = 1. . 2(1 − x)). . for any n = 0. ξ2 (n). . 1} for all i.. . But to even think about it will give you some strength. (5) 12 . . 1 − ξ3 . As for the name of the chain (bread kneading) it can be justiﬁed as follows: to each ξ there corresponds a number x between 0 and 1 because ξ can be seen as the binary representation of the real number x.Let us work out the sum to see if this assumption can be expressed in more “physical” terms: p(j) = 1(j ≥ i)p(j) = jp(j). ξ(n + 1) = ϕ(ξ(n)). Note that there is nothing random in the transition mechanism of this chain! The only way to instill randomness is by choosing the initial state at random. Remarks: This is not a Markov chain with countably many states. . i≥1 j≥i j≥1 i≥1 j≥1 and the latter sum is the expected time between two successive bus arrivals.d. applied to the interval [0.e. ξi ∈ {0. . The Markov chain is obtained by applying the mapping ϕ again and again. We can show that if we start with the ξr (0). . εr ∈ {0. 1] (think of [0. . 2. P (ξ1 (n + 1) = ε1 . So it goes beyond our theory. . But the function f (x) = min(2x.i. . ξ2 (n) = 1 − ε1 . . . . So. Hence choosing the initial state at random in this manner results in a stationary Markov chain. . and transform it according to the following rule: if ξ1 = 0 then shift to the left to obtain (ξ2 . ξr+1 (n) = 1 − εr ). . the same is true at n + 1. ξ2 . if ξ1 = 1 then shift to the left and ﬂip all bits to obtain (1 − ξ2 .* (This is slightly beyond the scope of these lectures and can be omitted.) Consider an inﬁnite binary vector ξ = (ξ1 . ξ3 . and any ε1 . . We have thus deﬁned a mapping ϕ from binary vectors into binary vectors. So our assumption really means: ASSUMPTION ⇐⇒ the expected time between successive bus arrivals is ﬁnite.). ξr+1 (n) = εr ) + P (ξ1 (n) = 1.i. . . . with P (ξr (n) = 1) = P (ξr (n) = 0) = 1/2 then. If ξ1 = 1 then x ≥ 1/2. . . and the rule maps x into 2(1 − x). . ξ2 (n). . You can check that .) is the state at time n. .

Let us introduce the notion of “probability flow” from a set A of states to its complement under a distribution π: F (A. we introduce more.i 13 . But we only have d unknowns. If the state space has d states. i∈A j∈Ac Think of π(i)pi. Writing the equations in this form is not always the best thing to do. amongst them. A).j (which equals 1): π(i) j∈S pi. π(i) = j∈S (balance equations) (normalisation condition). for all A ⊂ S.Note on terminology: steady-state.i + j∈Ac π(j)pj. Ac ) := π(i)pi. expecting to be able to choose.j . This is OK. Instead of eliminating equations. The convenience is often suggested by the topological structure of the chain. then choose A = {i} and see that you obtain the balance equations. So F (A. We have: Proposition 1. as we will see in examples. If the latter condition holds. therefore one of them is always redundant. π is a stationary distribution if and only if π1 = 1 and F (A. Ac ) is the total current from A to Ac . because the ﬁrst d equations are linearly dependent: the sum of both sides equals 1. π(j)pj. Fix a set A ⊂ S and write π(i) = j∈S π(j)pj. then the above are d + 1 equations. suppose that the balance equations hold.j as a “current” ﬂowing from i to j. Ac ) = F (Ac . a set of d linearly independent equations which can be more easily solved.j = j∈A π(j)pj.i . j∈S i ∈ S. When a Markov chain is stationary we refer to it as being in 4. π(j) = 1.i Multiply π(i) by j∈S pi.1 Finding a stationary distribution As explained above.i + j∈Ac π(j)pj. a stationary distribution π satisﬁes π = πP π1 = 1 In other words. Conversely.i = j∈A π(j)pj. Proof.

A c F(A .048. p22 = 1 − β. in steady state. where p + q = 1: p p p 0 q 1 q 2 q 3 q 4 . but we shall not do this here. Thus.2% of the time in 1. Example: a walk with a barrier. The constant c is found from π(1) + π(2) = 1. which immediately tells us that π(1) is proportional to β and π(2) proportional to α: π(1) = cβ. That is. Instead. π(1) = β .. the mouse spends only 95.j + j∈A j∈Ac π(i)pi. . A) Schematic representation of the ﬂow balance relations. . 3.i This is true for all i ∈ S. (Consequently. .8% of the time in room 2. 1.05 ≈ 0. A ) A c c F( A . One can ask how to choose a minimal complete set of linearly independent equations.05. gives us ﬂexibility.) Assume 0 < α. here is an example: Example: Consider the 2-state Markov chain with p12 = α. β < 1. .i + j∈Ac π(j)pj. The balance equations are: π(i) = pπ(i − 1) + qπ(i + 1). p q Consider the following chain..04 room 1. 2. p21 = β. We have π(1)α = π(2)β. 14 i = 1.99 (as in the mouse example). and 4. Summing over i ∈ A.99 ≈ 0.952. we see that the ﬁrst term on the left cancels with the ﬁrst on the right. If we take α = 0. p11 = 1 − α. α+β π(2) = α . The extra degree of freedom provided by the last result.Now split the left sum also: π(i)pi. we ﬁnd π(1) = 0.j = j∈A π(j)pj.04 π(2) = 0. β = 0. while the remaining two terms give the equality we seek. α+β π(2) = cα.

Ac ) = π(i − 1)p = π(i)q = F (Ac . This is i=0 possible if and only if p < q.e. . Does this provide a stationary distribution? Yes. Since 0 < Pi (∃n ≥ 0 Xn = j) ≤ 15 Pi (Xn = j). i − 1}. (ii) pi. • Suppose that i j. Proof. a pair (S. . In other words.j > 0. The following are equivalent: (i) i j i i j) ⇐⇒ Pi (∃n ≥ 0 Xn = j) > 0.2 The relation “leads to” We say that a state i leads to state j if. we can solve them immediately. . This is often cast in terms of a digraph (directed graph). i. In fact. So assume i = j. provided that ∞ π(i) = 1. p < 1/2.i2 · · · pin−1 . (iii) Pi (Xn = j) > 0 for some n ≥ 0. 5.) 5 5.e. These are much simpler than the previous ones. π(i) = (p/q)i π(0). . the chain will visit j at some ﬁnite time. . (If p ≥ 1/2 there is no stationary distribution. j) ∈ E ⇐⇒ pi. .j > 0 for some n ∈ N and some states i1 . and E a set of directed edges. 1. . starting from i. in−1 . S is a set of vertices. If i = j there is nothing to prove. For i ≥ 1. i leads to j (written as i Notice that this relation is reflexive): ∀i ∈ S Furthermore: Theorem 1. A). n≥0 . If i ≥ 1 we have F (A. . We can frequently (but not always) draw the graph (as we did in examples) and visualise the set E as the set of all edges with arrows. i. In graph-theoretic terminology. E) where S is the state space and E ⊂ S × S deﬁned by (i.1 Topological structure The graph of a chain This refers to the structure of the process that does not depend on the exact values of the transition probabilities but only on which of them are positive.But we can make them simpler by choosing A = {0.i1 pi1 .

Corollary 1. The class {4. where the sum extends over all possible choices of states in S. 5}. We just observed it is symmetric. {6}.i2 · · · pin−1 . We have Pi (Xn = j) = i1 . Proof. Hence Pi (Xn = j) > 0. because Pi (Xn = j) ≤ Pi (∃m ≥ 0 Xm = j). is an equivalence relation. i}. However. this is symmetric (i Corollary 2. it partitions S into equivalence classes known as communicating classes. (iii) holds.in−1 pi.3 The relation “communicates with” Deﬁne next the relation i communicates with j (written as i Obviously. 3}.j . Since Pi (Xn = j) > 0. 5. [i] = [j] if and only if i identical or completely disjoint. It is reﬂexive and transitive because is so. by deﬁnition. {2. 16 . The ﬁrst two classes diﬀer from the last two in character.. Two communicating classes are either 1 2 3 4 5 6 In the example of the ﬁgure. The communicating class corresponding to the state i is. Equivalence relation means that it is symmetric. 3} is not closed. we have that the sum is also positive and so one of its terms is positive. (6) j. j) ⇐⇒ i j and j i. (i) holds. Look at the last display again. In other words. meaning that (ii) holds. 5} is closed: if the chain goes in there then it never moves out of it. by deﬁnition. reﬂexive and transitive.. • Suppose now that (iii) holds. and so (iii) holds.. The relation is transitive: If i j and j k then i k.the sum on the right must have a nonzero term. Just as any equivalence relation.. we see that there are four communicating classes: {1}. {4.i1 pi1 . the sum contains a nonzero term. the set of all states that communicate with i: [i] := {j ∈ S : j So. The relation j ⇐⇒ j i). In addition. • Suppose that (ii) holds. the class {2. By assumption.

. To put it otherwise. C1 C2 C3 C4 C5 The general structure of a Markov chain is indicated in this ﬁgure. its transition matrix P) is called irreducible. We then can ﬁnd states i1 . j∈C for all i ∈ C. The classes C4 . Since i ∈ C. There can be arrows between two nonclosed communicating classes but they are always in the same direction. more manageable.i2 · · · pin−1 . we have i1 ∈ C. Note: An inessential state i is such that there exists a state j such that i j but j i. starting from any state. In graph terms. Suppose ﬁrst that C is a closed communicating class. In other words still. . A general Markov chain can have an inﬁnite number of communicating classes.i = 1 then we say that i is an absorbing state. If a state i is such that pi. The communicating classes C1 . . Note that there can be no arrows between closed communicating classes. pi1 . The converse is left as an exercise.j > 0. A communicating class C is closed if and only if the following holds: If i ∈ C and if i j then j ∈ C. Similarly. the single-element set {i} is a closed communicating class.i1 > 0. 17 . Closed communicating classes are particularly important because they decompose the chain into smaller. If we remove the non-closed communicating classes then we obtain a collection of disjoint closed communicating classes with no arrows between them. If all states communicate with all others. Thus.i1 pi1 . and so i2 ∈ C. Otherwise. in−1 such that pi. then the chain (or. parts. Proof. and so on. the state is inessential. we say that a set of states C ⊂ S is closed if pi. Lemma 3. pi. . the state space S is a closed communicating class. we can reach any other state. Note: If we consider a Markov chain with transition matrix P and ﬁx n ∈ N then the Markov chain with transition matrix Pn has exactly the same closed communicating classes. we conclude that j ∈ C. more precisely. Why? A state i is essential if it belongs to a closed communicating class. C5 in the ﬁgure above are closed communicating classes. C3 are not closed. C2 . and C is closed.More generally. Suppose that i j.j = 1.i2 > 0.

pii (ℓn) ≥ pii (n) ℓ . X3 ∈ C2 . all states form a single closed communicating class. Another way to say this is: (n) (m) If the integer n divides m (denote this by: n | m) and if pii > 0 then pii > 0. If i j then d(i) = d(j). Since Pm+n = Pm Pn . So. Dj = {n ∈ (n) (α) N : pjj > 0}. whether it is possible to return to i in m steps can be decided by one of the integer divisors of m. . d(i) = gcd Di . 3} at some point of time then. If i j there is α ∈ N with pij > 0 and if 18 . . Theorem 2 (the period is a class property). (m) (n) for all m. So if the chain starts with X0 ∈ C1 then we know that X1 ∈ C2 . j. Notice however. (Can you ﬁnd an example of a chain with period 3?) We will now give the deﬁnition of the period of a state of an arbitrary chain. . that we can further simplify the behaviour of the chain by noticing that if the chain is in the set C1 = {1. Therefore if pii > 0 then pii (n) (ℓn) > 0 for all ℓ ∈ N. X4 ∈ C1 . i. The chain alternates between the two sets.5. A state i is called aperiodic if d(i) = 1. We say that the chain has period 2. The period of an essential state is deﬁned as the greatest common divisor of all natural numbers n with such that it is possible to return to i in n steps: d(i) := gcd{n ∈ N : Pi (Xn = i) > 0} . we have So pii (2n) (n) pij ≥ pii (n) 2 (m+n) ≥ pik pkj . . n ≥ 0. i. ℓ ∈ N. d(j) = gcd Dj . 4} at the next step and vice versa. j ∈ S.4 Period Consider the following chain: 1 2 4 3 The chain is irreducible. it will move to the set C2 = {2. X2 ∈ C1 . The period of an inessential state is not deﬁned.e. Consider two distinct essential states i. Let us denote by pij the entries of Pn . more generally. (n) Proof. and. Let Di = {n ∈ N : pii > 0}.

If i is an aperiodic state then there exists n0 such that pii > 0 for all n ≥ n0 . C1 . . . (n) (n) (α+n+β) (α) (n) (β) (β) (α+β) (α) (β) This ﬁgure shows the internal structure of a closed communicating class with period d = 4. are irreducible chains. . this is the content of the following theorem. Then pii > 0 and so pjj ≥ pij pii pji > 0. i. if d = 1. Therefore n = qn1 + r(n2 − n1 ) = (q − r)n1 + rn2 . d − 1. . Then we can uniquely partition the state space S into d disjoint subsets C0 . . So d(i) = d(j). . Consider an irreducible chain with period d. Therefore d(j) | α + β + n for all n ∈ Di . showing that α + n + β ∈ Dj . j∈Cr+1 i ∈ Cr . More generally. From the last two displays we get d(j) | n for all n ∈ Di . j ∈ S. all states communicate with one another. An irreducible chain has period d if one (and hence all) of the states have period d. r = 0. Therefore d(j) | α + β. we have n ∈ Di as well. Divide n by n1 to obtain n = qn1 + r. In particular. if an irreducible chain has period d then we can decompose the state space into d sets C0 . Theorem 4. (Here Cd := C0 . Arguing symmetrically. Proof. Of particular interest. Cd−1 such that the chain moves cyclically between them.) 19 . Pick n2 . Now let n ∈ Di . the chain is called aperiodic. . Formally. Because n is suﬃciently large. where the remainder r ≤ n1 − 1. .j i there is β ∈ N with pji > 0. showing that α + β ∈ Dj . 1. So pjj ≥ pij pji > 0. Since n1 . chains where.e. An irreducible chain with ﬁnitely many states is aperiodic if and only if there (n) exists an n such that pij > 0 for all i. we obtain d(i) ≤ d(j) as well. So d(j) is a divisor of all elements of Di . . . Theorem 3. d(j) | d(i)). n1 ∈ Di such that n2 − n1 = 1. Cd−1 such that pij = 1. Let n be suﬃciently large. . C1 . Corollary 3. n2 ∈ Di . q − r > 0. Since d(i) is the greatest common divisor we have d(i) ≥ d(j) (in fact. If a number divides two other numbers then it also divides their diﬀerence. as deﬁned earlier. .

Deﬁne the relation i d j ⇐⇒ pij (nd) > 0 for some n ≥ 0. We now show that if i belongs to one of these classes and if pij > 0 then.. j belongs to the next class. Such a path is possible because of the choice of the i0 . We use a method that is based upon considering what the Markov chain does at time 1. and by the assumptions that d d j. Hence it partitions the state space S into d disjoint equivalence classes. i ∈ C0 . Take. i1 . We show that these classes satisfy what we need. . . i1 . with corresponding classes C0 . Then pick i1 such that pi0 i1 > 0. Continue in this manner and deﬁne states i0 . say. Pick a state i0 and let C0 be its equivalence class (i. . i2 i0 i0 in a number of steps which is an integer multiple of d − 1 (why?). The existence of such a path implies that it is possible to go from i0 to i.id > 0. Cd−1 . which contradicts the deﬁnition of d. necessarily..e. It is easy to see that if we continue and pick a further state id with pid−1 . id−1 d 6 Hitting times and ﬁrst-step analysis Consider a Markov chain with transition probability matrix P. The reader is warned to be alert as to which of the two variables we are considering at each time.e the set of states j such that i j).. consider the chain 1 1/3 1 0 4 1/2 1 1/6 2 We shall later consider the time inf{n ≥ 1 : Xn ∈ A} which diﬀers from TA simply by considering n ≥ 1 instead of n ≥ 0. necessarily. If X0 ∈ A the two times coincide. As an example. . id ∈ C0 . j ∈ C2 . for example. Consider the path i0 → i → j → i2 → i3 → · · · → id → i0 .. Assume d > 1. i. Suppose pij > 0 but j ∈ C1 but. Notice that this is an equivalence relation... after it takes one step from its current position. that is why we call it “first-step analysis”. We deﬁne the hitting time4 of a set of states A by TA := inf{n ≥ 0 : Xn ∈ A}. C1 . 20 .Proof. We avoid excessive notation by using the same letter for both. Let C1 be the equivalence class of i1 . .. We are interested in deriving formulae for the probability that this time is ﬁnite as well as the expectation of this time. then.

′ Proof. then TA ≥ 1. if ϕ′ (i) is any other solution of these equations then ϕ′ (i) ≥ ϕ(i) for all i ∈ S. ﬁx a set of states A. let ϕ′ (i) be another solution. But the Markov chain is homogeneous. conditionally on X1 . . Furthermore.).It is clear that P1 (T0 < ∞) < 1. The function ϕ(i) satisﬁes ϕ(i) = 1. Therefore. we have ϕ′ (i) = j∈A pij + j∈A = j∈A pij + j∈A pij pij pjk + k∈A k∈A pjk + j∈A pij pjk ϕ′ (k) k∈A pjk ϕ′ (k) k∈A 21 . because the chain may end up in state 2. By self-feeding this equation. Then ϕ′ (i) = j∈A pij + j∈A pij ϕ′ (j). Hence: ′ ′ P (TA < ∞|X1 = j. and deﬁne ϕ(i) := Pi (TA < ∞). For the second part of the proof. But what exactly is this probability equal to? To answer this in its generality. If i ∈ A. and so ϕ(i) = 1. So TA = 1 + TA . . We then have: Theorem 5. X2 . i ∈ A.e. ϕ(i) = j∈S i∈A pij ϕ(j). ′ is the remaining time until A is hit). a ′ function of X1 . We ﬁrst have (where TA Pi (TA < ∞) = j∈S ′ Pi (1 + TA < ∞|X1 = j)Pi (X1 = j) = j∈S ′ Pi (TA < ∞|X1 = j)pij ′ But observe that the random variable TA is a function of the future after time 1 (i. . we have Pi (TA < ∞) = ϕ(j)pij . X0 = i) = P (TA < ∞|X1 = j). j∈S as needed. If i ∈ A then TA = 0. which implies that ′ P (TA < ∞|X1 = j) = P (TA < ∞|X0 = j) = Pj (TA < ∞) = ϕ(j). Combining the above. the event TA is independent of X0 .

2. (If the chain starts from 0 it takes no time to hit 0. 2}. The function ψ(i) satisﬁes ψ(i) = 0. which is the second of the equations. the second equals Pi (TA = 2). The second part of the proof is omitted. ′ E i TA = 1 + E i TA = 1 + j∈S pij Ej TA = 1 + j∈A pij ψ(j). and ϕ(2) = 0. Clearly. By omitting the last term we obtain ϕ′ (i) ≥ Pi (TA ≤ 2). Therefore. We immediately have ϕ(0) = 1. Example: Continuing the previous example. By continuing self-feeding the above equation n times. 3 2 so ϕ(1) = 3/4. ψ(i) = 1 + i∈A pij ψ(j). Example: In the example of the previous ﬁgure. On the other hand. the set that contains only state 0. ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ Next consider the mean time (mean number of steps) until the set A is hit for the ﬁrst time. Furthermore. obviously.) As for ϕ(1). We let ϕ(i) = Pi (T0 < ∞). We have: Theorem 6. If ′ i ∈ A. let A = {0. If i ∈ A then. i = 0. we obtain ϕ′ (i) ≥ Pi (TA < ∞) = ϕ(i). and let ψ(i) = Ei TA . because it takes no time to hit A if the chain starts from A. Start the chain from i. so the ﬁrst two terms together equal Pi (TA ≤ 2). as above. Proof. if the chain starts from 2 it will never hit 0. if ψ ′ (i) is any other solution of these equations then ψ ′ (i) ≥ ψ(i) for all i ∈ S. 3 22 . j∈A i ∈ A. we obtain ϕ′ (i) ≥ Pi (TA ≤ n). ψ(i) := Ei TA . we choose A = {0}.We recognise that the ﬁrst term equals Pi (TA = 1). then TA = 1 + TA . ψ(0) = ψ(2) = 0. 1 ψ(1) = 1 + ψ(1). TA = 0 and so ψ(i) := Ei TA = 0. we have 1 1 ϕ(1) = ϕ(1) + ϕ(0). Letting n → ∞. 1.

. ϕAB (i) = 0. . TB for two disjoint sets of states A. in the last sum. consider the following situation: Every time the state is x a reward f (x) is earned. therefore the mean number of ﬂips until the ﬁrst success is 3/2. ϕAB is the minimal solution to these equations. Clearly. The total reward is f (X0 ) + f (X1 ) + · · · + f (XTA ). . we just changed index from n to n + 1. where. until set A is reached. i∈A Furthermore. Next. of the random variables X1 . Hence. ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ As yet another application of the ﬁrst-step analysis method.) ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ Now let us consider two hitting times TA . This happens up to the ﬁrst hitting time TA ﬁrst hitting time of the set A. as argued earlier. (This should have been obvious from the start. 23 . ′ TA Ex n=0 f (Xn+1 ) = y∈S pxy h(y). because the underlying experiment is the ﬂipping of a coin with “success” probability 2/3. h(x) = Ex n=0 f (Xn ) ′ 1+TA = f (x) + Ex n=1 ′ TA f (Xn ) = f (x) + Ex n=0 f (Xn+1 ). We may ask to ﬁnd the probabilities ϕAB (i) = Pi (TA < TB ). .e. Now the last sum is a function of the future after time 1. ′ where TA is the remaining time. because when X0 ∈ A then T0 = 0 and so the total reward is f (X0 ). i. after one step. It should be clear that the equations satisﬁed are: ϕAB (i) = 1. h(x) = f (x). X2 . Then ′ 1+TA x ∈ A. otherwise. ′ TA = 1 + TA .which gives ψ(1) = 3/2. if x ∈ A. by the Markov property. B. ϕAB (i) = j∈A i∈B pij ϕAB (j). We are interested in the mean total reward TA h(x) := Ex n=0 f (Xn ).

h(1) = 5. in which case he dies immediately. if he starts from room 2. are h(x) = 0. why?) If we let h(i) be the average total reward if he starts from room i. collecting “rewards”: when in room i. x∈A pxy h(y). until he dies in room 4. he will die. unless he is in room 4. k ∈ N) form the strategy of the gambler. If heads come up then you win £1 per pound of stake. Example: A thief enters sneaks in the following house and moves around the rooms. no gambler is assumed to have any information about the outcome of the upcoming toss. (Also. your fortune after the n-th toss is n Xn = X0 + k=1 Y k ξk . h(3) = 7. y∈S h(x) = f (x) + x ∈ A. 1 2 4 3 The question is to compute the average total reward. (Sooner or later. 2 2 We solve and ﬁnd h(2) = 8. Let p the probability of heads and q = 1 − p that of tails. 2. 3. If ξk is the outcome of the k-th toss (where ξk = +1 means heads and ξk = −1 means tails). if just before the k-th toss you decide to bet Yk pounds then after the realisation of the toss you will have earned Yk pounds if heads came up or lost Yk pounds if tails appeared. we have h(4) = 0 1 h(1) = 1 + h(2) 2 1 h(3) = 3 + h(2) 2 1 1 h(2) = 2 + h(1) + h(3). You can remember the latter ones by thinking that the expectecd total reward equals the immediate reward f (x) plus the expected remaining reward. for i = 1. So. he collects i pounds. Clearly.Thus. so 24 .) 7 Gambler’s ruin Consider the simplest game of chance of tossing an (possibly unfair) coin repeatedly and independently. the set of equations satisﬁed by h. The successive stakes (Yk .

is our next goal. Consider a gambler playing a simple coin tossing game. betting £1 at each integer time against a coin which has P (heads) = p. it will make enormous proﬁts). ℓ) := Px (Tℓ < T0 ). in the ﬁrst case. . Ex ) for the probability (respectively. in simplest case Yk = 1 for all k. We are clearly dealing with a Markov chain in the state space {a. a ≤ x ≤ b. . Consider the hitting time of state y: Ty := inf{n ≥ 0 : Xn = y}. a + 1. ξk−1 and–as is often the case in practice–on other considerations of e. . write ϕ(x) instead of ϕ(x. Then the probability that the gambler’s fortune will reach level b > x before reaching level a < x equals: x−a − 1 (q/p) (q/p)b−a − 1 . It can only depend on ξ1 . . ℓ). pi. in addition. 0 ≤ x ≤ ℓ. In other words. Let £x be the initial fortune of the gambler. b − 1. the company has control over the probability p. if p = q = 1/2 b−a Proof.g. b are absorbing. . We obviously have ϕ(0) = 0. b} with transition probabilities pi. if p = q Px (Tb < Ta ) = . We use the notation Px (respectively. Yk cannot depend on ξk . x − a .i−1 = q = 1 − p. . b = ℓ > 0 and let ϕ(x. b − a). Let us consider a concrete problem. . a + 2. . She will stop playing if her money fall below a (a < x). . astrological nature. expectation) under the initial condition X0 = x. To solve the problem. P (tails) = 1 − p. she must base it on whatever outcomes have appeared thus far. (Think why!) To simplify things. We ﬁrst solve the problem when a = 0. 25 ϕ(ℓ) = 1. We wish to consider the problem of ﬁnding the probability that the gambler will never lose her money.i+1 = p. The gambler starts with X0 = x pounds and has a target: To make b pounds (b > x). This has profound actuarial applications: An insurance company must make sure that the strategies it employs are such that–at least–the company will not go bankrupt (and.if the gambler wants to have a strategy at all. The diﬀerence between the gambler’s ruin problem for a company and that for a mere mortal is that. and where the states a. Theorem 7. as described above. a < i < b. The general solution will then obtained from Px (Tb < Ta ) = ϕ(x − a.

We ﬁnally ﬁnd λx − 1 . if p = q. Since ϕ(ℓ) = 1. then λ = q/p > 1. If p > 1/2. we ﬁnd δ(0) = 1/ℓ and so x ϕ(x) = . if p > 1/2 if p ≤ 1/2. λℓ − 1 0 ≤ x ≤ ℓ. and so Px (T0 < ∞) = 1 − lim If p < 1/2. λ = 1. We have δ(x) = q δ(x − 1) = p q p 2 0 < x < ℓ. δ(x − 2) = · · · = q p x δ(0). λ−1 ϕ(x) = If λ = 1 then q = p and so ϕ(x + 1) − ϕ(x) = ϕ(x) − ϕ(x − 1). we ﬁnd ϕ(x) = δ(x − 1) + δ(x − 2) + · · · + δ(0) = [λx−1 + · · · + λ + 1]δ(0) = Since ϕ(ℓ) = 1. We have Px (T0 < ∞) = lim Px (T0 < Tℓ ) = 1 − lim Px (Tℓ < T0 ). This gives δ(x) = δ(0) for all x and ϕ(x) = xδ(0). The ruin probability of a gambler starting with £x is Px (T0 < ∞) = Proof. Corollary 4. ℓ→∞ λℓ − 1 0−1 λx − 1 = 1 − 0 = 1. if λ = 1 if λ = 1. Px (T0 < ∞) = 1 − lim 26 . The general case is obtained by replacing ℓ by b − a and x by x − a.and. λx − 1 δ(0). ℓ→∞ ℓ Px (T0 < ∞) = 1 − lim Finally. λℓ −1 x ℓ. Since p + q = 1. 1. ℓ Summarising. ℓ→∞ λℓ − 1 x = 1 − 0 = 1. ℓ→∞ ℓ→∞ (q/p)x . then λ = q/p < 1. we have obtained ϕ(x. ϕ(x) = pϕ(x + 1) + qϕ(x − 1). we can write this as p[ϕ(x + 1) − ϕ(x)] = q[ϕ(x) − ϕ(x − 1)]. we have λℓ −1 λ−1 δ(0) = 1 which gives δ(0) = (λ − 1)/(λℓ − 1). 0 ≤ x ≤ ℓ. by ﬁrst-step analysis. Let δ(x) := ϕ(x + 1) − ϕ(x). λ = 1. Assuming that λ := q/p = 1. ℓ) = λx −1 . and so λx − 1 λx − 1 =1− = λx .

. τ = n). τ = n)P (Xn = i. . . . 27 . . n ≥ 0) is a time-homogeneous Markov chain. Our assertions are now easily obtained from this by ﬁrst summing over all n (whence the event {τ < ∞} appears) and then by dividing both sides by P (Xτ = i. This stronger property is referred to as the strong Markov property. τ = n) = pi. Let A be any event determined by the past (X0 . Xn+1 . . . . τ < ∞)P (A | Xτ . . . . (b) is by the deﬁnition of conditional probability. . τ = n) (c) = P (Xn+1 = j1 . n ≥ 0) is independent of the past process (X0 . . Xn+2 = j2 . Let τ be a stopping time. Xn+2 = j2 . . . τ < ∞) = pi. . τ = n) = P (Xn+1 = j1 .8 Stopping times and the strong Markov property A random time is a random variable taking values in the time-set. . considered earlier. for all n ∈ Z+ . . An example of a random time which is not a stopping time is the last visit of a speciﬁc set. . . . . . . Recall that the Markov property states that. . .j2 · · · P (Xn = i. In particular. A. Xn = i. the future process (Xτ +n . Since (X0 . . | Xτ = i. τ < ∞) = P (Xn+1 = j1 . A. . A. . A. . . (d) where (a) is just logic. Theorem 8 (strong Markov property). . . . . Xτ +2 = j2 . Xn )1(τ = n). Xτ +2 = j2 . the value +∞. . . . Xτ ) if τ < ∞. . An example of a stopping time is the time TA . . | Xτ . . Xn ) and the ordinary splitting property at n. a stopping time τ is a random variable with values in Z+ ∪{+∞} such that 1(τ = n) is a deterministic function of (X0 . Xn ) if we know the present Xn . (c) is due to the fact that A ∩ {τ = n} is determined by (X0 . and (d) is due to the fact that (Xn ) is time-homogeneous Markov chain with transition probabilities pij . Xn ) for all n ∈ Z+ . including. any such A must have the property that. . . for any n. A ∩ {τ = n} is determined by (X0 . Let τ be a stopping time. . Xτ )1(τ < ∞) = n∈Z+ (X0 . . the ﬁrst hitting time of the set A.Xτ +2 = j2 . Proof. Then. .) is independent of the past process (X0 . | Xn = i.j2 · · · We have: P (Xτ +1 = j1 . We will now prove that this remains true if we replace the deterministic n by a random stopping time τ . P (Xτ +1 = j1 . . .j1 pj1 . . Suppose (Xn .j1 pj1 . A. Xτ = i. A. possibly. Xτ +2 = j2 . . τ < ∞) and that P (Xτ +1 = j1 . . the future process (Xn . Xn ). . τ = n) (a) (b) = P (Xτ +1 = j1 . . conditional on τ < ∞ and Xτ = i. . . We are going to show that for any such A. . τ < ∞). . . . A | Xτ . Xn+2 = j2 . | Xn = i)P (Xn = i. Xτ ) and has the same law as the original process started from i.

. in a generalised sense.. . The idea is that if we can ﬁnd a state i that is visited again and again. These excursions are independent! 11111111111 00000000000 1111111111 0000000000 11111111111 00000000000 11111111111111 00000000000000 1111111111 0000000000 11111111111 00000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 11111111111 00000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 1111111111111111 0000000000000000 11111 11111111111111 00000000000000 1111111111 0000000000 i 00000 11111111111111111111111111111111111111111111 00000000000000000000000000000000000000000000 1111111111111111 0000000000000000 11111 00000 11111 00000 Ti = Ti (1) Ti (2) Ti (3) Ti (4) Fix a state i ∈ S and let Ti be the ﬁrst time that the chain will visit this state: Ti := inf{n ≥ 1 : Xn = i}. The random variables Xn comprising a general Markov chain are not independent. we will show that we can split the sequence into i. r = 2. random trajectories. Xi (1) . If X0 = i then the two deﬁnitions coincide. .i. we let Ti be the time of the second (1) (3) visit. in fact.i. Regardless. (1) r = 1. Ti Xi (0) (r) ≤ n < Ti (r+1) .d. Xi (2) . Xi (0) . State i may or may not be visited again. Our Ti here is always positive. then the behaviour of the chain between successive visits cannot be inﬂuenced by the past or the future (and that is due to the strong Markov property). (And we set Ti := Ti . 2.9 Regenerative structure of a Markov chain A sequence of i. (Why?) Consider the “trajectory” of the chain between two successive visits to the state i: Xi Let also We think of (r) := Xn . So Xi is the r-th excursion: the chain visits 28 . . by using the strong Markov property. Schematically. we recursively deﬁne Ti (r) (2) := inf{n > Ti (r−1) : Xn = i}. := Xn ..) We let Ti be the time of the third visit. . The Ti of Section 6 was allowed to take the value 0. However. as “random variables”. 3. “chunks”. Formally. .d. All these random times are stopping times. We (r) refer to them as excursions of the process. random variables is a (very) particular example of a Markov chain. . They are. 0 ≤ n < Ti . and so on. Note the subtle diﬀerence between this Ti and the Ti of Section 6.

The last corollary ensures us that Λ1 . An immediate consequence of the strong Markov property is: Theorem 9. (Numerical) functions g(Xi independent random variables. we say that a state i is recurrent if. equal to i. (r) . Moreover. at least one state must be visited inﬁnitely many times. starting from i. Λ2 .d. Second. for time-homogeneous Markov chains. .state i for the r-th time. . 10 Recurrence and transience When a Markov chain has ﬁnitely many states. . Xi distribution. are independent random variables. But (r) the state at Ti is. given the state at (r) (r) (r) the stopping time Ti . Perhaps some states (inessential ones) will be visited ﬁnitely many times. the future after Ti is independent of the past before Ti . Xi (2) . we say that a state i is transient if. . by deﬁnition. we “watch” it up until the next time it returns to state i. . have the same Proof. Therefore.. The claim that Xi . starting from i. all. . and this proves the ﬁrst claim.i. A trivial. possibly. We abstract this trivial observation and give a couple of deﬁnitions: First. if it starts from state i at some point of time. In particular. but there are always states which will be visited again and again. and “goes for an excursion”. the chain returns to i only ﬁnitely many times: Pi (Xn = i for inﬁnitely many n) = 0. g(Xi (2) ). Xi have the same distribution follows from the fact that the Markov chain is time-homogeneous. 29 . Therefore. The excursions Xi (0) . except. it will behave in the same way as if it starts from the same state at another point of time. (0) are independent random variables. the chain returns to i inﬁnitely many times: Pi (Xn = i for inﬁnitely many n) = 1. ad inﬁnitum. Apply Strong Markov Property (SMP) at Ti : SMP says that. of the excursions are Λr := (r) n=Ti 1(Xn = j) The meaning of Λr is this: it expresses the number of visits to a state j between two successive visits to the state i. they are also identical in law (i.).. Xi (1) . Example: Deﬁne Ti (r+1) (0) ). g(Xi (1) ). the future is independent of the (1) (2) past. . but important corollary of the observation of this section is that: Corollary 5.

i. . Pi (Ji = ∞) = 1. fii = Pi (Ti < ∞) = 1. k ≥ 0. 2 k Pi (Ji ≥ k) = fii Pi (Ji ≥ k − 1) = fii Pi (Ji ≥ k − 2) = · · · = fii . 3. We will show that such a possibility does not exist. we conclude what we asserted earlier. We summarise this together with another useful characterisation of recurrence & transience in the following two lemmas. Starting from X0 = i. we have Proof. and that is why we get the product. Indeed. This means that the state i is transient. We claim that: Lemma 4. (k−1) Notice that fii = Pi (Ti < ∞) can either be equal to 1 or smaller than 1. in principle. the random variable Ji is geometric: k Pi (Ji ≥ k) = fii . Ei Ji = ∞. This means that the state i is recurrent. The following are equivalent: 1. Therefore. then Pi (Ji < ∞) = 1.e. If fii < 1. therefore concluding that every state is either recurrent or transient. 30 . If fii = 1 we have that Pi (Ji ≥ k) = 1 for all k. there could be yet another possibility: 0 < Pi (Xn = i for inﬁnitely many n) < 1. Lemma 5. Suppose that the Markov chain starts from state i. Because either fii = 1 or fii < 1. Pi (Ji = ∞) = 1. Notice also that fii := Pi (Ji ≥ 1) = Pi (Ti < ∞). 2. State i is recurrent. By the strong Markov property at time Ti (k−1) Pi (Ji ≥ k) = Pi (Ji ≥ k − 1)Pi (Ji ≥ 1). But the past (k−1) before Ti is independent of the future. Indeed. the event {Ji ≥ k} implies that Ti < ∞ and that Ji ≥ k − 1. 4.Important note: Notice that the two statements above are not negations of one another. every state is either recurrent or transient. Consider the total number of visits to state i (excluding the possibility that X0 = i): ∞ Ji = n=1 1(Xn = i) = sup{r ≥ 0 : Ti(r) < ∞} . namely. Notice that: state i is visited inﬁnitely many times ⇐⇒ Ji = ∞.

If i is a transient state then pii → 0. clearly. Hence Ei Ji = ∞ also. therefore Pi (Ji < ∞) = 1. State i is transient if. Pi (Ji < ∞) = 1. fij ≤ pij .e. 4. pii → 0. fii = Pi (Ti < ∞) < 1. owing to the fact that Ji is a geometric random variable. it is visited inﬁnitely many times. But ∞ ∞ ∞ (n) Ei Ji = Ei n=1 1(Xn = i) = n=1 Pi (Xn = i) = n=1 (n) pii . we see that its expectation cannot be inﬁnity without Ji itself being inﬁnity with probability one. The latter is the probability to be in state j at time n (not necessarily for the ﬁrst time) starting from i. Ji cannot take value +∞ with positive probability. From Elementary Probability. Lemma 6. Corollary 6. the probability to hit state j for the ﬁrst time at time n ≥ 1. The following are equivalent: 1. (n) i. Ei Ji < ∞.1 Deﬁne First hitting time decomposition fij := Pi (Tj = n). 2. State i is transient. State i is recurrent if. this implies that Ei Ji < ∞. Xn = j) Pi (Tj = m)Pi (Xn = j | Tj = m) 31 = m=1 . by deﬁnition of Ji is equivalent to Pi (Ji = ∞) = 1.e. as n → ∞. If Ei Ji = ∞. it is visited ﬁnitely many times with probability 1.Proof. which. but the two sequences are related in an exact way: Lemma 7. Notice the diﬀerence with pij . is equivalent to Pi (Ji < ∞) = 1. by the deﬁnition of Ji . 10. Proof. (n) pij n = Pi (Xn = j) = m=1 n Pi (Tj = m. Proof. (n) But if a series is ﬁnite then the summands go to 0. i. The equivalence of the ﬁrst two statements was proved before the lemma. by deﬁnition. n ≥ 1. 3. On the other hand. then. by deﬁnition. assuming that the chain (n) starts from state i. and. The equivalence of the ﬁrst two statements was proved above. (n) pij n (m) (n−m) (n) (n) = m=1 fij pjj . This. if we assume Ei Ji < ∞. as n → ∞. Proof. Clearly. since Ji is a geometric random variable. then. If i is transient then Ei Ji < ∞.

Xi . j i. . So the random variables δj. either all states are recurrent or all states are transient. are i. 10. Start with X0 = i. Hence. r = 0. Now. showing that j will appear in inﬁnitely many of the excursions for sure. and hence pij as n → ∞. The last statement is simply an expression in symbols of what we said above. For the second term. Corollary 7. and gij := Pi (Tj < Ti ) > 0. we use the Strong Markov Property: (n−m) Pi (Xn = j | Tj = m) = Pi (Xn = j | Xm = j) = pjj . as n → ∞. . by deﬁnition. Proof.2 Communicating classes and recurrence/transience Recall the deﬁnition of the relation “state i leads to state j”. If j is transient. for all states i.d. we obtain (after interchanging the order of the two sums in the right hand side) ∞ n=1 (n) pij ∞ (m) fij ∞ n=m (n) (n−m) pjj ∞ (m) (n) (m) = m=1 = (1 + Ej Jj ) m=1 fij = (1 + Ej Jj )Pi (Tj < ∞). In particular.i. .d.i. we have n=1 pjj = Ej Jj < ∞. by summing over n the relation of Lemma 7. 1. If i is recurrent and i j then there is a positive probability (0) (1) (= fij ) that j will appear in one of the i. j i. not only fij > 0. . if we let fij := Pi (Tj < ∞) .) Theorem 10. and so the probability that j will appear in a speciﬁc excursion is positive. If i is recurrent and if i j then j is recurrent and fij = Pi (Tj < ∞) = 1. Indeed. the probability to go to j in some ﬁnite time is positive. We now show that recurrence is a class property. we have i j ⇐⇒ fij > 0. .. and since they take the value 1 with positive probability. starting from i. 32 . excursions Xi .r := 1 j appears in excursion Xi (r) . If j is a transient state then pij → 0. In other words. but also fij = 1. In a communicating class. which is ﬁnite. Hence. ( Remember: another property that was a class property was the property that the state have a certain period. Corollary 8. inﬁnitely many of them will be 1 (with probability 1). the probability that j will appear in a speciﬁc excursion equals gij = Pi (Tj < Ti ).The ﬁrst term equals fij . denoted by i j: it means that. . ∞ Proof.

if started in C. If a communicating class contains a state which is transient then. U1 .e. Proof. Let C be the class. . Un ≤ U0 ) 1 = 0 1 P (U1 ≤ x. U2 . If i is inessential then it is transient. be i. .d. Xn = i for inﬁnitely many n) = 0. X remains in C. random variables. 0 < Pi (A) = Pi (A ∩ B) + Pi (A ∩ B c ) = Pi (A ∩ B c ) ≤ Pi (B c ). some state i must be visited inﬁnitely many times. . . P (T > n) = P (U1 ≤ U0 . every other j is also recurrent. So if a communicating class is recurrent then it contains no inessential states and so it is closed. n+1 33 = 0 . Let T := inf{n ≥ 1 : Un > U0 }. It should be clear that this can. i. Proof. What is the expectation of T ? First. . But then.Proof. Suppose that i is inessential.i. Indeed. and hence. If i is recurrent then every other j in the same communicating class satisﬁes i j. We now take a closer look at recurrence. Hence all states are recurrent. observe that T is ﬁnite. . . Hence 11 Positive recurrence Recall that if we ﬁx a state i and let Ti be the ﬁrst return time to i then either Pi (Ti < ∞) = 1 (in which case we say that i is recurrent) or Pi (Ti < ∞) < 1 (transient state). Theorem 12. for example. By ﬁniteness. By closedness. appear in a certain game.e. and so i is a transient state. . P (B) = Pi (Xn = i for inﬁnitely many n) < 1. uniformly distributed in the unit interval [0. all other states are also transient. Hence. This state is recurrent. 1]. . Example: Let U0 . Recall that a ﬁnite random variable may not necessarily have ﬁnite expectation. i. there is a state j such that i there is m such that Pi (A) = Pi (Xm = j) > 0. necessarily. Thus T represents the number of variables we need to draw to get a value larger than the initial one. Theorem 11. Un ≤ x | U0 = x)dx xn dx = 1 . by the above theorem. . . but Pi (A ∩ B) = Pi (Xm = j. Pi (B) = 0. j but j i. Every ﬁnite closed communicating class is recurrent.

the Fundamental Theorem of Probability Theory. then it takes. it is a Theorem that can be derived from the axioms of Probability Theory. an inﬁnite number of steps to exceed it! We now classify a recurrent state i as follows: We say that i is positive recurrent if Ei Ti < ∞. When we have a Markov chain. arguably. . but E min(Z1 . Theorem 13. Our goal now is to show that a chain possessing a positive recurrent state also possesses a stationary distribution. . the sequence X0 . X1 . It is traditional to refer to it as “Law”. P (T < ∞) = lim P (T ≤ n) = lim (1 − n→∞ ∞ n→∞ 1 ) = 1. Let Z1 . (ii) Suppose E max(Z1 . random variables. we shall discover what the form of this stationary distribution is by using the Law of Large Numbers 12 Law (=Theorem) of Large Numbers in Probability Theory The following result is. n+1 But ET = ∞ P (T > n) = n=1 n=1 1 = ∞. To our rescue comes the regenerative structure of a Markov chain identiﬁed at an earlier section. 13 Law of Large Numbers for Markov chains The law of large numbers stated above requires independence. To apply the law of large numbers we must create some kind of independence. But this is misleading. .Therefore. 0) > −∞. It is not a Law. . be i. 0) = ∞. Then P Z1 + · · · + Zn =µ n→∞ n lim = 1. Z2 . 34 . Then P Z1 + · · · + Zn =∞ n→∞ n lim = 1. . We say that i is null recurrent if Ei Ti = ∞.i. of successive states is not an independent sequence. (i) Suppose E|Z1 | < ∞ and let µ = EZ1 . on the average.d. We state it here without proof. X2 . n+1 You are invited to think about the meaning of this: If the initial value U0 is unknown. Before doing that. .

(7) Whichever the choice of G. The ﬁnal equality in the last display is merely our notation for the expectation of a random variable conditional on X0 = a.d. G(Xa(r) ) = (r) (r+1) Ta ≤t<Ta f (Xt ). because the excursions Xa .d. Thus. We are interested in the existence of the limit time-average reward = lim 1 t→∞ t t f (Xk ). deﬁne G(X ) to be the maximum reward received during this excursion. are i. the (0) initial excursion Xa also has the same law as the rest of them.i. Example 2: Let f (x) be as in example 1. 35 . Note that EG(Xa(1) ) = EG(Xa(2) ) = · · · = E G(Xa(0) ) | X0 = a = Ea G(Xa(0) ). (r) (r+1) G(Xa(r) ) = max{f (Xt ) : Ta ≤ t < Ta }. In particular. (1) (2) (1) (n) (8) 13. . G(Xa(2) ). . . and because if we start with X0 = a. forms an independent sequence of (generalised) random variables. Then. which.13. . n as n → ∞.2 Average reward Consider now a function f : S → R. Speciﬁcally. are i. for reasonable choices of the functions G taking values in R. k=0 which represents the ‘time-average reward’ received. Example 1: To each state x ∈ S. . G(Xa(3) ). with probability 1. in this example. Xa(3) . for an excursion X . we can think of as a reward received when the chain is in state x. . given a recurrent state a ∈ S (which will be referred to as the ground state for no good reason other than that it has been ﬁxed once and for all) the sequence of excursions Xa(1) . Since functions of independent random variables are also independent. as in Example 2. random variables.. we shall assume that f is bounded and shall examine those conditions which yield a nontrivial limit. Xa . .i.1 Functions of excursions Namely. the law of large numbers tells us that G(Xa ) + · · · + G(Xa ) → EG(Xa(1) ). assign a reward f (x). we have that G(Xa(1) ). . Xa(2) . . but deﬁne G(X ) to be the total reward received over the excursion X .

t as t → ∞. Hence |Gﬁrst | ≤ BTa . We assumed that f is t bounded. say. with probability one. t t 36 . Proof. We are looking for the total reward received between 0 and t. of complete excursions from the ground state a that occur between 0 and t. Suppose that a Markov chain possesses a positive recurrent state a ∈ S (i. We break the sum into the total reward received over the initial excursion. t t as t → ∞. Ea Ta < ∞) and assume that the distribution of X0 is such that P (Ta < ∞) = 1. plus the total reward received over the next excursion. Gﬁrst is the reward up to time Ta = Ta or t (whichever t is smaller). For example. in the ﬁgure below. Then. (r) Nt := max{r ≥ 1 : Ta ≤ t}. Ea f= n=0 f (Xn ) E a Ta = x∈S f (x)π(x). and Glast is the reward over the last incomplete cycle. 0 Ta (1) Ta (2) T (3) a Ta(4) Ta (5) t We write the total reward as a sum of rewards over complete excursions plus a ﬁrst and a last term: t Nt −1 r=1 (1) f (Xn ) = n=0 Gr + Gﬁrst + Glast . Gr is given by (7). we have BTa /t → 0.Theorem 14. The idea is this: Suppose that t is large enough. Thus we need to keep track of the number. with probability one. A similar argument shows that 1 last G → 0. t t In other words. Since P (Ta < ∞) = 1. We shall present the proof in a way that the constant f will reveal itself–it will be discovered. and so on. the ‘time-average reward’ f exists and is a deterministic quantity: P where Ta −1 1 t→∞ t lim t f (Xn ) = f n=0 =1 . Nt = 5. say |f (x)| ≤ B. Nt . Let f : S → R+ be a bounded ‘reward’ function. and so 1 ﬁrst G → 0.e.

random variables with common expectation Ea Ta . (11) By deﬁnition. 1 Nt = . 2. Hence Ta t t Ta t ≤ < Nt Nt Nt But (11) tells us that Ta t Ta t = lim t→∞ Nt t→∞ Nt lim Hence lim (N ) (N +1) (N ) (N +1) . t→∞ t E a Ta (12) Using (10) and (12) in (9) we have 1 lim t→∞ t Thus 1 t→∞ t lim t Nt −1 Gr = r=1 EG1 .. = E a Ta . r=1 (10) Now look at the sequence r (r) (1) Ta = Ta + k=1 (k) (k−1) ) (Ta − Ta (1) (k) (k−1) and see what the Law of Large Numbers again. E a Ta 37 . and since Ta − Ta k = 1.also. Hence.d. So we are left with the sum of the rewards over complete cycles. are i. . . lim t∞ 1 Nt Nt −1 Gr = EG1 . E a Ta f (Xn ) = n=0 EG1 =: f . Write now 1 t Nt −1 Gr = r=1 Nt 1 t Nt Nt −1 Gr . = E a Ta . we have Ta r→∞ r lim (r) . it follows that the number Nt of complete cycles before time t will also be tending to ∞. . Since Ta /r → 0. as r → ∞. we have that the visit with index r = Nt to the ground state a is the last before t (see also ﬁgure above): (N (N Ta t ) ≤ t < Ta t +1) . r=1 with probability one.i. r=1 (9) Look again what the Law of Large Numbers (8) tells us: that 1 n→∞ n lim (r) n Gr = EG1 . as t → ∞. Since Ta → ∞.

e. Comment 1: It is important to understand what the formula says. then the long-run proportion of visits to the state a should equal 1/Ea Ta . Note that we include only one endpoint of the excursion. of times state b is visited = Ea between two successive visits to state a Ta −1 1(Xk = b) = k=0 E a Ta . The denominator is the duration of the excursion. which communicate and are positive recurrent. just observe that EG1 = E (1) (1+1) Ta ≤t<Ta f (Xt ) = Ea 0≤t<Ta f (Xt ). E b Tb 38 . We may write EG1 = Ea 0≤t<Ta f (Xt ) or EG1 = Ea 0<t≤Ta f (Xt ). a. t Ea 1(Xk = b) E a Ta . The theorem above then says that. The equality of these two limits gives: average no. (13) 1(Xn = b) = n=1 Comment 3: If in the above we let b = a. from the Strong Markov Property. then we see that the numerator equals 1. b. The constant f is a sort of average over an excursion. not both. The numerator of f is computed by summing up the rewards over the excursion (making sure that not both endpoints are included). Applying formula (14) with b in place of a we ﬁnd that the limit in (13) also equals 1/Eb Tb . The physical interpretation of the latter should be clear: If. Hence t→∞ lim 1 t t 1(Xn = a) = n=1 1 . We can use b as a ground state instead of a.To see that f is as claimed. Comment 4: Suppose that two states. it takes Ea Ta units of time between successive visits to the state a. Ta −1 k=0 1 lim t→∞ t with probability 1. i. we let f to be 1 at the state b and 0 everywhere else. f(X2 ) f(X1 ) f(X )= f(a) 0 f(T −1) a 0 T a Comment 2: A very important particular case occurs when we choose f (x) := 1(x = b). on the average. for all b ∈ S. E a Ta (14) with probability 1. the ground state.

In view of this. X2 . X2 . . Indeed. Proposition 2. 5 We stress that we do not need the stronger assumption of positive recurrence here. 39 .14 Construction of a stationary distribution Our discussions on the law of large numbers for Markov chains revealed that the quantity Ta −1 ν (x) := Ea n=0 [a] 1(Xn = x) . (15) should play an important role. where a is the ﬁxed recurrent ground state whose existence was assumed. Since a = XT a . we will soon realise that this dependence vanishes when the chain is irreducible. X3 . i. Then ν [a] satisﬁes the balance equations: ν [a] = ν [a] P. . Moreover. we can write ν (x) = Ea k=1 [a] Ta 1(Xn = x). Consider the function ν [a] deﬁned in (15). Proof. . (16) This was a trivial but important step: We replaced the n = 0 term with the n = Ta term. X1 . The real reason for doing so is because we wanted to replace a function of X0 . we saw (Comment 2 above) that if a is a positive recurrent state then ν [a] (x) E a Ta is the long-run fraction of times that state x is visited. notice that the choice of notation ν [a] is justiﬁable through the notation (6) for the communicating class of the state a. Recurrence tells us that Ta < ∞ with probability one. for any state b such that a → x (a leads to x). . Clearly. Fix a ground state a ∈ S and assume that it is recurrent5 . this should have something to do with the concept of stationary distribution discussed a few sections ago. by a function of X1 . we have 0 < ν [a] (x) < ∞. ν [a] (x) = y∈S ν [a] (y)pyx . Both of them equal 1. . x ∈ S. . so there is nothing lost in doing so. Hence the superscript [a] will soon be dropped.e. We start the Markov chain with X0 = a. Note: Although ν [a] depends on the choice of the ‘ground’ state a. x ∈ S.

namely. indeed. n ≤ Ta ) Pa (Xn = x. n ≤ Ta ) = n=1 ∞ Pa (Xn = x. Note: The last result was shown under the assumption that a is a recurrent state. Deﬁne Ta −1 ν [a] (x) = π (x) := E a Ta [a] Ea n=0 1(Xn = x) E a Ta . in this last step. so there exists m ≥ 1. such that pax > 0.and thus be in a position to apply ﬁrst-step analysis. which is precisely what we do next: ∞ ν (x) = Ea n=1 ∞ [a] 1(Xn = x. xa so. We just showed that ν [a] = ν [a] P. Hence ν [a] (x) ≥ ν [a] (a)p(n) = p(n) > 0. so there exists n ≥ 1. which are precisely the balance equations. where. Therefore. Hence 1 = ν [a] (a) ≥ ν [a] (x)p(m) > ν [a] (x). such that pxa > 0. n ≤ Ta ) Ta = n=1 y∈S ∞ = n=1 y∈S = y∈S pyx Ea n=1 [a] 1(Xn−1 = y) = y∈S pyx ν (y). 40 . by deﬁnition. Ta Ta (m) ν [a] (x) = Ea x∈S n=1 x∈S 1(Xn = x) = Ea n=1 1 = E a Ta . x ∈ S. So we have shown ν [a] (x) = y∈S pyx ν [a] (y). ν [a] (x) < ∞. Xn−1 = y. ax ax From Theorem 10. we used (16). for all n ∈ N. that a is positive recurrent. from (16). x a. Suppose that the Markov chain possesses a positive recurrent state a. (n) Next. is ﬁnite when a is positive recurrent. n ≤ Ta ) pyx Pa (Xn−1 = y. First note that. for all x ∈ S. ν [a] = ν [a] Pn . let x be such that a x. We now have: Corollary 9. which. We are now going to assume more. (17) This π is a stationary distribution.

2. that ν [a] P = ν [a] . 41 . then we immediately have a stationary (probability) distribution! We DID NOT claim that this distribution is. . The coins are i. to decide whether j will occur in a speciﬁc excursion. Start with X0 = i. π [a] P = π [a] . Suppose that i recurrent. 15 Positive recurrence is a class property We now show that (just like recurrence) positive recurrence is also a class property. Otherwise. . Consider the excursions Xi . second) excursion that contains j. then j occurs in the r-th excursion. in Proposition 2. if tails show up. in general. positive recurrent state. π(y) = 1 y∈S x∈S have at least one solution. It only remains to show that π [a] is a probability. In words. that it sums up to one. no matter which one. Therefore. 1. (r) j. R2 ) be the index of the ﬁrst (respectively. But π [a] is simply a multiple of ν [a] . Theorem 15.i.e. Then j is positive Proof. i. and the probability gij = Pi (Tj < ∞) that j occurs in a speciﬁc excursion is positive.In other words: If there is a positive recurrent state. unique.d. since the excursions are independent. We already know that j is recurrent. then the r-th excursion does not contain state j. then the set of linear equations π(x) = y∈S π(y)pyx . Let i be a positive recurrent state. . we have Ea Ta < ∞. We have already shown. This is what we will try to do in the sequel. If heads show up at the r-th excursion. we toss a coin with probability of heads gij > 0. Hence j will occur in inﬁnitely many excursions. Proof. Let R1 (respectively. and so π [a] is not trivially equal to zero. r = 0. x∈S Pause for thought: We have achieved something VERY IMPORTANT: if we have one. Also. But π [a] (x) = x∈S 1 E a Ta ν [a] (x) = 1. Since a is positive recurrent. the event that j occurs in a speciﬁc excursion is independent from the event that j occurs in the next excursion.

we just need to show that R2 −R1 +1 ζ := r=1 Zr has ﬁnite expectation. with the same law as Ti . Since gij > 0 and Ei Ti < ∞ we have that Ei ζ < ∞. . R2 has ﬁnite expectation. or all states are null recurrent or all states are transient. by Wald’s lemma. . Let C be a communicating class. . Corollary 10. . the expectation of this random variable equals −1 Ei ζ = (Ei Ti ) × Ei (R2 − R1 + 1) = (Ei Ti ) × (gij + 1) because R2 − R1 is a geometric random variable: Pi (R2 − R1 = r) = (1 − gij )r−1 gij . An irreducible Markov chain is called recurrent if one (and hence all) states are recurrent. where the Zr are i. In other words.i. An irreducible Markov chain is called transient if one (and hence all) states are transient.R2 R1 j i R1+1 R1+2 This random variable has the same law as T under P i i ζ We just need to show that the sum of the durations of excursions with indices R1 . . An irreducible Markov chain is called positive recurrent if one (and hence all) states are positive recurrent. . But. An irreducible Markov chain is called null recurrent if one (and hence all) states are null recurrent. RECURRENT TRANSIENT POSITIVE RECURRENT NULL RECURRENT 42 . IRREDUCIBLE CHAIN r = 1. .d. 2. Then either all states in C are positive recurrent.

for all states i and j. as t → ∞. we have P (Xn = x) = π(x). for totally trivial reasons (it is. Fix a ground state a. Hence. x ∈ S. But. Since the chain is irreducible and positive recurrent we have Pi (Tj < ∞) = 1. the π deﬁned by π(0) = α. following Theorem 14. with probability one. state 1 (and state 2) is positive recurrent. Let π be a stationary distribution. we can take expectations of both sides: E as t → ∞. But observe that there are many–inﬁnitely many–stationary distributions: For each 0 ≤ α ≤ 1.16 Uniqueness of the stationary distribution At the risk of being too pedantic. Ta −1 1 t t Ea 1(Xk = x) k=0 1(Xn = x) n=1 → E a Ta = π [a] (x). π(1) = 0. (18) π(x)Px (Ta < ∞) = π(x) = 1. Consider positive recurrent irreducible Markov chain. Theorem 16. π(2) = 1 − α 2. Consider the Markov chain (Xn . n ≥ 0) and suppose that P (X0 = x) = π(x). is a stationary distribution. n=1 43 . indeed. Then P (Ta < ∞) = x∈S In other words. Then there is a unique stationary distribution. Proof. we stress that a stationary distribution may not be unique. Then. after all. we start the Markov chain from the stationary distribution π. there is a stationary distribution. for all n. By a theorem of Analysis (Bounded Convergence Theorem). recognising that the random variables on the left are nonnegative and bounded below 1. what prohibits uniqueness. 1(Xn = x) n=1 = 1 t t P (Xn = x) = π(x). (You should think why this is OBVIOUS!) Notice that 1 This lack of communication is. x∈S By (13). an absorbing state). for all x ∈ S. E 1 t t 1 t t 1(Xn = x) n=1 → π [a] (x). we have. by (18). Example: Consider the chain we’ve seen before: 1 1/3 1 0 1/2 1 1/6 2 Clearly. x ∈ S.

b ∈ S. y ∈ C.e. π(a) = π [a] (a) = 1/Ea Ta . There is a unique stationary distribution. which is useful because it is often used as a criterion for positive recurrence: Corollary 13. π(C) x ∈ C. delete all transition probabilities pxy with x ∈ C.Therefore π(x) = π [a] (x). The following is a sort of converse to that. so they are equal. Consider an arbitrary Markov chain. This is in fact. Consider the restriction of the chain to C. we obtain the cycle formula: Ta −1 Tb −1 Ea k=0 1(Xk = x) E a Ta = Eb k=0 1(Xk = x) E b Tb . Then any state i such that π(i) > 0 is positive recurrent. Proof. Proof. By the 1 above corollary. for all x ∈ S. Consider a positive recurrent irreducible Markov chain and two states a. Hence there is only one stationary distribution. Decompose it into communicating classes. the only stationary distribution for the restricted chain because C is irreducible. where π(C) = y∈C π(y). Then. i is positive recurrent. In particular. The cycle formula merely re-expresses this equality. We can easily see that we thus obtain a Markov chain with state space C with stationary distribution π(·|C). Hence π = π [a] . Let C be the communicating class of i. Therefore Ei Ti < ∞. i. In Section 14 we assumed the existence of a positive recurrent state a and proved the existence of a stationary distribution π. 17 Structure of a stationary distribution It is not hard to show now what an arbitrary stationary distribution looks like. Consider a positive recurrent irreducible Markov chain with stationary distribution π. Corollary 11. from the deﬁnition of π [a] . Then for each positive recurrent communicating class C there corresponds a unique stationary distribution π C 44 . But π(i|C) > 0. Thus. Consider a Markov chain. an arbitrary stationary distribution π must be equal to the speciﬁc one π [a] . There is only one stationary distribution. Corollary 12. In particular. Let i be such that π(i) > 0. E a Ta Proof. Then π [a] = π [b] . Both π [a] and π[b] are stationary distributions. for all a ∈ S. Let π(·|C) be the distribution deﬁned by restricting π to C: π(x|C) := π(x) . Assume that a stationary distribution π exists. π(i|C) = Ei Ti . Namely. x ∈ S. 1 π(a) = .

So we have that the above decomposition holds with αC = π(C). For each such C deﬁne the conditional distribution π(x|C) := π(x) . such that C αC = 1. under irreducibility and positive recurrence conditions. A Markov chain is a simple model of a stochastic dynamical system. Why stability? There are physical reasons why we are interested in the stability of a Markov chain. the so-called Cesaro 1 averages n n P (Xk = x) converge to π(x). it will settle to the vertical position. To each positive recurrent communicating class C there corresponds a stationary distribution π C . Proposition 3. x∈C Then (π(x|C). In an ideal pendulum. This is deﬁned by picking a state (any state!) a ∈ C and letting π C := π [a] . An arbitrary stationary distribution π must be a linear combination of these π C . For example.which assigns positive probability to each state in C. Let π be an arbitrary stationary distribution. Any stationary distribution π is necessarily of the form π= C: C positive recurrent communicating class αC π C . eventually. A sequence of real numbers an k=1 1 may not converge but its Cesaro averages n (a1 + · · · + an ) may converge. where π(C) := π(C) π(x). π(x|C) ≡ π C (x). from physical experience. So far we have seen. 18 18. for example. more generally. that it remains in a small region). This is the stable position. where αC ≥ 0. consider the motion of a pendulum under the action of gravity. We all know. There is dissipation of energy due to friction and that causes the settling down. x ∈ C) is a stationary distribution. as n → ∞. that the pendulum will swing for a while and. without friction. Stability of a deterministic dynamical system means that the system eventually converges to a ﬁxed point (or. the pendulum will perform an 45 . Proof.1 Coupling and stability Deﬁnition of stability Stability refers to the convergence of the probabilities P (Xn = x) as n → ∞. that. By our uniqueness result. take an = (−1)n . If π(x) > 0 then x belongs to some positive recurrent class C.

However notice what happens when we solve the recursion. It follows that |ρ| < 1 is the condition for stability. are given. it cannot mean convergence to a ﬁxed equilibrium point. and deﬁne a stochastic sequence X1 . 4 3 2 x* 1 0 0 1 2 t 3 t4 If we start with x(t = 0) = x∗ . . or an example in physics. X2 . . ξ1 . simple because the ξn ’s depend on the time parameter n. by means of the recursion above. for we can do it: X1 = ρX0 + ξ0 X2 = ρX1 + ξ1 = ρ(ρX0 + ξ0 ) + ξ1 = ρ2 X0 + ρξ0 + ξ1 X3 = ρX2 + ξ2 = ρ3 X0 + ρ2 ξ0 + ρξ1 + ξ2 Xn = ρn X0 + ρn−1 ξ0 + ρn−2 ξ1 + · · · + ρξn−2 + ξn−1 . We shall assume that the ξn have the same law. x ∈ R. But what does stability mean? In general. we may it is stable because it does not (it cannot!) swing too far away. . as t → ∞. we here assume that X0 . The thing to observe here is that there is an equilibrium point i. Speciﬁcally. random variables. . or whatever. then x(t) = x∗ for all t > 0. ······ 46 . Since we are prohibited by the syllabus to speak about continuous time stochastic systems.e. If. consider the following diﬀerential equation x = −ax + b. We say that x∗ is a stable equilibrium. a > 0. then regardless of the starting position. ξ2 . ˙ There are myriads of applications giving physical interpretation to this. we have x(t) → x∗ . Still. The simplest diﬀerential equation As another example. ξ0 . mutually independent. . And we are in a happy situation here. moreover.oscillatory motion ad inﬁnitum. let us consider a stochastic analogue of the above in discrete time: Xn+1 = ρXn + ξn . so I’ll leave it upon the reader to decide whether he or she wants to consider this as an ﬁnancial example. . the one found by setting the right-hand side equal to zero: x∗ = b/a. A linear stochastic system We now pass on to stochastic systems.

but we are not given what P (X = x.Since |ρ| < 1. suppose you are given the laws of two random variables X. The second term. In particular. Starting at an elementary level. . In other words. the n-step transition matrix Pn . under nice conditions. the limit of its distribution does: ˜ ˆ ˆ lim P (Xn ≤ x) = lim P (Xn ≤ x) = lim P (Xn ≤ x) = P (X∞ ≤ x) n→∞ n→∞ n→∞ This is precisely the reason why we cannot. What this means is that. . i. Y but not their joint law. while the limit of Xn does not exists. positive recurrent and aperiodic with (unique) stationary distribution π then n→∞ lim P (Xn = x) = π(x). as n → ∞. which is a linear combination of ξ0 . to a matrix with rows all equal to π: (n) lim p n→∞ ij = π(j). But suppose that we have some further requirement.2 The fundamental stability theorem for Markov chains Theorem 17. ξn−1 does not converge. j ∈ S. . we know f (x) = P (X = x). deﬁne stability in a way other than convergence of the distributions of the random variables. for instance.3 Coupling To understand why the fundamental stability theorem works.g. Y = y) is? 47 . g(y) = P (Y = y). ∞ ˆ The nice thing is that. Coupling refers to the various methods for constructing this joint law. (e. for any initial distribution. How should we then deﬁne what P (X = x. for time-homogeneous Markov chains. If a Markov chain is irreducible. we have ˜ ˆ P (Xn ≤ x) = P (Xn ≤ x). converges. we have that the ﬁrst term goes to 0 as n → ∞. assume that the ξn are bounded) Xn does converge to ˆ X∞ = k=0 ρk ξk . to try to make the coincidence probability P (X = Y ) as large as possible. we need to understand the notion of coupling. Y = y) is. . A straightforward (often not so useful) construction is to assume that the random variables are independent and deﬁne P (X = x. but notice that if we let ˆ Xn := ρn−1 ξn−1 + ρn−2 ξn−2 + · · · + ρξ1 + ξ0 . Y = y) = P (X = x)P (Y = y). 18. 18. ˜ Xn := ρn−1 ξ0 + ρn−2 ξ1 + · · · + ρξn−2 + ξn−1 . uniformly in x.

P (T < ∞) = 1. Since this holds for all x ∈ S and since the right hand side does not depend on x.e. We have P (Xn = x) = P (Xn = x. x∈S 48 .e. y). and all x ∈ S. Suppose that we have two stochastic processes as above and suppose that they have been coupled. n ≥ T ) ≤ P (n < T ) + P (Yn = x). |P (Xn = x) − P (Yn = x)| ≤ P (T > n).Exercise: Let X. Repeating (19). The coupling we are interested in is a coupling of processes. but we are not told what the joint probabilities are. and hence P (Xn = x) − P (Yn = x) ≤ P (T > n). Let T be a meeting of two coupled stochastic processes X. y) = P (X = x. n ≥ T ) = P (Xn = x. then |P (Xn = x) − P (Yn = x)| → 0. we obtain P (Yn = x) − P (Xn = x) ≤ P (T > n). n ≥ 0). respectively. in particular. n < T ) + P (Yn = x. Y . Combining the two inequalities we obtain the ﬁrst assertion: |P (Xn = x) − P (Yn = x)| ≤ P (T > n). x ∈ S. Y be random variables with values in Z and laws f (x) = P (X = x). n ≥ 0) and the law of a stochastic process Y = (Yn . If. g(y) = P (Y = y). Then. Y = y) which respects the marginals. We are given the law of a stochastic process X = (Xn . The following result is the so-called coupling inequality: Proposition 4. i. but with the roles of X and Y interchanged. precisely because of the assumption that the two processes have been constructed together. Find the joint law h(x. g(y) = x h(x. A coupling of them refers to a joint construction on the same probability space. n ≥ 0. i. y). n ≥ 0. We say that ransom time T is meeting time of the two processes if Xn = Yn for all n ≥ T. Note that it makes sense to speak of a meeting time. we can write it as sup |P (Xn = x) − P (Yn = x)| ≤ P (T > n). on the event that the two processes never meet. (19) as n → ∞. for all n ≥ 0. T is ﬁnite with probability one. n < T ) + P (Xn = x. This T is a random variables that takes values in the set of times or it may take values +∞. and which maximises P (X = Y ). uniformly in x ∈ S. Proof. f (x) = y h(x.

n ≥ 0) is an irreducible chain. in other words. It is very important to note that we have only deﬁned the law of X and the law of Y but we have not deﬁned a joint law. Then (Wn .y′ . denoted by Y = (Yn . Notice that σ(x. and so (Wn . positive recurrent and aperiodic. Yn ). 49 (n) (n) (n) (n) (n) (n) = P (Xn+1 = x′ | Xn = x) P (Yn+1 = y ′ | Yn = y) = px. Having coupled them. implying that q(x.y′ ) = px.x′ py. review the assumptions of the theorem: the chain is irreducible. y) = µ(x)π(y). then. as usual. they diﬀer only in how the initial states are chosen. The ﬁrst process.4 Proof of the fundamental stability theorem First. (Indeed. π(x) > 0 for all x ∈ S.(y. n ≥ 0) is independent of Y = (Yn . From Theorem 3 and the aperiodicity assumption. . n ≥ 0). We now couple them by doing the straightforward thing: we assume that X = (Xn .y′ . n ≥ 0) has the law of a Markov chain with transition probabilities pij and Y0 distributed according to the stationary distribution µ.(y. n ≥ 0) is a Markov chain with state space S × S. we can deﬁne joint probabilities. and it also makes sense to consider their ﬁrst meeting time: T := inf{n ≥ 0 : Xn = Yn }. had we started with both X0 and Y0 independent and distributed according to π.(y. Then n→∞ lim P (T > n) = P (T = ∞) = 0. Therefore. 18.x′ py. Its (1-step) transition probabilities are q(x. Its initial state W0 = (X0 . the transition probabilities. y) := π(x)π(y). for all n ≥ 1. Y0 ) has distribution P (W0 = (x. We know that there is a unique stationary distribution. Xm = y). we have that px.x′ ). Let pij denote. So both processes are positive recurrent aperiodic Markov chains with the same transition probabilities. n ≥ 0).y′ ) := P Wn+1 = (x′ . The second process. sup |P (Xn = x) − P (Yn = x)| → 0.) By positive recurrence.x′ ). denoted by π. y ′ ) | Wn = (x. denoted by X = (Xn . Consider the process Wn := (Xn . x∈S as n → ∞.x′ ). is a stationary distribution for (Wn . n ≥ 0) has the law of a Markov chain with transition probabilities pij and X0 distributed according to some arbitrary distribution µ. y) Its n-step transition probabilities are q(x. (x.y′ for all large n. Xn and Yn would be independent and distributed according to π. We shall consider the laws of two processes.Assume now that P (T < ∞) = 1.y′ ) for all large n. we have not deﬁned joint probabilities such as P (Xn = x.x′ > 0 and py. y) ∈ S × S.

suppose that S = {0. which converges to zero. (0. Y ) may fail to be irreducible. 1). By the coupling inequality. Example showing that the aperiodicity assumption is essential for the method used in the proof: If pij are the transition probabilities of an irreducible chain X and if Y is an independent copy of X.(1. if we modify the original chain and make it aperiodic by. P (Xn = x) = P (Zn = x). Deﬁne now a third process by: Zn := Xn 1(n < T ) + Yn 1(n ≥ T ). n ≥ 0). p10 = 1. In particular. n ≥ 0) is positive recurrent. T is a meeting time between Z and Y . as n → ∞. we have |P (Xn = x) − π(x)| ≤ P (T > n). Z0 = X0 which has an arbitrary distribution µ. 0).(0.1). after the meeting time. Moreover. 1} and the pij are p01 = p10 = 1. 1)} and the transition probabilities are q(0.(0. ≥ 0) is a Markov chain with transition probabilities pij . However.99. Hence (Zn . (Zn . p01 = 0. In particular. 50 .Therefore σ(x. Hence (Wn .0). by assumption. Zn equals Xn before the meeting time of X and Y . Since P (Zn = x) = P (Xn = x). Then the product chain has state space S ×S = {(0. (1. y) ∈ S × S. uniformly in x.(1. Clearly. for all n and x. p00 = 0. y) > 0 for all (x.0) = q(1. n ≥ 0) is identical in law to (Xn . This is not irreducible.01. |P (Zn = x) − P (Yn = x)| ≤ P (T > n).0). (1. Zn = Yn . and since P (Yn = x) = π(x). say. X arbitrary distribution Z stationary distribution Y T Thus. Obviously. 0). precisely because P (T < ∞) = 1. For example.1).1) = q(1.1) = 1.0) = q(0. then the process W = (X. P (T < ∞) = 1.

497 .e. n→∞ (n) where π(0).then the product chain becomes irreducible. then (n) lim p n→∞ 00 (n) = lim p10 = π(0). such that the chain moves cyclically between them: from C0 to C1 . n→∞ (n) (n) lim p n→∞ 11 = lim p01 = π(1). from C1 to C2 . limn→∞ p00 does not exist. Terminology cannot be. most people use the term “ergodic” for an irreducible. 51 . however.01.4 and. But this is “wrong”. π(0) + π(1) = 1. we know we have a unique stationary distribution π and that π(x) > 0 for all states x. n→∞ lim p00 p01 (n) (n) p10 p11 (n) (n) = 0. 1981).503 0.99π(0) = π(1). 18. from Cd back to C0 . However. Suppose that the period of the chain is d.503 0. In matrix notation. and this is expressed by our terminology. . It is not diﬃcult to see that 6 We put “wrong” in quotes. π(1) ≈ 0.f. Then Xn = 0 for even n and = 1 for odd n. positive recurrent and aperiodic Markov chain. By the results of Section 14. we can decompose the state space into d classes C0 . Indeed.497 18. . By the results of §5. Therefore p00 = 1 for even n (n) and = 0 for odd n. Theorem 4. per se wrong.497. and so on. A Markov chain which is irreducible and positive recurrent and aperiodic will be called mixing. customary to have a consistent terminology. Example showing that the aperiodicity assumption is essential for the limiting result: Consider the same chain as above: p01 = p10 = 1. p10 = 1. 6 The reason is that the term “ergodic” means something speciﬁc in Mathematics (c. p01 = 0. 0.503.5 Terminology A Markov chain which is irreducible and positive recurrent will be called ergodic. if we modify the chain by p00 = 0. Cd−1 . π(1) satisfy 0. Parry. . The reader is asked to pay attention to this terminology which is diﬀerent from the one used in the Markov chain literature. It is. Suppose X0 = 0. Thus. C1 . π(0) ≈ 0.6 Periodic chains Suppose now that we have a positive recurrent irreducible chain. i.99. . in particular.

after reduction by d) and. Can we also ﬁnd the limit when i and j do not belong to the same class? Suppose that i ∈ Ck . i ∈ Cr . as n → ∞. j ∈ Cℓ . n ≥ 0) is dπ(i). i ∈ S. π(i) = 1/d. Let αr := i∈Cr π(i). as n → ∞. starting from i. Suppose that X0 ∈ Cr . . . Suppose that pij are the transition probabilities of a positive recurrent irreducible chain with period d. Hence if we restrict the chain (Xnd .Lemma 8. . and let r = ℓ − k (modulo d). . Then pji = 0 unless j ∈ Cr−1 . 1. Then. the original chain enters Cℓ in r = ℓ − k steps (where r is taken to be a number between 0 and d − 1. We shall show that Lemma 9. Then pij (r+nd) (nd) = Pi (Xnd = j) → dπ(j). i ∈ Cr . Since the αr add up to 1. So π(i) = j∈Cr−1 π(j)pji . we have pij for all i. n ≥ 0) to any of the sets Cr we obtain a positive recurrent irreducible aperiodic chain. if sampled every d steps. The chain (Xnd . and the previous results apply. . j ∈ Cℓ . Proof. Summing up over i ∈ Cr we obtain αr = i∈Cr π(i) = j∈Cr−1 π(j) i∈Cr pji = j∈Cr−1 π(j) = αr−1 . n ≥ 0) is aperiodic. namely C0 . it remains in Cℓ . Since (Xnd . All states have period 1 for this chain. Suppose i ∈ Cr . . each one must be equal to 1/d. . Let i ∈ Ck . n ≥ 0) consists of d closed communicating classes. d − 1. j ∈ Cr . This means that: Theorem 18. We have that π satisﬁes the balance equations: π(i) = j∈S π(j)pji . 52 . the (unique) stationary distribution of (Xnd . where π is the stationary distribution. Cd−1 . . thereafter. → dπ(j). i∈Cr for all r = 0. Then.

). . schematically. . k = 1. Since they are (n2 ) subsequence n2 of n1 such that limk→∞ p1 k k k (nk ) (n) as n → ∞. . . for all i ∈ S. a probability distribution on N. 19. k = 1. p2 . 2. Since the numbers p1 n1 1 n1 2 (n) (n1 ) < < n1 < · · · . then pij → π(j) (Theorems 17). .e. i.19 Limiting theory for Markov chains* We wish to complete our discussion about the possible limits of the n-step transition prob(n) abilities pij . . If the period is d then the limit exists only along a selected subsequence (Theorem 18). Now consider the contained between 0 and 1 there must be a exists and equals. p3 . 2. By construction. It is clear how we (ni ) k continue. say. such that limk→∞ p1 k 3 (n1 ) numbers p2 k .1 Limiting probabilities for null recurrent states Theorem 19. Then we can ﬁnd a i=1 pi sequence n1 < n2 < n3 < · · · of integers such that limk→∞ pi Proof. p1 . for each n = 1. where = 1 and pi ≥ 0 for all i.. . p(n) = (n) (n) (n) (n) (n) ∞ p1 . k = 1. then pij → 0 as n → ∞. for all i ∈ S. p2 . (nk ) k → pi . So the diagonal subsequence (nk . . . if j is transient. are contained between 0 and 1. k = 1. such that limk→∞ pi k exists and equals. For each i we have found subsequence ni . as k → ∞. pi . say. . for each The second result is Fatou’s lemma. We also saw in Corollary (n) 7 that. say. . . Hence pi k i. . . the sequence n2 of the second row k is a subsequence of n1 of the ﬁrst row. the ﬁrst of which is known as HellyBray Lemma: Lemma 10. exists for all i ∈ N. Consider. If j is a null recurrent state then pij → 0. . The proof will be based on two results from Analysis. 2. . 2. for each i. . p1 p1 (n1 ) 1 p1 (n1 ) 2 (n2 ) 2 p1 (n1 ) 3 (n2 ) 3 (n3 ) 3 (n2 ) 1 (n3 ) 1 ··· ··· ··· ··· → → → ··· p1 p2 p3 ··· p1 p1 ··· p1 p1 ··· (n3 ) 2 p1 ··· Now pick the numbers in the diagonal. as n → ∞. .) is a k k subsequence of each (ni . 2. Recall that we know what happens when j is positive recurrent and i is in the same commu(n) nicating class as j If the period is 1. .. and so on. We have. e 53 . the sequence n3 of the third row is a subsequence k k of n2 of the second row. there must be a subsequence exists and equals. whose proof we omit7 : 7 See Br´maud (1999).

x∈S (n′ ) Since aj > 0. we have that π(x) := αx y∈S αy satisﬁes the balance equations and has x∈S π(x) = 1. 54 . (n′ ) Hence αx ≥ αy pyx . actually. by assumption. αy pyx . (n ) (n ) Consider the probability vector By Lemma 10. 2. By Corollary 13. k→∞ (n′ +1) = y∈S piy k pyx .Lemma 11. pix k . it is a strict inequality: αx0 > y∈S αy pyx0 . Let j be a null recurrent state. . i=1 (n) Proof of Theorem 19. equalities. . This would imply that αx > x∈S x∈S y∈S x∈S αy pyx = y∈S αy x∈S pyx = y∈S αy . i ∈ N). for some x0 ∈ S. i. the sequence (n) pij does not converge to 0. pix k = 1. (n′ ) lim pix k (n′ +1) ≥ y∈S k→∞ lim piy k pyx . But Pn+1 = Pn P. are sequences of nonnegative numbers then ∞ (n) yi lim inf xi i=1 n→∞ (n) ∞ ≤ lim inf n→∞ yi x i . i ∈ N). If (yi . i. y∈S Since x∈S αx ≤ 1.e. (xi . Now π(j) > 0 because αj > 0.e. αx ≤ lim inf k→∞ (n′ ) for all x ∈ S. Suppose that for some i. and the last equality is obvious since we have a probability vector. we have 0< x∈S The inequality follows from Lemma 11. x ∈ S . π is a stationary distribution. pix k Therefore. Suppose that one of them is not. We wish to show that these inequalities are. i. state j is positive recurrent: Contradiction. Hence we can ﬁnd a subsequence nk such that k→∞ lim pij k = αj > 0. y∈S x ∈ S. . n = 1. and this is a contradiction (the number αx = αx cannot be strictly larger than itself). by Lemma 11. we can pick subsequence (n′ ) of (nk ) such that k k→∞ lim pix k = αx .e. x ∈ S. Thus.

Let i be an arbitrary state. Let HC := inf{n ≥ 0 : Xn ∈ C}. Pµ (Xn−m = j) → π C (j). Indeed. Let C be the communicating class of j and let π C be the stationary distribution associated with C. as in §10. Theorem 20. we have pij = Pi (Xn = j) = Pi (Xn = j. From the Strong Markov Property. Hence (n) lim p n→∞ ij ∞ = π (j) m=1 C Pi (HC = m) = π C (j)Pi (HC < ∞). so we will only look that the aperiodic case. Let j be a positive recurrent state with period 1. In general. β. starting with the ﬁrst hitting time decomposition relation of Lemma 7. we have Pi (Xn = j | HC = m) = Pµ (Xn−m = j). The result when the period j is larger than 1 is a bit awkward to formulate. In this form. and fij = Pi (Tj < ∞). Let µ be the distribution of XHC given that X0 = i and that {HC < ∞}. and fij = ϕC (j). that is. as n → ∞.2. Then (n) lim p n→∞ ij = ϕC (i)π C (j). Proof. the result can be obtained by the so-called Abel’s Lemma of Analysis. γ < 1): γ 1−γ 1 1 2 1−α α 3 β 4 1−β 5 1 55 . E j Tj where Tj = inf{n ≥ 1 : Xn = j}. which is what we need. π C (j) = 1/Ej Tj . Remark: We can write the result of Theorem 20 as (n) lim p n→∞ ij = fij . Example: Consider the following chain (0 < α. But Theorem 17 tells us that the limit exists regardless of the initial distribution. as n → ∞.2 The general case (n) It remains to see what the limit of pij is. This is a more old-fashioned way of getting the same result. when j is a positive recurrent state but i and j do not necessarily belong to the same communicating class. If i also belongs to C then ϕC (i) = 1 and so the result follows from Theorem 17.19. HC ≤ n) n (n) = m=1 Pi (HC = m)Pi (Xn = j | HC = m). Let ϕC (i) := Pi (HC < ∞).

π C2 (5) = 1. Suppose that a Markov chain possesses a recurrent state a ∈ S and starts from an initial state X0 such that P (Ta < ∞) = 1. The phase (or state) space of a particle is not just a physical space but it includes. Pioneers in the are were. And so is the Law of Large Numbers itself (Theorem 13). p45 → (1 − ϕ(4))π C2 (5). information about its velocity. The ergodic hypothesis in physics says that the proportion of time that a particle spends in a particular region of its “phase space” is proportional to the volume of this region. (n) (n) p32 → ϕ(3)π C1 (2). among others. (n) (n) p34 → 0. ranging from Mathematical Physics and Dynamical Systems to Probability and Statistics. ϕ(4) = . Hence. Recall that. Theorem 14 is a type of ergodic theorem. Let f be a reward function such that |f (x)|ν [a] (x) < ∞. Consider the quantity ν [a] (x) deﬁned as the average number of visits to state x between two successive visits to state a–see 16. as n → ∞. 1+γ 1+γ Let ϕ(i) be the probability that class C1 will be reached starting from state i. 56 (20) x∈S . p44 → 0. from ﬁrst-step analysis. 1 − αβ 1 − αβ Both C1 . (n) (n) p43 → 0. p41 → ϕ(4)π C1 (1). ϕ(4) = (1 − β)ϕ(5) + βϕ(3). p42 → ϕ(4)π C1 (2). ϕ(5) = 0. ϕ(3) = p31 → ϕ(3)π C1 (1). 2} (closed class). in Theorem 14. π C1 (2) = . We have ϕ(1) = ϕ(2) = 1. We will avoid to give a precise deﬁnition of the term ‘ergodic’ but only mention what constitutes a result in the area. we assumed the existence of a positive recurrent state. 4} (inessential states). Ergodic Theory is nowadays an area that combines many ﬁelds of Mathematics and Science. 20 Ergodic theorems An ergodic theorem is a result about time-averages of random processes. We now generalise this. C2 = {5} (absorbing state). (n) (n) p35 → (1 − ϕ(3))π C2 (5). The subject of Ergodic Theory has its origins in Physics. C1 = {1. ϕ(3) = (1 − α)ϕ(2) + αϕ(4) . (n) (n) p33 → 0. C2 are aperiodic classes. whence 1−α β(1 − α) . The stationary distributions associated with the two closed classes are: 1 γ π C1 (1) = . For example. while. for example. Similarly.The communicating classes are I = {3. Theorem 21. the 19-th century French mathematicians Henri Poincar´ and e Joseph Liouville.

t t→∞ t t t with probability one. Glast are the total rewards over the ﬁrst and t t a a last incomplete cycles. Note that the denominator t n=0 1(Xn = a) equals the number Nt of visits to state a up to time t. in the former. as we saw in the proof of Theorem 14. The reader is asked to revisit the proof of Theorem 14. 57 . t t where Gr = T (r) ≤t<T (r+1) f (Xt ). Proof. Remark 2: If a chain is irreducible and positive recurrent then the Ergodic Theorem Theorem 14 holds. we break the total reward as follows: t Nt −1 f (Xn ) = n=0 r=1 Gr + Gﬁrst + Glast . The Law of Large Numbers tells us that 1 lim n→∞ n n−1 Gr = EG1 . (21) f (x)ν [a] (x). then. so the numerator converges to f /Ea Ta . Remark 1: The diﬀerence between Theorem 14 and 21 is that. Then at least one of the states must be visited inﬁnitely often. and this justiﬁes the terminology of §18. We only need to check that EG1 = f . since the number of states is ﬁnite. we have Nt /t → 1/Ea Ta .5. we assumed positive recurrence of the ground state a. So if we divide by t the numerator and denominator of the fraction in (21). As in the proof of Theorem 14. while Gﬁrst . we have C := i∈S ν(i) < ∞. r=0 with probability 1 as long as E|G1 | < ∞.Then P where f := x∈S t→∞ lim t n=0 t n=0 f (Xn ) 1(Xn = a) =f =1 . we obtain the result. Now. But this is precisely the condition (20). So there are always recurrent states. As in Theorem 14. Replacing n by the subsequence Nt . respectively. and this is left as an exercise. 21 Finite chains We take a brief look at some things that are speciﬁc to chains with ﬁnitely many states. If so. From proposition 2 we know that the balance equations ν = νP have at least one solution. we have that the denominator equals Nt /t and converges to 1/Ea Ta . we have t→∞ lim 1 last 1 G = lim Gﬁrst = 0. which is the quantity denoted by f in Theorem 14.

For example. In addition. then we instantly know that the ﬁlm is played in reverse. for all n. More precisely. 58 . . We summarise the above discussion in: Theorem 22. π(i) := Hence. . Xn has the same distribution as X0 for all n. X0 ). Reversibility means that (X0 . If. X1 = j) = P (X1 = i. . j ∈ S. (X0 . by Theorem 16. the chain is aperiodic. X0 ) for all n. By Corollary 13. if we ﬁlm a standing man who raises his arms up and down successively. a ﬁnite Markov chain always has positive recurrent states. in addition. it will look the same. like playing a ﬁlm backwards. i ∈ S. where P is the transition matrix and Π is a matrix all the rows of which are equal to π. Taking n = 1 in the deﬁnition of reversibility. we have Pn → Π. . this state must be positive recurrent. 22 Time reversibility Consider a Markov chain (Xn . X1 . actually. Then the chain is positive recurrent with a unique stationary distribution π. . that is. Xn ) has the same distribution as (Xn . . because positive recurrence is a class property (Theorem 15). In particular. n ≥ 0).Hence 1 ν(i). It is. Another way to put this is by saying that observing any part of the process will not reveal any information about whether the process runs forwards or backwards in time. then all states are positive recurrent. We say that it is time-reversible. . we see that (X0 . Lemma 12. Proof.e. . without being able to tell that it is. Xn−1 . . Theorem 23. Xn−1 . this means that it is stationary. i. C satisﬁes the balance equations and the normalisation condition i∈S π(i) = 1. A reversible Markov chain is stationary. running backwards. i. . the ﬁrst components of the two vectors must have the same distribution. Then the chain is reversible if and only if the detailed balance equations hold: π(i)pij = π(j)pji . X0 ). the stationary distribution is unique. Proof. At least some state i must have π(i) > 0. Consider an irreducible Markov chain with transition probabilities pij and stationary distribution π. X1 . X1 ) has the same distribution as (X1 . in a sense. Therefore π is a stationary distribution. . X0 = j). or. as n → ∞. Suppose ﬁrst that the chain is reversible. Consider an irreducible Markov chain with ﬁnitely many states. in addition. Because the process is Markov. and run the ﬁlm backwards. . P (X0 = i. Xn ) has the same distribution as (Xn . . But if we ﬁlm a man who walks and run the ﬁlm backwards. . reversible if it has the same distribution when the arrow of time is reversed. . . If. the chain is irreducible. simply.

X1 = j) = π(i)pij . So the chain is reversible. . (22) Proof. and hence (X0 . The DBE imply immediately that (X0 . In other words. it is easy to show that. π(i)pij pjk pki = [π(j)pji ]pjk pki = [π(j)pjk ]pji pki = [π(k)pkj ]pji pki = [π(k)pki ]pji pkj = [π(i)pik ]pji pkj = π(i)pik pkj pji Since the chain is irreducible. X1 . k. X0 ). (X0 . im ∈ S and all m ≥ 3. . . using DBE. . . suppose the DBE hold. . X0 ). suppose that the KLC (22) holds. X1 . j. . These are the DBE. X1 = j. The same logic works for any m and can be worked out by induction. . Summing up over all choices of states i3 . Then the chain is reversible if and only if the Kolmogorov’s loop criterion holds: for all i1 . . X2 ) has the same distribution as (X2 . . X2 = k) = P (X0 = k. X1 = i) = π(j)pji . we have π(i) > 0 for all i. Xn ) has the same distribution as (Xn . . for all i. using the DBE. Consider an aperiodic irreducible Markov chain with transition probabilities pij and stationary distribution π. Conversely. P (X0 = j. pi1 i2 pi2 i3 · · · pim−1 im = pi1 im pim im−1 · · · pi2 i1 . Pick 3 states i. then. X1 ) has the same distribution as (X1 . By induction. X1 . we have separation of variables: pij = f (i)g(j). . j. and write. X1 = j. k. Xn−1 . i4 . we have pi2 i1 implying that π(i1 )pi1 i2 = π(i2 )pi2 i1 . i2 . But P (X0 = i. Conversely. if there is an arrow from i to j in the graph of the chain but no arrow from j to i. and this means that P (X0 = i. Notes: 1) A necessary condition for reversibility is that pij > 0 if and only if pji > 0. Theorem 24. X0 ). Now pick a state k and write. Hence cancelling the π(i) from the above display we obtain the KLC for m = 3. 2) The DBE are equivalent to saying that the ratio pij /pji can be written as a product of a function of i and a function of j. π(i)pij pjk = [π(j)pji ]pjk = [π(j)pjk ]pji = π(k)pkj pji . So the detailed balance equations (DBE) hold. . . Hence the DBE hold.for all i and j. and the chain is reversible. and pi1 i2 (m−3) → π(i2 ). then the chain cannot be reversible. im−1 we obtain (m−3) (m−3) pi1 i2 pi2 i1 = pi1 i2 pi2 i1 If the chain is aperiodic. . letting m → ∞. . Suppose ﬁrst that the chain is reversible. for all n. . We will show the Kolmogorov’s loop criterion (KLC). pji 59 (m−3) → π(i1 ). X2 = i).

pji g(i) Multiplying by g(i) and summing over j we ﬁnd g(i) = j∈S g(j)pji . and. There is obviously only one arrow from AtoAc . and by replacing. the graph becomes disconnected. Hence the original chain is reversible.1) which gives π(i)pij = π(j)pji . This gives the following method for showing that a ﬁnite irreducible chain is reversible: • Form the ratio pij /pji and show that the variables separate. The reason is as follows. For example. to guarantee.e. the detailed balance equations. consider the chain: Replace it by: This is a tree. meaning that it has no loops. then the chain is reversible. in addition. Form the graph by deleting all self-loops. the two oriented edges from i to j and from j to i by a single edge without arrow. (If it didn’t. Hence the chain is reversible. by assumption. and the arrow from j to j is the only one from Ac to A. there would be a loop. using another method. i. Pick two distinct states i. for each pair of distinct states i. j for which pij > 0. then we need. there isn’t any. Ac ) = F (Ac .) Let A. If we remove the link between them. A) (see §4. so we have pij g(j) = . j for which pij > 0.) Necessarily. If the graph thus obtained is a tree. f (i)g(i) must be 1 (why?). namely the arrow from i to j. that the chain is positive recurrent.(Convention: we form this ratio only when pij > 0. Ac be the sets in the two connected components. Example 1: Chain with a tree-structure. • If the chain has inﬁnitely many states. Consider a positive recurrent irreducible Markov chain such that if pij > 0 then pji > 0. And hence π can be found very easily. 60 . So g satisﬁes the balance equations. The balance equations are written in the form F (A.

Hence we conclude that: 1. the DBE hold. 0. The constant C is found by normalisation. Suppose the graph is connected. In all cases. j are not neighbours then both sides are 0. For example. The stationary distribution assigns to a state probability which is proportional to its degree. A RWG is a reversible chain. The Markov chain with these transition probabilities is called a random walk on a graph (RWG). i∈S where |E| is the total number of edges. Each edge connects a pair of distinct vertices (states) without orientation. p02 = 1/4. To see this let us verify that the detailed balance equations hold. if j is a neighbour of i otherwise. π(i)pij = π(j)pji . where C is a constant. States j are i are called neighbours if they are connected by an edge.Example 2: Random walk on a graph. We claim that π(i) = Cδ(i). and a set E of undirected edges. If i. we have that C = 1/2|E|. j are neighbours 1 1 then π(i)pij = Cδ(i) δ(i) = C and π(j)pji = Cδ(j) δ(j) = C. a ﬁnite set. so both members are equal to C. There are two cases: If i. 2.e. 61 . Now deﬁne the the following transition probabilities: pij = 1/δ(i). etc. i. i∈S Since δ(i) = 2|E|. π(i) = 1. Consider an undirected graph G with vertices S. 1 4 0 2 3 Here we have p12 = p10 = p14 = 1/3. For each state i deﬁne its degree δ(i) = number of neighbours of i. The stationary distribution of a RWG is very easy to ﬁnd.

where ξt are i. . 62 n ∈ N. . . St+1 = St + ξt . Let (St . . . where pij are the m-step transition probabilities of the original chain. in addition. The sequence (Yk . . 2. . 2. If π is the stationary distribution of the original chain then π/π(A) is the stationary distribution of (Yr ). . random variables with values in N and distribution λ(n) = P (ξ0 = n). t ≥ 0. k = 0. then it is a stationary distribution for the new chain. If the subsequence forms an arithmetic progression. if nk = n0 + km. π is a stationary distribution for the original chain. Let A be (r) a set of states and let TA be the r-th visit to A. in general.) has the Markov property but it is not time-homogeneous. Deﬁne Yr := XT (r) .e.) has time-homogeneous transition probabilities: P (Yk+1 = j|Yk = i) = pij . k = 0. 1.3 Subordination Let (Xn . k = 0. . 2. . Due to the Strong Markov Property. . (m) (m) 23. a sequence of integers (nk )k≥0 such that 0 ≤ n0 < n1 < n2 < · · · and by letting Yk := Xnk .2 Watching a Markov chain when it visits a set Suppose that the original chain is ergodic (i. n ≥ 0) be Markov with transition probabilities pij . then (Yk .e.d.i. 1.e. . k = 0. i. . i. A r = 1.23 New Markov chains from old ones* We now deal with the question: if we have a Markov chain (Xn ). t ≥ 0) be independent from (Xn ) with S0 = 0. 1. 1. . . 2. . irreducible and positive recurrent). . this process is also a Markov chain and it does have time-homogeneous transition probabilities. The state space of (Yr ) is A. 2.1 Subsequences The most obvious way to create a new Markov chain is by picking a subsequence. In this case. 23. if. how can we create new ones? 23.

.4 Deterministic function of a Markov chain If (Xn ) is a Markov chain with state space S and if f : S → S ′ is some function from S to some countable set S ′ . . Suppose that P (Yn+1 = β | Xn = i) depends on i only through f (i). however. β). We need to show that P (Yn+1 = β | Yn = α. β. knowledge of f (Xn ) implies knowledge of Xn and so. that there is a function Q : S ′ × S ′ → R such that P (Yn+1 = β | Xn = i) = Q(f (i).Then Yt := XSt .e. we have: Lemma 13.. i. Yn−1 = αn−1 }. is a Markov chain with P (Yt+1 = j|Yt = i) = n=1 t ≥ 0. . a Markov chain. . n≥0 63 . and the elements of S ′ by α. . Then (Yn ) has the Markov property. α1 . ∞ λ(n)pij . . . conditional on Yn the past is independent of the future. . . αn−1 ∈ S ′ and let Yn−1 be the event Yn−1 := {Y0 = α0 ..) More generally. . . . then the process Yn = f (Xn ). Indeed. . . (Notation: We will denote the elements of S by i. Proof. Fix α0 . If. is NOT. (n) 23. in general. j. Yn−1 ) = P (Yn+1 = β | Yn = α). f is a one-to-one function then (Yn ) is a Markov chain.

if β = 1. if i = 0. β) := 1. Yn−1 ) Q(f (i). β) 1(α = f (i))P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. β). But P (Yn+1 = β | Yn = α. β). In other words. indeed. To see this. and if i = 0. conditional on {Xn = i}. Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = i∈S = i∈S = i∈S Q(f (i). regardless of whether i > 0 or i < 0. Yn−1 ) = i∈S P (Yn+1 = β | Xn = i. i ∈ Z. Indeed.e. then |Xn+1 | = 1 for sure. α > 0 Q(α. the absolute value of next state will either increase by 1 or decrease by 1 with probability 1/2. if β = α ± 1. a Markov chain with transition probabilities Q(α. β). Xn = i|Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. Yn−1 )P (Xn = i | Yn = α. for all i ∈ Z and all β ∈ Z+ . β ∈ Z+ . i. Thus (Yn ) is Markov with transition probabilities Q(α. But we will show that P (Yn+1 = β|Xn = i). Deﬁne Yn := |Xn |. if we let 1/2. β) P (Yn = α|Yn−1 ) i∈S = Q(α. 64 . Then (Yn ) is a Markov chain. Example: Consider the symmetric drunkard’s random walk. Yn = α. β)P (Xn = i | Yn = α. Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. because the probability of going up is the same as the probability of going down. β) i∈S P (Yn = α|Xn = i. P (Xn+1 = i ± 1|Xn = i) = 1/2. α = 0. On the other hand. observe that the function | · | is not one-to-one. Yn−1 ) Q(f (i). β) P (Yn = α|Xn = i. We have that P (Yn+1 = β|Xn = i) = Q(|i|. depends on i only through |i|. and so (Yn ) is. β). β) P (Yn = α.for all choices of the variables involved. i ∈ Z.

In other words. . Then P (ξ = 0) + P (ξ ≥ 2) > 0. The parent dies immediately upon giving birth.24 24. (and independent of the initial population size X0 –a further assumption).i. n ≥ 0) has the Markov property. . . Computing the transition probabilities pij = P (Xn+1 = j|Xn = i) is. therefore transient. if 0 is never reached then Xn → ∞. k = 0. and. p0 = 1 − p. ξ2 . we have that (Xn . Example: Suppose that p1 = p. If p1 = P (ξ = 1) = 1 then Xn = X0 + n and this is a trivial case. If we know that Xn = i then Xn+1 is the number of successful births amongst i individuals. since ξ (0) . Case II: If p0 = P (ξ = 0) > 0 then state 0 can be reached from any other state: indeed.1 Applications Branching processes: a population growth application We describe a basic model for population growth known as the Galton-Watson process. The state 0 is irrelevant because it cannot be reached from anywhere. It is a Markov chain with time-homogeneous transitions. each individual gives birth to at one child with probability p or has children with probability 1 − p. Hence all states i ≥ 1 are inessential. with probability 1. We let pk be the probability that an individual will bear k oﬀspring.d. ξ (1) . 0 ≤ j ≤ i. in general. The idea is this: At each point of time n we have a number Xn of individuals each of which gives birth to a random number of oﬀspring. independently from one another. if the 65 (n) (n) . 1. Therefore. if the current population is i there is probability pi that everybody gives birth to zero oﬀspring. . then we see that the above equation is of the form Xn+1 = f (Xn . the population cannot grow: it will eventually reach 0. and 0 is always an absorbing state. and following the same distribution. ξ (n) ). one question of interest is whether the process will become extinct or not. We distinguish two cases: Case I: If p0 = P (ξ = 0) = 0 then the population can only grow. . 2. are i. 0 But 0 is an absorbing state. Back to the general case. . So assume that p1 = 1. j In this case. If we let ξ (n) = ξ1 . Thus. a hard problem. . Let ξk be the number of oﬀspring of the k-th individual in the n-th generation. We ﬁnd the population (n) at time n + 1 by adding the number of oﬀspring of each individual. . We will not attempt to compute the pij neither draw the transition diagram because they are both futile. But here is an interesting special case. Then the size of the (n + 1)-st generation is: (n) (n) (n) Xn+1 = ξ1 + ξ2 + · · · + ξXn .. which has a binomial distribution: i j pij = p (1 − p)i−j . so we will ﬁnd some diﬀerent method to proceed. . Hence all states i ≥ 1 are transient.

we have n→∞ lim ϕn (0) = lim P (Xm = 0 for all m ≥ n) n→∞ = P (there exists n such that Xm = 0 for all m ≥ n) = P (D) = ε. If µ > 1 then ε < 1 is the unique solution of ϕ(z) = z. when we start with X0 = i. ε(0) = 1. so P (D|X0 = i) = εi . Proof. Consider a Galton-Watson process starting with X0 = 1 initial ancestor. Therefore the problem reduces to the study of the branching process starting with X0 = 1. ϕn (0) = P (Xn = 0). each one behaves independently of one another. Let ∞ µ = Eξ = k=1 kpk be the mean number of oﬀspring of an individual. This function is of interest because it gives information about the extinction probability ε. Let ε be the extinction probability of the process. and P (Dk ) = ε. The result is this: Proposition 5. . We let ϕn (z) := Ez Xn ∞ = k=0 z k P (Xn = k) be the probability generating function of Xn . and. . i. Indeed. We then have that D = D1 ∩ · · · ∩ Di . 66 . Obviously. . let Dk be the event that the branching process corresponding to the k-th initial ancestor dies. if we start with X0 = i in ital ancestors. . since Xn = 0 implies Xm = 0 for all m ≥ n. If µ ≤ 1 then the ε = 1. For each k = 1. Let ∞ ϕ(z) := Ez ξ = k=0 z k pk be the probability generating function of ξ. we can argue that ε(i) = εi . where ε = P1 (D). For i ≥ 1. We wish to compute the extinction probability ε(i) := Pi (D).population does not become extinct then it must grow to inﬁnity. Let D be the event of ‘death’ or extinction: D := {Xn = 0 for some n ≥ 0}. Indeed.

But ϕn+1 (0) → ε as n → ∞. In particular. ε = ε∗ . Therefore ε = ε∗ . But 0 ≤ ε∗ . Therefore. We then have ϕn+1 (z) = Ez Xn+1 = E z ξ1 z ξ2 · · · z ξXn ∞ (n) (n) (n) ϕ1 (z) = ϕ(z). recall that ϕ is a convex function with ϕ(1) = 1.) 67 . ϕ3 (z) = ϕ2 (ϕ(z)) = ϕ(ϕ(ϕ(z))) = ϕ(ϕ2 (z)). ϕn+1 (z) = ϕ(ϕn (z)). (See right ﬁgure below. ϕ(ϕn (0)) → ϕ(ε). Let us show that. But ϕn (0) → ε. since ϕ is a continuous function. ϕn (0) ≤ ε∗ . Therefore ε = ϕ(ε). Also ϕn (0) → ε. if µ > 1. Hence ϕ2 (z) = ϕ1 (ϕ(z)) = ϕ(ϕ(z)). the graph of ϕ(z) lies above the diagonal (the function z) and so the equation z = ϕ(z) has no solutions other than z = 1. This means that ε = 1 (see left ﬁgure below). all we need to show is that ε < 1. On the other hand. ϕn+1 (0) = ϕ(ϕn (0)). Hence ϕ(0) ≤ ϕ(ε∗ ) = ε∗ . and. So ε ≤ ε∗ < 1. and so on. if the slope µ at z = 1 is ≤ 1. By induction. z=1 Also. and since the latter has only two solutions (z = 0 and z = ε∗ ). Since both ε and ε∗ satisfy z = ϕ(z). = i=0 ∞ E z ξ 1 z ξ 2 · · · z ξ Xn E z ξ1 z ξ2 · · · z ξi (n) (n) (n) (n) (n) (n) (n) (n) 1(Xn = i) 1(Xn = i) (n) = i=0 ∞ = i=0 ∞ E[z ξ1 ] E[z ξ2 ] · · · E[z ξi ] E[1(Xn = i)] ϕ(z)i P (Xn = i) = i=0 = E ϕ(z)Xn = ϕn (ϕ(z)). in fact. Notice now that µ= d ϕ(z) dz .Note also that ϕ0 (z) = 1. then the graph of ϕ(z) does intersect the diagonal at some point ε∗ < 1.

But this is not a good method it is a waste of time to allocate a slot to a host if that host has nothing to transmit. An the number of new packets on the same time slot. is that if more than one hosts attempt to transmit a packet. Then ε = 1. in a simpliﬁed model. . on the next times slot. we let hosts transmit whenever they like. 3. 24.. there is collision and the transmission is unsuccessful. then the transmission is not successful and the transmissions must be attempted anew. The problem of communication over a common channel. 2. 1. We assume that the An are i. If s = 1 there is a successful transmission and. Each of the backlogged or newly arrived packets is selected for transmission with (a small) probability p. So if we let Xn be the number of backlogged packets at the beg ginning of the n-th time slot. are allocated to hosts 1. there are x + a packets. 6. random variables with some common distribution: P (An = k) = λk . using the same frequency. Case: µ > 1. One obvious solution is to allocate time slots to hosts in a speciﬁc order. there is a collision and. So if there are. k ≥ 0. . and Sn the number of selected packets. if Sn = 1. If s ≥ 2. Thus. If collision happens. Nowadays. 3. Abandoning this idea. 2. So. Since things happen fast in a computer network. assume that on a time slot there are x packets amongst all hosts that are waiting for transmission (they are ‘backlogged’).i. 5. for a packet to go through. 3. periodically. 0). Xn+1 = Xn + An . 2. 1.2 The ALOHA protocol: a computer communications application This is a protocol used in the early days of computer networking for having computer hosts communicate over a shared medium (radio channel). there is no time to make a prior query whether a host has something to transmit or not. During this time slot assume there are a new packets that need to be transmitted. 3 hosts. 1. otherwise. . on the next times slot. say. there are x + a − 1 packets. Then ε < 1.1 1 (z) (z) ε 0 1 0 ε 1 z z Case: µ ≤ 1. . Ethernet has replaced Aloha. . then time slots 0. 68 . 4.d. Let s be the number of selected packets. then max(Xn + An − 1. . only one of the hosts should be transmitting at a given time.

69 . we have q < 1.x+k = λk (1 − (x + k)pq x+k−1 ) + λk+1 (x + k + 1)pq x+k . and exactly one selection: px.e.x+k for k ≥ 1. where G(x) is a function of the form α + βx . Xn−1 . n ≥ 0) is a Markov chain. we must have no new packets arriving. Since p > 0. To compute px. Unless µ = 0 (which is a trivial case: no arrivals!).x−1 = P (An = 0. we should not subtract 1. Idea of proof.x+k kλk − kλk (x + k)pq x+k−1 + k≥1 x k≥1 = −λ0 xpq x−1 + = −λ0 xpq x x−1 kλk+1 (x + k + 1)pq x+k . An−1 . we have that the drift is positive for all large x and this proves transience. Sn = 1|Xn = x) = λ0 xpq x−1 . Indeed. the Markov chain (Xn ) is transient. Therefore ∆(x) ≥ µ/2 for all large x. where q = 1 − p. k ≥ 1. Proving this is beyond the scope of the lectures. The main result is: Proposition 6. and so q x tends to 0 much faster than G(x) tends to ∞. So: px. .e. for example we can make it ≤ µ/2. . conditional on Xn and An (and all past states and arrivals) a distribution which is Binomial with parameters Xn + An . Let us ﬁnd its transition probabilities. selections are made amongst the Xn + An packets. . The reason we take the maximum with 0 in the above equation is because. we think like this: to have an increase by k we either need exactly k new packets and no successful transmissions (i. The random variable Sn has. i. but we mention that it is based on the so-called drift criterion which says that if the expected change Xn+1 − Xn conditional on Xn = x is bounded below from a positive constant for all large x.x−1 when x > 0. So P (Sn = k|Xn = x. For x > 0.) = x + a k x−k p q . then the chain is transient.x−1 + k≥1 kpx. and p. An = a.where λk ≥ 0 and k≥0 kλk = 1. if Xn + An = 0. Sn should be 0 or ≥ 2) or we need k + 1 new packets and one selection (Sn = 1). deﬁned by‘ ∆(x) := E[Xn+1 − Xn |Xn = x]. k 0 ≤ k ≤ x + a. so q x G(x) tends to 0 as x → ∞. We here only restrict ourselves to the computation of this expected change. So we can make q x G(x) as small as we like by choosing x suﬃciently large. x+k−1 k≥1 + µ − λ1 (x + 1)pq − λk (x + k)pq k≥2 = µ − q G(x). The process (Xn . The probabilities px. we have ∆(x) = (−1)px. we think as follows: to have a reduction of x by 1. The Aloha protocol is unstable.x are computed by subtracting the sum of the above from 1. We also let µ := k≥1 kλk be the expectation of An . To compute px.

in the latter case. because of the size of the problem. the chain moves to a random page. Comments: 1. For each page i let L(i) be the set of pages it links to. there are companies that sell high-rank pages. A new use of PageRank is to rank academic doctoral programs based on their records of ﬁnding jobs for their graduates. unless i is a dangling page. In an Internet-dominated society.3 PageRank (trademark of Google): A World Wide Web application PageRank is an algorithm used by the popular search engine Google in order to assign importance to a web page. We pick α between 0 and 1 (typically. So if Xn = i is the current location of the chain. The interpretation is that of a ‘bored surfer’: a surfer moves according the the original Markov chain. if L(i) = ∅ 0. The way it does that is by computing the stationary distribution of a Markov chain that we now describe. neither that it is aperiodic. There is a technique. The algorithm used (PageRank) starts with an initial guess π(0) for π and updates it by π (k) = π (k−1) P. unless he gets bored–and this happens with probability α–in which case he clicks on a random page. otherwise. |V | These are new transition probabilities. then i is a ‘dangling page’. α ≈ 0.2) and let α pij := (1 − α)pij + . Deﬁne transition probabilities as follows: 1/|L(i)|. The web graph is a a graph set of vertices V consisting of web pages and edges representing hyperlinks: thus. It uses advanced techniques from linear algebra to speed up computations. PageRank. if j ∈ L(i) pij = 1/|V |. known as ‘302 Google jacking’. to compute the stationary distribution π corresponding to the Markov chain with transition matrix P = [pij ]: π = πP. All you need to do to get high rank is pay (a lot of money). if page i has a link to page j then we put an arrow from i to j and consider it as a directed edge of the graph. If L(i) = ∅. the rank of a page may be very important for a company’s proﬁts. The idea is simple. Its implementation though is one of the largest matrix computations in the world. So this is how Google assigns ranks to pages: It uses an algorithm. that allows someone to manipulate the PageRank of a web page and make it much higher. Then it says: page i has higher rank than j if π(i) > π(j). So we make the following modiﬁcation. 3.24. 2. the next location Xn+1 is picked uniformly at random amongst all pages that i links to. It is not clear that the chain is irreducible. Academic departments link to each other by hiring their 70 . Therefore.

in each generation. the genetic diversity depends only on the total number of alleles of each type. generation after generation. Imagine that we have a population of N individuals who mate at random producing a number of oﬀspring. 5. this allele is of type A with probability x/2N . maintain this allele for the next generation. The alleles in an individual come in pairs and express the individual’s genotype for that gene. meaning how many characteristics we have. Since we don’t care to keep track of the individuals’ identities. But it is a kind of corruption that is hard to trace since it is unanimously equated with modern use of Information Technology. If so. When two individuals mate. To simplify. So let x be the number of alleles of type A (so and 2N − x is the number of those of type a). their oﬀspring inherits an allele from each parent. Thus. then. Three of the square alleles are never selected. colour of eyes) appears in two forms. We will assume that a child chooses an allele from each parent at random. We will also assume that the parents are replaced by the children. What is important is that the evaluation can be done without ever looking at the content of one’s research. Do the same thing 2N times. The process repeats. it makes the evaluation procedure easy. all we need is the information outlined in this ﬁgure. One of the square alleles is selected 4 times. One of the circle alleles is selected twice. we will assume that. Lazy university administrators use PageRank-like systems for assigning a single number to a researcher’s output. a gene controlling a speciﬁc characteristic (e. at the next generation the number of alleles of type A can be found as follows: Pick an allele at random. 4. 71 .g. And so on. Alleles that are never picked up may belong to individuals who never mate or may belong to individuals who mate but their oﬀspring did not inherit this allele.4 The Wright-Fisher model: an example from biology In an organism. We are now ready to deﬁne the Markov chain Xn representing the number of alleles of type A in generation n. But an AA mating with a Aa may produce a child of type AA or one of type Aa. This number could be independent of the research quality but. a dominant (say A) and a recessive (say a). An individual can be of type AA (meaning that all its alleles are A) or aa or Aa. We would like to study the genetic diversity. two AA individuals will produce and AA child. This means that the external appearance of a characteristic is a function of the pair of alleles. from the point of view of an administrator. 24. Needless to say that these uses of PageRank and PageRank-like systems represent corruption at the deepest level. In this ﬁgure there are 4 alleles of type A (circles) and 6 of type a (squares). We have a transition from x to y if y alleles of type A were produced.faculty from each other (and from themselves).

72 . at “the end of time” the chain (started from x) will be either M with probability ϕ(x) or 0 with probability 1 − ϕ(x). so both 0 and 2N are absorbing states. p00 = p2N. on the average. conditionally on (X0 .e. the expectation doesn’t change and. the chain gets absorbed by either 0 or 2N . 2N − 1} form a communicating class of inessential states. X0 ) = 1 − Xn . . These are: M ϕ(0) = 0. .Each allele of type A is selected with probability x/2N . . . Another way to represent the chain is as follows: n n Xn+1 = ξ1 + · · · + ξM . Let M = 2N . . the number of alleles of type A won’t change. . X0 ) = E(Xn+1 |Xn ) = M This implies that. E(Xn+1 |Xn . Tx := inf{n ≥ 1 : Xn = x}. there is a x chance (= ( 2N )2N ) that only alleles of type A are selected. since. 1− y 2N 2N x Notice that there is a chance (= (1 − 2N )2N ) that no allele of type A is selected at all.i. we see that 2N x 2N −y x y pxy = . 0 ≤ y ≤ 2N. the states {1. . Therefore P (Xn = 0 or Xn = 2N for all large n) = 1. . since. . The result is correct. . i. . as usual. ϕ(x) = y=0 pxy ϕ(y). . . M and thus. we should have M × ϕ(x) + 0 × (1 − ϕ(x)) = x. i. . M n P (ξk = 0|Xn . ξM are i. Since selections are independent. Similarly. EXn+1 = EXn Xn = Xn . . . y ≤ 2N − 1. the random variables ξ1 . . . We show that our guess is correct by proceeding via the mundane way of verifying that it satisﬁes the ﬁrst-step analysis equations. . but the argument needs further justiﬁcation because “the end of time” is not a deterministic time.e. ϕ(x) = x/M. for all n ≥ 0. Xn ). n n where. we would like to know the probability ϕ(x) = Px (T0 > TM )P (T0 > TM |X0 = x) that the chain will eventually be absorbed by state M . X0 ) = Xn . ϕ(M ) = 1. Clearly. (Here. . The fact that M is even is irrelevant for the mathematical model. M Therefore. . on the average. . . with n P (ξk = 1|Xn . and the population becomes one consisting of alleles of type A only.2N = 1.) One can argue that. and the population becomes one consisting of alleles of type a only. Clearly. . Since pxy > 0 for all 1 ≤ x.d.

i. Let ξn := An − Dn . We will apply the so-called Loynes’ method. Let us consider (n) gxy := Px (Xn > y) = P (x + ξ0 + · · · + ξn−1 ) ∨ (ξ1 + · · · + ξn−1 ) ∨ · · · ∨ ξn−1 ∨ 0 > y . 73 .The ﬁrst two are obvious. n ≥ 0) is a Markov chain. Thus.5 A storage or queueing application A storage facility stores. Here are our assumptions: The pairs of random variables ((An . the demand is. the demand is met instantaneously. A number An of new components are ordered and added to the stock. at time n + 1. the demand is only partially satisﬁed. We will show that this Markov chain is positive recurrent. M −1 y−1 x M y−1 1− x . If the demand is for Dn components. while a number of them are sold to the market. and E(An − Dn ) = −µ < 0. b). Dn ). . provided that enough components are available. on the average. there are Xn + An − Dn components in the stock. We can express these requirements by the recursion Xn+1 = (Xn + An − Dn ) ∨ 0 where a ∨ b := max(a. we will try to solve the recursion. Xn electronic components at time n. The right-hand side of the last equals M y=0 M y x M y x 1− M M −y y = M = M y=1 M y=1 M! x y!(M − y)! M y 1− x M M −y y M x M M −y−1+1 x (M − 1)! (y − 1)!(M − y)! M M y=1 y−1+1 1− x = M x = M as needed. . we ﬁnd X1 = (X0 + ξ0 ) ∨ 0 X2 = (X1 + ξ1 ) ∨ 0 = (X0 + ξ0 + ξ1 ) ∨ ξ1 ∨ 0 ··· X3 = (X2 + ξ2 ) ∨ 0 = (X0 + ξ0 + ξ1 + ξ2 ) ∨ (ξ1 + ξ2 ) ∨ ξ2 ∨ 0 Xn = (X0 + ξ0 + · · · + ξn−1 ) ∨ (ξ1 + · · · + ξn−1 ) ∨ · · · ∨ ξn−1 ∨ 0. 2. . We have that (Xn . The correctness of the latter can formally be proved by showing that it does satisfy the recursion. say.d. so that Xn+1 = (Xn + ξn ) ∨ 0. larger than the supply per unit of time. Since the Markov chain is represented by this very simple recursion. for all n ≥ 0. and so. n ≥ 0) are i. Otherwise. M x M (M −1)−(y−1) x x +1− M M M −1 = 24. Working for n = 1. and the stock reached level zero at time n + 1.

. n→∞ n→∞ We claim that π(y) := g(y) − g(y − 1) satisﬁes the balance equations. the event whose probability we want to compute is a function of (ξ0 . so we can shuﬄe them in any way we like without altering their joint distribution. Hence. In particular. . . . . instead of considering them in their original order. . we ﬁnd π(y) = x≥0 π(x)pxy . Let Zn := ξ0 + · · · + ξn−1 Mn := Zn ∨ Zn−1 ∨ · · · ∨ Z1 ∨ 0. . . . Indeed. we reverse it. . Then Since Mn+1 ≥ Mn we have g0y ≤ g0y (n) (n) (n+1) g0y = P (Mn > y). we may take the limit of both sides as n → ∞. P (Mn+1 = y) = P (0 ∨ (Mn + ξ0 ) = y) = x≥0 P (Mn = x)P (0 ∨ (x + ξ0 ) = y) P (Mn = x) pxy . where the latter equality follows from the fact that. using π(y) = limn→∞ P (Mn = y). . ξn−1 . Therefore. . (n) . The limit of a bounded and increasing sequence certainly exists: g(y) := lim g0y = lim P0 (Xn > y). . . we may replace the latter random variables by ξ1 . in computing the distribution of Mn which depends on ξ0 . . for all n. (n) gxy = P (x + ξn−1 + · · · + ξ0 ) ∨ (ξn−2 + · · · + ξ0 ) ∨ · · · ∨ ξ0 ∨ 0 > y . ξn−1 ) = (ξn−1 . . and. . for y ≥ 0. ξn−1 ). 74 . . = x≥0 Since the n-dependent terms are bounded.d. ξ0 ). ξn .i. . Notice. But Mn+1 = 0 ∨ max Zj 1≤j≤n+1 =0∨ d 1≤j≤n+1 max (Zj − Z1 ) + ξ0 = 0 ∨ (Mn + ξ0 ). that d (ξ0 . the random variables (ξn ) are i. however.Thus. .

Z2 . n→∞ But then P ( lim Mn = 0 ∨ max{Z1 . the Markov chain may be transient or null recurrent. . . 75 . From the Law of Large Numbers we have P ( lim Zn /n = −µ) = 1. .} < ∞) = 1. We must make sure that it also satisﬁes y π(y) = 1. This will be the case if g(y) is not identically equal to 1. . . It can also be shown (left as an exercise) that if Eξn > 0 then the Markov chain is transient. Depending on the actual distribution of ξn . . as required. we will use the assumption that Eξn = −µ < 0. Z2 . n→∞ This implies that P ( lim Zn = −∞) = 1.So π satisﬁes the balance equations.} > y) is not identically equal to 1. n→∞ Hence g(y) = P (0 ∨ max{Z1 . The case Eξn = 0 is delicate. Here.

p(−1) = q. so far. here. ±ed } otherwise 76 . Example 1: Let p(x) = C2−|x1 |−···−|xd | . . we need to give more structure to the state space S. that of spatial homogeneity. the skip-free property. For example. a random walk is a Markov chain with time. a structure that enables us to say that px. Deﬁne pj . besides time. if x ∈ {±e1 .PART II: RANDOM WALKS 25 Random walk The purpose of this part of the notes is to take a closer look at a special kind of Markov chain known as random walk.and space-homogeneity. then p(1) = p. this walk enjoys. groups) are allowed. . . d p(x) = qj . x ∈ Zd . . . More general spaces (e. . we can let S = Zd . In dimension d = 1. if x = ej . Our Markov chains. have been processes with timehomogeneous transition probabilities. . given a function p(x). d 0. Example 3: Deﬁne p(x) = 1 2d . but.y = px+z.and space-homogeneous transition probabilities given by px. This means that the transition probability pxy should depend on x and y only through their relative positions in space. where p1 + · · · + pd + q1 + · · · + qd = 1. otherwise. A random walk possesses an additional property. To be able to say this. namely that to go from state x to state y it must pass through all intermediate states because its value can change by at most 1 at each step. j = 1. So.g. we won’t go beyond Zd . j = 1. A random walk of this type is called simple random walk or nearest neighbour random walk. namely. the set of d-dimensional vectors with integer coordinates. Example 2: Let where ej be the vector that has 1 in the j-th position and 0 everywhere else. if x = −ej . where C is such that x p(x) = 1. . otherwise where p + q = 1. This is the drunkard’s walk we’ve seen before. p(x) = 0. If d = 1. . 0. .y = p(y − x). .y+z for any translation z. .

x ∈ Zd . has equal probabilities to move from one state x to one of its “neighbouring states” x ± e1 . computing the n-th order convolution for all n is a notoriously hard problem. We call it simple symmetric random walk. We refer to ξn as the increments of the random walk. x ± ed .This is a simple random walk. where ξ1 . . (Recall that δz is a probability distribution that assigns probability 1 to the point z. ξ2 . . Furthermore. otherwise. But this would be nonsense. (When we write a sum without indicating explicitly the range of the summation variable. Indeed. Let us denote by Sn the state of the walk at time n. x ∈ Zd is deﬁned as a new probability p ∗ p′ given by (p ∗ p′ )(x) = The reader can easily check that p ∗ p′ = p′ ∗ p. We easily see that P (ξ1 + · · · + ξn = x) = p∗n (x). y p(x)p′ (y − x).) In this notation. random variables in Zd with common distribution P (ξn = x) = p(x).d. x + ξ2 = y) = y p(x)p(y − x). Furthermore. xy The random walk ξ1 + · · · + ξn is a random walk starting from 0. p ∗ δz = p. we will mean a sum over all possible values. we have P (ξ1 + ξ2 = x) = (p ∗ p)(x). We shall resist the temptation to compute and look at other 77 . . the initial state S0 is independent of the increments. p(n) = Px (Sn = y) = P (x + ξ1 + · · · + ξn = y) = P (ξ1 + · · · + ξn = y − x). If d = 1. which. p ∗ (p′ ∗ p′′ ) = (p ∗ p′ ) ∗ p′′ . p′ (x). . The special structure of a random walk enables us to give more detailed formulae for its behaviour. and this is the symmetric drunkard’s walk. in general. . p(x) = 0. We will let p∗n be the convolution of p with itself n times. So. Obviously. . The convolution of two probabilities p(x).) Now the operation in the last term is known as convolution. in addition. because all we have done is that we dressed the original problem in more fancy clothes. notice that P (ξ1 + ξ2 = x) = y P (ξ1 = x. One could stop here and say that we have a formula for the n-step transition probability. we can reduce several questions to questions about this random walk. It is clear that Sn can be represented as Sn = S0 + ξ1 + · · · + ξn . then p(1) = p(−1) = 1/2. are i. .i.

So. . .methods. + (1. 1) → (2. If not. a path of length n. for ﬁxed n. but its practice may be hard. 2) → (3.0) We can thus think of the elements of the sample space as paths. −1. if we can count the number of elements of A we can compute its probability. So if the initial position is S0 = 0. First. ξn ).d.1) − (3. and the sequence of signs is {+1.2) (4. In the sequel. Therefore. . random variables (random signs) with P (ξn = ±1) = 1/2. we shall learn some counting methods. . .2) where the ξn are i. . 1): (2. Since there are 2n elements in {0. Clearly. A ⊂ {−1. As a warm-up. +1}n . After all. If yes. 0) → (1. When the walk starts from 0 we can write Sn = ξ1 + · · · + ξn .1) + (0. −1} then we have a path in the plane that joins the points (0. Will the random walk ever return to where it started from? B. we are dealing with a uniform distribution on the sample space {−1. 1) → (4. to each initial position and to each sequence of n signs. and so #A P (A) = n . ξn = εn ) = 2−n . . we have P (ξ1 = ε1 . +1}n . where will it go? 26 The simple symmetric random walk in dimension 1: path counting This is a random walk with values in Z and transition probabilities pi.i+1 = pi. The principle is trivial.i−1 = 1/2 for all i ∈ Z. we should assign. how long will that take? C. a random vector with values in {−1. 1}n . +1. 2) → (5. all possible values of the vector are equally likely. Look at the distribution of (ξ1 .i. +1}n . Here are a couple of meaningful questions: A. Events A that depend only on these the ﬁrst n random signs are subsets of this sample space. consider the following: 78 . . who cares about the exact value of p∗n (x) for all n and x? Perhaps it is the case (a) that diﬀerent questions we need to ask or (b) that approximate methods are more informative than exact ones. +1. 2 where #A is the number of elements of A.1) + − (5.

Consider a path that starts from (0. and compute the probability of the event A = {S2 ≥ 0. (n − m + y − x)/2 (23) Remark. Proof. 0) and n ends at (n. So P (A) = 6/26 = 6/128 = 3/64 ≈ 0. (n. Let us ﬁrst show that Lemma 14.i) (0.1 Paths that start and end at speciﬁc points Now consider a path that starts from a speciﬁc point and ends up at a speciﬁc point.) The darker region shows all paths that start at (0. 0) and end up at (n. 0) and ends at (n. We can easily count that there are 6 paths in A. We use the notation {(m. i). S6 < 0. x) and end up at (n. n 79 . i). y)} =: {all paths that start at (m. (There are 2n of them. The paths comprising this event are shown below.0) The lightly shaded region contains all paths of length n that start at (0. i)} = n n+i 2 More generally. 0) and end up at the speciﬁc point (n. x) (n. 0)).Example: Assume that S0 = 0. the number of paths that start from (m. 0) (n. S7 < 0}. i). y) is given by: #{(m.047. y) is given by: #{(0. 2 1 3 0 1 2 4 5 6 7 −1 −2 −3 − −4 −7 26. y)} = n−m . In if n + i is not an even number then there is no path that starts from (0. S4 = −1. y)}. x) and end up at (n. in this case. x) (n. Hence (n+i)/2 = 0. The number of paths that start from (0.

Let u be the number of +’s in this path (the number of upward segments) and d the number of −’s (the number of downward segments). Then u + d = n, Hence u = (n + i)/2, d = (n − i)/2. So if we know where the path starts from and where it ends at we know how many times it has gone up and how many down. The question then becomes: given a sequence of n signs out of which u are + and the rest are −, in how many ways can we arrange them? Well, in exactly n ways, and this proves the formula. u Notice that u = (n + i)/2 must be an integer. This is true if n + i is even or, equivalently if n and i are simultaneously even or simultaneously odd. If this is not the case, then there is no path that can go from (0, 0) to (n, i). For example (look at the ﬁgure), there is no path that goes from (0, 0) to (3, 2). We shall thus deﬁne n to be zero if u is not an integer. u More generally, the number of paths that start from (m, x) and end up at (n, y) equals the number of paths that start from (0, 0) and end up at (n − m, y − x). So, applying the previous formula, we obtain (23). u − d = i.

26.2

**Paths that start and end at speciﬁc points and do not touch zero at all
**

(n, y) that never become n−m . − m − y − x)

Lemma 15. Suppose that x, y > 0. The number of paths (m, x) zero (i.e. they do not touch the horizontal axis) is given by: #{(m, x) (n, y); remain > 0} =

1 2 (n

n−m − − m + y − x)

1 2 (n

(24)

Proof.

(m,x) (n,y)

The shaded region contains all possible paths that start at (m, x) and end up at the speciﬁc point (n, y) which, in addition, never touch the horizontal axis.

n−m

Consider, instead, a path that starts at (m, x) and ends up at point (n, y) which DOES touch the horizontal axis, like the path indicated by the solid line below. There is a trick, 80

called reflection, which is useful here. Note that to each such path there corresponds another path obtained by following it up to the ﬁrst time it hits zero and then reﬂecting it around the horizontal axis. In the ﬁgure below, follow the solid path starting at (m, x) up to the ﬁrst time it hits zero and then follow the dotted path: this is the reﬂected path.

(n, y)

(m, x)

(n, −y)

A path and its reﬂection around a horizontal line. Notice that the reﬂection starts only at the point where the path touches the horizontal line.

Note that if we know the reﬂected path we can recover the original path. Hence we may as well count the number of reﬂected paths. But each reﬂected path starts from (m, x) and ends at (n, −y) and so it necessarily crosses the horizontal axis. We thus apply (23) with −y in place of y: #{(m, x) (n, y); touch zero} = #{(m, x) = So, for x > 0, y > 0, #{(m, x) (n, y); remain > 0} = #{(m, x) =

1 2 (n

n−m . (n − m − y − x)/2

(n, −y)}

n−m − − m + y − x)

(n, y)} − #{(m, x)

1 2 (n

(n, y); touch zero}

n−m . − m − y − x)

**Corollary 14. For any y > 0, #{(1, 1) Let m = 1, x = 1 in (24): #{(1, 1) (n, y); remain > 0} = n−1 n−1 − 1 + y) − 1 2 (n − y) − 1 (n − 1)! (n − 1)! − n−y = n+y n−y n−y ! ! 2 −1 ! 2 2 −1 ! 2 1n+y 1n−y n! = − n+y n 2 n 2 ! n−y ! 2 2
**

1 2 (n

(n, y); remain > 0} =

y n

#{(0, 0)

(n, y)}.

(25)

=

y n

n

n+y 2

,

and the latter term equals the total number of paths from (0, 0) to (n, y).

81

Corollary 15. The number of paths from (m, x) to (n, y) that do not touch level z < min(x, y) is given by: #{(m, x) (n, y); remain > z} = n−m − − m + y − x) n−m . − m − y − x) − z (26)

1 2 (n

1 2 (n

Proof. Because it is only the relative position of x and y with respect to z that plays any role, we have #{(m, x) (n, y); remain > z} = #{(m, x + z) (n, y + z); remain > 0}

and so the formula is obtained by an application of (24). Corollary 16. The number of paths from (0, 0) to (n, y), where y ≥ 0, that remain ≥ 0 is given by: n n #{(0, 0) (n, y); remain ≥ 0} = n+y − n+y (27) 2 2 +1 Proof. #{(0, 0) (n, y); remain ≥ 0} = #{(0, 0) = = where we used (26) with z = −1. Corollary 17 (Catalan numbers). The number of paths from (0, 0) to (n, 0) that remain ≥ 0 is given by: #{(0, 0) Proof. Apply (27) with y = 0: #{(0, 0) (n, 0); remain ≥ 0} = n

n 2

= #{(0, 1) n

n+y 2

(n, y); remain > −1} n −1 n , n+y 2 +1

n−y 2

(n, y + 1); remain > 0}

− −

n

n+y 2

(n, 0); remain ≥ 0} =

1 n+1 . n + 1 n/2

−

n 2

n −1

=

1 n+1 , n + 1 n/2

where the latter calculation is as in the proof of (25). Corollary 18. The number of paths that start from (0, 0), have length n, and remain nonnegative is given by: #{(0, 0) (n, •); remain ≥ 0} = n . ⌈n/2⌉ (28)

where ⌈x⌉ denotes the smallest integer that is ≥ x. 82

Proof. To see, this, just use (27) and sum over all y ≥ 0. If n = 2m then we need to set y = 2z in (27) (otherwise (27) is just zero) and so we have #{(0, 0) = (2m, •); remain ≥ 0} = 2m m+z − 2m m+z+1

z≥0

2m 2m 2m 2m 2m 2m 2m ± ··· + − + ··· + − + − 2m 2m 2m − 1 m+2 m+1 m+1 m n 2m , = = n/2 m

because all the terms that are denoted by · · · are equal to 0, while the rest cancel with one another, and only the ﬁrst one remains. If n = 2m + 1 then we need to set y = 2z + 1 in (27) (otherwise (27) is just zero) and so we have #{(0, 0) = (2m + 1, •); remain ≥ 0} = 2m + 1 2m + 1 − m+z+1 m+z+2

z≥0

2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 ±· · · + − +· · ·+ − + − 2m + 1 2m + 1 2m m+3 m+2 m+2 m+1 n 2m + 1 . = = ⌈n/2⌉ m+1

We have shown that the shaded regions contain exactly the same number of paths. Was this, a posteriori, obvious?

2m

2m

27

The simple symmetric random walk in dimension 1: simple probability calculations

Since all paths of certain length have the same probability, we can translate the path counting formulae into corresponding probability formulae.

27.1

**Distribution after n steps
**

n 2−n . (n + y)/2 83

Lemma 16. P0 (Sn = y) =

if n is even. 1) (n. P0 (S1 ≥ 0 . y ≥ 0 are both even or both odd then P0 (S1 ≥ 0 . n−m 2−n−m . 0) (n. n/2 (29) #{(0. y)} . the random walk never becomes zero up to time n is given by P0 (S1 > 0. 27. if n is even. y)} . Sn > 0 | Sn = y) = y . Such a simple answer begs for a physical/intuitive explanation.P (Sn = y|Sm = x) = In particular. S2 > 0. The probability that. y)} × 2−n n−1 n−1 − 1 1 (n − 1 + y − 1) 2 (n − 1 − y − 1) = 2 n 1 (n + y) 2 = y . Sn ≥ 0 | Sn = y) = 1 − In particular. . (n/2) + 1 y+1 P0 (Sn = y + 2) = 1 . Sn ≥ 0 | Sn = 0) = 84 1 . . n 2−n . n This is called the ballot theorem and the reason will be given in Section 29. x) 2n−m (n. . But. . P0 (Sn = y) 2 (n + y) + 1 . . The left side of (30) equals #{paths (1.2 The ballot theorem Theorem 25. . for now. (n − m + y − x)/2 u(n) := P0 (Sn = 0) = Proof. We have P0 (Sn = y) = and P (Sn = y|Sm = x) = and we apply (23). n 2 #{paths (m. If n. 0) (n.3 Some conditional probabilities Lemma 17. . y) that do not touch zero} × 2−n #{paths (0. given that S0 = 0 and Sn = y > 0. . n (30) Proof. . . 27. we shall just compute it.

ξ3 . Sn ≥ 0) = P0 (S1 = 0. . ξ3 . . . and ﬁnally (g) is trivial because Sn−1 ≥ 0 means Sn−1 > 0 because Sn−1 cannot take the value 0. . . . . . . . . (c) is just conditioning on {ξ1 = 1} and using the fact that S1 = ξ1 . . . We use (27): P0 (Sn ≥ 0 . . . .). (b) follows from symmetry. +1 27. •). 1 + ξ2 + · · · + ξn > 0 | ξ1 = 1) = P0 (S1 ≥ 0. . Sn ≥ 0 | Sn = y) = n = n+y 2 P0 (Sn ≥ 0 . . . .Proof. and Sn−1 > 0 implies that Sn ≥ 0. . P0 (S1 ≥ 0. . . . . . . Proof. Sn < 0) (b) (a) = 2P0 (S1 > 0. . . Sn = 0) = P0 (Sn = 0). . . Sn > 0 | ξ1 = 1) = P0 (1 + ξ2 > 0. . Sn = 0) = P0 (S1 > 0. Sn > 0) = 2P0 (ξ1 = 1)P0 (S2 > 0. . . . If n is even. and from the fact that if m is integer then 1 + m > 0 is equivalent to m ≥ 0. For even n we have P0 (S1 ≥ 0. Sn > 0) + P0 (S1 < 0. (e) follows from the fact that ξ1 is independent of ξ2 . . . . . (d) follows from P0 (ξ1 = 1) = 1/2 and from the substitution of the value of ξ1 on the left of the conditioning. We next have P0 (S1 = 0. Sn ≥ 0. Sn−1 ≥ 0) = P0 (ξ2 ≥ 0. . 1 + ξ2 + ξ3 > 0. ξ2 + · · · + ξn ≥ 0) = P0 (S1 ≥ 0. . .) by (ξ1 . . . . Sn = y) P0 (Sn = y) =1− P0 (Sn = y + 2) . remain ≥ 0} = 2n n 2−n = P0 (Sn = 0). . . 0) (n. . . . P0 (Sn = y) − n n+y 2 n +1 n+y 2 This is further equal to 1− ( n+y 2 ( n! n−y + 1)! ( 2 − 1)! n+y 2 )! ( n−y )! 2 =1− n! n−y 2 n+y 2 + 1 = n+y 2 y+1 . . . . 85 . . ξ2 + ξ3 ≥ 0. . (c) (d) (e) (f ) (g) where (a) is obvious. . . . n/2 where we used (28) and (29). (f ) follows from the replacement of (ξ2 . . Sn ≥ 0). . . .4 Some remarkable identities Theorem 26. ξ2 . . Sn ≥ 0) = #{(0.

2 and the last probability can be computed from the ballot theorem–see (30): it is equal to 1/(n − 1). we either have Sn−1 = 1 or −1. 86 . . we can omit the last event. Sn−1 < 0. f (n) = P0 (S1 > 0. P0 (Sn−1 > 0. .27. So f (n) = 2P0 (S1 > 0. So u(n) f (n) = . n−1 Proof. Sn = 0) + P0 (S1 < 0. We now want to ﬁnd the probability f (n) that the walk will return to 0 for the ﬁrst time in n steps. Sn = 0) = P0 (Sn = 0) P (Sn−1 > 0|Sn = 0). Sn−1 = 0. . Sn = 0) = 2P0 (Sn−1 > 0. To take care of the last term in the last expression for f (n). . for even n. given that Sn = 0. Sn = 0) = u(n) . Sn = 0) P0 (S1 > 0. n We know that P0 (Sn = 0) = u(n) = n/2 2−n . . Theorem 27. If a particle is to the right of a then you see its image on the left.5 First return to zero In (29). and if the particle is to the left of a then its mirror image is on the right. . . Then f (n) := P0 (S1 = 0. Fix a state a. we see that Sn−1 > 0. So if the particle is in position s then its mirror image is in position s = 2a − s because s − a = a − s. the probability u(n) that the walk (started at 0) attains value 0 in n steps. By symmetry. Sn−2 > 0|Sn−1 = 1). . . So the last term is 1/2. . This is not so interesting. By the Markov property at time n − 1. Sn−1 > 0. Put a two-sided mirror at a. Sn = 0. . . Sn = 0). with equal probability. . Sn−2 > 0|Sn−1 > 0. . the two terms must be equal. Sn = 0) Now. Sn−1 > 0. Sn = 0 means Sn−1 = 1. So 1 f (n) = 2u(n) P0 (S1 > 0. and u(n) = P0 (Sn = 0). . . . . Since the random walk cannot change sign before becoming zero. . It is clear that if a particle starts at 0 and performs a simple symmetric random walk then its mirror image starts at a and performs a simple symmetric random walk. . . Also. Let n be even. . n−1 28 The reﬂection principle for a simple random walk in dimension 1 The reflection principle says the following. . we computed. .

S1 . If Sn ≥ a then Ta ≤ n. n ≥ Ta . the process (STa +m − a. n ≥ Ta By the strong Markov property. Sn ≥ a). Sn > a) = P0 (Ta ≤ n. Combining the last two displays gives P0 (Ta ≤ n) = P0 (Sn ≥ a) + P0 (Sn > a) = P0 (Sn ≥ a) + P0 (Sn ≥ a) − P0 (Sn = a). 2 Proof. . Trivially. So: P0 (Sn ≥ a) = P0 (Ta ≤ n. n < Ta . Then 1 P0 (Ta ≤ n) = 2P0 (Sn ≥ a) − P0 (Sn = a) = P0 (|Sn | ≥ a) − P0 (|Sn | = a). So Sn − a can be replaced by a − Sn in the last display and the resulting process. as a corollary to the above result. so we may replace Sn by 2a − Sn (Theorem 28). n ≥ 0) be a simple symmetric random walk and a > 0. m ≥ 0) is a simple symmetric random walk. Proof. In the last event.1 Distribution of hitting time and maximum We can take advantage of this principle to establish the distribution function of Ta : Theorem 29. Theorem 28. Finally. But P0 (Ta ≤ n) = P0 (Ta ≤ n. 28. deﬁne Sn = where is the ﬁrst hitting time of a. we have n ≥ Ta . Let (Sn . . called (Sn . Thus. Then. Sn ). we obtain 87 . If (Sn ) is a simple symmetric random walk started at 0 then so is (Sn ). n < Ta .What is more interesting is this: Run the particle till it hits a (it will. 2a − Sn ≥ a) = P0 (Ta ≤ n. In other words. observing that Sn has the same distribution as −Sn we get that this is also equal to P0 (|Sn | ≥ a) − 1 P0 (|Sn | = a). . Then Ta = inf{n ≥ 0 : Sn = a} Sn . The last equality follows from the fact that Sn > a implies Ta ≤ n. Sn ≤ a) + P0 (Sn > a). n ∈ Z+ ). n ∈ Z+ ) has the same law as (Sn . 2 Deﬁne now the running maximum Mn = max(S0 . independent of the past before Ta . P0 (Sn ≥ a) = P0 (Ta ≤ n. Sn ≤ a) + P0 (Ta ≤ n. But a symmetric random walk has the same law as its negative. . for sure–Theorem 35) and after that consider its mirror image Sn = 2a − Sn . Sn = Sn . a + (Sn − a). 2a − Sn . Sn ≤ a).

Rend. and anciently for holding ballots to be drawn. then the probability that. (1887). We can do more: we can derive the joint distribution of Mn and Sn : Theorem 30. the number of selected azure items is always ahead of the black ones equals (a − b)/n. where ηi indicates the colour of the i-th item picked.Corollary 19. what is the probability that the number of azure items is constantly ahead of the number of black ones? The problem was solved10 in the same year. Observe that Mn ≥ x if and only if Sk ≥ x for some k ≤ n which is equivalent to Tx ≤ n and use Theorem 29. Compt. we have P0 (Mn < x. posed in 18879 asks the following: if we start picking the items from the urn one by one without replacement. n = a + b. The set of values of η contains n elements and η is uniformly distributed a in this set. Sn = y) = P0 (Sn = y) − P0 (Sn = 2x − y) . e e e Paris. 9 Bertrand. J. Compt. the ashes of the dead after cremation. Bertrand. gives the result. 105. Sn = y) = P0 (Sn = y) − P0 (Tx ≤ n. Sn = 2x − y). 369. Sn = y) = P0 (Tx > n. Acad. 105. If an urn contains a azure items and b = n−a black items. e 10 Andr´. Solution directe du probl`me r´solu par M. 8 88 . 1 P0 (Mn ≥ x) = P0 (Tx ≤ n) = 2P0 (Sn ≥ x) − P0 (Sn = x) = P0 (|Sn | ≥ x) − P0 (|Sn | = x). then. (1887). . Sci. 29 Urns and the ballot theorem Suppose that an urn8 contains n items. combined with (31). 436-437. Sn = y). . Sn = y) = P0 (Tx ≤ n. But if x > y then 2x − y > x and so {Sn = 2x − y} ⊂ {Tx ≤ n}. employed for diﬀerent purposes. P0 (Mn < x. Solution d’un probl`me. Acad. Proof. as long as a ≥ b. The original ballot problem. Sci. by applying the reﬂection principle (Theorem 28). during sampling without replacement. as for holding liquids. ornamental objects. Rend. (31) x > y. 2 Proof. . usually a vase furnished with a foot or pedestal. If Tx ≤ n. Sn = y) = P0 (Sn = 2x − y). we can replace Sn by 2x − Sn in the last part of the last display and get P0 (Tx ≤ n. Since Mn < x is equivalent to Tx > n. . of which a are coloured azure and b black. ηn ). which results into P0 (Tx ≤ n. Paris. A sample from the urn is a sequence η = (η1 . It is the latter use we are interested in in Probability and Statistics. An urn is a vessel. This. The answer is very simple: Theorem 31. D.

We have P0 (Sn = k) = n n+k 2 pi. . using the deﬁnition of conditional probability and Lemma 16. 89 . Since the ‘azure’ items correspond to + signs (respectively. Sn > 0} that the random walk remains positive. we have: P0 (ξ1 = ε1 . . . but which is not necessarily symmetric. ξn ) of the increments is an urn process with parameters n and a = (n + y)/2. But if Sn = y then. We need to show that. εn be elements of {−1. always assuming (without any loss of generality) that a ≥ b = n − a. . . ηn ) an urn process with parameters n. . so ‘azure’ items are +1 and ‘black’ items are −1. b be deﬁned by a + b = n. . conditional on the event {Sn = y}. . . Consider a simple symmetric random walk (Sn . (ξ1 . . . conditional on {Sn = y}. . . .) So pi. a can be realised by a simple symmetric random walk starting from 0 and ending at Sn = y. . 30 The asymmetric simple random walk in dimension 1 Consider now a simple random walk with values in Z. ξn ) is. letting a. n Since y = a − (n − a) = a − b. the sequence (ξ1 . . (The word “asymmetric” will always mean “not necessarily symmetric”. i ∈ Z. −n ≤ k ≤ n. a − b = y. But in (30) we computed that y P0 (S1 > 0. . . where y = a − (n − a) = 2a − n. . the sequence (ξ1 . we can translate the original question into a question about the random walk. . . the event that the azure are always ahead of the black ones during sampling is precisely the event {S1 > 0. Sn > 0 | Sn = y) = . ξn = εn | Sn = y) = P0 (ξ1 = ε1 .i−1 = q = 1 − p. .i+1 = p. . Proof. . . ξn = εn ) P0 (Sn = y) 2−n 1 = = . +1} such that ε1 + · · · + εn = y.We shall call (η1 . . the result follows. . n . . indeed. Then. ξn ) is uniformly distributed in a set with n elements. . . a. p(n+k)/2 q (n−k)/2 . Since an urn with parameters n. An urn can be realised quite easily be a random walk: Theorem 32. the ‘black’ items correspond to − signs). . n n −n 2 (n + y)/2 a Thus. . The values of each ξi are ±1. . Then. . So the number of possible values of (ξ1 . . Proof of Theorem 31. . n ≥ 0) with values in Z and let y be a positive integer. Let ε1 . . . It remains to show that all values are equally a likely. a we have that a out of the n increments will be +1 and b will be −1. conditional on {Sn = y}. . ξn ) has a uniform distribution. .

. Then the probability generating function of the ﬁrst time that level 1 is reached is given by ϕ(s) = E0 sT1 = 1− 1 − 4pqs2 . In view of this. . This random variable takes values in {0. Consider a simple random walk starting from 0. E0 sTx = ϕ(s)x . . Consider a simple random walk starting from 0. . Lemma 19. If x > 0.i. . plus a ′′ further time T1 required to go from 0 to +1 for the ﬁrst time. plus a time T1 required to go from −1 to 0 for the ﬁrst time. x − 1 in order to reach x. Theorem 33. Let ϕ(s) := E0 sT1 . We will approach this using probability generating functions.} ∪ {∞}. Start with S0 = 0 and watch the process until the ﬁrst n = T1 such that Sn = 1. The rest is a consequence of the strong Markov property. Lemma 18. For all x. The process (Tx . it is no loss of generality to consider the random walk starting from 0. Px (Ty = t) = P0 (Ty−x = t). . random variables. If S1 = 1 then T1 = 1. If S1 = −1 then T1 is the sum of three times: one unit of time spent ′ to go from 0 to −1. we make essential use of the skip-free property.1 First hitting time analysis We will use probabilistic + analytical methods to compute the distribution of the hitting times Tx = inf{n ≥ 0 : Sn = x}. In view of Lemma 19 we need to know the distribution of T1 .30. x ≥ 0) is also a random walk.) Then. for x > 0. 2. y ∈ Z. Proof. then the summands in Tx = T1 + (T2 − T1 ) + · · · + (Tx − Tx−1 ) are i. Corollary 20. only the relative position of y with respect to x is needed. (We tacitly assume that S0 = 0. 90 . and all t ≥ 0. 2.d. namely the obvious fact that a simple random walk starting from 0 must pass through levels 1. 1. 2qs Proof. Proof. To prove this. Consider a simple random walk starting from 0. . This follows from the spatial homogeneity of the transition probabilities: to reach y starting from x. x ∈ Z. See ﬁgure below.

Sn 1 0 −1 n T1’ T1’’ If S1 = −1 then. T1 is independent of T1 conditional on S1 = −1 and each with distribution that of T1 . Corollary 21. Consider a simple random walk starting from 0. Then the probability generating function of the ﬁrst time that level x ∈ Z is reached is given by √ x E sT1 x = 1− 1−4pqs2 . Hence the previous formula applies with the roles of p and q interchanged. The ﬁrst time n such that Sn = −1 is the ﬁrst time n such that −Sn = 1. 91 . We thus have ′ ′′ ′ ′′ ′ ′′ 1 ′ ′′ 1 + T1 + T 1 if S1 = 1 if S1 = −1 ϕ(s) = ps + qsϕ(s)2 . Then the probability generating function of the ﬁrst time that level −1 is reached is given by E0 sT−1 = 1− 1 − 4pqs2 . 2ps Proof. Use Lemma 19 and the previous formulae of Theorem 33 and Corollary 21. from the strong Markov property. if x < 0 2ps Proof. 0 if x > 0 2qs Tx (32) E0 s = √ |x| E sT−1 |x| = 1− 1−4pqs2 0 . Consider a simple random walk starting from 0. Corollary 22. in order to hit 1. Hence T1 = This can be used as follows: ϕ(s) = E0 [s1(S1 = 1) + s1+T1 +T1 1(S1 = −1)] = sP (S1 = 1) + sE0 [sT1 sT1 |S1 = −1]P (S1 = −1) = sp + sE0 [sT1 sT1 |S1 = −1]q ′′ ′ But. But note that −Sn is a simple random walk with upward transition probability q and downward transition probability p. the RW must ﬁrst hit 0 (it takes time ′ ′′ T1 to do this) and then 1 (it takes a further time of T1 ). The formula is obtained by solving the quadratic.

2ps ′ =1− 30. We again do ﬁrst-step analysis. in distribution. then T0 equals 1 plus a time whose distribution that of the time required for the walk to hit −1 starting from 0. how long will it take till it ﬁrst returns to 0? We are talking about the stopping time ′ T0 := inf{n ≥ 1 : Sn = 0}.30. ′ But if S1 = −1 then T0 equals 1 plus the time required for the walk to hit 0 starting from −1. Consider a simple random walk starting from 0. 1(S1 = −1) ′ + E0 s T 0 ′ ′ 1(S1 = 1) = qE0 sT0 | S1 = −1 + pE0 sT0 | S1 = 1 . k by the binomial theorem. and simply write (1 + x)n = k≥0 n k x . (Pay attention: it is important to write n ≥ 1 inside the set!) Theorem 34. For the general case. The latter is. Similarly. this was found in §30.2 First return to the origin Now let us ask: If the particle starts at 0.3 The distribution of the ﬁrst return to the origin We wish to ﬁnd the exact distribution of the ﬁrst return time to the origin. k 92 .2: P0 (T0 = n) = f (n) = u(n)/(n − 1) is n is even. then we can omit the k upper limit in the sum. For a symmetric ′ random walk. Then the probability generating function of the ﬁrst return time to 0 is given by E0 s T 0 = 1 − Proof. E0 sT0 = qE0 (s1+T1 ) + pE0 (s1+T−1 ) = qs 1− 1 − 4pqs2 1− + ps 2qs 1 − 4pqs2 . where α is a real number. we have computed the probability generating function E0 sT0 –see Theorem 34 The tool we need here is Taylor’s theorem for the function f (x) = (1 + x)α . If α = n is a positive integer then n (1 + x)n = k=0 n k x . 1 − 4pqs2 . if S1 = 1. E0 s T 0 = E0 s T 0 ′ ′ ′ 1 − 4pqs2 . the same as the time required for the walk to hit 1 starting ′ from 0. If we deﬁne n to be equal to 0 for k > n.

2k − 1 k 93 . √ 1 − x.Recall that n k = It turns out that the formula is formally true for general α. We have k≥0 1 − x = (1 − x)1/2 = 1/2 (−1)k xk . For example. and it looks exactly the same: (1 + x)α = k≥0 n(n − 1) · · · (n − k + 1) n! = . k! k! (−1)2k xk = k≥0 xk . (−1)k = 2k − 1 k k and this is shown by direct computation. k We can write this in more familiar symbols if we observe that 2k −k 1 1/2 4 . So √ 1−x= ′ k≥0 1 2k k x =1+ 2k − 1 k 1 − 4pqs2 : ′ k≥1 2k k 1 x 2k − 1 k We apply this to E0 sT0 = 1 − E0 s T 0 = k≥1 1 2k (pq)k s2k . −1 k so (1 − x)−1 = as needed. As another example. but that we won’t worry about which ones. k as long as we deﬁne α k = α(α − 1) · · · (α − k + 1) . Indeed. k! The adverb ‘formally’ is supposed to mean that this formula holds for some x around 0. (1 − x)−1 = But. by deﬁnition. k!(n − k)! k! α k x . let us compute √ (−1)k (−x)k = k≥0 k≥0 k≥0 −1 (−x)k k = (−1)k k! (−1)(−2) · · · (−1 − k + 1) = = (−1)k . we will show that (1 − x)−1 = 1 + x + x2 + x3 + · · · which is of course the all too-familiar geometric series.

by the deﬁnition of E0 sT0 . By applying the law of large numbers we have that P0 Sn =p−q n→∞ n lim = 1. 94 . Comparing the two ‘inﬁnite polynomials’ we see that we must have n = 2k (we knew that– the random walk can return to 0 only in an even number of steps) and then ′ P0 (T0 = 2k) = 1 1 2k k k p q = P0 (S2k = 0). • If p = 1/2 then p − q = 0 and the law of large numbers only tells us that the walk is not positive recurrent. • If p > 1/2 then p − q > 0 and so Sn tends to +∞ with probability one. That it is actually null-recurrent follows from Markov chain theory. P0 (Tx < ∞) = p q q p ∧1 ∧1 . Hence it is again transient. The key fact needed here is: P0 (Tx < ∞) = lim E0 sTx . Hence it is transient. if p = q then P0 (Tx < ∞) = 1 for all x ∈ Z. x if x > 0 . We will see how the results about the generating functions of the hitting and return times help us answer these questions directly. The quantity inside the square root becomes 1 − 4pq. if x < 0 This formula can be written more neatly as follows: Theorem 35. if x < 0 |x| In particular. 2k − 1 k 2k − 1 31 Recurrence properties of simple random walks Consider again a simple random walk Sn with values in Z with upward probability p and downward probability q = 1 − p. |x| if x > 0 (33) . x 1−|p−q| 2q 1−|p−q| 2p . so 1 − 4pq = So P0 (Tx < ∞) = 1 − 4p + 4p2 = (1 − 2p)2 = |1 − 2p| = |p − q|.But. We apply this to (32). without appeal to the Markov chain theory. s↑1 which is true for any probability generating function. But remember that q = 1 − p. E0 s T 0 = n≥0 ′ ′ ′ P0 (T0 = n)sn . • If p < 1/2 then p − q < 0 and so Sn tends to −∞ with probability one.

This means that there is a largest value that will be ever attained by the random walk.Proof. Theorem 36. then we have M∞ = limn→∞ Mn . x ≥ 0. we obtain what we are looking for. P0 (M∞ ≥ x) = lim P0 (Mn ≥ x) = lim P0 (Tx ≤ n) = P0 (Tx < ∞) = (p/q)x . Then limn→∞ Sn = −∞ with probability 1. This value is random and is denoted by M∞ = sup(S0 . 95 . except that the limit may be ∞. Then the overall maximum M∞ has a geometric distribution: P0 (M∞ ≥ x) = (p/q)x .2 Return probabilities Only in the symmetric case p = q = 1/2 will the random walk return to the point where it started from. S1 . If p < q then |p − q| = q − p and. . Let ′ T0 = inf{n ≥ 1 : Sn = 0} be the ﬁrst return time to 0. the sequence Mn is nondecreasing. What is this probability? Theorem 37.1 Asymptotic distribution of the maximum Suppose now that p < 1/2. . If p > q then |p − q| = p − q and so (33) gives P0 (Tx < ∞) = 1. . The last part of Theorem 35 tells us that it is recurrent. Otherwise. . but here it is < ∞. Consider a simple random walk with values in Z. if x > 0 . . n→∞ n→∞ x ≥ 0. it will never ever return to the point where it started from. We know that it cannot be positive recurrent. this probability is always strictly less than 1. if x < 0 which is what we want. Then ′ P0 (T0 < ∞) = 2(p ∧ q). every nondecreasing sequence has a limit. Unless p = 1/2. with probability 1 because p < 1/2. 31. Corollary 23. q p. Proof. Sn ). . Proof. Assume p < 1/2. Indeed. S2 . with some positive probability. The simple symmetric random walk with values in Z is null-recurrent. Consider a random walk with values in Z. again. 31.) If we let Mn = max(S0 . . and so it is null-recurrent.

meaning that Pi (Ji = ∞) = 1. . But if it is transient. And so P0 (J0 ≥ k) = P0 (J0 ≥ 1)k .′ Proof. And if p < q this gives 1 − (q − p) = 1 − (1 − p − p) = 2p. even for p = 1/2. 2. starting from zero. In the latter case. we obviously have that the same formulae apply for any i: Pi (Ji ≥ k) = (2(p ∧ q))k . . k = 0. Then. 2. k = 0. . In Lemma 4 we showed that. Ei Ji = 1 − 2(p ∧ q) Remark 2: The formulae work for all values of p ∈ [0. So ′ P0 (J0 ≥ 1) = P0 (T0 < ∞) = 2(p ∧ q). where the last probability was computed in Theorem 37. 1]. If p = q this gives 1. 2. . This proves the ﬁrst formula. . . But J0 ≥ 1 if and only if the ﬁrst return time to zero is ﬁnite. The probability generating function of T0 was obtained in Theorem 34. k = 0. Consider a simple random walk with values in Z. as already known. Theorem 38. 1. 1. For the second formula we have ∞ ∞ E0 J0 = k=1 P0 (J0 ≥ k) = k=1 (2(p ∧ q))k = 2(p ∧ q) . We now compute its exact distribution. . 2(p ∧ q) . We now look at the the number of visits to some state but if we start from a diﬀerent state. . 1 − 2(p ∧ q) this being a geometric series. Remark 1: By spatial homogeneity. 96 . the total number J0 of visits to state 0 is geometrically distributed with P0 (J0 ≥ k) = (2(p ∧ q))k . 31. 2(p ∧ q) E0 J0 = . But ′ P0 (T0 < ∞) = lim EsT0 = 1 − s↑1 ′ 1 − 4pq = 1 − |p − q|.3 Total number of visits to a state If the random walk is recurrent then it visits any state inﬁnitely many times. (34) a random variable ﬁrst considered in Section 10 where it was also shown to be geometric. . 1 − 2(p ∧ q) Proof. if a Markov chain starts from a state i then Ji is geometric. 1. . If p > q this gives 1 − (p − q) = 1 − (1 − q − q) = 2q. the total number of visits to a state will be ﬁnite. The total number of visits to state i is given by ∞ Ji = n=1 1(Sn = i). Pi (Ji ≥ k) = 1 for all k.

∞ As for the expectation. The reason that we replaced k by k − 1 is because. If p ≤ 1/2 then this gives P0 (Jx ≥ k) = (p/q)x (2p)k−1 . Remark 1: The expectation E0 Jx is ﬁnite if p = 1/2. The ﬁrst term was computed in Theorem 35. If p ≤ 1/2 then this gives P0 (Jx ≥ k) = (2p)k−1 . we have E0 Jx = 1/(1−2p) regardless of x. (2(p ∧ q))k−1 . Jx ≥ k). Then P0 (Tx < ∞.e.e. Consider a random walk with values in Z. E0 Jx = Proof. In other words. So we obtain P0 (Jx ≥ k) = p ∧1 q x (p/q)x∨0 1 − 2p k≥1 . 97 . we apply E0 Jx = k=1 P0 (Jx ≥ k) and perform the sum of a geometric series. there has already been one visit to state x and so we need at least k − 1 remaining visits in order to have at least k visits in total.Theorem 39. if the random walk “drifts down” (i. For x > 0. starting from 0. If p ≥ 1/2 then this gives P0 (Jx ≥ k) = (q/p)|x| (2q)k−1 . i. Remark 2: If the initial state is not 0 then use spatial homogeneity to obtain the analogous formulae. If p ≥ 1/2 then this gives P0 (Jx ≥ k) = (2q)k−1 . Jx ≥ k) = P0 (Tx < ∞)Px (Jx ≥ k − 1). (q/p)(−x)∨0 1 − 2q k≥1 . then E0 Jx drops down to zero geometrically fast as x tends to inﬁnity. Py (Jx ≥ k) = P0 (Jx−y ≥ k) and Ey Jx = E0 Jx−y . We repeat the argument for x < 0 and ﬁnd P0 (Jx ≥ k) = q ∧1 p |x| (2(p ∧ q))k−1 . Suppose x > 0. Apply now the strong Markov property at Tx . Then P0 (Jx ≥ k) = P0 (Tx < ∞. it will visit any point below 0 exactly the same number of times on the average. p < 1/2) then. If p ≤ 1/2 then P0 (Jx ≥ k) = (p/q)x∨0 (2p)k−1 . But for x < 0. given that Tx < ∞. simply because if Jx ≥ k ≥ 1 then Tx < ∞. and the second in Theorem 38. k ≥ 1. Consider the case p < 1/2. E0 Jx = If p ≥ 1/2 then P0 (Jx ≥ k) = (q/p)(−x)∨0 (2q)k−1 .

Thus.i. .i. (36) 2 Lastly. . a sum of i. 0 ≤ k ≤ n) is deﬁned by Sk := Sn − Sn−k . 0 ≤ k ≤ n). Rewrite the ballot theorem terms of the dual: P0 (S1 > 0. random variables. . n Proof. Then P0 (Tx = n) = x P0 (Sn = x) . (Sk . i=1 Then the dual (Sk . . 0 ≤ k ≤ n − 1.e. . the probabilities P0 (Tx = n) = x P0 (Sn = x). Sn = x) P0 (Tx = n) = . Sn > 0 | Sn = x) = But the left-hand side is P0 (Sn −Sn−k > 0. ξn−1 . we derived using duality. every time we have a probability for S we can translate it into a probability for S. In Lemma 19 we observed that. n x > 0.32 Duality Consider random walk Sk = k ξi . (35) In Theorem 29 we used the reﬂection principle and derived the distribution function of Tx for a simple symmetric random walk: 1 P0 (Tx ≤ n) = P0 (|Sn | ≥ x) − P0 (|Sn | = x). . Use the obvious fact that (ξ1 . ξn ) has the same distribution as (ξn . Theorem 40. which is basically another probability for S.d. 0 ≤ k ≤ n) of (Sk . in Theorem 41. i. for a simple random walk and for x > 0. . n (37) 98 . Let (Sk ) be a simple symmetric random walk and x a positive integer. Proof. each distributed as T1 . . 0 ≤ k ≤ n) has the same distribution as (Sk .d. . ξ1 ). for a simple symmetric random walk. P0 (Sn = x) P0 (Sn = x) x . because the two have the same distribution. . = 1− √ 1 − s2 s x . Fix an integer n. random variables. Pause for reﬂection: We have been dealing with the random times Tx for simple random walks (symmetric or asymmetric) from the beginning of the lectures. . ξ2 . . Here is an application of this: Theorem 41. Let Tx = inf{n ≥ 1 : Sn = x} be the ﬁrst time that x will be visited. Tx is the sum of x i. and thus derived the generating function of Tx : E0 sTx = Specialising to the symmetric case we have E0 s Tx 1− 1 − 4pqs2 2qs x . 1 ≤ k ≤ n | Sn = x) = P0 (Sk < x.

We remark here something methodological: It was completely diﬀerent techniques that led to results about, essentially, the same thing: the distribution of Tx . This is an important lesson to learn. Of course, the three formulae should be compatible: the number (37) should be the coeﬃcient of sn in (35); summing up (37) should give (36), or taking diﬀerences in (36) should give (37). But on trying to verify these things algebraically, the reader will realise that it is not easy.

33

Amount of time that a SSRW is positive*

Consider a SSRW (Sn ) starting from 0. Deﬁne its trajectory as the polygonal line obtained by joining the points (n, Sn ) in the plane, as described earlier. Watch this trajectory up to time 2n. Let T+ (2n) be the number of sides of the polygonal line that are above the horizontal axis and T− (2n) = 2n − T+ (2n) the number of sides that are below. Clearly, T± (2n) are both even numbers because, if the RW is at 0, it takes an even number of steps to return to 0. We can think of T+ (2n) as the number of leads in 2n coin tosses, where we

0

T 0’

′ Figure 1: Part of the time the SSRW is positive and part negative. After it returns to 0 at time T0 ,

it behaves again like a SSRW and is independent of the past.

have a lead at some time if the number of heads till that time exceeds the number of tails. We are dealing with a fair coin which is tossed 2n times, so the event that the number of heads equals k has maximum probability when k = n (heads=tails). Most people take this statement and translate it into a statement about T+ (2n) arguing that P (T+ (2n) = m) is maximised when m = n, i.e. when T+ (2n) = T− (2n). This is wrong. In fact, as the theorem below shows, the most likely value of P (T+ (2n) = m) is when m = 0 or m = 2n. Theorem 42 ( distribution of the total time the random walk is positive ). P (T+ (2n) = 2k) = P (S2k = 0)P (S2n−2k = 0) , 0 ≤ k ≤ n.

′ Proof. The idea is to condition at the ﬁrst time T0 that the RW returns to 0: n ′ P (T+ (2n) = 2k, T0 = 2r). r=1

P (T+ (2n) = 2k) =

99

′ Between 0 and T0 the random walk is either above the horizontal axis or below; call the ﬁrst event A+ and the second A− and write n ′ ′ P (A+ , T0 = 2r)P (T+ (2n) = 2k | A+ , T0 = 2r) n ′ ′ P (A− , T0 = 2r)P (T+ (2n) = 2k | A− , T0 = 2r)

P (T+ (2n) = 2k) =

r=1

+

r=1

**In the ﬁrst case,
**

′ ′ P (T+ (2n) = 2k | A+ , T0 = 2r) = P (T+ (2n) = 2k−2r | A+ , T0 = 2r) = P (T+ (2n−2r) = 2k−2r), ′ because, if T0 = 2r and A+ occurs then the RW has already spent 2r units of time above the axis and so T+ (2n) is reduced by 2r. Furthermore, we may remove the conditioning ′ because the future evolution of the RW after T0 is independent of the past. In the second case, a similar argument shows ′ ′ P (T− (2n) = 2k | A− , T0 = 2r) = P (T− (2n) = 2k | A− , T0 = 2r) = P (T− (2n − 2r) = 2k).

**We also observe that, by symmetry, 1 ′ ′ ′ P (A+ , T0 = 2r) = P (A− , T0 = 2r) = P (T0 = 2r) = f2r . 2 Letting p2k,2n = P (T+ (2n) = 2k), we have shown that p2k,2n = 1 2
**

k

**f2r p2k−2r,2n−2r +
**

r=1

1 2

n−k

**f2r p2k,2n−2r
**

r=1

We want to show that p2k,2n = u2k u2n−2k , where u2k = P (S2k = 0). With this hypothesis, we can start playing with the recursion of the display above and using induction on n we will realise that p2k,2n 1 = u2n−2k 2

k r=1

1 f2r u2k−2r + u2k 2

n−k

f2r u2n−2k−2r .

r=1

Explicitly, we have P (T+ (2n) = 2k) = 2k k 2n − 2k −2n 2 , n−k 0 ≤ k ≤ n. (38)

With 2n = 100, we plot the function P (T100 = 2k) for 0 ≤ k ≤ 50 below. Notice that it is maximised at the ends of the interval.

100

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0

10

20 k

30

40

50

**The arcsine law*
**

Use of Stirling’s approximation in formula (38) gives P (T+ (2n) = 2k) ∼ 1/π k(n − k) , as k → ∞ and n − k → ∞.

**Now consider the calculation of the following probability: P T+ (2n) ∈ [a, b] 2n =
**

na≤k≤nb

P

T+ (2n) k = 2n n 1/π k(n − k) =

a≤k/n≤b

∼

a≤k/n≤b

1 . k k (n − n ) n n

1/π

**This is easily seen to have a limit as n → ∞, the limit being
**

b

1/π t(1 − t)

dt,

a

because the last sum is simply a Riemann approximation to the last integral. This shows the celebrated arcsine law: x √ T+ (2n) 2 1/π lim P dt = arcsin x. ≤x = n→∞ 2n π t(1 − t) 0 We thus have that the limiting distribution of T+ (2n) (the fraction of time that the RW is √ 2 positive), as n → ∞, is the distribution function π arcsin x, 0 < x < 1. This is known as the arcsine law. This distribution function has density √ 1/π , 0 < x < 1, the plot of

x(1−x)

which is amazingly close to the plot of the ﬁgure above. Now consider again a SSRW and consider the hitting times (Tx , x ≥ 0). This is another random walk because the summands in Tx = x (Ty − Ty−1 ) are i.i.d. all distributed as y=1 T1 : 2n −2n 1 P (T1 ≥ 2n) = P (S2n = 0) = 2 ∼√ . n πn 101

ξ2 . . n ≥ 1. We have: 102 . the random 1 2 variables ξn . y) where x. The two random walks are not independent. 0) and. 1)) = P (ξn = (−1. .From this. x1 − x2 ). i=1 ξi i=1 ξi 2 1 Lemma 20. i. So Sn = (Sn . map (x1 . . −1)) = 1/2. 34 Simple symmetric random walk in two dimensions Consider a drunkard in a city whose roads are arranged in a rectangular grid. 1 P (ξn = 1) = P (ξn = (1. 1 Hence (Sn . Sn − Sn ). we let x1 be its horizontal coordinate and x2 its vertical. Sn = i=1 n ξi . so are ξ1 . . ξ2 . i. north or south. west. random vectors such that P (ξn = (0. (Sn . ξ2 . (Sn . x2 ) into (x1 + x2 . . random variables and hence a random walk. When he reaches a corner he moves again in the same manner. for each n. 0)) = P (ξn = (−1. 0)) = P (ξn = (1. y ∈ Z. He starts at (0.i. Consider a change of coordinates: rotate the axes by 45o (and scale them). 2 Same argument applies to (Sn . 2 1 If x ∈ Z2 . he moves east. Sn ) = n n 2 . So let 1 2 1 2 1 2 Rn = (Rn . n ∈ Z+ ) is a random walk. 0)) = 1/4. Indeed. the latter being functions of the former. However. The corners are described by coordinates (x. They are also identically distributed with 1 P (ξn = 0) = P (ξn = (0. n ∈ Z+ ) is also a random walk. with equal probability (1/4). too. ξn are not independent simply because if one takes value 0 the other is nonzero. Since ξ1 . are independent. . . . 0).e. the two stochastic processes are not independent. We then let S0 = (0. going at random east. 0)) = 1/4 1 P (ξn = −1) = P (ξn = (−1. is a random walk.d.i.. We have 1. 1 1 Proof.d. We describe this motion as a random walk in Z2 = Z × Z: Let ξ1 . Many other quantities can be approximated in a similar manner. . 1)) + P (ξn = (0. Rn ) := (Sn + Sn . we immediately get that the expectation of T1 is inﬁnity: ET1 = ∞. n ∈ Z+ ) showing that it. being totally drunk. It is not a simple random walk because there is a possibility that the increment takes the value 0. 0)) = 1/4. north or south. n ∈ Z+ ) is a sum of i. west.

Let a < 0 < b. 1 1 2 2 Hence P (ηn = 1) = P (ηn = −1) = 1/2. 2n n 2 2−4n . n ∈ Z+ ) is a random 2 . ηn = ξn − ξn are independent for each n.d. P (S2n = 0) = Proof. as n → ∞. But (Rn ) 1 2 is a simple symmetric random walk. . But by the previous n=0 c lemma. 1 2 1 2 P (S2n = 0) = P (R2n = 0) = P (R2n = 0. Hence (Rn . 2 1 2 2 1 1 Proof. We have Rn = n (ξi + ξi ). (Rn + independent. 35 Skorokhod embedding* We take a closer look at the gambler’s ruin problem for a simple symmetric random walk in one dimension. P (S2n = 0) ∼ n . The sequence of vectors η . we need to show that ∞ P (Sn = 0) = ∞. ε′ ∈ {−1.1 Lemma 21.i. ξ 1 − ξ 2 ). that for a simple symmetric random walk (started from S0 = 0). . . And so 2 1 2 1 P (ηn = ε. from Theorem 7. The same is true for (Rn ). 1)) = 1/4 1 2 P (ηn = −1. n ∈ Z+ ) 1 is a random walk in two dimensions. b−a −a . ηn = −1) = P (ξn = (−1. 1 where the last equality follows from the independence of the two random walks. .. So P (R2n = 0) = 2n 2−2n . being obtained the vector ηn = (ξn 1 2 n n n by taking the same function on the elements of the sequence ξ1 . η . 2 . b−a P (Ta < Tb ) = b . R2n = 0) = P (R2n = 0)(R2n = 0). where c is a positive constant. in the sense that the ratio of the two sides goes to 1. b−a 103 . ξ2 . Recall. (Rn . and by Stirling’s approximation. The two-dimensional simple symmetric random walk is recurrent. P (Tb < Ta ) = We can read this as follows: P (STa ∧Tb = b) = −a . n ∈ Z+ ) is a random walk in two dimensions. n ∈ Z ) is a random walk in one dimension. By Lemma 5. (Rn + 2 2 1 2 1 1 we check that ηn = ξn + ξn . We have of each of the ηn n 1 2 P (ηn = 1. 0)) = 1/4 1 2 P (ηn = 1. n ∈ Z ) is a random walk in one dimension. n ∈ Z+ ). ηn = 1) = P (ξn = (1. Proof. . b−a P (STa ∧Tb = a) = b . ηn = −1) = P (ξn = (0. . (Rn . The last two are walk in one dimension. Hence P (ηn = 1) = P (ηn = −1) = 1/2. = 1)) = 1/4. +1}. η 2 are ±1. The possible values 1 . Lemma 22. n Corollary 24. is i. So Rn = n ηn where ηn is i=1 i=1 i=1 1 + ξ 2 . ηn = 1) = P (ξn = (0. Similar argument shows that each of (Rn . To show that the last two are independent. ηn = ε′ ) = P (ηn = ε)P (ηn = ε′ ) for all choices of ε. 0)) = 1/4 1 2 P (ηn = −1. Rn = n (ξi − ξi ).

N ∈ N. This means that we can simulate the distribution by generating a simple symmetric random walk starting from 0 and watching it till the ﬁrst time it exits the interval [a. How about more complicated distributions? Let suppose we are given a probability distribution (pi . we have this common value: i<0 (−i)pi = and. i ∈ Z. B) are independent of (Sn ).This last display tells us the distribution of STa ∧Tb . i ∈ Z. and assuming that (A. B = 0) = p0 . k ∈ Z. B = j) = c−1 (j − i)pi pj . Note. The probability it exits it from the left point is p. in particular. b]. Then there exists a stopping time T such that P (ST = i) = pi . if p = M/N . Conversely. i<0 j>0 This leads to: c−1 j>0 jpj i<0 pi + c−1 i<0 (−i)pi j>0 i>0 ipi pj = 1 − p0 Since i∈Z ipi = 0. B = 0) + P (A = i. 0)} ∪ (N × N) with the following distribution: P (A = i. choose. we get that c is precisely ipi . The claim is that P (Z = k) = pk . M < N . using this. b = N . a = M − N . Theorem 43. To see this. B = j) = 1. Let (Sn ) be a simple symmetric random walk and (pi . B) taking values in {(0. i ∈ Z). Deﬁne a pair of random variables (A. c= i<0 (−i)pi = i>0 Letting Ti = inf{n ≥ 0 : Sn = i}. 1 − p). such that p is rational. and the probability it exits from the right point is 1 − p. M. Proof. Then Z = 0 if and only if A = B = 0 and this has 104 . i ∈ Z) a given probability distribution on Z with zero mean: i∈Z ipi = 0. where c−1 is chosen by normalisation: P (A = 0. a < 0 < b such that pa + (1 − p)b = 0. given a 2-point probability distribution (p. suppose ﬁrst k = 0.g. we consider the random variable Z = STA ∧TB . e. Indeed. b. that ESTa ∧Tb = 0. we can always ﬁnd integers a. i < 0 < j. Can we ﬁnd a stopping time T such that ST has the given distribution? The Skorokhod embedding theorem tells us how to do this. P (A = 0.

P (Z = k) = pk for k < 0. We have P (Z = k) = P (STA ∧TB = k) = i<0 j>0 P (STi ∧Tj = k)P (A = i. B = k) i<0 = = i<0 −i −1 c (k − i)pi pk k−i (−i)pi pk = pk . Suppose next k > 0. B = j) P (STi ∧Tk = k)P (A = i. i<0 = c−1 Similarly. by deﬁnition. 105 .probability p0 .

14) = gcd(6. If b − a > a. Given integers a. For instance. So (ii) holds. d | b. 1. b). 2) = gcd(4. (That it is unique is obvious from the fact that if d1 . 0. (ii) if c | a and c | b. with at least one of them nonzero. 3. . −1.}. c | b. their negatives and zero. we say that b divides a if there is an integer k such that a = kb. They are not supposed to replace textbooks or basic courses. replace b − a by b − 2a. we may summarise the ﬁrst line by saying that we replace the largest of the two numbers by the remainder of the division of the largest with the smallest. with the second line. . they merely serve as reminders. 38) = gcd(6. b − a). 2. then d | c. In other words. if we also invoke Euclid’s algorithm (i. If c | b and b | a then c | a. gcd(120. Keep doing this until you obtain b − qa < a. with b = 0. Now. 3. But in the ﬁrst line we subtracted 38 exactly 3 times. i. 2) = 2. Indeed. If a | b and b | a then a = b. 38) = gcd(6. Z = {· · · . 8) = gcd(6. 2) = gcd(2. 4. This gives us an algorithm for ﬁnding the gcd between two numbers: Suppose 0 < a < b are integers. if a. we can deﬁne their greatest common divisor d = gcd(a. d2 are two such numbers then d1 would divide d2 and vice-versa.e. b). So we work faster. 26) = gcd(6. Therefore d | b − a. b.e. −3.PART III: APPENDICES This section contains some sine qua non elements from standard Mathematics. 20) = gcd(6. b are arbitrary integers. Indeed 4 × 38 > 120. 106 . Then interchange the roles of the numbers and continue. b) = gcd(a. Because c | a and c | b. Zero is denoted by 0. The set N of natural numbers is the set of positive integers: N = {1. b) as the unique positive integer such that (i) d | a.) Notice that: gcd(a. We write this by b | a. 2. and d = gcd(a. But d | a and d | b. Then c | a + (b − a). 6. we have c | d. 38) = gcd(82. as taught in high schools or universities. Now suppose that c | a and c | b − a. 32) = gcd(6. the process of long division that one learns in elementary school). Replace b by b − a. b − a). So (i) holds. The set Z of integers consists of N. But 3 is the maximum number of times that 38 “ﬁts” into 120. −2. · · · }. We will show that d = gcd(a. let d = gcd(a. and we’re done. Similarly. 5. Arithmetic Arithmetic deals with integers. leading to d1 = d2 . 38) = gcd(44. .

Its proof is obvious. y ∈ Z. y = 6. r = 18. The same logic works in general. y ∈ Z. . with x = 19. For example. a. it divides d. we can show that. In the process. To see this. a − 1}. 38) = gcd(6. xa + yb > 0}.The theorem of division says the following: Given two positive integers b. we used the divisions 120 = 3 × 38 + 6 38 = 6 × 6 + 2. To see why this is true. to divide 3450 with 26 we do 132 —— 3450 26 — 85 78 — 70 52 — 18 26 | and obtain q = 26. i. 2) = 2. 1. 120) = 38x + 120y. b. . we can ﬁnd an integer q and an integer r ∈ {0. let us follow the previous example again: gcd(120. gcd(38. 107 . we have the fact that if d = gcd(a. Second. Suppose that c | a and c | b. for some x. not both zero. with a > b. Then d = xa + yb. b) = min{xa + yb : x. we have gcd(a. we have 2 = 38 − 6 × 6 = 38 − 6 × (120 − 3 × 38) = 19 × 38 − 6 × 120.e. Then c divides any linear combination of a. such that a = qb + r. We need to show (i). Working backwards. 38) = gcd(6. . in particular. b) then there are integers x. This shows (ii) of the deﬁnition of a greatest common divisor. for integers a. Thus. Here are some other facts about the greatest common divisor: First. y such that d = xa + yb. . let d be the right hand side. The long division algorithm is a process for ﬁnding the quotient q and the remainder r. b ∈ Z.

y) and (y. for some 1 ≤ r ≤ d − 1. then the relation “x is a friend of y” is symmetric. yielding r = (−qx)a + (1 − qy)b. The range of f is the set of its values. So r is expressible as a linear combination of a and b and is strictly smaller than d (being a remainder). we can divide b by d to ﬁnd b = qd + r. Relations A relation on a set A is a mathematical notion that establishes a way of pairing elements of A. So d is not minimum as claimed. If B ⊂ Y then f −1 (B) consists of all x ∈ X with f (x) ∈ B. In general. that d does not divide b. The equality relation = is reﬂexive. the greatest common divisor of any subset of the integers also makes sense. that d does not divide both numbers. so r = 0. The relation < is not reﬂexive. the relation < between real numbers can be thought of as consisting of pairs of numbers (x. e. say. The set of functions from X to Y is denoted by Y X . m elements. Both < and = are transitive. Counting Let A. x). But then b = q(xa + yb) + r. A relation is called symmetric if it cannot contain a pair (x. so d divides both a and b.g. Any equivalence relation splits a set into equivalence classes: the equivalence class of x contains all y that are related (are equivalent) to x. y) such that x is smaller than y.that d | a and d | b.e. The greatest common divisor of 3 integers can now be deﬁned in a similar manner. For example. A relation is transitive if when it contains (x. z) it also contains (x..e. a relation can be thought of as a set of pairs of elements of the set A. j (i communicates with j) in Markov chain Functions We a function f from a set X into a set Y is denoted by f : X → Y . A function is called onto (or surjective) if its range is Y . f (x2 ). Since d > 0. z). We encountered the equivalence relation i theory. it can be thought as reﬂexive (unless we do not want to accept the fact that one may not be afriend of oneself). x2 ∈ X have diﬀerent values f (x1 ). B be ﬁnite sets with n. If we take A to be the set of students in a classroom. A relation which possesses all the three properties is called equivalence relation. A function is called one-to-one (or injective) if two diﬀerent x1 . x). y) without contain its symmetric (y. the set {f (x) : x ∈ X}. 108 . Suppose this is not true. Finally. respectively. The relation < is not symmetric. The equality relation is symmetric. i. and this is a contradiction. but is not necessarily transitive. A relation is called reflexive if it contains pairs (x. i.

) So the ﬁrst animal can be assigned any of the m available names. To be able to do this. the cardinality of the set B A ) is mn . The ﬁrst element can be chosen in n ways. So we have (n)k = n(n − 1)(n − 2) · · · (n − k + 1) ways to pick k elements if we care about their order. (The same name may be used for diﬀerent animals. and so on. k! k where the symbol equation. Any set A contains several subsets. 1}n of sequences of length n with elements 0 or 1. Each assignment of this type is a one-to-one function from A to B. We thus observe that there are as many subsets in A as number of elements in the set {0.e. the number of subsets with k elements each can be found by designating which of the n elements of A will be chosen. and so on. A function is then an assignment of a name to each animal. If we don’t then we divide by the number k! of permutations of the k elements concluding that the number of subsets with k elements of a set with n elements equals (n)k n =: . we denote (n)n also by n!. In the special case m = n. Clearly. the second in n − 1. If A has n elements. Incidentally. n! is the total number of permutations of a set of n elements. 109 . Thus. and this follows from the binomial theorem. the k-th in n − k + 1 ways. so does the second. we have = 1. that of the second in m − 1 ways. Indeed. and the third. Since the name of the ﬁrst animal can be chosen in m ways. in other words. k!(n − k)! Since there is only one subset with no elements (it is the empty set ∅). we need more names than animals: n ≤ m. n k is merely a name for the number computed on the left side of this n k = n n−k n 0 = n! . Hence the total number of assignments is m × m × ··· × m n times Suppose now we are not allowed to give the same name to diﬀerent animals. the total number of one-to-one functions from A to B is (m)n := m(m − 1)(m − 2) · · · (m − n + 1). a one-to-one function from the set A into A is called a permutation of A.The number of functions from A to B (i. The total number of subsets of A (regardless of their number of elements) is thus equal to n k=0 n k = (1 + 1)n = 2n . we may think of the elements of A as animals and those of B as names. and so on. n! stands for the product of the ﬁrst n natural numbers: n! = 1 × 2 × 3 × · · · × n.

Maximum and minimum The minimum of two numbers a. in choosing k animals from a set of n + 1 ones. k Also. by convention. c. 110 . the pig. we add the two numbers together in order to ﬁnd the total number of ways to pick k animals from n + 1 animals. and we don’t care to put parantheses because the order of ﬁnding a maximum is irrelevant. and a− is called the negative part of a.We shall let n = 0 if n < k. For example. The notation extends to more than one numbers. Since the sentences “the pig is included” and “the pig is not included” cannot be simultaneously true. Note that it is always true that a = a+ − a− . b. Note that −(a ∨ b) = (−a) ∧ (−b) and (a ∧ b) + c = (a + c) ∧ (b + c). 0). we let The Pascal triangle relation states that n+1 k = = 0. 0). a∨b∨c refers to the maximum of the three numbers a. m! n y If y is not an integer then. b). we use the following notations: a+ = max(a. n n . a+ is called the positive part of a. In particular. then there are k − 1 remaining k n animals to be chosen and this can be done in k−1 ways. b is denoted by a ∧ b = min(a. we deﬁne x m = x(x − 1) · · · (x − m + 1) . If the pig is not included in the sample. if x is a real number. say. consider a speciﬁc animal. we choose k animals from the remaining n ones and this can be done in n ways. The absolute value of a is deﬁned by |a| = a+ + a− . If the pig is included. + k−1 k Indeed. Thus a ∧ b = a if a ≤ b or = b if b ≤ a. Both quantities are nonnegative. The maximum is denoted by a ∨ b = max(a. by convention. b). a− = − min(a.

. For instance. This is very useful notation because conjunction of clauses is transformed into multiplication. = 1A + 1B − 1A∩B . A2 . if I go to a bank N times and Ai is the clause “I am robbed the i-th time” then N n=1 1Ai is the total number of times I am robbed. . It takes value 1 with probability P (A) and 0 with probability P (Ac ).}. . So X= n≥1 1(X ≥ n). which means P (A ∪ B) = P (A) + P (B) − P (A ∩ B). 111 . An example of its application: Since 1A∪B = 1 − 1(A∪B)c = 1 − 1Ac ∩Bc = 1 − 1Ac 1Bc = 1 − (1 − 1A )(1 − 1B ) = 1A + 1B − 1A 1B . are sets or clauses then the number n≥1 1An is the number true clauses. Another example: Let X be a nonnegative random variable with values in N = {1. 2. we have E[1A∪B ] = E[1A ] + E[1B ] − E[1A∩B ]. P (X ≥ n). A linear function ϕ from Rn to Rm is such that ϕ(αx + βy) = αϕ(x) + βϕ(y). If A is an event then 1A is a random variable. . Several quantities of interest are expressed in terms of indicators. 1 − 1A = 1 A c . We also use the notation 1(clause) for a number that is 1 whenever the clause is true or 0 otherwise. . .Indicator function stands for the indicator function of a set A. Therefore its expectation is E 1A = 1 × P (A) + 0 × P (Ac ) = P (A). if A1 . Thus. Then X X= n=1 1 (the sum of X ones is X). meaning a function that takes value 1 on A and 0 on the complement Ac of A. It is easy to see that 1A 1A∩B = 1A 1B . It is worth remembering that. So E[X] = n≥1 E 1(X ≥ n) = n≥1 Matrices Let Rd be the set of vectors with d components.

. . . Let u1 . xn n 1 y := ϕ(x) = j=1 xj ϕ(uj ).for all x. . x . . vm . . xn the given basis. we have ϕ(x) = m n vi i=1 j=1 aij xj . β ∈ R. But the ϕ(uj ) are vectors in Rm . The column . . So if we choose a basis v1 . . Let y i := n aij xj . Then. . . x1 . am1 · · · amn xn ym Rn − Rm − Rℓ → → ψ ϕ . . Suppose that ψ. ϕ are linear functions: It is a m × n matrix. . The matrix A of ϕ with respect to the choice of the two basis is deﬁned by a11 · · · a1n A := · · · · · · · · · am1 · · · amn y1 Then ϕ ◦ ψ is a linear function: Rn − → Rℓ − 112 ϕ◦ψ . and all α. . m. . j=1 i = 1. . where ∈ R. because ϕ is linear. vm in Rn we can express each ϕ(uj ) as a linear combination of the basis vectors: m ϕ(uj ) = i=1 aij vi . . y ∈ Rn . Then any x ∈ Rn can be uniquely written as n x= j=1 x j uj . is a column representation of y := ϕ(x) with respect to the given basis . The relation between the column representation of y and the column representation of x is written as 1 1 x y a11 · · · a1n . Combining. un be a basis for Rn . . . The column . ym v1 . . is a column representation of x with respect to . . . . = ··· ··· ··· . . .

1/2. a square matrix represents a linear transformation from Rn to Rn . For instance I = {1. e2 = . The sequence xn has a limit ℓ if and only if lim sup xn = lim inf xn = ℓ.Fix three sets of basis on Rn . . Limsup and liminf If x1 . for any ε > 0. there is x ∈ I with x ≥ ε + inf I. since any increasing sequence has a limit (possibly +∞) we take this limit and call it lim sup xn (limit superior) of the sequence. . for any ε > 0. . . of numbers) then the minimum of I is denoted by min I and refers to a number c ∈ I such that c ≤ x for all the numbers x ∈ I.} has no minimum. . the choice of the basis is the so-called standard basis. x4 . 1 ≤ i ≤ ℓ. . Similarly. . A property of real numbers says that if the set I is bounded from below then this maximum of all lower bounds exists. . Likewise. . . . .} ≤ · · · and. xn+1 . Similarly. · · · }. with respect to these bases. B is a m × n matrix. x3 . where m (AB)ij = k=1 Aik Bkj . . xn+2 . Usually. a sequence a1 . In these cases we use the notation inf I. and AB is a ℓ × n matrix. it is its matrix representation with respect to the ﬁxed basis. ei = 1(i = j). and it is this that we called inﬁmum. . . . . A matrix A is called square if it is a n × n matrix. we deﬁne lim sup xn . . . 113 . . . a2 . . x2 . If I has no lower bound then inf I = −∞. Then the matrix representation of ϕ ◦ ψ is AB.e. is a sequence of numbers then inf{x1 . x2 . 0 0 1 In other words. ψ. This minimum (or maximum) may not exist. . we deﬁne max I. 1 ≤ j ≤ n. . . .} ≤ inf{x2 . respectively. B be the matrix representations of ϕ. inf I is characterised by: inf I ≤ x for all x ∈ I and. sup ∅ = −∞. ed = . If I has no upper bound then sup I = +∞. We note that lim inf(−xn ) = − lim sup xn . standing for the “infimum” of I. So lim sup xn = limn→∞ inf{xn . If I is empty then inf ∅ = +∞. . deﬁned as the maximum of all the numbers y with the property y ≤ x for all x ∈ I. . . If we ﬁx a basis in Rn . we deﬁne the supremum sup I to be the minimum of all upper bounds. there is x ∈ I with x ≤ ε + inf I. sup I is characterised by: sup I ≥ x for all x ∈ I and. Rm and Rℓ and let A. . . j So A is a ℓ × m matrix.} ≤ inf{x3 . Supremum and inﬁmum When we have a set of numbers I (for instance. i. 1/3. The standard basis in Rd consists of the vectors 1 0 0 0 1 0 e 1 = .

we work a lot with conditional probabilities.e. B2 . then P n≥1 An = P (An ). I do sometimes “cheat” from a strict mathematical point of view. n≥1 Usually. In contrast.) If A. That is why Probability Theory is very rich. in which case we write P (A) = n P (A ∩ Bn ) = P (A|Bn )P (Bn ). A2 . Similarly. Markov’s inequality for a positive random variable X states that P (X > x) ≤ EX/x. Of course. and 1 Ac = 1 − 1 A . B are disjoint events then P (A ∪ B) = P (A) + P (B). If the events A. if he events A1 .. it is useful to ﬁnd disjoint events B1 . . follows from this axiom. . n P (An ). P (∪n An ) ≤ Similarly. For example. But I would like the reader to keep in mind one thing: Every mathematical system has a set of axioms from which we can prove (useful) theorems. . which holds since X is positive. In Markov chain theory. and apply them. A2 . In contrast. If A1 . and everywhere else. are mutually disjoint events. .e. . we get P (A ∩ B) = P (A)P (B|A) = P (B)P (A|B) and this is called Bayes’ theorem. . to compute P (A ∩ B) we read the deﬁnition as P (A ∩ B) = P (A)P (B|A). This is generalisable for a number of events: P (A1 ∩ A2 ∩ A3 ) = P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 ) If we interchange the roles of A. the function that takes value 1 on A and 0 on Ac . If An are events with P (An ) = 0 then P (∪n An ) = 0. A2 . but not too much. derive formulae. Linear Algebra that has many axioms is more ‘restrictive’. n If the events A1 .Probability Theory I do not require11 knowledge of axiomatic Probability Theory (with Measure Theory). namely: Everything else. Indeed. This can be generalised to any ﬁnite number of events. Note that 1A 1B = 1A∩B . we often need to argue that X ≤ Y . 11 114 . . E 1A = P (A). Quite often. for (besides P (Ω) = 1) there is essentially one axiom . are increasing with A = ∪n An then P (A) = limn→∞ P (An ). by not requiring this. B in the deﬁnition. A is a subset of B). such that ∪n Bn = Ω. to compute P (A). Often. to show that EX ≤ EY . Linear Algebra that has many axioms is more ‘restrictive’. . . . B are not disjoint then P (A ∪ B) = P (A) + P (B) − P (A ∩ B). Recall that P (B|A) is deﬁned by P (A ∩ B)/P (A) and is a very motivated deﬁnition. are decreasing with A = ∩n An then P (A) = limn→∞ P (An ). . (That is why Probability Theory is very rich. This is the inclusion-exclusion formula for two events. i. in these notes. Similarly. This follows from the logical statement X 1(X > x) ≤ X. showing that P (A) ≤ P (B) is more of a logical problem than a numerical one: we need to show that A implies B (i. Probability Theory is exceptionally simple in terms of its axioms. Therefore.. . If A is an event then 1A or 1(A) is the indicator random variable. if An are events with P (An ) = 1 then P (∩n An ) = 1.

. For a random variable that takes values in N we can write EX = ∞ P (X ≥ n). . we have EX = −∞ xf (x)dx. n = 0. a1 . for s = 0 for which G(0) = 0. 1]. EX = x xP (X = x). 1]. is an inﬁnite polynomial having the ai as coeﬃcients: G(s) = a0 + a1 s + a2 s2 + · · · . 0). In case the random variable is discrete. . . We do not a priori know that this exists for any s other than. |s| < 1/|ρ|. This deﬁnition makes sense only if EX + < ∞ or if EX − < ∞. Generating functions A generating function of a sequence a0 . i. the formula reduces to the known one. .e. 1. This is due to the fact that n=1 P (X ≥ n) = E 1(X ≥ n). . 1. is called probability generating function of X: ∞ as long as |ρs| < 1. we deﬁne X + = max(X.An application of this is: P (X < ∞) = lim P (X ≤ x). 0). For a random variable which takes positive and negative values.} because then n an = 1. . . X − = − min(X. . . we encounter situations where P (X = ∞) may be positive. a1 . 1 − ρs (39) Note that |G(s)| ≤ n |an ||s|n and this is ≤ n |an | if s ∈ [−1. which is motivated from the algebraic identity X = X + − X − . In case the random variable is ∞ absoultely continuous with density f . As an example. If it exists for some s1 then it exists uniformly for all s with |s| < |s1 | (and this needs a proof. The expectation of a nonnegative random variable X may be deﬁned by ∞ EX = 0 P (X > x)dx. n ≥ 0 is ∞ ρn s n = n=0 1 . 2. so it is useful to keep this in mind. obviously. which are both nonnegative and then deﬁne EX = EX + − EX − .. viz. . The generating function of the sequence pn = P (X = n). the generating function of the geometric sequence an = ρn . making it a useful object when the a0 . taught in calculus courses). Suppose now that X is a random variable with values in Z+ ∪ {+∞}. The formula holds for any positive random variable. ϕ(s) = EsX = n=0 sn P (X = n) 115 . x→∞ In Markov chain theory. So if n an < ∞ then G(s) exists for all s ∈ [−1. . is a probability distribution on Z+ = {0.

we can recover the expectation of X from EX = lim ϕ′ (s). ∞ and p∞ := P (X = ∞) = 1 − pn . Also. while ∞ ϕ(1) = n=0 P (X = n) = P (X < +∞). n ≥ 0) is n≥0 xn+1 sn = s−1 (X(s) − x0 ) and the generating function of (xn+2 . As an example. but. n ≥ 0) is n≥0 xn+2 sn = s−2 (X(s) − x0 − sx1 ). Generating functions are useful for solving linear recursions. xn sn be the generating of (xn . We can recover the probabilities pn from the formula pn = ϕ(n) (0) . and this is why ϕ is called probability generating function. 116 .This is deﬁned for all −1 < s < 1. n=0 We do allow X to. take the value +∞. n! as follows from Taylor’s theorem. since |s| < 1. Note that ϕ(0) = P (X = 0). Note that ∞ n=0 pn ≤ 1. Since the term s∞ p∞ could formally appear in the sum deﬁning ϕ(s). If we let X(s) = n≥0 n ≥ 0. with x0 = x1 = 1. we have s∞ = 0. n ≥ 0) then the generating function of (xn+1 . possibly. ds ds and by letting s = 1. consider the Fibonacci sequence deﬁned by the recursion xn+2 = xn+1 + xn . s→1 something that can be formally seen from d d EsX = E sX = EXsX−1 .

i. 117 . − − a−b s−b s−a b−a b−s a−s 1/ρ 1 1 But (39) tells us that ρn has generating function 1−ρs = (1/ρ)−s .e. namely. Despite the square roots in the formula. n ≥ 0) and (xn . Hence s2 + s − 1 = (s − a)(s − b). So. Thus (1/ρ)−s is the 1 generating function of ρn+1 .From the recursion xn+2 = xn+1 + xn we have (and this is were linearity is used) that the generating function of (xn+2 . Since x0 = x1 = 1. then the so-called distribution function. s−2 (X(s) − x0 − sx1 ) = s−1 (X(s) − x0 ) + X(s).e. b = −( 5 + 1)/2. +s−1 But the polynomial s2 + s − 1 has two roots: √ √ a = ( 5 − 1)/2. x ∈ R. n ≥ 0). the function P (X ≤ x). we can solve for X(s) and ﬁnd X(s) = s2 −1 . This tells us that s−b is the generating function of (1/b)n+1 1 and that s−a is the generating function of (1/a)n+1 . But I could very well use the function P (X > x). if X is a random variable with values in R. i. and so X(s) = 1 1 1 1 1 1 = . is such an example. the n-th term of the Fibonacci sequence. b−a Noting that ab = −1. Hence X(s) is the generating function of 1 xn = ((1/b)n+1 − (1/a)n+1 ). for example. n ≥ 0) equals the sum of the generating functions of (xn+1 . b−a is the solution to the recursion. this number is always an integer (it has to be!) Distribution or law of a random object When we speak of the “distribution” or “law” of a random variable X with values in Z or R or Rn or even in a much larger space. we speak of some function which enables us to specify the probabilities of the form P (X ∈ B) where B is a set of values of X. we can also write this as xn = (−a)n+1 − (−b)n+1 .

I could use its density f (x) = d P (X ≤ x). for it does allow us to specify the probabilities P (X = n). It turns out to be a good one. (b) The approximation is not good when we consider ratios of products. Even the generating function EsX of a Z+ -valued random variable X is an example. For instance. Hence we have We call this the rough Stirling approximation. Such an excursion is a random object which should be thought of as a random function. We frequently encounter random objects which belong to larger spaces.or the 2-parameter function P (a < X < b). Large products: Stirling’s formula If we want to compute n! = 1 × 2 × · · · × n for large n then. for example they could take values which are themselves functions. Specifying the distribution of such an object may be hard. But there are two problems: (a) We do not know in what sense the symbol ≈ should be interpreted. it follows that n 1 log x dx = n log n − n + 1 ≈ n log n − n. instead. n ∈ Z+ . but using or merely talking about such an object presents no diﬃculty other than overcoming a psychological barrier. Warning: the verb “specify” should not be interpreted as “compute” in the sense of existence of an analytical formula or algorithm. we encountered the concept of an excursion of a Markov chain from a state i. Since log x does not vary too much between two successive positive integers k and k + 1 it follows that k+1 k log x dx ≈ log k. Now. n Hence n k=1 log k ≈ log x dx. to compute a large sum we can. n! ≈ exp(n log n − n) = nn e−n . For example (try it!) the rough Stirling approximation give that the probability that. compute an integral and obtain an approximation. in 2n fair coin tosses 118 . it’s better to use logarithms: n n! = e log n = exp k=1 log k. If X is absolutely continuous. dx All these objects can be referred to as “distribution” or “law” of X. since the number is so big. 1 Since d dx (x log x − x) = log x.

To remedy this. and so. We can case both properties of g neatly by writing For all x. Surely. deﬁned in (34). expressing non-negativity. We baptise this geometric(λ) distribution. expressing monotonicity. and this is terrible. y ∈ R g(y) ≥ g(x)1(y ≥ x). g(X) ≥ g(x)1(X ≥ x). then X is such that knowledge that X has exceeded n does not make the distribution of the remaining time X − n any diﬀerent from X. We have already seen a geometric random variable. for all x. a geometric function of k.we have exactly n heads equals 2n 2−2n ≈ 1. by taking expectations of both sides (expectation is an increasing operator). so this little section is a reminder. Then. Eg(X) ≥ g(x)P (X ≥ x) . you have seen it before. And if y < x then 1(y < x) = 0 and the display reads g(y) ≥ 0. n ∈ Z+ . Markov’s inequality Arguably this is the most useful inequality in Probability. Consider an increasing and nonnegative function g : R → R+ . In fact it was shown there that P (J0 ≥ k) = P (J0 ≥ 1)k . It was shown in Theorem 38 that J0 satisﬁes the memoryless property. This completely trivial thought gives the very useful Markov’s inequality. we shall prove Strirling’s approximation: √ n! ∼ nn e−n 2πn where the symbol an ∼ bn means that an /bn → 0. and is based on the concept of monotonicity. if y ≥ x then 1(y ≥ x) = 1 and the display tells us g(y) ≥ g(x). The geometric distribution We say that the random variable X with values in Z+ is geometric if it has the memoryless property: P (X − n ≥ m | X ≥ n) = P (X ≥ m). where λ = P (X ≥ 1). as n → ∞. If we think of X as the time of occurrence of a random phenomenon. The same process shows that a geometric random variable has a distribution of the form P (X ≥ n) = λn . because we know that the n probability goes to zero as n → ∞. 0 ≤ n ≤ m. Indeed. Let X be a random variable. That was the random variable J0 = ∞ n=1 1(Sn = 0). 119 .

Feller. Also available at: http://www. Springer. 120 . Soc. K L and Aitsahlia F (2003) Elementary Probability Theory (With Stochastic Processes and an Introduction to Mathematical Finance). Bremaud P (1997) An Introduction to Probabilistic Modeling. Cambridge University Press. Amer. 5. Springer. 7. C M and Snell. Springer.edu/~chance/teaching_aids/books_articles/ /probability_book/amsbook. Chung. Wiley. 3.dartmouth. Grinstead.pdf 6. W (1981) Topics in Ergodic Theory.mac. Parry. J R (1998) Markov Chains. J L (1997) Introduction to Probability. W (1968) An Introduction to Probability Theory and its Applications. ´ 2. 4. Norris. Bremaud P (1999) Markov Chains. Cambridge University Press. Math.Bibliography ´ 1.

110 meeting time. 108 ergodic. 10 irreducible. 13 probability generating function. 68 aperiodic. 115 random walk on a graph. 34 probability ﬂow. 17 Kolmogorov’s loop criterion. 70 greatest common divisor. 35 Helly-Bray Lemma. 63 Galton-Watson process. 17 Aloha. 58 digraph (directed graph). 15 distribution. 15. 18 permutation. 117 division. 98 dynamical system. 17 communicates with. 45 Chapman-Kolmogorov equation. 34 PageRank. 18 arcsine law. 48 cycle formula. 59 law. 53 hitting time. 115 geometric distribution. 101 axiom of Probability Theory. 84 bank account. 2 nearest neighbour random walk. 20 increments. 17 inﬁmum. 65 generating function. 17 Ethernet. 108 ruin probability. 116 ﬁrst-step analysis. 6 Markov’s inequality. 26 running maximum. 81 reﬂection principle. 68 Fatou’s lemma. 7 invariant distribution. 1 equivalence classes. 108 relation. 119 matrix.Index absorbing. 117 leads to. 51 essential state. 20 function of a Markov chain. 82 Cesaro averages. 111 maximum. 114 balance equations. 87 121 . 76 neighbours. 6 Markov property. 109 positive recurrent. 29 reﬂection. 61 detailed balance equations. 61 null recurrent. 119 Google. 16. 77 indicator. 7 closed. 113 initial distribution. 73 Markov chain. 18 dual. 77 coupling. 3 branching process. 61 recurrent. 65 Catalan numbers. 111 inessential state. 53 Fibonacci sequence. 48 memoryless property. 119 minimum. 51 Monte Carlo. 44 degree. 10 ballot theorem. 16 equivalence relation. 16 convolution. 15 Loynes’ method. 110 mixing. 86 reﬂexive. 47 coupling inequality. 106 ground state. 70 period. 16 communicating classes.

104 states. 27 storage chain. 27 subordination. 60 urn. 77 skip-free property. 29 transition diagram:. 62 Wright-Fisher model. 3. 6 stationary. 7 simple random walk. 108 tree. 8 transition probability matrix. 73 strategy of the gambler. 58 transient. 71 122 .semigroup property. 8 transitive. 108 time-homogeneous. 16. 9 stationary distribution. 62 supremum. 13 stopping time. 76 Skorokhod embedding. 2 transition probability. 7 time-reversible. 88 watching a chain in a set. 76 simple symmetric random walk. 6 statespace. 119 strong Markov property. 16. 7. 113 symmetric. 10 steady-state. 62 subsequence of a Markov chain. 24 Strirling’s approximation.

Sign up to vote on this title

UsefulNot useful- markov-chains
- 20752_0412055511MarkovChain
- Markov Chain
- (Ebook) Hidden Markov Models (Theory & Methods) Markov Chains Particle Filter Monte Carlo Hmm - Cappemoulinesryden
- MarkovChains
- Introduction to Markov Chain Monte Carlo Simulation and Their Statistical Analysis
- Markov Chains
- Bayesian Econometric Methods
- Markov Coding
- ProblemsMarkovChains
- A First Course in Linear Algebra
- Statistics
- Exercises for Stochastic Calculus
- Markov Chain
- Markov
- markov chain chapter 6
- Stochastic Processes
- A Course in Linear Algebra .pdf
- Solved Problems
- An Introduction to Statistics
- Usig R for Introductory Statistics
- Basic Topology Armstrong
- Grimmett g.r., Stirzaker d.r. Probability and Random Processes (3ed., Oxford, 2001)(1)
- Weather Wax Ross Solutions
- Bayesian Econometrics
- ACTEX Exam C Spring 2009 Vol 1
- Stochastic Processes - Ross
- List of Publications
- Pattern Discovery Using Sequence Data Mining
- Markov Chain Random Walk