This action might not be possible to undo. Are you sure you want to continue?

**MARKOV CHAINS AND RANDOM WALKS
**

Takis Konstantopoulos∗ Autumn 2009

∗

c Takis Konstantopoulos 2006-2009

Contents

1 Introduction 2 Examples of Markov chains 2.1 2.2 2.3 2.4 2.5 A mouse in a cage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bank account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple random walk (drunkard’s walk) . . . . . . . . . . . . . . . . . . . . . . Simple random walk (drunkard’s walk) in a city . . . . . . . . . . . . . . . . . Actuarial chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 3 4 4 5 6 6 6 7 9

3 The Markov property 3.1 3.2 3.3 Deﬁnition of the Markov property . . . . . . . . . . . . . . . . . . . . . . . . Transition probability and initial distribution . . . . . . . . . . . . . . . . . . Time-homogeneous chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Stationarity 4.1

Finding a stationary distribution . . . . . . . . . . . . . . . . . . . . . . . . . 13 15

5 Topological structure 5.1 5.2 5.3 5.4

The graph of a chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The relation “leads to” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The relation “communicates with” . . . . . . . . . . . . . . . . . . . . . . . . 16 Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 20 24 27 28 29

6 Hitting times and ﬁrst-step analysis 7 Gambler’s ruin 8 Stopping times and the strong Markov property 9 Regenerative structure of a Markov chain 10 Recurrence and transience

10.1 First hitting time decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 31 10.2 Communicating classes and recurrence/transience . . . . . . . . . . . . . . . . 32 11 Positive recurrence 33

i

12 Law (=Theorem) of Large Numbers in Probability Theory 13 Law of Large Numbers for Markov chains 13.1 Functions of excursions

34 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

13.2 Average reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 14 Construction of a stationary distribution 15 Positive recurrence is a class property 16 Uniqueness of the stationary distribution 17 Structure of a stationary distribution 18 Coupling and stability 39 41 43 44 45

18.1 Deﬁnition of stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 18.2 The fundamental stability theorem for Markov chains . . . . . . . . . . . . . 47 18.3 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 18.4 Proof of the fundamental stability theorem . . . . . . . . . . . . . . . . . . . 49 18.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 18.6 Periodic chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 19 Limiting theory for Markov chains* 53

19.1 Limiting probabilities for null recurrent states . . . . . . . . . . . . . . . . . . 53 19.2 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 20 Ergodic theorems 21 Finite chains 22 Time reversibility 23 New Markov chains from old ones* 56 57 58 62

23.1 Subsequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 23.2 Watching a Markov chain when it visits a set . . . . . . . . . . . . . . . . . . 62 23.3 Subordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 23.4 Deterministic function of a Markov chain . . . . . . . . . . . . . . . . . . . . 63 24 Applications 65

ii

. . .1 Asymptotic distribution of the maximum . . . . . . . . . .4 The Wright-Fisher model: an example from biology . . . . . 80 27 The simple symmetric random walk in dimension 1: simple probability calculations 83 27. .5 A storage or queueing application . . . . . . . . . . . . 85 27. . . .1 Paths that start and end at speciﬁc points .1 Branching processes: a population growth application . . . . . . . . . . .4 Some remarkable identities . . .3 Total number of visits to a state . . . . . 68 24. . . . . . . . . . . . . 92 30. . . . . . . . . . . 70 24. . . . . . . 83 27. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 31. . . . . . . . . . . . . . . 79 26. . . . . . 95 31. . . .2 Return probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Distribution after n steps . . . . . . . . . . . . . .3 PageRank (trademark of Google): A World Wide Web application .1 Distribution of hitting time and maximum . . . . . . . . . . . . . . . . . .24. . . . . . . . . . . . . 84 27. . . . . . . . . .3 Some conditional probabilities . . . . . . . . . . . . . . 92 31 Recurrence properties of simple random walks 94 31. . . . . . . . . . . . . . . . .3 The distribution of the ﬁrst return to the origin . . . . . . . . . . . 86 28 The reﬂection principle for a simple random walk in dimension 1 86 28. . . . . . . . . . . . . 65 24. . . . . . . . . . . . . . . . . 84 27. . . 90 30. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 The ALOHA protocol: a computer communications application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 The ballot theorem . . . . .2 First return to the origin . . . . 71 24. . . 87 29 Urns and the ballot theorem 30 The asymmetric simple random walk in dimension 1 88 89 30. . . . . . . . . . . . . . 73 25 Random walk 26 The simple symmetric random walk in dimension 1: path counting 76 78 26. . . 96 32 Duality 33 Amount of time that a SSRW is positive* 98 99 iii . . . . . . .1 First hitting time analysis . .2 Paths that start and end at speciﬁc points and do not touch zero at all . . . . . .5 First return to zero . .

34 Simple symmetric random walk in two dimensions 35 Skorokhod embedding* 102 103 iv .

I have a little mathematical appendix. Also. I plan to add a few more sections: – On algorithms and simulation – On criteria for positive recurrence – On doing stuﬀ with matrices – On ﬁnance applications – On a few more delicate computations for simple random walks – Reshape the appendix – Add more examples These are things to come. I introduce stationarity before even talking about state classiﬁcation.Preface These notes were written based on a number of courses I taught over the years in the U. one can argue that random walk calculations should be done before the student is exposed to the Markov chain theory. dozens of good books on the topic. I tried to make everything as rigorous as possible while maintaining each step as accessible as possible. The only new thing here is that I give emphasis to probabilistic methods as soon as possible. I have tried both and prefer the current ordering. of course. “important” formulae are placed inside a coloured box. Greece and the U. The notes should be readable by someone who has taken a course in introductory (non-measuretheoretic) probability. There are.S. There notes are still incomplete.. At the end. The ﬁrst part is about Markov chains and some applications. v . I tried to put all terms in blue small capitals whenever they are ﬁrst encountered.. The second one is speciﬁcally for simple random walks. Also. They form the core material for an undergraduate course on Markov chains in discrete time..K. A few starred sections should be considered “advanced” and can be omitted at ﬁrst reading. Of course.

It is not hard to see that. An example from Physics is provided by Newton’s laws of motion.e. 1. x(t)). the integers). has the property that. yn ). we let our “state” be a pair of integers (xn . where r = rem(a. Its velocity is x = dx . X1 . 2. F = f (x. . initialised by x0 = max(a. . b). . . i. we end up with two numbers. Xn+2 . b). for any t.e. Suppose that a particle of mass m is moving under the action of a force F in a straight line. yn ) into Xn+1 = (xn+1 .PART I: MARKOV CHAINS 1 Introduction A Markov chain is a mathematical model of a random phenomenon evolving with time in a way that the past aﬀects the future only through the present. and the other the greatest common divisor. b) between two positive integers a and b. X0 . We are herein constrained by the Syllabus to discuss only discrete-time Markov chains. for any n. and evolving as xn+1 = yn yn+1 = rem(xn . x). or. Recall (from elementary school maths). For instance. a phenomenon which evolves with time in a way that only the present aﬀects the future is called a dynamical system. The state of the particle at time t is described by the pair ˙ X(t) = (x(t). b) is the remainder of the division of a by b. where xn ≥ yn . yn+1 ). b) = gcd(r. x = x(t) denotes the position of the particle at time t. s ≥ t) ˙ 1 . the particle is suspended at the end of a spring and force is produced by extending the spring and letting the system oscillate (Hooke’s law). The “time” can be discrete (i. one of which is 0. . In the module following this one you will study continuous-time chains. Formally. In other words. . b). The sequence of pairs thus produced.e. Newton’s second law says that the acceleration is proportional to the force: m¨ = F. the real numbers). X2 . yn ). In Mathematics. that gcd(a. n = 0. the future trajectory (X(s). more generally. . a totally ordered set. continuous (i. x Here. An example from Arithmetic of a dynamical system in discrete time is the one which ﬁnds the greatest common divisor gcd(a. . y0 = min(a. depend only on the pair Xn . . the future pairs Xn+1 . there is a function F that takes the pair Xn = (xn . and its ˙ dt 2x acceleration is x = d 2 . Assume that the force F depends only on the particle’s position ¨ dt and velocity. By repeating the procedure.

. A scientist’s job is to record the position of the mouse every minute. We will be dealing with random variables. Repeat with (Xn . 1 2 Statistical observations lead the scientist to believe that the mouse moves from cell 1 to cell 2 with probability α = 0. An example of a random algorithm is the Monte Carlo method for the b approximate computation of the integral a f (x)dx of a complicated function f : suppose that f is positive and bounded below B. we will see what kind of questions we can ask and what kind of answers we can hope for. and use those mathematical models which are called Markov chains. In other words. Other examples of dynamical systems are the algorithms run. The generalisation takes us into the realm of Randomness. via the use of the tools of Probability. Count the number of 1’s and divide by N . We will also see why they are useful and discuss how they are applied. 2. at time n + 1 it is either still in 1 or has moved to 2.000 rows and 10. When the mouse is in cell 1 at time n (minutes) then. Indeed. instead of deterministic objects. 0 ≤ Y0 ≤ B. but some are stochastic. 1 and 2. We can summarise this information by the transition diagram: 1 Stochastic means random. by the software in your computer. respectively.can be completely speciﬁed by the current state X(t). Y0 ) of random numbers. . i. for steps n = 1. Try this on the computer! 2 . say. In addition. Choose a pair (X0 . a ≤ X0 ≤ b. it moves from 2 to 1 with probability β = 0.000 columns or computing the integral of a complicated function of a large number of variables.1 Examples of Markov chains A mouse in a cage A mouse is in a cage with two cells.1 They are stochastic either because the problem they are solving is stochastic or because the problem is deterministic but “very large” such as ﬁnding the determinant of a matrix with 10. past states are not needed. Each time count 1 if Yn < f (Xn ) and 0 otherwise. regardless of where it was at earlier times. Yn ). it does so. We will develop a theory that tells us how to describe.05. .. A mouse lives in the cage. 2 2. Perform this a large number N of times. containing fresh and stinky cheese. Similarly. Some of these algorithms are deterministic. The resulting ratio is an approximation b of a f (x)dx. analyse. Markov chains generalise this concept of dependence of the future only on the present. uniformly. an eﬀective way for dealing with large problems is via the use of randomness.e.99.

intuitive.y = P (Dn − Sn = y − x) = = z=0 ∞ z=0 ∞ P (Sn = z.05 0.01 Questions of interest: 1. Here. S2 . 1. S1 .2 Bank account The amount of money in Mr Baxter’s bank account evolves. Sn . Dn = z + y − x) P (Sn = z)P (Dn = z + y − x) FS (z)FD (z + y − x). D0 . 2. . D1 .y := P (Xn+1 = y|Xn = x). S0 . as follows: Xn+1 = max{Xn + Dn − Sn . to move from cell 1 to cell 2? 2. on the average. . . Assume also that X0 . How long does it take for the mouse. . are random variables with distributions FD . Dn − Sn = y − x) P (Sn = z. FS . The information described above by the evolution of the account together with the distributions for Dn and Sn leads to the (one-step) transition probability which is deﬁned by: px.95 0. (This is the mean of the binomial distribution with parameter α. How often is the mouse in room 1 ? Question 1 has an easy. . Assume that Dn . So if he wishes to spend more than what is available. We easily see that if y > 0 then ∞ x. = z=0 ∞ = z=0 3 .) 2. 0}. and Sn is the money he wants to spend.99 0. he can’t. are independent random variables. . respectively. the ﬁrst time that the mouse will move from 1 to 2 will have mean 1/α = 1/0. px. from month to month. answer: Since the mouse really tosses a coin to decide whether to move or stay in a cell. D2 .05 ≈ 20 minutes. Dn is the money he deposits. Xn is the money (in pounds) at the beginning of the month.α 1−α 1 β 2 1−β Another way to summarise the information is by the 2×2 transition probability matrix P= 1−α α β 1−β = 0. y = 0.

2. How long will it take (if ever) for the bank account to empty? We shall develop methods for answering these and other questions. Of course.1 p0.0 p0.0 p2. can be computed from FS and FD . The transition probability matrix is y=1 p0. If the pub is located in position 0 and his home in position 100. in principle. 1/2 1/2 1/2 1/2 1/2 −2 1/2 −1 1/2 0 1/2 1 1/2 2 1/2 3 Questions: 1.1 p1. So he may move forwards with equal probability that he moves backwards. on the average.3 Simple random walk (drunkard’s walk) Consider a completely drunk person who walks along a street.1 p2. to go from the pub to home? You might object to the assumption that the person is totally drunk and has no sense of direction.2 p1.y .2 P= p2. being drunk.0 p1. We may want to model such a case by assigning probability p for the drunkard to go one step to the right (and probability 1 − p for him to go one step to the left).4 Simple random walk (drunkard’s walk) in a city Suppose that the drunkard is allowed to move in a city whose streets are laid down in a square pattern: 4 . Are there any places which he will visit inﬁnitely many times? 4.2 ··· ··· ··· ··· · · · · · · ··· and is an inﬁnite matrix. Questions of interest: 1. how often will he be visiting 0? 2. how long will it take him. if y = 0. 2. he has no sense of direction.something that. Will Mr Baxter’s account grow. how fast? 2.0 = 1 − ∞ px. Are there any places which he will never visit? 3. we have px. If he starts from position 0. and if so.

Question: What is the distribution of the lifetime (in months) of a currently healthy individual? Clearly.S = 0.H − pS. Note that the diagram omits pH.3 H 0.1 D Thus. and pS. It proposes the following model summarising the state of health of an individual on a monthly basis: 0. so we assign probability 1/4 for each move. there is clearly a higher chance that our man will be lost. there is probability pH.D .Salvador Dali (1922) The Drunkard Suppose that the drunkard moves from corner to corner. north or south.D is omitted.01 S 0. Also.D = 1. the reason being that the company does not believe that its clients are subject to resurrection. and pS. but they will become more precise in the sequel. therefore. observe. etc. We may again want to ask similar questions. Clearly. and let’s say he’s completely drunk. These things are.S = 1 − pS. But. 5 .5 Actuarial chains A life insurance company wants to ﬁnd out how much money to charge its clients.S because pH. west.8 0. 2.3 that the person will become sick. given that he is currently healthy. that since there are more degrees of freedom now. the company must have an idea on how long the clients will live. pD. mathematically imprecise.D .H = 1 − pH. So he can move east. pD. the answer to this is crucial for the policy determination. for now.H .S − pH.

if the claim holds. .} or N = {1. it is customary to call (Xn ) a Markov chain. So we will omit referring to T explicitly. 3. . n ∈ T} is Markov instead of saying that it has the Markov property. . We say that this sequence has the Markov property if. Since S is countable. . . n + m] Xk = ik | Xn = in ). in+1 ∈ S. . . n ∈ T}. (Xn . the Markov property. . n + m] Xk = ik | Xn = in . 3. almost exclusively (unless otherwise said explicitly).3 3. . or negative). m > n. . X2 = i2 . The elements of S are frequently called states. X1 = i1 . Xn = in ) = P (X0 = i0 )P (X1 = i1 | X0 = i0 )P (X2 = i2 | X1 = i1 ) · · · P (Xn = in | Xn−1 = in−1 ). 3. Let us now write the Markov property in an equivalent way. . and so future is independent of past given the present. . 2. P (Xn+1 = in+1 | Xn = in . (1) 6 . . . . m ∈ T) is independent of the past process (Xm . for any n ∈ T. T = Z+ = {0. which precisely means that (Xn+1 . that the Xn take values in some countable set S. . Lemma 1. X0 = i0 ) = P (∀ k ∈ [n + 1. . Xn−1 ) given Xn . . m and any choice of states i0 . . . . Xn+m ) and (X0 . . . Sometimes we say that {Xn . unless needed. Xn−1 ) are independent conditionally on Xn . . called the state space.} or the set Z of all integers (positive. m < n. .2 Transition probability and initial distribution We therefore see that joint probabilities can be expressed as P (X0 = i0 . in+m . 1. . the future process (Xm . . On the other hand. . m ∈ T). for any n. zero. . n ∈ Z+ ) is Markov if and only if. We focus almost exclusively to the ﬁrst choice. for all n ∈ Z+ and all i0 . viz. . . . where T is a subset of the integers. . We will also assume. 2. . . . conditionally on Xn . Usually. we can see that by applying it several times we obtain P (∀ k ∈ [n + 1. This is true for any n and m. Proof. .1 The Markov property Deﬁnition of the Markov property Consider a (ﬁnite or inﬁnite) sequence of random variables {Xn . If (Xn ) is Markov then the claim holds by the conditional independence of Xn+1 and (X0 . X0 = i0 ) = P (Xn+1 = in+1 | Xn = in ). .

j∈S . n). means that the transition probabilities pi. pi.j (n. by deﬁnition. by a result in Probability. If |S| = d then we have d × d matrices. pi. viz. But we will circumvent them by using Probability only. we can write (2) as P(m. a square matrix whose rows and columns are indexed by S (and this. The function pi. j ∈ S. m + 2) · · · P(n − 1. . but if S is an inﬁnite set. something that can be called semigroup property or Chapman-Kolmogorov equation.j (m. 3. n). n) := [pi. we deﬁne.j (m. n). 2) · · · pin−1 . .3 Time-homogeneous chains We will specialise to the case of time-homogeneous chains.j (n.in (n − 1. Xn = in ) = µ(i0 ) pi0 . n) = i1 ∈S ··· in−1 ∈S pi.j (n. n)]i. pi. remembering a bit of Algebra.j (m. for each m ≤ n the matrix P(m. (2) which is reminiscent of matrix multiplication. . n). n) := P (Xn = j | Xm = i) is the transition probability from state i at time m to state j time n ≥ m.i (n − 1. m + 1) · · · pin−1 . 1) pi1 . n ≥ 0. n) = 1(i = j) and. n + 1) := P (Xn+1 = j | Xn = i) . which.i1 (0. n + 1) do not depend on n . by (1). ℓ) P(ℓ.j (m. n) = P(m. n) if m ≤ ℓ ≤ n. X1 = i1 . requires some ordering on S) and whose entries are pi. m + 1)P(m + 1. 7 .i2 (1..3 In this new notation. or as P(m. pi. i∈S i. X2 = i2 . then the matrices are inﬁnitely large. . will specify all joint distributions2 : P (X0 = i0 . n + 1) is called transition probability from state i at time n to state j at time n + 1. n) = P(m.which means that the two functions µ(i) := P (X0 = i) . The function µ(i) is called initial distribution. More generally.j (n. Of course. of course.i1 (m. all probabilities of events associated with the Markov chain This causes some analytical diﬃculties. Therefore. 2 3 And.

Notational conventions Unit mass. for all integers a. while P = [pi. unless otherwise speciﬁed. if j = i otherwise. y y 8 . to picking state i with probability 1. and call pi. n−m times for all m ≤ n. arranged in a row vector. b ≥ 0. In other words. as follows from (1) and the deﬁnition of P. if Y is a real random variable. simply. and the semigroup property is now even more obvious because Pa+b = Pa Pb . δi (j) := 1. of course. (3) We call δi the unit mass at i. It corresponds. The distribution that assigns value 1 to a state i and 0 to all other states is denoted by δi . 0. For example. Similarly. Then P(m. n) = P × P × · · · · · · × P = Pn−m .and therefore we can simply write pi. Thus Pi is the law of the chain when the initial distribution is δi . Then we have µ n = µ 0 Pn .j∈S is called the (one-step) transition probability matrix or. From now on.j . Notice that any probability distribution on S can be written as a linear combination of unit masses. let µn := [µn (i) = P (Xn = i)]i∈S be the distribution of Xn .j ]i. Starting from a speciﬁc state i.j the (one-step) transition probability from state i to state j. A useful convention used for time-homogeneous chains is to write Pi (A) := P (A | X0 = i).j (n. n + 1) =: pi. if µ assigns probability p to state i1 and 1 − p to state i2 . For example. when we say “Markov chain” we will mean “timehomogeneous Markov chain”. transition matrix. then µ = pδi1 + (1 − p)δi2 . we write Ei Y := E(Y | X0 = i) = yP (Y = y | X0 = i) = yPi (Y = y). The matrix notation is useful.

Then a hole is opened in the separator and we watch what happens as the molecules diﬀuse between the two rooms. let us look at the following example: Example: Motion on a polygon.i−1 = P (Xn+1 N −i N i = i − 1 | Xn = i) = . . we indeed expect that we have N/2 molecules in each room. then. say N = 1025 ) in a metallic chamber with a separator in the middle. It should be clear that if initially place the bug at random on one of the vertices. But this will not work. the law of a stationary process does not depend on the origin of time. Hence the uniform probability distribution is stationary. selecting a vertex with probability 1/7. Let Xn be the number of molecules in room 1 at time n. . then. and vice versa. The gas is placed partly in room 1 and partly in room 2.i+1 = P (Xn+1 = i + 1 | Xn = i) = pi. at time n + 1 it moves to one of the adjacent vertices with equal probability. 1. The question then is: can we distribute the initial number of molecules in a way that the origin of time plays no role? One guess might be to take X0 = N/2. Clearly. at any point of time n. m + 1. . because it is impossible for the number of molecules to remain constant at all times. It is clear. . for a while we will be able to tell how long ago the process started. Consider a canonical heptagon and let a bug move on its vertices. Then pi. Imagine there are N molecules of gas (where N is a large number. a sequence of i. random variables provides an example of a stationary process. To provide motivation and intuition.e.) is stationary if it has the same law as (Xn . that if room 1 contains more gas than room 2 then there is a tendency to observe net motion from 1 to 2. We now investigate when a Markov chain is stationary. . Example: the Ehrenfest chain. .4 Stationarity We say that the process (Xn .). the distribution of the bug will still be the same. as it takes some time before the molecules diﬀuse from room to room.i. On the average. Another guess then is: Place 9 . dividing it into two identical rooms. We can model this by saying that the chance that a molecule move from room 1 to room 2 is proportional to the number of molecules in room 1. This is a model of gas distributed between two identical communicating chambers.d. n = m. for any m ∈ Z+ . n ≥ 0) will not be stationary: indeed. If at time n the bug is on a vertex. i. n = 0. N If we start the chain with X0 = N (all molecules in room 1) then the process (Xn . In other words.

P (X1 = i) = j P (X0 = j)pj. (a binomial distribution). If we can do so. Xm+n = in ). in ∈ S. the distribution µn at time n is related to the distribution at time 0 by µn = µ0 Pn . Xm+1 = i1 . . we see that the Markov chain is stationary if and only if the distribution µ0 is chosen so that µ0 = µ0 P. X2 = i2 .i + P (X0 = i + 1)pi+1. we are led to consider: The stationarity problem: Given [pij ].i = π(i − 1)pi−1.) The example above is typical: We found (by guessing!) some distribution π with the property that if X0 has distribution π then every other Xn also has distribution π. X1 = i1 .i = N N −i+1 N i+1 2−N + = · · · = π(i).i = P (X0 = i − 1)pi−1. We can now verify that if P (X0 = i) = π(i) then P (X1 = i) = π(i) and. . use (1) to see that P (X0 = i0 . . then the number of molecules in room 1 will be i with probability π(i) = N −N 2 . then we have created a stationary Markov chain. For the if part. Lemma 2. .each molecule at random either in room 1 or in room 2. . Changing notation.i + π(i + 1)pi+1. Indeed. Xn = in ) = P (Xm = i0 . . Since. P (Xn = i) = π(i) for all n. . a Markov chain is stationary if and only if the distribution of Xn does not change with n. In other words. A Markov chain (Xn . . ﬁnd those distributions π such that πP = π. . Xm+2 = i2 . which you should go through by yourselves. If we do so. i 0 ≤ i ≤ N. . by (3). 10 . The equations (4) are called balance equations. for all m ≥ 0 and i0 . . But we need to prove this. . (4) Such a π is called . 2−N i−1 i+1 N N (The dots signify some missing algebra. hence. Proof. stationary or invariant distribution. n ∈ Z+ ) is stationary if and only if it is time-homogeneous and Xn has the same distribution as Xℓ for all n and ℓ. The only if part is obvious.

. 2 . write the equations (4): π(i) = π(j)pj. Consider the following process: At time n (measured in minutes) let Xn be the time till the arrival of the next bus. Consider a deck of d = 52 cards and shuﬄe them in the following (stupid) way: pick two cards at random and swap their positions. . xd ). random variables. which.Eigenvector interpretation: The equation π = πP says that π is a left eigenvector of the matrix P with eigenvalue 1. This means that the solution to the balance equations is identically zero. and so each π(i) will be zero.}. Then (Xn . . d}. So we must make an: ASSUMPTION: i≥1 j≥i p(j) < ∞. d! x ∈ Sd . It is easy to see that the stationary distribution is uniform: π(x) = 1 . which in turn means that the normalisation condition cannot be satisﬁed. the next one will arrive in i minutes with probability p(i). i ≥ 0. Thus. where 1 is a column whose entries are all 1. p1.i = p(i). We can describe this by a Markov chain which takes values in the set Sd of d! permutations of {1.. i = 1. Who tells us that the sum inside the parentheses is ﬁnite? If the sum is inﬁnite. The state space is S = N = {1. i∈S The matrix P has the eigenvalue 1 always because j∈S pi. . Example: waiting for a bus.i = 1. then we run into trouble: indeed. π must be a probability: π(i) = 1. Keep doing it continuously. 2. Example: card shuﬄing. To ﬁnd the stationary distribution. i ≥ 1. . and this means that 1 is a (right) eigenvector of P corresponding to the eigenvalue 1. .i = π(0)p(i) + π(i + 1). where all the xi are distinct. can be written as P1 = 1. we use the normalisation condition π(1) = i≥1 j≥i = 1. . The times between successive buses are i. each x ∈ Sd is of the form x = (x1 . in matrix notation. But here. π(1) will be zero.j = 1. . if i > 0 i ∈ N. . . After bus arrives at a bus stop. . we must be careful.d. 11 . . which gives p(j) . In addition to that. . j≥1 These can easily be solved: π(i) = π(1) j≥i p(j). i≥1 π(i) −1 To compute π(1). n ≥ 0) is a Markov chain with transition probabilities pi+1.i.

. . 2. . with P (ξr (n) = 1) = P (ξr (n) = 0) = 1/2 then. . 1}. i≥1 j≥i j≥1 i≥1 j≥1 and the latter sum is the expected time between two successive bus arrivals. ξ2 (n). . The Markov chain is obtained by applying the mapping ϕ again and again. what we have really shown is that this assumption is a necessary and suﬃcient condition for the existence and uniqueness of a stationary distribution. . 1 − ξ3 . applied to the interval [0. . So. 1. So let us do so. 2. . We can show that if we start with the ξr (0). P (ξ1 (n + 1) = ε1 . ξ3 . . ξr (n + 1) = εr ) = P (ξ1 (n) = 0. . . . Hence choosing the initial state at random in this manner results in a stationary Markov chain.. for any n = 0. 1] as a ﬂexible bread dough arranged in a long rod) kneads the bread for it stretches it till it doubles in length and then folds it back in the middle. .). are i. εr ∈ {0. . Example: bread kneading. if ξ1 = 1 then shift to the left and ﬂip all bits to obtain (1 − ξ2 . . . . As for the name of the chain (bread kneading) it can be justiﬁed as follows: to each ξ there corresponds a number x between 0 and 1 because ξ can be seen as the binary representation of the real number x. If ξ1 = 1 then x ≥ 1/2. 2. r = 1. . .i. ξ(n + 1) = ϕ(ξ(n)).. . . ξ2 (n). 1] (think of [0. . the same is true at n + 1.d.Let us work out the sum to see if this assumption can be expressed in more “physical” terms: p(j) = 1(j ≥ i)p(j) = jp(j).) is the state at time n. But to even think about it will give you some strength. . i. . .). Note that there is nothing random in the transition mechanism of this chain! The only way to instill randomness is by choosing the initial state at random. 1} for all i. ξ2 (n) = ε1 . P (ξr (n) = 1) = P (ξr (n) = 0) = 1/2 The proof is by induction: if the ξ1 (n). We have thus deﬁned a mapping ϕ from binary vectors into binary vectors. ξr+1 (n) = 1 − εr ).d..). If ξ1 = 0 then x < 1/2 and our shifting rule maps x into 2x. You can check that . ξr+1 (n) = εr ) + P (ξ1 (n) = 1. . . ξi ∈ {0.* (This is slightly beyond the scope of these lectures and can be omitted. . any r = 1. with P (ξr (0) = 1) = P (ξr (0) = 0) = 1/2 then the same will be true for all n. 2(1 − x)). (5) 12 .) Consider an inﬁnite binary vector ξ = (ξ1 . . Here ξ(n) = (ξ1 (n). . So it goes beyond our theory. and any ε1 . . ξ2 . and transform it according to the following rule: if ξ1 = 0 then shift to the left to obtain (ξ2 . . . . . . and the rule maps x into 2(1 − x).e. But the function f (x) = min(2x. .i. ξ2 (n) = 1 − ε1 . So our assumption really means: ASSUMPTION ⇐⇒ the expected time between successive bus arrivals is ﬁnite. i. Remarks: This is not a Markov chain with countably many states. . . . from (5).

amongst them. because the ﬁrst d equations are linearly dependent: the sum of both sides equals 1. π(j) = 1. This is OK.j .Note on terminology: steady-state. then the above are d + 1 equations. Proof. j∈S i ∈ S. If the latter condition holds.i = j∈A π(j)pj. Ac ) = F (Ac . The convenience is often suggested by the topological structure of the chain. When a Markov chain is stationary we refer to it as being in 4. π(i) = j∈S (balance equations) (normalisation condition). Ac ) is the total current from A to Ac . i∈A j∈Ac Think of π(i)pi.1 Finding a stationary distribution As explained above. Conversely. therefore one of them is always redundant.i . a stationary distribution π satisﬁes π = πP π1 = 1 In other words.i Multiply π(i) by j∈S pi. Fix a set A ⊂ S and write π(i) = j∈S π(j)pj. We have: Proposition 1. suppose that the balance equations hold. A). But we only have d unknowns. π is a stationary distribution if and only if π1 = 1 and F (A. Writing the equations in this form is not always the best thing to do.j = j∈A π(j)pj. Instead of eliminating equations.i 13 . Ac ) := π(i)pi. as we will see in examples.i + j∈Ac π(j)pj. If the state space has d states.j as a “current” ﬂowing from i to j.j (which equals 1): π(i) j∈S pi. a set of d linearly independent equations which can be more easily solved. Let us introduce the notion of “probability flow” from a set A of states to its complement under a distribution π: F (A. then choose A = {i} and see that you obtain the balance equations. So F (A. for all A ⊂ S. we introduce more.i + j∈Ac π(j)pj. π(j)pj. expecting to be able to choose.

p22 = 1 − β.i This is true for all i ∈ S. The constant c is found from π(1) + π(2) = 1.) Assume 0 < α. If we take α = 0. p21 = β. but we shall not do this here. . 2. A ) A c c F( A . we see that the ﬁrst term on the left cancels with the ﬁrst on the right. p11 = 1 − α. p q Consider the following chain.j = j∈A π(j)pj.05 ≈ 0. and 4. (Consequently. The balance equations are: π(i) = pπ(i − 1) + qπ(i + 1).99 ≈ 0. here is an example: Example: Consider the 2-state Markov chain with p12 = α. We have π(1)α = π(2)β. 1. which immediately tells us that π(1) is proportional to β and π(2) proportional to α: π(1) = cβ.8% of the time in room 2. A c F(A .952. .99 (as in the mouse example). .j + j∈A j∈Ac π(i)pi.04 room 1. The extra degree of freedom provided by the last result. α+β π(2) = α . That is. the mouse spends only 95. π(1) = β .048. Example: a walk with a barrier.. Summing over i ∈ A. while the remaining two terms give the equality we seek.2% of the time in 1. . gives us ﬂexibility. we ﬁnd π(1) = 0. β = 0.. A) Schematic representation of the ﬂow balance relations. Thus. 14 i = 1. in steady state. 3.i + j∈Ac π(j)pj.05. α+β π(2) = cα.04 π(2) = 0.Now split the left sum also: π(i)pi. One can ask how to choose a minimal complete set of linearly independent equations. Instead. where p + q = 1: p p p 0 q 1 q 2 q 3 q 4 . β < 1.

. . and E a set of directed edges. i leads to j (written as i Notice that this relation is reflexive): ∀i ∈ S Furthermore: Theorem 1. Since 0 < Pi (∃n ≥ 0 Xn = j) ≤ 15 Pi (Xn = j). • Suppose that i j.i1 pi1 . Does this provide a stationary distribution? Yes. Proof. a pair (S. (If p ≥ 1/2 there is no stationary distribution. i − 1}. .2 The relation “leads to” We say that a state i leads to state j if. . For i ≥ 1.i2 · · · pin−1 .j > 0. 1. j) ∈ E ⇐⇒ pi.But we can make them simpler by choosing A = {0. This is i=0 possible if and only if p < q. E) where S is the state space and E ⊂ S × S deﬁned by (i.e.) 5 5. (ii) pi. the chain will visit j at some ﬁnite time. S is a set of vertices. This is often cast in terms of a digraph (directed graph). we can solve them immediately. If i ≥ 1 we have F (A. . i.j > 0 for some n ∈ N and some states i1 . .e. If i = j there is nothing to prove. In fact. 5. π(i) = (p/q)i π(0). In graph-theoretic terminology. In other words. i. The following are equivalent: (i) i j i i j) ⇐⇒ Pi (∃n ≥ 0 Xn = j) > 0. in−1 . These are much simpler than the previous ones. . So assume i = j. (iii) Pi (Xn = j) > 0 for some n ≥ 0. provided that ∞ π(i) = 1. We can frequently (but not always) draw the graph (as we did in examples) and visualise the set E as the set of all edges with arrows. starting from i. p < 1/2. n≥0 .1 Topological structure The graph of a chain This refers to the structure of the process that does not depend on the exact values of the transition probabilities but only on which of them are positive. Ac ) = π(i − 1)p = π(i)q = F (Ac . . A).

{6}. Equivalence relation means that it is symmetric. by deﬁnition. the set of all states that communicate with i: [i] := {j ∈ S : j So. {4. The ﬁrst two classes diﬀer from the last two in character. meaning that (ii) holds. By assumption. Just as any equivalence relation.i1 pi1 . Proof. It is reﬂexive and transitive because is so. the sum contains a nonzero term. Look at the last display again.j .the sum on the right must have a nonzero term. is an equivalence relation. j) ⇐⇒ i j and j i. 3}. reﬂexive and transitive. Hence Pi (Xn = j) > 0.. Corollary 1. (6) j. it partitions S into equivalence classes known as communicating classes.3 The relation “communicates with” Deﬁne next the relation i communicates with j (written as i Obviously. (iii) holds. [i] = [j] if and only if i identical or completely disjoint. {2. We have Pi (Xn = j) = i1 . • Suppose that (ii) holds.. i}. However. • Suppose now that (iii) holds.. 5}. Since Pi (Xn = j) > 0. In other words. The class {4. the class {2. by deﬁnition. (i) holds. The relation j ⇐⇒ j i). In addition.. 5. We just observed it is symmetric. and so (iii) holds.in−1 pi. 5} is closed: if the chain goes in there then it never moves out of it. The relation is transitive: If i j and j k then i k. because Pi (Xn = j) ≤ Pi (∃m ≥ 0 Xm = j). Two communicating classes are either 1 2 3 4 5 6 In the example of the ﬁgure. this is symmetric (i Corollary 2. where the sum extends over all possible choices of states in S. we see that there are four communicating classes: {1}. we have that the sum is also positive and so one of its terms is positive. 16 .i2 · · · pin−1 . The communicating class corresponding to the state i is. 3} is not closed.

Note: An inessential state i is such that there exists a state j such that i j but j i. If we remove the non-closed communicating classes then we obtain a collection of disjoint closed communicating classes with no arrows between them. In other words still. We then can ﬁnd states i1 . and so on. In graph terms.i = 1 then we say that i is an absorbing state. Why? A state i is essential if it belongs to a closed communicating class. more precisely. 17 . Suppose ﬁrst that C is a closed communicating class. . Lemma 3. and so i2 ∈ C.i1 > 0. pi1 . . parts. in−1 such that pi.j = 1.i1 pi1 . we can reach any other state.More generally. Since i ∈ C. the state space S is a closed communicating class. C2 . starting from any state. pi. the state is inessential. j∈C for all i ∈ C.j > 0. If a state i is such that pi. If all states communicate with all others. Proof. To put it otherwise. the single-element set {i} is a closed communicating class. Otherwise. and C is closed. The communicating classes C1 . its transition matrix P) is called irreducible. Closed communicating classes are particularly important because they decompose the chain into smaller. C5 in the ﬁgure above are closed communicating classes. . Note: If we consider a Markov chain with transition matrix P and ﬁx n ∈ N then the Markov chain with transition matrix Pn has exactly the same closed communicating classes. The converse is left as an exercise. we have i1 ∈ C. A general Markov chain can have an inﬁnite number of communicating classes. Note that there can be no arrows between closed communicating classes. The classes C4 . we conclude that j ∈ C. There can be arrows between two nonclosed communicating classes but they are always in the same direction. C3 are not closed. C1 C2 C3 C4 C5 The general structure of a Markov chain is indicated in this ﬁgure. Similarly. more manageable.i2 > 0.i2 · · · pin−1 . A communicating class C is closed if and only if the following holds: If i ∈ C and if i j then j ∈ C. we say that a set of states C ⊂ S is closed if pi. Thus. then the chain (or. Suppose that i j. .

5. n ≥ 0. ℓ ∈ N. Consider two distinct essential states i. more generally. If i j there is α ∈ N with pij > 0 and if 18 . The period of an inessential state is not deﬁned. X2 ∈ C1 . A state i is called aperiodic if d(i) = 1. Dj = {n ∈ (n) (α) N : pjj > 0}.e. Let us denote by pij the entries of Pn . i. . X3 ∈ C2 . Since Pm+n = Pm Pn . we have So pii (2n) (n) pij ≥ pii (n) 2 (m+n) ≥ pik pkj . (n) Proof. 3} at some point of time then. . Notice however. The chain alternates between the two sets. Let Di = {n ∈ N : pii > 0}. So. and. all states form a single closed communicating class. X4 ∈ C1 . If i j then d(i) = d(j). . i. d(i) = gcd Di . pii (ℓn) ≥ pii (n) ℓ . (Can you ﬁnd an example of a chain with period 3?) We will now give the deﬁnition of the period of a state of an arbitrary chain. (m) (n) for all m.4 Period Consider the following chain: 1 2 4 3 The chain is irreducible. So if the chain starts with X0 ∈ C1 then we know that X1 ∈ C2 . d(j) = gcd Dj . We say that the chain has period 2. it will move to the set C2 = {2. j ∈ S. Theorem 2 (the period is a class property). 4} at the next step and vice versa. j. whether it is possible to return to i in m steps can be decided by one of the integer divisors of m. that we can further simplify the behaviour of the chain by noticing that if the chain is in the set C1 = {1. Therefore if pii > 0 then pii (n) (ℓn) > 0 for all ℓ ∈ N. Another way to say this is: (n) (m) If the integer n divides m (denote this by: n | m) and if pii > 0 then pii > 0. The period of an essential state is deﬁned as the greatest common divisor of all natural numbers n with such that it is possible to return to i in n steps: d(i) := gcd{n ∈ N : Pi (Xn = i) > 0} . .

we obtain d(i) ≤ d(j) as well. Divide n by n1 to obtain n = qn1 + r. Then pii > 0 and so pjj ≥ pij pii pji > 0. d(j) | d(i)). Cd−1 such that the chain moves cyclically between them. n1 ∈ Di such that n2 − n1 = 1. Therefore d(j) | α + β. Arguing symmetrically. the chain is called aperiodic. Therefore d(j) | α + β + n for all n ∈ Di . . C1 . showing that α + β ∈ Dj . C1 . Then we can uniquely partition the state space S into d disjoint subsets C0 . Theorem 4. Pick n2 . chains where. Because n is suﬃciently large. So pjj ≥ pij pji > 0. i. are irreducible chains. Formally. Theorem 3. . Therefore n = qn1 + r(n2 − n1 ) = (q − r)n1 + rn2 . showing that α + n + β ∈ Dj . j∈Cr+1 i ∈ Cr . Since d(i) is the greatest common divisor we have d(i) ≥ d(j) (in fact. . Since n1 . Proof. we have n ∈ Di as well. An irreducible chain has period d if one (and hence all) of the states have period d. (n) (n) (α+n+β) (α) (n) (β) (β) (α+β) (α) (β) This ﬁgure shows the internal structure of a closed communicating class with period d = 4. r = 0. If i is an aperiodic state then there exists n0 such that pii > 0 for all n ≥ n0 . d − 1. In particular. . 1. if d = 1. If a number divides two other numbers then it also divides their diﬀerence. .j i there is β ∈ N with pji > 0. (Here Cd := C0 . From the last two displays we get d(j) | n for all n ∈ Di .e. Corollary 3. Of particular interest. . j ∈ S. More generally. So d(j) is a divisor of all elements of Di . . Now let n ∈ Di . as deﬁned earlier.) 19 . . Consider an irreducible chain with period d. Let n be suﬃciently large. An irreducible chain with ﬁnitely many states is aperiodic if and only if there (n) exists an n such that pij > 0 for all i. all states communicate with one another. . So d(i) = d(j). where the remainder r ≤ n1 − 1. n2 ∈ Di . Cd−1 such that pij = 1. . q − r > 0. if an irreducible chain has period d then we can decompose the state space into d sets C0 . . this is the content of the following theorem. .

after it takes one step from its current position. j belongs to the next class. necessarily. It is easy to see that if we continue and pick a further state id with pid−1 .id > 0. Take. necessarily. Let C1 be the equivalence class of i1 . Notice that this is an equivalence relation. As an example. . . Suppose pij > 0 but j ∈ C1 but. with corresponding classes C0 .. If X0 ∈ A the two times coincide. C1 . and by the assumptions that d d j. Assume d > 1. We use a method that is based upon considering what the Markov chain does at time 1. i.Proof. We now show that if i belongs to one of these classes and if pij > 0 then. We avoid excessive notation by using the same letter for both. then.. . i2 i0 i0 in a number of steps which is an integer multiple of d − 1 (why?). We are interested in deriving formulae for the probability that this time is ﬁnite as well as the expectation of this time. Pick a state i0 and let C0 be its equivalence class (i. for example. The reader is warned to be alert as to which of the two variables we are considering at each time. Then pick i1 such that pi0 i1 > 0.. Cd−1 .e. Hence it partitions the state space S into d disjoint equivalence classes. Continue in this manner and deﬁne states i0 . i1 . Such a path is possible because of the choice of the i0 . id−1 d 6 Hitting times and ﬁrst-step analysis Consider a Markov chain with transition probability matrix P.. which contradicts the deﬁnition of d. id ∈ C0 . The existence of such a path implies that it is possible to go from i0 to i. consider the chain 1 1/3 1 0 4 1/2 1 1/6 2 We shall later consider the time inf{n ≥ 1 : Xn ∈ A} which diﬀers from TA simply by considering n ≥ 1 instead of n ≥ 0.e the set of states j such that i j). Deﬁne the relation i d j ⇐⇒ pij (nd) > 0 for some n ≥ 0.. j ∈ C2 . We show that these classes satisfy what we need.. say. . that is why we call it “first-step analysis”. . We deﬁne the hitting time4 of a set of states A by TA := inf{n ≥ 0 : Xn ∈ A}.. Consider the path i0 → i → j → i2 → i3 → · · · → id → i0 . i1 . i ∈ C0 . 20 .

We ﬁrst have (where TA Pi (TA < ∞) = j∈S ′ Pi (1 + TA < ∞|X1 = j)Pi (X1 = j) = j∈S ′ Pi (TA < ∞|X1 = j)pij ′ But observe that the random variable TA is a function of the future after time 1 (i. because the chain may end up in state 2. Furthermore. j∈S as needed. For the second part of the proof. we have Pi (TA < ∞) = ϕ(j)pij . Hence: ′ ′ P (TA < ∞|X1 = j. X2 . The function ϕ(i) satisﬁes ϕ(i) = 1. the event TA is independent of X0 . i ∈ A.e.It is clear that P1 (T0 < ∞) < 1. ﬁx a set of states A. . But the Markov chain is homogeneous. . which implies that ′ P (TA < ∞|X1 = j) = P (TA < ∞|X0 = j) = Pj (TA < ∞) = ϕ(j). we have ϕ′ (i) = j∈A pij + j∈A = j∈A pij + j∈A pij pij pjk + k∈A k∈A pjk + j∈A pij pjk ϕ′ (k) k∈A pjk ϕ′ (k) k∈A 21 . But what exactly is this probability equal to? To answer this in its generality. By self-feeding this equation. ϕ(i) = j∈S i∈A pij ϕ(j). and deﬁne ϕ(i) := Pi (TA < ∞). If i ∈ A then TA = 0. So TA = 1 + TA . If i ∈ A. let ϕ′ (i) be another solution. and so ϕ(i) = 1. .). We then have: Theorem 5. Combining the above. X0 = i) = P (TA < ∞|X1 = j). conditionally on X1 . if ϕ′ (i) is any other solution of these equations then ϕ′ (i) ≥ ϕ(i) for all i ∈ S. Then ϕ′ (i) = j∈A pij + j∈A pij ϕ′ (j). Therefore. ′ is the remaining time until A is hit). ′ Proof. a ′ function of X1 . then TA ≥ 1.

We immediately have ϕ(0) = 1. ′ E i TA = 1 + E i TA = 1 + j∈S pij Ej TA = 1 + j∈A pij ψ(j). Furthermore. 3 22 . j∈A i ∈ A. If i ∈ A then. then TA = 1 + TA . Letting n → ∞. as above. Start the chain from i. we have 1 1 ϕ(1) = ϕ(1) + ϕ(0). On the other hand. 2}. ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ Next consider the mean time (mean number of steps) until the set A is hit for the ﬁrst time. the second equals Pi (TA = 2). ψ(0) = ψ(2) = 0. 1 ψ(1) = 1 + ψ(1). The second part of the proof is omitted.We recognise that the ﬁrst term equals Pi (TA = 1). obviously.) As for ϕ(1). Proof. Clearly. if the chain starts from 2 it will never hit 0. ψ(i) := Ei TA . and let ψ(i) = Ei TA . if ψ ′ (i) is any other solution of these equations then ψ ′ (i) ≥ ψ(i) for all i ∈ S. i = 0. ψ(i) = 1 + i∈A pij ψ(j). the set that contains only state 0. so the ﬁrst two terms together equal Pi (TA ≤ 2). we obtain ϕ′ (i) ≥ Pi (TA < ∞) = ϕ(i). Therefore. 3 2 so ϕ(1) = 3/4. we choose A = {0}. because it takes no time to hit A if the chain starts from A. we obtain ϕ′ (i) ≥ Pi (TA ≤ n). By omitting the last term we obtain ϕ′ (i) ≥ Pi (TA ≤ 2). The function ψ(i) satisﬁes ψ(i) = 0. and ϕ(2) = 0. 1. (If the chain starts from 0 it takes no time to hit 0. Example: Continuing the previous example. which is the second of the equations. We let ϕ(i) = Pi (T0 < ∞). Example: In the example of the previous ﬁgure. TA = 0 and so ψ(i) := Ei TA = 0. By continuing self-feeding the above equation n times. let A = {0. We have: Theorem 6. If ′ i ∈ A. 2.

. of the random variables X1 . therefore the mean number of ﬂips until the ﬁrst success is 3/2. we just changed index from n to n + 1. Next. This happens up to the ﬁrst hitting time TA ﬁrst hitting time of the set A. . h(x) = Ex n=0 f (Xn ) ′ 1+TA = f (x) + Ex n=1 ′ TA f (Xn ) = f (x) + Ex n=0 f (Xn+1 ). It should be clear that the equations satisﬁed are: ϕAB (i) = 1. (This should have been obvious from the start. where. by the Markov property. if x ∈ A. because when X0 ∈ A then T0 = 0 and so the total reward is f (X0 ). until set A is reached. consider the following situation: Every time the state is x a reward f (x) is earned.which gives ψ(1) = 3/2. otherwise. ′ where TA is the remaining time.e. The total reward is f (X0 ) + f (X1 ) + · · · + f (XTA ). Now the last sum is a function of the future after time 1. B. We may ask to ﬁnd the probabilities ϕAB (i) = Pi (TA < TB ).) ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ Now let us consider two hitting times TA . in the last sum. X2 . h(x) = f (x).. We are interested in the mean total reward TA h(x) := Ex n=0 f (Xn ). . ′ TA = 1 + TA . ϕAB (i) = j∈A i∈B pij ϕAB (j). ϕAB (i) = 0. Clearly. ′ TA Ex n=0 f (Xn+1 ) = y∈S pxy h(y). i. Then ′ 1+TA x ∈ A. Hence. ϕAB is the minimal solution to these equations. as argued earlier. 23 . i∈A Furthermore. because the underlying experiment is the ﬂipping of a coin with “success” probability 2/3. TB for two disjoint sets of states A. after one step. ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ As yet another application of the ﬁrst-step analysis method.

If heads come up then you win £1 per pound of stake. x∈A pxy h(y). (Sooner or later. The successive stakes (Yk . So. You can remember the latter ones by thinking that the expectecd total reward equals the immediate reward f (x) plus the expected remaining reward. k ∈ N) form the strategy of the gambler. If ξk is the outcome of the k-th toss (where ξk = +1 means heads and ξk = −1 means tails). (Also. why?) If we let h(i) be the average total reward if he starts from room i. 2. if just before the k-th toss you decide to bet Yk pounds then after the realisation of the toss you will have earned Yk pounds if heads came up or lost Yk pounds if tails appeared. so 24 . he will die. h(3) = 7. we have h(4) = 0 1 h(1) = 1 + h(2) 2 1 h(3) = 3 + h(2) 2 1 1 h(2) = 2 + h(1) + h(3). the set of equations satisﬁed by h. Example: A thief enters sneaks in the following house and moves around the rooms. for i = 1. until he dies in room 4. in which case he dies immediately. collecting “rewards”: when in room i.Thus. he collects i pounds. your fortune after the n-th toss is n Xn = X0 + k=1 Y k ξk .) 7 Gambler’s ruin Consider the simplest game of chance of tossing an (possibly unfair) coin repeatedly and independently. Clearly. are h(x) = 0. 1 2 4 3 The question is to compute the average total reward. h(1) = 5. if he starts from room 2. y∈S h(x) = f (x) + x ∈ A. 3. no gambler is assumed to have any information about the outcome of the upcoming toss. unless he is in room 4. 2 2 We solve and ﬁnd h(2) = 8. Let p the probability of heads and q = 1 − p that of tails.

In other words.if the gambler wants to have a strategy at all. is our next goal. We wish to consider the problem of ﬁnding the probability that the gambler will never lose her money.i−1 = q = 1 − p. Let us consider a concrete problem. the company has control over the probability p. . expectation) under the initial condition X0 = x. She will stop playing if her money fall below a (a < x).g. Ex ) for the probability (respectively. Consider the hitting time of state y: Ty := inf{n ≥ 0 : Xn = y}. x − a . We use the notation Px (respectively. 25 ϕ(ℓ) = 1. b are absorbing. write ϕ(x) instead of ϕ(x. Consider a gambler playing a simple coin tossing game. To solve the problem. We ﬁrst solve the problem when a = 0. It can only depend on ξ1 . 0 ≤ x ≤ ℓ. . .i+1 = p. Let £x be the initial fortune of the gambler. in the ﬁrst case. Yk cannot depend on ξk . P (tails) = 1 − p. a + 2. in addition. . The general solution will then obtained from Px (Tb < Ta ) = ϕ(x − a. This has profound actuarial applications: An insurance company must make sure that the strategies it employs are such that–at least–the company will not go bankrupt (and. if p = q = 1/2 b−a Proof. in simplest case Yk = 1 for all k. as described above. The gambler starts with X0 = x pounds and has a target: To make b pounds (b > x). ℓ) := Px (Tℓ < T0 ). (Think why!) To simplify things. Then the probability that the gambler’s fortune will reach level b > x before reaching level a < x equals: x−a − 1 (q/p) (q/p)b−a − 1 . a < i < b. . We obviously have ϕ(0) = 0. b = ℓ > 0 and let ϕ(x. . b} with transition probabilities pi. she must base it on whatever outcomes have appeared thus far. . a + 1. ℓ). it will make enormous proﬁts). . pi. betting £1 at each integer time against a coin which has P (heads) = p. a ≤ x ≤ b. and where the states a. Theorem 7. We are clearly dealing with a Markov chain in the state space {a. The diﬀerence between the gambler’s ruin problem for a company and that for a mere mortal is that. b − a). astrological nature. if p = q Px (Tb < Ta ) = . . b − 1. ξk−1 and–as is often the case in practice–on other considerations of e.

and so λx − 1 λx − 1 =1− = λx . we ﬁnd δ(0) = 1/ℓ and so x ϕ(x) = . Since p + q = 1. we can write this as p[ϕ(x + 1) − ϕ(x)] = q[ϕ(x) − ϕ(x − 1)]. ℓ) = λx −1 . Assuming that λ := q/p = 1. We ﬁnally ﬁnd λx − 1 . Since ϕ(ℓ) = 1. ℓ Summarising. we have λℓ −1 λ−1 δ(0) = 1 which gives δ(0) = (λ − 1)/(λℓ − 1). We have Px (T0 < ∞) = lim Px (T0 < Tℓ ) = 1 − lim Px (Tℓ < T0 ). λ−1 ϕ(x) = If λ = 1 then q = p and so ϕ(x + 1) − ϕ(x) = ϕ(x) − ϕ(x − 1). This gives δ(x) = δ(0) for all x and ϕ(x) = xδ(0). we have obtained ϕ(x. λx − 1 δ(0). ϕ(x) = pϕ(x + 1) + qϕ(x − 1). 0 ≤ x ≤ ℓ. if λ = 1 if λ = 1. if p = q. Corollary 4. If p > 1/2. Let δ(x) := ϕ(x + 1) − ϕ(x). Px (T0 < ∞) = 1 − lim 26 . then λ = q/p < 1. 1. we ﬁnd ϕ(x) = δ(x − 1) + δ(x − 2) + · · · + δ(0) = [λx−1 + · · · + λ + 1]δ(0) = Since ϕ(ℓ) = 1. λ = 1. We have δ(x) = q δ(x − 1) = p q p 2 0 < x < ℓ. then λ = q/p > 1. and so Px (T0 < ∞) = 1 − lim If p < 1/2. by ﬁrst-step analysis. The general case is obtained by replacing ℓ by b − a and x by x − a. λ = 1. ℓ→∞ ℓ Px (T0 < ∞) = 1 − lim Finally. The ruin probability of a gambler starting with £x is Px (T0 < ∞) = Proof. ℓ→∞ λℓ − 1 x = 1 − 0 = 1. λℓ − 1 0 ≤ x ≤ ℓ.and. ℓ→∞ λℓ − 1 0−1 λx − 1 = 1 − 0 = 1. λℓ −1 x ℓ. if p > 1/2 if p ≤ 1/2. ℓ→∞ ℓ→∞ (q/p)x . δ(x − 2) = · · · = q p x δ(0).

Xn+1 . τ < ∞) and that P (Xτ +1 = j1 . Then. . . A ∩ {τ = n} is determined by (X0 . A | Xτ . Xτ = i. . . (c) is due to the fact that A ∩ {τ = n} is determined by (X0 .j1 pj1 . . τ < ∞) = P (Xn+1 = j1 . Theorem 8 (strong Markov property). τ < ∞)P (A | Xτ . . | Xτ . any such A must have the property that. Xn ) if we know the present Xn . . considered earlier. . Proof. A. Xn ) for all n ∈ Z+ . . n ≥ 0) is independent of the past process (X0 . conditional on τ < ∞ and Xτ = i. . Since (X0 . This stronger property is referred to as the strong Markov property. . . . for all n ∈ Z+ . for any n. . . . . the ﬁrst hitting time of the set A. In particular. the future process (Xτ +n . Our assertions are now easily obtained from this by ﬁrst summing over all n (whence the event {τ < ∞} appears) and then by dividing both sides by P (Xτ = i. An example of a random time which is not a stopping time is the last visit of a speciﬁc set. . P (Xτ +1 = j1 . . . . . τ = n) (c) = P (Xn+1 = j1 . τ < ∞) = pi. . | Xn = i. . τ < ∞). . A.j2 · · · We have: P (Xτ +1 = j1 . . . . Let τ be a stopping time. A.j1 pj1 . | Xτ = i. τ = n) = pi. . . 27 .8 Stopping times and the strong Markov property A random time is a random variable taking values in the time-set. . A. Xn = i. Xτ +2 = j2 . . Suppose (Xn . An example of a stopping time is the time TA . the future process (Xn . . . possibly. n ≥ 0) is a time-homogeneous Markov chain. . Xn+2 = j2 . . Let A be any event determined by the past (X0 . τ = n) = P (Xn+1 = j1 . Xn )1(τ = n). . A. . a stopping time τ is a random variable with values in Z+ ∪{+∞} such that 1(τ = n) is a deterministic function of (X0 . . . Recall that the Markov property states that. . . | Xn = i)P (Xn = i. including. Xτ ) if τ < ∞. . Let τ be a stopping time. Xτ )1(τ < ∞) = n∈Z+ (X0 . .j2 · · · P (Xn = i.Xτ +2 = j2 . τ = n) (a) (b) = P (Xτ +1 = j1 . . . Xn+2 = j2 . . Xn ) and the ordinary splitting property at n. . the value +∞. . . Xn ). . . Xτ +2 = j2 . . Xτ ) and has the same law as the original process started from i.) is independent of the past process (X0 . We will now prove that this remains true if we replace the deterministic n by a random stopping time τ . τ = n). We are going to show that for any such A. . Xn+2 = j2 . . . . and (d) is due to the fact that (Xn ) is time-homogeneous Markov chain with transition probabilities pij . A. τ = n)P (Xn = i. . (b) is by the deﬁnition of conditional probability. (d) where (a) is just logic. . Xτ +2 = j2 .

we let Ti be the time of the second (1) (3) visit. “chunks”. (And we set Ti := Ti . So Xi is the r-th excursion: the chain visits 28 ..d.. 0 ≤ n < Ti . by using the strong Markov property. Xi (2) . Ti Xi (0) (r) ≤ n < Ti (r+1) .9 Regenerative structure of a Markov chain A sequence of i. Note the subtle diﬀerence between this Ti and the Ti of Section 6. Schematically. The idea is that if we can ﬁnd a state i that is visited again and again. . in fact. as “random variables”. Regardless.d. 3. Formally. The random variables Xn comprising a general Markov chain are not independent.) We let Ti be the time of the third visit. If X0 = i then the two deﬁnitions coincide. . . random variables is a (very) particular example of a Markov chain. . Xi (1) . . we recursively deﬁne Ti (r) (2) := inf{n > Ti (r−1) : Xn = i}. Xi (0) . (Why?) Consider the “trajectory” of the chain between two successive visits to the state i: Xi Let also We think of (r) := Xn . All these random times are stopping times. and so on. (1) r = 1. However.i. in a generalised sense. . Our Ti here is always positive. then the behaviour of the chain between successive visits cannot be inﬂuenced by the past or the future (and that is due to the strong Markov property). := Xn . State i may or may not be visited again. . we will show that we can split the sequence into i.i. random trajectories. 2. r = 2. They are. We (r) refer to them as excursions of the process. The Ti of Section 6 was allowed to take the value 0. These excursions are independent! 11111111111 00000000000 1111111111 0000000000 11111111111 00000000000 11111111111111 00000000000000 1111111111 0000000000 11111111111 00000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 11111111111 00000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 1111111111111111 0000000000000000 11111 11111111111111 00000000000000 1111111111 0000000000 i 00000 11111111111111111111111111111111111111111111 00000000000000000000000000000000000000000000 1111111111111111 0000000000000000 11111 00000 11111 00000 Ti = Ti (1) Ti (2) Ti (3) Ti (4) Fix a state i ∈ S and let Ti be the ﬁrst time that the chain will visit this state: Ti := inf{n ≥ 1 : Xn = i}.

the chain returns to i only ﬁnitely many times: Pi (Xn = i for inﬁnitely many n) = 0. if it starts from state i at some point of time. Xi have the same distribution follows from the fact that the Markov chain is time-homogeneous. . ad inﬁnitum.. at least one state must be visited inﬁnitely many times. but important corollary of the observation of this section is that: Corollary 5. equal to i. But (r) the state at Ti is. starting from i. and this proves the ﬁrst claim. A trivial.d. (0) are independent random variables. Xi distribution. . We abstract this trivial observation and give a couple of deﬁnitions: First. Therefore. possibly. have the same Proof. are independent random variables. for time-homogeneous Markov chains. Xi (2) . Moreover. Perhaps some states (inessential ones) will be visited ﬁnitely many times. we say that a state i is transient if. . . In particular. The claim that Xi . The last corollary ensures us that Λ1 . . we say that a state i is recurrent if. . g(Xi (1) ). Xi (1) .state i for the r-th time. g(Xi (2) ). except. Second. Therefore. An immediate consequence of the strong Markov property is: Theorem 9. (r) . 29 . the chain returns to i inﬁnitely many times: Pi (Xn = i for inﬁnitely many n) = 1.). Λ2 . by deﬁnition. and “goes for an excursion”. of the excursions are Λr := (r) n=Ti 1(Xn = j) The meaning of Λr is this: it expresses the number of visits to a state j between two successive visits to the state i. starting from i. the future after Ti is independent of the past before Ti . Example: Deﬁne Ti (r+1) (0) ). but there are always states which will be visited again and again. it will behave in the same way as if it starts from the same state at another point of time. we “watch” it up until the next time it returns to state i. they are also identical in law (i. all. (Numerical) functions g(Xi independent random variables. 10 Recurrence and transience When a Markov chain has ﬁnitely many states. . Apply Strong Markov Property (SMP) at Ti : SMP says that. given the state at (r) (r) (r) the stopping time Ti . the future is independent of the (1) (2) past.i. The excursions Xi (0) ..

there could be yet another possibility: 0 < Pi (Xn = i for inﬁnitely many n) < 1. fii = Pi (Ti < ∞) = 1. We claim that: Lemma 4. i. State i is recurrent. Pi (Ji = ∞) = 1. Because either fii = 1 or fii < 1. 2. (k−1) Notice that fii = Pi (Ti < ∞) can either be equal to 1 or smaller than 1.Important note: Notice that the two statements above are not negations of one another. Indeed. Notice also that fii := Pi (Ji ≥ 1) = Pi (Ti < ∞).e. . namely. Suppose that the Markov chain starts from state i. We will show that such a possibility does not exist. Therefore. Lemma 5. 30 . Notice that: state i is visited inﬁnitely many times ⇐⇒ Ji = ∞. By the strong Markov property at time Ti (k−1) Pi (Ji ≥ k) = Pi (Ji ≥ k − 1)Pi (Ji ≥ 1). every state is either recurrent or transient. 3. k ≥ 0. Ei Ji = ∞. we conclude what we asserted earlier. Consider the total number of visits to state i (excluding the possibility that X0 = i): ∞ Ji = n=1 1(Xn = i) = sup{r ≥ 0 : Ti(r) < ∞} . Indeed. 2 k Pi (Ji ≥ k) = fii Pi (Ji ≥ k − 1) = fii Pi (Ji ≥ k − 2) = · · · = fii . we have Proof. Pi (Ji = ∞) = 1. Starting from X0 = i. 4. This means that the state i is recurrent. This means that the state i is transient. therefore concluding that every state is either recurrent or transient. and that is why we get the product. the event {Ji ≥ k} implies that Ti < ∞ and that Ji ≥ k − 1. We summarise this together with another useful characterisation of recurrence & transience in the following two lemmas. But the past (k−1) before Ti is independent of the future. in principle. If fii < 1. the random variable Ji is geometric: k Pi (Ji ≥ k) = fii . If fii = 1 we have that Pi (Ji ≥ k) = 1 for all k. The following are equivalent: 1. then Pi (Ji < ∞) = 1.

On the other hand. Ji cannot take value +∞ with positive probability. (n) i. If Ei Ji = ∞. then.1 Deﬁne First hitting time decomposition fij := Pi (Tj = n). Lemma 6. pii → 0. The following are equivalent: 1. by deﬁnition. 2. fii = Pi (Ti < ∞) < 1. State i is transient. Clearly. then. by the deﬁnition of Ji . The latter is the probability to be in state j at time n (not necessarily for the ﬁrst time) starting from i.e. Pi (Ji < ∞) = 1. Proof. owing to the fact that Ji is a geometric random variable. we see that its expectation cannot be inﬁnity without Ji itself being inﬁnity with probability one. Ei Ji < ∞. This. (n) But if a series is ﬁnite then the summands go to 0. fij ≤ pij . Hence Ei Ji = ∞ also. n ≥ 1. Proof. if we assume Ei Ji < ∞. Proof. it is visited inﬁnitely many times. Corollary 6. (n) pij n = Pi (Xn = j) = m=1 n Pi (Tj = m. which. Notice the diﬀerence with pij . From Elementary Probability. by deﬁnition of Ji is equivalent to Pi (Ji = ∞) = 1. since Ji is a geometric random variable. as n → ∞. assuming that the chain (n) starts from state i. clearly. The equivalence of the ﬁrst two statements was proved before the lemma. The equivalence of the ﬁrst two statements was proved above. But ∞ ∞ ∞ (n) Ei Ji = Ei n=1 1(Xn = i) = n=1 Pi (Xn = i) = n=1 (n) pii . 4. the probability to hit state j for the ﬁrst time at time n ≥ 1. this implies that Ei Ji < ∞. (n) pij n (m) (n−m) (n) (n) = m=1 fij pjj . i. by deﬁnition. If i is transient then Ei Ji < ∞. State i is transient if.e. it is visited ﬁnitely many times with probability 1. 10. 3. but the two sequences are related in an exact way: Lemma 7. and. as n → ∞. is equivalent to Pi (Ji < ∞) = 1.Proof. Xn = j) Pi (Tj = m)Pi (Xn = j | Tj = m) 31 = m=1 . therefore Pi (Ji < ∞) = 1. If i is a transient state then pii → 0. State i is recurrent if.

excursions Xi . r = 0. So the random variables δj. . the probability to go to j in some ﬁnite time is positive. not only fij > 0. . j i. if we let fij := Pi (Tj < ∞) .i. We now show that recurrence is a class property. inﬁnitely many of them will be 1 (with probability 1). showing that j will appear in inﬁnitely many of the excursions for sure. by summing over n the relation of Lemma 7. and so the probability that j will appear in a speciﬁc excursion is positive. we have n=1 pjj = Ej Jj < ∞. If j is transient.d. . .) Theorem 10. but also fij = 1. 1.. Xi . either all states are recurrent or all states are transient. Hence. 32 . If j is a transient state then pij → 0. and since they take the value 1 with positive probability. ( Remember: another property that was a class property was the property that the state have a certain period. In other words. for all states i. . ∞ Proof. are i. by deﬁnition.i. If i is recurrent and i j then there is a positive probability (0) (1) (= fij ) that j will appear in one of the i. . Proof.r := 1 j appears in excursion Xi (r) . For the second term. starting from i. Corollary 7.d. and hence pij as n → ∞. The last statement is simply an expression in symbols of what we said above. In a communicating class. the probability that j will appear in a speciﬁc excursion equals gij = Pi (Tj < Ti ). as n → ∞. Now. we obtain (after interchanging the order of the two sums in the right hand side) ∞ n=1 (n) pij ∞ (m) fij ∞ n=m (n) (n−m) pjj ∞ (m) (n) (m) = m=1 = (1 + Ej Jj ) m=1 fij = (1 + Ej Jj )Pi (Tj < ∞).2 Communicating classes and recurrence/transience Recall the deﬁnition of the relation “state i leads to state j”.The ﬁrst term equals fij . denoted by i j: it means that. Start with X0 = i. Hence. Indeed. In particular. If i is recurrent and if i j then j is recurrent and fij = Pi (Tj < ∞) = 1. and gij := Pi (Tj < Ti ) > 0. we use the Strong Markov Property: (n−m) Pi (Xn = j | Tj = m) = Pi (Xn = j | Xm = j) = pjj . which is ﬁnite. 10. Corollary 8. j i. we have i j ⇐⇒ fij > 0.

Indeed. Proof. P (B) = Pi (Xn = i for inﬁnitely many n) < 1. random variables. some state i must be visited inﬁnitely many times. Recall that a ﬁnite random variable may not necessarily have ﬁnite expectation.Proof. Un ≤ x | U0 = x)dx xn dx = 1 . 1]. Hence. P (T > n) = P (U1 ≤ U0 . . Theorem 12. Theorem 11. . . If a communicating class contains a state which is transient then. Xn = i for inﬁnitely many n) = 0. necessarily. all other states are also transient.e. .i. We now take a closer look at recurrence. Every ﬁnite closed communicating class is recurrent. U1 . observe that T is ﬁnite. But then.e. uniformly distributed in the unit interval [0. X remains in C. and so i is a transient state. Hence all states are recurrent. . . Example: Let U0 . by the above theorem. . . Pi (B) = 0. Un ≤ U0 ) 1 = 0 1 P (U1 ≤ x. . . Hence 11 Positive recurrence Recall that if we ﬁx a state i and let Ti be the ﬁrst return time to i then either Pi (Ti < ∞) = 1 (in which case we say that i is recurrent) or Pi (Ti < ∞) < 1 (transient state). By ﬁniteness. U2 . appear in a certain game. Let C be the class. n+1 33 = 0 . and hence. What is the expectation of T ? First. This state is recurrent. If i is inessential then it is transient. Proof. i. every other j is also recurrent. be i. for example. j but j i. It should be clear that this can. Thus T represents the number of variables we need to draw to get a value larger than the initial one. Suppose that i is inessential. 0 < Pi (A) = Pi (A ∩ B) + Pi (A ∩ B c ) = Pi (A ∩ B c ) ≤ Pi (B c ).d. By closedness. if started in C. If i is recurrent then every other j in the same communicating class satisﬁes i j. i. but Pi (A ∩ B) = Pi (Xm = j. So if a communicating class is recurrent then it contains no inessential states and so it is closed. . Let T := inf{n ≥ 1 : Un > U0 }. there is a state j such that i there is m such that Pi (A) = Pi (Xm = j) > 0.

. Our goal now is to show that a chain possessing a positive recurrent state also possesses a stationary distribution. (ii) Suppose E max(Z1 . Then P Z1 + · · · + Zn =µ n→∞ n lim = 1.d. n+1 But ET = ∞ P (T > n) = n=1 n=1 1 = ∞. It is traditional to refer to it as “Law”. X2 . we shall discover what the form of this stationary distribution is by using the Law of Large Numbers 12 Law (=Theorem) of Large Numbers in Probability Theory The following result is. n+1 You are invited to think about the meaning of this: If the initial value U0 is unknown. P (T < ∞) = lim P (T ≤ n) = lim (1 − n→∞ ∞ n→∞ 1 ) = 1. an inﬁnite number of steps to exceed it! We now classify a recurrent state i as follows: We say that i is positive recurrent if Ei Ti < ∞. . Theorem 13. 34 . then it takes. It is not a Law. be i.i. To our rescue comes the regenerative structure of a Markov chain identiﬁed at an earlier section. 0) = ∞. When we have a Markov chain. the Fundamental Theorem of Probability Theory. the sequence X0 . Let Z1 . . Z2 .Therefore. But this is misleading. X1 . of successive states is not an independent sequence. but E min(Z1 . it is a Theorem that can be derived from the axioms of Probability Theory. 0) > −∞. . arguably. We say that i is null recurrent if Ei Ti = ∞. 13 Law of Large Numbers for Markov chains The law of large numbers stated above requires independence. We state it here without proof. (i) Suppose E|Z1 | < ∞ and let µ = EZ1 . . To apply the law of large numbers we must create some kind of independence. random variables. on the average. Before doing that. Then P Z1 + · · · + Zn =∞ n→∞ n lim = 1. .

Then. . which. . . Xa(2) .d. as in Example 2.13. k=0 which represents the ‘time-average reward’ received. G(Xa(3) ). The ﬁnal equality in the last display is merely our notation for the expectation of a random variable conditional on X0 = a.i. (r) (r+1) G(Xa(r) ) = max{f (Xt ) : Ta ≤ t < Ta }. but deﬁne G(X ) to be the total reward received over the excursion X . G(Xa(r) ) = (r) (r+1) Ta ≤t<Ta f (Xt ). given a recurrent state a ∈ S (which will be referred to as the ground state for no good reason other than that it has been ﬁxed once and for all) the sequence of excursions Xa(1) . n as n → ∞. with probability 1. in this example.1 Functions of excursions Namely. (1) (2) (1) (n) (8) 13. are i. . we can think of as a reward received when the chain is in state x. forms an independent sequence of (generalised) random variables. (7) Whichever the choice of G.. and because if we start with X0 = a. In particular. are i.i. . for reasonable choices of the functions G taking values in R. because the excursions Xa . deﬁne G(X ) to be the maximum reward received during this excursion. Example 2: Let f (x) be as in example 1. assign a reward f (x).2 Average reward Consider now a function f : S → R. the (0) initial excursion Xa also has the same law as the rest of them. Example 1: To each state x ∈ S. We are interested in the existence of the limit time-average reward = lim 1 t→∞ t t f (Xk ). the law of large numbers tells us that G(Xa ) + · · · + G(Xa ) → EG(Xa(1) ). Xa(3) . Thus. . Speciﬁcally. for an excursion X . . Note that EG(Xa(1) ) = EG(Xa(2) ) = · · · = E G(Xa(0) ) | X0 = a = Ea G(Xa(0) ). random variables. G(Xa(2) ).d. Since functions of independent random variables are also independent. . we shall assume that f is bounded and shall examine those conditions which yield a nontrivial limit. . Xa . we have that G(Xa(1) ). 35 .

Theorem 14. (r) Nt := max{r ≥ 1 : Ta ≤ t}. with probability one. t t 36 . The idea is this: Suppose that t is large enough. We break the sum into the total reward received over the initial excursion. and Glast is the reward over the last incomplete cycle. Nt . Let f : S → R+ be a bounded ‘reward’ function. Since P (Ta < ∞) = 1. Thus we need to keep track of the number. Ea Ta < ∞) and assume that the distribution of X0 is such that P (Ta < ∞) = 1. with probability one. Nt = 5. Proof. plus the total reward received over the next excursion. Ea f= n=0 f (Xn ) E a Ta = x∈S f (x)π(x). Gr is given by (7). We are looking for the total reward received between 0 and t. t as t → ∞. For example. Gﬁrst is the reward up to time Ta = Ta or t (whichever t is smaller). 0 Ta (1) Ta (2) T (3) a Ta(4) Ta (5) t We write the total reward as a sum of rewards over complete excursions plus a ﬁrst and a last term: t Nt −1 r=1 (1) f (Xn ) = n=0 Gr + Gﬁrst + Glast . Then. in the ﬁgure below. we have BTa /t → 0. say |f (x)| ≤ B. t t as t → ∞. and so 1 ﬁrst G → 0. say.e. Suppose that a Markov chain possesses a positive recurrent state a ∈ S (i. the ‘time-average reward’ f exists and is a deterministic quantity: P where Ta −1 1 t→∞ t lim t f (Xn ) = f n=0 =1 . and so on. t t In other words. Hence |Gﬁrst | ≤ BTa . We shall present the proof in a way that the constant f will reveal itself–it will be discovered. A similar argument shows that 1 last G → 0. of complete excursions from the ground state a that occur between 0 and t. We assumed that f is t bounded.

random variables with common expectation Ea Ta . (11) By deﬁnition. r=1 (10) Now look at the sequence r (r) (1) Ta = Ta + k=1 (k) (k−1) ) (Ta − Ta (1) (k) (k−1) and see what the Law of Large Numbers again.d. = E a Ta . So we are left with the sum of the rewards over complete cycles.also. . E a Ta f (Xn ) = n=0 EG1 =: f . Write now 1 t Nt −1 Gr = r=1 Nt 1 t Nt Nt −1 Gr . t→∞ t E a Ta (12) Using (10) and (12) in (9) we have 1 lim t→∞ t Thus 1 t→∞ t lim t Nt −1 Gr = r=1 EG1 . lim t∞ 1 Nt Nt −1 Gr = EG1 . we have that the visit with index r = Nt to the ground state a is the last before t (see also ﬁgure above): (N (N Ta t ) ≤ t < Ta t +1) . 2. Since Ta → ∞. Hence Ta t t Ta t ≤ < Nt Nt Nt But (11) tells us that Ta t Ta t = lim t→∞ Nt t→∞ Nt lim Hence lim (N ) (N +1) (N ) (N +1) . as t → ∞.. as r → ∞. Hence. are i. it follows that the number Nt of complete cycles before time t will also be tending to ∞. 1 Nt = . = E a Ta . Since Ta /r → 0. . . and since Ta − Ta k = 1. E a Ta 37 . r=1 with probability one.i. r=1 (9) Look again what the Law of Large Numbers (8) tells us: that 1 n→∞ n lim (r) n Gr = EG1 . we have Ta r→∞ r lim (r) .

then we see that the numerator equals 1. of times state b is visited = Ea between two successive visits to state a Ta −1 1(Xk = b) = k=0 E a Ta . the ground state. i. The physical interpretation of the latter should be clear: If. b. We can use b as a ground state instead of a. Ta −1 k=0 1 lim t→∞ t with probability 1. which communicate and are positive recurrent. Comment 4: Suppose that two states. for all b ∈ S. The constant f is a sort of average over an excursion. (13) 1(Xn = b) = n=1 Comment 3: If in the above we let b = a. We may write EG1 = Ea 0≤t<Ta f (Xt ) or EG1 = Ea 0<t≤Ta f (Xt ). a. The denominator is the duration of the excursion. The equality of these two limits gives: average no. The numerator of f is computed by summing up the rewards over the excursion (making sure that not both endpoints are included). from the Strong Markov Property. on the average. Note that we include only one endpoint of the excursion. we let f to be 1 at the state b and 0 everywhere else. Comment 1: It is important to understand what the formula says. f(X2 ) f(X1 ) f(X )= f(a) 0 f(T −1) a 0 T a Comment 2: A very important particular case occurs when we choose f (x) := 1(x = b). then the long-run proportion of visits to the state a should equal 1/Ea Ta . t Ea 1(Xk = b) E a Ta .To see that f is as claimed. E a Ta (14) with probability 1. it takes Ea Ta units of time between successive visits to the state a. The theorem above then says that. Hence t→∞ lim 1 t t 1(Xn = a) = n=1 1 . E b Tb 38 .e. not both. Applying formula (14) with b in place of a we ﬁnd that the limit in (13) also equals 1/Eb Tb . just observe that EG1 = E (1) (1+1) Ta ≤t<Ta f (Xt ) = Ea 0≤t<Ta f (Xt ).

Note: Although ν [a] depends on the choice of the ‘ground’ state a. Recurrence tells us that Ta < ∞ with probability one. Indeed. We start the Markov chain with X0 = a. this should have something to do with the concept of stationary distribution discussed a few sections ago. . (15) should play an important role. for any state b such that a → x (a leads to x). we will soon realise that this dependence vanishes when the chain is irreducible. so there is nothing lost in doing so. 5 We stress that we do not need the stronger assumption of positive recurrence here. X2 . Since a = XT a . ν [a] (x) = y∈S ν [a] (y)pyx . . we have 0 < ν [a] (x) < ∞. 39 . . notice that the choice of notation ν [a] is justiﬁable through the notation (6) for the communicating class of the state a. Proposition 2. Then ν [a] satisﬁes the balance equations: ν [a] = ν [a] P. Moreover. Hence the superscript [a] will soon be dropped. Clearly. Proof. X1 . X2 . by a function of X1 . (16) This was a trivial but important step: We replaced the n = 0 term with the n = Ta term.14 Construction of a stationary distribution Our discussions on the law of large numbers for Markov chains revealed that the quantity Ta −1 ν (x) := Ea n=0 [a] 1(Xn = x) . i. X3 . Both of them equal 1. Fix a ground state a ∈ S and assume that it is recurrent5 . The real reason for doing so is because we wanted to replace a function of X0 . x ∈ S. . . . x ∈ S. we can write ν (x) = Ea k=1 [a] Ta 1(Xn = x). In view of this.e. we saw (Comment 2 above) that if a is a positive recurrent state then ν [a] (x) E a Ta is the long-run fraction of times that state x is visited. Consider the function ν [a] deﬁned in (15). where a is the ﬁxed recurrent ground state whose existence was assumed.

n ≤ Ta ) pyx Pa (Xn−1 = y. First note that. Ta Ta (m) ν [a] (x) = Ea x∈S n=1 x∈S 1(Xn = x) = Ea n=1 1 = E a Ta . x ∈ S. such that pxa > 0. We are now going to assume more. such that pax > 0. in this last step. by deﬁnition. We just showed that ν [a] = ν [a] P. let x be such that a x. n ≤ Ta ) = n=1 ∞ Pa (Xn = x. for all x ∈ S. We now have: Corollary 9. we used (16). which is precisely what we do next: ∞ ν (x) = Ea n=1 ∞ [a] 1(Xn = x. is ﬁnite when a is positive recurrent. Hence ν [a] (x) ≥ ν [a] (a)p(n) = p(n) > 0. from (16). Deﬁne Ta −1 ν [a] (x) = π (x) := E a Ta [a] Ea n=0 1(Xn = x) E a Ta . xa so. which. ax ax From Theorem 10. Suppose that the Markov chain possesses a positive recurrent state a. Hence 1 = ν [a] (a) ≥ ν [a] (x)p(m) > ν [a] (x). ν [a] (x) < ∞. n ≤ Ta ) Pa (Xn = x. that a is positive recurrent. Xn−1 = y. x a. (17) This π is a stationary distribution. n ≤ Ta ) Ta = n=1 y∈S ∞ = n=1 y∈S = y∈S pyx Ea n=1 [a] 1(Xn−1 = y) = y∈S pyx ν (y). Note: The last result was shown under the assumption that a is a recurrent state. 40 . (n) Next. where. ν [a] = ν [a] Pn . So we have shown ν [a] (x) = y∈S pyx ν [a] (y). namely. so there exists m ≥ 1. indeed. which are precisely the balance equations. Therefore. so there exists n ≥ 1. for all n ∈ N.and thus be in a position to apply ﬁrst-step analysis.

. to decide whether j will occur in a speciﬁc excursion. Suppose that i recurrent. in Proposition 2. in general. Theorem 15. the event that j occurs in a speciﬁc excursion is independent from the event that j occurs in the next excursion. But π [a] is simply a multiple of ν [a] . The coins are i. second) excursion that contains j. then j occurs in the r-th excursion.e. . 2. since the excursions are independent. and the probability gij = Pi (Tj < ∞) that j occurs in a speciﬁc excursion is positive. 41 . Therefore. Since a is positive recurrent. π(y) = 1 y∈S x∈S have at least one solution. positive recurrent state. 15 Positive recurrence is a class property We now show that (just like recurrence) positive recurrence is also a class property.In other words: If there is a positive recurrent state.i. Proof. if tails show up. and so π [a] is not trivially equal to zero. It only remains to show that π [a] is a probability. unique. Start with X0 = i. We already know that j is recurrent. In words. then the set of linear equations π(x) = y∈S π(y)pyx . we have Ea Ta < ∞. no matter which one. Let R1 (respectively. x∈S Pause for thought: We have achieved something VERY IMPORTANT: if we have one. We have already shown. Hence j will occur in inﬁnitely many excursions. R2 ) be the index of the ﬁrst (respectively. r = 0. π [a] P = π [a] . . i. Also. Let i be a positive recurrent state. This is what we will try to do in the sequel. But π [a] (x) = x∈S 1 E a Ta ν [a] (x) = 1. Then j is positive Proof. 1. If heads show up at the r-th excursion. Consider the excursions Xi . Otherwise. then we immediately have a stationary (probability) distribution! We DID NOT claim that this distribution is.d. (r) j. that ν [a] P = ν [a] . that it sums up to one. then the r-th excursion does not contain state j. we toss a coin with probability of heads gij > 0.

we just need to show that R2 −R1 +1 ζ := r=1 Zr has ﬁnite expectation. Corollary 10. . IRREDUCIBLE CHAIN r = 1. Since gij > 0 and Ei Ti < ∞ we have that Ei ζ < ∞. with the same law as Ti . Then either all states in C are positive recurrent.i. An irreducible Markov chain is called positive recurrent if one (and hence all) states are positive recurrent. RECURRENT TRANSIENT POSITIVE RECURRENT NULL RECURRENT 42 . An irreducible Markov chain is called null recurrent if one (and hence all) states are null recurrent. Let C be a communicating class. .R2 R1 j i R1+1 R1+2 This random variable has the same law as T under P i i ζ We just need to show that the sum of the durations of excursions with indices R1 . But. . An irreducible Markov chain is called transient if one (and hence all) states are transient. .d. by Wald’s lemma. In other words. R2 has ﬁnite expectation. 2. . . where the Zr are i. or all states are null recurrent or all states are transient. the expectation of this random variable equals −1 Ei ζ = (Ei Ti ) × Ei (R2 − R1 + 1) = (Ei Ti ) × (gij + 1) because R2 − R1 is a geometric random variable: Pi (R2 − R1 = r) = (1 − gij )r−1 gij . An irreducible Markov chain is called recurrent if one (and hence all) states are recurrent. .

we stress that a stationary distribution may not be unique. Then there is a unique stationary distribution. the π deﬁned by π(0) = α. Ta −1 1 t t Ea 1(Xk = x) k=0 1(Xn = x) n=1 → E a Ta = π [a] (x). (18) π(x)Px (Ta < ∞) = π(x) = 1. But. Since the chain is irreducible and positive recurrent we have Pi (Tj < ∞) = 1. n ≥ 0) and suppose that P (X0 = x) = π(x). for totally trivial reasons (it is. what prohibits uniqueness. for all states i and j. By a theorem of Analysis (Bounded Convergence Theorem). π(1) = 0. Proof. there is a stationary distribution. Theorem 16. E 1 t t 1 t t 1(Xn = x) n=1 → π [a] (x). for all n. we can take expectations of both sides: E as t → ∞. after all. by (18). indeed. Example: Consider the chain we’ve seen before: 1 1/3 1 0 1/2 1 1/6 2 Clearly. π(2) = 1 − α 2. state 1 (and state 2) is positive recurrent. Fix a ground state a. But observe that there are many–inﬁnitely many–stationary distributions: For each 0 ≤ α ≤ 1. x∈S By (13). as t → ∞. for all x ∈ S. Consider the Markov chain (Xn . (You should think why this is OBVIOUS!) Notice that 1 This lack of communication is. we have P (Xn = x) = π(x). recognising that the random variables on the left are nonnegative and bounded below 1. following Theorem 14. x ∈ S. Consider positive recurrent irreducible Markov chain. x ∈ S. Then. is a stationary distribution.16 Uniqueness of the stationary distribution At the risk of being too pedantic. with probability one. we have. Hence. an absorbing state). n=1 43 . Then P (Ta < ∞) = x∈S In other words. 1(Xn = x) n=1 = 1 t t P (Xn = x) = π(x). we start the Markov chain from the stationary distribution π. Let π be a stationary distribution.

which is useful because it is often used as a criterion for positive recurrence: Corollary 13. where π(C) = y∈C π(y). Consider a positive recurrent irreducible Markov chain with stationary distribution π. The following is a sort of converse to that. Then for each positive recurrent communicating class C there corresponds a unique stationary distribution π C 44 .Therefore π(x) = π [a] (x). Therefore Ei Ti < ∞. x ∈ S. E a Ta Proof. for all a ∈ S. π(a) = π [a] (a) = 1/Ea Ta . 1 π(a) = . We can easily see that we thus obtain a Markov chain with state space C with stationary distribution π(·|C). But π(i|C) > 0. There is a unique stationary distribution.e. Let i be such that π(i) > 0. Then. This is in fact. so they are equal. the only stationary distribution for the restricted chain because C is irreducible. i is positive recurrent. Proof. In particular. 17 Structure of a stationary distribution It is not hard to show now what an arbitrary stationary distribution looks like. There is only one stationary distribution. In Section 14 we assumed the existence of a positive recurrent state a and proved the existence of a stationary distribution π. Hence there is only one stationary distribution. Both π [a] and π[b] are stationary distributions. Proof. we obtain the cycle formula: Ta −1 Tb −1 Ea k=0 1(Xk = x) E a Ta = Eb k=0 1(Xk = x) E b Tb . Then π [a] = π [b] . an arbitrary stationary distribution π must be equal to the speciﬁc one π [a] . Consider a Markov chain. Consider the restriction of the chain to C. By the 1 above corollary. Namely. Let C be the communicating class of i. Hence π = π [a] . from the deﬁnition of π [a] . Then any state i such that π(i) > 0 is positive recurrent. Thus. Decompose it into communicating classes. In particular. Consider an arbitrary Markov chain. The cycle formula merely re-expresses this equality. delete all transition probabilities pxy with x ∈ C. Assume that a stationary distribution π exists. Let π(·|C) be the distribution deﬁned by restricting π to C: π(x|C) := π(x) . y ∈ C. Corollary 12. Consider a positive recurrent irreducible Markov chain and two states a. Corollary 11. i. π(C) x ∈ C. for all x ∈ S. π(i|C) = Ei Ti . b ∈ S.

There is dissipation of energy due to friction and that causes the settling down. This is deﬁned by picking a state (any state!) a ∈ C and letting π C := π [a] . For each such C deﬁne the conditional distribution π(x|C) := π(x) . from physical experience. 18 18. where αC ≥ 0. So far we have seen. π(x|C) ≡ π C (x). If π(x) > 0 then x belongs to some positive recurrent class C. take an = (−1)n . Let π be an arbitrary stationary distribution. that it remains in a small region).which assigns positive probability to each state in C. Any stationary distribution π is necessarily of the form π= C: C positive recurrent communicating class αC π C . To each positive recurrent communicating class C there corresponds a stationary distribution π C . for example. the so-called Cesaro 1 averages n n P (Xk = x) converge to π(x). This is the stable position. it will settle to the vertical position. Why stability? There are physical reasons why we are interested in the stability of a Markov chain. Proposition 3. such that C αC = 1. A Markov chain is a simple model of a stochastic dynamical system. An arbitrary stationary distribution π must be a linear combination of these π C . A sequence of real numbers an k=1 1 may not converge but its Cesaro averages n (a1 + · · · + an ) may converge. We all know. So we have that the above decomposition holds with αC = π(C). By our uniqueness result. that. x∈C Then (π(x|C). For example. as n → ∞. In an ideal pendulum. eventually. consider the motion of a pendulum under the action of gravity. x ∈ C) is a stationary distribution. the pendulum will perform an 45 . more generally.1 Coupling and stability Deﬁnition of stability Stability refers to the convergence of the probabilities P (Xn = x) as n → ∞. where π(C) := π(C) π(x). that the pendulum will swing for a while and. Stability of a deterministic dynamical system means that the system eventually converges to a ﬁxed point (or. Proof. under irreducibility and positive recurrence conditions. without friction.

Still. let us consider a stochastic analogue of the above in discrete time: Xn+1 = ρXn + ξn . 4 3 2 x* 1 0 0 1 2 t 3 t4 If we start with x(t = 0) = x∗ . it cannot mean convergence to a ﬁxed equilibrium point. It follows that |ρ| < 1 is the condition for stability. random variables. . consider the following diﬀerential equation x = −ax + b. as t → ∞. We say that x∗ is a stable equilibrium. the one found by setting the right-hand side equal to zero: x∗ = b/a. The simplest diﬀerential equation As another example. But what does stability mean? In general. by means of the recursion above. are given. And we are in a happy situation here. . then x(t) = x∗ for all t > 0. . ξ1 . for we can do it: X1 = ρX0 + ξ0 X2 = ρX1 + ξ1 = ρ(ρX0 + ξ0 ) + ξ1 = ρ2 X0 + ρξ0 + ξ1 X3 = ρX2 + ξ2 = ρ3 X0 + ρ2 ξ0 + ρξ1 + ξ2 Xn = ρn X0 + ρn−1 ξ0 + ρn−2 ξ1 + · · · + ρξn−2 + ξn−1 . we may it is stable because it does not (it cannot!) swing too far away. The thing to observe here is that there is an equilibrium point i. we here assume that X0 . . Speciﬁcally. ξ0 . . ˙ There are myriads of applications giving physical interpretation to this. We shall assume that the ξn have the same law. A linear stochastic system We now pass on to stochastic systems. and deﬁne a stochastic sequence X1 . a > 0. ······ 46 . mutually independent. x ∈ R. so I’ll leave it upon the reader to decide whether he or she wants to consider this as an ﬁnancial example.oscillatory motion ad inﬁnitum. X2 . simple because the ξn ’s depend on the time parameter n. we have x(t) → x∗ . However notice what happens when we solve the recursion. or an example in physics. Since we are prohibited by the syllabus to speak about continuous time stochastic systems. . ξ2 . then regardless of the starting position. or whatever. moreover.e. If.

under nice conditions. the limit of its distribution does: ˜ ˆ ˆ lim P (Xn ≤ x) = lim P (Xn ≤ x) = lim P (Xn ≤ x) = P (X∞ ≤ x) n→∞ n→∞ n→∞ This is precisely the reason why we cannot. Starting at an elementary level. for any initial distribution. converges. we have ˜ ˆ P (Xn ≤ x) = P (Xn ≤ x). positive recurrent and aperiodic with (unique) stationary distribution π then n→∞ lim P (Xn = x) = π(x). which is a linear combination of ξ0 . . but we are not given what P (X = x. (e. . but notice that if we let ˆ Xn := ρn−1 ξn−1 + ρn−2 ξn−2 + · · · + ρξ1 + ξ0 . ξn−1 does not converge. to a matrix with rows all equal to π: (n) lim p n→∞ ij = π(j). we have that the ﬁrst term goes to 0 as n → ∞. uniformly in x. to try to make the coincidence probability P (X = Y ) as large as possible. Y = y) is. the n-step transition matrix Pn . How should we then deﬁne what P (X = x. suppose you are given the laws of two random variables X. . Y = y) is? 47 . The second term. Coupling refers to the various methods for constructing this joint law.3 Coupling To understand why the fundamental stability theorem works.2 The fundamental stability theorem for Markov chains Theorem 17.Since |ρ| < 1. Y = y) = P (X = x)P (Y = y). j ∈ S. while the limit of Xn does not exists. 18. . A straightforward (often not so useful) construction is to assume that the random variables are independent and deﬁne P (X = x. ˜ Xn := ρn−1 ξ0 + ρn−2 ξ1 + · · · + ρξn−2 + ξn−1 . g(y) = P (Y = y). In other words. deﬁne stability in a way other than convergence of the distributions of the random variables. for instance. But suppose that we have some further requirement. If a Markov chain is irreducible. as n → ∞. 18. In particular. for time-homogeneous Markov chains. we know f (x) = P (X = x).g. we need to understand the notion of coupling. i. Y but not their joint law. What this means is that. ∞ ˆ The nice thing is that. assume that the ξn are bounded) Xn does converge to ˆ X∞ = k=0 ρk ξk .

Since this holds for all x ∈ S and since the right hand side does not depend on x. but with the roles of X and Y interchanged. uniformly in x ∈ S. T is ﬁnite with probability one. A coupling of them refers to a joint construction on the same probability space. and all x ∈ S. Find the joint law h(x. n < T ) + P (Xn = x. Proof. This T is a random variables that takes values in the set of times or it may take values +∞. Y . i.e. |P (Xn = x) − P (Yn = x)| ≤ P (T > n). n ≥ T ) ≤ P (n < T ) + P (Yn = x). g(y) = x h(x. (19) as n → ∞. n ≥ T ) = P (Xn = x. The following result is the so-called coupling inequality: Proposition 4. precisely because of the assumption that the two processes have been constructed together. P (T < ∞) = 1. then |P (Xn = x) − P (Yn = x)| → 0. Combining the two inequalities we obtain the ﬁrst assertion: |P (Xn = x) − P (Yn = x)| ≤ P (T > n). x∈S 48 . y). x ∈ S. Y be random variables with values in Z and laws f (x) = P (X = x). for all n ≥ 0. The coupling we are interested in is a coupling of processes. Then. and which maximises P (X = Y ). in particular. but we are not told what the joint probabilities are. Suppose that we have two stochastic processes as above and suppose that they have been coupled. we can write it as sup |P (Xn = x) − P (Yn = x)| ≤ P (T > n). n ≥ 0. n ≥ 0). n ≥ 0) and the law of a stochastic process Y = (Yn . we obtain P (Yn = x) − P (Xn = x) ≤ P (T > n). We say that ransom time T is meeting time of the two processes if Xn = Yn for all n ≥ T. on the event that the two processes never meet. n < T ) + P (Yn = x. y) = P (X = x. i. f (x) = y h(x. n ≥ 0. g(y) = P (Y = y). respectively. Note that it makes sense to speak of a meeting time.e. Repeating (19). Y = y) which respects the marginals. We are given the law of a stochastic process X = (Xn . If. Let T be a meeting of two coupled stochastic processes X.Exercise: Let X. and hence P (Xn = x) − P (Yn = x) ≤ P (T > n). y). We have P (Xn = x) = P (Xn = x.

The ﬁrst process. 49 (n) (n) (n) (n) (n) (n) = P (Xn+1 = x′ | Xn = x) P (Yn+1 = y ′ | Yn = y) = px. denoted by X = (Xn .x′ py. . Y0 ) has distribution P (W0 = (x.y′ for all large n.x′ ). Yn ). y) = µ(x)π(y). y ′ ) | Wn = (x.) By positive recurrence. Having coupled them. n ≥ 0) has the law of a Markov chain with transition probabilities pij and X0 distributed according to some arbitrary distribution µ. sup |P (Xn = x) − P (Yn = x)| → 0. then.(y. Its (1-step) transition probabilities are q(x. π(x) > 0 for all x ∈ S.x′ ). Xm = y). n ≥ 0) is an irreducible chain. n ≥ 0) is independent of Y = (Yn . 18. Therefore. n ≥ 0) has the law of a Markov chain with transition probabilities pij and Y0 distributed according to the stationary distribution µ.(y. The second process.y′ . as usual. positive recurrent and aperiodic. denoted by Y = (Yn . We now couple them by doing the straightforward thing: we assume that X = (Xn . x∈S as n → ∞. we can deﬁne joint probabilities. implying that q(x. y) Its n-step transition probabilities are q(x. Then (Wn . We shall consider the laws of two processes. Xn and Yn would be independent and distributed according to π. for all n ≥ 1. and it also makes sense to consider their ﬁrst meeting time: T := inf{n ≥ 0 : Xn = Yn }. we have not deﬁned joint probabilities such as P (Xn = x. (Indeed. y) := π(x)π(y). We know that there is a unique stationary distribution.4 Proof of the fundamental stability theorem First. denoted by π.y′ .x′ ). had we started with both X0 and Y0 independent and distributed according to π. n ≥ 0).(y. So both processes are positive recurrent aperiodic Markov chains with the same transition probabilities. the transition probabilities.y′ ) := P Wn+1 = (x′ . y) ∈ S × S. and so (Wn . review the assumptions of the theorem: the chain is irreducible. in other words. Notice that σ(x. It is very important to note that we have only deﬁned the law of X and the law of Y but we have not deﬁned a joint law. we have that px. n ≥ 0). Then n→∞ lim P (T > n) = P (T = ∞) = 0.x′ > 0 and py. Consider the process Wn := (Xn . is a stationary distribution for (Wn .x′ py.y′ ) = px. (x. n ≥ 0) is a Markov chain with state space S × S. they diﬀer only in how the initial states are chosen.Assume now that P (T < ∞) = 1. Let pij denote.y′ ) for all large n. From Theorem 3 and the aperiodicity assumption. Its initial state W0 = (X0 .

This is not irreducible.0). X arbitrary distribution Z stationary distribution Y T Thus. y) ∈ S × S. Hence (Wn . |P (Zn = x) − P (Yn = x)| ≤ P (T > n). 50 . p10 = 1. p00 = 0. after the meeting time. which converges to zero. Since P (Zn = x) = P (Xn = x). suppose that S = {0. say.1) = q(1. 1)} and the transition probabilities are q(0. Example showing that the aperiodicity assumption is essential for the method used in the proof: If pij are the transition probabilities of an irreducible chain X and if Y is an independent copy of X. By the coupling inequality. ≥ 0) is a Markov chain with transition probabilities pij . (Zn . Clearly. Hence (Zn . 0). Obviously. In particular. (1. uniformly in x. Deﬁne now a third process by: Zn := Xn 1(n < T ) + Yn 1(n ≥ T ). P (Xn = x) = P (Zn = x). For example. However.(0. 1} and the pij are p01 = p10 = 1. 0). n ≥ 0). n ≥ 0) is identical in law to (Xn . if we modify the original chain and make it aperiodic by.1) = 1. precisely because P (T < ∞) = 1.1).01.1). n ≥ 0) is positive recurrent. as n → ∞. (1. and since P (Yn = x) = π(x). then the process W = (X. (0. p01 = 0. for all n and x. Z0 = X0 which has an arbitrary distribution µ. Then the product chain has state space S ×S = {(0.(0. Zn = Yn . y) > 0 for all (x. we have |P (Xn = x) − π(x)| ≤ P (T > n).0). Zn equals Xn before the meeting time of X and Y . by assumption.Therefore σ(x. Moreover.99. 1).0) = q(1.0) = q(0. Y ) may fail to be irreducible.(1. T is a meeting time between Z and Y . In particular.(1. P (T < ∞) = 1.

p10 = 1. 18. customary to have a consistent terminology. .01. The reader is asked to pay attention to this terminology which is diﬀerent from the one used in the Markov chain literature.e. It is not diﬃcult to see that 6 We put “wrong” in quotes. Parry. most people use the term “ergodic” for an irreducible.6 Periodic chains Suppose now that we have a positive recurrent irreducible chain. 0. 51 . per se wrong. π(1) ≈ 0. then (n) lim p n→∞ 00 (n) = lim p10 = π(0). 1981). limn→∞ p00 does not exist. n→∞ lim p00 p01 (n) (n) p10 p11 (n) (n) = 0. i.f. Indeed. and so on. p01 = 0. however.99. positive recurrent and aperiodic Markov chain. C1 .497. 6 The reason is that the term “ergodic” means something speciﬁc in Mathematics (c. Suppose X0 = 0. n→∞ (n) where π(0).503. Therefore p00 = 1 for even n (n) and = 0 for odd n. . in particular. Terminology cannot be. π(0) ≈ 0. π(0) + π(1) = 1. However. It is. π(1) satisfy 0.4 and. But this is “wrong”. By the results of §5.503 0. from Cd back to C0 .503 0. if we modify the chain by p00 = 0. Then Xn = 0 for even n and = 1 for odd n. Example showing that the aperiodicity assumption is essential for the limiting result: Consider the same chain as above: p01 = p10 = 1.497 . Suppose that the period of the chain is d. . A Markov chain which is irreducible and positive recurrent and aperiodic will be called mixing. we can decompose the state space into d classes C0 .5 Terminology A Markov chain which is irreducible and positive recurrent will be called ergodic. n→∞ (n) (n) lim p n→∞ 11 = lim p01 = π(1).then the product chain becomes irreducible. we know we have a unique stationary distribution π and that π(x) > 0 for all states x. Cd−1 . from C1 to C2 . and this is expressed by our terminology.99π(0) = π(1).497 18. . In matrix notation. Theorem 4. By the results of Section 14. such that the chain moves cyclically between them: from C0 to C1 . Thus.

. n ≥ 0) is aperiodic. . n ≥ 0) to any of the sets Cr we obtain a positive recurrent irreducible aperiodic chain. This means that: Theorem 18. as n → ∞. . Cd−1 . Summing up over i ∈ Cr we obtain αr = i∈Cr π(i) = j∈Cr−1 π(j) i∈Cr pji = j∈Cr−1 π(j) = αr−1 . after reduction by d) and. Then. The chain (Xnd . i ∈ S. . All states have period 1 for this chain. We shall show that Lemma 9. i ∈ Cr . . we have pij for all i. namely C0 .Lemma 8. starting from i. i ∈ Cr . as n → ∞. → dπ(j). Then pij (r+nd) (nd) = Pi (Xnd = j) → dπ(j). d − 1. Then pji = 0 unless j ∈ Cr−1 . . Hence if we restrict the chain (Xnd . the original chain enters Cℓ in r = ℓ − k steps (where r is taken to be a number between 0 and d − 1. j ∈ Cℓ . and the previous results apply. the (unique) stationary distribution of (Xnd . 52 . Suppose that X0 ∈ Cr . and let r = ℓ − k (modulo d). j ∈ Cℓ . Let αr := i∈Cr π(i). . Then. Can we also ﬁnd the limit when i and j do not belong to the same class? Suppose that i ∈ Ck . . π(i) = 1/d. Suppose that pij are the transition probabilities of a positive recurrent irreducible chain with period d. n ≥ 0) consists of d closed communicating classes. if sampled every d steps. So π(i) = j∈Cr−1 π(j)pji . 1. Since the αr add up to 1. each one must be equal to 1/d. j ∈ Cr . Suppose i ∈ Cr . Let i ∈ Ck . where π is the stationary distribution. i∈Cr for all r = 0. Since (Xnd . it remains in Cℓ . n ≥ 0) is dπ(i). We have that π satisﬁes the balance equations: π(i) = j∈S π(j)pji . thereafter. Proof.

Hence pi k i. p3 . We also saw in Corollary (n) 7 that. (nk ) k → pi . . By construction. . then pij → π(j) (Theorems 17). .e. It is clear how we (ni ) k continue. For each i we have found subsequence ni . and so on. for all i ∈ S. whose proof we omit7 : 7 See Br´maud (1999). The proof will be based on two results from Analysis. . . for each The second result is Fatou’s lemma.. the ﬁrst of which is known as HellyBray Lemma: Lemma 10. as k → ∞. k = 1. Now consider the contained between 0 and 1 there must be a exists and equals. p(n) = (n) (n) (n) (n) (n) ∞ p1 . say. for each i. Then we can ﬁnd a i=1 pi sequence n1 < n2 < n3 < · · · of integers such that limk→∞ pi Proof.) is a k k subsequence of each (ni . If the period is d then the limit exists only along a selected subsequence (Theorem 18). i. . a probability distribution on N. are contained between 0 and 1. the sequence n2 of the second row k is a subsequence of n1 of the ﬁrst row. 19. as n → ∞. pi .. 2. exists for all i ∈ N. such that limk→∞ p1 k 3 (n1 ) numbers p2 k . If j is a null recurrent state then pij → 0. Recall that we know what happens when j is positive recurrent and i is in the same commu(n) nicating class as j If the period is 1. k = 1. say. p2 . . p1 . . . p2 . . 2. for all i ∈ S. schematically. . . then pij → 0 as n → ∞. k = 1. p1 p1 (n1 ) 1 p1 (n1 ) 2 (n2 ) 2 p1 (n1 ) 3 (n2 ) 3 (n3 ) 3 (n2 ) 1 (n3 ) 1 ··· ··· ··· ··· → → → ··· p1 p2 p3 ··· p1 p1 ··· p1 p1 ··· (n3 ) 2 p1 ··· Now pick the numbers in the diagonal. Since they are (n2 ) subsequence n2 of n1 such that limk→∞ p1 k k k (nk ) (n) as n → ∞. . Since the numbers p1 n1 1 n1 2 (n) (n1 ) < < n1 < · · · .1 Limiting probabilities for null recurrent states Theorem 19.). . the sequence n3 of the third row is a subsequence k k of n2 of the second row. . there must be a subsequence exists and equals. So the diagonal subsequence (nk . 2. We have. . 2. .19 Limiting theory for Markov chains* We wish to complete our discussion about the possible limits of the n-step transition prob(n) abilities pij . . e 53 . if j is transient. k = 1. where = 1 and pi ≥ 0 for all i. 2. . Consider. for each n = 1. such that limk→∞ pi k exists and equals. say.

Suppose that for some i. x∈S (n′ ) Since aj > 0. for some x0 ∈ S. equalities. are sequences of nonnegative numbers then ∞ (n) yi lim inf xi i=1 n→∞ (n) ∞ ≤ lim inf n→∞ yi x i . We wish to show that these inequalities are. y∈S x ∈ S. i. and this is a contradiction (the number αx = αx cannot be strictly larger than itself). αy pyx . actually. pix k = 1. i. This would imply that αx > x∈S x∈S y∈S x∈S αy pyx = y∈S αy x∈S pyx = y∈S αy . If (yi . . y∈S Since x∈S αx ≤ 1. π is a stationary distribution. by assumption. αx ≤ lim inf k→∞ (n′ ) for all x ∈ S. Suppose that one of them is not. we have that π(x) := αx y∈S αy satisﬁes the balance equations and has x∈S π(x) = 1. 2. Hence we can ﬁnd a subsequence nk such that k→∞ lim pij k = αj > 0. k→∞ (n′ +1) = y∈S piy k pyx . and the last equality is obvious since we have a probability vector. i ∈ N).e.e. But Pn+1 = Pn P. (n′ ) lim pix k (n′ +1) ≥ y∈S k→∞ lim piy k pyx . Now π(j) > 0 because αj > 0. i. we can pick subsequence (n′ ) of (nk ) such that k k→∞ lim pix k = αx . pix k Therefore. n = 1. (xi . i ∈ N). x ∈ S. state j is positive recurrent: Contradiction. x ∈ S .e. it is a strict inequality: αx0 > y∈S αy pyx0 . the sequence (n) pij does not converge to 0. i=1 (n) Proof of Theorem 19. Let j be a null recurrent state. 54 . by Lemma 11. .Lemma 11. Thus. . (n ) (n ) Consider the probability vector By Lemma 10. (n′ ) Hence αx ≥ αy pyx . By Corollary 13. pix k . we have 0< x∈S The inequality follows from Lemma 11.

Pµ (Xn−m = j) → π C (j). From the Strong Markov Property. Let ϕC (i) := Pi (HC < ∞). Proof. The result when the period j is larger than 1 is a bit awkward to formulate. γ < 1): γ 1−γ 1 1 2 1−α α 3 β 4 1−β 5 1 55 . HC ≤ n) n (n) = m=1 Pi (HC = m)Pi (Xn = j | HC = m). Let C be the communicating class of j and let π C be the stationary distribution associated with C. when j is a positive recurrent state but i and j do not necessarily belong to the same communicating class. But Theorem 17 tells us that the limit exists regardless of the initial distribution.2. In general. Example: Consider the following chain (0 < α. E j Tj where Tj = inf{n ≥ 1 : Xn = j}. Remark: We can write the result of Theorem 20 as (n) lim p n→∞ ij = fij . Let i be an arbitrary state. This is a more old-fashioned way of getting the same result. as n → ∞. so we will only look that the aperiodic case. Indeed. as n → ∞. β. If i also belongs to C then ϕC (i) = 1 and so the result follows from Theorem 17.2 The general case (n) It remains to see what the limit of pij is. Let j be a positive recurrent state with period 1. we have Pi (Xn = j | HC = m) = Pµ (Xn−m = j). as in §10. the result can be obtained by the so-called Abel’s Lemma of Analysis. Hence (n) lim p n→∞ ij ∞ = π (j) m=1 C Pi (HC = m) = π C (j)Pi (HC < ∞). and fij = ϕC (j).19. In this form. that is. and fij = Pi (Tj < ∞). which is what we need. we have pij = Pi (Xn = j) = Pi (Xn = j. π C (j) = 1/Ej Tj . Then (n) lim p n→∞ ij = ϕC (i)π C (j). Let HC := inf{n ≥ 0 : Xn ∈ C}. starting with the ﬁrst hitting time decomposition relation of Lemma 7. Theorem 20. Let µ be the distribution of XHC given that X0 = i and that {HC < ∞}.

Let f be a reward function such that |f (x)|ν [a] (x) < ∞. π C2 (5) = 1. 4} (inessential states). ranging from Mathematical Physics and Dynamical Systems to Probability and Statistics. p44 → 0. The ergodic hypothesis in physics says that the proportion of time that a particle spends in a particular region of its “phase space” is proportional to the volume of this region. We have ϕ(1) = ϕ(2) = 1. we assumed the existence of a positive recurrent state. 2} (closed class). p41 → ϕ(4)π C1 (1). while. information about its velocity. Theorem 21. 56 (20) x∈S . Ergodic Theory is nowadays an area that combines many ﬁelds of Mathematics and Science. 1 − αβ 1 − αβ Both C1 . in Theorem 14. ϕ(5) = 0. C2 = {5} (absorbing state). Theorem 14 is a type of ergodic theorem. The phase (or state) space of a particle is not just a physical space but it includes. Pioneers in the are were. ϕ(3) = p31 → ϕ(3)π C1 (1). π C1 (2) = . Recall that. the 19-th century French mathematicians Henri Poincar´ and e Joseph Liouville. We will avoid to give a precise deﬁnition of the term ‘ergodic’ but only mention what constitutes a result in the area. from ﬁrst-step analysis. Consider the quantity ν [a] (x) deﬁned as the average number of visits to state x between two successive visits to state a–see 16. 1+γ 1+γ Let ϕ(i) be the probability that class C1 will be reached starting from state i. C2 are aperiodic classes. (n) (n) p35 → (1 − ϕ(3))π C2 (5). For example. 20 Ergodic theorems An ergodic theorem is a result about time-averages of random processes. whence 1−α β(1 − α) . We now generalise this. (n) (n) p43 → 0. as n → ∞. for example.The communicating classes are I = {3. Hence. And so is the Law of Large Numbers itself (Theorem 13). ϕ(4) = (1 − β)ϕ(5) + βϕ(3). p42 → ϕ(4)π C1 (2). (n) (n) p34 → 0. Suppose that a Markov chain possesses a recurrent state a ∈ S and starts from an initial state X0 such that P (Ta < ∞) = 1. The subject of Ergodic Theory has its origins in Physics. The stationary distributions associated with the two closed classes are: 1 γ π C1 (1) = . ϕ(3) = (1 − α)ϕ(2) + αϕ(4) . p45 → (1 − ϕ(4))π C2 (5). (n) (n) p32 → ϕ(3)π C1 (2). Similarly. (n) (n) p33 → 0. ϕ(4) = . C1 = {1. among others.

57 . so the numerator converges to f /Ea Ta . we have that the denominator equals Nt /t and converges to 1/Ea Ta . respectively. We only need to check that EG1 = f . Now. then. t t→∞ t t t with probability one. But this is precisely the condition (20). we have C := i∈S ν(i) < ∞. we have t→∞ lim 1 last 1 G = lim Gﬁrst = 0. 21 Finite chains We take a brief look at some things that are speciﬁc to chains with ﬁnitely many states. So there are always recurrent states. and this is left as an exercise. which is the quantity denoted by f in Theorem 14. we have Nt /t → 1/Ea Ta .Then P where f := x∈S t→∞ lim t n=0 t n=0 f (Xn ) 1(Xn = a) =f =1 . As in Theorem 14. Proof. If so. From proposition 2 we know that the balance equations ν = νP have at least one solution. As in the proof of Theorem 14. Glast are the total rewards over the ﬁrst and t t a a last incomplete cycles. So if we divide by t the numerator and denominator of the fraction in (21). The reader is asked to revisit the proof of Theorem 14. as we saw in the proof of Theorem 14. Replacing n by the subsequence Nt . and this justiﬁes the terminology of §18. r=0 with probability 1 as long as E|G1 | < ∞. since the number of states is ﬁnite. (21) f (x)ν [a] (x). Remark 1: The diﬀerence between Theorem 14 and 21 is that. The Law of Large Numbers tells us that 1 lim n→∞ n n−1 Gr = EG1 .5. Remark 2: If a chain is irreducible and positive recurrent then the Ergodic Theorem Theorem 14 holds. we obtain the result. we assumed positive recurrence of the ground state a. in the former. Note that the denominator t n=0 1(Xn = a) equals the number Nt of visits to state a up to time t. we break the total reward as follows: t Nt −1 f (Xn ) = n=0 r=1 Gr + Gﬁrst + Glast . while Gﬁrst . t t where Gr = T (r) ≤t<T (r+1) f (Xt ). Then at least one of the states must be visited inﬁnitely often.

for all n. Then the chain is reversible if and only if the detailed balance equations hold: π(i)pij = π(j)pji . We summarise the above discussion in: Theorem 22. where P is the transition matrix and Π is a matrix all the rows of which are equal to π. In addition. we have Pn → Π. We say that it is time-reversible. X1 = j) = P (X1 = i. Taking n = 1 in the deﬁnition of reversibility. simply. it will look the same. this state must be positive recurrent. this means that it is stationary. Consider an irreducible Markov chain with ﬁnitely many states. X0 ). Xn has the same distribution as X0 for all n. At least some state i must have π(i) > 0. Proof. Consider an irreducible Markov chain with transition probabilities pij and stationary distribution π. . like playing a ﬁlm backwards. the chain is irreducible. 22 Time reversibility Consider a Markov chain (Xn . j ∈ S. Theorem 23. In particular. a ﬁnite Markov chain always has positive recurrent states. reversible if it has the same distribution when the arrow of time is reversed. Suppose ﬁrst that the chain is reversible. 58 . . i ∈ S. Lemma 12. the stationary distribution is unique. . or. A reversible Markov chain is stationary. as n → ∞. Because the process is Markov. then we instantly know that the ﬁlm is played in reverse. actually. i. .Hence 1 ν(i). because positive recurrence is a class property (Theorem 15). that is. Therefore π is a stationary distribution. More precisely. running backwards. X0 ). and run the ﬁlm backwards. But if we ﬁlm a man who walks and run the ﬁlm backwards. X1 ) has the same distribution as (X1 . in a sense. π(i) := Hence. Xn ) has the same distribution as (Xn . . Xn−1 . Xn ) has the same distribution as (Xn . It is. Reversibility means that (X0 . . . in addition. if we ﬁlm a standing man who raises his arms up and down successively. X0 ) for all n. Another way to put this is by saying that observing any part of the process will not reveal any information about whether the process runs forwards or backwards in time. If. By Corollary 13. Proof. . Then the chain is positive recurrent with a unique stationary distribution π. . without being able to tell that it is. If. the chain is aperiodic. by Theorem 16. . (X0 . i. X0 = j). . n ≥ 0). . . P (X0 = i. . . For example. we see that (X0 . Xn−1 . C satisﬁes the balance equations and the normalisation condition i∈S π(i) = 1. the ﬁrst components of the two vectors must have the same distribution. then all states are positive recurrent. . in addition.e. X1 . X1 .

using the DBE. X1 . But P (X0 = i. . π(i)pij pjk pki = [π(j)pji ]pjk pki = [π(j)pjk ]pji pki = [π(k)pkj ]pji pki = [π(k)pki ]pji pkj = [π(i)pik ]pji pkj = π(i)pik pkj pji Since the chain is irreducible. . Notes: 1) A necessary condition for reversibility is that pij > 0 if and only if pji > 0. it is easy to show that. we have π(i) > 0 for all i. and pi1 i2 (m−3) → π(i2 ). pji 59 (m−3) → π(i1 ). Xn ) has the same distribution as (Xn . . j. and write. . k. . letting m → ∞. . . we have separation of variables: pij = f (i)g(j). Hence the DBE hold. pi1 i2 pi2 i3 · · · pim−1 im = pi1 im pim im−1 · · · pi2 i1 . im ∈ S and all m ≥ 3. By induction. . im−1 we obtain (m−3) (m−3) pi1 i2 pi2 i1 = pi1 i2 pi2 i1 If the chain is aperiodic. k. . Suppose ﬁrst that the chain is reversible. Hence cancelling the π(i) from the above display we obtain the KLC for m = 3. Now pick a state k and write. for all n. Pick 3 states i. i2 .for all i and j. . X2 = i). The DBE imply immediately that (X0 . So the detailed balance equations (DBE) hold. X1 = i) = π(j)pji . . . The same logic works for any m and can be worked out by induction. if there is an arrow from i to j in the graph of the chain but no arrow from j to i. i4 . (X0 . Conversely. So the chain is reversible. . suppose the DBE hold. X1 = j) = π(i)pij . Xn−1 . then the chain cannot be reversible. Summing up over all choices of states i3 . suppose that the KLC (22) holds. X2 = k) = P (X0 = k. and this means that P (X0 = i. (22) Proof. . X1 = j. . X2 ) has the same distribution as (X2 . for all i. X1 = j. X1 . X1 . we have pi2 i1 implying that π(i1 )pi1 i2 = π(i2 )pi2 i1 . and hence (X0 . Then the chain is reversible if and only if the Kolmogorov’s loop criterion holds: for all i1 . X0 ). In other words. X0 ). Theorem 24. j. π(i)pij pjk = [π(j)pji ]pjk = [π(j)pjk ]pji = π(k)pkj pji . Conversely. We will show the Kolmogorov’s loop criterion (KLC). . P (X0 = j. then. X1 ) has the same distribution as (X1 . X0 ). . These are the DBE. using DBE. and the chain is reversible. 2) The DBE are equivalent to saying that the ratio pij /pji can be written as a product of a function of i and a function of j. Consider an aperiodic irreducible Markov chain with transition probabilities pij and stationary distribution π.

1) which gives π(i)pij = π(j)pji . meaning that it has no loops. j for which pij > 0. Hence the original chain is reversible. The reason is as follows. If we remove the link between them. then we need. And hence π can be found very easily.(Convention: we form this ratio only when pij > 0. so we have pij g(j) = . to guarantee.e. This gives the following method for showing that a ﬁnite irreducible chain is reversible: • Form the ratio pij /pji and show that the variables separate. i. For example. that the chain is positive recurrent. the detailed balance equations. Example 1: Chain with a tree-structure. Consider a positive recurrent irreducible Markov chain such that if pij > 0 then pji > 0. by assumption. • If the chain has inﬁnitely many states. So g satisﬁes the balance equations. the two oriented edges from i to j and from j to i by a single edge without arrow. pji g(i) Multiplying by g(i) and summing over j we ﬁnd g(i) = j∈S g(j)pji . Ac be the sets in the two connected components. Form the graph by deleting all self-loops. A) (see §4. the graph becomes disconnected. Ac ) = F (Ac .) Let A. using another method. Pick two distinct states i. and the arrow from j to j is the only one from Ac to A. If the graph thus obtained is a tree. namely the arrow from i to j. and by replacing. f (i)g(i) must be 1 (why?). there isn’t any. then the chain is reversible. Hence the chain is reversible. for each pair of distinct states i. (If it didn’t. and. The balance equations are written in the form F (A. in addition. consider the chain: Replace it by: This is a tree. there would be a loop.) Necessarily. j for which pij > 0. 60 . There is obviously only one arrow from AtoAc .

In all cases. The stationary distribution assigns to a state probability which is proportional to its degree. Consider an undirected graph G with vertices S. Hence we conclude that: 1. There are two cases: If i. i∈S Since δ(i) = 2|E|. π(i) = 1.Example 2: Random walk on a graph. For each state i deﬁne its degree δ(i) = number of neighbours of i. and a set E of undirected edges. If i. Each edge connects a pair of distinct vertices (states) without orientation. 2. To see this let us verify that the detailed balance equations hold. π(i)pij = π(j)pji . if j is a neighbour of i otherwise. The Markov chain with these transition probabilities is called a random walk on a graph (RWG). the DBE hold. States j are i are called neighbours if they are connected by an edge. 0. j are neighbours 1 1 then π(i)pij = Cδ(i) δ(i) = C and π(j)pji = Cδ(j) δ(j) = C. so both members are equal to C. etc. 61 . i. 1 4 0 2 3 Here we have p12 = p10 = p14 = 1/3. For example. j are not neighbours then both sides are 0. Suppose the graph is connected. i∈S where |E| is the total number of edges. The constant C is found by normalisation. We claim that π(i) = Cδ(i). we have that C = 1/2|E|.e. Now deﬁne the the following transition probabilities: pij = 1/δ(i). where C is a constant. The stationary distribution of a RWG is very easy to ﬁnd. a ﬁnite set. p02 = 1/4. A RWG is a reversible chain.

. i.23 New Markov chains from old ones* We now deal with the question: if we have a Markov chain (Xn ).3 Subordination Let (Xn . if. . . St+1 = St + ξt .) has time-homogeneous transition probabilities: P (Yk+1 = j|Yk = i) = pij . irreducible and positive recurrent). 1. If the subsequence forms an arithmetic progression. If π is the stationary distribution of the original chain then π/π(A) is the stationary distribution of (Yr ). 23. . . if nk = n0 + km. In this case. A r = 1. 2. how can we create new ones? 23. . 62 n ∈ N. .e. a sequence of integers (nk )k≥0 such that 0 ≤ n0 < n1 < n2 < · · · and by letting Yk := Xnk . k = 0. Let (St . 2. . . 2.i. in addition. this process is also a Markov chain and it does have time-homogeneous transition probabilities. π is a stationary distribution for the original chain.2 Watching a Markov chain when it visits a set Suppose that the original chain is ergodic (i. Let A be (r) a set of states and let TA be the r-th visit to A. 1. where pij are the m-step transition probabilities of the original chain. k = 0. random variables with values in N and distribution λ(n) = P (ξ0 = n). . . t ≥ 0. i. t ≥ 0) be independent from (Xn ) with S0 = 0. 1. 2. . . k = 0. . where ξt are i. Deﬁne Yr := XT (r) . . (m) (m) 23. in general.) has the Markov property but it is not time-homogeneous. The sequence (Yk .d. then (Yk . then it is a stationary distribution for the new chain.e. Due to the Strong Markov Property. 2. The state space of (Yr ) is A. 1. . n ≥ 0) be Markov with transition probabilities pij . . k = 0.1 Subsequences The most obvious way to create a new Markov chain is by picking a subsequence.e.

. j. a Markov chain. that there is a function Q : S ′ × S ′ → R such that P (Yn+1 = β | Xn = i) = Q(f (i). Yn−1 = αn−1 }. Then (Yn ) has the Markov property. . αn−1 ∈ S ′ and let Yn−1 be the event Yn−1 := {Y0 = α0 .4 Deterministic function of a Markov chain If (Xn ) is a Markov chain with state space S and if f : S → S ′ is some function from S to some countable set S ′ . ∞ λ(n)pij . β. however. α1 . we have: Lemma 13. then the process Yn = f (Xn ).e. .Then Yt := XSt . Indeed. n≥0 63 . . is a Markov chain with P (Yt+1 = j|Yt = i) = n=1 t ≥ 0. is NOT. If. We need to show that P (Yn+1 = β | Yn = α. . . . conditional on Yn the past is independent of the future. . (Notation: We will denote the elements of S by i.) More generally.. Suppose that P (Yn+1 = β | Xn = i) depends on i only through f (i). f is a one-to-one function then (Yn ) is a Markov chain. i. knowledge of f (Xn ) implies knowledge of Xn and so. . (n) 23. . in general. β). . Proof. . . Fix α0 .. Yn−1 ) = P (Yn+1 = β | Yn = α). and the elements of S ′ by α. .

We have that P (Yn+1 = β|Xn = i) = Q(|i|. β) i∈S P (Yn = α|Xn = i. Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. the absolute value of next state will either increase by 1 or decrease by 1 with probability 1/2. then |Xn+1 | = 1 for sure. β). β) P (Yn = α|Xn = i. if β = 1. Yn = α. depends on i only through |i|. Deﬁne Yn := |Xn |.for all choices of the variables involved. β)P (Xn = i | Yn = α. β) P (Yn = α. i.e. On the other hand. Yn−1 ) Q(f (i). observe that the function | · | is not one-to-one. for all i ∈ Z and all β ∈ Z+ . In other words. conditional on {Xn = i}. β) := 1. because the probability of going up is the same as the probability of going down. Xn = i|Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. Then (Yn ) is a Markov chain. a Markov chain with transition probabilities Q(α. if β = α ± 1. But we will show that P (Yn+1 = β|Xn = i). Example: Consider the symmetric drunkard’s random walk. But P (Yn+1 = β | Yn = α. Yn−1 ) = i∈S P (Yn+1 = β | Xn = i. indeed. and so (Yn ) is. P (Xn+1 = i ± 1|Xn = i) = 1/2. Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = i∈S = i∈S = i∈S Q(f (i). β). α = 0. regardless of whether i > 0 or i < 0. 64 . Thus (Yn ) is Markov with transition probabilities Q(α. β). To see this. Yn−1 )P (Xn = i | Yn = α. Yn−1 ) Q(f (i). if i = 0. Indeed. α > 0 Q(α. β) 1(α = f (i))P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. β ∈ Z+ . if we let 1/2. i ∈ Z. i ∈ Z. β) P (Yn = α|Yn−1 ) i∈S = Q(α. and if i = 0. β).

and. (and independent of the initial population size X0 –a further assumption). so we will ﬁnd some diﬀerent method to proceed. But here is an interesting special case. and following the same distribution. If we know that Xn = i then Xn+1 is the number of successful births amongst i individuals. . . . 2. . which has a binomial distribution: i j pij = p (1 − p)i−j . It is a Markov chain with time-homogeneous transitions. 1. .. we have that (Xn . We distinguish two cases: Case I: If p0 = P (ξ = 0) = 0 then the population can only grow. 0 But 0 is an absorbing state. Hence all states i ≥ 1 are inessential. j In this case. Then P (ξ = 0) + P (ξ ≥ 2) > 0. .d. Example: Suppose that p1 = p. with probability 1.24 24. . Case II: If p0 = P (ξ = 0) > 0 then state 0 can be reached from any other state: indeed. k = 0. .1 Applications Branching processes: a population growth application We describe a basic model for population growth known as the Galton-Watson process. Therefore. independently from one another. if the current population is i there is probability pi that everybody gives birth to zero oﬀspring. since ξ (0) . ξ (n) ). a hard problem. Thus. ξ (1) . each individual gives birth to at one child with probability p or has children with probability 1 − p. if the 65 (n) (n) . If we let ξ (n) = ξ1 . Hence all states i ≥ 1 are transient. Then the size of the (n + 1)-st generation is: (n) (n) (n) Xn+1 = ξ1 + ξ2 + · · · + ξXn . If p1 = P (ξ = 1) = 1 then Xn = X0 + n and this is a trivial case. . The idea is this: At each point of time n we have a number Xn of individuals each of which gives birth to a random number of oﬀspring. are i. in general. Back to the general case. Computing the transition probabilities pij = P (Xn+1 = j|Xn = i) is. the population cannot grow: it will eventually reach 0. 0 ≤ j ≤ i. So assume that p1 = 1. We ﬁnd the population (n) at time n + 1 by adding the number of oﬀspring of each individual.i. Let ξk be the number of oﬀspring of the k-th individual in the n-th generation. We let pk be the probability that an individual will bear k oﬀspring. one question of interest is whether the process will become extinct or not. ξ2 . The state 0 is irrelevant because it cannot be reached from anywhere. if 0 is never reached then Xn → ∞. n ≥ 0) has the Markov property. In other words. then we see that the above equation is of the form Xn+1 = f (Xn . and 0 is always an absorbing state. therefore transient. The parent dies immediately upon giving birth. p0 = 1 − p. We will not attempt to compute the pij neither draw the transition diagram because they are both futile.

We then have that D = D1 ∩ · · · ∩ Di . Let ε be the extinction probability of the process. and. . . we have n→∞ lim ϕn (0) = lim P (Xm = 0 for all m ≥ n) n→∞ = P (there exists n such that Xm = 0 for all m ≥ n) = P (D) = ε. The result is this: Proposition 5. Proof. and P (Dk ) = ε. when we start with X0 = i. We wish to compute the extinction probability ε(i) := Pi (D). if we start with X0 = i in ital ancestors. Obviously. 66 . ϕn (0) = P (Xn = 0). . If µ ≤ 1 then the ε = 1. Consider a Galton-Watson process starting with X0 = 1 initial ancestor. If µ > 1 then ε < 1 is the unique solution of ϕ(z) = z. . ε(0) = 1. let Dk be the event that the branching process corresponding to the k-th initial ancestor dies. This function is of interest because it gives information about the extinction probability ε. Indeed. each one behaves independently of one another.population does not become extinct then it must grow to inﬁnity. i. For each k = 1. where ε = P1 (D). Let ∞ µ = Eξ = k=1 kpk be the mean number of oﬀspring of an individual. since Xn = 0 implies Xm = 0 for all m ≥ n. Indeed. we can argue that ε(i) = εi . Let D be the event of ‘death’ or extinction: D := {Xn = 0 for some n ≥ 0}. We let ϕn (z) := Ez Xn ∞ = k=0 z k P (Xn = k) be the probability generating function of Xn . Therefore the problem reduces to the study of the branching process starting with X0 = 1. Let ∞ ϕ(z) := Ez ξ = k=0 z k pk be the probability generating function of ξ. so P (D|X0 = i) = εi . For i ≥ 1.

This means that ε = 1 (see left ﬁgure below). In particular.) 67 . = i=0 ∞ E z ξ 1 z ξ 2 · · · z ξ Xn E z ξ1 z ξ2 · · · z ξi (n) (n) (n) (n) (n) (n) (n) (n) 1(Xn = i) 1(Xn = i) (n) = i=0 ∞ = i=0 ∞ E[z ξ1 ] E[z ξ2 ] · · · E[z ξi ] E[1(Xn = i)] ϕ(z)i P (Xn = i) = i=0 = E ϕ(z)Xn = ϕn (ϕ(z)). Since both ε and ε∗ satisfy z = ϕ(z). ϕ(ϕn (0)) → ϕ(ε). ϕn (0) ≤ ε∗ . recall that ϕ is a convex function with ϕ(1) = 1. and. By induction. then the graph of ϕ(z) does intersect the diagonal at some point ε∗ < 1. (See right ﬁgure below. Hence ϕ2 (z) = ϕ1 (ϕ(z)) = ϕ(ϕ(z)). Therefore ε = ε∗ . and so on. all we need to show is that ε < 1. in fact. Notice now that µ= d ϕ(z) dz . and since the latter has only two solutions (z = 0 and z = ε∗ ).Note also that ϕ0 (z) = 1. But ϕn+1 (0) → ε as n → ∞. z=1 Also. Also ϕn (0) → ε. But 0 ≤ ε∗ . Hence ϕ(0) ≤ ϕ(ε∗ ) = ε∗ . We then have ϕn+1 (z) = Ez Xn+1 = E z ξ1 z ξ2 · · · z ξXn ∞ (n) (n) (n) ϕ1 (z) = ϕ(z). ϕ3 (z) = ϕ2 (ϕ(z)) = ϕ(ϕ(ϕ(z))) = ϕ(ϕ2 (z)). if the slope µ at z = 1 is ≤ 1. Therefore. But ϕn (0) → ε. since ϕ is a continuous function. the graph of ϕ(z) lies above the diagonal (the function z) and so the equation z = ϕ(z) has no solutions other than z = 1. Therefore ε = ϕ(ε). ϕn+1 (0) = ϕ(ϕn (0)). On the other hand. if µ > 1. ϕn+1 (z) = ϕ(ϕn (z)). ε = ε∗ . Let us show that. So ε ≤ ε∗ < 1.

d. there is no time to make a prior query whether a host has something to transmit or not. 5. we let hosts transmit whenever they like. . So if there are. then the transmission is not successful and the transmissions must be attempted anew. . 2. 1. in a simpliﬁed model. then max(Xn + An − 1. 1. otherwise.2 The ALOHA protocol: a computer communications application This is a protocol used in the early days of computer networking for having computer hosts communicate over a shared medium (radio channel). . on the next times slot. 3 hosts. 3. The problem of communication over a common channel. 2. Since things happen fast in a computer network. 1. Nowadays. Then ε = 1. . periodically. If collision happens. assume that on a time slot there are x packets amongst all hosts that are waiting for transmission (they are ‘backlogged’). say. 2. So. If s = 1 there is a successful transmission and. Ethernet has replaced Aloha. One obvious solution is to allocate time slots to hosts in a speciﬁc order. 68 . During this time slot assume there are a new packets that need to be transmitted. But this is not a good method it is a waste of time to allocate a slot to a host if that host has nothing to transmit. 3. there are x + a packets. are allocated to hosts 1. 3. Let s be the number of selected packets. there is a collision and. on the next times slot. 6. for a packet to go through. if Sn = 1. . only one of the hosts should be transmitting at a given time. k ≥ 0. Case: µ > 1. there is collision and the transmission is unsuccessful. So if we let Xn be the number of backlogged packets at the beg ginning of the n-th time slot. Each of the backlogged or newly arrived packets is selected for transmission with (a small) probability p. Abandoning this idea. We assume that the An are i. using the same frequency. An the number of new packets on the same time slot. If s ≥ 2. there are x + a − 1 packets. then time slots 0.i. 4. random variables with some common distribution: P (An = k) = λk . Xn+1 = Xn + An .. is that if more than one hosts attempt to transmit a packet.1 1 (z) (z) ε 0 1 0 ε 1 z z Case: µ ≤ 1. . Thus. and Sn the number of selected packets. Then ε < 1. 0). 24.

if Xn + An = 0. and p. Idea of proof. x+k−1 k≥1 + µ − λ1 (x + 1)pq − λk (x + k)pq k≥2 = µ − q G(x). An−1 . To compute px.e. Unless µ = 0 (which is a trivial case: no arrivals!).where λk ≥ 0 and k≥0 kλk = 1. . The reason we take the maximum with 0 in the above equation is because. Proving this is beyond the scope of the lectures. the Markov chain (Xn ) is transient.x+k for k ≥ 1.x−1 = P (An = 0. To compute px. then the chain is transient.) = x + a k x−k p q . . The Aloha protocol is unstable. deﬁned by‘ ∆(x) := E[Xn+1 − Xn |Xn = x]. where G(x) is a function of the form α + βx . The probabilities px. So we can make q x G(x) as small as we like by choosing x suﬃciently large. i. we have ∆(x) = (−1)px. we think as follows: to have a reduction of x by 1. . we have q < 1.x+k = λk (1 − (x + k)pq x+k−1 ) + λk+1 (x + k + 1)pq x+k .x−1 when x > 0. An = a. Since p > 0.e. and so q x tends to 0 much faster than G(x) tends to ∞. conditional on Xn and An (and all past states and arrivals) a distribution which is Binomial with parameters Xn + An .x−1 + k≥1 kpx. The process (Xn . Indeed. and exactly one selection: px. but we mention that it is based on the so-called drift criterion which says that if the expected change Xn+1 − Xn conditional on Xn = x is bounded below from a positive constant for all large x. we should not subtract 1. So P (Sn = k|Xn = x. selections are made amongst the Xn + An packets. k ≥ 1. we must have no new packets arriving. The main result is: Proposition 6. we think like this: to have an increase by k we either need exactly k new packets and no successful transmissions (i. The random variable Sn has. So: px.x are computed by subtracting the sum of the above from 1. Let us ﬁnd its transition probabilities. We also let µ := k≥1 kλk be the expectation of An . k 0 ≤ k ≤ x + a. Therefore ∆(x) ≥ µ/2 for all large x. Xn−1 . 69 . We here only restrict ourselves to the computation of this expected change.x+k kλk − kλk (x + k)pq x+k−1 + k≥1 x k≥1 = −λ0 xpq x−1 + = −λ0 xpq x x−1 kλk+1 (x + k + 1)pq x+k . so q x G(x) tends to 0 as x → ∞. for example we can make it ≤ µ/2. For x > 0. Sn = 1|Xn = x) = λ0 xpq x−1 . we have that the drift is positive for all large x and this proves transience. Sn should be 0 or ≥ 2) or we need k + 1 new packets and one selection (Sn = 1). n ≥ 0) is a Markov chain. where q = 1 − p.

Comments: 1.24. It is not clear that the chain is irreducible. if j ∈ L(i) pij = 1/|V |.2) and let α pij := (1 − α)pij + . The algorithm used (PageRank) starts with an initial guess π(0) for π and updates it by π (k) = π (k−1) P. The way it does that is by computing the stationary distribution of a Markov chain that we now describe. the rank of a page may be very important for a company’s proﬁts. In an Internet-dominated society. All you need to do to get high rank is pay (a lot of money). Academic departments link to each other by hiring their 70 . then i is a ‘dangling page’.3 PageRank (trademark of Google): A World Wide Web application PageRank is an algorithm used by the popular search engine Google in order to assign importance to a web page. unless i is a dangling page. So if Xn = i is the current location of the chain. known as ‘302 Google jacking’. unless he gets bored–and this happens with probability α–in which case he clicks on a random page. For each page i let L(i) be the set of pages it links to. if page i has a link to page j then we put an arrow from i to j and consider it as a directed edge of the graph. the chain moves to a random page. to compute the stationary distribution π corresponding to the Markov chain with transition matrix P = [pij ]: π = πP. PageRank. because of the size of the problem. there are companies that sell high-rank pages. So this is how Google assigns ranks to pages: It uses an algorithm. Deﬁne transition probabilities as follows: 1/|L(i)|. The web graph is a a graph set of vertices V consisting of web pages and edges representing hyperlinks: thus. if L(i) = ∅ 0. It uses advanced techniques from linear algebra to speed up computations. We pick α between 0 and 1 (typically. in the latter case. Its implementation though is one of the largest matrix computations in the world. A new use of PageRank is to rank academic doctoral programs based on their records of ﬁnding jobs for their graduates. α ≈ 0. So we make the following modiﬁcation. The interpretation is that of a ‘bored surfer’: a surfer moves according the the original Markov chain. that allows someone to manipulate the PageRank of a web page and make it much higher. There is a technique. 3. 2. If L(i) = ∅. The idea is simple. otherwise. Then it says: page i has higher rank than j if π(i) > π(j). the next location Xn+1 is picked uniformly at random amongst all pages that i links to. |V | These are new transition probabilities. Therefore. neither that it is aperiodic.

maintain this allele for the next generation. We will also assume that the parents are replaced by the children. all we need is the information outlined in this ﬁgure. If so. a dominant (say A) and a recessive (say a). Needless to say that these uses of PageRank and PageRank-like systems represent corruption at the deepest level. But an AA mating with a Aa may produce a child of type AA or one of type Aa. We would like to study the genetic diversity. 4. The alleles in an individual come in pairs and express the individual’s genotype for that gene. To simplify. Three of the square alleles are never selected.g. Lazy university administrators use PageRank-like systems for assigning a single number to a researcher’s output. meaning how many characteristics we have. a gene controlling a speciﬁc characteristic (e. at the next generation the number of alleles of type A can be found as follows: Pick an allele at random. in each generation. Alleles that are never picked up may belong to individuals who never mate or may belong to individuals who mate but their oﬀspring did not inherit this allele. we will assume that. their oﬀspring inherits an allele from each parent. So let x be the number of alleles of type A (so and 2N − x is the number of those of type a). The process repeats. 71 . We will assume that a child chooses an allele from each parent at random. We have a transition from x to y if y alleles of type A were produced. In this ﬁgure there are 4 alleles of type A (circles) and 6 of type a (squares). This means that the external appearance of a characteristic is a function of the pair of alleles. When two individuals mate. And so on. But it is a kind of corruption that is hard to trace since it is unanimously equated with modern use of Information Technology. then.4 The Wright-Fisher model: an example from biology In an organism. Imagine that we have a population of N individuals who mate at random producing a number of oﬀspring. it makes the evaluation procedure easy. One of the circle alleles is selected twice. two AA individuals will produce and AA child. 24. Since we don’t care to keep track of the individuals’ identities. Thus. colour of eyes) appears in two forms. Do the same thing 2N times. One of the square alleles is selected 4 times. 5. generation after generation. An individual can be of type AA (meaning that all its alleles are A) or aa or Aa. We are now ready to deﬁne the Markov chain Xn representing the number of alleles of type A in generation n. This number could be independent of the research quality but. this allele is of type A with probability x/2N . from the point of view of an administrator. the genetic diversity depends only on the total number of alleles of each type. What is important is that the evaluation can be done without ever looking at the content of one’s research.faculty from each other (and from themselves).

but the argument needs further justiﬁcation because “the end of time” is not a deterministic time. . . Clearly. E(Xn+1 |Xn . Xn ). M n P (ξk = 0|Xn . Let M = 2N . (Here. . . n n where. since. These are: M ϕ(0) = 0.2N = 1. and the population becomes one consisting of alleles of type a only. on the average. . the expectation doesn’t change and. . EXn+1 = EXn Xn = Xn . . . and the population becomes one consisting of alleles of type A only. Tx := inf{n ≥ 1 : Xn = x}. Similarly. on the average. X0 ) = Xn . the number of alleles of type A won’t change. Another way to represent the chain is as follows: n n Xn+1 = ξ1 + · · · + ξM . .Each allele of type A is selected with probability x/2N .d. The fact that M is even is irrelevant for the mathematical model. . . we see that 2N x 2N −y x y pxy = . . i. ϕ(x) = y=0 pxy ϕ(y). X0 ) = E(Xn+1 |Xn ) = M This implies that. .e. ϕ(x) = x/M. we should have M × ϕ(x) + 0 × (1 − ϕ(x)) = x. . The result is correct. . ϕ(M ) = 1. . since.) One can argue that. . for all n ≥ 0. M Therefore. the random variables ξ1 . y ≤ 2N − 1. ξM are i. . 0 ≤ y ≤ 2N. at “the end of time” the chain (started from x) will be either M with probability ϕ(x) or 0 with probability 1 − ϕ(x). with n P (ξk = 1|Xn . the chain gets absorbed by either 0 or 2N . 2N − 1} form a communicating class of inessential states. . the states {1. . . Since selections are independent. as usual. so both 0 and 2N are absorbing states. We show that our guess is correct by proceeding via the mundane way of verifying that it satisﬁes the ﬁrst-step analysis equations. M and thus. i. 1− y 2N 2N x Notice that there is a chance (= (1 − 2N )2N ) that no allele of type A is selected at all. . p00 = p2N. . there is a x chance (= ( 2N )2N ) that only alleles of type A are selected.e. Clearly. . Therefore P (Xn = 0 or Xn = 2N for all large n) = 1.i. Since pxy > 0 for all 1 ≤ x. conditionally on (X0 . we would like to know the probability ϕ(x) = Px (T0 > TM )P (T0 > TM |X0 = x) that the chain will eventually be absorbed by state M . 72 . X0 ) = 1 − Xn .

Let us consider (n) gxy := Px (Xn > y) = P (x + ξ0 + · · · + ξn−1 ) ∨ (ξ1 + · · · + ξn−1 ) ∨ · · · ∨ ξn−1 ∨ 0 > y .The ﬁrst two are obvious. We will apply the so-called Loynes’ method. provided that enough components are available. . on the average. while a number of them are sold to the market. we will try to solve the recursion. we ﬁnd X1 = (X0 + ξ0 ) ∨ 0 X2 = (X1 + ξ1 ) ∨ 0 = (X0 + ξ0 + ξ1 ) ∨ ξ1 ∨ 0 ··· X3 = (X2 + ξ2 ) ∨ 0 = (X0 + ξ0 + ξ1 + ξ2 ) ∨ (ξ1 + ξ2 ) ∨ ξ2 ∨ 0 Xn = (X0 + ξ0 + · · · + ξn−1 ) ∨ (ξ1 + · · · + ξn−1 ) ∨ · · · ∨ ξn−1 ∨ 0.d. If the demand is for Dn components. Let ξn := An − Dn . for all n ≥ 0. 73 . and the stock reached level zero at time n + 1. . We have that (Xn . larger than the supply per unit of time. say. The right-hand side of the last equals M y=0 M y x M y x 1− M M −y y = M = M y=1 M y=1 M! x y!(M − y)! M y 1− x M M −y y M x M M −y−1+1 x (M − 1)! (y − 1)!(M − y)! M M y=1 y−1+1 1− x = M x = M as needed. and so. Here are our assumptions: The pairs of random variables ((An . The correctness of the latter can formally be proved by showing that it does satisfy the recursion. We will show that this Markov chain is positive recurrent. We can express these requirements by the recursion Xn+1 = (Xn + An − Dn ) ∨ 0 where a ∨ b := max(a. n ≥ 0) are i. Dn ). M x M (M −1)−(y−1) x x +1− M M M −1 = 24. and E(An − Dn ) = −µ < 0. the demand is. n ≥ 0) is a Markov chain. so that Xn+1 = (Xn + ξn ) ∨ 0.i.5 A storage or queueing application A storage facility stores. the demand is only partially satisﬁed. M −1 y−1 x M y−1 1− x . there are Xn + An − Dn components in the stock. at time n + 1. Since the Markov chain is represented by this very simple recursion. Working for n = 1. Xn electronic components at time n. A number An of new components are ordered and added to the stock. Thus. b). . the demand is met instantaneously. Otherwise. 2.

. 74 . and. . using π(y) = limn→∞ P (Mn = y). . in computing the distribution of Mn which depends on ξ0 . for all n. for y ≥ 0. In particular. ξn−1 . Let Zn := ξ0 + · · · + ξn−1 Mn := Zn ∨ Zn−1 ∨ · · · ∨ Z1 ∨ 0. . instead of considering them in their original order. . the event whose probability we want to compute is a function of (ξ0 . (n) gxy = P (x + ξn−1 + · · · + ξ0 ) ∨ (ξn−2 + · · · + ξ0 ) ∨ · · · ∨ ξ0 ∨ 0 > y . . But Mn+1 = 0 ∨ max Zj 1≤j≤n+1 =0∨ d 1≤j≤n+1 max (Zj − Z1 ) + ξ0 = 0 ∨ (Mn + ξ0 ). . we ﬁnd π(y) = x≥0 π(x)pxy . . . . Then Since Mn+1 ≥ Mn we have g0y ≤ g0y (n) (n) (n+1) g0y = P (Mn > y). so we can shuﬄe them in any way we like without altering their joint distribution. . . ξn−1 ). . we may take the limit of both sides as n → ∞. the random variables (ξn ) are i. Hence. P (Mn+1 = y) = P (0 ∨ (Mn + ξ0 ) = y) = x≥0 P (Mn = x)P (0 ∨ (x + ξ0 ) = y) P (Mn = x) pxy . ξ0 ). we reverse it. however.d. . .Thus. . Indeed. ξn . . = x≥0 Since the n-dependent terms are bounded. Therefore. Notice. . we may replace the latter random variables by ξ1 . . that d (ξ0 . n→∞ n→∞ We claim that π(y) := g(y) − g(y − 1) satisﬁes the balance equations. The limit of a bounded and increasing sequence certainly exists: g(y) := lim g0y = lim P0 (Xn > y). . ξn−1 ) = (ξn−1 . (n) .i. where the latter equality follows from the fact that.

It can also be shown (left as an exercise) that if Eξn > 0 then the Markov chain is transient. as required. n→∞ Hence g(y) = P (0 ∨ max{Z1 . . 75 . . . the Markov chain may be transient or null recurrent. The case Eξn = 0 is delicate. From the Law of Large Numbers we have P ( lim Zn /n = −µ) = 1. Z2 . we will use the assumption that Eξn = −µ < 0. . n→∞ But then P ( lim Mn = 0 ∨ max{Z1 . We must make sure that it also satisﬁes y π(y) = 1. n→∞ This implies that P ( lim Zn = −∞) = 1. . Depending on the actual distribution of ξn . .} > y) is not identically equal to 1. Z2 .} < ∞) = 1.So π satisﬁes the balance equations. This will be the case if g(y) is not identically equal to 1. Here.

d 0. ±ed } otherwise 76 . namely that to go from state x to state y it must pass through all intermediate states because its value can change by at most 1 at each step. the set of d-dimensional vectors with integer coordinates. . To be able to say this. so far. This means that the transition probability pxy should depend on x and y only through their relative positions in space. . 0.g. A random walk of this type is called simple random walk or nearest neighbour random walk. here. Example 3: Deﬁne p(x) = 1 2d . .y+z for any translation z. where p1 + · · · + pd + q1 + · · · + qd = 1. .and space-homogeneity. if x = −ej . we need to give more structure to the state space S. have been processes with timehomogeneous transition probabilities. . Deﬁne pj . a random walk is a Markov chain with time. . but. . besides time. Our Markov chains.and space-homogeneous transition probabilities given by px. Example 2: Let where ej be the vector that has 1 in the j-th position and 0 everywhere else. . . a structure that enables us to say that px. More general spaces (e. the skip-free property. groups) are allowed. that of spatial homogeneity. A random walk possesses an additional property. j = 1. p(−1) = q.PART II: RANDOM WALKS 25 Random walk The purpose of this part of the notes is to take a closer look at a special kind of Markov chain known as random walk. x ∈ Zd . given a function p(x). we can let S = Zd . In dimension d = 1. p(x) = 0. j = 1. where C is such that x p(x) = 1. otherwise. otherwise where p + q = 1.y = px+z. For example. if x ∈ {±e1 . . d p(x) = qj . if x = ej . . Example 1: Let p(x) = C2−|x1 |−···−|xd | . .y = p(y − x). namely. this walk enjoys. This is the drunkard’s walk we’ve seen before. So. then p(1) = p. we won’t go beyond Zd . If d = 1.

has equal probabilities to move from one state x to one of its “neighbouring states” x ± e1 . Obviously. we can reduce several questions to questions about this random walk. and this is the symmetric drunkard’s walk. x + ξ2 = y) = y p(x)p(y − x). random variables in Zd with common distribution P (ξn = x) = p(x). Indeed. x ∈ Zd . (When we write a sum without indicating explicitly the range of the summation variable. notice that P (ξ1 + ξ2 = x) = y P (ξ1 = x. p(n) = Px (Sn = y) = P (x + ξ1 + · · · + ξn = y) = P (ξ1 + · · · + ξn = y − x). where ξ1 . . we have P (ξ1 + ξ2 = x) = (p ∗ p)(x). . .) Now the operation in the last term is known as convolution. Furthermore. . the initial state S0 is independent of the increments. xy The random walk ξ1 + · · · + ξn is a random walk starting from 0. (Recall that δz is a probability distribution that assigns probability 1 to the point z.d. But this would be nonsense. We shall resist the temptation to compute and look at other 77 . ξ2 . The convolution of two probabilities p(x). . Let us denote by Sn the state of the walk at time n. because all we have done is that we dressed the original problem in more fancy clothes. in addition. We call it simple symmetric random walk. then p(1) = p(−1) = 1/2.) In this notation. If d = 1. . x ± ed . in general. y p(x)p′ (y − x). which. Furthermore. computing the n-th order convolution for all n is a notoriously hard problem. otherwise. It is clear that Sn can be represented as Sn = S0 + ξ1 + · · · + ξn . We will let p∗n be the convolution of p with itself n times.i. p(x) = 0. We easily see that P (ξ1 + · · · + ξn = x) = p∗n (x). are i. we will mean a sum over all possible values. p′ (x).This is a simple random walk. x ∈ Zd is deﬁned as a new probability p ∗ p′ given by (p ∗ p′ )(x) = The reader can easily check that p ∗ p′ = p′ ∗ p. . So. p ∗ (p′ ∗ p′′ ) = (p ∗ p′ ) ∗ p′′ . The special structure of a random walk enables us to give more detailed formulae for its behaviour. p ∗ δz = p. One could stop here and say that we have a formula for the n-step transition probability. We refer to ξn as the increments of the random walk.

ξn ). ξn = εn ) = 2−n . . 2) → (5.d. +1. how long will that take? C. + (1. If not. 0) → (1. −1} then we have a path in the plane that joins the points (0. +1}n .methods.0) We can thus think of the elements of the sample space as paths. First. Will the random walk ever return to where it started from? B. 2 where #A is the number of elements of A. . we are dealing with a uniform distribution on the sample space {−1. . 1) → (2.i. . we shall learn some counting methods. 2) → (3. So if the initial position is S0 = 0. As a warm-up.2) where the ξn are i. 1}n . 1): (2. Therefore.2) (4. After all. A ⊂ {−1.1) + (0. In the sequel. who cares about the exact value of p∗n (x) for all n and x? Perhaps it is the case (a) that diﬀerent questions we need to ask or (b) that approximate methods are more informative than exact ones. If yes. Clearly. . where will it go? 26 The simple symmetric random walk in dimension 1: path counting This is a random walk with values in Z and transition probabilities pi. and so #A P (A) = n . When the walk starts from 0 we can write Sn = ξ1 + · · · + ξn . but its practice may be hard. if we can count the number of elements of A we can compute its probability. Here are a couple of meaningful questions: A. Events A that depend only on these the ﬁrst n random signs are subsets of this sample space. all possible values of the vector are equally likely. Look at the distribution of (ξ1 . and the sequence of signs is {+1. for ﬁxed n. random variables (random signs) with P (ξn = ±1) = 1/2. −1. . to each initial position and to each sequence of n signs.1) − (3. consider the following: 78 .i−1 = 1/2 for all i ∈ Z. a random vector with values in {−1. +1.1) + − (5. The principle is trivial. +1}n . . Since there are 2n elements in {0. . +1}n . 1) → (4. So. we should assign. we have P (ξ1 = ε1 . a path of length n.i+1 = pi.

Example: Assume that S0 = 0. 0)). x) and end up at (n. (There are 2n of them. i). and compute the probability of the event A = {S2 ≥ 0.i) (0.) The darker region shows all paths that start at (0. the number of paths that start from (m. n 79 . (n. 0) and ends at (n. The paths comprising this event are shown below. 0) and end up at (n. i). Let us ﬁrst show that Lemma 14. (n − m + y − x)/2 (23) Remark. y) is given by: #{(0. We use the notation {(m. In if n + i is not an even number then there is no path that starts from (0.0) The lightly shaded region contains all paths of length n that start at (0. Hence (n+i)/2 = 0. Consider a path that starts from (0. S7 < 0}.047. y)} =: {all paths that start at (m. 0) and end up at the speciﬁc point (n. y)} = n−m . We can easily count that there are 6 paths in A. in this case. 0) (n. 2 1 3 0 1 2 4 5 6 7 −1 −2 −3 − −4 −7 26. 0) and n ends at (n. The number of paths that start from (0. i). x) (n. x) and end up at (n. S6 < 0.1 Paths that start and end at speciﬁc points Now consider a path that starts from a speciﬁc point and ends up at a speciﬁc point. Proof. So P (A) = 6/26 = 6/128 = 3/64 ≈ 0. S4 = −1. y) is given by: #{(m. x) (n. y)}. i)} = n n+i 2 More generally.

Let u be the number of +’s in this path (the number of upward segments) and d the number of −’s (the number of downward segments). Then u + d = n, Hence u = (n + i)/2, d = (n − i)/2. So if we know where the path starts from and where it ends at we know how many times it has gone up and how many down. The question then becomes: given a sequence of n signs out of which u are + and the rest are −, in how many ways can we arrange them? Well, in exactly n ways, and this proves the formula. u Notice that u = (n + i)/2 must be an integer. This is true if n + i is even or, equivalently if n and i are simultaneously even or simultaneously odd. If this is not the case, then there is no path that can go from (0, 0) to (n, i). For example (look at the ﬁgure), there is no path that goes from (0, 0) to (3, 2). We shall thus deﬁne n to be zero if u is not an integer. u More generally, the number of paths that start from (m, x) and end up at (n, y) equals the number of paths that start from (0, 0) and end up at (n − m, y − x). So, applying the previous formula, we obtain (23). u − d = i.

26.2

**Paths that start and end at speciﬁc points and do not touch zero at all
**

(n, y) that never become n−m . − m − y − x)

Lemma 15. Suppose that x, y > 0. The number of paths (m, x) zero (i.e. they do not touch the horizontal axis) is given by: #{(m, x) (n, y); remain > 0} =

1 2 (n

n−m − − m + y − x)

1 2 (n

(24)

Proof.

(m,x) (n,y)

The shaded region contains all possible paths that start at (m, x) and end up at the speciﬁc point (n, y) which, in addition, never touch the horizontal axis.

n−m

Consider, instead, a path that starts at (m, x) and ends up at point (n, y) which DOES touch the horizontal axis, like the path indicated by the solid line below. There is a trick, 80

called reflection, which is useful here. Note that to each such path there corresponds another path obtained by following it up to the ﬁrst time it hits zero and then reﬂecting it around the horizontal axis. In the ﬁgure below, follow the solid path starting at (m, x) up to the ﬁrst time it hits zero and then follow the dotted path: this is the reﬂected path.

(n, y)

(m, x)

(n, −y)

A path and its reﬂection around a horizontal line. Notice that the reﬂection starts only at the point where the path touches the horizontal line.

Note that if we know the reﬂected path we can recover the original path. Hence we may as well count the number of reﬂected paths. But each reﬂected path starts from (m, x) and ends at (n, −y) and so it necessarily crosses the horizontal axis. We thus apply (23) with −y in place of y: #{(m, x) (n, y); touch zero} = #{(m, x) = So, for x > 0, y > 0, #{(m, x) (n, y); remain > 0} = #{(m, x) =

1 2 (n

n−m . (n − m − y − x)/2

(n, −y)}

n−m − − m + y − x)

(n, y)} − #{(m, x)

1 2 (n

(n, y); touch zero}

n−m . − m − y − x)

**Corollary 14. For any y > 0, #{(1, 1) Let m = 1, x = 1 in (24): #{(1, 1) (n, y); remain > 0} = n−1 n−1 − 1 + y) − 1 2 (n − y) − 1 (n − 1)! (n − 1)! − n−y = n+y n−y n−y ! ! 2 −1 ! 2 2 −1 ! 2 1n+y 1n−y n! = − n+y n 2 n 2 ! n−y ! 2 2
**

1 2 (n

(n, y); remain > 0} =

y n

#{(0, 0)

(n, y)}.

(25)

=

y n

n

n+y 2

,

and the latter term equals the total number of paths from (0, 0) to (n, y).

81

Corollary 15. The number of paths from (m, x) to (n, y) that do not touch level z < min(x, y) is given by: #{(m, x) (n, y); remain > z} = n−m − − m + y − x) n−m . − m − y − x) − z (26)

1 2 (n

1 2 (n

Proof. Because it is only the relative position of x and y with respect to z that plays any role, we have #{(m, x) (n, y); remain > z} = #{(m, x + z) (n, y + z); remain > 0}

and so the formula is obtained by an application of (24). Corollary 16. The number of paths from (0, 0) to (n, y), where y ≥ 0, that remain ≥ 0 is given by: n n #{(0, 0) (n, y); remain ≥ 0} = n+y − n+y (27) 2 2 +1 Proof. #{(0, 0) (n, y); remain ≥ 0} = #{(0, 0) = = where we used (26) with z = −1. Corollary 17 (Catalan numbers). The number of paths from (0, 0) to (n, 0) that remain ≥ 0 is given by: #{(0, 0) Proof. Apply (27) with y = 0: #{(0, 0) (n, 0); remain ≥ 0} = n

n 2

= #{(0, 1) n

n+y 2

(n, y); remain > −1} n −1 n , n+y 2 +1

n−y 2

(n, y + 1); remain > 0}

− −

n

n+y 2

(n, 0); remain ≥ 0} =

1 n+1 . n + 1 n/2

−

n 2

n −1

=

1 n+1 , n + 1 n/2

where the latter calculation is as in the proof of (25). Corollary 18. The number of paths that start from (0, 0), have length n, and remain nonnegative is given by: #{(0, 0) (n, •); remain ≥ 0} = n . ⌈n/2⌉ (28)

where ⌈x⌉ denotes the smallest integer that is ≥ x. 82

Proof. To see, this, just use (27) and sum over all y ≥ 0. If n = 2m then we need to set y = 2z in (27) (otherwise (27) is just zero) and so we have #{(0, 0) = (2m, •); remain ≥ 0} = 2m m+z − 2m m+z+1

z≥0

2m 2m 2m 2m 2m 2m 2m ± ··· + − + ··· + − + − 2m 2m 2m − 1 m+2 m+1 m+1 m n 2m , = = n/2 m

because all the terms that are denoted by · · · are equal to 0, while the rest cancel with one another, and only the ﬁrst one remains. If n = 2m + 1 then we need to set y = 2z + 1 in (27) (otherwise (27) is just zero) and so we have #{(0, 0) = (2m + 1, •); remain ≥ 0} = 2m + 1 2m + 1 − m+z+1 m+z+2

z≥0

2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 ±· · · + − +· · ·+ − + − 2m + 1 2m + 1 2m m+3 m+2 m+2 m+1 n 2m + 1 . = = ⌈n/2⌉ m+1

We have shown that the shaded regions contain exactly the same number of paths. Was this, a posteriori, obvious?

2m

2m

27

The simple symmetric random walk in dimension 1: simple probability calculations

Since all paths of certain length have the same probability, we can translate the path counting formulae into corresponding probability formulae.

27.1

**Distribution after n steps
**

n 2−n . (n + y)/2 83

Lemma 16. P0 (Sn = y) =

If n. . . . S2 > 0. if n is even. . if n is even. 27. But. n−m 2−n−m . Sn ≥ 0 | Sn = y) = 1 − In particular. we shall just compute it. The probability that. n This is called the ballot theorem and the reason will be given in Section 29. . y ≥ 0 are both even or both odd then P0 (S1 ≥ 0 .2 The ballot theorem Theorem 25. (n − m + y − x)/2 u(n) := P0 (Sn = 0) = Proof. y)} . n 2−n . x) 2n−m (n. 0) (n. . P0 (Sn = y) 2 (n + y) + 1 . We have P0 (Sn = y) = and P (Sn = y|Sm = x) = and we apply (23). . Sn ≥ 0 | Sn = 0) = 84 1 . 1) (n. n 2 #{paths (m. n (30) Proof. The left side of (30) equals #{paths (1. y)} × 2−n n−1 n−1 − 1 1 (n − 1 + y − 1) 2 (n − 1 − y − 1) = 2 n 1 (n + y) 2 = y . Such a simple answer begs for a physical/intuitive explanation. 0) (n. P0 (S1 ≥ 0 . y) that do not touch zero} × 2−n #{paths (0. given that S0 = 0 and Sn = y > 0. n/2 (29) #{(0. .3 Some conditional probabilities Lemma 17. y)} . Sn > 0 | Sn = y) = y . the random walk never becomes zero up to time n is given by P0 (S1 > 0. 27. (n/2) + 1 y+1 P0 (Sn = y + 2) = 1 .P (Sn = y|Sm = x) = In particular. . for now. .

. For even n we have P0 (S1 ≥ 0.). . Sn > 0) + P0 (S1 < 0. . . . Sn ≥ 0) = #{(0. . •). . . 85 . . . . . Sn−1 ≥ 0) = P0 (ξ2 ≥ 0. . ξ2 . (c) is just conditioning on {ξ1 = 1} and using the fact that S1 = ξ1 . . . remain ≥ 0} = 2n n 2−n = P0 (Sn = 0). Sn = 0) = P0 (Sn = 0). . . . . 1 + ξ2 + ξ3 > 0. n/2 where we used (28) and (29). ξ3 . . (c) (d) (e) (f ) (g) where (a) is obvious.Proof. . . .4 Some remarkable identities Theorem 26. Proof.) by (ξ1 . . . . If n is even. P0 (S1 ≥ 0. . . (d) follows from P0 (ξ1 = 1) = 1/2 and from the substitution of the value of ξ1 on the left of the conditioning. . P0 (Sn = y) − n n+y 2 n +1 n+y 2 This is further equal to 1− ( n+y 2 ( n! n−y + 1)! ( 2 − 1)! n+y 2 )! ( n−y )! 2 =1− n! n−y 2 n+y 2 + 1 = n+y 2 y+1 . and Sn−1 > 0 implies that Sn ≥ 0. . . Sn ≥ 0 | Sn = y) = n = n+y 2 P0 (Sn ≥ 0 . and ﬁnally (g) is trivial because Sn−1 ≥ 0 means Sn−1 > 0 because Sn−1 cannot take the value 0. ξ2 + ξ3 ≥ 0. . . . (e) follows from the fact that ξ1 is independent of ξ2 . . Sn > 0 | ξ1 = 1) = P0 (1 + ξ2 > 0. . ξ3 . . . (f ) follows from the replacement of (ξ2 . Sn ≥ 0. . (b) follows from symmetry. . . . We use (27): P0 (Sn ≥ 0 . . Sn > 0) = 2P0 (ξ1 = 1)P0 (S2 > 0. . and from the fact that if m is integer then 1 + m > 0 is equivalent to m ≥ 0. Sn = 0) = P0 (S1 > 0. . . . 1 + ξ2 + · · · + ξn > 0 | ξ1 = 1) = P0 (S1 ≥ 0. . . . We next have P0 (S1 = 0. . 0) (n. . . Sn ≥ 0). Sn < 0) (b) (a) = 2P0 (S1 > 0. . . Sn ≥ 0) = P0 (S1 = 0. +1 27. . ξ2 + · · · + ξn ≥ 0) = P0 (S1 ≥ 0. . . . . Sn = y) P0 (Sn = y) =1− P0 (Sn = y + 2) . . . .

Put a two-sided mirror at a. we computed. So 1 f (n) = 2u(n) P0 (S1 > 0. Sn = 0). Sn−1 > 0. . So the last term is 1/2. . . . and u(n) = P0 (Sn = 0). n We know that P0 (Sn = 0) = u(n) = n/2 2−n . . Fix a state a. n−1 Proof. This is not so interesting. we can omit the last event. Also. . Sn−1 < 0. Then f (n) := P0 (S1 = 0. Sn = 0. . . 2 and the last probability can be computed from the ballot theorem–see (30): it is equal to 1/(n − 1). Sn = 0) + P0 (S1 < 0. . Sn = 0) = P0 (Sn = 0) P (Sn−1 > 0|Sn = 0). Sn = 0) Now. Sn−2 > 0|Sn−1 = 1). If a particle is to the right of a then you see its image on the left. we see that Sn−1 > 0. . By symmetry. with equal probability. . f (n) = P0 (S1 > 0. .27. . and if the particle is to the left of a then its mirror image is on the right. . Sn = 0) = 2P0 (Sn−1 > 0. Sn = 0) = u(n) .5 First return to zero In (29). the two terms must be equal. By the Markov property at time n − 1. . . Sn = 0) P0 (S1 > 0. To take care of the last term in the last expression for f (n). . for even n. We now want to ﬁnd the probability f (n) that the walk will return to 0 for the ﬁrst time in n steps. the probability u(n) that the walk (started at 0) attains value 0 in n steps. given that Sn = 0. . So if the particle is in position s then its mirror image is in position s = 2a − s because s − a = a − s. Theorem 27. It is clear that if a particle starts at 0 and performs a simple symmetric random walk then its mirror image starts at a and performs a simple symmetric random walk. Since the random walk cannot change sign before becoming zero. . Sn−1 = 0. . Sn = 0 means Sn−1 = 1. . Let n be even. Sn−2 > 0|Sn−1 > 0. . So u(n) f (n) = . Sn−1 > 0. . P0 (Sn−1 > 0. . we either have Sn−1 = 1 or −1. n−1 28 The reﬂection principle for a simple random walk in dimension 1 The reflection principle says the following. 86 . So f (n) = 2P0 (S1 > 0.

n < Ta . Sn = Sn . n < Ta . called (Sn . 2 Deﬁne now the running maximum Mn = max(S0 . In the last event. . n ∈ Z+ ). 2a − Sn ≥ a) = P0 (Ta ≤ n. If Sn ≥ a then Ta ≤ n. Sn ). Then. Sn ≤ a). m ≥ 0) is a simple symmetric random walk. P0 (Sn ≥ a) = P0 (Ta ≤ n. So Sn − a can be replaced by a − Sn in the last display and the resulting process. so we may replace Sn by 2a − Sn (Theorem 28). deﬁne Sn = where is the ﬁrst hitting time of a. Sn ≤ a) + P0 (Ta ≤ n. as a corollary to the above result. Combining the last two displays gives P0 (Ta ≤ n) = P0 (Sn ≥ a) + P0 (Sn > a) = P0 (Sn ≥ a) + P0 (Sn ≥ a) − P0 (Sn = a). Finally. So: P0 (Sn ≥ a) = P0 (Ta ≤ n. But a symmetric random walk has the same law as its negative. 28. Thus. Theorem 28. n ≥ Ta By the strong Markov property.1 Distribution of hitting time and maximum We can take advantage of this principle to establish the distribution function of Ta : Theorem 29. Then Ta = inf{n ≥ 0 : Sn = a} Sn . Then 1 P0 (Ta ≤ n) = 2P0 (Sn ≥ a) − P0 (Sn = a) = P0 (|Sn | ≥ a) − P0 (|Sn | = a). . 2 Proof. Sn > a) = P0 (Ta ≤ n. . independent of the past before Ta . n ∈ Z+ ) has the same law as (Sn . S1 . Trivially. observing that Sn has the same distribution as −Sn we get that this is also equal to P0 (|Sn | ≥ a) − 1 P0 (|Sn | = a). the process (STa +m − a. n ≥ 0) be a simple symmetric random walk and a > 0. But P0 (Ta ≤ n) = P0 (Ta ≤ n. Proof. for sure–Theorem 35) and after that consider its mirror image Sn = 2a − Sn . Let (Sn . Sn ≥ a). If (Sn ) is a simple symmetric random walk started at 0 then so is (Sn ). n ≥ Ta . . we obtain 87 . In other words. The last equality follows from the fact that Sn > a implies Ta ≤ n. a + (Sn − a).What is more interesting is this: Run the particle till it hits a (it will. 2a − Sn . Sn ≤ a) + P0 (Sn > a). we have n ≥ Ta .

Sn = y) = P0 (Sn = y) − P0 (Tx ≤ n. The answer is very simple: Theorem 31. . n = a + b. Proof. employed for diﬀerent purposes. 9 Bertrand. Observe that Mn ≥ x if and only if Sk ≥ x for some k ≤ n which is equivalent to Tx ≤ n and use Theorem 29. We can do more: we can derive the joint distribution of Mn and Sn : Theorem 30. then. Solution d’un probl`me. 29 Urns and the ballot theorem Suppose that an urn8 contains n items. 105. we have P0 (Mn < x. Sn = y) = P0 (Sn = y) − P0 (Sn = 2x − y) . 8 88 . of which a are coloured azure and b black. Since Mn < x is equivalent to Tx > n. where ηi indicates the colour of the i-th item picked. (1887). (31) x > y. Compt. If Tx ≤ n. . which results into P0 (Tx ≤ n. as for holding liquids. If an urn contains a azure items and b = n−a black items. 105. usually a vase furnished with a foot or pedestal. 2 Proof. Sn = y) = P0 (Sn = 2x − y). The original ballot problem. e e e Paris. 369. . Sn = y) = P0 (Tx ≤ n. Solution directe du probl`me r´solu par M. (1887). gives the result. the ashes of the dead after cremation. and anciently for holding ballots to be drawn. P0 (Mn < x. An urn is a vessel. the number of selected azure items is always ahead of the black ones equals (a − b)/n. Paris. It is the latter use we are interested in in Probability and Statistics. D. Sci. ornamental objects. then the probability that. e 10 Andr´. what is the probability that the number of azure items is constantly ahead of the number of black ones? The problem was solved10 in the same year. J. posed in 18879 asks the following: if we start picking the items from the urn one by one without replacement. Sn = y). Compt. Bertrand. combined with (31). Rend. But if x > y then 2x − y > x and so {Sn = 2x − y} ⊂ {Tx ≤ n}. Sci.Corollary 19. Rend. ηn ). Acad. during sampling without replacement. as long as a ≥ b. by applying the reﬂection principle (Theorem 28). Sn = y) = P0 (Tx > n. Sn = 2x − y). This. 1 P0 (Mn ≥ x) = P0 (Tx ≤ n) = 2P0 (Sn ≥ x) − P0 (Sn = x) = P0 (|Sn | ≥ x) − P0 (|Sn | = x). The set of values of η contains n elements and η is uniformly distributed a in this set. 436-437. we can replace Sn by 2x − Sn in the last part of the last display and get P0 (Tx ≤ n. Acad. . A sample from the urn is a sequence η = (η1 .

conditional on the event {Sn = y}. . . But if Sn = y then. We have P0 (Sn = k) = n n+k 2 pi. indeed. 30 The asymmetric simple random walk in dimension 1 Consider now a simple random walk with values in Z. ξn = εn ) P0 (Sn = y) 2−n 1 = = . . . . n ≥ 0) with values in Z and let y be a positive integer. −n ≤ k ≤ n. We need to show that. . . . n n −n 2 (n + y)/2 a Thus. . . Then. . . So the number of possible values of (ξ1 . . . . p(n+k)/2 q (n−k)/2 . . where y = a − (n − a) = 2a − n. Sn > 0 | Sn = y) = . using the deﬁnition of conditional probability and Lemma 16. It remains to show that all values are equally a likely. n . . . letting a. ξn = εn | Sn = y) = P0 (ξ1 = ε1 . n Since y = a − (n − a) = a − b. . But in (30) we computed that y P0 (S1 > 0. conditional on {Sn = y}. the sequence (ξ1 . Proof. . . . Then. . . . always assuming (without any loss of generality) that a ≥ b = n − a. the event that the azure are always ahead of the black ones during sampling is precisely the event {S1 > 0. Proof of Theorem 31. ξn ) of the increments is an urn process with parameters n and a = (n + y)/2. . . . (ξ1 . Since the ‘azure’ items correspond to + signs (respectively. . An urn can be realised quite easily be a random walk: Theorem 32. but which is not necessarily symmetric. we have: P0 (ξ1 = ε1 . Sn > 0} that the random walk remains positive. .i−1 = q = 1 − p. . . Let ε1 . . +1} such that ε1 + · · · + εn = y. a − b = y. 89 . the sequence (ξ1 . (The word “asymmetric” will always mean “not necessarily symmetric”. a we have that a out of the n increments will be +1 and b will be −1. the ‘black’ items correspond to − signs). The values of each ξi are ±1. a. ηn ) an urn process with parameters n. . . . ξn ) has a uniform distribution. we can translate the original question into a question about the random walk. .) So pi. so ‘azure’ items are +1 and ‘black’ items are −1. i ∈ Z. conditional on {Sn = y}. ξn ) is uniformly distributed in a set with n elements. ξn ) is. . the result follows. Consider a simple symmetric random walk (Sn .i+1 = p.We shall call (η1 . εn be elements of {−1. a can be realised by a simple symmetric random walk starting from 0 and ending at Sn = y. Since an urn with parameters n. . . b be deﬁned by a + b = n.

x ≥ 0) is also a random walk. The process (Tx .) Then. we make essential use of the skip-free property. Then the probability generating function of the ﬁrst time that level 1 is reached is given by ϕ(s) = E0 sT1 = 1− 1 − 4pqs2 . If S1 = −1 then T1 is the sum of three times: one unit of time spent ′ to go from 0 to −1. random variables. Lemma 18. 2. E0 sTx = ϕ(s)x . If x > 0. and all t ≥ 0. .i. 2qs Proof. Proof. 90 . 2. . y ∈ Z. In view of this. .30. In view of Lemma 19 we need to know the distribution of T1 . Px (Ty = t) = P0 (Ty−x = t). 1. The rest is a consequence of the strong Markov property. Consider a simple random walk starting from 0. We will approach this using probability generating functions.1 First hitting time analysis We will use probabilistic + analytical methods to compute the distribution of the hitting times Tx = inf{n ≥ 0 : Sn = x}.} ∪ {∞}. only the relative position of y with respect to x is needed.d. it is no loss of generality to consider the random walk starting from 0. . Lemma 19. See ﬁgure below. Theorem 33. Start with S0 = 0 and watch the process until the ﬁrst n = T1 such that Sn = 1. (We tacitly assume that S0 = 0. . This random variable takes values in {0. plus a ′′ further time T1 required to go from 0 to +1 for the ﬁrst time. Proof. . x − 1 in order to reach x. Consider a simple random walk starting from 0. Corollary 20. For all x. for x > 0. namely the obvious fact that a simple random walk starting from 0 must pass through levels 1. Consider a simple random walk starting from 0. plus a time T1 required to go from −1 to 0 for the ﬁrst time. x ∈ Z. . This follows from the spatial homogeneity of the transition probabilities: to reach y starting from x. If S1 = 1 then T1 = 1. Let ϕ(s) := E0 sT1 . To prove this. then the summands in Tx = T1 + (T2 − T1 ) + · · · + (Tx − Tx−1 ) are i.

T1 is independent of T1 conditional on S1 = −1 and each with distribution that of T1 . Then the probability generating function of the ﬁrst time that level −1 is reached is given by E0 sT−1 = 1− 1 − 4pqs2 . Hence the previous formula applies with the roles of p and q interchanged. the RW must ﬁrst hit 0 (it takes time ′ ′′ T1 to do this) and then 1 (it takes a further time of T1 ). in order to hit 1. Corollary 21. Corollary 22. Consider a simple random walk starting from 0. Then the probability generating function of the ﬁrst time that level x ∈ Z is reached is given by √ x E sT1 x = 1− 1−4pqs2 . Use Lemma 19 and the previous formulae of Theorem 33 and Corollary 21. We thus have ′ ′′ ′ ′′ ′ ′′ 1 ′ ′′ 1 + T1 + T 1 if S1 = 1 if S1 = −1 ϕ(s) = ps + qsϕ(s)2 . But note that −Sn is a simple random walk with upward transition probability q and downward transition probability p. from the strong Markov property. 0 if x > 0 2qs Tx (32) E0 s = √ |x| E sT−1 |x| = 1− 1−4pqs2 0 . The ﬁrst time n such that Sn = −1 is the ﬁrst time n such that −Sn = 1. if x < 0 2ps Proof. 2ps Proof.Sn 1 0 −1 n T1’ T1’’ If S1 = −1 then. 91 . Consider a simple random walk starting from 0. The formula is obtained by solving the quadratic. Hence T1 = This can be used as follows: ϕ(s) = E0 [s1(S1 = 1) + s1+T1 +T1 1(S1 = −1)] = sP (S1 = 1) + sE0 [sT1 sT1 |S1 = −1]P (S1 = −1) = sp + sE0 [sT1 sT1 |S1 = −1]q ′′ ′ But.

Consider a simple random walk starting from 0. For a symmetric ′ random walk. (Pay attention: it is important to write n ≥ 1 inside the set!) Theorem 34. then T0 equals 1 plus a time whose distribution that of the time required for the walk to hit −1 starting from 0. k 92 . ′ But if S1 = −1 then T0 equals 1 plus the time required for the walk to hit 0 starting from −1. Similarly. in distribution.3 The distribution of the ﬁrst return to the origin We wish to ﬁnd the exact distribution of the ﬁrst return time to the origin. E0 s T 0 = E0 s T 0 ′ ′ ′ 1 − 4pqs2 . Then the probability generating function of the ﬁrst return time to 0 is given by E0 s T 0 = 1 − Proof. We again do ﬁrst-step analysis.2: P0 (T0 = n) = f (n) = u(n)/(n − 1) is n is even. how long will it take till it ﬁrst returns to 0? We are talking about the stopping time ′ T0 := inf{n ≥ 1 : Sn = 0}. 1(S1 = −1) ′ + E0 s T 0 ′ ′ 1(S1 = 1) = qE0 sT0 | S1 = −1 + pE0 sT0 | S1 = 1 . if S1 = 1. If α = n is a positive integer then n (1 + x)n = k=0 n k x . and simply write (1 + x)n = k≥0 n k x . The latter is.2 First return to the origin Now let us ask: If the particle starts at 0. k by the binomial theorem. 1 − 4pqs2 . For the general case. this was found in §30.30. where α is a real number. E0 sT0 = qE0 (s1+T1 ) + pE0 (s1+T−1 ) = qs 1− 1 − 4pqs2 1− + ps 2qs 1 − 4pqs2 . we have computed the probability generating function E0 sT0 –see Theorem 34 The tool we need here is Taylor’s theorem for the function f (x) = (1 + x)α . 2ps ′ =1− 30. then we can omit the k upper limit in the sum. the same as the time required for the walk to hit 1 starting ′ from 0. If we deﬁne n to be equal to 0 for k > n.

k! The adverb ‘formally’ is supposed to mean that this formula holds for some x around 0. √ 1 − x.Recall that n k = It turns out that the formula is formally true for general α. We have k≥0 1 − x = (1 − x)1/2 = 1/2 (−1)k xk . Indeed. by deﬁnition. but that we won’t worry about which ones. So √ 1−x= ′ k≥0 1 2k k x =1+ 2k − 1 k 1 − 4pqs2 : ′ k≥1 2k k 1 x 2k − 1 k We apply this to E0 sT0 = 1 − E0 s T 0 = k≥1 1 2k (pq)k s2k . k!(n − k)! k! α k x . let us compute √ (−1)k (−x)k = k≥0 k≥0 k≥0 −1 (−x)k k = (−1)k k! (−1)(−2) · · · (−1 − k + 1) = = (−1)k . we will show that (1 − x)−1 = 1 + x + x2 + x3 + · · · which is of course the all too-familiar geometric series. For example. k as long as we deﬁne α k = α(α − 1) · · · (α − k + 1) . 2k − 1 k 93 . (−1)k = 2k − 1 k k and this is shown by direct computation. As another example. k We can write this in more familiar symbols if we observe that 2k −k 1 1/2 4 . and it looks exactly the same: (1 + x)α = k≥0 n(n − 1) · · · (n − k + 1) n! = . (1 − x)−1 = But. k! k! (−1)2k xk = k≥0 xk . −1 k so (1 − x)−1 = as needed.

The key fact needed here is: P0 (Tx < ∞) = lim E0 sTx . by the deﬁnition of E0 sT0 . if x < 0 |x| In particular. P0 (Tx < ∞) = p q q p ∧1 ∧1 . 94 . |x| if x > 0 (33) . Hence it is transient. Comparing the two ‘inﬁnite polynomials’ we see that we must have n = 2k (we knew that– the random walk can return to 0 only in an even number of steps) and then ′ P0 (T0 = 2k) = 1 1 2k k k p q = P0 (S2k = 0). We will see how the results about the generating functions of the hitting and return times help us answer these questions directly. 2k − 1 k 2k − 1 31 Recurrence properties of simple random walks Consider again a simple random walk Sn with values in Z with upward probability p and downward probability q = 1 − p. That it is actually null-recurrent follows from Markov chain theory. if p = q then P0 (Tx < ∞) = 1 for all x ∈ Z. x 1−|p−q| 2q 1−|p−q| 2p . x if x > 0 . so 1 − 4pq = So P0 (Tx < ∞) = 1 − 4p + 4p2 = (1 − 2p)2 = |1 − 2p| = |p − q|. • If p < 1/2 then p − q < 0 and so Sn tends to −∞ with probability one. We apply this to (32). But remember that q = 1 − p. Hence it is again transient. • If p > 1/2 then p − q > 0 and so Sn tends to +∞ with probability one. s↑1 which is true for any probability generating function. E0 s T 0 = n≥0 ′ ′ ′ P0 (T0 = n)sn . without appeal to the Markov chain theory. • If p = 1/2 then p − q = 0 and the law of large numbers only tells us that the walk is not positive recurrent. if x < 0 This formula can be written more neatly as follows: Theorem 35. By applying the law of large numbers we have that P0 Sn =p−q n→∞ n lim = 1.But. The quantity inside the square root becomes 1 − 4pq.

Then limn→∞ Sn = −∞ with probability 1. Consider a random walk with values in Z. Proof. if x < 0 which is what we want. Then the overall maximum M∞ has a geometric distribution: P0 (M∞ ≥ x) = (p/q)x . S1 . . Consider a simple random walk with values in Z. P0 (M∞ ≥ x) = lim P0 (Mn ≥ x) = lim P0 (Tx ≤ n) = P0 (Tx < ∞) = (p/q)x . q p. x ≥ 0. Theorem 36. Sn ). The simple symmetric random walk with values in Z is null-recurrent. but here it is < ∞. 31. Otherwise. with probability 1 because p < 1/2.2 Return probabilities Only in the symmetric case p = q = 1/2 will the random walk return to the point where it started from. if x > 0 . then we have M∞ = limn→∞ Mn . This value is random and is denoted by M∞ = sup(S0 .Proof. Let ′ T0 = inf{n ≥ 1 : Sn = 0} be the ﬁrst return time to 0.) If we let Mn = max(S0 . This means that there is a largest value that will be ever attained by the random walk. . . What is this probability? Theorem 37. . Assume p < 1/2. Proof. S2 . this probability is always strictly less than 1. We know that it cannot be positive recurrent. Indeed. Then ′ P0 (T0 < ∞) = 2(p ∧ q). 31. Corollary 23. it will never ever return to the point where it started from. Unless p = 1/2. . 95 . The last part of Theorem 35 tells us that it is recurrent. If p > q then |p − q| = p − q and so (33) gives P0 (Tx < ∞) = 1.1 Asymptotic distribution of the maximum Suppose now that p < 1/2. If p < q then |p − q| = q − p and. every nondecreasing sequence has a limit. except that the limit may be ∞. n→∞ n→∞ x ≥ 0. . and so it is null-recurrent. the sequence Mn is nondecreasing. again. . with some positive probability. we obtain what we are looking for.

So ′ P0 (J0 ≥ 1) = P0 (T0 < ∞) = 2(p ∧ q). k = 0. Ei Ji = 1 − 2(p ∧ q) Remark 2: The formulae work for all values of p ∈ [0.′ Proof. For the second formula we have ∞ ∞ E0 J0 = k=1 P0 (J0 ≥ k) = k=1 (2(p ∧ q))k = 2(p ∧ q) . But if it is transient. 1 − 2(p ∧ q) this being a geometric series. Theorem 38. (34) a random variable ﬁrst considered in Section 10 where it was also shown to be geometric. If p = q this gives 1. This proves the ﬁrst formula. . 96 . . The probability generating function of T0 was obtained in Theorem 34. But J0 ≥ 1 if and only if the ﬁrst return time to zero is ﬁnite. 31. starting from zero. Pi (Ji ≥ k) = 1 for all k. Remark 1: By spatial homogeneity. 1. We now look at the the number of visits to some state but if we start from a diﬀerent state. the total number of visits to a state will be ﬁnite. Then. 2. . . . 2. as already known.3 Total number of visits to a state If the random walk is recurrent then it visits any state inﬁnitely many times. k = 0. In the latter case. 1. Consider a simple random walk with values in Z. if a Markov chain starts from a state i then Ji is geometric. We now compute its exact distribution. . 2(p ∧ q) . 2(p ∧ q) E0 J0 = . we obviously have that the same formulae apply for any i: Pi (Ji ≥ k) = (2(p ∧ q))k . the total number J0 of visits to state 0 is geometrically distributed with P0 (J0 ≥ k) = (2(p ∧ q))k . . k = 0. meaning that Pi (Ji = ∞) = 1. And if p < q this gives 1 − (q − p) = 1 − (1 − p − p) = 2p. even for p = 1/2. In Lemma 4 we showed that. 1]. where the last probability was computed in Theorem 37. . If p > q this gives 1 − (p − q) = 1 − (1 − q − q) = 2q. 1. 1 − 2(p ∧ q) Proof. . But ′ P0 (T0 < ∞) = lim EsT0 = 1 − s↑1 ′ 1 − 4pq = 1 − |p − q|. The total number of visits to state i is given by ∞ Ji = n=1 1(Sn = i). And so P0 (J0 ≥ k) = P0 (J0 ≥ 1)k . 2. .

In other words. there has already been one visit to state x and so we need at least k − 1 remaining visits in order to have at least k visits in total. simply because if Jx ≥ k ≥ 1 then Tx < ∞. then E0 Jx drops down to zero geometrically fast as x tends to inﬁnity. E0 Jx = Proof. If p ≥ 1/2 then this gives P0 (Jx ≥ k) = (q/p)|x| (2q)k−1 . given that Tx < ∞. We repeat the argument for x < 0 and ﬁnd P0 (Jx ≥ k) = q ∧1 p |x| (2(p ∧ q))k−1 . Then P0 (Tx < ∞. we apply E0 Jx = k=1 P0 (Jx ≥ k) and perform the sum of a geometric series. If p ≤ 1/2 then this gives P0 (Jx ≥ k) = (p/q)x (2p)k−1 . Apply now the strong Markov property at Tx . Suppose x > 0. If p ≥ 1/2 then this gives P0 (Jx ≥ k) = (2q)k−1 . k ≥ 1. i. 97 . E0 Jx = If p ≥ 1/2 then P0 (Jx ≥ k) = (q/p)(−x)∨0 (2q)k−1 . Consider the case p < 1/2.Theorem 39. Consider a random walk with values in Z. Then P0 (Jx ≥ k) = P0 (Tx < ∞. If p ≤ 1/2 then this gives P0 (Jx ≥ k) = (2p)k−1 . p < 1/2) then. it will visit any point below 0 exactly the same number of times on the average.e. But for x < 0. starting from 0. ∞ As for the expectation. we have E0 Jx = 1/(1−2p) regardless of x. Jx ≥ k). Jx ≥ k) = P0 (Tx < ∞)Px (Jx ≥ k − 1). and the second in Theorem 38. The reason that we replaced k by k − 1 is because. if the random walk “drifts down” (i.e. (q/p)(−x)∨0 1 − 2q k≥1 . (2(p ∧ q))k−1 . Remark 1: The expectation E0 Jx is ﬁnite if p = 1/2. If p ≤ 1/2 then P0 (Jx ≥ k) = (p/q)x∨0 (2p)k−1 . For x > 0. Remark 2: If the initial state is not 0 then use spatial homogeneity to obtain the analogous formulae. The ﬁrst term was computed in Theorem 35. So we obtain P0 (Jx ≥ k) = p ∧1 q x (p/q)x∨0 1 − 2p k≥1 . Py (Jx ≥ k) = P0 (Jx−y ≥ k) and Ey Jx = E0 Jx−y .

for a simple random walk and for x > 0. which is basically another probability for S. . ξn−1 .i. a sum of i. Let Tx = inf{n ≥ 1 : Sn = x} be the ﬁrst time that x will be visited. we derived using duality. . in Theorem 41. random variables. and thus derived the generating function of Tx : E0 sTx = Specialising to the symmetric case we have E0 s Tx 1− 1 − 4pqs2 2qs x . 0 ≤ k ≤ n) is deﬁned by Sk := Sn − Sn−k . for a simple symmetric random walk. random variables. .32 Duality Consider random walk Sk = k ξi . i. Here is an application of this: Theorem 41. . = 1− √ 1 − s2 s x . In Lemma 19 we observed that. every time we have a probability for S we can translate it into a probability for S. Sn = x) P0 (Tx = n) = . Let (Sk ) be a simple symmetric random walk and x a positive integer. 0 ≤ k ≤ n) of (Sk . . ξ1 ). 0 ≤ k ≤ n) has the same distribution as (Sk . P0 (Sn = x) P0 (Sn = x) x . Rewrite the ballot theorem terms of the dual: P0 (S1 > 0. (36) 2 Lastly.d. .i. 0 ≤ k ≤ n). Proof. Pause for reﬂection: We have been dealing with the random times Tx for simple random walks (symmetric or asymmetric) from the beginning of the lectures. Sn > 0 | Sn = x) = But the left-hand side is P0 (Sn −Sn−k > 0. 1 ≤ k ≤ n | Sn = x) = P0 (Sk < x. Thus. (Sk . each distributed as T1 . . . . . Tx is the sum of x i. the probabilities P0 (Tx = n) = x P0 (Sn = x). 0 ≤ k ≤ n − 1. Theorem 40. i=1 Then the dual (Sk . Fix an integer n. because the two have the same distribution. . n Proof. (35) In Theorem 29 we used the reﬂection principle and derived the distribution function of Tx for a simple symmetric random walk: 1 P0 (Tx ≤ n) = P0 (|Sn | ≥ x) − P0 (|Sn | = x).e. ξ2 . n x > 0. Then P0 (Tx = n) = x P0 (Sn = x) . Use the obvious fact that (ξ1 . .d. n (37) 98 . ξn ) has the same distribution as (ξn .

We remark here something methodological: It was completely diﬀerent techniques that led to results about, essentially, the same thing: the distribution of Tx . This is an important lesson to learn. Of course, the three formulae should be compatible: the number (37) should be the coeﬃcient of sn in (35); summing up (37) should give (36), or taking diﬀerences in (36) should give (37). But on trying to verify these things algebraically, the reader will realise that it is not easy.

33

Amount of time that a SSRW is positive*

Consider a SSRW (Sn ) starting from 0. Deﬁne its trajectory as the polygonal line obtained by joining the points (n, Sn ) in the plane, as described earlier. Watch this trajectory up to time 2n. Let T+ (2n) be the number of sides of the polygonal line that are above the horizontal axis and T− (2n) = 2n − T+ (2n) the number of sides that are below. Clearly, T± (2n) are both even numbers because, if the RW is at 0, it takes an even number of steps to return to 0. We can think of T+ (2n) as the number of leads in 2n coin tosses, where we

0

T 0’

′ Figure 1: Part of the time the SSRW is positive and part negative. After it returns to 0 at time T0 ,

it behaves again like a SSRW and is independent of the past.

have a lead at some time if the number of heads till that time exceeds the number of tails. We are dealing with a fair coin which is tossed 2n times, so the event that the number of heads equals k has maximum probability when k = n (heads=tails). Most people take this statement and translate it into a statement about T+ (2n) arguing that P (T+ (2n) = m) is maximised when m = n, i.e. when T+ (2n) = T− (2n). This is wrong. In fact, as the theorem below shows, the most likely value of P (T+ (2n) = m) is when m = 0 or m = 2n. Theorem 42 ( distribution of the total time the random walk is positive ). P (T+ (2n) = 2k) = P (S2k = 0)P (S2n−2k = 0) , 0 ≤ k ≤ n.

′ Proof. The idea is to condition at the ﬁrst time T0 that the RW returns to 0: n ′ P (T+ (2n) = 2k, T0 = 2r). r=1

P (T+ (2n) = 2k) =

99

′ Between 0 and T0 the random walk is either above the horizontal axis or below; call the ﬁrst event A+ and the second A− and write n ′ ′ P (A+ , T0 = 2r)P (T+ (2n) = 2k | A+ , T0 = 2r) n ′ ′ P (A− , T0 = 2r)P (T+ (2n) = 2k | A− , T0 = 2r)

P (T+ (2n) = 2k) =

r=1

+

r=1

**In the ﬁrst case,
**

′ ′ P (T+ (2n) = 2k | A+ , T0 = 2r) = P (T+ (2n) = 2k−2r | A+ , T0 = 2r) = P (T+ (2n−2r) = 2k−2r), ′ because, if T0 = 2r and A+ occurs then the RW has already spent 2r units of time above the axis and so T+ (2n) is reduced by 2r. Furthermore, we may remove the conditioning ′ because the future evolution of the RW after T0 is independent of the past. In the second case, a similar argument shows ′ ′ P (T− (2n) = 2k | A− , T0 = 2r) = P (T− (2n) = 2k | A− , T0 = 2r) = P (T− (2n − 2r) = 2k).

**We also observe that, by symmetry, 1 ′ ′ ′ P (A+ , T0 = 2r) = P (A− , T0 = 2r) = P (T0 = 2r) = f2r . 2 Letting p2k,2n = P (T+ (2n) = 2k), we have shown that p2k,2n = 1 2
**

k

**f2r p2k−2r,2n−2r +
**

r=1

1 2

n−k

**f2r p2k,2n−2r
**

r=1

We want to show that p2k,2n = u2k u2n−2k , where u2k = P (S2k = 0). With this hypothesis, we can start playing with the recursion of the display above and using induction on n we will realise that p2k,2n 1 = u2n−2k 2

k r=1

1 f2r u2k−2r + u2k 2

n−k

f2r u2n−2k−2r .

r=1

Explicitly, we have P (T+ (2n) = 2k) = 2k k 2n − 2k −2n 2 , n−k 0 ≤ k ≤ n. (38)

With 2n = 100, we plot the function P (T100 = 2k) for 0 ≤ k ≤ 50 below. Notice that it is maximised at the ends of the interval.

100

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0

10

20 k

30

40

50

**The arcsine law*
**

Use of Stirling’s approximation in formula (38) gives P (T+ (2n) = 2k) ∼ 1/π k(n − k) , as k → ∞ and n − k → ∞.

**Now consider the calculation of the following probability: P T+ (2n) ∈ [a, b] 2n =
**

na≤k≤nb

P

T+ (2n) k = 2n n 1/π k(n − k) =

a≤k/n≤b

∼

a≤k/n≤b

1 . k k (n − n ) n n

1/π

**This is easily seen to have a limit as n → ∞, the limit being
**

b

1/π t(1 − t)

dt,

a

because the last sum is simply a Riemann approximation to the last integral. This shows the celebrated arcsine law: x √ T+ (2n) 2 1/π lim P dt = arcsin x. ≤x = n→∞ 2n π t(1 − t) 0 We thus have that the limiting distribution of T+ (2n) (the fraction of time that the RW is √ 2 positive), as n → ∞, is the distribution function π arcsin x, 0 < x < 1. This is known as the arcsine law. This distribution function has density √ 1/π , 0 < x < 1, the plot of

x(1−x)

which is amazingly close to the plot of the ﬁgure above. Now consider again a SSRW and consider the hitting times (Tx , x ≥ 0). This is another random walk because the summands in Tx = x (Ty − Ty−1 ) are i.i.d. all distributed as y=1 T1 : 2n −2n 1 P (T1 ≥ 2n) = P (S2n = 0) = 2 ∼√ . n πn 101

0). ξ2 . . we immediately get that the expectation of T1 is inﬁnity: ET1 = ∞. When he reaches a corner he moves again in the same manner.d. i.d. the random 1 2 variables ξn . (Sn . north or south. random vectors such that P (ξn = (0.e. 1 1 Proof. We have: 102 . are independent. n ∈ Z+ ) is a sum of i. x1 − x2 ). . being totally drunk. n ∈ Z+ ) showing that it. n ≥ 1. We describe this motion as a random walk in Z2 = Z × Z: Let ξ1 . map (x1 . west. y ∈ Z. ξ2 . 34 Simple symmetric random walk in two dimensions Consider a drunkard in a city whose roads are arranged in a rectangular grid. 0)) = 1/4 1 P (ξn = −1) = P (ξn = (−1. n ∈ Z+ ) is a random walk. The two random walks are not independent. Sn = i=1 n ξi . . Rn ) := (Sn + Sn .i. . 0)) = 1/4. 0)) = 1/4. . going at random east. x2 ) into (x1 + x2 . north or south. −1)) = 1/2. It is not a simple random walk because there is a possibility that the increment takes the value 0. Since ξ1 . . 0)) = P (ξn = (−1. we let x1 be its horizontal coordinate and x2 its vertical. They are also identically distributed with 1 P (ξn = 0) = P (ξn = (0. ξn are not independent simply because if one takes value 0 the other is nonzero. . Sn ) = n n 2 . Consider a change of coordinates: rotate the axes by 45o (and scale them). 1)) = P (ξn = (−1. The corners are described by coordinates (x. (Sn .From this. Sn − Sn ). So let 1 2 1 2 1 2 Rn = (Rn . He starts at (0. Indeed. he moves east. with equal probability (1/4). the two stochastic processes are not independent. 1)) + P (ξn = (0. . 1 Hence (Sn . 2 Same argument applies to (Sn . Many other quantities can be approximated in a similar manner. We then let S0 = (0.. . is a random walk. ξ2 . 0)) = P (ξn = (1. We have 1. However. i=1 ξi i=1 ξi 2 1 Lemma 20. i. for each n. random variables and hence a random walk. the latter being functions of the former. So Sn = (Sn . y) where x. 1 P (ξn = 1) = P (ξn = (1. 0) and. too. so are ξ1 . n ∈ Z+ ) is also a random walk.i. 2 1 If x ∈ Z2 . west.

To show that the last two are independent. 35 Skorokhod embedding* We take a closer look at the gambler’s ruin problem for a simple symmetric random walk in one dimension. . ηn = ε′ ) = P (ηn = ε)P (ηn = ε′ ) for all choices of ε. n ∈ Z ) is a random walk in one dimension. .1 Lemma 21. (Rn + independent. . ηn = −1) = P (ξn = (0. 0)) = 1/4 1 2 P (ηn = −1. ηn = −1) = P (ξn = (−1. Rn = n (ξi − ξi ). n ∈ Z+ ). Hence P (ηn = 1) = P (ηn = −1) = 1/2. = 1)) = 1/4.d. So Rn = n ηn where ηn is i=1 i=1 i=1 1 + ξ 2 . . n Corollary 24. as n → ∞. n ∈ Z+ ) is a random 2 . The same is true for (Rn ). b−a P (STa ∧Tb = a) = b . 2 . . The last two are walk in one dimension. n ∈ Z+ ) is a random walk in two dimensions. 2n n 2 2−4n . So P (R2n = 0) = 2n 2−2n . The possible values 1 . we need to show that ∞ P (Sn = 0) = ∞. b−a 103 . 2 1 2 2 1 1 Proof. (Rn . b−a P (Ta < Tb ) = b . P (S2n = 0) ∼ n . is i. Recall. But by the previous n=0 c lemma. 1 1 2 2 Hence P (ηn = 1) = P (ηn = −1) = 1/2. n ∈ Z ) is a random walk in one dimension. By Lemma 5. that for a simple symmetric random walk (started from S0 = 0). where c is a positive constant. η 2 are ±1. Lemma 22. (Rn + 2 2 1 2 1 1 we check that ηn = ξn + ξn . b−a −a . P (Tb < Ta ) = We can read this as follows: P (STa ∧Tb = b) = −a . η . ξ 1 − ξ 2 ). 1)) = 1/4 1 2 P (ηn = −1. We have of each of the ηn n 1 2 P (ηn = 1. ηn = 1) = P (ξn = (0. from Theorem 7.. +1}.i. Let a < 0 < b. And so 2 1 2 1 P (ηn = ε. 1 where the last equality follows from the independence of the two random walks. ε′ ∈ {−1. 1 2 1 2 P (S2n = 0) = P (R2n = 0) = P (R2n = 0. The two-dimensional simple symmetric random walk is recurrent. Proof. and by Stirling’s approximation. (Rn . R2n = 0) = P (R2n = 0)(R2n = 0). P (S2n = 0) = Proof. Hence (Rn . ηn = 1) = P (ξn = (1. The sequence of vectors η . . Similar argument shows that each of (Rn . being obtained the vector ηn = (ξn 1 2 n n n by taking the same function on the elements of the sequence ξ1 . ξ2 . 0)) = 1/4 1 2 P (ηn = 1. ηn = ξn − ξn are independent for each n. We have Rn = n (ξi + ξi ). n ∈ Z+ ) 1 is a random walk in two dimensions. in the sense that the ratio of the two sides goes to 1. But (Rn ) 1 2 is a simple symmetric random walk.

Can we ﬁnd a stopping time T such that ST has the given distribution? The Skorokhod embedding theorem tells us how to do this. P (A = 0. This means that we can simulate the distribution by generating a simple symmetric random walk starting from 0 and watching it till the ﬁrst time it exits the interval [a. Let (Sn ) be a simple symmetric random walk and (pi . such that p is rational. c= i<0 (−i)pi = i>0 Letting Ti = inf{n ≥ 0 : Sn = i}. i ∈ Z. B = 0) + P (A = i. Then Z = 0 if and only if A = B = 0 and this has 104 . and assuming that (A. 0)} ∪ (N × N) with the following distribution: P (A = i. we get that c is precisely ipi . i<0 j>0 This leads to: c−1 j>0 jpj i<0 pi + c−1 i<0 (−i)pi j>0 i>0 ipi pj = 1 − p0 Since i∈Z ipi = 0. The claim is that P (Z = k) = pk . e. How about more complicated distributions? Let suppose we are given a probability distribution (pi . Deﬁne a pair of random variables (A. B) taking values in {(0. i ∈ Z). k ∈ Z. M. given a 2-point probability distribution (p. Theorem 43. Note. if p = M/N . M < N . B = 0) = p0 . i ∈ Z. b. and the probability it exits from the right point is 1 − p. N ∈ N. B = j) = c−1 (j − i)pi pj .g. using this. Then there exists a stopping time T such that P (ST = i) = pi . Indeed. b]. we consider the random variable Z = STA ∧TB . that ESTa ∧Tb = 0. To see this. where c−1 is chosen by normalisation: P (A = 0. Conversely. B) are independent of (Sn ). suppose ﬁrst k = 0.This last display tells us the distribution of STa ∧Tb . in particular. 1 − p). i ∈ Z) a given probability distribution on Z with zero mean: i∈Z ipi = 0. choose. a < 0 < b such that pa + (1 − p)b = 0. i < 0 < j. we have this common value: i<0 (−i)pi = and. The probability it exits it from the left point is p. b = N . a = M − N . B = j) = 1. we can always ﬁnd integers a. Proof.

We have P (Z = k) = P (STA ∧TB = k) = i<0 j>0 P (STi ∧Tj = k)P (A = i. i<0 = c−1 Similarly. P (Z = k) = pk for k < 0. B = j) P (STi ∧Tk = k)P (A = i. Suppose next k > 0.probability p0 . 105 . by deﬁnition. B = k) i<0 = = i<0 −i −1 c (k − i)pi pk k−i (−i)pi pk = pk .

replace b − a by b − 2a. But d | a and d | b. · · · }.e. 38) = gcd(6.e. 38) = gcd(6. (That it is unique is obvious from the fact that if d1 . i. We write this by b | a. we may summarise the ﬁrst line by saying that we replace the largest of the two numbers by the remainder of the division of the largest with the smallest. and we’re done. . (ii) if c | a and c | b. let d = gcd(a. with the second line. and d = gcd(a. −3. Then c | a + (b − a). 106 . We will show that d = gcd(a. They are not supposed to replace textbooks or basic courses.}. But 3 is the maximum number of times that 38 “ﬁts” into 120. b are arbitrary integers. 0. 26) = gcd(6. Now. we have c | d. b − a). Similarly. we can deﬁne their greatest common divisor d = gcd(a. they merely serve as reminders. . Because c | a and c | b. 2) = gcd(2. b) as the unique positive integer such that (i) d | a. 2. their negatives and zero. But in the ﬁrst line we subtracted 38 exactly 3 times.) Notice that: gcd(a. Now suppose that c | a and c | b − a. If a | b and b | a then a = b. In other words. as taught in high schools or universities. 14) = gcd(6. b). then d | c. For instance. The set N of natural numbers is the set of positive integers: N = {1. we say that b divides a if there is an integer k such that a = kb. the process of long division that one learns in elementary school). . Indeed. −2. 2) = gcd(4. So we work faster. c | b. The set Z of integers consists of N. If b − a > a. So (i) holds. d2 are two such numbers then d1 would divide d2 and vice-versa. Keep doing this until you obtain b − qa < a. b) = gcd(a. −1.PART III: APPENDICES This section contains some sine qua non elements from standard Mathematics. Then interchange the roles of the numbers and continue. if a. 8) = gcd(6. with at least one of them nonzero. 6. 2. 38) = gcd(82. If c | b and b | a then c | a. 1. Replace b by b − a. if we also invoke Euclid’s algorithm (i. 20) = gcd(6. Given integers a. with b = 0. gcd(120. Therefore d | b − a. Zero is denoted by 0. 3. Arithmetic Arithmetic deals with integers. 32) = gcd(6. 2) = 2. leading to d1 = d2 . Z = {· · · . 3. d | b. 4. This gives us an algorithm for ﬁnding the gcd between two numbers: Suppose 0 < a < b are integers. 5. 38) = gcd(44. b. Indeed 4 × 38 > 120. So (ii) holds. b). b − a).

120) = 38x + 120y. xa + yb > 0}. let us follow the previous example again: gcd(120. Then d = xa + yb. 38) = gcd(6. The long division algorithm is a process for ﬁnding the quotient q and the remainder r. . Its proof is obvious. b) then there are integers x. Working backwards. 107 . y such that d = xa + yb. This shows (ii) of the deﬁnition of a greatest common divisor. y ∈ Z. not both zero. let d be the right hand side. 2) = 2. we have 2 = 38 − 6 × 6 = 38 − 6 × (120 − 3 × 38) = 19 × 38 − 6 × 120. to divide 3450 with 26 we do 132 —— 3450 26 — 85 78 — 70 52 — 18 26 | and obtain q = 26. a − 1}. we can show that. we used the divisions 120 = 3 × 38 + 6 38 = 6 × 6 + 2. For example. gcd(38. such that a = qb + r. r = 18. with a > b. in particular. b ∈ Z. 1. . . 38) = gcd(6. a. Thus. for some x. Here are some other facts about the greatest common divisor: First. To see why this is true. Then c divides any linear combination of a. for integers a.e. b) = min{xa + yb : x. it divides d. we can ﬁnd an integer q and an integer r ∈ {0. with x = 19. . We need to show (i). y ∈ Z. The same logic works in general. In the process. To see this. b.The theorem of division says the following: Given two positive integers b. Suppose that c | a and c | b. y = 6. Second. we have the fact that if d = gcd(a. we have gcd(a. i.

y) such that x is smaller than y. The range of f is the set of its values. i. Relations A relation on a set A is a mathematical notion that establishes a way of pairing elements of A. The greatest common divisor of 3 integers can now be deﬁned in a similar manner. A function is called onto (or surjective) if its range is Y . x2 ∈ X have diﬀerent values f (x1 ). for some 1 ≤ r ≤ d − 1.that d | a and d | b. say. but is not necessarily transitive. Suppose this is not true.. If B ⊂ Y then f −1 (B) consists of all x ∈ X with f (x) ∈ B. A relation is called reflexive if it contains pairs (x.e. 108 . Since d > 0. So r is expressible as a linear combination of a and b and is strictly smaller than d (being a remainder). In general.e. But then b = q(xa + yb) + r. x). that d does not divide both numbers. A relation is transitive if when it contains (x. A relation is called symmetric if it cannot contain a pair (x. So d is not minimum as claimed. respectively. e. that d does not divide b. y) and (y. Both < and = are transitive. the relation < between real numbers can be thought of as consisting of pairs of numbers (x. B be ﬁnite sets with n. A relation which possesses all the three properties is called equivalence relation.g. Any equivalence relation splits a set into equivalence classes: the equivalence class of x contains all y that are related (are equivalent) to x. the greatest common divisor of any subset of the integers also makes sense. For example. the set {f (x) : x ∈ X}. A function is called one-to-one (or injective) if two diﬀerent x1 . then the relation “x is a friend of y” is symmetric. yielding r = (−qx)a + (1 − qy)b. x). The equality relation is symmetric. j (i communicates with j) in Markov chain Functions We a function f from a set X into a set Y is denoted by f : X → Y . a relation can be thought of as a set of pairs of elements of the set A. so r = 0. If we take A to be the set of students in a classroom. y) without contain its symmetric (y. so d divides both a and b. Finally. i. The equality relation = is reﬂexive. m elements. Counting Let A. f (x2 ). it can be thought as reﬂexive (unless we do not want to accept the fact that one may not be afriend of oneself). and this is a contradiction. z) it also contains (x. The relation < is not symmetric. we can divide b by d to ﬁnd b = qd + r. z). The set of functions from X to Y is denoted by Y X . We encountered the equivalence relation i theory. The relation < is not reﬂexive.

and so on. So we have (n)k = n(n − 1)(n − 2) · · · (n − k + 1) ways to pick k elements if we care about their order. 109 . and so on.) So the ﬁrst animal can be assigned any of the m available names. Incidentally. (The same name may be used for diﬀerent animals. and this follows from the binomial theorem. so does the second. the number of subsets with k elements each can be found by designating which of the n elements of A will be chosen. we have = 1. A function is then an assignment of a name to each animal. n! stands for the product of the ﬁrst n natural numbers: n! = 1 × 2 × 3 × · · · × n. Thus. we may think of the elements of A as animals and those of B as names. Each assignment of this type is a one-to-one function from A to B. We thus observe that there are as many subsets in A as number of elements in the set {0. 1}n of sequences of length n with elements 0 or 1. we denote (n)n also by n!. a one-to-one function from the set A into A is called a permutation of A. the cardinality of the set B A ) is mn . The ﬁrst element can be chosen in n ways. and the third. Hence the total number of assignments is m × m × ··· × m n times Suppose now we are not allowed to give the same name to diﬀerent animals. The total number of subsets of A (regardless of their number of elements) is thus equal to n k=0 n k = (1 + 1)n = 2n . Any set A contains several subsets. the total number of one-to-one functions from A to B is (m)n := m(m − 1)(m − 2) · · · (m − n + 1).The number of functions from A to B (i. n! is the total number of permutations of a set of n elements. In the special case m = n. Since the name of the ﬁrst animal can be chosen in m ways. the k-th in n − k + 1 ways. Indeed. To be able to do this.e. in other words. k! k where the symbol equation. k!(n − k)! Since there is only one subset with no elements (it is the empty set ∅). n k is merely a name for the number computed on the left side of this n k = n n−k n 0 = n! . and so on. the second in n − 1. If A has n elements. we need more names than animals: n ≤ m. that of the second in m − 1 ways. Clearly. If we don’t then we divide by the number k! of permutations of the k elements concluding that the number of subsets with k elements of a set with n elements equals (n)k n =: .

the pig. a+ is called the positive part of a. Note that it is always true that a = a+ − a− . + k−1 k Indeed. we add the two numbers together in order to ﬁnd the total number of ways to pick k animals from n + 1 animals. For example. 0). and a− is called the negative part of a. b). The absolute value of a is deﬁned by |a| = a+ + a− . we let The Pascal triangle relation states that n+1 k = = 0. m! n y If y is not an integer then. Both quantities are nonnegative. a− = − min(a. if x is a real number. If the pig is not included in the sample. say. If the pig is included. The maximum is denoted by a ∨ b = max(a. n n . 110 . in choosing k animals from a set of n + 1 ones. Note that −(a ∨ b) = (−a) ∧ (−b) and (a ∧ b) + c = (a + c) ∧ (b + c). The notation extends to more than one numbers. by convention. and we don’t care to put parantheses because the order of ﬁnding a maximum is irrelevant. we deﬁne x m = x(x − 1) · · · (x − m + 1) . we choose k animals from the remaining n ones and this can be done in n ways. b is denoted by a ∧ b = min(a. by convention. consider a speciﬁc animal. 0). b). b. Maximum and minimum The minimum of two numbers a. then there are k − 1 remaining k n animals to be chosen and this can be done in k−1 ways. Since the sentences “the pig is included” and “the pig is not included” cannot be simultaneously true. k Also. In particular. we use the following notations: a+ = max(a. c. a∨b∨c refers to the maximum of the three numbers a.We shall let n = 0 if n < k. Thus a ∧ b = a if a ≤ b or = b if b ≤ a.

Indicator function stands for the indicator function of a set A. Thus. . It takes value 1 with probability P (A) and 0 with probability P (Ac ). P (X ≥ n). . . we have E[1A∪B ] = E[1A ] + E[1B ] − E[1A∩B ]. which means P (A ∪ B) = P (A) + P (B) − P (A ∩ B). Then X X= n=1 1 (the sum of X ones is X). For instance. 2. This is very useful notation because conjunction of clauses is transformed into multiplication. So E[X] = n≥1 E 1(X ≥ n) = n≥1 Matrices Let Rd be the set of vectors with d components. Therefore its expectation is E 1A = 1 × P (A) + 0 × P (Ac ) = P (A). meaning a function that takes value 1 on A and 0 on the complement Ac of A. 1 − 1A = 1 A c . if A1 . An example of its application: Since 1A∪B = 1 − 1(A∪B)c = 1 − 1Ac ∩Bc = 1 − 1Ac 1Bc = 1 − (1 − 1A )(1 − 1B ) = 1A + 1B − 1A 1B . . are sets or clauses then the number n≥1 1An is the number true clauses. 111 . We also use the notation 1(clause) for a number that is 1 whenever the clause is true or 0 otherwise. A2 . .}. It is easy to see that 1A 1A∩B = 1A 1B . So X= n≥1 1(X ≥ n). A linear function ϕ from Rn to Rm is such that ϕ(αx + βy) = αϕ(x) + βϕ(y). Several quantities of interest are expressed in terms of indicators. It is worth remembering that. . = 1A + 1B − 1A∩B . Another example: Let X be a nonnegative random variable with values in N = {1. If A is an event then 1A is a random variable. if I go to a bank N times and Ai is the clause “I am robbed the i-th time” then N n=1 1Ai is the total number of times I am robbed.

Then. Let u1 . . = ··· ··· ··· . . The matrix A of ϕ with respect to the choice of the two basis is deﬁned by a11 · · · a1n A := · · · · · · · · · am1 · · · amn y1 Then ϕ ◦ ψ is a linear function: Rn − → Rℓ − 112 ϕ◦ψ . . . ym v1 . . Let y i := n aij xj . un be a basis for Rn . x . . . x1 . xn the given basis. am1 · · · amn xn ym Rn − Rm − Rℓ → → ψ ϕ . The column . . is a column representation of y := ϕ(x) with respect to the given basis . . . where ∈ R. and all α. xn n 1 y := ϕ(x) = j=1 xj ϕ(uj ). vm in Rn we can express each ϕ(uj ) as a linear combination of the basis vectors: m ϕ(uj ) = i=1 aij vi . ϕ are linear functions: It is a m × n matrix. . Suppose that ψ. . . we have ϕ(x) = m n vi i=1 j=1 aij xj . β ∈ R. . . Combining. The column . m. . . . . But the ϕ(uj ) are vectors in Rm . because ϕ is linear. The relation between the column representation of y and the column representation of x is written as 1 1 x y a11 · · · a1n . So if we choose a basis v1 . . j=1 i = 1. . vm . . .for all x. . is a column representation of x with respect to . Then any x ∈ Rn can be uniquely written as n x= j=1 x j uj . y ∈ Rn .

we deﬁne max I. If I is empty then inf ∅ = +∞. respectively. since any increasing sequence has a limit (possibly +∞) we take this limit and call it lim sup xn (limit superior) of the sequence. . The standard basis in Rd consists of the vectors 1 0 0 0 1 0 e 1 = . · · · }. . e2 = . 1 ≤ i ≤ ℓ. xn+1 . Similarly. ψ. . In these cases we use the notation inf I. . i. sup I is characterised by: sup I ≥ x for all x ∈ I and. Likewise. there is x ∈ I with x ≥ ε + inf I. . x2 . sup ∅ = −∞. 1 ≤ j ≤ n. ei = 1(i = j). . for any ε > 0.e. . We note that lim inf(−xn ) = − lim sup xn . . . Then the matrix representation of ϕ ◦ ψ is AB. j So A is a ℓ × m matrix. x3 . B is a m × n matrix. and AB is a ℓ × n matrix.} has no minimum. Similarly. of numbers) then the minimum of I is denoted by min I and refers to a number c ∈ I such that c ≤ x for all the numbers x ∈ I. . Limsup and liminf If x1 . A matrix A is called square if it is a n × n matrix. . . it is its matrix representation with respect to the ﬁxed basis. a sequence a1 .Fix three sets of basis on Rn . ed = . the choice of the basis is the so-called standard basis. . The sequence xn has a limit ℓ if and only if lim sup xn = lim inf xn = ℓ. . deﬁned as the maximum of all the numbers y with the property y ≤ x for all x ∈ I. B be the matrix representations of ϕ. for any ε > 0. . . If I has no lower bound then inf I = −∞. standing for the “infimum” of I. . x4 . This minimum (or maximum) may not exist.} ≤ inf{x3 . . Supremum and inﬁmum When we have a set of numbers I (for instance. . there is x ∈ I with x ≤ ε + inf I. . . x2 . 1/3. 1/2. . 0 0 1 In other words. So lim sup xn = limn→∞ inf{xn . a2 . If I has no upper bound then sup I = +∞. with respect to these bases. where m (AB)ij = k=1 Aik Bkj . Usually. For instance I = {1. we deﬁne lim sup xn . . . . Rm and Rℓ and let A. . . . is a sequence of numbers then inf{x1 . a square matrix represents a linear transformation from Rn to Rn . 113 . and it is this that we called inﬁmum. we deﬁne the supremum sup I to be the minimum of all upper bounds. If we ﬁx a basis in Rn . xn+2 . . A property of real numbers says that if the set I is bounded from below then this maximum of all lower bounds exists. . inf I is characterised by: inf I ≤ x for all x ∈ I and. .} ≤ · · · and.} ≤ inf{x2 .

by not requiring this. to compute P (A ∩ B) we read the deﬁnition as P (A ∩ B) = P (A)P (B|A). A is a subset of B). then P n≥1 An = P (An ).) If A. and everywhere else. to compute P (A). . to show that EX ≤ EY . If A is an event then 1A or 1(A) is the indicator random variable. B2 . Linear Algebra that has many axioms is more ‘restrictive’. we get P (A ∩ B) = P (A)P (B|A) = P (B)P (A|B) and this is called Bayes’ theorem. B are disjoint events then P (A ∪ B) = P (A) + P (B). This is the inclusion-exclusion formula for two events. in which case we write P (A) = n P (A ∩ Bn ) = P (A|Bn )P (Bn ). . B are not disjoint then P (A ∪ B) = P (A) + P (B) − P (A ∩ B). and apply them. Indeed. are increasing with A = ∪n An then P (A) = limn→∞ P (An ). in these notes. i. Therefore. Quite often. . This follows from the logical statement X 1(X > x) ≤ X. . if An are events with P (An ) = 1 then P (∩n An ) = 1. . A2 . In Markov chain theory. are mutually disjoint events. which holds since X is positive. I do sometimes “cheat” from a strict mathematical point of view. the function that takes value 1 on A and 0 on Ac . Similarly. and 1 Ac = 1 − 1 A .. A2 . 11 114 . . are decreasing with A = ∩n An then P (A) = limn→∞ P (An ). if he events A1 . E 1A = P (A). If the events A. In contrast. for (besides P (Ω) = 1) there is essentially one axiom . Recall that P (B|A) is deﬁned by P (A ∩ B)/P (A) and is a very motivated deﬁnition. Of course. (That is why Probability Theory is very rich. it is useful to ﬁnd disjoint events B1 .e. For example. . This is generalisable for a number of events: P (A1 ∩ A2 ∩ A3 ) = P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 ) If we interchange the roles of A. This can be generalised to any ﬁnite number of events. we often need to argue that X ≤ Y . In contrast.Probability Theory I do not require11 knowledge of axiomatic Probability Theory (with Measure Theory). P (∪n An ) ≤ Similarly. follows from this axiom. but not too much. Linear Algebra that has many axioms is more ‘restrictive’. . derive formulae. showing that P (A) ≤ P (B) is more of a logical problem than a numerical one: we need to show that A implies B (i. If An are events with P (An ) = 0 then P (∪n An ) = 0.e. . Often. we work a lot with conditional probabilities. n P (An ). Markov’s inequality for a positive random variable X states that P (X > x) ≤ EX/x. namely: Everything else. But I would like the reader to keep in mind one thing: Every mathematical system has a set of axioms from which we can prove (useful) theorems. . n≥1 Usually. n If the events A1 . . such that ∪n Bn = Ω. . Probability Theory is exceptionally simple in terms of its axioms. Similarly. A2 .. If A1 . Note that 1A 1B = 1A∩B . B in the deﬁnition. That is why Probability Theory is very rich.

making it a useful object when the a0 . As an example. If it exists for some s1 then it exists uniformly for all s with |s| < |s1 | (and this needs a proof. a1 . 1]. n ≥ 0 is ∞ ρn s n = n=0 1 .An application of this is: P (X < ∞) = lim P (X ≤ x). So if n an < ∞ then G(s) exists for all s ∈ [−1. . 1].. viz. 1. . In case the random variable is ∞ absoultely continuous with density f . . a1 . 0). taught in calculus courses). . . X − = − min(X. which is motivated from the algebraic identity X = X + − X − . The expectation of a nonnegative random variable X may be deﬁned by ∞ EX = 0 P (X > x)dx. For a random variable which takes positive and negative values. for s = 0 for which G(0) = 0. ϕ(s) = EsX = n=0 sn P (X = n) 115 . we deﬁne X + = max(X. we encounter situations where P (X = ∞) may be positive. which are both nonnegative and then deﬁne EX = EX + − EX − . 0). . . The formula holds for any positive random variable. |s| < 1/|ρ|. In case the random variable is discrete. .e. is called probability generating function of X: ∞ as long as |ρs| < 1. . The generating function of the sequence pn = P (X = n). n = 0. Generating functions A generating function of a sequence a0 . the formula reduces to the known one. 1 − ρs (39) Note that |G(s)| ≤ n |an ||s|n and this is ≤ n |an | if s ∈ [−1. is a probability distribution on Z+ = {0. is an inﬁnite polynomial having the ai as coeﬃcients: G(s) = a0 + a1 s + a2 s2 + · · · . For a random variable that takes values in N we can write EX = ∞ P (X ≥ n). 2. .} because then n an = 1. This is due to the fact that n=1 P (X ≥ n) = E 1(X ≥ n). . EX = x xP (X = x). 1. x→∞ In Markov chain theory. This deﬁnition makes sense only if EX + < ∞ or if EX − < ∞. we have EX = −∞ xf (x)dx. i. Suppose now that X is a random variable with values in Z+ ∪ {+∞}. so it is useful to keep this in mind. obviously. . the generating function of the geometric sequence an = ρn . We do not a priori know that this exists for any s other than.

we can recover the expectation of X from EX = lim ϕ′ (s). As an example. we have s∞ = 0. n ≥ 0) then the generating function of (xn+1 . take the value +∞. but. possibly. ∞ and p∞ := P (X = ∞) = 1 − pn . while ∞ ϕ(1) = n=0 P (X = n) = P (X < +∞). Generating functions are useful for solving linear recursions. If we let X(s) = n≥0 n ≥ 0. Also. consider the Fibonacci sequence deﬁned by the recursion xn+2 = xn+1 + xn . since |s| < 1. s→1 something that can be formally seen from d d EsX = E sX = EXsX−1 . Since the term s∞ p∞ could formally appear in the sum deﬁning ϕ(s). with x0 = x1 = 1. 116 . n ≥ 0) is n≥0 xn+1 sn = s−1 (X(s) − x0 ) and the generating function of (xn+2 . n! as follows from Taylor’s theorem.This is deﬁned for all −1 < s < 1. and this is why ϕ is called probability generating function. n=0 We do allow X to. ds ds and by letting s = 1. Note that ∞ n=0 pn ≤ 1. Note that ϕ(0) = P (X = 0). n ≥ 0) is n≥0 xn+2 sn = s−2 (X(s) − x0 − sx1 ). We can recover the probabilities pn from the formula pn = ϕ(n) (0) . xn sn be the generating of (xn .

and so X(s) = 1 1 1 1 1 1 = . i. Hence X(s) is the generating function of 1 xn = ((1/b)n+1 − (1/a)n+1 ). the n-th term of the Fibonacci sequence. the function P (X ≤ x). b = −( 5 + 1)/2. n ≥ 0) equals the sum of the generating functions of (xn+1 . Thus (1/ρ)−s is the 1 generating function of ρn+1 .e. This tells us that s−b is the generating function of (1/b)n+1 1 and that s−a is the generating function of (1/a)n+1 . Despite the square roots in the formula. b−a Noting that ab = −1.e.From the recursion xn+2 = xn+1 + xn we have (and this is were linearity is used) that the generating function of (xn+2 . is such an example. n ≥ 0) and (xn . b−a is the solution to the recursion. 117 . namely. x ∈ R. i. − − a−b s−b s−a b−a b−s a−s 1/ρ 1 1 But (39) tells us that ρn has generating function 1−ρs = (1/ρ)−s . for example. So. we speak of some function which enables us to specify the probabilities of the form P (X ∈ B) where B is a set of values of X. this number is always an integer (it has to be!) Distribution or law of a random object When we speak of the “distribution” or “law” of a random variable X with values in Z or R or Rn or even in a much larger space. But I could very well use the function P (X > x). we can solve for X(s) and ﬁnd X(s) = s2 −1 . Since x0 = x1 = 1. then the so-called distribution function. if X is a random variable with values in R. Hence s2 + s − 1 = (s − a)(s − b). +s−1 But the polynomial s2 + s − 1 has two roots: √ √ a = ( 5 − 1)/2. we can also write this as xn = (−a)n+1 − (−b)n+1 . n ≥ 0). s−2 (X(s) − x0 − sx1 ) = s−1 (X(s) − x0 ) + X(s).

n Hence n k=1 log k ≈ log x dx. We frequently encounter random objects which belong to larger spaces. dx All these objects can be referred to as “distribution” or “law” of X. it follows that n 1 log x dx = n log n − n + 1 ≈ n log n − n. Now. But there are two problems: (a) We do not know in what sense the symbol ≈ should be interpreted. to compute a large sum we can. I could use its density f (x) = d P (X ≤ x). 1 Since d dx (x log x − x) = log x. If X is absolutely continuous. Since log x does not vary too much between two successive positive integers k and k + 1 it follows that k+1 k log x dx ≈ log k. For example (try it!) the rough Stirling approximation give that the probability that. but using or merely talking about such an object presents no diﬃculty other than overcoming a psychological barrier. in 2n fair coin tosses 118 . since the number is so big. for it does allow us to specify the probabilities P (X = n). Even the generating function EsX of a Z+ -valued random variable X is an example. Hence we have We call this the rough Stirling approximation. instead. n! ≈ exp(n log n − n) = nn e−n . it’s better to use logarithms: n n! = e log n = exp k=1 log k. n ∈ Z+ . for example they could take values which are themselves functions. Such an excursion is a random object which should be thought of as a random function. It turns out to be a good one. For instance. (b) The approximation is not good when we consider ratios of products. Specifying the distribution of such an object may be hard. we encountered the concept of an excursion of a Markov chain from a state i. Warning: the verb “specify” should not be interpreted as “compute” in the sense of existence of an analytical formula or algorithm.or the 2-parameter function P (a < X < b). compute an integral and obtain an approximation. Large products: Stirling’s formula If we want to compute n! = 1 × 2 × · · · × n for large n then.

Eg(X) ≥ g(x)P (X ≥ x) . expressing monotonicity. because we know that the n probability goes to zero as n → ∞. if y ≥ x then 1(y ≥ x) = 1 and the display tells us g(y) ≥ g(x). And if y < x then 1(y < x) = 0 and the display reads g(y) ≥ 0. so this little section is a reminder. That was the random variable J0 = ∞ n=1 1(Sn = 0). To remedy this. In fact it was shown there that P (J0 ≥ k) = P (J0 ≥ 1)k . Let X be a random variable. as n → ∞. where λ = P (X ≥ 1). for all x. The geometric distribution We say that the random variable X with values in Z+ is geometric if it has the memoryless property: P (X − n ≥ m | X ≥ n) = P (X ≥ m). we shall prove Strirling’s approximation: √ n! ∼ nn e−n 2πn where the symbol an ∼ bn means that an /bn → 0. and so. by taking expectations of both sides (expectation is an increasing operator). Then. 119 . then X is such that knowledge that X has exceeded n does not make the distribution of the remaining time X − n any diﬀerent from X. We baptise this geometric(λ) distribution. We have already seen a geometric random variable. Consider an increasing and nonnegative function g : R → R+ . Indeed. y ∈ R g(y) ≥ g(x)1(y ≥ x). Markov’s inequality Arguably this is the most useful inequality in Probability. The same process shows that a geometric random variable has a distribution of the form P (X ≥ n) = λn . a geometric function of k. n ∈ Z+ . and this is terrible. g(X) ≥ g(x)1(X ≥ x).we have exactly n heads equals 2n 2−2n ≈ 1. We can case both properties of g neatly by writing For all x. you have seen it before. expressing non-negativity. If we think of X as the time of occurrence of a random phenomenon. This completely trivial thought gives the very useful Markov’s inequality. deﬁned in (34). and is based on the concept of monotonicity. Surely. 0 ≤ n ≤ m. It was shown in Theorem 38 that J0 satisﬁes the memoryless property.

J L (1997) Introduction to Probability. K L and Aitsahlia F (2003) Elementary Probability Theory (With Stochastic Processes and an Introduction to Mathematical Finance). Soc. Bremaud P (1997) An Introduction to Probabilistic Modeling. Norris.edu/~chance/teaching_aids/books_articles/ /probability_book/amsbook. Math. Springer. C M and Snell.dartmouth. Cambridge University Press. Grinstead. Springer. 5. Feller. Chung. 3. Bremaud P (1999) Markov Chains. W (1981) Topics in Ergodic Theory. Springer.pdf 6. Also available at: http://www. ´ 2. Parry. W (1968) An Introduction to Probability Theory and its Applications. Cambridge University Press. 7. Wiley. 120 . 4. Amer.Bibliography ´ 1.mac. J R (1998) Markov Chains.

63 Galton-Watson process. 18 permutation. 101 axiom of Probability Theory.Index absorbing. 111 inessential state. 77 coupling. 7 invariant distribution. 65 generating function. 61 recurrent. 2 nearest neighbour random walk. 111 maximum. 82 Cesaro averages. 34 PageRank. 17 Ethernet. 108 ergodic. 110 meeting time. 59 law. 3 branching process. 115 random walk on a graph. 108 relation. 17 Kolmogorov’s loop criterion. 110 mixing. 17 communicates with. 109 positive recurrent. 6 Markov’s inequality. 61 detailed balance equations. 70 period. 84 bank account. 16. 119 Google. 51 Monte Carlo. 53 hitting time. 119 minimum. 16 communicating classes. 53 Fibonacci sequence. 6 Markov property. 15 distribution. 119 matrix. 20 function of a Markov chain. 45 Chapman-Kolmogorov equation. 114 balance equations. 68 Fatou’s lemma. 116 ﬁrst-step analysis. 73 Markov chain. 10 irreducible. 1 equivalence classes. 13 probability generating function. 115 geometric distribution. 113 initial distribution. 44 degree. 18 arcsine law. 86 reﬂexive. 47 coupling inequality. 48 cycle formula. 34 probability ﬂow. 106 ground state. 77 indicator. 70 greatest common divisor. 16 convolution. 58 digraph (directed graph). 108 ruin probability. 26 running maximum. 17 Aloha. 51 essential state. 87 121 . 61 null recurrent. 117 leads to. 7 closed. 15 Loynes’ method. 81 reﬂection principle. 76 neighbours. 29 reﬂection. 117 division. 98 dynamical system. 18 dual. 68 aperiodic. 17 inﬁmum. 48 memoryless property. 65 Catalan numbers. 15. 20 increments. 16 equivalence relation. 35 Helly-Bray Lemma. 10 ballot theorem.

27 subordination. 6 statespace. 29 transition diagram:. 104 states.semigroup property. 16. 3. 113 symmetric. 119 strong Markov property. 60 urn. 10 steady-state. 16. 108 tree. 7. 88 watching a chain in a set. 24 Strirling’s approximation. 8 transition probability matrix. 13 stopping time. 58 transient. 108 time-homogeneous. 77 skip-free property. 9 stationary distribution. 62 subsequence of a Markov chain. 6 stationary. 71 122 . 7 time-reversible. 7 simple random walk. 73 strategy of the gambler. 2 transition probability. 62 Wright-Fisher model. 76 Skorokhod embedding. 62 supremum. 27 storage chain. 76 simple symmetric random walk. 8 transitive.

Sign up to vote on this title

UsefulNot useful- markov-chains
- Markov Chain
- 20752_0412055511MarkovChain
- (Ebook) Hidden Markov Models (Theory & Methods) Markov Chains Particle Filter Monte Carlo Hmm - Cappemoulinesryden
- MarkovChains
- Markov Chains
- Bayesian Econometric Methods
- Markov Coding
- ProblemsMarkovChains
- A First Course in Linear Algebra
- An Introduction to Statistics
- Statistics
- Exercises for Stochastic Calculus
- Introduction to Markov Chain Monte Carlo Simulation and Their Statistical Analysis
- Markov Chain
- Markov
- Exponential Distribution Theory and Methods
- markov chain chapter 6
- Stochastic Processes
- Computational Economic Analysis for Engineering and Industry Industrial Innovation
- A Course in Linear Algebra .pdf
- Solved Problems
- Ash Basic Probability Theory (Dover)
- Usig R for Introductory Statistics
- Basic Topology Armstrong
- Grimmett g.r., Stirzaker d.r. Probability and Random Processes (3ed., Oxford, 2001)(1)
- Weather Wax Ross Solutions
- MIR publisher
- Math Olympiad
- Markov Chain Random Walk

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.