This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

**MARKOV CHAINS AND RANDOM WALKS
**

Takis Konstantopoulos∗ Autumn 2009

∗

c Takis Konstantopoulos 2006-2009

Contents

1 Introduction 2 Examples of Markov chains 2.1 2.2 2.3 2.4 2.5 A mouse in a cage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bank account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple random walk (drunkard’s walk) . . . . . . . . . . . . . . . . . . . . . . Simple random walk (drunkard’s walk) in a city . . . . . . . . . . . . . . . . . Actuarial chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 3 4 4 5 6 6 6 7 9

3 The Markov property 3.1 3.2 3.3 Deﬁnition of the Markov property . . . . . . . . . . . . . . . . . . . . . . . . Transition probability and initial distribution . . . . . . . . . . . . . . . . . . Time-homogeneous chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Stationarity 4.1

Finding a stationary distribution . . . . . . . . . . . . . . . . . . . . . . . . . 13 15

5 Topological structure 5.1 5.2 5.3 5.4

The graph of a chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The relation “leads to” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The relation “communicates with” . . . . . . . . . . . . . . . . . . . . . . . . 16 Period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 20 24 27 28 29

6 Hitting times and ﬁrst-step analysis 7 Gambler’s ruin 8 Stopping times and the strong Markov property 9 Regenerative structure of a Markov chain 10 Recurrence and transience

10.1 First hitting time decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 31 10.2 Communicating classes and recurrence/transience . . . . . . . . . . . . . . . . 32 11 Positive recurrence 33

i

12 Law (=Theorem) of Large Numbers in Probability Theory 13 Law of Large Numbers for Markov chains 13.1 Functions of excursions

34 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

13.2 Average reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 14 Construction of a stationary distribution 15 Positive recurrence is a class property 16 Uniqueness of the stationary distribution 17 Structure of a stationary distribution 18 Coupling and stability 39 41 43 44 45

18.1 Deﬁnition of stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 18.2 The fundamental stability theorem for Markov chains . . . . . . . . . . . . . 47 18.3 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 18.4 Proof of the fundamental stability theorem . . . . . . . . . . . . . . . . . . . 49 18.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 18.6 Periodic chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 19 Limiting theory for Markov chains* 53

19.1 Limiting probabilities for null recurrent states . . . . . . . . . . . . . . . . . . 53 19.2 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 20 Ergodic theorems 21 Finite chains 22 Time reversibility 23 New Markov chains from old ones* 56 57 58 62

23.1 Subsequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 23.2 Watching a Markov chain when it visits a set . . . . . . . . . . . . . . . . . . 62 23.3 Subordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 23.4 Deterministic function of a Markov chain . . . . . . . . . . . . . . . . . . . . 63 24 Applications 65

ii

.3 PageRank (trademark of Google): A World Wide Web application . 73 25 Random walk 26 The simple symmetric random walk in dimension 1: path counting 76 78 26. . 95 31. . . . . . . 84 27. . . . . . . . . . . . . . . . . . . . . . .1 Distribution of hitting time and maximum . .2 Paths that start and end at speciﬁc points and do not touch zero at all . . . .2 The ballot theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 30. . . . . . . . . . . . . 65 24.2 First return to the origin . . . 85 27. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 27. . . . . . . . . . . . . . .1 First hitting time analysis . . .4 The Wright-Fisher model: an example from biology .1 Distribution after n steps . . . . . . . . . . . . .3 The distribution of the ﬁrst return to the origin . . . . . 71 24. . 80 27 The simple symmetric random walk in dimension 1: simple probability calculations 83 27. 90 30. . . . . . . . .5 First return to zero . . . . . . . . . . . . . . . . .3 Some conditional probabilities . . . . . . . . . . . . . . . . . . . . . . . . . .2 The ALOHA protocol: a computer communications application . . . . . . . . . . . . .1 Asymptotic distribution of the maximum . . . 79 26. . . . . . .4 Some remarkable identities . . . . . . . . . . . . . . . . . . . . . . 92 31 Recurrence properties of simple random walks 94 31. . . . . . . . . . . . . . . . . . .1 Paths that start and end at speciﬁc points . . . . . . . . . . .2 Return probabilities . . . . 70 24. . . . . . . . . . . . . . . . . . . . 96 32 Duality 33 Amount of time that a SSRW is positive* 98 99 iii . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 27. . . . . . . . . . . . . 87 29 Urns and the ballot theorem 30 The asymmetric simple random walk in dimension 1 88 89 30. . . . . 86 28 The reﬂection principle for a simple random walk in dimension 1 86 28. . . .1 Branching processes: a population growth application . . . . . . . . . . . . . . . . . . . . .5 A storage or queueing application . . . . . . . . . . . . . . . . . 95 31. . . . . . . . . . . . . . . . . .24. . .3 Total number of visits to a state . . . . 68 24. . . . . . . . . .

34 Simple symmetric random walk in two dimensions 35 Skorokhod embedding* 102 103 iv .

A few starred sections should be considered “advanced” and can be omitted at ﬁrst reading.. The ﬁrst part is about Markov chains and some applications.K. I tried to make everything as rigorous as possible while maintaining each step as accessible as possible. of course. Also. Also.. one can argue that random walk calculations should be done before the student is exposed to the Markov chain theory.S. I tried to put all terms in blue small capitals whenever they are ﬁrst encountered. There are.. I have a little mathematical appendix.Preface These notes were written based on a number of courses I taught over the years in the U. Greece and the U. At the end. I introduce stationarity before even talking about state classiﬁcation. I have tried both and prefer the current ordering. The second one is speciﬁcally for simple random walks. Of course. They form the core material for an undergraduate course on Markov chains in discrete time. “important” formulae are placed inside a coloured box. There notes are still incomplete. v . I plan to add a few more sections: – On algorithms and simulation – On criteria for positive recurrence – On doing stuﬀ with matrices – On ﬁnance applications – On a few more delicate computations for simple random walks – Reshape the appendix – Add more examples These are things to come. The notes should be readable by someone who has taken a course in introductory (non-measuretheoretic) probability. The only new thing here is that I give emphasis to probabilistic methods as soon as possible. dozens of good books on the topic.

It is not hard to see that. a totally ordered set. b). An example from Physics is provided by Newton’s laws of motion. the real numbers). X2 . and the other the greatest common divisor. An example from Arithmetic of a dynamical system in discrete time is the one which ﬁnds the greatest common divisor gcd(a. we let our “state” be a pair of integers (xn . The “time” can be discrete (i. By repeating the procedure. b). yn ). x = x(t) denotes the position of the particle at time t. x Here. b). for any n. X1 . In Mathematics. s ≥ t) ˙ 1 . . and evolving as xn+1 = yn yn+1 = rem(xn . that gcd(a. we end up with two numbers. . . yn+1 ). a phenomenon which evolves with time in a way that only the present aﬀects the future is called a dynamical system. For instance. one of which is 0. Assume that the force F depends only on the particle’s position ¨ dt and velocity. or. continuous (i. yn ). Recall (from elementary school maths). the integers). In the module following this one you will study continuous-time chains. y0 = min(a. The sequence of pairs thus produced. . depend only on the pair Xn . The state of the particle at time t is described by the pair ˙ X(t) = (x(t). Suppose that a particle of mass m is moving under the action of a force F in a straight line. more generally. X0 . for any t.e. n = 0. the future pairs Xn+1 . has the property that.e. . . 1. F = f (x. We are herein constrained by the Syllabus to discuss only discrete-time Markov chains. b) = gcd(r. x(t)). there is a function F that takes the pair Xn = (xn . where r = rem(a. Its velocity is x = dx . yn ) into Xn+1 = (xn+1 . Xn+2 . b) between two positive integers a and b.e. 2. where xn ≥ yn . b) is the remainder of the division of a by b. i. Formally. . x). and its ˙ dt 2x acceleration is x = d 2 . . In other words.PART I: MARKOV CHAINS 1 Introduction A Markov chain is a mathematical model of a random phenomenon evolving with time in a way that the past aﬀects the future only through the present. the particle is suspended at the end of a spring and force is produced by extending the spring and letting the system oscillate (Hooke’s law). the future trajectory (X(s). initialised by x0 = max(a. Newton’s second law says that the acceleration is proportional to the force: m¨ = F. .

uniformly. Each time count 1 if Yn < f (Xn ) and 0 otherwise.99. Indeed. analyse.1 They are stochastic either because the problem they are solving is stochastic or because the problem is deterministic but “very large” such as ﬁnding the determinant of a matrix with 10. We will develop a theory that tells us how to describe. instead of deterministic objects. We will also see why they are useful and discuss how they are applied. we will see what kind of questions we can ask and what kind of answers we can hope for.can be completely speciﬁed by the current state X(t). containing fresh and stinky cheese. regardless of where it was at earlier times. In addition. In other words. We can summarise this information by the transition diagram: 1 Stochastic means random. A scientist’s job is to record the position of the mouse every minute. The generalisation takes us into the realm of Randomness. a ≤ X0 ≤ b. Other examples of dynamical systems are the algorithms run. 2. Repeat with (Xn . say. Similarly. at time n + 1 it is either still in 1 or has moved to 2. Try this on the computer! 2 . Markov chains generalise this concept of dependence of the future only on the present. Choose a pair (X0 . via the use of the tools of Probability.000 columns or computing the integral of a complicated function of a large number of variables. The resulting ratio is an approximation b of a f (x)dx.e.. for steps n = 1. Yn ). and use those mathematical models which are called Markov chains. by the software in your computer. Y0 ) of random numbers. A mouse lives in the cage. We will be dealing with random variables.05. . . 1 2 Statistical observations lead the scientist to believe that the mouse moves from cell 1 to cell 2 with probability α = 0. past states are not needed. but some are stochastic.1 Examples of Markov chains A mouse in a cage A mouse is in a cage with two cells. it does so. Count the number of 1’s and divide by N . Some of these algorithms are deterministic. it moves from 2 to 1 with probability β = 0. respectively. . an eﬀective way for dealing with large problems is via the use of randomness. An example of a random algorithm is the Monte Carlo method for the b approximate computation of the integral a f (x)dx of a complicated function f : suppose that f is positive and bounded below B. Perform this a large number N of times. 0 ≤ Y0 ≤ B.000 rows and 10. 2 2. 1 and 2. i. When the mouse is in cell 1 at time n (minutes) then.

FS . D2 . Assume that Dn . respectively.) 2. he can’t. the ﬁrst time that the mouse will move from 1 to 2 will have mean 1/α = 1/0. answer: Since the mouse really tosses a coin to decide whether to move or stay in a cell. S1 . on the average.95 0. 0}. Sn . . How long does it take for the mouse. D1 . The information described above by the evolution of the account together with the distributions for Dn and Sn leads to the (one-step) transition probability which is deﬁned by: px. We easily see that if y > 0 then ∞ x. to move from cell 1 to cell 2? 2. as follows: Xn+1 = max{Xn + Dn − Sn . px. 1. . .2 Bank account The amount of money in Mr Baxter’s bank account evolves. and Sn is the money he wants to spend. are independent random variables. from month to month.α 1−α 1 β 2 1−β Another way to summarise the information is by the 2×2 transition probability matrix P= 1−α α β 1−β = 0.y := P (Xn+1 = y|Xn = x). Here. 2. intuitive.01 Questions of interest: 1. . So if he wishes to spend more than what is available. Dn = z + y − x) P (Sn = z)P (Dn = z + y − x) FS (z)FD (z + y − x). . = z=0 ∞ = z=0 3 . Dn is the money he deposits. D0 . are random variables with distributions FD . How often is the mouse in room 1 ? Question 1 has an easy. y = 0. Xn is the money (in pounds) at the beginning of the month. S2 .05 ≈ 20 minutes. .05 0.99 0. Dn − Sn = y − x) P (Sn = z.y = P (Dn − Sn = y − x) = = z=0 ∞ z=0 ∞ P (Sn = z. S0 . (This is the mean of the binomial distribution with parameter α. Assume also that X0 .

1/2 1/2 1/2 1/2 1/2 −2 1/2 −1 1/2 0 1/2 1 1/2 2 1/2 3 Questions: 1.0 p2. Are there any places which he will never visit? 3. we have px. and if so. Will Mr Baxter’s account grow.1 p2. in principle. he has no sense of direction. Are there any places which he will visit inﬁnitely many times? 4. how often will he be visiting 0? 2.4 Simple random walk (drunkard’s walk) in a city Suppose that the drunkard is allowed to move in a city whose streets are laid down in a square pattern: 4 . Of course. We may want to model such a case by assigning probability p for the drunkard to go one step to the right (and probability 1 − p for him to go one step to the left). can be computed from FS and FD . if y = 0. The transition probability matrix is y=1 p0. 2.0 = 1 − ∞ px.2 p1. Questions of interest: 1. 2. How long will it take (if ever) for the bank account to empty? We shall develop methods for answering these and other questions. If the pub is located in position 0 and his home in position 100.3 Simple random walk (drunkard’s walk) Consider a completely drunk person who walks along a street.1 p1. how fast? 2.2 P= p2. being drunk. So he may move forwards with equal probability that he moves backwards. how long will it take him.0 p1.2 ··· ··· ··· ··· · · · · · · ··· and is an inﬁnite matrix.y . on the average.0 p0. If he starts from position 0.something that.1 p0. to go from the pub to home? You might object to the assumption that the person is totally drunk and has no sense of direction.

observe. that since there are more degrees of freedom now.01 S 0.5 Actuarial chains A life insurance company wants to ﬁnd out how much money to charge its clients. pD. But.D . So he can move east. These things are. there is probability pH. and pS.Salvador Dali (1922) The Drunkard Suppose that the drunkard moves from corner to corner. given that he is currently healthy.H = 1 − pH.S − pH. pD. and let’s say he’s completely drunk. It proposes the following model summarising the state of health of an individual on a monthly basis: 0.D is omitted. Clearly.S = 1 − pS.3 H 0. We may again want to ask similar questions. Also. the company must have an idea on how long the clients will live. 2.8 0.S because pH. therefore. Note that the diagram omits pH. but they will become more precise in the sequel. the answer to this is crucial for the policy determination. west. for now. and pS. so we assign probability 1/4 for each move. the reason being that the company does not believe that its clients are subject to resurrection.D . mathematically imprecise. there is clearly a higher chance that our man will be lost.H − pS. Question: What is the distribution of the lifetime (in months) of a currently healthy individual? Clearly. north or south. 5 .S = 0.D = 1. etc.1 D Thus.3 that the person will become sick.H .

the future process (Xm . Xn−1 ) are independent conditionally on Xn . Lemma 1. n + m] Xk = ik | Xn = in . . . . 3. . almost exclusively (unless otherwise said explicitly). (1) 6 . in+m . in+1 ∈ S. for any n ∈ T. . X0 = i0 ) = P (Xn+1 = in+1 | Xn = in ). The elements of S are frequently called states. . 2. . . . n ∈ T}. . P (Xn+1 = in+1 | Xn = in . . that the Xn take values in some countable set S. zero. for any n. 2. . . Since S is countable. We will also assume. unless needed. . 3. Let us now write the Markov property in an equivalent way. 3. or negative). m ∈ T) is independent of the past process (Xm . . So we will omit referring to T explicitly.1 The Markov property Deﬁnition of the Markov property Consider a (ﬁnite or inﬁnite) sequence of random variables {Xn .} or the set Z of all integers (positive. . We focus almost exclusively to the ﬁrst choice. n ∈ T} is Markov instead of saying that it has the Markov property. . . . Usually. . m > n. . m < n. n ∈ Z+ ) is Markov if and only if. X0 = i0 ) = P (∀ k ∈ [n + 1. X2 = i2 . . where T is a subset of the integers. (Xn . Proof. n + m] Xk = ik | Xn = in ). . . we can see that by applying it several times we obtain P (∀ k ∈ [n + 1. it is customary to call (Xn ) a Markov chain. 1. .} or N = {1. called the state space.2 Transition probability and initial distribution We therefore see that joint probabilities can be expressed as P (X0 = i0 . viz. and so future is independent of past given the present. conditionally on Xn . . . We say that this sequence has the Markov property if. for all n ∈ Z+ and all i0 . Sometimes we say that {Xn .3 3. On the other hand. . . m and any choice of states i0 . if the claim holds. This is true for any n and m. . If (Xn ) is Markov then the claim holds by the conditional independence of Xn+1 and (X0 . which precisely means that (Xn+1 . . m ∈ T). . . . . the Markov property. Xn−1 ) given Xn . T = Z+ = {0. . . . Xn = in ) = P (X0 = i0 )P (X1 = i1 | X0 = i0 )P (X2 = i2 | X1 = i1 ) · · · P (Xn = in | Xn−1 = in−1 ). Xn+m ) and (X0 . X1 = i1 .

j (m. j ∈ S. Xn = in ) = µ(i0 ) pi0 . m + 1) · · · pin−1 . . a square matrix whose rows and columns are indexed by S (and this. n + 1) := P (Xn+1 = j | Xn = i) . But we will circumvent them by using Probability only.j (n. The function µ(i) is called initial distribution. X1 = i1 . will specify all joint distributions2 : P (X0 = i0 .i2 (1. n) if m ≤ ℓ ≤ n.which means that the two functions µ(i) := P (X0 = i) .j (n.in (n − 1.j (m. all probabilities of events associated with the Markov chain This causes some analytical diﬃculties. we deﬁne. X2 = i2 . (2) which is reminiscent of matrix multiplication. remembering a bit of Algebra. pi. n ≥ 0.3 Time-homogeneous chains We will specialise to the case of time-homogeneous chains. n) := [pi. of course. 1) pi1 . pi. or as P(m. ℓ) P(ℓ. n). Therefore. then the matrices are inﬁnitely large. m + 1)P(m + 1. Of course. n). . n) = P(m. but if S is an inﬁnite set. we can write (2) as P(m.j (m. n) := P (Xn = j | Xm = i) is the transition probability from state i at time m to state j time n ≥ m. for each m ≤ n the matrix P(m.i1 (0. n). 7 .j (m. n + 1) do not depend on n . m + 2) · · · P(n − 1.j (n. pi. viz. 2) · · · pin−1 . More generally. n) = i1 ∈S ··· in−1 ∈S pi. by a result in Probability. by deﬁnition.i1 (m. n) = P(m.j∈S .. which. i∈S i. . something that can be called semigroup property or Chapman-Kolmogorov equation. If |S| = d then we have d × d matrices.j (n. requires some ordering on S) and whose entries are pi.i (n − 1.3 In this new notation. 3. n + 1) is called transition probability from state i at time n to state j at time n + 1. . pi. n)]i. 2 3 And. by (1). n). n) = 1(i = j) and. means that the transition probabilities pi. The function pi.

j . for all integers a. if Y is a real random variable. A useful convention used for time-homogeneous chains is to write Pi (A) := P (A | X0 = i). (3) We call δi the unit mass at i. Then P(m. For example.j the (one-step) transition probability from state i to state j.j (n. when we say “Markov chain” we will mean “timehomogeneous Markov chain”. For example. δi (j) := 1. Similarly. Then we have µ n = µ 0 Pn .and therefore we can simply write pi. if j = i otherwise.j ]i. y y 8 . Thus Pi is the law of the chain when the initial distribution is δi . simply. n−m times for all m ≤ n. arranged in a row vector. It corresponds. and call pi. unless otherwise speciﬁed. 0. Starting from a speciﬁc state i. b ≥ 0. of course. we write Ei Y := E(Y | X0 = i) = yP (Y = y | X0 = i) = yPi (Y = y). if µ assigns probability p to state i1 and 1 − p to state i2 . then µ = pδi1 + (1 − p)δi2 . In other words. The distribution that assigns value 1 to a state i and 0 to all other states is denoted by δi . Notational conventions Unit mass. n + 1) =: pi. and the semigroup property is now even more obvious because Pa+b = Pa Pb . let µn := [µn (i) = P (Xn = i)]i∈S be the distribution of Xn . n) = P × P × · · · · · · × P = Pn−m . transition matrix.j∈S is called the (one-step) transition probability matrix or. Notice that any probability distribution on S can be written as a linear combination of unit masses. The matrix notation is useful. as follows from (1) and the deﬁnition of P. From now on. while P = [pi. to picking state i with probability 1.

i+1 = P (Xn+1 = i + 1 | Xn = i) = pi. It should be clear that if initially place the bug at random on one of the vertices. n = m. we indeed expect that we have N/2 molecules in each room. . the law of a stationary process does not depend on the origin of time. We can model this by saying that the chance that a molecule move from room 1 to room 2 is proportional to the number of molecules in room 1. On the average. say N = 1025 ) in a metallic chamber with a separator in the middle. let us look at the following example: Example: Motion on a polygon. Hence the uniform probability distribution is stationary. a sequence of i.i−1 = P (Xn+1 N −i N i = i − 1 | Xn = i) = .d. To provide motivation and intuition. 1. Consider a canonical heptagon and let a bug move on its vertices. Let Xn be the number of molecules in room 1 at time n. If at time n the bug is on a vertex. for any m ∈ Z+ . But this will not work. dividing it into two identical rooms. Then pi. N If we start the chain with X0 = N (all molecules in room 1) then the process (Xn . Another guess then is: Place 9 . Example: the Ehrenfest chain.) is stationary if it has the same law as (Xn . . The question then is: can we distribute the initial number of molecules in a way that the origin of time plays no role? One guess might be to take X0 = N/2. This is a model of gas distributed between two identical communicating chambers. at time n + 1 it moves to one of the adjacent vertices with equal probability. Then a hole is opened in the separator and we watch what happens as the molecules diﬀuse between the two rooms. at any point of time n. In other words. . the distribution of the bug will still be the same. Imagine there are N molecules of gas (where N is a large number. It is clear. m + 1. i. . Clearly. . random variables provides an example of a stationary process. n = 0.4 Stationarity We say that the process (Xn . n ≥ 0) will not be stationary: indeed.i.). and vice versa. selecting a vertex with probability 1/7. We now investigate when a Markov chain is stationary. then. The gas is placed partly in room 1 and partly in room 2.e. that if room 1 contains more gas than room 2 then there is a tendency to observe net motion from 1 to 2. as it takes some time before the molecules diﬀuse from room to room. because it is impossible for the number of molecules to remain constant at all times. . then. for a while we will be able to tell how long ago the process started.

A Markov chain (Xn .i = π(i − 1)pi−1. . . . In other words. . .i = P (X0 = i − 1)pi−1. use (1) to see that P (X0 = i0 . a Markov chain is stationary if and only if the distribution of Xn does not change with n.) The example above is typical: We found (by guessing!) some distribution π with the property that if X0 has distribution π then every other Xn also has distribution π. X2 = i2 . by (3). ﬁnd those distributions π such that πP = π. The only if part is obvious. Xm+2 = i2 .i + π(i + 1)pi+1. stationary or invariant distribution. which you should go through by yourselves. . X1 = i1 . (a binomial distribution). . then we have created a stationary Markov chain. 10 . the distribution µn at time n is related to the distribution at time 0 by µn = µ0 Pn . . For the if part. But we need to prove this. Changing notation.i + P (X0 = i + 1)pi+1. we see that the Markov chain is stationary if and only if the distribution µ0 is chosen so that µ0 = µ0 P. i 0 ≤ i ≤ N. If we can do so. for all m ≥ 0 and i0 .i = N N −i+1 N i+1 2−N + = · · · = π(i). We can now verify that if P (X0 = i) = π(i) then P (X1 = i) = π(i) and. Xm+n = in ). we are led to consider: The stationarity problem: Given [pij ]. Since. then the number of molecules in room 1 will be i with probability π(i) = N −N 2 . (4) Such a π is called . Proof. If we do so. in ∈ S. Xm+1 = i1 .each molecule at random either in room 1 or in room 2. The equations (4) are called balance equations. n ∈ Z+ ) is stationary if and only if it is time-homogeneous and Xn has the same distribution as Xℓ for all n and ℓ. . 2−N i−1 i+1 N N (The dots signify some missing algebra. Indeed. . . P (X1 = i) = j P (X0 = j)pj. Xn = in ) = P (Xm = i0 . . Lemma 2. hence. P (Xn = i) = π(i) for all n.

i. then we run into trouble: indeed. in matrix notation.i = π(0)p(i) + π(i + 1). . i≥1 π(i) −1 To compute π(1). Example: waiting for a bus. Thus. . So we must make an: ASSUMPTION: i≥1 j≥i p(j) < ∞. 2. We can describe this by a Markov chain which takes values in the set Sd of d! permutations of {1. Consider a deck of d = 52 cards and shuﬄe them in the following (stupid) way: pick two cards at random and swap their positions. .. which. each x ∈ Sd is of the form x = (x1 . π(1) will be zero. p1. xd ). π must be a probability: π(i) = 1. n ≥ 0) is a Markov chain with transition probabilities pi+1. can be written as P1 = 1. Who tells us that the sum inside the parentheses is ﬁnite? If the sum is inﬁnite. .i = p(i). . 11 . This means that the solution to the balance equations is identically zero. if i > 0 i ∈ N. d! x ∈ Sd . To ﬁnd the stationary distribution. j≥1 These can easily be solved: π(i) = π(1) j≥i p(j). Consider the following process: At time n (measured in minutes) let Xn be the time till the arrival of the next bus. which in turn means that the normalisation condition cannot be satisﬁed. . we must be careful. i∈S The matrix P has the eigenvalue 1 always because j∈S pi. where 1 is a column whose entries are all 1.d. the next one will arrive in i minutes with probability p(i). i = 1. i ≥ 1. and this means that 1 is a (right) eigenvector of P corresponding to the eigenvalue 1. After bus arrives at a bus stop.i = 1. random variables. But here. Then (Xn . . we use the normalisation condition π(1) = i≥1 j≥i = 1.Eigenvector interpretation: The equation π = πP says that π is a left eigenvector of the matrix P with eigenvalue 1. . write the equations (4): π(i) = π(j)pj. It is easy to see that the stationary distribution is uniform: π(x) = 1 . .}. i ≥ 0. which gives p(j) . The times between successive buses are i. where all the xi are distinct. Keep doing it continuously. In addition to that. d}. and so each π(i) will be zero.j = 1. 2 . . . Example: card shuﬄing. . . The state space is S = N = {1.

). ξ2 (n) = 1 − ε1 . . So let us do so. . and any ε1 . applied to the interval [0. 1. with P (ξr (0) = 1) = P (ξr (0) = 0) = 1/2 then the same will be true for all n. i. .). We can show that if we start with the ξr (0). 1] as a ﬂexible bread dough arranged in a long rod) kneads the bread for it stretches it till it doubles in length and then folds it back in the middle. Hence choosing the initial state at random in this manner results in a stationary Markov chain. .).. . r = 1. Example: bread kneading. ξ(n + 1) = ϕ(ξ(n)). . . and transform it according to the following rule: if ξ1 = 0 then shift to the left to obtain (ξ2 . . if ξ1 = 1 then shift to the left and ﬂip all bits to obtain (1 − ξ2 . ξi ∈ {0. for any n = 0. ξ3 . . 2.d.) is the state at time n. (5) 12 . . Note that there is nothing random in the transition mechanism of this chain! The only way to instill randomness is by choosing the initial state at random. P (ξ1 (n + 1) = ε1 . 2. .d. . . . ξ2 (n) = ε1 . ξr+1 (n) = 1 − εr ). So. The Markov chain is obtained by applying the mapping ϕ again and again. 1}. You can check that . .e. are i. ξ2 . ξ2 (n). We have thus deﬁned a mapping ϕ from binary vectors into binary vectors. . 1} for all i. So our assumption really means: ASSUMPTION ⇐⇒ the expected time between successive bus arrivals is ﬁnite. ξr+1 (n) = εr ) + P (ξ1 (n) = 1. 2. the same is true at n + 1. .i. 1 − ξ3 . . If ξ1 = 1 then x ≥ 1/2. . . i≥1 j≥i j≥1 i≥1 j≥1 and the latter sum is the expected time between two successive bus arrivals. what we have really shown is that this assumption is a necessary and suﬃcient condition for the existence and uniqueness of a stationary distribution. . . . εr ∈ {0. But to even think about it will give you some strength. 1] (think of [0. P (ξr (n) = 1) = P (ξr (n) = 0) = 1/2 The proof is by induction: if the ξ1 (n). . .Let us work out the sum to see if this assumption can be expressed in more “physical” terms: p(j) = 1(j ≥ i)p(j) = jp(j). . with P (ξr (n) = 1) = P (ξr (n) = 0) = 1/2 then. .) Consider an inﬁnite binary vector ξ = (ξ1 . .i. 2(1 − x)). . . Here ξ(n) = (ξ1 (n). i. . . . So it goes beyond our theory. But the function f (x) = min(2x. . and the rule maps x into 2(1 − x). If ξ1 = 0 then x < 1/2 and our shifting rule maps x into 2x. ξ2 (n). .* (This is slightly beyond the scope of these lectures and can be omitted. . ξr (n + 1) = εr ) = P (ξ1 (n) = 0. any r = 1. from (5).. As for the name of the chain (bread kneading) it can be justiﬁed as follows: to each ξ there corresponds a number x between 0 and 1 because ξ can be seen as the binary representation of the real number x. . . . .. Remarks: This is not a Markov chain with countably many states.

Instead of eliminating equations.i + j∈Ac π(j)pj. amongst them.j . When a Markov chain is stationary we refer to it as being in 4. Writing the equations in this form is not always the best thing to do. This is OK. Ac ) = F (Ac . π(j)pj.j (which equals 1): π(i) j∈S pi. as we will see in examples. Ac ) := π(i)pi. If the state space has d states.1 Finding a stationary distribution As explained above. then choose A = {i} and see that you obtain the balance equations. Conversely.i 13 .j = j∈A π(j)pj. π(j) = 1. i∈A j∈Ac Think of π(i)pi.i = j∈A π(j)pj.i . a stationary distribution π satisﬁes π = πP π1 = 1 In other words. because the ﬁrst d equations are linearly dependent: the sum of both sides equals 1. We have: Proposition 1. expecting to be able to choose. Proof. π is a stationary distribution if and only if π1 = 1 and F (A. If the latter condition holds.Note on terminology: steady-state. π(i) = j∈S (balance equations) (normalisation condition). Let us introduce the notion of “probability flow” from a set A of states to its complement under a distribution π: F (A. But we only have d unknowns. Fix a set A ⊂ S and write π(i) = j∈S π(j)pj. for all A ⊂ S.j as a “current” ﬂowing from i to j. A). we introduce more.i Multiply π(i) by j∈S pi. Ac ) is the total current from A to Ac . suppose that the balance equations hold. then the above are d + 1 equations. So F (A. a set of d linearly independent equations which can be more easily solved.i + j∈Ac π(j)pj. j∈S i ∈ S. therefore one of them is always redundant. The convenience is often suggested by the topological structure of the chain.

the mouse spends only 95. in steady state. (Consequently. p11 = 1 − α. and 4.2% of the time in 1. p22 = 1 − β. π(1) = β .99 ≈ 0. here is an example: Example: Consider the 2-state Markov chain with p12 = α. . 2.j = j∈A π(j)pj.Now split the left sum also: π(i)pi. Summing over i ∈ A. Thus. The extra degree of freedom provided by the last result. β < 1. we ﬁnd π(1) = 0. β = 0. p q Consider the following chain. 1.952. where p + q = 1: p p p 0 q 1 q 2 q 3 q 4 . A) Schematic representation of the ﬂow balance relations. while the remaining two terms give the equality we seek.99 (as in the mouse example). A c F(A . The constant c is found from π(1) + π(2) = 1. That is. We have π(1)α = π(2)β. . Instead. we see that the ﬁrst term on the left cancels with the ﬁrst on the right.) Assume 0 < α.i + j∈Ac π(j)pj. α+β π(2) = α .. If we take α = 0. One can ask how to choose a minimal complete set of linearly independent equations. . gives us ﬂexibility.05.. .04 π(2) = 0. The balance equations are: π(i) = pπ(i − 1) + qπ(i + 1). but we shall not do this here. 14 i = 1.04 room 1. 3.05 ≈ 0. α+β π(2) = cα.8% of the time in room 2. Example: a walk with a barrier.j + j∈A j∈Ac π(i)pi.i This is true for all i ∈ S. which immediately tells us that π(1) is proportional to β and π(2) proportional to α: π(1) = cβ. p21 = β. A ) A c c F( A .048.

the chain will visit j at some ﬁnite time.) 5 5. i. . • Suppose that i j. If i ≥ 1 we have F (A. Proof. . in−1 . 1.j > 0 for some n ∈ N and some states i1 .But we can make them simpler by choosing A = {0. . a pair (S.e. j) ∈ E ⇐⇒ pi. We can frequently (but not always) draw the graph (as we did in examples) and visualise the set E as the set of all edges with arrows.i2 · · · pin−1 . If i = j there is nothing to prove. n≥0 . . . . In graph-theoretic terminology. In fact. . S is a set of vertices. . (iii) Pi (Xn = j) > 0 for some n ≥ 0.e. These are much simpler than the previous ones. E) where S is the state space and E ⊂ S × S deﬁned by (i. i leads to j (written as i Notice that this relation is reflexive): ∀i ∈ S Furthermore: Theorem 1. 5. Since 0 < Pi (∃n ≥ 0 Xn = j) ≤ 15 Pi (Xn = j). (ii) pi. i.2 The relation “leads to” We say that a state i leads to state j if. Does this provide a stationary distribution? Yes. A). This is often cast in terms of a digraph (directed graph). For i ≥ 1. provided that ∞ π(i) = 1. p < 1/2. (If p ≥ 1/2 there is no stationary distribution. This is i=0 possible if and only if p < q. π(i) = (p/q)i π(0). In other words. and E a set of directed edges. i − 1}.i1 pi1 . we can solve them immediately.1 Topological structure The graph of a chain This refers to the structure of the process that does not depend on the exact values of the transition probabilities but only on which of them are positive. So assume i = j. The following are equivalent: (i) i j i i j) ⇐⇒ Pi (∃n ≥ 0 Xn = j) > 0.j > 0. starting from i. Ac ) = π(i − 1)p = π(i)q = F (Ac .

by deﬁnition. In other words. i}. meaning that (ii) holds. We have Pi (Xn = j) = i1 .. the set of all states that communicate with i: [i] := {j ∈ S : j So. Proof. • Suppose that (ii) holds. The communicating class corresponding to the state i is. it partitions S into equivalence classes known as communicating classes.. 5} is closed: if the chain goes in there then it never moves out of it. 5. {4. by deﬁnition. the class {2.the sum on the right must have a nonzero term. The relation is transitive: If i j and j k then i k. where the sum extends over all possible choices of states in S. (i) holds. we see that there are four communicating classes: {1}. this is symmetric (i Corollary 2. 16 . (iii) holds. {2.i1 pi1 . because Pi (Xn = j) ≤ Pi (∃m ≥ 0 Xm = j). [i] = [j] if and only if i identical or completely disjoint. By assumption.3 The relation “communicates with” Deﬁne next the relation i communicates with j (written as i Obviously.j . is an equivalence relation. Equivalence relation means that it is symmetric. 3} is not closed. and so (iii) holds.i2 · · · pin−1 . However. The ﬁrst two classes diﬀer from the last two in character. (6) j.. {6}. 3}. The relation j ⇐⇒ j i). Since Pi (Xn = j) > 0. We just observed it is symmetric. Corollary 1. The class {4. j) ⇐⇒ i j and j i. 5}. the sum contains a nonzero term. reﬂexive and transitive. we have that the sum is also positive and so one of its terms is positive.. Two communicating classes are either 1 2 3 4 5 6 In the example of the ﬁgure. Hence Pi (Xn = j) > 0.in−1 pi. • Suppose now that (iii) holds. It is reﬂexive and transitive because is so. In addition. Look at the last display again. Just as any equivalence relation.

i2 > 0. Thus. we have i1 ∈ C. If we remove the non-closed communicating classes then we obtain a collection of disjoint closed communicating classes with no arrows between them. To put it otherwise. Since i ∈ C. C5 in the ﬁgure above are closed communicating classes. starting from any state. and so on.More generally. Note: An inessential state i is such that there exists a state j such that i j but j i. the single-element set {i} is a closed communicating class. In graph terms. Note that there can be no arrows between closed communicating classes. and C is closed. . Closed communicating classes are particularly important because they decompose the chain into smaller. C3 are not closed. parts. If a state i is such that pi. more manageable.i = 1 then we say that i is an absorbing state. pi. . we conclude that j ∈ C. The converse is left as an exercise. If all states communicate with all others. .i2 · · · pin−1 . In other words still. pi1 . and so i2 ∈ C.i1 > 0. The communicating classes C1 . Similarly.i1 pi1 . the state space S is a closed communicating class.j = 1.j > 0. j∈C for all i ∈ C. Suppose that i j. Suppose ﬁrst that C is a closed communicating class. we say that a set of states C ⊂ S is closed if pi. its transition matrix P) is called irreducible. Proof. We then can ﬁnd states i1 . we can reach any other state. the state is inessential. 17 . A communicating class C is closed if and only if the following holds: If i ∈ C and if i j then j ∈ C. Note: If we consider a Markov chain with transition matrix P and ﬁx n ∈ N then the Markov chain with transition matrix Pn has exactly the same closed communicating classes. Lemma 3. . There can be arrows between two nonclosed communicating classes but they are always in the same direction. The classes C4 . more precisely. C2 . in−1 such that pi. Otherwise. A general Markov chain can have an inﬁnite number of communicating classes. then the chain (or. Why? A state i is essential if it belongs to a closed communicating class. C1 C2 C3 C4 C5 The general structure of a Markov chain is indicated in this ﬁgure.

The chain alternates between the two sets. whether it is possible to return to i in m steps can be decided by one of the integer divisors of m. A state i is called aperiodic if d(i) = 1. Dj = {n ∈ (n) (α) N : pjj > 0}. and. Since Pm+n = Pm Pn .e. that we can further simplify the behaviour of the chain by noticing that if the chain is in the set C1 = {1. . j ∈ S.5. So if the chain starts with X0 ∈ C1 then we know that X1 ∈ C2 . If i j then d(i) = d(j). X3 ∈ C2 . . d(i) = gcd Di . . Let us denote by pij the entries of Pn . (Can you ﬁnd an example of a chain with period 3?) We will now give the deﬁnition of the period of a state of an arbitrary chain. 3} at some point of time then.4 Period Consider the following chain: 1 2 4 3 The chain is irreducible. (n) Proof. we have So pii (2n) (n) pij ≥ pii (n) 2 (m+n) ≥ pik pkj . it will move to the set C2 = {2. So. Let Di = {n ∈ N : pii > 0}. The period of an inessential state is not deﬁned. d(j) = gcd Dj . Theorem 2 (the period is a class property). . 4} at the next step and vice versa. If i j there is α ∈ N with pij > 0 and if 18 . The period of an essential state is deﬁned as the greatest common divisor of all natural numbers n with such that it is possible to return to i in n steps: d(i) := gcd{n ∈ N : Pi (Xn = i) > 0} . (m) (n) for all m. n ≥ 0. X4 ∈ C1 . i. Notice however. ℓ ∈ N. X2 ∈ C1 . Therefore if pii > 0 then pii (n) (ℓn) > 0 for all ℓ ∈ N. Another way to say this is: (n) (m) If the integer n divides m (denote this by: n | m) and if pii > 0 then pii > 0. Consider two distinct essential states i. j. all states form a single closed communicating class. pii (ℓn) ≥ pii (n) ℓ . more generally. i. We say that the chain has period 2.

Of particular interest. Then pii > 0 and so pjj ≥ pij pii pji > 0. So d(j) is a divisor of all elements of Di . j ∈ S. Cd−1 such that pij = 1. Then we can uniquely partition the state space S into d disjoint subsets C0 . Cd−1 such that the chain moves cyclically between them. . the chain is called aperiodic. . In particular. .j i there is β ∈ N with pji > 0. q − r > 0. Theorem 4. 1. all states communicate with one another. j∈Cr+1 i ∈ Cr . where the remainder r ≤ n1 − 1. So pjj ≥ pij pji > 0. if d = 1. r = 0. Since d(i) is the greatest common divisor we have d(i) ≥ d(j) (in fact. . we have n ∈ Di as well. Therefore n = qn1 + r(n2 − n1 ) = (q − r)n1 + rn2 . Let n be suﬃciently large. d(j) | d(i)). as deﬁned earlier. From the last two displays we get d(j) | n for all n ∈ Di . if an irreducible chain has period d then we can decompose the state space into d sets C0 . we obtain d(i) ≤ d(j) as well.) 19 . Formally. Arguing symmetrically. i. (n) (n) (α+n+β) (α) (n) (β) (β) (α+β) (α) (β) This ﬁgure shows the internal structure of a closed communicating class with period d = 4. Divide n by n1 to obtain n = qn1 + r.e. . . Pick n2 . Proof. . . An irreducible chain has period d if one (and hence all) of the states have period d. n2 ∈ Di . C1 . Consider an irreducible chain with period d. (Here Cd := C0 . are irreducible chains. this is the content of the following theorem. . . More generally. C1 . Corollary 3. Because n is suﬃciently large. . An irreducible chain with ﬁnitely many states is aperiodic if and only if there (n) exists an n such that pij > 0 for all i. n1 ∈ Di such that n2 − n1 = 1. chains where. Now let n ∈ Di . . If a number divides two other numbers then it also divides their diﬀerence. Theorem 3. showing that α + n + β ∈ Dj . Therefore d(j) | α + β. Therefore d(j) | α + β + n for all n ∈ Di . Since n1 . d − 1. If i is an aperiodic state then there exists n0 such that pii > 0 for all n ≥ n0 . So d(i) = d(j). showing that α + β ∈ Dj .

. Notice that this is an equivalence relation.e. We now show that if i belongs to one of these classes and if pij > 0 then.Proof.. We are interested in deriving formulae for the probability that this time is ﬁnite as well as the expectation of this time. Consider the path i0 → i → j → i2 → i3 → · · · → id → i0 . i ∈ C0 . Such a path is possible because of the choice of the i0 . id ∈ C0 . i2 i0 i0 in a number of steps which is an integer multiple of d − 1 (why?). Suppose pij > 0 but j ∈ C1 but. with corresponding classes C0 . Cd−1 . C1 .e the set of states j such that i j). after it takes one step from its current position. We show that these classes satisfy what we need. 20 . consider the chain 1 1/3 1 0 4 1/2 1 1/6 2 We shall later consider the time inf{n ≥ 1 : Xn ∈ A} which diﬀers from TA simply by considering n ≥ 1 instead of n ≥ 0. . j belongs to the next class. Take. .id > 0.. Deﬁne the relation i d j ⇐⇒ pij (nd) > 0 for some n ≥ 0. We avoid excessive notation by using the same letter for both. We use a method that is based upon considering what the Markov chain does at time 1. Hence it partitions the state space S into d disjoint equivalence classes. j ∈ C2 . id−1 d 6 Hitting times and ﬁrst-step analysis Consider a Markov chain with transition probability matrix P. i1 . If X0 ∈ A the two times coincide. necessarily. We deﬁne the hitting time4 of a set of states A by TA := inf{n ≥ 0 : Xn ∈ A}. for example. . Assume d > 1. necessarily. i. Pick a state i0 and let C0 be its equivalence class (i. say.. It is easy to see that if we continue and pick a further state id with pid−1 . which contradicts the deﬁnition of d. The reader is warned to be alert as to which of the two variables we are considering at each time. then.. As an example.. i1 . and by the assumptions that d d j. The existence of such a path implies that it is possible to go from i0 to i. . Let C1 be the equivalence class of i1 .. that is why we call it “first-step analysis”. Continue in this manner and deﬁne states i0 . . Then pick i1 such that pi0 i1 > 0.

ϕ(i) = j∈S i∈A pij ϕ(j). X2 . because the chain may end up in state 2. Then ϕ′ (i) = j∈A pij + j∈A pij ϕ′ (j). i ∈ A. and deﬁne ϕ(i) := Pi (TA < ∞). . if ϕ′ (i) is any other solution of these equations then ϕ′ (i) ≥ ϕ(i) for all i ∈ S. ﬁx a set of states A. the event TA is independent of X0 . conditionally on X1 .It is clear that P1 (T0 < ∞) < 1. By self-feeding this equation. which implies that ′ P (TA < ∞|X1 = j) = P (TA < ∞|X0 = j) = Pj (TA < ∞) = ϕ(j). a ′ function of X1 . We then have: Theorem 5.e. j∈S as needed. X0 = i) = P (TA < ∞|X1 = j). let ϕ′ (i) be another solution. we have Pi (TA < ∞) = ϕ(j)pij . But the Markov chain is homogeneous. Hence: ′ ′ P (TA < ∞|X1 = j. If i ∈ A then TA = 0. But what exactly is this probability equal to? To answer this in its generality. ′ is the remaining time until A is hit). we have ϕ′ (i) = j∈A pij + j∈A = j∈A pij + j∈A pij pij pjk + k∈A k∈A pjk + j∈A pij pjk ϕ′ (k) k∈A pjk ϕ′ (k) k∈A 21 . . So TA = 1 + TA . and so ϕ(i) = 1. If i ∈ A. Combining the above. The function ϕ(i) satisﬁes ϕ(i) = 1. ′ Proof. . Furthermore. For the second part of the proof. then TA ≥ 1.). Therefore. We ﬁrst have (where TA Pi (TA < ∞) = j∈S ′ Pi (1 + TA < ∞|X1 = j)Pi (X1 = j) = j∈S ′ Pi (TA < ∞|X1 = j)pij ′ But observe that the random variable TA is a function of the future after time 1 (i.

We recognise that the ﬁrst term equals Pi (TA = 1). if the chain starts from 2 it will never hit 0. We have: Theorem 6. If ′ i ∈ A. the second equals Pi (TA = 2). obviously. Start the chain from i. We immediately have ϕ(0) = 1. 2. the set that contains only state 0. we obtain ϕ′ (i) ≥ Pi (TA < ∞) = ϕ(i). 1 ψ(1) = 1 + ψ(1).) As for ϕ(1). j∈A i ∈ A. we choose A = {0}. ′ E i TA = 1 + E i TA = 1 + j∈S pij Ej TA = 1 + j∈A pij ψ(j). By continuing self-feeding the above equation n times. On the other hand. we have 1 1 ϕ(1) = ϕ(1) + ϕ(0). and let ψ(i) = Ei TA . By omitting the last term we obtain ϕ′ (i) ≥ Pi (TA ≤ 2). If i ∈ A then. Clearly. i = 0. and ϕ(2) = 0. if ψ ′ (i) is any other solution of these equations then ψ ′ (i) ≥ ψ(i) for all i ∈ S. 3 2 so ϕ(1) = 3/4. 2}. we obtain ϕ′ (i) ≥ Pi (TA ≤ n). ψ(i) = 1 + i∈A pij ψ(j). Example: In the example of the previous ﬁgure. because it takes no time to hit A if the chain starts from A. The second part of the proof is omitted. which is the second of the equations. Example: Continuing the previous example. let A = {0. Letting n → ∞. Proof. TA = 0 and so ψ(i) := Ei TA = 0. Therefore. so the ﬁrst two terms together equal Pi (TA ≤ 2). ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ Next consider the mean time (mean number of steps) until the set A is hit for the ﬁrst time. then TA = 1 + TA . Furthermore. The function ψ(i) satisﬁes ψ(i) = 0. 3 22 . 1. ψ(i) := Ei TA . ψ(0) = ψ(2) = 0. (If the chain starts from 0 it takes no time to hit 0. as above. We let ϕ(i) = Pi (T0 < ∞).

. Then ′ 1+TA x ∈ A. otherwise. (This should have been obvious from the start. therefore the mean number of ﬂips until the ﬁrst success is 3/2. X2 . ′ where TA is the remaining time. . It should be clear that the equations satisﬁed are: ϕAB (i) = 1. if x ∈ A. in the last sum. we just changed index from n to n + 1. consider the following situation: Every time the state is x a reward f (x) is earned.) ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ Now let us consider two hitting times TA . by the Markov property. i. i∈A Furthermore. The total reward is f (X0 ) + f (X1 ) + · · · + f (XTA ). because the underlying experiment is the ﬂipping of a coin with “success” probability 2/3. ϕAB is the minimal solution to these equations. Clearly. . ∼◦∼◦∼◦∼◦∼◦∼◦∼◦∼ As yet another application of the ﬁrst-step analysis method. where. h(x) = Ex n=0 f (Xn ) ′ 1+TA = f (x) + Ex n=1 ′ TA f (Xn ) = f (x) + Ex n=0 f (Xn+1 ). Next. until set A is reached. ′ TA = 1 + TA . We are interested in the mean total reward TA h(x) := Ex n=0 f (Xn ). because when X0 ∈ A then T0 = 0 and so the total reward is f (X0 ). Now the last sum is a function of the future after time 1. . This happens up to the ﬁrst hitting time TA ﬁrst hitting time of the set A. as argued earlier. ϕAB (i) = j∈A i∈B pij ϕAB (j). B. TB for two disjoint sets of states A. ϕAB (i) = 0.e. We may ask to ﬁnd the probabilities ϕAB (i) = Pi (TA < TB ). Hence.which gives ψ(1) = 3/2. h(x) = f (x). of the random variables X1 . ′ TA Ex n=0 f (Xn+1 ) = y∈S pxy h(y). 23 . after one step.

2 2 We solve and ﬁnd h(2) = 8. h(3) = 7. 3. k ∈ N) form the strategy of the gambler. You can remember the latter ones by thinking that the expectecd total reward equals the immediate reward f (x) plus the expected remaining reward. (Also. 2. y∈S h(x) = f (x) + x ∈ A. The successive stakes (Yk . the set of equations satisﬁed by h. unless he is in room 4. we have h(4) = 0 1 h(1) = 1 + h(2) 2 1 h(3) = 3 + h(2) 2 1 1 h(2) = 2 + h(1) + h(3). (Sooner or later. So. are h(x) = 0. so 24 .) 7 Gambler’s ruin Consider the simplest game of chance of tossing an (possibly unfair) coin repeatedly and independently. for i = 1. he collects i pounds. h(1) = 5. Clearly. Example: A thief enters sneaks in the following house and moves around the rooms. If heads come up then you win £1 per pound of stake. why?) If we let h(i) be the average total reward if he starts from room i. he will die. If ξk is the outcome of the k-th toss (where ξk = +1 means heads and ξk = −1 means tails). in which case he dies immediately. x∈A pxy h(y).Thus. if he starts from room 2. 1 2 4 3 The question is to compute the average total reward. Let p the probability of heads and q = 1 − p that of tails. if just before the k-th toss you decide to bet Yk pounds then after the realisation of the toss you will have earned Yk pounds if heads came up or lost Yk pounds if tails appeared. your fortune after the n-th toss is n Xn = X0 + k=1 Y k ξk . until he dies in room 4. collecting “rewards”: when in room i. no gambler is assumed to have any information about the outcome of the upcoming toss.

b = ℓ > 0 and let ϕ(x. . . We ﬁrst solve the problem when a = 0. a + 1. if p = q = 1/2 b−a Proof. a < i < b. . astrological nature. if p = q Px (Tb < Ta ) = .if the gambler wants to have a strategy at all.i−1 = q = 1 − p. betting £1 at each integer time against a coin which has P (heads) = p. she must base it on whatever outcomes have appeared thus far. Let £x be the initial fortune of the gambler. We are clearly dealing with a Markov chain in the state space {a. is our next goal. in addition. 0 ≤ x ≤ ℓ. and where the states a. We wish to consider the problem of ﬁnding the probability that the gambler will never lose her money.i+1 = p. a ≤ x ≤ b. x − a . in the ﬁrst case. In other words. . The gambler starts with X0 = x pounds and has a target: To make b pounds (b > x). ℓ). We use the notation Px (respectively. pi. Theorem 7. Let us consider a concrete problem. Then the probability that the gambler’s fortune will reach level b > x before reaching level a < x equals: x−a − 1 (q/p) (q/p)b−a − 1 . (Think why!) To simplify things. . The diﬀerence between the gambler’s ruin problem for a company and that for a mere mortal is that. Ex ) for the probability (respectively. in simplest case Yk = 1 for all k. a + 2. . This has profound actuarial applications: An insurance company must make sure that the strategies it employs are such that–at least–the company will not go bankrupt (and. Yk cannot depend on ξk . To solve the problem. ℓ) := Px (Tℓ < T0 ). write ϕ(x) instead of ϕ(x.g. We obviously have ϕ(0) = 0. Consider a gambler playing a simple coin tossing game. P (tails) = 1 − p. the company has control over the probability p. . as described above. expectation) under the initial condition X0 = x. it will make enormous proﬁts). b} with transition probabilities pi. b are absorbing. . The general solution will then obtained from Px (Tb < Ta ) = ϕ(x − a. Consider the hitting time of state y: Ty := inf{n ≥ 0 : Xn = y}. . b − a). It can only depend on ξ1 . She will stop playing if her money fall below a (a < x). b − 1. 25 ϕ(ℓ) = 1. ξk−1 and–as is often the case in practice–on other considerations of e.

λℓ −1 x ℓ. by ﬁrst-step analysis. we have obtained ϕ(x. ℓ→∞ λℓ − 1 0−1 λx − 1 = 1 − 0 = 1. Assuming that λ := q/p = 1. Since ϕ(ℓ) = 1. ℓ→∞ ℓ Px (T0 < ∞) = 1 − lim Finally. λℓ − 1 0 ≤ x ≤ ℓ. ℓ→∞ λℓ − 1 x = 1 − 0 = 1. We ﬁnally ﬁnd λx − 1 . δ(x − 2) = · · · = q p x δ(0). we ﬁnd δ(0) = 1/ℓ and so x ϕ(x) = . We have δ(x) = q δ(x − 1) = p q p 2 0 < x < ℓ. The general case is obtained by replacing ℓ by b − a and x by x − a. we have λℓ −1 λ−1 δ(0) = 1 which gives δ(0) = (λ − 1)/(λℓ − 1). λ = 1. Since p + q = 1. ℓ→∞ ℓ→∞ (q/p)x . if p > 1/2 if p ≤ 1/2. ℓ Summarising. λ−1 ϕ(x) = If λ = 1 then q = p and so ϕ(x + 1) − ϕ(x) = ϕ(x) − ϕ(x − 1). If p > 1/2. then λ = q/p > 1. we ﬁnd ϕ(x) = δ(x − 1) + δ(x − 2) + · · · + δ(0) = [λx−1 + · · · + λ + 1]δ(0) = Since ϕ(ℓ) = 1.and. if p = q. λ = 1. ℓ) = λx −1 . 1. Corollary 4. and so Px (T0 < ∞) = 1 − lim If p < 1/2. Px (T0 < ∞) = 1 − lim 26 . 0 ≤ x ≤ ℓ. λx − 1 δ(0). Let δ(x) := ϕ(x + 1) − ϕ(x). We have Px (T0 < ∞) = lim Px (T0 < Tℓ ) = 1 − lim Px (Tℓ < T0 ). ϕ(x) = pϕ(x + 1) + qϕ(x − 1). if λ = 1 if λ = 1. then λ = q/p < 1. and so λx − 1 λx − 1 =1− = λx . we can write this as p[ϕ(x + 1) − ϕ(x)] = q[ϕ(x) − ϕ(x − 1)]. The ruin probability of a gambler starting with £x is Px (T0 < ∞) = Proof. This gives δ(x) = δ(0) for all x and ϕ(x) = xδ(0).

A | Xτ . τ < ∞) = pi. Since (X0 . Recall that the Markov property states that. . In particular. . . .Xτ +2 = j2 . . . Xn+2 = j2 . τ = n) = pi. . . Xn+2 = j2 . Let τ be a stopping time. | Xn = i)P (Xn = i. .8 Stopping times and the strong Markov property A random time is a random variable taking values in the time-set. Xn ) and the ordinary splitting property at n. . for any n. the ﬁrst hitting time of the set A. τ = n)P (Xn = i. . . . . Xn )1(τ = n). . (b) is by the deﬁnition of conditional probability. . Xn+2 = j2 . . the future process (Xn . . τ = n) (c) = P (Xn+1 = j1 . Xτ )1(τ < ∞) = n∈Z+ (X0 . . . . Xn ). . 27 . the value +∞. . . . τ < ∞) and that P (Xτ +1 = j1 . possibly. τ < ∞)P (A | Xτ . We are going to show that for any such A. including. P (Xτ +1 = j1 . . Xτ +2 = j2 . Xn+1 . .j2 · · · P (Xn = i. . . Let τ be a stopping time. . Xτ = i. . Xτ +2 = j2 . A. This stronger property is referred to as the strong Markov property. τ < ∞) = P (Xn+1 = j1 . An example of a random time which is not a stopping time is the last visit of a speciﬁc set. (d) where (a) is just logic. We will now prove that this remains true if we replace the deterministic n by a random stopping time τ . A. . A. . . . a stopping time τ is a random variable with values in Z+ ∪{+∞} such that 1(τ = n) is a deterministic function of (X0 . Xn ) for all n ∈ Z+ . Theorem 8 (strong Markov property). A. Our assertions are now easily obtained from this by ﬁrst summing over all n (whence the event {τ < ∞} appears) and then by dividing both sides by P (Xτ = i. . (c) is due to the fact that A ∩ {τ = n} is determined by (X0 .) is independent of the past process (X0 . Xn = i.j1 pj1 . . n ≥ 0) is a time-homogeneous Markov chain.j2 · · · We have: P (Xτ +1 = j1 . A ∩ {τ = n} is determined by (X0 . Xn ) if we know the present Xn . A. . .j1 pj1 . any such A must have the property that. . A. . . . τ = n) = P (Xn+1 = j1 . . . | Xτ . . . the future process (Xτ +n . . Proof. . An example of a stopping time is the time TA . . Then. . considered earlier. and (d) is due to the fact that (Xn ) is time-homogeneous Markov chain with transition probabilities pij . | Xn = i. τ = n) (a) (b) = P (Xτ +1 = j1 . for all n ∈ Z+ . . Xτ ) and has the same law as the original process started from i. Xτ ) if τ < ∞. τ < ∞). n ≥ 0) is independent of the past process (X0 . . . . τ = n). Suppose (Xn . | Xτ = i. Xτ +2 = j2 . Let A be any event determined by the past (X0 . . . . conditional on τ < ∞ and Xτ = i. .

in fact. State i may or may not be visited again. If X0 = i then the two deﬁnitions coincide. in a generalised sense.) We let Ti be the time of the third visit. “chunks”. Regardless. All these random times are stopping times. Xi (2) .d. We (r) refer to them as excursions of the process. .d. Our Ti here is always positive. So Xi is the r-th excursion: the chain visits 28 . and so on. (And we set Ti := Ti . Note the subtle diﬀerence between this Ti and the Ti of Section 6. we recursively deﬁne Ti (r) (2) := inf{n > Ti (r−1) : Xn = i}. These excursions are independent! 11111111111 00000000000 1111111111 0000000000 11111111111 00000000000 11111111111111 00000000000000 1111111111 0000000000 11111111111 00000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 11111111111 00000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 11111 00000 11111111111111 00000000000000 1111111111 0000000000 1111111111111111 0000000000000000 11111 11111111111111 00000000000000 1111111111 0000000000 i 00000 11111111111111111111111111111111111111111111 00000000000000000000000000000000000000000000 1111111111111111 0000000000000000 11111 00000 11111 00000 Ti = Ti (1) Ti (2) Ti (3) Ti (4) Fix a state i ∈ S and let Ti be the ﬁrst time that the chain will visit this state: Ti := inf{n ≥ 1 : Xn = i}. . we let Ti be the time of the second (1) (3) visit.. Schematically. we will show that we can split the sequence into i.i. The Ti of Section 6 was allowed to take the value 0. Ti Xi (0) (r) ≤ n < Ti (r+1) . However. random variables is a (very) particular example of a Markov chain. 0 ≤ n < Ti . 2. . Formally. then the behaviour of the chain between successive visits cannot be inﬂuenced by the past or the future (and that is due to the strong Markov property). (Why?) Consider the “trajectory” of the chain between two successive visits to the state i: Xi Let also We think of (r) := Xn . The idea is that if we can ﬁnd a state i that is visited again and again. The random variables Xn comprising a general Markov chain are not independent. . Xi (1) . 3. Xi (0) . . . ..9 Regenerative structure of a Markov chain A sequence of i. random trajectories. := Xn . (1) r = 1. by using the strong Markov property.i. They are. as “random variables”. r = 2.

Perhaps some states (inessential ones) will be visited ﬁnitely many times. are independent random variables. . A trivial. g(Xi (1) ). (r) . g(Xi (2) ). Xi have the same distribution follows from the fact that the Markov chain is time-homogeneous. An immediate consequence of the strong Markov property is: Theorem 9. we “watch” it up until the next time it returns to state i. 29 . if it starts from state i at some point of time. Xi distribution. of the excursions are Λr := (r) n=Ti 1(Xn = j) The meaning of Λr is this: it expresses the number of visits to a state j between two successive visits to the state i. Apply Strong Markov Property (SMP) at Ti : SMP says that. but important corollary of the observation of this section is that: Corollary 5. the chain returns to i only ﬁnitely many times: Pi (Xn = i for inﬁnitely many n) = 0. . the future is independent of the (1) (2) past. . by deﬁnition. starting from i. equal to i. . Example: Deﬁne Ti (r+1) (0) ). they are also identical in law (i..d. .i. except. it will behave in the same way as if it starts from the same state at another point of time. for time-homogeneous Markov chains. the future after Ti is independent of the past before Ti . and “goes for an excursion”.state i for the r-th time. Xi (1) . The excursions Xi (0) . . possibly. Second. we say that a state i is transient if.). Λ2 . (0) are independent random variables. Therefore. The claim that Xi . and this proves the ﬁrst claim. Moreover. But (r) the state at Ti is. have the same Proof. 10 Recurrence and transience When a Markov chain has ﬁnitely many states. The last corollary ensures us that Λ1 . but there are always states which will be visited again and again. Therefore. ad inﬁnitum. . given the state at (r) (r) (r) the stopping time Ti . all. (Numerical) functions g(Xi independent random variables. the chain returns to i inﬁnitely many times: Pi (Xn = i for inﬁnitely many n) = 1. at least one state must be visited inﬁnitely many times. we say that a state i is recurrent if. In particular. Xi (2) .. starting from i. We abstract this trivial observation and give a couple of deﬁnitions: First.

Consider the total number of visits to state i (excluding the possibility that X0 = i): ∞ Ji = n=1 1(Xn = i) = sup{r ≥ 0 : Ti(r) < ∞} . State i is recurrent. Therefore. Indeed. Indeed. then Pi (Ji < ∞) = 1. i. every state is either recurrent or transient. the random variable Ji is geometric: k Pi (Ji ≥ k) = fii . we have Proof. therefore concluding that every state is either recurrent or transient.Important note: Notice that the two statements above are not negations of one another. 3. Notice that: state i is visited inﬁnitely many times ⇐⇒ Ji = ∞. We will show that such a possibility does not exist. Suppose that the Markov chain starts from state i. Notice also that fii := Pi (Ji ≥ 1) = Pi (Ti < ∞). This means that the state i is recurrent. . If fii < 1. there could be yet another possibility: 0 < Pi (Xn = i for inﬁnitely many n) < 1. in principle. 30 . namely. We summarise this together with another useful characterisation of recurrence & transience in the following two lemmas. Pi (Ji = ∞) = 1. Ei Ji = ∞. the event {Ji ≥ k} implies that Ti < ∞ and that Ji ≥ k − 1. But the past (k−1) before Ti is independent of the future. and that is why we get the product. 2. By the strong Markov property at time Ti (k−1) Pi (Ji ≥ k) = Pi (Ji ≥ k − 1)Pi (Ji ≥ 1). Pi (Ji = ∞) = 1.e. Lemma 5. We claim that: Lemma 4. we conclude what we asserted earlier. 2 k Pi (Ji ≥ k) = fii Pi (Ji ≥ k − 1) = fii Pi (Ji ≥ k − 2) = · · · = fii . k ≥ 0. Because either fii = 1 or fii < 1. (k−1) Notice that fii = Pi (Ti < ∞) can either be equal to 1 or smaller than 1. Starting from X0 = i. 4. This means that the state i is transient. fii = Pi (Ti < ∞) = 1. If fii = 1 we have that Pi (Ji ≥ k) = 1 for all k. The following are equivalent: 1.

Notice the diﬀerence with pij . it is visited ﬁnitely many times with probability 1. Pi (Ji < ∞) = 1. From Elementary Probability. by the deﬁnition of Ji . Proof. Ei Ji < ∞. State i is transient if. assuming that the chain (n) starts from state i.e. the probability to hit state j for the ﬁrst time at time n ≥ 1. then. it is visited inﬁnitely many times. Proof. This. and.1 Deﬁne First hitting time decomposition fij := Pi (Tj = n). fij ≤ pij . by deﬁnition of Ji is equivalent to Pi (Ji = ∞) = 1. If Ei Ji = ∞. (n) pij n (m) (n−m) (n) (n) = m=1 fij pjj . fii = Pi (Ti < ∞) < 1. State i is transient. by deﬁnition. is equivalent to Pi (Ji < ∞) = 1. Lemma 6. then. clearly. The following are equivalent: 1. if we assume Ei Ji < ∞. On the other hand. The equivalence of the ﬁrst two statements was proved before the lemma. by deﬁnition. as n → ∞. 3. Clearly. therefore Pi (Ji < ∞) = 1. (n) But if a series is ﬁnite then the summands go to 0. (n) pij n = Pi (Xn = j) = m=1 n Pi (Tj = m.Proof. Corollary 6. The equivalence of the ﬁrst two statements was proved above.e. as n → ∞. pii → 0. If i is transient then Ei Ji < ∞. Xn = j) Pi (Tj = m)Pi (Xn = j | Tj = m) 31 = m=1 . Proof. (n) i. But ∞ ∞ ∞ (n) Ei Ji = Ei n=1 1(Xn = i) = n=1 Pi (Xn = i) = n=1 (n) pii . Ji cannot take value +∞ with positive probability. The latter is the probability to be in state j at time n (not necessarily for the ﬁrst time) starting from i. n ≥ 1. this implies that Ei Ji < ∞. which. 10. If i is a transient state then pii → 0. we see that its expectation cannot be inﬁnity without Ji itself being inﬁnity with probability one. 4. owing to the fact that Ji is a geometric random variable. since Ji is a geometric random variable. i. but the two sequences are related in an exact way: Lemma 7. Hence Ei Ji = ∞ also. 2. State i is recurrent if.

( Remember: another property that was a class property was the property that the state have a certain period. Xi . but also fij = 1. showing that j will appear in inﬁnitely many of the excursions for sure.. Corollary 7. and hence pij as n → ∞. j i.r := 1 j appears in excursion Xi (r) . and gij := Pi (Tj < Ti ) > 0. Hence. . as n → ∞. The last statement is simply an expression in symbols of what we said above. 10.i. r = 0. So the random variables δj. we have n=1 pjj = Ej Jj < ∞. if we let fij := Pi (Tj < ∞) . 1. inﬁnitely many of them will be 1 (with probability 1). Corollary 8. . . and since they take the value 1 with positive probability. . the probability to go to j in some ﬁnite time is positive. In other words. ∞ Proof. not only fij > 0. Now. starting from i. In a communicating class. j i. by summing over n the relation of Lemma 7. for all states i. We now show that recurrence is a class property. by deﬁnition.) Theorem 10.The ﬁrst term equals fij . If i is recurrent and i j then there is a positive probability (0) (1) (= fij ) that j will appear in one of the i. and so the probability that j will appear in a speciﬁc excursion is positive. which is ﬁnite. are i. For the second term. we use the Strong Markov Property: (n−m) Pi (Xn = j | Tj = m) = Pi (Xn = j | Xm = j) = pjj . either all states are recurrent or all states are transient. If j is a transient state then pij → 0.d. In particular. we have i j ⇐⇒ fij > 0.2 Communicating classes and recurrence/transience Recall the deﬁnition of the relation “state i leads to state j”. Proof. the probability that j will appear in a speciﬁc excursion equals gij = Pi (Tj < Ti ). excursions Xi .d. 32 .i. denoted by i j: it means that. If i is recurrent and if i j then j is recurrent and fij = Pi (Tj < ∞) = 1. If j is transient. Hence. Indeed. Start with X0 = i. . . we obtain (after interchanging the order of the two sums in the right hand side) ∞ n=1 (n) pij ∞ (m) fij ∞ n=m (n) (n−m) pjj ∞ (m) (n) (m) = m=1 = (1 + Ej Jj ) m=1 fij = (1 + Ej Jj )Pi (Tj < ∞).

. Recall that a ﬁnite random variable may not necessarily have ﬁnite expectation. Every ﬁnite closed communicating class is recurrent. . and so i is a transient state. By ﬁniteness. . . observe that T is ﬁnite. . all other states are also transient. 0 < Pi (A) = Pi (A ∩ B) + Pi (A ∩ B c ) = Pi (A ∩ B c ) ≤ Pi (B c ). necessarily. If i is recurrent then every other j in the same communicating class satisﬁes i j. Suppose that i is inessential. there is a state j such that i there is m such that Pi (A) = Pi (Xm = j) > 0. but Pi (A ∩ B) = Pi (Xm = j. Theorem 11. every other j is also recurrent. Proof. So if a communicating class is recurrent then it contains no inessential states and so it is closed. random variables. What is the expectation of T ? First. Xn = i for inﬁnitely many n) = 0. Hence all states are recurrent. Thus T represents the number of variables we need to draw to get a value larger than the initial one. Un ≤ U0 ) 1 = 0 1 P (U1 ≤ x. some state i must be visited inﬁnitely many times. Hence. Proof.i. n+1 33 = 0 . if started in C. We now take a closer look at recurrence. . i. If a communicating class contains a state which is transient then. be i. This state is recurrent.e. P (T > n) = P (U1 ≤ U0 . Hence 11 Positive recurrence Recall that if we ﬁx a state i and let Ti be the ﬁrst return time to i then either Pi (Ti < ∞) = 1 (in which case we say that i is recurrent) or Pi (Ti < ∞) < 1 (transient state). U1 .e. appear in a certain game. by the above theorem. P (B) = Pi (Xn = i for inﬁnitely many n) < 1. Example: Let U0 . uniformly distributed in the unit interval [0. i. and hence. .d. By closedness. j but j i. U2 . . Pi (B) = 0. Let C be the class. . But then. Let T := inf{n ≥ 1 : Un > U0 }. for example. It should be clear that this can. Theorem 12. . . X remains in C.Proof. Un ≤ x | U0 = x)dx xn dx = 1 . If i is inessential then it is transient. 1]. Indeed.

It is traditional to refer to it as “Law”. Z2 . X2 . P (T < ∞) = lim P (T ≤ n) = lim (1 − n→∞ ∞ n→∞ 1 ) = 1. be i. . Our goal now is to show that a chain possessing a positive recurrent state also possesses a stationary distribution. . random variables. arguably. 34 . an inﬁnite number of steps to exceed it! We now classify a recurrent state i as follows: We say that i is positive recurrent if Ei Ti < ∞. . . But this is misleading.Therefore. n+1 You are invited to think about the meaning of this: If the initial value U0 is unknown. then it takes. Before doing that. we shall discover what the form of this stationary distribution is by using the Law of Large Numbers 12 Law (=Theorem) of Large Numbers in Probability Theory The following result is. of successive states is not an independent sequence. but E min(Z1 . Then P Z1 + · · · + Zn =µ n→∞ n lim = 1. (i) Suppose E|Z1 | < ∞ and let µ = EZ1 . n+1 But ET = ∞ P (T > n) = n=1 n=1 1 = ∞. When we have a Markov chain. (ii) Suppose E max(Z1 . We say that i is null recurrent if Ei Ti = ∞.i. . 0) > −∞. To our rescue comes the regenerative structure of a Markov chain identiﬁed at an earlier section. To apply the law of large numbers we must create some kind of independence. it is a Theorem that can be derived from the axioms of Probability Theory. Let Z1 . . on the average. It is not a Law. Then P Z1 + · · · + Zn =∞ n→∞ n lim = 1. We state it here without proof. the Fundamental Theorem of Probability Theory. Theorem 13. X1 . the sequence X0 . 0) = ∞.d. 13 Law of Large Numbers for Markov chains The law of large numbers stated above requires independence.

we have that G(Xa(1) ). Then. . (1) (2) (1) (n) (8) 13. The ﬁnal equality in the last display is merely our notation for the expectation of a random variable conditional on X0 = a. . . . . the law of large numbers tells us that G(Xa ) + · · · + G(Xa ) → EG(Xa(1) ). . we can think of as a reward received when the chain is in state x.i. Speciﬁcally. G(Xa(r) ) = (r) (r+1) Ta ≤t<Ta f (Xt ). which. we shall assume that f is bounded and shall examine those conditions which yield a nontrivial limit. are i. (r) (r+1) G(Xa(r) ) = max{f (Xt ) : Ta ≤ t < Ta }. (7) Whichever the choice of G. but deﬁne G(X ) to be the total reward received over the excursion X .1 Functions of excursions Namely.d.2 Average reward Consider now a function f : S → R. G(Xa(2) ). Xa(2) . are i. Note that EG(Xa(1) ) = EG(Xa(2) ) = · · · = E G(Xa(0) ) | X0 = a = Ea G(Xa(0) ). the (0) initial excursion Xa also has the same law as the rest of them.d. Xa . . n as n → ∞. because the excursions Xa . Xa(3) . Example 2: Let f (x) be as in example 1.. for an excursion X . with probability 1. We are interested in the existence of the limit time-average reward = lim 1 t→∞ t t f (Xk ). . in this example. In particular. random variables. deﬁne G(X ) to be the maximum reward received during this excursion. G(Xa(3) ). 35 . assign a reward f (x). as in Example 2. Thus. Since functions of independent random variables are also independent. k=0 which represents the ‘time-average reward’ received.13.i. forms an independent sequence of (generalised) random variables. given a recurrent state a ∈ S (which will be referred to as the ground state for no good reason other than that it has been ﬁxed once and for all) the sequence of excursions Xa(1) . Example 1: To each state x ∈ S. and because if we start with X0 = a. for reasonable choices of the functions G taking values in R. .

say. say |f (x)| ≤ B. We shall present the proof in a way that the constant f will reveal itself–it will be discovered. t t In other words. and so on. Suppose that a Markov chain possesses a positive recurrent state a ∈ S (i.Theorem 14. t t as t → ∞. Proof. Ea Ta < ∞) and assume that the distribution of X0 is such that P (Ta < ∞) = 1. with probability one. and Glast is the reward over the last incomplete cycle. Hence |Gﬁrst | ≤ BTa .e. Since P (Ta < ∞) = 1. 0 Ta (1) Ta (2) T (3) a Ta(4) Ta (5) t We write the total reward as a sum of rewards over complete excursions plus a ﬁrst and a last term: t Nt −1 r=1 (1) f (Xn ) = n=0 Gr + Gﬁrst + Glast . t t 36 . (r) Nt := max{r ≥ 1 : Ta ≤ t}. and so 1 ﬁrst G → 0. with probability one. Nt . For example. we have BTa /t → 0. Let f : S → R+ be a bounded ‘reward’ function. of complete excursions from the ground state a that occur between 0 and t. A similar argument shows that 1 last G → 0. We break the sum into the total reward received over the initial excursion. t as t → ∞. Gr is given by (7). Then. Nt = 5. Thus we need to keep track of the number. in the ﬁgure below. Ea f= n=0 f (Xn ) E a Ta = x∈S f (x)π(x). The idea is this: Suppose that t is large enough. the ‘time-average reward’ f exists and is a deterministic quantity: P where Ta −1 1 t→∞ t lim t f (Xn ) = f n=0 =1 . We assumed that f is t bounded. Gﬁrst is the reward up to time Ta = Ta or t (whichever t is smaller). plus the total reward received over the next excursion. We are looking for the total reward received between 0 and t.

r=1 (10) Now look at the sequence r (r) (1) Ta = Ta + k=1 (k) (k−1) ) (Ta − Ta (1) (k) (k−1) and see what the Law of Large Numbers again.d. . Since Ta → ∞. t→∞ t E a Ta (12) Using (10) and (12) in (9) we have 1 lim t→∞ t Thus 1 t→∞ t lim t Nt −1 Gr = r=1 EG1 . and since Ta − Ta k = 1. r=1 (9) Look again what the Law of Large Numbers (8) tells us: that 1 n→∞ n lim (r) n Gr = EG1 . 2.also. Since Ta /r → 0. we have Ta r→∞ r lim (r) . So we are left with the sum of the rewards over complete cycles.i.. it follows that the number Nt of complete cycles before time t will also be tending to ∞. Hence Ta t t Ta t ≤ < Nt Nt Nt But (11) tells us that Ta t Ta t = lim t→∞ Nt t→∞ Nt lim Hence lim (N ) (N +1) (N ) (N +1) . we have that the visit with index r = Nt to the ground state a is the last before t (see also ﬁgure above): (N (N Ta t ) ≤ t < Ta t +1) . are i. Hence. as r → ∞. = E a Ta . . E a Ta 37 . as t → ∞. E a Ta f (Xn ) = n=0 EG1 =: f . random variables with common expectation Ea Ta . r=1 with probability one. . = E a Ta . (11) By deﬁnition. lim t∞ 1 Nt Nt −1 Gr = EG1 . Write now 1 t Nt −1 Gr = r=1 Nt 1 t Nt Nt −1 Gr . 1 Nt = .

Comment 1: It is important to understand what the formula says. we let f to be 1 at the state b and 0 everywhere else. not both. Hence t→∞ lim 1 t t 1(Xn = a) = n=1 1 . The denominator is the duration of the excursion. for all b ∈ S. t Ea 1(Xk = b) E a Ta . i. then we see that the numerator equals 1.e. The equality of these two limits gives: average no. The theorem above then says that. Applying formula (14) with b in place of a we ﬁnd that the limit in (13) also equals 1/Eb Tb . just observe that EG1 = E (1) (1+1) Ta ≤t<Ta f (Xt ) = Ea 0≤t<Ta f (Xt ). f(X2 ) f(X1 ) f(X )= f(a) 0 f(T −1) a 0 T a Comment 2: A very important particular case occurs when we choose f (x) := 1(x = b). We may write EG1 = Ea 0≤t<Ta f (Xt ) or EG1 = Ea 0<t≤Ta f (Xt ). Note that we include only one endpoint of the excursion. (13) 1(Xn = b) = n=1 Comment 3: If in the above we let b = a. The physical interpretation of the latter should be clear: If. The numerator of f is computed by summing up the rewards over the excursion (making sure that not both endpoints are included). then the long-run proportion of visits to the state a should equal 1/Ea Ta . The constant f is a sort of average over an excursion. Comment 4: Suppose that two states. a. Ta −1 k=0 1 lim t→∞ t with probability 1. the ground state. it takes Ea Ta units of time between successive visits to the state a. b. of times state b is visited = Ea between two successive visits to state a Ta −1 1(Xk = b) = k=0 E a Ta . We can use b as a ground state instead of a. E b Tb 38 .To see that f is as claimed. E a Ta (14) with probability 1. which communicate and are positive recurrent. on the average. from the Strong Markov Property.

X3 . where a is the ﬁxed recurrent ground state whose existence was assumed. i. (16) This was a trivial but important step: We replaced the n = 0 term with the n = Ta term. we saw (Comment 2 above) that if a is a positive recurrent state then ν [a] (x) E a Ta is the long-run fraction of times that state x is visited. The real reason for doing so is because we wanted to replace a function of X0 . . We start the Markov chain with X0 = a. by a function of X1 . X1 . Note: Although ν [a] depends on the choice of the ‘ground’ state a. for any state b such that a → x (a leads to x). notice that the choice of notation ν [a] is justiﬁable through the notation (6) for the communicating class of the state a. In view of this.e. X2 . x ∈ S. Moreover. . Then ν [a] satisﬁes the balance equations: ν [a] = ν [a] P. Hence the superscript [a] will soon be dropped. 5 We stress that we do not need the stronger assumption of positive recurrence here. . Clearly. x ∈ S. we can write ν (x) = Ea k=1 [a] Ta 1(Xn = x). (15) should play an important role. Since a = XT a . this should have something to do with the concept of stationary distribution discussed a few sections ago. Both of them equal 1. 39 . . . .14 Construction of a stationary distribution Our discussions on the law of large numbers for Markov chains revealed that the quantity Ta −1 ν (x) := Ea n=0 [a] 1(Xn = x) . Consider the function ν [a] deﬁned in (15). we have 0 < ν [a] (x) < ∞. Recurrence tells us that Ta < ∞ with probability one. Proposition 2. so there is nothing lost in doing so. we will soon realise that this dependence vanishes when the chain is irreducible. ν [a] (x) = y∈S ν [a] (y)pyx . Proof. Indeed. Fix a ground state a ∈ S and assume that it is recurrent5 . X2 .

n ≤ Ta ) Pa (Xn = x. that a is positive recurrent. such that pxa > 0. which. We just showed that ν [a] = ν [a] P. Deﬁne Ta −1 ν [a] (x) = π (x) := E a Ta [a] Ea n=0 1(Xn = x) E a Ta . We now have: Corollary 9. Ta Ta (m) ν [a] (x) = Ea x∈S n=1 x∈S 1(Xn = x) = Ea n=1 1 = E a Ta . which are precisely the balance equations. ν [a] = ν [a] Pn . so there exists n ≥ 1. where. We are now going to assume more. 40 . we used (16). let x be such that a x. is ﬁnite when a is positive recurrent. such that pax > 0. xa so. Xn−1 = y. x a. (n) Next. for all x ∈ S. ν [a] (x) < ∞. (17) This π is a stationary distribution. Suppose that the Markov chain possesses a positive recurrent state a.and thus be in a position to apply ﬁrst-step analysis. Hence 1 = ν [a] (a) ≥ ν [a] (x)p(m) > ν [a] (x). from (16). Note: The last result was shown under the assumption that a is a recurrent state. Hence ν [a] (x) ≥ ν [a] (a)p(n) = p(n) > 0. So we have shown ν [a] (x) = y∈S pyx ν [a] (y). Therefore. x ∈ S. n ≤ Ta ) Ta = n=1 y∈S ∞ = n=1 y∈S = y∈S pyx Ea n=1 [a] 1(Xn−1 = y) = y∈S pyx ν (y). First note that. for all n ∈ N. ax ax From Theorem 10. by deﬁnition. n ≤ Ta ) pyx Pa (Xn−1 = y. which is precisely what we do next: ∞ ν (x) = Ea n=1 ∞ [a] 1(Xn = x. namely. so there exists m ≥ 1. n ≤ Ta ) = n=1 ∞ Pa (Xn = x. indeed. in this last step.

We already know that j is recurrent. . R2 ) be the index of the ﬁrst (respectively. Also. if tails show up. that it sums up to one. But π [a] (x) = x∈S 1 E a Ta ν [a] (x) = 1. (r) j. 1. and the probability gij = Pi (Tj < ∞) that j occurs in a speciﬁc excursion is positive. But π [a] is simply a multiple of ν [a] . . Let R1 (respectively. in general. and so π [a] is not trivially equal to zero. 41 . 2.i. that ν [a] P = ν [a] . since the excursions are independent.e. the event that j occurs in a speciﬁc excursion is independent from the event that j occurs in the next excursion. to decide whether j will occur in a speciﬁc excursion. we toss a coin with probability of heads gij > 0. then the set of linear equations π(x) = y∈S π(y)pyx . In words. second) excursion that contains j. The coins are i. in Proposition 2.d. It only remains to show that π [a] is a probability. Let i be a positive recurrent state. Proof. 15 Positive recurrence is a class property We now show that (just like recurrence) positive recurrence is also a class property. i. Start with X0 = i. Consider the excursions Xi . . This is what we will try to do in the sequel. x∈S Pause for thought: We have achieved something VERY IMPORTANT: if we have one. no matter which one. Then j is positive Proof. We have already shown. positive recurrent state. Suppose that i recurrent. then j occurs in the r-th excursion. then the r-th excursion does not contain state j. we have Ea Ta < ∞. Hence j will occur in inﬁnitely many excursions. Otherwise. π(y) = 1 y∈S x∈S have at least one solution. If heads show up at the r-th excursion. unique. π [a] P = π [a] . Since a is positive recurrent. r = 0. Theorem 15. Therefore.In other words: If there is a positive recurrent state. then we immediately have a stationary (probability) distribution! We DID NOT claim that this distribution is.

. Then either all states in C are positive recurrent. An irreducible Markov chain is called transient if one (and hence all) states are transient. with the same law as Ti . . An irreducible Markov chain is called positive recurrent if one (and hence all) states are positive recurrent.R2 R1 j i R1+1 R1+2 This random variable has the same law as T under P i i ζ We just need to show that the sum of the durations of excursions with indices R1 . An irreducible Markov chain is called null recurrent if one (and hence all) states are null recurrent. An irreducible Markov chain is called recurrent if one (and hence all) states are recurrent. In other words. . R2 has ﬁnite expectation. . . where the Zr are i.i. we just need to show that R2 −R1 +1 ζ := r=1 Zr has ﬁnite expectation. But. . Since gij > 0 and Ei Ti < ∞ we have that Ei ζ < ∞. IRREDUCIBLE CHAIN r = 1. 2. Let C be a communicating class. the expectation of this random variable equals −1 Ei ζ = (Ei Ti ) × Ei (R2 − R1 + 1) = (Ei Ti ) × (gij + 1) because R2 − R1 is a geometric random variable: Pi (R2 − R1 = r) = (1 − gij )r−1 gij .d. or all states are null recurrent or all states are transient. . Corollary 10. by Wald’s lemma. RECURRENT TRANSIENT POSITIVE RECURRENT NULL RECURRENT 42 .

Then P (Ta < ∞) = x∈S In other words. we start the Markov chain from the stationary distribution π. E 1 t t 1 t t 1(Xn = x) n=1 → π [a] (x). x ∈ S. after all. for totally trivial reasons (it is. the π deﬁned by π(0) = α. Let π be a stationary distribution. as t → ∞. 1(Xn = x) n=1 = 1 t t P (Xn = x) = π(x). what prohibits uniqueness. we stress that a stationary distribution may not be unique. π(1) = 0. By a theorem of Analysis (Bounded Convergence Theorem). x ∈ S. following Theorem 14. x∈S By (13). for all states i and j. we have P (Xn = x) = π(x). Fix a ground state a. Example: Consider the chain we’ve seen before: 1 1/3 1 0 1/2 1 1/6 2 Clearly. Proof. Consider positive recurrent irreducible Markov chain. state 1 (and state 2) is positive recurrent. for all x ∈ S. Theorem 16. But.16 Uniqueness of the stationary distribution At the risk of being too pedantic. indeed. Then there is a unique stationary distribution. recognising that the random variables on the left are nonnegative and bounded below 1. Since the chain is irreducible and positive recurrent we have Pi (Tj < ∞) = 1. n=1 43 . Then. Consider the Markov chain (Xn . (You should think why this is OBVIOUS!) Notice that 1 This lack of communication is. n ≥ 0) and suppose that P (X0 = x) = π(x). is a stationary distribution. (18) π(x)Px (Ta < ∞) = π(x) = 1. Ta −1 1 t t Ea 1(Xk = x) k=0 1(Xn = x) n=1 → E a Ta = π [a] (x). we can take expectations of both sides: E as t → ∞. an absorbing state). we have. π(2) = 1 − α 2. Hence. by (18). for all n. there is a stationary distribution. But observe that there are many–inﬁnitely many–stationary distributions: For each 0 ≤ α ≤ 1. with probability one.

17 Structure of a stationary distribution It is not hard to show now what an arbitrary stationary distribution looks like. Hence there is only one stationary distribution. i is positive recurrent. which is useful because it is often used as a criterion for positive recurrence: Corollary 13. π(C) x ∈ C. Thus. from the deﬁnition of π [a] . Corollary 11. for all x ∈ S. 1 π(a) = . Hence π = π [a] . Both π [a] and π[b] are stationary distributions. Consider a Markov chain. Decompose it into communicating classes.Therefore π(x) = π [a] (x). π(i|C) = Ei Ti . E a Ta Proof. delete all transition probabilities pxy with x ∈ C. Assume that a stationary distribution π exists. Consider the restriction of the chain to C. π(a) = π [a] (a) = 1/Ea Ta . Let π(·|C) be the distribution deﬁned by restricting π to C: π(x|C) := π(x) . Let C be the communicating class of i. This is in fact. an arbitrary stationary distribution π must be equal to the speciﬁc one π [a] . so they are equal. where π(C) = y∈C π(y). In particular. There is only one stationary distribution. we obtain the cycle formula: Ta −1 Tb −1 Ea k=0 1(Xk = x) E a Ta = Eb k=0 1(Xk = x) E b Tb . The cycle formula merely re-expresses this equality. There is a unique stationary distribution. But π(i|C) > 0. Consider a positive recurrent irreducible Markov chain and two states a. By the 1 above corollary. Proof. Consider an arbitrary Markov chain. for all a ∈ S. In particular. Then any state i such that π(i) > 0 is positive recurrent. Then for each positive recurrent communicating class C there corresponds a unique stationary distribution π C 44 . Consider a positive recurrent irreducible Markov chain with stationary distribution π. Then.e. The following is a sort of converse to that. In Section 14 we assumed the existence of a positive recurrent state a and proved the existence of a stationary distribution π. y ∈ C. i. Let i be such that π(i) > 0. b ∈ S. Proof. Therefore Ei Ti < ∞. Then π [a] = π [b] . Corollary 12. Namely. We can easily see that we thus obtain a Markov chain with state space C with stationary distribution π(·|C). the only stationary distribution for the restricted chain because C is irreducible. x ∈ S.

take an = (−1)n . from physical experience. π(x|C) ≡ π C (x). This is deﬁned by picking a state (any state!) a ∈ C and letting π C := π [a] . There is dissipation of energy due to friction and that causes the settling down. This is the stable position. that the pendulum will swing for a while and. it will settle to the vertical position. x∈C Then (π(x|C). For each such C deﬁne the conditional distribution π(x|C) := π(x) . without friction. such that C αC = 1. consider the motion of a pendulum under the action of gravity. 18 18. By our uniqueness result. that it remains in a small region). Stability of a deterministic dynamical system means that the system eventually converges to a ﬁxed point (or. We all know. For example. the pendulum will perform an 45 . Let π be an arbitrary stationary distribution. A sequence of real numbers an k=1 1 may not converge but its Cesaro averages n (a1 + · · · + an ) may converge. So we have that the above decomposition holds with αC = π(C). more generally.which assigns positive probability to each state in C. Proposition 3. An arbitrary stationary distribution π must be a linear combination of these π C . for example. under irreducibility and positive recurrence conditions. So far we have seen. the so-called Cesaro 1 averages n n P (Xk = x) converge to π(x). In an ideal pendulum.1 Coupling and stability Deﬁnition of stability Stability refers to the convergence of the probabilities P (Xn = x) as n → ∞. A Markov chain is a simple model of a stochastic dynamical system. where αC ≥ 0. where π(C) := π(C) π(x). eventually. as n → ∞. If π(x) > 0 then x belongs to some positive recurrent class C. Any stationary distribution π is necessarily of the form π= C: C positive recurrent communicating class αC π C . x ∈ C) is a stationary distribution. Proof. that. Why stability? There are physical reasons why we are interested in the stability of a Markov chain. To each positive recurrent communicating class C there corresponds a stationary distribution π C .

are given. . . ξ0 . . However notice what happens when we solve the recursion. ξ2 . a > 0. we may it is stable because it does not (it cannot!) swing too far away. moreover. A linear stochastic system We now pass on to stochastic systems. by means of the recursion above. random variables. mutually independent. Since we are prohibited by the syllabus to speak about continuous time stochastic systems. we here assume that X0 . for we can do it: X1 = ρX0 + ξ0 X2 = ρX1 + ξ1 = ρ(ρX0 + ξ0 ) + ξ1 = ρ2 X0 + ρξ0 + ξ1 X3 = ρX2 + ξ2 = ρ3 X0 + ρ2 ξ0 + ρξ1 + ξ2 Xn = ρn X0 + ρn−1 ξ0 + ρn−2 ξ1 + · · · + ρξn−2 + ξn−1 .e. as t → ∞. If. it cannot mean convergence to a ﬁxed equilibrium point. ······ 46 . and deﬁne a stochastic sequence X1 . or whatever. It follows that |ρ| < 1 is the condition for stability. .oscillatory motion ad inﬁnitum. ˙ There are myriads of applications giving physical interpretation to this. then regardless of the starting position. The thing to observe here is that there is an equilibrium point i. X2 . Speciﬁcally. we have x(t) → x∗ . . ξ1 . simple because the ξn ’s depend on the time parameter n. x ∈ R. 4 3 2 x* 1 0 0 1 2 t 3 t4 If we start with x(t = 0) = x∗ . The simplest diﬀerential equation As another example. . And we are in a happy situation here. then x(t) = x∗ for all t > 0. We say that x∗ is a stable equilibrium. But what does stability mean? In general. let us consider a stochastic analogue of the above in discrete time: Xn+1 = ρXn + ξn . the one found by setting the right-hand side equal to zero: x∗ = b/a. We shall assume that the ξn have the same law. Still. so I’ll leave it upon the reader to decide whether he or she wants to consider this as an ﬁnancial example. consider the following diﬀerential equation x = −ax + b. or an example in physics.

for time-homogeneous Markov chains. 18. uniformly in x. Coupling refers to the various methods for constructing this joint law.3 Coupling To understand why the fundamental stability theorem works. as n → ∞. under nice conditions. we have that the ﬁrst term goes to 0 as n → ∞. How should we then deﬁne what P (X = x. . positive recurrent and aperiodic with (unique) stationary distribution π then n→∞ lim P (Xn = x) = π(x). which is a linear combination of ξ0 . In other words. Starting at an elementary level. we need to understand the notion of coupling. Y but not their joint law. we know f (x) = P (X = x). but notice that if we let ˆ Xn := ρn−1 ξn−1 + ρn−2 ξn−2 + · · · + ρξ1 + ξ0 . ˜ Xn := ρn−1 ξ0 + ρn−2 ξ1 + · · · + ρξn−2 + ξn−1 . Y = y) is? 47 . converges.Since |ρ| < 1. but we are not given what P (X = x. (e. to try to make the coincidence probability P (X = Y ) as large as possible. Y = y) is. If a Markov chain is irreducible. 18. ∞ ˆ The nice thing is that. while the limit of Xn does not exists. . Y = y) = P (X = x)P (Y = y). A straightforward (often not so useful) construction is to assume that the random variables are independent and deﬁne P (X = x. to a matrix with rows all equal to π: (n) lim p n→∞ ij = π(j). assume that the ξn are bounded) Xn does converge to ˆ X∞ = k=0 ρk ξk . for any initial distribution. . . j ∈ S. i.2 The fundamental stability theorem for Markov chains Theorem 17. for instance. deﬁne stability in a way other than convergence of the distributions of the random variables. In particular. But suppose that we have some further requirement. ξn−1 does not converge. we have ˜ ˆ P (Xn ≤ x) = P (Xn ≤ x). g(y) = P (Y = y). the n-step transition matrix Pn . suppose you are given the laws of two random variables X. What this means is that. The second term. the limit of its distribution does: ˜ ˆ ˆ lim P (Xn ≤ x) = lim P (Xn ≤ x) = lim P (Xn ≤ x) = P (X∞ ≤ x) n→∞ n→∞ n→∞ This is precisely the reason why we cannot.g.

n ≥ 0. y).e. |P (Xn = x) − P (Yn = x)| ≤ P (T > n). then |P (Xn = x) − P (Yn = x)| → 0. x ∈ S. x∈S 48 . in particular. but with the roles of X and Y interchanged. (19) as n → ∞. respectively. y) = P (X = x. Repeating (19). g(y) = P (Y = y).e. Proof. we obtain P (Yn = x) − P (Xn = x) ≤ P (T > n). and all x ∈ S. precisely because of the assumption that the two processes have been constructed together. If. i. for all n ≥ 0. n ≥ T ) = P (Xn = x. and which maximises P (X = Y ). n ≥ 0) and the law of a stochastic process Y = (Yn . i. Then. A coupling of them refers to a joint construction on the same probability space. we can write it as sup |P (Xn = x) − P (Yn = x)| ≤ P (T > n). Y = y) which respects the marginals. uniformly in x ∈ S. Since this holds for all x ∈ S and since the right hand side does not depend on x. n ≥ 0). n < T ) + P (Yn = x. Note that it makes sense to speak of a meeting time. Find the joint law h(x. n < T ) + P (Xn = x. Combining the two inequalities we obtain the ﬁrst assertion: |P (Xn = x) − P (Yn = x)| ≤ P (T > n). on the event that the two processes never meet. This T is a random variables that takes values in the set of times or it may take values +∞. y). n ≥ 0. f (x) = y h(x. and hence P (Xn = x) − P (Yn = x) ≤ P (T > n). P (T < ∞) = 1. T is ﬁnite with probability one. The following result is the so-called coupling inequality: Proposition 4. We say that ransom time T is meeting time of the two processes if Xn = Yn for all n ≥ T. Suppose that we have two stochastic processes as above and suppose that they have been coupled. The coupling we are interested in is a coupling of processes. but we are not told what the joint probabilities are. Y be random variables with values in Z and laws f (x) = P (X = x). n ≥ T ) ≤ P (n < T ) + P (Yn = x).Exercise: Let X. We have P (Xn = x) = P (Xn = x. g(y) = x h(x. Y . We are given the law of a stochastic process X = (Xn . Let T be a meeting of two coupled stochastic processes X.

Consider the process Wn := (Xn . So both processes are positive recurrent aperiodic Markov chains with the same transition probabilities. Let pij denote. Yn ). 18.x′ py. Therefore. n ≥ 0) has the law of a Markov chain with transition probabilities pij and Y0 distributed according to the stationary distribution µ.y′ ) = px.(y.4 Proof of the fundamental stability theorem First. denoted by π. and it also makes sense to consider their ﬁrst meeting time: T := inf{n ≥ 0 : Xn = Yn }. Xm = y). n ≥ 0) is a Markov chain with state space S × S. We know that there is a unique stationary distribution. sup |P (Xn = x) − P (Yn = x)| → 0. Xn and Yn would be independent and distributed according to π. in other words.y′ ) := P Wn+1 = (x′ . positive recurrent and aperiodic.(y. then. and so (Wn . (x. We shall consider the laws of two processes. x∈S as n → ∞. y) := π(x)π(y). n ≥ 0) is an irreducible chain. .) By positive recurrence.(y. (Indeed.x′ > 0 and py. We now couple them by doing the straightforward thing: we assume that X = (Xn . n ≥ 0) has the law of a Markov chain with transition probabilities pij and X0 distributed according to some arbitrary distribution µ.x′ ). n ≥ 0). had we started with both X0 and Y0 independent and distributed according to π. we can deﬁne joint probabilities.y′ ) for all large n. Having coupled them.x′ ). Notice that σ(x. n ≥ 0). is a stationary distribution for (Wn . y ′ ) | Wn = (x. we have not deﬁned joint probabilities such as P (Xn = x.y′ . implying that q(x. 49 (n) (n) (n) (n) (n) (n) = P (Xn+1 = x′ | Xn = x) P (Yn+1 = y ′ | Yn = y) = px. The second process.Assume now that P (T < ∞) = 1. they diﬀer only in how the initial states are chosen. denoted by X = (Xn .y′ . The ﬁrst process.x′ ). Its initial state W0 = (X0 .x′ py. π(x) > 0 for all x ∈ S. review the assumptions of the theorem: the chain is irreducible.y′ for all large n. denoted by Y = (Yn . Then n→∞ lim P (T > n) = P (T = ∞) = 0. y) = µ(x)π(y). n ≥ 0) is independent of Y = (Yn . Its (1-step) transition probabilities are q(x. we have that px. the transition probabilities. Y0 ) has distribution P (W0 = (x. Then (Wn . From Theorem 3 and the aperiodicity assumption. y) ∈ S × S. as usual. for all n ≥ 1. y) Its n-step transition probabilities are q(x. It is very important to note that we have only deﬁned the law of X and the law of Y but we have not deﬁned a joint law.

Y ) may fail to be irreducible. 50 . Hence (Wn . Then the product chain has state space S ×S = {(0. after the meeting time. In particular. ≥ 0) is a Markov chain with transition probabilities pij . This is not irreducible. (1. 1} and the pij are p01 = p10 = 1. 1)} and the transition probabilities are q(0. which converges to zero. n ≥ 0) is positive recurrent.(1.1). uniformly in x.1) = 1.0) = q(0. precisely because P (T < ∞) = 1.(0.(1. P (T < ∞) = 1.01. then the process W = (X. (0. (Zn . y) > 0 for all (x. By the coupling inequality.0). y) ∈ S × S. Deﬁne now a third process by: Zn := Xn 1(n < T ) + Yn 1(n ≥ T ).99. p01 = 0. and since P (Yn = x) = π(x). Clearly.1). T is a meeting time between Z and Y . Obviously.1) = q(1. Example showing that the aperiodicity assumption is essential for the method used in the proof: If pij are the transition probabilities of an irreducible chain X and if Y is an independent copy of X. as n → ∞. However. we have |P (Xn = x) − π(x)| ≤ P (T > n). 0). Moreover. X arbitrary distribution Z stationary distribution Y T Thus. In particular. by assumption.(0. Z0 = X0 which has an arbitrary distribution µ. Since P (Zn = x) = P (Xn = x). Zn equals Xn before the meeting time of X and Y . For example. Hence (Zn . suppose that S = {0. if we modify the original chain and make it aperiodic by. p10 = 1.Therefore σ(x. 1). n ≥ 0) is identical in law to (Xn . for all n and x. n ≥ 0). (1. P (Xn = x) = P (Zn = x). p00 = 0. say.0). 0). |P (Zn = x) − P (Yn = x)| ≤ P (T > n).0) = q(1. Zn = Yn .

0. from C1 to C2 . Theorem 4. .497 . π(1) satisfy 0.503. p01 = 0.503 0. Then Xn = 0 for even n and = 1 for odd n. we can decompose the state space into d classes C0 . . p10 = 1. By the results of §5. It is. per se wrong. 1981). n→∞ lim p00 p01 (n) (n) p10 p11 (n) (n) = 0. 18. . Indeed.99π(0) = π(1).503 0. π(0) ≈ 0. C1 . π(1) ≈ 0. In matrix notation. The reader is asked to pay attention to this terminology which is diﬀerent from the one used in the Markov chain literature.497 18. however. 51 . positive recurrent and aperiodic Markov chain. It is not diﬃcult to see that 6 We put “wrong” in quotes. 6 The reason is that the term “ergodic” means something speciﬁc in Mathematics (c. Parry.6 Periodic chains Suppose now that we have a positive recurrent irreducible chain.497. in particular. limn→∞ p00 does not exist. Cd−1 . By the results of Section 14. Terminology cannot be. But this is “wrong”. However. and this is expressed by our terminology. customary to have a consistent terminology. and so on. n→∞ (n) (n) lim p n→∞ 11 = lim p01 = π(1).f. Suppose that the period of the chain is d. i. from Cd back to C0 .99. such that the chain moves cyclically between them: from C0 to C1 . . π(0) + π(1) = 1. Suppose X0 = 0. most people use the term “ergodic” for an irreducible.4 and. we know we have a unique stationary distribution π and that π(x) > 0 for all states x. Thus. then (n) lim p n→∞ 00 (n) = lim p10 = π(0). n→∞ (n) where π(0). A Markov chain which is irreducible and positive recurrent and aperiodic will be called mixing.then the product chain becomes irreducible. Therefore p00 = 1 for even n (n) and = 0 for odd n.e. if we modify the chain by p00 = 0.01. Example showing that the aperiodicity assumption is essential for the limiting result: Consider the same chain as above: p01 = p10 = 1.5 Terminology A Markov chain which is irreducible and positive recurrent will be called ergodic.

starting from i. and let r = ℓ − k (modulo d). Then. j ∈ Cℓ . n ≥ 0) is aperiodic. we have pij for all i. This means that: Theorem 18. All states have period 1 for this chain. Let i ∈ Ck . π(i) = 1/d. where π is the stationary distribution. Since (Xnd . So π(i) = j∈Cr−1 π(j)pji . namely C0 . Then pji = 0 unless j ∈ Cr−1 . We shall show that Lemma 9.Lemma 8. We have that π satisﬁes the balance equations: π(i) = j∈S π(j)pji . n ≥ 0) to any of the sets Cr we obtain a positive recurrent irreducible aperiodic chain. Suppose that X0 ∈ Cr . Suppose i ∈ Cr . The chain (Xnd . n ≥ 0) consists of d closed communicating classes. . i ∈ S. it remains in Cℓ . each one must be equal to 1/d. Proof. i ∈ Cr . 1. n ≥ 0) is dπ(i). d − 1. as n → ∞. . Then pij (r+nd) (nd) = Pi (Xnd = j) → dπ(j). Cd−1 . the (unique) stationary distribution of (Xnd . Since the αr add up to 1. . . i∈Cr for all r = 0. . the original chain enters Cℓ in r = ℓ − k steps (where r is taken to be a number between 0 and d − 1. . j ∈ Cℓ . Hence if we restrict the chain (Xnd . Can we also ﬁnd the limit when i and j do not belong to the same class? Suppose that i ∈ Ck . . j ∈ Cr . and the previous results apply. 52 . as n → ∞. Then. i ∈ Cr . Let αr := i∈Cr π(i). after reduction by d) and. thereafter. if sampled every d steps. Suppose that pij are the transition probabilities of a positive recurrent irreducible chain with period d. → dπ(j). . Summing up over i ∈ Cr we obtain αr = i∈Cr π(i) = j∈Cr−1 π(j) i∈Cr pji = j∈Cr−1 π(j) = αr−1 .

say. Hence pi k i.. . p(n) = (n) (n) (n) (n) (n) ∞ p1 . the sequence n3 of the third row is a subsequence k k of n2 of the second row. whose proof we omit7 : 7 See Br´maud (1999). if j is transient. Recall that we know what happens when j is positive recurrent and i is in the same commu(n) nicating class as j If the period is 1. (nk ) k → pi . i. . . It is clear how we (ni ) k continue. For each i we have found subsequence ni . We also saw in Corollary (n) 7 that. the sequence n2 of the second row k is a subsequence of n1 of the ﬁrst row. such that limk→∞ p1 k 3 (n1 ) numbers p2 k . p2 . then pij → 0 as n → ∞. . then pij → π(j) (Theorems 17). If the period is d then the limit exists only along a selected subsequence (Theorem 18). Since they are (n2 ) subsequence n2 of n1 such that limk→∞ p1 k k k (nk ) (n) as n → ∞. and so on. .e. the ﬁrst of which is known as HellyBray Lemma: Lemma 10. k = 1. So the diagonal subsequence (nk .) is a k k subsequence of each (ni . The proof will be based on two results from Analysis. exists for all i ∈ N. . say. . as n → ∞. 19. . schematically. a probability distribution on N. . for all i ∈ S. . 2. p1 . p1 p1 (n1 ) 1 p1 (n1 ) 2 (n2 ) 2 p1 (n1 ) 3 (n2 ) 3 (n3 ) 3 (n2 ) 1 (n3 ) 1 ··· ··· ··· ··· → → → ··· p1 p2 p3 ··· p1 p1 ··· p1 p1 ··· (n3 ) 2 p1 ··· Now pick the numbers in the diagonal. p2 .. such that limk→∞ pi k exists and equals. e 53 . k = 1. Then we can ﬁnd a i=1 pi sequence n1 < n2 < n3 < · · · of integers such that limk→∞ pi Proof. . Now consider the contained between 0 and 1 there must be a exists and equals. We have.19 Limiting theory for Markov chains* We wish to complete our discussion about the possible limits of the n-step transition prob(n) abilities pij .1 Limiting probabilities for null recurrent states Theorem 19. . . 2. say. . for each i. k = 1. . for each n = 1. Consider. 2. . pi . where = 1 and pi ≥ 0 for all i. 2. as k → ∞. Since the numbers p1 n1 1 n1 2 (n) (n1 ) < < n1 < · · · . 2. are contained between 0 and 1. .). for all i ∈ S. . for each The second result is Fatou’s lemma. k = 1. . If j is a null recurrent state then pij → 0. there must be a subsequence exists and equals. By construction. p3 .

y∈S x ∈ S.e. equalities. and this is a contradiction (the number αx = αx cannot be strictly larger than itself). for some x0 ∈ S. If (yi . it is a strict inequality: αx0 > y∈S αy pyx0 . y∈S Since x∈S αx ≤ 1.e. x ∈ S . αx ≤ lim inf k→∞ (n′ ) for all x ∈ S. pix k Therefore. (n′ ) Hence αx ≥ αy pyx . by assumption. actually. the sequence (n) pij does not converge to 0. pix k = 1. . Suppose that one of them is not. (n′ ) lim pix k (n′ +1) ≥ y∈S k→∞ lim piy k pyx . x∈S (n′ ) Since aj > 0. (n ) (n ) Consider the probability vector By Lemma 10. pix k . k→∞ (n′ +1) = y∈S piy k pyx . i ∈ N). Let j be a null recurrent state. and the last equality is obvious since we have a probability vector. are sequences of nonnegative numbers then ∞ (n) yi lim inf xi i=1 n→∞ (n) ∞ ≤ lim inf n→∞ yi x i . we have that π(x) := αx y∈S αy satisﬁes the balance equations and has x∈S π(x) = 1. (xi . i. 2. . . This would imply that αx > x∈S x∈S y∈S x∈S αy pyx = y∈S αy x∈S pyx = y∈S αy . x ∈ S. Hence we can ﬁnd a subsequence nk such that k→∞ lim pij k = αj > 0. we can pick subsequence (n′ ) of (nk ) such that k k→∞ lim pix k = αx . by Lemma 11. i. n = 1. i ∈ N). Suppose that for some i. i=1 (n) Proof of Theorem 19. we have 0< x∈S The inequality follows from Lemma 11.Lemma 11.e. Now π(j) > 0 because αj > 0. We wish to show that these inequalities are. 54 . Thus. But Pn+1 = Pn P. αy pyx . π is a stationary distribution. i. state j is positive recurrent: Contradiction. By Corollary 13.

as n → ∞. Let HC := inf{n ≥ 0 : Xn ∈ C}. HC ≤ n) n (n) = m=1 Pi (HC = m)Pi (Xn = j | HC = m). In general. so we will only look that the aperiodic case. Then (n) lim p n→∞ ij = ϕC (i)π C (j). This is a more old-fashioned way of getting the same result. E j Tj where Tj = inf{n ≥ 1 : Xn = j}. and fij = Pi (Tj < ∞). Let ϕC (i) := Pi (HC < ∞). that is. Proof. Pµ (Xn−m = j) → π C (j).19. Hence (n) lim p n→∞ ij ∞ = π (j) m=1 C Pi (HC = m) = π C (j)Pi (HC < ∞). as in §10. when j is a positive recurrent state but i and j do not necessarily belong to the same communicating class. Let µ be the distribution of XHC given that X0 = i and that {HC < ∞}. Let j be a positive recurrent state with period 1.2 The general case (n) It remains to see what the limit of pij is. which is what we need. Example: Consider the following chain (0 < α. Remark: We can write the result of Theorem 20 as (n) lim p n→∞ ij = fij . From the Strong Markov Property. we have Pi (Xn = j | HC = m) = Pµ (Xn−m = j). and fij = ϕC (j). π C (j) = 1/Ej Tj . as n → ∞. Indeed. The result when the period j is larger than 1 is a bit awkward to formulate. starting with the ﬁrst hitting time decomposition relation of Lemma 7. Let i be an arbitrary state. Let C be the communicating class of j and let π C be the stationary distribution associated with C. the result can be obtained by the so-called Abel’s Lemma of Analysis.2. we have pij = Pi (Xn = j) = Pi (Xn = j. If i also belongs to C then ϕC (i) = 1 and so the result follows from Theorem 17. In this form. But Theorem 17 tells us that the limit exists regardless of the initial distribution. γ < 1): γ 1−γ 1 1 2 1−α α 3 β 4 1−β 5 1 55 . Theorem 20. β.

while. For example. C1 = {1. ϕ(4) = . from ﬁrst-step analysis. (n) (n) p32 → ϕ(3)π C1 (2). The subject of Ergodic Theory has its origins in Physics. (n) (n) p33 → 0. C2 are aperiodic classes. (n) (n) p34 → 0. p45 → (1 − ϕ(4))π C2 (5). We have ϕ(1) = ϕ(2) = 1. information about its velocity. Theorem 14 is a type of ergodic theorem. And so is the Law of Large Numbers itself (Theorem 13). ranging from Mathematical Physics and Dynamical Systems to Probability and Statistics. π C2 (5) = 1. Theorem 21. Consider the quantity ν [a] (x) deﬁned as the average number of visits to state x between two successive visits to state a–see 16. among others. 1 − αβ 1 − αβ Both C1 . We will avoid to give a precise deﬁnition of the term ‘ergodic’ but only mention what constitutes a result in the area. ϕ(4) = (1 − β)ϕ(5) + βϕ(3). 56 (20) x∈S . 4} (inessential states). 20 Ergodic theorems An ergodic theorem is a result about time-averages of random processes. the 19-th century French mathematicians Henri Poincar´ and e Joseph Liouville. ϕ(3) = p31 → ϕ(3)π C1 (1). (n) (n) p35 → (1 − ϕ(3))π C2 (5). p42 → ϕ(4)π C1 (2). for example. (n) (n) p43 → 0. as n → ∞. The ergodic hypothesis in physics says that the proportion of time that a particle spends in a particular region of its “phase space” is proportional to the volume of this region. Pioneers in the are were. ϕ(5) = 0. C2 = {5} (absorbing state). ϕ(3) = (1 − α)ϕ(2) + αϕ(4) . we assumed the existence of a positive recurrent state. p41 → ϕ(4)π C1 (1). Ergodic Theory is nowadays an area that combines many ﬁelds of Mathematics and Science. π C1 (2) = . 1+γ 1+γ Let ϕ(i) be the probability that class C1 will be reached starting from state i. in Theorem 14. The stationary distributions associated with the two closed classes are: 1 γ π C1 (1) = .The communicating classes are I = {3. The phase (or state) space of a particle is not just a physical space but it includes. Suppose that a Markov chain possesses a recurrent state a ∈ S and starts from an initial state X0 such that P (Ta < ∞) = 1. Recall that. We now generalise this. 2} (closed class). Similarly. whence 1−α β(1 − α) . Hence. p44 → 0. Let f be a reward function such that |f (x)|ν [a] (x) < ∞.

which is the quantity denoted by f in Theorem 14. So there are always recurrent states. Replacing n by the subsequence Nt . then. Then at least one of the states must be visited inﬁnitely often. As in the proof of Theorem 14. The Law of Large Numbers tells us that 1 lim n→∞ n n−1 Gr = EG1 . we have t→∞ lim 1 last 1 G = lim Gﬁrst = 0. t t→∞ t t t with probability one. and this is left as an exercise. since the number of states is ﬁnite. so the numerator converges to f /Ea Ta . Remark 2: If a chain is irreducible and positive recurrent then the Ergodic Theorem Theorem 14 holds. But this is precisely the condition (20). we have C := i∈S ν(i) < ∞. we break the total reward as follows: t Nt −1 f (Xn ) = n=0 r=1 Gr + Gﬁrst + Glast . and this justiﬁes the terminology of §18. r=0 with probability 1 as long as E|G1 | < ∞. we have that the denominator equals Nt /t and converges to 1/Ea Ta . Proof. we assumed positive recurrence of the ground state a. respectively. As in Theorem 14. From proposition 2 we know that the balance equations ν = νP have at least one solution. So if we divide by t the numerator and denominator of the fraction in (21). 21 Finite chains We take a brief look at some things that are speciﬁc to chains with ﬁnitely many states. Remark 1: The diﬀerence between Theorem 14 and 21 is that. If so.Then P where f := x∈S t→∞ lim t n=0 t n=0 f (Xn ) 1(Xn = a) =f =1 . 57 . t t where Gr = T (r) ≤t<T (r+1) f (Xt ). we obtain the result. The reader is asked to revisit the proof of Theorem 14. we have Nt /t → 1/Ea Ta . Note that the denominator t n=0 1(Xn = a) equals the number Nt of visits to state a up to time t. in the former. (21) f (x)ν [a] (x). We only need to check that EG1 = f . Glast are the total rewards over the ﬁrst and t t a a last incomplete cycles. while Gﬁrst .5. as we saw in the proof of Theorem 14. Now.

Reversibility means that (X0 . X0 ). . because positive recurrence is a class property (Theorem 15). a ﬁnite Markov chain always has positive recurrent states. simply. Another way to put this is by saying that observing any part of the process will not reveal any information about whether the process runs forwards or backwards in time. Therefore π is a stationary distribution. If. . Proof. P (X0 = i. . If. and run the ﬁlm backwards. like playing a ﬁlm backwards. Xn ) has the same distribution as (Xn . in a sense. then we instantly know that the ﬁlm is played in reverse. for all n. where P is the transition matrix and Π is a matrix all the rows of which are equal to π. . the ﬁrst components of the two vectors must have the same distribution. n ≥ 0). i. this state must be positive recurrent. . In particular. as n → ∞. in addition. Proof.e. It is. A reversible Markov chain is stationary. . For example. . . . Then the chain is positive recurrent with a unique stationary distribution π. running backwards. At least some state i must have π(i) > 0. if we ﬁlm a standing man who raises his arms up and down successively. this means that it is stationary. X0 ) for all n. In addition. (X0 . the chain is aperiodic. i ∈ S. Taking n = 1 in the deﬁnition of reversibility. Lemma 12. X1 ) has the same distribution as (X1 . Suppose ﬁrst that the chain is reversible. we have Pn → Π. C satisﬁes the balance equations and the normalisation condition i∈S π(i) = 1. reversible if it has the same distribution when the arrow of time is reversed. X1 . we see that (X0 . Xn−1 . the stationary distribution is unique. By Corollary 13. actually. . Xn ) has the same distribution as (Xn . . 58 . X1 . without being able to tell that it is. 22 Time reversibility Consider a Markov chain (Xn . then all states are positive recurrent. the chain is irreducible. j ∈ S. Consider an irreducible Markov chain with transition probabilities pij and stationary distribution π. or. . . by Theorem 16. that is. We summarise the above discussion in: Theorem 22. i. X0 = j). X0 ). Xn has the same distribution as X0 for all n. X1 = j) = P (X1 = i. .Hence 1 ν(i). . We say that it is time-reversible. But if we ﬁlm a man who walks and run the ﬁlm backwards. π(i) := Hence. it will look the same. Xn−1 . Theorem 23. More precisely. . Because the process is Markov. Consider an irreducible Markov chain with ﬁnitely many states. in addition. Then the chain is reversible if and only if the detailed balance equations hold: π(i)pij = π(j)pji .

for all n. Conversely. By induction. Theorem 24. we have separation of variables: pij = f (i)g(j). (X0 . So the chain is reversible. using the DBE. pi1 i2 pi2 i3 · · · pim−1 im = pi1 im pim im−1 · · · pi2 i1 . . and write. and this means that P (X0 = i. Conversely. 2) The DBE are equivalent to saying that the ratio pij /pji can be written as a product of a function of i and a function of j. X1 = i) = π(j)pji . . then. . X1 = j. X1 = j) = π(i)pij . Consider an aperiodic irreducible Markov chain with transition probabilities pij and stationary distribution π. . . We will show the Kolmogorov’s loop criterion (KLC). P (X0 = j.for all i and j. Pick 3 states i. . i4 . But P (X0 = i. pji 59 (m−3) → π(i1 ). X1 . and the chain is reversible. In other words. if there is an arrow from i to j in the graph of the chain but no arrow from j to i. X0 ). X1 = j. . X0 ). The DBE imply immediately that (X0 . X1 . . and pi1 i2 (m−3) → π(i2 ). . . then the chain cannot be reversible. Then the chain is reversible if and only if the Kolmogorov’s loop criterion holds: for all i1 . π(i)pij pjk pki = [π(j)pji ]pjk pki = [π(j)pjk ]pji pki = [π(k)pkj ]pji pki = [π(k)pki ]pji pkj = [π(i)pik ]pji pkj = π(i)pik pkj pji Since the chain is irreducible. im−1 we obtain (m−3) (m−3) pi1 i2 pi2 i1 = pi1 i2 pi2 i1 If the chain is aperiodic. . Notes: 1) A necessary condition for reversibility is that pij > 0 if and only if pji > 0. k. X2 = k) = P (X0 = k. Xn ) has the same distribution as (Xn . These are the DBE. X0 ). Hence the DBE hold. . X2 = i). X2 ) has the same distribution as (X2 . π(i)pij pjk = [π(j)pji ]pjk = [π(j)pjk ]pji = π(k)pkj pji . i2 . Summing up over all choices of states i3 . letting m → ∞. using DBE. j. j. So the detailed balance equations (DBE) hold. Now pick a state k and write. (22) Proof. it is easy to show that. Suppose ﬁrst that the chain is reversible. . The same logic works for any m and can be worked out by induction. Hence cancelling the π(i) from the above display we obtain the KLC for m = 3. and hence (X0 . for all i. . X1 ) has the same distribution as (X1 . we have pi2 i1 implying that π(i1 )pi1 i2 = π(i2 )pi2 i1 . suppose the DBE hold. . im ∈ S and all m ≥ 3. suppose that the KLC (22) holds. Xn−1 . . X1 . . we have π(i) > 0 for all i. k.

to guarantee. and by replacing. there isn’t any. Example 1: Chain with a tree-structure. Consider a positive recurrent irreducible Markov chain such that if pij > 0 then pji > 0. the graph becomes disconnected. If the graph thus obtained is a tree. that the chain is positive recurrent. Form the graph by deleting all self-loops. 60 . If we remove the link between them. by assumption.(Convention: we form this ratio only when pij > 0. meaning that it has no loops. there would be a loop. (If it didn’t. namely the arrow from i to j. using another method. A) (see §4. Ac be the sets in the two connected components. pji g(i) Multiplying by g(i) and summing over j we ﬁnd g(i) = j∈S g(j)pji .e. Hence the original chain is reversible. the two oriented edges from i to j and from j to i by a single edge without arrow. the detailed balance equations. So g satisﬁes the balance equations. consider the chain: Replace it by: This is a tree. f (i)g(i) must be 1 (why?).) Necessarily. Ac ) = F (Ac . in addition. i. This gives the following method for showing that a ﬁnite irreducible chain is reversible: • Form the ratio pij /pji and show that the variables separate. • If the chain has inﬁnitely many states. and. The balance equations are written in the form F (A. then we need. j for which pij > 0. and the arrow from j to j is the only one from Ac to A. so we have pij g(j) = . And hence π can be found very easily. The reason is as follows. for each pair of distinct states i. Pick two distinct states i.1) which gives π(i)pij = π(j)pji . then the chain is reversible. j for which pij > 0. There is obviously only one arrow from AtoAc . For example. Hence the chain is reversible.) Let A.

Consider an undirected graph G with vertices S. Now deﬁne the the following transition probabilities: pij = 1/δ(i). Suppose the graph is connected. where C is a constant. etc. and a set E of undirected edges. j are not neighbours then both sides are 0. Each edge connects a pair of distinct vertices (states) without orientation. In all cases. For example.e. j are neighbours 1 1 then π(i)pij = Cδ(i) δ(i) = C and π(j)pji = Cδ(j) δ(j) = C. The constant C is found by normalisation. A RWG is a reversible chain. States j are i are called neighbours if they are connected by an edge. i∈S where |E| is the total number of edges. We claim that π(i) = Cδ(i). Hence we conclude that: 1. If i. 1 4 0 2 3 Here we have p12 = p10 = p14 = 1/3. i∈S Since δ(i) = 2|E|. π(i)pij = π(j)pji . 0. we have that C = 1/2|E|. 61 . The Markov chain with these transition probabilities is called a random walk on a graph (RWG). the DBE hold. a ﬁnite set. To see this let us verify that the detailed balance equations hold. The stationary distribution assigns to a state probability which is proportional to its degree. if j is a neighbour of i otherwise. The stationary distribution of a RWG is very easy to ﬁnd. i. There are two cases: If i. so both members are equal to C. π(i) = 1. p02 = 1/4. 2. For each state i deﬁne its degree δ(i) = number of neighbours of i.Example 2: Random walk on a graph.

If π is the stationary distribution of the original chain then π/π(A) is the stationary distribution of (Yr ).i. then (Yk . 1. . .23 New Markov chains from old ones* We now deal with the question: if we have a Markov chain (Xn ). . k = 0. . a sequence of integers (nk )k≥0 such that 0 ≤ n0 < n1 < n2 < · · · and by letting Yk := Xnk . . . 2. if. . . i. 1. in addition. k = 0. where pij are the m-step transition probabilities of the original chain. then it is a stationary distribution for the new chain. The state space of (Yr ) is A. The sequence (Yk . In this case. 2.e. random variables with values in N and distribution λ(n) = P (ξ0 = n). how can we create new ones? 23. 62 n ∈ N. . k = 0. 2. . π is a stationary distribution for the original chain.2 Watching a Markov chain when it visits a set Suppose that the original chain is ergodic (i.d. Due to the Strong Markov Property. 2. i.e. Let (St . 1. Deﬁne Yr := XT (r) . 1.1 Subsequences The most obvious way to create a new Markov chain is by picking a subsequence. 23.e.) has time-homogeneous transition probabilities: P (Yk+1 = j|Yk = i) = pij . this process is also a Markov chain and it does have time-homogeneous transition probabilities. t ≥ 0) be independent from (Xn ) with S0 = 0. k = 0. in general. . if nk = n0 + km. A r = 1. t ≥ 0. . St+1 = St + ξt . where ξt are i. 2. . irreducible and positive recurrent).) has the Markov property but it is not time-homogeneous. (m) (m) 23. Let A be (r) a set of states and let TA be the r-th visit to A. .3 Subordination Let (Xn . . n ≥ 0) be Markov with transition probabilities pij . . If the subsequence forms an arithmetic progression. .

then the process Yn = f (Xn ). . is a Markov chain with P (Yt+1 = j|Yt = i) = n=1 t ≥ 0. . n≥0 63 . Yn−1 = αn−1 }. (Notation: We will denote the elements of S by i. If. Indeed. . that there is a function Q : S ′ × S ′ → R such that P (Yn+1 = β | Xn = i) = Q(f (i). j. however. α1 . Yn−1 ) = P (Yn+1 = β | Yn = α). .) More generally. . a Markov chain. We need to show that P (Yn+1 = β | Yn = α. (n) 23. . . is NOT.Then Yt := XSt . and the elements of S ′ by α. . i. conditional on Yn the past is independent of the future. β.4 Deterministic function of a Markov chain If (Xn ) is a Markov chain with state space S and if f : S → S ′ is some function from S to some countable set S ′ . . we have: Lemma 13. Proof. f is a one-to-one function then (Yn ) is a Markov chain. Fix α0 . αn−1 ∈ S ′ and let Yn−1 be the event Yn−1 := {Y0 = α0 ... . ∞ λ(n)pij . . Then (Yn ) has the Markov property. . knowledge of f (Xn ) implies knowledge of Xn and so. β). in general. . Suppose that P (Yn+1 = β | Xn = i) depends on i only through f (i). .e.

because the probability of going up is the same as the probability of going down. and if i = 0. for all i ∈ Z and all β ∈ Z+ . β).for all choices of the variables involved. Thus (Yn ) is Markov with transition probabilities Q(α. if β = 1. β). depends on i only through |i|. Deﬁne Yn := |Xn |. if we let 1/2. i ∈ Z. β) P (Yn = α. Xn = i|Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. Yn−1 ) = i∈S P (Yn+1 = β | Xn = i.e. observe that the function | · | is not one-to-one. β). P (Xn+1 = i ± 1|Xn = i) = 1/2. But we will show that P (Yn+1 = β|Xn = i). To see this. β)P (Xn = i | Yn = α. β). β) P (Yn = α|Yn−1 ) i∈S = Q(α. indeed. conditional on {Xn = i}. β) P (Yn = α|Xn = i. i. i ∈ Z. Yn−1 )P (Xn = i | Yn = α. But P (Yn+1 = β | Yn = α. Yn = α. Yn−1 ) Q(f (i). We have that P (Yn+1 = β|Xn = i) = Q(|i|. Then (Yn ) is a Markov chain. if β = α ± 1. 64 . regardless of whether i > 0 or i < 0. α = 0. the absolute value of next state will either increase by 1 or decrease by 1 with probability 1/2. then |Xn+1 | = 1 for sure. β ∈ Z+ . β) 1(α = f (i))P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. Indeed. Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = Q(α. Yn−1 ) Q(f (i). β) := 1. Example: Consider the symmetric drunkard’s random walk. In other words. β) i∈S P (Yn = α|Xn = i. if i = 0. and so (Yn ) is. Yn−1 )P (Xn = i|Yn−1 ) P (Yn = α|Yn−1 ) = i∈S = i∈S = i∈S Q(f (i). a Markov chain with transition probabilities Q(α. On the other hand. α > 0 Q(α.

Then the size of the (n + 1)-st generation is: (n) (n) (n) Xn+1 = ξ1 + ξ2 + · · · + ξXn . Let ξk be the number of oﬀspring of the k-th individual in the n-th generation.i. . . Hence all states i ≥ 1 are inessential. since ξ (0) . . 2. But here is an interesting special case. So assume that p1 = 1. The idea is this: At each point of time n we have a number Xn of individuals each of which gives birth to a random number of oﬀspring. 0 ≤ j ≤ i. 1. Thus. Hence all states i ≥ 1 are transient. k = 0. Computing the transition probabilities pij = P (Xn+1 = j|Xn = i) is. if 0 is never reached then Xn → ∞. ξ2 . .. are i. We let pk be the probability that an individual will bear k oﬀspring. We ﬁnd the population (n) at time n + 1 by adding the number of oﬀspring of each individual. if the current population is i there is probability pi that everybody gives birth to zero oﬀspring. If p1 = P (ξ = 1) = 1 then Xn = X0 + n and this is a trivial case. We distinguish two cases: Case I: If p0 = P (ξ = 0) = 0 then the population can only grow. . j In this case. We will not attempt to compute the pij neither draw the transition diagram because they are both futile. In other words. Case II: If p0 = P (ξ = 0) > 0 then state 0 can be reached from any other state: indeed. therefore transient. .1 Applications Branching processes: a population growth application We describe a basic model for population growth known as the Galton-Watson process. The state 0 is irrelevant because it cannot be reached from anywhere. Example: Suppose that p1 = p. Back to the general case. we have that (Xn . n ≥ 0) has the Markov property. so we will ﬁnd some diﬀerent method to proceed. If we know that Xn = i then Xn+1 is the number of successful births amongst i individuals. the population cannot grow: it will eventually reach 0. in general. . a hard problem.24 24. each individual gives birth to at one child with probability p or has children with probability 1 − p. and 0 is always an absorbing state. ξ (n) ). and. one question of interest is whether the process will become extinct or not. If we let ξ (n) = ξ1 . if the 65 (n) (n) . . independently from one another.d. (and independent of the initial population size X0 –a further assumption). p0 = 1 − p. Then P (ξ = 0) + P (ξ ≥ 2) > 0. Therefore. It is a Markov chain with time-homogeneous transitions. . The parent dies immediately upon giving birth. 0 But 0 is an absorbing state. and following the same distribution. which has a binomial distribution: i j pij = p (1 − p)i−j . then we see that the above equation is of the form Xn+1 = f (Xn . ξ (1) . with probability 1.

Let ε be the extinction probability of the process. We then have that D = D1 ∩ · · · ∩ Di . 66 . We let ϕn (z) := Ez Xn ∞ = k=0 z k P (Xn = k) be the probability generating function of Xn . If µ ≤ 1 then the ε = 1. we have n→∞ lim ϕn (0) = lim P (Xm = 0 for all m ≥ n) n→∞ = P (there exists n such that Xm = 0 for all m ≥ n) = P (D) = ε. We wish to compute the extinction probability ε(i) := Pi (D). we can argue that ε(i) = εi .population does not become extinct then it must grow to inﬁnity. ϕn (0) = P (Xn = 0). This function is of interest because it gives information about the extinction probability ε. Indeed. If µ > 1 then ε < 1 is the unique solution of ϕ(z) = z. i. Let ∞ µ = Eξ = k=1 kpk be the mean number of oﬀspring of an individual. let Dk be the event that the branching process corresponding to the k-th initial ancestor dies. Obviously. For each k = 1. and. ε(0) = 1. Indeed. when we start with X0 = i. since Xn = 0 implies Xm = 0 for all m ≥ n. . Consider a Galton-Watson process starting with X0 = 1 initial ancestor. . Proof. so P (D|X0 = i) = εi . Let D be the event of ‘death’ or extinction: D := {Xn = 0 for some n ≥ 0}. and P (Dk ) = ε. where ε = P1 (D). if we start with X0 = i in ital ancestors. The result is this: Proposition 5. . For i ≥ 1. each one behaves independently of one another. . Let ∞ ϕ(z) := Ez ξ = k=0 z k pk be the probability generating function of ξ. Therefore the problem reduces to the study of the branching process starting with X0 = 1.

ϕn (0) ≤ ε∗ . But ϕn+1 (0) → ε as n → ∞. and since the latter has only two solutions (z = 0 and z = ε∗ ). This means that ε = 1 (see left ﬁgure below). and. Also ϕn (0) → ε. ε = ε∗ . (See right ﬁgure below. Let us show that.) 67 . Therefore ε = ε∗ . ϕn+1 (z) = ϕ(ϕn (z)). recall that ϕ is a convex function with ϕ(1) = 1.Note also that ϕ0 (z) = 1. z=1 Also. So ε ≤ ε∗ < 1. We then have ϕn+1 (z) = Ez Xn+1 = E z ξ1 z ξ2 · · · z ξXn ∞ (n) (n) (n) ϕ1 (z) = ϕ(z). ϕ(ϕn (0)) → ϕ(ε). and so on. Hence ϕ(0) ≤ ϕ(ε∗ ) = ε∗ . On the other hand. = i=0 ∞ E z ξ 1 z ξ 2 · · · z ξ Xn E z ξ1 z ξ2 · · · z ξi (n) (n) (n) (n) (n) (n) (n) (n) 1(Xn = i) 1(Xn = i) (n) = i=0 ∞ = i=0 ∞ E[z ξ1 ] E[z ξ2 ] · · · E[z ξi ] E[1(Xn = i)] ϕ(z)i P (Xn = i) = i=0 = E ϕ(z)Xn = ϕn (ϕ(z)). in fact. all we need to show is that ε < 1. Therefore. ϕ3 (z) = ϕ2 (ϕ(z)) = ϕ(ϕ(ϕ(z))) = ϕ(ϕ2 (z)). if the slope µ at z = 1 is ≤ 1. ϕn+1 (0) = ϕ(ϕn (0)). In particular. But 0 ≤ ε∗ . Since both ε and ε∗ satisfy z = ϕ(z). But ϕn (0) → ε. the graph of ϕ(z) lies above the diagonal (the function z) and so the equation z = ϕ(z) has no solutions other than z = 1. Therefore ε = ϕ(ε). since ϕ is a continuous function. if µ > 1. then the graph of ϕ(z) does intersect the diagonal at some point ε∗ < 1. By induction. Notice now that µ= d ϕ(z) dz . Hence ϕ2 (z) = ϕ1 (ϕ(z)) = ϕ(ϕ(z)).

2. Case: µ > 1. if Sn = 1. Then ε = 1. 3. . If collision happens. then max(Xn + An − 1. are allocated to hosts 1. 4. 5. . assume that on a time slot there are x packets amongst all hosts that are waiting for transmission (they are ‘backlogged’).d. Let s be the number of selected packets. . otherwise. 1. Since things happen fast in a computer network. on the next times slot. Xn+1 = Xn + An .i. there is no time to make a prior query whether a host has something to transmit or not. Thus. in a simpliﬁed model. on the next times slot. Each of the backlogged or newly arrived packets is selected for transmission with (a small) probability p. there are x + a − 1 packets. say. periodically.2 The ALOHA protocol: a computer communications application This is a protocol used in the early days of computer networking for having computer hosts communicate over a shared medium (radio channel). there is collision and the transmission is unsuccessful. 2. there are x + a packets. 3 hosts. So if we let Xn be the number of backlogged packets at the beg ginning of the n-th time slot. If s ≥ 2. using the same frequency. then the transmission is not successful and the transmissions must be attempted anew. If s = 1 there is a successful transmission and. there is a collision and. So if there are. . . 24. k ≥ 0. 0). But this is not a good method it is a waste of time to allocate a slot to a host if that host has nothing to transmit. is that if more than one hosts attempt to transmit a packet.1 1 (z) (z) ε 0 1 0 ε 1 z z Case: µ ≤ 1. Nowadays. we let hosts transmit whenever they like. The problem of communication over a common channel. During this time slot assume there are a new packets that need to be transmitted. random variables with some common distribution: P (An = k) = λk . then time slots 0. for a packet to go through. 68 .. 6. . We assume that the An are i. 1. So. One obvious solution is to allocate time slots to hosts in a speciﬁc order. and Sn the number of selected packets. 2. Abandoning this idea. Ethernet has replaced Aloha. Then ε < 1. 3. 3. 1. An the number of new packets on the same time slot. only one of the hosts should be transmitting at a given time.

An−1 .e.x−1 + k≥1 kpx. n ≥ 0) is a Markov chain. . we think as follows: to have a reduction of x by 1. x+k−1 k≥1 + µ − λ1 (x + 1)pq − λk (x + k)pq k≥2 = µ − q G(x). Since p > 0. An = a. k ≥ 1. Xn−1 . We also let µ := k≥1 kλk be the expectation of An . Let us ﬁnd its transition probabilities. 69 . Unless µ = 0 (which is a trivial case: no arrivals!). for example we can make it ≤ µ/2. To compute px. So we can make q x G(x) as small as we like by choosing x suﬃciently large.e. So: px. Sn should be 0 or ≥ 2) or we need k + 1 new packets and one selection (Sn = 1). so q x G(x) tends to 0 as x → ∞. The reason we take the maximum with 0 in the above equation is because.x+k kλk − kλk (x + k)pq x+k−1 + k≥1 x k≥1 = −λ0 xpq x−1 + = −λ0 xpq x x−1 kλk+1 (x + k + 1)pq x+k .where λk ≥ 0 and k≥0 kλk = 1. we think like this: to have an increase by k we either need exactly k new packets and no successful transmissions (i. . and exactly one selection: px. The process (Xn .x−1 = P (An = 0. To compute px. and so q x tends to 0 much faster than G(x) tends to ∞.x+k for k ≥ 1. the Markov chain (Xn ) is transient. but we mention that it is based on the so-called drift criterion which says that if the expected change Xn+1 − Xn conditional on Xn = x is bounded below from a positive constant for all large x. we have ∆(x) = (−1)px. selections are made amongst the Xn + An packets. deﬁned by‘ ∆(x) := E[Xn+1 − Xn |Xn = x]. we must have no new packets arriving.) = x + a k x−k p q . k 0 ≤ k ≤ x + a. Proving this is beyond the scope of the lectures. then the chain is transient. The probabilities px.x+k = λk (1 − (x + k)pq x+k−1 ) + λk+1 (x + k + 1)pq x+k . i. we have q < 1. So P (Sn = k|Xn = x.x are computed by subtracting the sum of the above from 1. The main result is: Proposition 6. we have that the drift is positive for all large x and this proves transience. we should not subtract 1. Idea of proof. The random variable Sn has. and p. conditional on Xn and An (and all past states and arrivals) a distribution which is Binomial with parameters Xn + An .x−1 when x > 0. We here only restrict ourselves to the computation of this expected change. The Aloha protocol is unstable. Indeed. where G(x) is a function of the form α + βx . if Xn + An = 0. where q = 1 − p. For x > 0. Sn = 1|Xn = x) = λ0 xpq x−1 . . Therefore ∆(x) ≥ µ/2 for all large x.

2. Academic departments link to each other by hiring their 70 . the rank of a page may be very important for a company’s proﬁts. if page i has a link to page j then we put an arrow from i to j and consider it as a directed edge of the graph. All you need to do to get high rank is pay (a lot of money). unless he gets bored–and this happens with probability α–in which case he clicks on a random page. if j ∈ L(i) pij = 1/|V |. to compute the stationary distribution π corresponding to the Markov chain with transition matrix P = [pij ]: π = πP. For each page i let L(i) be the set of pages it links to. Comments: 1. |V | These are new transition probabilities. The web graph is a a graph set of vertices V consisting of web pages and edges representing hyperlinks: thus. So if Xn = i is the current location of the chain. In an Internet-dominated society. Then it says: page i has higher rank than j if π(i) > π(j). So this is how Google assigns ranks to pages: It uses an algorithm. We pick α between 0 and 1 (typically. the next location Xn+1 is picked uniformly at random amongst all pages that i links to. So we make the following modiﬁcation.24. The interpretation is that of a ‘bored surfer’: a surfer moves according the the original Markov chain.3 PageRank (trademark of Google): A World Wide Web application PageRank is an algorithm used by the popular search engine Google in order to assign importance to a web page. The way it does that is by computing the stationary distribution of a Markov chain that we now describe. 3. α ≈ 0. otherwise. There is a technique. if L(i) = ∅ 0. If L(i) = ∅. It uses advanced techniques from linear algebra to speed up computations. there are companies that sell high-rank pages. Therefore. because of the size of the problem. in the latter case. The idea is simple. that allows someone to manipulate the PageRank of a web page and make it much higher. Its implementation though is one of the largest matrix computations in the world.2) and let α pij := (1 − α)pij + . then i is a ‘dangling page’. known as ‘302 Google jacking’. the chain moves to a random page. It is not clear that the chain is irreducible. neither that it is aperiodic. A new use of PageRank is to rank academic doctoral programs based on their records of ﬁnding jobs for their graduates. unless i is a dangling page. Deﬁne transition probabilities as follows: 1/|L(i)|. The algorithm used (PageRank) starts with an initial guess π(0) for π and updates it by π (k) = π (k−1) P. PageRank.

But an AA mating with a Aa may produce a child of type AA or one of type Aa. We have a transition from x to y if y alleles of type A were produced. To simplify. We are now ready to deﬁne the Markov chain Xn representing the number of alleles of type A in generation n. One of the square alleles is selected 4 times. it makes the evaluation procedure easy. Imagine that we have a population of N individuals who mate at random producing a number of oﬀspring. Three of the square alleles are never selected. 4. generation after generation. What is important is that the evaluation can be done without ever looking at the content of one’s research. then. two AA individuals will produce and AA child.g. Alleles that are never picked up may belong to individuals who never mate or may belong to individuals who mate but their oﬀspring did not inherit this allele. So let x be the number of alleles of type A (so and 2N − x is the number of those of type a). Do the same thing 2N times. maintain this allele for the next generation. 5. their oﬀspring inherits an allele from each parent. this allele is of type A with probability x/2N . The process repeats.faculty from each other (and from themselves). a dominant (say A) and a recessive (say a). An individual can be of type AA (meaning that all its alleles are A) or aa or Aa. the genetic diversity depends only on the total number of alleles of each type. all we need is the information outlined in this ﬁgure. at the next generation the number of alleles of type A can be found as follows: Pick an allele at random. We would like to study the genetic diversity. in each generation. a gene controlling a speciﬁc characteristic (e. We will assume that a child chooses an allele from each parent at random. This number could be independent of the research quality but. Needless to say that these uses of PageRank and PageRank-like systems represent corruption at the deepest level. Thus. One of the circle alleles is selected twice. If so. 71 . we will assume that.4 The Wright-Fisher model: an example from biology In an organism. We will also assume that the parents are replaced by the children. This means that the external appearance of a characteristic is a function of the pair of alleles. The alleles in an individual come in pairs and express the individual’s genotype for that gene. When two individuals mate. meaning how many characteristics we have. from the point of view of an administrator. colour of eyes) appears in two forms. Lazy university administrators use PageRank-like systems for assigning a single number to a researcher’s output. But it is a kind of corruption that is hard to trace since it is unanimously equated with modern use of Information Technology. In this ﬁgure there are 4 alleles of type A (circles) and 6 of type a (squares). Since we don’t care to keep track of the individuals’ identities. 24. And so on.

) One can argue that. 1− y 2N 2N x Notice that there is a chance (= (1 − 2N )2N ) that no allele of type A is selected at all. y ≤ 2N − 1. E(Xn+1 |Xn . . at “the end of time” the chain (started from x) will be either M with probability ϕ(x) or 0 with probability 1 − ϕ(x). M and thus. n n where. Similarly. ϕ(x) = x/M. ϕ(M ) = 1. . . . i. Clearly. the expectation doesn’t change and. . . since. ξM are i. X0 ) = E(Xn+1 |Xn ) = M This implies that. conditionally on (X0 . Xn ). on the average. we would like to know the probability ϕ(x) = Px (T0 > TM )P (T0 > TM |X0 = x) that the chain will eventually be absorbed by state M . M Therefore. The fact that M is even is irrelevant for the mathematical model. the states {1. Since pxy > 0 for all 1 ≤ x. . ϕ(x) = y=0 pxy ϕ(y). Since selections are independent. we see that 2N x 2N −y x y pxy = . . 0 ≤ y ≤ 2N. Clearly.e. . . . . X0 ) = Xn . (Here. i. . as usual.2N = 1. for all n ≥ 0. The result is correct. the random variables ξ1 . .e. . with n P (ξk = 1|Xn . since. and the population becomes one consisting of alleles of type a only. Therefore P (Xn = 0 or Xn = 2N for all large n) = 1. so both 0 and 2N are absorbing states. Tx := inf{n ≥ 1 : Xn = x}. . Let M = 2N . Another way to represent the chain is as follows: n n Xn+1 = ξ1 + · · · + ξM . and the population becomes one consisting of alleles of type A only. . We show that our guess is correct by proceeding via the mundane way of verifying that it satisﬁes the ﬁrst-step analysis equations. there is a x chance (= ( 2N )2N ) that only alleles of type A are selected. . . 72 . X0 ) = 1 − Xn . EXn+1 = EXn Xn = Xn . the number of alleles of type A won’t change. 2N − 1} form a communicating class of inessential states. the chain gets absorbed by either 0 or 2N . .d. . p00 = p2N.Each allele of type A is selected with probability x/2N . . M n P (ξk = 0|Xn . These are: M ϕ(0) = 0. but the argument needs further justiﬁcation because “the end of time” is not a deterministic time. we should have M × ϕ(x) + 0 × (1 − ϕ(x)) = x. on the average.i. . .

at time n + 1. the demand is.i. We can express these requirements by the recursion Xn+1 = (Xn + An − Dn ) ∨ 0 where a ∨ b := max(a. larger than the supply per unit of time. M x M (M −1)−(y−1) x x +1− M M M −1 = 24. A number An of new components are ordered and added to the stock. . so that Xn+1 = (Xn + ξn ) ∨ 0. the demand is only partially satisﬁed. n ≥ 0) is a Markov chain. . The correctness of the latter can formally be proved by showing that it does satisfy the recursion. say. 2. Since the Markov chain is represented by this very simple recursion. Xn electronic components at time n. there are Xn + An − Dn components in the stock. We will show that this Markov chain is positive recurrent. on the average. we will try to solve the recursion. we ﬁnd X1 = (X0 + ξ0 ) ∨ 0 X2 = (X1 + ξ1 ) ∨ 0 = (X0 + ξ0 + ξ1 ) ∨ ξ1 ∨ 0 ··· X3 = (X2 + ξ2 ) ∨ 0 = (X0 + ξ0 + ξ1 + ξ2 ) ∨ (ξ1 + ξ2 ) ∨ ξ2 ∨ 0 Xn = (X0 + ξ0 + · · · + ξn−1 ) ∨ (ξ1 + · · · + ξn−1 ) ∨ · · · ∨ ξn−1 ∨ 0. 73 . If the demand is for Dn components. and so. We will apply the so-called Loynes’ method. We have that (Xn . Working for n = 1. Dn ). and the stock reached level zero at time n + 1.d. for all n ≥ 0. The right-hand side of the last equals M y=0 M y x M y x 1− M M −y y = M = M y=1 M y=1 M! x y!(M − y)! M y 1− x M M −y y M x M M −y−1+1 x (M − 1)! (y − 1)!(M − y)! M M y=1 y−1+1 1− x = M x = M as needed. provided that enough components are available. n ≥ 0) are i. M −1 y−1 x M y−1 1− x . and E(An − Dn ) = −µ < 0. Otherwise. Thus. Let ξn := An − Dn . the demand is met instantaneously.The ﬁrst two are obvious.5 A storage or queueing application A storage facility stores. Here are our assumptions: The pairs of random variables ((An . while a number of them are sold to the market. . b). Let us consider (n) gxy := Px (Xn > y) = P (x + ξ0 + · · · + ξn−1 ) ∨ (ξ1 + · · · + ξn−1 ) ∨ · · · ∨ ξn−1 ∨ 0 > y .

. . for y ≥ 0. we may take the limit of both sides as n → ∞. . . ξn−1 ) = (ξn−1 . in computing the distribution of Mn which depends on ξ0 . and. instead of considering them in their original order. the random variables (ξn ) are i. n→∞ n→∞ We claim that π(y) := g(y) − g(y − 1) satisﬁes the balance equations. . . . 74 . The limit of a bounded and increasing sequence certainly exists: g(y) := lim g0y = lim P0 (Xn > y). where the latter equality follows from the fact that.i. . for all n. (n) . In particular.d. . . we may replace the latter random variables by ξ1 . (n) gxy = P (x + ξn−1 + · · · + ξ0 ) ∨ (ξn−2 + · · · + ξ0 ) ∨ · · · ∨ ξ0 ∨ 0 > y . ξn . Let Zn := ξ0 + · · · + ξn−1 Mn := Zn ∨ Zn−1 ∨ · · · ∨ Z1 ∨ 0. using π(y) = limn→∞ P (Mn = y). we ﬁnd π(y) = x≥0 π(x)pxy . . . . we reverse it. the event whose probability we want to compute is a function of (ξ0 . . . however. But Mn+1 = 0 ∨ max Zj 1≤j≤n+1 =0∨ d 1≤j≤n+1 max (Zj − Z1 ) + ξ0 = 0 ∨ (Mn + ξ0 ). . ξ0 ). Then Since Mn+1 ≥ Mn we have g0y ≤ g0y (n) (n) (n+1) g0y = P (Mn > y). that d (ξ0 . ξn−1 . so we can shuﬄe them in any way we like without altering their joint distribution. ξn−1 ). . Therefore. = x≥0 Since the n-dependent terms are bounded. Indeed. P (Mn+1 = y) = P (0 ∨ (Mn + ξ0 ) = y) = x≥0 P (Mn = x)P (0 ∨ (x + ξ0 ) = y) P (Mn = x) pxy . Hence. .Thus. Notice. . .

So π satisﬁes the balance equations. . . This will be the case if g(y) is not identically equal to 1. .} > y) is not identically equal to 1.} < ∞) = 1. Depending on the actual distribution of ξn . . 75 . From the Law of Large Numbers we have P ( lim Zn /n = −µ) = 1. n→∞ Hence g(y) = P (0 ∨ max{Z1 . It can also be shown (left as an exercise) that if Eξn > 0 then the Markov chain is transient. we will use the assumption that Eξn = −µ < 0. . Z2 . The case Eξn = 0 is delicate. the Markov chain may be transient or null recurrent. . Here. Z2 . n→∞ But then P ( lim Mn = 0 ∨ max{Z1 . We must make sure that it also satisﬁes y π(y) = 1. n→∞ This implies that P ( lim Zn = −∞) = 1. as required.

±ed } otherwise 76 . . where C is such that x p(x) = 1. if x = ej . x ∈ Zd . we need to give more structure to the state space S. Example 1: Let p(x) = C2−|x1 |−···−|xd | . we won’t go beyond Zd . if x ∈ {±e1 . the set of d-dimensional vectors with integer coordinates. . To be able to say this.g. so far. d 0.and space-homogeneous transition probabilities given by px. d p(x) = qj . groups) are allowed. p(−1) = q. . this walk enjoys. that of spatial homogeneity. namely that to go from state x to state y it must pass through all intermediate states because its value can change by at most 1 at each step. . we can let S = Zd . given a function p(x). otherwise where p + q = 1. Deﬁne pj . For example. besides time. if x = −ej .y+z for any translation z. . This means that the transition probability pxy should depend on x and y only through their relative positions in space. . . In dimension d = 1. More general spaces (e. 0. the skip-free property.y = p(y − x). then p(1) = p. namely. j = 1. . Our Markov chains. have been processes with timehomogeneous transition probabilities. Example 2: Let where ej be the vector that has 1 in the j-th position and 0 everywhere else. . A random walk possesses an additional property. Example 3: Deﬁne p(x) = 1 2d . but.y = px+z. a random walk is a Markov chain with time. A random walk of this type is called simple random walk or nearest neighbour random walk. If d = 1. . . p(x) = 0.and space-homogeneity. where p1 + · · · + pd + q1 + · · · + qd = 1.PART II: RANDOM WALKS 25 Random walk The purpose of this part of the notes is to take a closer look at a special kind of Markov chain known as random walk. j = 1. . This is the drunkard’s walk we’ve seen before. a structure that enables us to say that px. So. otherwise. here.

d. the initial state S0 is independent of the increments. But this would be nonsense. Indeed. . We call it simple symmetric random walk. We shall resist the temptation to compute and look at other 77 . computing the n-th order convolution for all n is a notoriously hard problem. which. . p(x) = 0. then p(1) = p(−1) = 1/2. in addition. We easily see that P (ξ1 + · · · + ξn = x) = p∗n (x). has equal probabilities to move from one state x to one of its “neighbouring states” x ± e1 . and this is the symmetric drunkard’s walk. in general. x ∈ Zd . We refer to ξn as the increments of the random walk. . Let us denote by Sn the state of the walk at time n. Furthermore. So. (Recall that δz is a probability distribution that assigns probability 1 to the point z. are i. we can reduce several questions to questions about this random walk. (When we write a sum without indicating explicitly the range of the summation variable. The convolution of two probabilities p(x). ξ2 . If d = 1. . p(n) = Px (Sn = y) = P (x + ξ1 + · · · + ξn = y) = P (ξ1 + · · · + ξn = y − x). p′ (x). xy The random walk ξ1 + · · · + ξn is a random walk starting from 0. x ± ed . otherwise. The special structure of a random walk enables us to give more detailed formulae for its behaviour.This is a simple random walk. One could stop here and say that we have a formula for the n-step transition probability. y p(x)p′ (y − x). p ∗ δz = p. where ξ1 . we have P (ξ1 + ξ2 = x) = (p ∗ p)(x). p ∗ (p′ ∗ p′′ ) = (p ∗ p′ ) ∗ p′′ . .) In this notation. we will mean a sum over all possible values. . It is clear that Sn can be represented as Sn = S0 + ξ1 + · · · + ξn . x ∈ Zd is deﬁned as a new probability p ∗ p′ given by (p ∗ p′ )(x) = The reader can easily check that p ∗ p′ = p′ ∗ p.) Now the operation in the last term is known as convolution. notice that P (ξ1 + ξ2 = x) = y P (ξ1 = x. . Furthermore. Obviously. random variables in Zd with common distribution P (ξn = x) = p(x). We will let p∗n be the convolution of p with itself n times. because all we have done is that we dressed the original problem in more fancy clothes. x + ξ2 = y) = y p(x)p(y − x).i.

1) → (4. So if the initial position is S0 = 0. +1. a random vector with values in {−1. ξn ). all possible values of the vector are equally likely. 1}n . we are dealing with a uniform distribution on the sample space {−1. we have P (ξ1 = ε1 . 2) → (3. a path of length n. where will it go? 26 The simple symmetric random walk in dimension 1: path counting This is a random walk with values in Z and transition probabilities pi. Since there are 2n elements in {0. who cares about the exact value of p∗n (x) for all n and x? Perhaps it is the case (a) that diﬀerent questions we need to ask or (b) that approximate methods are more informative than exact ones. +1}n .2) (4.d.i. + (1. ξn = εn ) = 2−n . we shall learn some counting methods. random variables (random signs) with P (ξn = ±1) = 1/2. consider the following: 78 . +1}n . . After all. +1}n .1) + − (5. If yes. A ⊂ {−1.methods. 0) → (1. Clearly. When the walk starts from 0 we can write Sn = ξ1 + · · · + ξn . if we can count the number of elements of A we can compute its probability. Here are a couple of meaningful questions: A. . . and so #A P (A) = n . to each initial position and to each sequence of n signs. Look at the distribution of (ξ1 . −1} then we have a path in the plane that joins the points (0. and the sequence of signs is {+1. how long will that take? C. .i+1 = pi. Will the random walk ever return to where it started from? B.1) + (0. In the sequel.2) where the ξn are i. 2) → (5. . for ﬁxed n. Therefore. 1): (2.0) We can thus think of the elements of the sample space as paths.1) − (3. . If not. 2 where #A is the number of elements of A. we should assign. but its practice may be hard. First. As a warm-up. −1.i−1 = 1/2 for all i ∈ Z. Events A that depend only on these the ﬁrst n random signs are subsets of this sample space. 1) → (2. . +1. . The principle is trivial. So.

i) (0. y)}. y) is given by: #{(m. 0) and end up at the speciﬁc point (n.0) The lightly shaded region contains all paths of length n that start at (0. S6 < 0. We use the notation {(m. i). 0) and n ends at (n. y) is given by: #{(0. In if n + i is not an even number then there is no path that starts from (0. x) and end up at (n.047. Hence (n+i)/2 = 0. Proof. the number of paths that start from (m. and compute the probability of the event A = {S2 ≥ 0. S4 = −1. 0) and end up at (n. Consider a path that starts from (0. y)} =: {all paths that start at (m. x) (n. (There are 2n of them. 0) and ends at (n. 0) (n. S7 < 0}.Example: Assume that S0 = 0. 0)). y)} = n−m . 2 1 3 0 1 2 4 5 6 7 −1 −2 −3 − −4 −7 26.1 Paths that start and end at speciﬁc points Now consider a path that starts from a speciﬁc point and ends up at a speciﬁc point.) The darker region shows all paths that start at (0. n 79 . We can easily count that there are 6 paths in A. i). in this case. The paths comprising this event are shown below. i)} = n n+i 2 More generally. i). The number of paths that start from (0. x) (n. Let us ﬁrst show that Lemma 14. (n − m + y − x)/2 (23) Remark. x) and end up at (n. (n. So P (A) = 6/26 = 6/128 = 3/64 ≈ 0.

Let u be the number of +’s in this path (the number of upward segments) and d the number of −’s (the number of downward segments). Then u + d = n, Hence u = (n + i)/2, d = (n − i)/2. So if we know where the path starts from and where it ends at we know how many times it has gone up and how many down. The question then becomes: given a sequence of n signs out of which u are + and the rest are −, in how many ways can we arrange them? Well, in exactly n ways, and this proves the formula. u Notice that u = (n + i)/2 must be an integer. This is true if n + i is even or, equivalently if n and i are simultaneously even or simultaneously odd. If this is not the case, then there is no path that can go from (0, 0) to (n, i). For example (look at the ﬁgure), there is no path that goes from (0, 0) to (3, 2). We shall thus deﬁne n to be zero if u is not an integer. u More generally, the number of paths that start from (m, x) and end up at (n, y) equals the number of paths that start from (0, 0) and end up at (n − m, y − x). So, applying the previous formula, we obtain (23). u − d = i.

26.2

**Paths that start and end at speciﬁc points and do not touch zero at all
**

(n, y) that never become n−m . − m − y − x)

Lemma 15. Suppose that x, y > 0. The number of paths (m, x) zero (i.e. they do not touch the horizontal axis) is given by: #{(m, x) (n, y); remain > 0} =

1 2 (n

n−m − − m + y − x)

1 2 (n

(24)

Proof.

(m,x) (n,y)

The shaded region contains all possible paths that start at (m, x) and end up at the speciﬁc point (n, y) which, in addition, never touch the horizontal axis.

n−m

Consider, instead, a path that starts at (m, x) and ends up at point (n, y) which DOES touch the horizontal axis, like the path indicated by the solid line below. There is a trick, 80

called reflection, which is useful here. Note that to each such path there corresponds another path obtained by following it up to the ﬁrst time it hits zero and then reﬂecting it around the horizontal axis. In the ﬁgure below, follow the solid path starting at (m, x) up to the ﬁrst time it hits zero and then follow the dotted path: this is the reﬂected path.

(n, y)

(m, x)

(n, −y)

A path and its reﬂection around a horizontal line. Notice that the reﬂection starts only at the point where the path touches the horizontal line.

Note that if we know the reﬂected path we can recover the original path. Hence we may as well count the number of reﬂected paths. But each reﬂected path starts from (m, x) and ends at (n, −y) and so it necessarily crosses the horizontal axis. We thus apply (23) with −y in place of y: #{(m, x) (n, y); touch zero} = #{(m, x) = So, for x > 0, y > 0, #{(m, x) (n, y); remain > 0} = #{(m, x) =

1 2 (n

n−m . (n − m − y − x)/2

(n, −y)}

n−m − − m + y − x)

(n, y)} − #{(m, x)

1 2 (n

(n, y); touch zero}

n−m . − m − y − x)

**Corollary 14. For any y > 0, #{(1, 1) Let m = 1, x = 1 in (24): #{(1, 1) (n, y); remain > 0} = n−1 n−1 − 1 + y) − 1 2 (n − y) − 1 (n − 1)! (n − 1)! − n−y = n+y n−y n−y ! ! 2 −1 ! 2 2 −1 ! 2 1n+y 1n−y n! = − n+y n 2 n 2 ! n−y ! 2 2
**

1 2 (n

(n, y); remain > 0} =

y n

#{(0, 0)

(n, y)}.

(25)

=

y n

n

n+y 2

,

and the latter term equals the total number of paths from (0, 0) to (n, y).

81

Corollary 15. The number of paths from (m, x) to (n, y) that do not touch level z < min(x, y) is given by: #{(m, x) (n, y); remain > z} = n−m − − m + y − x) n−m . − m − y − x) − z (26)

1 2 (n

1 2 (n

Proof. Because it is only the relative position of x and y with respect to z that plays any role, we have #{(m, x) (n, y); remain > z} = #{(m, x + z) (n, y + z); remain > 0}

and so the formula is obtained by an application of (24). Corollary 16. The number of paths from (0, 0) to (n, y), where y ≥ 0, that remain ≥ 0 is given by: n n #{(0, 0) (n, y); remain ≥ 0} = n+y − n+y (27) 2 2 +1 Proof. #{(0, 0) (n, y); remain ≥ 0} = #{(0, 0) = = where we used (26) with z = −1. Corollary 17 (Catalan numbers). The number of paths from (0, 0) to (n, 0) that remain ≥ 0 is given by: #{(0, 0) Proof. Apply (27) with y = 0: #{(0, 0) (n, 0); remain ≥ 0} = n

n 2

= #{(0, 1) n

n+y 2

(n, y); remain > −1} n −1 n , n+y 2 +1

n−y 2

(n, y + 1); remain > 0}

− −

n

n+y 2

(n, 0); remain ≥ 0} =

1 n+1 . n + 1 n/2

−

n 2

n −1

=

1 n+1 , n + 1 n/2

where the latter calculation is as in the proof of (25). Corollary 18. The number of paths that start from (0, 0), have length n, and remain nonnegative is given by: #{(0, 0) (n, •); remain ≥ 0} = n . ⌈n/2⌉ (28)

where ⌈x⌉ denotes the smallest integer that is ≥ x. 82

Proof. To see, this, just use (27) and sum over all y ≥ 0. If n = 2m then we need to set y = 2z in (27) (otherwise (27) is just zero) and so we have #{(0, 0) = (2m, •); remain ≥ 0} = 2m m+z − 2m m+z+1

z≥0

2m 2m 2m 2m 2m 2m 2m ± ··· + − + ··· + − + − 2m 2m 2m − 1 m+2 m+1 m+1 m n 2m , = = n/2 m

because all the terms that are denoted by · · · are equal to 0, while the rest cancel with one another, and only the ﬁrst one remains. If n = 2m + 1 then we need to set y = 2z + 1 in (27) (otherwise (27) is just zero) and so we have #{(0, 0) = (2m + 1, •); remain ≥ 0} = 2m + 1 2m + 1 − m+z+1 m+z+2

z≥0

2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 2m + 1 ±· · · + − +· · ·+ − + − 2m + 1 2m + 1 2m m+3 m+2 m+2 m+1 n 2m + 1 . = = ⌈n/2⌉ m+1

We have shown that the shaded regions contain exactly the same number of paths. Was this, a posteriori, obvious?

2m

2m

27

The simple symmetric random walk in dimension 1: simple probability calculations

Since all paths of certain length have the same probability, we can translate the path counting formulae into corresponding probability formulae.

27.1

**Distribution after n steps
**

n 2−n . (n + y)/2 83

Lemma 16. P0 (Sn = y) =

n This is called the ballot theorem and the reason will be given in Section 29. (n − m + y − x)/2 u(n) := P0 (Sn = 0) = Proof. 27. . Sn ≥ 0 | Sn = 0) = 84 1 . we shall just compute it. . n/2 (29) #{(0. The left side of (30) equals #{paths (1. if n is even. if n is even. the random walk never becomes zero up to time n is given by P0 (S1 > 0. P0 (Sn = y) 2 (n + y) + 1 . y)} × 2−n n−1 n−1 − 1 1 (n − 1 + y − 1) 2 (n − 1 − y − 1) = 2 n 1 (n + y) 2 = y . y ≥ 0 are both even or both odd then P0 (S1 ≥ 0 .P (Sn = y|Sm = x) = In particular. 27. given that S0 = 0 and Sn = y > 0. But. If n. y)} . n 2−n . (n/2) + 1 y+1 P0 (Sn = y + 2) = 1 . S2 > 0. Such a simple answer begs for a physical/intuitive explanation. . 0) (n. n−m 2−n−m . y) that do not touch zero} × 2−n #{paths (0. . Sn ≥ 0 | Sn = y) = 1 − In particular. Sn > 0 | Sn = y) = y . . .3 Some conditional probabilities Lemma 17. for now. We have P0 (Sn = y) = and P (Sn = y|Sm = x) = and we apply (23). 0) (n.2 The ballot theorem Theorem 25. . n 2 #{paths (m. . P0 (S1 ≥ 0 . y)} . . The probability that. . x) 2n−m (n. 1) (n. n (30) Proof.

Proof. For even n we have P0 (S1 ≥ 0. Sn ≥ 0). . . (e) follows from the fact that ξ1 is independent of ξ2 . . .4 Some remarkable identities Theorem 26. . Sn ≥ 0 | Sn = y) = n = n+y 2 P0 (Sn ≥ 0 . . . . .). . . Sn ≥ 0. . Sn > 0) = 2P0 (ξ1 = 1)P0 (S2 > 0. . . +1 27. . . .) by (ξ1 . . . •). . ξ2 + · · · + ξn ≥ 0) = P0 (S1 ≥ 0. . We next have P0 (S1 = 0. . . . ξ3 . . and Sn−1 > 0 implies that Sn ≥ 0. . . and ﬁnally (g) is trivial because Sn−1 ≥ 0 means Sn−1 > 0 because Sn−1 cannot take the value 0. Sn ≥ 0) = #{(0. 0) (n. 1 + ξ2 + · · · + ξn > 0 | ξ1 = 1) = P0 (S1 ≥ 0. We use (27): P0 (Sn ≥ 0 . . (b) follows from symmetry. 85 . ξ2 + ξ3 ≥ 0. (f ) follows from the replacement of (ξ2 . Sn > 0 | ξ1 = 1) = P0 (1 + ξ2 > 0. Sn < 0) (b) (a) = 2P0 (S1 > 0. . . P0 (S1 ≥ 0. . Sn = 0) = P0 (S1 > 0. . . . . . Proof. . P0 (Sn = y) − n n+y 2 n +1 n+y 2 This is further equal to 1− ( n+y 2 ( n! n−y + 1)! ( 2 − 1)! n+y 2 )! ( n−y )! 2 =1− n! n−y 2 n+y 2 + 1 = n+y 2 y+1 . (c) (d) (e) (f ) (g) where (a) is obvious. Sn = y) P0 (Sn = y) =1− P0 (Sn = y + 2) . (c) is just conditioning on {ξ1 = 1} and using the fact that S1 = ξ1 . . . . . . If n is even. n/2 where we used (28) and (29). . Sn ≥ 0) = P0 (S1 = 0. Sn > 0) + P0 (S1 < 0. and from the fact that if m is integer then 1 + m > 0 is equivalent to m ≥ 0. . ξ2 . . . . remain ≥ 0} = 2n n 2−n = P0 (Sn = 0). . . . . . (d) follows from P0 (ξ1 = 1) = 1/2 and from the substitution of the value of ξ1 on the left of the conditioning. . Sn−1 ≥ 0) = P0 (ξ2 ≥ 0. ξ3 . . . Sn = 0) = P0 (Sn = 0). . . . 1 + ξ2 + ξ3 > 0. . . . . .

So the last term is 1/2. So 1 f (n) = 2u(n) P0 (S1 > 0. Sn = 0) P0 (S1 > 0. . 86 . . . . . Sn = 0) = u(n) . and u(n) = P0 (Sn = 0). the two terms must be equal. Fix a state a. with equal probability. Sn = 0) = 2P0 (Sn−1 > 0. . By the Markov property at time n − 1. . . . n We know that P0 (Sn = 0) = u(n) = n/2 2−n . . . So f (n) = 2P0 (S1 > 0. If a particle is to the right of a then you see its image on the left. . It is clear that if a particle starts at 0 and performs a simple symmetric random walk then its mirror image starts at a and performs a simple symmetric random walk. Sn = 0) Now. We now want to ﬁnd the probability f (n) that the walk will return to 0 for the ﬁrst time in n steps. Sn−1 > 0. Sn = 0 means Sn−1 = 1. . Then f (n) := P0 (S1 = 0. . for even n. . Theorem 27. Put a two-sided mirror at a. . we either have Sn−1 = 1 or −1. Sn = 0). Sn = 0) = P0 (Sn = 0) P (Sn−1 > 0|Sn = 0). This is not so interesting.27. Sn = 0) + P0 (S1 < 0. we see that Sn−1 > 0. given that Sn = 0. . and if the particle is to the left of a then its mirror image is on the right. . By symmetry. . P0 (Sn−1 > 0. n−1 Proof. Sn−1 > 0. To take care of the last term in the last expression for f (n). . n−1 28 The reﬂection principle for a simple random walk in dimension 1 The reflection principle says the following. . . the probability u(n) that the walk (started at 0) attains value 0 in n steps. 2 and the last probability can be computed from the ballot theorem–see (30): it is equal to 1/(n − 1). Let n be even. Sn−2 > 0|Sn−1 > 0. . Also. Since the random walk cannot change sign before becoming zero. Sn = 0. f (n) = P0 (S1 > 0. we can omit the last event. Sn−1 = 0. Sn−2 > 0|Sn−1 = 1). . Sn−1 < 0.5 First return to zero In (29). So u(n) f (n) = . we computed. So if the particle is in position s then its mirror image is in position s = 2a − s because s − a = a − s.

What is more interesting is this: Run the particle till it hits a (it will. 2a − Sn . n ∈ Z+ ) has the same law as (Sn . 28. . Sn > a) = P0 (Ta ≤ n. deﬁne Sn = where is the ﬁrst hitting time of a. n < Ta . But P0 (Ta ≤ n) = P0 (Ta ≤ n. If (Sn ) is a simple symmetric random walk started at 0 then so is (Sn ). m ≥ 0) is a simple symmetric random walk. Sn ≤ a). In the last event. the process (STa +m − a. observing that Sn has the same distribution as −Sn we get that this is also equal to P0 (|Sn | ≥ a) − 1 P0 (|Sn | = a). a + (Sn − a). Sn ≤ a) + P0 (Sn > a).1 Distribution of hitting time and maximum We can take advantage of this principle to establish the distribution function of Ta : Theorem 29. S1 . The last equality follows from the fact that Sn > a implies Ta ≤ n. we have n ≥ Ta . Let (Sn . . Trivially. Finally. n ≥ 0) be a simple symmetric random walk and a > 0. . In other words. Thus. . called (Sn . independent of the past before Ta . So Sn − a can be replaced by a − Sn in the last display and the resulting process. Theorem 28. for sure–Theorem 35) and after that consider its mirror image Sn = 2a − Sn . Sn ≤ a) + P0 (Ta ≤ n. 2 Deﬁne now the running maximum Mn = max(S0 . n ≥ Ta . Combining the last two displays gives P0 (Ta ≤ n) = P0 (Sn ≥ a) + P0 (Sn > a) = P0 (Sn ≥ a) + P0 (Sn ≥ a) − P0 (Sn = a). Then Ta = inf{n ≥ 0 : Sn = a} Sn . n ≥ Ta By the strong Markov property. So: P0 (Sn ≥ a) = P0 (Ta ≤ n. Sn = Sn . n < Ta . n ∈ Z+ ). 2 Proof. we obtain 87 . Proof. as a corollary to the above result. But a symmetric random walk has the same law as its negative. Then. If Sn ≥ a then Ta ≤ n. Sn ). Then 1 P0 (Ta ≤ n) = 2P0 (Sn ≥ a) − P0 (Sn = a) = P0 (|Sn | ≥ a) − P0 (|Sn | = a). Sn ≥ a). so we may replace Sn by 2a − Sn (Theorem 28). P0 (Sn ≥ a) = P0 (Ta ≤ n. 2a − Sn ≥ a) = P0 (Ta ≤ n.

posed in 18879 asks the following: if we start picking the items from the urn one by one without replacement. (1887).Corollary 19. Observe that Mn ≥ x if and only if Sk ≥ x for some k ≤ n which is equivalent to Tx ≤ n and use Theorem 29. The answer is very simple: Theorem 31. Proof. Paris. Rend. Rend. we have P0 (Mn < x. Sn = y). . Compt. then the probability that. An urn is a vessel. combined with (31). Sn = y) = P0 (Sn = y) − P0 (Sn = 2x − y) . (1887). as long as a ≥ b. The set of values of η contains n elements and η is uniformly distributed a in this set. Sn = 2x − y). 8 88 . by applying the reﬂection principle (Theorem 28). We can do more: we can derive the joint distribution of Mn and Sn : Theorem 30. we can replace Sn by 2x − Sn in the last part of the last display and get P0 (Tx ≤ n. Sn = y) = P0 (Tx ≤ n. 369. where ηi indicates the colour of the i-th item picked. Sn = y) = P0 (Tx > n. Sn = y) = P0 (Sn = y) − P0 (Tx ≤ n. Sci. This. of which a are coloured azure and b black. ηn ). (31) x > y. the ashes of the dead after cremation. D. Since Mn < x is equivalent to Tx > n. n = a + b. 29 Urns and the ballot theorem Suppose that an urn8 contains n items. P0 (Mn < x. 105. If Tx ≤ n. It is the latter use we are interested in in Probability and Statistics. The original ballot problem. 1 P0 (Mn ≥ x) = P0 (Tx ≤ n) = 2P0 (Sn ≥ x) − P0 (Sn = x) = P0 (|Sn | ≥ x) − P0 (|Sn | = x). e 10 Andr´. Sn = y) = P0 (Sn = 2x − y). gives the result. during sampling without replacement. what is the probability that the number of azure items is constantly ahead of the number of black ones? The problem was solved10 in the same year. 9 Bertrand. But if x > y then 2x − y > x and so {Sn = 2x − y} ⊂ {Tx ≤ n}. 105. A sample from the urn is a sequence η = (η1 . Compt. as for holding liquids. . and anciently for holding ballots to be drawn. employed for diﬀerent purposes. J. then. ornamental objects. If an urn contains a azure items and b = n−a black items. the number of selected azure items is always ahead of the black ones equals (a − b)/n. Bertrand. Solution d’un probl`me. which results into P0 (Tx ≤ n. 2 Proof. Acad. Sci. usually a vase furnished with a foot or pedestal. . 436-437. e e e Paris. Solution directe du probl`me r´solu par M. . Acad.

the result follows. . ξn ) is uniformly distributed in a set with n elements. . ξn = εn ) P0 (Sn = y) 2−n 1 = = . The values of each ξi are ±1. Let ε1 . ξn ) is. Proof of Theorem 31. . 30 The asymmetric simple random walk in dimension 1 Consider now a simple random walk with values in Z. . . ηn ) an urn process with parameters n. 89 . using the deﬁnition of conditional probability and Lemma 16. But if Sn = y then. We need to show that.i+1 = p. ξn = εn | Sn = y) = P0 (ξ1 = ε1 . . p(n+k)/2 q (n−k)/2 . . . Proof. . . . . .We shall call (η1 . It remains to show that all values are equally a likely. . Consider a simple symmetric random walk (Sn . Then. . the sequence (ξ1 . . . Sn > 0 | Sn = y) = . Then. . . . so ‘azure’ items are +1 and ‘black’ items are −1. +1} such that ε1 + · · · + εn = y. n ≥ 0) with values in Z and let y be a positive integer. . (The word “asymmetric” will always mean “not necessarily symmetric”. . . conditional on the event {Sn = y}. conditional on {Sn = y}. . . b be deﬁned by a + b = n.i−1 = q = 1 − p. a we have that a out of the n increments will be +1 and b will be −1. −n ≤ k ≤ n. . the ‘black’ items correspond to − signs). . Since an urn with parameters n. n Since y = a − (n − a) = a − b. always assuming (without any loss of generality) that a ≥ b = n − a. Sn > 0} that the random walk remains positive. the sequence (ξ1 . the event that the azure are always ahead of the black ones during sampling is precisely the event {S1 > 0. we can translate the original question into a question about the random walk. . . where y = a − (n − a) = 2a − n. indeed. εn be elements of {−1. n n −n 2 (n + y)/2 a Thus. Since the ‘azure’ items correspond to + signs (respectively. a can be realised by a simple symmetric random walk starting from 0 and ending at Sn = y. . (ξ1 . n .) So pi. So the number of possible values of (ξ1 . a − b = y. . We have P0 (Sn = k) = n n+k 2 pi. i ∈ Z. . . . But in (30) we computed that y P0 (S1 > 0. we have: P0 (ξ1 = ε1 . ξn ) of the increments is an urn process with parameters n and a = (n + y)/2. letting a. conditional on {Sn = y}. . ξn ) has a uniform distribution. . . . but which is not necessarily symmetric. . An urn can be realised quite easily be a random walk: Theorem 32. a. .

for x > 0.} ∪ {∞}. In view of this. and all t ≥ 0. Consider a simple random walk starting from 0.d.i. Lemma 18. The process (Tx . If x > 0. 90 . We will approach this using probability generating functions. . If S1 = 1 then T1 = 1. Consider a simple random walk starting from 0. . . plus a time T1 required to go from −1 to 0 for the ﬁrst time. .1 First hitting time analysis We will use probabilistic + analytical methods to compute the distribution of the hitting times Tx = inf{n ≥ 0 : Sn = x}. Start with S0 = 0 and watch the process until the ﬁrst n = T1 such that Sn = 1. To prove this. See ﬁgure below. (We tacitly assume that S0 = 0. x ∈ Z. x − 1 in order to reach x. then the summands in Tx = T1 + (T2 − T1 ) + · · · + (Tx − Tx−1 ) are i. random variables. Lemma 19. x ≥ 0) is also a random walk. Proof.30. Theorem 33. 2qs Proof. Then the probability generating function of the ﬁrst time that level 1 is reached is given by ϕ(s) = E0 sT1 = 1− 1 − 4pqs2 . Let ϕ(s) := E0 sT1 . Corollary 20. Proof. This random variable takes values in {0. 1. For all x. we make essential use of the skip-free property. namely the obvious fact that a simple random walk starting from 0 must pass through levels 1. it is no loss of generality to consider the random walk starting from 0.) Then. y ∈ Z. 2. . E0 sTx = ϕ(s)x . Px (Ty = t) = P0 (Ty−x = t). Consider a simple random walk starting from 0. If S1 = −1 then T1 is the sum of three times: one unit of time spent ′ to go from 0 to −1. In view of Lemma 19 we need to know the distribution of T1 . only the relative position of y with respect to x is needed. The rest is a consequence of the strong Markov property. plus a ′′ further time T1 required to go from 0 to +1 for the ﬁrst time. 2. This follows from the spatial homogeneity of the transition probabilities: to reach y starting from x. . .

The ﬁrst time n such that Sn = −1 is the ﬁrst time n such that −Sn = 1. The formula is obtained by solving the quadratic. the RW must ﬁrst hit 0 (it takes time ′ ′′ T1 to do this) and then 1 (it takes a further time of T1 ). 2ps Proof. Corollary 21. Consider a simple random walk starting from 0. from the strong Markov property. Corollary 22. We thus have ′ ′′ ′ ′′ ′ ′′ 1 ′ ′′ 1 + T1 + T 1 if S1 = 1 if S1 = −1 ϕ(s) = ps + qsϕ(s)2 . in order to hit 1.Sn 1 0 −1 n T1’ T1’’ If S1 = −1 then. Consider a simple random walk starting from 0. Use Lemma 19 and the previous formulae of Theorem 33 and Corollary 21. Hence the previous formula applies with the roles of p and q interchanged. T1 is independent of T1 conditional on S1 = −1 and each with distribution that of T1 . But note that −Sn is a simple random walk with upward transition probability q and downward transition probability p. Hence T1 = This can be used as follows: ϕ(s) = E0 [s1(S1 = 1) + s1+T1 +T1 1(S1 = −1)] = sP (S1 = 1) + sE0 [sT1 sT1 |S1 = −1]P (S1 = −1) = sp + sE0 [sT1 sT1 |S1 = −1]q ′′ ′ But. Then the probability generating function of the ﬁrst time that level −1 is reached is given by E0 sT−1 = 1− 1 − 4pqs2 . 91 . 0 if x > 0 2qs Tx (32) E0 s = √ |x| E sT−1 |x| = 1− 1−4pqs2 0 . Then the probability generating function of the ﬁrst time that level x ∈ Z is reached is given by √ x E sT1 x = 1− 1−4pqs2 . if x < 0 2ps Proof.

(Pay attention: it is important to write n ≥ 1 inside the set!) Theorem 34. The latter is. ′ But if S1 = −1 then T0 equals 1 plus the time required for the walk to hit 0 starting from −1. 1 − 4pqs2 . For a symmetric ′ random walk.30. For the general case. k 92 . k by the binomial theorem. Then the probability generating function of the ﬁrst return time to 0 is given by E0 s T 0 = 1 − Proof.2: P0 (T0 = n) = f (n) = u(n)/(n − 1) is n is even. how long will it take till it ﬁrst returns to 0? We are talking about the stopping time ′ T0 := inf{n ≥ 1 : Sn = 0}. this was found in §30. in distribution. the same as the time required for the walk to hit 1 starting ′ from 0. then we can omit the k upper limit in the sum. E0 s T 0 = E0 s T 0 ′ ′ ′ 1 − 4pqs2 . 1(S1 = −1) ′ + E0 s T 0 ′ ′ 1(S1 = 1) = qE0 sT0 | S1 = −1 + pE0 sT0 | S1 = 1 . E0 sT0 = qE0 (s1+T1 ) + pE0 (s1+T−1 ) = qs 1− 1 − 4pqs2 1− + ps 2qs 1 − 4pqs2 . we have computed the probability generating function E0 sT0 –see Theorem 34 The tool we need here is Taylor’s theorem for the function f (x) = (1 + x)α . 2ps ′ =1− 30. If α = n is a positive integer then n (1 + x)n = k=0 n k x .3 The distribution of the ﬁrst return to the origin We wish to ﬁnd the exact distribution of the ﬁrst return time to the origin. We again do ﬁrst-step analysis. where α is a real number. Similarly.2 First return to the origin Now let us ask: If the particle starts at 0. Consider a simple random walk starting from 0. then T0 equals 1 plus a time whose distribution that of the time required for the walk to hit −1 starting from 0. and simply write (1 + x)n = k≥0 n k x . if S1 = 1. If we deﬁne n to be equal to 0 for k > n.

and it looks exactly the same: (1 + x)α = k≥0 n(n − 1) · · · (n − k + 1) n! = . Indeed. k as long as we deﬁne α k = α(α − 1) · · · (α − k + 1) . k We can write this in more familiar symbols if we observe that 2k −k 1 1/2 4 . (−1)k = 2k − 1 k k and this is shown by direct computation. k!(n − k)! k! α k x . but that we won’t worry about which ones. 2k − 1 k 93 . So √ 1−x= ′ k≥0 1 2k k x =1+ 2k − 1 k 1 − 4pqs2 : ′ k≥1 2k k 1 x 2k − 1 k We apply this to E0 sT0 = 1 − E0 s T 0 = k≥1 1 2k (pq)k s2k . k! k! (−1)2k xk = k≥0 xk . √ 1 − x. For example. let us compute √ (−1)k (−x)k = k≥0 k≥0 k≥0 −1 (−x)k k = (−1)k k! (−1)(−2) · · · (−1 − k + 1) = = (−1)k . (1 − x)−1 = But. by deﬁnition. we will show that (1 − x)−1 = 1 + x + x2 + x3 + · · · which is of course the all too-familiar geometric series. We have k≥0 1 − x = (1 − x)1/2 = 1/2 (−1)k xk . k! The adverb ‘formally’ is supposed to mean that this formula holds for some x around 0. −1 k so (1 − x)−1 = as needed. As another example.Recall that n k = It turns out that the formula is formally true for general α.

Comparing the two ‘inﬁnite polynomials’ we see that we must have n = 2k (we knew that– the random walk can return to 0 only in an even number of steps) and then ′ P0 (T0 = 2k) = 1 1 2k k k p q = P0 (S2k = 0). by the deﬁnition of E0 sT0 . so 1 − 4pq = So P0 (Tx < ∞) = 1 − 4p + 4p2 = (1 − 2p)2 = |1 − 2p| = |p − q|. 94 . By applying the law of large numbers we have that P0 Sn =p−q n→∞ n lim = 1. • If p > 1/2 then p − q > 0 and so Sn tends to +∞ with probability one.But. P0 (Tx < ∞) = p q q p ∧1 ∧1 . s↑1 which is true for any probability generating function. That it is actually null-recurrent follows from Markov chain theory. 2k − 1 k 2k − 1 31 Recurrence properties of simple random walks Consider again a simple random walk Sn with values in Z with upward probability p and downward probability q = 1 − p. The quantity inside the square root becomes 1 − 4pq. without appeal to the Markov chain theory. • If p < 1/2 then p − q < 0 and so Sn tends to −∞ with probability one. x if x > 0 . We will see how the results about the generating functions of the hitting and return times help us answer these questions directly. We apply this to (32). But remember that q = 1 − p. if p = q then P0 (Tx < ∞) = 1 for all x ∈ Z. E0 s T 0 = n≥0 ′ ′ ′ P0 (T0 = n)sn . Hence it is again transient. if x < 0 This formula can be written more neatly as follows: Theorem 35. x 1−|p−q| 2q 1−|p−q| 2p . if x < 0 |x| In particular. The key fact needed here is: P0 (Tx < ∞) = lim E0 sTx . • If p = 1/2 then p − q = 0 and the law of large numbers only tells us that the walk is not positive recurrent. |x| if x > 0 (33) . Hence it is transient.

. q p. Assume p < 1/2. P0 (M∞ ≥ x) = lim P0 (Mn ≥ x) = lim P0 (Tx ≤ n) = P0 (Tx < ∞) = (p/q)x . We know that it cannot be positive recurrent. 31. with some positive probability. 95 . x ≥ 0. S2 . 31. if x < 0 which is what we want. but here it is < ∞. What is this probability? Theorem 37.1 Asymptotic distribution of the maximum Suppose now that p < 1/2. it will never ever return to the point where it started from. . Unless p = 1/2.) If we let Mn = max(S0 . Corollary 23. . we obtain what we are looking for. Indeed. The last part of Theorem 35 tells us that it is recurrent. This means that there is a largest value that will be ever attained by the random walk. Then limn→∞ Sn = −∞ with probability 1. . this probability is always strictly less than 1. Otherwise. Then ′ P0 (T0 < ∞) = 2(p ∧ q). If p < q then |p − q| = q − p and. Consider a simple random walk with values in Z. This value is random and is denoted by M∞ = sup(S0 .2 Return probabilities Only in the symmetric case p = q = 1/2 will the random walk return to the point where it started from. Consider a random walk with values in Z. Then the overall maximum M∞ has a geometric distribution: P0 (M∞ ≥ x) = (p/q)x . S1 . n→∞ n→∞ x ≥ 0. Let ′ T0 = inf{n ≥ 1 : Sn = 0} be the ﬁrst return time to 0. Theorem 36. except that the limit may be ∞. again. Proof. every nondecreasing sequence has a limit. if x > 0 . with probability 1 because p < 1/2. If p > q then |p − q| = p − q and so (33) gives P0 (Tx < ∞) = 1. the sequence Mn is nondecreasing. . . . The simple symmetric random walk with values in Z is null-recurrent. Proof. and so it is null-recurrent. then we have M∞ = limn→∞ Mn .Proof. Sn ).

2. (34) a random variable ﬁrst considered in Section 10 where it was also shown to be geometric. as already known. k = 0. 2. k = 0. This proves the ﬁrst formula. 1. . And so P0 (J0 ≥ k) = P0 (J0 ≥ 1)k . . . 96 . meaning that Pi (Ji = ∞) = 1. If p = q this gives 1. 1. Consider a simple random walk with values in Z. Remark 1: By spatial homogeneity. if a Markov chain starts from a state i then Ji is geometric. If p > q this gives 1 − (p − q) = 1 − (1 − q − q) = 2q. But if it is transient. Then. . . In Lemma 4 we showed that. . Theorem 38. . the total number of visits to a state will be ﬁnite. But ′ P0 (T0 < ∞) = lim EsT0 = 1 − s↑1 ′ 1 − 4pq = 1 − |p − q|. For the second formula we have ∞ ∞ E0 J0 = k=1 P0 (J0 ≥ k) = k=1 (2(p ∧ q))k = 2(p ∧ q) . The probability generating function of T0 was obtained in Theorem 34. even for p = 1/2. We now compute its exact distribution. 1]. In the latter case. So ′ P0 (J0 ≥ 1) = P0 (T0 < ∞) = 2(p ∧ q). 1. Ei Ji = 1 − 2(p ∧ q) Remark 2: The formulae work for all values of p ∈ [0. the total number J0 of visits to state 0 is geometrically distributed with P0 (J0 ≥ k) = (2(p ∧ q))k . 2. And if p < q this gives 1 − (q − p) = 1 − (1 − p − p) = 2p. But J0 ≥ 1 if and only if the ﬁrst return time to zero is ﬁnite. The total number of visits to state i is given by ∞ Ji = n=1 1(Sn = i). 2(p ∧ q) E0 J0 = .′ Proof. . . We now look at the the number of visits to some state but if we start from a diﬀerent state. k = 0. . 2(p ∧ q) . we obviously have that the same formulae apply for any i: Pi (Ji ≥ k) = (2(p ∧ q))k . 31. 1 − 2(p ∧ q) Proof. Pi (Ji ≥ k) = 1 for all k. starting from zero. 1 − 2(p ∧ q) this being a geometric series.3 Total number of visits to a state If the random walk is recurrent then it visits any state inﬁnitely many times. where the last probability was computed in Theorem 37.

If p ≤ 1/2 then this gives P0 (Jx ≥ k) = (2p)k−1 . i. Remark 1: The expectation E0 Jx is ﬁnite if p = 1/2. there has already been one visit to state x and so we need at least k − 1 remaining visits in order to have at least k visits in total. Py (Jx ≥ k) = P0 (Jx−y ≥ k) and Ey Jx = E0 Jx−y . starting from 0. (2(p ∧ q))k−1 . k ≥ 1. Then P0 (Tx < ∞. If p ≤ 1/2 then this gives P0 (Jx ≥ k) = (p/q)x (2p)k−1 . E0 Jx = If p ≥ 1/2 then P0 (Jx ≥ k) = (q/p)(−x)∨0 (2q)k−1 . 97 . Jx ≥ k) = P0 (Tx < ∞)Px (Jx ≥ k − 1). E0 Jx = Proof. we have E0 Jx = 1/(1−2p) regardless of x. and the second in Theorem 38. then E0 Jx drops down to zero geometrically fast as x tends to inﬁnity. If p ≥ 1/2 then this gives P0 (Jx ≥ k) = (2q)k−1 . In other words. Apply now the strong Markov property at Tx . it will visit any point below 0 exactly the same number of times on the average. Consider a random walk with values in Z. Jx ≥ k). Remark 2: If the initial state is not 0 then use spatial homogeneity to obtain the analogous formulae. given that Tx < ∞. For x > 0. But for x < 0. The ﬁrst term was computed in Theorem 35. simply because if Jx ≥ k ≥ 1 then Tx < ∞. If p ≤ 1/2 then P0 (Jx ≥ k) = (p/q)x∨0 (2p)k−1 . (q/p)(−x)∨0 1 − 2q k≥1 . So we obtain P0 (Jx ≥ k) = p ∧1 q x (p/q)x∨0 1 − 2p k≥1 . We repeat the argument for x < 0 and ﬁnd P0 (Jx ≥ k) = q ∧1 p |x| (2(p ∧ q))k−1 . If p ≥ 1/2 then this gives P0 (Jx ≥ k) = (q/p)|x| (2q)k−1 . Suppose x > 0. Then P0 (Jx ≥ k) = P0 (Tx < ∞. if the random walk “drifts down” (i. Consider the case p < 1/2. p < 1/2) then.e. ∞ As for the expectation. The reason that we replaced k by k − 1 is because. we apply E0 Jx = k=1 P0 (Jx ≥ k) and perform the sum of a geometric series.e.Theorem 39.

each distributed as T1 . n x > 0. Pause for reﬂection: We have been dealing with the random times Tx for simple random walks (symmetric or asymmetric) from the beginning of the lectures. . = 1− √ 1 − s2 s x . i. 0 ≤ k ≤ n) has the same distribution as (Sk . . which is basically another probability for S. Sn > 0 | Sn = x) = But the left-hand side is P0 (Sn −Sn−k > 0. Tx is the sum of x i. Sn = x) P0 (Tx = n) = . . 0 ≤ k ≤ n) is deﬁned by Sk := Sn − Sn−k . the probabilities P0 (Tx = n) = x P0 (Sn = x). Here is an application of this: Theorem 41. because the two have the same distribution.i. Rewrite the ballot theorem terms of the dual: P0 (S1 > 0.i. ξ1 ). . for a simple symmetric random walk. in Theorem 41. 0 ≤ k ≤ n). In Lemma 19 we observed that. Then P0 (Tx = n) = x P0 (Sn = x) . ξn ) has the same distribution as (ξn . 1 ≤ k ≤ n | Sn = x) = P0 (Sk < x. (Sk . . . Let Tx = inf{n ≥ 1 : Sn = x} be the ﬁrst time that x will be visited. (35) In Theorem 29 we used the reﬂection principle and derived the distribution function of Tx for a simple symmetric random walk: 1 P0 (Tx ≤ n) = P0 (|Sn | ≥ x) − P0 (|Sn | = x). 0 ≤ k ≤ n) of (Sk . random variables.d. n (37) 98 . and thus derived the generating function of Tx : E0 sTx = Specialising to the symmetric case we have E0 s Tx 1− 1 − 4pqs2 2qs x . ξ2 . Proof. Use the obvious fact that (ξ1 . P0 (Sn = x) P0 (Sn = x) x . every time we have a probability for S we can translate it into a probability for S. .e. Theorem 40. . . i=1 Then the dual (Sk . a sum of i. . Let (Sk ) be a simple symmetric random walk and x a positive integer. Fix an integer n. random variables. 0 ≤ k ≤ n − 1.d. Thus. we derived using duality. . for a simple random walk and for x > 0. . (36) 2 Lastly.32 Duality Consider random walk Sk = k ξi . n Proof. ξn−1 .

We remark here something methodological: It was completely diﬀerent techniques that led to results about, essentially, the same thing: the distribution of Tx . This is an important lesson to learn. Of course, the three formulae should be compatible: the number (37) should be the coeﬃcient of sn in (35); summing up (37) should give (36), or taking diﬀerences in (36) should give (37). But on trying to verify these things algebraically, the reader will realise that it is not easy.

33

Amount of time that a SSRW is positive*

Consider a SSRW (Sn ) starting from 0. Deﬁne its trajectory as the polygonal line obtained by joining the points (n, Sn ) in the plane, as described earlier. Watch this trajectory up to time 2n. Let T+ (2n) be the number of sides of the polygonal line that are above the horizontal axis and T− (2n) = 2n − T+ (2n) the number of sides that are below. Clearly, T± (2n) are both even numbers because, if the RW is at 0, it takes an even number of steps to return to 0. We can think of T+ (2n) as the number of leads in 2n coin tosses, where we

0

T 0’

′ Figure 1: Part of the time the SSRW is positive and part negative. After it returns to 0 at time T0 ,

it behaves again like a SSRW and is independent of the past.

have a lead at some time if the number of heads till that time exceeds the number of tails. We are dealing with a fair coin which is tossed 2n times, so the event that the number of heads equals k has maximum probability when k = n (heads=tails). Most people take this statement and translate it into a statement about T+ (2n) arguing that P (T+ (2n) = m) is maximised when m = n, i.e. when T+ (2n) = T− (2n). This is wrong. In fact, as the theorem below shows, the most likely value of P (T+ (2n) = m) is when m = 0 or m = 2n. Theorem 42 ( distribution of the total time the random walk is positive ). P (T+ (2n) = 2k) = P (S2k = 0)P (S2n−2k = 0) , 0 ≤ k ≤ n.

′ Proof. The idea is to condition at the ﬁrst time T0 that the RW returns to 0: n ′ P (T+ (2n) = 2k, T0 = 2r). r=1

P (T+ (2n) = 2k) =

99

′ Between 0 and T0 the random walk is either above the horizontal axis or below; call the ﬁrst event A+ and the second A− and write n ′ ′ P (A+ , T0 = 2r)P (T+ (2n) = 2k | A+ , T0 = 2r) n ′ ′ P (A− , T0 = 2r)P (T+ (2n) = 2k | A− , T0 = 2r)

P (T+ (2n) = 2k) =

r=1

+

r=1

**In the ﬁrst case,
**

′ ′ P (T+ (2n) = 2k | A+ , T0 = 2r) = P (T+ (2n) = 2k−2r | A+ , T0 = 2r) = P (T+ (2n−2r) = 2k−2r), ′ because, if T0 = 2r and A+ occurs then the RW has already spent 2r units of time above the axis and so T+ (2n) is reduced by 2r. Furthermore, we may remove the conditioning ′ because the future evolution of the RW after T0 is independent of the past. In the second case, a similar argument shows ′ ′ P (T− (2n) = 2k | A− , T0 = 2r) = P (T− (2n) = 2k | A− , T0 = 2r) = P (T− (2n − 2r) = 2k).

**We also observe that, by symmetry, 1 ′ ′ ′ P (A+ , T0 = 2r) = P (A− , T0 = 2r) = P (T0 = 2r) = f2r . 2 Letting p2k,2n = P (T+ (2n) = 2k), we have shown that p2k,2n = 1 2
**

k

**f2r p2k−2r,2n−2r +
**

r=1

1 2

n−k

**f2r p2k,2n−2r
**

r=1

We want to show that p2k,2n = u2k u2n−2k , where u2k = P (S2k = 0). With this hypothesis, we can start playing with the recursion of the display above and using induction on n we will realise that p2k,2n 1 = u2n−2k 2

k r=1

1 f2r u2k−2r + u2k 2

n−k

f2r u2n−2k−2r .

r=1

Explicitly, we have P (T+ (2n) = 2k) = 2k k 2n − 2k −2n 2 , n−k 0 ≤ k ≤ n. (38)

With 2n = 100, we plot the function P (T100 = 2k) for 0 ≤ k ≤ 50 below. Notice that it is maximised at the ends of the interval.

100

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0

10

20 k

30

40

50

**The arcsine law*
**

Use of Stirling’s approximation in formula (38) gives P (T+ (2n) = 2k) ∼ 1/π k(n − k) , as k → ∞ and n − k → ∞.

**Now consider the calculation of the following probability: P T+ (2n) ∈ [a, b] 2n =
**

na≤k≤nb

P

T+ (2n) k = 2n n 1/π k(n − k) =

a≤k/n≤b

∼

a≤k/n≤b

1 . k k (n − n ) n n

1/π

**This is easily seen to have a limit as n → ∞, the limit being
**

b

1/π t(1 − t)

dt,

a

because the last sum is simply a Riemann approximation to the last integral. This shows the celebrated arcsine law: x √ T+ (2n) 2 1/π lim P dt = arcsin x. ≤x = n→∞ 2n π t(1 − t) 0 We thus have that the limiting distribution of T+ (2n) (the fraction of time that the RW is √ 2 positive), as n → ∞, is the distribution function π arcsin x, 0 < x < 1. This is known as the arcsine law. This distribution function has density √ 1/π , 0 < x < 1, the plot of

x(1−x)

which is amazingly close to the plot of the ﬁgure above. Now consider again a SSRW and consider the hitting times (Tx , x ≥ 0). This is another random walk because the summands in Tx = x (Ty − Ty−1 ) are i.i.d. all distributed as y=1 T1 : 2n −2n 1 P (T1 ≥ 2n) = P (S2n = 0) = 2 ∼√ . n πn 101

0)) = 1/4. n ≥ 1. 2 Same argument applies to (Sn . So let 1 2 1 2 1 2 Rn = (Rn . y ∈ Z. When he reaches a corner he moves again in the same manner. ξ2 . 0)) = 1/4. Sn = i=1 n ξi . random vectors such that P (ξn = (0. . y) where x. . Consider a change of coordinates: rotate the axes by 45o (and scale them). . So Sn = (Sn . 2 1 If x ∈ Z2 . n ∈ Z+ ) is a random walk. 0). are independent. north or south. 1 Hence (Sn . n ∈ Z+ ) is a sum of i. We have 1. Since ξ1 . . the random 1 2 variables ξn . Sn − Sn ). It is not a simple random walk because there is a possibility that the increment takes the value 0. He starts at (0. for each n. is a random walk. Rn ) := (Sn + Sn . ξ2 . i=1 ξi i=1 ξi 2 1 Lemma 20. . the two stochastic processes are not independent. Sn ) = n n 2 .From this. ξ2 . map (x1 .i. We describe this motion as a random walk in Z2 = Z × Z: Let ξ1 . . Many other quantities can be approximated in a similar manner. west. we let x1 be its horizontal coordinate and x2 its vertical. x1 − x2 ). . 0)) = 1/4 1 P (ξn = −1) = P (ξn = (−1. 0)) = P (ξn = (−1. We then let S0 = (0. The corners are described by coordinates (x. . north or south. being totally drunk. They are also identically distributed with 1 P (ξn = 0) = P (ξn = (0. However. Indeed. n ∈ Z+ ) is also a random walk. with equal probability (1/4). we immediately get that the expectation of T1 is inﬁnity: ET1 = ∞. x2 ) into (x1 + x2 . random variables and hence a random walk.d. 1)) + P (ξn = (0. he moves east. −1)) = 1/2. 0) and. The two random walks are not independent. n ∈ Z+ ) showing that it. 1 P (ξn = 1) = P (ξn = (1. the latter being functions of the former.i.d. 34 Simple symmetric random walk in two dimensions Consider a drunkard in a city whose roads are arranged in a rectangular grid. west.e. We have: 102 .. i. too. going at random east. 1)) = P (ξn = (−1. so are ξ1 . 1 1 Proof. i. 0)) = P (ξn = (1. (Sn . (Sn . . ξn are not independent simply because if one takes value 0 the other is nonzero.

in the sense that the ratio of the two sides goes to 1. Recall. (Rn + 2 2 1 2 1 1 we check that ηn = ξn + ξn . . . n ∈ Z+ ). R2n = 0) = P (R2n = 0)(R2n = 0). that for a simple symmetric random walk (started from S0 = 0). Similar argument shows that each of (Rn . n ∈ Z+ ) is a random walk in two dimensions. n ∈ Z+ ) 1 is a random walk in two dimensions. n Corollary 24. The last two are walk in one dimension. Proof. n ∈ Z+ ) is a random 2 . By Lemma 5. n ∈ Z ) is a random walk in one dimension. (Rn + independent. We have of each of the ηn n 1 2 P (ηn = 1. 35 Skorokhod embedding* We take a closer look at the gambler’s ruin problem for a simple symmetric random walk in one dimension. 0)) = 1/4 1 2 P (ηn = 1. ηn = ξn − ξn are independent for each n. we need to show that ∞ P (Sn = 0) = ∞. 0)) = 1/4 1 2 P (ηn = −1. η 2 are ±1. 1 where the last equality follows from the independence of the two random walks. ηn = ε′ ) = P (ηn = ε)P (ηn = ε′ ) for all choices of ε. The two-dimensional simple symmetric random walk is recurrent. 1)) = 1/4 1 2 P (ηn = −1. and by Stirling’s approximation. 1 2 1 2 P (S2n = 0) = P (R2n = 0) = P (R2n = 0. is i. n ∈ Z ) is a random walk in one dimension. Hence P (ηn = 1) = P (ηn = −1) = 1/2. But by the previous n=0 c lemma. b−a P (Ta < Tb ) = b . (Rn . 2 . ηn = 1) = P (ξn = (0. b−a 103 . ξ 1 − ξ 2 ). η . P (S2n = 0) ∼ n . ξ2 . . . being obtained the vector ηn = (ξn 1 2 n n n by taking the same function on the elements of the sequence ξ1 . And so 2 1 2 1 P (ηn = ε. Lemma 22. We have Rn = n (ξi + ξi ). = 1)) = 1/4. 2n n 2 2−4n . But (Rn ) 1 2 is a simple symmetric random walk. ηn = 1) = P (ξn = (1. P (Tb < Ta ) = We can read this as follows: P (STa ∧Tb = b) = −a . To show that the last two are independent.. from Theorem 7. as n → ∞.1 Lemma 21. The sequence of vectors η . So Rn = n ηn where ηn is i=1 i=1 i=1 1 + ξ 2 . b−a −a . The possible values 1 . Let a < 0 < b. ηn = −1) = P (ξn = (0. where c is a positive constant. 2 1 2 2 1 1 Proof. +1}. Hence (Rn . . . 1 1 2 2 Hence P (ηn = 1) = P (ηn = −1) = 1/2. (Rn . So P (R2n = 0) = 2n 2−2n . b−a P (STa ∧Tb = a) = b . P (S2n = 0) = Proof. The same is true for (Rn ). ε′ ∈ {−1.i. Rn = n (ξi − ξi ).d. ηn = −1) = P (ξn = (−1.

if p = M/N . i ∈ Z) a given probability distribution on Z with zero mean: i∈Z ipi = 0. Deﬁne a pair of random variables (A. i ∈ Z. and the probability it exits from the right point is 1 − p. and assuming that (A. 0)} ∪ (N × N) with the following distribution: P (A = i. P (A = 0. 1 − p). Note. such that p is rational. i<0 j>0 This leads to: c−1 j>0 jpj i<0 pi + c−1 i<0 (−i)pi j>0 i>0 ipi pj = 1 − p0 Since i∈Z ipi = 0. b]. B = 0) = p0 . B) taking values in {(0. How about more complicated distributions? Let suppose we are given a probability distribution (pi . B) are independent of (Sn ). Indeed. k ∈ Z. M < N . b. we have this common value: i<0 (−i)pi = and. B = j) = c−1 (j − i)pi pj . where c−1 is chosen by normalisation: P (A = 0. given a 2-point probability distribution (p. M. Conversely. we get that c is precisely ipi . suppose ﬁrst k = 0. Then Z = 0 if and only if A = B = 0 and this has 104 . b = N . a = M − N . Let (Sn ) be a simple symmetric random walk and (pi . The probability it exits it from the left point is p.This last display tells us the distribution of STa ∧Tb . choose. Theorem 43. using this. Proof.g. i < 0 < j. i ∈ Z). To see this. Can we ﬁnd a stopping time T such that ST has the given distribution? The Skorokhod embedding theorem tells us how to do this. we consider the random variable Z = STA ∧TB . a < 0 < b such that pa + (1 − p)b = 0. i ∈ Z. This means that we can simulate the distribution by generating a simple symmetric random walk starting from 0 and watching it till the ﬁrst time it exits the interval [a. The claim is that P (Z = k) = pk . c= i<0 (−i)pi = i>0 Letting Ti = inf{n ≥ 0 : Sn = i}. that ESTa ∧Tb = 0. N ∈ N. e. in particular. we can always ﬁnd integers a. B = j) = 1. Then there exists a stopping time T such that P (ST = i) = pi . B = 0) + P (A = i.

probability p0 . P (Z = k) = pk for k < 0. B = k) i<0 = = i<0 −i −1 c (k − i)pi pk k−i (−i)pi pk = pk . Suppose next k > 0. i<0 = c−1 Similarly. B = j) P (STi ∧Tk = k)P (A = i. 105 . We have P (Z = k) = P (STA ∧TB = k) = i<0 j>0 P (STi ∧Tj = k)P (A = i. by deﬁnition.

Then c | a + (b − a). Because c | a and c | b.) Notice that: gcd(a. Now suppose that c | a and c | b − a. with at least one of them nonzero. 38) = gcd(44. d2 are two such numbers then d1 would divide d2 and vice-versa. For instance. 14) = gcd(6. and we’re done. 2) = gcd(4. b. Indeed. replace b − a by b − 2a. their negatives and zero. let d = gcd(a.PART III: APPENDICES This section contains some sine qua non elements from standard Mathematics. Then interchange the roles of the numbers and continue.e. 2. we have c | d. 0. 38) = gcd(82. with the second line. if a. But 3 is the maximum number of times that 38 “ﬁts” into 120. The set Z of integers consists of N. b). Therefore d | b − a. Now. gcd(120. If b − a > a. b − a). 2) = gcd(2. −3. Indeed 4 × 38 > 120. and d = gcd(a. So (ii) holds. We write this by b | a. i. b − a). Zero is denoted by 0. 6. 32) = gcd(6. . · · · }. Replace b by b − a. So we work faster. (That it is unique is obvious from the fact that if d1 . as taught in high schools or universities. The set N of natural numbers is the set of positive integers: N = {1. b) = gcd(a. they merely serve as reminders. leading to d1 = d2 . c | b. 26) = gcd(6. d | b. But d | a and d | b. b are arbitrary integers. 5. −2. 4. Similarly. 1. 2. we can deﬁne their greatest common divisor d = gcd(a. Given integers a. If a | b and b | a then a = b. we may summarise the ﬁrst line by saying that we replace the largest of the two numbers by the remainder of the division of the largest with the smallest. (ii) if c | a and c | b. Keep doing this until you obtain b − qa < a. They are not supposed to replace textbooks or basic courses. . if we also invoke Euclid’s algorithm (i.e. 3. If c | b and b | a then c | a. Z = {· · · . This gives us an algorithm for ﬁnding the gcd between two numbers: Suppose 0 < a < b are integers. In other words. the process of long division that one learns in elementary school). −1. But in the ﬁrst line we subtracted 38 exactly 3 times. b).}. . 106 . then d | c. 20) = gcd(6. with b = 0. Arithmetic Arithmetic deals with integers. So (i) holds. we say that b divides a if there is an integer k such that a = kb. b) as the unique positive integer such that (i) d | a. 3. 8) = gcd(6. 2) = 2. 38) = gcd(6. 38) = gcd(6. We will show that d = gcd(a.

with x = 19. In the process. 1. 120) = 38x + 120y. y such that d = xa + yb. The same logic works in general. we have the fact that if d = gcd(a. 38) = gcd(6. Its proof is obvious. in particular. . Then d = xa + yb. To see this. b ∈ Z. it divides d. we used the divisions 120 = 3 × 38 + 6 38 = 6 × 6 + 2. 107 . we have 2 = 38 − 6 × 6 = 38 − 6 × (120 − 3 × 38) = 19 × 38 − 6 × 120.The theorem of division says the following: Given two positive integers b. Suppose that c | a and c | b. b) = min{xa + yb : x. let us follow the previous example again: gcd(120. This shows (ii) of the deﬁnition of a greatest common divisor. 2) = 2. b) then there are integers x. Second. gcd(38. i. with a > b. 38) = gcd(6. a. we can ﬁnd an integer q and an integer r ∈ {0. y ∈ Z. y ∈ Z. for integers a. for some x. such that a = qb + r. not both zero. . . For example.e. we have gcd(a. to divide 3450 with 26 we do 132 —— 3450 26 — 85 78 — 70 52 — 18 26 | and obtain q = 26. . let d be the right hand side. a − 1}. Here are some other facts about the greatest common divisor: First. To see why this is true. Thus. Then c divides any linear combination of a. we can show that. We need to show (i). y = 6. b. The long division algorithm is a process for ﬁnding the quotient q and the remainder r. xa + yb > 0}. Working backwards. r = 18.

We encountered the equivalence relation i theory. x). Finally. A relation is called symmetric if it cannot contain a pair (x.that d | a and d | b. Since d > 0. A relation is called reflexive if it contains pairs (x.e. for some 1 ≤ r ≤ d − 1. The set of functions from X to Y is denoted by Y X . Any equivalence relation splits a set into equivalence classes: the equivalence class of x contains all y that are related (are equivalent) to x. A relation which possesses all the three properties is called equivalence relation. respectively. that d does not divide b. so d divides both a and b. the relation < between real numbers can be thought of as consisting of pairs of numbers (x. A function is called onto (or surjective) if its range is Y . The equality relation is symmetric. For example. x).e. e. so r = 0. But then b = q(xa + yb) + r. The range of f is the set of its values. A relation is transitive if when it contains (x. yielding r = (−qx)a + (1 − qy)b. In general. i. If we take A to be the set of students in a classroom. The greatest common divisor of 3 integers can now be deﬁned in a similar manner. z) it also contains (x. say. B be ﬁnite sets with n. Counting Let A. z). the greatest common divisor of any subset of the integers also makes sense. So d is not minimum as claimed. j (i communicates with j) in Markov chain Functions We a function f from a set X into a set Y is denoted by f : X → Y . it can be thought as reﬂexive (unless we do not want to accept the fact that one may not be afriend of oneself). y) and (y. The relation < is not reﬂexive.g.. the set {f (x) : x ∈ X}. A function is called one-to-one (or injective) if two diﬀerent x1 . f (x2 ). a relation can be thought of as a set of pairs of elements of the set A. i. then the relation “x is a friend of y” is symmetric. Both < and = are transitive. The equality relation = is reﬂexive. that d does not divide both numbers. The relation < is not symmetric. y) such that x is smaller than y. and this is a contradiction. but is not necessarily transitive. If B ⊂ Y then f −1 (B) consists of all x ∈ X with f (x) ∈ B. Relations A relation on a set A is a mathematical notion that establishes a way of pairing elements of A. m elements. Suppose this is not true. So r is expressible as a linear combination of a and b and is strictly smaller than d (being a remainder). 108 . we can divide b by d to ﬁnd b = qd + r. x2 ∈ X have diﬀerent values f (x1 ). y) without contain its symmetric (y.

we denote (n)n also by n!. and this follows from the binomial theorem. Indeed. the k-th in n − k + 1 ways. Each assignment of this type is a one-to-one function from A to B. we have = 1. Thus. If A has n elements. Since the name of the ﬁrst animal can be chosen in m ways. So we have (n)k = n(n − 1)(n − 2) · · · (n − k + 1) ways to pick k elements if we care about their order. Any set A contains several subsets.) So the ﬁrst animal can be assigned any of the m available names. In the special case m = n. 109 . A function is then an assignment of a name to each animal. in other words. (The same name may be used for diﬀerent animals. If we don’t then we divide by the number k! of permutations of the k elements concluding that the number of subsets with k elements of a set with n elements equals (n)k n =: . To be able to do this. Hence the total number of assignments is m × m × ··· × m n times Suppose now we are not allowed to give the same name to diﬀerent animals. The total number of subsets of A (regardless of their number of elements) is thus equal to n k=0 n k = (1 + 1)n = 2n . n k is merely a name for the number computed on the left side of this n k = n n−k n 0 = n! . a one-to-one function from the set A into A is called a permutation of A. n! is the total number of permutations of a set of n elements. Incidentally.e. Clearly. k! k where the symbol equation. and the third. and so on. so does the second. the total number of one-to-one functions from A to B is (m)n := m(m − 1)(m − 2) · · · (m − n + 1). 1}n of sequences of length n with elements 0 or 1. the cardinality of the set B A ) is mn .The number of functions from A to B (i. The ﬁrst element can be chosen in n ways. n! stands for the product of the ﬁrst n natural numbers: n! = 1 × 2 × 3 × · · · × n. and so on. We thus observe that there are as many subsets in A as number of elements in the set {0. and so on. we need more names than animals: n ≤ m. we may think of the elements of A as animals and those of B as names. the number of subsets with k elements each can be found by designating which of the n elements of A will be chosen. that of the second in m − 1 ways. the second in n − 1. k!(n − k)! Since there is only one subset with no elements (it is the empty set ∅).

we deﬁne x m = x(x − 1) · · · (x − m + 1) . and a− is called the negative part of a. by convention. 0). Thus a ∧ b = a if a ≤ b or = b if b ≤ a. 0). and we don’t care to put parantheses because the order of ﬁnding a maximum is irrelevant. b is denoted by a ∧ b = min(a. m! n y If y is not an integer then. b). in choosing k animals from a set of n + 1 ones. Both quantities are nonnegative. In particular. we add the two numbers together in order to ﬁnd the total number of ways to pick k animals from n + 1 animals. For example. we use the following notations: a+ = max(a. b. n n . say. consider a speciﬁc animal. 110 . c. If the pig is not included in the sample. k Also. The maximum is denoted by a ∨ b = max(a. we choose k animals from the remaining n ones and this can be done in n ways. if x is a real number. + k−1 k Indeed. Note that −(a ∨ b) = (−a) ∧ (−b) and (a ∧ b) + c = (a + c) ∧ (b + c). then there are k − 1 remaining k n animals to be chosen and this can be done in k−1 ways. a+ is called the positive part of a. b). the pig. a− = − min(a. by convention. The absolute value of a is deﬁned by |a| = a+ + a− . a∨b∨c refers to the maximum of the three numbers a. The notation extends to more than one numbers. Note that it is always true that a = a+ − a− . If the pig is included. Maximum and minimum The minimum of two numbers a. Since the sentences “the pig is included” and “the pig is not included” cannot be simultaneously true. we let The Pascal triangle relation states that n+1 k = = 0.We shall let n = 0 if n < k.

For instance. . P (X ≥ n). 111 . if A1 . Several quantities of interest are expressed in terms of indicators. . Therefore its expectation is E 1A = 1 × P (A) + 0 × P (Ac ) = P (A). . If A is an event then 1A is a random variable.Indicator function stands for the indicator function of a set A. if I go to a bank N times and Ai is the clause “I am robbed the i-th time” then N n=1 1Ai is the total number of times I am robbed. . Another example: Let X be a nonnegative random variable with values in N = {1. It is easy to see that 1A 1A∩B = 1A 1B .}. It is worth remembering that. Then X X= n=1 1 (the sum of X ones is X). 1 − 1A = 1 A c . Thus. A2 . 2. A linear function ϕ from Rn to Rm is such that ϕ(αx + βy) = αϕ(x) + βϕ(y). We also use the notation 1(clause) for a number that is 1 whenever the clause is true or 0 otherwise. So E[X] = n≥1 E 1(X ≥ n) = n≥1 Matrices Let Rd be the set of vectors with d components. An example of its application: Since 1A∪B = 1 − 1(A∪B)c = 1 − 1Ac ∩Bc = 1 − 1Ac 1Bc = 1 − (1 − 1A )(1 − 1B ) = 1A + 1B − 1A 1B . = 1A + 1B − 1A∩B . we have E[1A∪B ] = E[1A ] + E[1B ] − E[1A∩B ]. are sets or clauses then the number n≥1 1An is the number true clauses. . It takes value 1 with probability P (A) and 0 with probability P (Ac ). So X= n≥1 1(X ≥ n). meaning a function that takes value 1 on A and 0 on the complement Ac of A. which means P (A ∪ B) = P (A) + P (B) − P (A ∩ B). . This is very useful notation because conjunction of clauses is transformed into multiplication.

is a column representation of y := ϕ(x) with respect to the given basis . m. Then. . . . Suppose that ψ. . x . The relation between the column representation of y and the column representation of x is written as 1 1 x y a11 · · · a1n . and all α. xn n 1 y := ϕ(x) = j=1 xj ϕ(uj ). Then any x ∈ Rn can be uniquely written as n x= j=1 x j uj . un be a basis for Rn . The matrix A of ϕ with respect to the choice of the two basis is deﬁned by a11 · · · a1n A := · · · · · · · · · am1 · · · amn y1 Then ϕ ◦ ψ is a linear function: Rn − → Rℓ − 112 ϕ◦ψ . . . . Let u1 . xn the given basis. . . vm in Rn we can express each ϕ(uj ) as a linear combination of the basis vectors: m ϕ(uj ) = i=1 aij vi . j=1 i = 1. am1 · · · amn xn ym Rn − Rm − Rℓ → → ψ ϕ . The column . . . . = ··· ··· ··· . .for all x. where ∈ R. is a column representation of x with respect to . . The column . . ym v1 . . . we have ϕ(x) = m n vi i=1 j=1 aij xj . But the ϕ(uj ) are vectors in Rm . . . Let y i := n aij xj . x1 . . vm . . . So if we choose a basis v1 . because ϕ is linear. y ∈ Rn . . Combining. β ∈ R. ϕ are linear functions: It is a m × n matrix. .

Rm and Rℓ and let A. A property of real numbers says that if the set I is bounded from below then this maximum of all lower bounds exists. . is a sequence of numbers then inf{x1 . respectively. So lim sup xn = limn→∞ inf{xn . If I is empty then inf ∅ = +∞. The standard basis in Rd consists of the vectors 1 0 0 0 1 0 e 1 = . for any ε > 0. Supremum and inﬁmum When we have a set of numbers I (for instance. . .} ≤ · · · and. and it is this that we called inﬁmum. Similarly. Then the matrix representation of ϕ ◦ ψ is AB. . Usually. sup I is characterised by: sup I ≥ x for all x ∈ I and. Likewise. For instance I = {1. .} ≤ inf{x3 . . Limsup and liminf If x1 .Fix three sets of basis on Rn . . If I has no lower bound then inf I = −∞. . We note that lim inf(−xn ) = − lim sup xn . for any ε > 0. . . 113 . . i. If I has no upper bound then sup I = +∞. e2 = . 1/3. xn+1 . B be the matrix representations of ϕ. where m (AB)ij = k=1 Aik Bkj . with respect to these bases. . If we ﬁx a basis in Rn . . x3 . . . x4 . . we deﬁne max I. . the choice of the basis is the so-called standard basis. ed = . j So A is a ℓ × m matrix. . . 0 0 1 In other words. 1 ≤ j ≤ n. of numbers) then the minimum of I is denoted by min I and refers to a number c ∈ I such that c ≤ x for all the numbers x ∈ I. . we deﬁne the supremum sup I to be the minimum of all upper bounds. . a2 . Similarly.} ≤ inf{x2 . 1 ≤ i ≤ ℓ. . . we deﬁne lim sup xn .} has no minimum. ei = 1(i = j). there is x ∈ I with x ≥ ε + inf I. sup ∅ = −∞. x2 . standing for the “infimum” of I. . it is its matrix representation with respect to the ﬁxed basis. A matrix A is called square if it is a n × n matrix. there is x ∈ I with x ≤ ε + inf I. inf I is characterised by: inf I ≤ x for all x ∈ I and. In these cases we use the notation inf I. since any increasing sequence has a limit (possibly +∞) we take this limit and call it lim sup xn (limit superior) of the sequence. deﬁned as the maximum of all the numbers y with the property y ≤ x for all x ∈ I. . B is a m × n matrix. and AB is a ℓ × n matrix. . xn+2 . . x2 . . 1/2.e. . a sequence a1 . The sequence xn has a limit ℓ if and only if lim sup xn = lim inf xn = ℓ. . ψ. . · · · }. This minimum (or maximum) may not exist. a square matrix represents a linear transformation from Rn to Rn .

to compute P (A ∩ B) we read the deﬁnition as P (A ∩ B) = P (A)P (B|A). B in the deﬁnition. A2 . Often. in these notes. we work a lot with conditional probabilities. A2 . A2 . follows from this axiom. and 1 Ac = 1 − 1 A . to compute P (A). For example. are mutually disjoint events. . to show that EX ≤ EY . . P (∪n An ) ≤ Similarly. This is the inclusion-exclusion formula for two events. which holds since X is positive. and apply them. Similarly. and everywhere else. Linear Algebra that has many axioms is more ‘restrictive’. we get P (A ∩ B) = P (A)P (B|A) = P (B)P (A|B) and this is called Bayes’ theorem. . then P n≥1 An = P (An ). but not too much. . Linear Algebra that has many axioms is more ‘restrictive’. such that ∪n Bn = Ω. This can be generalised to any ﬁnite number of events.. This is generalisable for a number of events: P (A1 ∩ A2 ∩ A3 ) = P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 ) If we interchange the roles of A. In Markov chain theory.Probability Theory I do not require11 knowledge of axiomatic Probability Theory (with Measure Theory). n P (An ). B are disjoint events then P (A ∪ B) = P (A) + P (B). That is why Probability Theory is very rich. This follows from the logical statement X 1(X > x) ≤ X. If the events A. B are not disjoint then P (A ∪ B) = P (A) + P (B) − P (A ∩ B). the function that takes value 1 on A and 0 on Ac . In contrast. n If the events A1 . Markov’s inequality for a positive random variable X states that P (X > x) ≤ EX/x. .. B2 . Quite often. derive formulae. In contrast. i. we often need to argue that X ≤ Y .) If A. If A is an event then 1A or 1(A) is the indicator random variable. . Therefore. Of course. Note that 1A 1B = 1A∩B . . namely: Everything else. if An are events with P (An ) = 1 then P (∩n An ) = 1. . by not requiring this. it is useful to ﬁnd disjoint events B1 . n≥1 Usually. if he events A1 . in which case we write P (A) = n P (A ∩ Bn ) = P (A|Bn )P (Bn ). Recall that P (B|A) is deﬁned by P (A ∩ B)/P (A) and is a very motivated deﬁnition. showing that P (A) ≤ P (B) is more of a logical problem than a numerical one: we need to show that A implies B (i. are decreasing with A = ∩n An then P (A) = limn→∞ P (An ). A is a subset of B). But I would like the reader to keep in mind one thing: Every mathematical system has a set of axioms from which we can prove (useful) theorems. . . If A1 . I do sometimes “cheat” from a strict mathematical point of view. Similarly.e. Indeed. are increasing with A = ∪n An then P (A) = limn→∞ P (An ). for (besides P (Ω) = 1) there is essentially one axiom .e. Probability Theory is exceptionally simple in terms of its axioms. E 1A = P (A). If An are events with P (An ) = 0 then P (∪n An ) = 0. 11 114 . (That is why Probability Theory is very rich. . .

ϕ(s) = EsX = n=0 sn P (X = n) 115 . n = 0. which are both nonnegative and then deﬁne EX = EX + − EX − . . 1]. so it is useful to keep this in mind. 2. Suppose now that X is a random variable with values in Z+ ∪ {+∞}. This is due to the fact that n=1 P (X ≥ n) = E 1(X ≥ n). 1. The expectation of a nonnegative random variable X may be deﬁned by ∞ EX = 0 P (X > x)dx. . This deﬁnition makes sense only if EX + < ∞ or if EX − < ∞. . 1. The formula holds for any positive random variable. x→∞ In Markov chain theory. . 0). we encounter situations where P (X = ∞) may be positive. . . As an example. obviously. If it exists for some s1 then it exists uniformly for all s with |s| < |s1 | (and this needs a proof. 1 − ρs (39) Note that |G(s)| ≤ n |an ||s|n and this is ≤ n |an | if s ∈ [−1. the generating function of the geometric sequence an = ρn . we have EX = −∞ xf (x)dx. a1 . .. 0). making it a useful object when the a0 .An application of this is: P (X < ∞) = lim P (X ≤ x). which is motivated from the algebraic identity X = X + − X − . In case the random variable is discrete. a1 .} because then n an = 1. For a random variable which takes positive and negative values. . taught in calculus courses). . for s = 0 for which G(0) = 0. 1]. is an inﬁnite polynomial having the ai as coeﬃcients: G(s) = a0 + a1 s + a2 s2 + · · · . . the formula reduces to the known one. . n ≥ 0 is ∞ ρn s n = n=0 1 . i. is called probability generating function of X: ∞ as long as |ρs| < 1. .e. For a random variable that takes values in N we can write EX = ∞ P (X ≥ n). The generating function of the sequence pn = P (X = n). We do not a priori know that this exists for any s other than. EX = x xP (X = x). X − = − min(X. |s| < 1/|ρ|. we deﬁne X + = max(X. In case the random variable is ∞ absoultely continuous with density f . Generating functions A generating function of a sequence a0 . viz. So if n an < ∞ then G(s) exists for all s ∈ [−1. is a probability distribution on Z+ = {0.

ds ds and by letting s = 1. n ≥ 0) is n≥0 xn+2 sn = s−2 (X(s) − x0 − sx1 ). since |s| < 1. Note that ϕ(0) = P (X = 0). If we let X(s) = n≥0 n ≥ 0. xn sn be the generating of (xn . we have s∞ = 0.This is deﬁned for all −1 < s < 1. take the value +∞. n=0 We do allow X to. n ≥ 0) is n≥0 xn+1 sn = s−1 (X(s) − x0 ) and the generating function of (xn+2 . n ≥ 0) then the generating function of (xn+1 . As an example. while ∞ ϕ(1) = n=0 P (X = n) = P (X < +∞). Since the term s∞ p∞ could formally appear in the sum deﬁning ϕ(s). Generating functions are useful for solving linear recursions. with x0 = x1 = 1. consider the Fibonacci sequence deﬁned by the recursion xn+2 = xn+1 + xn . possibly. and this is why ϕ is called probability generating function. Note that ∞ n=0 pn ≤ 1. Also. n! as follows from Taylor’s theorem. 116 . ∞ and p∞ := P (X = ∞) = 1 − pn . we can recover the expectation of X from EX = lim ϕ′ (s). We can recover the probabilities pn from the formula pn = ϕ(n) (0) . but. s→1 something that can be formally seen from d d EsX = E sX = EXsX−1 .

then the so-called distribution function. x ∈ R. n ≥ 0) equals the sum of the generating functions of (xn+1 .e. Despite the square roots in the formula. n ≥ 0) and (xn . namely. 117 . this number is always an integer (it has to be!) Distribution or law of a random object When we speak of the “distribution” or “law” of a random variable X with values in Z or R or Rn or even in a much larger space.e. Thus (1/ρ)−s is the 1 generating function of ρn+1 . b = −( 5 + 1)/2. Since x0 = x1 = 1. for example. the n-th term of the Fibonacci sequence. This tells us that s−b is the generating function of (1/b)n+1 1 and that s−a is the generating function of (1/a)n+1 . is such an example. b−a Noting that ab = −1. − − a−b s−b s−a b−a b−s a−s 1/ρ 1 1 But (39) tells us that ρn has generating function 1−ρs = (1/ρ)−s . we speak of some function which enables us to specify the probabilities of the form P (X ∈ B) where B is a set of values of X. if X is a random variable with values in R. and so X(s) = 1 1 1 1 1 1 = . s−2 (X(s) − x0 − sx1 ) = s−1 (X(s) − x0 ) + X(s). we can solve for X(s) and ﬁnd X(s) = s2 −1 . Hence s2 + s − 1 = (s − a)(s − b). So. the function P (X ≤ x). Hence X(s) is the generating function of 1 xn = ((1/b)n+1 − (1/a)n+1 ). But I could very well use the function P (X > x). +s−1 But the polynomial s2 + s − 1 has two roots: √ √ a = ( 5 − 1)/2. n ≥ 0). i. i. we can also write this as xn = (−a)n+1 − (−b)n+1 . b−a is the solution to the recursion.From the recursion xn+2 = xn+1 + xn we have (and this is were linearity is used) that the generating function of (xn+2 .

or the 2-parameter function P (a < X < b). Large products: Stirling’s formula If we want to compute n! = 1 × 2 × · · · × n for large n then. For example (try it!) the rough Stirling approximation give that the probability that. to compute a large sum we can. Since log x does not vary too much between two successive positive integers k and k + 1 it follows that k+1 k log x dx ≈ log k. dx All these objects can be referred to as “distribution” or “law” of X. compute an integral and obtain an approximation. Warning: the verb “specify” should not be interpreted as “compute” in the sense of existence of an analytical formula or algorithm. It turns out to be a good one. (b) The approximation is not good when we consider ratios of products. 1 Since d dx (x log x − x) = log x. we encountered the concept of an excursion of a Markov chain from a state i. If X is absolutely continuous. Even the generating function EsX of a Z+ -valued random variable X is an example. since the number is so big. Such an excursion is a random object which should be thought of as a random function. For instance. I could use its density f (x) = d P (X ≤ x). but using or merely talking about such an object presents no diﬃculty other than overcoming a psychological barrier. n ∈ Z+ . instead. But there are two problems: (a) We do not know in what sense the symbol ≈ should be interpreted. n! ≈ exp(n log n − n) = nn e−n . n Hence n k=1 log k ≈ log x dx. in 2n fair coin tosses 118 . it follows that n 1 log x dx = n log n − n + 1 ≈ n log n − n. Specifying the distribution of such an object may be hard. Hence we have We call this the rough Stirling approximation. We frequently encounter random objects which belong to larger spaces. for it does allow us to specify the probabilities P (X = n). it’s better to use logarithms: n n! = e log n = exp k=1 log k. Now. for example they could take values which are themselves functions.

0 ≤ n ≤ m. Indeed. And if y < x then 1(y < x) = 0 and the display reads g(y) ≥ 0. g(X) ≥ g(x)1(X ≥ x). by taking expectations of both sides (expectation is an increasing operator). and is based on the concept of monotonicity. y ∈ R g(y) ≥ g(x)1(y ≥ x). We can case both properties of g neatly by writing For all x. n ∈ Z+ . then X is such that knowledge that X has exceeded n does not make the distribution of the remaining time X − n any diﬀerent from X. That was the random variable J0 = ∞ n=1 1(Sn = 0). as n → ∞. a geometric function of k. for all x. expressing non-negativity. you have seen it before. Then. 119 . and this is terrible. It was shown in Theorem 38 that J0 satisﬁes the memoryless property. Markov’s inequality Arguably this is the most useful inequality in Probability. The geometric distribution We say that the random variable X with values in Z+ is geometric if it has the memoryless property: P (X − n ≥ m | X ≥ n) = P (X ≥ m). If we think of X as the time of occurrence of a random phenomenon. This completely trivial thought gives the very useful Markov’s inequality. In fact it was shown there that P (J0 ≥ k) = P (J0 ≥ 1)k . We have already seen a geometric random variable. Let X be a random variable. where λ = P (X ≥ 1). To remedy this. so this little section is a reminder. we shall prove Strirling’s approximation: √ n! ∼ nn e−n 2πn where the symbol an ∼ bn means that an /bn → 0. expressing monotonicity.we have exactly n heads equals 2n 2−2n ≈ 1. and so. We baptise this geometric(λ) distribution. because we know that the n probability goes to zero as n → ∞. deﬁned in (34). The same process shows that a geometric random variable has a distribution of the form P (X ≥ n) = λn . Eg(X) ≥ g(x)P (X ≥ x) . Surely. Consider an increasing and nonnegative function g : R → R+ . if y ≥ x then 1(y ≥ x) = 1 and the display tells us g(y) ≥ g(x).

Norris. Also available at: http://www.dartmouth.mac. Wiley. K L and Aitsahlia F (2003) Elementary Probability Theory (With Stochastic Processes and an Introduction to Mathematical Finance).Bibliography ´ 1. ´ 2. Grinstead. Cambridge University Press. Parry. Amer. 5. Springer.edu/~chance/teaching_aids/books_articles/ /probability_book/amsbook. 3. Math. Feller. Chung. Springer. W (1981) Topics in Ergodic Theory. J R (1998) Markov Chains. W (1968) An Introduction to Probability Theory and its Applications. 7.pdf 6. 4. Bremaud P (1997) An Introduction to Probabilistic Modeling. Springer. C M and Snell. 120 . Cambridge University Press. J L (1997) Introduction to Probability. Bremaud P (1999) Markov Chains. Soc.

6 Markov’s inequality. 119 minimum. 119 matrix. 29 reﬂection. 98 dynamical system. 65 Catalan numbers. 10 ballot theorem. 6 Markov property. 68 aperiodic. 16 communicating classes. 10 irreducible. 16. 110 mixing. 17 Kolmogorov’s loop criterion. 114 balance equations. 82 Cesaro averages. 61 detailed balance equations. 111 maximum. 76 neighbours. 87 121 . 53 Fibonacci sequence. 17 Aloha. 7 closed. 18 permutation. 47 coupling inequality. 51 Monte Carlo. 15. 17 inﬁmum. 116 ﬁrst-step analysis. 111 inessential state. 101 axiom of Probability Theory. 16 convolution. 18 dual. 17 Ethernet. 53 hitting time. 35 Helly-Bray Lemma. 119 Google. 16 equivalence relation. 115 random walk on a graph. 117 leads to. 59 law. 20 function of a Markov chain. 26 running maximum.Index absorbing. 117 division. 84 bank account. 61 null recurrent. 17 communicates with. 65 generating function. 61 recurrent. 51 essential state. 68 Fatou’s lemma. 77 indicator. 115 geometric distribution. 2 nearest neighbour random walk. 18 arcsine law. 77 coupling. 108 ruin probability. 109 positive recurrent. 108 relation. 70 greatest common divisor. 108 ergodic. 63 Galton-Watson process. 110 meeting time. 7 invariant distribution. 113 initial distribution. 34 PageRank. 106 ground state. 1 equivalence classes. 44 degree. 15 distribution. 13 probability generating function. 70 period. 48 memoryless property. 86 reﬂexive. 58 digraph (directed graph). 3 branching process. 48 cycle formula. 81 reﬂection principle. 45 Chapman-Kolmogorov equation. 20 increments. 34 probability ﬂow. 15 Loynes’ method. 73 Markov chain.

60 urn. 7. 3. 108 tree. 13 stopping time. 29 transition diagram:. 8 transitive. 27 subordination. 9 stationary distribution. 8 transition probability matrix. 16. 76 Skorokhod embedding. 7 simple random walk. 27 storage chain. 62 supremum. 6 stationary.semigroup property. 58 transient. 73 strategy of the gambler. 7 time-reversible. 104 states. 76 simple symmetric random walk. 10 steady-state. 24 Strirling’s approximation. 71 122 . 62 subsequence of a Markov chain. 119 strong Markov property. 2 transition probability. 62 Wright-Fisher model. 108 time-homogeneous. 16. 113 symmetric. 88 watching a chain in a set. 6 statespace. 77 skip-free property.

- The Incredible Power of Post Selection (Scott Aaronson)
- markov-chains
- Lec27
- Markov Chains
- Comm Bremermann
- MarkovChains
- Exponential Growth
- 20752_0412055511MarkovChain
- 9781904534488
- markov chain chapter 6
- 2-2
- Bayesian Econometric Methods
- Mata23 Final 2012s
- (Ebook) Hidden Markov Models (Theory & Methods) Markov Chains Particle Filter Monte Carlo Hmm - Cappemoulinesryden
- UCLA Math 33A Midterm 2 Solutions
- Markov Coding
- Matrix
- ProblemsMarkovChains
- SVD_ans_you.pdf
- Markov Chain
- Matrizes, Eliminação Gaussiana, Equações Lineares e Espaços Vetoriais
- A First Course in Linear Algebra
- ROC
- Solved Problems
- lect14
- Statistics
- distance transform 21
- Introduction to Markov Chain Monte Carlo Simulation and Their Statistical Analysis
- 2-D Graphics Transformation -ecomputernotes.com
- Exercises for Stochastic Calculus

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd