This action might not be possible to undo. Are you sure you want to continue?
https://www.scribd.com/doc/85873852/WalksandBranching
05/13/2014
text
original
University of California, Santa Barbara
Gerard Brunick
Last Updated: February 2, 2012
These notes are a minor modiﬁcation of a set on notes which were generously shared with the
present “author” by Gordan
ˇ
Zitkovi´c who currently works in the Department of Mathematics at
The University of Texas at Austin. Any mistakes in these notes where undoubtedly introduced by
the present author when he modiﬁed the original presentation.
Contents
1 Random Walks I 2
1.1 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The canonical probability space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Constructing the random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 The reﬂection principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Generating Functions 9
2.1 Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Associating a generating function with a random variable . . . . . . . . . . . . . . . 10
2.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Random sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Random Walks II 19
3.1 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Wald’s identity II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 The distribution of the ﬁrst hitting time T
1
. . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Branching Process 27
4.1 A bit of history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 A mathematical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Construction and simulation of branching processes . . . . . . . . . . . . . . . . . . . 28
4.4 A generatingfunction approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.5 Extinction probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1
1 Random Walks I
1.1 Stochastic processes
Deﬁnition 1.1. Let T be a subset of [0, ∞). A family of random variables (X
t
)
t∈T
, indexed by T ,
is called a stochastic (or random) process. When T = N (or T = N
0
), (X
t
)
t∈T
is said to be a
discretetime process, and when T = [0, ∞), it is called a continuoustime process.
When T is a singleton (say T = ¦1¦), the process (X
t
)
t∈T
≡ X
1
is really just a single random
variable. When T is ﬁnite (e.g., T = ¦1, 2, . . . , n¦), we get a random vector. Therefore, stochastic
processes are generalizations of random vectors. The interpretation is, however, somewhat diﬀer
ent. While the components of a random vector usually (not always) stand for diﬀerent spatial
coordinates, the index t ∈ T is more often than not interpreted as time. Stochastic processes
usually model the evolution of a random system in time. When T = [0, ∞) (continuoustime
processes), the value of the process can change every instant. When T = N (discretetime
processes), the changes occur discretely.
In contrast to the case of random vectors or random variables, it is not easy to deﬁne a notion
of a density (or a probability mass function) for a stochastic process. Without going into details
why exactly this is a problem, let me just mention that the main culprit is the inﬁnity. One usually
considers a family of (discrete, continuous, etc.) ﬁnitedimensional distributions, i.e., the joint
distributions of random vectors
(X
t
1
, X
t
2
, . . . , X
tn
),
for all n ∈ N and all choices t
1
, . . . , t
n
∈ T .
The notion of a stochastic processes is very important both in mathematical theory and its
applications in science, engineering, economics, etc. It is used to model a large number of various
phenomena where the quantity of interest varies discretely or continuously through time in a non
predictable fashion.
Every stochastic process can be viewed as a function of two variables  t and ω. For each ﬁxed
t, ω → X
t
(ω) is a random variable, as postulated in the deﬁnition. However, if we change our
point of view and keep ω ﬁxed, we see that the stochastic process is a function mapping ω to
the realvalued function t → X
t
(ω). These functions are called the trajectories of the stochastic
process X. The following two ﬁgures show two possible trajectories of a simple random walk
1
, i.e.,
each one corresponds to a (diﬀerent) frozen ω ∈ Ω, but t varies from 0 to 30.
5 10 15 20
6
4
2
2
4
6
5 10 15 20
6
4
2
2
4
6
1
We will deﬁne the simple random walk later. For now, let us just say that is behaves as follows. It starts at
x = 0 for t = 0. After that a (possibly biased) fair coin is tossed and we move up (to x = 1) if heads is observed and
down to x = −1 is we see tails. The procedure is repeated at t = 1, 2, . . . and the position at t + 1 is determined in
the same way, independently of all the coin tosses before (note that the position at t = k can be any of the following
x = −k, x = −k + 2, . . . , x = k −2, x = k).
2
Unlike with the ﬁgures above, the next two pictures show two timeslices of the same random
process; in each graph, the time t is ﬁxed (t = 15 vs. t = 25) but the various values random variables
X
15
and X
25
can take are presented through the probability mass functions.
20 10 0 10 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Figure 1: Probability mass function for X
15
20 10 0 10 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Figure 2: Probability mass function for X
25
1.2 The canonical probability space
When one deals with inﬁniteindex ([T [ = +∞) stochastic processes, the construction of the proba
bility space (Ω, T, P) to support a given model is usually quite a technical matter. This course does
not suﬀer from that problem because all our models can be implemented on a special probability
space. We start with the samplespace Ω:
Ω = [0, 1] [0, 1] = [0, 1]
N
0
,
and any generic element of Ω will be a sequence ω = (ω
0
, ω
1
, ω
2
, . . . ) of real numbers in [0, 1]. For
n ∈ N
0
we deﬁne the mapping U
n
: Ω → [0, 1] which simply chooses the nth coordinate:
U
n
(ω) = ω
n
.
The proof of the following theorem can be found in most advanced probability books (e.g. [1] Thm.
20.4):
Theorem 1.2. There exists a probability measure P on Ω such that
1. each U
n
, n ∈ N
0
is a random variable with the uniform distribution on [0, 1], and
2. the sequence (U
n
)
n∈N
0
is independent.
Remark 1.3. One should think of the sample space Ω as a source of all the randomness in the
system: the elementary event ω ∈ Ω is chosen by a process beyond out control and the exact
value of ω is assumed to be unknown. All the other parts of the system are possibly complicated,
but deterministic, functions of ω (random variables). When a coin is tossed, only a single drop of
randomness is needed  the outcome of a cointoss. When several coins are tossed, more randomness
is involved and the sample space must be bigger. When a system involves an inﬁnite number of
random variables (like a stochastic process with inﬁnite T ), a large sample space Ω is needed.
Once we can construct a sequence of independent random variables which are uniformly dis
tributed on the unit interval, we can then construct any number of models. For example:
3
1.3 Constructing the random walk
Let us show how to construct the simple random walk on the canonical probability space (Ω, T, P)
from Theorem 1.2. First of all, we need a deﬁnition of the simple random walk:
Deﬁnition 1.4. A sequence (X
n
)
n∈N
0
of random variables is called a simple random walk (with
parameter p ∈ (0, 1)) if
a) X
0
= 0,
b) X
n+1
−X
n
is independent of (X
0
, X
1
, . . . , X
n
) for all n ∈ N, and
c) the random variable X
n+1
−X
n
has the following distribution
x = 1 −1
P(X
n+1
−X
n
= x) = p q
where, as usual, q = 1 −p.
If p =
1
2
, the random walk is called symmetric.
The adjective simple comes from the fact that the size of each step is ﬁxed (equal to 1) and it
is only the direction that is random. One can study more general random walks where each step
comes from an arbitrary prescribed probability distribution. For the sequence (U
n
)
n∈N
, given by
Theorem 1.2, deﬁne the following, new, sequence (ξ
n
)
n∈N
of random variables:
ξ
n
=
_
1, U
n
≤ p
−1, otherwise.
We then set
X
0
= 0, X
n
=
n
k=1
ξ
k
, n ∈ N.
Intuitively, we use each ξ
n
to emulate a biased coin toss and then deﬁne the value of the process
X at time n as the cumulative sum of the ﬁrst n cointosses.
Proposition 1.5. The sequence (X
n
)
n∈N
0
deﬁned above is a simple random walk.
Proof. Property a) is trivially true. To check property b), we ﬁrst note that the (ξ
n
)
n∈N
is an
independent sequence (as it has been constructed by an application of a deterministic function to
each element of an independent sequence (U
n
)
n∈N
). Therefore, the increment X
n+1
− X
n
= ξ
n+1
is independent of all the previous cointosses ξ
1
, . . . , ξ
n
. What we need to prove, though, is that it
is independent of all the previous values of the process X. These, previous, values are nothing but
linear combinations of the cointosses ξ
1
, . . . , ξ
n
, so they must also be independent of ξ
n+1
. Finally,
to get (3), we compute
P[X
n+1
−X
n
= 1] = P[ξ
n+1
= 1] = P[U
n+1
≤ p] = p.
A similar computation shows that P[X
n+1
−X
n
= −1] = q.
We have now deﬁned and constructed a random walk (X
n
)
n∈N
0
. Our next task is to study some
of its mathematical properties.
4
Proposition 1.6. Let (X
n
)
n∈N
0
be a simple random walk with parameter p. The distribution of
the random variable X
n
is discrete with support ¦−n, −n + 2, . . . , n −2, n¦, and probabilities
P[X
n
= l] =
_
n
l+n
2
_
p
(n+l)/2
q
(n−l)/2
, l = −n, −n + 2, . . . , n −2, n. (1.1)
Proof. X
n
is composed of n independent steps ξ
k
= X
k+1
− X
k
, k = 1, . . . , n, each of which goes
either up or down. In order to reach level l in those n steps, the number u of upsteps and the
number d of downsteps must satisfy u −d = l (and u +d = n). Therefore, u =
n+l
2
and d =
n−l
2
.
The number of ways we can choose these u upsteps from the total of n is
_
n
n+l
2
_
, which, with
the fact the probability of any trajectory with exactly u upsteps is p
u
q
n−u
, gives the probability
(1.1) above. Equivalently, we could have noticed that the random variable
n+Xn
2
has the binomial
b(n, p)distribution.
1.4 The reﬂection principle
Now we know how to compute the probabilities related to the position of the random walk (X
n
)
n∈N
0
at a ﬁxed future time n. A mathematically more interesting question can be posed about the
maximum of the random walk on ¦0, 1, . . . , n¦. A nice expression for this probability is available
for the case of symmetric simple random walks.
To compute this quantity, it is more helpful to view the random walk
as a random trajectory in some space of paths, and, compute the
required probability by simply counting the number of trajectories in
the subset (event) you are interested in, and adding them all together,
weighted by their probabilities. To prepare the ground for the future
results, let C be the set of all possible trajectories:
C = ¦(x
0
, x
1
, . . . , x
n
) : x
0
= 0, x
k+1
−x
k
= ±1, k ≤ n −1¦.
You can think of the ﬁrst n steps of a random walk simply as a
probability distribution on the statespace C.
The ﬁgure on the right shows the superposition of all trajectories in
C for n = 4 with path (0, 1, 0, 1, 2) marked in red.
1 2 3 4
4
2
2
4
Proposition 1.7. Let (X
n
)
n∈N
0
be a symmetric simple random walk, suppose n ≥ 2, and let
M
n
= max(X
0
, . . . , X
n
) be the maximal value of (X
n
)
n∈N
0
on the interval 0, 1, . . . , n. The support
of M
n
is ¦0, 1, . . . , n¦ and its probability mass function is given by
P[M
n
= l] =
_
n
¸
n+l+1
2

_
2
−n
, l = 0, . . . , n.
Proof. Let us ﬁrst pick a level l ∈ ¦0, 1, . . . , n¦ and compute the auxiliary probability P[M
n
≥ l] by
counting the number of trajectories whose maximal level reached is at least l. Indeed, the symmetry
assumption ensures that all trajectories are equally likely. More precisely, let A
l
⊂ C be given by
A
l
=
_
(x
0
, x
1
, . . . , x
n
) ∈ C : max
k=0,...,n
x
k
≥ l
_
=
_
(x
0
, x
1
, . . . , x
n
) ∈ C : x
k
≥ l, for at least one k ∈ ¦0, . . . , n¦
_
.
5
Then P[M
n
≥ l] =
1
2
n
[A
l
[, where [A[ denotes the number of elements in the set A. When l = 0,
we clearly have P[M
n
≥ 0] = 1, since X
0
= 0. To count the number of elements in A
l
, we use the
following clever observation (known as the reﬂection principle):
Claim: For l ∈ N, we have
[A
l
[ = 2
¸
¸
¦(x
0
, x
1
, . . . , x
n
) : x
n
> l¦
¸
¸
+
¸
¸
¦(x
0
, x
1
, . . . , x
n
) : x
n
= l¦
¸
¸
. (1.2)
We start by deﬁning a bijective transformation which maps trajectories into trajectories. For a
trajectory (x
0
, x
1
, . . . , x
n
) ∈ A
l
, let k(l) = k(l, (x
0
, x
1
, . . . , x
n
)) be the smallest value of the index k
such that x
k
≥ l. In the stochasticprocesstheory parlance, k(l) is the ﬁrst hitting time of the
set ¦l, l +1, . . . ¦. We know that k(l) is welldeﬁned (since we are only applying it to trajectories in
A
l
) and that it takes values in the set ¦1, . . . , n¦. With k(l) at our disposal, let (y
0
, y
1
, . . . , y
n
) ∈ C
be a trajectory obtained from (x
0
, x
1
, . . . , x
n
) by the following procedure:
1. do nothing until you get to k(l):
• y
0
= x
0
,
• y
1
= x
1
, . . .
• y
k(l)
= x
k(l)
.
2. use the ﬂipped values for the cointosses
from k(l) onwards:
• y
k(l)+1
−y
k(l)
= −(x
k(l)+1
−x
k(l)
),
• y
k(l)+2
−y
k(l)+1
= −(x
k(l)+2
−x
k(l)+1
),
. . .
• y
n
−y
n−1
= −(x
n
−x
n−1
).
2 4 6 8 10 12 14
2
2
4
6
The picture on the right shows two trajectories: a blue one and its reﬂection in red, with n = 15,
l = 4 and k(l) = 8. Graphically, (y
0
, . . . , y
n
) looks like (x
0
, . . . , x
n
) until it hits the level l, and
then follows its reﬂection around the level l so that y
k
− l = l − x
k
, for k ≥ k(l). If k(l) = n,
then (x
0
, x
1
, . . . , x
n
) = (y
0
, y
1
, . . . , y
n
). It is clear that (y
0
, y
1
, . . . , y
n
) is in C. Let us denote this
transformation by
Φ : A
l
→ C, Φ(x
0
, x
1
, . . . , x
n
) = (y
0
, y
1
, . . . , y
n
)
and call it the reﬂection map. The ﬁrst important property of the reﬂection map is that it is its
own inverse: apply Φ to any (y
0
, y
1
, . . . , y
n
) in A
l
, and you will get the original (x
0
, x
1
, . . . , x
n
). In
other words Φ ◦ Φ = Id, i.e. Φ is an involution. It follows immediately that Φ is a bijection from
A
l
onto A
l
.
To get to the second important property of Φ, let us split the set A
l
into three parts according
to the value of x
n
:
1. A
>
l
= ¦(x
0
, x
1
, . . . , x
n
) ∈ A
l
: x
n
> l¦,
2. A
=
l
= ¦(x
0
, x
1
, . . . , x
n
) ∈ A
l
: x
n
= l¦, and
3. A
<
l
= ¦(x
0
, x
1
, . . . , x
n
) ∈ A
l
: x
n
< l¦,
So that
Φ(A
>
l
) = A
<
l
, Φ(A
<
l
) = A
>
l
, and Φ(A
=
l
) = A
=
l
.
6
We should note that, in the deﬁnition of A
>
l
and A
=
l
, the a priori stipulation that (x
0
, x
1
, . . . , x
n
) ∈
A
l
is unnecessary. Indeed, if x
n
≥ l, you must already be in A
l
. Therefore, by the bijectivity of Φ,
we have
[A
<
l
[ = [A
>
l
[ = [¦(x
0
, x
1
, . . . , x
n
) : x
n
> l¦[,
and so
[A
l
[ = 2 [¦(x
0
, x
1
, . . . , x
n
) : x
n
> l¦[ +[¦(x
0
, x
1
, . . . , x
n
) : x
n
= l¦[.
This shows the claim.
Now that we have (1.2), we can easily rewrite it as follows:
P[M
n
≥ l] = P[X
n
= l] + 2
j>l
P[X
n
= j] =
j>l
P[X
n
= j] +
j≥l
P[X
n
= j].
Finally, we subtract P[M
n
≥ l + 1] from P[M
n
≥ l] to get the expression for P[M
n
= l]:
P[M
n
= l] = P[X
n
= l + 1] +P[X
n
= l].
It remains to note that only one of the probabilities P[X
n
= l + 1] and P[X
n
= l] is nonzero, the
ﬁrst one if n and l have diﬀerent parity and the second one otherwise. In either case the nonzero
probability is given by
_
n
¸
n+l+1
2

_
2
−n
.
Let us use the reﬂection principle to solve a classical problem in combinatorics.
Example 1.8 (The Ballot Problem). Suppose that two candidates, Daisy and Oscar, are run
ning for oﬃce, and n ∈ N voters cast their ballots. Votes are counted by the same oﬃcial, one
by one, until all n of them have been processed (like in the old days). After each ballot is opened,
the oﬃcial records the number of votes each candidate has received so far. At the end, the oﬃcial
announces that Daisy has won by a margin of m > 0 votes, i.e., that Daisy got (n+m)/2 votes and
Oscar the remaining (n −m)/2 votes. What is the probability that Daisy never trails Oscar during
the counting of the votes?
We assume that the order in which the oﬃcial counts the votes is completely independent of the
actual votes, and that each voter chooses Daisy with probability p ∈ (0, 1) and Oscar with probability
q = 1 − p. For k ≤ n, let X
k
be the number of votes received by Daisy minus the number of votes
received by Oscar in the ﬁrst k ballots. When the k + 1st vote is counted, X
k
either increases by
1 (if the vote was for Daisy), or decreases by 1 otherwise. The votes are independent of each other
and X
0
= 0, so X
k
, 0 ≤ k ≤ n is (the beginning of ) a simple random walk. The probability of an
upstep is p ∈ (0, 1), so this random walk is not necessarily symmetric. The ballot problem can now
be restated as follows:
What is the probability that X
k
≥ 0 for all k ∈ ¦0, . . . , n¦, given that X
n
= m?
The ﬁrst step towards understanding the solution is the realization that the exact value of p does
not matter. Indeed, we are interested in the conditional probability P[F[G] = P[F ∩G]/P[G], where
F = “all trajectories that stay nonnegative” = ¦X
i
≥ 0, for all 0 ≤ i ≤ n¦,
G = “all trajectories that reach m at time n” = ¦X
n
= m¦.
7
Each trajectory in G has (n+m)/2 upsteps and (n−m)/2 downsteps, so its probability weight is
always equal to p
(n+m)/2
q
(n−m)/2
. Therefore,
P[F[G] =
P[F ∩ G]
P[G]
=
[F ∩ G[ p
(n+m)/2
q
(n−m)/2
[G[ p
(n+m)/2
q
(n−m)/2
=
[F ∩ G[
[G[
. (1.3)
We already know how to count the number of paths in G  it is equal to
_
n
(n+m)/2
_
 so “all” that
remains to be done is to count the number of paths in G∩ F. If we set
H = “all paths which ﬁnish at m and visit the level l = −1”
= ¦X
n
= m and min
0≤i≤n
X
i
≤ −1¦.
then G = (G∩F) ∪H. In other words, the collection of paths that go from 0 to m can be split into
1. G∩ F: the paths that go from 0 to m and stay nonegative, and
2. H: the paths that go from 0 to m and become negative at some point.
So [G∩ F[ = [G[ −[H[.
Can we use the reﬂection principle to ﬁnd [H[? Yes, we can. In fact, you can convince yourself
that the reﬂection of any path in H around the level l = −1 after its ﬁrst hitting time of that level
produces a path that starts at 0 and ends at −m − 2. Conversely, the same procedure applied to
such a path yields a path in H. If a path travels from 0 to −m−2, then it must have (n+m+2)/2
down steps and (n −m−2)/2 up steps. This means there are
_
n
1+(n+m)/2
_
of these paths. Putting
everything together, we get
P[F[G] =
_
n
k
_
−
_
n
k+1
_
_
n
k
_ =
2k + 1 −n
k + 1
, where k =
n +m
2
.
The last equality follows from the deﬁnition of binomial coeﬃcients
_
n
k
_
=
n!
k!(n−k)!
.
How would you modify this argument compute the probability that Daisy leads Oscar during the
entire counting of votes?
The Ballot problem has a long history (going back to at least 1887) and has spurred a lot of
research in combinatorics and probability. In fact, people still write research papers on some of
its generalizations. When posed outside the context of probability, it is often phrased as “in how
many ways can the counting be performed . . . ” (the diﬀerence being only in the normalizing factor
_
n
k
_
appearing in (1.3) above). A special case m = 0 seems to be even more popular  the number
of 2nstep paths from 0 to 0 never going below zero is called the Catalan number and equals to
C
n
=
1
n + 1
_
2n
n
_
.
Can you derive this expression from (1.3)? If you want to test your understanding a bit further,
here is an identity (called Segner’s recurrence formula) satisﬁed by the Catalan numbers
C
n
=
n
i=1
C
i−1
C
n−i
, n ∈ N.
Can you prove it using the Ballotproblem interpretation?
8
2 Generating Functions
A generating function is a clothesline on which we hang up a sequence of numbers for display.
Herbert S. Wilf, generatingfunctionology
The pathcounting method used in the previous lecture only works for computations related to
the ﬁrst n steps of the random walk, where n is given in advance. We will see later that most of the
interesting questions do not fall into this category. For example, the distribution of the time it takes
for the random walk to hit the level l ,= 0 is like that. There is no way to give an apriori bound
on the number of steps it will take to get to l (in fact, the expectation of this random variable can
be +∞). To deal with a wider class of properties of random walks (and other processes), we need
to develop some new mathematical tools.
2.1 Generating functions
A generating function associates a sequence of numbers with a function (a power series). It turns
out that in many cases of interest we can use this function to learn about the sequence.
Deﬁnition 2.1. If (a
n
)
n∈N
0
is a sequence of numbers then we say that radius of convergence
of the power series
∞
k=0
a
k
s
k
is the largest number R ∈ [0, ∞] such that
∞
k=0
[a
k
[[s[
k
converges
when [s[ < R. When R > 0, we say that the function
G(s) =
k∈N
0
a
k
s
k
, −R < s < R,
is the generating function associated with the sequence (a
n
)
n∈N
0
.
The generating function A associated with a sequence (a
n
)
n∈N
0
is inﬁnitely diﬀerentiable, and
its derivative can be expressed as another power series.
Proposition 2.2. When R > 0, the function A(s) is inﬁnitely diﬀerentiable on (−R, R) and
d
n
ds
n
A(s) =
n≤k<∞
k(k −1) (k −n + 1) a
k
s
k−n
, n ∈ N. (2.1)
In particular, a
n
=
1
n!
d
n
ds
n
A(s)
¸
¸
s=0
, so we can recover the sequence (a
n
)
n∈N
0
from the function A.
Proof. If we formally diﬀerentiate each term in the expression:
A(s) = (a
0
+a
1
s +a
2
s
2
+a
3
s
3
+. . . ),
then we see that the result is again a power series with the stated coeﬃcients. For example:
d
ds
A(s) =
1≤k<∞
k a
k
s
k−1
,
d
2
ds
2
A(s) =
2≤k<∞
k(k −1) a
k
s
k−2
.
Checking this result rigorously is beyond the scope of this course.
9
The name generating function comes from the last part of this result. The knowledge of A implies
the knowledge of the whole sequence (a
n
)
n∈N
0
. It also turns out that we can use generating functions
to study convolution. We will see shortly that convolution arises naturally when we compute the
probability mass function of the sum of two independent, N
0
valued, random variables.
Deﬁnition 2.3. Let (a
n
)
n∈N
0
and (b
n
)
n∈N
0
be sequences and deﬁne
c
n
=
n
j=0
a
j
b
j−k
=
n
k=0
a
n−k
b
k
, n ∈ N
0
.
Then we say that the sequence (c
n
)
n∈N
0
is the convolution of the sequences (a
n
)
n∈N
0
and (b
n
)
n∈N
0
and we write c = a ∗ b.
It turns out that convolving two sequences is equivalent to multiplying their generating functions.
Proposition 2.4. Let (a
n
)
n∈N
0
and (b
n
)
n∈N
0
be sequences, let G
a
and G
b
denote the generating
functions associated with these sequences, and assume that both power series have radius of con
vergence at least as large as R > 0. If we set c = a ∗ b, then the
G
c
(s) =
∞
k=0
c
k
s
k
,
also has radius of convergence at least as large as R, and G
c
(s) = G
a
(s)G
b
(s) for [s[ < R.
Proof. If we formally expand and then collect like powers of s in the following expression:
G
a
(s)G
b
(s) = (a
0
+a
1
s +a
2
s
2
+. . . ) (b
0
+b
1
s +b
2
s
2
+. . . )
then we see that the resulting coeﬃcient for s
n
is given by c
n
. Checking the remaining claims
rigorously is again beyond the scope of this course.
2.2 Associating a generating function with a random variable
In this section we will look at random variables which take values in the set
T N
0
∪ ¦+∞¦ = ¦0, 1, 2, 3, . . . ¦ ∪ ¦+∞¦.
We will often be interested in random variables which record the amount of (discrete) time that we
have to wait for an event to occur. In some cases, the event may never occurs, so we allow these
random variables to take the value +∞ to indicate that we have to wait “forever.”
The distribution of an Tvalued random variable X is completely determined by the sequence
(a
n
)
n∈N
0
of numbers in [0, 1] given by
a
n
= P[X = n], n ∈ N
0
. (2.2)
Notice that the value P(X = ∞) does not occur in the sequence (a
n
)
n∈N
0
, but we can still ﬁgure it
from the values in the sequence:
P(X = ∞) = 1 −P(X < ∞) = 1 −
n∈N
0
a
n
.
10
In the future, when we say “let (a
n
)
n∈N
0
be the sequence associated with X”, we mean that (a
n
)
n∈N
0
is given by (2.2). We then deﬁne the generating function associated with the sequence (a
n
)
n∈N
0
by
G
X
(s) =
0≤k<∞
a
k
s
k
. (2.3)
It follows from the fact that [a
n
[ ≤ 1 that the radius of convergence of the sequence (a
n
)
n∈N
0
is at
least 1 and G
X
is well deﬁned for s ∈ (−1, 1). The function G
X
that we have obtained in known
as the generating function or probability generating function associated with X.
Before we proceed, let us ﬁnd an expression for the generating functions of some of the popular
N
0
valued random variables.
Example 2.5.
(1) Bernoulli(p): Here a
0
= q, a
1
= p, and a
n
= 0, for n ≥ 2. Therefore,
G
X
(s) = ps +q.
(2) Binomial(n, p): Since a
k
=
_
n
k
_
p
k
q
n−k
, k = 0, . . . , n, we have
G
X
(s) =
n
k=0
_
n
k
_
p
k
q
n−k
s
k
= (ps +q)
n
,
by the binomial theorem.
(3) Geometric(p) For k ∈ N
0
, a
k
= q
k
p, so that
G
X
(s) =
∞
k=0
q
k
s
k
p = p
∞
k=0
(qs)
k
=
p
1 −qs
.
(4) Poisson(λ): Given that a
k
= e
−λ λ
k
k!
, k ∈ N
0
, we have
G
X
(s) =
∞
k=0
e
−λ
λ
k
k!
s
k
= e
−λ
∞
k=0
(sλ)
k
k!
= e
−λ
e
sλ
= e
λ(s−1)
.
Remark 2.6. Note that the true radius of convergence varies from distribution to distribution but
is never smaller than 1. It is inﬁnite in (1), (2) and (4), and equal to 1/q > 1 in (3), in Example
2.5. For the distribution with pmf given by a
k
=
C
(k+1)
2
, where C = (
∞
k=0
1
(k+1)
2
)
−1
, the radius of
convergence is exactly equal to 1. Can you see why?
The following proposition gives another way to the compute the generating function associated
with a random variable.
Proposition 2.7. Let X be an Tvalued random variable with generating function G
X
. Then
1. G
X
(s) = E[s
X
] = E[s
X
1
{X<∞}
], s ∈ (−1, 1),
2. P(X < ∞) = lim
s1
G
X
(s).
11
Proof. Statement (1) follows directly from the formula
E[g(X)] =
n∈T
g(n) P(X = n)
applied to g(x) = s
x
where we have used the fact/convention that s
∞
= 0 if [s[ < 1.
The second claim follows from the fact that:
lim
s1
G
X
(s) = lim
s1
n∈N
0
s
n
P(X = n)
=
n∈N
0
lim
s1
s
n
P(X = n) =
n∈N
0
P(X = n) = P(X < ∞).
Of course, one should really justify the fact that we can exchange the summation and limit in the
previous equation. In this case, one could employ the monotone convergence theorem from real
analysis, or simply check it by hand, but this is beyond the scope of this course.
Remark 2.8. We used the formula a
n
= P[X = n] to associate a sequence with the random variable
X. One could also use the formula b
n
= E[X
n
/n!] to associate a sequence with X. If one then
computes the generating function B corresponding to the sequence (b
n
), the resulting function
is given by B(s) = E[e
sX
] and is known as the moment generating function associated with
X. The moment generating function is quite similar to the probability generating function. In
particular, one can check that that they are related by the formula G
X
(s) = B(log(s)), for s ∈ (0, 1).
The probability generating function will turn out to be more convenient for this class.
2.3 Convolution
The true power of generating functions comes from the fact that they behave very well under the
usual operations in probability.
Proposition 2.9. Let X, Y be independent Tvalued random variables and set Z = X + Y . If
(a
n
)
n∈N
0
and G
X
are the sequence and generating function associated with X, (b
n
) and G
y
are
the sequence and generating function associated with Y , and (c
n
) and G
Z
are the sequence and
generating function associated with Z, then c = a ∗ b and G
Z
(s) = G
X
(s)G
Y
(s).
Proof. For each n ∈ N, we have
c
n
= P(Z = n) =
n
i=0
P(X = i and Y = n −i) =
n
i=0
P(X = i) P(Y = n −i) =
n
i=0
a
i
b
n−1
.
Similarly, if [s[ < 1, then
G
Z
(s) = E[s
Z
] = E[s
X+Y
] = E[s
X
s
Y
] = E[s
X
] E[s
Y
] = G
X
(s)G
Y
(s).
12
Example 2.10.
1. The binomial(n, p) distribution is a sum of n independent Bernoulli random variables with
parameter(p). Therefore, if we apply Prop. 2.9 n times to the generating function (q +ps) of
the Bernoulli b(p) distribution we immediately get that the generating function of the binomial
is (q +ps) . . . (q +ps) = (q +ps)
n
.
2. More generally, we can show that the sum of m independent random variables with the b(n, p)
distribution has a binomial(mn, p) distribution. If you try to sum binomials with diﬀerent
values of the parameter p you will not get a binomial.
3. What is even more interesting, the following statement can be shown: Suppose that the sum
Z of two independent N
0
valued random variables X and Y is binomially distributed with
parameters n and p. Then both X and Y are binomial with parameters n
X
, p and n
y
, p where
n
X
+n
Y
= n. In other words, the only way to get a binomial as a sum of independent random
variables is the trivial one.
We will actually need something slightly more complicated than Proposition 2.9 when we get
back to random walks. To understand what we want, lets consider the following example.
Example 2.11. You own a pizza shop with a single delivery driver. You send your driver out with
an order, but then realize that you gave him the wrong pizza. Unfortunately, it’s 1983 so you can’t
call your driver on a cell phone and there is nothing that you can do but wait for your driver to
return.
Let T
1
∈ T denote the time that your driver gets back from the ﬁrst trip, and let T
2
≥ T
1
denote the time that your driver gets back from the second trip. Of course, these should probably be
continuous random variable, but lets assume that the world is discrete. Moreover, if you don’t like
the idea that T
1
might take the value zero, you can just assign probability zero to this possibility.
Your driver is actually somewhat unreliable: on each trip there is some chance that he will decide
he is sick of this job and never return. In other words, P(T
1
= ∞) > 0 and P(T
2
= ∞) > 0.
In the event that your driver does return to the shop after the ﬁrst trip, the time that it takes
for him to make the second round trip is independent of the time that the ﬁrst trip took and has
the same distribution. More formally, we suppose that:
P(T
2
= m+n [ T
1
= m) = P(T
1
= n), m, n ∈ N
0
. (2.4)
Since conditional on the event ¦T
1
= m¦, T
2
can only take the values ¦m, m+1, m+2, . . . ¦∪¦∞¦.
So it follows from (2.4) that
P(T
2
= ∞ [ T
1
= m) = 1 −
n∈N
0
P(T
2
= m+n [ T
1
= m) = 1 −
n≤k<∞
P(T
1
= n) = P(T
1
= ∞).
Let G
T
1
denote the generating function associated with the random variable T
1
and let G
T
2
denote the generating function associated with the random variable T
2
. We would like to check that
G
T
2
(s) = G
2
T
1
(s) for [s[ < 1. Morally, we want to apply Prop. 2.9 to T
1
and T
2
−T
1
, but there one
sticking point: if T
1
= ∞, then T
2
= ∞ and T
2
−T
1
is undeﬁned. In other words, if the driver quits
before he return from the ﬁrst round trip, then it doesn’t make any sense to ask how long the second
13
round trip took. We could just deﬁne ∞−∞ = ∞, but this still isn’t going to make T
1
and T
2
−T
1
independent. Fortunately, this minor annoyance doesn’t end up mattering. First notice that
E(s
T
2
[ T
1
= m) =
n∈T
s
m+n
P(T
1
= n) = s
m
G
T
1
(s), m ∈ N
0
, [s[ < 1.
We also know that T
2
= ∞ when T
1
= ∞, so
E(s
T
2
[ T
1
= ∞) = s
∞
= 0, [s[ < 1.
As a result, we may apply the tower law for conditional expectation to conclude that
G
T
2
(s) = E[s
T
2
] =
m∈T
E[s
T
2
[ T
1
= m] P(T
1
= m) =
m∈N
0
s
m
G
T
1
(s) P(T
1
= m) = G
2
T
1
(s),
when [s[ < 1.
In fact, we know something slightly stronger. If two generating functions agree around zero,
then it follows from Proposition 2.2 they are generated by the same sequence, so they must agree
on their entire common radius of convergence.
We have now shown the following proposition which we will need in the next section.
Proposition 2.12. Let T
2
≥ T
1
be random times taking values in T = N
0
∪¦+∞¦ with generating
functions G
T
1
and G
T
2
. If
P(T
2
= m+n [ T
1
= m) = P(T
1
= n), m, n ∈ N
0
,
then G
T
2
(s) = G
2
T
1
(s).
2.4 Moments
Another useful thing about generating functions is that they can make the computation of moments
easier. Recall that E[X
n
] is said to be the nth moment of X. Also notice that if P(X = ∞) > 0,
then X
n
is never integrable, so we will now restrict attention to random variables that only take
values in N
0
.
Proposition 2.13. Let X be a N
0
valued random variable X with generating function G
X
. For
n ∈ N the following two statements are equivalent
1. E[X
n
] < ∞,
2.
d
n
G
X
(s)
ds
n
¸
¸
¸
s=1
exists (in the sense that the left limit lim
s1
d
n
G
X
(s)
ds
n
exists)
In either case, we have
E[X(X −1)(X −2) . . . (X −n + 1)] =
d
n
ds
n
G
X
(s)
¸
¸
¸
s=1
.
Proof. Formally, one can check this by setting s = 1 in (2.1), and checking the resulting summation
amounts to calculating the desired expectation.
14
The quantities
E[X], E[X(X −1)], E[X(X −1)(X −2)], . . .
are called factorial moments of the random variable X. You can get the classical moments from
the factorial moments by solving a system of linear equations. It is very simple for the ﬁrst few:
E[X] = E[X],
E[X
2
] = E[X(X −1)] +E[X],
E[X
3
] = E[X(X −1)(X −2)] + 3E[X(X −1)] +E[X], . . .
A useful identity which follows directly from the above results is the following:
Var[X] = P
(1) +P
(1) −(P
(1))
2
,
and it is valid if the ﬁrst two derivatives of P at 1 exist.
Example 2.14. Let X be a Poisson random variable with parameter λ. Its generating function is
given by
A(s) = e
λ(s−1)
.
Therefore,
d
n
ds
n
A(1) = λ
n
, and so, the sequence (E[X], E[X(X − 1)], E[X(X − 1)(X − 2)], . . . ) of
factorial moments of X is just (λ, λ
2
, λ
3
, . . . ). It follows that
E[X] = λ,
E[X
2
] = λ
2
+λ, Var[X] = λ
E[X
3
] = λ
3
+ 3λ
2
+λ, . . .
Example 2.15. We have an urn which contains three number balls. We then play a repeated game
where on each turn we draw a ball with the following outcomes:
a) If we draw the ﬁrst ball, we win a dollar, replace the ball in the urn, and then play another
round.
b) If we draw the second ball, we win two dollars, replace the ball in the urn, and then play
another round.
c) If we draw the third ball, the game is over.
Let X denote the amount of money that we win in this game. The number of rounds that we play
has a geometric distribution, so the game ends with probability one and P(X = ∞) = 0. We would
like to determine the generating function G
X
associated with X. To do this, we let Y denote the
amount of winnings obtained after (but not including the winnings from) the ﬁrst round, and we
let Z denote the ﬁrst ball drawn. Then the conditional distribution of Y given Z = 1 or Z = 2 is
the same as the unconditional distribution of X. That is:
P(Y = n [ Z = 1) = P(Y = n [ Z = 2) = P(X = n), for all n ∈ N
0
.
As a result:
G
Y
(s) = E(s
Y
[ Z = 1) = E(s
Y
[ Z = 2) = E[s
X
] = G
X
(s),
15
and
G
X
(s) = E[s
X
] = E[s
X
[ Z = 1] P(Z = 1) +E[s
X
[ Z = 2] P(Z = 2) +E[s
X
[ Z = 3] P(Z = 3)
= E[s
1+Y
[ Z = 1]/3 +E[s
2+Y
[ Z = 2]/3 +E[s
0
[ Z = 3]/3
= sG
X
(s)/3 +s
2
G
X
(s)/3 + 1/3.
Solving for G
X
show that G
X
(s) = 1/(3 −s −s
2
). In particular,
G
X
(s) =
1 + 2s
(3 −s −s
2
)
2
, G
X
(s) =
8 + 6s + 6s
2
(3 −s −s
2
)
3
.
As a result, we can determine a number of properties of X
P(X = 0) = G
X
(0) = 1/3, P(X = 1) = G
X
(0) = 1/9, P(X = 2) = G
X
(0)/2 = 4/27,
E[X] = G
X
(1) = 3 E[X
2
] = G
X
(1) +G
X
(1) = 23 Var(X) = 14.
2.5 Random sums
Our next application of generating function in the theory of stochastic processes deals with the
socalled random sums. Let (ξ
n
)
n∈N
be a sequence of random variables, and let N be a random
time (a random time is simply an T = N
0
∪ ¦+∞¦value random variable). We can deﬁne the
random variable
Y =
N
k=0
ξ
k
by Y (ω) =
_
0, N(ω) = 0,
N(ω)
k=1
ξ
k
(ω), N(ω) ≥ 1
for ω ∈ Ω.
More generally, for an arbitrary stochastic process (X
n
)
n∈N
0
and a random time N (with P[N =
+∞] = 0), we deﬁne the random variable X
N
by X
N
(ω) = X
N(ω)
(ω), for ω ∈ Ω. When N
is a constant (N = n), then X
N
is simply equal to X
n
. In general, think of X
N
as a value
of the stochastic process X taken at the time which is itself random. If X
n
=
n
k=1
ξ
k
, then
X
N
=
N
k=1
ξ
k
.
Example 2.16. Let (ξ
n
)
n∈N
be the increments of a symmetric simple random walk (cointosses),
and let N have the following distribution
n = 0 1 2
P(N = n) = 1/3 1/3 1/3
which is independent of (ξ
n
)
n∈N
(it is very important to specify the dependence structure between
N and (ξ
n
)
n∈N
in this setting!). Let us compute the distribution of Y =
N
k=0
ξ
k
in this case. This
16
is where we, typically, use the formula of total probability:
P(Y = m)
= P(Y = m[N = 0) P(N = 0) +P(Y = m[N = 1) P(N = 1) +P(Y = m[N = 2) P(N = 2)
= P
_
N
k=0
ξ
k
= m
¸
¸
¸ N = 0
_
P(N = 0) +P
_
N
k=0
ξ
k
= m
¸
¸
¸ N = 1
_
P(N = 1)
+P
_
N
k=0
ξ
k
= m
¸
¸
¸ N = 2
_
P(N = 2)
=
1
3
_
1
{m=0}
+P(ξ
1
= m) +P(ξ
1
+ξ
2
= m)
_
.
When m = 1 (for example), we get
P[Y = 1] =
0 +
1
2
+ 0
3
= 1/6.
Perform the computation for some other values of m for yourself.
What happens when N and (ξ
n
)
n∈N
are dependent? This will usually be the case in practice, as
the value of the time N when we stop adding increments will typically depend on the behavior of
the sum itself.
Example 2.17. Let (ξ
n
)
n∈N
be as above  we can think of a situation where a gambler is repeatedly
playing the same game in which a fair coin is tossed and the gambler wins a dollar if the outcome
is heads and loses a dollar otherwise. A “smart” gambler enters the game and decides on the
following tactic: Let’s see how the ﬁrst game goes. If I lose, I’ll play another 2 games and hopefully
cover my losses, and if I win, I’ll quit then and there. The described strategy amounts to the choice
of the random time N as follows:
N(ω) =
_
1, ξ
1
= 1,
3, ξ
1
= −1.
Then
Y (ω) =
_
1, ξ
1
= −1,
−1 +ξ
2
+ξ
3
, ξ
1
= 1.
Therefore,
P[Y = 1] = P[Y = 1[ξ
1
= 1]P[ξ
1
= 1] +P[Y = 1[ξ
1
= −1]P[ξ
1
= −1]
= 1 P[ξ
1
= 1] +P[ξ
2
+ξ
3
= 2]P[ξ
1
= −1]
=
1
2
(1 +
1
4
) =
5
8
.
Similarly, we get P[Y = −1] =
1
4
and P[Y = −3] =
1
8
. The expectation E[Y ] is equal to 1
5
8
+
(−1)
1
4
+ (−3)
1
8
= 0. This is not an accident. One of the ﬁrst powerful results of the beautiful
martingale theory states that no matter how smart a strategy you employ, you cannot beat a fair
gamble.
17
We will return to the general (nonindependent) case in the next lecture. Let us use generating
functions to give a full description of the distribution of Y =
N
k=0
ξ
k
when the time is independent
of the summands.
Proposition 2.18. Let (ξ
n
)
n∈N
be a sequence of independent N
0
valued random variables, all of
which share the same distribution and generating function G
ξ
(s). Let N be a random time which
is independent of (ξ
n
)
n∈N
with P(N < ∞) = 1 and generating function G
N
. Then the generating
function G
Y
of the random sum Y =
N
k=0
ξ
k
is given by
G
Y
(s) = G
N
_
G
ξ
(s)
_
.
Proof. First let X
0
= 0 and X
n
=
n
i=1
ξ
i
for n ∈ N denote the sequence of partial sums. Repeated
applications of Proposition 2.9 show that G
Xn
= G
n
ξ
(where G
0
ξ
(s) = 1). As a result, we may apply
the tower law for conditional expectation to see that
E[s
Y
] =
n∈N
0
E[s
Y
[ N = n] P(N = n)
=
n∈N
0
E[s
Xn
[ N = n] P(N = n) =
n∈N
0
G
n
ξ
(s) P(N = n) = G
N
_
G
ξ
(s)
_
.
Corollary 2.19 (Wald’s Identity I). Let (ξ
n
)
n∈N
and N be as in Proposition 2.18. Suppose,
also, that E[N] < ∞ and E[ξ
1
] < ∞. Then
E
_
N
k=0
ξ
k
_
= E[N] E[ξ
1
].
Proof. We just apply the composition rule for derivatives to the equality G
Y
= G
N
◦ G
ξ
to get
G
Y
(s) = G
N
(G
ξ
(s))G
ξ
(s).
After we let s ¸ 1, we get
E[Y ] = G
Y
(1) = G
N
(G
ξ
(1))G
ξ
(1) = G
N
(1)G
ξ
(1) = E[N] E[ξ
1
].
Example 2.20. Every time the Springﬁeld Isotopes play in the league championship, their chance
of winning is p ∈ (0, 1). The number of years between two championships they get to play in has the
Poisson distribution p(λ), λ > 0. What is the expected number of years Y between the consecutive
championship wins?
Let (ξ
n
)
n∈N
be the sequence of independent Poisson(λ)random variables modeling the number
of years between consecutive championship appearances by the Wildcat. Moreover, let N be a
Geometric(p) random variable with success probability p. Then
Y =
N
k=0
ξ
k
.
Indeed, every time the Isotopes lose the championship, another ξ
·
years have to pass before they get
another chance and the whole thing stops when they ﬁnally win. To compute the expectation of Y
we use Corollary 2.19
E[Y ] = E[N] E[ξ
k
] =
1 −p
p
λ.
18
3 Random Walks II
3.1 Stopping times
The last application of generating functions dealt with sums evaluated between 0 and some random
time N. An especially interesting case occurs when the value of N depends directly on the evolution
of the underlying stochastic process. Even more important is the case where time’s arrow is taken
into account. If you think of N as the time you stop adding new terms to the sum, it is usually the
case that you are not allowed (able) to see the values of the terms you would get if you continued
adding. Think of an investor in the stock market. Her decision to stop and sell her stocks can
depend only on the information available up to the moment of the decision. Otherwise, she would
sell at the absolute maximum and buy at the absolute minimum, making tons of money in the
process. Of course, this is not possible unless you are clairvoyant, so the mere mortals have to
restrict their choices to socalled stopping times.
Deﬁnition 3.1. Let (X
n
)
n∈N
0
be a stochastic process. A random variable T taking values in T =
N
0
∪ ¦+∞¦ is said to be a stopping time with respect to (X
n
)
n∈N
0
if for each n ∈ N
0
there exists
a function G
n
: R
n+1
→ ¦0, 1¦ such that
1
{T=n}
= G
n
(X
0
, X
1
, . . . , X
n
), for all n ∈ N
0
.
The functions G
n
are called the decision functions, and should be thought of as a black box
which takes the values of the process (X
n
)
n∈N
0
observed up to the present point and outputs either
0 or 1. The value 0 means keep going and 1 means stop. The whole point is that the decision has
to based only on the available observations and not on the future ones.
Example 3.2.
1. The simplest examples of stopping times are (nonrandom) deterministic times. Just set
T = 5 (or T = 723 or T = n
0
for any n
0
∈ N
0
∪ ¦+∞¦), no matter what the state of the
world ω ∈ Ω is. The family of decision rules is easy to construct:
G
n
(x
0
, x
1
, . . . , x
n
) =
_
1, n = n
0
,
0, n ,= n
0
.
.
Decision functions G
n
do not depend on the values of X
0
, X
1
, . . . , X
n
at all. A gambler who
stops gambling after 20 games, no matter of what the winnings of losses are uses such a rule.
2. Probably the most wellknown examples of stopping times are (ﬁrst) hitting times. They can
be deﬁned for general stochastic processes, but we will stick to simple random walks for the
purposes of this example. So, let X
n
=
n
k=0
ξ
k
be a simple random walk, and let T
l
be the
ﬁrst time X hits the level l ∈ N. More precisely, we use the following slightly nonintuitive
but mathematically correct deﬁnition
T
l
= min¦n ∈ N
0
: X
n
= l¦.
The set ¦n ∈ N
0
: X
n
= l¦ is the collection of all timepoints at which X visits the level l.
The earliest one  the minimum of that set  is the ﬁrst hitting time of l. In states of the
19
world ω ∈ Ω in which the level l just never get reached, i.e., when ¦n ∈ N
0
: X
n
= l¦ is an
empty set, we set T
l
(ω) = +∞. In order to show that T
l
is indeed a stopping time, we need to
construct the decision functions G
n
, n ∈ N
0
. Let us start with n = 0. We would have T
l
= 0
in the (impossible) case X
0
= l, so we always have G
0
(X
0
) = 0. How about n ∈ N. For the
value of T
l
to be equal to exactly n, two things must happen:
(a) X
n
= l (the level l must actually be hit at time n), and
(b) X
n−1
,= l, X
n−2
,= l, . . . , X
1
,= l, X
0
,= l (the level l has not been hit before).
Therefore,
G
n
(x
0
, x
1
, . . . , x
n
) =
_
1, x
0
,= l, x
1
,= l, . . . , x
n−1
,= l, x
n
= l
0, otherwise.
The hitting time T
2
of the level l = 2 for a particular trajectory of a symmetric simple random
walk is depicted below:
T
2
T
M
5 10 15 20 25 30
6
4
2
2
4
6
.
3. How about something that is not a stopping time? Let n
0
be an arbitrary timehorizon and let
T
M
be the last time during 0, . . . , n
0
that the random walk visits its maximum during 0, . . . , n
0
(see picture above). If you bought a stock at time t = 0, had to sell it some time before n
0
and
had the ability to predict the future, this is one of the points you would choose to sell it at.
Of course, it is impossible to decide whether T
M
= n, for some n ∈ 0, . . . , n
0
−1 without the
knowledge of the values of the random walk after n. More precisely, let us sketch the proof of
the fact that T
M
is not a stopping time. Suppose, to the contrary, that it is, and let G
n
be the
family of decision functions. Consider the following two trajectories: (0, 1, 2, 3, . . . , n − 1, n)
and (0, 1, 2, 3, . . . , n − 1, n − 2). The diﬀer only in the direction of the last step. They also
diﬀer in the fact that T
M
= n for the ﬁrst one and T
M
= n − 1 for the second one. On the
other hand, by the deﬁnition of the decision functions, we have
1
{T
M
=n−1}
= G
n−1
(X
0
, . . . , X
n−1
).
The righthand side is equal for both trajectories, while the lefthand side equals to 0 for the
ﬁrst one and 1 for the second one. A contradiction.
Remark 3.3. In the remainder of this section, we will sometimes write our decision function as a
function of the random variables ξ
1
, ξ
2
, . . . , ξ
n
rather than the random variables X
0
, X
1
, X
2
, . . . , X
n
.
As knowing the values X
0
, X
1
, X
2
, . . . , X
n
is clearly equivalent to knowing the values ξ
1
, ξ
2
, . . . , ξ
n
,
we are free to use whichever representation is more convenient.
20
3.2 Wald’s identity II
Having deﬁned the notion of a stopping time, let use try to compute something about it. The
random variables (ξ
n
)
n∈N
in the statement of the theorem below are only assumed to be independent
of each other and identically distributed. To make things simpler, you can think of (ξ
n
)
n∈N
as
increments of a simple random walk. Before we state the main result, here is an extremely useful
identity:
Proposition 3.4. Let N be an N
0
valued random variable. Then
E[N] =
k∈N
0
P[N ≥ k].
Proof. Clearly, P[N ≥ k] =
j≥k
P[N = j], so (note what happens to the indices when we switch
the sums)
k∈N
P[N ≥ k] =
k∈N
k≤j<∞
P[N = j]
=
j∈N
1≤k≤j
P[N = j] =
∞
j=1
j P[N = j] = E[N].
Theorem 3.5 (Wald’s Identity II). Let (ξ
n
)
n∈N
be a sequence of independent, identically dis
tributed random variables with E
_
[ξ
1
[
¸
< ∞. Set
X
n
=
n
k=1
ξ
k
, n ∈ N
0
.
If T is an (X
n
)
n∈N
0
stopping time such that E[T] < ∞, then
E[X
T
] = E[ξ
1
] E[T].
Proof. Here is another way of writing the sum
T
k=1
ξ
k
:
T
k=1
ξ
k
=
k∈N
0
ξ
k
1
{k≤T}
.
The idea behind it is simple: add all the values of ξ
k
for k ≤ T and keep adding zeros (since
ξ
k
1
{k≤T}
= 0 for k > T) after that. Taking expectation of both sides and switching E and
(this
can be justiﬁed, but the argument is technical and we omit it here) yields:
E
_
T
k=1
ξ
k
_
=
∞
k=1
E[1
{k≤T}
ξ
k
]. (3.1)
Now lets look at the random variable 1
{k≤T}
more closely. We have
1
{k≤T}
= 1 −1
{k>T}
= 1 −1
{k−1≥T}
= 1 −
k−1
j=0
1
{T=j}
= 1 −
k−1
j=0
G
j
(ξ
1
, . . . , ξ
j
),
21
where G
j
(ξ
1
, . . . , ξ
j
) is the decision function which corresponds to the the event ¦T = j¦. In
particular, we see that the random variable 1
{k≤T}
can be written as function of the variables
(ξ
1
, . . . , ξ
k−1
). As the random variables (ξ
1
, . . . , ξ
k
) are independent, the random variables 1
{k≤T}
and ξ
k
are also independent. This means that
E
_
T
k=1
ξ
k
_
=
∞
k=1
E[1
{k≤T}
] E[ξ
k
] = E[ξ
k
]
∞
k=1
P(k ≤ T) = E[ξ
k
]E[T].
Example 3.6 (Gambler’s ruin problem). A gambler start with x ∈ N dollars and repeatedly
plays a game in which he wins a dollar with probability
1
2
and loses a dollar with probability
1
2
. He
decides to stop when one of the following two things happens:
1. he goes bankrupt, i.e., his wealth hits 0, or
2. he makes enough money, i.e., his wealth reaches some level a > x.
The classical “Gambler’s ruin” problem asks the following question: what is the probability that the
gambler will make a dollars before he goes bankrupt?
Gambler’s wealth (W
n
)
n∈N
is modeled by a simple random walk starting from x, whose incre
ments ξ
k
= W
k
−W
k−1
are cointosses. Then W
n
= x +X
n
, where X
n
=
n
k=0
ξ
k
, n ∈ N
0
. Let T
be the time the gambler stops. We can represent T in two diﬀerent (but equivalent) ways. On the
one hand, we can think of T as the smaller of the two hitting times T
−x
and T
a−x
of the levels −x
and a − x for the random walk (X
n
)
n∈N
0
(remember that W
n
= x + X
n
, so these two correspond
to the hitting times for the process (W
n
)
n∈N
0
of the levels 0 and a). On the other hand, we can
think of T as the ﬁrst hitting time of the twoelement set ¦−x, a −x¦ for the process (X
n
)
n∈N
0
. In
either case, it is quite clear that T is a stopping time (can you write down the decision functions?).
We will see later that the probability that the gambler’s wealth will remain strictly between 0 and a
forever is zero, so P[T < ∞] = 1.
What can we say about the random variable X
T
 the gambler’s wealth (minus x) at the random
time T? Clearly, it is either equal to −x or to a −x, and the probabilities p
0
and p
a
with which it
takes these values are exactly what we are after in this problem. We know that, since there are no
other values X
T
can take, we must have p
0
+p
a
= 1. Wald’s identity gives us the second equation
for p
0
and p
a
:
E[X
T
] = E[ξ
1
]E[T] = 0 E[T] = 0,
so
0 = E[X
T
] = p
0
(−x) +p
a
(a −x).
These two linear equations with two unknowns yield
p
0
=
a −x
a
, p
a
=
x
a
.
It is remarkable that the two probabilities are proportional to the amounts of money the gambler
needs to make (lose) in the two outcomes. Again we see that the gambler cannot extract positive
expected value from a fair game. The situation is diﬀerent when p ,=
1
2
.
22
3.3 The distribution of the ﬁrst hitting time T
1
Let (X
n
)
n∈N
0
be a simple random walk, with the probability p of stepping up and let
T
= min¦n ∈ N
0
: X
n
= ¦
be the ﬁrst hitting time of level but the random walk. We will now study the random variables T
using the generatingfunction methods. We will essentially follow the approach of Example 2.15:
we will attempt to determine an equation that the generating function associated with the random
variable T
satisﬁes.
The ﬁrst step is contained in the following proposition. Recall that the minimum value in empty
set is taken to be +∞ by convention.
Proposition 3.7. Let (X
1
, X
2
, . . . ) be a random walk with probability p of moving up. If ∈ N,
then G
T
(s) = G
T
1
(s), where G
T
denotes the generating function of T
.
Proof. We will only handle the case = 2, but the general case follows in much that same way.
The strategy is to show that
P(T
2
= m+n [ T
1
= m) = P(T
1
= n), m, n ∈ N
0
.
To do this, lets consider the events
A
m,n
= “n is the ﬁrst time after m that X
n
= X
m
+ 1”
= ¦X
i
≤ X
m
for all i < n, and X
n
= X
m
+ 1¦,
for m ≤ n. These events have the two properties:
1. If i < j ≤ k < , then the events A
i,j
and A
k,
are independent.
This is because A
i,j
on depends on the steps ξ
i+1
, ξ
i+1
, . . . , ξ
j
of the random walk, while A
k,
only depends the steps ξ
k+1
, ξ
k+2
, . . . , ξ
, and all the steps of the random walk are independent.
2. P(A
i,j
) = P(A
n+i,n+j
) for all n ∈ N
0
.
The event A
i,j
is determined by some relatively complicated formula applied to the random
variables (ξ
i+1
, ξ
i+2
, . . . , ξ
j
) and the event A
n+i,n+j
is determined by the exact same formula
applied to the random variables (ξ
n+i+1
, ξ
n+k+2
, . . . , ξ
n+j
).
More precisely, we can choose a set B ∈ ¦−1, 1¦
j−i−1
such that A
i,j
= ¦(ξ
i+1
, ξ
i+2
, . . . , ξ
j
) ∈
B¦ and A
n+i,n+j
= ¦(ξ
n+i+1
, ξ
n+i+2
, . . . , ξ
n+j
) ∈ B¦. As the random vectors (ξ
i+1
, ξ
i+2
, . . . , ξ
j
)
and (ξ
n+i+1
, ξ
n+k+2
. . . , ξ
n+j
) have the same joint distribution, the events A
i,j
and A
n+i,n+j
must have the same probability.
So, if m, n ∈ N, then
P(T
2
= m+n [ T
1
= m) = P(A
m,m+n
[ A
0,m
) = P(A
m,m+n
) = P(A
0,n
) = P(T
1
= n).
In then follows from Proposition 2.12 that G
T
2
(s) = G
2
T
1
(s).
We will now use the previous relationship between T
1
and T
2
to obtain the generating function
for T
1
explicitly.
23
Proposition 3.8. Let (X
1
, X
2
, . . . ) be a random walk with probability p of moving up and let
T
1
= min¦n ∈ N
0
: X
n
= 1¦
denote the ﬁrst time that the random walk hits the level 1. Then the generating function for T
1
is
given by
G
T
1
(s) =
1 −
_
1 −4pqs
2
2qs
.
Proof. Our strategy is to condition of the ﬁrst move that the random makes, and then derive a
recursive formula for the generating function. As a result, it will be useful to consider an auxiliary
process given by:
Y
n
= X
n+1
−X
1
, n ∈ N
0
.
So Y corresponds to the changes in the process X after the ﬁrst step. It is not hard to check that
Y is also a random walk with probability p of moving up at each step and Y is independent of X
1
.
It turns out that it will be useful to consider the random variable:
T
2
= “The ﬁrst time that Y hits the level 2”
= min ¦n ∈ N
0
: Y
n
= 2¦
As Y is a random walk, the generating function for T
2
is given by G
2
T
1
. As Y is independent of X
1
,
and T
2
is determined by Y , T
2
is also independent of X
1
.
Crucial observation: If the ﬁrst step that X makes is down, then T
1
= 1+T
2
. This is because
X has now has to climb two steps up to get from −1 up to 1. This is equivalent to Y climbing two
steps up from 0 to 2. The other thing to realize is that Y is running on a clock that is one time
step behind X. If Y ﬁrst hits 2 at time T
2
relative to its clock, then X ﬁrst hits 1 at time 1 + T
2
relative the initial clock.
Now we make the recursive argument:
G
T
1
(s) = E[s
T
1
] = E[s
T
1
[ X
1
= 1] P(X
1
= 1) +E[s
T
1
[ X
1
= −1] P(X
1
= −1)
= s p +E[s
1+T
2
[ X
1
= −1] (1 −p)
= s p +s E[s
T
2
] (1 −p)
= s p +s G
2
T
1
(s) (1 −p).
We now know that G
T
1
solves: G
T
1
(s) = sp + sqG
2
T
1
(s) for [s[ < 1. There are two possible
solutions to this equation (for each s), given by
G
T
1
(s) =
1 ±
_
1 −4pqs
2
2qs
.
One of these solutions is always greater than 1 in absolute value, so it cannot correspond to a value
of a generating function and we must select the negative square root.
24
Once we have the generating function for T
1
, we can begin to answer some questions about this
random variable. The ﬁrst question is: does the random walk eventually hit the level 1? To answer
this, we use second part of Proposition 2.7 and compute
P(T
1
< ∞) = lim
s→1
G
T
1
(s) =
1 −
√
1 −4pq
2q
=
1 −[p −q[
2q
=
_
1, p ≥
1
2
p
q
, p <
1
2
,
where we have used the fact that
[p −q[
2
= (2p −1)
2
= 4p
2
−4p + 1 = 1 −4pq.
In particular, if the random walk is more likely to go down, than it is to go up, then there is
some strictly positive chance that it will never hit the level 1. It is remarkable that if p =
1
2
, the
random walk will always hit 1 sooner or later, but this does not need to happen if p <
1
2
. What
we have here is an example of a phenomenon known as criticality: many physical systems exhibit
qualitatively diﬀerent behavior depending on whether the value of certain parameter p lies above
or below certain critical value p = p
c
.
Another question that generating functions can help up answer is: how long, on average, do
we need to wait before 1 is hit? When p <
1
2
, P[T
1
= +∞] > 0, so we can immediately conclude
that E[T
1
] = +∞, by deﬁnition. The case p ≥
1
2
is more interesting. Following the recipe from the
lecture on generating functions, we compute the derivative of G
T
1
(s) and get
G
T
1
(s) =
2p
_
1 −4pqs
2
−
1 −
_
1 −4pqs
2
2qs
2
.
When p =
1
2
, we get
lim
s1
G
T
1
(s) = lim
s1
_
1
√
1 −s
2
−
1 −
√
1 −s
2
s
2
_
= +∞,
and conclude that E[T
1
] = +∞.
For p >
1
2
, the situation is less severe:
lim
s1
G
T
1
(s) =
1
p −q
.
We can summarize the situation in the following table
P[T
1
< ∞] E[T
1
]
p <
1
2
p
q
+∞
p =
1
2
1 +∞
p >
1
2
1
1
p−q
Finally, we can try to extract the probability mass function of the random variable T
1
from the
generating function G
T
1
. The obvious way to do this is to compute higher and higher derivatives
of G
T
1
and then set s = 0. In this case, it turns out there is an easier way.
25
The square root appearing in the formula for G
X
is an expression of the form (1 + x)
1/2
and
the (generalized) binomial formula can be used:
(1 +x)
α
=
∞
k=0
_
α
k
_
x
k
, where
_
α
k
_
=
α(α −1) . . . (α −k + 1)
k!
, k ∈ N, α ∈ R.
Therefore,
G
T
1
(s) =
1
2qs
−
1
2qs
∞
k=0
_
1/2
k
_
(−4pqs
2
)
k
=
∞
k=1
s
2k−1
1
2p
(4pq)
k
(−1)
k−1
_
1/2
k
_
,
and
a
2k−1
=
1
2q
(4pq)
k
(−1)
k−1
_
1/2
k
_
, k ∈ N.
Of course, the random walk cannot move from 0 to 1 in a even number of steps, so a
n
= 0 if n is
even. This expression can be simpliﬁed a bit further: one can show (by induction on k for instance)
that
_
1/2
k
_
=
2(−1)
k+1
4
k
(2k −1)
_
2k −1
k
_
.
Thus,
P(T
1
= 2k −1) =
1
2k −1
_
2k −1
k
_
p
k
q
k−1
, k ∈ N,
and P(T
1
= k) = 0 when k is even.
3.4 Strong Markov property
In the previous section, we observed that Y
n
= X
1+n
− X
n
is a again a random walk. Inuitively,
we stop, reset time and space, and then continue, the resulting process that we obtain is again a
random walk. It turns out this is true not only for deterministic times, but even for stopping times.
This in fact is a result of the “strong Markov” property which we will deﬁne later in the course.
Proposition 3.9. Let (X
n
)
n∈N
0
be a random walk with parameter p, let T be a stopping time with
respect to (X
n
)
n∈N
0
which never takes the value ∞. If we deﬁne the process
Y
n
= X
T+n
−X
T
, n ∈ N
0
,
then (Y
n
)
n∈N
0
is also a random walk with parameter p.
26
4 Branching Process
4.1 A bit of history
In the mid 19th century several aristocratic families in Victo
rian England realized that their family names could become ex
tinct. Was it just unfounded paranoia, or did something real
prompt them to come to this conclusion? They decided to ask
around, and Sir Francis Galton (a “polymath, anthropologist,
eugenicist, tropical explorer, geographer, inventor, meteorolo
gist, protogeneticist, psychometrician and statistician” and half
cousin of Charles Darwin) posed the following question (1873,
Educational Times):
How many male children (on average) must each gen
eration of a family have in order for the family name
to continue in perpetuity?
The ﬁrst complete answer came from Reverend Henry William
Watson soon after, and the two wrote a joint paper entitled One
the probability of extinction of families in 1874. By the end of
this section, you will be able to give a precise answer to Galton’s
question.
Sir Francis Galton
4.2 A mathematical model
The model proposed by Watson was the following:
1. A population starts with one individual at time n = 0: Z
0
= 1.
2. After one unit of time (at time n = 1) the sole individual produces Z
1
identical clones of itself
and dies. Z
1
is an N
0
valued random variable.
3. (a) If Z
1
happens to be equal to 0 the population is dead and nothing happens at any future
time n ≥ 2.
(b) If Z
1
> 0, a unit of time later, each of Z
1
individuals gives birth to a random number
of children and dies. The ﬁrst one has Z
1,1
children, the second one Z
1,2
children, etc.
The last, Z
th
1
one, gives birth to Z
1,Z
1
children. We assume that the distribution of the
number of children is the same for each individual in every generation and independent
of either the number of individuals in the generation and of the number of children
the others have. This distribution, shared by all Z
n,i
and Z
1
, is called the oﬀspring
distribution. The total number of individuals in the second generation is now
Z
2
=
Z
1
k=1
Z
1,k
.
27
(c) The third, fourth, etc. generations are produced in the same way. If it ever happens that
Z
n
= 0, for some n, then Z
m
= 0 for all m ≥ n  the population is extinct. Otherwise,
Z
n+1
=
Zn
k=1
Z
n,k
.
Deﬁnition 4.1. A stochastic process with the properties described in (1), (2) and (3) above is
called a (simple) branching process.
The mechanism that produces the next generation from the present one can diﬀer from application
to application. It is the oﬀspring distribution alone that determines the evolution of a branching
process. With this new formalism, we can pose Galton’s question more precisely:
Under what conditions on the oﬀspring distribution will the process (Z
n
)
n∈N
0
never go
extinct, i.e., when does
P[Z
n
≥ 1 for all n ∈ N
0
] = 1 (4.1)
hold?
4.3 Construction and simulation of branching processes
Before we answer Galton’s question, let us ﬁgure out how to simulate a branching process, for a
given oﬀspring distribution p(k) = P[Z
1
= k], k ∈ N
0
. When we studies simulation, we showed
that it is possible to construct a function g : [0, 1] → N
0
such that the random variable g(U) has
probability mass function p when U ∼ Uniform(0, 1).
Some time ago we asserted that a probability space which supports a sequence (U
n
)
n∈N
0
of
independent U[0, 1] random variables exists. We think of (U
n
)
n∈N
0
as a sequence of random numbers
produced by a computer. Let us ﬁrst apply the function g to each member of (U
n
)
n∈N
0
to obtain
an independent sequence (η
n
)
n∈N
0
of N
0
valued random variables with pmf p. In the case of a
simple random walk, we would be done at this point  an accumulation of the ﬁrst n elements of
(η
n
)
n∈N
0
would give you the value X
n
of the random walk at time n. Branching processes are a bit
more complicated; the increment Z
n+1
−Z
n
depends on Z
n
: the more individuals in a generation,
the more oﬀspring they will produce. In other words, we need a black box with two inputs 
“randomness” and Z
n
 which will produce Z
n+1
. What do we mean by “randomness”? Ideally, we
would need exactly Z
n
(unused) elements of (η
n
)
n∈N
0
to simulate the number of children for each
of Z
n
members of generation n. This is exactly how one would do it in practice: given the size
Z
n
of generation n, one would draw Z
n
simulations from the distribution (p
n
)
n∈N
0
, and sum up
the results to get Z
n+1
. Mathematically, it is easier to be more wasteful. The sequence (η
n
)
n∈N
0
can be rearranged into a double sequence
2
¦Z
n,i
¦
n∈N
0
,i∈N
. In words, instead of one sequence of
independent random variables with pmf p, we have a sequence of sequences. Such an abundance
allows us to feed the whole “row” ¦Z
n,i
¦
i∈N
into the black box which produces Z
n+1
from Z
n
. You
can think of Z
n,i
as the number of children the i
th
individual in the n
th
generation would have had
2
Can you ﬁnd a onetoone and onto mapping from N into N ×N?
28
she been born. The black box uses only the ﬁrst Z
n
elements of ¦Z
n,i
¦
i∈N
and discards the rest:
Z
0
= 1, Z
n+1
=
Zn
i=1
Z
n,i
,
where all ¦Z
n,i
¦
n∈N
0
,i∈N
are independent of each other and have the same distribution with pmf p.
Once we learn a bit more about the probabilistic structure of (Z
n
)
n∈N
0
, we will describe another
way to simulate it.
4.4 A generatingfunction approach
Having deﬁned and constructed a branching process (Z
n
)
n∈N
0
with oﬀspring distribution given by
the pmf p, let us analyze its probabilistic structure. The ﬁrst question the needs to be answered is
the following: What is the distribution of Z
n
, for n ∈ N
0
? It is clear that Z
n
must be N
0
valued,
so its distribution is completely described by its pmf, which is, in turn, completely determined by
its generating function. While an explicit expression for the pmf of Z
n
may not be available, its
generating function can always be computed:
Proposition 4.2. Let (Z
n
)
n∈N
0
be a branching process, and let the generating function of its oﬀ
spring distribution be given by G(s). Then the generating function of Z
n
is the nfold composition
of G with itself, i.e.,
G
Zn
(s) = G(G(. . . G(s) . . . ))
. ¸¸ .
n G’s
, for n ≥ 1.
Proof. For n = 1, the distribution of Z
1
has pmf p, so G
Z
1
(s) = G(s). Suppose that the statement
of the proposition holds for some n ∈ N. Then
Z
n+1
=
Zn
i=1
Z
i,n
,
can be viewed as a random sum of Z
n
independent random variables where each random summand
has generating function G and the number of summands Z
n
is independent of the terms in the
sum. Proposition 2.18 asserts that the generating function G
Z
n+1
of Z
n+1
is a composition of the
generating function G(s) of each of the summands and the generating function G
Zn
of the random
time Z
n
. Therefore,
G
Z
n+1
(s) = G
Zn
(G(s)) = G(G(. . . G(G(s)) . . . )))
. ¸¸ .
n + 1 G’s
,
where the second equality follows by induction.
Let us use Proposition 4.2 in some simple examples.
Example 4.3. Let (Z
n
)
n∈N
0
be a branching process with oﬀspring distribution (p
n
)
n∈N
0
. In the
ﬁrst three examples no randomness occurs and the population growth can be described exactly. In
the other examples, more interesting things happen.
1. p(0) = 1, p(n) = 0, n ∈ N:
In this case Z
0
= 1 and Z
n
= 0 for all n ∈ N. This infertile population dies after the ﬁrst
generation.
29
2. p(0) = 0, p(1) = 1, p(n) = 0, n ≥ 2:
Each individual produces exactly one child before he/she dies. The population size is always
1: Z
n
= 1, n ∈ N
0
.
3. p(0) = 0, p(1) = 0, . . . , p(k) = 1, p(n) = 0, n ≥ k, for some k ≥ 2:
Here, there are k kids per individual, so the population grows exponentially: G(s) = s
k
, so
G
Zn
(s) = ((. . . (s
k
)
k
. . . )
k
)
k
= s
k
n
. Therefore, Z
n
= k
n
, for n ∈ N.
4. p(0) = p, p(1) = q = 1 −p, p(n) = 0, n ≥ 2:
Each individual tosses a (a biased) coin and has one child of the outcome is heads or dies
childless if the outcome is tails. The generating function of the oﬀspring distribution is
G(s) = p +qs. Therefore,
G
Zn
(s) = (p +q(p +q(p +q(. . . (p +qs)))))
. ¸¸ .
n pairs of parentheses
.
The expression above can be simpliﬁed considerably. One needs to realize two things:
(a) After all the products above are expanded, the resulting expression must be of the form
A + Bs, for some A, B. If you inspect the expression for G
Zn
even more closely, you
will see that the coeﬃcient B next to s is just q
n
.
(b) G
Zn
is a generating function of a probability distribution, so A+B = 1.
Therefore,
G
Zn
(s) = (1 −q
n
) +q
n
s.
Of course, the value of Z
n
will be equal to 1 if and only if all of the cointosses of its ancestors
turned out to be heads. The probability of that event is q
n
. So we didn’t need Proposition 4.2
after all.
This example can be interpreted alternatively as follows. Each individual has exactly one child,
but its gender is determined at random  male with probability q and female with probability
p. Assuming that all females change their last name when they marry, and assuming that
all of them marry, Z
n
is just the number of individuals carrying the family name after n
generations.
5. p(0) = p
2
, p(1) = 2pq, p(2) = q
2
, p(n) = 0, n ≥ 3:
In this case each individual has exactly two children and their gender is female with probability
p and male with probability q, independently of each other. The generating function G of the
oﬀspring distribution (p
n
)
n∈N
is given by G(s) = (p +qs)
2
. Then
G
Zn
= (p +q(p +q(. . . p +qs)
2
. . . )
2
)
2
. ¸¸ .
n pairs of parentheses
.
Unlike the example above, it is not so easy to simplify the above expression.
Proposition 4.2 can be used to compute the mean and variance of the population size Z
n
, for
n ∈ N.
30
Proposition 4.4. Let p denote the pmf of the oﬀspring distribution for a branching process (Z
n
)
n∈N
0
.
If p is integrable, i.e., if
µ =
∞
k=0
k p(k) < ∞,
then
E[Z
n
] = µ
n
. (4.2)
If the variance of p is also ﬁnite, i.e., if
σ
2
=
∞
k=0
(k −µ)
2
p(k) < ∞,
then
Var[Z
n
] = σ
2
µ
n
(1 +µ +µ
2
+ +µ
n
) =
_
σ
2
µ
n 1−µ
n+1
1−µ
, µ ,= 1,
σ
2
(n + 1), µ = 1
(4.3)
Proof. Since the distribution of Z
1
has probability mass function p, it is clear that E[Z
1
] = µ and
Var[Z
1
] = σ
2
. We proceed by induction and assume that the formulas (4.2) and (4.3) hold for n ∈ N.
By Proposition 4.2, the generating function G
Z
n+1
is given as a composition G
Z
n+1
(s) = G
Zn
(G(s)).
Therefore, if we use the identity E[Z
n+1
] = G
Z
n+1
(1), we get
G
Z
n+1
(1) = G
Zn
(G(1))G
(1) = G
Zn
(1)G
(1) = E[Z
n
]E[Z
1
] = µ
n
µ = µ
n+1
.
A similar (but more complicated and less illuminating) argument can be used to establish (4.3).
4.5 Extinction probability
We now turn to the central question (the one posed by Galton). We deﬁne extinction to be the
following event:
E = ¦ω ∈ Ω : Z
n
(ω) = 0 for some n ∈ N¦.
It follows from the properties of the branching process that Z
m
= 0 for all m ≥ n whenever Z
n
= 0.
Therefore, we can write E as an increasing union of sets E
n
, where
E
n
= ¦ω ∈ Ω : Z
n
(ω) = 0¦.
Therefore, the sequence (P[E
n
])
n∈N
is nondecreasing and “continuity of probability” implies that
P[E] = lim
n→∞
P[E
n
].
The number P[E] is called the extinction probability. Using generating functions, and, in
particular, the fact that P[E
n
] = P[Z
n
= 0] = G
Zn
(0) we get
P[E] = lim
n→∞
G
Zn
(0) = lim
n→∞
G(G(. . . G(0) . . . ))
. ¸¸ .
n G’s
.
It is amazing that this probability can be computed, even if the explicit form of the generating
function G
Zn
is not known.
31
Proposition 4.5. The extinction probability p = P[E] is the smallest nonnegative solution of the
equation
x = G(x), called the extinction equation,
where G is the generating function of the oﬀspring distribution.
Proof. Let us show ﬁrst that p = P[E] is a solution of the equation x = G(x). Indeed, G is a
continuous function, so G(lim
n→∞
x
n
) = lim
n→∞
G(x
n
) for every convergent sequence (x
n
)
n∈N
0
in
[0, 1]. Let us take a particular sequence given by
x
n
= G(G(. . . G(0) . . . ))
. ¸¸ .
n G’s
.
Then
1. p = P[E] = lim
n→∞
x
n
, and
2. G(x
n
) = x
n+1
.
Therefore,
p = lim
n→∞
x
n
= lim
n→∞
x
n+1
= lim
n→∞
G(x
n
) = G( lim
n→∞
x
n
) = G(p),
and so p solves the equation G(x) = x.
The fact that p = P[E] is the smallest solution of x = G(x) on [0, 1] is a bit trickier to get. Let
p
be another solution of x = G(x) on [0, 1]. Since 0 ≤ p
and G is a nondecreasing function on
[0, 1], we have
G(0) ≤ G(p
) = p
.
We can apply the function G to both sides of the inequality above to get
G(G(0)) ≤ G(G(p
)) = G(p
) = p
.
Continuing in the same way we get
P[E
n
] = G(G(. . . G(0) . . . ))
. ¸¸ .
n G’
≤ p
,
so p = P[E] = lim
n→∞
P[E
n
] ≤ p
, so p is not larger then any other solution p
of x = G(x).
Example 4.6. Let us compute extinction probabilities in the cases from Example 4.3.
1. p(0) = 1, p(n) = 0, n ∈ N:
No need to use any theorems. P[E] = 1 in this case.
2. p(0) = 0, p(1) = 1, p(n) = 0, n ≥ 2:
Like above, the situation is clear  P[E] = 0.
3. p(0) = 0, p(1) = 0, . . . , p(k) = 1, p(n) = 0, n ≥ k, for some k ≥ 2:
No extinction here  P[E] = 0.
32
4. p(0) = p, p(1) = q = 1 −p, p(n) = 0, n ≥ 2:
Since G(s) = p+qs, the extinction equation is s = p+qs. If p = 0, the only solution is s = 0,
so no extinction occurs. If p > 0, the only solution is s = 1  the extinction is guaranteed. It
is interesting to note the jump in the extinction probability as p changes from 0 to a positive
number.
5. p(0) = p
2
, p(1) = 2pq, p(2) = q
2
, p(n) = 0, n ≥ 3:
Here G(s) = (p +qs)
2
, so the extinction equation reads
s = (p +qs)
2
.
This is a quadratic in s and its solutions are s
1
= 1 and s
2
=
p
2
q
2
, if we assume that q > 0.
When p < q, the smaller of the two is s
2
. When p ≥ q, s = 1 is the smallest solution.
Therefore
P[E] = min(1,
p
2
q
2
).
References
[1] Patrick Billingsley. Probability and measure. Wiley Series in Probability and Mathematical
Statistics: Probability and Mathematical Statistics. John Wiley & Sons Inc., New York, second
edition, 1986.
33
1
1.1
Random Walks I
Stochastic processes
Deﬁnition 1.1. Let T be a subset of [0, ∞). A family of random variables (Xt )t∈T , indexed by T , is called a stochastic (or random) process. When T = N (or T = N0 ), (Xt )t∈T is said to be a discretetime process, and when T = [0, ∞), it is called a continuoustime process. When T is a singleton (say T = {1}), the process (Xt )t∈T ≡ X1 is really just a single random variable. When T is ﬁnite (e.g., T = {1, 2, . . . , n}), we get a random vector. Therefore, stochastic processes are generalizations of random vectors. The interpretation is, however, somewhat diﬀerent. While the components of a random vector usually (not always) stand for diﬀerent spatial coordinates, the index t ∈ T is more often than not interpreted as time. Stochastic processes usually model the evolution of a random system in time. When T = [0, ∞) (continuoustime processes), the value of the process can change every instant. When T = N (discretetime processes), the changes occur discretely. In contrast to the case of random vectors or random variables, it is not easy to deﬁne a notion of a density (or a probability mass function) for a stochastic process. Without going into details why exactly this is a problem, let me just mention that the main culprit is the inﬁnity. One usually considers a family of (discrete, continuous, etc.) ﬁnitedimensional distributions, i.e., the joint distributions of random vectors (Xt1 , Xt2 , . . . , Xtn ), for all n ∈ N and all choices t1 , . . . , tn ∈ T . The notion of a stochastic processes is very important both in mathematical theory and its applications in science, engineering, economics, etc. It is used to model a large number of various phenomena where the quantity of interest varies discretely or continuously through time in a nonpredictable fashion. Every stochastic process can be viewed as a function of two variables  t and ω. For each ﬁxed t, ω → Xt (ω) is a random variable, as postulated in the deﬁnition. However, if we change our point of view and keep ω ﬁxed, we see that the stochastic process is a function mapping ω to the realvalued function t → Xt (ω). These functions are called the trajectories of the stochastic process X. The following two ﬁgures show two possible trajectories of a simple random walk1 , i.e., each one corresponds to a (diﬀerent) frozen ω ∈ Ω, but t varies from 0 to 30.
6 6
4
4
2
2
5
10
15
20
5
10
15
20
2
2
4
4
6
6
1 We will deﬁne the simple random walk later. For now, let us just say that is behaves as follows. It starts at x = 0 for t = 0. After that a (possibly biased) fair coin is tossed and we move up (to x = 1) if heads is observed and down to x = −1 is we see tails. The procedure is repeated at t = 1, 2, . . . and the position at t + 1 is determined in the same way, independently of all the coin tosses before (note that the position at t = k can be any of the following x = −k, x = −k + 2, . . . , x = k − 2, x = k).
2
Unlike with the ﬁgures above, the next two pictures show two timeslices of the same random process; in each graph, the time t is ﬁxed (t = 15 vs. t = 25) but the various values random variables X15 and X25 can take are presented through the probability mass functions.
0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.30 0.25 0.20 0.15 0.10 0.05 0.00
20
10
0
10
20
20
10
0
10
20
Figure 1: Probability mass function for X15
Figure 2: Probability mass function for X25
1.2
The canonical probability space
When one deals with inﬁniteindex (T  = +∞) stochastic processes, the construction of the probability space (Ω, F, P) to support a given model is usually quite a technical matter. This course does not suﬀer from that problem because all our models can be implemented on a special probability space. We start with the samplespace Ω: Ω = [0, 1] × [0, 1] × · · · = [0, 1]N0 , and any generic element of Ω will be a sequence ω = (ω0 , ω1 , ω2 , . . . ) of real numbers in [0, 1]. For n ∈ N0 we deﬁne the mapping Un : Ω → [0, 1] which simply chooses the nth coordinate: Un (ω) = ωn . The proof of the following theorem can be found in most advanced probability books (e.g. [1] Thm. 20.4): Theorem 1.2. There exists a probability measure P on Ω such that 1. each Un , n ∈ N0 is a random variable with the uniform distribution on [0, 1], and 2. the sequence (Un )n∈N0 is independent. Remark 1.3. One should think of the sample space Ω as a source of all the randomness in the system: the elementary event ω ∈ Ω is chosen by a process beyond out control and the exact value of ω is assumed to be unknown. All the other parts of the system are possibly complicated, but deterministic, functions of ω (random variables). When a coin is tossed, only a single drop of randomness is needed  the outcome of a cointoss. When several coins are tossed, more randomness is involved and the sample space must be bigger. When a system involves an inﬁnite number of random variables (like a stochastic process with inﬁnite T ), a large sample space Ω is needed. Once we can construct a sequence of independent random variables which are uniformly distributed on the unit interval, we can then construct any number of models. For example: 3
4. otherwise. q = 1 − p. the increment Xn+1 − Xn = ξn+1 is independent of all the previous cointosses ξ1 . . to get (3). 4 . . . To check property b). previous. the random walk is called symmetric. A sequence (Xn )n∈N0 of random variables is called a simple random walk (with parameter p ∈ (0. we ﬁrst note that the (ξn )n∈N is an independent sequence (as it has been constructed by an application of a deterministic function to each element of an independent sequence (Un )n∈N ).1. . . ξn . Un ≤ p −1. F. Our next task is to study some of its mathematical properties. Therefore.2. ξn . One can study more general random walks where each step comes from an arbitrary prescribed probability distribution. . Xn ) for all n ∈ N. For the sequence (Un )n∈N . We have now deﬁned and constructed a random walk (Xn )n∈N0 .2.3 Constructing the random walk Let us show how to construct the simple random walk on the canonical probability space (Ω. X1 . and c) the random variable Xn+1 − Xn has the following distribution x= P(Xn+1 − Xn = x) = where. If p = 1 . so they must also be independent of ξn+1 . though. Property a) is trivially true. 1)) if a) X0 = 0. . n ξk . Proposition 1. we compute P[Xn+1 − Xn = 1] = P[ξn+1 = 1] = P[Un+1 ≤ p] = p. P) from Theorem 1. . b) Xn+1 − Xn is independent of (X0 . is that it is independent of all the previous values of the process X. given by Theorem 1. Proof. sequence (ξn )n∈N of random variables: ξn = We then set X0 = 0. These. . as usual. What we need to prove. The sequence (Xn )n∈N0 deﬁned above is a simple random walk. . n ∈ N. we need a deﬁnition of the simple random walk: Deﬁnition 1. new. . values are nothing but linear combinations of the cointosses ξ1 . A similar computation shows that P[Xn+1 − Xn = −1] = q. Xn = k=1 1 p −1 q 1. deﬁne the following. 2 The adjective simple comes from the fact that the size of each step is ﬁxed (equal to 1) and it is only the direction that is random. Finally.5. . we use each ξn to emulate a biased coin toss and then deﬁne the value of the process X at time n as the cumulative sum of the ﬁrst n cointosses. First of all. Intuitively.
.n = (x0 . . . each of which goes either up or down. 1. To compute this quantity. . 0. . it is more helpful to view the random walk as a random trajectory in some space of paths. l = −n. . A mathematically more interesting question can be posed about the maximum of the random walk on {0. . xn ) ∈ C : xk ≥ l. The support of Mn is {0. compute the required probability by simply counting the number of trajectories in the subset (event) you are interested in. and adding them all together. l = 0. xn ) : x0 = 0. n}. A nice expression for this probability is available for the case of symmetric simple random walks. . The ﬁgure on the right shows the superposition of all trajectories in C for n = 4 with path (0. . . . k = 1. . 1. n. . xk+1 − xk = ±1. . . Let (Xn )n∈N0 be a simple random walk with parameter p.. −n + 2. . Indeed. n.Proposition 1. n. weighted by their probabilities. . . . .. . n. . . . Proof. 4 Proposition 1. . Xn ) be the maximal value of (Xn )n∈N0 on the interval 0. . 1. (1. . . . −n + 2. To prepare the ground for the future results. let C be the set of all possible trajectories: C = {(x0 . . More precisely. p)distribution. . .. suppose n ≥ 2. . . Let us ﬁrst pick a level l ∈ {0. 1. Therefore. xn ) ∈ C : max xk ≥ l k=0. . . . 1. which. x1 . x1 . . .7. . . let Al ⊂ C be given by Al = (x0 .1) above. n} . and probabilities P[Xn = l] = n l+n 2 p(n+l)/2 q (n−l)/2 .4 The reﬂection principle Now we know how to compute the probabilities related to the position of the random walk (Xn )n∈N0 at a ﬁxed future time n.. 5 . n − 2. . . 2 4 2 1 2 3 4 You can think of the ﬁrst n steps of a random walk simply as a probability distribution on the statespace C.6. . In order to reach level l in those n steps. and. gives the probability (1. x1 . . with the fact the probability of any trajectory with exactly u upsteps is pu q n−u . . . Let (Xn )n∈N0 be a symmetric simple random walk. The distribution of the random variable Xn is discrete with support {−n. 2 1. the number u of upsteps and the number d of downsteps must satisfy u − d = l (and u + d = n). . Equivalently. 1. the symmetry assumption ensures that all trajectories are equally likely. n} and its probability mass function is given by P[Mn = l] = n n+l+1 2 2−n . . . . n − 2. . n} and compute the auxiliary probability P[Mn ≥ l] by counting the number of trajectories whose maximal level reached is at least l. n}. 2) marked in red. 2 2 n The number of ways we can choose these u upsteps from the total of n is n+l .1) Proof. for at least one k ∈ {0. Xn is composed of n independent steps ξk = Xk+1 − Xk . and let Mn = max(X0 . we could have noticed that the random variable n+Xn has the binomial 2 b(n. u = n+l and d = n−l . k ≤ n − 1}.
. . . • y1 = x1 . To get to the second important property of Φ. x1 . y1 . . . If k(l) = n. . . . (y0 . . . . l = 4 and k(l) = 8. . xn ) ∈ Al : xn < l}. we have Al  = 2 {(x0 . x1 . . . . 2. then (x0 . l 2. . . . With k(l) at our disposal. xn ) : xn = l} . . . . . . . x1 . .e. y1 . use the ﬂipped values for the cointosses from k(l) onwards: • yk(l)+1 − yk(l) = −(xk(l)+1 − xk(l) ). . . k(l) is the ﬁrst hitting time of the set {l. yn ) is in C. For a trajectory (x0 . xn ) ∈ Al . . . . for k ≥ k(l). . .. . . xn ). x1 . n we clearly have P[Mn ≥ 0] = 1. . A= = {(x0 . x1 . . y1 . . (1. When l = 0. • yk(l) = xk(l) . . let (y0 . do nothing until you get to k(l): • y0 = x0 . . . xn ) = (y0 . . . We know that k(l) is welldeﬁned (since we are only applying it to trajectories in Al ) and that it takes values in the set {1. A< = {(x0 . y1 . and l 3. . . . l l Φ(A< ) = A> .2) We start by deﬁning a bijective transformation which maps trajectories into trajectories. l + 1. . . y1 . x1 . l l l l 6 . . where A denotes the number of elements in the set A. Graphically. and Φ(A= ) = A= . . . with n = 15. • yk(l)+2 − yk(l)+1 = −(xk(l)+2 − xk(l)+1 ). . . xn ) : xn > l} + {(x0 . yn ) in Al . xn ) until it hits the level l. It follows immediately that Φ is a bijection from Al onto Al . . xn ) by the following procedure: 1. . . The picture on the right shows two trajectories: a blue one and its reﬂection in red. A> = {(x0 . .. . yn ). x1 . . i.Then P[Mn ≥ l] = 21 Al . . yn ) and call it the reﬂection map. . Let us denote this transformation by Φ : Al → C. . . . . }. xn ) ∈ Al : xn > l}. . Φ is an involution. To count the number of elements in Al . . . x1 . . . Φ(x0 . we use the following clever observation (known as the reﬂection principle): Claim: For l ∈ N. . It is clear that (y0 . and you will get the original (x0 . In the stochasticprocesstheory parlance. xn ) ∈ Al : xn = l}. . . and then follows its reﬂection around the level l so that yk − l = l − xk . xn )) be the smallest value of the index k such that xk ≥ l. In other words Φ ◦ Φ = Id. let k(l) = k(l. x1 . . . . let us split the set Al into three parts according to the value of xn : 1. n}. 2 6 4 2 2 4 6 8 10 12 14 • yn − yn−1 = −(xn − xn−1 ). . since X0 = 0. . . x1 . The ﬁrst important property of the reﬂection map is that it is its own inverse: apply Φ to any (y0 . . . . . yn ) ∈ C be a trajectory obtained from (x0 . xn ) = (y0 . (x0 . yn ) looks like (x0 . x1 . l So that Φ(A> ) = A< . .
so Xk . are running for oﬃce. . After each ballot is opened. The votes are independent of each other and X0 = 0. . . given that Xn = m? The ﬁrst step towards understanding the solution is the realization that the exact value of p does not matter. n}. . The ballot problem can now be restated as follows: What is the probability that Xk ≥ 0 for all k ∈ {0.. 1) and Oscar with probability q = 1 − p. At the end. by the bijectivity of Φ. we can easily rewrite it as follows: P[Mn ≥ l] = P[Xn = l] + 2 j>l P[Xn = j] = j>l P[Xn = j] + j≥l P[Xn = j]. where F = “all trajectories that stay nonnegative” = {Xi ≥ 0. . . the oﬃcial records the number of votes each candidate has received so far. we are interested in the conditional probability P[F G] = P[F ∩ G]/P[G]. It remains to note that only one of the probabilities P[Xn = l + 1] and P[Xn = l] is nonzero. . . if xn ≥ l. that Daisy got (n + m)/2 votes and Oscar the remaining (n − m)/2 votes. the a priori stipulation that (x0 . i. . let Xk be the number of votes received by Daisy minus the number of votes received by Oscar in the ﬁrst k ballots. Daisy and Oscar. . x1 . we have A<  = A>  = {(x0 . and n ∈ N voters cast their ballots. . This shows the claim.2). When the k + 1st vote is counted. or decreases by 1 otherwise. For k ≤ n. Now that we have (1. 7 . . . In either case the nonzero probability is given by n 2−n . xn ) : xn = l}. x1 . . we subtract P[Mn ≥ l + 1] from P[Mn ≥ l] to get the expression for P[Mn = l]: P[Mn = l] = P[Xn = l + 1] + P[Xn = l]. Xk either increases by 1 (if the vote was for Daisy). xn ) ∈ l l Al is unnecessary. Indeed. n+l+1 2 Let us use the reﬂection principle to solve a classical problem in combinatorics. . 1). .e. Therefore.8 (The Ballot Problem). 0 ≤ k ≤ n is (the beginning of ) a simple random walk. x1 . The probability of an upstep is p ∈ (0. Indeed. xn ) : xn > l} + {(x0 . and that each voter chooses Daisy with probability p ∈ (0. What is the probability that Daisy never trails Oscar during the counting of the votes? We assume that the order in which the oﬃcial counts the votes is completely independent of the actual votes. the ﬁrst one if n and l have diﬀerent parity and the second one otherwise. . xn ) : xn > l}. . you must already be in Al . x1 . Finally. . Suppose that two candidates. .We should note that. until all n of them have been processed (like in the old days). for all 0 ≤ i ≤ n}. G = “all trajectories that reach m at time n” = {Xn = m}. Example 1. Votes are counted by the same oﬃcial. in the deﬁnition of A> and A= . so this random walk is not necessarily symmetric. one by one. the oﬃcial announces that Daisy has won by a margin of m > 0 votes. l l and so Al  = 2 {(x0 .
so “all” that We already know how to count the number of paths in G . Putting everything together. 0≤i≤n then G = (G ∩ F ) ∪ H. . and 2. This means there are 1+(n+m)/2 of these paths. we get P[F G] = n k − n k n k+1 = n+m 2k + 1 − n . So G ∩ F  = G − H.3) . it is often phrased as “in how many ways can the counting be performed . n ∈ N. we can. Conversely. people still write research papers on some of its generalizations. where k = . In fact.it is equal to remains to be done is to count the number of paths in G ∩ F . P[F G] = F ∩ G P[F ∩ G] F ∩ G p(n+m)/2 q (n−m)/2 = = .Each trajectory in G has (n + m)/2 upsteps and (n − m)/2 downsteps. the collection of paths that go from 0 to m can be split into 1.3) above). Can we use the reﬂection principle to ﬁnd H? Yes. so its probability weight is always equal to p(n+m)/2 q (n−m)/2 . then it must have (n + m + 2)/2 n down steps and (n − m − 2)/2 up steps. In fact. H: the paths that go from 0 to m and become negative at some point. you can convince yourself that the reﬂection of any path in H around the level l = −1 after its ﬁrst hitting time of that level produces a path that starts at 0 and ends at −m − 2. here is an identity (called Segner’s recurrence formula) satisﬁed by the Catalan numbers n Cn = i=1 Ci−1 Cn−i . k How would you modify this argument compute the probability that Daisy leads Oscar during the entire counting of votes? The Ballot problem has a long history (going back to at least 1887) and has spurred a lot of research in combinatorics and probability. k+1 2 n! The last equality follows from the deﬁnition of binomial coeﬃcients n = k!(n−k)! . (n+m)/2 q (n−m)/2 P[G] G G p n (n+m)/2 (1. A special case m = 0 seems to be even more popular .3)? If you want to test your understanding a bit further. ” (the diﬀerence being only in the normalizing factor n k appearing in (1. G ∩ F : the paths that go from 0 to m and stay nonegative. When posed outside the context of probability. If we set H = “all paths which ﬁnish at m and visit the level l = −1” = {Xn = m and min Xi ≤ −1}. . the same procedure applied to such a path yields a path in H. Can you prove it using the Ballotproblem interpretation? 8 . Therefore.the number of 2nstep paths from 0 to 0 never going below zero is called the Catalan number and equals to Cn = 1 2n . In other words. n+1 n Can you derive this expression from (1. If a path travels from 0 to −m − 2.
It turns out that in many cases of interest we can use this function to learn about the sequence. Herbert S. .2. R) and dn A(s) = dsn In particular. 9 . generatingfunctionology The pathcounting method used in the previous lecture only works for computations related to the ﬁrst n steps of the random walk. where n is given in advance. (2.1) 1 dn n! dsn A(s) s=0 .2 Generating Functions A generating function is a clothesline on which we hang up a sequence of numbers for display. Proof. Checking this result rigorously is beyond the scope of this course. . We will see later that most of the interesting questions do not fall into this category. 2. n≤k<∞ n ∈ N. If (an )n∈N0 is a sequence of numbers then we say that radius of convergence of the power series ∞ ak sk is the largest number R ∈ [0. an = k(k − 1) · · · (k − n + 1) ak sk−n . The generating function A associated with a sequence (an )n∈N0 is inﬁnitely diﬀerentiable. ∞] such that ∞ ak sk converges k=0 k=0 when s < R. is the generating function associated with the sequence (an )n∈N0 . Deﬁnition 2. When R > 0. the distribution of the time it takes for the random walk to hit the level l = 0 is like that. −R < s < R.1. Wilf. 1≤k<∞ k(k − 1) ak sk−2 . To deal with a wider class of properties of random walks (and other processes). the expectation of this random variable can be +∞). Proposition 2. When R > 0. we need to develop some new mathematical tools. If we formally diﬀerentiate each term in the expression: A(s) = (a0 + a1 s + a2 s2 + a3 s3 + .1 Generating functions A generating function associates a sequence of numbers with a function (a power series). There is no way to give an apriori bound on the number of steps it will take to get to l (in fact. For example. the function A(s) is inﬁnitely diﬀerentiable on (−R. then we see that the result is again a power series with the stated coeﬃcients. and its derivative can be expressed as another power series. ). we say that the function G(s) = k∈N0 ak sk . so we can recover the sequence (an )n∈N0 from the function A. For example: d A(s) = ds d2 ds2 A(s) = 2≤k<∞ k ak sk−1 .
2 Associating a generating function with a random variable In this section we will look at random variables which take values in the set T N0 ∪ {+∞} = {0. N0 valued.” The distribution of an Tvalued random variable X is completely determined by the sequence (an )n∈N0 of numbers in [0. 2. ) then we see that the resulting coeﬃcient for sn is given by cn . (2. . } ∪ {+∞}. We will see shortly that convolution arises naturally when we compute the probability mass function of the sum of two independent. We will often be interested in random variables which record the amount of (discrete) time that we have to wait for an event to occur. let Ga and Gb denote the generating functions associated with these sequences. In some cases. Let (an )n∈N0 and (bn )n∈N0 be sequences. The knowledge of A implies the knowledge of the whole sequence (an )n∈N0 . If we set c = a ∗ b. . 1] given by an = P[X = n]. n ∈ N0 . 3. It turns out that convolving two sequences is equivalent to multiplying their generating functions. 1. It also turns out that we can use generating functions to study convolution. .3. 2. Checking the remaining claims rigorously is again beyond the scope of this course. Proposition 2. and Gc (s) = Ga (s)Gb (s) for s < R.The name generating function comes from the last part of this result. . . and assume that both power series have radius of convergence at least as large as R > 0. also has radius of convergence at least as large as R. then the ∞ Gc (s) = k=0 ck sk . 10 . but we can still ﬁgure it from the values in the sequence: P(X = ∞) = 1 − P(X < ∞) = 1 − n∈N0 an . Let (an )n∈N0 and (bn )n∈N0 be sequences and deﬁne n n cn = j=0 aj bj−k = k=0 an−k bk . Proof. . so we allow these random variables to take the value +∞ to indicate that we have to wait “forever. Deﬁnition 2.2) Notice that the value P(X = ∞) does not occur in the sequence (an )n∈N0 .4. ) (b0 + b1 s + b2 s2 + . . the event may never occurs. random variables. n ∈ N0 . Then we say that the sequence (cn )n∈N0 is the convolution of the sequences (an )n∈N0 and (bn )n∈N0 and we write c = a ∗ b. If we formally expand and then collect like powers of s in the following expression: Ga (s)Gb (s) = (a0 + a1 s + a2 s2 + .
. k! Remark 2. (2. Therefore. . (2) and (4). k = 0. Proposition 2. Let X be an Tvalued random variable with generating function GX . ak = q k p. we have k! ∞ GX (s) = k=0 e−λ λk k s = e−λ k! ∞ k=0 (sλ)k = e−λ esλ = eλ(s−1) . 1). when we say “let (an )n∈N0 be the sequence associated with X”. 11 . GX (s) = E[sX ] = E[sX 1{X<∞} ]. For the distribution with pmf given by ak = (k+1)2 . so that ∞ ∞ GX (s) = k=0 k q k sk p = p k=0 (qs)k = p . for n ≥ 2.5. k by the binomial theorem. k ∈ N0 . Example 2.7. a1 = p. GX (s) = ps + q. The function GX that we have obtained in known as the generating function or probability generating function associated with X. It is inﬁnite in (1). we have n GX (s) = k=0 n k n−k k p q s = (ps + q)n . We then deﬁne the generating function associated with the sequence (an )n∈N0 by GX (s) = 0≤k<∞ ak sk . in Example C 1 2.2).5. P (X < ∞) = lims 1 GX (s). s ∈ (−1. Note that the true radius of convergence varies from distribution to distribution but is never smaller than 1.In the future. . Before we proceed. Can you see why? The following proposition gives another way to the compute the generating function associated with a random variable. (1) Bernoulli(p): Here a0 = q. p): Since ak = n k pk q n−k . we mean that (an )n∈N0 is given by (2. the radius of k=0 convergence is exactly equal to 1.6. . 2. 1). and equal to 1/q > 1 in (3). and an = 0.3) It follows from the fact that an  ≤ 1 that the radius of convergence of the sequence (an )n∈N0 is at least 1 and GX is well deﬁned for s ∈ (−1. (3) Geometric(p) For k ∈ N0 . Then 1. 1 − qs (4) Poisson(λ): Given that ak = e−λ λ . where C = ( ∞ (k+1)2 )−1 . let us ﬁnd an expression for the generating functions of some of the popular N0 valued random variables. n. (2) Binomial(n.
1). then c = a ∗ b and GZ (s) = GX (s)GY (s). one should really justify the fact that we can exchange the summation and limit in the previous equation. we have n n n cn = P(Z = n) = i=0 P(X = i and Y = n − i) = i=0 P(X = i) P(Y = n − i) = i=0 ai bn−1 . or simply check it by hand. In particular. For each n ∈ N. one could employ the monotone convergence theorem from real analysis. Proposition 2. 12 .Proof. The moment generating function is quite similar to the probability generating function. Of course. Let X. We used the formula an = P[X = n] to associate a sequence with the random variable X. Y be independent Tvalued random variables and set Z = X + Y . If (an )n∈N0 and GX are the sequence and generating function associated with X.3 Convolution The true power of generating functions comes from the fact that they behave very well under the usual operations in probability. In this case. the resulting function is given by B(s) = E[esX ] and is known as the moment generating function associated with X. The second claim follows from the fact that: s lim GX (s) = lim 1 s 1 sn P(X = n) n∈N0 s = n∈N0 lim sn P(X = n) = 1 n∈N0 P(X = n) = P(X < ∞). 2. Remark 2. If one then computes the generating function B corresponding to the sequence (bn ). Similarly. if s < 1. Proof. One could also use the formula bn = E[X n /n!] to associate a sequence with X. Statement (1) follows directly from the formula E[g(X)] = n∈T g(n) P(X = n) applied to g(x) = sx where we have used the fact/convention that s∞ = 0 if s < 1. The probability generating function will turn out to be more convenient for this class. (bn ) and Gy are the sequence and generating function associated with Y . but this is beyond the scope of this course. one can check that that they are related by the formula GX (s) = B(log(s)).9. and (cn ) and GZ are the sequence and generating function associated with Z. for s ∈ (0. then GZ (s) = E[sZ ] = E[sX+Y ] = E[sX sY ] = E[sX ] E[sY ] = GX (s)GY (s).8.
We will actually need something slightly more complicated than Proposition 2. In the event that your driver does return to the shop after the ﬁrst trip.11. p) distribution is a sum of n independent Bernoulli random variables with parameter(p). Moreover. Let T1 ∈ T denote the time that your driver gets back from the ﬁrst trip. 1. these should probably be continuous random variable. We would like to check that GT2 (s) = G2 1 (s) for s < 1. m + 1. we want to apply Prop. P(T1 = ∞) > 0 and P(T2 = ∞) > 0. . What is even more interesting.4) that P(T2 = ∞  T1 = m) = 1 − n∈N0 P(T2 = m + n  T1 = m) = 1 − n≤k<∞ P(T1 = n) = P(T1 = ∞). m + 2. p) distribution has a binomial(mn. but then realize that you gave him the wrong pizza. T2 can only take the values {m. lets consider the following example.9 to T1 and T2 − T1 . if you don’t like the idea that T1 might take the value zero. In other words. If you try to sum binomials with diﬀerent values of the parameter p you will not get a binomial. You own a pizza shop with a single delivery driver. p and ny . Let GT1 denote the generating function associated with the random variable T1 and let GT2 denote the generating function associated with the random variable T2 . and let T2 ≥ T1 denote the time that your driver gets back from the second trip. The binomial(n.9 when we get back to random walks. Of course.9 n times to the generating function (q + ps) of the Bernoulli b(p) distribution we immediately get that the generating function of the binomial is (q + ps) . p) distribution. You send your driver out with an order. the following statement can be shown: Suppose that the sum Z of two independent N0 valued random variables X and Y is binomially distributed with parameters n and p. if the driver quits before he return from the ﬁrst round trip. n ∈ N0 . Morally. but there one T sticking point: if T1 = ∞. Example 2. . you can just assign probability zero to this possibility. Then both X and Y are binomial with parameters nX . (q + ps) = (q + ps)n . In other words.Example 2.4) Since conditional on the event {T1 = m}. we suppose that: P(T2 = m + n  T1 = m) = P(T1 = n). } ∪ {∞}. Therefore. p where nX + nY = n. . the only way to get a binomial as a sum of independent random variables is the trivial one. 2. Your driver is actually somewhat unreliable: on each trip there is some chance that he will decide he is sick of this job and never return. . it’s 1983 so you can’t call your driver on a cell phone and there is nothing that you can do but wait for your driver to return. More formally. the time that it takes for him to make the second round trip is independent of the time that the ﬁrst trip took and has the same distribution. To understand what we want. 2. 3. Unfortunately. we can show that the sum of m independent random variables with the b(n. So it follows from (2. then T2 = ∞ and T2 − T1 is undeﬁned. if we apply Prop. More generally. (2. then it doesn’t make any sense to ask how long the second 13 . but lets assume that the world is discrete. In other words. m. .10. 2.
Proposition 2. dn GX (s) dsn s=1 exists (in the sense that the left limit lims 1 dn GX (s) dsn exists) In either case.13. 2.round trip took. so they must agree on their entire common radius of convergence. so E(sT2  T1 = ∞) = s∞ = 0. We have now shown the following proposition which we will need in the next section. and checking the resulting summation amounts to calculating the desired expectation. s < 1. If two generating functions agree around zero. one can check this by setting s = 1 in (2. We could just deﬁne ∞ − ∞ = ∞. . If P(T2 = m + n  T1 = m) = P(T1 = n).12. Let X be a N0 valued random variable X with generating function GX . (X − n + 1)] = dn GX (s) dsn . . We also know that T2 = ∞ when T1 = ∞. For n ∈ N the following two statements are equivalent 1.1). so we will now restrict attention to random variables that only take values in N0 . 2.2 they are generated by the same sequence. but this still isn’t going to make T1 and T2 − T1 independent. Formally. s=1 Proof. Also notice that if P(X = ∞) > 0. 14 . m ∈ N0 . As a result. In fact. Proposition 2. we have E[X(X − 1)(X − 2) . Let T2 ≥ T1 be random times taking values in T = N0 ∪ {+∞} with generating functions GT1 and GT2 . First notice that E(sT2  T1 = m) = n∈T sm+n P(T1 = n) = sm GT1 (s). then it follows from Proposition 2. T m.4 Moments Another useful thing about generating functions is that they can make the computation of moments easier. Recall that E[X n ] is said to be the nth moment of X. Fortunately. this minor annoyance doesn’t end up mattering. we may apply the tower law for conditional expectation to conclude that GT2 (s) = E[sT2 ] = m∈T E[sT2  T1 = m] P(T1 = m) = m∈N0 sm GT1 (s) P(T1 = m) = G2 1 (s). T when s < 1. we know something slightly stronger. then X n is never integrable. then GT2 (s) = G2 1 (s). E[X n ] < ∞. s < 1. n ∈ N0 .
Example 2.14. E[X(X − 1)]. E[X(X − 1)]. ) of factorial moments of X is just (λ. . and we let Z denote the ﬁrst ball drawn. E[X(X − 1)(X − 2)]. ). . replace the ball in the urn. we let Y denote the amount of winnings obtained after (but not including the winnings from) the ﬁrst round. Let X be a Poisson random variable with parameter λ. . As a result: GY (s) = E(sY  Z = 1) = E(sY  Z = 2) = E[sX ] = GX (s). b) If we draw the second ball. Example 2. and then play another round. 15 . λ2 . We would like to determine the generating function GX associated with X. and it is valid if the ﬁrst two derivatives of P at 1 exist. E[X 2 ] = E[X(X − 1)] + E[X]. The number of rounds that we play has a geometric distribution. We then play a repeated game where on each turn we draw a ball with the following outcomes: a) If we draw the ﬁrst ball. for all n ∈ N0 . Then the conditional distribution of Y given Z = 1 or Z = 2 is the same as the unconditional distribution of X. λ3 . It is very simple for the ﬁrst few: E[X] = E[X]. You can get the classical moments from the factorial moments by solving a system of linear equations. We have an urn which contains three number balls. E[X(X − 1)(X − 2)]. A useful identity which follows directly from the above results is the following: Var[X] = P (1) + P (1) − (P (1))2 . so the game ends with probability one and P(X = ∞) = 0. c) If we draw the third ball. Var[X] = λ E[X 3 ] = λ3 + 3λ2 + λ.15. . replace the ball in the urn. . . . the sequence (E[X]. Its generating function is given by A(s) = eλ(s−1) . . . the game is over. . and so. are called factorial moments of the random variable X. . E[X 3 ] = E[X(X − 1)(X − 2)] + 3E[X(X − 1)] + E[X]. . Let X denote the amount of money that we win in this game. That is: P(Y = n  Z = 1) = P(Y = n  Z = 2) = P(X = n). It follows that n E[X] = λ. . . dsn A(1) = λn .The quantities E[X]. E[X 2 ] = λ2 + λ. we win a dollar. To do this. and then play another round. d Therefore. we win two dollars. .
we deﬁne the random variable XN by XN (ω) = XN (ω) (ω). More generally. GX (s) = 1 + 2s . E[X] = GX (1) = 3 P(X = 1) = GX (0) = 1/9. Let us compute the distribution of Y = N ξk in this case. E[X ] = GX (1) + GX (1) = 23 2 P(X = 2) = GX (0)/2 = 4/27. 2. (3 − s − s2 )2 GX (s) = 8 + 6s + 6s2 . N (ω) = 0. k=1 Example 2. This k=0 16 . If Xn = k=1 ξk . In particular. and let N be a random time (a random time is simply an T = N0 ∪ {+∞}value random variable). N (ω) k=1 ξk (ω). for ω ∈ Ω. we can determine a number of properties of X P(X = 0) = GX (0) = 1/3. and let N have the following distribution n= 0 1 2 P(N = n) = 1/3 1/3 1/3 which is independent of (ξn )n∈N (it is very important to specify the dependence structure between N and (ξn )n∈N in this setting!). then XN is simply equal to Xn . When N is a constant (N = n).16. think of XN as a value n of the stochastic process X taken at the time which is itself random.and GX (s) = E[sX ] = E[sX  Z = 1] P(Z = 1) + E[sX  Z = 2] P(Z = 2) + E[sX  Z = 3] P(Z = 3) = E[s1+Y  Z = 1]/3 + E[s2+Y  Z = 2]/3 + E[s0  Z = 3]/3 = sGX (s)/3 + s2 GX (s)/3 + 1/3. for an arbitrary stochastic process (Xn )n∈N0 and a random time N (with P[N = +∞] = 0). Solving for GX show that GX (s) = 1/(3 − s − s2 ). Var(X) = 14. We can deﬁne the random variable N Y = k=0 ξk by Y (ω) = 0.5 Random sums Our next application of generating function in the theory of stochastic processes deals with the socalled random sums. Let (ξn )n∈N be a sequence of random variables. (3 − s − s2 )3 As a result. then XN = N ξk . In general. Let (ξn )n∈N be the increments of a symmetric simple random walk (cointosses). N (ω) ≥ 1 for ω ∈ Ω.
Perform the computation for some other values of m for yourself.we can think of a situation where a gambler is repeatedly playing the same game in which a fair coin is tossed and the gambler wins a dollar if the outcome is heads and loses a dollar otherwise. you cannot beat a fair gamble.is where we. ξ1 = 1. typically. 1. One of the ﬁrst powerful results of the beautiful 8 martingale theory states that no matter how smart a strategy you employ. I’ll play another 2 games and hopefully cover my losses. If I lose. ξ1 = 1. The expectation E[Y ] is equal to 1 · 5 + 4 8 8 1 (−1) · 4 + (−3) · 1 = 0. use the formula of total probability: P(Y = m) = P(Y = mN = 0) P(N = 0) + P(Y = mN = 1) P(N = 1) + P(Y = mN = 2) P(N = 2) N N =P k=0 ξk = m N = 0 P(N = 0) + P k=0 N ξk = m N = 1 P(N = 1) +P k=0 ξk = m N = 2 P(N = 2) = 1 1 + P(ξ1 = m) + P(ξ1 + ξ2 = m) . −1 + ξ2 + ξ3 . 3 {m=0} When m = 1 (for example). Example 2. . and if I win. we get P[Y = −1] = 1 and P[Y = −3] = 1 . ξ1 = −1. This is not an accident. as the value of the time N when we stop adding increments will typically depend on the behavior of the sum itself. The described strategy amounts to the choice of the random time N as follows: N (ω) = Then Y (ω) = Therefore. 2 4 8 Similarly. Let (ξn )n∈N be as above . 3. A “smart” gambler enters the game and decides on the following tactic: Let’s see how the ﬁrst game goes. What happens when N and (ξn )n∈N are dependent? This will usually be the case in practice.17. ξ1 = −1. P[Y = 1] = P[Y = 1ξ1 = 1]P[ξ1 = 1] + P[Y = 1ξ1 = −1]P[ξ1 = −1] = 1 · P[ξ1 = 1] + P[ξ2 + ξ3 = 2]P[ξ1 = −1] = 1 (1 + 1 ) = 5 . I’ll quit then and there. we get P[Y = 1] = 0+ 1 2 +0 3 = 1/6. 17 1.
another ξ· years have to pass before they get another chance and the whole thing stops when they ﬁnally win.18. we get E[Y ] = GY (1) = GN (Gξ (1))Gξ (1) = GN (1)Gξ (1) = E[N ] E[ξ1 ]. p 18 . Proof. Then the generating function GY of the random sum Y = N ξk is given by k=0 GY (s) = GN Gξ (s) . λ > 0. Repeated applications of Proposition 2. First let X0 = 0 and Xn = for n ∈ N denote the sequence of partial sums. Proposition 2. Let (ξn )n∈N and N be as in Proposition 2. Every time the Springﬁeld Isotopes play in the league championship. Indeed.19 (Wald’s Identity I). We just apply the composition rule for derivatives to the equality GY = GN ◦ Gξ to get GY (s) = GN (Gξ (s))Gξ (s).20. that E[N ] < ∞ and E[ξ1 ] < ∞. 1). Then N Y = k=0 ξk . their chance of winning is p ∈ (0. we may apply ξ ξ the tower law for conditional expectation to see that E[sY ] = n∈N0 n i=1 ξi E[sY  N = n] P(N = n) E[sXn  N = n] P(N = n) = n∈N0 n∈N0 = Gn (s) P(N = n) = GN Gξ (s) . Then N E k=0 ξk = E[N ] E[ξ1 ]. Let us use generating functions to give a full description of the distribution of Y = N ξk when the time is independent k=0 of the summands. The number of years between two championships they get to play in has the Poisson distribution p(λ). To compute the expectation of Y we use Corollary 2. ξ Corollary 2. Suppose. Moreover.19 1−p E[Y ] = E[N ] E[ξk ] = λ. As a result. Proof. every time the Isotopes lose the championship. Let (ξn )n∈N be a sequence of independent N0 valued random variables. let N be a Geometric(p) random variable with success probability p. all of which share the same distribution and generating function Gξ (s).9 show that GXn = Gn (where G0 (s) = 1). also.We will return to the general (nonindependent) case in the next lecture. After we let s 1. What is the expected number of years Y between the consecutive championship wins? Let (ξn )n∈N be the sequence of independent Poisson(λ)random variables modeling the number of years between consecutive championship appearances by the Wildcat. Let N be a random time which is independent of (ξn )n∈N with P(N < ∞) = 1 and generating function GN .18. Example 2.
The set {n ∈ N0 : Xn = l} is the collection of all timepoints at which X visits the level l. . for all n ∈ N0 . X1 . They can be deﬁned for general stochastic processes. 2. .is the ﬁrst hitting time of l. Let (Xn )n∈N0 be a stochastic process. Deﬁnition 3. More precisely. . Her decision to stop and sell her stocks can depend only on the information available up to the moment of the decision. . xn ) = 1. . making tons of money in the process. she would sell at the absolute maximum and buy at the absolute minimum. and let Tl be the k=0 ﬁrst time X hits the level l ∈ N. Think of an investor in the stock market. So. so the mere mortals have to restrict their choices to socalled stopping times. Probably the most wellknown examples of stopping times are (ﬁrst) hitting times. A gambler who stops gambling after 20 games. Xn at all.3 3. In states of the 19 . Of course. 0. Otherwise. . The simplest examples of stopping times are (nonrandom) deterministic times. x1 . . Example 3. The value 0 means keep going and 1 means stop. . and should be thought of as a black box which takes the values of the process (Xn )n∈N0 observed up to the present point and outputs either 0 or 1. . 1. let Xn = n ξk be a simple random walk. but we will stick to simple random walks for the purposes of this example. no matter of what the winnings of losses are uses such a rule. n = n0 . X1 . An especially interesting case occurs when the value of N depends directly on the evolution of the underlying stochastic process. A random variable T taking values in T = N0 ∪ {+∞} is said to be a stopping time with respect to (Xn )n∈N0 if for each n ∈ N0 there exists a function Gn : Rn+1 → {0. The family of decision rules is easy to construct: Gn (x0 . The earliest one .2. 1} such that 1{T =n} = Gn (X0 . . If you think of N as the time you stop adding new terms to the sum. we use the following slightly nonintuitive but mathematically correct deﬁnition Tl = min{n ∈ N0 : Xn = l}. n = n0 .1. . The functions Gn are called the decision functions. Just set T = 5 (or T = 723 or T = n0 for any n0 ∈ N0 ∪ {+∞}). this is not possible unless you are clairvoyant. Even more important is the case where time’s arrow is taken into account. . it is usually the case that you are not allowed (able) to see the values of the terms you would get if you continued adding.the minimum of that set . The whole point is that the decision has to based only on the available observations and not on the future ones. no matter what the state of the world ω ∈ Ω is. Decision functions Gn do not depend on the values of X0 . Xn ). .1 Random Walks II Stopping times The last application of generating functions dealt with sums evaluated between 0 and some random time N .
. 20 . 0. this is one of the points you would choose to sell it at. and let Gn be the family of decision functions. ξ2 . . . . x0 = l. n − 1. Consider the following two trajectories: (0. In the remainder of this section. we need to construct the decision functions Gn . n0 − 1 without the knowledge of the values of the random walk after n. . . . . . . i. . . Of course. . n0 that the random walk visits its maximum during 0. we will sometimes write our decision function as a function of the random variables ξ1 . 1. The hitting time T2 of the level l = 2 for a particular trajectory of a symmetric simple random walk is depicted below: 6 4 2 5 2 4 6 10 15 20 25 30 T2 TM . . for some n ∈ 0.3. . . . The righthand side is equal for both trajectories. . . . We would have Tl = 0 in the (impossible) case X0 = l. to the contrary. . Remark 3. n − 2). . 3. two things must happen: (a) Xn = l (the level l must actually be hit at time n). . . 2. xn = l otherwise. . Xn is clearly equivalent to knowing the values ξ1 . How about n ∈ N. that it is. we set Tl (ω) = +∞. 1. . . . we are free to use whichever representation is more convenient. ξn . . while the lefthand side equals to 0 for the ﬁrst one and 1 for the second one. so we always have G0 (X0 ) = 0. . X0 = l (the level l has not been hit before). . X2 . n) and (0. . More precisely. when {n ∈ N0 : Xn = l} is an empty set. X1 = l. . . 3. Let us start with n = 0. If you bought a stock at time t = 0. As knowing the values X0 . . On the other hand. . . and (b) Xn−1 = l. n ∈ N0 . let us sketch the proof of the fact that TM is not a stopping time. . ξ2 . x1 = l. . Gn (x0 . . ξn rather than the random variables X0 . .e. . How about something that is not a stopping time? Let n0 be an arbitrary timehorizon and let TM be the last time during 0. it is impossible to decide whether TM = n. 2. xn−1 = l. . Xn−1 ).. we have 1{TM =n−1} = Gn−1 (X0 . For the value of Tl to be equal to exactly n. . Therefore. Suppose.world ω ∈ Ω in which the level l just never get reached. . A contradiction. by the deﬁnition of the decision functions. Xn . . . In order to show that Tl is indeed a stopping time. The diﬀer only in the direction of the last step. xn ) = 1. had to sell it some time before n0 and had the ability to predict the future. . They also diﬀer in the fact that TM = n for the ﬁrst one and TM = n − 1 for the second one. n − 1. X2 . x1 . . . . 3. X1 . . n0 (see picture above). X1 . Xn−2 = l.
. We have k−1 k−1 1{k≤T } = 1 − 1{k>T } = 1 − 1{k−1≥T } = 1 − j=0 1{T =j} = 1 − j=0 Gj (ξ1 . ξj ). (3. P[N ≥ k] = the sums) j≥k P[N = j]. Let N be an N0 valued random variable. The random variables (ξn )n∈N in the statement of the theorem below are only assumed to be independent of each other and identically distributed. Then E[N ] = k∈N0 P[N ≥ k]. 21 . let use try to compute something about it. here is an extremely useful identity: Proposition 3. Let (ξn )n∈N be a sequence of independent. To make things simpler. so (note what happens to the indices when we switch P[N = j] k∈N k≤j<∞ ∞ P[N ≥ k] = k∈N = j∈N 1≤k≤j P[N = j] = j=1 j P[N = j] = E[N ]. .2 Wald’s identity II Having deﬁned the notion of a stopping time. n ∈ N0 . you can think of (ξn )n∈N as increments of a simple random walk. but the argument is technical and we omit it here) yields: T ∞ E k=1 ξk = k=1 E[1{k≤T } ξk ].5 (Wald’s Identity II).4. Theorem 3. Here is another way of writing the sum T T k=1 ξk : ξk = k=1 k∈N0 ξk 1{k≤T } . Taking expectation of both sides and switching E and (this can be justiﬁed. Proof. . Before we state the main result. Proof. identically distributed random variables with E ξ1  < ∞.1) Now lets look at the random variable 1{k≤T } more closely. then E[XT ] = E[ξ1 ] E[T ]. Clearly.3. If T is an (Xn )n∈N0 stopping time such that E[T ] < ∞. Set n Xn = k=1 ξk . The idea behind it is simple: add all the values of ξk for k ≤ T and keep adding zeros (since ξk 1{k≤T } = 0 for k > T ) after that. .
. whose increments ξk = Wk − Wk−1 are cointosses.the gambler’s wealth (minus x) at the random time T ? Clearly. . . we can think of T as the ﬁrst hitting time of the twoelement set {−x. . pa = . The classical “Gambler’s ruin” problem asks the following question: what is the probability that the gambler will make a dollars before he goes bankrupt? Gambler’s wealth (Wn )n∈N is modeled by a simple random walk starting from x. where Xn = n ξk . In particular. Let T k=0 be the time the gambler stops. so 0 = E[XT ] = p0 (−x) + pa (a − x). and the probabilities p0 and pa with which it takes these values are exactly what we are after in this problem. What can we say about the random variable XT .. his wealth hits 0. We can represent T in two diﬀerent (but equivalent) ways. We know that. since there are no other values XT can take. This means that T ∞ ∞ E k=1 ξk = k=1 E[1{k≤T } ] E[ξk ] = E[ξk ] k=1 P(k ≤ T ) = E[ξk ]E[T ]. a − x} for the process (Xn )n∈N0 . . .e. it is quite clear that T is a stopping time (can you write down the decision functions?). he goes bankrupt. . so P[T < ∞] = 1. we see that the random variable 1{k≤T } can be written as function of the variables (ξ1 . i. it is either equal to −x or to a − x.6 (Gambler’s ruin problem). ξk ) are independent. his wealth reaches some level a > x. He 2 2 decides to stop when one of the following two things happens: 1. he makes enough money. On the other hand. As the random variables (ξ1 . we must have p0 + pa = 1. These two linear equations with two unknowns yield p0 = a−x x . Example 3. . Then Wn = x + Xn . Wald’s identity gives us the second equation for p0 and pa : E[XT ] = E[ξ1 ]E[T ] = 0 · E[T ] = 0. In either case. n ∈ N0 . the random variables 1{k≤T } and ξk are also independent. a a It is remarkable that the two probabilities are proportional to the amounts of money the gambler needs to make (lose) in the two outcomes. i. . we can think of T as the smaller of the two hitting times T−x and Ta−x of the levels −x and a − x for the random walk (Xn )n∈N0 (remember that Wn = x + Xn . On the one hand. ξk−1 ). . .e. 2 22 . so these two correspond to the hitting times for the process (Wn )n∈N0 of the levels 0 and a). The situation is diﬀerent when p = 1 .where Gj (ξ1 .. A gambler start with x ∈ N dollars and repeatedly plays a game in which he wins a dollar with probability 1 and loses a dollar with probability 1 . We will see later that the probability that the gambler’s wealth will remain strictly between 0 and a forever is zero. ξj ) is the decision function which corresponds to the the event {T = j}. or 2. . Again we see that the gambler cannot extract positive expected value from a fair game.
the events Ai. only depends the steps ξk+1 . Recall that the minimum value in empty set is taken to be +∞ by convention.n = “n is the ﬁrst time after m that Xn = Xm + 1” = {Xi ≤ Xm for all i < n. Let (X1 . . . . = 2. . Proposition 3. In then follows from Proposition 2. and all the steps of the random walk are independent. . So. If then GT (s) = GT1 (s). ξn+j ). ξi+2 . if m. m. . 1}j−i−1 such that Ai. . More precisely. then P(T2 = m + n  T1 = m) = P(Am. lets consider the events Am. This is because Ai. .j on depends on the steps ξi+1 . . ξj ) and (ξn+i+1 . ξj ) and the event An+i. . ξn+i+2 . X2 . . Proof. T We will now use the previous relationship between T1 and T2 to obtain the generating function for T1 explicitly. . If i < j ≤ k < . are independent. but the general case follows in much that same way.n+j is determined by the exact same formula applied to the random variables (ξn+i+1 . n ∈ N. ξi+1 . .n+j must have the same probability. We will essentially follow the approach of Example 2. These events have the two properties: 1. . . . . with the probability p of stepping up and let T = min{n ∈ N0 : Xn = } be the ﬁrst hitting time of level but the random walk. To do this.12 that GT2 (s) = G2 1 (s). . . P(T2 = m + n  T1 = m) = P(T1 = n). ξn+j ) have the same joint distribution. . while Ak.3 The distribution of the ﬁrst hitting time T1 Let (Xn )n∈N0 be a simple random walk.3. . .j is determined by some relatively complicated formula applied to the random variables (ξi+1 . where GT denotes the generating function of T .j = {(ξi+1 . ξj of the random walk. P(Ai. ξi+2 . ξi+2 . .m+n ) = P(A0. .j and An+i. The event Ai.7.j and Ak. and Xn = Xm + 1}. then the events Ai. . .m ) = P(Am. . ξn+k+2 .j ) = P(An+i. 2. As the random vectors (ξi+1 . . . ξ .n+j ) for all n ∈ N0 . The ﬁrst step is contained in the following proposition.m+n  A0. we can choose a set B ∈ {−1. We will now study the random variables T using the generatingfunction methods. ξj ) ∈ B} and An+i. n ∈ N0 . . ξn+j ) ∈ B}.n+j = {(ξn+i+1 .15: we will attempt to determine an equation that the generating function associated with the random variable T satisﬁes. . .n ) = P(T1 = n). ) be a random walk with probability p of moving up. . We will only handle the case The strategy is to show that ∈ N. ξn+k+2 . 23 . ξk+2 . . for m ≤ n.
Then the generating function for T1 is given by 1 − 1 − 4pqs2 GT1 (s) = . 24 . There are two possible T solutions to this equation (for each s). Our strategy is to condition of the ﬁrst move that the random makes. it will be useful to consider an auxiliary process given by: Yn = Xn+1 − X1 . X2 . 2qs One of these solutions is always greater than 1 in absolute value. 2qs Proof. n ∈ N0 . Crucial observation: If the ﬁrst step that X makes is down. T We now know that GT1 solves: GT1 (s) = sp + sqG2 1 (s) for s < 1. So Y corresponds to the changes in the process X after the ﬁrst step. so it cannot correspond to a value of a generating function and we must select the negative square root. This is equivalent to Y climbing two steps up from 0 to 2. This is because X has now has to climb two steps up to get from −1 up to 1. As a result. The other thing to realize is that Y is running on a clock that is one time step behind X. Let (X1 . It turns out that it will be useful to consider the random variable: T2 = “The ﬁrst time that Y hits the level 2” = min {n ∈ N0 : Yn = 2} As Y is a random walk.Proposition 3. T2 is also independent of X1 . T and T2 is determined by Y . the generating function for T2 is given by G2 1 . Now we make the recursive argument: GT1 (s) = E[sT1 ] = E[sT1  X1 = 1] P(X1 = 1) + E[sT1  X1 = −1] P(X1 = −1) = s p + E[s1+T2  X1 = −1] (1 − p) = s p + s E[sT2 ] (1 − p) = s p + s G2 1 (s) (1 − p). given by GT1 (s) = 1± 1 − 4pqs2 . . . It is not hard to check that Y is also a random walk with probability p of moving up at each step and Y is independent of X1 . and then derive a recursive formula for the generating function. If Y ﬁrst hits 2 at time T2 relative to its clock. then X ﬁrst hits 1 at time 1 + T2 relative the initial clock. As Y is independent of X1 . then T1 = 1+T2 . .8. ) be a random walk with probability p of moving up and let T1 = min{n ∈ N0 : Xn = 1} denote the ﬁrst time that the random walk hits the level 1.
Another question that generating functions can help up answer is: how long. we get 2 √ 1 − 1 − s2 1 lim GT1 (s) = lim √ − s 1 s 1 s2 1 − s2 and conclude that E[T1 ] = +∞. we use second part of Proposition 2.Once we have the generating function for T1 . than it is to go up. 2qs2 = +∞. we compute the derivative of GT1 (s) and get GT1 (s) = When p = 1 . In this case. the situation is less severe: s 2p 1 − 4pqs2 − 1− 1 − 4pqs2 . if the random walk is more likely to go down. The ﬁrst question is: does the random walk eventually hit the level 1? To answer this. The obvious way to do this is to compute higher and higher derivatives of GT1 and then set s = 0. In particular. it turns out there is an easier way. we can try to extract the probability mass function of the random variable T1 from the generating function GT1 . 25 . so we can immediately conclude 2 that E[T1 ] = +∞. p≥ 1 2 p < 1. lim GT1 (s) = 1 1 . It is remarkable that if p = 1 . then there is some strictly positive chance that it will never hit the level 1. by deﬁnition. Following the recipe from the 2 lecture on generating functions. do we need to wait before 1 is hit? When p < 1 . the 2 random walk will always hit 1 sooner or later. What 2 we have here is an example of a phenomenon known as criticality: many physical systems exhibit qualitatively diﬀerent behavior depending on whether the value of certain parameter p lies above or below certain critical value p = pc . 2 where we have used the fact that p − q2 = (2p − 1)2 = 4p2 − 4p + 1 = 1 − 4pq.7 and compute P(T1 < ∞) = lim GT1 (s) = s→1 1− √ 1 − 4pq 1 − p − q = = 2q 2q 1. 1 For p > 2 . p−q We can summarize the situation in the following table P[T1 < ∞] E[T1 ] p< p= p> 1 2 1 2 1 2 p q +∞ +∞ 1 p−q 1 1 Finally. but this does not need to happen if p < 1 . we can begin to answer some questions about this random variable. p q. on average. The case p ≥ 1 is more interesting. P[T1 = +∞] > 0.
9. Inuitively.The square root appearing in the formula for GX is an expression of the form (1 + x)1/2 and the (generalized) binomial formula can be used: ∞ (1 + x) = k=0 α α k x . then (Yn )n∈N0 is also a random walk with parameter p. 2q k Of course. If we deﬁne the process Yn = XT +n − XT . we stop. . the random walk cannot move from 0 to 1 in a even number of steps. It turns out this is true not only for deterministic times. . 2k − 1 k k ∈ N. k! Therefore. n ∈ N0 . k ∈ N. the resulting process that we obtain is again a random walk. (α − k + 1) . GT1 (s) = and a2k−1 = 1 1 − 2qs 2qs ∞ k=0 1/2 (−4pqs2 )k = k ∞ s2k−1 k=1 1 1/2 (4pq)k (−1)k−1 . where k α k = α(α − 1) . 2p k 1/2 1 (4pq)k (−1)k−1 . Let (Xn )n∈N0 be a random walk with parameter p. P(T1 = 2k − 1) = and P(T1 = k) = 0 when k is even. we observed that Yn = X1+n − Xn is a again a random walk. so an = 0 if n is even. 3. and then continue.4 Strong Markov property In the previous section. This in fact is a result of the “strong Markov” property which we will deﬁne later in the course. α ∈ R. k ∈ N. This expression can be simpliﬁed a bit further: one can show (by induction on k for instance) that 1/2 2(−1)k+1 2k − 1 = k . 1 2k − 1 k k−1 p q . k k 4 (2k − 1) Thus. Proposition 3. but even for stopping times. reset time and space. let T be a stopping time with respect to (Xn )n∈N0 which never takes the value ∞. 26 .
eugenicist. This distribution.1 children. (b) If Z1 > 0. A population starts with one individual at time n = 0: Z0 = 1. (a) If Z1 happens to be equal to 0 the population is dead and nothing happens at any future time n ≥ 2. protogeneticist. Z1 one. Was it just unfounded paranoia.i and Z1 . gives birth to Z1. th The last. anthropologist. By the end of this section. The total number of individuals in the second generation is now Z1 Z2 = k=1 Z1. or did something real prompt them to come to this conclusion? They decided to ask around. Sir Francis Galton 4. 27 . psychometrician and statistician” and halfcousin of Charles Darwin) posed the following question (1873. inventor. The ﬁrst one has Z1. 2. you will be able to give a precise answer to Galton’s question.4 4. shared by all Zn. geographer. 3. Educational Times): How many male children (on average) must each generation of a family have in order for the family name to continue in perpetuity? The ﬁrst complete answer came from Reverend Henry William Watson soon after.Z1 children. Z1 is an N0 valued random variable. After one unit of time (at time n = 1) the sole individual produces Z1 identical clones of itself and dies. each of Z1 individuals gives birth to a random number of children and dies. We assume that the distribution of the number of children is the same for each individual in every generation and independent of either the number of individuals in the generation and of the number of children the others have. the second one Z1. and Sir Francis Galton (a “polymath. tropical explorer.k .1 Branching Process A bit of history In the mid 19th century several aristocratic families in Victorian England realized that their family names could become extinct. and the two wrote a joint paper entitled One the probability of extinction of families in 1874. is called the oﬀspring distribution. a unit of time later. etc.2 children. meteorologist.2 A mathematical model The model proposed by Watson was the following: 1.
the population is extinct. the increment Zn+1 − Zn depends on Zn : the more individuals in a generation.k . Deﬁnition 4. we need a black box with two inputs “randomness” and Zn . In words. 1). We think of (Un )n∈N0 as a sequence of random numbers produced by a computer. 1] → N0 such that the random variable g(U ) has probability mass function p when U ∼ Uniform(0.an accumulation of the ﬁrst n elements of (ηn )n∈N0 would give you the value Xn of the random walk at time n. When we studies simulation. Branching processes are a bit more complicated. i.i }n∈N0 .i∈N .(c) The third. You can think of Zn. A stochastic process with the properties described in (1). Such an abundance allows us to feed the whole “row” {Zn. The mechanism that produces the next generation from the present one can diﬀer from application to application.i as the number of children the ith individual in the nth generation would have had 2 Can you ﬁnd a onetoone and onto mapping from N into N × N? 28 .e. it is easier to be more wasteful. then Zm = 0 for all m ≥ n . and sum up the results to get Zn+1 . instead of one sequence of independent random variables with pmf p. 1] random variables exists. we would need exactly Zn (unused) elements of (ηn )n∈N0 to simulate the number of children for each of Zn members of generation n. we would be done at this point . generations are produced in the same way. Mathematically. one would draw Zn simulations from the distribution (pn )n∈N0 . In other words. we showed that it is possible to construct a function g : [0. k ∈ N0 . (2) and (3) above is called a (simple) branching process. What do we mean by “randomness”? Ideally. Zn Zn+1 = k=1 Zn. With this new formalism. In the case of a simple random walk. This is exactly how one would do it in practice: given the size Zn of generation n. let us ﬁgure out how to simulate a branching process. fourth.which will produce Zn+1 . we can pose Galton’s question more precisely: Under what conditions on the oﬀspring distribution will the process (Zn )n∈N0 never go extinct. the more oﬀspring they will produce. when does P[Zn ≥ 1 for all n ∈ N0 ] = 1 hold? (4. for some n. Let us ﬁrst apply the function g to each member of (Un )n∈N0 to obtain an independent sequence (ηn )n∈N0 of N0 valued random variables with pmf p. The sequence (ηn )n∈N0 can be rearranged into a double sequence2 {Zn. for a given oﬀspring distribution p(k) = P[Z1 = k].1) 4. Otherwise. It is the oﬀspring distribution alone that determines the evolution of a branching process..i }i∈N into the black box which produces Zn+1 from Zn .3 Construction and simulation of branching processes Before we answer Galton’s question. we have a sequence of sequences. etc. Some time ago we asserted that a probability space which supports a sequence (Un )n∈N0 of independent U [0.1. If it ever happens that Zn = 0.
4.i∈N are independent of each other and have the same distribution with pmf p. . In the ﬁrst three examples no randomness occurs and the population growth can be described exactly. Then the generating function of Zn is the nfold composition of G with itself. Let (Zn )n∈N0 be a branching process with oﬀspring distribution (pn )n∈N0 . which is.she been born. p(0) = 1. more interesting things happen. In the other examples. Then Zn Zn+1 = i=1 Zi. for n ∈ N0 ? It is clear that Zn must be N0 valued. For n = 1.4 A generatingfunction approach Having deﬁned and constructed a branching process (Zn )n∈N0 with oﬀspring distribution given by the pmf p. The ﬁrst question the needs to be answered is the following: What is the distribution of Zn . let us analyze its probabilistic structure. GZn+1 (s) = GZn (G(s)) = G(G(.2. 1. .3. . so its distribution is completely described by its pmf.18 asserts that the generating function GZn+1 of Zn+1 is a composition of the generating function G(s) of each of the summands and the generating function GZn of the random time Zn . for n ≥ 1. 29 . G(s) . . While an explicit expression for the pmf of Zn may not be available.i . so GZ1 (s) = G(s). n ∈ N: In this case Z0 = 1 and Zn = 0 for all n ∈ N. Example 4. i. the distribution of Z1 has pmf p. completely determined by its generating function. Therefore. n + 1 G’s where the second equality follows by induction. where all {Zn. GZn (s) = G(G(. . )). p(n) = 0. and let the generating function of its oﬀspring distribution be given by G(s). . can be viewed as a random sum of Zn independent random variables where each random summand has generating function G and the number of summands Zn is independent of the terms in the sum. .n . Once we learn a bit more about the probabilistic structure of (Zn )n∈N0 . we will describe another way to simulate it. This infertile population dies after the ﬁrst generation. Zn+1 = i=1 Zn. n G’s Proof.2 in some simple examples.i }i∈N and discards the rest: Zn Z0 = 1. G(G(s)) . Proposition 2. The black box uses only the ﬁrst Zn elements of {Zn. Let us use Proposition 4. Let (Zn )n∈N0 be a branching process. . its generating function can always be computed: Proposition 4. in turn. Suppose that the statement of the proposition holds for some n ∈ N. ))).i }n∈N0 ..e.
Therefore.2. The probability of that event is q n . 3. p(n) = 0. n ∈ N0 . GZn (s) = (p + q(p + q(p + q(. so A + B = 1. If you inspect the expression for GZn even more closely. Of course. The generating function G of the oﬀspring distribution (pn )n∈N is given by G(s) = (p + qs)2 . n ≥ 2: Each individual tosses a (a biased) coin and has one child of the outcome is heads or dies childless if the outcome is tails. n pairs of parentheses The expression above can be simpliﬁed considerably. (p + qs))))) . (sk )k . for some k ≥ 2: Here. . The population size is always 1: Zn = 1. p(1) = 2pq. p(k) = 1. so the population grows exponentially: G(s) = sk . GZn (s) = (1 − q n ) + q n s. . Then GZn = (p + q(p + q(. . Therefore. it is not so easy to simplify the above expression. n ≥ 3: In this case each individual has exactly two children and their gender is female with probability p and male with probability q. . so n GZn (s) = ((. n ≥ k. 5. . 30 . . . Zn is just the number of individuals carrying the family name after n generations. . n pairs of parentheses Unlike the example above. but its gender is determined at random . Each individual has exactly one child. p(n) = 0. p(0) = 0. p(0) = p. and assuming that all of them marry. p(0) = 0. for some A. This example can be interpreted alternatively as follows. .2 can be used to compute the mean and variance of the population size Zn . One needs to realize two things: (a) After all the products above are expanded. .male with probability q and female with probability p. for n ∈ N. The generating function of the oﬀspring distribution is G(s) = p + qs. p + qs)2 . independently of each other. . p(n) = 0. . for n ∈ N. there are k kids per individual. B. the value of Zn will be equal to 1 if and only if all of the cointosses of its ancestors turned out to be heads. p(n) = 0. Zn = k n . )k )k = sk . (b) GZn is a generating function of a probability distribution. )2 )2 . 4. p(1) = 0. p(2) = q 2 . . So we didn’t need Proposition 4.2 after all. Proposition 4. n ≥ 2: Each individual produces exactly one child before he/she dies. p(0) = p2 . p(1) = q = 1 − p. . Therefore. Assuming that all females change their last name when they marry. the resulting expression must be of the form A + Bs. you will see that the coeﬃcient B next to s is just q n . p(1) = 1.
then E[Zn ] = µn . . even if the explicit form of the generating function GZn is not known. if ∞ (4.Proposition 4. By Proposition 4. where En = {ω ∈ Ω : Zn (ω) = 0}. )) .2) σ2 = k=0 (k − µ)2 p(k) < ∞. and. i.3). the fact that P[En ] = P[Zn = 0] = GZn (0) we get P[E] = lim GZn (0) = lim G(G(. n→∞ The number P[E] is called the extinction probability.2. If the variance of p is also ﬁnite. µ=1 (4. If p is integrable. n→∞ n→∞ n G’s It is amazing that this probability can be computed.e. A similar (but more complicated and less illuminating) argument can be used to establish (4. in particular. We deﬁne extinction to be the following event: E = {ω ∈ Ω : Zn (ω) = 0 for some n ∈ N}. we get GZn+1 (1) = GZn (G(1))G (1) = GZn (1)G (1) = E[Zn ]E[Z1 ] = µn µ = µn+1 .4.3) hold for n ∈ N. 31 . Therefore.3) Proof. . G(0) . i. Since the distribution of Z1 has probability mass function p. Using generating functions. if ∞ µ= k=0 k p(k) < ∞. the generating function GZn+1 is given as a composition GZn+1 (s) = GZn (G(s)). n+1 µ = 1. We proceed by induction and assume that the formulas (4.. Therefore. It follows from the properties of the branching process that Zm = 0 for all m ≥ n whenever Zn = 0.5 Extinction probability We now turn to the central question (the one posed by Galton).e. we can write E as an increasing union of sets En . . the sequence (P[En ])n∈N is nondecreasing and “continuity of probability” implies that P[E] = lim P[En ].2) and (4. Let p denote the pmf of the oﬀspring distribution for a branching process (Zn )n∈N0 . then Var[Zn ] = σ µ (1 + µ + µ + · · · + µ ) = 2 n 2 n σ 2 µn 1−µ . . it is clear that E[Z1 ] = µ and Var[Z1 ] = σ 2 . 1−µ σ 2 (n + 1). if we use the identity E[Zn+1 ] = GZn+1 (1). Therefore.. 4.
. so p is not larger then any other solution p of x = G(x). p(1) = 0. n→∞ n→∞ n→∞ n→∞ and so p solves the equation G(x) = x. 3. . p(n) = 0. Let us take a particular sequence given by xn = G(G(. p(1) = 1.Proposition 4. . Let us compute extinction probabilities in the cases from Example 4. 1]. where G is the generating function of the oﬀspring distribution. p(n) = 0. n ≥ 2: Like above. P[E] = 1 in this case. n G’s Then 1.3. and 2. we have G(0) ≤ G(p ) = p . called the extinction equation. 1]. G(0) . p(n) = 0. The fact that p = P[E] is the smallest solution of x = G(x) on [0. Example 4. Since 0 ≤ p and G is a nondecreasing function on [0. Let p be another solution of x = G(x) on [0. . .5. . 1] is a bit trickier to get. . .6. The extinction probability p = P[E] is the smallest nonnegative solution of the equation x = G(x). Indeed.P[E] = 0. n ≥ k. Proof. )) ≤ p . p = P[E] = limn→∞ xn . the situation is clear . G(0) . p = lim xn = lim xn+1 = lim G(xn ) = G( lim xn ) = G(p). 1]. 1. Therefore.P[E] = 0. . for some k ≥ 2: No extinction here . We can apply the function G to both sides of the inequality above to get G(G(0)) ≤ G(G(p )) = G(p ) = p . p(0) = 1. n G’ so p = P[E] = limn→∞ P[En ] ≤ p . Continuing in the same way we get P [En ] = G(G(. so G(limn→∞ xn ) = limn→∞ G(xn ) for every convergent sequence (xn )n∈N0 in [0. )) . 2. . p(0) = 0. p(0) = 0. p(k) = 1. Let us show ﬁrst that p = P[E] is a solution of the equation x = G(x). n ∈ N: No need to use any theorems. G is a continuous function. 32 . . G(xn ) = xn+1 . .
5. p(n) = 0. the only solution is s = 1 . 33 . Probability and measure. It is interesting to note the jump in the extinction probability as p changes from 0 to a positive number. John Wiley & Sons Inc. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. New York. s = 1 is the smallest solution. n ≥ 2: Since G(s) = p + qs.4.the extinction is guaranteed.. p(0) = p2 . n ≥ 3: Here G(s) = (p + qs)2 . p(0) = p. so no extinction occurs. the only solution is s = 0. q When p < q. so the extinction equation reads s = (p + qs)2 . second edition. p(1) = q = 1 − p. p(1) = 2pq. If p = 0. p(n) = 0. Therefore p2 P[E] = min(1. 2 ). When p ≥ q. q 2 References [1] Patrick Billingsley. if we assume that q > 0. This is a quadratic in s and its solutions are s1 = 1 and s2 = p2 . p(2) = q 2 . 1986. the extinction equation is s = p + qs. the smaller of the two is s2 . If p > 0.
This action might not be possible to undo. Are you sure you want to continue?