This action might not be possible to undo. Are you sure you want to continue?
University of California, Santa Barbara
Gerard Brunick
Last Updated: February 2, 2012
These notes are a minor modiﬁcation of a set on notes which were generously shared with the
present “author” by Gordan
ˇ
Zitkovi´c who currently works in the Department of Mathematics at
The University of Texas at Austin. Any mistakes in these notes where undoubtedly introduced by
the present author when he modiﬁed the original presentation.
Contents
1 Random Walks I 2
1.1 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The canonical probability space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Constructing the random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 The reﬂection principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Generating Functions 9
2.1 Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Associating a generating function with a random variable . . . . . . . . . . . . . . . 10
2.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Random sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Random Walks II 19
3.1 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Wald’s identity II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 The distribution of the ﬁrst hitting time T
1
. . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Branching Process 27
4.1 A bit of history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 A mathematical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Construction and simulation of branching processes . . . . . . . . . . . . . . . . . . . 28
4.4 A generatingfunction approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.5 Extinction probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1
1 Random Walks I
1.1 Stochastic processes
Deﬁnition 1.1. Let T be a subset of [0, ∞). A family of random variables (X
t
)
t∈T
, indexed by T ,
is called a stochastic (or random) process. When T = N (or T = N
0
), (X
t
)
t∈T
is said to be a
discretetime process, and when T = [0, ∞), it is called a continuoustime process.
When T is a singleton (say T = ¦1¦), the process (X
t
)
t∈T
≡ X
1
is really just a single random
variable. When T is ﬁnite (e.g., T = ¦1, 2, . . . , n¦), we get a random vector. Therefore, stochastic
processes are generalizations of random vectors. The interpretation is, however, somewhat diﬀer
ent. While the components of a random vector usually (not always) stand for diﬀerent spatial
coordinates, the index t ∈ T is more often than not interpreted as time. Stochastic processes
usually model the evolution of a random system in time. When T = [0, ∞) (continuoustime
processes), the value of the process can change every instant. When T = N (discretetime
processes), the changes occur discretely.
In contrast to the case of random vectors or random variables, it is not easy to deﬁne a notion
of a density (or a probability mass function) for a stochastic process. Without going into details
why exactly this is a problem, let me just mention that the main culprit is the inﬁnity. One usually
considers a family of (discrete, continuous, etc.) ﬁnitedimensional distributions, i.e., the joint
distributions of random vectors
(X
t
1
, X
t
2
, . . . , X
tn
),
for all n ∈ N and all choices t
1
, . . . , t
n
∈ T .
The notion of a stochastic processes is very important both in mathematical theory and its
applications in science, engineering, economics, etc. It is used to model a large number of various
phenomena where the quantity of interest varies discretely or continuously through time in a non
predictable fashion.
Every stochastic process can be viewed as a function of two variables  t and ω. For each ﬁxed
t, ω → X
t
(ω) is a random variable, as postulated in the deﬁnition. However, if we change our
point of view and keep ω ﬁxed, we see that the stochastic process is a function mapping ω to
the realvalued function t → X
t
(ω). These functions are called the trajectories of the stochastic
process X. The following two ﬁgures show two possible trajectories of a simple random walk
1
, i.e.,
each one corresponds to a (diﬀerent) frozen ω ∈ Ω, but t varies from 0 to 30.
5 10 15 20
6
4
2
2
4
6
5 10 15 20
6
4
2
2
4
6
1
We will deﬁne the simple random walk later. For now, let us just say that is behaves as follows. It starts at
x = 0 for t = 0. After that a (possibly biased) fair coin is tossed and we move up (to x = 1) if heads is observed and
down to x = −1 is we see tails. The procedure is repeated at t = 1, 2, . . . and the position at t + 1 is determined in
the same way, independently of all the coin tosses before (note that the position at t = k can be any of the following
x = −k, x = −k + 2, . . . , x = k −2, x = k).
2
Unlike with the ﬁgures above, the next two pictures show two timeslices of the same random
process; in each graph, the time t is ﬁxed (t = 15 vs. t = 25) but the various values random variables
X
15
and X
25
can take are presented through the probability mass functions.
20 10 0 10 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Figure 1: Probability mass function for X
15
20 10 0 10 20
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Figure 2: Probability mass function for X
25
1.2 The canonical probability space
When one deals with inﬁniteindex ([T [ = +∞) stochastic processes, the construction of the proba
bility space (Ω, T, P) to support a given model is usually quite a technical matter. This course does
not suﬀer from that problem because all our models can be implemented on a special probability
space. We start with the samplespace Ω:
Ω = [0, 1] [0, 1] = [0, 1]
N
0
,
and any generic element of Ω will be a sequence ω = (ω
0
, ω
1
, ω
2
, . . . ) of real numbers in [0, 1]. For
n ∈ N
0
we deﬁne the mapping U
n
: Ω → [0, 1] which simply chooses the nth coordinate:
U
n
(ω) = ω
n
.
The proof of the following theorem can be found in most advanced probability books (e.g. [1] Thm.
20.4):
Theorem 1.2. There exists a probability measure P on Ω such that
1. each U
n
, n ∈ N
0
is a random variable with the uniform distribution on [0, 1], and
2. the sequence (U
n
)
n∈N
0
is independent.
Remark 1.3. One should think of the sample space Ω as a source of all the randomness in the
system: the elementary event ω ∈ Ω is chosen by a process beyond out control and the exact
value of ω is assumed to be unknown. All the other parts of the system are possibly complicated,
but deterministic, functions of ω (random variables). When a coin is tossed, only a single drop of
randomness is needed  the outcome of a cointoss. When several coins are tossed, more randomness
is involved and the sample space must be bigger. When a system involves an inﬁnite number of
random variables (like a stochastic process with inﬁnite T ), a large sample space Ω is needed.
Once we can construct a sequence of independent random variables which are uniformly dis
tributed on the unit interval, we can then construct any number of models. For example:
3
1.3 Constructing the random walk
Let us show how to construct the simple random walk on the canonical probability space (Ω, T, P)
from Theorem 1.2. First of all, we need a deﬁnition of the simple random walk:
Deﬁnition 1.4. A sequence (X
n
)
n∈N
0
of random variables is called a simple random walk (with
parameter p ∈ (0, 1)) if
a) X
0
= 0,
b) X
n+1
−X
n
is independent of (X
0
, X
1
, . . . , X
n
) for all n ∈ N, and
c) the random variable X
n+1
−X
n
has the following distribution
x = 1 −1
P(X
n+1
−X
n
= x) = p q
where, as usual, q = 1 −p.
If p =
1
2
, the random walk is called symmetric.
The adjective simple comes from the fact that the size of each step is ﬁxed (equal to 1) and it
is only the direction that is random. One can study more general random walks where each step
comes from an arbitrary prescribed probability distribution. For the sequence (U
n
)
n∈N
, given by
Theorem 1.2, deﬁne the following, new, sequence (ξ
n
)
n∈N
of random variables:
ξ
n
=
_
1, U
n
≤ p
−1, otherwise.
We then set
X
0
= 0, X
n
=
n
k=1
ξ
k
, n ∈ N.
Intuitively, we use each ξ
n
to emulate a biased coin toss and then deﬁne the value of the process
X at time n as the cumulative sum of the ﬁrst n cointosses.
Proposition 1.5. The sequence (X
n
)
n∈N
0
deﬁned above is a simple random walk.
Proof. Property a) is trivially true. To check property b), we ﬁrst note that the (ξ
n
)
n∈N
is an
independent sequence (as it has been constructed by an application of a deterministic function to
each element of an independent sequence (U
n
)
n∈N
). Therefore, the increment X
n+1
− X
n
= ξ
n+1
is independent of all the previous cointosses ξ
1
, . . . , ξ
n
. What we need to prove, though, is that it
is independent of all the previous values of the process X. These, previous, values are nothing but
linear combinations of the cointosses ξ
1
, . . . , ξ
n
, so they must also be independent of ξ
n+1
. Finally,
to get (3), we compute
P[X
n+1
−X
n
= 1] = P[ξ
n+1
= 1] = P[U
n+1
≤ p] = p.
A similar computation shows that P[X
n+1
−X
n
= −1] = q.
We have now deﬁned and constructed a random walk (X
n
)
n∈N
0
. Our next task is to study some
of its mathematical properties.
4
Proposition 1.6. Let (X
n
)
n∈N
0
be a simple random walk with parameter p. The distribution of
the random variable X
n
is discrete with support ¦−n, −n + 2, . . . , n −2, n¦, and probabilities
P[X
n
= l] =
_
n
l+n
2
_
p
(n+l)/2
q
(n−l)/2
, l = −n, −n + 2, . . . , n −2, n. (1.1)
Proof. X
n
is composed of n independent steps ξ
k
= X
k+1
− X
k
, k = 1, . . . , n, each of which goes
either up or down. In order to reach level l in those n steps, the number u of upsteps and the
number d of downsteps must satisfy u −d = l (and u +d = n). Therefore, u =
n+l
2
and d =
n−l
2
.
The number of ways we can choose these u upsteps from the total of n is
_
n
n+l
2
_
, which, with
the fact the probability of any trajectory with exactly u upsteps is p
u
q
n−u
, gives the probability
(1.1) above. Equivalently, we could have noticed that the random variable
n+Xn
2
has the binomial
b(n, p)distribution.
1.4 The reﬂection principle
Now we know how to compute the probabilities related to the position of the random walk (X
n
)
n∈N
0
at a ﬁxed future time n. A mathematically more interesting question can be posed about the
maximum of the random walk on ¦0, 1, . . . , n¦. A nice expression for this probability is available
for the case of symmetric simple random walks.
To compute this quantity, it is more helpful to view the random walk
as a random trajectory in some space of paths, and, compute the
required probability by simply counting the number of trajectories in
the subset (event) you are interested in, and adding them all together,
weighted by their probabilities. To prepare the ground for the future
results, let C be the set of all possible trajectories:
C = ¦(x
0
, x
1
, . . . , x
n
) : x
0
= 0, x
k+1
−x
k
= ±1, k ≤ n −1¦.
You can think of the ﬁrst n steps of a random walk simply as a
probability distribution on the statespace C.
The ﬁgure on the right shows the superposition of all trajectories in
C for n = 4 with path (0, 1, 0, 1, 2) marked in red.
1 2 3 4
4
2
2
4
Proposition 1.7. Let (X
n
)
n∈N
0
be a symmetric simple random walk, suppose n ≥ 2, and let
M
n
= max(X
0
, . . . , X
n
) be the maximal value of (X
n
)
n∈N
0
on the interval 0, 1, . . . , n. The support
of M
n
is ¦0, 1, . . . , n¦ and its probability mass function is given by
P[M
n
= l] =
_
n
¸
n+l+1
2

_
2
−n
, l = 0, . . . , n.
Proof. Let us ﬁrst pick a level l ∈ ¦0, 1, . . . , n¦ and compute the auxiliary probability P[M
n
≥ l] by
counting the number of trajectories whose maximal level reached is at least l. Indeed, the symmetry
assumption ensures that all trajectories are equally likely. More precisely, let A
l
⊂ C be given by
A
l
=
_
(x
0
, x
1
, . . . , x
n
) ∈ C : max
k=0,...,n
x
k
≥ l
_
=
_
(x
0
, x
1
, . . . , x
n
) ∈ C : x
k
≥ l, for at least one k ∈ ¦0, . . . , n¦
_
.
5
Then P[M
n
≥ l] =
1
2
n
[A
l
[, where [A[ denotes the number of elements in the set A. When l = 0,
we clearly have P[M
n
≥ 0] = 1, since X
0
= 0. To count the number of elements in A
l
, we use the
following clever observation (known as the reﬂection principle):
Claim: For l ∈ N, we have
[A
l
[ = 2
¸
¸
¦(x
0
, x
1
, . . . , x
n
) : x
n
> l¦
¸
¸
+
¸
¸
¦(x
0
, x
1
, . . . , x
n
) : x
n
= l¦
¸
¸
. (1.2)
We start by deﬁning a bijective transformation which maps trajectories into trajectories. For a
trajectory (x
0
, x
1
, . . . , x
n
) ∈ A
l
, let k(l) = k(l, (x
0
, x
1
, . . . , x
n
)) be the smallest value of the index k
such that x
k
≥ l. In the stochasticprocesstheory parlance, k(l) is the ﬁrst hitting time of the
set ¦l, l +1, . . . ¦. We know that k(l) is welldeﬁned (since we are only applying it to trajectories in
A
l
) and that it takes values in the set ¦1, . . . , n¦. With k(l) at our disposal, let (y
0
, y
1
, . . . , y
n
) ∈ C
be a trajectory obtained from (x
0
, x
1
, . . . , x
n
) by the following procedure:
1. do nothing until you get to k(l):
• y
0
= x
0
,
• y
1
= x
1
, . . .
• y
k(l)
= x
k(l)
.
2. use the ﬂipped values for the cointosses
from k(l) onwards:
• y
k(l)+1
−y
k(l)
= −(x
k(l)+1
−x
k(l)
),
• y
k(l)+2
−y
k(l)+1
= −(x
k(l)+2
−x
k(l)+1
),
. . .
• y
n
−y
n−1
= −(x
n
−x
n−1
).
2 4 6 8 10 12 14
2
2
4
6
The picture on the right shows two trajectories: a blue one and its reﬂection in red, with n = 15,
l = 4 and k(l) = 8. Graphically, (y
0
, . . . , y
n
) looks like (x
0
, . . . , x
n
) until it hits the level l, and
then follows its reﬂection around the level l so that y
k
− l = l − x
k
, for k ≥ k(l). If k(l) = n,
then (x
0
, x
1
, . . . , x
n
) = (y
0
, y
1
, . . . , y
n
). It is clear that (y
0
, y
1
, . . . , y
n
) is in C. Let us denote this
transformation by
Φ : A
l
→ C, Φ(x
0
, x
1
, . . . , x
n
) = (y
0
, y
1
, . . . , y
n
)
and call it the reﬂection map. The ﬁrst important property of the reﬂection map is that it is its
own inverse: apply Φ to any (y
0
, y
1
, . . . , y
n
) in A
l
, and you will get the original (x
0
, x
1
, . . . , x
n
). In
other words Φ ◦ Φ = Id, i.e. Φ is an involution. It follows immediately that Φ is a bijection from
A
l
onto A
l
.
To get to the second important property of Φ, let us split the set A
l
into three parts according
to the value of x
n
:
1. A
>
l
= ¦(x
0
, x
1
, . . . , x
n
) ∈ A
l
: x
n
> l¦,
2. A
=
l
= ¦(x
0
, x
1
, . . . , x
n
) ∈ A
l
: x
n
= l¦, and
3. A
<
l
= ¦(x
0
, x
1
, . . . , x
n
) ∈ A
l
: x
n
< l¦,
So that
Φ(A
>
l
) = A
<
l
, Φ(A
<
l
) = A
>
l
, and Φ(A
=
l
) = A
=
l
.
6
We should note that, in the deﬁnition of A
>
l
and A
=
l
, the a priori stipulation that (x
0
, x
1
, . . . , x
n
) ∈
A
l
is unnecessary. Indeed, if x
n
≥ l, you must already be in A
l
. Therefore, by the bijectivity of Φ,
we have
[A
<
l
[ = [A
>
l
[ = [¦(x
0
, x
1
, . . . , x
n
) : x
n
> l¦[,
and so
[A
l
[ = 2 [¦(x
0
, x
1
, . . . , x
n
) : x
n
> l¦[ +[¦(x
0
, x
1
, . . . , x
n
) : x
n
= l¦[.
This shows the claim.
Now that we have (1.2), we can easily rewrite it as follows:
P[M
n
≥ l] = P[X
n
= l] + 2
j>l
P[X
n
= j] =
j>l
P[X
n
= j] +
j≥l
P[X
n
= j].
Finally, we subtract P[M
n
≥ l + 1] from P[M
n
≥ l] to get the expression for P[M
n
= l]:
P[M
n
= l] = P[X
n
= l + 1] +P[X
n
= l].
It remains to note that only one of the probabilities P[X
n
= l + 1] and P[X
n
= l] is nonzero, the
ﬁrst one if n and l have diﬀerent parity and the second one otherwise. In either case the nonzero
probability is given by
_
n
¸
n+l+1
2

_
2
−n
.
Let us use the reﬂection principle to solve a classical problem in combinatorics.
Example 1.8 (The Ballot Problem). Suppose that two candidates, Daisy and Oscar, are run
ning for oﬃce, and n ∈ N voters cast their ballots. Votes are counted by the same oﬃcial, one
by one, until all n of them have been processed (like in the old days). After each ballot is opened,
the oﬃcial records the number of votes each candidate has received so far. At the end, the oﬃcial
announces that Daisy has won by a margin of m > 0 votes, i.e., that Daisy got (n+m)/2 votes and
Oscar the remaining (n −m)/2 votes. What is the probability that Daisy never trails Oscar during
the counting of the votes?
We assume that the order in which the oﬃcial counts the votes is completely independent of the
actual votes, and that each voter chooses Daisy with probability p ∈ (0, 1) and Oscar with probability
q = 1 − p. For k ≤ n, let X
k
be the number of votes received by Daisy minus the number of votes
received by Oscar in the ﬁrst k ballots. When the k + 1st vote is counted, X
k
either increases by
1 (if the vote was for Daisy), or decreases by 1 otherwise. The votes are independent of each other
and X
0
= 0, so X
k
, 0 ≤ k ≤ n is (the beginning of ) a simple random walk. The probability of an
upstep is p ∈ (0, 1), so this random walk is not necessarily symmetric. The ballot problem can now
be restated as follows:
What is the probability that X
k
≥ 0 for all k ∈ ¦0, . . . , n¦, given that X
n
= m?
The ﬁrst step towards understanding the solution is the realization that the exact value of p does
not matter. Indeed, we are interested in the conditional probability P[F[G] = P[F ∩G]/P[G], where
F = “all trajectories that stay nonnegative” = ¦X
i
≥ 0, for all 0 ≤ i ≤ n¦,
G = “all trajectories that reach m at time n” = ¦X
n
= m¦.
7
Each trajectory in G has (n+m)/2 upsteps and (n−m)/2 downsteps, so its probability weight is
always equal to p
(n+m)/2
q
(n−m)/2
. Therefore,
P[F[G] =
P[F ∩ G]
P[G]
=
[F ∩ G[ p
(n+m)/2
q
(n−m)/2
[G[ p
(n+m)/2
q
(n−m)/2
=
[F ∩ G[
[G[
. (1.3)
We already know how to count the number of paths in G  it is equal to
_
n
(n+m)/2
_
 so “all” that
remains to be done is to count the number of paths in G∩ F. If we set
H = “all paths which ﬁnish at m and visit the level l = −1”
= ¦X
n
= m and min
0≤i≤n
X
i
≤ −1¦.
then G = (G∩F) ∪H. In other words, the collection of paths that go from 0 to m can be split into
1. G∩ F: the paths that go from 0 to m and stay nonegative, and
2. H: the paths that go from 0 to m and become negative at some point.
So [G∩ F[ = [G[ −[H[.
Can we use the reﬂection principle to ﬁnd [H[? Yes, we can. In fact, you can convince yourself
that the reﬂection of any path in H around the level l = −1 after its ﬁrst hitting time of that level
produces a path that starts at 0 and ends at −m − 2. Conversely, the same procedure applied to
such a path yields a path in H. If a path travels from 0 to −m−2, then it must have (n+m+2)/2
down steps and (n −m−2)/2 up steps. This means there are
_
n
1+(n+m)/2
_
of these paths. Putting
everything together, we get
P[F[G] =
_
n
k
_
−
_
n
k+1
_
_
n
k
_ =
2k + 1 −n
k + 1
, where k =
n +m
2
.
The last equality follows from the deﬁnition of binomial coeﬃcients
_
n
k
_
=
n!
k!(n−k)!
.
How would you modify this argument compute the probability that Daisy leads Oscar during the
entire counting of votes?
The Ballot problem has a long history (going back to at least 1887) and has spurred a lot of
research in combinatorics and probability. In fact, people still write research papers on some of
its generalizations. When posed outside the context of probability, it is often phrased as “in how
many ways can the counting be performed . . . ” (the diﬀerence being only in the normalizing factor
_
n
k
_
appearing in (1.3) above). A special case m = 0 seems to be even more popular  the number
of 2nstep paths from 0 to 0 never going below zero is called the Catalan number and equals to
C
n
=
1
n + 1
_
2n
n
_
.
Can you derive this expression from (1.3)? If you want to test your understanding a bit further,
here is an identity (called Segner’s recurrence formula) satisﬁed by the Catalan numbers
C
n
=
n
i=1
C
i−1
C
n−i
, n ∈ N.
Can you prove it using the Ballotproblem interpretation?
8
2 Generating Functions
A generating function is a clothesline on which we hang up a sequence of numbers for display.
Herbert S. Wilf, generatingfunctionology
The pathcounting method used in the previous lecture only works for computations related to
the ﬁrst n steps of the random walk, where n is given in advance. We will see later that most of the
interesting questions do not fall into this category. For example, the distribution of the time it takes
for the random walk to hit the level l ,= 0 is like that. There is no way to give an apriori bound
on the number of steps it will take to get to l (in fact, the expectation of this random variable can
be +∞). To deal with a wider class of properties of random walks (and other processes), we need
to develop some new mathematical tools.
2.1 Generating functions
A generating function associates a sequence of numbers with a function (a power series). It turns
out that in many cases of interest we can use this function to learn about the sequence.
Deﬁnition 2.1. If (a
n
)
n∈N
0
is a sequence of numbers then we say that radius of convergence
of the power series
∞
k=0
a
k
s
k
is the largest number R ∈ [0, ∞] such that
∞
k=0
[a
k
[[s[
k
converges
when [s[ < R. When R > 0, we say that the function
G(s) =
k∈N
0
a
k
s
k
, −R < s < R,
is the generating function associated with the sequence (a
n
)
n∈N
0
.
The generating function A associated with a sequence (a
n
)
n∈N
0
is inﬁnitely diﬀerentiable, and
its derivative can be expressed as another power series.
Proposition 2.2. When R > 0, the function A(s) is inﬁnitely diﬀerentiable on (−R, R) and
d
n
ds
n
A(s) =
n≤k<∞
k(k −1) (k −n + 1) a
k
s
k−n
, n ∈ N. (2.1)
In particular, a
n
=
1
n!
d
n
ds
n
A(s)
¸
¸
s=0
, so we can recover the sequence (a
n
)
n∈N
0
from the function A.
Proof. If we formally diﬀerentiate each term in the expression:
A(s) = (a
0
+a
1
s +a
2
s
2
+a
3
s
3
+. . . ),
then we see that the result is again a power series with the stated coeﬃcients. For example:
d
ds
A(s) =
1≤k<∞
k a
k
s
k−1
,
d
2
ds
2
A(s) =
2≤k<∞
k(k −1) a
k
s
k−2
.
Checking this result rigorously is beyond the scope of this course.
9
The name generating function comes from the last part of this result. The knowledge of A implies
the knowledge of the whole sequence (a
n
)
n∈N
0
. It also turns out that we can use generating functions
to study convolution. We will see shortly that convolution arises naturally when we compute the
probability mass function of the sum of two independent, N
0
valued, random variables.
Deﬁnition 2.3. Let (a
n
)
n∈N
0
and (b
n
)
n∈N
0
be sequences and deﬁne
c
n
=
n
j=0
a
j
b
j−k
=
n
k=0
a
n−k
b
k
, n ∈ N
0
.
Then we say that the sequence (c
n
)
n∈N
0
is the convolution of the sequences (a
n
)
n∈N
0
and (b
n
)
n∈N
0
and we write c = a ∗ b.
It turns out that convolving two sequences is equivalent to multiplying their generating functions.
Proposition 2.4. Let (a
n
)
n∈N
0
and (b
n
)
n∈N
0
be sequences, let G
a
and G
b
denote the generating
functions associated with these sequences, and assume that both power series have radius of con
vergence at least as large as R > 0. If we set c = a ∗ b, then the
G
c
(s) =
∞
k=0
c
k
s
k
,
also has radius of convergence at least as large as R, and G
c
(s) = G
a
(s)G
b
(s) for [s[ < R.
Proof. If we formally expand and then collect like powers of s in the following expression:
G
a
(s)G
b
(s) = (a
0
+a
1
s +a
2
s
2
+. . . ) (b
0
+b
1
s +b
2
s
2
+. . . )
then we see that the resulting coeﬃcient for s
n
is given by c
n
. Checking the remaining claims
rigorously is again beyond the scope of this course.
2.2 Associating a generating function with a random variable
In this section we will look at random variables which take values in the set
T N
0
∪ ¦+∞¦ = ¦0, 1, 2, 3, . . . ¦ ∪ ¦+∞¦.
We will often be interested in random variables which record the amount of (discrete) time that we
have to wait for an event to occur. In some cases, the event may never occurs, so we allow these
random variables to take the value +∞ to indicate that we have to wait “forever.”
The distribution of an Tvalued random variable X is completely determined by the sequence
(a
n
)
n∈N
0
of numbers in [0, 1] given by
a
n
= P[X = n], n ∈ N
0
. (2.2)
Notice that the value P(X = ∞) does not occur in the sequence (a
n
)
n∈N
0
, but we can still ﬁgure it
from the values in the sequence:
P(X = ∞) = 1 −P(X < ∞) = 1 −
n∈N
0
a
n
.
10
In the future, when we say “let (a
n
)
n∈N
0
be the sequence associated with X”, we mean that (a
n
)
n∈N
0
is given by (2.2). We then deﬁne the generating function associated with the sequence (a
n
)
n∈N
0
by
G
X
(s) =
0≤k<∞
a
k
s
k
. (2.3)
It follows from the fact that [a
n
[ ≤ 1 that the radius of convergence of the sequence (a
n
)
n∈N
0
is at
least 1 and G
X
is well deﬁned for s ∈ (−1, 1). The function G
X
that we have obtained in known
as the generating function or probability generating function associated with X.
Before we proceed, let us ﬁnd an expression for the generating functions of some of the popular
N
0
valued random variables.
Example 2.5.
(1) Bernoulli(p): Here a
0
= q, a
1
= p, and a
n
= 0, for n ≥ 2. Therefore,
G
X
(s) = ps +q.
(2) Binomial(n, p): Since a
k
=
_
n
k
_
p
k
q
n−k
, k = 0, . . . , n, we have
G
X
(s) =
n
k=0
_
n
k
_
p
k
q
n−k
s
k
= (ps +q)
n
,
by the binomial theorem.
(3) Geometric(p) For k ∈ N
0
, a
k
= q
k
p, so that
G
X
(s) =
∞
k=0
q
k
s
k
p = p
∞
k=0
(qs)
k
=
p
1 −qs
.
(4) Poisson(λ): Given that a
k
= e
−λ λ
k
k!
, k ∈ N
0
, we have
G
X
(s) =
∞
k=0
e
−λ
λ
k
k!
s
k
= e
−λ
∞
k=0
(sλ)
k
k!
= e
−λ
e
sλ
= e
λ(s−1)
.
Remark 2.6. Note that the true radius of convergence varies from distribution to distribution but
is never smaller than 1. It is inﬁnite in (1), (2) and (4), and equal to 1/q > 1 in (3), in Example
2.5. For the distribution with pmf given by a
k
=
C
(k+1)
2
, where C = (
∞
k=0
1
(k+1)
2
)
−1
, the radius of
convergence is exactly equal to 1. Can you see why?
The following proposition gives another way to the compute the generating function associated
with a random variable.
Proposition 2.7. Let X be an Tvalued random variable with generating function G
X
. Then
1. G
X
(s) = E[s
X
] = E[s
X
1
{X<∞}
], s ∈ (−1, 1),
2. P(X < ∞) = lim
s1
G
X
(s).
11
Proof. Statement (1) follows directly from the formula
E[g(X)] =
n∈T
g(n) P(X = n)
applied to g(x) = s
x
where we have used the fact/convention that s
∞
= 0 if [s[ < 1.
The second claim follows from the fact that:
lim
s1
G
X
(s) = lim
s1
n∈N
0
s
n
P(X = n)
=
n∈N
0
lim
s1
s
n
P(X = n) =
n∈N
0
P(X = n) = P(X < ∞).
Of course, one should really justify the fact that we can exchange the summation and limit in the
previous equation. In this case, one could employ the monotone convergence theorem from real
analysis, or simply check it by hand, but this is beyond the scope of this course.
Remark 2.8. We used the formula a
n
= P[X = n] to associate a sequence with the random variable
X. One could also use the formula b
n
= E[X
n
/n!] to associate a sequence with X. If one then
computes the generating function B corresponding to the sequence (b
n
), the resulting function
is given by B(s) = E[e
sX
] and is known as the moment generating function associated with
X. The moment generating function is quite similar to the probability generating function. In
particular, one can check that that they are related by the formula G
X
(s) = B(log(s)), for s ∈ (0, 1).
The probability generating function will turn out to be more convenient for this class.
2.3 Convolution
The true power of generating functions comes from the fact that they behave very well under the
usual operations in probability.
Proposition 2.9. Let X, Y be independent Tvalued random variables and set Z = X + Y . If
(a
n
)
n∈N
0
and G
X
are the sequence and generating function associated with X, (b
n
) and G
y
are
the sequence and generating function associated with Y , and (c
n
) and G
Z
are the sequence and
generating function associated with Z, then c = a ∗ b and G
Z
(s) = G
X
(s)G
Y
(s).
Proof. For each n ∈ N, we have
c
n
= P(Z = n) =
n
i=0
P(X = i and Y = n −i) =
n
i=0
P(X = i) P(Y = n −i) =
n
i=0
a
i
b
n−1
.
Similarly, if [s[ < 1, then
G
Z
(s) = E[s
Z
] = E[s
X+Y
] = E[s
X
s
Y
] = E[s
X
] E[s
Y
] = G
X
(s)G
Y
(s).
12
Example 2.10.
1. The binomial(n, p) distribution is a sum of n independent Bernoulli random variables with
parameter(p). Therefore, if we apply Prop. 2.9 n times to the generating function (q +ps) of
the Bernoulli b(p) distribution we immediately get that the generating function of the binomial
is (q +ps) . . . (q +ps) = (q +ps)
n
.
2. More generally, we can show that the sum of m independent random variables with the b(n, p)
distribution has a binomial(mn, p) distribution. If you try to sum binomials with diﬀerent
values of the parameter p you will not get a binomial.
3. What is even more interesting, the following statement can be shown: Suppose that the sum
Z of two independent N
0
valued random variables X and Y is binomially distributed with
parameters n and p. Then both X and Y are binomial with parameters n
X
, p and n
y
, p where
n
X
+n
Y
= n. In other words, the only way to get a binomial as a sum of independent random
variables is the trivial one.
We will actually need something slightly more complicated than Proposition 2.9 when we get
back to random walks. To understand what we want, lets consider the following example.
Example 2.11. You own a pizza shop with a single delivery driver. You send your driver out with
an order, but then realize that you gave him the wrong pizza. Unfortunately, it’s 1983 so you can’t
call your driver on a cell phone and there is nothing that you can do but wait for your driver to
return.
Let T
1
∈ T denote the time that your driver gets back from the ﬁrst trip, and let T
2
≥ T
1
denote the time that your driver gets back from the second trip. Of course, these should probably be
continuous random variable, but lets assume that the world is discrete. Moreover, if you don’t like
the idea that T
1
might take the value zero, you can just assign probability zero to this possibility.
Your driver is actually somewhat unreliable: on each trip there is some chance that he will decide
he is sick of this job and never return. In other words, P(T
1
= ∞) > 0 and P(T
2
= ∞) > 0.
In the event that your driver does return to the shop after the ﬁrst trip, the time that it takes
for him to make the second round trip is independent of the time that the ﬁrst trip took and has
the same distribution. More formally, we suppose that:
P(T
2
= m+n [ T
1
= m) = P(T
1
= n), m, n ∈ N
0
. (2.4)
Since conditional on the event ¦T
1
= m¦, T
2
can only take the values ¦m, m+1, m+2, . . . ¦∪¦∞¦.
So it follows from (2.4) that
P(T
2
= ∞ [ T
1
= m) = 1 −
n∈N
0
P(T
2
= m+n [ T
1
= m) = 1 −
n≤k<∞
P(T
1
= n) = P(T
1
= ∞).
Let G
T
1
denote the generating function associated with the random variable T
1
and let G
T
2
denote the generating function associated with the random variable T
2
. We would like to check that
G
T
2
(s) = G
2
T
1
(s) for [s[ < 1. Morally, we want to apply Prop. 2.9 to T
1
and T
2
−T
1
, but there one
sticking point: if T
1
= ∞, then T
2
= ∞ and T
2
−T
1
is undeﬁned. In other words, if the driver quits
before he return from the ﬁrst round trip, then it doesn’t make any sense to ask how long the second
13
round trip took. We could just deﬁne ∞−∞ = ∞, but this still isn’t going to make T
1
and T
2
−T
1
independent. Fortunately, this minor annoyance doesn’t end up mattering. First notice that
E(s
T
2
[ T
1
= m) =
n∈T
s
m+n
P(T
1
= n) = s
m
G
T
1
(s), m ∈ N
0
, [s[ < 1.
We also know that T
2
= ∞ when T
1
= ∞, so
E(s
T
2
[ T
1
= ∞) = s
∞
= 0, [s[ < 1.
As a result, we may apply the tower law for conditional expectation to conclude that
G
T
2
(s) = E[s
T
2
] =
m∈T
E[s
T
2
[ T
1
= m] P(T
1
= m) =
m∈N
0
s
m
G
T
1
(s) P(T
1
= m) = G
2
T
1
(s),
when [s[ < 1.
In fact, we know something slightly stronger. If two generating functions agree around zero,
then it follows from Proposition 2.2 they are generated by the same sequence, so they must agree
on their entire common radius of convergence.
We have now shown the following proposition which we will need in the next section.
Proposition 2.12. Let T
2
≥ T
1
be random times taking values in T = N
0
∪¦+∞¦ with generating
functions G
T
1
and G
T
2
. If
P(T
2
= m+n [ T
1
= m) = P(T
1
= n), m, n ∈ N
0
,
then G
T
2
(s) = G
2
T
1
(s).
2.4 Moments
Another useful thing about generating functions is that they can make the computation of moments
easier. Recall that E[X
n
] is said to be the nth moment of X. Also notice that if P(X = ∞) > 0,
then X
n
is never integrable, so we will now restrict attention to random variables that only take
values in N
0
.
Proposition 2.13. Let X be a N
0
valued random variable X with generating function G
X
. For
n ∈ N the following two statements are equivalent
1. E[X
n
] < ∞,
2.
d
n
G
X
(s)
ds
n
¸
¸
¸
s=1
exists (in the sense that the left limit lim
s1
d
n
G
X
(s)
ds
n
exists)
In either case, we have
E[X(X −1)(X −2) . . . (X −n + 1)] =
d
n
ds
n
G
X
(s)
¸
¸
¸
s=1
.
Proof. Formally, one can check this by setting s = 1 in (2.1), and checking the resulting summation
amounts to calculating the desired expectation.
14
The quantities
E[X], E[X(X −1)], E[X(X −1)(X −2)], . . .
are called factorial moments of the random variable X. You can get the classical moments from
the factorial moments by solving a system of linear equations. It is very simple for the ﬁrst few:
E[X] = E[X],
E[X
2
] = E[X(X −1)] +E[X],
E[X
3
] = E[X(X −1)(X −2)] + 3E[X(X −1)] +E[X], . . .
A useful identity which follows directly from the above results is the following:
Var[X] = P
(1) +P
(1) −(P
(1))
2
,
and it is valid if the ﬁrst two derivatives of P at 1 exist.
Example 2.14. Let X be a Poisson random variable with parameter λ. Its generating function is
given by
A(s) = e
λ(s−1)
.
Therefore,
d
n
ds
n
A(1) = λ
n
, and so, the sequence (E[X], E[X(X − 1)], E[X(X − 1)(X − 2)], . . . ) of
factorial moments of X is just (λ, λ
2
, λ
3
, . . . ). It follows that
E[X] = λ,
E[X
2
] = λ
2
+λ, Var[X] = λ
E[X
3
] = λ
3
+ 3λ
2
+λ, . . .
Example 2.15. We have an urn which contains three number balls. We then play a repeated game
where on each turn we draw a ball with the following outcomes:
a) If we draw the ﬁrst ball, we win a dollar, replace the ball in the urn, and then play another
round.
b) If we draw the second ball, we win two dollars, replace the ball in the urn, and then play
another round.
c) If we draw the third ball, the game is over.
Let X denote the amount of money that we win in this game. The number of rounds that we play
has a geometric distribution, so the game ends with probability one and P(X = ∞) = 0. We would
like to determine the generating function G
X
associated with X. To do this, we let Y denote the
amount of winnings obtained after (but not including the winnings from) the ﬁrst round, and we
let Z denote the ﬁrst ball drawn. Then the conditional distribution of Y given Z = 1 or Z = 2 is
the same as the unconditional distribution of X. That is:
P(Y = n [ Z = 1) = P(Y = n [ Z = 2) = P(X = n), for all n ∈ N
0
.
As a result:
G
Y
(s) = E(s
Y
[ Z = 1) = E(s
Y
[ Z = 2) = E[s
X
] = G
X
(s),
15
and
G
X
(s) = E[s
X
] = E[s
X
[ Z = 1] P(Z = 1) +E[s
X
[ Z = 2] P(Z = 2) +E[s
X
[ Z = 3] P(Z = 3)
= E[s
1+Y
[ Z = 1]/3 +E[s
2+Y
[ Z = 2]/3 +E[s
0
[ Z = 3]/3
= sG
X
(s)/3 +s
2
G
X
(s)/3 + 1/3.
Solving for G
X
show that G
X
(s) = 1/(3 −s −s
2
). In particular,
G
X
(s) =
1 + 2s
(3 −s −s
2
)
2
, G
X
(s) =
8 + 6s + 6s
2
(3 −s −s
2
)
3
.
As a result, we can determine a number of properties of X
P(X = 0) = G
X
(0) = 1/3, P(X = 1) = G
X
(0) = 1/9, P(X = 2) = G
X
(0)/2 = 4/27,
E[X] = G
X
(1) = 3 E[X
2
] = G
X
(1) +G
X
(1) = 23 Var(X) = 14.
2.5 Random sums
Our next application of generating function in the theory of stochastic processes deals with the
socalled random sums. Let (ξ
n
)
n∈N
be a sequence of random variables, and let N be a random
time (a random time is simply an T = N
0
∪ ¦+∞¦value random variable). We can deﬁne the
random variable
Y =
N
k=0
ξ
k
by Y (ω) =
_
0, N(ω) = 0,
N(ω)
k=1
ξ
k
(ω), N(ω) ≥ 1
for ω ∈ Ω.
More generally, for an arbitrary stochastic process (X
n
)
n∈N
0
and a random time N (with P[N =
+∞] = 0), we deﬁne the random variable X
N
by X
N
(ω) = X
N(ω)
(ω), for ω ∈ Ω. When N
is a constant (N = n), then X
N
is simply equal to X
n
. In general, think of X
N
as a value
of the stochastic process X taken at the time which is itself random. If X
n
=
n
k=1
ξ
k
, then
X
N
=
N
k=1
ξ
k
.
Example 2.16. Let (ξ
n
)
n∈N
be the increments of a symmetric simple random walk (cointosses),
and let N have the following distribution
n = 0 1 2
P(N = n) = 1/3 1/3 1/3
which is independent of (ξ
n
)
n∈N
(it is very important to specify the dependence structure between
N and (ξ
n
)
n∈N
in this setting!). Let us compute the distribution of Y =
N
k=0
ξ
k
in this case. This
16
is where we, typically, use the formula of total probability:
P(Y = m)
= P(Y = m[N = 0) P(N = 0) +P(Y = m[N = 1) P(N = 1) +P(Y = m[N = 2) P(N = 2)
= P
_
N
k=0
ξ
k
= m
¸
¸
¸ N = 0
_
P(N = 0) +P
_
N
k=0
ξ
k
= m
¸
¸
¸ N = 1
_
P(N = 1)
+P
_
N
k=0
ξ
k
= m
¸
¸
¸ N = 2
_
P(N = 2)
=
1
3
_
1
{m=0}
+P(ξ
1
= m) +P(ξ
1
+ξ
2
= m)
_
.
When m = 1 (for example), we get
P[Y = 1] =
0 +
1
2
+ 0
3
= 1/6.
Perform the computation for some other values of m for yourself.
What happens when N and (ξ
n
)
n∈N
are dependent? This will usually be the case in practice, as
the value of the time N when we stop adding increments will typically depend on the behavior of
the sum itself.
Example 2.17. Let (ξ
n
)
n∈N
be as above  we can think of a situation where a gambler is repeatedly
playing the same game in which a fair coin is tossed and the gambler wins a dollar if the outcome
is heads and loses a dollar otherwise. A “smart” gambler enters the game and decides on the
following tactic: Let’s see how the ﬁrst game goes. If I lose, I’ll play another 2 games and hopefully
cover my losses, and if I win, I’ll quit then and there. The described strategy amounts to the choice
of the random time N as follows:
N(ω) =
_
1, ξ
1
= 1,
3, ξ
1
= −1.
Then
Y (ω) =
_
1, ξ
1
= −1,
−1 +ξ
2
+ξ
3
, ξ
1
= 1.
Therefore,
P[Y = 1] = P[Y = 1[ξ
1
= 1]P[ξ
1
= 1] +P[Y = 1[ξ
1
= −1]P[ξ
1
= −1]
= 1 P[ξ
1
= 1] +P[ξ
2
+ξ
3
= 2]P[ξ
1
= −1]
=
1
2
(1 +
1
4
) =
5
8
.
Similarly, we get P[Y = −1] =
1
4
and P[Y = −3] =
1
8
. The expectation E[Y ] is equal to 1
5
8
+
(−1)
1
4
+ (−3)
1
8
= 0. This is not an accident. One of the ﬁrst powerful results of the beautiful
martingale theory states that no matter how smart a strategy you employ, you cannot beat a fair
gamble.
17
We will return to the general (nonindependent) case in the next lecture. Let us use generating
functions to give a full description of the distribution of Y =
N
k=0
ξ
k
when the time is independent
of the summands.
Proposition 2.18. Let (ξ
n
)
n∈N
be a sequence of independent N
0
valued random variables, all of
which share the same distribution and generating function G
ξ
(s). Let N be a random time which
is independent of (ξ
n
)
n∈N
with P(N < ∞) = 1 and generating function G
N
. Then the generating
function G
Y
of the random sum Y =
N
k=0
ξ
k
is given by
G
Y
(s) = G
N
_
G
ξ
(s)
_
.
Proof. First let X
0
= 0 and X
n
=
n
i=1
ξ
i
for n ∈ N denote the sequence of partial sums. Repeated
applications of Proposition 2.9 show that G
Xn
= G
n
ξ
(where G
0
ξ
(s) = 1). As a result, we may apply
the tower law for conditional expectation to see that
E[s
Y
] =
n∈N
0
E[s
Y
[ N = n] P(N = n)
=
n∈N
0
E[s
Xn
[ N = n] P(N = n) =
n∈N
0
G
n
ξ
(s) P(N = n) = G
N
_
G
ξ
(s)
_
.
Corollary 2.19 (Wald’s Identity I). Let (ξ
n
)
n∈N
and N be as in Proposition 2.18. Suppose,
also, that E[N] < ∞ and E[ξ
1
] < ∞. Then
E
_
N
k=0
ξ
k
_
= E[N] E[ξ
1
].
Proof. We just apply the composition rule for derivatives to the equality G
Y
= G
N
◦ G
ξ
to get
G
Y
(s) = G
N
(G
ξ
(s))G
ξ
(s).
After we let s ¸ 1, we get
E[Y ] = G
Y
(1) = G
N
(G
ξ
(1))G
ξ
(1) = G
N
(1)G
ξ
(1) = E[N] E[ξ
1
].
Example 2.20. Every time the Springﬁeld Isotopes play in the league championship, their chance
of winning is p ∈ (0, 1). The number of years between two championships they get to play in has the
Poisson distribution p(λ), λ > 0. What is the expected number of years Y between the consecutive
championship wins?
Let (ξ
n
)
n∈N
be the sequence of independent Poisson(λ)random variables modeling the number
of years between consecutive championship appearances by the Wildcat. Moreover, let N be a
Geometric(p) random variable with success probability p. Then
Y =
N
k=0
ξ
k
.
Indeed, every time the Isotopes lose the championship, another ξ
·
years have to pass before they get
another chance and the whole thing stops when they ﬁnally win. To compute the expectation of Y
we use Corollary 2.19
E[Y ] = E[N] E[ξ
k
] =
1 −p
p
λ.
18
3 Random Walks II
3.1 Stopping times
The last application of generating functions dealt with sums evaluated between 0 and some random
time N. An especially interesting case occurs when the value of N depends directly on the evolution
of the underlying stochastic process. Even more important is the case where time’s arrow is taken
into account. If you think of N as the time you stop adding new terms to the sum, it is usually the
case that you are not allowed (able) to see the values of the terms you would get if you continued
adding. Think of an investor in the stock market. Her decision to stop and sell her stocks can
depend only on the information available up to the moment of the decision. Otherwise, she would
sell at the absolute maximum and buy at the absolute minimum, making tons of money in the
process. Of course, this is not possible unless you are clairvoyant, so the mere mortals have to
restrict their choices to socalled stopping times.
Deﬁnition 3.1. Let (X
n
)
n∈N
0
be a stochastic process. A random variable T taking values in T =
N
0
∪ ¦+∞¦ is said to be a stopping time with respect to (X
n
)
n∈N
0
if for each n ∈ N
0
there exists
a function G
n
: R
n+1
→ ¦0, 1¦ such that
1
{T=n}
= G
n
(X
0
, X
1
, . . . , X
n
), for all n ∈ N
0
.
The functions G
n
are called the decision functions, and should be thought of as a black box
which takes the values of the process (X
n
)
n∈N
0
observed up to the present point and outputs either
0 or 1. The value 0 means keep going and 1 means stop. The whole point is that the decision has
to based only on the available observations and not on the future ones.
Example 3.2.
1. The simplest examples of stopping times are (nonrandom) deterministic times. Just set
T = 5 (or T = 723 or T = n
0
for any n
0
∈ N
0
∪ ¦+∞¦), no matter what the state of the
world ω ∈ Ω is. The family of decision rules is easy to construct:
G
n
(x
0
, x
1
, . . . , x
n
) =
_
1, n = n
0
,
0, n ,= n
0
.
.
Decision functions G
n
do not depend on the values of X
0
, X
1
, . . . , X
n
at all. A gambler who
stops gambling after 20 games, no matter of what the winnings of losses are uses such a rule.
2. Probably the most wellknown examples of stopping times are (ﬁrst) hitting times. They can
be deﬁned for general stochastic processes, but we will stick to simple random walks for the
purposes of this example. So, let X
n
=
n
k=0
ξ
k
be a simple random walk, and let T
l
be the
ﬁrst time X hits the level l ∈ N. More precisely, we use the following slightly nonintuitive
but mathematically correct deﬁnition
T
l
= min¦n ∈ N
0
: X
n
= l¦.
The set ¦n ∈ N
0
: X
n
= l¦ is the collection of all timepoints at which X visits the level l.
The earliest one  the minimum of that set  is the ﬁrst hitting time of l. In states of the
19
world ω ∈ Ω in which the level l just never get reached, i.e., when ¦n ∈ N
0
: X
n
= l¦ is an
empty set, we set T
l
(ω) = +∞. In order to show that T
l
is indeed a stopping time, we need to
construct the decision functions G
n
, n ∈ N
0
. Let us start with n = 0. We would have T
l
= 0
in the (impossible) case X
0
= l, so we always have G
0
(X
0
) = 0. How about n ∈ N. For the
value of T
l
to be equal to exactly n, two things must happen:
(a) X
n
= l (the level l must actually be hit at time n), and
(b) X
n−1
,= l, X
n−2
,= l, . . . , X
1
,= l, X
0
,= l (the level l has not been hit before).
Therefore,
G
n
(x
0
, x
1
, . . . , x
n
) =
_
1, x
0
,= l, x
1
,= l, . . . , x
n−1
,= l, x
n
= l
0, otherwise.
The hitting time T
2
of the level l = 2 for a particular trajectory of a symmetric simple random
walk is depicted below:
T
2
T
M
5 10 15 20 25 30
6
4
2
2
4
6
.
3. How about something that is not a stopping time? Let n
0
be an arbitrary timehorizon and let
T
M
be the last time during 0, . . . , n
0
that the random walk visits its maximum during 0, . . . , n
0
(see picture above). If you bought a stock at time t = 0, had to sell it some time before n
0
and
had the ability to predict the future, this is one of the points you would choose to sell it at.
Of course, it is impossible to decide whether T
M
= n, for some n ∈ 0, . . . , n
0
−1 without the
knowledge of the values of the random walk after n. More precisely, let us sketch the proof of
the fact that T
M
is not a stopping time. Suppose, to the contrary, that it is, and let G
n
be the
family of decision functions. Consider the following two trajectories: (0, 1, 2, 3, . . . , n − 1, n)
and (0, 1, 2, 3, . . . , n − 1, n − 2). The diﬀer only in the direction of the last step. They also
diﬀer in the fact that T
M
= n for the ﬁrst one and T
M
= n − 1 for the second one. On the
other hand, by the deﬁnition of the decision functions, we have
1
{T
M
=n−1}
= G
n−1
(X
0
, . . . , X
n−1
).
The righthand side is equal for both trajectories, while the lefthand side equals to 0 for the
ﬁrst one and 1 for the second one. A contradiction.
Remark 3.3. In the remainder of this section, we will sometimes write our decision function as a
function of the random variables ξ
1
, ξ
2
, . . . , ξ
n
rather than the random variables X
0
, X
1
, X
2
, . . . , X
n
.
As knowing the values X
0
, X
1
, X
2
, . . . , X
n
is clearly equivalent to knowing the values ξ
1
, ξ
2
, . . . , ξ
n
,
we are free to use whichever representation is more convenient.
20
3.2 Wald’s identity II
Having deﬁned the notion of a stopping time, let use try to compute something about it. The
random variables (ξ
n
)
n∈N
in the statement of the theorem below are only assumed to be independent
of each other and identically distributed. To make things simpler, you can think of (ξ
n
)
n∈N
as
increments of a simple random walk. Before we state the main result, here is an extremely useful
identity:
Proposition 3.4. Let N be an N
0
valued random variable. Then
E[N] =
k∈N
0
P[N ≥ k].
Proof. Clearly, P[N ≥ k] =
j≥k
P[N = j], so (note what happens to the indices when we switch
the sums)
k∈N
P[N ≥ k] =
k∈N
k≤j<∞
P[N = j]
=
j∈N
1≤k≤j
P[N = j] =
∞
j=1
j P[N = j] = E[N].
Theorem 3.5 (Wald’s Identity II). Let (ξ
n
)
n∈N
be a sequence of independent, identically dis
tributed random variables with E
_
[ξ
1
[
¸
< ∞. Set
X
n
=
n
k=1
ξ
k
, n ∈ N
0
.
If T is an (X
n
)
n∈N
0
stopping time such that E[T] < ∞, then
E[X
T
] = E[ξ
1
] E[T].
Proof. Here is another way of writing the sum
T
k=1
ξ
k
:
T
k=1
ξ
k
=
k∈N
0
ξ
k
1
{k≤T}
.
The idea behind it is simple: add all the values of ξ
k
for k ≤ T and keep adding zeros (since
ξ
k
1
{k≤T}
= 0 for k > T) after that. Taking expectation of both sides and switching E and
(this
can be justiﬁed, but the argument is technical and we omit it here) yields:
E
_
T
k=1
ξ
k
_
=
∞
k=1
E[1
{k≤T}
ξ
k
]. (3.1)
Now lets look at the random variable 1
{k≤T}
more closely. We have
1
{k≤T}
= 1 −1
{k>T}
= 1 −1
{k−1≥T}
= 1 −
k−1
j=0
1
{T=j}
= 1 −
k−1
j=0
G
j
(ξ
1
, . . . , ξ
j
),
21
where G
j
(ξ
1
, . . . , ξ
j
) is the decision function which corresponds to the the event ¦T = j¦. In
particular, we see that the random variable 1
{k≤T}
can be written as function of the variables
(ξ
1
, . . . , ξ
k−1
). As the random variables (ξ
1
, . . . , ξ
k
) are independent, the random variables 1
{k≤T}
and ξ
k
are also independent. This means that
E
_
T
k=1
ξ
k
_
=
∞
k=1
E[1
{k≤T}
] E[ξ
k
] = E[ξ
k
]
∞
k=1
P(k ≤ T) = E[ξ
k
]E[T].
Example 3.6 (Gambler’s ruin problem). A gambler start with x ∈ N dollars and repeatedly
plays a game in which he wins a dollar with probability
1
2
and loses a dollar with probability
1
2
. He
decides to stop when one of the following two things happens:
1. he goes bankrupt, i.e., his wealth hits 0, or
2. he makes enough money, i.e., his wealth reaches some level a > x.
The classical “Gambler’s ruin” problem asks the following question: what is the probability that the
gambler will make a dollars before he goes bankrupt?
Gambler’s wealth (W
n
)
n∈N
is modeled by a simple random walk starting from x, whose incre
ments ξ
k
= W
k
−W
k−1
are cointosses. Then W
n
= x +X
n
, where X
n
=
n
k=0
ξ
k
, n ∈ N
0
. Let T
be the time the gambler stops. We can represent T in two diﬀerent (but equivalent) ways. On the
one hand, we can think of T as the smaller of the two hitting times T
−x
and T
a−x
of the levels −x
and a − x for the random walk (X
n
)
n∈N
0
(remember that W
n
= x + X
n
, so these two correspond
to the hitting times for the process (W
n
)
n∈N
0
of the levels 0 and a). On the other hand, we can
think of T as the ﬁrst hitting time of the twoelement set ¦−x, a −x¦ for the process (X
n
)
n∈N
0
. In
either case, it is quite clear that T is a stopping time (can you write down the decision functions?).
We will see later that the probability that the gambler’s wealth will remain strictly between 0 and a
forever is zero, so P[T < ∞] = 1.
What can we say about the random variable X
T
 the gambler’s wealth (minus x) at the random
time T? Clearly, it is either equal to −x or to a −x, and the probabilities p
0
and p
a
with which it
takes these values are exactly what we are after in this problem. We know that, since there are no
other values X
T
can take, we must have p
0
+p
a
= 1. Wald’s identity gives us the second equation
for p
0
and p
a
:
E[X
T
] = E[ξ
1
]E[T] = 0 E[T] = 0,
so
0 = E[X
T
] = p
0
(−x) +p
a
(a −x).
These two linear equations with two unknowns yield
p
0
=
a −x
a
, p
a
=
x
a
.
It is remarkable that the two probabilities are proportional to the amounts of money the gambler
needs to make (lose) in the two outcomes. Again we see that the gambler cannot extract positive
expected value from a fair game. The situation is diﬀerent when p ,=
1
2
.
22
3.3 The distribution of the ﬁrst hitting time T
1
Let (X
n
)
n∈N
0
be a simple random walk, with the probability p of stepping up and let
T
= min¦n ∈ N
0
: X
n
= ¦
be the ﬁrst hitting time of level but the random walk. We will now study the random variables T
using the generatingfunction methods. We will essentially follow the approach of Example 2.15:
we will attempt to determine an equation that the generating function associated with the random
variable T
satisﬁes.
The ﬁrst step is contained in the following proposition. Recall that the minimum value in empty
set is taken to be +∞ by convention.
Proposition 3.7. Let (X
1
, X
2
, . . . ) be a random walk with probability p of moving up. If ∈ N,
then G
T
(s) = G
T
1
(s), where G
T
denotes the generating function of T
.
Proof. We will only handle the case = 2, but the general case follows in much that same way.
The strategy is to show that
P(T
2
= m+n [ T
1
= m) = P(T
1
= n), m, n ∈ N
0
.
To do this, lets consider the events
A
m,n
= “n is the ﬁrst time after m that X
n
= X
m
+ 1”
= ¦X
i
≤ X
m
for all i < n, and X
n
= X
m
+ 1¦,
for m ≤ n. These events have the two properties:
1. If i < j ≤ k < , then the events A
i,j
and A
k,
are independent.
This is because A
i,j
on depends on the steps ξ
i+1
, ξ
i+1
, . . . , ξ
j
of the random walk, while A
k,
only depends the steps ξ
k+1
, ξ
k+2
, . . . , ξ
, and all the steps of the random walk are independent.
2. P(A
i,j
) = P(A
n+i,n+j
) for all n ∈ N
0
.
The event A
i,j
is determined by some relatively complicated formula applied to the random
variables (ξ
i+1
, ξ
i+2
, . . . , ξ
j
) and the event A
n+i,n+j
is determined by the exact same formula
applied to the random variables (ξ
n+i+1
, ξ
n+k+2
, . . . , ξ
n+j
).
More precisely, we can choose a set B ∈ ¦−1, 1¦
j−i−1
such that A
i,j
= ¦(ξ
i+1
, ξ
i+2
, . . . , ξ
j
) ∈
B¦ and A
n+i,n+j
= ¦(ξ
n+i+1
, ξ
n+i+2
, . . . , ξ
n+j
) ∈ B¦. As the random vectors (ξ
i+1
, ξ
i+2
, . . . , ξ
j
)
and (ξ
n+i+1
, ξ
n+k+2
. . . , ξ
n+j
) have the same joint distribution, the events A
i,j
and A
n+i,n+j
must have the same probability.
So, if m, n ∈ N, then
P(T
2
= m+n [ T
1
= m) = P(A
m,m+n
[ A
0,m
) = P(A
m,m+n
) = P(A
0,n
) = P(T
1
= n).
In then follows from Proposition 2.12 that G
T
2
(s) = G
2
T
1
(s).
We will now use the previous relationship between T
1
and T
2
to obtain the generating function
for T
1
explicitly.
23
Proposition 3.8. Let (X
1
, X
2
, . . . ) be a random walk with probability p of moving up and let
T
1
= min¦n ∈ N
0
: X
n
= 1¦
denote the ﬁrst time that the random walk hits the level 1. Then the generating function for T
1
is
given by
G
T
1
(s) =
1 −
_
1 −4pqs
2
2qs
.
Proof. Our strategy is to condition of the ﬁrst move that the random makes, and then derive a
recursive formula for the generating function. As a result, it will be useful to consider an auxiliary
process given by:
Y
n
= X
n+1
−X
1
, n ∈ N
0
.
So Y corresponds to the changes in the process X after the ﬁrst step. It is not hard to check that
Y is also a random walk with probability p of moving up at each step and Y is independent of X
1
.
It turns out that it will be useful to consider the random variable:
T
2
= “The ﬁrst time that Y hits the level 2”
= min ¦n ∈ N
0
: Y
n
= 2¦
As Y is a random walk, the generating function for T
2
is given by G
2
T
1
. As Y is independent of X
1
,
and T
2
is determined by Y , T
2
is also independent of X
1
.
Crucial observation: If the ﬁrst step that X makes is down, then T
1
= 1+T
2
. This is because
X has now has to climb two steps up to get from −1 up to 1. This is equivalent to Y climbing two
steps up from 0 to 2. The other thing to realize is that Y is running on a clock that is one time
step behind X. If Y ﬁrst hits 2 at time T
2
relative to its clock, then X ﬁrst hits 1 at time 1 + T
2
relative the initial clock.
Now we make the recursive argument:
G
T
1
(s) = E[s
T
1
] = E[s
T
1
[ X
1
= 1] P(X
1
= 1) +E[s
T
1
[ X
1
= −1] P(X
1
= −1)
= s p +E[s
1+T
2
[ X
1
= −1] (1 −p)
= s p +s E[s
T
2
] (1 −p)
= s p +s G
2
T
1
(s) (1 −p).
We now know that G
T
1
solves: G
T
1
(s) = sp + sqG
2
T
1
(s) for [s[ < 1. There are two possible
solutions to this equation (for each s), given by
G
T
1
(s) =
1 ±
_
1 −4pqs
2
2qs
.
One of these solutions is always greater than 1 in absolute value, so it cannot correspond to a value
of a generating function and we must select the negative square root.
24
Once we have the generating function for T
1
, we can begin to answer some questions about this
random variable. The ﬁrst question is: does the random walk eventually hit the level 1? To answer
this, we use second part of Proposition 2.7 and compute
P(T
1
< ∞) = lim
s→1
G
T
1
(s) =
1 −
√
1 −4pq
2q
=
1 −[p −q[
2q
=
_
1, p ≥
1
2
p
q
, p <
1
2
,
where we have used the fact that
[p −q[
2
= (2p −1)
2
= 4p
2
−4p + 1 = 1 −4pq.
In particular, if the random walk is more likely to go down, than it is to go up, then there is
some strictly positive chance that it will never hit the level 1. It is remarkable that if p =
1
2
, the
random walk will always hit 1 sooner or later, but this does not need to happen if p <
1
2
. What
we have here is an example of a phenomenon known as criticality: many physical systems exhibit
qualitatively diﬀerent behavior depending on whether the value of certain parameter p lies above
or below certain critical value p = p
c
.
Another question that generating functions can help up answer is: how long, on average, do
we need to wait before 1 is hit? When p <
1
2
, P[T
1
= +∞] > 0, so we can immediately conclude
that E[T
1
] = +∞, by deﬁnition. The case p ≥
1
2
is more interesting. Following the recipe from the
lecture on generating functions, we compute the derivative of G
T
1
(s) and get
G
T
1
(s) =
2p
_
1 −4pqs
2
−
1 −
_
1 −4pqs
2
2qs
2
.
When p =
1
2
, we get
lim
s1
G
T
1
(s) = lim
s1
_
1
√
1 −s
2
−
1 −
√
1 −s
2
s
2
_
= +∞,
and conclude that E[T
1
] = +∞.
For p >
1
2
, the situation is less severe:
lim
s1
G
T
1
(s) =
1
p −q
.
We can summarize the situation in the following table
P[T
1
< ∞] E[T
1
]
p <
1
2
p
q
+∞
p =
1
2
1 +∞
p >
1
2
1
1
p−q
Finally, we can try to extract the probability mass function of the random variable T
1
from the
generating function G
T
1
. The obvious way to do this is to compute higher and higher derivatives
of G
T
1
and then set s = 0. In this case, it turns out there is an easier way.
25
The square root appearing in the formula for G
X
is an expression of the form (1 + x)
1/2
and
the (generalized) binomial formula can be used:
(1 +x)
α
=
∞
k=0
_
α
k
_
x
k
, where
_
α
k
_
=
α(α −1) . . . (α −k + 1)
k!
, k ∈ N, α ∈ R.
Therefore,
G
T
1
(s) =
1
2qs
−
1
2qs
∞
k=0
_
1/2
k
_
(−4pqs
2
)
k
=
∞
k=1
s
2k−1
1
2p
(4pq)
k
(−1)
k−1
_
1/2
k
_
,
and
a
2k−1
=
1
2q
(4pq)
k
(−1)
k−1
_
1/2
k
_
, k ∈ N.
Of course, the random walk cannot move from 0 to 1 in a even number of steps, so a
n
= 0 if n is
even. This expression can be simpliﬁed a bit further: one can show (by induction on k for instance)
that
_
1/2
k
_
=
2(−1)
k+1
4
k
(2k −1)
_
2k −1
k
_
.
Thus,
P(T
1
= 2k −1) =
1
2k −1
_
2k −1
k
_
p
k
q
k−1
, k ∈ N,
and P(T
1
= k) = 0 when k is even.
3.4 Strong Markov property
In the previous section, we observed that Y
n
= X
1+n
− X
n
is a again a random walk. Inuitively,
we stop, reset time and space, and then continue, the resulting process that we obtain is again a
random walk. It turns out this is true not only for deterministic times, but even for stopping times.
This in fact is a result of the “strong Markov” property which we will deﬁne later in the course.
Proposition 3.9. Let (X
n
)
n∈N
0
be a random walk with parameter p, let T be a stopping time with
respect to (X
n
)
n∈N
0
which never takes the value ∞. If we deﬁne the process
Y
n
= X
T+n
−X
T
, n ∈ N
0
,
then (Y
n
)
n∈N
0
is also a random walk with parameter p.
26
4 Branching Process
4.1 A bit of history
In the mid 19th century several aristocratic families in Victo
rian England realized that their family names could become ex
tinct. Was it just unfounded paranoia, or did something real
prompt them to come to this conclusion? They decided to ask
around, and Sir Francis Galton (a “polymath, anthropologist,
eugenicist, tropical explorer, geographer, inventor, meteorolo
gist, protogeneticist, psychometrician and statistician” and half
cousin of Charles Darwin) posed the following question (1873,
Educational Times):
How many male children (on average) must each gen
eration of a family have in order for the family name
to continue in perpetuity?
The ﬁrst complete answer came from Reverend Henry William
Watson soon after, and the two wrote a joint paper entitled One
the probability of extinction of families in 1874. By the end of
this section, you will be able to give a precise answer to Galton’s
question.
Sir Francis Galton
4.2 A mathematical model
The model proposed by Watson was the following:
1. A population starts with one individual at time n = 0: Z
0
= 1.
2. After one unit of time (at time n = 1) the sole individual produces Z
1
identical clones of itself
and dies. Z
1
is an N
0
valued random variable.
3. (a) If Z
1
happens to be equal to 0 the population is dead and nothing happens at any future
time n ≥ 2.
(b) If Z
1
> 0, a unit of time later, each of Z
1
individuals gives birth to a random number
of children and dies. The ﬁrst one has Z
1,1
children, the second one Z
1,2
children, etc.
The last, Z
th
1
one, gives birth to Z
1,Z
1
children. We assume that the distribution of the
number of children is the same for each individual in every generation and independent
of either the number of individuals in the generation and of the number of children
the others have. This distribution, shared by all Z
n,i
and Z
1
, is called the oﬀspring
distribution. The total number of individuals in the second generation is now
Z
2
=
Z
1
k=1
Z
1,k
.
27
(c) The third, fourth, etc. generations are produced in the same way. If it ever happens that
Z
n
= 0, for some n, then Z
m
= 0 for all m ≥ n  the population is extinct. Otherwise,
Z
n+1
=
Zn
k=1
Z
n,k
.
Deﬁnition 4.1. A stochastic process with the properties described in (1), (2) and (3) above is
called a (simple) branching process.
The mechanism that produces the next generation from the present one can diﬀer from application
to application. It is the oﬀspring distribution alone that determines the evolution of a branching
process. With this new formalism, we can pose Galton’s question more precisely:
Under what conditions on the oﬀspring distribution will the process (Z
n
)
n∈N
0
never go
extinct, i.e., when does
P[Z
n
≥ 1 for all n ∈ N
0
] = 1 (4.1)
hold?
4.3 Construction and simulation of branching processes
Before we answer Galton’s question, let us ﬁgure out how to simulate a branching process, for a
given oﬀspring distribution p(k) = P[Z
1
= k], k ∈ N
0
. When we studies simulation, we showed
that it is possible to construct a function g : [0, 1] → N
0
such that the random variable g(U) has
probability mass function p when U ∼ Uniform(0, 1).
Some time ago we asserted that a probability space which supports a sequence (U
n
)
n∈N
0
of
independent U[0, 1] random variables exists. We think of (U
n
)
n∈N
0
as a sequence of random numbers
produced by a computer. Let us ﬁrst apply the function g to each member of (U
n
)
n∈N
0
to obtain
an independent sequence (η
n
)
n∈N
0
of N
0
valued random variables with pmf p. In the case of a
simple random walk, we would be done at this point  an accumulation of the ﬁrst n elements of
(η
n
)
n∈N
0
would give you the value X
n
of the random walk at time n. Branching processes are a bit
more complicated; the increment Z
n+1
−Z
n
depends on Z
n
: the more individuals in a generation,
the more oﬀspring they will produce. In other words, we need a black box with two inputs 
“randomness” and Z
n
 which will produce Z
n+1
. What do we mean by “randomness”? Ideally, we
would need exactly Z
n
(unused) elements of (η
n
)
n∈N
0
to simulate the number of children for each
of Z
n
members of generation n. This is exactly how one would do it in practice: given the size
Z
n
of generation n, one would draw Z
n
simulations from the distribution (p
n
)
n∈N
0
, and sum up
the results to get Z
n+1
. Mathematically, it is easier to be more wasteful. The sequence (η
n
)
n∈N
0
can be rearranged into a double sequence
2
¦Z
n,i
¦
n∈N
0
,i∈N
. In words, instead of one sequence of
independent random variables with pmf p, we have a sequence of sequences. Such an abundance
allows us to feed the whole “row” ¦Z
n,i
¦
i∈N
into the black box which produces Z
n+1
from Z
n
. You
can think of Z
n,i
as the number of children the i
th
individual in the n
th
generation would have had
2
Can you ﬁnd a onetoone and onto mapping from N into N ×N?
28
she been born. The black box uses only the ﬁrst Z
n
elements of ¦Z
n,i
¦
i∈N
and discards the rest:
Z
0
= 1, Z
n+1
=
Zn
i=1
Z
n,i
,
where all ¦Z
n,i
¦
n∈N
0
,i∈N
are independent of each other and have the same distribution with pmf p.
Once we learn a bit more about the probabilistic structure of (Z
n
)
n∈N
0
, we will describe another
way to simulate it.
4.4 A generatingfunction approach
Having deﬁned and constructed a branching process (Z
n
)
n∈N
0
with oﬀspring distribution given by
the pmf p, let us analyze its probabilistic structure. The ﬁrst question the needs to be answered is
the following: What is the distribution of Z
n
, for n ∈ N
0
? It is clear that Z
n
must be N
0
valued,
so its distribution is completely described by its pmf, which is, in turn, completely determined by
its generating function. While an explicit expression for the pmf of Z
n
may not be available, its
generating function can always be computed:
Proposition 4.2. Let (Z
n
)
n∈N
0
be a branching process, and let the generating function of its oﬀ
spring distribution be given by G(s). Then the generating function of Z
n
is the nfold composition
of G with itself, i.e.,
G
Zn
(s) = G(G(. . . G(s) . . . ))
. ¸¸ .
n G’s
, for n ≥ 1.
Proof. For n = 1, the distribution of Z
1
has pmf p, so G
Z
1
(s) = G(s). Suppose that the statement
of the proposition holds for some n ∈ N. Then
Z
n+1
=
Zn
i=1
Z
i,n
,
can be viewed as a random sum of Z
n
independent random variables where each random summand
has generating function G and the number of summands Z
n
is independent of the terms in the
sum. Proposition 2.18 asserts that the generating function G
Z
n+1
of Z
n+1
is a composition of the
generating function G(s) of each of the summands and the generating function G
Zn
of the random
time Z
n
. Therefore,
G
Z
n+1
(s) = G
Zn
(G(s)) = G(G(. . . G(G(s)) . . . )))
. ¸¸ .
n + 1 G’s
,
where the second equality follows by induction.
Let us use Proposition 4.2 in some simple examples.
Example 4.3. Let (Z
n
)
n∈N
0
be a branching process with oﬀspring distribution (p
n
)
n∈N
0
. In the
ﬁrst three examples no randomness occurs and the population growth can be described exactly. In
the other examples, more interesting things happen.
1. p(0) = 1, p(n) = 0, n ∈ N:
In this case Z
0
= 1 and Z
n
= 0 for all n ∈ N. This infertile population dies after the ﬁrst
generation.
29
2. p(0) = 0, p(1) = 1, p(n) = 0, n ≥ 2:
Each individual produces exactly one child before he/she dies. The population size is always
1: Z
n
= 1, n ∈ N
0
.
3. p(0) = 0, p(1) = 0, . . . , p(k) = 1, p(n) = 0, n ≥ k, for some k ≥ 2:
Here, there are k kids per individual, so the population grows exponentially: G(s) = s
k
, so
G
Zn
(s) = ((. . . (s
k
)
k
. . . )
k
)
k
= s
k
n
. Therefore, Z
n
= k
n
, for n ∈ N.
4. p(0) = p, p(1) = q = 1 −p, p(n) = 0, n ≥ 2:
Each individual tosses a (a biased) coin and has one child of the outcome is heads or dies
childless if the outcome is tails. The generating function of the oﬀspring distribution is
G(s) = p +qs. Therefore,
G
Zn
(s) = (p +q(p +q(p +q(. . . (p +qs)))))
. ¸¸ .
n pairs of parentheses
.
The expression above can be simpliﬁed considerably. One needs to realize two things:
(a) After all the products above are expanded, the resulting expression must be of the form
A + Bs, for some A, B. If you inspect the expression for G
Zn
even more closely, you
will see that the coeﬃcient B next to s is just q
n
.
(b) G
Zn
is a generating function of a probability distribution, so A+B = 1.
Therefore,
G
Zn
(s) = (1 −q
n
) +q
n
s.
Of course, the value of Z
n
will be equal to 1 if and only if all of the cointosses of its ancestors
turned out to be heads. The probability of that event is q
n
. So we didn’t need Proposition 4.2
after all.
This example can be interpreted alternatively as follows. Each individual has exactly one child,
but its gender is determined at random  male with probability q and female with probability
p. Assuming that all females change their last name when they marry, and assuming that
all of them marry, Z
n
is just the number of individuals carrying the family name after n
generations.
5. p(0) = p
2
, p(1) = 2pq, p(2) = q
2
, p(n) = 0, n ≥ 3:
In this case each individual has exactly two children and their gender is female with probability
p and male with probability q, independently of each other. The generating function G of the
oﬀspring distribution (p
n
)
n∈N
is given by G(s) = (p +qs)
2
. Then
G
Zn
= (p +q(p +q(. . . p +qs)
2
. . . )
2
)
2
. ¸¸ .
n pairs of parentheses
.
Unlike the example above, it is not so easy to simplify the above expression.
Proposition 4.2 can be used to compute the mean and variance of the population size Z
n
, for
n ∈ N.
30
Proposition 4.4. Let p denote the pmf of the oﬀspring distribution for a branching process (Z
n
)
n∈N
0
.
If p is integrable, i.e., if
µ =
∞
k=0
k p(k) < ∞,
then
E[Z
n
] = µ
n
. (4.2)
If the variance of p is also ﬁnite, i.e., if
σ
2
=
∞
k=0
(k −µ)
2
p(k) < ∞,
then
Var[Z
n
] = σ
2
µ
n
(1 +µ +µ
2
+ +µ
n
) =
_
σ
2
µ
n 1−µ
n+1
1−µ
, µ ,= 1,
σ
2
(n + 1), µ = 1
(4.3)
Proof. Since the distribution of Z
1
has probability mass function p, it is clear that E[Z
1
] = µ and
Var[Z
1
] = σ
2
. We proceed by induction and assume that the formulas (4.2) and (4.3) hold for n ∈ N.
By Proposition 4.2, the generating function G
Z
n+1
is given as a composition G
Z
n+1
(s) = G
Zn
(G(s)).
Therefore, if we use the identity E[Z
n+1
] = G
Z
n+1
(1), we get
G
Z
n+1
(1) = G
Zn
(G(1))G
(1) = G
Zn
(1)G
(1) = E[Z
n
]E[Z
1
] = µ
n
µ = µ
n+1
.
A similar (but more complicated and less illuminating) argument can be used to establish (4.3).
4.5 Extinction probability
We now turn to the central question (the one posed by Galton). We deﬁne extinction to be the
following event:
E = ¦ω ∈ Ω : Z
n
(ω) = 0 for some n ∈ N¦.
It follows from the properties of the branching process that Z
m
= 0 for all m ≥ n whenever Z
n
= 0.
Therefore, we can write E as an increasing union of sets E
n
, where
E
n
= ¦ω ∈ Ω : Z
n
(ω) = 0¦.
Therefore, the sequence (P[E
n
])
n∈N
is nondecreasing and “continuity of probability” implies that
P[E] = lim
n→∞
P[E
n
].
The number P[E] is called the extinction probability. Using generating functions, and, in
particular, the fact that P[E
n
] = P[Z
n
= 0] = G
Zn
(0) we get
P[E] = lim
n→∞
G
Zn
(0) = lim
n→∞
G(G(. . . G(0) . . . ))
. ¸¸ .
n G’s
.
It is amazing that this probability can be computed, even if the explicit form of the generating
function G
Zn
is not known.
31
Proposition 4.5. The extinction probability p = P[E] is the smallest nonnegative solution of the
equation
x = G(x), called the extinction equation,
where G is the generating function of the oﬀspring distribution.
Proof. Let us show ﬁrst that p = P[E] is a solution of the equation x = G(x). Indeed, G is a
continuous function, so G(lim
n→∞
x
n
) = lim
n→∞
G(x
n
) for every convergent sequence (x
n
)
n∈N
0
in
[0, 1]. Let us take a particular sequence given by
x
n
= G(G(. . . G(0) . . . ))
. ¸¸ .
n G’s
.
Then
1. p = P[E] = lim
n→∞
x
n
, and
2. G(x
n
) = x
n+1
.
Therefore,
p = lim
n→∞
x
n
= lim
n→∞
x
n+1
= lim
n→∞
G(x
n
) = G( lim
n→∞
x
n
) = G(p),
and so p solves the equation G(x) = x.
The fact that p = P[E] is the smallest solution of x = G(x) on [0, 1] is a bit trickier to get. Let
p
be another solution of x = G(x) on [0, 1]. Since 0 ≤ p
and G is a nondecreasing function on
[0, 1], we have
G(0) ≤ G(p
) = p
.
We can apply the function G to both sides of the inequality above to get
G(G(0)) ≤ G(G(p
)) = G(p
) = p
.
Continuing in the same way we get
P[E
n
] = G(G(. . . G(0) . . . ))
. ¸¸ .
n G’
≤ p
,
so p = P[E] = lim
n→∞
P[E
n
] ≤ p
, so p is not larger then any other solution p
of x = G(x).
Example 4.6. Let us compute extinction probabilities in the cases from Example 4.3.
1. p(0) = 1, p(n) = 0, n ∈ N:
No need to use any theorems. P[E] = 1 in this case.
2. p(0) = 0, p(1) = 1, p(n) = 0, n ≥ 2:
Like above, the situation is clear  P[E] = 0.
3. p(0) = 0, p(1) = 0, . . . , p(k) = 1, p(n) = 0, n ≥ k, for some k ≥ 2:
No extinction here  P[E] = 0.
32
4. p(0) = p, p(1) = q = 1 −p, p(n) = 0, n ≥ 2:
Since G(s) = p+qs, the extinction equation is s = p+qs. If p = 0, the only solution is s = 0,
so no extinction occurs. If p > 0, the only solution is s = 1  the extinction is guaranteed. It
is interesting to note the jump in the extinction probability as p changes from 0 to a positive
number.
5. p(0) = p
2
, p(1) = 2pq, p(2) = q
2
, p(n) = 0, n ≥ 3:
Here G(s) = (p +qs)
2
, so the extinction equation reads
s = (p +qs)
2
.
This is a quadratic in s and its solutions are s
1
= 1 and s
2
=
p
2
q
2
, if we assume that q > 0.
When p < q, the smaller of the two is s
2
. When p ≥ q, s = 1 is the smallest solution.
Therefore
P[E] = min(1,
p
2
q
2
).
References
[1] Patrick Billingsley. Probability and measure. Wiley Series in Probability and Mathematical
Statistics: Probability and Mathematical Statistics. John Wiley & Sons Inc., New York, second
edition, 1986.
33
1
1.1
Random Walks I
Stochastic processes
Deﬁnition 1.1. Let T be a subset of [0, ∞). A family of random variables (Xt )t∈T , indexed by T , is called a stochastic (or random) process. When T = N (or T = N0 ), (Xt )t∈T is said to be a discretetime process, and when T = [0, ∞), it is called a continuoustime process. When T is a singleton (say T = {1}), the process (Xt )t∈T ≡ X1 is really just a single random variable. When T is ﬁnite (e.g., T = {1, 2, . . . , n}), we get a random vector. Therefore, stochastic processes are generalizations of random vectors. The interpretation is, however, somewhat diﬀerent. While the components of a random vector usually (not always) stand for diﬀerent spatial coordinates, the index t ∈ T is more often than not interpreted as time. Stochastic processes usually model the evolution of a random system in time. When T = [0, ∞) (continuoustime processes), the value of the process can change every instant. When T = N (discretetime processes), the changes occur discretely. In contrast to the case of random vectors or random variables, it is not easy to deﬁne a notion of a density (or a probability mass function) for a stochastic process. Without going into details why exactly this is a problem, let me just mention that the main culprit is the inﬁnity. One usually considers a family of (discrete, continuous, etc.) ﬁnitedimensional distributions, i.e., the joint distributions of random vectors (Xt1 , Xt2 , . . . , Xtn ), for all n ∈ N and all choices t1 , . . . , tn ∈ T . The notion of a stochastic processes is very important both in mathematical theory and its applications in science, engineering, economics, etc. It is used to model a large number of various phenomena where the quantity of interest varies discretely or continuously through time in a nonpredictable fashion. Every stochastic process can be viewed as a function of two variables  t and ω. For each ﬁxed t, ω → Xt (ω) is a random variable, as postulated in the deﬁnition. However, if we change our point of view and keep ω ﬁxed, we see that the stochastic process is a function mapping ω to the realvalued function t → Xt (ω). These functions are called the trajectories of the stochastic process X. The following two ﬁgures show two possible trajectories of a simple random walk1 , i.e., each one corresponds to a (diﬀerent) frozen ω ∈ Ω, but t varies from 0 to 30.
6 6
4
4
2
2
5
10
15
20
5
10
15
20
2
2
4
4
6
6
1 We will deﬁne the simple random walk later. For now, let us just say that is behaves as follows. It starts at x = 0 for t = 0. After that a (possibly biased) fair coin is tossed and we move up (to x = 1) if heads is observed and down to x = −1 is we see tails. The procedure is repeated at t = 1, 2, . . . and the position at t + 1 is determined in the same way, independently of all the coin tosses before (note that the position at t = k can be any of the following x = −k, x = −k + 2, . . . , x = k − 2, x = k).
2
Unlike with the ﬁgures above, the next two pictures show two timeslices of the same random process; in each graph, the time t is ﬁxed (t = 15 vs. t = 25) but the various values random variables X15 and X25 can take are presented through the probability mass functions.
0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.30 0.25 0.20 0.15 0.10 0.05 0.00
20
10
0
10
20
20
10
0
10
20
Figure 1: Probability mass function for X15
Figure 2: Probability mass function for X25
1.2
The canonical probability space
When one deals with inﬁniteindex (T  = +∞) stochastic processes, the construction of the probability space (Ω, F, P) to support a given model is usually quite a technical matter. This course does not suﬀer from that problem because all our models can be implemented on a special probability space. We start with the samplespace Ω: Ω = [0, 1] × [0, 1] × · · · = [0, 1]N0 , and any generic element of Ω will be a sequence ω = (ω0 , ω1 , ω2 , . . . ) of real numbers in [0, 1]. For n ∈ N0 we deﬁne the mapping Un : Ω → [0, 1] which simply chooses the nth coordinate: Un (ω) = ωn . The proof of the following theorem can be found in most advanced probability books (e.g. [1] Thm. 20.4): Theorem 1.2. There exists a probability measure P on Ω such that 1. each Un , n ∈ N0 is a random variable with the uniform distribution on [0, 1], and 2. the sequence (Un )n∈N0 is independent. Remark 1.3. One should think of the sample space Ω as a source of all the randomness in the system: the elementary event ω ∈ Ω is chosen by a process beyond out control and the exact value of ω is assumed to be unknown. All the other parts of the system are possibly complicated, but deterministic, functions of ω (random variables). When a coin is tossed, only a single drop of randomness is needed  the outcome of a cointoss. When several coins are tossed, more randomness is involved and the sample space must be bigger. When a system involves an inﬁnite number of random variables (like a stochastic process with inﬁnite T ), a large sample space Ω is needed. Once we can construct a sequence of independent random variables which are uniformly distributed on the unit interval, we can then construct any number of models. For example: 3
n ξk . we use each ξn to emulate a biased coin toss and then deﬁne the value of the process X at time n as the cumulative sum of the ﬁrst n cointosses. Proposition 1. . q = 1 − p. . First of all. The sequence (Xn )n∈N0 deﬁned above is a simple random walk. we ﬁrst note that the (ξn )n∈N is an independent sequence (as it has been constructed by an application of a deterministic function to each element of an independent sequence (Un )n∈N ). sequence (ξn )n∈N of random variables: ξn = We then set X0 = 0. . otherwise. to get (3). One can study more general random walks where each step comes from an arbitrary prescribed probability distribution. . Xn ) for all n ∈ N. we compute P[Xn+1 − Xn = 1] = P[ξn+1 = 1] = P[Un+1 ≤ p] = p. is that it is independent of all the previous values of the process X.1. values are nothing but linear combinations of the cointosses ξ1 . A similar computation shows that P[Xn+1 − Xn = −1] = q. Property a) is trivially true. 1)) if a) X0 = 0.4. ξn . we need a deﬁnition of the simple random walk: Deﬁnition 1. the increment Xn+1 − Xn = ξn+1 is independent of all the previous cointosses ξ1 . Un ≤ p −1. so they must also be independent of ξn+1 . .2. Proof. . n ∈ N. If p = 1 . and c) the random variable Xn+1 − Xn has the following distribution x= P(Xn+1 − Xn = x) = where. A sequence (Xn )n∈N0 of random variables is called a simple random walk (with parameter p ∈ (0. Our next task is to study some of its mathematical properties. 2 The adjective simple comes from the fact that the size of each step is ﬁxed (equal to 1) and it is only the direction that is random. . P) from Theorem 1.2. b) Xn+1 − Xn is independent of (X0 . ξn . . Intuitively. though. Therefore. the random walk is called symmetric. previous.3 Constructing the random walk Let us show how to construct the simple random walk on the canonical probability space (Ω. X1 . To check property b). F. new. 4 . Xn = k=1 1 p −1 q 1. given by Theorem 1. Finally. deﬁne the following.5. as usual. . We have now deﬁned and constructed a random walk (Xn )n∈N0 . These. . For the sequence (Un )n∈N . . . What we need to prove.
. . 1. . . x1 . n. n. . let C be the set of all possible trajectories: C = {(x0 . we could have noticed that the random variable n+Xn has the binomial 2 b(n. . 1. the symmetry assumption ensures that all trajectories are equally likely. . . n. . and probabilities P[Xn = l] = n l+n 2 p(n+l)/2 q (n−l)/2 . n − 2. 2) marked in red. k = 1. .6. 2 2 n The number of ways we can choose these u upsteps from the total of n is n+l . 0. . u = n+l and d = n−l . which. (1. Therefore. x1 . l = −n. . −n + 2. Let (Xn )n∈N0 be a symmetric simple random walk. Xn is composed of n independent steps ξk = Xk+1 − Xk . weighted by their probabilities. .4 The reﬂection principle Now we know how to compute the probabilities related to the position of the random walk (Xn )n∈N0 at a ﬁxed future time n. l = 0. 2 1. n}. . Proof. .. . . . .1) Proof. . . 1. . 1. . . xk+1 − xk = ±1. A mathematically more interesting question can be posed about the maximum of the random walk on {0. n}. . xn ) : x0 = 0.. it is more helpful to view the random walk as a random trajectory in some space of paths. . Let us ﬁrst pick a level l ∈ {0. . Equivalently. and let Mn = max(X0 . Xn ) be the maximal value of (Xn )n∈N0 on the interval 0. . n. .. each of which goes either up or down. x1 . . xn ) ∈ C : max xk ≥ l k=0. . The support of Mn is {0.. . n − 2. 1. n} and its probability mass function is given by P[Mn = l] = n n+l+1 2 2−n . . . k ≤ n − 1}. n} . More precisely. .Proposition 1. .1) above. . . To prepare the ground for the future results. . gives the probability (1. To compute this quantity. let Al ⊂ C be given by Al = (x0 . 5 . suppose n ≥ 2. . A nice expression for this probability is available for the case of symmetric simple random walks. . The distribution of the random variable Xn is discrete with support {−n.7. xn ) ∈ C : xk ≥ l. 2 4 2 1 2 3 4 You can think of the ﬁrst n steps of a random walk simply as a probability distribution on the statespace C. . . . . for at least one k ∈ {0. with the fact the probability of any trajectory with exactly u upsteps is pu q n−u . The ﬁgure on the right shows the superposition of all trajectories in C for n = 4 with path (0. . p)distribution. Indeed. and adding them all together.n = (x0 . 4 Proposition 1. . . . and. . . Let (Xn )n∈N0 be a simple random walk with parameter p. n} and compute the auxiliary probability P[Mn ≥ l] by counting the number of trajectories whose maximal level reached is at least l. compute the required probability by simply counting the number of trajectories in the subset (event) you are interested in. the number u of upsteps and the number d of downsteps must satisfy u − d = l (and u + d = n). 1. −n + 2. . In order to reach level l in those n steps.
. yn ) in Al . n we clearly have P[Mn ≥ 0] = 1. let k(l) = k(l. xn ) ∈ Al : xn = l}. . x1 . xn ). For a trajectory (x0 . . . y1 . . xn ) : xn > l} + {(x0 . then (x0 . where A denotes the number of elements in the set A. The picture on the right shows two trajectories: a blue one and its reﬂection in red. . . use the ﬂipped values for the cointosses from k(l) onwards: • yk(l)+1 − yk(l) = −(xk(l)+1 − xk(l) ). . x1 . . x1 . If k(l) = n. . Let us denote this transformation by Φ : Al → C. . . . . . x1 . . . . . Φ is an involution. x1 . }. yn ).Then P[Mn ≥ l] = 21 Al .. . With k(l) at our disposal. . x1 . . l + 1. . n}. and Φ(A= ) = A= . . .e. . . . since X0 = 0. let us split the set Al into three parts according to the value of xn : 1. . . l l l l 6 . A< = {(x0 . with n = 15. The ﬁrst important property of the reﬂection map is that it is its own inverse: apply Φ to any (y0 . . . xn ) = (y0 . . (y0 . . . . yn ) and call it the reﬂection map. do nothing until you get to k(l): • y0 = x0 . . . yn ) ∈ C be a trajectory obtained from (x0 . . It follows immediately that Φ is a bijection from Al onto Al . xn ) by the following procedure: 1. x1 . xn ) ∈ Al : xn < l}. i. we use the following clever observation (known as the reﬂection principle): Claim: For l ∈ N. . . yn ) looks like (x0 . . (x0 . let (y0 . . • yk(l) = xk(l) . . . l 2. In other words Φ ◦ Φ = Id. and then follows its reﬂection around the level l so that yk − l = l − xk . . . . 2. 2 6 4 2 2 4 6 8 10 12 14 • yn − yn−1 = −(xn − xn−1 ). . . To count the number of elements in Al . It is clear that (y0 . xn ) ∈ Al : xn > l}. . y1 . x1 . . (1. . • y1 = x1 . In the stochasticprocesstheory parlance. x1 . l = 4 and k(l) = 8. . . Graphically. . . for k ≥ k(l). . xn ) = (y0 . l l Φ(A< ) = A> . x1 . . xn )) be the smallest value of the index k such that xk ≥ l. and l 3. . . . . xn ) ∈ Al . .2) We start by deﬁning a bijective transformation which maps trajectories into trajectories. . . xn ) until it hits the level l. l So that Φ(A> ) = A< . and you will get the original (x0 . • yk(l)+2 − yk(l)+1 = −(xk(l)+2 − xk(l)+1 ). . We know that k(l) is welldeﬁned (since we are only applying it to trajectories in Al ) and that it takes values in the set {1. . xn ) : xn = l} . A= = {(x0 . yn ) is in C. . . . . . . k(l) is the ﬁrst hitting time of the set {l. . we have Al  = 2 {(x0 . . y1 . y1 . When l = 0. . y1 . . Φ(x0 . To get to the second important property of Φ. A> = {(x0 . . . x1 . . . ..
so Xk . in the deﬁnition of A> and A= . i. x1 . .2). for all 0 ≤ i ≤ n}.e. we can easily rewrite it as follows: P[Mn ≥ l] = P[Xn = l] + 2 j>l P[Xn = j] = j>l P[Xn = j] + j≥l P[Xn = j]. . the oﬃcial records the number of votes each candidate has received so far. if xn ≥ l. . . Votes are counted by the same oﬃcial. xn ) : xn > l} + {(x0 . . you must already be in Al . we subtract P[Mn ≥ l + 1] from P[Mn ≥ l] to get the expression for P[Mn = l]: P[Mn = l] = P[Xn = l + 1] + P[Xn = l]. Example 1.8 (The Ballot Problem). . The probability of an upstep is p ∈ (0. Now that we have (1. . Finally. Indeed. The ballot problem can now be restated as follows: What is the probability that Xk ≥ 0 for all k ∈ {0. Indeed. and that each voter chooses Daisy with probability p ∈ (0. and n ∈ N voters cast their ballots. or decreases by 1 otherwise. 1) and Oscar with probability q = 1 − p. After each ballot is opened. x1 . until all n of them have been processed (like in the old days). where F = “all trajectories that stay nonnegative” = {Xi ≥ 0. G = “all trajectories that reach m at time n” = {Xn = m}. x1 . we are interested in the conditional probability P[F G] = P[F ∩ G]/P[G]. 0 ≤ k ≤ n is (the beginning of ) a simple random walk. . that Daisy got (n + m)/2 votes and Oscar the remaining (n − m)/2 votes. In either case the nonzero probability is given by n 2−n . Daisy and Oscar. x1 . the a priori stipulation that (x0 . given that Xn = m? The ﬁrst step towards understanding the solution is the realization that the exact value of p does not matter. . 7 . . .We should note that. For k ≤ n. n}. xn ) ∈ l l Al is unnecessary. the ﬁrst one if n and l have diﬀerent parity and the second one otherwise. The votes are independent of each other and X0 = 0. let Xk be the number of votes received by Daisy minus the number of votes received by Oscar in the ﬁrst k ballots. Therefore. one by one. . we have A<  = A>  = {(x0 . the oﬃcial announces that Daisy has won by a margin of m > 0 votes. At the end. It remains to note that only one of the probabilities P[Xn = l + 1] and P[Xn = l] is nonzero. Xk either increases by 1 (if the vote was for Daisy).. are running for oﬃce. xn ) : xn > l}. l l and so Al  = 2 {(x0 . . xn ) : xn = l}. by the bijectivity of Φ. . so this random walk is not necessarily symmetric. . n+l+1 2 Let us use the reﬂection principle to solve a classical problem in combinatorics. . What is the probability that Daisy never trails Oscar during the counting of the votes? We assume that the order in which the oﬃcial counts the votes is completely independent of the actual votes. This shows the claim. When the k + 1st vote is counted. 1). . . Suppose that two candidates. . .
If a path travels from 0 to −m − 2. you can convince yourself that the reﬂection of any path in H around the level l = −1 after its ﬁrst hitting time of that level produces a path that starts at 0 and ends at −m − 2. 0≤i≤n then G = (G ∩ F ) ∪ H. the same procedure applied to such a path yields a path in H. where k = . This means there are 1+(n+m)/2 of these paths. n ∈ N. people still write research papers on some of its generalizations.so “all” that We already know how to count the number of paths in G . H: the paths that go from 0 to m and become negative at some point. and 2. k How would you modify this argument compute the probability that Daisy leads Oscar during the entire counting of votes? The Ballot problem has a long history (going back to at least 1887) and has spurred a lot of research in combinatorics and probability. In fact. so its probability weight is always equal to p(n+m)/2 q (n−m)/2 . When posed outside the context of probability. n+1 n Can you derive this expression from (1. k+1 2 n! The last equality follows from the deﬁnition of binomial coeﬃcients n = k!(n−k)! . Can you prove it using the Ballotproblem interpretation? 8 . Conversely. . So G ∩ F  = G − H. then it must have (n + m + 2)/2 n down steps and (n − m − 2)/2 up steps. G ∩ F : the paths that go from 0 to m and stay nonegative.3) above). we get P[F G] = n k − n k n k+1 = n+m 2k + 1 − n .Each trajectory in G has (n + m)/2 upsteps and (n − m)/2 downsteps.3) .3)? If you want to test your understanding a bit further. the collection of paths that go from 0 to m can be split into 1. In fact. A special case m = 0 seems to be even more popular . Putting everything together. Can we use the reﬂection principle to ﬁnd H? Yes. here is an identity (called Segner’s recurrence formula) satisﬁed by the Catalan numbers n Cn = i=1 Ci−1 Cn−i . . ” (the diﬀerence being only in the normalizing factor n k appearing in (1. In other words. P[F G] = F ∩ G P[F ∩ G] F ∩ G p(n+m)/2 q (n−m)/2 = = . Therefore. it is often phrased as “in how many ways can the counting be performed . If we set H = “all paths which ﬁnish at m and visit the level l = −1” = {Xn = m and min Xi ≤ −1}.the number of 2nstep paths from 0 to 0 never going below zero is called the Catalan number and equals to Cn = 1 2n . (n+m)/2 q (n−m)/2 P[G] G G p n (n+m)/2 (1. we can.it is equal to remains to be done is to count the number of paths in G ∩ F .
is the generating function associated with the sequence (an )n∈N0 . The generating function A associated with a sequence (an )n∈N0 is inﬁnitely diﬀerentiable. the function A(s) is inﬁnitely diﬀerentiable on (−R. Herbert S. the distribution of the time it takes for the random walk to hit the level l = 0 is like that. so we can recover the sequence (an )n∈N0 from the function A. Wilf. the expectation of this random variable can be +∞). and its derivative can be expressed as another power series.1 Generating functions A generating function associates a sequence of numbers with a function (a power series). Checking this result rigorously is beyond the scope of this course. . we say that the function G(s) = k∈N0 ak sk .2. R) and dn A(s) = dsn In particular. then we see that the result is again a power series with the stated coeﬃcients. We will see later that most of the interesting questions do not fall into this category. 1≤k<∞ k(k − 1) ak sk−2 . To deal with a wider class of properties of random walks (and other processes). an = k(k − 1) · · · (k − n + 1) ak sk−n . −R < s < R. where n is given in advance. ). For example.2 Generating Functions A generating function is a clothesline on which we hang up a sequence of numbers for display. Proposition 2. There is no way to give an apriori bound on the number of steps it will take to get to l (in fact. If we formally diﬀerentiate each term in the expression: A(s) = (a0 + a1 s + a2 s2 + a3 s3 + . 2. When R > 0. 9 . . It turns out that in many cases of interest we can use this function to learn about the sequence. we need to develop some new mathematical tools. n≤k<∞ n ∈ N. For example: d A(s) = ds d2 ds2 A(s) = 2≤k<∞ k ak sk−1 . Proof. (2. Deﬁnition 2. If (an )n∈N0 is a sequence of numbers then we say that radius of convergence of the power series ∞ ak sk is the largest number R ∈ [0. ∞] such that ∞ ak sk converges k=0 k=0 when s < R.1. When R > 0.1) 1 dn n! dsn A(s) s=0 . generatingfunctionology The pathcounting method used in the previous lecture only works for computations related to the ﬁrst n steps of the random walk.
. .4. then the ∞ Gc (s) = k=0 ck sk .2 Associating a generating function with a random variable In this section we will look at random variables which take values in the set T N0 ∪ {+∞} = {0. 1] given by an = P[X = n]. 1. the event may never occurs. . If we set c = a ∗ b.2) Notice that the value P(X = ∞) does not occur in the sequence (an )n∈N0 . Let (an )n∈N0 and (bn )n∈N0 be sequences. Checking the remaining claims rigorously is again beyond the scope of this course. 3.3. In some cases. n ∈ N0 . also has radius of convergence at least as large as R. It also turns out that we can use generating functions to study convolution. } ∪ {+∞}. It turns out that convolving two sequences is equivalent to multiplying their generating functions. Deﬁnition 2. Let (an )n∈N0 and (bn )n∈N0 be sequences and deﬁne n n cn = j=0 aj bj−k = k=0 an−k bk . ) then we see that the resulting coeﬃcient for sn is given by cn . . 2. Then we say that the sequence (cn )n∈N0 is the convolution of the sequences (an )n∈N0 and (bn )n∈N0 and we write c = a ∗ b. Proof. We will see shortly that convolution arises naturally when we compute the probability mass function of the sum of two independent.” The distribution of an Tvalued random variable X is completely determined by the sequence (an )n∈N0 of numbers in [0. We will often be interested in random variables which record the amount of (discrete) time that we have to wait for an event to occur. 2. If we formally expand and then collect like powers of s in the following expression: Ga (s)Gb (s) = (a0 + a1 s + a2 s2 + . so we allow these random variables to take the value +∞ to indicate that we have to wait “forever. ) (b0 + b1 s + b2 s2 + . let Ga and Gb denote the generating functions associated with these sequences. Proposition 2. and assume that both power series have radius of convergence at least as large as R > 0. 10 . n ∈ N0 .The name generating function comes from the last part of this result. . and Gc (s) = Ga (s)Gb (s) for s < R. (2. random variables. . N0 valued. but we can still ﬁgure it from the values in the sequence: P(X = ∞) = 1 − P(X < ∞) = 1 − n∈N0 an . . The knowledge of A implies the knowledge of the whole sequence (an )n∈N0 .
Proposition 2. (1) Bernoulli(p): Here a0 = q. . k! Remark 2. (2) Binomial(n. we mean that (an )n∈N0 is given by (2.7. Therefore. a1 = p. and equal to 1/q > 1 in (3). k = 0. k by the binomial theorem. . in Example C 1 2. let us ﬁnd an expression for the generating functions of some of the popular N0 valued random variables. and an = 0. we have n GX (s) = k=0 n k n−k k p q s = (ps + q)n . Note that the true radius of convergence varies from distribution to distribution but is never smaller than 1. p): Since ak = n k pk q n−k . P (X < ∞) = lims 1 GX (s). It is inﬁnite in (1). . for n ≥ 2. where C = ( ∞ (k+1)2 )−1 . GX (s) = ps + q. Can you see why? The following proposition gives another way to the compute the generating function associated with a random variable.2). s ∈ (−1. Example 2. (2. 1 − qs (4) Poisson(λ): Given that ak = e−λ λ . when we say “let (an )n∈N0 be the sequence associated with X”.In the future. For the distribution with pmf given by ak = (k+1)2 . ak = q k p. (3) Geometric(p) For k ∈ N0 . We then deﬁne the generating function associated with the sequence (an )n∈N0 by GX (s) = 0≤k<∞ ak sk .3) It follows from the fact that an  ≤ 1 that the radius of convergence of the sequence (an )n∈N0 is at least 1 and GX is well deﬁned for s ∈ (−1. Before we proceed. 1). GX (s) = E[sX ] = E[sX 1{X<∞} ]. so that ∞ ∞ GX (s) = k=0 k q k sk p = p k=0 (qs)k = p . we have k! ∞ GX (s) = k=0 e−λ λk k s = e−λ k! ∞ k=0 (sλ)k = e−λ esλ = eλ(s−1) . 11 . (2) and (4). k ∈ N0 . the radius of k=0 convergence is exactly equal to 1. . Let X be an Tvalued random variable with generating function GX .5.5. Then 1. n. 2. The function GX that we have obtained in known as the generating function or probability generating function associated with X. 1).6.
We used the formula an = P[X = n] to associate a sequence with the random variable X. one can check that that they are related by the formula GX (s) = B(log(s)). If (an )n∈N0 and GX are the sequence and generating function associated with X. In particular.8.9. Proposition 2. 2. Y be independent Tvalued random variables and set Z = X + Y .3 Convolution The true power of generating functions comes from the fact that they behave very well under the usual operations in probability. One could also use the formula bn = E[X n /n!] to associate a sequence with X. the resulting function is given by B(s) = E[esX ] and is known as the moment generating function associated with X. 12 . 1). one should really justify the fact that we can exchange the summation and limit in the previous equation. Similarly. For each n ∈ N. Of course. if s < 1. we have n n n cn = P(Z = n) = i=0 P(X = i and Y = n − i) = i=0 P(X = i) P(Y = n − i) = i=0 ai bn−1 . Proof. In this case. for s ∈ (0. then GZ (s) = E[sZ ] = E[sX+Y ] = E[sX sY ] = E[sX ] E[sY ] = GX (s)GY (s). (bn ) and Gy are the sequence and generating function associated with Y . The probability generating function will turn out to be more convenient for this class. The moment generating function is quite similar to the probability generating function. then c = a ∗ b and GZ (s) = GX (s)GY (s). Remark 2. one could employ the monotone convergence theorem from real analysis. If one then computes the generating function B corresponding to the sequence (bn ). Let X. Statement (1) follows directly from the formula E[g(X)] = n∈T g(n) P(X = n) applied to g(x) = sx where we have used the fact/convention that s∞ = 0 if s < 1. and (cn ) and GZ are the sequence and generating function associated with Z.Proof. but this is beyond the scope of this course. The second claim follows from the fact that: s lim GX (s) = lim 1 s 1 sn P(X = n) n∈N0 s = n∈N0 lim sn P(X = n) = 1 n∈N0 P(X = n) = P(X < ∞). or simply check it by hand.
if the driver quits before he return from the ﬁrst round trip. p where nX + nY = n. 1. we can show that the sum of m independent random variables with the b(n.4) that P(T2 = ∞  T1 = m) = 1 − n∈N0 P(T2 = m + n  T1 = m) = 1 − n≤k<∞ P(T1 = n) = P(T1 = ∞). More formally. the time that it takes for him to make the second round trip is independent of the time that the ﬁrst trip took and has the same distribution. these should probably be continuous random variable. P(T1 = ∞) > 0 and P(T2 = ∞) > 0. m + 1.4) Since conditional on the event {T1 = m}. lets consider the following example. Unfortunately. If you try to sum binomials with diﬀerent values of the parameter p you will not get a binomial. if you don’t like the idea that T1 might take the value zero. p) distribution is a sum of n independent Bernoulli random variables with parameter(p). (q + ps) = (q + ps)n . Then both X and Y are binomial with parameters nX . 2. Let T1 ∈ T denote the time that your driver gets back from the ﬁrst trip. p and ny . n ∈ N0 . but there one T sticking point: if T1 = ∞. and let T2 ≥ T1 denote the time that your driver gets back from the second trip. What is even more interesting. So it follows from (2. m + 2. Moreover. . In other words. p) distribution has a binomial(mn. You send your driver out with an order. we suppose that: P(T2 = m + n  T1 = m) = P(T1 = n). then T2 = ∞ and T2 − T1 is undeﬁned. We will actually need something slightly more complicated than Proposition 2. You own a pizza shop with a single delivery driver. then it doesn’t make any sense to ask how long the second 13 . 2. } ∪ {∞}. We would like to check that GT2 (s) = G2 1 (s) for s < 1. (2. if we apply Prop. In other words. Example 2.9 when we get back to random walks. . . More generally.9 to T1 and T2 − T1 . 2. the only way to get a binomial as a sum of independent random variables is the trivial one. we want to apply Prop. m.11. .Example 2. p) distribution. you can just assign probability zero to this possibility.10. . but then realize that you gave him the wrong pizza. Of course. In the event that your driver does return to the shop after the ﬁrst trip. it’s 1983 so you can’t call your driver on a cell phone and there is nothing that you can do but wait for your driver to return. T2 can only take the values {m. Let GT1 denote the generating function associated with the random variable T1 and let GT2 denote the generating function associated with the random variable T2 . In other words. The binomial(n. Therefore.9 n times to the generating function (q + ps) of the Bernoulli b(p) distribution we immediately get that the generating function of the binomial is (q + ps) . the following statement can be shown: Suppose that the sum Z of two independent N0 valued random variables X and Y is binomially distributed with parameters n and p. Your driver is actually somewhat unreliable: on each trip there is some chance that he will decide he is sick of this job and never return. 3. To understand what we want. Morally. but lets assume that the world is discrete.
round trip took. In fact. .12. T when s < 1. s=1 Proof. We have now shown the following proposition which we will need in the next section. Fortunately. s < 1. Let T2 ≥ T1 be random times taking values in T = N0 ∪ {+∞} with generating functions GT1 and GT2 . so they must agree on their entire common radius of convergence. so E(sT2  T1 = ∞) = s∞ = 0.1). . then GT2 (s) = G2 1 (s). dn GX (s) dsn s=1 exists (in the sense that the left limit lims 1 dn GX (s) dsn exists) In either case. Also notice that if P(X = ∞) > 0. Let X be a N0 valued random variable X with generating function GX . Recall that E[X n ] is said to be the nth moment of X. then X n is never integrable. If P(T2 = m + n  T1 = m) = P(T1 = n). We could just deﬁne ∞ − ∞ = ∞.13. we may apply the tower law for conditional expectation to conclude that GT2 (s) = E[sT2 ] = m∈T E[sT2  T1 = m] P(T1 = m) = m∈N0 sm GT1 (s) P(T1 = m) = G2 1 (s). s < 1. 2. Formally.2 they are generated by the same sequence.4 Moments Another useful thing about generating functions is that they can make the computation of moments easier. but this still isn’t going to make T1 and T2 − T1 independent. As a result. Proposition 2. one can check this by setting s = 1 in (2. we have E[X(X − 1)(X − 2) . so we will now restrict attention to random variables that only take values in N0 . For n ∈ N the following two statements are equivalent 1. We also know that T2 = ∞ when T1 = ∞. E[X n ] < ∞. m ∈ N0 . then it follows from Proposition 2. (X − n + 1)] = dn GX (s) dsn . n ∈ N0 . this minor annoyance doesn’t end up mattering. If two generating functions agree around zero. and checking the resulting summation amounts to calculating the desired expectation. First notice that E(sT2  T1 = m) = n∈T sm+n P(T1 = n) = sm GT1 (s). T m. 2. 14 . Proposition 2. we know something slightly stronger.
A useful identity which follows directly from the above results is the following: Var[X] = P (1) + P (1) − (P (1))2 .14. and it is valid if the ﬁrst two derivatives of P at 1 exist. E[X(X − 1)]. . d Therefore. E[X 2 ] = λ2 + λ. It is very simple for the ﬁrst few: E[X] = E[X]. . . c) If we draw the third ball. Let X denote the amount of money that we win in this game. dsn A(1) = λn . The number of rounds that we play has a geometric distribution. are called factorial moments of the random variable X. E[X(X − 1)]. . We would like to determine the generating function GX associated with X. You can get the classical moments from the factorial moments by solving a system of linear equations. . we win two dollars.15. Example 2. we win a dollar. Var[X] = λ E[X 3 ] = λ3 + 3λ2 + λ. Let X be a Poisson random variable with parameter λ. and we let Z denote the ﬁrst ball drawn. the sequence (E[X]. E[X(X − 1)(X − 2)]. the game is over. . That is: P(Y = n  Z = 1) = P(Y = n  Z = 2) = P(X = n). λ2 . . and then play another round. b) If we draw the second ball. we let Y denote the amount of winnings obtained after (but not including the winnings from) the ﬁrst round. ).The quantities E[X]. replace the ball in the urn. λ3 . . . Then the conditional distribution of Y given Z = 1 or Z = 2 is the same as the unconditional distribution of X. ) of factorial moments of X is just (λ. Its generating function is given by A(s) = eλ(s−1) . As a result: GY (s) = E(sY  Z = 1) = E(sY  Z = 2) = E[sX ] = GX (s). and so. so the game ends with probability one and P(X = ∞) = 0. replace the ball in the urn. and then play another round. for all n ∈ N0 . . E[X 2 ] = E[X(X − 1)] + E[X]. . . It follows that n E[X] = λ. To do this. 15 . . Example 2. E[X(X − 1)(X − 2)]. . E[X 3 ] = E[X(X − 1)(X − 2)] + 3E[X(X − 1)] + E[X]. We then play a repeated game where on each turn we draw a ball with the following outcomes: a) If we draw the ﬁrst ball. We have an urn which contains three number balls. .
Var(X) = 14. N (ω) ≥ 1 for ω ∈ Ω. E[X ] = GX (1) + GX (1) = 23 2 P(X = 2) = GX (0)/2 = 4/27. (3 − s − s2 )2 GX (s) = 8 + 6s + 6s2 . N (ω) = 0. and let N have the following distribution n= 0 1 2 P(N = n) = 1/3 1/3 1/3 which is independent of (ξn )n∈N (it is very important to specify the dependence structure between N and (ξn )n∈N in this setting!). More generally. we deﬁne the random variable XN by XN (ω) = XN (ω) (ω). and let N be a random time (a random time is simply an T = N0 ∪ {+∞}value random variable).and GX (s) = E[sX ] = E[sX  Z = 1] P(Z = 1) + E[sX  Z = 2] P(Z = 2) + E[sX  Z = 3] P(Z = 3) = E[s1+Y  Z = 1]/3 + E[s2+Y  Z = 2]/3 + E[s0  Z = 3]/3 = sGX (s)/3 + s2 GX (s)/3 + 1/3. (3 − s − s2 )3 As a result. Solving for GX show that GX (s) = 1/(3 − s − s2 ). In particular. Let (ξn )n∈N be a sequence of random variables. then XN = N ξk . When N is a constant (N = n).16. E[X] = GX (1) = 3 P(X = 1) = GX (0) = 1/9. Let (ξn )n∈N be the increments of a symmetric simple random walk (cointosses). Let us compute the distribution of Y = N ξk in this case. for an arbitrary stochastic process (Xn )n∈N0 and a random time N (with P[N = +∞] = 0).5 Random sums Our next application of generating function in the theory of stochastic processes deals with the socalled random sums. We can deﬁne the random variable N Y = k=0 ξk by Y (ω) = 0. GX (s) = 1 + 2s . for ω ∈ Ω. N (ω) k=1 ξk (ω). k=1 Example 2. think of XN as a value n of the stochastic process X taken at the time which is itself random. we can determine a number of properties of X P(X = 0) = GX (0) = 1/3. This k=0 16 . then XN is simply equal to Xn . 2. In general. If Xn = k=1 ξk .
we get P[Y = 1] = 0+ 1 2 +0 3 = 1/6. ξ1 = 1. 2 4 8 Similarly. Example 2. The expectation E[Y ] is equal to 1 · 5 + 4 8 8 1 (−1) · 4 + (−3) · 1 = 0. ξ1 = −1. ξ1 = 1. use the formula of total probability: P(Y = m) = P(Y = mN = 0) P(N = 0) + P(Y = mN = 1) P(N = 1) + P(Y = mN = 2) P(N = 2) N N =P k=0 ξk = m N = 0 P(N = 0) + P k=0 N ξk = m N = 1 P(N = 1) +P k=0 ξk = m N = 2 P(N = 2) = 1 1 + P(ξ1 = m) + P(ξ1 + ξ2 = m) . P[Y = 1] = P[Y = 1ξ1 = 1]P[ξ1 = 1] + P[Y = 1ξ1 = −1]P[ξ1 = −1] = 1 · P[ξ1 = 1] + P[ξ2 + ξ3 = 2]P[ξ1 = −1] = 1 (1 + 1 ) = 5 . 3. Let (ξn )n∈N be as above . Perform the computation for some other values of m for yourself. I’ll quit then and there. If I lose. ξ1 = −1. I’ll play another 2 games and hopefully cover my losses. 3 {m=0} When m = 1 (for example). One of the ﬁrst powerful results of the beautiful 8 martingale theory states that no matter how smart a strategy you employ. . −1 + ξ2 + ξ3 . A “smart” gambler enters the game and decides on the following tactic: Let’s see how the ﬁrst game goes.is where we. we get P[Y = −1] = 1 and P[Y = −3] = 1 .17. you cannot beat a fair gamble. typically. as the value of the time N when we stop adding increments will typically depend on the behavior of the sum itself. 1.we can think of a situation where a gambler is repeatedly playing the same game in which a fair coin is tossed and the gambler wins a dollar if the outcome is heads and loses a dollar otherwise. 17 1. What happens when N and (ξn )n∈N are dependent? This will usually be the case in practice. and if I win. This is not an accident. The described strategy amounts to the choice of the random time N as follows: N (ω) = Then Y (ω) = Therefore.
Moreover. p 18 . Then N Y = k=0 ξk . The number of years between two championships they get to play in has the Poisson distribution p(λ). Suppose. their chance of winning is p ∈ (0. Proposition 2.20. After we let s 1. As a result. we get E[Y ] = GY (1) = GN (Gξ (1))Gξ (1) = GN (1)Gξ (1) = E[N ] E[ξ1 ]. Every time the Springﬁeld Isotopes play in the league championship. Let N be a random time which is independent of (ξn )n∈N with P(N < ∞) = 1 and generating function GN .19 (Wald’s Identity I). Let us use generating functions to give a full description of the distribution of Y = N ξk when the time is independent k=0 of the summands. another ξ· years have to pass before they get another chance and the whole thing stops when they ﬁnally win. Let (ξn )n∈N be a sequence of independent N0 valued random variables. we may apply ξ ξ the tower law for conditional expectation to see that E[sY ] = n∈N0 n i=1 ξi E[sY  N = n] P(N = n) E[sXn  N = n] P(N = n) = n∈N0 n∈N0 = Gn (s) P(N = n) = GN Gξ (s) . Then N E k=0 ξk = E[N ] E[ξ1 ]. let N be a Geometric(p) random variable with success probability p. that E[N ] < ∞ and E[ξ1 ] < ∞. Example 2. also.We will return to the general (nonindependent) case in the next lecture. all of which share the same distribution and generating function Gξ (s). First let X0 = 0 and Xn = for n ∈ N denote the sequence of partial sums. 1). Repeated applications of Proposition 2. To compute the expectation of Y we use Corollary 2. Then the generating function GY of the random sum Y = N ξk is given by k=0 GY (s) = GN Gξ (s) . Indeed.18. ξ Corollary 2.9 show that GXn = Gn (where G0 (s) = 1). Proof.18. Proof. every time the Isotopes lose the championship. λ > 0.19 1−p E[Y ] = E[N ] E[ξk ] = λ. We just apply the composition rule for derivatives to the equality GY = GN ◦ Gξ to get GY (s) = GN (Gξ (s))Gξ (s). Let (ξn )n∈N and N be as in Proposition 2. What is the expected number of years Y between the consecutive championship wins? Let (ξn )n∈N be the sequence of independent Poisson(λ)random variables modeling the number of years between consecutive championship appearances by the Wildcat.
Xn at all. X1 . Otherwise.the minimum of that set . 2. Just set T = 5 (or T = 723 or T = n0 for any n0 ∈ N0 ∪ {+∞}). . . 1.1. The whole point is that the decision has to based only on the available observations and not on the future ones. The simplest examples of stopping times are (nonrandom) deterministic times. More precisely. They can be deﬁned for general stochastic processes. Let (Xn )n∈N0 be a stochastic process. An especially interesting case occurs when the value of N depends directly on the evolution of the underlying stochastic process. 0. 1} such that 1{T =n} = Gn (X0 . X1 . The functions Gn are called the decision functions. n = n0 . . Probably the most wellknown examples of stopping times are (ﬁrst) hitting times. . The family of decision rules is easy to construct: Gn (x0 . xn ) = 1. but we will stick to simple random walks for the purposes of this example. A gambler who stops gambling after 20 games. no matter of what the winnings of losses are uses such a rule. Decision functions Gn do not depend on the values of X0 . A random variable T taking values in T = N0 ∪ {+∞} is said to be a stopping time with respect to (Xn )n∈N0 if for each n ∈ N0 there exists a function Gn : Rn+1 → {0.3 3. . . Deﬁnition 3. so the mere mortals have to restrict their choices to socalled stopping times. . The value 0 means keep going and 1 means stop. Xn ).is the ﬁrst hitting time of l. we use the following slightly nonintuitive but mathematically correct deﬁnition Tl = min{n ∈ N0 : Xn = l}. this is not possible unless you are clairvoyant. Her decision to stop and sell her stocks can depend only on the information available up to the moment of the decision. . The set {n ∈ N0 : Xn = l} is the collection of all timepoints at which X visits the level l. . . In states of the 19 . no matter what the state of the world ω ∈ Ω is. for all n ∈ N0 . . it is usually the case that you are not allowed (able) to see the values of the terms you would get if you continued adding. Think of an investor in the stock market. she would sell at the absolute maximum and buy at the absolute minimum. making tons of money in the process.2. Example 3. and let Tl be the k=0 ﬁrst time X hits the level l ∈ N. n = n0 . So. Of course. . If you think of N as the time you stop adding new terms to the sum. . let Xn = n ξk be a simple random walk.1 Random Walks II Stopping times The last application of generating functions dealt with sums evaluated between 0 and some random time N . Even more important is the case where time’s arrow is taken into account. The earliest one . and should be thought of as a black box which takes the values of the process (Xn )n∈N0 observed up to the present point and outputs either 0 or 1. x1 .
Xn−2 = l. The hitting time T2 of the level l = 2 for a particular trajectory of a symmetric simple random walk is depicted below: 6 4 2 5 2 4 6 10 15 20 25 30 T2 TM . . xn−1 = l. How about n ∈ N. so we always have G0 (X0 ) = 0. Xn−1 ). . . . we need to construct the decision functions Gn . They also diﬀer in the fact that TM = n for the ﬁrst one and TM = n − 1 for the second one. n ∈ N0 . . that it is. . . . . n − 1. xn = l otherwise. More precisely. X1 . X1 . X0 = l (the level l has not been hit before). . . . . x1 = l. .world ω ∈ Ω in which the level l just never get reached. let us sketch the proof of the fact that TM is not a stopping time. .3. . For the value of Tl to be equal to exactly n. ξn . xn ) = 1. n0 − 1 without the knowledge of the values of the random walk after n. X2 . If you bought a stock at time t = 0. . x0 = l. . ξ2 . Xn . Remark 3. Let us start with n = 0. . two things must happen: (a) Xn = l (the level l must actually be hit at time n). . when {n ∈ N0 : Xn = l} is an empty set. by the deﬁnition of the decision functions. X1 = l. Of course.. . . . n) and (0. We would have Tl = 0 in the (impossible) case X0 = l. we set Tl (ω) = +∞. 2. The righthand side is equal for both trajectories. On the other hand. . . n − 2). we have 1{TM =n−1} = Gn−1 (X0 . As knowing the values X0 . . this is one of the points you would choose to sell it at. . . . Suppose. 3. . for some n ∈ 0. . . we are free to use whichever representation is more convenient. . had to sell it some time before n0 and had the ability to predict the future. n0 (see picture above). 20 . ξ2 . and let Gn be the family of decision functions. Consider the following two trajectories: (0. Gn (x0 . n0 that the random walk visits its maximum during 0. x1 . In the remainder of this section. . Therefore. and (b) Xn−1 = l. n − 1.e. X2 . . . . it is impossible to decide whether TM = n. . . . 1. 3. . . ξn rather than the random variables X0 . The diﬀer only in the direction of the last step. . . A contradiction. i. 1. while the lefthand side equals to 0 for the ﬁrst one and 1 for the second one. . 2. 3. we will sometimes write our decision function as a function of the random variables ξ1 . . In order to show that Tl is indeed a stopping time. 0. . Xn is clearly equivalent to knowing the values ξ1 . . . to the contrary. . How about something that is not a stopping time? Let n0 be an arbitrary timehorizon and let TM be the last time during 0. . .
Set n Xn = k=1 ξk .2 Wald’s identity II Having deﬁned the notion of a stopping time. (3. The idea behind it is simple: add all the values of ξk for k ≤ T and keep adding zeros (since ξk 1{k≤T } = 0 for k > T ) after that. Proof. Here is another way of writing the sum T T k=1 ξk : ξk = k=1 k∈N0 ξk 1{k≤T } . If T is an (Xn )n∈N0 stopping time such that E[T ] < ∞. identically distributed random variables with E ξ1  < ∞. . so (note what happens to the indices when we switch P[N = j] k∈N k≤j<∞ ∞ P[N ≥ k] = k∈N = j∈N 1≤k≤j P[N = j] = j=1 j P[N = j] = E[N ]. you can think of (ξn )n∈N as increments of a simple random walk. ξj ). .3.4. . Clearly. Proof.5 (Wald’s Identity II). Then E[N ] = k∈N0 P[N ≥ k].1) Now lets look at the random variable 1{k≤T } more closely. Let N be an N0 valued random variable. then E[XT ] = E[ξ1 ] E[T ]. Let (ξn )n∈N be a sequence of independent. 21 . Taking expectation of both sides and switching E and (this can be justiﬁed. P[N ≥ k] = the sums) j≥k P[N = j]. The random variables (ξn )n∈N in the statement of the theorem below are only assumed to be independent of each other and identically distributed. . Theorem 3. We have k−1 k−1 1{k≤T } = 1 − 1{k>T } = 1 − 1{k−1≥T } = 1 − j=0 1{T =j} = 1 − j=0 Gj (ξ1 . Before we state the main result. here is an extremely useful identity: Proposition 3. To make things simpler. but the argument is technical and we omit it here) yields: T ∞ E k=1 ξk = k=1 E[1{k≤T } ξk ]. let use try to compute something about it. n ∈ N0 .
. Example 3. On the one hand. and the probabilities p0 and pa with which it takes these values are exactly what we are after in this problem. .e. since there are no other values XT can take.. it is either equal to −x or to a − x. so P[T < ∞] = 1. it is quite clear that T is a stopping time (can you write down the decision functions?). the random variables 1{k≤T } and ξk are also independent. As the random variables (ξ1 . The situation is diﬀerent when p = 1 . so 0 = E[XT ] = p0 (−x) + pa (a − x). . i. This means that T ∞ ∞ E k=1 ξk = k=1 E[1{k≤T } ] E[ξk ] = E[ξk ] k=1 P(k ≤ T ) = E[ξk ]E[T ]. We can represent T in two diﬀerent (but equivalent) ways. n ∈ N0 . so these two correspond to the hitting times for the process (Wn )n∈N0 of the levels 0 and a). In particular. What can we say about the random variable XT . he makes enough money. ξk ) are independent. ξj ) is the decision function which corresponds to the the event {T = j}. On the other hand.the gambler’s wealth (minus x) at the random time T ? Clearly.6 (Gambler’s ruin problem). . his wealth hits 0. These two linear equations with two unknowns yield p0 = a−x x . In either case. . where Xn = n ξk . we can think of T as the ﬁrst hitting time of the twoelement set {−x. The classical “Gambler’s ruin” problem asks the following question: what is the probability that the gambler will make a dollars before he goes bankrupt? Gambler’s wealth (Wn )n∈N is modeled by a simple random walk starting from x. We will see later that the probability that the gambler’s wealth will remain strictly between 0 and a forever is zero. he goes bankrupt. He 2 2 decides to stop when one of the following two things happens: 1. Again we see that the gambler cannot extract positive expected value from a fair game. Let T k=0 be the time the gambler stops. . his wealth reaches some level a > x. . Wald’s identity gives us the second equation for p0 and pa : E[XT ] = E[ξ1 ]E[T ] = 0 · E[T ] = 0. or 2. . we must have p0 + pa = 1. a a It is remarkable that the two probabilities are proportional to the amounts of money the gambler needs to make (lose) in the two outcomes. a − x} for the process (Xn )n∈N0 . i..e. We know that. . we see that the random variable 1{k≤T } can be written as function of the variables (ξ1 .where Gj (ξ1 . Then Wn = x + Xn . pa = . whose increments ξk = Wk − Wk−1 are cointosses. ξk−1 ). . 2 22 . . A gambler start with x ∈ N dollars and repeatedly plays a game in which he wins a dollar with probability 1 and loses a dollar with probability 1 . we can think of T as the smaller of the two hitting times T−x and Ta−x of the levels −x and a − x for the random walk (Xn )n∈N0 (remember that Wn = x + Xn . .
with the probability p of stepping up and let T = min{n ∈ N0 : Xn = } be the ﬁrst hitting time of level but the random walk. ξn+k+2 . .12 that GT2 (s) = G2 1 (s). n ∈ N0 . 23 . the events Ai. ξn+j ) have the same joint distribution. are independent. . if m. . . If i < j ≤ k < . . then P(T2 = m + n  T1 = m) = P(Am. .n+j is determined by the exact same formula applied to the random variables (ξn+i+1 .15: we will attempt to determine an equation that the generating function associated with the random variable T satisﬁes. . and all the steps of the random walk are independent. P(Ai. m. X2 . . = 2. .m+n ) = P(A0. In then follows from Proposition 2. Let (X1 . we can choose a set B ∈ {−1.j and An+i. ξn+j ) ∈ B}. then the events Ai. . . ξn+k+2 . As the random vectors (ξi+1 . If then GT (s) = GT1 (s). This is because Ai.j is determined by some relatively complicated formula applied to the random variables (ξi+1 . ξj ) and the event An+i. only depends the steps ξk+1 . .j on depends on the steps ξi+1 . . The ﬁrst step is contained in the following proposition. .j and Ak. Proof. . and Xn = Xm + 1}. 2. . but the general case follows in much that same way. . n ∈ N. ξj ) ∈ B} and An+i. lets consider the events Am. . . ξi+2 .n+j must have the same probability. . We will essentially follow the approach of Example 2. .n = “n is the ﬁrst time after m that Xn = Xm + 1” = {Xi ≤ Xm for all i < n. ξj ) and (ξn+i+1 . Proposition 3. . T We will now use the previous relationship between T1 and T2 to obtain the generating function for T1 explicitly. . 1}j−i−1 such that Ai.n+j ) for all n ∈ N0 . More precisely. ξj of the random walk. . .3 The distribution of the ﬁrst hitting time T1 Let (Xn )n∈N0 be a simple random walk. . . We will now study the random variables T using the generatingfunction methods. ξi+2 . . . .j = {(ξi+1 . ξi+1 . P(T2 = m + n  T1 = m) = P(T1 = n).n+j = {(ξn+i+1 .7. We will only handle the case The strategy is to show that ∈ N. ξn+j ).m ) = P(Am. ξi+2 . ) be a random walk with probability p of moving up. where GT denotes the generating function of T . ξk+2 . .n ) = P(T1 = n). for m ≤ n. . ξ . Recall that the minimum value in empty set is taken to be +∞ by convention. So. The event Ai. To do this. These events have the two properties: 1.m+n  A0. ξn+i+2 .3. while Ak.j ) = P(An+i. . .
It is not hard to check that Y is also a random walk with probability p of moving up at each step and Y is independent of X1 . . Crucial observation: If the ﬁrst step that X makes is down. T2 is also independent of X1 . Now we make the recursive argument: GT1 (s) = E[sT1 ] = E[sT1  X1 = 1] P(X1 = 1) + E[sT1  X1 = −1] P(X1 = −1) = s p + E[s1+T2  X1 = −1] (1 − p) = s p + s E[sT2 ] (1 − p) = s p + s G2 1 (s) (1 − p). 2qs One of these solutions is always greater than 1 in absolute value. then T1 = 1+T2 . then X ﬁrst hits 1 at time 1 + T2 relative the initial clock. As a result. The other thing to realize is that Y is running on a clock that is one time step behind X. Our strategy is to condition of the ﬁrst move that the random makes. 24 .Proposition 3. Let (X1 . T We now know that GT1 solves: GT1 (s) = sp + sqG2 1 (s) for s < 1. n ∈ N0 . Then the generating function for T1 is given by 1 − 1 − 4pqs2 GT1 (s) = . If Y ﬁrst hits 2 at time T2 relative to its clock. given by GT1 (s) = 1± 1 − 4pqs2 . It turns out that it will be useful to consider the random variable: T2 = “The ﬁrst time that Y hits the level 2” = min {n ∈ N0 : Yn = 2} As Y is a random walk. it will be useful to consider an auxiliary process given by: Yn = Xn+1 − X1 . . This is because X has now has to climb two steps up to get from −1 up to 1. ) be a random walk with probability p of moving up and let T1 = min{n ∈ N0 : Xn = 1} denote the ﬁrst time that the random walk hits the level 1. X2 . 2qs Proof. the generating function for T2 is given by G2 1 . T and T2 is determined by Y . and then derive a recursive formula for the generating function. so it cannot correspond to a value of a generating function and we must select the negative square root. This is equivalent to Y climbing two steps up from 0 to 2. So Y corresponds to the changes in the process X after the ﬁrst step. .8. As Y is independent of X1 . There are two possible T solutions to this equation (for each s).
by deﬁnition. than it is to go up. it turns out there is an easier way. In particular. 2 where we have used the fact that p − q2 = (2p − 1)2 = 4p2 − 4p + 1 = 1 − 4pq. we compute the derivative of GT1 (s) and get GT1 (s) = When p = 1 . p q. we can try to extract the probability mass function of the random variable T1 from the generating function GT1 . the situation is less severe: s 2p 1 − 4pqs2 − 1− 1 − 4pqs2 .7 and compute P(T1 < ∞) = lim GT1 (s) = s→1 1− √ 1 − 4pq 1 − p − q = = 2q 2q 1. p≥ 1 2 p < 1. we use second part of Proposition 2. we get 2 √ 1 − 1 − s2 1 lim GT1 (s) = lim √ − s 1 s 1 s2 1 − s2 and conclude that E[T1 ] = +∞. if the random walk is more likely to go down. The obvious way to do this is to compute higher and higher derivatives of GT1 and then set s = 0. In this case. 2qs2 = +∞. P[T1 = +∞] > 0. Following the recipe from the 2 lecture on generating functions. on average. The ﬁrst question is: does the random walk eventually hit the level 1? To answer this. but this does not need to happen if p < 1 . What 2 we have here is an example of a phenomenon known as criticality: many physical systems exhibit qualitatively diﬀerent behavior depending on whether the value of certain parameter p lies above or below certain critical value p = pc .Once we have the generating function for T1 . so we can immediately conclude 2 that E[T1 ] = +∞. Another question that generating functions can help up answer is: how long. 25 . It is remarkable that if p = 1 . 1 For p > 2 . the 2 random walk will always hit 1 sooner or later. we can begin to answer some questions about this random variable. The case p ≥ 1 is more interesting. do we need to wait before 1 is hit? When p < 1 . p−q We can summarize the situation in the following table P[T1 < ∞] E[T1 ] p< p= p> 1 2 1 2 1 2 p q +∞ +∞ 1 p−q 1 1 Finally. then there is some strictly positive chance that it will never hit the level 1. lim GT1 (s) = 1 1 .
and then continue. let T be a stopping time with respect to (Xn )n∈N0 which never takes the value ∞. 2p k 1/2 1 (4pq)k (−1)k−1 . the random walk cannot move from 0 to 1 in a even number of steps.9. . 2q k Of course. k ∈ N. then (Yn )n∈N0 is also a random walk with parameter p. α ∈ R.4 Strong Markov property In the previous section. so an = 0 if n is even. k k 4 (2k − 1) Thus. we observed that Yn = X1+n − Xn is a again a random walk. 3. If we deﬁne the process Yn = XT +n − XT . we stop. 26 . Proposition 3. 1 2k − 1 k k−1 p q . . but even for stopping times. P(T1 = 2k − 1) = and P(T1 = k) = 0 when k is even. reset time and space. k ∈ N. Inuitively. n ∈ N0 . This in fact is a result of the “strong Markov” property which we will deﬁne later in the course. It turns out this is true not only for deterministic times. where k α k = α(α − 1) . k! Therefore. 2k − 1 k k ∈ N. GT1 (s) = and a2k−1 = 1 1 − 2qs 2qs ∞ k=0 1/2 (−4pqs2 )k = k ∞ s2k−1 k=1 1 1/2 (4pq)k (−1)k−1 . the resulting process that we obtain is again a random walk.The square root appearing in the formula for GX is an expression of the form (1 + x)1/2 and the (generalized) binomial formula can be used: ∞ (1 + x) = k=0 α α k x . Let (Xn )n∈N0 be a random walk with parameter p. (α − k + 1) . This expression can be simpliﬁed a bit further: one can show (by induction on k for instance) that 1/2 2(−1)k+1 2k − 1 = k .
psychometrician and statistician” and halfcousin of Charles Darwin) posed the following question (1873. 2. and Sir Francis Galton (a “polymath. Z1 is an N0 valued random variable.4 4. protogeneticist. shared by all Zn. inventor.2 children. anthropologist. th The last.2 A mathematical model The model proposed by Watson was the following: 1. 27 . is called the oﬀspring distribution. The total number of individuals in the second generation is now Z1 Z2 = k=1 Z1.k . each of Z1 individuals gives birth to a random number of children and dies.Z1 children. tropical explorer. 3. This distribution. The ﬁrst one has Z1. a unit of time later. meteorologist. the second one Z1.1 Branching Process A bit of history In the mid 19th century several aristocratic families in Victorian England realized that their family names could become extinct. A population starts with one individual at time n = 0: Z0 = 1. Sir Francis Galton 4. geographer. We assume that the distribution of the number of children is the same for each individual in every generation and independent of either the number of individuals in the generation and of the number of children the others have. gives birth to Z1. etc. After one unit of time (at time n = 1) the sole individual produces Z1 identical clones of itself and dies. Was it just unfounded paranoia. (a) If Z1 happens to be equal to 0 the population is dead and nothing happens at any future time n ≥ 2.i and Z1 . (b) If Z1 > 0. you will be able to give a precise answer to Galton’s question. Z1 one. or did something real prompt them to come to this conclusion? They decided to ask around. By the end of this section. Educational Times): How many male children (on average) must each generation of a family have in order for the family name to continue in perpetuity? The ﬁrst complete answer came from Reverend Henry William Watson soon after. and the two wrote a joint paper entitled One the probability of extinction of families in 1874. eugenicist.1 children.
k ∈ N0 . for some n. A stochastic process with the properties described in (1). Zn Zn+1 = k=1 Zn. You can think of Zn. What do we mean by “randomness”? Ideally. i. In words.3 Construction and simulation of branching processes Before we answer Galton’s question.i∈N . The sequence (ηn )n∈N0 can be rearranged into a double sequence2 {Zn. In the case of a simple random walk.which will produce Zn+1 . it is easier to be more wasteful.the population is extinct. The mechanism that produces the next generation from the present one can diﬀer from application to application. This is exactly how one would do it in practice: given the size Zn of generation n. 1] → N0 such that the random variable g(U ) has probability mass function p when U ∼ Uniform(0. we have a sequence of sequences.i as the number of children the ith individual in the nth generation would have had 2 Can you ﬁnd a onetoone and onto mapping from N into N × N? 28 . Such an abundance allows us to feed the whole “row” {Zn.e.1) 4. Deﬁnition 4.(c) The third. If it ever happens that Zn = 0. With this new formalism. we would be done at this point . 1] random variables exists.1. We think of (Un )n∈N0 as a sequence of random numbers produced by a computer. let us ﬁgure out how to simulate a branching process. 1).. Some time ago we asserted that a probability space which supports a sequence (Un )n∈N0 of independent U [0. for a given oﬀspring distribution p(k) = P[Z1 = k].i }n∈N0 .k . we would need exactly Zn (unused) elements of (ηn )n∈N0 to simulate the number of children for each of Zn members of generation n. when does P[Zn ≥ 1 for all n ∈ N0 ] = 1 hold? (4. the increment Zn+1 − Zn depends on Zn : the more individuals in a generation. we showed that it is possible to construct a function g : [0. and sum up the results to get Zn+1 . we can pose Galton’s question more precisely: Under what conditions on the oﬀspring distribution will the process (Zn )n∈N0 never go extinct. Let us ﬁrst apply the function g to each member of (Un )n∈N0 to obtain an independent sequence (ηn )n∈N0 of N0 valued random variables with pmf p. we need a black box with two inputs “randomness” and Zn . the more oﬀspring they will produce. etc. Otherwise. (2) and (3) above is called a (simple) branching process.an accumulation of the ﬁrst n elements of (ηn )n∈N0 would give you the value Xn of the random walk at time n. Mathematically. one would draw Zn simulations from the distribution (pn )n∈N0 . then Zm = 0 for all m ≥ n . Branching processes are a bit more complicated. In other words. It is the oﬀspring distribution alone that determines the evolution of a branching process. generations are produced in the same way. fourth.i }i∈N into the black box which produces Zn+1 from Zn . instead of one sequence of independent random variables with pmf p. When we studies simulation.
Let (Zn )n∈N0 be a branching process. the distribution of Z1 has pmf p. 29 .i }n∈N0 . let us analyze its probabilistic structure.e. Proposition 2. This infertile population dies after the ﬁrst generation. n G’s Proof. . The ﬁrst question the needs to be answered is the following: What is the distribution of Zn . in turn.i∈N are independent of each other and have the same distribution with pmf p. completely determined by its generating function. more interesting things happen. . its generating function can always be computed: Proposition 4.n . Zn+1 = i=1 Zn. Let us use Proposition 4. G(s) . p(0) = 1. so its distribution is completely described by its pmf. for n ∈ N0 ? It is clear that Zn must be N0 valued. we will describe another way to simulate it. . Example 4. p(n) = 0. In the ﬁrst three examples no randomness occurs and the population growth can be described exactly. For n = 1. . GZn+1 (s) = GZn (G(s)) = G(G(. which is. Then Zn Zn+1 = i=1 Zi. While an explicit expression for the pmf of Zn may not be available. G(G(s)) . )). ))). .4 A generatingfunction approach Having deﬁned and constructed a branching process (Zn )n∈N0 with oﬀspring distribution given by the pmf p.she been born.3. Then the generating function of Zn is the nfold composition of G with itself. can be viewed as a random sum of Zn independent random variables where each random summand has generating function G and the number of summands Zn is independent of the terms in the sum. so GZ1 (s) = G(s). and let the generating function of its oﬀspring distribution be given by G(s). 4. . GZn (s) = G(G(. Let (Zn )n∈N0 be a branching process with oﬀspring distribution (pn )n∈N0 . where all {Zn. 1.2 in some simple examples.i }i∈N and discards the rest: Zn Z0 = 1. Suppose that the statement of the proposition holds for some n ∈ N. In the other examples..18 asserts that the generating function GZn+1 of Zn+1 is a composition of the generating function G(s) of each of the summands and the generating function GZn of the random time Zn . . n ∈ N: In this case Z0 = 1 and Zn = 0 for all n ∈ N.i . i. for n ≥ 1. Therefore.2. The black box uses only the ﬁrst Zn elements of {Zn. . Once we learn a bit more about the probabilistic structure of (Zn )n∈N0 . n + 1 G’s where the second equality follows by induction.
n ≥ 2: Each individual tosses a (a biased) coin and has one child of the outcome is heads or dies childless if the outcome is tails. p(0) = 0. Zn = k n . One needs to realize two things: (a) After all the products above are expanded. p(n) = 0. (p + qs))))) . p(n) = 0. n ∈ N0 . . n ≥ 2: Each individual produces exactly one child before he/she dies. Therefore. Each individual has exactly one child. The generating function G of the oﬀspring distribution (pn )n∈N is given by G(s) = (p + qs)2 . . p(1) = 0. Therefore. independently of each other. Zn is just the number of individuals carrying the family name after n generations. This example can be interpreted alternatively as follows. p(1) = 1. you will see that the coeﬃcient B next to s is just q n . So we didn’t need Proposition 4.2 can be used to compute the mean and variance of the population size Zn . there are k kids per individual. B. (sk )k . If you inspect the expression for GZn even more closely. n pairs of parentheses Unlike the example above. 3. so the population grows exponentially: G(s) = sk . p + qs)2 . p(0) = p. . Then GZn = (p + q(p + q(. and assuming that all of them marry. Therefore. 30 . the value of Zn will be equal to 1 if and only if all of the cointosses of its ancestors turned out to be heads. . 5. . p(n) = 0. Proposition 4. n ≥ 3: In this case each individual has exactly two children and their gender is female with probability p and male with probability q. . . so A + B = 1. . for n ∈ N. Assuming that all females change their last name when they marry. so n GZn (s) = ((. The probability of that event is q n .2. p(1) = 2pq. but its gender is determined at random . n ≥ k.2 after all. GZn (s) = (p + q(p + q(p + q(. n pairs of parentheses The expression above can be simpliﬁed considerably. )2 )2 . . . . p(k) = 1.male with probability q and female with probability p. Of course. . it is not so easy to simplify the above expression. . The generating function of the oﬀspring distribution is G(s) = p + qs. p(0) = 0. p(n) = 0. (b) GZn is a generating function of a probability distribution. p(1) = q = 1 − p. the resulting expression must be of the form A + Bs. The population size is always 1: Zn = 1. for n ∈ N. . p(0) = p2 . GZn (s) = (1 − q n ) + q n s. )k )k = sk . for some A. for some k ≥ 2: Here. 4. p(2) = q 2 .
3) Proof. . Therefore. )) . We deﬁne extinction to be the following event: E = {ω ∈ Ω : Zn (ω) = 0 for some n ∈ N}. even if the explicit form of the generating function GZn is not known.e. µ=1 (4.5 Extinction probability We now turn to the central question (the one posed by Galton). G(0) . the generating function GZn+1 is given as a composition GZn+1 (s) = GZn (G(s)). then E[Zn ] = µn . Therefore. n→∞ n→∞ n G’s It is amazing that this probability can be computed. if ∞ (4.Proposition 4. it is clear that E[Z1 ] = µ and Var[Z1 ] = σ 2 . we can write E as an increasing union of sets En . the fact that P[En ] = P[Zn = 0] = GZn (0) we get P[E] = lim GZn (0) = lim G(G(.4. 1−µ σ 2 (n + 1). n+1 µ = 1. if we use the identity E[Zn+1 ] = GZn+1 (1). Since the distribution of Z1 has probability mass function p.. i. It follows from the properties of the branching process that Zm = 0 for all m ≥ n whenever Zn = 0. If the variance of p is also ﬁnite.2) σ2 = k=0 (k − µ)2 p(k) < ∞. 31 . .2) and (4. . 4. where En = {ω ∈ Ω : Zn (ω) = 0}. then Var[Zn ] = σ µ (1 + µ + µ + · · · + µ ) = 2 n 2 n σ 2 µn 1−µ .2. in particular. Let p denote the pmf of the oﬀspring distribution for a branching process (Zn )n∈N0 . if ∞ µ= k=0 k p(k) < ∞. By Proposition 4. Therefore. If p is integrable. the sequence (P[En ])n∈N is nondecreasing and “continuity of probability” implies that P[E] = lim P[En ]. We proceed by induction and assume that the formulas (4. Using generating functions. . we get GZn+1 (1) = GZn (G(1))G (1) = GZn (1)G (1) = E[Zn ]E[Z1 ] = µn µ = µn+1 ..3) hold for n ∈ N. A similar (but more complicated and less illuminating) argument can be used to establish (4.3).e. and. i. n→∞ The number P[E] is called the extinction probability.
. )) . G(xn ) = xn+1 . G(0) . . p(n) = 0.3.Proposition 4. the situation is clear . n G’s Then 1. Let us show ﬁrst that p = P[E] is a solution of the equation x = G(x). p(1) = 0.5. . p(0) = 1. Let us compute extinction probabilities in the cases from Example 4.6. The fact that p = P[E] is the smallest solution of x = G(x) on [0. Example 4. n ≥ 2: Like above. n→∞ n→∞ n→∞ n→∞ and so p solves the equation G(x) = x. p = P[E] = limn→∞ xn . Indeed. . Continuing in the same way we get P [En ] = G(G(. P[E] = 1 in this case. and 2. for some k ≥ 2: No extinction here . Let p be another solution of x = G(x) on [0. G is a continuous function. p(n) = 0. . . . we have G(0) ≤ G(p ) = p . 1]. p(0) = 0. called the extinction equation. G(0) . p(0) = 0. 32 . )) ≤ p . so G(limn→∞ xn ) = limn→∞ G(xn ) for every convergent sequence (xn )n∈N0 in [0. n ∈ N: No need to use any theorems. 2. Proof. The extinction probability p = P[E] is the smallest nonnegative solution of the equation x = G(x). 3. p(k) = 1. . . so p is not larger then any other solution p of x = G(x). 1. Let us take a particular sequence given by xn = G(G(. Since 0 ≤ p and G is a nondecreasing function on [0. p(1) = 1. 1].P[E] = 0. n ≥ k. . We can apply the function G to both sides of the inequality above to get G(G(0)) ≤ G(G(p )) = G(p ) = p . Therefore. p = lim xn = lim xn+1 = lim G(xn ) = G( lim xn ) = G(p). .P[E] = 0. 1] is a bit trickier to get. p(n) = 0. where G is the generating function of the oﬀspring distribution. . 1]. n G’ so p = P[E] = limn→∞ P[En ] ≤ p .
5. Therefore p2 P[E] = min(1. second edition.4. p(1) = q = 1 − p. When p ≥ q. the only solution is s = 1 . New York. so the extinction equation reads s = (p + qs)2 . the extinction equation is s = p + qs. n ≥ 2: Since G(s) = p + qs.. p(n) = 0. so no extinction occurs. John Wiley & Sons Inc. q When p < q. Probability and measure.the extinction is guaranteed. 1986. p(1) = 2pq. p(0) = p2 . the only solution is s = 0. if we assume that q > 0. p(0) = p. 33 . p(n) = 0. p(2) = q 2 . It is interesting to note the jump in the extinction probability as p changes from 0 to a positive number. n ≥ 3: Here G(s) = (p + qs)2 . s = 1 is the smallest solution. the smaller of the two is s2 . If p = 0. 2 ). Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. If p > 0. This is a quadratic in s and its solutions are s1 = 1 and s2 = p2 . q 2 References [1] Patrick Billingsley.