You are on page 1of 67

A Course in Stochastic Processes

José Manuel Corcuera


2 J.M. Corcuera
Contents

1 Basic Concepts and Definitions 3


1.1 Definition of a Stochastic Process . . . . . . . . . . . . . . . . . . 3
1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Equivalence of Stochastic Processes . . . . . . . . . . . . . . . . . 6
1.4 Continuity Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Main Classes of Random Processes . . . . . . . . . . . . . . . . . 7
1.6 Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6.1 Model of a Germination Process . . . . . . . . . . . . . . 8
1.6.2 Modeling of Soil Erosion Effect . . . . . . . . . . . . . . . 9
1.6.3 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Problemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Discrete time martingales 13


2.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Discrete Time Martingales . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Optional stopping theorem . . . . . . . . . . . . . . . . . . . . . 21
2.5 The Snell envelope and optimal stopping . . . . . . . . . . . . . . 22
2.6 Decomposition of supermartingales . . . . . . . . . . . . . . . . . 25
2.7 Martingale convergence theorems . . . . . . . . . . . . . . . . . . 27

3 Markov Processes 31
3.1 Discrete-time Markov chains . . . . . . . . . . . . . . . . . . . . . 31
3.1.1 Class structure . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.2 Hitting times and absortion probabilities . . . . . . . . . . 34
3.1.3 Recurrence and transience . . . . . . . . . . . . . . . . . . 39
3.1.4 Recurrence and trasient in a random walk . . . . . . . . . 41
3.1.5 Invariant distributions . . . . . . . . . . . . . . . . . . . . 42
3.1.6 Limiting behaviour . . . . . . . . . . . . . . . . . . . . . . 47
3.1.7 Ergodic theorem . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.1 Exponential distribution . . . . . . . . . . . . . . . . . . . 48
3.2.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3 Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . 52

3
4 CONTENTS

3.3.1 Definition and examples . . . . . . . . . . . . . . . . . . . 52


3.3.2 Computing the transition probability . . . . . . . . . . . 53
3.3.3 Jump chain and holding times . . . . . . . . . . . . . . . 56
3.3.4 Birth processes . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.5 Class structure . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.6 Hitting times . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.7 Recurrence and trasience . . . . . . . . . . . . . . . . . . 61
3.3.8 Invariant distributions . . . . . . . . . . . . . . . . . . . . 62
3.3.9 Convergence to equilibrium . . . . . . . . . . . . . . . . . 63
CONTENTS 1
2 CONTENTS
Chapter 1

Basic Concepts and


Definitions

1.1 Definition of a Stochastic Process


A stochastic process with state space S is a collection of random variables
{Xt ; t ∈ T } defined on the same probability space (Ω, F, P ). The set T is called
its parameter set. If T = N = {0, 1, 2, ...}, the process is said to be a discrete
parameter process. If T is not countable, the process is said to have a continuous
parameter. In the latter case the usual examples are T = R+ = [0, ∞) and
T = [a; b] ⊂ R. The index t represents time, and then one thinks of Xt as the
state or the position of the process at time t. The state space is R in most usual
examples, and then the process is said real-valued. There will be also examples
where S is N, the set of all integers, or a finite set.
For every fixed ω ∈ Ω , the mapping

t −→ Xt (ω)
defined on the parameter set T , is called a realization, trajectory, sample path
or sample function of the process.
Let {Xt ; t ∈ T } be a real-valued stochastic process and {t1 < · · · < tn } ⊂ T ,
then the probability distribution Pt1 ,...,tn = P ◦ (Xt1 , ..., Xtn )−1 of the random
vector

(Xt1 , ..., Xtn ) : Ω −→ Rn


is called a finite-dimensional marginal distribution of the process {Xt ; t ∈ T }.
A theorem due to Kolmogorov, establishes the existence of a stochastic process
associated with a given family of finite-dimensional distributions satisfying the
consistence condition. Let Ω = RT , the collection of all functions ω := (ω(t)t∈T )
from T to R. Define Xt (ω) = ω(t).A set
C = {ω : Xt1 (ω) ∈ B1 , ..., Xtn (ω) ∈ Bn }

3
4 CHAPTER 1. BASIC CONCEPTS AND DEFINITIONS

for t1 < · · · < tn ∈ T and B1 , ..., Bn ∈ B(Rn ) is called a cylinder set. Consider
the σ-field F generated by the cylinders sets.
Theorem 1.1.1 Consider a family of probability measures
{Pt1 ,...,tn , t1 < · · · < tn , n ≥ 1, ti ∈ T }
such that: 1. Pt1 ,...,tn is a probability on Rn 2.(Consistence condition): If
{tk1 < · · · < tkm } ⊂ {t1 < · · · < tn } ,
then Ptk1 ,...,tkm is the marginal of Pt1 ,...,tn , corresponding to the indexes k1 , ..., km .
Then, there exists a unique probability P on F which has the family {Pt1 ,...,tn }
as finite-dimensional marginal distributions.

1.2 Examples
A real-valued process {Xt , t ≥ 0} is called a second order process provided
E(X 2 ) < ∞ for all t ≥ 0. The mean and the covariance function of a second
order process {Xt , t ≥ 0} are defined by

mX (t) = E(Xt )
ΓX (s, t) = Cov(Xs ; Xt )
= E((Xs − mX (s))(Xt − mX (t)).
The variance of the process {Xt , t ≥ 0} is defined by is defined by
2
σX (t) = ΓX (t, t) = Var(Xt ).
Example 1.2.1 Let X and Y be independent random variables. Consider the
stochastic process with parameter t ∈ [0, ∞)
Xt = tX + Y.
The sample paths of this process are lines with random coefficients. The finite-
dimensional marginal distributions are given by
 
xi − y
Z
P (Xt1 ≤ x1 , ..., Xtn ≤ xn ) = FX min PY (dy).
R 1≤i≤n ti
Example 1.2.2 Consider the stochastic process
Xt = Acos(ϕ + λt);
where A and ϕ are independent random variables such that E(A) = 0, E(A2 ) <
∞ and ϕ is uniformly distributed on [0, 2π]. This is a second order process with
mX (t) = 0
1
ΓX (s, t) = E(A2 ) cos λ(t − s).
2
1.2. EXAMPLES 5

Example 1.2.3 Arrival process: Consider the process of arrivals of customers


at a store, and suppose the experiment is set up to measure the interarrival times.
Suppose that the interarrival times are positive random variables X1 , X2 , ....Then,
for each t ∈ [0; ∞), we put Nt = k if and only if the integer k is such that
X1 + ... + Xk ≤ t < X1 + ... + Xk+1 ,
and we put Nt = 0 if t < X1 . Then Nt is the number of arrivals in the time
interval [0, t]. Notice that for each t ≥ 0, Nt is a random variable taking values
in the set S = N. Thus, {Nt , t ≥ 0} is a continuous time process with values
in the state space N. The sample paths of this process are non-decreasing, right
continuous and they increase by jumps of size 1 at the times X1 + ... + Xk . On
the other hand, Nt < ∞ for all t ≥ 0 if and only if

X
Xk = ∞.
k=1

Example 1.2.4 Consider a discrete time stochastic process {Xn , n = 0, 1, 2, ...}


with a finite number of states S = {1, 2, 3}. The dynamics of the process is as
follows. You move from state 1 to state 2 with probability 1. From state 3 you
move either to 1 or to 2 with equal probability 1/2, and from 2 you jump to 3
with probability 1/3, otherwise stay at 2. This is an example of a Markov chain.
A real-valued stochastic process {Xt , t ∈ T } is said to be Gaussian or normal
if its finite-dimensional marginal distributions are multi-dimensional Gaussian
laws. The mean mX (t) and the covariance function ΓX (s, t) of a Gaussian pro-
cess determine its finite-dimensional marginal distributions. Conversely, sup-
pose that we are given an arbitrary function m : T −→ R, and a symmetric
function Γ : T × T −→ R, which is nonnegative definite, that is
n
X
Γ(ti , tj )ai aj ≥ 0
i,j=1

for all ti ∈ T, ai ∈ R, and n ≥ 1. Then there exists a Gaussian process with


mean m and covariance function Γ.
Example 1.2.5 Let X and Y be random variables with joint Gaussian distri-
bution. Then the process Xt = tX + Y , t ≥ 0, is Gaussian with mean and
covariance functions
mX (t) = tE(X) + E(Y );
ΓX (s, t) = stVar(X) + (t + s)Cov(X; Y ) + Var(Y ).
Example 1.2.6 Gaussian white noise: Consider a stochastic process {Xt , t ∈
T } such that the random variables Xt are independent and with the same law
N (0, σ 2 ). Then, this process is Gaussian with mean and covariance functions,
mX (t) = 0
ΓX (s, t) = σ 2 δst .
6 CHAPTER 1. BASIC CONCEPTS AND DEFINITIONS

1.3 Equivalence of Stochastic Processes


Definition 1.3.1 A stochastic process {Xt , t ∈ T } is equivalent to another
stochastic process {Yt , t ∈ T } if for each t ∈ T

P {Xt = Yt } = 1.

We also say that {Xt , t ∈ T } is a version of {Yt , t ∈ T }. Two equivalent


processes may have quite different sample paths.

Example 1.3.1 Let ξ be a nonnegative random variable with continuous dis-


tribution function. Set T = [0, ∞). The processes

Xt = 0

0 if 6 t
ξ=
Yt =
1 if ξ=t

are equivalent but their sample paths are different.

Definition 1.3.2 Two stochastic processes {Xt , t ∈ T } and {Yt , t ∈ T } are said
to be indistinguishable if X· (ω) = Y· (ω) for all ω 6∈ N , with P (N ) = 0.

Two stochastic process which have right continuous sample paths and are
equivalent, then they are indistinguishable.
Two discrete time stochastic processes which are equivalent, they are also
indistinguishable.

1.4 Continuity Concepts


Definition 1.4.1 A real-valued stochastic process {Xt , t ∈ T }, where T is an
interval of R, is said to be continuous in probability if, for any ε > 0 and every
t∈T
lim P (|Xt − Xs | > ε) = 0.
s−→t

Definition 1.4.2 Fix p ≥ 1. Let {Xt , t ∈ T } be a real-valued stochastic pro-


cess, where T is an interval of R, such that E(|Xt |p ) < ∞, for all t ∈ T . The
process {Xt , t ∈ T } is said to be continuous in mean of order p if

lim E(|Xt − Xs |p ) = 0.
s−→t

Continuity in mean of orderp implies continuity in probability. However, the


continuity in probability (or in mean of order p) does not necessarily implies
that the sample paths of the process are continuous.
In order to show that a given stochastic process have continuous sample paths
it is enough to have suitable estimations on the moments of the increments of the
process. The following continuity criterion by Kolmogorov provides a sufficient
condition of this type:
1.5. MAIN CLASSES OF RANDOM PROCESSES 7

Proposition 1.4.1 (Kolmogorov continuity criterion) Let {Xt , t ∈ T } be


a real-valued stochastic process and T is a finite interval. Suppose that there
exist constants α > 1 and p > 0 such that

E(|Xt − Xs |p ) ≤ cT |t − s|α (1.1)

for all s, t ∈ T . Then, there exists a version of the process {Xt , t ∈ T } with
continuous sample paths.

Condition (1.1) also provides some information about the modulus of conti-
nuity of the sample paths of the process: mT (ω, δ) := sups,t∈T,|t−s|<δ |Xt (ω) −
Xs (ω)| . In particular if T = [a, b], for each ε > 0 there exists a random variable
Gε such that, with probability one,
α−1
mT (ω, δ) ≤ Gε |b − a| p −ε

Moreover, E(Gpε ) < ∞.

1.5 Main Classes of Random Processes


1. Processes with independent increments. Let X := {Xt , t ∈ T } be a real-
valued stochastic process on a probability space (Ω, F, P ), where T ⊂ R is an
interval.

Definition 1.5.1 The process X is said to have independent increments if for


any finite subset {t0 < · · · < tn } ⊂ T , the increments

Xt0 , Xt1 − Xt0 , ..., Xtn − Xtn−1

are independent random variables.

Remark 1.5.1 If T = [0, ∞) , from the definition, it follows that all the
marginal distributions are determined by X0 and Xt − Xs , s < t ∈ T . The
most important processes with independent increments are the Poisson process
and the Wiener process (or Brownian motion).

2. Markov processes.

Definition 1.5.2 The process X is said to be a Markov process with a state


space {R, B(R)} if for any {t1 < · · · < tn } ⊂ T and B ∈ B(R),

P (Xtn ∈ B|Xt1 , ..., Xtn−1 ) = P (Xtn ∈ B|Xtn−1 ) (a.s.) (1.2)

Remark 1.5.2 Property (2.3) is called the Markov property.

3. Strickly stationary processes.


8 CHAPTER 1. BASIC CONCEPTS AND DEFINITIONS

Definition 1.5.3 The process X is said to be strickly stationary if for any


{t1 < · · · < tn } ⊂ T and {t1 + h < · · · < tn + h} ⊂ T any (Bk )1≤k≤n ∈ B(R)
P (Xt1 +h ∈ B1 , ..., Xtn +h ∈ Bn ) = P (Xt1 ∈ B1 , ..., Xtn ∈ Bn )
4. Wide sense stationary processes.
Definition 1.5.4 A second order process X is said to be wide sense stationary
if for all t ∈ T and t + h ∈ T
E(Xt+h ) = E(Xt )
and
Cov(Xt+h , Xs+h ) = Cov(Xt , Xs ).
5. Martingales.
Definition 1.5.5 A process X, such that E(|X|) < ∞ is said to be a martingale
if, for every {t1 < · · · < tn } ⊂ T,
E(Xtn |Xt1 , ..., Xtn−1 ) = Xtn−1 (a.s.).
If
E(Xtn |Xt1 , ..., Xtn−1 ) ≤ Xtn−1 (a.s.),
the process X is a supermartingale. Finally if
E(Xtn |Xt1 , ..., Xtn−1 ) ≥ Xtn−1 (a.s.),
it is said to be a submartingale.

1.6 Some Applications


1.6.1 Model of a Germination Process
Let p be the probability that a seed may fal to germinate. If there is germination
assume that the germination time, T, follows a exponential distribution with
parameter λ. Then
P (T ≤ t) = P (T ≤ t|germination)P (germination)
= (1 − e−λt )(1 − p).
If we have n plants and Xt is the number of them that have germinated, then
N
X
Xt = 1{Ti ≤t} ,
i=1

where Ti is the germination time of the i-plant. If we assume that {Ti }N


i=1 is
an i.i.d. sequence, then
Xt ∼ Bi N, (1 − e−λt )(1 − p) .

1.6. SOME APPLICATIONS 9

1.6.2 Modeling of Soil Erosion Effect


Let Y1 , Y2 , ..., Yn , ... be the sequence of annual yields of a given crop in an area
without erosion and Z1 , Z2 , ...Zn , ... a sequence of factors with common support
in (0, 1]. Then we can model the crop yield in year n as
n
Y
Xn = Yn Zi ,
i=1

and to assume that Y and Z are, respectively, sequences of i.i.d random variables
and that Y and Z are independent.

1.6.3 Brownian motion


In 1827 Robert Brown noticed the irregular but ceaseless motion of pollen par-
ticles suspended in a liquid. Assume that ξt is the coordinate, say x, of the
position of a pollen particle. Then (ξt ) is a continuos-time process. If we as-
sume that ξt − ξs has a distribution depending on t − s, independent of what
happened before s and that (ξt ) is a continuous process then if can be proved
that
ξt ∼ N(µt, σ 2 t),
where µ ∈ R and σ ∈ R+ . If µ = 0 and σ = 1 we have the so called standard
Brownian motion. Since
2
(ξt − ξs ) = ξt2 + ξs2 − 2ξt ξs ,

we have that
2
|t − s| = E((ξt − ξs ) ) = E(ξt2 ) + E(ξs2 ) − 2Cov (ξt , ξs ) ,

and
1
Cov (ξt , ξs ) = (t + s − |t − s|) = min(s, t).
2
10 CHAPTER 1. BASIC CONCEPTS AND DEFINITIONS

1.7 Problemes
1. Siguin Ω = [0, 1], F = B([0, 1]) i P la mesura de probabilitat uniforme en
Ω. Considereu el procés Xt (ω) = tω, t ∈ [0, 1]. Trobeu les trajectòries i
les distribucions en una i dos dimensions.
2. Siguin Xt = A cos αt + B sin αt i Yt = R cos(αt + Θ). On α > 0, A i B
són variables aleatòries independents amb distribució N (0, σ 2 ). R i Θ són
també variables aleatòries independents on Θ ∼Unif(0, 2π) i R té densitat
x x2
fR (r) = exp(− )1(0,+∞) (x).
σ2 2σ 2
Demostreu que els processos {Xt }t≥0 i {Yt }t≥0 tenen la mateixa llei.
3. Siguin X1 i X2 variables aleatòries independents definides en el mateix es-
pai de probabilitat (Ω, F, P ) i amb la mateixa distribució normal estàndard.
Sigui {Yt }t≥0 el procés estocàstic definit per
Yt = (X1 + X2 ) t.
Determineu les distribucions en dimensió finita del procés. Si A és el
conjunt de les trajectòries no negatives del procés, calculeu P (A).
4. Sigui {Yt }t≥0 el procés estocàstic definit per
Yt = X + αt, α > 1,
on X és una variable aleatòria amb llei N(0, 1). Sigui D ⊂ [0, +∞) un
conjunt finit o infinit numerable. Determineu:

(a) P (Yt = 0, per almenys un t ∈ D),


(b) P (Yt = 0, per almenys un t ∈ [1, 2]).

5. Siguin X i Y variables aleatòries definides en (Ω, F, P ) , on Y ∼ N (0, 1) .


Sigui {Zt }t≥0 el següent procés estocàstic
Zt = X + t (Y + t) .
Sigui A el conjunt de trajectòries no decreixents d’aquest procés. Deter-
mineu P (A) .
6. Siguin {Xi }i=1,...,n i {Yi }i=1,...,n variables aleatòries tal que E[Xi ] = E[Yi ] =
0, Var[Xi ] = Var[Yi ] = σi < +∞ i E[Xi Xj ] = E[Yi Yj ] = E[Xi Yj ] = 0 per
a tot i 6= j. Sigui {Zt }t≥0 el procés estocàstic definit per
n
X
Zt = {Xi cos (λi t) + Yi sin (λi t)} .
i=1

Determineu la seva funció de covariància. És {Zt }t≥0 estacionari en sentit


ampli?
1.7. PROBLEMES 11

7. Diem que un procés{Nt }t≥0 és un un procés de Poisson homogeni de


paràmetre λ > 0, si té increments independents, Nt = 0 i per a tot 0 <
t1 < t2 es compleix
k
(λ (t2 − t1 )) −λ(t2 −t1 )
P (Nt2 − Nt1 = k) = e , k ∈ Z+ .
k!
Sigui {Xk }k≥0 una successió de v.a. i.i.d. amb E[Xk ] = 0 i Var[Xi ] = σ 2 <
+∞. Sigui {Nt }t≥0 un procés de Poisson homogeni de parametre λ > 0,
independent de {Xk }k≥0 . És el procés {Yt }t≥0 , definit per Yt = XNt ,
estacionari en sentit ampli? I estrictament estacionari?
8. Sigui {Yt }t≥0 el procés estocàstic definit per

Yt = α sin (βt + X) ,

on α, β > 0 i X ∼ N (0, 1) . Calculeu E[Yt ] i E[Ys Yt ]. És el procés {Yt }t≥0


estacionari en senti ampli?
12 CHAPTER 1. BASIC CONCEPTS AND DEFINITIONS
Chapter 2

Discrete time martingales

We will first introduce the notion of conditional expectation of a random variable


X with respect to a σ-field B ⊂ F in a probability space (Ω, F, P ).

2.1 Conditional expectation


Consider an integrable random variable X defined in a probability space (Ω, F, P ),
and σ-field B ⊂ F. We define the conditional expectation of X given B (denoted
by E(X|B) to be any integrable random variable Z that satisfies the following
two properties:

(i) Z is measurable with respect to B.

(ii) For all A ∈ B, E(Z1A ) = E(X1A )

It can be proved that there exists a unique (up to modifications on sets of


probability zero) random variable satisfying these properties. That is, if Z̃ and
Z satisfy the above properties, then Z = Z̃, P -almost surely.
Property (ii) implies that for any bounded and B-measurable random vari-
able Y we have
E(E(X|B)Y ) = E(XY ) (2.1)

Example 2.1.1 Consider the particular case where the σ-field B is generated by
a finite partition {B1 , ..., Bm }. In this case, the conditional expectation E(X|B)
is a discrete random variable that takes the constant value E(X|Bj ) on each set
Bj :
m
X E(X1Bj )
E(X|B) = 1Bj .
j=1
P (Bj )

Here are some properties of the conditional expectations in the general case:

1. The conditional expectation is lineal: E(aX+bY |B) = aE(X|B)+bE(Y |B).

13
14 CHAPTER 2. DISCRETE TIME MARTINGALES

2. A random variable and its conditional expectation have the same expecta-
tion: E(E(X|B)) = E(X). This follows from property (ii) taking A = Ω.
3. If X and B are independent, then E(X|B) = E(X). In fact, the constant
E(X) is clearly B-measurable, and for all A ∈ B we have E(X1A ) =
E(X)E(1A ) = E(E(X)1A ).
4. If X is B-measurable, then E(X|B) = X.
5. If Y is a bounded and B-measurable random variable, then E(Y X|B) =
Y E(X|B). In fact, the random variable Y E(X|B) is integrable and B-
measurable, and for all A ∈ B we have

E(E(X|B)Y 1A ) = E(XY 1A ),

where the equality follows from (2.1). This property means that B-measurable
random variables behave as constants and can be factorized out of the
conditional expectation with respect to B. This property holds if X, Y ∈
L2 (Ω).
6. Given two σ-fields C ⊂ B, then E(E(X|B)|C) = E(X|C)
7. Consider two random variable X and Z, such that Z is B-measurable
and X is independent of B. Consider a measurable function h(x, z) such
that the composition h(X, Z) is an integrable random variable. Then,
we have E(h(X, Z)|B) = E(h(X; z))|z=Z . That is, we first compute the
expectation E(h(X; z)) for any fixed value z of the random variable Z
and, afterwards, we replace z by Z.

Conditional expectation has properties similar to those of ordinary expecta-


tion. For instance, the following monotone property holds:

X ≤ Y ⇒ E(X|B) ≤ E(Y |B).

This implies |E(X|B)| ≤ E(|X||B).


Jensen’s inequality also holds. That is, if ϕ is a convex function defined in
an open interval, such that E(|ϕ(X)|) < ∞, then

ϕ(E(X|B)) ≤ E(ϕ(X)|B). (2.2)

In particular, if we take ϕ(x) = |x|p with p ≥ 1, and E(|X|p ) < ∞, we obtain

|E(X|B)|p ≤ E(|X|p |B),

hence, taking expectations,

E(|E(X|B)|p ) ≤ E(|X|p ). (2.3)

We can define the conditional probability of an event C ∈ F given a σ-field B


as
P (C|B) = E(1C |B).
2.1. CONDITIONAL EXPECTATION 15

Suppose that the σ-field B is generated by a finite collection of random vari-


ables Y1 , ..., Ym . In this case, we will denote the conditional expectation of X
given B by E(X|Y1 , ..., Ym ) and this conditional expectation is the mean of the
conditional distribution of X given Y1 , ..., Ym . The conditional distribution of
X given Y1 , ..., Ym is a family of distributions p(dx|y1 , ..., ym ) parametrized by
the possible values y1 , ..., ym of the random variables Y1 , ..., Ym , such that for
all a < b Z b
P (a ≤ X ≤ b|Y1 , ..., Ym ) = p(dx|Y1 , ..., Ym ).
a
Then, this implies that
Z
E(X|Y1 , ..., Ym ) = xp(dx|Y1 , ..., Ym ).
R

Notice that the conditional expectation E(X|Y1 , ..., Ym ) is a function g(Y1 , ..., Ym )
of the variables Y1 , ..., Ym , where
Z
g(y1 , ..., ym ) = xp(dx|y1 , ..., ym ).
R

In particular, if the random variables X, Y1 , ..., Ym have a joint density f (x, y1 , ..., ym ),
then the conditional distribution has the density:

f (x, y1 , ..., ym )
f (x|y1 , ..., ym ) = R ∞ ,
−∞
f (x, y1 , ..., ym )dx

and Z ∞
E(X|Y1 , ..., Ym ) = xf (x|Y1 , ..., Ym )dx.
−∞

The set of all square integrable random variables, denoted by L2 , is a Hilbert


space with the scalar product

hZ, Y i = E(ZY ).

Then, the set of square integrable and B-measurable random variables, denoted
by L2 (Ω, B, P ) is a closed subspace of L2 (Ω, F, P ). Then, given a random
variable X such that E(X 2 ) < ∞, the conditional expectation E(X|B) is the
projection of X on the subspace L2 (Ω, B, P ). In fact, we have:

(i) E(X|B) belongs to L2 (Ω, B, P ) because it is a B-measurable random vari-


able and it is square integrable due to (2.3).

X − E(X|B) is orthogonal to the subspace L2 (Ω, B, P ). In fact, for all


Z ∈ L2 (Ω, B, P ) we have, using the property 5,

E[(X − E(X|B))Z] = E(XZ) − E(E(X|B)Z)


= E(XZ) − E(E(XZ|B)) = 0.
16 CHAPTER 2. DISCRETE TIME MARTINGALES

As a consequence, E(X|B) is the random variable in L2 (Ω, B, P ) that minimizes


the mean square error:

E[(X − E(X|B))2 ] = min E[(X − Y )2 ].


Y ∈L2 (Ω,B,P )

This follows from the relation

E[(X − Y )2 ] = E[(X − E(X|B))2 ] + E[(E(X|B) − Y )2 ],

and it means that the conditional expectation is the optimal estimator of X


given the σ-field B.

2.2 Discrete Time Martingales


In this section we consider a probability space (Ω, F, P ) and a nondecreasing
sequence of σ- fields
F0 ⊆ F1 ⊆ · · · ⊆ Fn ⊆ · · ·

contained in F. F := {Fn }n≥0 is called a filtration. A sequence of real random


variables M = {Mn }n≥0 is called a martingale with respect to the filtration F
if:
(i) For each n ≥ 0, Mn is Fn -measurable (that is, M is adapted to the
filtration F).
(ii) For each n ≥ 0, E(|Mn |) < ∞.
(iii) For each n ≥ 0,

E(Mn+1 |Fn ) = Mn .

The sequence M = {Mn } n≥0 is called a supermartingale ( or submartin-


gale) if property (iii) is replaced by E(Mn+1 |Fn ) ≤ Mn (or E(Mn+1 |Fn ) ≥ Mn ).
Notice that the martingale property implies that E(Mn ) = E(M0 ) for all
n ≥ 0.
On the other hand, condition (iii) can also be written as

E(∆Mn |Fn−1 ) = 0,

for all n ≥ 1, where ∆Mn = Mn − Mn−1 .

Example 2.2.1 Suppose that {ξn , n ≥ 1} are i.i.d. centered random variables.
Set M0 = 0 and Mn = ξ1 + ... + ξn , for n ≥ 1.Then Mn is a martingale w.r.t
(Fn ), with Fn = σ(ξ1 , ..., ξn ) for n ≥ 1, and F0 = {φ, Ω}. In fact,

Example 2.2.2 Suppose that {ξn , n ≥ 1} are i.i.d. random variables such that
 ξ1 +...+ξn
P (ξn = 1} = 1 − P (ξn = −1} = p, on 0 < p < 1. Then Mn = 1−p p ,
2.2. DISCRETE TIME MARTINGALES 17

M0 = 1 is a martingale w.r.t (Fn ), with Fn = σ(ξ1 , ..., ξn ) for n ≥ 1, and


F0 = {φ, Ω}. In fact,
 ξ +...+ξn+1 !
1−p 1
E(Mn+1 |Fn ) = E |Fn
p
 ξ +...+ξn ξ !
1−p 1

1 − p n+1
= E |Fn
p p
 ξ !
1 − p n+1
= Mn E = Mn .
p

In the two previous examples, Fn = σ(M0 , ..., Mn ), for all n ≥ 0. That is,
(Fn ) is the filtration generated by the process (Mn ). Usually, when the filtration
is not mentioned, we will take Fn = σ(M0 , ..., Mn ), for all n ≥ 0. This is always
possible due to the following result:

Proposition 2.2.1 Suppose (Mn )n≥0 is a martingale with respect to a filtration


(Gn ). Let Fn = σ(M0 , ..., Mn ) ⊆ Gn . Then (Mn )n≥0 is a martingale with respect
to the filtration (Fn ).

Proof. We have, by the properties 6 and 4 of the conditional expectation,

E(Mn+1 |Fn ) = E(E(Mn+1 |Gn )|Fn ) = E(Mn |Fn ) = Mn .

Some elementary properties of martingales:


1. If (Mn ) is a martingale, then for all m ≥ n we have

E(Mm |Fn ) = Mn .
In fact, for m > n + 1

E(Mm |Fn ) = E(E(Mm |Fm−1 )|Fn )


= E(Mm−1 |Fn ) = ... = E(Mn+1 |Fn ) = Mn .

2. (Mn ) is a submartingale if and only if (−Mn ) is a supermartingale.


3. If (Mn ) is a martingale and ϕ is a convex function, defined in an open
interval, such that E(|ϕ (Mm ) |) < ∞ for all n ≥ 0, then (ϕ (Mm )) is a sub-
martingale. In fact, by Jensen’s inequality for the conditional expectation we
have
E(ϕ (Mn+1 ) |Fn ) ≥ ϕ (E(Mn+1 |Fn )) = ϕ (Mn ) .
In particular, if (Mn ) is a martingale such that E(|Mn |p ) < ∞ for all n ≥ 0 and
for some p ≥ 1, then (|Mn |p ) is a submartingale.
18 CHAPTER 2. DISCRETE TIME MARTINGALES

4. If (Mn ) is a submartingale and ϕ is a convex an increasing function such


that E(|ϕ (Mm ) |) < ∞ for all n ≥ 0, then (ϕ (Mm )) is a submartingale. In fact,
by Jensen’s inequality for the conditional expectation we have

E(ϕ (Mn+1 ) |Fn ) ≥ ϕ (E(Mn+1 |Fn )) ≥ ϕ (Mn ) .

In particular, if If (Mn ) is a submartingale, then (Mn+ ) and (Mn ∨ a) are sub-


martingales.
Suppose that (Fn )n≥0 is a given filtration. We say that (Hn )n≥1 is a pre-
dictable sequence of random variables if for each n ≥ 1, Hn is Fn−1 -measurable.
We define the martingale transform of (Mn ) by (Hn ) as the sequence
n
X
(H · M )n = M0 + Hj ∆Mj , n ≥ 1.
j=1

Proposition 2.2.2 If (Mn )n≥0 is a (sub)martingale and (Hn )n≥1 is a bounded


(nonnegative) predictable sequence, then the martingale transform ((H · M )n )
is a (sub)martingale.

Proof. Clearly, for each n ≥ 0 the random variable (H · M )n is Fn -


measurable and integrable. On the other hand, if n ≥ 0 we have

E (∆(H · M )n+1 |Fn ) = E (Hn+1 ∆Mn+1 |Fn )


= Hn+1 E (∆Mn+1 |Fn ) = 0.

Remark 2.2.1 We may think of Hn as the amount of money a gambler will


bet at time n. Suppose that ∆Mn = Mn − Mn−1 is the amount of money a
gambler can win or lose at every step of the game if the bet is 1 Euro, and M0
is the initial capital of the gambler. Then, Mn will be the fortune of the gambler
at time n if in each step the bet is 1 Euro, and (H · M )n will be the fortune
of the gambler is he uses the gambling system (Hn ). The fact that (Mn ) is a
martingale means that the game is fair. So, the previous proposition tells us
that if a game is fair, it is also fair regardless the gambling system (Hn ).

Example 2.2.3 Suppose that M0 = 0 and Mn = ξ1 + ... + ξn , n ≥ 1, where


{ξn , n ≥ 1} are i.i.d. random variables such that P (ξn = 1} = P (ξn = −1} = 21 .
(Mn ) is a martingale. A famous system to bet is ”the doubling strategy” defined
as
H1 = 1

2n−1

if ξ1 = .... = ξn−1 = −1
Hn = , n > 1,
0 otherwise
2.3. STOPPING TIMES 19

then, if we stop at τ = inf{k, ξk = 1} we will have


τ
X
(H · M )τ = Hn ∆Mn
i=1
τ
X −1
=− 2n−1 + 2n = 1! (a.s)
i=1
k
and our initial capital was zero! Note that P (τ = k) = 21 and so P (τ <
∞) = 1 then we will stop at a finite time. However, that result does not con-
tradict our previous proposition, since if we stop the si process by τ and we look
what happens at a fix time n and its relation with previous times we obtain a
martingale, that is

E((H · M )n∧τ |Fn−1 ) = (H · M )n−1∧τ for all n ≥ 1.

It is remarkable that the term martingale is originated from a French acronym


for the gambling strategy described above.

2.3 Stopping times


Consider a filtration F = {Fn }n≥0 in a probability space (Ω, F, P ). A (general-
ized) random variable is called a stopping time if T indicates the time when an
observable event happens. In this way at time n we know the value of T ∧(n+1),
in other words T ∧ (n + 1) is Fn -medible, so equivalently {T ≤ n} ∈ Fn for all
n ≥ 0.

Definition 2.3.1 A (generalized) random variable T is called a stopping time


if
T : Ω −→ Z+ ∪ {∞},
satisfies {T ≤ n} ∈ Fn .

Proposition 2.3.1 T is an F-stopping time iff

{T = n} ∈ Fn , for all n ≥ 0.

Proof. {T ≤ n} = ∪nj=1 {T = j}
{T = n} = {T ≤ n} ∩ {T ≤ n − 1}c .

Example 2.3.1 Consider a discrete time stochastic process (Xn ) adapted to F.


Let A be a subset of the space state. Then the first hitting time of A, TA is a
stopping time time because

{TA = n} = {X0 6∈ A, X1 6∈ A, ..., Xn−1 6∈ A, Xn ∈ A} .

Remark 2.3.1 We could consider random times with certain delay, for in-
stance if {T ≤ n − 1} ∈ Fn . Or even we can consider predictable times:
{T ≤ n} ∈ Fn−1 .
20 CHAPTER 2. DISCRETE TIME MARTINGALES

Remark 2.3.2 The extension of the notion of stopping time to continuous time
is evident: the (generalized) random variable

T : Ω −→ [0, ∞]

is a stopping time if {T ≤ t} ∈ Ft . Random times with infinitesimal delay are


called optional times: {T < t} ∈ Ft (equivalently {T ≤ t} ∈ Ft+ := ∩s>t Fs ),
random times with infinitesimal anticipation are called predictable times: {T ≤
t} ∈ Ft− := σ(∪s<t Fs ).

Consider a real-valued stochastic process {Xt , t ≥ 0} with continuous tra-


jectories, and adapted to F = {Ft , t ≥ 0}. Assume X0 = 0 and a > 0. The first
passage time for a level a ∈ R defined by

Ta := inf{t > 0 : Xt = a}
is a stopping time because

{Ta ≤ t} = { sup Xs ≥ a} = { sup Xs ≥ a} ∈ Ft .


0≤s≤t 0≤s≤t,s∈Q

Properties of stopping times:


1. If S and T are stopping times, so are S ∨ T and S ∧ T . In fact, this a
consequence of the relations

{S ∨ T ≤ t} = {S ≤ t} ∩ {T ≤ t} ,
{S ∧ T ≤ t} = {S ≤ t} ∪ {T ≤ t} .

2. Given a stopping time T , we can define the σ-field

FT = {A : A ∩ {T ≤ t} ∈ Ft , for all t ≥ 0}.


FT is a σ-field because it contains the empty set, it is stable by complements
due to
c c
Ac ∩ {T ≤ t} = {A ∪ {T > t}} = {(A ∩ {T ≤ t}) ∪ {T > t}} ,

and it is stable by countable intersections.


3. If S and T are stopping times such that S ≤ T , then FS ⊂ FT . In fact,
if A ∈ FS , then

A ∩ {T ≤ t} = A ∩ {S ≤ t} ∩ {T ≤ t} ∈ Ft

for all t ≥ 0.

Remark 2.3.3 The σ-field FT have the meaning of the information before the
stopping time T . We observe what happens until time T , time T included.
2.4. OPTIONAL STOPPING THEOREM 21

4. Let {Xt } be an adapted stochastic process (with discrete or continuous


parameter) and let T be a stopping time. If the parameter is continuous, we
assume that the trajectories of the process {Xt } are right-continuous. Then the
random variable
XT (ω) = XT (ω) (ω)
is FT -measurable. In discrete time this property follows from the relation

{XT ∈ B} ∩ {T = n} = {Xn ∈ B} ∩ {T = n} ∈ Fn

for any subset B of the state space (Borel set if the state space is R).

2.4 Optional stopping theorem


Consider a discrete time fitration and suppose that T is a stopping time. Then,
the process
Hn = 1{T ≥n}
is predictable. In fact, {T ≥ n} = {T ≤ n − 1}c ∈ Fn−1 . The martingale
transform of Mn by this sequence is
n
X
(H · M )n = M0 + 1{T ≥j} ∆Mj ,
j=1
T
X ∧n
= M0 + ∆Mj = MT ∧n ,
j=1


as a consequence, if {Mn } is a (sub)martingale, the stopped process MnT :=
{MT ∧n } will be a (sub)martingale.

Theorem 2.4.1 (Optional Stopping Theorem) Suppose that {Mn } is a sub-


martingale and S ≤ T < m are two stopping times bounded by a fixed time m.

E(MT |FS ) ≥ MS

with equality in the martingale case.

Proof. We make the proof only in the martingale case. Notice first that
MT is integrable because
Xm
|MT | ≤ |Mn |.
n=0

Consider the predictable process Hn = 1{S<n≤T }∩A , where A ∈ FS . Notice


that {Hn } is predictable because
c
{S < n ≤ T } ∩ A = {T < n} ∩ ({S ≤ n − 1} ∩ A) ∈ Fn−1 .
22 CHAPTER 2. DISCRETE TIME MARTINGALES

Moreover, the random variables Hn are nonnegative and bounded by one. There-
fore by Proposition 2.2.2, (H · M )n is a martingale. We have

(H · M )0 = M0
(H · M )m = M0 + 1A (MT − MS ).

The martingale property of (H · M )n implies that E(1A (MT − MS )) = 0 for all


A ∈ FS and this implies that E(MT |FS ) = MS , because MS is FS -measurable.

Theorem 2.4.2 (Doob’s Maximal Inequality) Suppose that {Mn } is a sub-


martingale and λ > 0. Then
1 1 +
P ( sup Mn ≥ λ) ≤ E(MN 1{sup0≤n≤N Mn ≥λ} ≤ E(MN )
0≤n≤N λ λ

Proof. Consider the stopping time

T = inf{n ≥ 0, Mn ≥ λ} ∧ N.

Then, by the Optional Stopping Theorem,

E(MN ) ≥ E(MT ) = E(MT 1{sup0≤n≤N Mn ≥λ} )


+ E(MT 1{sup0≤n≤N Mn <λ} )
≥ E(λ1{sup0≤n≤N Mn ≥λ} ) + E(MN 1{sup0≤n≤N Mn <λ} ).

Corollary 2.4.1 If {Mn } is a martingale and p ≥ 1, and E(|Mn |p ) < ∞ then


1
P ( sup Mn ≥ λ) ≤ E(|MN |p ).
0≤n≤N λp

Remark 2.4.1 Note that the last inequality is a generalization of the Chebyshev
inequality.

2.5 The Snell envelope and optimal stopping


Let (Yn ) be an adapted process (to (Fn )) with finite expectation, define

XN = YN
Xn = max(Yn , E(Xn+1 |Fn )), 0 ≤ n ≤ N − 1,

we say that (Xn ) is the Snell envelope of (Yn ).

Proposition 2.5.1 The sequence (Xn ) is the smallest a supermartingale that


dominates the sequence(Yn )
2.5. THE SNELL ENVELOPE AND OPTIMAL STOPPING 23

Proof. (Xn ) is adapted and by construction

E(Xn+1 |Fn ) ≤ Xn .

Let (Tn ) be another supermartingale that dominates (Yn ), then TN ≥ YN = XN .


Assume that Tn+1 ≥ Xn+1 . Then, by the monotony of the expectation and since
(Tn ) is a supermartingale

Tn ≥ E(Tn+1 |Fn ) ≥ E(Xn+1 |Fn )

moreover (Tn ) dominates (Yn ), so

Tn ≥ max(Yn , E(Xn+1 |Fn )) = Xn

Remark 2.5.1 Fixed ω if Xn is strictly greater than Yn , Xn = E(Xn+1 |Fn )


so Xn behaves, until this n as a martingale, this indicates that if we ”stop” Xn
properly we can have a martingale.

Proposition 2.5.2 The random variable

ν = inf{n ≥ 0, Xn = Yn }

is a stopping time and (Xnν ) is a martingale.

Proof.

{ν = n} = {X0 > Y0 } ∩ ... ∩ {Xn−1 > Yn−1 } ∩ {Xn = Yn } ∈ Fn .

And
n
X
Xnν = X0 + 1{j≤ν} (Xj − Xj−1 )
j=1

therefore
ν
Xn+1 − Xnν = 1{n+1≤ν} (Xn+1 − Xn )
and
ν
E(Xn+1 − Xnν |Fn ) = 1{n+1≤ν} E(Xn+1 − Xn |Fn )

0 if ν ≤ n since the indicator vanishes
=
0 if ν > n since in such a case Xn = E(Xn+1 |Fn )

We denote τn,N stopping times with values in {n, n + 1, ..., N }.

Corollary 2.5.1

X0 = E(Yν |F0 ) = sup E(Yτ |F0 )


τ ∈τ0,N
24 CHAPTER 2. DISCRETE TIME MARTINGALES

Proof. (Xnν ) is a martingale and consequently


ν
X0 = E(XN |F0 ) = E(XN ∧ν |F0 )
= E(Xν |F0 ) = E(Yν |F0 ).

On the other hand (Xn ) is supermartingale and then (Xnτ ) as well for all τ ∈
τ0,N , so
τ
X0 ≥ E(XN |F0 ) = E(Xτ |F0 ) ≥ E(Yτ |F0 ),
therefore
E(Yν |F0 ) ≥ E(Yτ |F0 ), ∀τ ∈ τ0,N

Remark 2.5.2 Analogously we could prove

Xn = E(Yνn |Fn ) = sup E(Yτ |Fn ),


τ ∈τn,N

where
νn = inf{j ≥ n, Xj = Yj }

Definition 2.5.1 A stopping time ν is said to be optimal for the sequence (Yn )
if
E(Yν |F0 ) = sup E(Yτ |F0 ).
τ ∈τ0,N

Remark 2.5.3 The stopping time ν = inf{n, Xn = Yn } (where X is the Snell


envelope of Y ) is then an optimal stopping time for Y . We shall see the it is
the smallest optimal stopping time.

The following theorem characterize the optimal stopping times.

Theorem 2.5.1 τ is an optimal stopping time if and only if



Xτ = Yτ
(Xnτ ) is a martingale

Proof. If (Xnτ ) is a martingale and Xτ = Yτ


τ
X0 = E(XN |F0 ) = E(XN ∧τ |F0 )
= E(Xτ |F0 ) = E(Yτ |F0 ).

On the other hand for all stopping time π, (Xnπ ) is a supermartingale, so


π
X0 ≥ E(XN |F0 ) = E(Xπ |F0 ) ≥ E(Yπ |F0 ).

Reciprocally, we know, by the previous corollary, that X0 = supτ ∈τ0,N E(Yτ |F0 ).
Then, if τ is optimal

X0 = E(Yτ |F0 ) ≤ E(Xτ |F0 ) ≤ X0 ,


2.6. DECOMPOSITION OF SUPERMARTINGALES 25

where the last inequality is due to the fact that (Xnτ ) is a supermartingale. So,
we have
E(Xτ − Yτ |F0 ) = 0
and since Xτ − Yτ ≥ 0, we conclude that Xτ = Yτ .
Now we can also see that (Xnτ ) is a martingale. We know that it is a super-
martingale, then

X0 ≥ E(Xnτ |F0 ) ≥ E(XN


τ
|F0 ) = E(Xτ |F0 ) = X0

as we saw before. Then, for all n

E(Xnτ − E(Xτ |Fn )|F0 ) = 0,

and since (Xnτ ) is supermartingale,

Xnτ ≥ E(XN
τ
|Fn ) = E(Xτ |Fn )

therefore Xnτ = E(Xτ |Fn ).

2.6 Decomposition of supermartingales


Proposition 2.6.1 Any supermartingale (Xn ) has a unique decomposition:

Xn = Mn − An

where (Mn ) is a martingale and (An ) is non-decreasing predictable with A0 = 0.

Proof. It is enough to write


n
X n
X
Xn = (Xj − E(Xj |Fj−1 )) − (Xj−1 − E(Xj |Fj−1 )) + X0
j=1 j=1

and to identify
n
X
Mn = (Xj − E(Xj |Fj−1 )) + X0 ,
j=1
Xn
An = (Xj−1 − E(Xj |Fj−1 ))
j=1

where we define M0 = X0 and A0 = 0. So (Mn ) is a martingale:

Mn − Mn−1 = Xn − E(Xn |Fn−1 ), 1≤n≤N

in such a way that

E(Mn − Mn−1 |Fn−1 ) = 0, 1 ≤ n ≤ N.


26 CHAPTER 2. DISCRETE TIME MARTINGALES

Finally since (Xn ) is supermartingale

An − An−1 = Xn−1 − E(Xn |Fn−1 ) ≥ 0, 1 ≤ n ≤ N.

Now we can see the uniqueness. If

Mn − An = Mn0 − A0n , 0≤n≤N

we have
Mn − Mn0 = An − A0n , 0 ≤ n ≤ N,
but then since (Mn ) y (Mn0 ) are martingales and (An ) y (A0n ) predictable, it
turns out that

An−1 − A0n−1 = Mn−1 − Mn−1


0
= E(Mn − Mn0 |Fn−1 )
= E(An − An |Fn−1 ) = An − A0n , 1 ≤ n ≤ N,
0

that is
AN − A0N = AN −1 − A0N −1 = ... = A0 − A00 = 0,
since by hypothesis A0 = A00 = 0.
This decomposition is known as the Doob decomposition.

Proposition 2.6.2 The biggest optimal stopping time for (Yn ) is given by

N si AN = 0
νmax = ,
inf{n, An+1 > 0} si AN > 0

where (Xn ), Snell envelope of (Yn ), has a Doob decomposition Xn = Mn − An .

Proof. {νmax = n} = {A1 = 0, A2 = 0, ..., An = 0, An+1 > 0} ∈ Fn ,


0 ≤ n ≤ N − 1, {νmax = N } = {AN = 0} ∈ FN −1 . So, it is a stopping time.

Xnνmax = Xn∧νmax = Mn∧νmax − An∧νmax = Mn∧νmax

since An∧νmax = 0. Therefore (Xnνmax ) is a martingale. So, to see that this


stopping time is optimal we have to prove that

Xνmax = Yνmax

N
X −1
Xνmax = 1{νmax =j} Xj + 1{νmax =N } XN
j=1
N
X −1
= 1{νmax =j} max(Yj , E(Xj+1 |Fj )) + 1{νmax =N } YN ,
j=1

but in {νmax = j}, Aj = 0, Aj+1 > 0 so

E(Xj+1 |Fj ) = E(Mj+1 |Fj ) − Aj+1 < E(Mj+1 |Fj ) = Mj = Xj


2.7. MARTINGALE CONVERGENCE THEOREMS 27

therefore Xj = Yj en {νmax = j} and consequently Xνmax = Yνmax . Finally we see


that is the biggest optimal stopping time. Let τ ≥ νmax and P {τ > νmax } > 0.
Then

E(Xτ ) = E(Mτ ) − E(Aτ ) = E(M0 ) − E(Aτ )


= X0 − E(Aτ ) < X0

so (Xτ ∧n ) cannot be a martingale.

2.7 Martingale convergence theorems


Let {Mn }n≥0 be a supermartingale and a < b ∈ R, let UN [a, b](ω) be the number
of upcrossings made by (Mn (ω)) by time N . Then define a predictable strategy
H1 = 1{M0 <a} and Hn = 1{Mn−1 <a,Hn−1 =0} + 1{Mn−1 ≤b,Hn−1 =1} , for n ≥ 2.
Then
Xn
(H · M )n := Hj (Mj − Mj−1 )
j=1

is a supermartingale that represents the gain by time n in a game where the


profit at time n if the bet is 1 euro. It is easy to see (by a simple picture) that

(H · M )n ≥ (b − a)Un [a, b] − (Mn − a)− , n ≥ 1

then we have:

Lemma 2.7.1 Let {Mn }n≥0 be a supermartingale let UN [a, b](ω) be the number
of upcrossings by time N. Then

(b − a)E(UN [a, b]) ≤ E((MN − a)− ).

Proof. The process (H · M )n above is a supermartingale then E((H · M )N ) ≤


0.

Corollary 2.7.1 Let {Mn }n≥0 be a supermartingale such that supn E(|Mn |) <
∞. Let a < b ∈ R and U∞ [a, b] := limn→∞ Un [a, b], then

(b − a)E(U∞ [a, b]) ≤ |a| + sup E(|Mn |) < ∞,


n

so that
P (U∞ [a, b] = ∞) = 0.

Proof. By the previous lemma

(b − a)E(UN [a, b]) ≤ |a| + E(|MN |)

and taking the limit when N → ∞ and using the monotone convergence theorem
we obtain the result.
28 CHAPTER 2. DISCRETE TIME MARTINGALES

Theorem 2.7.1 Let {Mn }n≥0 be a supermartingale such that supn E(|Mn |) <
∞. Then almost surely M∞ := limn→∞ Mn exists and is finite.

Proof. Let Λ = {ω, Mn (ω) does not converge to a limit in [−∞, +∞]},
then

Λ = ∪a<b∈Q {ω, lim inf Mn (ω) < a < b < lim sup Mn (ω)}
= ∪a<b∈Q Λab (say),

but
Λab ⊂ {U∞ [a, b] = ∞} ,
and by the previous corollary P (Λab ) = 0 and then P (Λ) = 0. So M∞ exists in
[−∞, +∞] a.s., but by Fatou’s lemma

E (|M∞ |) = E (lim inf |Mn |) ≤ lim inf E (|Mn |) ≤ sup E(|Mn |) < ∞,
n

so M∞ is finite a.s..

Corollary 2.7.2 If {Mn }n≥0 is a non negative supermartingale then limn→∞ Mn


exists and is finite almost surely.

Proof. E(|Mn |) = E(Mn ) ≤ E(M0 ).

Remark 2.7.1 It is important to note that it need no be true that Mn → M∞


in L1 .

Example 2.7.1 Suppose that {ξn , n ≥ 1} are i.i.d. centered random variables
with distribution N (0, 2σ 2 ). Set M0 = 1 and
 
Xn
Mn = exp  ξj − nσ 2  .
j=1

Then, {Mn } is a nonnegative martingale such that Mn → 0 a.s., by the strong


law of large numbers, but E(Mn ) = 1 for all n.

Theorem 2.7.2 Let {Mn }n≥0 be a martingale, such that supn E(Mn2 ) < ∞,
then Mn → M∞ a.s. and in L2 .

Proof. Obviously supn E(|Mn |) < ∞ so we have amost sure convergence to


M∞ , on the other hand by the orthogonality of the increments
n
X
E(Mn2 ) = E(M02 ) + E((Mk − Mk−1 )2 ),
k=1

so

X
E((Mk − Mk−1 )2 ) < ∞,
k=1
2.7. MARTINGALE CONVERGENCE THEOREMS 29

and applying Fatou’s lemma in


n+r
X
E((Mn+r − Mn )2 ) = E((Mk − Mk−1 )2 ),
k=n+1

when r → ∞, we have

X
E((M∞ − Mn )2 ) ≤ E((Mk − Mk−1 )2 ),
k=n+1

so
lim E((M∞ − Mn )2 ) = 0.
n→∞
30 CHAPTER 2. DISCRETE TIME MARTINGALES
Chapter 3

Markov Processes

3.1 Discrete-time Markov chains

Definition 3.1.1 We say that (Xn )n≥0 is a Markov chain with initial distri-
bution λ := (λi , i ∈ I) and transition matrix P := (pij , i, j ∈ I)

(i) P {X0 = i} = λi , i ∈ I
(ii) for all n ≥ 0, P (Xn+1 = in+1 |X0 = i0 , X1 = i1 , ..., Xn = in ) = pin in+1 .

Where I is a countable set. We say that (Xn )n≥0 is Markov(λ, P ) for short.

We regard λ as a row vector whose components are indexed by I and P a


(n)
matrix whose entries are indexed by I × I. We write pij = (P n )ij for the (i, j)

entry in P n . We agree that P 0 is the identity matrix, that is P 0 ij = δij .

Proposition 3.1.1 Let (Xn )n≥0 be Markov(λ, P ). Then for all n, m ≥ 0

P (Xn+m = j|X0 = i0 , , ..., Xn = i) = P (Xn+m = j|Xn = i)


= P (Xm = j|X0 = i).

31
32 CHAPTER 3. MARKOV PROCESSES

Proof.

P (Xn+m = j|Xn = i)
P (Xn+m = j, Xn = i)
= =
P (Xn = i)
P P P
i0 ∈I i1 ∈I ... in+m−1 ∈I P (X0 = i0 , ..., Xn = i)piin+1 · · · pin+m−1 j
= P P P
i0 ∈I i1 ∈I ... in ∈I P (X0 = i0 , ..., Xn = i)
X X
= ... piin+1 · · · pin+m−1 j
in+1 ∈I in+m−1 ∈I
P P
in+1 ∈I ... in+m−1 ∈I P (X0 = i, X1 = in+1 ..., Xm−1 = in+m−1 , Xm = j)
=
P (X0 = i)
P (X0 = i, Xm = j)
= = P (Xm = j|X0 = i)
P (X0 = i)

Remark 3.1.1 Note also that

P (Xn+1 = j1 , Xn+2 = j2 , ..., Xn+m = jm |X0 = i0 , , ..., Xn = i)


= P (X1 = j1 , X2 = j2 , ..., Xm = jm |X0 = i).

Proposition 3.1.2 Let (Xn )n≥0 be Markov(λ, P ). Then, for all n, m ≥ 0,

(i) P {Xn = j} = (λP n )j , j ∈ I


(n)
(ii) Pi (Xn = j) := P (Xn = j|, X0 = i) = P (Xn+m = j|, Xm = i) = pij .

Proof. (i):
XX X
P {Xn = j} = ... P (X0 = i0 , ..., Xn = j)
i0 ∈I i1 ∈I in−1 ∈I
XX X
= ... P (Xn = j|Xn−1 = in−1 , ..., X0 = i0 )P (Xn−1 = in−1 , ..., X0 = i0 )
i0 ∈I i1 ∈I in−1 ∈I
XX X
= ... pin−1 j P (Xn−1 = in−1 , ..., X0 = i0 )
i0 ∈I i1 ∈I in−1 ∈I
XX X
= ... pin−2 in−1 pin−1 j P (Xn−2 = in−2 , ..., X0 = i0 ) =
i0 ∈I i1 ∈I in−1 ∈I
XX X
= ... = ... λi0 pi0 i1 · · · pin−2 in−1 pin−1 j = (λP n )j .
i0 ∈I i1 ∈I in−1 ∈I

(n)
pii1 · · · pin−1 j = (P n )ij = pij .
P P
(ii): P (Xn = j|, X0 = i) = i1 ∈I ... in−1 ∈I

Example 3.1.1 The most general two-state chain has transition matrix
 
1−α α
P = .
β 1−β
3.1. DISCRETE-TIME MARKOV CHAINS 33

(n)
If we want to calculate p11 we can use the relation P n+1 = P n P , then
(n+1) (n) (n)
p11 = p11 (1 − α) + p12 β.

We know that
(n) (n)
p11 + p12 = P1 (Xn = 1 or 2) = 1,
so
(n+1) (n)
p11 = p11 (1 − α − β) + β, n ≥ 0
(0)
with p11 = 1. This has a unique solution
 β α n
(n)
p11 = α+β + α+β (1 − α − β) for α + β > 0
1 for α + β = 0.

3.1.1 Class structure


It is sometimes possible to split a Markov chain into smaller pieces, each of
which is relatively easy to analyze, and which together we can understand the
whole.This is done by identifying the communicating classes.
We say that i leads to j and write i → j if

Pi (Xn = j for some n ≥ 0) > 0.

We say that i communicates with j and write i ←→ j if both i → j and j → i.

Proposition 3.1.3 The following are equivalent

(i) i → j;
(ii) pi0 i1 · · · pin−1 in > 0 for some states i0 , i1 , ..., in , with i0 = i and in = j;
(n)
(iii) pij > 0 for some n ≥ 0.

Proof. (i)∼(iii):

(n) (n)
X
pij ≤ Pi (Xn = j for some n ≥ 0) ≤ pij .
n=0

(ii)∼ (iii):
(n)
X
pij = pii1 · · · pin−1 j.
i1 ,i2 ,...,in−1

It is clear that i ←→ j is an equivalence relation. So, it partitions I (the


state space) into communicating classes. We say that a class C is closed if

i ∈ C, i → j imply j ∈ C.

We cannot escape from a closed class. A state i is absorbing if {i} is a closed


class. A chain or matrix P is called irreducible if I is a single class.
34 CHAPTER 3. MARKOV PROCESSES

Example 3.1.2 Consider the stochastic matrix


 1 1 
2 2 0 0 0 0
 0 0 1 0 0 0 
 1 1 1


3 0 0 3 3 0 
P =  0 0 0 1 1
,
 2 2 0 

 0 0 0 0 0 1 
0 0 0 0 1 0

the communicating classes associated with it are: {1, 2, 3}, {4} and {5, 6}, with
only {5, 6} being closed.

3.1.2 Hitting times and absortion probabilities


A
Let τ be the hitting time of a subset A ⊆ I:

τ A = inf{n ≥ 0, Xn ∈ A}.

One purpose of this subsection is to learn how to calculate the probability,


starting from i, that (Xn )n≥0 ever hits A that is

hA A
i := Pi (τ < ∞).

When A is a closed class this probability is named absortion probability. The


other purpose is to calculate the mean time taken for (Xn )n≥0 to reach A:
X
kiA := Ei (τ A ) = nPi (τ A = n) + ∞Pi (τ A = ∞).
1≤n<∞

Proposition 3.1.4
X
Ei (τ A ) = Pi (τ A ≥ n)
1≤n≤∞

Proof. X
Pi (τ A ≥ n) = Pi (τ A = m),
n≤m≤∞

then
X X X
Pi (τ A ≥ n) = Pi (τ A = m)
1≤n≤∞ 1≤n≤∞ n≤m≤∞
X X
= Pi (τ A = m)
1≤m≤∞ 1≤n≤m
X
= mPi (τ A = m).
1≤m≤∞
3.1. DISCRETE-TIME MARKOV CHAINS 35

Example 3.1.3 Consider the following transition matrix


 
1 0 0 0
 1 0 1 0 
P = 2 2
 0 1 0 1 ,

2 2
0 0 0 1
{4} {1,4}
Introduce hi := hi and ki =: ki . Then is clear that h1 = 0, h4 = 1 and
k1 = k4 = 0. Also we have
1 1 1 1
h2 = h1 + h3 , k2 = 1 + k1 + k3 ,
2 2 2 2
and
1 1 1 1
h3 = h2 + h4 , k3 = 1 + k2 + k4 .
2 2 2 2
Therefore
1 1
h2 = h3 , k2 = 1 + k3
2 2
1 1 1
h3 = h2 + , k3 = 1 + k2
2 2 2
and then
1 2
h2 = , h3 =
3 3
k2 = k3 = 2.

Proposition 3.1.5 The vector of hitting probabilities hA = (hA i : i ∈ I) is the


minimal non-negative solution to the system of linear equations
 A
hi = P1 for i ∈ A
(3.1)
hA
i = j∈I p ij h A
j for i 6∈ A.

where minimality means that if x = (xi : i ∈ I) is another solution with xi ≥ 0


for all i, then xi ≥ hA
i , for all i.

Proof. First we prove that hA satisfies (3.1). If X0 = i ∈ A then τ A = 0,


so hA A
i = 1. If X0 = i 6∈ A, then τ ≥ 1 and by the Markov property

Pi (τ A < ∞|X1 = j) = Pj (τ A < ∞) = hA


j .

Then
X
hA
i = Pi (τ A < ∞, X1 = j)
j∈I
X
= Pi (τ A < ∞|X1 = j)Pi (X1 = j)
j∈I
X X
= Pj (τ A < ∞)Pi (X1 = j) = pij hA
j .
j∈I j∈I
36 CHAPTER 3. MARKOV PROCESSES

Suppose now that x = (xi : i ∈ I) is another solution to (3.1) with xi ≥ 0. Then


hA
i = xi = 1 for i ∈ A consequently, if i 6∈ A
X X X
xi = pij xj = pij + pij xj .
j∈I j∈A j6∈A

Now if we substitute for xj we obtain


 
X X X X
xi = pij + pij  pjk + pik xk 
j∈A j6∈A k∈A k6∈A
X XX XX
= pij + pij pjk + pij pik xk
j∈A j6∈A k∈A j6∈A k6∈A
XX
= Pi (X1 ∈ A) + Pi (X1 6∈ A, X2 ∈ A) + pij pik xk .
j6∈A k6∈A

By repeating the substitution, after n steps we have


xi = Pi (X1 ∈ A) + ... + Pi (X1 6∈ A, ..., Xn−1 6∈ A, Xn ∈ A)+
X X
+ ... pij1 pj1 j2 · · · pjn−1 jn xjn .
j1 6∈A jn 6∈A

So, since tha last term is non-negative, for all n ≥ 1


xi ≥ Pi (X1 ∈ A) + ... + Pi (X1 6∈ A, ..., Xn−1 6∈ A, Xn ∈ A)
= Pi (τ A ≤ n).
Finally
xi ≥ lim Pi (τ A ≤ n) = Pi (τ A < ∞) = hA
i .
n−→∞

Proposition 3.1.6 The vector of mean hitting times k A = (kiA : i ∈ I) is the


minimal non-negative solution to the system of linear equations
 A
ki = 0 P for i ∈ A
(3.2)
kiA = 1 + j6∈A pij kjA for i 6∈ A.

Proof. First we prove that k A satisfies the system (3.2). If i ∈ A then


A
τ = 0 and kiA = 0. If X0 = i 6∈ A, then τ A ≥ 1 and by the Markov property
Ei (τ A |X1 = j) = 1 + Ej (τ A ).
Then
X X
Ei (τ A ) = Ei (τ A 1{X1 =j} ) = Ei (τ A |X1 = j)Pi (X1 = j)
j∈I j∈I
X X
A
=1+ Ej (τ )pij = 1 + pij kjA .
j∈I j6∈A
3.1. DISCRETE-TIME MARKOV CHAINS 37

Now, assume that y = (yi , i ∈ I) is a solution of the system (3.2). Then


X X X
yi = 1 + pij yj = 1 + pij (1 + pjk yk )
j6∈A j6∈A k6∈A
X XX
=1+ pij + pij pjk yk
j6∈A j6∈A k6∈A
XX
= Pi (τ A ≥ 1) + Pi (τ A ≥ 2) + pij pjk yk
j6∈A k6∈A

= ... = Pi (τ A ≥ 1) + ... + Pi (τ A ≥ n)
X X
+ ... pij1 · · · pjn−1 jn yjn ,
j1 6∈A jn 6∈A

and X
yi ≥ Pi (τ A ≥ n).
1≤n≤∞

(Gambler’s ruin) Consider the transition probabilities

p00 = 1
pi,i−1 = q > 0, pi,i+1 = p = 1 − q > 0, for i = 1, 2, ....

Imagine you enter in the casino with 1 euro and you can win 1 euro with
probability por to lose it with probability q. The resources of the casino are
regarded as infinite, so there is no upper bound to your fortune. But which is
the probability you finish broke?
Set hi = Pi (τ {0} < ∞), then hi is the minimal solution of the system

h0 = 1,
hi = qhi−1 + phi+1 , i = 1, 2, ...

If p 6= q this recurrence equation has a general solution


 i
q
hi = A + B .
p

If p < q (quite often) the restriction 0 ≤ hi ≤ 1 forces B to be zero so hi = 1


for all i. If p > q, since h0 = 1, we have a family of solutions
 i  i
q q
hi = + A(1 − ),
p p
 i
q
and since hi ≥ 0 we must have A ≥ 0, then the minimal solution is hi = p .
Finally if p = q the recurrent relation has a general solution

hi = A + Bi
38 CHAPTER 3. MARKOV PROCESSES

and again the restriction 0 ≤ hi ≤ 1 forces B to be zero, so hi = 1 for all i.


Thus, even if you find a fair casino and you have a lot of money, you are certain
to end up broke.
Proposition 3.1.7 (Strong Markov property) Let (Xn )n≥0 be Markov(λ, P ).
And τ a finite stopping time associated to (Xn )n≥0 and A ∈ Fτ then for all
m≥0
P (Xτ +m = j|A, Xτ = i) = P (Xm = j|X0 = i).
That is (XT +n )n≥0 is Markov(µ, P ) where µi = P (XT = i).
Proof.
P (Xτ +m = j, A, Xτ = i)
P (Xτ +m = j|A, Xτ = i) = =
P (A, Xτ = i)
P
n<∞ P (Xτ +m = j, A, Xτ = i, τ = n)
= P
n<∞ P (A, Xτ = i, τ = n)
P
n<∞ (Xn+m = j, A, Xn = i, τ = n)
P
= P
n<∞ P (A, Xn = i, τ = n)
P
n<∞ P (Xn+m = j|Xn = i)P (A, Xn = i, τ = n)
= P
n<∞ P (A, Xn = i, τ = n)
P
n<∞ P (Xm = j|X0 = i)P (A, Xn = i, τ = n)
= P
n<∞ P (A, Xn = i, τ = n)
P
P (Xm = j|X0 = i) n<∞ P (A, Xn = i, τ = n)
= P
n<∞ P (A, Xn = i, τ = n)

Remark 3.1.2 Note also that if τ is a finite stopping time associated to (Xn )n≥0
and A ∈ Fτ then for all m ≥ 0
P (Xτ +1 = j1 , Xτ +2 = j2 , ..., Xτ +m = jm |A, Xτ = i)
= P (X1 = j1 , X2 = j2 , ..., Xm = jm |X0 = i).
In particular assume that
τ0 = inf{n ≥ 0, Xn ∈ J ⊂ I}
and for m = 0, 1, 2, ...
τm+1 = inf{n > τm , Xn ∈ J ⊂ I},
then

P Xτm+1 = j|Xτ0 = i0 , Xτ1 = i1 , ..., Xτm = i
= P (Xτ1 = j|Xτ0 = i) ,
where for i, j ∈ J
P (Xτ1 = j|Xτ0 = i) = Pi (Xn = j for some n ≥ 1)
3.1. DISCRETE-TIME MARKOV CHAINS 39

3.1.3 Recurrence and transience


Let (Xn )n≥0 be a Markov chain with transition matrix P. We say that a state
i is recurrent if
Pi (Xn = i, for infinitely many n) = 1.
We say that i is transient if
Pi (Xn = i, for infinitely many n) = 0.
Recall that the first passage time to state i is the random variable τ (i) defined
as
τ (i) = inf{n ≥ 1, Xn = i},
(i)
then we can define inductively the rth passage time τr to state i for r = 1, 2, ...
by
(i) (i)
τ1 = τ (i) , τr+1 = inf{n ≥ τr(i) + 1, Xn = i}.
The length for the rth excursion, for r ≥ 2, is given by
(
(i) (i) (i)
(i) τr − τr−1 if τr−1 < ∞
δr := (i) .
0 if τr−1 = ∞
(i)
Lemma 3.1.1 For r ≥ 2, conditional on τr−1 < ∞ and for all A ∈ Fτ (i) we
r−1
have
(i)
P (δr(i) = n|τr−1 < ∞, A) = Pi (τ (i) = n).
n o
(i)
Proof. By the strong Markov property and conditional on τr−1 < ∞ ,
(Xτ (i) +n )n≥0 is a Markov chain with transition matrix P = (pij ), pij =
r−1
(i)
P (X1 = j|X0 = i) and λj = δij for all j ∈ I. Then δr is the first passage time
of (Xτ (i) +n )n≥0 to state i, so
r−1

(i)
P (δr(i) = n|τr−1 < ∞, A) = P (τ (i) = n|X0 = i).

Let us introduce the number of visits Vi to i:



X
Vi = 1{Xn =i} ,
n=0

note that
∞ ∞ ∞
(n)
X X X
Ei (Vi ) = Ei (1{Xn =i} ) = Pi (Xn = i) = pii .
n=0 n=0 n=0

Also we can compute the distribution of Vi under Pi in terms of the return


probability
fi = Pi (τ (i) < ∞).
40 CHAPTER 3. MARKOV PROCESSES

Lemma 3.1.2 For r ≥ 0 we have Pi (Vi > r) = fir (we assume 00 = 1).

Proof. For Pi (Vi > 0) = 1 since X0 = i, so the result is true for r = 0.


(i)
{X0 = i, Vi > r} = {X0 = i, τr < ∞}, assume the result is true for r, then
(i)
Pi (Vi > r + 1) = Pi (τr+1 < ∞)
(i)
= Pi (τr(i) < ∞, δr+1 < ∞)
(i)
= Pi (δr+1 < ∞|τr(i) < ∞)Pi (τr(i) < ∞)
= Pi (τ (i) < ∞)fir .

Theorem 3.1.1 The following dichotomy holds:



(n)
X
(i) if Pi (τ (i) < ∞) = 1 then i is recurrent and pii = ∞
n=0

(n)
X
(ii) if Pi (τ (i) < ∞) < 1 then i is transient and pii < ∞.
n=0

Proof.
Pi (Vi = ∞) = lim Pi (Vi > n) = fin ,
n→∞
(i)
then if fi := Pi (τ < ∞) = 1, then Pi (Vi = ∞) = 1 and i is recurrent and

(n)
X
pii = Ei (Vi ) = ∞.
n=0

On the other hand if fi < 1


∞ ∞ ∞
(n)
X X X
pii = Ei (Vi ) = Pi (V > n) = fin
n=0 n=0 n=0
1
= < ∞,
1 − fi

so Pi (Vi = ∞) = 0 and i is trasient.


Recurrence and trasience are class properties.

Theorem 3.1.2 Let C a communicating class. Then either all states in C are
trasient or recurrent.

Proof. Take a pair of states i, j ∈ C and assume that i is trasient. Then


(n) (m)
there exist n, m such that pij > 0 and pji > 0, and for all r ≥ 0

(n+m+r) (n) (r) (m)


pii ≥ pij pjj pji ,
3.1. DISCRETE-TIME MARKOV CHAINS 41

then
∞ ∞
X (r) 1 (n+m+r)
X
pjj ≤ (n) (m)
pii < ∞,
r=0 pij pji r=0
so by the previous Theorem j is also trasient.

Theorem 3.1.3 Every recurrent class is closed.

Proof. Let C a class that is not closed. Then there exist i ∈ C, j 6∈ C and
m ≥ 1 such that
Pi (Xm = j) > 0.
Since we have that

Pi (Xm = j, Xn = i for infinitely many n) = 0,

this implies that

Pi (Xn = i for infinitely many n) < 1,

so i is not recurrent, and so neither is C.

Theorem 3.1.4 Every finite closed class is recurrent.

Proof. Suppose that C is close and that (Xn )n≥0 starts in C. Then for
some i ∈ C
P (Xn = i for infinitely many n) > 0,
since the class is finite. Let τ = inf{n ≥ 0, Xn = i} then

P (Xn = i for infinitely many n)


= P (τ < ∞)P (Xτ +m = i for infinitely many m|τ < ∞)
= P (τ < ∞)Pi (Xm = i for infinitely many m) > 0.

So this shows that i is not trasient.

3.1.4 Recurrence and trasient in a random walk


Consider the transition probabilities in Z

pi,i−1 = q > 0, pi,i+1 = p = 1 − q > 0, for i ∈ Z,

that is a simple random walk on Z. It is clear that


(2n+1)
p00 = 0,

and it is easy to see that


 
(2n) 2n
p00 = pn q n .
n
42 CHAPTER 3. MARKOV PROCESSES

Bu Stirling’s formula

n! ∼ 2πnnn e−n , as n → ∞

so n
(2n) (2n)! n 1 (4pq)
p00 = 2 (pq) ∼
√ √ , as n → ∞.
(n!) π n
(2n)
If p = q = 21 , p00 ∼ √1 √1 ,
π n
and since


X 1
√ = ∞,
n=1
n

we have that

(n)
X
p00 = ∞,
n=0

so the random walk is recurrent. If, on the contrary, p 6= q, then 4pq = r < 1
(2n) rn
and p00 ∼ √1π √ n
, now since


X rn
√ < ∞,
n=1
n

we have that

(n)
X
p00 < ∞,
n=0

and the random walk is trasient.

3.1.5 Invariant distributions


Many of the long-time properties of Markov chains are connected with the notion
of an invariant distribution or measure. We say that λ is invariant if

λP = λ.

Theorem 3.1.5 Let (Xn )n≥0 be Markov(λ, P ) and suposse that λ is invariant
then (Xm+n )n≥0 is also Markov(λ, P ).

Proof. By the Markov property (Xm+n )n≥0 has the same transition matrix
with initial distribution P (Xm = i) = (λP m )i = λi .

Theorem 3.1.6 Let I be finite. Suppose for some i ∈ I that


(n)
pij → πj as n → ∞ for all j ∈ I.

Then π = {πj , j ∈ I} is an invariant distribution.


3.1. DISCRETE-TIME MARKOV CHAINS 43

Proof. We have
(n) (n)
X X X
πj = lim pij = lim pij = 1,
n→∞ n→∞
j∈I j∈I j∈I

and
(n) (n−1) (n−1)
X X
πj = lim pij = lim pik pkj = lim p pkj
n→∞ n→∞ n→∞ ik
k∈I k∈I
X
= πk pkj .
k∈I

Example 3.1.4 Consider the two-state Markov chain with transition matrix
 
1−α α
P = .
β 1−β

Ignore te trivial case α = β = 0 and α = β = 1. Then, as we saw before


!
β α
Pn → α+β
β
α+β
α as n → ∞,
α+β α+β
 
β α
so, by the previous theorem, the distribution α+β , α+β must be invariant. Of
course we can solve
 
 1−α α 
λ 1 , λ2 = λ1 , λ2 ,
β 1−β

and we obtain  
 β α
λ1, λ2 = α+β , α+β .
Note also that , for the case α = β = 1 we do not have convergence of P n .

Example 3.1.5 Consider the three-state chain with matrix


 
0 1 0
P =  0 12 12  .
1
2 0 12

Then, since the eigenvalues are 1, i/2, −i/2, we deduce that


 n  n
(n) i i
p11 = a + b +c − ,
2 2
(n)
for some constants a, b and c. Since p11 is real and
 n  n  n
i 1 inπ 1 nπ nπ
± = e± 2 = (cos ± i sin ),
2 2 2 2 2
44 CHAPTER 3. MARKOV PROCESSES

then we can write


 n 
(n) 1 nπ nπ 
p11 =α+ β cos + γ sin ,
2 2 2
for constants α, β and γ that can be determined by the boundary conditions
(0)
1 = p11 = α + β
(1) 1
0 = p11 = α + γ
2
(2) 1
0 = p11 = α − β.
4
So, α = 1/5, β = 4/5, γ = −2/5 and
 n  
(n) 1 1 4 nπ 2 nπ
p11 = + cos − sin .
5 2 5 2 5 2
Then to find the invariant distribution we can write down the components of the
vector equation
1
π1 = π3
2
1
π2 = π1 + π2
2
1 1
π3 = π2 + π3.
2 2
One equation is redundant but we have the additional condition
π1 + π2 + π3 = 1
and the soution is π = (1/5, 2/5, 2/5). Note that
(n) 1
p11 → as n → ∞,
5
so this confirm the previous theorem and also we could have used the result to
(2)
identify α = 1/5 instead of working out p11 .
For a fixed state k, consider for each i the expected time spent in i between
visits to k:  (k) 
τX −1
k
γi := Ek  1{Xn =i}. 
n=0

Theorem 3.1.7 Let P be irreducible and recurrent. Then


(i) γkk = 1;
(ii) γ k := (γik , i ∈ I) satisfies γ k P = γ k ;
(iii) 0 < γik < ∞ for all i ∈ I.
3.1. DISCRETE-TIME MARKOV CHAINS 45

Proof. (i) is obvious. (ii) Since P is recurrent, under Pk , τ (k) < ∞ and
X0 = Xτ (k) = k
 (k) 

τ
!
X X
k
γj = Ek  1{Xn =j}  = Ek 1{Xn =j} 1{n≤τ (k) }
n=1 n=1

X ∞
XX
= Pk (Xn = j, n ≤ τ (k) ) = Pk (Xn−1 = i, Xn = j, n ≤ τ (k) )
n=1 i∈I n=1

XX
= Pk (Xn = j|Xn−1 = i, n ≤ τ (k) )Pk (Xn−1 = i, n ≤ τ (k) )
i∈I n=1
 
X ∞
X X τ (k)
X −1
= pij Pk (Xn−1 = i, n ≤ τ (k) ) = pij E  1{Xm =i} 
i∈I n=1 i∈I m=0
X
= pij γik .
i∈I

(n)
(iii) Fixed i ∈ I, since P is irreducible, there exist m, n ≥ 0 such that pki > 0
(m) (n) (m)
and pik > 0, then by (i) and (ii) γik > γkk pki > 0 and γkk > γik pik .

Theorem 3.1.8 Let P be irreducible and let λ be an invariant measure for P


with λk = 1.Then λ ≥ γ k . If in addition P is recurrent, then λ = γ k .

Proof.
X X
λj = λ i0 pi 0 j = λi0 pi0 j + pkj
i0 ∈I i0 6=k
X X X
= λi1 pi1 i0 pi0 j + pkj + pki0 pi0 j
i0 6=k i1 6=k i0 6=k
X X X
= ... = ... λin pin in−1 · · · pi0 j
i0 6=k i1 6=k in 6=k
X X X X
+ pkj + pki0 pi0 j + ... + ... pkin−1 · · · pi0 j
i0 6=k i0 6=k i1 6=k in−1 6=k
X X X X
≥ pkj + pki0 pi0 j + ... + ... pkin−1 · · · pi0 j
i0 6=k i0 6=k i1 6=k in−1 6=k

→ γjk as n goes to infinity.

Since P is recurrent γ is invariant and µ := λ − γ will be also invariant, now


(n)
since it is irreducible, for any fixed i, there will be n ≥ 0 such that pik > 0 and
(n) (n)
X
0 = µk = µj pjk ≥ µi pik ,
j

so µi = 0.
46 CHAPTER 3. MARKOV PROCESSES

A recurrent state is positive recurrent if the expected return time

mi := Ei (τ (i) )

is finite, otherwise is called null recurrent.

Remark 3.1.3 Note that


 (i)

τX −1 X X
mi = Ei  1{Xn =j}  + 1 = γji
n=0 j6=i j∈I

Theorem 3.1.9 Let P be irreducible. Then the following are equivalent:

(i) every state is positive recurrent;


(ii) some state i is positive recurrent;
(iii) P has an invariant distribution, π say.
Moreover, when (iii) holds we have mi = 1/πi , for all i.

Proof. (i)=⇒(ii) is evident. (ii)=⇒(iii). If i is recurrent then P is recur-


rent and by the Theorem 3.1.7 γ i is an invariant measure. But
X
γji = mi ,
j∈I

and mi < ∞, so πi := γji /mi defines an invariant distribution. (iii)=⇒(i).


Takes any state k, since P is ireducible there exist n ≥ 1 such that πk =
P (n) k
j∈I πi pik > 0. Set λi = πi /πk , then, by the previous theorem λ ≥ γ and

X X πi 1
mk = γik ≤ = <∞
πk πk
i∈I i∈I

and k is positive recurrent. Note finally that, since P is recurrent, by the previ-
ous theorem the above inequality is in fact an equality.

Theorem 3.1.10 If the state space is finite there is at least one stationary
distribution.

Proof. We can assume the chain is irreducible, then since it is finite, every
state is positive recurrent, in fact since
X
mk = γik
i∈I

and I is finite it is sufficient to see that γik < ∞, but this follows from Theorem
3.1.7.
3.1. DISCRETE-TIME MARKOV CHAINS 47

3.1.6 Limiting behaviour


(n)
Let us call a state i is aperiodic if pii > 0 for all sufficiently large n. It can be
(n)
shown that i is aperiodic iff the set {n ≥ 0, pii > 0} has no common divisor
other than 1. It can be also shown that it is a class property.

Theorem 3.1.11 Let P be ireducible and aperiodic and suposse that P has an
invariant distribution π then
(n)
pij → πj as n → ∞ for all i, j.

3.1.7 Ergodic theorem


Ergodic theory concerns the limiting behaviour of averages over time. We shall
prove a theorem which identifies for Markov chains the long-run proportion of
time spent in each state.
Denote by Vi (n) the number of visits to i before n :
n−1
X
Vi (n) := 1{Xk =i} .
k=0

Theorem 3.1.12 Let P irreducible. Then


Vi (n) 1
→ a.s. as n → ∞.
n mi

Moreover, in the positive recurrent case, for any bounded function f : I → R we


have
n−1
1X X
f (Xk ) → πi f (i), a.s as n → ∞.
n
k=0 i∈I

where π is the unique invariant distribution.

Proof. Without loss of generality we assume that we start at state i. If P


is trasient then Vi (n) is bounded by the total number of visits Vi so then

Vi (n) Vi 1
≤ →0=
n n mi
(i)
Suposse that P is recurrent. Let δr the length of the r excursion to i. Then
(i) (i)
by the strong Markov property δr are i.i.d. with E(δr ) = mi , and
(i) (i) (i) (i)
δ1 + ... + δVi (n)−1 n δ1 + ... + δVi (n)
≤ ≤
Vi (n) Vi (n) Vi (n)

and we can apply the strong law of large numbers.


48 CHAPTER 3. MARKOV PROCESSES

For the second part, we can assume w.l.o.g. that |f | ≤ 1 then for any J ⊆ I
n−1
1 X X  V (n) 
i
f (Xk ) − πi f (i) = − πi f (i)

n n


k=0 i∈I

X Vi (n)
X Vi (n)

n − π i
+
n − πi

i∈J i6∈J
X Vi (n) X
Vi (n)

≤ n − πi +
+ πi
n
i∈J i6∈J

X Vi (n) X
≤2
n − π i + 2
πi ,
i∈J i6∈J

but we saw that


Vi (n)
→ πi as n → ∞ for all i.
n
Then, given ε > 0 we can take J such that
X ε
πi < ,
4
i6∈J

and N (ω) such that for all n ≥ N (ω)


X Vi (n)
ε
n − πi < 4 .

i∈J

3.2 Poisson process


3.2.1 Exponential distribution
Definition 3.2.1 We say that T : Ω → [0, ∞] follows an exponential distribu-
tion with parameter λ (we shall write it E(λ) for short) if

fT (t) = λe−λt 1[0,∞] (t).

It is easy to see the following properties:

• E(T ) = 1/λ, var(T ) = 1/λ2 .

• T /µ ∼ E (µλ) .

• P (T > t + s|T > t) = P (T > s) (lack of memory).

• If S ∼ E(λ) and T ∼ E(µ), then min(S, T ) ∼ E(λ + µ) and P (S < T ) =


λ
λ+µ .
3.2. POISSON PROCESS 49

• If t1 , t2 , ..., tn are independent E(λ) then Tn := t1 + t2 + ... + tn ∼


gamma(n, λ).
Theorem 3.2.1 Let T1 , T2 , ... be a sequence of independent random variables
with Tn = E(λn ) and 0 < λn < ∞ for all n.
∞ ∞
X 1 X
(i) If < ∞, then P ( Tn < ∞) = 1.
λ
n=1 n n=1
∞ ∞
X 1 X
(ii) If = ∞, then P ( Tn = ∞) = 1.
λ
n=1 n n=1
P∞ 1
Proof. (i) Suppose n=1 λn < ∞. Then by the monotone convergence
theorem
∞ ∞ ∞
!
X X X 1
E Tn = E (Tn ) = < ∞,
n=1 n=1
λ
n=1 n
P∞
so P ( n=1 Tn < ∞) = 1. P∞
(ii) Suppose instead that n=1 λ1n = ∞. Then Π∞ 1
n=1 (1 + λn ) = ∞, by the
monotone convergence theorem and the independence

!
X 1 −1
E exp{− Tn } = Π∞ ∞
n=1 E (exp{−Tn }) = Πn=1 (1 + ) = 0,
n=1
λ n
P∞
so P ( n=1 Tn = ∞) = 1.
Theorem 3.2.2 Let I be a countable set and P let Tk , k ∈ I, be independent
random variables with Tk ∼ E(qk ) and q := k∈I qk < ∞. Set T = inf k Tk .
Then this minimum is attained at a unique random value K of k, with probability
1. Moreover T and K are independent, with T ∼ E(q) and P (K = k) = qqk .
Proof.
P (K = k, T ≥ t)
= P (Tk ≥ t, Tj > Tk for all j 6= k)
Z ∞
= qk e−qk s P (Tj > s for all j 6= k)ds
t
Z ∞
qk
= qk e−qk s Πj6=k e−qj s ds = e−qt ,
t q
Hence P (K = k, for some k) = 1, and K and T have the claimed joint distri-
bution.

3.2.2 Poisson process


Definition 3.2.2 Let t1 , t2 , ...of independent E(λ) random variables, let Tn =
t1 + t2 + ... + tn , for n ≥ 1, T0 = 0 and define Ns = max{n, Tn ≤ s}.Then, the
process (Ns )s≥0 is named a Poisson process with parameter (or rate) λ.
50 CHAPTER 3. MARKOV PROCESSES

Remark 3.2.1 We think of the tn as times between arrivals of customers at a


bank. Then Ns is the number of arrivals by the time s.
Lemma 3.2.1 Ns ∼ P oisson(λs)
Proof. P (Ns = n) = P (Tn ≤ s < Tn+1 ).
Tn+1 = tn+1 + Tn ,
and tn+1 is independent of Tn , then
Z sZ ∞
P (Tn ≤ s < Tn+1 ) = fTn (t)ftn+1 (u − t)dudt
0 s
Z sZ ∞
1
= λn tn−1 e−λt λe−λ(u−t) dudt
0 s (n − 1)!
Z s n
λn (λs)
= e−λs tn−1 dt = e−λs .
(n − 1)! 0 n!

Lemma 3.2.2 Nt+s − Ns , t ≥ 0 is a Poisson process with rate λ independent


of (Nr )0≤r≤s .
Proof. Assume that for the time s there where four arrivals and that T4 =
u4 . Then
P (t5 > s − u4 + t|t5 > s − u4 ) = P (t5 > t) = e−λt .
So, the first arrival after s is E(λ) and independent of T1 , T2 , T3 , T4 . It is clear
that t5 , t6 , ...are independent of T1 , T2 , T3 , T4 .
Lemma 3.2.3 (Nt )t≥0 has independent increments.
Theorem 3.2.3 If (Nt )t≥0 is a Poisson process

(i) N0 = 0
(ii) Nt+s − Ns ∼ P oisson(λt)
(iii) (Nt )t≥0 has independent increments .
Conversely , if (i) (ii) and (iii) hold and the process is right continuous, then
(Nt )t≥0 is a Poisson process.
Proof. (The converse)
P (T1 > t) = P (Nt = 0) = e−λt .

P (t2 > t|t1 ≤ s) = P (Nt+s − Ns = 0|Nr ≤ 1, 0 ≤ r ≤ s)


= P (Nt+s − Ns = 0) = e−λt ,
so t2 ∼ E(λ) independent of t1 .
3.2. POISSON PROCESS 51

Theorem 3.2.4 (Nt )t≥0 is a Poisson process iff

(i) N0 = 0,
(ii) Nt is an increasing, right continuous, integer-valued process
with independent increments,
(ii) P (Nt+h − Nt = 0) = 1 − λh + o(h), P (Nt+h − Nt = 1) = λh + o(h) ,
as h ↓ 0 uniformly in t.

Proof. Suppose that (Nt )t≥0 is a Poisson process. Then

P (Nt+h − Nt ≥ 1) = P (Nh ≥ 1) = 1 − e−λh = λh + o(h),


P (Nt+h − Nt ≥ 2) = P (Nh ≥ 2) = P (T1 + T2 ≤ h) ≤ P (T1 ≤ h, T2 ≤ h)
= P (T1 ≤ h)P (T2 ≤ h) = (1 − e−λh )2 = o(h).

If (i) (ii) and (iii) hold, for i = 2, 3, ..., we have P (Nt+h − Nt = i) = o(h) as
h ↓ 0, uniformly in t. Set pj (t) = P (Nt = j). Then, for j = 1, 2, ...,

j
X
pj (t + h) = P (Nt+h = j) = P (Nt+h − Nt = i)P (Nt = j − i)
i=0
= (1 − λh + o(h))pj (t) + (λh + o(h)) pj−1 (t) + o(h).

So
pj (t + h) − pj (t)
= −λpj (t) + λpj−1 (t) + O(h),
h
since this estimate is uniform in t we can put t = s − h to obtain

pj (s) − pj (s − h)
= −λpj (s − h) + λpj−1 (s − h) + O(h),
h
now letting h ↓ 0 we obtain that pj (t) are differentiable and satisfies the differ-
ential equation
p0j (t) = −λpj (t) + λpj−1 (t).
Analogously we can see that

p00 (t) = −λp0 (t),

and since
p0 (0) = 1, pj (0) = 0 for j = 1, 2, ...
we have that
j
(λt)
pj (t) = e−λt for j = 0, 1, 2, ....
j!
52 CHAPTER 3. MARKOV PROCESSES

3.3 Continuous-time Markov chains


3.3.1 Definition and examples
Definition 3.3.1 (Xt )t≥0 is a Markov chain if for any 0 ≤ s0 < s1 · ·· < sn < s
and possible states i0 , ..., in , i, j we have

P (Xt+s = j|Xs = i, Xsn = in , ..., Xs0 = i0 ) = P (Xt+s = j|Xs = i)


= P (Xt = j|X0 = i).

Example 3.3.1 Let (Nt )t≥0 be a Poisson process with rate λ and let (Yn )n≥0
be a discrete time Markov chain with transition probabilities πij independent of
(Nt )t≥0 . Then Xt = YNt is a continuous-time Markov chain. Intuitively, this
follows from the memoryless property of the exponential distribution : if Xs = i,
then, independently of what happened in the past, the time to the next jump will
be exponentially distributed with rate λ and will go to state j with probability
pij . If we write
pij (t) = P (Xt = j|X0 = i),
we have
P (Xt = j, X0 = i) P (YNt = j, Y0 = i)
pij (t) = =
P (X0 = i) P (Y0 = i)

X P (Yk = j, Y0 = i)1
= P (Nt = k)
P (Y0 = i)
k=0
∞ k
−λt ∞ (k)
X (k) e
(λt) X e−λt (tλπij )
= πij =
k! k!
k=0 k=0
 
= e−λt etλΠ ij = et(λΠ−λI) = etQ ij .
 
ij

Where Q = λ(Π − I).

Chapman-Kolmogorov equation
Proposition 3.3.1
X
pik (s)pkj (t) = pij (t + s). (3.3)
k∈I

Proof.
X
pij (t + s) = P (Xt+s = j|X0 = i) = P (Xt+s = j, Xs = k|X0 = i)
k∈I
X
= P (Xt+s = j|Xs = k)P (Xs = k|X0 = i).
k∈I
3.3. CONTINUOUS-TIME MARKOV CHAINS 53

(3.3) shows that if we know that transition probability for t < t0 for any
t0 > 0 we know it for all t. This observation suggests that the transition
probabilities pij (t) can be determined from their derivatives at 0 :
pij (h)
qij = lim , j 6= i.
h→0 h
If this limit exists we will call qij the jump rate from i to j.
For Example 3.3.1 qij = λπij .
Example 3.3.1 is atypical. There we started with the Markov chain and then
defined its rates: In most cases it is much simpler to describe the system by
writing down its transition rates qij for j 6= i, which describe the rates at which
jumps are made from i to j.
Example 3.3.2 Poisson process
qn,n+1 = λ for all n ≥ 0.
Example 3.3.3 M/M/s queue. Imagine a bank with s sellers that serve cus-
tomers who queue in a single line if all of the servers are busy. We imagine that
customers arrive at times of a Poisson process with rate λ, and that each ser-
vice time is an independent exponential with rate µ. As in the previous example
qn,n+1 = λ. To model the departures we let

nµ 0 ≤ n ≤ s
qn,n−1 =
sµ n ≥ µ

3.3.2 Computing the transition probability


Theorem 3.3.1 Let Q be matrix on a finite set I. Set P (t) = etQ . Then
(P (t), t ≥ 0) has the following properties:
(i) P (s + t) = P (s)P (t) for all s, t (semigroup property)
(ii) (P (t), t ≥ 0) is the unique solution to the forward equation
d
dt P (t) = P (t)Q, P (0) = I
(iii) (P (t), t ≥ 0) is the unique solution to the backward equation
d
dt P (t) = QP (t), P (0) = I
(iv) for k = 0, 1, 2, ..., we have
d k

dt |k=0 P (t) = Qk .
Proof. For any s, t ∈ R, sQ and tQ commute, so
esQ etQ = e(s+t)Q .
Proving the semigroup property. In fact if Q1 and Q2 commute
∞ ∞ n
X (Q1 + Q2 )n X 1 X n!
e(Q1 +Q2 ) = = Qk1 Qn−k
2
n=0
n! n=0
n! k!(n − k)!
k=0
∞ ∞
X Qk1 X Qn−k
= 2
= eQ1 eQ2 .
k! (n − k)!
k=0 n=k
54 CHAPTER 3. MARKOV PROCESSES

The matrix-value power series



X (tQ)k
P (t) =
k!
k=0

has infinite radius of convergence. So each component is differentiable with


derivative given by term-by-term differentiation:

X tk−1 Qk
P 0 (t) = = P (t)Q = QP (t).
(k − 1)!
k=1

Hence P (t) satisfies the forward and backward equations. By repeated term-
by-term differentiation we obtain (iv). To show uniqueness, assume that M (t)
is another solution of the forward equation, then

d d d −tQ 
M (t)e−tQ = (M (t)) e−tQ + M (t)

e
dt dt dt
= M (t)Qe−tQ + M (t)(−Q)e−tQ = 0,

so M (t)e−tQ is constant, and so M (t) = etQ . Similarly for the backward equa-
tion.

Example 3.3.4 Let  


−2 1 1
Q= 1 -1 0 .
2 1 −3

det(Q − λI) = λ(λ + 2)(λ + 4).


Then, Q has eigenvalues 0, −2, −4. Then there is an invertible matrix U such
that  
0 0 0
Q = U  0 −2 0  U −1 ,
0 0 −4
so
 k 
∞ ∞ 0 0 0
X (tQ)k X 1  k  U −1
etQ = =U 0 (−2t) 0
k! k!
k=0 k=0 0 0 (−4t) k
 
1 0 0
= U  0 e−2t 0  U −1 .
−4t
0 0 e

To determine the conditions we use


00
P (0) = I, P 0 (0) = Q, P (0) = Q2 .
3.3. CONTINUOUS-TIME MARKOV CHAINS 55

Theorem 3.3.2 Consider a continuous-time Markov chain, with a finite set of


states. Assume that as h ↓ 0 we have that

pij (h) = δij + qij h + o(h), for all i, j ∈ I,

uniformly in t, then {pij (t), i, j ∈ I, t ≥ 0} is the solution of the forward equation

p0 (t) = p(t)Q, p(0) = I,

where (p(t))ij = pij (t), Qij = qij .

Proof. Similar to the Theorem 3.2.4:


X
pij (t + h) = pik (t)(δkj + qkj h + o(h)).
k∈I

Since I is finite,
pij (t + h) − pij (t) X
= pik (t)qkj + O(h).
h
k∈I

P
Remark 3.3.1 Note that in the previous theorem qii = − j6=i qij . By Theorem
3.3.1 pij (t), i, j ∈ I, t ≥ 0} is also the solution of the backward equation

p0 (t) = Qp(t), p(0) = I,

where (p(t))ij = pij (t), Qij = qij

Remark 3.3.2 We have similar results for the case of infinite state-space. But
in this case we have to look for a minimal non-negative solution of the backward
and forward equations.

Example 3.3.5 Consider a two-state chain, where, for concreteness I = {1, 2}.
In this case we have to specify only two rates q12 = λ and q21 = µ, so
 
−λ λ
Q= .
µ −µ

Writing the backward equation in matrix form


 0
p11 (t) p012 (t)
   
−λ λ p11 (t) p12 (t)
= ,
p021 (t) p022 (t) µ −µ p21 (t) p22 (t)

to solve this we guess that


µ λ −(µ+λ)t
p11 (t) = + e
µ+λ µ+λ
µ µ −(µ+λ)t
p21 (t) = − e .
µ+λ µ+λ
56 CHAPTER 3. MARKOV PROCESSES

Example 3.3.6 (The Yule process). In this model each particle splits into two
at rate β so qi,i+1 = βi. It can be shown that
p1j (t) = e−βt (1 − e−βt )j−1 for j ≥ 1. (3.4)
From this, since the chain starting at i is the sum of i copies of the chain
starting at i we have
 
j+i−1 i
pij (t) = e−βt (1 − e−βt )j−i .
j

3.3.3 Jump chain and holding times


We say that a matrix Q = (qij , i, j ∈ I) is a rate matrix if
(i) 0 ≤ −qii < ∞ for all i,
(ii) qij ≥ 0 for all i 6= j,
X
(iii) qij = 0 for all i. .
j∈I

We shall write qi or q(i) as an alternative notation for −qii .


From a rate matrix Q we can obtain a stochastic matrix Π, the jump matrix ,

qij /qi if j 6= i and qi 6= 0
πij =
0 if j 6= i and qi = 0

0 if qi 6= 0
πii= .
1 if qi = 0
Example 3.3.7  
−2 1 1
Q= 1 −1 0 ,
2 1 −3
then  
0 1/2 1/2
Π= 1 0 0 
2/3 1/3 0
Definition 3.3.2 A minimal right-continuous process (Xt )t≥0 on I is a Markov
chain with initial distribution λ and rate matrix Q if its jump chain (Yn )n≥0
is a discrete-time Markov chain (λ, Π) and if for each n ≥ 1, conditional on
Y0 , Y1 , ...Yn−1 , its holding times S1 , S2 , ..., Sn are independent exponential ran-
dom variables of parameters q(Y0 ), q(Y1 ), ..., q(Yn−1 ). In such a way that if write
the jump times Jn = S1 + S2 + ... + Sn
XJn = Yn ,
and 
Yn if Jn ≤ t < Jn+1 for some n
Xt =
∞ otherwise (it is minimal) .
3.3. CONTINUOUS-TIME MARKOV CHAINS 57

Let X0 = Y0 with distribution λ, and with and array Tnj , n ≥ 1, j ∈ I of
independent exponential random variable of parameter 1. Then, inductively for
n = 0, 1, 2, ..., if Yn = i, we set
j
Sn+1 = Tnj /qij , for j 6= i,
j
Sn+1 = inf Sn+1 ,
j6=i
j

j if Sn+1 = Sn+1 < ∞
Yn+1 = .
i if Sn+1 = ∞.
P
Then conditional on Yn = i, Sn+1 is E(qi ), qi = j6=i qij , Yn+1 has distri-
bution (πij , j ∈ I) and Sn+1 and Yn+1 are independent, and independent of
Y0 , Y1 , ..., Yn−1 and S1 , ..., Sn .

3.3.4 Birth processes


A birth process, say (Xt )t≥0 , is a generalization of a Poisson process in which the
parameter λ is allowed to depend on the current state of the process. The data
for a birth process consist of birth rates 0 ≤ qi,i+1 < ∞, where i = 0, 1, 2, ...That
is (Xt )t≥0 is a right-continuous process with values in Z+ ∪ {∞} and conditional
on X0 = i, its holding times are qi,i+1 , qi+1,i+2 , ...and its jump chain is given by
Yn = i + n. So
 
−q01 q01

 −q12 q12 

 −q23 q23 
Q= .
 
.. ..

 . . 

 

Example 3.3.8 (Simple birth process or Yule process) qi,i+1 = βi. Write
Xt the number of individuals at time t. Suppose that X0 = 1 and let J1 denote,
as above, the time of the first birth, then J1 ∼ exp(β) and

E(Xt ) = E(Xt 1{J1 ≤t} ) + E(Xt 1{J1 >t} )


Z t
= βe−βs E(Xt |J1 = s)ds + E(1{J1 >t} )
0
Z t
= βe−βs E(Xt |J1 = s)ds + eβt .
0

If we put µ(t) = E(Xt ) then E(Xt |J1 = s) = 2µ(t − s): if we have a birth at s
hereafter we have like two simple birth process of the same type as X. Then
Z t
µ(t) = 2 βe−βs µ(t − s)ds + eβt ,
0
58 CHAPTER 3. MARKOV PROCESSES

and from here we obtain


µ0 (t) = βµ(t),
so the mean population size grows exponentially:
E(Xt ) = eβt .
Note also that we can obtain µ(t) from (3.4)

X ∞
X
µ(t) = jp1j (t) = je−βt (1 − e−βt )j−1
j=1 j=1

1 d X 1 d 1 − e−βt
= (1 − e−βt )j =
β dt j=1 β dt e−βt

= eβt .
Definition 3.3.3 If (Jn )n≥1 n≥1 are the jump times and (Sn )n≥1 the holding
times of our Markov chain

X
ζ = sup Jn = Sn
n=1

is called the explosion time.


Remark 3.3.3 Note that our Markov chains, say (Xt )t≥0 are minimal pro-
cesses in the sense that
Xt = ∞ if t ≥ ζ.
Remark 3.3.4 It is easy to see from Theorem 3.2.1 that the birth process start-
ing from zero explodes if and only if

X 1
< ∞.
i=0
qi,i+1

Example 3.3.9 (Non-minimal chain) Consider a birth process (Xt )t≥0 start-
ing from 0 with rates qi,i+1 = 2i for i ≥ 0. Then by the previous remark the
process explodes, we have insisted that Xt = ∞ if t ≥ ζ, where ζ is the explosion
time. But another obvious possibility is to start the process off again from zero
at time ζ and do the same for all subsequent explosions. Using the memoryless
property of the exponential distribution it can be shown that the new process is
a continuous Markov chain in the sense of Definition 3.3.1 .

3.3.5 Class structure


Hereafter we deal only with minimal chains, those that die after explosion. Then
the class structure is simply the discrete-time class structure of the jump chain
(Yn )n≥0 .We say that i leads to j and write i → j if
Pt (Xt = i for some t ≥ 0) > 0.
3.3. CONTINUOUS-TIME MARKOV CHAINS 59

We say i communicates with j and write i ←→ j if both i → j and j → i. The


notions of communicating class, closed class, absorbing state and irreducibility
are inherited from the jump chain.

Theorem 3.3.3 For distinct states i and j the following are equivalent:

(i) i → j;
(ii) i → j for the jump chain;
(iii) qi0 i1 qi1 i2 · · · qin−1 in > 0 for some states i0 , i1 , ..., in with i0 = i,
and in = j;
(iv) pij (t) > 0 for all t > 0;
(v) pij (t) > 0 for some t > 0.

3.3.6 Hitting times


Let (Xt )t≥0 be a Markov chain with rate matrix Q. The hitting time of a subset
A of I is the random variable DA defined by

DA (ω) = inf{t ≥ 0, Xt (ω) ∈ A}

with the usual convention that inf ∅ = ∞. Since (Xt )t≥0 is minimal, if H A is
the hitting time of A for the jump chain, then
 A
H < ∞ = DA < ∞ ,


and on this set


D A = JH A .
The probability, starting from i, that (Xt )t≥0 ever hits A is then

hA A A
i = Pi (D < ∞) = Pi (H < ∞).

Theorem 3.3.4 The vector of hitting probabilities hA = (hA i , i ∈ I) is the


minimal non-negative solution to the system of linear equations
 A
hi = 1 for i ∈ A
A
P
j∈I qij hj = 0 for i 6∈ A
60 CHAPTER 3. MARKOV PROCESSES

The average time taken, starting from i, for (Xt )t≥0 to reach A is given by

kiA = Ei (DA )
Theorem 3.3.5 Assume that qi > 0 for all i 6∈ A. The vector of expected
hitting times k A = (kiA , i ∈ A) is the minimal non-negative solution to the
system of linear equations
 A
ki P=0 for i ∈ A
(3.5)
− j∈I qij kjA = 1 for i 6∈ A

Proof. First note that k A satisfies (3.5): if X0 = i ∈ A then DA = 0, so


kiA = 0; if X0 = i 6∈ A, by the Markov property
Ei (DA − J1 |Y1 = j) = Ej (DA ),
so
kiA = Ei (DA ) = Ei (J1 ) + Ei (DA − J1 )
X
= Ei (J1 ) + Ei (DA − J1 |Y1 = j)Pi (Y1 = j)
j6=i
1 X 1 X
= + Ej (DA )πij = + πij kjA
qi qi
j6=i j6=i
1 X qij A
= + k ,
qi qi j
j6=i

and then X
−qii kiA − qij kjA = 1.
j6=i

Suppose now that y = (yi , i ∈ I) is another non-negative solution to (3.5). Then


kiA = yi = 0 for i ∈ A. If i 6∈ A, then
 
1 X 1 X 1 X
yi = + πij yj = + πij  + πjk yk 
qi qi qj
j6∈A j6∈A k6∈A
XX
= Ei (S1 ) + Ei (S2 1{H A ≥2} ) + πij πjk yk .
j6∈A k6∈A

By repeated substitution for y we obtain


X X
yi = Ei (S1 ) + · · · + Ei (Sn 1{H A ≥n} ) + ··· πij1 · · · πjn−1 jn yjn .
j1 6∈A jn−1 6∈A

So,
 A

n
X n∧H
X
yi ≥ Ei (Sm 1{H A ≥m} ) = Ei  Sm  → Ei (DA ) = kiA .
n→∞
m=1 m=1
3.3. CONTINUOUS-TIME MARKOV CHAINS 61

3.3.7 Recurrence and trasience


Let (Xt )t≥0 be Markov chain with rate matrix Q. Recall that (Xt )t≥0 is mini-
mal. We say a state i is recurrent if

Pi ({t ≥ 0, Xt = i} is unbounded) = 1.

We say is transient if

Pi ({t ≥ 0, Xt = i} is unbounded) = 0.

Remark 3.3.5 Note that if (Xt )t≥0 can explode starting from i then i si cer-
tainly not recurrent.

Theorem 3.3.6 We have:

(i) if i is recurrent for the jump chain (Yn )n≥0 , then i is recurrent for (Xt )t≥0 ;
(ii) if i is transient for the jump chain (Yn )n≥0 , then i is transient for (Xt )t≥0 ;
(iii) every state is either recurrent or transient;
(iv) recurrence and transient are class properties.

Denote by Ti the first passage time of (Xt )t≥0 to state i, defined by

Ti (ω) = inf{t ≥ J1 (ω), Xt (ω) = i}

Proof. (i) Suppose i is recurrent for the jump chain (Yn )n≥0 . If X0 = i then
(Xt )t≥0 does not explode and Jn → ∞.Also X(Jn ) = Yn = i infinitely often, so
{t ≥ 0, Xt = i} is unbounded, with probability 1.
(ii) Suppose i is transient for (Yn )n≥0 . If X0 = i then

N = sup{n ≥ 0, Yn = i} < ∞,

so {t ≥ 0, Xt = i} is bounded by JN +1 which is finite with probability 1, because


{Yn , n ≤ N } cannot include an absorbing state.
We denote by Ti the first passage time of (Xt )t≥0 to state i, defined by

Ti (ω) = inf{t ≥ J1 , Xt (ω) = i}.

Theorem 3.3.7 The following dichotomy holds:


Z ∞
(i) if qi = 0 or Pi (Ti < ∞) = 1, then i is recurrent and pii (t)dt = ∞;
0
Z ∞
(ii) if qi > 0 and Pi (Ti < ∞) < 1, then i is transient and pii (t)dt < ∞; .
0

Proof. If qi = 0 then (Xt )t≥0 cannot leave i and pii (t) = 1.Suppose qi > 0.
Let Ni denote the first passage time of the chain (Yn )n≥0 for the state i.Then

P (Ni < ∞) = P (Ti < ∞),


62 CHAPTER 3. MARKOV PROCESSES

so i is recurrent if and only if P (Ti < ∞) = 1 by the previous theorem and the
result for the jump chain (Yn )n≥0 .
(n)
Write πij for (Πn )ij . We shall prove that
Z ∞ ∞
1 X (n)
pii (t)dt = π .
0 qi n=0 ij

Z ∞ Z ∞ Z ∞ 
pii (t)dt = (Ei 1{Xt =i} )dt = Ei 1{Xt =i} dt
0 0 0

!
X
= Ei Sn+1 1{Yn =i}
n=0
∞ ∞
X 1 X (n)
= Ei (Sn+1 |Yn = i) Pi (Yn = i) = π .
n=0
qi n=0 ij

3.3.8 Invariant distributions


We say that λ is invariant if
λQ = 0.
Theorem 3.3.8 Let Q a rate matrix with jump matrix Π and let λ be a mea-
sure. The following are equivalent:
(i) λ is invariant;
(ii) µΠ = µ where µi = λi qi .
Proof. We have qi (πij − δij ) = qij for all i, j so
X X
µ (Π − I)j = µi (πij − δij ) = λi qij .
i∈I i∈I

Theorem 3.3.9 Suppose that Q is irreducible and recurrent. Then Q has an


invariant measure λ unique up to scalar multiples.
Recall that a state i is recurrent if qi = 0 or Pi (Ti < ∞) = 1. If qi = 0
or the expected return time mi = Ei (Ti ) is finite we say i is positive recurrent
otherwise is called null recurrent.
Theorem 3.3.10 Let Q be an irreducible rate matrix. Then the following are
equivalent:
(i) every state is positive recurrent;
(ii) some state i is positive recurrent;
(iii) P has an invariant distribution, λ say.
Moreover, when (iii) holds we have mi = 1/ (λi qi ) , for all i.
3.3. CONTINUOUS-TIME MARKOV CHAINS 63

Theorem 3.3.11 Let Q be an irreducible rate matrix, and let λ be a measure.


Let s > 0 be given. The following are equivalent

(i) λQ = 0;
(ii) λP (s) = λ .

3.3.9 Convergence to equilibrium


Theorem 3.3.12 Let Q be an irreducible non-explosive rate matrix with semi-
group P (t) an having an invariant distribution λ. Then for all states i, j we
have
pij (t) → λj as t → ∞.

Theorem 3.3.13 Let Q be an irreducible rate matrix and let ν be any distri-
bution. Suppose that (Xt )t≥0 is Markov (ν, Q) . Then

P (Xt = j) → 1/(qj mj ) as t → ∞ for all j ∈ I

where mj is the expected return time to state j.

You might also like