MIT18 S096F13 Lecnote1

A Course in Stochastic Processes
José Manuel Corcuera

2 J.M. Corcuera
Contents
1 Basic Concepts and Definitions 3

1.1 Definition of a Stochastic Process . . . . . . . . . . . . . . . . . . 3
1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Equivalence of Stochastic Processes . . . . . . . . . . . . . . . . . 6
1.4 Continuity Concepts . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Main Classes of Random Processes . . . . . . . . . . . . . . . . . 7
1.6 Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6.1 Model of a Germination Process . . . . . . . . . . . . . . 8
1.6.2 Modeling of Soil Erosion Effect . . . . . . . . . . . . . . . 9
1.6.3 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Problemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Discrete time martingales 13

2.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Discrete Time Martingales . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Optional stopping theorem . . . . . . . . . . . . . . . . . . . . . 21
2.5 The Snell envelope and optimal stopping . . . . . . . . . . . . . . 22
2.6 Decomposition of supermartingales . . . . . . . . . . . . . . . . . 25
2.7 Martingale convergence theorems . . . . . . . . . . . . . . . . . . 27
3 Markov Processes 31
3.1 Discrete-time Markov chains . . . . . . . . . . . . . . . . . . . . . 31
3.1.1 Class structure . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1.2 Hitting times and absortion probabilities . . . . . . . . . . 34
3.1.3 Recurrence and transience . . . . . . . . . . . . . . . . . . 39
3.1.4 Recurrence and trasient in a random walk . . . . . . . . . 41
3.1.5 Invariant distributions . . . . . . . . . . . . . . . . . . . . 42
3.1.6 Limiting behaviour . . . . . . . . . . . . . . . . . . . . . . 47
3.1.7 Ergodic theorem . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.1 Exponential distribution . . . . . . . . . . . . . . . . . . . 48
3.2.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3 Continuous-time Markov chains . . . . . . . . . . . . . . . . . . . 52
3
4 CONTENTS
3.3.1 Definition and examples . . . . . . . . . . . . . . . . . . . 52

3.3.2 Computing the transition probability . . . . . . . . . . . 53
3.3.3 Jump chain and holding times . . . . . . . . . . . . . . . 56
3.3.4 Birth processes . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.5 Class structure . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.6 Hitting times . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.7 Recurrence and trasience . . . . . . . . . . . . . . . . . . 61
3.3.8 Invariant distributions . . . . . . . . . . . . . . . . . . . . 62
3.3.9 Convergence to equilibrium . . . . . . . . . . . . . . . . . 63
CONTENTS 1
2 CONTENTS
Chapter 1
Basic Concepts and

Definitions
1.1 Definition of a Stochastic Process

A stochastic process with state space S is a collection of random variables
{Xt ; t ∈ T } defined on the same probability space (Ω, F, P ). The set T is called
its parameter set. If T = N = {0, 1, 2, ...}, the process is said to be a discrete
parameter process. If T is not countable, the process is said to have a continuous
parameter. In the latter case the usual examples are T = R+ = [0, ∞) and
T = [a; b] ⊂ R. The index t represents time, and then one thinks of Xt as the
state or the position of the process at time t. The state space is R in most usual
examples, and then the process is said real-valued. There will be also examples
where S is N, the set of all integers, or a finite set.
For every fixed ω ∈ Ω , the mapping
t −→ Xt (ω)
defined on the parameter set T , is called a realization, trajectory, sample path
or sample function of the process.
Let {Xt ; t ∈ T } be a real-valued stochastic process and {t1 < · · · < tn } ⊂ T ,
then the probability distribution Pt1 ,...,tn = P ◦ (Xt1 , ..., Xtn )−1 of the random
vector
(Xt1 , ..., Xtn ) : Ω −→ Rn

is called a finite-dimensional marginal distribution of the process {Xt ; t ∈ T }.
A theorem due to Kolmogorov, establishes the existence of a stochastic process
associated with a given family of finite-dimensional distributions satisfying the
consistence condition. Let Ω = RT , the collection of all functions ω := (ω(t)t∈T )
from T to R. Define Xt (ω) = ω(t).A set
C = {ω : Xt1 (ω) ∈ B1 , ..., Xtn (ω) ∈ Bn }
3
4 CHAPTER 1. BASIC CONCEPTS AND DEFINITIONS
for t1 < · · · < tn ∈ T and B1 , ..., Bn ∈ B(Rn ) is called a cylinder set. Consider
the σ-field F generated by the cylinders sets.
Theorem 1.1.1 Consider a family of probability measures
{Pt1 ,...,tn , t1 < · · · < tn , n ≥ 1, ti ∈ T }
such that: 1. Pt1 ,...,tn is a probability on Rn 2.(Consistence condition): If
{tk1 < · · · < tkm } ⊂ {t1 < · · · < tn } ,
then Ptk1 ,...,tkm is the marginal of Pt1 ,...,tn , corresponding to the indexes k1 , ..., km .
Then, there exists a unique probability P on F which has the family {Pt1 ,...,tn }
as finite-dimensional marginal distributions.
1.2 Examples
A real-valued process {Xt , t ≥ 0} is called a second order process provided
E(X 2 ) < ∞ for all t ≥ 0. The mean and the covariance function of a second
order process {Xt , t ≥ 0} are defined by
mX (t) = E(Xt )
ΓX (s, t) = Cov(Xs ; Xt )
= E((Xs − mX (s))(Xt − mX (t)).
The variance of the process {Xt , t ≥ 0} is defined by is defined by
2
σX (t) = ΓX (t, t) = Var(Xt ).
Example 1.2.1 Let X and Y be independent random variables. Consider the
stochastic process with parameter t ∈ [0, ∞)
Xt = tX + Y.
The sample paths of this process are lines with random coefficients. The finite-
dimensional marginal distributions are given by

xi − y
Z
P (Xt1 ≤ x1 , ..., Xtn ≤ xn ) = FX min PY (dy).
R 1≤i≤n ti
Example 1.2.2 Consider the stochastic process
Xt = Acos(ϕ + λt);
where A and ϕ are independent random variables such that E(A) = 0, E(A2 ) <
∞ and ϕ is uniformly distributed on [0, 2π]. This is a second order process with
mX (t) = 0
1
ΓX (s, t) = E(A2 ) cos λ(t − s).
2
1.2. EXAMPLES 5
Example 1.2.3 Arrival process: Consider the process of arrivals of customers

at a store, and suppose the experiment is set up to measure the interarrival times.
Suppose that the interarrival times are positive random variables X1 , X2 , ....Then,
for each t ∈ [0; ∞), we put Nt = k if and only if the integer k is such that
X1 + ... + Xk ≤ t < X1 + ... + Xk+1 ,
and we put Nt = 0 if t < X1 . Then Nt is the number of arrivals in the time
interval [0, t]. Notice that for each t ≥ 0, Nt is a random variable taking values
in the set S = N. Thus, {Nt , t ≥ 0} is a continuous time process with values
in the state space N. The sample paths of this process are non-decreasing, right
continuous and they increase by jumps of size 1 at the times X1 + ... + Xk . On
the other hand, Nt < ∞ for all t ≥ 0 if and only if
∞
X
Xk = ∞.
k=1
Example 1.2.4 Consider a discrete time stochastic process {Xn , n = 0, 1, 2, ...}

with a finite number of states S = {1, 2, 3}. The dynamics of the process is as
follows. You move from state 1 to state 2 with probability 1. From state 3 you
move either to 1 or to 2 with equal probability 1/2, and from 2 you jump to 3
with probability 1/3, otherwise stay at 2. This is an example of a Markov chain.
A real-valued stochastic process {Xt , t ∈ T } is said to be Gaussian or normal
if its finite-dimensional marginal distributions are multi-dimensional Gaussian
laws. The mean mX (t) and the covariance function ΓX (s, t) of a Gaussian pro-
cess determine its finite-dimensional marginal distributions. Conversely, sup-
pose that we are given an arbitrary function m : T −→ R, and a symmetric
function Γ : T × T −→ R, which is nonnegative definite, that is
n
X
Γ(ti , tj )ai aj ≥ 0
i,j=1
for all ti ∈ T, ai ∈ R, and n ≥ 1. Then there exists a Gaussian process with

mean m and covariance function Γ.
Example 1.2.5 Let X and Y be random variables with joint Gaussian distri-
bution. Then the process Xt = tX + Y , t ≥ 0, is Gaussian with mean and
covariance functions
mX (t) = tE(X) + E(Y );
ΓX (s, t) = stVar(X) + (t + s)Cov(X; Y ) + Var(Y ).
Example 1.2.6 Gaussian white noise: Consider a stochastic process {Xt , t ∈
T } such that the random variables Xt are independent and with the same law
N (0, σ 2 ). Then, this process is Gaussian with mean and covariance functions,
mX (t) = 0
ΓX (s, t) = σ 2 δst .
1.3 Equivalence of Stochastic Processes

Definition 1.3.1 A stochastic process {Xt , t ∈ T } is equivalent to another
stochastic process {Yt , t ∈ T } if for each t ∈ T
P {Xt = Yt } = 1.
We also say that {Xt , t ∈ T } is a version of {Yt , t ∈ T }. Two equivalent

processes may have quite different sample paths.
Example 1.3.1 Let ξ be a nonnegative random variable with continuous dis-

tribution function. Set T = [0, ∞). The processes
Xt = 0

0 if 6 t
ξ=
Yt =
1 if ξ=t
are equivalent but their sample paths are different.
Definition 1.3.2 Two stochastic processes {Xt , t ∈ T } and {Yt , t ∈ T } are said
to be indistinguishable if X· (ω) = Y· (ω) for all ω 6∈ N , with P (N ) = 0.
Two stochastic process which have right continuous sample paths and are
equivalent, then they are indistinguishable.
Two discrete time stochastic processes which are equivalent, they are also
indistinguishable.
1.4 Continuity Concepts

Definition 1.4.1 A real-valued stochastic process {Xt , t ∈ T }, where T is an
interval of R, is said to be continuous in probability if, for any ε > 0 and every
t∈T
lim P (|Xt − Xs | > ε) = 0.
s−→t
Definition 1.4.2 Fix p ≥ 1. Let {Xt , t ∈ T } be a real-valued stochastic pro-

cess, where T is an interval of R, such that E(|Xt |p ) < ∞, for all t ∈ T . The
process {Xt , t ∈ T } is said to be continuous in mean of order p if
lim E(|Xt − Xs |p ) = 0.
s−→t
Continuity in mean of orderp implies continuity in probability. However, the

continuity in probability (or in mean of order p) does not necessarily implies
that the sample paths of the process are continuous.
In order to show that a given stochastic process have continuous sample paths
it is enough to have suitable estimations on the moments of the increments of the
process. The following continuity criterion by Kolmogorov provides a sufficient
condition of this type:
1.5. MAIN CLASSES OF RANDOM PROCESSES 7
Proposition 1.4.1 (Kolmogorov continuity criterion) Let {Xt , t ∈ T } be

a real-valued stochastic process and T is a finite interval. Suppose that there
exist constants α > 1 and p > 0 such that
E(|Xt − Xs |p ) ≤ cT |t − s|α (1.1)
for all s, t ∈ T . Then, there exists a version of the process {Xt , t ∈ T } with
continuous sample paths.
Condition (1.1) also provides some information about the modulus of conti-
nuity of the sample paths of the process: mT (ω, δ) := sups,t∈T,|t−s|<δ |Xt (ω) −
Xs (ω)| . In particular if T = [a, b], for each ε > 0 there exists a random variable
Gε such that, with probability one,
α−1
mT (ω, δ) ≤ Gε |b − a| p −ε
Moreover, E(Gpε ) < ∞.
1.5 Main Classes of Random Processes

1. Processes with independent increments. Let X := {Xt , t ∈ T } be a real-
valued stochastic process on a probability space (Ω, F, P ), where T ⊂ R is an
interval.
Definition 1.5.1 The process X is said to have independent increments if for

any finite subset {t0 < · · · < tn } ⊂ T , the increments
Xt0 , Xt1 − Xt0 , ..., Xtn − Xtn−1
are independent random variables.
Remark 1.5.1 If T = [0, ∞) , from the definition, it follows that all the
marginal distributions are determined by X0 and Xt − Xs , s < t ∈ T . The
most important processes with independent increments are the Poisson process
and the Wiener process (or Brownian motion).
2. Markov processes.
Definition 1.5.2 The process X is said to be a Markov process with a state

space {R, B(R)} if for any {t1 < · · · < tn } ⊂ T and B ∈ B(R),
P (Xtn ∈ B|Xt1 , ..., Xtn−1 ) = P (Xtn ∈ B|Xtn−1 ) (a.s.) (1.2)
Remark 1.5.2 Property (2.3) is called the Markov property.
3. Strickly stationary processes.

Definition 1.5.3 The process X is said to be strickly stationary if for any

{t1 < · · · < tn } ⊂ T and {t1 + h < · · · < tn + h} ⊂ T any (Bk )1≤k≤n ∈ B(R)
P (Xt1 +h ∈ B1 , ..., Xtn +h ∈ Bn ) = P (Xt1 ∈ B1 , ..., Xtn ∈ Bn )
4. Wide sense stationary processes.
Definition 1.5.4 A second order process X is said to be wide sense stationary
if for all t ∈ T and t + h ∈ T
E(Xt+h ) = E(Xt )
and
Cov(Xt+h , Xs+h ) = Cov(Xt , Xs ).
5. Martingales.
Definition 1.5.5 A process X, such that E(|X|) < ∞ is said to be a martingale
if, for every {t1 < · · · < tn } ⊂ T,
E(Xtn |Xt1 , ..., Xtn−1 ) = Xtn−1 (a.s.).
If
E(Xtn |Xt1 , ..., Xtn−1 ) ≤ Xtn−1 (a.s.),
the process X is a supermartingale. Finally if
E(Xtn |Xt1 , ..., Xtn−1 ) ≥ Xtn−1 (a.s.),
it is said to be a submartingale.
1.6 Some Applications

1.6.1 Model of a Germination Process
Let p be the probability that a seed may fal to germinate. If there is germination
assume that the germination time, T, follows a exponential distribution with
parameter λ. Then
P (T ≤ t) = P (T ≤ t|germination)P (germination)
= (1 − e−λt )(1 − p).
If we have n plants and Xt is the number of them that have germinated, then
N
X
Xt = 1{Ti ≤t} ,
i=1
where Ti is the germination time of the i-plant. If we assume that {Ti }N

i=1 is
an i.i.d. sequence, then
Xt ∼ Bi N, (1 − e−λt )(1 − p) .

1.6. SOME APPLICATIONS 9
1.6.2 Modeling of Soil Erosion Effect

Let Y1 , Y2 , ..., Yn , ... be the sequence of annual yields of a given crop in an area
without erosion and Z1 , Z2 , ...Zn , ... a sequence of factors with common support
in (0, 1]. Then we can model the crop yield in year n as
n
Y
Xn = Yn Zi ,
i=1
and to assume that Y and Z are, respectively, sequences of i.i.d random variables
and that Y and Z are independent.
1.6.3 Brownian motion

In 1827 Robert Brown noticed the irregular but ceaseless motion of pollen par-
ticles suspended in a liquid. Assume that ξt is the coordinate, say x, of the
position of a pollen particle. Then (ξt ) is a continuos-time process. If we as-
sume that ξt − ξs has a distribution depending on t − s, independent of what
happened before s and that (ξt ) is a continuous process then if can be proved
that
ξt ∼ N(µt, σ 2 t),
where µ ∈ R and σ ∈ R+ . If µ = 0 and σ = 1 we have the so called standard
Brownian motion. Since
2
(ξt − ξs ) = ξt2 + ξs2 − 2ξt ξs ,
we have that
2
|t − s| = E((ξt − ξs ) ) = E(ξt2 ) + E(ξs2 ) − 2Cov (ξt , ξs ) ,
and
1
Cov (ξt , ξs ) = (t + s − |t − s|) = min(s, t).
2
1.7 Problemes
1. Siguin Ω = [0, 1], F = B([0, 1]) i P la mesura de probabilitat uniforme en
Ω. Considereu el procés Xt (ω) = tω, t ∈ [0, 1]. Trobeu les trajectòries i
les distribucions en una i dos dimensions.
2. Siguin Xt = A cos αt + B sin αt i Yt = R cos(αt + Θ). On α > 0, A i B
són variables aleatòries independents amb distribució N (0, σ 2 ). R i Θ són
també variables aleatòries independents on Θ ∼Unif(0, 2π) i R té densitat
x x2
fR (r) = exp(− )1(0,+∞) (x).
σ2 2σ 2
Demostreu que els processos {Xt }t≥0 i {Yt }t≥0 tenen la mateixa llei.
3. Siguin X1 i X2 variables aleatòries independents definides en el mateix es-
pai de probabilitat (Ω, F, P ) i amb la mateixa distribució normal estàndard.
Sigui {Yt }t≥0 el procés estocàstic definit per
Yt = (X1 + X2 ) t.
Determineu les distribucions en dimensió finita del procés. Si A és el
conjunt de les trajectòries no negatives del procés, calculeu P (A).
4. Sigui {Yt }t≥0 el procés estocàstic definit per
Yt = X + αt, α > 1,
on X és una variable aleatòria amb llei N(0, 1). Sigui D ⊂ [0, +∞) un
conjunt finit o infinit numerable. Determineu:
(a) P (Yt = 0, per almenys un t ∈ D),

(b) P (Yt = 0, per almenys un t ∈ [1, 2]).
5. Siguin X i Y variables aleatòries definides en (Ω, F, P ) , on Y ∼ N (0, 1) .

Sigui {Zt }t≥0 el següent procés estocàstic
Zt = X + t (Y + t) .
Sigui A el conjunt de trajectòries no decreixents d’aquest procés. Deter-
mineu P (A) .
6. Siguin {Xi }i=1,...,n i {Yi }i=1,...,n variables aleatòries tal que E[Xi ] = E[Yi ] =
0, Var[Xi ] = Var[Yi ] = σi < +∞ i E[Xi Xj ] = E[Yi Yj ] = E[Xi Yj ] = 0 per
a tot i 6= j. Sigui {Zt }t≥0 el procés estocàstic definit per
n
X
Zt = {Xi cos (λi t) + Yi sin (λi t)} .
i=1
Determineu la seva funció de covariància. És {Zt }t≥0 estacionari en sentit

ampli?
1.7. PROBLEMES 11
7. Diem que un procés{Nt }t≥0 és un un procés de Poisson homogeni de

paràmetre λ > 0, si té increments independents, Nt = 0 i per a tot 0 <
t1 < t2 es compleix
k
(λ (t2 − t1 )) −λ(t2 −t1 )
P (Nt2 − Nt1 = k) = e , k ∈ Z+ .
k!
Sigui {Xk }k≥0 una successió de v.a. i.i.d. amb E[Xk ] = 0 i Var[Xi ] = σ 2 <
+∞. Sigui {Nt }t≥0 un procés de Poisson homogeni de parametre λ > 0,
independent de {Xk }k≥0 . És el procés {Yt }t≥0 , definit per Yt = XNt ,
estacionari en sentit ampli? I estrictament estacionari?
8. Sigui {Yt }t≥0 el procés estocàstic definit per
Yt = α sin (βt + X) ,
on α, β > 0 i X ∼ N (0, 1) . Calculeu E[Yt ] i E[Ys Yt ]. És el procés {Yt }t≥0

estacionari en senti ampli?
Chapter 2
Discrete time martingales
We will first introduce the notion of conditional expectation of a random variable

X with respect to a σ-field B ⊂ F in a probability space (Ω, F, P ).
2.1 Conditional expectation

Consider an integrable random variable X defined in a probability space (Ω, F, P ),
and σ-field B ⊂ F. We define the conditional expectation of X given B (denoted
by E(X|B) to be any integrable random variable Z that satisfies the following
two properties:
(i) Z is measurable with respect to B.
(ii) For all A ∈ B, E(Z1A ) = E(X1A )
It can be proved that there exists a unique (up to modifications on sets of

probability zero) random variable satisfying these properties. That is, if Z̃ and
Z satisfy the above properties, then Z = Z̃, P -almost surely.
Property (ii) implies that for any bounded and B-measurable random vari-
able Y we have
E(E(X|B)Y ) = E(XY ) (2.1)
Example 2.1.1 Consider the particular case where the σ-field B is generated by
a finite partition {B1 , ..., Bm }. In this case, the conditional expectation E(X|B)
is a discrete random variable that takes the constant value E(X|Bj ) on each set
Bj :
m
X E(X1Bj )
E(X|B) = 1Bj .
j=1
P (Bj )
Here are some properties of the conditional expectations in the general case:
1. The conditional expectation is lineal: E(aX+bY |B) = aE(X|B)+bE(Y |B).
13
14 CHAPTER 2. DISCRETE TIME MARTINGALES
2. A random variable and its conditional expectation have the same expecta-
tion: E(E(X|B)) = E(X). This follows from property (ii) taking A = Ω.
3. If X and B are independent, then E(X|B) = E(X). In fact, the constant
E(X) is clearly B-measurable, and for all A ∈ B we have E(X1A ) =
E(X)E(1A ) = E(E(X)1A ).
4. If X is B-measurable, then E(X|B) = X.
5. If Y is a bounded and B-measurable random variable, then E(Y X|B) =
Y E(X|B). In fact, the random variable Y E(X|B) is integrable and B-
measurable, and for all A ∈ B we have
E(E(X|B)Y 1A ) = E(XY 1A ),
where the equality follows from (2.1). This property means that B-measurable
random variables behave as constants and can be factorized out of the
conditional expectation with respect to B. This property holds if X, Y ∈
L2 (Ω).
6. Given two σ-fields C ⊂ B, then E(E(X|B)|C) = E(X|C)
7. Consider two random variable X and Z, such that Z is B-measurable
and X is independent of B. Consider a measurable function h(x, z) such
that the composition h(X, Z) is an integrable random variable. Then,
we have E(h(X, Z)|B) = E(h(X; z))|z=Z . That is, we first compute the
expectation E(h(X; z)) for any fixed value z of the random variable Z
and, afterwards, we replace z by Z.
Conditional expectation has properties similar to those of ordinary expecta-

tion. For instance, the following monotone property holds:
X ≤ Y ⇒ E(X|B) ≤ E(Y |B).
This implies |E(X|B)| ≤ E(|X||B).

Jensen’s inequality also holds. That is, if ϕ is a convex function defined in
an open interval, such that E(|ϕ(X)|) < ∞, then
ϕ(E(X|B)) ≤ E(ϕ(X)|B). (2.2)
In particular, if we take ϕ(x) = |x|p with p ≥ 1, and E(|X|p ) < ∞, we obtain
|E(X|B)|p ≤ E(|X|p |B),
hence, taking expectations,
E(|E(X|B)|p ) ≤ E(|X|p ). (2.3)
We can define the conditional probability of an event C ∈ F given a σ-field B

as
P (C|B) = E(1C |B).
2.1. CONDITIONAL EXPECTATION 15
Suppose that the σ-field B is generated by a finite collection of random vari-

ables Y1 , ..., Ym . In this case, we will denote the conditional expectation of X
given B by E(X|Y1 , ..., Ym ) and this conditional expectation is the mean of the
conditional distribution of X given Y1 , ..., Ym . The conditional distribution of
X given Y1 , ..., Ym is a family of distributions p(dx|y1 , ..., ym ) parametrized by
the possible values y1 , ..., ym of the random variables Y1 , ..., Ym , such that for
all a < b Z b
P (a ≤ X ≤ b|Y1 , ..., Ym ) = p(dx|Y1 , ..., Ym ).
a
Then, this implies that
Z
E(X|Y1 , ..., Ym ) = xp(dx|Y1 , ..., Ym ).
R
Notice that the conditional expectation E(X|Y1 , ..., Ym ) is a function g(Y1 , ..., Ym )
of the variables Y1 , ..., Ym , where
Z
g(y1 , ..., ym ) = xp(dx|y1 , ..., ym ).
R
In particular, if the random variables X, Y1 , ..., Ym have a joint density f (x, y1 , ..., ym ),
then the conditional distribution has the density:
f (x, y1 , ..., ym )
f (x|y1 , ..., ym ) = R ∞ ,
−∞
f (x, y1 , ..., ym )dx
and Z ∞
E(X|Y1 , ..., Ym ) = xf (x|Y1 , ..., Ym )dx.
−∞
The set of all square integrable random variables, denoted by L2 , is a Hilbert

space with the scalar product
hZ, Y i = E(ZY ).
Then, the set of square integrable and B-measurable random variables, denoted
by L2 (Ω, B, P ) is a closed subspace of L2 (Ω, F, P ). Then, given a random
variable X such that E(X 2 ) < ∞, the conditional expectation E(X|B) is the
projection of X on the subspace L2 (Ω, B, P ). In fact, we have:
(i) E(X|B) belongs to L2 (Ω, B, P ) because it is a B-measurable random vari-

able and it is square integrable due to (2.3).
X − E(X|B) is orthogonal to the subspace L2 (Ω, B, P ). In fact, for all

Z ∈ L2 (Ω, B, P ) we have, using the property 5,
E[(X − E(X|B))Z] = E(XZ) − E(E(X|B)Z)

= E(XZ) − E(E(XZ|B)) = 0.
As a consequence, E(X|B) is the random variable in L2 (Ω, B, P ) that minimizes

the mean square error:
E[(X − E(X|B))2 ] = min E[(X − Y )2 ].

Y ∈L2 (Ω,B,P )
This follows from the relation
E[(X − Y )2 ] = E[(X − E(X|B))2 ] + E[(E(X|B) − Y )2 ],
and it means that the conditional expectation is the optimal estimator of X

given the σ-field B.
2.2 Discrete Time Martingales

In this section we consider a probability space (Ω, F, P ) and a nondecreasing
sequence of σ- fields
F0 ⊆ F1 ⊆ · · · ⊆ Fn ⊆ · · ·
contained in F. F := {Fn }n≥0 is called a filtration. A sequence of real random

variables M = {Mn }n≥0 is called a martingale with respect to the filtration F
if:
(i) For each n ≥ 0, Mn is Fn -measurable (that is, M is adapted to the
filtration F).
(ii) For each n ≥ 0, E(|Mn |) < ∞.
(iii) For each n ≥ 0,
E(Mn+1 |Fn ) = Mn .
The sequence M = {Mn } n≥0 is called a supermartingale ( or submartin-

gale) if property (iii) is replaced by E(Mn+1 |Fn ) ≤ Mn (or E(Mn+1 |Fn ) ≥ Mn ).
Notice that the martingale property implies that E(Mn ) = E(M0 ) for all
n ≥ 0.
On the other hand, condition (iii) can also be written as
E(∆Mn |Fn−1 ) = 0,
for all n ≥ 1, where ∆Mn = Mn − Mn−1 .
Example 2.2.1 Suppose that {ξn , n ≥ 1} are i.i.d. centered random variables.
Set M0 = 0 and Mn = ξ1 + ... + ξn , for n ≥ 1.Then Mn is a martingale w.r.t
(Fn ), with Fn = σ(ξ1 , ..., ξn ) for n ≥ 1, and F0 = {φ, Ω}. In fact,
Example 2.2.2 Suppose that {ξn , n ≥ 1} are i.i.d. random variables such that
ξ1 +...+ξn
P (ξn = 1} = 1 − P (ξn = −1} = p, on 0 < p < 1. Then Mn = 1−p p ,
2.2. DISCRETE TIME MARTINGALES 17
M0 = 1 is a martingale w.r.t (Fn ), with Fn = σ(ξ1 , ..., ξn ) for n ≥ 1, and

F0 = {φ, Ω}. In fact,
ξ +...+ξn+1 !
1−p 1
E(Mn+1 |Fn ) = E |Fn
p
ξ +...+ξn ξ !
1−p 1

1 − p n+1
= E |Fn
p p
ξ !
1 − p n+1
= Mn E = Mn .
p
In the two previous examples, Fn = σ(M0 , ..., Mn ), for all n ≥ 0. That is,
(Fn ) is the filtration generated by the process (Mn ). Usually, when the filtration
is not mentioned, we will take Fn = σ(M0 , ..., Mn ), for all n ≥ 0. This is always
possible due to the following result:
Proposition 2.2.1 Suppose (Mn )n≥0 is a martingale with respect to a filtration

(Gn ). Let Fn = σ(M0 , ..., Mn ) ⊆ Gn . Then (Mn )n≥0 is a martingale with respect
to the filtration (Fn ).
Proof. We have, by the properties 6 and 4 of the conditional expectation,
E(Mn+1 |Fn ) = E(E(Mn+1 |Gn )|Fn ) = E(Mn |Fn ) = Mn .
Some elementary properties of martingales:

1. If (Mn ) is a martingale, then for all m ≥ n we have
E(Mm |Fn ) = Mn .
In fact, for m > n + 1
E(Mm |Fn ) = E(E(Mm |Fm−1 )|Fn )

= E(Mm−1 |Fn ) = ... = E(Mn+1 |Fn ) = Mn .
2. (Mn ) is a submartingale if and only if (−Mn ) is a supermartingale.

3. If (Mn ) is a martingale and ϕ is a convex function, defined in an open
interval, such that E(|ϕ (Mm ) |) < ∞ for all n ≥ 0, then (ϕ (Mm )) is a sub-
martingale. In fact, by Jensen’s inequality for the conditional expectation we
have
E(ϕ (Mn+1 ) |Fn ) ≥ ϕ (E(Mn+1 |Fn )) = ϕ (Mn ) .
In particular, if (Mn ) is a martingale such that E(|Mn |p ) < ∞ for all n ≥ 0 and
for some p ≥ 1, then (|Mn |p ) is a submartingale.
4. If (Mn ) is a submartingale and ϕ is a convex an increasing function such

that E(|ϕ (Mm ) |) < ∞ for all n ≥ 0, then (ϕ (Mm )) is a submartingale. In fact,
by Jensen’s inequality for the conditional expectation we have
E(ϕ (Mn+1 ) |Fn ) ≥ ϕ (E(Mn+1 |Fn )) ≥ ϕ (Mn ) .
In particular, if If (Mn ) is a submartingale, then (Mn+ ) and (Mn ∨ a) are sub-

martingales.
Suppose that (Fn )n≥0 is a given filtration. We say that (Hn )n≥1 is a pre-
dictable sequence of random variables if for each n ≥ 1, Hn is Fn−1 -measurable.
We define the martingale transform of (Mn ) by (Hn ) as the sequence
n
X
(H · M )n = M0 + Hj ∆Mj , n ≥ 1.
j=1
Proposition 2.2.2 If (Mn )n≥0 is a (sub)martingale and (Hn )n≥1 is a bounded

(nonnegative) predictable sequence, then the martingale transform ((H · M )n )
is a (sub)martingale.
Proof. Clearly, for each n ≥ 0 the random variable (H · M )n is Fn -

measurable and integrable. On the other hand, if n ≥ 0 we have
E (∆(H · M )n+1 |Fn ) = E (Hn+1 ∆Mn+1 |Fn )

= Hn+1 E (∆Mn+1 |Fn ) = 0.
Remark 2.2.1 We may think of Hn as the amount of money a gambler will

bet at time n. Suppose that ∆Mn = Mn − Mn−1 is the amount of money a
gambler can win or lose at every step of the game if the bet is 1 Euro, and M0
is the initial capital of the gambler. Then, Mn will be the fortune of the gambler
at time n if in each step the bet is 1 Euro, and (H · M )n will be the fortune
of the gambler is he uses the gambling system (Hn ). The fact that (Mn ) is a
martingale means that the game is fair. So, the previous proposition tells us
that if a game is fair, it is also fair regardless the gambling system (Hn ).
Example 2.2.3 Suppose that M0 = 0 and Mn = ξ1 + ... + ξn , n ≥ 1, where

{ξn , n ≥ 1} are i.i.d. random variables such that P (ξn = 1} = P (ξn = −1} = 21 .
(Mn ) is a martingale. A famous system to bet is ”the doubling strategy” defined
as
H1 = 1
2n−1

if ξ1 = .... = ξn−1 = −1
Hn = , n > 1,
0 otherwise
2.3. STOPPING TIMES 19
then, if we stop at τ = inf{k, ξk = 1} we will have

τ
X
(H · M )τ = Hn ∆Mn
i=1
τ
X −1
=− 2n−1 + 2n = 1! (a.s)
i=1
k
and our initial capital was zero! Note that P (τ = k) = 21 and so P (τ <
∞) = 1 then we will stop at a finite time. However, that result does not con-
tradict our previous proposition, since if we stop the si process by τ and we look
what happens at a fix time n and its relation with previous times we obtain a
martingale, that is
E((H · M )n∧τ |Fn−1 ) = (H · M )n−1∧τ for all n ≥ 1.
It is remarkable that the term martingale is originated from a French acronym

for the gambling strategy described above.
2.3 Stopping times

Consider a filtration F = {Fn }n≥0 in a probability space (Ω, F, P ). A (general-
ized) random variable is called a stopping time if T indicates the time when an
observable event happens. In this way at time n we know the value of T ∧(n+1),
in other words T ∧ (n + 1) is Fn -medible, so equivalently {T ≤ n} ∈ Fn for all
n ≥ 0.
Definition 2.3.1 A (generalized) random variable T is called a stopping time

if
T : Ω −→ Z+ ∪ {∞},
satisfies {T ≤ n} ∈ Fn .
Proposition 2.3.1 T is an F-stopping time iff
{T = n} ∈ Fn , for all n ≥ 0.
Proof. {T ≤ n} = ∪nj=1 {T = j}
{T = n} = {T ≤ n} ∩ {T ≤ n − 1}c .
Example 2.3.1 Consider a discrete time stochastic process (Xn ) adapted to F.

Let A be a subset of the space state. Then the first hitting time of A, TA is a
stopping time time because
{TA = n} = {X0 6∈ A, X1 6∈ A, ..., Xn−1 6∈ A, Xn ∈ A} .
Remark 2.3.1 We could consider random times with certain delay, for in-
stance if {T ≤ n − 1} ∈ Fn . Or even we can consider predictable times:
{T ≤ n} ∈ Fn−1 .
Remark 2.3.2 The extension of the notion of stopping time to continuous time
is evident: the (generalized) random variable
T : Ω −→ [0, ∞]
is a stopping time if {T ≤ t} ∈ Ft . Random times with infinitesimal delay are

called optional times: {T < t} ∈ Ft (equivalently {T ≤ t} ∈ Ft+ := ∩s>t Fs ),
random times with infinitesimal anticipation are called predictable times: {T ≤
t} ∈ Ft− := σ(∪s<t Fs ).
Consider a real-valued stochastic process {Xt , t ≥ 0} with continuous tra-

jectories, and adapted to F = {Ft , t ≥ 0}. Assume X0 = 0 and a > 0. The first
passage time for a level a ∈ R defined by
Ta := inf{t > 0 : Xt = a}
is a stopping time because
{Ta ≤ t} = { sup Xs ≥ a} = { sup Xs ≥ a} ∈ Ft .

0≤s≤t 0≤s≤t,s∈Q
Properties of stopping times:

1. If S and T are stopping times, so are S ∨ T and S ∧ T . In fact, this a
consequence of the relations
{S ∨ T ≤ t} = {S ≤ t} ∩ {T ≤ t} ,
{S ∧ T ≤ t} = {S ≤ t} ∪ {T ≤ t} .
2. Given a stopping time T , we can define the σ-field
FT = {A : A ∩ {T ≤ t} ∈ Ft , for all t ≥ 0}.

FT is a σ-field because it contains the empty set, it is stable by complements
due to
c c
Ac ∩ {T ≤ t} = {A ∪ {T > t}} = {(A ∩ {T ≤ t}) ∪ {T > t}} ,
and it is stable by countable intersections.

3. If S and T are stopping times such that S ≤ T , then FS ⊂ FT . In fact,
if A ∈ FS , then
A ∩ {T ≤ t} = A ∩ {S ≤ t} ∩ {T ≤ t} ∈ Ft
for all t ≥ 0.
Remark 2.3.3 The σ-field FT have the meaning of the information before the
stopping time T . We observe what happens until time T , time T included.
2.4. OPTIONAL STOPPING THEOREM 21
4. Let {Xt } be an adapted stochastic process (with discrete or continuous

parameter) and let T be a stopping time. If the parameter is continuous, we
assume that the trajectories of the process {Xt } are right-continuous. Then the
random variable
XT (ω) = XT (ω) (ω)
is FT -measurable. In discrete time this property follows from the relation
{XT ∈ B} ∩ {T = n} = {Xn ∈ B} ∩ {T = n} ∈ Fn
for any subset B of the state space (Borel set if the state space is R).
2.4 Optional stopping theorem

Consider a discrete time fitration and suppose that T is a stopping time. Then,
the process
Hn = 1{T ≥n}
is predictable. In fact, {T ≥ n} = {T ≤ n − 1}c ∈ Fn−1 . The martingale
transform of Mn by this sequence is
n
X
(H · M )n = M0 + 1{T ≥j} ∆Mj ,
j=1
T
X ∧n
= M0 + ∆Mj = MT ∧n ,
j=1

as a consequence, if {Mn } is a (sub)martingale, the stopped process MnT :=
{MT ∧n } will be a (sub)martingale.
Theorem 2.4.1 (Optional Stopping Theorem) Suppose that {Mn } is a sub-

martingale and S ≤ T < m are two stopping times bounded by a fixed time m.
E(MT |FS ) ≥ MS
with equality in the martingale case.
Proof. We make the proof only in the martingale case. Notice first that
MT is integrable because
Xm
|MT | ≤ |Mn |.
n=0
Consider the predictable process Hn = 1{S<n≤T }∩A , where A ∈ FS . Notice

that {Hn } is predictable because
c
{S < n ≤ T } ∩ A = {T < n} ∩ ({S ≤ n − 1} ∩ A) ∈ Fn−1 .
Moreover, the random variables Hn are nonnegative and bounded by one. There-
fore by Proposition 2.2.2, (H · M )n is a martingale. We have
(H · M )0 = M0
(H · M )m = M0 + 1A (MT − MS ).
The martingale property of (H · M )n implies that E(1A (MT − MS )) = 0 for all

A ∈ FS and this implies that E(MT |FS ) = MS , because MS is FS -measurable.
Theorem 2.4.2 (Doob’s Maximal Inequality) Suppose that {Mn } is a sub-

martingale and λ > 0. Then
1 1 +
P ( sup Mn ≥ λ) ≤ E(MN 1{sup0≤n≤N Mn ≥λ} ≤ E(MN )
0≤n≤N λ λ
Proof. Consider the stopping time
T = inf{n ≥ 0, Mn ≥ λ} ∧ N.
Then, by the Optional Stopping Theorem,
E(MN ) ≥ E(MT ) = E(MT 1{sup0≤n≤N Mn ≥λ} )

+ E(MT 1{sup0≤n≤N Mn <λ} )
≥ E(λ1{sup0≤n≤N Mn ≥λ} ) + E(MN 1{sup0≤n≤N Mn <λ} ).
Corollary 2.4.1 If {Mn } is a martingale and p ≥ 1, and E(|Mn |p ) < ∞ then

1
P ( sup Mn ≥ λ) ≤ E(|MN |p ).
0≤n≤N λp
Remark 2.4.1 Note that the last inequality is a generalization of the Chebyshev
inequality.
2.5 The Snell envelope and optimal stopping

Let (Yn ) be an adapted process (to (Fn )) with finite expectation, define
XN = YN
Xn = max(Yn , E(Xn+1 |Fn )), 0 ≤ n ≤ N − 1,
we say that (Xn ) is the Snell envelope of (Yn ).
Proposition 2.5.1 The sequence (Xn ) is the smallest a supermartingale that

dominates the sequence(Yn )
2.5. THE SNELL ENVELOPE AND OPTIMAL STOPPING 23
Proof. (Xn ) is adapted and by construction
E(Xn+1 |Fn ) ≤ Xn .
Let (Tn ) be another supermartingale that dominates (Yn ), then TN ≥ YN = XN .

Assume that Tn+1 ≥ Xn+1 . Then, by the monotony of the expectation and since
(Tn ) is a supermartingale
Tn ≥ E(Tn+1 |Fn ) ≥ E(Xn+1 |Fn )
moreover (Tn ) dominates (Yn ), so
Tn ≥ max(Yn , E(Xn+1 |Fn )) = Xn
Remark 2.5.1 Fixed ω if Xn is strictly greater than Yn , Xn = E(Xn+1 |Fn )

so Xn behaves, until this n as a martingale, this indicates that if we ”stop” Xn
properly we can have a martingale.
Proposition 2.5.2 The random variable
ν = inf{n ≥ 0, Xn = Yn }
is a stopping time and (Xnν ) is a martingale.
Proof.
{ν = n} = {X0 > Y0 } ∩ ... ∩ {Xn−1 > Yn−1 } ∩ {Xn = Yn } ∈ Fn .
And
n
X
Xnν = X0 + 1{j≤ν} (Xj − Xj−1 )
j=1
therefore
ν
Xn+1 − Xnν = 1{n+1≤ν} (Xn+1 − Xn )
and
ν
E(Xn+1 − Xnν |Fn ) = 1{n+1≤ν} E(Xn+1 − Xn |Fn )

0 if ν ≤ n since the indicator vanishes
=
0 if ν > n since in such a case Xn = E(Xn+1 |Fn )
We denote τn,N stopping times with values in {n, n + 1, ..., N }.
Corollary 2.5.1
X0 = E(Yν |F0 ) = sup E(Yτ |F0 )

τ ∈τ0,N
Proof. (Xnν ) is a martingale and consequently

ν
X0 = E(XN |F0 ) = E(XN ∧ν |F0 )
= E(Xν |F0 ) = E(Yν |F0 ).
On the other hand (Xn ) is supermartingale and then (Xnτ ) as well for all τ ∈
τ0,N , so
τ
X0 ≥ E(XN |F0 ) = E(Xτ |F0 ) ≥ E(Yτ |F0 ),
therefore
E(Yν |F0 ) ≥ E(Yτ |F0 ), ∀τ ∈ τ0,N
Remark 2.5.2 Analogously we could prove
Xn = E(Yνn |Fn ) = sup E(Yτ |Fn ),

τ ∈τn,N
where
νn = inf{j ≥ n, Xj = Yj }
Definition 2.5.1 A stopping time ν is said to be optimal for the sequence (Yn )
if
E(Yν |F0 ) = sup E(Yτ |F0 ).
τ ∈τ0,N
Remark 2.5.3 The stopping time ν = inf{n, Xn = Yn } (where X is the Snell

envelope of Y ) is then an optimal stopping time for Y . We shall see the it is
the smallest optimal stopping time.
The following theorem characterize the optimal stopping times.
Theorem 2.5.1 τ is an optimal stopping time if and only if

Xτ = Yτ
(Xnτ ) is a martingale
Proof. If (Xnτ ) is a martingale and Xτ = Yτ

τ
X0 = E(XN |F0 ) = E(XN ∧τ |F0 )
= E(Xτ |F0 ) = E(Yτ |F0 ).
On the other hand for all stopping time π, (Xnπ ) is a supermartingale, so

π
X0 ≥ E(XN |F0 ) = E(Xπ |F0 ) ≥ E(Yπ |F0 ).
Reciprocally, we know, by the previous corollary, that X0 = supτ ∈τ0,N E(Yτ |F0 ).
Then, if τ is optimal
X0 = E(Yτ |F0 ) ≤ E(Xτ |F0 ) ≤ X0 ,

2.6. DECOMPOSITION OF SUPERMARTINGALES 25
where the last inequality is due to the fact that (Xnτ ) is a supermartingale. So,
we have
E(Xτ − Yτ |F0 ) = 0
and since Xτ − Yτ ≥ 0, we conclude that Xτ = Yτ .
Now we can also see that (Xnτ ) is a martingale. We know that it is a super-
martingale, then
X0 ≥ E(Xnτ |F0 ) ≥ E(XN

τ
|F0 ) = E(Xτ |F0 ) = X0
as we saw before. Then, for all n
E(Xnτ − E(Xτ |Fn )|F0 ) = 0,
and since (Xnτ ) is supermartingale,
Xnτ ≥ E(XN
τ
|Fn ) = E(Xτ |Fn )
therefore Xnτ = E(Xτ |Fn ).
2.6 Decomposition of supermartingales

Proposition 2.6.1 Any supermartingale (Xn ) has a unique decomposition:
Xn = Mn − An
where (Mn ) is a martingale and (An ) is non-decreasing predictable with A0 = 0.
Proof. It is enough to write

n
X n
X
Xn = (Xj − E(Xj |Fj−1 )) − (Xj−1 − E(Xj |Fj−1 )) + X0
j=1 j=1
and to identify
n
X
Mn = (Xj − E(Xj |Fj−1 )) + X0 ,
j=1
Xn
An = (Xj−1 − E(Xj |Fj−1 ))
j=1
where we define M0 = X0 and A0 = 0. So (Mn ) is a martingale:
Mn − Mn−1 = Xn − E(Xn |Fn−1 ), 1≤n≤N
in such a way that
E(Mn − Mn−1 |Fn−1 ) = 0, 1 ≤ n ≤ N.

Finally since (Xn ) is supermartingale
An − An−1 = Xn−1 − E(Xn |Fn−1 ) ≥ 0, 1 ≤ n ≤ N.
Now we can see the uniqueness. If
Mn − An = Mn0 − A0n , 0≤n≤N
we have
Mn − Mn0 = An − A0n , 0 ≤ n ≤ N,
but then since (Mn ) y (Mn0 ) are martingales and (An ) y (A0n ) predictable, it
turns out that
An−1 − A0n−1 = Mn−1 − Mn−1

0
= E(Mn − Mn0 |Fn−1 )
= E(An − An |Fn−1 ) = An − A0n , 1 ≤ n ≤ N,
0
that is
AN − A0N = AN −1 − A0N −1 = ... = A0 − A00 = 0,
since by hypothesis A0 = A00 = 0.
This decomposition is known as the Doob decomposition.
Proposition 2.6.2 The biggest optimal stopping time for (Yn ) is given by

N si AN = 0
νmax = ,
inf{n, An+1 > 0} si AN > 0
where (Xn ), Snell envelope of (Yn ), has a Doob decomposition Xn = Mn − An .
Proof. {νmax = n} = {A1 = 0, A2 = 0, ..., An = 0, An+1 > 0} ∈ Fn ,

0 ≤ n ≤ N − 1, {νmax = N } = {AN = 0} ∈ FN −1 . So, it is a stopping time.
Xnνmax = Xn∧νmax = Mn∧νmax − An∧νmax = Mn∧νmax
since An∧νmax = 0. Therefore (Xnνmax ) is a martingale. So, to see that this

stopping time is optimal we have to prove that
Xνmax = Yνmax
N
X −1
Xνmax = 1{νmax =j} Xj + 1{νmax =N } XN
j=1
N
X −1
= 1{νmax =j} max(Yj , E(Xj+1 |Fj )) + 1{νmax =N } YN ,
j=1
but in {νmax = j}, Aj = 0, Aj+1 > 0 so
E(Xj+1 |Fj ) = E(Mj+1 |Fj ) − Aj+1 < E(Mj+1 |Fj ) = Mj = Xj

2.7. MARTINGALE CONVERGENCE THEOREMS 27
therefore Xj = Yj en {νmax = j} and consequently Xνmax = Yνmax . Finally we see

that is the biggest optimal stopping time. Let τ ≥ νmax and P {τ > νmax } > 0.
Then
E(Xτ ) = E(Mτ ) − E(Aτ ) = E(M0 ) − E(Aτ )

= X0 − E(Aτ ) < X0
so (Xτ ∧n ) cannot be a martingale.
2.7 Martingale convergence theorems

Let {Mn }n≥0 be a supermartingale and a < b ∈ R, let UN [a, b](ω) be the number
of upcrossings made by (Mn (ω)) by time N . Then define a predictable strategy
H1 = 1{M0 <a} and Hn = 1{Mn−1 <a,Hn−1 =0} + 1{Mn−1 ≤b,Hn−1 =1} , for n ≥ 2.
Then
Xn
(H · M )n := Hj (Mj − Mj−1 )
j=1
is a supermartingale that represents the gain by time n in a game where the

profit at time n if the bet is 1 euro. It is easy to see (by a simple picture) that
(H · M )n ≥ (b − a)Un [a, b] − (Mn − a)− , n ≥ 1
then we have:
Lemma 2.7.1 Let {Mn }n≥0 be a supermartingale let UN [a, b](ω) be the number
of upcrossings by time N. Then
(b − a)E(UN [a, b]) ≤ E((MN − a)− ).
Proof. The process (H · M )n above is a supermartingale then E((H · M )N ) ≤

0.
Corollary 2.7.1 Let {Mn }n≥0 be a supermartingale such that supn E(|Mn |) <
∞. Let a < b ∈ R and U∞ [a, b] := limn→∞ Un [a, b], then
(b − a)E(U∞ [a, b]) ≤ |a| + sup E(|Mn |) < ∞,

n
so that
P (U∞ [a, b] = ∞) = 0.
Proof. By the previous lemma
(b − a)E(UN [a, b]) ≤ |a| + E(|MN |)
and taking the limit when N → ∞ and using the monotone convergence theorem
we obtain the result.
Theorem 2.7.1 Let {Mn }n≥0 be a supermartingale such that supn E(|Mn |) <
∞. Then almost surely M∞ := limn→∞ Mn exists and is finite.
Proof. Let Λ = {ω, Mn (ω) does not converge to a limit in [−∞, +∞]},
then
Λ = ∪a<b∈Q {ω, lim inf Mn (ω) < a < b < lim sup Mn (ω)}
= ∪a<b∈Q Λab (say),
but
Λab ⊂ {U∞ [a, b] = ∞} ,
and by the previous corollary P (Λab ) = 0 and then P (Λ) = 0. So M∞ exists in
[−∞, +∞] a.s., but by Fatou’s lemma
E (|M∞ |) = E (lim inf |Mn |) ≤ lim inf E (|Mn |) ≤ sup E(|Mn |) < ∞,
n
so M∞ is finite a.s..
Corollary 2.7.2 If {Mn }n≥0 is a non negative supermartingale then limn→∞ Mn

exists and is finite almost surely.
Proof. E(|Mn |) = E(Mn ) ≤ E(M0 ).
Remark 2.7.1 It is important to note that it need no be true that Mn → M∞

in L1 .
Example 2.7.1 Suppose that {ξn , n ≥ 1} are i.i.d. centered random variables
with distribution N (0, 2σ 2 ). Set M0 = 1 and
 
Xn
Mn = exp  ξj − nσ 2  .
j=1
Then, {Mn } is a nonnegative martingale such that Mn → 0 a.s., by the strong

law of large numbers, but E(Mn ) = 1 for all n.
Theorem 2.7.2 Let {Mn }n≥0 be a martingale, such that supn E(Mn2 ) < ∞,
then Mn → M∞ a.s. and in L2 .
Proof. Obviously supn E(|Mn |) < ∞ so we have amost sure convergence to

M∞ , on the other hand by the orthogonality of the increments
n
X
E(Mn2 ) = E(M02 ) + E((Mk − Mk−1 )2 ),
k=1
so
∞
X
E((Mk − Mk−1 )2 ) < ∞,
k=1
2.7. MARTINGALE CONVERGENCE THEOREMS 29
and applying Fatou’s lemma in

n+r
X
E((Mn+r − Mn )2 ) = E((Mk − Mk−1 )2 ),
k=n+1
when r → ∞, we have
∞
X
E((M∞ − Mn )2 ) ≤ E((Mk − Mk−1 )2 ),
k=n+1
so
lim E((M∞ − Mn )2 ) = 0.
n→∞
Chapter 3
Markov Processes
3.1 Discrete-time Markov chains
Definition 3.1.1 We say that (Xn )n≥0 is a Markov chain with initial distri-
bution λ := (λi , i ∈ I) and transition matrix P := (pij , i, j ∈ I)
(i) P {X0 = i} = λi , i ∈ I
(ii) for all n ≥ 0, P (Xn+1 = in+1 |X0 = i0 , X1 = i1 , ..., Xn = in ) = pin in+1 .
Where I is a countable set. We say that (Xn )n≥0 is Markov(λ, P ) for short.
We regard λ as a row vector whose components are indexed by I and P a

(n)
matrix whose entries are indexed by I × I. We write pij = (P n )ij for the (i, j)

entry in P n . We agree that P 0 is the identity matrix, that is P 0 ij = δij .
Proposition 3.1.1 Let (Xn )n≥0 be Markov(λ, P ). Then for all n, m ≥ 0
P (Xn+m = j|X0 = i0 , , ..., Xn = i) = P (Xn+m = j|Xn = i)

= P (Xm = j|X0 = i).
31
32 CHAPTER 3. MARKOV PROCESSES
Proof.
P (Xn+m = j|Xn = i)
P (Xn+m = j, Xn = i)
= =
P (Xn = i)
P P P
i0 ∈I i1 ∈I ... in+m−1 ∈I P (X0 = i0 , ..., Xn = i)piin+1 · · · pin+m−1 j
= P P P
i0 ∈I i1 ∈I ... in ∈I P (X0 = i0 , ..., Xn = i)
X X
= ... piin+1 · · · pin+m−1 j
in+1 ∈I in+m−1 ∈I
P P
in+1 ∈I ... in+m−1 ∈I P (X0 = i, X1 = in+1 ..., Xm−1 = in+m−1 , Xm = j)
=
P (X0 = i)
P (X0 = i, Xm = j)
= = P (Xm = j|X0 = i)
P (X0 = i)
Remark 3.1.1 Note also that
P (Xn+1 = j1 , Xn+2 = j2 , ..., Xn+m = jm |X0 = i0 , , ..., Xn = i)

= P (X1 = j1 , X2 = j2 , ..., Xm = jm |X0 = i).
Proposition 3.1.2 Let (Xn )n≥0 be Markov(λ, P ). Then, for all n, m ≥ 0,
(i) P {Xn = j} = (λP n )j , j ∈ I

(n)
(ii) Pi (Xn = j) := P (Xn = j|, X0 = i) = P (Xn+m = j|, Xm = i) = pij .
Proof. (i):
XX X
P {Xn = j} = ... P (X0 = i0 , ..., Xn = j)
i0 ∈I i1 ∈I in−1 ∈I
XX X
= ... P (Xn = j|Xn−1 = in−1 , ..., X0 = i0 )P (Xn−1 = in−1 , ..., X0 = i0 )
i0 ∈I i1 ∈I in−1 ∈I
XX X
= ... pin−1 j P (Xn−1 = in−1 , ..., X0 = i0 )
i0 ∈I i1 ∈I in−1 ∈I
XX X
= ... pin−2 in−1 pin−1 j P (Xn−2 = in−2 , ..., X0 = i0 ) =
i0 ∈I i1 ∈I in−1 ∈I
XX X
= ... = ... λi0 pi0 i1 · · · pin−2 in−1 pin−1 j = (λP n )j .
i0 ∈I i1 ∈I in−1 ∈I
(n)
pii1 · · · pin−1 j = (P n )ij = pij .
P P
(ii): P (Xn = j|, X0 = i) = i1 ∈I ... in−1 ∈I
Example 3.1.1 The most general two-state chain has transition matrix

1−α α
P = .
β 1−β
3.1. DISCRETE-TIME MARKOV CHAINS 33
(n)
If we want to calculate p11 we can use the relation P n+1 = P n P , then
(n+1) (n) (n)
p11 = p11 (1 − α) + p12 β.
We know that
(n) (n)
p11 + p12 = P1 (Xn = 1 or 2) = 1,
so
(n+1) (n)
p11 = p11 (1 − α − β) + β, n ≥ 0
(0)
with p11 = 1. This has a unique solution
β α n
(n)
p11 = α+β + α+β (1 − α − β) for α + β > 0
1 for α + β = 0.
3.1.1 Class structure

It is sometimes possible to split a Markov chain into smaller pieces, each of
which is relatively easy to analyze, and which together we can understand the
whole.This is done by identifying the communicating classes.
We say that i leads to j and write i → j if
Pi (Xn = j for some n ≥ 0) > 0.
We say that i communicates with j and write i ←→ j if both i → j and j → i.
Proposition 3.1.3 The following are equivalent
(i) i → j;
(ii) pi0 i1 · · · pin−1 in > 0 for some states i0 , i1 , ..., in , with i0 = i and in = j;
(n)
(iii) pij > 0 for some n ≥ 0.
Proof. (i)∼(iii):
∞
(n) (n)
X
pij ≤ Pi (Xn = j for some n ≥ 0) ≤ pij .
n=0
(ii)∼ (iii):
(n)
X
pij = pii1 · · · pin−1 j.
i1 ,i2 ,...,in−1
It is clear that i ←→ j is an equivalence relation. So, it partitions I (the

state space) into communicating classes. We say that a class C is closed if
i ∈ C, i → j imply j ∈ C.
We cannot escape from a closed class. A state i is absorbing if {i} is a closed

class. A chain or matrix P is called irreducible if I is a single class.
Example 3.1.2 Consider the stochastic matrix

 1 1 
2 2 0 0 0 0
 0 0 1 0 0 0 
 1 1 1


3 0 0 3 3 0 
P =  0 0 0 1 1
,
 2 2 0 

 0 0 0 0 0 1 
0 0 0 0 1 0
the communicating classes associated with it are: {1, 2, 3}, {4} and {5, 6}, with
only {5, 6} being closed.
3.1.2 Hitting times and absortion probabilities

A
Let τ be the hitting time of a subset A ⊆ I:
τ A = inf{n ≥ 0, Xn ∈ A}.
One purpose of this subsection is to learn how to calculate the probability,

starting from i, that (Xn )n≥0 ever hits A that is
hA A
i := Pi (τ < ∞).
When A is a closed class this probability is named absortion probability. The

other purpose is to calculate the mean time taken for (Xn )n≥0 to reach A:
X
kiA := Ei (τ A ) = nPi (τ A = n) + ∞Pi (τ A = ∞).
1≤n<∞
Proposition 3.1.4
X
Ei (τ A ) = Pi (τ A ≥ n)
1≤n≤∞
Proof. X
Pi (τ A ≥ n) = Pi (τ A = m),
n≤m≤∞
then
X X X
Pi (τ A ≥ n) = Pi (τ A = m)
1≤n≤∞ 1≤n≤∞ n≤m≤∞
X X
= Pi (τ A = m)
1≤m≤∞ 1≤n≤m
X
= mPi (τ A = m).
1≤m≤∞
Example 3.1.3 Consider the following transition matrix

 
1 0 0 0
 1 0 1 0 
P = 2 2
 0 1 0 1 ,

2 2
0 0 0 1
{4} {1,4}
Introduce hi := hi and ki =: ki . Then is clear that h1 = 0, h4 = 1 and
k1 = k4 = 0. Also we have
1 1 1 1
h2 = h1 + h3 , k2 = 1 + k1 + k3 ,
2 2 2 2
and
1 1 1 1
h3 = h2 + h4 , k3 = 1 + k2 + k4 .
2 2 2 2
Therefore
1 1
h2 = h3 , k2 = 1 + k3
2 2
1 1 1
h3 = h2 + , k3 = 1 + k2
2 2 2
and then
1 2
h2 = , h3 =
3 3
k2 = k3 = 2.
Proposition 3.1.5 The vector of hitting probabilities hA = (hA i : i ∈ I) is the

minimal non-negative solution to the system of linear equations
A
hi = P1 for i ∈ A
(3.1)
hA
i = j∈I p ij h A
j for i 6∈ A.
where minimality means that if x = (xi : i ∈ I) is another solution with xi ≥ 0

for all i, then xi ≥ hA
i , for all i.
Proof. First we prove that hA satisfies (3.1). If X0 = i ∈ A then τ A = 0,

so hA A
i = 1. If X0 = i 6∈ A, then τ ≥ 1 and by the Markov property
Pi (τ A < ∞|X1 = j) = Pj (τ A < ∞) = hA

j .
Then
X
hA
i = Pi (τ A < ∞, X1 = j)
j∈I
X
= Pi (τ A < ∞|X1 = j)Pi (X1 = j)
j∈I
X X
= Pj (τ A < ∞)Pi (X1 = j) = pij hA
j .
j∈I j∈I
Suppose now that x = (xi : i ∈ I) is another solution to (3.1) with xi ≥ 0. Then

hA
i = xi = 1 for i ∈ A consequently, if i 6∈ A
X X X
xi = pij xj = pij + pij xj .
j∈I j∈A j6∈A
Now if we substitute for xj we obtain

 
X X X X
xi = pij + pij  pjk + pik xk 
j∈A j6∈A k∈A k6∈A
X XX XX
= pij + pij pjk + pij pik xk
j∈A j6∈A k∈A j6∈A k6∈A
XX
= Pi (X1 ∈ A) + Pi (X1 6∈ A, X2 ∈ A) + pij pik xk .
j6∈A k6∈A
By repeating the substitution, after n steps we have

xi = Pi (X1 ∈ A) + ... + Pi (X1 6∈ A, ..., Xn−1 6∈ A, Xn ∈ A)+
X X
+ ... pij1 pj1 j2 · · · pjn−1 jn xjn .
j1 6∈A jn 6∈A
So, since tha last term is non-negative, for all n ≥ 1

xi ≥ Pi (X1 ∈ A) + ... + Pi (X1 6∈ A, ..., Xn−1 6∈ A, Xn ∈ A)
= Pi (τ A ≤ n).
Finally
xi ≥ lim Pi (τ A ≤ n) = Pi (τ A < ∞) = hA
i .
n−→∞
Proposition 3.1.6 The vector of mean hitting times k A = (kiA : i ∈ I) is the

A
ki = 0 P for i ∈ A
(3.2)
kiA = 1 + j6∈A pij kjA for i 6∈ A.
Proof. First we prove that k A satisfies the system (3.2). If i ∈ A then

A
τ = 0 and kiA = 0. If X0 = i 6∈ A, then τ A ≥ 1 and by the Markov property
Ei (τ A |X1 = j) = 1 + Ej (τ A ).
Then
X X
Ei (τ A ) = Ei (τ A 1{X1 =j} ) = Ei (τ A |X1 = j)Pi (X1 = j)
j∈I j∈I
X X
A
=1+ Ej (τ )pij = 1 + pij kjA .
j∈I j6∈A
Now, assume that y = (yi , i ∈ I) is a solution of the system (3.2). Then

X X X
yi = 1 + pij yj = 1 + pij (1 + pjk yk )
j6∈A j6∈A k6∈A
X XX
=1+ pij + pij pjk yk
XX
= Pi (τ A ≥ 1) + Pi (τ A ≥ 2) + pij pjk yk
j6∈A k6∈A
= ... = Pi (τ A ≥ 1) + ... + Pi (τ A ≥ n)
X X
+ ... pij1 · · · pjn−1 jn yjn ,
j1 6∈A jn 6∈A
and X
yi ≥ Pi (τ A ≥ n).
1≤n≤∞
(Gambler’s ruin) Consider the transition probabilities
p00 = 1
pi,i−1 = q > 0, pi,i+1 = p = 1 − q > 0, for i = 1, 2, ....
Imagine you enter in the casino with 1 euro and you can win 1 euro with
probability por to lose it with probability q. The resources of the casino are
regarded as infinite, so there is no upper bound to your fortune. But which is
the probability you finish broke?
Set hi = Pi (τ {0} < ∞), then hi is the minimal solution of the system
h0 = 1,
hi = qhi−1 + phi+1 , i = 1, 2, ...
If p 6= q this recurrence equation has a general solution

i
q
hi = A + B .
p
If p < q (quite often) the restriction 0 ≤ hi ≤ 1 forces B to be zero so hi = 1

for all i. If p > q, since h0 = 1, we have a family of solutions
i i
q q
hi = + A(1 − ),
p p
i
q
and since hi ≥ 0 we must have A ≥ 0, then the minimal solution is hi = p .
Finally if p = q the recurrent relation has a general solution
hi = A + Bi
and again the restriction 0 ≤ hi ≤ 1 forces B to be zero, so hi = 1 for all i.

Thus, even if you find a fair casino and you have a lot of money, you are certain
to end up broke.
Proposition 3.1.7 (Strong Markov property) Let (Xn )n≥0 be Markov(λ, P ).
And τ a finite stopping time associated to (Xn )n≥0 and A ∈ Fτ then for all
m≥0
P (Xτ +m = j|A, Xτ = i) = P (Xm = j|X0 = i).
That is (XT +n )n≥0 is Markov(µ, P ) where µi = P (XT = i).
Proof.
P (Xτ +m = j, A, Xτ = i)
P (Xτ +m = j|A, Xτ = i) = =
P (A, Xτ = i)
P
n<∞ P (Xτ +m = j, A, Xτ = i, τ = n)
= P
n<∞ P (A, Xτ = i, τ = n)
P
n<∞ (Xn+m = j, A, Xn = i, τ = n)
P
= P
n<∞ P (A, Xn = i, τ = n)
P
n<∞ P (Xn+m = j|Xn = i)P (A, Xn = i, τ = n)
= P
n<∞ P (A, Xn = i, τ = n)
P
n<∞ P (Xm = j|X0 = i)P (A, Xn = i, τ = n)
= P
n<∞ P (A, Xn = i, τ = n)
P
P (Xm = j|X0 = i) n<∞ P (A, Xn = i, τ = n)
= P
n<∞ P (A, Xn = i, τ = n)
Remark 3.1.2 Note also that if τ is a finite stopping time associated to (Xn )n≥0
and A ∈ Fτ then for all m ≥ 0
P (Xτ +1 = j1 , Xτ +2 = j2 , ..., Xτ +m = jm |A, Xτ = i)
= P (X1 = j1 , X2 = j2 , ..., Xm = jm |X0 = i).
In particular assume that
τ0 = inf{n ≥ 0, Xn ∈ J ⊂ I}
and for m = 0, 1, 2, ...
τm+1 = inf{n > τm , Xn ∈ J ⊂ I},
then

P Xτm+1 = j|Xτ0 = i0 , Xτ1 = i1 , ..., Xτm = i
= P (Xτ1 = j|Xτ0 = i) ,
where for i, j ∈ J
P (Xτ1 = j|Xτ0 = i) = Pi (Xn = j for some n ≥ 1)
3.1.3 Recurrence and transience

Let (Xn )n≥0 be a Markov chain with transition matrix P. We say that a state
i is recurrent if
Pi (Xn = i, for infinitely many n) = 1.
We say that i is transient if
Pi (Xn = i, for infinitely many n) = 0.
Recall that the first passage time to state i is the random variable τ (i) defined
as
τ (i) = inf{n ≥ 1, Xn = i},
(i)
then we can define inductively the rth passage time τr to state i for r = 1, 2, ...
by
(i) (i)
τ1 = τ (i) , τr+1 = inf{n ≥ τr(i) + 1, Xn = i}.
The length for the rth excursion, for r ≥ 2, is given by
(
(i) (i) (i)
(i) τr − τr−1 if τr−1 < ∞
δr := (i) .
0 if τr−1 = ∞
(i)
Lemma 3.1.1 For r ≥ 2, conditional on τr−1 < ∞ and for all A ∈ Fτ (i) we
r−1
have
(i)
P (δr(i) = n|τr−1 < ∞, A) = Pi (τ (i) = n).
n o
(i)
Proof. By the strong Markov property and conditional on τr−1 < ∞ ,
(Xτ (i) +n )n≥0 is a Markov chain with transition matrix P = (pij ), pij =
r−1
(i)
P (X1 = j|X0 = i) and λj = δij for all j ∈ I. Then δr is the first passage time
of (Xτ (i) +n )n≥0 to state i, so
r−1
(i)
P (δr(i) = n|τr−1 < ∞, A) = P (τ (i) = n|X0 = i).
Let us introduce the number of visits Vi to i:

∞
X
Vi = 1{Xn =i} ,
n=0
note that
∞ ∞ ∞
(n)
X X X
Ei (Vi ) = Ei (1{Xn =i} ) = Pi (Xn = i) = pii .
n=0 n=0 n=0
Also we can compute the distribution of Vi under Pi in terms of the return

probability
fi = Pi (τ (i) < ∞).
Lemma 3.1.2 For r ≥ 0 we have Pi (Vi > r) = fir (we assume 00 = 1).
Proof. For Pi (Vi > 0) = 1 since X0 = i, so the result is true for r = 0.

(i)
{X0 = i, Vi > r} = {X0 = i, τr < ∞}, assume the result is true for r, then
(i)
Pi (Vi > r + 1) = Pi (τr+1 < ∞)
(i)
= Pi (τr(i) < ∞, δr+1 < ∞)
(i)
= Pi (δr+1 < ∞|τr(i) < ∞)Pi (τr(i) < ∞)
= Pi (τ (i) < ∞)fir .
Theorem 3.1.1 The following dichotomy holds:

∞
(n)
X
(i) if Pi (τ (i) < ∞) = 1 then i is recurrent and pii = ∞
n=0
∞
(n)
X
(ii) if Pi (τ (i) < ∞) < 1 then i is transient and pii < ∞.
n=0
Proof.
Pi (Vi = ∞) = lim Pi (Vi > n) = fin ,
n→∞
(i)
then if fi := Pi (τ < ∞) = 1, then Pi (Vi = ∞) = 1 and i is recurrent and
∞
(n)
X
pii = Ei (Vi ) = ∞.
n=0
On the other hand if fi < 1

∞ ∞ ∞
(n)
X X X
pii = Ei (Vi ) = Pi (V > n) = fin
n=0 n=0 n=0
1
= < ∞,
1 − fi
so Pi (Vi = ∞) = 0 and i is trasient.

Recurrence and trasience are class properties.
Theorem 3.1.2 Let C a communicating class. Then either all states in C are
trasient or recurrent.
Proof. Take a pair of states i, j ∈ C and assume that i is trasient. Then

(n) (m)
there exist n, m such that pij > 0 and pji > 0, and for all r ≥ 0
(n+m+r) (n) (r) (m)

pii ≥ pij pjj pji ,
then
∞ ∞
X (r) 1 (n+m+r)
X
pjj ≤ (n) (m)
pii < ∞,
r=0 pij pji r=0
so by the previous Theorem j is also trasient.
Theorem 3.1.3 Every recurrent class is closed.
Proof. Let C a class that is not closed. Then there exist i ∈ C, j 6∈ C and
m ≥ 1 such that
Pi (Xm = j) > 0.
Since we have that
Pi (Xm = j, Xn = i for infinitely many n) = 0,
this implies that
Pi (Xn = i for infinitely many n) < 1,
so i is not recurrent, and so neither is C.
Theorem 3.1.4 Every finite closed class is recurrent.
Proof. Suppose that C is close and that (Xn )n≥0 starts in C. Then for
some i ∈ C
P (Xn = i for infinitely many n) > 0,
since the class is finite. Let τ = inf{n ≥ 0, Xn = i} then
P (Xn = i for infinitely many n)

= P (τ < ∞)P (Xτ +m = i for infinitely many m|τ < ∞)
= P (τ < ∞)Pi (Xm = i for infinitely many m) > 0.
So this shows that i is not trasient.
3.1.4 Recurrence and trasient in a random walk

Consider the transition probabilities in Z
pi,i−1 = q > 0, pi,i+1 = p = 1 − q > 0, for i ∈ Z,
that is a simple random walk on Z. It is clear that

(2n+1)
p00 = 0,
and it is easy to see that

(2n) 2n
p00 = pn q n .
n
Bu Stirling’s formula
√
n! ∼ 2πnnn e−n , as n → ∞
so n
(2n) (2n)! n 1 (4pq)
p00 = 2 (pq) ∼
√ √ , as n → ∞.
(n!) π n
(2n)
If p = q = 21 , p00 ∼ √1 √1 ,
π n
and since
∞
X 1
√ = ∞,
n=1
n
we have that
∞
(n)
X
p00 = ∞,
n=0
so the random walk is recurrent. If, on the contrary, p 6= q, then 4pq = r < 1
(2n) rn
and p00 ∼ √1π √ n
, now since
∞
X rn
√ < ∞,
n=1
n
we have that
∞
(n)
X
p00 < ∞,
n=0
and the random walk is trasient.
3.1.5 Invariant distributions

Many of the long-time properties of Markov chains are connected with the notion
of an invariant distribution or measure. We say that λ is invariant if
λP = λ.
Theorem 3.1.5 Let (Xn )n≥0 be Markov(λ, P ) and suposse that λ is invariant
then (Xm+n )n≥0 is also Markov(λ, P ).
Proof. By the Markov property (Xm+n )n≥0 has the same transition matrix
with initial distribution P (Xm = i) = (λP m )i = λi .
Theorem 3.1.6 Let I be finite. Suppose for some i ∈ I that

(n)
pij → πj as n → ∞ for all j ∈ I.
Then π = {πj , j ∈ I} is an invariant distribution.

Proof. We have
(n) (n)
X X X
πj = lim pij = lim pij = 1,
n→∞ n→∞
j∈I j∈I j∈I
and
(n) (n−1) (n−1)
X X
πj = lim pij = lim pik pkj = lim p pkj
n→∞ n→∞ n→∞ ik
k∈I k∈I
X
= πk pkj .
k∈I
Example 3.1.4 Consider the two-state Markov chain with transition matrix

1−α α
P = .
β 1−β
Ignore te trivial case α = β = 0 and α = β = 1. Then, as we saw before

!
β α
Pn → α+β
β
α+β
α as n → ∞,
α+β α+β

β α
so, by the previous theorem, the distribution α+β , α+β must be invariant. Of
course we can solve

1−α α
λ 1 , λ2 = λ1 , λ2 ,
β 1−β
and we obtain
β α
λ1, λ2 = α+β , α+β .
Note also that , for the case α = β = 1 we do not have convergence of P n .
Example 3.1.5 Consider the three-state chain with matrix

 
0 1 0
P =  0 12 12  .
1
2 0 12
Then, since the eigenvalues are 1, i/2, −i/2, we deduce that

n n
(n) i i
p11 = a + b +c − ,
2 2
(n)
for some constants a, b and c. Since p11 is real and
n n n
i 1 inπ 1 nπ nπ
± = e± 2 = (cos ± i sin ),
2 2 2 2 2
then we can write

n
(n) 1 nπ nπ
p11 =α+ β cos + γ sin ,
2 2 2
for constants α, β and γ that can be determined by the boundary conditions
(0)
1 = p11 = α + β
(1) 1
0 = p11 = α + γ
2
(2) 1
0 = p11 = α − β.
4
So, α = 1/5, β = 4/5, γ = −2/5 and
n
(n) 1 1 4 nπ 2 nπ
p11 = + cos − sin .
5 2 5 2 5 2
Then to find the invariant distribution we can write down the components of the
vector equation
1
π1 = π3
2
1
π2 = π1 + π2
2
1 1
π3 = π2 + π3.
2 2
One equation is redundant but we have the additional condition
π1 + π2 + π3 = 1
and the soution is π = (1/5, 2/5, 2/5). Note that
(n) 1
p11 → as n → ∞,
5
so this confirm the previous theorem and also we could have used the result to
(2)
identify α = 1/5 instead of working out p11 .
For a fixed state k, consider for each i the expected time spent in i between
visits to k:  (k) 
τX −1
k
γi := Ek  1{Xn =i}. 
n=0
Theorem 3.1.7 Let P be irreducible and recurrent. Then

(i) γkk = 1;
(ii) γ k := (γik , i ∈ I) satisfies γ k P = γ k ;
(iii) 0 < γik < ∞ for all i ∈ I.
Proof. (i) is obvious. (ii) Since P is recurrent, under Pk , τ (k) < ∞ and
X0 = Xτ (k) = k
 (k) 
∞
τ
!
X X
k
γj = Ek  1{Xn =j}  = Ek 1{Xn =j} 1{n≤τ (k) }
n=1 n=1
∞
X ∞
XX
= Pk (Xn = j, n ≤ τ (k) ) = Pk (Xn−1 = i, Xn = j, n ≤ τ (k) )
n=1 i∈I n=1
∞
XX
= Pk (Xn = j|Xn−1 = i, n ≤ τ (k) )Pk (Xn−1 = i, n ≤ τ (k) )
i∈I n=1
 
X ∞
X X τ (k)
X −1
= pij Pk (Xn−1 = i, n ≤ τ (k) ) = pij E  1{Xm =i} 
i∈I n=1 i∈I m=0
X
= pij γik .
i∈I
(n)
(iii) Fixed i ∈ I, since P is irreducible, there exist m, n ≥ 0 such that pki > 0
(m) (n) (m)
and pik > 0, then by (i) and (ii) γik > γkk pki > 0 and γkk > γik pik .
Theorem 3.1.8 Let P be irreducible and let λ be an invariant measure for P

with λk = 1.Then λ ≥ γ k . If in addition P is recurrent, then λ = γ k .
Proof.
X X
λj = λ i0 pi 0 j = λi0 pi0 j + pkj
i0 ∈I i0 6=k
X X X
= λi1 pi1 i0 pi0 j + pkj + pki0 pi0 j
i0 6=k i1 6=k i0 6=k
X X X
= ... = ... λin pin in−1 · · · pi0 j
i0 6=k i1 6=k in 6=k
X X X X
+ pkj + pki0 pi0 j + ... + ... pkin−1 · · · pi0 j
i0 6=k i0 6=k i1 6=k in−1 6=k
X X X X
≥ pkj + pki0 pi0 j + ... + ... pkin−1 · · · pi0 j
i0 6=k i0 6=k i1 6=k in−1 6=k
→ γjk as n goes to infinity.
Since P is recurrent γ is invariant and µ := λ − γ will be also invariant, now

(n)
since it is irreducible, for any fixed i, there will be n ≥ 0 such that pik > 0 and
(n) (n)
X
0 = µk = µj pjk ≥ µi pik ,
j
so µi = 0.
A recurrent state is positive recurrent if the expected return time
mi := Ei (τ (i) )
is finite, otherwise is called null recurrent.
Remark 3.1.3 Note that

 (i)

τX −1 X X
mi = Ei  1{Xn =j}  + 1 = γji
n=0 j6=i j∈I
Theorem 3.1.9 Let P be irreducible. Then the following are equivalent:
(i) every state is positive recurrent;

(ii) some state i is positive recurrent;
(iii) P has an invariant distribution, π say.
Moreover, when (iii) holds we have mi = 1/πi , for all i.
Proof. (i)=⇒(ii) is evident. (ii)=⇒(iii). If i is recurrent then P is recur-

rent and by the Theorem 3.1.7 γ i is an invariant measure. But
X
γji = mi ,
j∈I
and mi < ∞, so πi := γji /mi defines an invariant distribution. (iii)=⇒(i).

Takes any state k, since P is ireducible there exist n ≥ 1 such that πk =
P (n) k
j∈I πi pik > 0. Set λi = πi /πk , then, by the previous theorem λ ≥ γ and
X X πi 1
mk = γik ≤ = <∞
πk πk
i∈I i∈I
and k is positive recurrent. Note finally that, since P is recurrent, by the previ-
ous theorem the above inequality is in fact an equality.
Theorem 3.1.10 If the state space is finite there is at least one stationary
distribution.
Proof. We can assume the chain is irreducible, then since it is finite, every
state is positive recurrent, in fact since
X
mk = γik
i∈I
and I is finite it is sufficient to see that γik < ∞, but this follows from Theorem
3.1.7.
3.1.6 Limiting behaviour

(n)
Let us call a state i is aperiodic if pii > 0 for all sufficiently large n. It can be
(n)
shown that i is aperiodic iff the set {n ≥ 0, pii > 0} has no common divisor
other than 1. It can be also shown that it is a class property.
Theorem 3.1.11 Let P be ireducible and aperiodic and suposse that P has an
invariant distribution π then
(n)
pij → πj as n → ∞ for all i, j.
3.1.7 Ergodic theorem

Ergodic theory concerns the limiting behaviour of averages over time. We shall
prove a theorem which identifies for Markov chains the long-run proportion of
time spent in each state.
Denote by Vi (n) the number of visits to i before n :
n−1
X
Vi (n) := 1{Xk =i} .
k=0
Theorem 3.1.12 Let P irreducible. Then

Vi (n) 1
→ a.s. as n → ∞.
n mi
Moreover, in the positive recurrent case, for any bounded function f : I → R we

have
n−1
1X X
f (Xk ) → πi f (i), a.s as n → ∞.
n
k=0 i∈I
where π is the unique invariant distribution.
Proof. Without loss of generality we assume that we start at state i. If P

is trasient then Vi (n) is bounded by the total number of visits Vi so then
Vi (n) Vi 1
≤ →0=
n n mi
(i)
Suposse that P is recurrent. Let δr the length of the r excursion to i. Then
(i) (i)
by the strong Markov property δr are i.i.d. with E(δr ) = mi , and
(i) (i) (i) (i)
δ1 + ... + δVi (n)−1 n δ1 + ... + δVi (n)
≤ ≤
Vi (n) Vi (n) Vi (n)
and we can apply the strong law of large numbers.

For the second part, we can assume w.l.o.g. that |f | ≤ 1 then for any J ⊆ I
n−1
1 X X V (n)
i
f (Xk ) − πi f (i) = − πi f (i)

n n

k=0 i∈I

X Vi (n)
X Vi (n)
≤
n − π i
+
n − πi

i∈J i6∈J
X Vi (n) X
Vi (n)

≤ n − πi +
+ πi
n
i∈J i6∈J

X Vi (n) X
≤2
n − π i + 2
πi ,
i∈J i6∈J
but we saw that

Vi (n)
→ πi as n → ∞ for all i.
n
Then, given ε > 0 we can take J such that
X ε
πi < ,
4
i6∈J
and N (ω) such that for all n ≥ N (ω)

X Vi (n)
ε
n − πi < 4 .

i∈J
3.2 Poisson process

3.2.1 Exponential distribution
Definition 3.2.1 We say that T : Ω → [0, ∞] follows an exponential distribu-
tion with parameter λ (we shall write it E(λ) for short) if
fT (t) = λe−λt 1[0,∞] (t).
It is easy to see the following properties:
• E(T ) = 1/λ, var(T ) = 1/λ2 .
• T /µ ∼ E (µλ) .
• P (T > t + s|T > t) = P (T > s) (lack of memory).
• If S ∼ E(λ) and T ∼ E(µ), then min(S, T ) ∼ E(λ + µ) and P (S < T ) =

λ
λ+µ .
3.2. POISSON PROCESS 49
• If t1 , t2 , ..., tn are independent E(λ) then Tn := t1 + t2 + ... + tn ∼

gamma(n, λ).
Theorem 3.2.1 Let T1 , T2 , ... be a sequence of independent random variables
with Tn = E(λn ) and 0 < λn < ∞ for all n.
∞ ∞
X 1 X
(i) If < ∞, then P ( Tn < ∞) = 1.
λ
n=1 n n=1
∞ ∞
X 1 X
(ii) If = ∞, then P ( Tn = ∞) = 1.
λ
n=1 n n=1
P∞ 1
Proof. (i) Suppose n=1 λn < ∞. Then by the monotone convergence
theorem
∞ ∞ ∞
!
X X X 1
E Tn = E (Tn ) = < ∞,
n=1 n=1
λ
n=1 n
P∞
so P ( n=1 Tn < ∞) = 1. P∞
(ii) Suppose instead that n=1 λ1n = ∞. Then Π∞ 1
n=1 (1 + λn ) = ∞, by the
monotone convergence theorem and the independence
∞
!
X 1 −1
E exp{− Tn } = Π∞ ∞
n=1 E (exp{−Tn }) = Πn=1 (1 + ) = 0,
n=1
λ n
P∞
so P ( n=1 Tn = ∞) = 1.
Theorem 3.2.2 Let I be a countable set and P let Tk , k ∈ I, be independent
random variables with Tk ∼ E(qk ) and q := k∈I qk < ∞. Set T = inf k Tk .
Then this minimum is attained at a unique random value K of k, with probability
1. Moreover T and K are independent, with T ∼ E(q) and P (K = k) = qqk .
Proof.
P (K = k, T ≥ t)
= P (Tk ≥ t, Tj > Tk for all j 6= k)
Z ∞
= qk e−qk s P (Tj > s for all j 6= k)ds
t
Z ∞
qk
= qk e−qk s Πj6=k e−qj s ds = e−qt ,
t q
Hence P (K = k, for some k) = 1, and K and T have the claimed joint distri-
bution.
3.2.2 Poisson process

Definition 3.2.2 Let t1 , t2 , ...of independent E(λ) random variables, let Tn =
t1 + t2 + ... + tn , for n ≥ 1, T0 = 0 and define Ns = max{n, Tn ≤ s}.Then, the
process (Ns )s≥0 is named a Poisson process with parameter (or rate) λ.
Remark 3.2.1 We think of the tn as times between arrivals of customers at a

bank. Then Ns is the number of arrivals by the time s.
Lemma 3.2.1 Ns ∼ P oisson(λs)
Proof. P (Ns = n) = P (Tn ≤ s < Tn+1 ).
Tn+1 = tn+1 + Tn ,
and tn+1 is independent of Tn , then
Z sZ ∞
P (Tn ≤ s < Tn+1 ) = fTn (t)ftn+1 (u − t)dudt
0 s
Z sZ ∞
1
= λn tn−1 e−λt λe−λ(u−t) dudt
0 s (n − 1)!
Z s n
λn (λs)
= e−λs tn−1 dt = e−λs .
(n − 1)! 0 n!
Lemma 3.2.2 Nt+s − Ns , t ≥ 0 is a Poisson process with rate λ independent

of (Nr )0≤r≤s .
Proof. Assume that for the time s there where four arrivals and that T4 =
u4 . Then
P (t5 > s − u4 + t|t5 > s − u4 ) = P (t5 > t) = e−λt .
So, the first arrival after s is E(λ) and independent of T1 , T2 , T3 , T4 . It is clear
that t5 , t6 , ...are independent of T1 , T2 , T3 , T4 .
Lemma 3.2.3 (Nt )t≥0 has independent increments.
Theorem 3.2.3 If (Nt )t≥0 is a Poisson process
(i) N0 = 0
(ii) Nt+s − Ns ∼ P oisson(λt)
(iii) (Nt )t≥0 has independent increments .
Conversely , if (i) (ii) and (iii) hold and the process is right continuous, then
(Nt )t≥0 is a Poisson process.
Proof. (The converse)
P (T1 > t) = P (Nt = 0) = e−λt .
P (t2 > t|t1 ≤ s) = P (Nt+s − Ns = 0|Nr ≤ 1, 0 ≤ r ≤ s)

= P (Nt+s − Ns = 0) = e−λt ,
so t2 ∼ E(λ) independent of t1 .
3.2. POISSON PROCESS 51
Theorem 3.2.4 (Nt )t≥0 is a Poisson process iff
(i) N0 = 0,
(ii) Nt is an increasing, right continuous, integer-valued process
with independent increments,
(ii) P (Nt+h − Nt = 0) = 1 − λh + o(h), P (Nt+h − Nt = 1) = λh + o(h) ,
as h ↓ 0 uniformly in t.
Proof. Suppose that (Nt )t≥0 is a Poisson process. Then
P (Nt+h − Nt ≥ 1) = P (Nh ≥ 1) = 1 − e−λh = λh + o(h),

P (Nt+h − Nt ≥ 2) = P (Nh ≥ 2) = P (T1 + T2 ≤ h) ≤ P (T1 ≤ h, T2 ≤ h)
= P (T1 ≤ h)P (T2 ≤ h) = (1 − e−λh )2 = o(h).
If (i) (ii) and (iii) hold, for i = 2, 3, ..., we have P (Nt+h − Nt = i) = o(h) as
h ↓ 0, uniformly in t. Set pj (t) = P (Nt = j). Then, for j = 1, 2, ...,
j
X
pj (t + h) = P (Nt+h = j) = P (Nt+h − Nt = i)P (Nt = j − i)
i=0
= (1 − λh + o(h))pj (t) + (λh + o(h)) pj−1 (t) + o(h).
So
pj (t + h) − pj (t)
= −λpj (t) + λpj−1 (t) + O(h),
h
since this estimate is uniform in t we can put t = s − h to obtain
pj (s) − pj (s − h)
= −λpj (s − h) + λpj−1 (s − h) + O(h),
h
now letting h ↓ 0 we obtain that pj (t) are differentiable and satisfies the differ-
ential equation
p0j (t) = −λpj (t) + λpj−1 (t).
Analogously we can see that
p00 (t) = −λp0 (t),
and since
p0 (0) = 1, pj (0) = 0 for j = 1, 2, ...
we have that
j
(λt)
pj (t) = e−λt for j = 0, 1, 2, ....
j!
3.3 Continuous-time Markov chains

3.3.1 Definition and examples
Definition 3.3.1 (Xt )t≥0 is a Markov chain if for any 0 ≤ s0 < s1 · ·· < sn < s
and possible states i0 , ..., in , i, j we have
P (Xt+s = j|Xs = i, Xsn = in , ..., Xs0 = i0 ) = P (Xt+s = j|Xs = i)

= P (Xt = j|X0 = i).
Example 3.3.1 Let (Nt )t≥0 be a Poisson process with rate λ and let (Yn )n≥0
be a discrete time Markov chain with transition probabilities πij independent of
(Nt )t≥0 . Then Xt = YNt is a continuous-time Markov chain. Intuitively, this
follows from the memoryless property of the exponential distribution : if Xs = i,
then, independently of what happened in the past, the time to the next jump will
be exponentially distributed with rate λ and will go to state j with probability
pij . If we write
pij (t) = P (Xt = j|X0 = i),
we have
P (Xt = j, X0 = i) P (YNt = j, Y0 = i)
pij (t) = =
P (X0 = i) P (Y0 = i)
∞
X P (Yk = j, Y0 = i)1
= P (Nt = k)
P (Y0 = i)
k=0
∞ k
−λt ∞ (k)
X (k) e
(λt) X e−λt (tλπij )
= πij =
k! k!
k=0 k=0

= e−λt etλΠ ij = et(λΠ−λI) = etQ ij .

ij
Where Q = λ(Π − I).
Chapman-Kolmogorov equation
Proposition 3.3.1
X
pik (s)pkj (t) = pij (t + s). (3.3)
k∈I
Proof.
X
pij (t + s) = P (Xt+s = j|X0 = i) = P (Xt+s = j, Xs = k|X0 = i)
k∈I
X
= P (Xt+s = j|Xs = k)P (Xs = k|X0 = i).
k∈I
3.3. CONTINUOUS-TIME MARKOV CHAINS 53
(3.3) shows that if we know that transition probability for t < t0 for any
t0 > 0 we know it for all t. This observation suggests that the transition
probabilities pij (t) can be determined from their derivatives at 0 :
pij (h)
qij = lim , j 6= i.
h→0 h
If this limit exists we will call qij the jump rate from i to j.
For Example 3.3.1 qij = λπij .
Example 3.3.1 is atypical. There we started with the Markov chain and then
defined its rates: In most cases it is much simpler to describe the system by
writing down its transition rates qij for j 6= i, which describe the rates at which
jumps are made from i to j.
Example 3.3.2 Poisson process
qn,n+1 = λ for all n ≥ 0.
Example 3.3.3 M/M/s queue. Imagine a bank with s sellers that serve cus-
tomers who queue in a single line if all of the servers are busy. We imagine that
customers arrive at times of a Poisson process with rate λ, and that each ser-
vice time is an independent exponential with rate µ. As in the previous example
qn,n+1 = λ. To model the departures we let

nµ 0 ≤ n ≤ s
qn,n−1 =
sµ n ≥ µ
3.3.2 Computing the transition probability

Theorem 3.3.1 Let Q be matrix on a finite set I. Set P (t) = etQ . Then
(P (t), t ≥ 0) has the following properties:
(i) P (s + t) = P (s)P (t) for all s, t (semigroup property)
(ii) (P (t), t ≥ 0) is the unique solution to the forward equation
d
dt P (t) = P (t)Q, P (0) = I
(iii) (P (t), t ≥ 0) is the unique solution to the backward equation
d
dt P (t) = QP (t), P (0) = I
(iv) for k = 0, 1, 2, ..., we have
d k

dt |k=0 P (t) = Qk .
Proof. For any s, t ∈ R, sQ and tQ commute, so
esQ etQ = e(s+t)Q .
Proving the semigroup property. In fact if Q1 and Q2 commute
∞ ∞ n
X (Q1 + Q2 )n X 1 X n!
e(Q1 +Q2 ) = = Qk1 Qn−k
2
n=0
n! n=0
n! k!(n − k)!
k=0
∞ ∞
X Qk1 X Qn−k
= 2
= eQ1 eQ2 .
k! (n − k)!
k=0 n=k
The matrix-value power series

∞
X (tQ)k
P (t) =
k!
k=0
has infinite radius of convergence. So each component is differentiable with

derivative given by term-by-term differentiation:
∞
X tk−1 Qk
P 0 (t) = = P (t)Q = QP (t).
(k − 1)!
k=1
Hence P (t) satisfies the forward and backward equations. By repeated term-
by-term differentiation we obtain (iv). To show uniqueness, assume that M (t)
is another solution of the forward equation, then
d d d −tQ
M (t)e−tQ = (M (t)) e−tQ + M (t)

e
dt dt dt
= M (t)Qe−tQ + M (t)(−Q)e−tQ = 0,
so M (t)e−tQ is constant, and so M (t) = etQ . Similarly for the backward equa-
tion.
Example 3.3.4 Let  

−2 1 1
Q= 1 -1 0 .
2 1 −3
det(Q − λI) = λ(λ + 2)(λ + 4).

Then, Q has eigenvalues 0, −2, −4. Then there is an invertible matrix U such
that  
0 0 0
Q = U  0 −2 0  U −1 ,
0 0 −4
so
 k 
∞ ∞ 0 0 0
X (tQ)k X 1  k  U −1
etQ = =U 0 (−2t) 0
k! k!
k=0 k=0 0 0 (−4t) k
 
1 0 0
= U  0 e−2t 0  U −1 .
−4t
0 0 e
To determine the conditions we use

00
P (0) = I, P 0 (0) = Q, P (0) = Q2 .
Theorem 3.3.2 Consider a continuous-time Markov chain, with a finite set of

states. Assume that as h ↓ 0 we have that
pij (h) = δij + qij h + o(h), for all i, j ∈ I,
uniformly in t, then {pij (t), i, j ∈ I, t ≥ 0} is the solution of the forward equation
p0 (t) = p(t)Q, p(0) = I,
where (p(t))ij = pij (t), Qij = qij .
Proof. Similar to the Theorem 3.2.4:

X
pij (t + h) = pik (t)(δkj + qkj h + o(h)).
k∈I
Since I is finite,
pij (t + h) − pij (t) X
= pik (t)qkj + O(h).
h
k∈I
P
Remark 3.3.1 Note that in the previous theorem qii = − j6=i qij . By Theorem
3.3.1 pij (t), i, j ∈ I, t ≥ 0} is also the solution of the backward equation
p0 (t) = Qp(t), p(0) = I,
where (p(t))ij = pij (t), Qij = qij
Remark 3.3.2 We have similar results for the case of infinite state-space. But
in this case we have to look for a minimal non-negative solution of the backward
and forward equations.
Example 3.3.5 Consider a two-state chain, where, for concreteness I = {1, 2}.
In this case we have to specify only two rates q12 = λ and q21 = µ, so

−λ λ
Q= .
µ −µ
Writing the backward equation in matrix form

0
p11 (t) p012 (t)

−λ λ p11 (t) p12 (t)
= ,
p021 (t) p022 (t) µ −µ p21 (t) p22 (t)
to solve this we guess that

µ λ −(µ+λ)t
p11 (t) = + e
µ+λ µ+λ
µ µ −(µ+λ)t
p21 (t) = − e .
µ+λ µ+λ
Example 3.3.6 (The Yule process). In this model each particle splits into two
at rate β so qi,i+1 = βi. It can be shown that
p1j (t) = e−βt (1 − e−βt )j−1 for j ≥ 1. (3.4)
From this, since the chain starting at i is the sum of i copies of the chain
starting at i we have

j+i−1 i
pij (t) = e−βt (1 − e−βt )j−i .
j
3.3.3 Jump chain and holding times

We say that a matrix Q = (qij , i, j ∈ I) is a rate matrix if
(i) 0 ≤ −qii < ∞ for all i,
(ii) qij ≥ 0 for all i 6= j,
X
(iii) qij = 0 for all i. .
j∈I
We shall write qi or q(i) as an alternative notation for −qii .

From a rate matrix Q we can obtain a stochastic matrix Π, the jump matrix ,

qij /qi if j 6= i and qi 6= 0
πij =
0 if j 6= i and qi = 0

0 if qi 6= 0
πii= .
1 if qi = 0
Example 3.3.7  
−2 1 1
Q= 1 −1 0 ,
2 1 −3
then  
0 1/2 1/2
Π= 1 0 0 
2/3 1/3 0
Definition 3.3.2 A minimal right-continuous process (Xt )t≥0 on I is a Markov
chain with initial distribution λ and rate matrix Q if its jump chain (Yn )n≥0
is a discrete-time Markov chain (λ, Π) and if for each n ≥ 1, conditional on
Y0 , Y1 , ...Yn−1 , its holding times S1 , S2 , ..., Sn are independent exponential ran-
dom variables of parameters q(Y0 ), q(Y1 ), ..., q(Yn−1 ). In such a way that if write
the jump times Jn = S1 + S2 + ... + Sn
XJn = Yn ,
and
Yn if Jn ≤ t < Jn+1 for some n
Xt =
∞ otherwise (it is minimal) .

Let X0 = Y0 with distribution λ, and with and array Tnj , n ≥ 1, j ∈ I of
independent exponential random variable of parameter 1. Then, inductively for
n = 0, 1, 2, ..., if Yn = i, we set
j
Sn+1 = Tnj /qij , for j 6= i,
j
Sn+1 = inf Sn+1 ,
j6=i
j

j if Sn+1 = Sn+1 < ∞
Yn+1 = .
i if Sn+1 = ∞.
P
Then conditional on Yn = i, Sn+1 is E(qi ), qi = j6=i qij , Yn+1 has distri-
bution (πij , j ∈ I) and Sn+1 and Yn+1 are independent, and independent of
Y0 , Y1 , ..., Yn−1 and S1 , ..., Sn .
3.3.4 Birth processes

A birth process, say (Xt )t≥0 , is a generalization of a Poisson process in which the
parameter λ is allowed to depend on the current state of the process. The data
for a birth process consist of birth rates 0 ≤ qi,i+1 < ∞, where i = 0, 1, 2, ...That
is (Xt )t≥0 is a right-continuous process with values in Z+ ∪ {∞} and conditional
on X0 = i, its holding times are qi,i+1 , qi+1,i+2 , ...and its jump chain is given by
Yn = i + n. So
 
−q01 q01

 −q12 q12 

 −q23 q23 
Q= .
 
.. ..

 . . 

 
Example 3.3.8 (Simple birth process or Yule process) qi,i+1 = βi. Write
Xt the number of individuals at time t. Suppose that X0 = 1 and let J1 denote,
as above, the time of the first birth, then J1 ∼ exp(β) and
E(Xt ) = E(Xt 1{J1 ≤t} ) + E(Xt 1{J1 >t} )

Z t
= βe−βs E(Xt |J1 = s)ds + E(1{J1 >t} )
0
Z t
= βe−βs E(Xt |J1 = s)ds + eβt .
0
If we put µ(t) = E(Xt ) then E(Xt |J1 = s) = 2µ(t − s): if we have a birth at s
hereafter we have like two simple birth process of the same type as X. Then
Z t
µ(t) = 2 βe−βs µ(t − s)ds + eβt ,
0
and from here we obtain

µ0 (t) = βµ(t),
so the mean population size grows exponentially:
E(Xt ) = eβt .
Note also that we can obtain µ(t) from (3.4)
∞
X ∞
X
µ(t) = jp1j (t) = je−βt (1 − e−βt )j−1
j=1 j=1
∞
1 d X 1 d 1 − e−βt
= (1 − e−βt )j =
β dt j=1 β dt e−βt
= eβt .
Definition 3.3.3 If (Jn )n≥1 n≥1 are the jump times and (Sn )n≥1 the holding
times of our Markov chain
∞
X
ζ = sup Jn = Sn
n=1
is called the explosion time.

Remark 3.3.3 Note that our Markov chains, say (Xt )t≥0 are minimal pro-
cesses in the sense that
Xt = ∞ if t ≥ ζ.
Remark 3.3.4 It is easy to see from Theorem 3.2.1 that the birth process start-
ing from zero explodes if and only if
∞
X 1
< ∞.
i=0
qi,i+1
Example 3.3.9 (Non-minimal chain) Consider a birth process (Xt )t≥0 start-
ing from 0 with rates qi,i+1 = 2i for i ≥ 0. Then by the previous remark the
process explodes, we have insisted that Xt = ∞ if t ≥ ζ, where ζ is the explosion
time. But another obvious possibility is to start the process off again from zero
at time ζ and do the same for all subsequent explosions. Using the memoryless
property of the exponential distribution it can be shown that the new process is
a continuous Markov chain in the sense of Definition 3.3.1 .
3.3.5 Class structure

Hereafter we deal only with minimal chains, those that die after explosion. Then
the class structure is simply the discrete-time class structure of the jump chain
(Yn )n≥0 .We say that i leads to j and write i → j if
Pt (Xt = i for some t ≥ 0) > 0.
We say i communicates with j and write i ←→ j if both i → j and j → i. The

notions of communicating class, closed class, absorbing state and irreducibility
are inherited from the jump chain.
Theorem 3.3.3 For distinct states i and j the following are equivalent:
(i) i → j;
(ii) i → j for the jump chain;
(iii) qi0 i1 qi1 i2 · · · qin−1 in > 0 for some states i0 , i1 , ..., in with i0 = i,
and in = j;
(iv) pij (t) > 0 for all t > 0;
(v) pij (t) > 0 for some t > 0.
3.3.6 Hitting times

Let (Xt )t≥0 be a Markov chain with rate matrix Q. The hitting time of a subset
A of I is the random variable DA defined by
DA (ω) = inf{t ≥ 0, Xt (ω) ∈ A}
with the usual convention that inf ∅ = ∞. Since (Xt )t≥0 is minimal, if H A is
the hitting time of A for the jump chain, then
A
H < ∞ = DA < ∞ ,

and on this set

D A = JH A .
The probability, starting from i, that (Xt )t≥0 ever hits A is then
hA A A
i = Pi (D < ∞) = Pi (H < ∞).
Theorem 3.3.4 The vector of hitting probabilities hA = (hA i , i ∈ I) is the

A
hi = 1 for i ∈ A
A
P
j∈I qij hj = 0 for i 6∈ A
The average time taken, starting from i, for (Xt )t≥0 to reach A is given by
kiA = Ei (DA )
Theorem 3.3.5 Assume that qi > 0 for all i 6∈ A. The vector of expected
hitting times k A = (kiA , i ∈ A) is the minimal non-negative solution to the
system of linear equations
A
ki P=0 for i ∈ A
(3.5)
− j∈I qij kjA = 1 for i 6∈ A
Proof. First note that k A satisfies (3.5): if X0 = i ∈ A then DA = 0, so

kiA = 0; if X0 = i 6∈ A, by the Markov property
Ei (DA − J1 |Y1 = j) = Ej (DA ),
so
kiA = Ei (DA ) = Ei (J1 ) + Ei (DA − J1 )
X
= Ei (J1 ) + Ei (DA − J1 |Y1 = j)Pi (Y1 = j)
j6=i
1 X 1 X
= + Ej (DA )πij = + πij kjA
qi qi
j6=i j6=i
1 X qij A
= + k ,
qi qi j
j6=i
and then X
−qii kiA − qij kjA = 1.
j6=i
Suppose now that y = (yi , i ∈ I) is another non-negative solution to (3.5). Then

kiA = yi = 0 for i ∈ A. If i 6∈ A, then
 
1 X 1 X 1 X
yi = + πij yj = + πij  + πjk yk 
qi qi qj
XX
= Ei (S1 ) + Ei (S2 1{H A ≥2} ) + πij πjk yk .
j6∈A k6∈A
By repeated substitution for y we obtain

X X
yi = Ei (S1 ) + · · · + Ei (Sn 1{H A ≥n} ) + ··· πij1 · · · πjn−1 jn yjn .
j1 6∈A jn−1 6∈A
So,
 A

n
X n∧H
X
yi ≥ Ei (Sm 1{H A ≥m} ) = Ei  Sm  → Ei (DA ) = kiA .
n→∞
m=1 m=1
3.3.7 Recurrence and trasience

Let (Xt )t≥0 be Markov chain with rate matrix Q. Recall that (Xt )t≥0 is mini-
mal. We say a state i is recurrent if
Pi ({t ≥ 0, Xt = i} is unbounded) = 1.
We say is transient if
Pi ({t ≥ 0, Xt = i} is unbounded) = 0.
Remark 3.3.5 Note that if (Xt )t≥0 can explode starting from i then i si cer-
tainly not recurrent.
Theorem 3.3.6 We have:
(i) if i is recurrent for the jump chain (Yn )n≥0 , then i is recurrent for (Xt )t≥0 ;
(ii) if i is transient for the jump chain (Yn )n≥0 , then i is transient for (Xt )t≥0 ;
(iii) every state is either recurrent or transient;
(iv) recurrence and transient are class properties.
Denote by Ti the first passage time of (Xt )t≥0 to state i, defined by
Ti (ω) = inf{t ≥ J1 (ω), Xt (ω) = i}
Proof. (i) Suppose i is recurrent for the jump chain (Yn )n≥0 . If X0 = i then
(Xt )t≥0 does not explode and Jn → ∞.Also X(Jn ) = Yn = i infinitely often, so
{t ≥ 0, Xt = i} is unbounded, with probability 1.
(ii) Suppose i is transient for (Yn )n≥0 . If X0 = i then
N = sup{n ≥ 0, Yn = i} < ∞,
so {t ≥ 0, Xt = i} is bounded by JN +1 which is finite with probability 1, because

{Yn , n ≤ N } cannot include an absorbing state.
We denote by Ti the first passage time of (Xt )t≥0 to state i, defined by
Ti (ω) = inf{t ≥ J1 , Xt (ω) = i}.
Theorem 3.3.7 The following dichotomy holds:

Z ∞
(i) if qi = 0 or Pi (Ti < ∞) = 1, then i is recurrent and pii (t)dt = ∞;
0
Z ∞
(ii) if qi > 0 and Pi (Ti < ∞) < 1, then i is transient and pii (t)dt < ∞; .
0
Proof. If qi = 0 then (Xt )t≥0 cannot leave i and pii (t) = 1.Suppose qi > 0.
Let Ni denote the first passage time of the chain (Yn )n≥0 for the state i.Then
P (Ni < ∞) = P (Ti < ∞),

so i is recurrent if and only if P (Ti < ∞) = 1 by the previous theorem and the
result for the jump chain (Yn )n≥0 .
(n)
Write πij for (Πn )ij . We shall prove that
Z ∞ ∞
1 X (n)
pii (t)dt = π .
0 qi n=0 ij
Z ∞ Z ∞ Z ∞
pii (t)dt = (Ei 1{Xt =i} )dt = Ei 1{Xt =i} dt
0 0 0
∞
!
X
= Ei Sn+1 1{Yn =i}
n=0
∞ ∞
X 1 X (n)
= Ei (Sn+1 |Yn = i) Pi (Yn = i) = π .
n=0
qi n=0 ij
3.3.8 Invariant distributions

We say that λ is invariant if
λQ = 0.
Theorem 3.3.8 Let Q a rate matrix with jump matrix Π and let λ be a mea-
sure. The following are equivalent:
(i) λ is invariant;
(ii) µΠ = µ where µi = λi qi .
Proof. We have qi (πij − δij ) = qij for all i, j so
X X
µ (Π − I)j = µi (πij − δij ) = λi qij .
i∈I i∈I
Theorem 3.3.9 Suppose that Q is irreducible and recurrent. Then Q has an

invariant measure λ unique up to scalar multiples.
Recall that a state i is recurrent if qi = 0 or Pi (Ti < ∞) = 1. If qi = 0
or the expected return time mi = Ei (Ti ) is finite we say i is positive recurrent
otherwise is called null recurrent.
Theorem 3.3.10 Let Q be an irreducible rate matrix. Then the following are
equivalent:
(i) every state is positive recurrent;
(ii) some state i is positive recurrent;
(iii) P has an invariant distribution, λ say.
Moreover, when (iii) holds we have mi = 1/ (λi qi ) , for all i.
Theorem 3.3.11 Let Q be an irreducible rate matrix, and let λ be a measure.

Let s > 0 be given. The following are equivalent
(i) λQ = 0;
(ii) λP (s) = λ .
3.3.9 Convergence to equilibrium

Theorem 3.3.12 Let Q be an irreducible non-explosive rate matrix with semi-
group P (t) an having an invariant distribution λ. Then for all states i, j we
have
pij (t) → λj as t → ∞.
Theorem 3.3.13 Let Q be an irreducible rate matrix and let ν be any distri-
bution. Suppose that (Xt )t≥0 is Markov (ν, Q) . Then
P (Xt = j) → 1/(qj mj ) as t → ∞ for all j ∈ I
where mj is the expected return time to state j.

MIT18 S096F13 Lecnote1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MIT18 S096F13 Lecnote1

Uploaded by

Copyright:

Available Formats

A Course in Stochastic Processes

José Manuel Corcuera

1 Basic Concepts and Definitions 3

2 Discrete time martingales 13

3.3.1 Definition and examples . . . . . . . . . . . . . . . . . . . 52

Basic Concepts and

1.1 Definition of a Stochastic Process

(Xt1 , ..., Xtn ) : Ω −→ Rn

Example 1.2.3 Arrival process: Consider the process of arrivals of customers

Example 1.2.4 Consider a discrete time stochastic process {Xn , n = 0, 1, 2, ...}

for all ti ∈ T, ai ∈ R, and n ≥ 1. Then there exists a Gaussian process with

1.3 Equivalence of Stochastic Processes

We also say that {Xt , t ∈ T } is a version of {Yt , t ∈ T }. Two equivalent

Example 1.3.1 Let ξ be a nonnegative random variable with continuous dis-

are equivalent but their sample paths are different.

1.4 Continuity Concepts

Definition 1.4.2 Fix p ≥ 1. Let {Xt , t ∈ T } be a real-valued stochastic pro-

Continuity in mean of orderp implies continuity in probability. However, the

Proposition 1.4.1 (Kolmogorov continuity criterion) Let {Xt , t ∈ T } be

E(|Xt − Xs |p ) ≤ cT |t − s|α (1.1)

Moreover, E(Gpε ) < ∞.

1.5 Main Classes of Random Processes

Definition 1.5.1 The process X is said to have independent increments if for

Xt0 , Xt1 − Xt0 , ..., Xtn − Xtn−1

are independent random variables.

Definition 1.5.2 The process X is said to be a Markov process with a state

P (Xtn ∈ B|Xt1 , ..., Xtn−1 ) = P (Xtn ∈ B|Xtn−1 ) (a.s.) (1.2)

Remark 1.5.2 Property (2.3) is called the Markov property.

3. Strickly stationary processes.

Definition 1.5.3 The process X is said to be strickly stationary if for any

1.6 Some Applications

where Ti is the germination time of the i-plant. If we assume that {Ti }N

1.6.2 Modeling of Soil Erosion Effect

1.6.3 Brownian motion

(a) P (Yt = 0, per almenys un t ∈ D),

5. Siguin X i Y variables aleatòries definides en (Ω, F, P ) , on Y ∼ N (0, 1) .

Determineu la seva funció de covariància. És {Zt }t≥0 estacionari en sentit

7. Diem que un procés{Nt }t≥0 és un un procés de Poisson homogeni de

on α, β > 0 i X ∼ N (0, 1) . Calculeu E[Yt ] i E[Ys Yt ]. És el procés {Yt }t≥0

Discrete time martingales

We will first introduce the notion of conditional expectation of a random variable

2.1 Conditional expectation

(i) Z is measurable with respect to B.

(ii) For all A ∈ B, E(Z1A ) = E(X1A )

It can be proved that there exists a unique (up to modifications on sets of

1. The conditional expectation is lineal: E(aX+bY |B) = aE(X|B)+bE(Y |B).

Conditional expectation has properties similar to those of ordinary expecta-

X ≤ Y ⇒ E(X|B) ≤ E(Y |B).

This implies |E(X|B)| ≤ E(|X||B).

ϕ(E(X|B)) ≤ E(ϕ(X)|B). (2.2)

In particular, if we take ϕ(x) = |x|p with p ≥ 1, and E(|X|p ) < ∞, we obtain

|E(X|B)|p ≤ E(|X|p |B),

hence, taking expectations,

E(|E(X|B)|p ) ≤ E(|X|p ). (2.3)

We can define the conditional probability of an event C ∈ F given a σ-field B

Suppose that the σ-field B is generated by a finite collection of random vari-

The set of all square integrable random variables, denoted by L2 , is a Hilbert

(i) E(X|B) belongs to L2 (Ω, B, P ) because it is a B-measurable random vari-

X − E(X|B) is orthogonal to the subspace L2 (Ω, B, P ). In fact, for all

E[(X − E(X|B))Z] = E(XZ) − E(E(X|B)Z)

As a consequence, E(X|B) is the random variable in L2 (Ω, B, P ) that minimizes

E[(X − E(X|B))2 ] = min E[(X − Y )2 ].

This follows from the relation