You are on page 1of 13

# 1 Preliminaries

Collection of useful facts.
Lemma 1 (Jensen’s Inequality, Proposition 1.4.2 in [3]). Suppose that X is an integrable R-
valued random variable on (Ω, T, µ) and φ is a convex function. Then φ(E[X]) ≤ E[φ(X)].
Proposition 1 (Operations that preserve measurability, see here). Suppose that (f
n
), f and g
map from some measurable space to R and that α ∈ R. Then
f +g, αf, fg, f ∨ g, f ∧ g, limsup
n→∞
f
n
, liminf
n→∞
f
n
are all measurable. So if (f
n
) converges point-wise to f we have that f is measurable, since
f = limsup
n→∞
f
n
= liminf
n→∞
f
n
.
Theorem 1 (Functional Monotone Class Theorem, theorem 1.5 p.277 of [1]). Let / be a π-
system (that is, closed under ﬁnite intersections) that contains Ω and let 1 be a collection of
real-valued functions on Ω that satisﬁes:
• If A ∈ /, then 1
A
∈ 1.
• If f, g ∈ 1, then f +g, cf ∈ 1 for any real number c.
• If (f
n
) ⊂ 1 is a a sequence of non-negative, increasing functions that tend pointwise to a
bounded function f, then f ∈ 1.
Then 1 contains all bounded functions that are measurable with respect to σ(/).
1.1 Dynkin’s π-λ Theorem
Deﬁnition 1 (π and λ-systems). Let Ω be a set. A π-system of Ω is a collection of subsets of Ω
that is closed under ﬁnite intersections. A λ-system T is a collection of subsets of Ω such that
Ω ∈ T and A, B ∈ T such that A ⊂ B implies that B − A ∈ T and such that for any sequence
of increasing sets A
1
⊂ A
2
⊂ . . . contained in T, the union of the sets is also contained in T.
Lemma 2 (Dynkin’s π-λ Theorem, Lemma 1.7.2 in [3]). Let ( be a π-system on some set Ω
and let T := λ¦(¦ be the smallest λ-system on Ω such that ( ⊂ T. Then T = σ¦(¦.
Corollary 1 (Lemma 1.7.3 in [3]). Let (Ω, T) be a measurable space, and let ( be a π-system
such that T = σ¦(¦. If two probability measures P and Q agree on (, that is, if we have that
P(A) = Q(A) for all A ∈ (, then P and Q are equal (that is P(A) = Q(A) for all A ∈ T.
1.2 Convergence Theorems
Deﬁnition 2. A collection of random variables ¦X
α
¦
α
on a probability space (Ω, T, P) is said
to be uniformly integrable if c → ∞ implies that
sup
α
_
|X
α
|≥c
[X
n
[dP → 0.
Theorem 2 (Vitali’s Convergence Theorem, Theorem 4 on page 188 of [2]). Let (X
n
) be a
sequence of R-valued integrable random variables. Suppose that (X
n
) is uniformly integrable and
that X
n
→ X almost surely. Then X is integrable and E[[X
n
− X[] → 0 as n → ∞ (which, by
Jensen’s inequality, implies that E[X
n
] → E[X] as n → ∞).
1
1.3 Conditional expectation
Deﬁnition 3. Let (Ω, T, µ) be a probability space and let X be an R
n
-valued integrable (that
is, E[[X
i
[] < ∞ for all i) random variable and let ( be any sigma-algebra. Then E[X[(] is the
unique
1
(-measurable random variable that satisﬁes the relation E[1
A
X] = E[1
A
E[X[(]] for all
events A ∈ (.
Conditional expectations have the following handy properties:
Proposition 2 (Theorem 2.3.2. in [3]). Let X, Y be real valued integrable functions on (Ω, T, µ),
let (, 1 ⊂ T be sigma-algebras, and let α, β ∈ R. Then
1. Linearity: E[αX +βY [(] = αE[X[(] +βE[Y [(].
2. If X ≥ 0 almost surely, then E[X[(] ≥ 0 almost surely.
3. Tower property: E[E[X(][1] = E[X[1] almost surely. Note that this implies that
E[X] = E[X[¦∅, Ω¦] = E[E[X[(][¦∅, Ω¦] = E[E[X[(]].
4. If X is (-measurable and XY is integrable, then E[XY [(] = XE[Y [(] almost surely. Thus
E[X[(] = XE[1

[(] = X almost surely.
5. If 1 and σ¦X, (¦ are independent, then E[X[σ¦(, 1¦] = E[X[(] almost surely. Thus
E[X[1] = E[X[σ¦1, ¦Ω, ∅¦¦] = E[X[¦Ω, ∅¦] = E[X] almost surely.
6. Monotone and dominated convergence theorems, Fatou’s lemma, and Jensen’s inequality
all hold if we replace in them E[X] with E[X[(].
1
Uniqueness is a consequence of the Radon-Nikodym Theorem, see [3].
2
2 Stochastic processes, ﬁltrations and stopping times
2.1 Stochastic processes, indistinguishability and modiﬁcations
I start these notes on stochastic calculus with the deﬁnition of a continuous time stochastic
process. Very simply, a stochastic process is a collection of random variables ¦X
t
¦
t≥0
deﬁned
on a probability space ¦(Ω, T, P)¦. That is, for each time t ≥ 0, ω → X
t
(ω) is a measurable
function from Ω to the real numbers.
Remark 1 (by Georgy Lowther). Stochastic processes may also take values in any measurable
space (E, c) but, in these notes, I concentrate on real valued processes. I am also restricting
to the case where the time index t runs through the non-negative real numbers R
+
, although
everything can easily be generalized to other subsets of the reals.
A stochastic process X := ¦X
t
¦
t≥0
can be viewed in either of the following three ways.
As a collection of random variables, one for each time t ≥ 0. As a collection of paths
R
+
→R,
t → X
t
(ω),
one for each ω ∈ Ω. These are referred to as the sample paths of the process.
As a function from the product space
R
+
Ω →R,
(t, ω) → X
t
(ω).
As is often the case in probability theory, we are not interested in events which occur with zero
probability. The theory of stochastic processes is no diﬀerent, and two processes X and Y are
said to be indistinguishable if there is an event A ⊆ Ω of probability one such that X
t
(ω) = Y
t
(ω)
for all ω ∈ A and all t ≥ 0. This is the same as saying that they almost surely (that is, with
probability one) have the same sample paths. Alternative language which is often used is that
X and Y are equivalent up to evanescence. In general, when discussing any properties of a
stochastic process, it is common to only care about what holds up to evanescence. For example,
if a processes has continuous sample paths with probability one, then it is referred to as a
continuous (resp., right-continuous and left-continuous) process and we dont care if it actually
has discontinuous (resp. non-right-continuous and non-left-continuous) paths on some event of
zero probability.
It is important to realize that even if we have two processes X and Y such that for each
ﬁxed t, the X
t
= Y
t
occurs with probability one, it is not necessarily the case that they are
indistinguishable. As an example, consider a random variable T uniformly distributed over the
interval [0, 1], and deﬁne the process X
t
= 1
{t=T}
. For each time t, P(X
t
,= 0) = P(T = t) = 0.
However, X is not indistinguishable from the zero process, as the sample path t → X
t
(ω) always
has one point at which X takes the value 1. The problem here is the uncountability of non-
negative real numbers used for the time index. By countable additivity of measures, if X
t
= Y
t
almost surely, for each t, then we can infer that the sample paths of X and Y are almost surely
identical on any given countable set of times S ⊂ R
+
, but cannot extend this to uncountable
sets.
This necessitates a further deﬁnition. A process Y is a version or modiﬁcation of X if, for
each time t ≥ 0, P(X
t
= Y
t
) = 1. Alternative language sometimes used is that X and Y are
stochastically equivalent.
Whenever a stochastic process is deﬁned in terms of its values at each individual time t, or
in terms of its joint distributions at ﬁnite times, then replacing it by any other version will still
satisfy the deﬁnition. It is therefore important to choose a good version. Right-continuous (or
left-continuous) versions are often used when possible, as this then deﬁnes the process up to
evanescence.
3
Lemma 3. Let X and Y be right-continuous processes (resp. left-continuous processes) such
that X
t
= Y
t
almost surely, at each time t ≥ 0. Then, they are indistinguishable.
Proof. We prove the lemma only for right-continuous processes, the proof for left continuous
processes is almost identical. Let A
S
:= ¦ω : X
t
(ω) ,= Y
t
(ω), for some t ∈ o¦. We want to
prove that P(A
R
≥0
) = 0. By countable additivity of measures we have that
P(A
Q
≥0
) =

t∈Q
≥0
P((¦ω : X
t
(ω) ,= Y
t
(ω)¦) = 0.
Fix any ω and suppose there exists a t ≥ 0 such that X
t
(ω) ,= Y
t
(ω). Then [X
t
(ω) −Y
t
(ω)[ = ε
for some ε > 0. By right-continuity of both processes we can ﬁnd, for almost all ω, a s ∈ Q
≥0
such that [X
t
(ω) −X
s
(ω)[ ≤ ε/3 and [Y
t
(ω) −Y
s
(ω)[ ≤ ε/3. Then
ε = [X
t
(ω)−Y
t
(ω)[ ≤ [X
s
(ω)−Y
s
(ω)[+[X
t
(ω)−X
s
(ω)[+[Y
t
(ω)−Y
s
(ω)[ ≤ [X
s
(ω)−Y
s
(ω)[+2ε/3.
Thus, [X
s
(ω) −Y
s
(ω)[ ≥ ε/3 which implies that ω ∈ A
Q
≥0
. In other words
A
R
≥0
⊂ A
Q
≥0
∪ N
where N is some null event. So we have that P(A
R
≥0
) = 0.
Viewing a stochastic process in the third sense mentioned above, as a function on the product
space R
+
Ω, it is often necessary to impose a measurability condition. The process X is said to
be jointly measurable if it is measurable with respect to the product sigma-algebra B
[0,∞)
⊗T.
In the previous subsection I started by introducing the concept of a stochastic process and
their modiﬁcations. It is necessary to introduce a further concept, to represent the information
available at each time. A ﬁltration ¦T
t
¦
t≥0
on a probability space (Ω, T, P) is a collection of
sub-sigma-algebras of T satisfying T
s
⊆ T
t
whenever s ≤ t. The idea is that T
t
represents the
set of events observable by time t. The probability space taken together with the ﬁltration
(Ω, T, ¦T
t
¦
t≥0
, P) is called a ﬁltered probability space.
Given a ﬁltration, its right and left limits at any time and the limit at inﬁnity are as follows
T
t+
:=

s>t
T
s
, T
t−
= σ
_
_
s<t
T
s
_
, T

= σ
_
_
t∈R
≥0
T
t
_
.
The left limit as deﬁned here only really makes sense at positive times. Throughout these
notes, I deﬁne the left limit at time zero as T
0−
:= T
0
. The ﬁltration is said to be right-continuous
if T
t
= T
t+
.
A probability space (Ω, T, P) is complete if T contains all subsets of zero probability elements
of T. Any probability space can be extended to a complete probability space (its completion) in
a unique way by enlarging the sigma-algebra to consist of all sets A ⊂ Ω such that B ⊆ A ⊆ C for
B, C ∈ T satisfying P(C ¸ B) = 0. Similarly, a ﬁltered probability space is said to be complete
if the underlying probability space is complete and T
0
contains all zero probability sets.
Often, in stochastic process theory, ﬁltered probability spaces are assumed to satisfy the
usual conditions, meaning that it is complete and the ﬁltration is right-continuous. Note that
any ﬁltered probability space can be completed simply by completing the underlying probability
space and then adding all zero probability sets to each T
t
. Furthermore, replacing T
t
by T
t+
,
any ﬁltration can be enlarged to a right-continuous one. By these constructions, any ﬁltered
probability space can be enlarged in a minimal way to one satisfying the usual conditions.
Throughout these notes I assume a complete ﬁltered probability space. However,
for the sake of a bit more generality, I dont assume that ﬁltrations are right-continuous.
4
Remark 2 (by George Lowther). Many of the results can be extended to the non-complete case
without much diﬃculty.
One reason for using ﬁltrations is to deﬁne adapted processes. A stochastic process process
¦X
t
¦
t≥0
is said to be adapted if X
t
is an T
t
-measurable random variable for each time t ≥ 0.
This is just saying that the value X
t
is observable by time t. Conversely, the ﬁltration generated
by any process X is the smallest ﬁltration with respect to which it is adapted. This is given by
T
X
t
= σ (X
s
: s ≤ t), and referred to as the natural ﬁltration of X.
As mentioned in the previous post, it is often necessary to impose measurability constraints
on a process X
t
considered as a map R
+
Ω →R. Right-continuous and left-continuous processes
are automatically jointly measurability. When considering more general processes, it is useful
to combine the measurability concept with adaptedness.
Deﬁnition 4. A process X is progressively measurable, or just progressive, if for each t ≥ 0,
the map
[0, t] Ω →R,
(s, ω) → X
s
(ω)
is B([0, t]) ⊗T
t
-measurable.
In verifying that certain processes are progressive, we will ﬁnd the following lemma of use.
Proposition 3. If a process ¦X
t
¦
t≥0
on (Ω, T, T
t
, P) is right-continuous and T
t
it is progressively measurable.
Proof. Fix t ≥ 0 and consider the sequence of processes deﬁned by
X
n
(s, ω) := 1
{0}
(s)X(0, ω) +
n

k=1
1

(k−1)t
n
,
kt
n

(s)X
_
kt
n
, ω
_
.
Since X
t
is T
t
adapted, the above processes are measurable functions from ([0, t] Ω, B
[0,t]
Ω)
to (E, B
E
) Note that if s = 0, t, then X
n
(s, ω) = X(s, ω) and if s ∈ (0, t) then
d(X
n
(s, ω), X(s, ω)) = d
_
X
_
t¸sn/t| +t
n
, ω
_
, X(s, ω)
_
.
Since X
t
is right-continuous, the above implies that for any (s, ω) in [0, t]Ω, X
n
(s, ω) → X(s, ω)
as n → ∞. That is X
s
, when viewed as a mapping from ([0, t] Ω, B
[0,t]
Ω) to (E, B
E
), is the
pointwise limit of measurable functions, and so, by Proposition 1, it is measurable. Since t was
arbitrary we have that X
t
is progressively measurable.
2.3 Stopping times and their associated sigma algebras
Deﬁnition 5. A stopping time for the ﬁltration ¦T
t
¦ is a random variable τ : Ω → [0, +∞] that
satisﬁes for every t ≥ 0
¦ω : τ(ω) ≤ t¦ ∈ T
t
.
This deﬁnition is equivalent to stating that the process X
t
(ω) := 1
[0,τ(ω)]
Equivalently, at any time t, the event ¦τ ≤ t¦ that the stopping time has already occurred is
observable.
One common way in which stopping times appear is as the ﬁrst time at which an adapted
stochastic process hits some value. The Debut Theorem states that this does indeed give a
stopping time.
5
Theorem 3 (Debut Theorem, a proof can be found here). Let X be an adapted right-continuous
stochastic process deﬁned on a complete ﬁltered probability space. If K is any real number then
τ : Ω →R
+
∪ ¦∞¦ deﬁned by
τ(ω) =
_
inf ¦t ∈ R
+
: X
t
(ω) ≥ K¦ if ¦t ∈ R
+
: X
t
(ω) ≥ K¦ , = ∅
+∞ otherwise
is a stopping time.
The class of stopping times is closed under basic operations such as taking the maximum
or minimum of two times or, for right-continuous ﬁltrations, taking the limit of a sequence of
times. We prove the ﬁrst statement, the proof for the second can be found here.
Proposition 4. If σ, τ are stopping times, then so are σ ∨ τ and σ ∧ τ.
Proof. Proposition 1 implies that all of the functions formed by the operations in the premise
are measurable. Next,
¦ω : (τ ∨ σ)(ω) ≤ t¦ = ¦ω : τ(ω) ≤ t¦ ∩ ¦ω : σ(ω) ≤ t¦
¦ω : (τ ∧ σ)(ω) ≤ t¦ = ¦ω : τ(ω) ≤ t¦ ∪ ¦ω : σ(ω) ≤ t¦
.
Since, both sets on the RHS belong to T
t
and T
t
is closed under ﬁnite intersections and inter-
sections, the LHS belongs to T
t
and so both τ ∨ σ and τ ∧ σ and are stopping times.
We say that a stopping time τ is bounded if there exists a constant C such that τ(ω) ≤ C for
almost all ω. We say that a stopping time takes ﬁnitely many values if for all ω, τ(ω) belongs to
some subset of [0, +∞] of ﬁnite cardinality. Since stopping times that take ﬁnitely many values
are particularly easy to analyse, it is handy to have a result that allows us to approximate more
general stopping times by ﬁnitely many valued ones.
Proposition 5. Suppose that f : [0, +∞] → [0, +∞] is function such that f(t) ≥ t, f(R) =
f
−1
(R) = R and its restriction to R is measurable mapping from (R, B
r
) to itself. Also, suppose
that τ is a stopping time. Then f ◦ τ is a stopping time. If τ is bounded, then there exists a
sequence ¦τ
k
¦ of stopping times that tends pointwise (that is, if τ(ω) is ﬁnite, τ
k
(ω) is as well
and τ
k
(ω) ↓ τ(ω), otherwise τ
k
(ω) = +∞) to f as k tends to inﬁnity. Furthermore, such that
for each ﬁxed k, τ
k
≥ τ, τ
k
takes only ﬁnitely many values and any constant that bounds above
τ also bounds above τ
k
.
Proof. For the ﬁrst part we show that the τ-preimage of any Borel subset of [0, t] is contained
in T
t
. Then we show that the f-preimage of [0, t] is a Borel subset of [0, t]. Putting both results
together we have that the f ◦τ-preimage of [0, t] is contained in T
t
, that is that f ◦τ is a stopping
time.
Let (
t
be the sub-collection of Borel subsets B of [0, t] such that τ
−1
(B) is in T
t
. The
stopping property of τ and the fact that T
s
⊂ T
t
for all s ≤ t implies that [0, s] is in (
t
for every
s ≤ t.
Since for any sets A and A
1
, A
2
, . . . , τ
−1
(A
c
) = (τ
−1
(A))
c
and τ
−1
(∪
i
A
i
) = ∪
i
τ
−1
(A
i
) and
since both B
[0,t]
and T
t
are sigma-algebras, (
t
is a sigma-algebra as well. Since the Borel sets
are generated by intervals we have that
B
[0,t]
= σ¦[0, s] : 0 ≤ s ≤ t¦ ⊂ σ¦(
t
¦ = (
t
.
Since by deﬁnition (
t
⊂ B
[0,t]
, the above gives us that (
t
= B
[0,t]
. Now, by our assumptions
on f, f
−1
([0, t]) is a Borel subset of R. Furthermore, since f(t) ≥ t, f
−1
([0, t]) is contained in
[0, t]. Thus f
−1
([0, t]) is contained in B
[0,t]
or, equivalently, in (
t
. Hence we have the desired
(f ◦ τ)
−1
([0, t]) ∈ T
t
, that is, that f ◦ τ is a stopping time.
6
Now for the last bit introduce τ
k
= f
k
(τ) :=
kτ+1
k
(if τ(ω) = ∞ set τ
k
(ω) = +∞). Clearly,
f
k
(t) ≥ t (thus, τ
k
≥ τ) and, for all ω, τ
k
(ω) ∈ ¦
1
k
,
2
k
, . . . , f
k
(C), +∞¦ (that is, τ
k
takes ﬁnitely
many values). Pick any ﬁxed ω, if τ(ω) = +∞, then τ
k
(ω) = τ(ω) for all k. Otherwise, the
sequence (τ
k
(ω)) is decreasing and bounded from below by τ(ω). Thus it converges to a limit
and an easy contradiction gives that the limit must be τ(ω). In other words, (τ
k
) converges
pointwise to τ.
Clearly for any k f
k
(R) = f
−1
k
(R) = R. So to conclude that τ
k
is a stopping time for any
k ≥ 0 all that remains to be shown is that f
k
[
R
: (R, B
R
) → (R, B
R
) is measurable. We have
that for every k ≥ 0
¸kx| = lim
n→∞
n

i=1
i1
[i/k,(i+1)/k)
(x).
So by Proposition 1 we have that ¸kx| is the pointwise limit of measurable functions (indicator
functions are measurable if and only if the set they measure is measurable). Hence applying
same proposition again we have that f
k
is measurable.
An T
t
-adapted stochastic process X can be sampled at stopping times. However, X
τ
() :=
X(τ(), ) is merely a random variable and not a stochastic process. It is natural to extend the
notion of adapted processes to random times and ask the following. What is the sigma-algebra of
observable events at the random time τ, and is X
τ
measurable with respect to this? The idea is
that if a set A is observable at time τ then for any time t, its restriction to the set ¦τ ≤ t¦ should
be in T
t
. As always, we work with respect to a ﬁltered probability space (Ω, T, ¦T
t
¦
t≥0
, P). The
sigma-algebra at the stopping time τ is then,
T
τ
= ¦A ∈ T

: A∩ ¦τ ≤ t¦ ∈ T
t
for all t ≥ 0¦
The restriction to sets in T

is to take account of the possibility that the stopping time can be
inﬁnite, and it ensures that A = A∩ ¦τ ≤ ∞¦ ∈ T

. From this deﬁnition, a random variable U
us T
τ
-measurable if and only if 1
{τ≤t}
U is T
t
-measurable for all times t ∈ R
+
∪ ¦∞¦.
With these deﬁnitions, the question of whether or not a process X is T
τ
-measurable at a
stopping time τ can be answered. There is one minor issue here though; stopping times can
be inﬁnite whereas stochastic processes in these notes are deﬁned on the time index set R
+
.
We could just restrict to the set ¦τ < ∞¦, but it is handy to allow the processes to take values
at inﬁnity. So, for the moment we consider a processes X
t
where the time index t runs over
¯
R
+
:= R
+
∪ ¦∞¦, and say that X is a predictable, optional or progressive process if it satisﬁes
the respective property restricted to times in R
+
and X

is T

-measurable.
Lemma 4. If X is a progressively measurable stochastic process and τ is a stopping time, then
X
τ
is T
τ
-measurable.
Proof. First we show that 1
τ<∞
X
τ
is T
τ
-measurable. From the deﬁnition we require that
1
τ<∞
1
τ≤t
X
τ
is T
t
-measurable for all t ∈ [0, +∞]. This is clearly the case if t = +∞ (since it’s
the zero function). Now suppose that t < +∞ and pick any B ∈ B
R
. We need to show that
D := ¦ω : X(τ(ω), ω) ∈ B¦ ∩ ¦ω : τ(ω) ≤ t¦ ∈ T
t
.
Since X is progressively measurable
A := ¦(s, ω) ∈ [0, t] Ω : X(s, ω) ∈ B¦ ∈ B
[0,t]
⊗T
t
.
Suppose that A is of the form A = A
1
A
2
where A
1
∈ B
[0,t]
, A
2
∈ T
t
. Then
C
A
:= ¦ω : (τ(ω), ω) ∈ A¦ = ¦ω : τ(ω) ∈ A
1
¦ ∩ A
2
.
Since τ is a stopping time, ¦ω : τ(ω) ≤ s¦ ∈ T
t
for all s ∈ [0, t]. Since the intervales [0, s],
0 ≤ s ≤ t, generate B
[0,t]
it follows that C is the intersection of two sets in T
t
and thus C
A
is
7
in T
t
as well. Furthermore since the sets of the form A = A
1
A
2
where A
1
∈ B
[0,t]
, A
2
∈ T
t
generate B
[0,t]
⊗ T
t
, it is straightforward to extend the above argument to show that C
A
∈ T
t
for any A ∈ B
[0,t]
⊗ T
t
(one needs to use the fact that the “generate sigma-algebra” operation
commutes with functions, see here or here). Comparing the deﬁnitions of D, A and C
A
it follows
that D = C
A
for some A ∈ B
[0,t]
⊗ T
t
. Since the t was arbitrary we have that 1
τ<+∞
X
τ
is T
τ
measurable.
It is easy to see that 1
τ=

X
τ
is T
τ
-measurable: 1
τ=+∞
1
τ≤t
X
τ
is the zero function for all
t < ∞, hence 1
τ=+∞
1
τ≤t
X
τ
T
t
-measurable). If t = ∞, then we have that 1
τ=+∞
1
τ≤t
X
τ
= X

is T
τ
-measurable since, by deﬁnition X

is T

-measurable.
2.4 Stopped processes
As well as simply observing the value at this time, as the name suggests, stopping times are
often used to stop the process. A process X stopped at the random time τ is denoted by X
τ
,
X
τ
(t, ω) := X
(
t ∧ τ(ω), ω).
It is important that stopping an adapted process at a stopping time preserves the basic
measurability properties. To show this, we ﬁrst need the following technical lemma.
Lemma 5. If X is jointly measurable and τ : Ω →R
+
is a measurable map (such a random
variable is called a random time) then X
τ
is measurable.
Proof. Let 1 denote the set of all progressively measurable processes X : [0, ∞) Ω → Ω such
that X
τ
: Ω → R is measurable. It follows from Proposition 1 that 1 is closed under the
addition and multiplication operations and under pointwise limits. Pick any B ∈ B
[0,∞)
and
F ∈ T. Since τ is measurable,
1
(τ(ω,ω)∈B×F
= 1
ω∈τ
−1
(B)∩F
is measurable as well. Thus, 1
(t,ω)∈B×F
∈ 1. Applying the Functional Monotone Class Theorem
(Theorem 1) we have that 1 contains all bounded progressively measurable functions. For any
progressive processes X, it follows from Proposition 1 that Y
n
(t, ω) := n∧X(t, ω) is a sequence
of bounded progressive processes that tends pointwise to X as n → ∞. Since 1 is closed under
pointwise limits, X ∈ 1 completing the claim.
Lemma 6. Let τ be a stopping time. If the stochastic process X satisﬁes any of the following
properties then so does the stopped process X
τ
.
• Predictable.
• Optional.
• Progressively measurable.
Proof. Lemma 5 states that X is jointly measurable and τ is any random time then X
τ
is
measurable (see here). It follows from the decomposition
X
τ
t
= 1
{t≤τ}
X
t
+ 1
{t>τ}
X
τ
,
that X
τ
is also jointly measurable. Now suppose that X is progressive and T ≥ 0 is any ﬁxed
time. By deﬁnition, X
T
is B(R
+
) ⊗T
T
-measurable and, if τ is a stopping time, then τ ∧ T is
T
T
-measurable. Then, by what we have just shown above, the stopped process
8
(X
τ
)
T
= X
τ∧T
= (X
T
)
τ∧T
is B(R
+
) ⊗T
T
-measurable. This shows that X
τ
is progressive.
9
3 Continuous time martingales
All processes discussed in this section, with the exception of Martingales, take values in (E, B
E
),
where E is a complete separable metric space and B
E
is the collection of Borel subsets of E. We
reserve the symbol d : E E → [0, ∞) for the metric on E. The martingales discussed here all
take values in (R, B
R
) (since we need a total order on the space for the deﬁnition of a martingale
to make sense).
3.1 Martingales and stopping times
Deﬁnition 6. Let (Ω, T) be a measurable space. A family ¦T
t
⊂ T : t ≥ 0¦ of sub-sigma-
algebras of T is called a ﬁltration if T
s
⊂ T for any s ≤ t.
With a stochastic processes ¦X
t
¦
t≥0
we associate its “natural ﬁltration” T
X
t
:= σ¦X
s
: s ≤
t¦. Note we sometimes write X(t, ω) instead of X
t
(ω).
Deﬁnition 7. Given a probability space (Ω, T, P) and a ﬁltration T
t
⊂ T indexed by t ≥ 0, a
stochastic process ¦M
t
¦
t≥0
is called a martingale relative to (Ω, T
t
, P) if the following hold
• For almost all ω, M
t
(ω) has left and right limits at every t and is continuous from the
right, that is, M(t + 0, ω) := lim
h↓0
M(t +h, ω) = M(t, ω).
• For every ﬁxed t ≥ 0, M(t) is T
t
measurable and integrable.
• For 0 ≤ s ≤ t
E[M
t
[T
s
] = M(s) a.e.
If instead we have that E[M
t
[T
s
] ≥ M(s) (or E[M
t
[T
s
] ≤ M(s)) we say that M
t
is a super-
matingale (respectively submartingale)
2
.
Remark 3. People often do not require the ﬁrst bullet point in the deﬁnition and call mar-
tingales that satisfy it c´adl´ag martingales (that is, “continue ´ a droite, limite ´a gauche”). Our
assumption that a martingale is c´adl´ag from the get go is justiﬁed by the fact that, under some
mild conditions, every martingale has a c´adl´ag modiﬁcation see Theorem 3 here.
Proposition 6. Suppose that φ : R →R is a convex function, M
t
is a martingale and φ(M
t
) is
integrable. Then, φ(M
t
) is a submartingale. In particular, [X(t)[ is a submartingale.
Proof. Because φ is convex it is also continuous, thus φ(M
t
) is also c´adl´ag. For any 0 ≤ s ≤ t,
by Jensen’s inequality we have that E[φ(M
t
)[T
s
] ≥ φ(E[M
t
[T
s
]) = φ(M
s
).
3.2 Doob’s Optional Stopping Theorem
Theorem 4 (Optional Stopping Theorem). If τ
1
, τ
2
are two bounded stopping times such that
τ
1
≤ τ
2
and M
t
is a martingale, then
E[M
τ
2
[T
τ
1
] = M(τ
1
)
holds almost surely.
To prove the above, we need the following lemma.
Lemma 7. Let X be an integrable random variable on a probability space (Ω, T, P) and X
Σ
:=
E[X[Σ] where Σ ⊂ T is a sigma-algebra. The collection ¦X
Σ
¦ as Σ varies over all sub-sigma-
algebras of T is uniformly integrable.
2
Easy way to remember which one is which: just remember they are named the opposite way any reasonable
human being would have named them
10
Proof. Applying Jensen’s inequality in the rightmost inequality and the tower property of con-
ditional expectation in the last equality we have that
P[[X
Σ
[ ≥ l] = E[1
{|X
Σ
|≥l}
] =
E[l1
{|X
Σ
|≥l}
]
l

E[[X
Σ
[1
{|X
Σ
|≥l}
]
l

E[[X
Σ
[]
l

E[E[[X[[Σ]]
l
=
E[[X[]
l
.
(1)
By the deﬁnition of X
Σ
, ¦ω : [X
Σ
(ω)[ ≥ l¦ ∈ Σ. So
_
|X
Σ
|≥l
[X
Σ
[dP ≤
_
|X
Σ
|≥l
E[[X[[Σ]dP =
_
|X
Σ
|≥l
[X[dP,
where we’ve used Jensen’s inequality to obtain inequality and the deﬁnition of E[[X[[Σ] to
obtain the equality. It follows from (1) that ([X[1
|X
Σ
|≥n
)
n
converges pointwise to a function with
support of measure 0 (that is, [X[1

n
{|X
Σ
|≥n}
). So by the Dominated Convergence Theorem we
have that
lim
n→∞
_
[X[1
|X
Σ
|≥n
dP =
_
[X[1

n
{|X
Σ
|≥n}
dP = 0.
The above implies that for any we can ﬁnd an N ∈ [0, ∞) such that
_
|X
Σ
|≥n
E[[X[[Σ]dP ≤ ε
for all n ≥ N and all sub-sigma-algebras Σ of T.
Proof of Theorem 4. We do this in three steps. First we prove that if τ ≤ C is a bounded stop-
ping time that takes ﬁnitely many values, then M
τ
is T
τ
-measurable (and thus T-measurable),
integrable and that
E[M
C
[T
τ
] = M
τ
. (2)
holds. Next we show repeat the above for the case that τ ≤ C is just a bounded stopping time.
Lastly, we show that if τ
1
and τ
2
are as in the premise, equation displayed in the conclusion
holds.
Step 1: Suppose that τ is a stopping time bounded by C and that τ(ω) is in some ﬁnite
set ¦t
1
, t
2
, . . . , t
l
¦ for all ω. For all i = 1, 2 . . . , l let E
i
:= ¦ω : τ
k
(ω) = t
i
¦, and note that
E
1
⊂ E
2
⊂ . . . E
l
. First note that
M
τ
= 1
E
1
M
t
1
+ 1
E
2
M
t
2
+ + 1
E
l
M
t
l
.
So by Proposition 1 M
τ
is T
t
l
-measurable (hence, T-measurable) and by the triangle rule in
L
1
(P) it is integrable. Next, pick any Borel set B. We already know that ¦ω : X(τ(ω), ω) ∈ B¦
lies in T, but we need to verify that it also lies in T
τ
. That is, that for all i
¦ω : M(τ(ω), ω) ∈ B¦ ∩ E
i
∈ T
t
i
.
But
¦ω : M(τ(ω), ω) ∈ B¦ ∩ ¦ω : τ(ω) ≤ t
i
¦ =
l
_
j=1
¦ω : τ(ω) = t
j
, M
t
j
(ω) ∈ B¦ ∩ E
i
=
n
_
j=1
(E
j
−E
j−1
) ∩ M
−1
t
j
(B) ∩ E
i
=
i
_
j=1
(E
j
−E
j−1
) ∩ M
−1
t
j
(B).
Since M
t
t
, τ is an T
t
-stopping time and T
t
is closed under ﬁnite intersections
and unions, the above lies in T
t
i
. In other words, M
τ
is T
τ
measurable. Next pick any A ∈ T
τ
.
Since τ is a stopping time and it is bounded by C we have that A∩ E
i
∈ T
t
i
and
_
A∩E
i
M
τ
dP =
_
A∩E
i
M
t
i
dP =
_
A∩E
i
E[M
C
[T
t
i
]dP =
_
1
A∩E
i
E[M
C
[T
t
i
]dP
=
_
E[1
A∩E
i
M
C
[T
t
i
]dP =
_
1
A∩E
i
M
C
dP =
_
A∩E
i
M
C
dP.
11
Summing of i then gives us that
_
A
M
τ
dP =
_
A
M
C
dP.
Since A ∈ T
τ
was arbitrary, the above and uniqueness of conditional expectation implies that
E[M
C
[T
τ
] = M
τ
(almost surely, of course).
Step 2: Suppose that τ ≤ C is a bounded stopping time.
———————————————-
To prove that M
τ
is T-measurable, integrable and that (3) holds it is suﬃcient to prove it
for some sequence of stopping times (τ
k
) such that τ
k
≤ C almost surely and τ
k
↓ τ (that is,
converges pointwise from above to τ). Since M
τ
is c´adl´ag, for almost all ω, τ
k
(ω) ↓ τ(ω) implies
that M(τ
k
(ω), ω) → M(τ(ω), ω)
Then, using Proposition ?? again, we have that if A ∈ T
τ
it is also true that A ∈ T
τ
k
and
thus
_
A
M
C
dP =
_
A
E[M
C
[T
τ
k
]dP =
_
A
M
τ
k
dP.
That is, M
τ
k
tends pointwise to M
τ
. Furthermore, by Jensen’s inequality we have that for any
k E[[M
τ
k
[] = E[[E[M
C
[T
τ
][] ≤ E[E[[M
C
[[T
τ
]] = E[[M
C
[] < ∞ (the last inequality follows from
the deﬁnition of a martingale). Hence applying Vitali’s Convergence Theorem (Theorem 2) we
have that M
τ
is integrable and that
_
A
M
τ
dP = lim
k→∞
_
A
M
τ
k
dP = lim
k→∞
_
A
M
C
dP =
_
A
M
C
dP.
Since the above holds for any A ∈ T
τ
, uniqueness of conditional expectation implies establishes
(3).
Suppose that we prove that for any stopping time τ bounded by C, M
τ
is T-measurable,
integrable and that
E[M
C
[T
τ
] = M
τ
. (3)
Then we would have that
E[M
τ
2
[T
τ
1
] = E[E[M
C
[T
τ
2
][T
τ
1
] = E[M
C
[T
τ
1
] = M
τ
1
where the second equality follows from the tower property and Proposition ??. To prove that
M
τ
is T-measurable, integrable and that (3) holds it is suﬃcient to prove it for some sequence
of stopping times (τ
k
) such that τ
k
≤ C almost surely and τ
k
↓ τ (that is, converges pointwise
from above to τ). Since M
τ
is c´adl´ag, for almost all ω, τ
k
(ω) ↓ τ(ω) implies that M(τ
k
(ω), ω) →
M(τ(ω), ω)
Then, using Proposition ?? again, we have that if A ∈ T
τ
it is also true that A ∈ T
τ
k
and
thus
_
A
M
C
dP =
_
A
E[M
C
[T
τ
k
]dP =
_
A
M
τ
k
dP.
That is, M
τ
k
tends pointwise to M
τ
. Furthermore, by Jensen’s inequality we have that for any
k E[[M
τ
k
[] = E[[E[M
C
[T
τ
][] ≤ E[E[[M
C
[[T
τ
]] = E[[M
C
[] < ∞ (the last inequality follows from
the deﬁnition of a martingale). Hence applying Vitali’s Convergence Theorem (Theorem 2) we
have that M
τ
is integrable and that
_
A
M
τ
dP = lim
k→∞
_
A
M
τ
k
dP = lim
k→∞
_
A
M
C
dP =
_
A
M
C
dP.
Since the above holds for any A ∈ T
τ
, uniqueness of conditional expectation implies establishes
(3).
12
Proposition 5 gives us an approximating sequence (τ
k
) as required above, such that for each
ﬁxed k, τ
k
(ω) is in some ﬁnite set ¦t
1
, t
2
, . . . , t
l
¦ for almost all ω. Fix k, pick any A ∈ T
τ
k
and
let E
i
:= ¦ω : τ
k
(ω) = t
i
¦. Since τ
k
is a stopping time we have that A∩ E
i
∈ T
t
i
and
_
A∩E
i
M
τ
k
dP =
_
A∩E
i
M
t
i
dP =
_
A∩E
i
E[M
C
[T
t
i
]dP =
_
1
A∩E
i
E[M
C
[T
t
i
]dP
=
_
E[1
A∩E
i
M
C
[T
t
i
]dP =
_
1
A∩E
i
M
C
dP =
_
A∩E
i
M
C
dP.
Summing of i then gives us that
_
A
M
τ
k
dP =
_
A
M
C
dP.
Since A ∈ T
τ
k
was arbitrary, the above implies that E[M
C
[T
τ
k
] = M
τ
k
(almost surely, of
course).
Corollary 2. If τ is any stopping time and M
t
is a martingale with respect to (Ω, T
t
, P), then
so is M
τ∧t
.
Proof. Proposition ?? establishes that for every ﬁxed t, τ ∧ t is a stopping time. Next Lemma
3 in here establishes that M
τ∧t
is c´adl´ag and that for every ﬁxed t it is T
t
measurable.
References
[1] R Durrett. Probability: theory and examples. Cambridge university press, 3rd edition, 2010.
[2] AN Shiryaev. Probability. Graduate texts in mathematics. Springer, 2nd edition, 1996.
[3] Ramon van Handel. Stochastic Calculus, Filtering, and Stochastic Control. 2007.
13