You are on page 1of 13

1 Preliminaries

Collection of useful facts.
Lemma 1 (Jensen’s Inequality, Proposition 1.4.2 in [3]). Suppose that X is an integrable R-
valued random variable on (Ω, T, µ) and φ is a convex function. Then φ(E[X]) ≤ E[φ(X)].
Proposition 1 (Operations that preserve measurability, see here). Suppose that (f
n
), f and g
map from some measurable space to R and that α ∈ R. Then
f +g, αf, fg, f ∨ g, f ∧ g, limsup
n→∞
f
n
, liminf
n→∞
f
n
are all measurable. So if (f
n
) converges point-wise to f we have that f is measurable, since
f = limsup
n→∞
f
n
= liminf
n→∞
f
n
.
Theorem 1 (Functional Monotone Class Theorem, theorem 1.5 p.277 of [1]). Let / be a π-
system (that is, closed under finite intersections) that contains Ω and let 1 be a collection of
real-valued functions on Ω that satisfies:
• If A ∈ /, then 1
A
∈ 1.
• If f, g ∈ 1, then f +g, cf ∈ 1 for any real number c.
• If (f
n
) ⊂ 1 is a a sequence of non-negative, increasing functions that tend pointwise to a
bounded function f, then f ∈ 1.
Then 1 contains all bounded functions that are measurable with respect to σ(/).
1.1 Dynkin’s π-λ Theorem
Definition 1 (π and λ-systems). Let Ω be a set. A π-system of Ω is a collection of subsets of Ω
that is closed under finite intersections. A λ-system T is a collection of subsets of Ω such that
Ω ∈ T and A, B ∈ T such that A ⊂ B implies that B − A ∈ T and such that for any sequence
of increasing sets A
1
⊂ A
2
⊂ . . . contained in T, the union of the sets is also contained in T.
Lemma 2 (Dynkin’s π-λ Theorem, Lemma 1.7.2 in [3]). Let ( be a π-system on some set Ω
and let T := λ¦(¦ be the smallest λ-system on Ω such that ( ⊂ T. Then T = σ¦(¦.
Corollary 1 (Lemma 1.7.3 in [3]). Let (Ω, T) be a measurable space, and let ( be a π-system
such that T = σ¦(¦. If two probability measures P and Q agree on (, that is, if we have that
P(A) = Q(A) for all A ∈ (, then P and Q are equal (that is P(A) = Q(A) for all A ∈ T.
1.2 Convergence Theorems
Definition 2. A collection of random variables ¦X
α
¦
α
on a probability space (Ω, T, P) is said
to be uniformly integrable if c → ∞ implies that
sup
α
_
|X
α
|≥c
[X
n
[dP → 0.
Theorem 2 (Vitali’s Convergence Theorem, Theorem 4 on page 188 of [2]). Let (X
n
) be a
sequence of R-valued integrable random variables. Suppose that (X
n
) is uniformly integrable and
that X
n
→ X almost surely. Then X is integrable and E[[X
n
− X[] → 0 as n → ∞ (which, by
Jensen’s inequality, implies that E[X
n
] → E[X] as n → ∞).
1
1.3 Conditional expectation
Definition 3. Let (Ω, T, µ) be a probability space and let X be an R
n
-valued integrable (that
is, E[[X
i
[] < ∞ for all i) random variable and let ( be any sigma-algebra. Then E[X[(] is the
unique
1
(-measurable random variable that satisfies the relation E[1
A
X] = E[1
A
E[X[(]] for all
events A ∈ (.
Conditional expectations have the following handy properties:
Proposition 2 (Theorem 2.3.2. in [3]). Let X, Y be real valued integrable functions on (Ω, T, µ),
let (, 1 ⊂ T be sigma-algebras, and let α, β ∈ R. Then
1. Linearity: E[αX +βY [(] = αE[X[(] +βE[Y [(].
2. If X ≥ 0 almost surely, then E[X[(] ≥ 0 almost surely.
3. Tower property: E[E[X(][1] = E[X[1] almost surely. Note that this implies that
E[X] = E[X[¦∅, Ω¦] = E[E[X[(][¦∅, Ω¦] = E[E[X[(]].
4. If X is (-measurable and XY is integrable, then E[XY [(] = XE[Y [(] almost surely. Thus
E[X[(] = XE[1

[(] = X almost surely.
5. If 1 and σ¦X, (¦ are independent, then E[X[σ¦(, 1¦] = E[X[(] almost surely. Thus
E[X[1] = E[X[σ¦1, ¦Ω, ∅¦¦] = E[X[¦Ω, ∅¦] = E[X] almost surely.
6. Monotone and dominated convergence theorems, Fatou’s lemma, and Jensen’s inequality
all hold if we replace in them E[X] with E[X[(].
1
Uniqueness is a consequence of the Radon-Nikodym Theorem, see [3].
2
2 Stochastic processes, filtrations and stopping times
2.1 Stochastic processes, indistinguishability and modifications
I start these notes on stochastic calculus with the definition of a continuous time stochastic
process. Very simply, a stochastic process is a collection of random variables ¦X
t
¦
t≥0
defined
on a probability space ¦(Ω, T, P)¦. That is, for each time t ≥ 0, ω → X
t
(ω) is a measurable
function from Ω to the real numbers.
Remark 1 (by Georgy Lowther). Stochastic processes may also take values in any measurable
space (E, c) but, in these notes, I concentrate on real valued processes. I am also restricting
to the case where the time index t runs through the non-negative real numbers R
+
, although
everything can easily be generalized to other subsets of the reals.
A stochastic process X := ¦X
t
¦
t≥0
can be viewed in either of the following three ways.
As a collection of random variables, one for each time t ≥ 0. As a collection of paths
R
+
→R,
t → X
t
(ω),
one for each ω ∈ Ω. These are referred to as the sample paths of the process.
As a function from the product space
R
+
Ω →R,
(t, ω) → X
t
(ω).
As is often the case in probability theory, we are not interested in events which occur with zero
probability. The theory of stochastic processes is no different, and two processes X and Y are
said to be indistinguishable if there is an event A ⊆ Ω of probability one such that X
t
(ω) = Y
t
(ω)
for all ω ∈ A and all t ≥ 0. This is the same as saying that they almost surely (that is, with
probability one) have the same sample paths. Alternative language which is often used is that
X and Y are equivalent up to evanescence. In general, when discussing any properties of a
stochastic process, it is common to only care about what holds up to evanescence. For example,
if a processes has continuous sample paths with probability one, then it is referred to as a
continuous (resp., right-continuous and left-continuous) process and we dont care if it actually
has discontinuous (resp. non-right-continuous and non-left-continuous) paths on some event of
zero probability.
It is important to realize that even if we have two processes X and Y such that for each
fixed t, the X
t
= Y
t
occurs with probability one, it is not necessarily the case that they are
indistinguishable. As an example, consider a random variable T uniformly distributed over the
interval [0, 1], and define the process X
t
= 1
{t=T}
. For each time t, P(X
t
,= 0) = P(T = t) = 0.
However, X is not indistinguishable from the zero process, as the sample path t → X
t
(ω) always
has one point at which X takes the value 1. The problem here is the uncountability of non-
negative real numbers used for the time index. By countable additivity of measures, if X
t
= Y
t
almost surely, for each t, then we can infer that the sample paths of X and Y are almost surely
identical on any given countable set of times S ⊂ R
+
, but cannot extend this to uncountable
sets.
This necessitates a further definition. A process Y is a version or modification of X if, for
each time t ≥ 0, P(X
t
= Y
t
) = 1. Alternative language sometimes used is that X and Y are
stochastically equivalent.
Whenever a stochastic process is defined in terms of its values at each individual time t, or
in terms of its joint distributions at finite times, then replacing it by any other version will still
satisfy the definition. It is therefore important to choose a good version. Right-continuous (or
left-continuous) versions are often used when possible, as this then defines the process up to
evanescence.
3
Lemma 3. Let X and Y be right-continuous processes (resp. left-continuous processes) such
that X
t
= Y
t
almost surely, at each time t ≥ 0. Then, they are indistinguishable.
Proof. We prove the lemma only for right-continuous processes, the proof for left continuous
processes is almost identical. Let A
S
:= ¦ω : X
t
(ω) ,= Y
t
(ω), for some t ∈ o¦. We want to
prove that P(A
R
≥0
) = 0. By countable additivity of measures we have that
P(A
Q
≥0
) =

t∈Q
≥0
P((¦ω : X
t
(ω) ,= Y
t
(ω)¦) = 0.
Fix any ω and suppose there exists a t ≥ 0 such that X
t
(ω) ,= Y
t
(ω). Then [X
t
(ω) −Y
t
(ω)[ = ε
for some ε > 0. By right-continuity of both processes we can find, for almost all ω, a s ∈ Q
≥0
such that [X
t
(ω) −X
s
(ω)[ ≤ ε/3 and [Y
t
(ω) −Y
s
(ω)[ ≤ ε/3. Then
ε = [X
t
(ω)−Y
t
(ω)[ ≤ [X
s
(ω)−Y
s
(ω)[+[X
t
(ω)−X
s
(ω)[+[Y
t
(ω)−Y
s
(ω)[ ≤ [X
s
(ω)−Y
s
(ω)[+2ε/3.
Thus, [X
s
(ω) −Y
s
(ω)[ ≥ ε/3 which implies that ω ∈ A
Q
≥0
. In other words
A
R
≥0
⊂ A
Q
≥0
∪ N
where N is some null event. So we have that P(A
R
≥0
) = 0.
Viewing a stochastic process in the third sense mentioned above, as a function on the product
space R
+
Ω, it is often necessary to impose a measurability condition. The process X is said to
be jointly measurable if it is measurable with respect to the product sigma-algebra B
[0,∞)
⊗T.
2.2 Filtrations and adapted processes
In the previous subsection I started by introducing the concept of a stochastic process and
their modifications. It is necessary to introduce a further concept, to represent the information
available at each time. A filtration ¦T
t
¦
t≥0
on a probability space (Ω, T, P) is a collection of
sub-sigma-algebras of T satisfying T
s
⊆ T
t
whenever s ≤ t. The idea is that T
t
represents the
set of events observable by time t. The probability space taken together with the filtration
(Ω, T, ¦T
t
¦
t≥0
, P) is called a filtered probability space.
Given a filtration, its right and left limits at any time and the limit at infinity are as follows
T
t+
:=

s>t
T
s
, T
t−
= σ
_
_
s<t
T
s
_
, T

= σ
_
_
t∈R
≥0
T
t
_
.
The left limit as defined here only really makes sense at positive times. Throughout these
notes, I define the left limit at time zero as T
0−
:= T
0
. The filtration is said to be right-continuous
if T
t
= T
t+
.
A probability space (Ω, T, P) is complete if T contains all subsets of zero probability elements
of T. Any probability space can be extended to a complete probability space (its completion) in
a unique way by enlarging the sigma-algebra to consist of all sets A ⊂ Ω such that B ⊆ A ⊆ C for
B, C ∈ T satisfying P(C ¸ B) = 0. Similarly, a filtered probability space is said to be complete
if the underlying probability space is complete and T
0
contains all zero probability sets.
Often, in stochastic process theory, filtered probability spaces are assumed to satisfy the
usual conditions, meaning that it is complete and the filtration is right-continuous. Note that
any filtered probability space can be completed simply by completing the underlying probability
space and then adding all zero probability sets to each T
t
. Furthermore, replacing T
t
by T
t+
,
any filtration can be enlarged to a right-continuous one. By these constructions, any filtered
probability space can be enlarged in a minimal way to one satisfying the usual conditions.
Throughout these notes I assume a complete filtered probability space. However,
for the sake of a bit more generality, I dont assume that filtrations are right-continuous.
4
Remark 2 (by George Lowther). Many of the results can be extended to the non-complete case
without much difficulty.
One reason for using filtrations is to define adapted processes. A stochastic process process
¦X
t
¦
t≥0
is said to be adapted if X
t
is an T
t
-measurable random variable for each time t ≥ 0.
This is just saying that the value X
t
is observable by time t. Conversely, the filtration generated
by any process X is the smallest filtration with respect to which it is adapted. This is given by
T
X
t
= σ (X
s
: s ≤ t), and referred to as the natural filtration of X.
As mentioned in the previous post, it is often necessary to impose measurability constraints
on a process X
t
considered as a map R
+
Ω →R. Right-continuous and left-continuous processes
are automatically jointly measurability. When considering more general processes, it is useful
to combine the measurability concept with adaptedness.
Definition 4. A process X is progressively measurable, or just progressive, if for each t ≥ 0,
the map
[0, t] Ω →R,
(s, ω) → X
s
(ω)
is B([0, t]) ⊗T
t
-measurable.
In verifying that certain processes are progressive, we will find the following lemma of use.
Proposition 3. If a process ¦X
t
¦
t≥0
on (Ω, T, T
t
, P) is right-continuous and T
t
-adapted, then
it is progressively measurable.
Proof. Fix t ≥ 0 and consider the sequence of processes defined by
X
n
(s, ω) := 1
{0}
(s)X(0, ω) +
n

k=1
1

(k−1)t
n
,
kt
n

(s)X
_
kt
n
, ω
_
.
Since X
t
is T
t
adapted, the above processes are measurable functions from ([0, t] Ω, B
[0,t]
Ω)
to (E, B
E
) Note that if s = 0, t, then X
n
(s, ω) = X(s, ω) and if s ∈ (0, t) then
d(X
n
(s, ω), X(s, ω)) = d
_
X
_
t¸sn/t| +t
n
, ω
_
, X(s, ω)
_
.
Since X
t
is right-continuous, the above implies that for any (s, ω) in [0, t]Ω, X
n
(s, ω) → X(s, ω)
as n → ∞. That is X
s
, when viewed as a mapping from ([0, t] Ω, B
[0,t]
Ω) to (E, B
E
), is the
pointwise limit of measurable functions, and so, by Proposition 1, it is measurable. Since t was
arbitrary we have that X
t
is progressively measurable.
2.3 Stopping times and their associated sigma algebras
Definition 5. A stopping time for the filtration ¦T
t
¦ is a random variable τ : Ω → [0, +∞] that
satisfies for every t ≥ 0
¦ω : τ(ω) ≤ t¦ ∈ T
t
.
This definition is equivalent to stating that the process X
t
(ω) := 1
[0,τ(ω)]
(t) is adapted.
Equivalently, at any time t, the event ¦τ ≤ t¦ that the stopping time has already occurred is
observable.
One common way in which stopping times appear is as the first time at which an adapted
stochastic process hits some value. The Debut Theorem states that this does indeed give a
stopping time.
5
Theorem 3 (Debut Theorem, a proof can be found here). Let X be an adapted right-continuous
stochastic process defined on a complete filtered probability space. If K is any real number then
τ : Ω →R
+
∪ ¦∞¦ defined by
τ(ω) =
_
inf ¦t ∈ R
+
: X
t
(ω) ≥ K¦ if ¦t ∈ R
+
: X
t
(ω) ≥ K¦ , = ∅
+∞ otherwise
is a stopping time.
The class of stopping times is closed under basic operations such as taking the maximum
or minimum of two times or, for right-continuous filtrations, taking the limit of a sequence of
times. We prove the first statement, the proof for the second can be found here.
Proposition 4. If σ, τ are stopping times, then so are σ ∨ τ and σ ∧ τ.
Proof. Proposition 1 implies that all of the functions formed by the operations in the premise
are measurable. Next,
¦ω : (τ ∨ σ)(ω) ≤ t¦ = ¦ω : τ(ω) ≤ t¦ ∩ ¦ω : σ(ω) ≤ t¦
¦ω : (τ ∧ σ)(ω) ≤ t¦ = ¦ω : τ(ω) ≤ t¦ ∪ ¦ω : σ(ω) ≤ t¦
.
Since, both sets on the RHS belong to T
t
and T
t
is closed under finite intersections and inter-
sections, the LHS belongs to T
t
and so both τ ∨ σ and τ ∧ σ and are stopping times.
We say that a stopping time τ is bounded if there exists a constant C such that τ(ω) ≤ C for
almost all ω. We say that a stopping time takes finitely many values if for all ω, τ(ω) belongs to
some subset of [0, +∞] of finite cardinality. Since stopping times that take finitely many values
are particularly easy to analyse, it is handy to have a result that allows us to approximate more
general stopping times by finitely many valued ones.
Proposition 5. Suppose that f : [0, +∞] → [0, +∞] is function such that f(t) ≥ t, f(R) =
f
−1
(R) = R and its restriction to R is measurable mapping from (R, B
r
) to itself. Also, suppose
that τ is a stopping time. Then f ◦ τ is a stopping time. If τ is bounded, then there exists a
sequence ¦τ
k
¦ of stopping times that tends pointwise (that is, if τ(ω) is finite, τ
k
(ω) is as well
and τ
k
(ω) ↓ τ(ω), otherwise τ
k
(ω) = +∞) to f as k tends to infinity. Furthermore, such that
for each fixed k, τ
k
≥ τ, τ
k
takes only finitely many values and any constant that bounds above
τ also bounds above τ
k
.
Proof. For the first part we show that the τ-preimage of any Borel subset of [0, t] is contained
in T
t
. Then we show that the f-preimage of [0, t] is a Borel subset of [0, t]. Putting both results
together we have that the f ◦τ-preimage of [0, t] is contained in T
t
, that is that f ◦τ is a stopping
time.
Let (
t
be the sub-collection of Borel subsets B of [0, t] such that τ
−1
(B) is in T
t
. The
stopping property of τ and the fact that T
s
⊂ T
t
for all s ≤ t implies that [0, s] is in (
t
for every
s ≤ t.
Since for any sets A and A
1
, A
2
, . . . , τ
−1
(A
c
) = (τ
−1
(A))
c
and τ
−1
(∪
i
A
i
) = ∪
i
τ
−1
(A
i
) and
since both B
[0,t]
and T
t
are sigma-algebras, (
t
is a sigma-algebra as well. Since the Borel sets
are generated by intervals we have that
B
[0,t]
= σ¦[0, s] : 0 ≤ s ≤ t¦ ⊂ σ¦(
t
¦ = (
t
.
Since by definition (
t
⊂ B
[0,t]
, the above gives us that (
t
= B
[0,t]
. Now, by our assumptions
on f, f
−1
([0, t]) is a Borel subset of R. Furthermore, since f(t) ≥ t, f
−1
([0, t]) is contained in
[0, t]. Thus f
−1
([0, t]) is contained in B
[0,t]
or, equivalently, in (
t
. Hence we have the desired
(f ◦ τ)
−1
([0, t]) ∈ T
t
, that is, that f ◦ τ is a stopping time.
6
Now for the last bit introduce τ
k
= f
k
(τ) :=
kτ+1
k
(if τ(ω) = ∞ set τ
k
(ω) = +∞). Clearly,
f
k
(t) ≥ t (thus, τ
k
≥ τ) and, for all ω, τ
k
(ω) ∈ ¦
1
k
,
2
k
, . . . , f
k
(C), +∞¦ (that is, τ
k
takes finitely
many values). Pick any fixed ω, if τ(ω) = +∞, then τ
k
(ω) = τ(ω) for all k. Otherwise, the
sequence (τ
k
(ω)) is decreasing and bounded from below by τ(ω). Thus it converges to a limit
and an easy contradiction gives that the limit must be τ(ω). In other words, (τ
k
) converges
pointwise to τ.
Clearly for any k f
k
(R) = f
−1
k
(R) = R. So to conclude that τ
k
is a stopping time for any
k ≥ 0 all that remains to be shown is that f
k
[
R
: (R, B
R
) → (R, B
R
) is measurable. We have
that for every k ≥ 0
¸kx| = lim
n→∞
n

i=1
i1
[i/k,(i+1)/k)
(x).
So by Proposition 1 we have that ¸kx| is the pointwise limit of measurable functions (indicator
functions are measurable if and only if the set they measure is measurable). Hence applying
same proposition again we have that f
k
is measurable.
An T
t
-adapted stochastic process X can be sampled at stopping times. However, X
τ
() :=
X(τ(), ) is merely a random variable and not a stochastic process. It is natural to extend the
notion of adapted processes to random times and ask the following. What is the sigma-algebra of
observable events at the random time τ, and is X
τ
measurable with respect to this? The idea is
that if a set A is observable at time τ then for any time t, its restriction to the set ¦τ ≤ t¦ should
be in T
t
. As always, we work with respect to a filtered probability space (Ω, T, ¦T
t
¦
t≥0
, P). The
sigma-algebra at the stopping time τ is then,
T
τ
= ¦A ∈ T

: A∩ ¦τ ≤ t¦ ∈ T
t
for all t ≥ 0¦
The restriction to sets in T

is to take account of the possibility that the stopping time can be
infinite, and it ensures that A = A∩ ¦τ ≤ ∞¦ ∈ T

. From this definition, a random variable U
us T
τ
-measurable if and only if 1
{τ≤t}
U is T
t
-measurable for all times t ∈ R
+
∪ ¦∞¦.
With these definitions, the question of whether or not a process X is T
τ
-measurable at a
stopping time τ can be answered. There is one minor issue here though; stopping times can
be infinite whereas stochastic processes in these notes are defined on the time index set R
+
.
We could just restrict to the set ¦τ < ∞¦, but it is handy to allow the processes to take values
at infinity. So, for the moment we consider a processes X
t
where the time index t runs over
¯
R
+
:= R
+
∪ ¦∞¦, and say that X is a predictable, optional or progressive process if it satisfies
the respective property restricted to times in R
+
and X

is T

-measurable.
Lemma 4. If X is a progressively measurable stochastic process and τ is a stopping time, then
X
τ
is T
τ
-measurable.
Proof. First we show that 1
τ<∞
X
τ
is T
τ
-measurable. From the definition we require that
1
τ<∞
1
τ≤t
X
τ
is T
t
-measurable for all t ∈ [0, +∞]. This is clearly the case if t = +∞ (since it’s
the zero function). Now suppose that t < +∞ and pick any B ∈ B
R
. We need to show that
D := ¦ω : X(τ(ω), ω) ∈ B¦ ∩ ¦ω : τ(ω) ≤ t¦ ∈ T
t
.
Since X is progressively measurable
A := ¦(s, ω) ∈ [0, t] Ω : X(s, ω) ∈ B¦ ∈ B
[0,t]
⊗T
t
.
Suppose that A is of the form A = A
1
A
2
where A
1
∈ B
[0,t]
, A
2
∈ T
t
. Then
C
A
:= ¦ω : (τ(ω), ω) ∈ A¦ = ¦ω : τ(ω) ∈ A
1
¦ ∩ A
2
.
Since τ is a stopping time, ¦ω : τ(ω) ≤ s¦ ∈ T
t
for all s ∈ [0, t]. Since the intervales [0, s],
0 ≤ s ≤ t, generate B
[0,t]
it follows that C is the intersection of two sets in T
t
and thus C
A
is
7
in T
t
as well. Furthermore since the sets of the form A = A
1
A
2
where A
1
∈ B
[0,t]
, A
2
∈ T
t
generate B
[0,t]
⊗ T
t
, it is straightforward to extend the above argument to show that C
A
∈ T
t
for any A ∈ B
[0,t]
⊗ T
t
(one needs to use the fact that the “generate sigma-algebra” operation
commutes with functions, see here or here). Comparing the definitions of D, A and C
A
it follows
that D = C
A
for some A ∈ B
[0,t]
⊗ T
t
. Since the t was arbitrary we have that 1
τ<+∞
X
τ
is T
τ
measurable.
It is easy to see that 1
τ=

X
τ
is T
τ
-measurable: 1
τ=+∞
1
τ≤t
X
τ
is the zero function for all
t < ∞, hence 1
τ=+∞
1
τ≤t
X
τ
T
t
-measurable). If t = ∞, then we have that 1
τ=+∞
1
τ≤t
X
τ
= X

is T
τ
-measurable since, by definition X

is T

-measurable.
2.4 Stopped processes
As well as simply observing the value at this time, as the name suggests, stopping times are
often used to stop the process. A process X stopped at the random time τ is denoted by X
τ
,
X
τ
(t, ω) := X
(
t ∧ τ(ω), ω).
It is important that stopping an adapted process at a stopping time preserves the basic
measurability properties. To show this, we first need the following technical lemma.
Lemma 5. If X is jointly measurable and τ : Ω →R
+
is a measurable map (such a random
variable is called a random time) then X
τ
is measurable.
Proof. Let 1 denote the set of all progressively measurable processes X : [0, ∞) Ω → Ω such
that X
τ
: Ω → R is measurable. It follows from Proposition 1 that 1 is closed under the
addition and multiplication operations and under pointwise limits. Pick any B ∈ B
[0,∞)
and
F ∈ T. Since τ is measurable,
1
(τ(ω,ω)∈B×F
= 1
ω∈τ
−1
(B)∩F
is measurable as well. Thus, 1
(t,ω)∈B×F
∈ 1. Applying the Functional Monotone Class Theorem
(Theorem 1) we have that 1 contains all bounded progressively measurable functions. For any
progressive processes X, it follows from Proposition 1 that Y
n
(t, ω) := n∧X(t, ω) is a sequence
of bounded progressive processes that tends pointwise to X as n → ∞. Since 1 is closed under
pointwise limits, X ∈ 1 completing the claim.
Lemma 6. Let τ be a stopping time. If the stochastic process X satisfies any of the following
properties then so does the stopped process X
τ
.
• Left-continuous and adapted.
• Right-continuous and adapted.
• Predictable.
• Optional.
• Progressively measurable.
Proof. Lemma 5 states that X is jointly measurable and τ is any random time then X
τ
is
measurable (see here). It follows from the decomposition
X
τ
t
= 1
{t≤τ}
X
t
+ 1
{t>τ}
X
τ
,
that X
τ
is also jointly measurable. Now suppose that X is progressive and T ≥ 0 is any fixed
time. By definition, X
T
is B(R
+
) ⊗T
T
-measurable and, if τ is a stopping time, then τ ∧ T is
T
T
-measurable. Then, by what we have just shown above, the stopped process
8
(X
τ
)
T
= X
τ∧T
= (X
T
)
τ∧T
is B(R
+
) ⊗T
T
-measurable. This shows that X
τ
is progressive.
9
3 Continuous time martingales
All processes discussed in this section, with the exception of Martingales, take values in (E, B
E
),
where E is a complete separable metric space and B
E
is the collection of Borel subsets of E. We
reserve the symbol d : E E → [0, ∞) for the metric on E. The martingales discussed here all
take values in (R, B
R
) (since we need a total order on the space for the definition of a martingale
to make sense).
3.1 Martingales and stopping times
Definition 6. Let (Ω, T) be a measurable space. A family ¦T
t
⊂ T : t ≥ 0¦ of sub-sigma-
algebras of T is called a filtration if T
s
⊂ T for any s ≤ t.
With a stochastic processes ¦X
t
¦
t≥0
we associate its “natural filtration” T
X
t
:= σ¦X
s
: s ≤
t¦. Note we sometimes write X(t, ω) instead of X
t
(ω).
Definition 7. Given a probability space (Ω, T, P) and a filtration T
t
⊂ T indexed by t ≥ 0, a
stochastic process ¦M
t
¦
t≥0
is called a martingale relative to (Ω, T
t
, P) if the following hold
• For almost all ω, M
t
(ω) has left and right limits at every t and is continuous from the
right, that is, M(t + 0, ω) := lim
h↓0
M(t +h, ω) = M(t, ω).
• For every fixed t ≥ 0, M(t) is T
t
measurable and integrable.
• For 0 ≤ s ≤ t
E[M
t
[T
s
] = M(s) a.e.
If instead we have that E[M
t
[T
s
] ≥ M(s) (or E[M
t
[T
s
] ≤ M(s)) we say that M
t
is a super-
matingale (respectively submartingale)
2
.
Remark 3. People often do not require the first bullet point in the definition and call mar-
tingales that satisfy it c´adl´ag martingales (that is, “continue ´ a droite, limite ´a gauche”). Our
assumption that a martingale is c´adl´ag from the get go is justified by the fact that, under some
mild conditions, every martingale has a c´adl´ag modification see Theorem 3 here.
Proposition 6. Suppose that φ : R →R is a convex function, M
t
is a martingale and φ(M
t
) is
integrable. Then, φ(M
t
) is a submartingale. In particular, [X(t)[ is a submartingale.
Proof. Because φ is convex it is also continuous, thus φ(M
t
) is also c´adl´ag. For any 0 ≤ s ≤ t,
by Jensen’s inequality we have that E[φ(M
t
)[T
s
] ≥ φ(E[M
t
[T
s
]) = φ(M
s
).
3.2 Doob’s Optional Stopping Theorem
Theorem 4 (Optional Stopping Theorem). If τ
1
, τ
2
are two bounded stopping times such that
τ
1
≤ τ
2
and M
t
is a martingale, then
E[M
τ
2
[T
τ
1
] = M(τ
1
)
holds almost surely.
To prove the above, we need the following lemma.
Lemma 7. Let X be an integrable random variable on a probability space (Ω, T, P) and X
Σ
:=
E[X[Σ] where Σ ⊂ T is a sigma-algebra. The collection ¦X
Σ
¦ as Σ varies over all sub-sigma-
algebras of T is uniformly integrable.
2
Easy way to remember which one is which: just remember they are named the opposite way any reasonable
human being would have named them
10
Proof. Applying Jensen’s inequality in the rightmost inequality and the tower property of con-
ditional expectation in the last equality we have that
P[[X
Σ
[ ≥ l] = E[1
{|X
Σ
|≥l}
] =
E[l1
{|X
Σ
|≥l}
]
l

E[[X
Σ
[1
{|X
Σ
|≥l}
]
l

E[[X
Σ
[]
l

E[E[[X[[Σ]]
l
=
E[[X[]
l
.
(1)
By the definition of X
Σ
, ¦ω : [X
Σ
(ω)[ ≥ l¦ ∈ Σ. So
_
|X
Σ
|≥l
[X
Σ
[dP ≤
_
|X
Σ
|≥l
E[[X[[Σ]dP =
_
|X
Σ
|≥l
[X[dP,
where we’ve used Jensen’s inequality to obtain inequality and the definition of E[[X[[Σ] to
obtain the equality. It follows from (1) that ([X[1
|X
Σ
|≥n
)
n
converges pointwise to a function with
support of measure 0 (that is, [X[1

n
{|X
Σ
|≥n}
). So by the Dominated Convergence Theorem we
have that
lim
n→∞
_
[X[1
|X
Σ
|≥n
dP =
_
[X[1

n
{|X
Σ
|≥n}
dP = 0.
The above implies that for any we can find an N ∈ [0, ∞) such that
_
|X
Σ
|≥n
E[[X[[Σ]dP ≤ ε
for all n ≥ N and all sub-sigma-algebras Σ of T.
Proof of Theorem 4. We do this in three steps. First we prove that if τ ≤ C is a bounded stop-
ping time that takes finitely many values, then M
τ
is T
τ
-measurable (and thus T-measurable),
integrable and that
E[M
C
[T
τ
] = M
τ
. (2)
holds. Next we show repeat the above for the case that τ ≤ C is just a bounded stopping time.
Lastly, we show that if τ
1
and τ
2
are as in the premise, equation displayed in the conclusion
holds.
Step 1: Suppose that τ is a stopping time bounded by C and that τ(ω) is in some finite
set ¦t
1
, t
2
, . . . , t
l
¦ for all ω. For all i = 1, 2 . . . , l let E
i
:= ¦ω : τ
k
(ω) = t
i
¦, and note that
E
1
⊂ E
2
⊂ . . . E
l
. First note that
M
τ
= 1
E
1
M
t
1
+ 1
E
2
M
t
2
+ + 1
E
l
M
t
l
.
So by Proposition 1 M
τ
is T
t
l
-measurable (hence, T-measurable) and by the triangle rule in
L
1
(P) it is integrable. Next, pick any Borel set B. We already know that ¦ω : X(τ(ω), ω) ∈ B¦
lies in T, but we need to verify that it also lies in T
τ
. That is, that for all i
¦ω : M(τ(ω), ω) ∈ B¦ ∩ E
i
∈ T
t
i
.
But
¦ω : M(τ(ω), ω) ∈ B¦ ∩ ¦ω : τ(ω) ≤ t
i
¦ =
l
_
j=1
¦ω : τ(ω) = t
j
, M
t
j
(ω) ∈ B¦ ∩ E
i
=
n
_
j=1
(E
j
−E
j−1
) ∩ M
−1
t
j
(B) ∩ E
i
=
i
_
j=1
(E
j
−E
j−1
) ∩ M
−1
t
j
(B).
Since M
t
is adapted to T
t
, τ is an T
t
-stopping time and T
t
is closed under finite intersections
and unions, the above lies in T
t
i
. In other words, M
τ
is T
τ
measurable. Next pick any A ∈ T
τ
.
Since τ is a stopping time and it is bounded by C we have that A∩ E
i
∈ T
t
i
and
_
A∩E
i
M
τ
dP =
_
A∩E
i
M
t
i
dP =
_
A∩E
i
E[M
C
[T
t
i
]dP =
_
1
A∩E
i
E[M
C
[T
t
i
]dP
=
_
E[1
A∩E
i
M
C
[T
t
i
]dP =
_
1
A∩E
i
M
C
dP =
_
A∩E
i
M
C
dP.
11
Summing of i then gives us that
_
A
M
τ
dP =
_
A
M
C
dP.
Since A ∈ T
τ
was arbitrary, the above and uniqueness of conditional expectation implies that
E[M
C
[T
τ
] = M
τ
(almost surely, of course).
Step 2: Suppose that τ ≤ C is a bounded stopping time.
———————————————-
To prove that M
τ
is T-measurable, integrable and that (3) holds it is sufficient to prove it
for some sequence of stopping times (τ
k
) such that τ
k
≤ C almost surely and τ
k
↓ τ (that is,
converges pointwise from above to τ). Since M
τ
is c´adl´ag, for almost all ω, τ
k
(ω) ↓ τ(ω) implies
that M(τ
k
(ω), ω) → M(τ(ω), ω)
Then, using Proposition ?? again, we have that if A ∈ T
τ
it is also true that A ∈ T
τ
k
and
thus
_
A
M
C
dP =
_
A
E[M
C
[T
τ
k
]dP =
_
A
M
τ
k
dP.
That is, M
τ
k
tends pointwise to M
τ
. Furthermore, by Jensen’s inequality we have that for any
k E[[M
τ
k
[] = E[[E[M
C
[T
τ
][] ≤ E[E[[M
C
[[T
τ
]] = E[[M
C
[] < ∞ (the last inequality follows from
the definition of a martingale). Hence applying Vitali’s Convergence Theorem (Theorem 2) we
have that M
τ
is integrable and that
_
A
M
τ
dP = lim
k→∞
_
A
M
τ
k
dP = lim
k→∞
_
A
M
C
dP =
_
A
M
C
dP.
Since the above holds for any A ∈ T
τ
, uniqueness of conditional expectation implies establishes
(3).
Suppose that we prove that for any stopping time τ bounded by C, M
τ
is T-measurable,
integrable and that
E[M
C
[T
τ
] = M
τ
. (3)
Then we would have that
E[M
τ
2
[T
τ
1
] = E[E[M
C
[T
τ
2
][T
τ
1
] = E[M
C
[T
τ
1
] = M
τ
1
where the second equality follows from the tower property and Proposition ??. To prove that
M
τ
is T-measurable, integrable and that (3) holds it is sufficient to prove it for some sequence
of stopping times (τ
k
) such that τ
k
≤ C almost surely and τ
k
↓ τ (that is, converges pointwise
from above to τ). Since M
τ
is c´adl´ag, for almost all ω, τ
k
(ω) ↓ τ(ω) implies that M(τ
k
(ω), ω) →
M(τ(ω), ω)
Then, using Proposition ?? again, we have that if A ∈ T
τ
it is also true that A ∈ T
τ
k
and
thus
_
A
M
C
dP =
_
A
E[M
C
[T
τ
k
]dP =
_
A
M
τ
k
dP.
That is, M
τ
k
tends pointwise to M
τ
. Furthermore, by Jensen’s inequality we have that for any
k E[[M
τ
k
[] = E[[E[M
C
[T
τ
][] ≤ E[E[[M
C
[[T
τ
]] = E[[M
C
[] < ∞ (the last inequality follows from
the definition of a martingale). Hence applying Vitali’s Convergence Theorem (Theorem 2) we
have that M
τ
is integrable and that
_
A
M
τ
dP = lim
k→∞
_
A
M
τ
k
dP = lim
k→∞
_
A
M
C
dP =
_
A
M
C
dP.
Since the above holds for any A ∈ T
τ
, uniqueness of conditional expectation implies establishes
(3).
12
Proposition 5 gives us an approximating sequence (τ
k
) as required above, such that for each
fixed k, τ
k
(ω) is in some finite set ¦t
1
, t
2
, . . . , t
l
¦ for almost all ω. Fix k, pick any A ∈ T
τ
k
and
let E
i
:= ¦ω : τ
k
(ω) = t
i
¦. Since τ
k
is a stopping time we have that A∩ E
i
∈ T
t
i
and
_
A∩E
i
M
τ
k
dP =
_
A∩E
i
M
t
i
dP =
_
A∩E
i
E[M
C
[T
t
i
]dP =
_
1
A∩E
i
E[M
C
[T
t
i
]dP
=
_
E[1
A∩E
i
M
C
[T
t
i
]dP =
_
1
A∩E
i
M
C
dP =
_
A∩E
i
M
C
dP.
Summing of i then gives us that
_
A
M
τ
k
dP =
_
A
M
C
dP.
Since A ∈ T
τ
k
was arbitrary, the above implies that E[M
C
[T
τ
k
] = M
τ
k
(almost surely, of
course).
Corollary 2. If τ is any stopping time and M
t
is a martingale with respect to (Ω, T
t
, P), then
so is M
τ∧t
.
Proof. Proposition ?? establishes that for every fixed t, τ ∧ t is a stopping time. Next Lemma
3 in here establishes that M
τ∧t
is c´adl´ag and that for every fixed t it is T
t
measurable.
References
[1] R Durrett. Probability: theory and examples. Cambridge university press, 3rd edition, 2010.
[2] AN Shiryaev. Probability. Graduate texts in mathematics. Springer, 2nd edition, 1996.
[3] Ramon van Handel. Stochastic Calculus, Filtering, and Stochastic Control. 2007.
13