Basics of Stochastic Analysis

Basics of Stochastic Analysis
c _Timo Sepp alainen

Department of Mathematics, University of WisconsinMadison,
Madison, Wisconsin 53706
E-mail address: seppalai@math.wisc.edu
This version December 19, 2010.
The author has been supported by NSF grants and by the Wisconsin
Alumni Research Foundation.
Abstract. This material is for a course on stochastic analysis at UW
Madison. The text covers the development of the stochastic integral of
predictable processes with respect to cadlag semimartingale integrators,
It os formula in an open domain in R
n
, an existence and uniqueness
theorem for an equation of the type dX = dH + F(t, X) dY where Y
is a cadlag semimartingale, and local time and Girsanovs theorem for
Brownian motion.
The text is self-contained except for certain basics of integration
theory and probability theory which are explained but not proved. In
addition, the reader needs to accept without proof two basic martin-
gale theorems: (i) the existence of quadratic variation for a cadlag local
martingale; and (ii) the so-called fundamental theorem of local martin-
gales that states the following: given a cadlag local martingale M and
a positive constant c, M can be decomposed as N +A where N and A
are cadlag local martingales, jumps of N are bounded by c, and A has
paths of bounded variation.
This text intends to provide a stepping stone to deeper books such
as Karatzas-Shreve and Protter. The hope is that this material is acces-
sible to students who do not have an ideal background in analysis and
probability theory, and useful for instructors who (like the author) are
not experts on stochastic analysis.
Contents
Chapter 1. Measures, Integrals, and Foundations of Probability
Theory 1
1.1. Measure theory and integration 1
1.2. Basic concepts of probability theory 19
Exercises 33
Chapter 2. Stochastic Processes 39
2.1. Filtrations and stopping times 39
2.2. Quadratic variation 49
2.3. Path spaces, martingale property and Markov property 56
2.4. Brownian motion 63
2.5. Poisson processes 76
Exercises 80
Chapter 3. Martingales 85
3.1. Optional stopping 87
3.2. Inequalities and limits 91
3.3. Local martingales and semimartingales 95
3.4. Quadratic variation for semimartingales 97
3.5. Doob-Meyer decomposition 102
3.6. Spaces of martingales 105
Exercises 109
Chapter 4. Stochastic Integral with respect to Brownian Motion 111
iii
iv Contents
Exercises 125
Chapter 5. Stochastic Integration of Predictable Processes 127
5.1. Square-integrable martingale integrator 128
5.2. Local square-integrable martingale integrator 155
5.3. Semimartingale integrator 165
5.4. Further properties of stochastic integrals 171
5.5. Integrator with absolutely continuous Doleans measure 182
Exercises 187
Chapter 6. Itos Formula 191
6.2. Itos formula 200
6.3. Applications of It os formula 208
Exercises 216
Chapter 7. Stochastic Dierential Equations 219
7.1. Examples of stochastic equations and solutions 220
7.2. Existence and uniqueness for Ito equations 227
7.3. A semimartingale equation 238
7.4. Existence and uniqueness for a semimartingale equation 243
Exercises 268
Chapter 8. Applications of Stochastic Calculus 269
8.1. Local time 269
8.2. Change of measure 281
Exercises 286
Appendix A. Analysis 287
A.1. Continuous, cadlag and BV functions 289
A.2. Dierentiation and integration 299
Exercises 302
Appendix B. Probability 303
B.1. General matters 303
B.2. Construction of Brownian motion 313
Bibliography 321
Notation and Conventions 323
Index 325
Chapter 1
Measures, Integrals,
and Foundations of
Probability Theory
In this chapter we sort out the integrals one typically encounters in courses
on calculus, analysis, measure theory, probability theory and various applied
subjects such as statistics and engineering. These are the Riemann inte-
gral, the Riemann-Stieltjes integral, the Lebesgue integral and the Lebesgue-
Stieltjes integral. The starting point is the general Lebesgue integral on an
abstract measure space. The other integrals are special cases, even though
they have denitions that look dierent.
This chapter is not a complete treatment of the basics of measure theory.
It provides a brief unied explanation for readers who have prior familiarity
with various notions of integration. To avoid unduly burdening this chapter,
many technical matters that we need later in the book have been relegated
to the appendix. For details that we have omitted and for proofs the reader
should turn to any of the standard textbook sources, such as Folland [6].
In the second part of the chapter we go over the measure-theoretic foun-
dations of probability theory. Readers who know basic measure theory and
measure-theoretic probability can safely skip this chapter.
1.1. Measure theory and integration
A space X is in general an arbitrary set. For integration, the space must
have two additional structural elements, namely a -algebra and a measure.
1
2 1. Measures, Integrals, and Foundations of Probability Theory
1.1.1. -algebras. Suppose / is a collection of subsets of X. (Terms such
as class, collection and family are used as synonyms for set to avoid speaking
of sets of sets, or even sets of sets of sets.) Then / is a -algebra (also
called a -eld) if it has these properties:
(i) X / and /.
(ii) If A / then also A
c
/.
(iii) If A
i
is a sequence of sets in /, then also their union

i
A
i
is an
element of /.
The restriction to countable unions in part (iii) is crucial. Unions over
arbitrarily large collections of sets are not permitted. On the other hand, if
part (iii) only permits nite unions, then / is called an algebra of sets, but
this is not rich enough. .
A pair (X, /) where X is a space and / is a -algebra on X is called
a measurable space. The elements of / are called measurable sets. Suppose
(X, /) and (Y, B) are two measurable spaces and f : X Y is a map
(another term for a function) from X into Y . Then f is measurable if for
every B B, the inverse image
f
1
(B) = x X : f(x) B = f B
lies in /. Measurable functions are the fundamental object in measure
theory. Measurability is preserved by composition f g of functions.
In nite or countable spaces the useful -algebra is usually the power set
2
X
which is the collection of all subsets of X. Not so in uncountable spaces.
Furthermore, the important -algebras are usually very complicated so that
it is impossible to give a concise criterion for testing whether a given set is
a member of the -algebra. The preferred way to dene a -algebra is to
generate it by a smaller collection of sets that can be explicitly described.
This procedure is analogous to spanning a subspace of a vector space with a
given set of vectors. Except that the generated -algebra usually lacks the
kind of internal description that vector subspaces have as the set of nite
linear combinations of basis vectors. Generation of -algebras is based on
this lemma whose proof the reader should ll in as an exercise, if this material
is new.
Lemma 1.1. Let be a family of -algebras on a space X. Then the
intersection
c =
,
/
is also a -algebra.
Let c be an arbitrary collection of subsets of X. The -algebra generated
by c, denoted by (c), is by denition the intersection of all -algebras on
X that contain c. This intersection is well-dened because there is always
at least one -algebra on X that contains c, namely the power set 2
X
. An
equivalent characterization of (c) is that it satises these three properties:
(i) (c) c, (ii) (c) is a -algebra on X, and (iii) if B is any -algebra on
X that contains c, then (c) B. This last point justies calling (c) the
smallest -algebra on X that contains c.
A related notion is a -algebra generated by collection of functions.
Suppose (Y, H) is a measurable space, and is a collection of functions
from X into Y . Then the -algebra generated by is dened by
(1.1) () =
_
f B : f , B H
_
.
() is the smallest -algebra that makes all the functions in measurable.
Example 1.2 (Borel -algebras). If X is a metric space, then the Borel
-eld B
X
is the smallest -algebra on X that contains all open sets. The
members of B
X
are called Borel sets. We also write B(X) when subscripts
become clumsy.
It is often technically convenient to have dierent generating sets for a
particular -algebra. For example, the Borel -algebra B
R
of the real line
is generated by either one of these classes of intervals:
(a, b] : < a < b < and (, b) : < b < .
We shall also need the Borel -algebra B
[,]
of the extended real line
[, ]. We dene this as the smallest -algebra that contains all Borel sets
on the real line and the singletons and . This -algebra is also
generated by the intervals [, b] : b R. It is possible to dene a metric
on [, ] such that this -algebra is the Borel -algebra determined by
the metric.
When we speak of real-valued or extended real-valued measurable func-
tions on an arbitrary measurable space (X, /), we always have in mind the
Borel -algebra on R and [, ]. One can then check that measurability
is preserved by algebraic operations (f g, fg, f/g, whenever these are well-
dened) and by pointwise limits and suprema of sequences: if f
n
: n N
is a sequence of real-valued measurable functions, then for example the func-
tions
g(x) = sup
nN
f
n
(x) and h(x) = lim
n
f
n
(x)
are measurable. The set N above is the set of natural numbers N =
1, 2, 3, . . . . Thus measurability is a more robust property than other fa-
miliar types of regularity, such as continuity or dierentiability, that are not
in general preserved by pointwise limits.
If (X, B
X
) is a metric space with its Borel -algebra, then every continu-
ous function f : X R is measurable. The denition of continuity implies
that for any open G R, f
1
(G) is an open set in X, hence a member
of B
X
. Since the open sets generate B
R
, this suces for concluding that
f
1
(B) B
X
for all Borel sets B R.
Example 1.3 (Product -algebras). Of great importance for probability
theory are product -algebras. Let 1 be an arbitrary index set, and for
each i 1 let (X
i
, /
i
) be a measurable space. The Cartesian product space
X =

i7
X
i
is the space of all functions x : 1

i7
X
i
such that
x(i) X
i
for each i. Alternate notation for x(i) is x
i
. Coordinate projection
maps on X are dened by f
i
(x) = x
i
, in other words f
i
maps X onto X
i
by
extracting the i-coordinate of the 1-tuple x. The product -algebra

i7
/
i
is by denition the -algebra generated by the coordinate projections f
i
:
i 1.
1.1.2. Measures. Let us move on to discuss the second fundamental in-
gredient of integration. Let (X, /) be a measurable space. A measure is a
function : / [0, ] that satises these properties:
(i) () = 0.
(ii) If A
i
is a sequence of sets in / such that A
i
A
j
= for all i ,= j
(pairwise disjoint is the term), then
_
_
i
A
i
_
=
i
(A
i
).
Property (ii) is called countable additivity. It goes together with the fact
that -algebras are closed under countable unions, so there is no issue about
whether the union

i
A
i
is a member of /. The triple (X, /, ) is called a
measure space.
If (X) < then is a nite measure. If (X) = 1 then is a prob-
ability measure. Innite measures arise naturally. The ones we encounter
satisfy a condition called -niteness: is -nite if there exists a sequence
of measurable sets V
i
such that X =

V
i
and (V
i
) < for all i. A
measure dened on a Borel -algebra is called a Borel measure.
Example 1.4. Suppose X = x
i
: i N is a countable space, and let
a
i
: i N be a sequence of nonnegative real numbers. Then
(A) =
i:x
i
A
a
i
denes a -nite (or nite, if

a
i
< ) measure on the -algebra 2
X
of all
subsets of X.
The example shows that to dene a measure on a countable space, one
only needs to specify the measures of the singletons, and the rest follows
by countable additivity. Again, things are more complicated in uncountable
spaces. For example, we would like to have a measure m on the Borel sets of
the real line with the property that the measure m(I) of an interval I is the
length of the interval. (This measure is known as the Lebesgue measure.)
But then the measure of any singleton must be zero. So there is no way to
construct the measure by starting with singletons.
Since it is impossible to describe every element of a -algebra, it is
even more impossible to give a simple explicit formula for the measure of
every measurable set. Consequently we need a theorem that generates a
measure from some modest ingredients that can be explicitly written down.
Here is a useful one.
First, a class o of subsets of X is a semialgebra if it has these properties:
(i) o
(ii) If A, B o then also A B o.
(iii) If A o, then A
c
is a nite disjoint union of elements of o.
A good example on the real line to keep in mind is
(1.2) o = (a, b] : a b < (a, ) : a < .
This semialgebra generates B
R
.
Theorem 1.5. Let o be a semialgebra, and
0
: o [0, ] a function with
these properties:
(i)
0
() = 0
(ii) If A o is a nite disjoint union of sets B
1
, . . . , B
n
in o, then
0
(A) =

0
(B
i
).
(iii) If A o is a countable disjoint union of sets B
1
, B
2
, . . . , B
n
, . . . in
o, then
0
(A)

0
(B
i
).
Assume furthermore that there exists a sequence of sets A
i
in o such that
X =

A
i
and
0
(A
i
) < for all i. Then there exists a unique measure
on the -algebra (o) such that =
0
on o.
This theorem is proved by rst extending
0
to the algebra generated
by o, and by then using the so-called Caratheodory Extension Theorem to
go from the algebra to the -algebra. With Theorem 1.5 we can describe a
large class of measures on R.
Example 1.6 (Lebesgue-Stieltjes measures). Let F be a nondecreasing,
right-continuous real-valued function on R. For intervals (a, b], dene
0
(a, b] = F(b) F(a).
This will work also for a = and b = if we dene
F() = lim
x,
F(x) and F() = lim
y
F(y).
It is possible that F() = and F() = , but this will not hurt
the denition. One can show that
0
satises the hypotheses of Theorem
1.5, and consequently there exists a measure on (R, B
R
) that gives mass
F(b) F(a) to each interval (a, b]. This measure is called the Lebesgue-
Stieltjes measure of the function F, and we shall denote by
F
to indicate
the connection with F.
The most important special case is Lebesgue measure which we shall
denote by m, obtained by taking F(x) = x.
On the other hand, if is a Borel measure on R such that (B) <
for all bounded Borel sets, we can dene a right-continuous nondecreasing
function by
G(0) = 0, and G(x) =
_
(0, x], x > 0
(x, 0], x < 0
and then =
G
. Thus Lebesgue-Stieltjes measures give us all the Borel
measures that are nite on bounded sets.
1.1.3. The integral. Let (X, /, ) be a xed measure space. To say that a
function f : X R or f : X [, ] is measurable is always interpreted
with the Borel -algebra on R or [, ]. In either case, it suces to
check that f t / for each real t. The Lebesgue integral is dened
in several stages, starting with cases for which the integral can be written
out explicitly. This same pattern of proceeding from simple cases to general
cases will also be used to dene stochastic integrals.
Step 1. Nonnegative measurable simple functions. A nonnegative sim-
ple function is a function with nitely many distinct values
1
, . . . ,
n

[0, ). If we set A
i
= f =
i
, then we can write
f(x) =
n
i=1
i
1
A
i
(x)
where
1
A
(x) =
_
1, x A
0, x / A
is the indicator function (also called characteristic function) of the set A.
The integral
_
f d is dened by
_
f d =
n
i=1
i
(A
i
).
This sum is well-dened because f is measurable i each A
i
is measurable,
and it is possible to add and multiply numbers in [0, ]. Note the convention
0 = 0.
Step 2. [0, ]-valued measurable functions. Let f : X [0, ] be
measurable. Then we dene
_
f d = sup
__
g d : g is a simple function such that 0 g f
_
.
This integral is a well-dened number in [0, ].
Step 3. General measurable functions. Let f : X [, ] be
measurable. The positive and negative parts of f are f
+
= f 0 and
f
= (f 0). f
are nonnegative functions, and satisfy f = f

+
f
and
[f[ = f
+
+f
. The integral of f is dened by

_
f d =
_
f
+
d
_
f
d
provided at least one of the integrals on the right is nite.
These steps complete the construction of the integral. Along the way
one proves that the integral has all the necessary properties, such as linearity
_
(f +g) d =
_
f d +
_
g d,
monotonicity:
f g implies
_
f d
_
g d,
and the important inequality
_
f d

_
[f[ d.
Various notations are used for the integral
_
f d. Sometimes it is desir-
able to indicate the space over which one integrates by
_
X
f d. Then one
can indicate integration over a subset A by dening
_
A
f d =
_
X
1
A
f d.
To make the integration variable explicit, one can write
_
X
f(x) (dx) or
_
X
f(x) d(x).
Since the integral is bilinear in both the function and the measure, the linear
functional notation f, ) is used. Sometimes the notation is simplied to
(f), or even to f.
A basic aspect of measure theory is that whatever happens on sets of
measure zero is not visible. We say that a property holds almost every-
where (or simply almost everywhere if the measure is clear from the context)
if there exists a set N / such that (N) = 0 (a -null set) and the
property in question holds on the set N
c
.
For example, we can dene a measure on Rwhich agrees with Lebesgue
measure on [0, 1] and vanishes elsewhere by (B) = m(B[0, 1]) for B B
R
.
Then if g(x) = x on R while
f(x) =
_
_
sinx, x < 0
x, 0 x 1
cos x, x > 1
we can say that f = g -almost everywhere.
The principal power of the Lebesgue integral derives from three funda-
mental convergence theorems which we state next. The value of an integral is
not aected by changing the function on a null set. Therefore the hypotheses
of the convergence theorems require only almost everywhere convergence.
Theorem 1.7. (Fatous lemma) Let 0 f
n
be measurable functions.
Then
_
_
lim
n
f
n
_
d lim
n
_
f
n
d.
Theorem 1.8. (Monotone convergence theorem) Let f
n
be nonnegative mea-
surable functions, and assume f
n
f
n+1
almost everywhere, for each n. Let
f = lim
n
f
n
. This limit exists at least almost everywhere. Then
_
f d = lim
n
_
f
n
d.
Theorem 1.9. (Dominated convergence theorem) Let f
n
be measurable func-
tions, and assume the limit f = lim
n
f
n
exists almost everywhere. As-
sume there exists a function g 0 such that [f
n
[ g almost everywhere for
each n, and
_
g d < . Then
_
f d = lim
n
_
f
n
d.
Finding examples where the hypotheses and the conclusions of these
theorems fail are excellent exercises. By the monotone convergence theorem
we can give the following explicit limit expression for the integral
_
f d of
a [0, ]-valued function f. Dene simple functions
(1.3) f
n
(x) =
2
n
n1
k=0
2
n
k 1
2
n
kf<2
n
(k+1)
(x) +n 1
fn
(x).
Then 0 f
n
(x) f(x), and so by Theorem 1.8,
_
f d = lim
n
_
f
n
d
= lim
n
_
2
n
n1
k=0
2
n
k
_
k
2
n
f <
k+1
2
n
_
+n f n
_
.
(1.4)
There is an abstract change of variables principle which is particu-
larly important in probability. Suppose we have a measurable map :
(X, /) (Y, H) between two measurable spaces, and a measurable func-
tion f : (Y, H) (R, B
R
). If is a measure on (X, /), we can dene a
measure on (Y, H) by
(U) = (
1
(U)) for U H.
In short, this connection is expressed by
=
1
.
If the integral of f over the measure space (Y, H, ) exists, then the value of
this integral is not changed if instead we integrate f over (X, /, ):
(1.5)
_
Y
f d =
_
X
(f ) d.
Note that the denition of already gives the equality for f = 1
U
. The
linearity of the integral then gives it for simple f. General f 0 follow by
monotone convergence, and nally general f = f
+
f
by linearity again.
This sequence of steps recurs often when an identity for integrals is to be
proved.
1.1.4. The Riemann and Lebesgue integrals. In calculus we learn the
Riemann integral. Suppose f is a bounded function on a compact interval
[a, b]. Given a (nite) partition = a = s
0
< s
1
< < s
n
= b of [a, b]
and some choice of points x
i
[s
i
, s
i+1
], we form the Riemann sum
S() =
n1
i=0
f(x
i
)(s
i+1
s
i
).
We say f is Riemann integrable on [a, b] if there is a number c such that the
following is true: given > 0, there exists > 0 such that [c S()[
for every partition with mesh() = maxs
i+1
s
i
and for any choice
of the points x
i
in the Riemann sum. In other words, the Riemann sums
converge to c as the mesh of the partition converges to zero. The limiting
value is by denition the Riemann integral of f:
(1.6)
_
b
a
f(x) dx = c = lim
mesh()0
S().
One can then prove that every continuous function is Riemann integrable.
The denition of the Riemann integral is fundamentally dierent from
the denition of the Lebesgue integral. For the Riemann integral there is
one recipe for all functions, instead of a step-by-step denition that proceeds
from simple to complex cases. For the Riemann integral we partition the
domain [a, b], whereas the Lebesgue integral proceeds by partitioning the
range of f, as formula (1.4) makes explicit. This latter dierence is some-
times illustrated by counting the money in your pocket: the Riemann way
picks one coin at a time from the pocket, adds its value to the total, and
repeats this until all coins are counted. The Lebesgue way rst partitions
the coins into pennies, nickles, dimes, etc, and then counts the piles. As the
coin-counting picture suggests, the Lebesgue way is more ecient (it leads
to a more general integral with superior properties) but when both apply,
the answers are the same. The precise relationship is the following, which
also gives the exact domain of applicability of the Riemann integral.
Theorem 1.10. Suppose f is a bounded function on [a, b].
(a) If f is a Riemann integrable function on [a, b], then f is Lebesgue
measurable, and the Riemann integral of f coincides with the Lebesgue in-
tegral of f with respect to Lebesgue measure m on [a, b].
(b) f is Riemann integrable i the set of discontinuities of f has Lebesgue
measure zero.
Because of this theorem, the Riemann integral notation is routinely used
for Lebesgue integrals on the real line. In other words, we write
_
b
a
f(x) dx instead of
_
[a,b]
f dm
for a Borel or Lebesgue measurable function f on [a, b], even if the function
f is not Riemann integrable.
1.1.5. Function spaces. Various function spaces play an important role
in analysis and in all the applied subjects that use analysis. One way to
dene such spaces is through integral norms. For 1 p < , the space
L
p
() is the set of all measurable f : X R such that
_
[f[
p
d < . The
L
p
norm on this space is dened by
(1.7) |f|
p
= |f|
L
p
()
=
__
[f[
p
d
_1
p
.
A function f is called integrable if
_
[f[ d < . This is synonymous with
f L
1
().
There is also a norm corresponding to p = , dened by
(1.8) |f|
= |f|
L
()
= inf
_
c 0 : [f[ > c = 0
_
.
This quantity is called the essential supremum of [f[. The inequality [f(x)[
|f|
holds almost everywhere, but can fail on a null set of points x.

The L
p
() spaces, 1 p , are Banach spaces (see Appendix). The
type of convergence in these spaces is called L
p
convergence, so we say
f
n
f in L
p
() if
|f
n
f|
L
p
()
0 as n .
However, a problem is created by the innocuous property of a norm that
requires |f|
p
= 0 if and only if f = 0. For example, let the underlying
measure space be the interval [0, 1] with Lebesgue measure on Borel sets.
From the denition of the L
p
norm then follows that |f|
p
= 0 if and only
if f = 0 Lebesguealmost everywhere. In other words, f can be nonzero
on even innitely many points as long as these points form a Lebesguenull
set, and still |f|
p
= 0. An example of this would be the indicator function
of the rationals in [0, 1]. So the disturbing situation is that many functions
have zero norm, not just the identically zero function f(x) 0.
To resolve this, we apply the idea that whatever happens on null sets
is not visible. We simply adopt the point of view that functions are equal
if they dier only on a null set. The mathematically sophisticated way
of phrasing this is that we regard elements of L
p
() as equivalence classes
of functions. Particular functions that are almost everywhere equal are
representatives of the same equivalence class. Fortunately, we do not have
to change our language. We can go on regarding elements of L
p
() as
functions, as long as we remember the convention concerning equality. This
issue will appear again when we discuss spaces of stochastic processes.
1.1.6. Completion of measures. There are certain technical benets to
having the following property in a measure space (X, /, ), called complete-
ness: if N / satises (N) = 0, then every subset of N is measurable
(and then of course has measure zero). It turns out that this can always be
arranged by a simple enlargement of the -algebra. Let
/ = A X : there exists B, N / and F N

such that (N) = 0 and A = B F
and dene on

/ by (A) = (B) when B has the relationship to A from
above. The one can check that /

/, (X,

/, ) is a complete measure
space, and agrees with on / .
An important example of this procedure is the extension of Lebesgue
measure m from B
R
to a -algebra /
R
of the so-called Lebesgue measurable
sets. /
R
is the completion of B
R
under Lebesgue measure, and it is strictly
larger than B
R
. Proving this latter fact is typically an exercise in real
analysis (for example, Exercise 2.9 in [6]). For our purposes the Borel sets
suce as a domain for Lebesgue measure. In analysis literature the term
Lebesgue measure usually refers to the completed measure.
1.1.7. Product measures. Let (X, /, ) and (Y, B, ) be two -nite
measure spaces. The product measure space
(X Y, /B, )
is dened as follows. X Y is the Cartesian product space. / B is the
product -algebra. The product measure is the unique measure on
/B that satises
(AB) = (A)(B)
for measurable rectangles A B where A / and B B. Measurable
rectangles generate /B and they form a semialgebra. The hypotheses of
the Extension Theorem 1.5 can be checked, so the measure is uniquely
and well dened. This measure is also -nite.
The x-section f
x
of an /B-measurable function f is f
x
(y) = f(x, y).
It is a B-measurable function on Y . Furthermore, the integral of f
x
over
(Y, B, ) gives an /-measurable function of x. Symmetric statements hold
for the y-section f
y
(x) = f(x, y), a measurable function on X. This is part
of the important Tonelli-Fubini theorem.
Theorem 1.11. Suppose (X, /, ) and (Y, B, ) are -nite measure spaces.
indexTonellis theorem
(a) (Tonellis theorem) Let f be a [0, ]-valued /B-measurable func-
tion on XY . Then the functions g(x) =
_
Y
f
x
d and h(y) =
_
X
f
y
d are
[0, ]-valued measurable functions on their respective spaces. Furthermore,
f can be integrated by iterated integration:
_
XY
f d( ) =
_
X
__
Y
f(x, y) (dy)
_
(dx)
=
_
Y
__
X
f(x, y) (dx)
_
(dy).
(b) (Fubinis theorem) Let f L
1
(). Then f
x
L
1
() for -almost
every x, f
y
L
1
() for -almost every y, g L
1
() and h L
1
(). Iterated
integration is valid as above.
The product measure construction and the theorem generalize naturally
to products
_
n
i=1
X
i
,
n
i=1
/
i
,
n
i=1
i
_
of nitely many -nite measure spaces. Innite products shall be discussed
in conjunction with the construction problem of stochastic processes.
The part of the Tonelli-Fubini theorem often needed is that integrating
away some variables from a product measurable function always leaves a
function that is measurable in the remaining variables.
In multivariable calculus we learn the multivariate Riemann integral over
n-dimensional rectangles,
_
b
1
a
1
_
b
2
a
2

_
bn
an
f(x
1
, x
2
, . . . , x
n
) dx
n
dx
2
dx
1
.
This is an integral with respect to n-dimensional Lebesgue measure on
R
n
, which can be dened as the completion of the n-fold product of one-
dimensional Lebesgue measures.
As a nal technical point, consider metric spaces X
1
, X
2
, . . . , X
n
, with
product space X =

X
i
. X has the product -algebra

B
X
i
of the Borel
-algebras from the factor spaces. On the other hand, X is a metric space in
its own right, and so has its own Borel -algebra B
X
. What is the relation
between the two? The projection maps (x
1
, . . . , x
n
) x
i
are continuous,
hence B
X
-measurable. Since these maps generate

B
X
i
, it follows that
B
X
i
B
X
. It turns out that if the X
i
s are separable then equality holds:
B
X
i
= B
X
. A separable metric space is one that has a countable dense
set, such as the rational numbers in R.
1.1.8. Signed measures. A nite signed measure on a measurable space
(X, /) is a function : / R such that () = 0, and
(1.9) (A) =
i=1
(A
i
)
whenever A =

A
i
is a disjoint union. The series in (1.9) has to converge
absolutely, meaning that
i
[(A)[ < .
Without absolute convergence the limit of the series

(A
i
) would depend
on the order of the terms. But this must not happen because rearranging
the sets A
1
, A
2
, A
3
, . . . does not change their union.
More generally, a signed measure is allowed to take one of the values
but not both. We shall use the term measure only when the signed
measure takes only values in [0, ]. If this point needs emphasizing, we use
the term positive measure as a synonym for measure.
For any signed measure , there exist unique positive measures
+
and
such that =
+
and
+
. (The statement
+
reads
+
and
are mutually singular, and means that there exists a measurable

set A such that
+
(A) =
(A
c
) = 0.) The measure
+
is the positive
variation of ,
is the negative variation of , and =

+

the
Jordan decomposition of . There exist measurable sets P and N such that
P N = X, P N = , and
+
(A) = (A P) and
(A) = (A N).
(P, N) is called the Hahn decomposition of . The total variation of is the
positive measure [[ =
+
+
. We say that the signed measure is -nite

if [[ is -nite.
Integration with respect to a signed measure is dened by
(1.10)
_
f d =
_
f d
+
_
f d
whenever both integrals on the right are nite. A function f is integrable

with respect to if it is integrable with respect to [[. In other words, L
1
()
is by denition L
1
( [[ ). A useful inequality is
(1.11)
_
f d

_
[f[ d[[
valid for all f for which the integral on the right is nite.
Note for future reference that integrals with respect to [[ can be ex-
pressed in terms of by
(1.12)
_
f d[[ =
_
P
f d
+
+
_
N
f d
=
_
(1
P
1
N
)f d.
1.1.9. BV functions and Lebesgue-Stieltjes integrals. Let F be a
function on [a, b]. The total variation function of F is the function V
F
(x)
dened on [a, b] by
(1.13) V
F
(x) = sup
_
n
i=1
[F(s
i
) F(s
i1
)[ : a = s
0
< s
1
< < s
n
= x
_
.
The supremum above is taken over partitions of the interval [a, x]. F has
bounded variation on [a, b] if V
F
(b) < . BV [a, b] denotes the space of
functions with bounded variation on [a, b] (BV functions).
V
F
is a nondecreasing function with V
F
(a) = 0. F is a BV function i
it is the dierence of two bounded nondecreasing functions, and in case F
is BV, one way to write this decomposition is
F =
1
2
(V
F
+F)
1
2
(V
F
F)
(the Jordan decomposition of F). If F is BV and right-continuous, then also
V
F
is right-continuous.
Henceforth suppose F is BV and right-continuous on [a, b]. Then there
is a unique signed Borel measure
F
on (a, b] determined by
F
(u, v] = F(v) F(u), a u < v b.
We can obtain this measure from our earlier denition of Lebesgue-Stieltjes
measures of nondecreasing functions. Let F = F
1
F
2
be the Jordan decom-
position of F. Extend these functions outside [a, b] by setting F
i
(x) = F
i
(a)
for x < a, and F
i
(x) = F
i
(b) for x > b. Then
F
=
F
1

F
2
is the Jordan
decomposition of the measure
F
, where
F
1
and
F
2
are as constructed in
Example 1.6. Furthermore, the total variation measure of
F
is
[
F
[ =
F
1
+
F
2
=
V
F
,
the Lebesgue-Stieltjes measure of the total variation function V
F
. The inte-
gral of a bounded Borel function g on (a, b] with respect to the measure
F
is of course denoted by
_
(a,b]
g d
F
but also by
_
(a,b]
g(x) dF(x),
and the integral is called a Lebesgue-Stieltjes integral. We shall use both
of these notations in the sequel. Especially when
_
g dF might be confused
with a stochastic integral, we prefer
_
g d
F
. For Lebesgue-Stieltjes integrals
inequality (1.11) can be written in the form
(1.14)
_
(a,b]
g(x) dF(x)

_
(a,b]
[g(x)[ dV
F
(x).
We considered
F
a measure on (a, b] rather than [a, b] because according
to the connection between a right-continuous function and its Lebesgue-
Stieltjes measure, the measure of the singleton a is
(1.15)
F
a = F(a) F(a).
This value is determined by how we choose to extend F to x < a, and so is
not determined by the values on [a, b].
Advanced calculus courses sometimes cover a related integral called the
Riemann-Stieltjes, or the Stieltjes integral. This is a generalization of the
Riemann integral. For bounded functions g and F on [a, b], the Riemann-
Stieltjes integral
_
b
a
g dF is dened by
_
b
a
g dF = lim
mesh()0
i
g(x
i
)
_
F(s
i+1
) F(s
i
)
_
if this limit exists. The notation and the interpretation of the limit is as
in (1.6). One can prove that this limit exists for example if g is continuous
and F is BV [14, page 282]. The next lemma gives a version of this limit
that will be used frequently in the sequel. The left limit function is dened
by f(t) = lim
s,t,s<t
f(s), by approaching t strictly from the left, provided
these limits exist.
Lemma 1.12. Let be a nite signed measure on (0, T]. Let f be a bounded
Borel function on [0, T] for which the left limit f(t) exists at all 0 < t T.
Let
n
= 0 = s
n
1
< < s
n
m(n)
= T be partitions of [0, T] such that
mesh(
n
) 0. Then
lim
n
sup
0tT
i
f(s
n
i
)(s
n
i
t, s
n
i+1
t]
_
(0,t]
f(s) (ds)
= 0.
In particular, for a right-continuous function G BV [0, T],
lim
n
sup
0tT
i
f(s
n
i
)
_
G(s
n
i+1
t) G(s
n
i
t)
_
_
(0,t]
f(s) dG(s)
= 0.
Remark 1.13. It is important here that f is evaluated at the left endpoint
of the partition intervals [s
n
i
, s
n
i+1
].
Proof. For each 0 t T,
i
f(s
n
i
)(s
n
i
t, s
n
i+1
t]
_
(0,t]
f(s) (ds)
_
(0,t]
i
f(s
n
i
)1
(s
n
i
,s
n
i+1
]
(s) f(s)
[[(ds)
_
(0,T]
i
f(s
n
i
)1
(s
n
i
,s
n
i+1
]
(s) f(s)
[[(ds)
where the last inequality is simply a consequence of increasing the interval
of integration to (0, T]. The last integral gives a bound that is uniform in t,
and it vanishes as n by the dominated convergence theorem.
Example 1.14. A basic example is a step function. Let x
i
be a sequence
of points in R (ordering of x
i
s is immaterial), and
i
an absolutely sum-
mable sequence, which means

[
i
[ < . Check that
G(t) =
i:x
i
t
i
is a right-continuous BV function on any subinterval of R. Both properties
follow from
[G(t) G(s)[
i:s<x
i
t
[
i
[.
The right-hand side above can be made arbitrarily small by taking t close
enough to s. The Lebesgue-Stieltjes integral of a bounded Borel function f
is
(1.16)
_
(0,T]
f dG =
i:0<x
i
T
i
f(x
i
).
To rigorously justify this, rst take f = 1
(a,b]
with 0 a < b T and
check that both sides equal G(b) G(a) (left-hand side by denition of the
Lebesgue-Stieltjes measure). These intervals form a -system that generates
the Borel -algebra on (0, T]. Theorem B.2 can be applied to verify the
identity for all bounded Borel functions f.
The above proof is a good example of a recurring theme. Suppose the
goal is to prove an identity, such as (1.16) above, that is valid for a large
class of objects, in this case all bounded Borel functions f. Typically we
can do an explicit verication for some special cases. If this class of special
cases is rich enough, then we can hope to complete the proof by appealing
to some general principle that extends the identity from the special class to
the entire class.
1.1.10. Radon-Nikodym theorem. This theorem is among the most im-
portant in measure theory. We state it here because it gives us the existence
of conditional expectations in the next section. First a denition. Suppose
is a measure and a signed measure on a measurable space (X, /). We say
is absolutely continuous with respect to , abbreviated , if (A) = 0
implies (A) = 0 for all A /.
Theorem 1.15. Let be a -nite measure and a -nite signed mea-
sure on a measurable space (X, /). Assume is absolutely continuous with
respect to . Then there exists a -almost everywhere unique /-measurable
function f such that at least one of
_
f
+
d and
_
f
d is nite, and for

each A /,
(1.17) (A) =
_
A
f d.
Some remarks are in order. Since either
_
A
f
+
d or
_
A
f
d is nite,
the integral
_
A
f d has a well-dened value in [, ]. The equality of
integrals (1.17) extends to measurable functions, so that
(1.18)
_
g d =
_
gf d
for all /-measurable functions g for which the integrals make sense. The
precise sense in which f is unique is this: if

f also satises (1.17) for all
A /, then f ,=

f = 0.
The function f is the Radon-Nikodym derivative of with respect to ,
and denoted by f = d/d. The derivative notation is very suggestive. It
leads to d = f d which tells us how to do the substitution in the integral.
Also, it suggests that
(1.19)
d
d

d
d
=
d
d
which is a true theorem under the right assumptions: suppose is a signed
measure, and positive measures, all -nite, and . Then
_
g d =
_
g
d
d
d =
_
g
d
d

d
d
d
by two applications of (1.18). Since the Radon-Nikodym derivative is unique,
the equality above proves (1.19).
Here is a result that combines the Radon-Nikodym theorem with Lebesgue-
Stieltjes integrals.
Lemma 1.16. Suppose is a nite signed Borel measure on [0, T] and
g L
1
(). Let
F(t) =
_
[0,t]
g(s) (ds), 0 t T.
Then F is a right-continuous BV function on [0, T]. The Lebesgue-Stieltjes
integral of a bounded Borel function on [0, T] satises
(1.20)
_
(0,T]
(s) dF(s) =
_
(0,T]
(s)g(s) (ds).
In abbreviated form, dF = g d and g = d
F
/d on (0, T].
Proof. For right continuity of F, let t
n
t. Then
[F(t
n
) F(t)[ =
_
[0,T]
1
(t,tn]
g d

_
[0,T]
1
(t,tn]
[g[ d[[.
The last integral vanishes as t
n
t because 1
(t,tn]
(s) 0 for each point s,
and the integral converges by dominated convergence. Thus F(t+) = F(t).
For any partition 0 = s
0
< s
1
< < s
n
= T,
F(s
i+1
) F(s
i
)
_
(s
i
,s
i+1
]
g d
i
_
(s
i
,s
i+1
]
[g[ d[[
=
_
(0,T]
[g[ d[[.
By the assumption g L
1
() the last quantity above is a nite upper bound
on the sums of F-increments over all partitions. Hence F BV [0, T].
The last issue is the equality of the two measures
F
and g d on (0, T].
By Lemma B.3 it suces to check the equality of the two measures for
intervals (a, b], because these types of intervals generate the Borel -algebra.
F
(a, b] = F(b)F(a) =
_
[0,b]
g(s) (ds)
_
[0,a]
g(s) (ds) =
_
(a,b]
g(s) (ds).
This suces.
The conclusion (1.20) can be extended to [0, T] if we dene F(0) = 0.
For then
F
0 = F(0) F(0) = F(0) = g(0)0 =
_
0
g d.
On the other hand, the conclusion of the lemma on (0, T] would not change
if we dened F(0) = 0 and
F(t) =
_
(0,t]
g(s) (ds), 0 < t T.
This changes F by a constant and hence does not aect its total variation
or Lebesgue-Stieltjes measure.
1.2. Basic concepts of probability theory
This section summarizes the measure-theoretic foundations of probability
theory. Matters related to stochastic processes will be treated in the next
chapter.
1.2.1. Probability spaces, random variables and expectations. The
foundations of probability are taken directly from measure theory, with no-
tation and terminology adapted to probabilistic conventions. A probability
space (, T, P) is a measure space with total mass P() = 1. The prob-
ability space is supposed to model the random experiment or collection of
experiments that we wish to analyze. The underlying space is called the
sample space, and its sample points are the elementary outcomes of
the experiment. The measurable sets in T are called events. P is a proba-
bility measure. A random variable is a measurable function X : S with
values in some measurable space S. Most often S = R. If S = R
d
one can
call X a random vector, and if S is a function space then X is a random
function.
Here are some examples to illustrate the terminology.
Example 1.17. Consider the experiment of choosing randomly a person in
a room of N people and registering his or her age in years. Then naturally
is the set of people in the room, T is the collection of all subsets of ,
and P = 1/N for each person . Let X() be the age of person .
Then X is a Z
+
-valued measurable function (random variable) on .
Example 1.18. Consider the (thought) experiment of tossing a coin inn-
itely many times. Let us record the outcomes (heads and tails) as zeroes
and ones. The sample space is the space of sequences = (x
1
, x
2
, x
3
, . . . )
of zeroes and ones, or = 0, 1
N
, where N = 1, 2, 3, . . . is the set of nat-
ural numbers. The -algebra T on is the product -algebra B
N
where
each factor is the natural -algebra
B =
_
, 0, 1, 0, 1
_
on 0, 1. To choose the appropriate probability measure on , we need to
make assumptions on the coin. Simplest would be to assume that successive
coin tosses are independent (a term we discuss below) and fair (heads and
tails equally likely). Let o be the class of events of the form
A = : (x
1
, . . . , x
n
) = (a
1
, . . . , a
n
)
as n varies over N and (a
1
, . . . , a
n
) varies over n-tuples of zeroes and ones.
Include and to make o a semialgebra. Our assumptions dictate that
the probability of the event A should be P
0
(A) = 2
n
. One needs to check
that P
0
satises the hypotheses of Theorem 1.5, and then the mathematical
machinery takes over. There exists a probability measure P on (, T) that
agrees with P
0
on o. This is a mathematical model of a sequence of inde-
pendent fair coin tosses. Natural random variables to dene on are rst
the coordinate variables X
i
() = x
i
, and then variables derived from these
such as S
n
= X
1
+ + X
n
, the number of ones among the rst n tosses.
The random variables X
i
are an example of an i.i.d. sequence, which is
short for independent and identically distributed.
The expectation of a real-valued random variable X is simply its Lebesgue
integral over the probability space:
EX =
_
X dP.
The rules governing the existence of the expectation are exactly those inher-
ited from measure theory. The spaces L
p
(P) are also dened as for general
measure spaces.
The probability distribution (or simply distribution, also the term law is
used) of a random variable X is the probability measure obtained when
the probability measure P is transported to the real line via
(B) = PX B, B B
R
.
The expression X B is an abbreviation for the longer set expression
: X() B.
If h is a bounded Borel function on R, then h(X) is also a random
variable (this means the composition h X), and
(1.21) Eh(X) =
_
h(X) dP =
_
R
h(x) (dx).
This equality is an instance of the change of variables identity (1.5). Notice
that we need not even specify the probability space to make this calculation.
This is the way things usually work. There must always be a probability
space underlying our reasoning, but when situations are simple we can ignore
it and perform our calculations in familiar spaces such as the real line or
Euclidean spaces.
The (cumulative) distribution function F of a random variable X is
dened by F(x) = PX x. The distribution is the Lebesgue-Stieltjes
measure of F. Using the notation of Lebesgue-Stieltjes integrals, (1.21) can
be expressed as
(1.22) Eh(X) =
_
R
h(x) dF(x).
This is the way expectations are expressed in probability and statistics books
that avoid using measure theory, relying on the advanced calculus level
understanding of the Stieltjes integral.
The density function f of a random variable X is the Radon-Nikodym de-
rivative of its distribution with respect to Lebesgue measure, so f = d/dx.
It exists i is absolutely continuous with respect to Lebesgue measure on
R. When f exists, the distribution function F is dierentiable Lebesgue
almost everywhere, and F
t
= f Lebesguealmost everywhere. The expecta-
tion can then be expressed as an integral with respect to Lebesgue measure:
(1.23) Eh(X) =
_
R
h(x)f(x) dx.
This is the way most expectations are evaluated in practice. For example, if
X is a rate exponential random variable (X Exp() in symbols), then
Eh(X) =
_

0
h(x)e
x
dx.
The concepts discussed above have natural extensions to R
d
-valued random
vectors.
Example 1.19. Here are the most important probability densities.
(i) On any bounded interval [a, b] of R there is the uniform distribution
Unif[a, b] with density f(x) = (b a)
1
1
[a,b]
(x). Whether the endpoints
are included is immaterial because it does not aect the outcome of any
calculation.
(ii) The exponential distribution mentioned above is a special case of the
Gamma(, ) distribution on R
+
with density f(x) = ()
1
(x)
1
e
x
.
The two parameters satisfy , > 0. The gamma function is () =
_
0
x
1
e
x
dx.
(iii) For , > 0 the Beta(, ) distribution on (0, 1) has density f(x) =
(+)
()()
x
1
(1 x)
1
.
(iv) For a vector v R
d
(d 1) and a symmetric, nonnegative denite,
nonsingular d d matrix , the normal (or Gaussian) distribution ^(v, )
on R
d
has density
(1.24) f(x) =
1
(2)
d/2
det
exp
_
1
2
(x v)
T
1
(x v)
_
.
Above, and in general, we regard a d-vector v as a d 1 matrix with 1 d
transpose v
T
.
If is singular then the ^(v, ) distribution can be characterized by its
characteristic function (the probabilistic term for Fourier transform)
(1.25) E(e
is
T
X
) = exp
_
is
T
v
1
2
s
T
s
_
, s R
d
,
where X represents the R
d
-valued ^(v, )-distributed random vector and
i is the imaginary unit

1. This probability distribution is supported on
the image of R
d
under .
The ^(0, 1) distribution on R is called the standard normal distribution.
When d > 1, ^(v, ) is called a multivariate normal.
One more terminological change as we switch from analysis to probabil-
ity: almost everywhere (a.e.) becomes almost surely (a.s.). But of course
there is no harm in using both.
Equality of random variables X and Y has the same meaning as equality
for any functions: if X and Y are dened on the same sample space then
they are equal as functions if X() = Y () for each . Often we
cannot really control what happens on null sets (sets of probability zero),
so the more relevant notion is the almost sure equality: X = Y a.s. if
P(X = Y ) = 1. We also talk about equality in distribution of X and Y
which means that P(X B) = P(Y B) for all measurable sets B in the
(common) range space of X and Y . This is abbreviated X
d
= Y and makes
sense even if X and Y are dened on dierent probability spaces.
1.2.2. Convergence of random variables. Here is a list of ways in which
random variables can converge. Except for convergence in distribution, they
are direct adaptations of the corresponding modes of convergence from anal-
ysis.
Denition 1.20. Let X
n
be a sequence of random variables and X a
random variable, all real-valued.
(a) X
n
X almost surely if
P
_
: lim
n
X
n
() = X()
_
= 1.
(b) X
n
X in probability if for every > 0,
lim
n
P
_
: [X
n
() X()[
_
= 0.
(c) X
n
X in L
p
for 1 p < if
lim
n
E
_
[X
n
() X()[
p
_
= 0.
(d) X
n
X in distribution (also called weakly) if
lim
n
PX
n
x = PX x
for each x at which F(x) = PX x is continuous.
Convergence types (a)(c) require that all the random variables are dened
on the same probability space, but (d) does not.
The denition of weak convergence above is a specialization to the real
line of the general denition, which is this: let
n
and be Borel proba-
bility measures on a metric space S. Then
n
weakly if
_
S
g d
n

_
S
g d
for all bounded continuous functions g on S. Random variables converge
weakly if their distributions do in the sense above. Commonly used notation
for weak convergence is X
n
X.
Here is a summary of the relationships between the dierent types of
convergence. We need one new denition: a sequence X
n
of random
variables is uniformly integrable if
(1.26) lim
M
sup
nN
E
_
[X
n
[ 1
[Xn[M
= 0.
Theorem 1.21. Let X
n
and X be real-valued random variables on a
common probability space.
(i) If X
n
X almost surely or in L
p
for some 1 p < , then
X
n
X in probability.
(ii) If X
n
X in probability, then X
n
X weakly.
(iii) If X
n
X in probability, then there exists a subsequence X
n
k
such
that X
n
k
X almost surely.
(iv) Suppose X
n
X in probability. Then X
n
X in L
1
i X
n
is
uniformly integrable.
1.2.3. Independence and conditioning. Fix a probability space
(, T, P). In probability theory, -algebras represent information. T rep-
resents all the information about the experiment, and sub--algebras / of
T represent partial information. Knowing a -algebra / means know-
ing for each event A / whether A happened or not. A common way
to create sub--algebras is to generate them with random variables. If X
is a random variable on , then the -algebra generated by X is denoted
by (X) and it is given by the collection of inverse images of Borel sets:
(X) =
_
X B : B B
R
_
. Measurability of X is exactly the same as
(X) T.
Knowing the actual value of X is the same as knowing whether X B
happened for each B B
R
. But of course there may be many sample
points that have the same values for X, so knowing X does not allow
us to determine which outcome actually happened. In this sense X
represents partial information.
Elementary probability courses dene that two events A and B are inde-
pendent if P(A B) = P(A)P(B). The conditional probability of A, given
B, is dened as
(1.27) P(A[B) =
P(A B)
P(B)
provided P(B) > 0. Thus the independence of A and B can be equivalently
expressed as P(A[B) = P(A). This reveals the meaning of independence:
knowing that B happened (in other words, conditioning on B) does not
change our probability for A.
One technical reason we need to go beyond these elementary denitions
is that we need to routinely condition on events of probability zero. For ex-
ample, suppose X and Y are independent random variables, both uniformly
distributed on [0, 1], and we set Z = X + Y . Then we would all agree that
P(Z
1
2
[Y =
1
3
) =
5
6
, yet since P(Y =
1
3
) = 0 this conditional probability
cannot be dened in the above manner.
The general denition of independence, from which various other deni-
tions follow as special cases, is for the independence of -algebras.
Denition 1.22. Let /
1
, /
2
, . . . , /
n
be sub--algebras of T. Then /
1
,
/
2
, . . . , /
n
are mutually independent (or simply independent) if, for every
choice of events A
1
/
1
, A
2
/
2
, . . . , A
n
/
n
,
(1.28) P(A
1
A
2
A
n
) = P(A
1
) P(A
2
) P(A
n
).
An arbitrary collection /
i
: i 1 of sub--algebras of T is independent
if each nite subcollection is independent.
The more concrete notions of independence of random variables and
independence of events derive from the above denition.
Denition 1.23. A collection of random variables X
i
: i 1 on a prob-
ability space (, T, P) is independent if the -algebras (X
i
) : i 1 gen-
erated by the individual random variables are independent. Equivalently,
for any nite set of distinct indices i
1
, i
2
, . . . , i
n
and any measurable sets
B
1
, B
2
, . . . , B
n
from the range spaces of the random variables, we have
(1.29) PX
i
1
B
1
, X
i
2
B
2
, . . . , X
in
B
n
=
n
k=1
PX
i
k
B
k
.
Finally, events A
i
: i 1 are independent if the corresponding indicator
random variables 1
A
i
: i 1 are independent.
Some remarks about the denitions. The product property extends to
all expectations that are well-dened. If /
1
, . . . , /
n
are independent -
algebras, and Z
1
, . . . , Z
n
are integrable random variables such that Z
i
is
/
i
-measurable (1 i n) and the product Z
1
Z
2
Z
n
is integrable, then
(1.30) E
_
Z
1
Z
2
Z
n
= EZ
1
EZ
2
EZ
n
.
If the variables Z
i
are [0, ]-valued then this identity holds regardless of
integrability because the expectations are limits of expectations of truncated
random variables.
Independence is closely tied with the notion of product measure. Let
be the distribution of the random vector X = (X
1
, X
2
, . . . , X
n
) on R
n
,
and let
i
be the distribution of component X
i
on R. Then the variables
X
1
, X
2
, . . . , X
n
are independent i =
1
2

n
.
Further specialization yields properties familiar from elementary prob-
ability. For example, if the random vector (X, Y ) has a density f(x, y) on
R
2
, then X and Y are independent i f(x, y) = f
X
(x)f
Y
(y) where f
X
and
f
Y
are the marginal densities of X and Y . Also, it is enough to check prop-
erties (1.28) and (1.29) for classes of sets that are closed under intersections
and generate the -algebras in question. (A consequence of the so-called
- theorem, see Lemma B.3 in the Appendix.) Hence we get the familiar
criterion for independence in terms of cumulative distribution functions:
(1.31) PX
i
1
t
1
, X
i
2
t
2
, . . . , X
in
t
n
=
n
k=1
PX
i
k
t
k
.
Independence is a special property, and always useful when it is present.
The key tool for handling dependence (that is, lack of independence) is the
notion of conditional expectation. It is a nontrivial concept, but fundamen-
tal to just about everything that follows in this book.
Denition 1.24. Let X L
1
(P) and let / be a sub--eld of T. The con-
ditional expectation of X, given /, is the integrable, /-measurable random
variable Y that satises
(1.32)
_
A
X dP =
_
A
Y dP for all A /.
The notation for the conditional expectation is Y () = E(X[/)(). It is
almost surely unique, in other words, if

Y is /-measurable and satises
(1.32), then PY =

Y = 1.
Justication of the denition. The existence of the conditional expec-
tation follows from the Radon-Nikodyn theorem. Dene a nite signed mea-
sure on (, /) by
(A) =
_
A
X dP, A /.
P(A) = 0 implies (A) = 0, and so P. By the Radon-Nikodym theorem
there exists a Radon-Nikodym derivative Y = d/dP which is /-measurable
and satises
_
A
Y dP = (A) =
_
A
X dP for all A /.
Y is integrable because
_
Y
+
dP =
_
Y 0
Y dP = Y 0 =
_
Y 0
X dP
[X[ dP <
and a similar bound can be given for
_
dP.
To prove uniqueness, suppose

Y satises the same properties as Y . Let
A = Y

Y . This is an /-measurable event. On A, Y

Y = (Y

Y )
+
,
while on A
c
, (Y

Y )
+
= 0. Consequently
_
(Y

Y )
+
dP =
_
A
(Y

Y ) dP =
_
A
Y dP
_
A
Y dP
=
_
A
X dP
_
A
X dP = 0.
The integral of a nonnegative function vanishes i the function vanishes
almost everywhere. Thus (Y

Y )
+
= 0 almost surely. A similar argument
shows (Y

Y )
= 0, and so [Y

Y [ = 0 almost surely.
The dening property (1.32) of the conditional expectation extends to
(1.33)
_
ZX dP =
_
Z E(X[/) dP
for any bounded /-measurable random variable Z. Boundedness of Z guar-
antees that ZX and Z E(X[/) are integrable for an integrable random vari-
able X.
Some notational conventions. When X = 1
B
is the indicator random
variable of an event B, we can write P(B[/) for E(1
B
[/). When the con-
ditining -algebra is generated by a random variable Y , so / = Y , we
can write E(X[Y ) instead of E(X[Y ).
Sometimes one also sees the conditional expectation E(X[Y = y), re-
garded as a function of y R (assuming now that Y is real-valued). This
is dened by an additional step. Since E(X[Y ) is Y -measurable, there
exists a Borel function h such that E(X[Y ) = h(Y ). This is an instance of a
general exercise according to which every Y -measurable random variable
is a Borel function of Y . Then one uses h to dene E(X[Y = y) = h(y). This
conditional expectation works with integrals on the real line with respect to
the distribution
Y
of Y : for any B B
R
,
(1.34) E[1
B
(Y )X] =
_
B
E(X[Y = y)
Y
(dy).
The denition of the conditional expectation is abstract, and it takes
practice to get used to the idea of conditional probabilities and expecta-
tions as random variables rather than as numbers. The task is to familiarize
oneself with this concept by working with it. Eventually one will under-
stand how it actually does everything we need. The typical way to nd
conditional expectations is to make an educated guess, based on an intu-
itive understanding of the situation, and then verify the denition. The
/-measurability is usually built into the guess, so what needs to be checked
is (1.32). Whatever its manifestation, conditional expectation always in-
volves averaging over some portion of the sample space. This is especially
clear in this simplest of examples.
Example 1.25. Let A be an event such that 0 < P(A) < 1, and / =
, , A, A
c
. Then
(1.35) E(X[/)() =
E(1
A
X)
P(A)
1
A
() +
E(1
A
cX)
P(A
c
)
1
A
c().
Let us check (1.32) for A. Let Y denote the right-hand-side of (1.35).
Then
_
A
Y dP =
_
A
E(1
A
X)
P(A)
1
A
() P(d) =
E(1
A
X)
P(A)
_
A
1
A
() P(d)
= E(1
A
X) =
_
A
X dP.
A similar calculation checks
_
A
c
Y dP =
_
A
c
X dP, and adding these together
gives the integral over . is of course trivial, since any integral over equals
zero. See Exercise 1.13 for a generalization of this.
Here is a concrete case. Let X Exp(), and suppose we are al-
lowed to know whether X c or X > c. What is our updated expec-
tation for X? To model this, let us take (, T, P) = (R
+
, B
R
+
, ) with
(dx) = e
x
dx, the identity random variable X() = , and the sub--
algebra / = , , [0, c], (c, ). For example, for any (c, ), we would
compute
E(X[/)() =
1
(c, )
_

c
xe
x
dx = e
c
(ce
c
+
1
e
c
)
= c +
1
.
You probably knew the answer from the memoryless property of the expo-
nential distribution? If not, see Exercise 1.11. Exercise 1.14 asks you to
complete this example.
The next theorem lists the main properties of the conditional expec-
tation. Equalities and inequalities concerning conditional expectations are
almost sure statements, although we did not indicate this below, because
the conditional expectation is dened only up to null sets.
Theorem 1.26. Let (, T, P) be a probability space, X and Y integrable
random variables on , and / and B sub--elds of T.
(i) E[E(X[/)] = EX.
(ii) E[X +Y [/] = E[X[/] +E[Y [/] for , R.
(iii) If X Y then E[X[/] E[Y [/].
(iv) If X is /-measurable, then E[X[/] = X.
(v) If X is /-measurable and XY is integrable, then
(1.36) E[XY [/] = X E[Y [/].
(vi) If X is independent of / (which means that the -algebras X
and / are independent), then E[X[/] = EX.
(vii) If / B, then
EE(X[/) [B = EE(X[B) [/ = E[X[/].
(viii) If / B and E(X[B) is /-measurable, then E(X[B) = E(X[/).
(ix) (Jensens inequality) Suppose f is a convex function on (a, b),
a < b . This means that
f(x + (1 )y) f(x) + (1 )f(y) for x, y (a, b) and 0 < < 1.
Assume Pa < X < b = 1. Then
(1.37) f
_
E[X[/]
_
E
_
f(X)

provided the conditional expectations are well dened.

(x) Suppose X is a random variable with values in a measurable space
(S
1
, H
1
), Y is a random variable with values in a measurable space (S
2
, H
2
),
and : S
1
S
2
R is a measurable function such that (X, Y ) is integrable.
Assume that X is /-measurable while Y is independent of /. Then
(1.38) E[(X, Y )[/]() =
_
(X(), Y ( )) P(d ).
Proof. The proofs must appeal to the denition of the conditional expecta-
tion. We leave them mostly as exercises or to be looked up in any graduate
probability textbook. Let us prove (v) and (vii) as examples.
Proof of part (v). We need to check that X E[Y [/] satises the de-
nition of E[XY [/]. The /-measurability of X E[Y [/] is true because X
is /-measurable by assumption, E[Y [/] is /-measurable by denition, and
multiplication preserves /-measurability. Then we need to check that
(1.39) E
_
1
A
XY
_
= E
_
1
A
X E[Y [/]
_
for an arbitrary A /. If X were bounded, this would be a special case of
(1.33) with Z replaced by 1
A
X. For the general case we need to check the
integrability of X E[Y [/] before we can really write down the right-hand
side of (1.39).
Let us assume rst that both X and Y are nonnegative. Then also
E(Y [/) 0 by (iii), because E(0[/) = 0 by (iv). Let X
(k)
= X k be a
truncation of X. We can apply (1.33) to get
(1.40) E
_
1
A
X
(k)
Y
_
= E
_
1
A
X
(k)
E[Y [/]
_
.
Inside both expectations we have nonnegative random variables that increase
with k. By the monotone convergence theorem we can let k and recover
(1.39) in the limit, for nonnegative X and Y . In particular, this tells us that,
at least if X, Y 0, the integrability of X, Y and XY imply that X E(Y [/)
is integrable.
Now decompose X = X
+
X
and Y = Y
+
Y
. By property (ii),
E(Y [/) = E(Y
+
[/) E(Y
[/).
The left-hand side of (1.39) becomes
E
_
1
A
X
+
Y
+
_
E
_
1
A
X
Y
+
_
E
_
1
A
X
+
Y
_
+E
_
1
A
X
_
.
The integrability assumption is true for all pairs X
, so to each term
above we can apply the case of (1.39) already proved for nonnegative random
variables. The expression becomes
E
_
1
A
X
+
E(Y
+
[/)
_
E
_
1
A
X
E(Y
+
[/)
_
E
_
1
A
X
+
E(Y
[/)
_
+E
_
1
A
X
E(Y
[/)
_
.
For integrable random variables, a sum of expectations can be combined
into an expectation of a sum. Consequently the sum above becomes the
right-hand side of (1.39). This completes the proof of part (v).
Proof of part (vii). That EE(X[/) [B = E[X[/] follows from part
(iv). To prove EE(X[B) [/ = E[X[/], we show that E[X[/] satises
the denition of EE(X[B) [/. Again the measurability is not a problem.
Then we need to check that for any A /,
_
A
E(X[B) dP =
_
A
E(X[/) dP.
This is true because A lies in both / and B, so both sides equal
_
A
X dP.
There is a geometric way of looking at E(X[/) as the solution to an
optimization or estimation problem. Assume that X L
2
(P). Then what
is the best /-measurable estimate of X in the mean-square sense? In other
words, nd the /-measurable random variable Z L
2
(P) that minimizes
E[(X Z)
2
]. This is a geometric view of E(X[/) because it involves
projecting X orthogonally to the subspace L
2
(, /, P) of /-measurable L
2
-
random variables.
Proposition 1.27. Let X L
2
(P). Then E(X[/) L
2
(, /, P). For all
Z L
2
(, /, P),
E
_
(X E[X[/])
2
E
_
(X Z)
2
with equality i Z = E(X[/).

Proof. By Jensens inequality,
E
_
E[X[/]
2
_
E
_
E[X
2
[/]
_
= E
_
X
2
_
.
Consequently E[X[/] is in L
2
(P).
E
_
(X Z)
2
= E
_
(X E[X[/] +E[X[/] Z)
2
= E
_
(X E[X[/])
2
+ 2E
_
(X E[X[/])(E[X[/] Z)
+E
_
(E[X[/] Z)
2
= E
_
(X E[X[/])
2
+E
_
(E[X[/] Z)
2
.
The cross term of the square vanishes because E[X[/] Z is /-measurable,
and this justies the last equality. The last line is minimized by the unique
choice Z = E[X[/].
1.2.4. Construction of probability spaces. In addition to the usual
construction issues of measures that we discussed before, in probability the-
ory we need to construct stochastic processes which are innite collections
X
t
: t 1 of random variables. Often the naturally available ingredients
for the construction are the nite-dimensional distributions of the process.
These are the joint distributions of nite vectors (X
t
1
, X
t
2
, . . . , X
tn
) of ran-
dom variables. Kolmogorovs Extension Theorem, whose proof is based
on the general machinery for extending measures, states that the nite-
dimensional distributions are all we need. The natural home for the con-
struction is a product space.
To formulate the hypotheses below, we need to consider permutations
acting on n-tuples of indices from an index set 1 and on n-vectors from a
product space. A permutation is a bijective map of 1, 2, . . . , n onto itself,
for some nite n. If s = (s
1
, s
2
, . . . , s
n
) and t = (t
1
, t
2
, . . . , t
n
) are n-tuples,
then t = s means that (t
1
, t
2
, . . . , t
n
) = (s
(1)
, s
(2)
, . . . , s
(n)
). The action
of on any n-vector is dened similarly, by permuting the coordinates.
Here is the setting for the theorem. 1 is an arbitrary index set, and for
each t 1, (X
t
, B
t
) is a complete, separable metric space together with its
Borel -algebra. Let X =

X
t
be the product space and B =

B
t
the
product -algebra. A generic element of X is written x = (x
t
)
t7
.
Theorem 1.28 (Kolmogorovs extension theorem). Suppose that for each
ordered n-tuple t = (t
1
, t
2
, . . . , t
n
) of distinct indices we are given a probabil-
ity measure Q
t
on the nite product space (X
t
, B
t
) =
_
n
k=1
X
t
k
,
n
k=1
B
t
k
_
,
for all n 1. We assume two properties that make Q
t
a consistent family
of nite-dimensional distributions:
(i) If t = s, then Q
t
= Q
s

1
.
(ii) If t = (t
1
, t
2
, . . . , t
n1
, t
n
) and s = (t
1
, t
2
, . . . , t
n1
), then for A
B
s
, Q
s
(A) = Q
t
(AX
tn
).
Then there exists a probability measure P on (X, B) whose nite-dimensional
marginal distributions are given by Q
t
. In other words, for any t =
(t
1
, t
2
, . . . , t
n
) and any B B
t
,
(1.41) Px X : (x
t
1
, x
t
2
, . . . , x
tn
) B = Q
t
(B).
We refer the reader to [3, Chapter 12] for a proof of Kolmogorovs the-
orem in this generality. [4] gives a proof for the case where 1 is countable
and X
t
= R for each t. The main idea of the proof is no dierent for the
more abstract result.
To construct a stochastic process (X
t
)
t7
with prescribed nite-dimensional
marginals
Q
t
(A) = P(X
t
1
, X
t
2
, . . . , X
tn
) A,
apply Kolmogorovs theorem to these inputs, take (, T, P) = (X, B, P),
and for = x = (x
t
)
t7
dene the coordinate random variables X
t
() =
x
t
. When 1 is countable, such as Z
+
, this strategy is perfectly adequate,
and gives for example all innite sequences of i.i.d. random variables. When
1 is uncountable, such as R
+
, we typically want something more from the
construction than merely the correct distributions, namely some regularity
for the random function t X
t
(). This is called path regularity. This issue
is addressed in the next chapter.
We will not discuss the proof of Kolmogorovs theorem. Let us observe
that hypotheses (i) and (ii) are necessary for the existence of P, so nothing
unnecessary is assumed in the theorem. Property (ii) is immediate from
(1.41) because (x
t
1
, x
t
2
, . . . , x
tn1
) A i (x
t
1
, x
t
2
, . . . , x
tn
) A X
tn
.
Property (i) is also clear on intuitive grounds because all it says is that if
the coordinates are permuted, their distribution gets permuted too. Here
is a rigorous justication. Take a bounded measurable function f on X
t
.
Note that f is then a function on X
s
, because
x = (x
1
, . . . , x
n
) X
s
x
i
X
s
i
(1 i n)
x
(i)
X
t
i
(1 i n)
x = (x
(1)
, . . . , x
(n)
) X
t
.
Compute as follows, assuming P exists:
_
X
t
f dQ
t
=
_
X
f(x
t
1
, x
t
2
, . . . , x
tn
) P(dx)
=
_
X
f(x
s
(1)
, x
s
(2)
, . . . , x
s
(n)
) P(dx)
=
_
X
(f )(x
s
1
, x
s
2
, . . . , x
sn
) P(dx)
=
_
X
s
(f ) dQ
s
=
_
X
t
f d(Q
s

1
).
Since f is arbitrary, this says that Q
t
= Q
s

1
.
Exercises 33
Exercises
Exercise 1.1. In case you wish to check your understanding of the Lebesgue-
Stieltjes integral, compute
_
(0,3a]
xdF(x) with a > 0 and
F(x) =
_
_
, 0 x < a
4 +a x, a x < 2a
(x 2a)
2
, x 2a.
You should get 8a
3
/3 +a
2
/2 (4 +)a.
Exercise 1.2. Let V be a continuous, nondecreasing function on R and
its Lebesgue-Stieltjes measure. Say t is a point of strict increase for V if
V (s) < V (t) < V (u) for all s < t and all u > t. Let I be the set of such
points. Show that I is a Borel set and (I
c
) = 0.
Hint. For rationals q let a(q) = infs q : V (s) = V (q) and b(q) =
sups q : V (s) = V (q). Considering q such that a(q) < b(q) show that
I
c
is a countable union of closed intervals with zero -measure.
Exercise 1.3. Show that in the denition (1.13) of total variation one can-
not in general replace the supremum over partitions by the limit as the mesh
of the partition tends to zero. (How about the indicator function of a single
point?) But if F has some regularity, for example right-continuity, then the
supremum can be replaced by the limit as mesh() 0.
Exercise 1.4. Let G be a continuous BV function on [0, T] with Lebesgue-
Stieltjes measure
G
on (0, T], and let h be a Borel measurable function on
[0, T] that is integrable with respect to
G
. Dene a function F on [0, T]
by F(0) = 0 and F(t) =
_
(0,t]
h(s) dG(s) for t > 0. Building on Lemma 1.16
and its proof, and remembering also (1.15), show that F is a continuous BV
function on [0, T]. Note in particular the special case F(t) =
_
t
0
h(s) ds.
Exercise 1.5. For a simple example of the failure of uncountable additiv-
ity for probabilities, let X be a [0, 1]-valued uniformly distributed random
variable on (, T, P). Then P(X = s) = 0 for each individual s [0, 1] but
the union of these events over all s is .
Exercise 1.6. Here is a useful formula for computing expectations. Suppose
X is a nonnegative random variable, and h is a nondecreasing function on
R
+
such that h(0) = 0 and h is absolutely continuous on each bounded
interval. (This last hypothesis is for ensuring that h(a) =
_
a
0
h
t
(s) ds for all
0 a < .) Then
(1.42) Eh(X) =
_

0
h
t
(s)P[X > s] dt.
Exercise 1.7. Suppose we need to prove something about a -algebra B =
(c) on a space X, generated by a class c of subsets of X. A common
strategy for proving such a result is to identify a suitable class c containing
c whose members have the desired property. If c can be shown to be a
-eld then B c follows and thereby all members of B have the desired
property. Here are useful examples.
(a) Fix two points x and y of the underlying space. Suppose for each
A c, x, y A or x, y A
c
. Show that the same property is true for
all A B. In other words, if the generating sets do not separate x and y,
neither does the -eld.
(b) Suppose is a collection of functions from a space X into a mea-
surable space (Y, H). Let B = f : f be the smallest -algebra that
makes all functions f measurable, as dened in (1.1). Suppose g is a
function from another space into X. Let have -algebra T. Show that
g is a measurable function from (, T) into (X, B) i for each f , f g
is a measurable function from (, T) into (Y, H).
(c) In the setting of part (b), suppose two points x and y of X satisfy
f(x) = f(y) for all f . Show that for each B B, x, y B or
x, y B
c
.
(d) Let S X such that S B. Let
B
1
= B B : B S = A S : A B
be the restricted -eld on the subspace S. Show that B
1
is the -eld
generated on the space S by the collection
c
1
= E S : E c.
Show by example that B
1
is not necessarily generated by
c
2
= E c : E S.
Hint: Consider c = B X : B S B
1
. For the example, note that
B
R
is generated by the class (, a] : a R but none of these innite
intervals lie in a bounded interval such as [0, 1].
(e) Let (X, /, ) be a measure space. Let | be a sub--eld of /, and
let ^ = A / : (A) = 0 be the collection of sets of -measure zero
(-null sets). Let |
= (| ^) be the -eld generated by | and ^.

Show that
|
= A / : there exists U | such that UA ^.

|
is called the augmentation of |.

Exercises 35
Exercise 1.8. Suppose A
i
and B
i
are sequences of measurable sets in
a measure space (X, /, ) such that (A
i
B
i
) = 0. Then
_
(A
i
)(B
i
)
_
=
_
(A
i
)(B
i
)
_
= 0.
Exercise 1.9. (Product -algebras.) Recall the setting of Example 1.3.
For a subset L 1 of indices, let B
L
= f
i
: i L denote the -algebra
generated by the projections f
i
for i L. So in particular, B
7
=

i7
/
i
is
the full product -algebra.
(a) Show that for each B B
7
there exists a countable set L 1 such
that B B
L
. Hint. Do not try to reason starting from a particular set
B B
7
. Instead, try to say something useful about the class of sets for
which a countable L exists.
(b) Let R
[0,)
be the space of all functions x : [0, ) R, with the
product -algebra generated by the projections x x(t), t [0, ). Show
that the set of continuous functions is not measurable.
Exercise 1.10. (a) Let c
1
, . . . , c
n
be collections of measurable sets on
(, T, P), each closed under intersections (if A, B c
i
then A B c
i
).
Suppose
P(A
1
A
2
A
n
) = P(A
1
) P(A
2
) P(A
n
)
for all A
1
c
1
, . . . , A
n
c
n
. Show that the -algebras (c
1
), . . . , (c
n
) are
independent. Hint. A straight-forward application of the theorem B.1.
(b) Let /
i
: i 1 be a collection of independent -algebras. Let I
1
,
. . . , I
n
be pairwise disjoint subsets of 1, and let B
k
= A
i
: i I
k
for
1 k n. Show that B
1
, . . . , B
n
are independent.
(c) Let /, B, and c be sub--algebras of T. Assume B, c is indepen-
dent of /, and c is independent of B. Show that /, B and c are independent,
and so in particular c is independent of /, B.
(d) Show by example that the independence of c and /, B does not
necessarily follow from having B independent of /, c independent of /, and
c independent of B. This last assumption is called pairwise independence of
/, B and c. Hint. An example can be built from two independent fair coin
tosses.
Exercise 1.11. (Memoryless property.) Let X Exp(). Show that, condi-
tional on X > c, Xc Exp(). In plain English: given that I have waited
for c time units, the remaining waiting time is still Exp()-distributed. This
is the memoryless property of the exponential distribution.
Exercise 1.12. Independence allows us to average separately. Here is a
special case that will be used in a later proof. Let (, T, P) be a probability
space, U and V measurable spaces, X : U and Y : V measurable
mappings, f : U V R a bounded measurable function (with respect to
the product -algebra on U V ). Assume that X and Y are independent.
Let be the distribution of Y on the V -space, dened by (B) = PY B
for measurable sets B V . Show that
E[f(X, Y )] =
_
V
E[f(X, y)] (dy).
Hints. Start with functions of the type f(x, y) = g(x)h(y). Use Theorem
B.2 from the appendix.
Exercise 1.13. Let D
i
: i N be a countable partition of , by which we
mean that D
i
are pairwise disjoint and =

D
i
. Let T be the -algebra
generated by D
i
.
(a) Show that G T i G =

iI
D
i
for some I N.
(b) Let X L
1
(P). Let U = i N : P(D
i
) > 0. Show that
E(X[T)() =
i:iU
E(1
D
i
X)
P(D
i
)
1
D
i
().
Exercise 1.14. Complete the exponential case in Example 1.25 by nding
E(X[/)() for [0, c]. Then verify the identity E
_
E(X[/)
= EX. Do
you understand why E(X[/) must be constant on [0, c] and on (c, )?
Exercise 1.15. Suppose P(A) = 0 or 1 for all A /. Show that E(X[/) =
EX for all X L
1
(P).
Exercise 1.16. Let (X, Y ) be an R
2
-valued random vector with joint den-
sity f(x, y). This means that for any bounded Borel function on R
2
,
E[(X, Y )] =
__
R
2
(x, y)f(x, y) dxdy.
The marginal density of Y is dened by
f
Y
(y) =
_
R
f(x, y) dx.
Let
f(x[y) =
_
_
_
f(x, y)
f
Y
(y)
, if f
Y
(y) > 0
0, if f
Y
(y) = 0.
(a) Show that f(x[y)f
Y
(y) = f(x, y) for almost every (x, y), with respect
to Lebesgue measure on R
2
. Hint: Let
H = (x, y) : f(x[y)f
Y
(y) ,= f(x, y).
Exercises 37
Show that m(H
y
) = 0 for each y-section of H, and use Tonellis theorem.
(b) Show that f(x[y) functions as a conditional density of X, given that
Y = y, in this sense: for a bounded Borel function h on R,
E[h(X)[Y ]() =
_
R
h(x)f(x[Y ()) dx.
Exercise 1.17. It is not too hard to write down impossible requirements
for a stochastic process. Suppose X
t
: 0 t 1 is a real-valued stochastic
process that satises
(i) X
s
and X
t
are independent whenever s ,= t.
(ii) Each X
t
has the same distribution, and variance 1.
(iii) The path t X
t
() is continuous for almost every .
Show that a process satisfying these conditions cannot exist.
Chapter 2
Stochastic Processes
This chapter rst covers general matters in the theory of stochastic processes,
and then discusses the two most important processes, Brownian motion and
Poisson processes.
2.1. Filtrations and stopping times
The set of nonnegative reals is denoted by R
+
= [0, ). Similarly Q
+
for
nonnegative rationals and Z
+
= 0, 1, 2, . . . for nonnegative integers. The
set of natural numbers is N = 1, 2, 3, . . . .
The discussion always takes place on a xed probability space (, T, P).
We will routinely assume that this space is complete as a measure space.
This means that if D T and P(D) = 0, then all subsets of D lie in T and
have probability zero. This is not a restriction because every measure space
can be completed. See Section 1.1.6.
A ltration on a probability space (, T, P) is a collection of -elds
T
t
: t R
+
that satisfy
T
s
T
t
T for all 0 s < t < .
Whenever the index t ranges over nonnegative reals, we write simply T
t
for T
t
: t R
+
. Given a ltration T
t
we can add a last member to it
by dening
(2.1) T
=
_
_
0t<
T
t
_
.
T
is contained in T but can be strictly smaller than T.

39
40 2. Stochastic Processes
Example 2.1. The natural interpretation for the index t R
+
is time.
Let X
t
denote the price of some stock at time t and assume it makes sense
to imagine that it is dened for all t R
+
. At time t we know the entire
evolution X
s
: s [0, t] up to the present. In other words, we are in
possession of the information in the -algebra T
X
t
= X
s
: s [0, t].
T
X
t
is a basic example of a ltration.
We will nd it convenient to assume that each T
t
contains all subsets
of T-measurable P-null events. This is more than just assuming that each
T
t
is complete, but it entails no loss of generality. To achieve this, rst
complete (, T, P), and then replace T
t
with
T
t
= B T : there exist A T
t
such that P(AB) = 0 . (2.2)
Exercise 1.7(e) veried that

T
t
is a -algebra. The ltration
T
t
is some-
times called complete, or the augmented ltration.
At the most general level, a stochastic process is a collection of random
variables X
i
: i 1 indexed by some arbitrary index set 1, and all dened
on the same probability space. For us the index set is most often R
+
or
some subset of it. Let X = X
t
: t R
+
be a process on (, T, P). It
is convenient to regard X as a function on R
+
through the formula
X(t, ) = X
t
(). Indeed, we shall use the notations X(t, ) and X
t
()
interchangeably. When a process X is discussed without explicit mention of
an index set, then R
+
is assumed.
If the random variables X
t
take their values in a space S, we say X =
X
t
: t R
+
is an S-valued process. To even talk about S-valued random
variables, S needs to have a -algebra so that a notion of measurability
exists. Often in general accounts of the theory S is assumed a metric space,
and then the natural -algebra on S is the Borel -eld B
S
. We have no
cause to consider anything more general than S = R
d
, the d-dimensional
Euclidean space. Unless otherwise specied, in this section a process is
always R
d
-valued. Of course, most important is the real-valued case with
space R
1
= R.
A process X = X
t
: t R
+
is adapted to the ltration T
t
if X
t
is
T
t
-measurable for each 0 t < . The smallest ltration to which X is
adapted is the ltration that it generates, dened by
T
X
t
= X
s
: 0 s t.
A process X is measurable if X is B
R
+
T-measurable as a function
from R
+
into R. Furthermore, X is progressively measurable if the
restriction of the function X to [0, T] is B
[0,T]
T
T
-measurable for each
T. More explicitly, the requirement is that for each B B
R
d, the event
(t, ) [0, T] : X(t, ) B
lies in the -algebra B
[0,T]
T
T
. If X is progressively measurable then it is
also adapted, but the reverse implication does not hold (Exercise 2.6).
Properties of random objects are often interpreted in such a way that
they are not aected by events of probability zero. For example, let X =
X
t
: t R
+
and Y = Y
t
: t R
+
be two stochastic processes dened
on the same probability space (, T, P). As functions on R
+
, X and Y
are equal if X
t
() = Y
t
() for each and t R
+
. A useful relaxation
of this strict notion of equality is called indistinguishability. We say X and
Y are indistinguishable if there exists an event
0
such that P(
0
) = 1
and for each
0
, X
t
() = Y
t
() for all t R
+
. Since most statements
about processes ignore events of probability zero, for all practical purposes
indistinguishable processes can be regarded as equal.
Another, even weaker notion is modication: Y is a modication (also
called version) of X if for each t, PX
t
= Y
t
= 1.
Equality in distribution is also of importance for processes X and Y :
X
d
= Y means that PX A = PY A for all measurable sets for which
this type of statement makes sense. In all reasonable situations equality in
distribution between processes follows from the weaker equality of nite-
dimensional distributions:
(2.3)
PX
t
1
B
1
, X
t
2
B
2
, . . . , X
tm
B
m
= PY
t
1
B
1
, Y
t
2
B
2
, . . . , Y
tm
B
m
for all nite subsets t

1
, t
2
, . . . , t
m
of indices and all choices of the measur-
able sets B
1
, B
2
, . . . , B
m
in the range space. This can be proved for example
with Lemma B.3 from the appendix.
Assuming that the probability space (, T, P) and the ltration T
t
are complete conveniently avoids certain measurability complications. For

example, if X is adapted and PX
t
= Y
t
= 1 for each t R
+
, then Y is
also adapted. To see the reason, let B B
R
, and note that
Y
t
B = X
t
B Y
t
B, X
t
/ B Y
t
/ B, X
t
B.
Since all subsets of zero probability events lie in T
t
, we conclude that there
are events D
1
, D
2
T
t
such that Y
t
B = X
t
B D
1
D
2
, which
shows that Y is adapted.
In particular, since the point of view is that indistinguishable processes
should really be viewed as one and the same process, it is sensible that such
processes cannot dier in adaptedness or measurability.
A stopping time is a random variable : [0, ] such that :
() t T
t
for each 0 t < . Many operations applied to stopping
times produce new stopping times. Often used ones include the minimum
and the maximum. If and are stopping times (for the same ltration)
then
t = t t T
t
so is a stopping time. Similarly can be shown to be a stopping
time.
Example 2.2. Here is an illustration of the notion of stopping time.
(a) If you instruct your stockbroker to sell all your shares in company
ABC on May 1, you are specifying a deterministic time. The time of sale
does not depend on the evolution of the stock price.
(b) If you instruct your stockbroker to sell all your shares in company
ABC as soon as the price exceeds 20, you are specifying a stopping time.
Whether the sale happened by May 5 can be determined by inspecting the
stock price of ABC Co. until May 5.
(c) Suppose you instruct your stockbroker to sell all your shares in com-
pany ABC on May 1 if the price will be lower on June 1. Again the sale
time depends on the evolution as in case (b). But now the sale time is not
a stopping time because to determine whether the sale happened on May 1
we need to look into the future.
So the notion of a stopping time is eminently sensible because it makes
precise the idea that todays decisions must be based on the information
available today.
If is a stopping time, the -eld of events known at time is dened
by
T
= A T : A t T
t
for all 0 t < .
A deterministic time is a special case of a stopping time. If () = u for all
, then T
= T
u
.
If X
t
is a process and is a stopping time, X
denotes the value of

the process at the random time , in other words X
() = X
()
(). The
random variable X
is dened on the event < , so not necessarily on

the whole space unless is nite. Or at least almost surely nite so that
X
is dened with probability one.

Here are some basic properties of these concepts. Innities arise natu-
rally, and we use the conventions that and = are true, but
< is not.
Lemma 2.3. Let and be stopping times, and X a process.
(i) For A T
, the events A and A < lie in T
. In
particular, implies T
.
(ii) Both and are T
-measurable. The events , < ,

and = lie in both T
and T
.
(iii) If the process X is progressively measurable then X() is T
-measurable
on the event < .
Proof. Part (i). Let A T
. For the rst statement, we need to show that

(A ) t T
t
. Write
(A ) t
= (A t) t t t.
All terms above lie in T
t
. (a) The rst by the denition of A T
. (b) The
second because both t and t are T
t
-measurable random variables: for
any u R, t u equals if u t and u if u < t, a member of
T
t
in both cases. (c) t T
t
since is a stopping time.
In particular, if , then T
.
To show A < T
, write
A < =
_
n1
A +
1
n
.
All members of the union on the right lie in T
by the rst part of the proof,

because +
1
n
implies A T
+1/n
. (And you should observe that for a
constant u 0, +u is also a stopping time.)
Part (ii). Since s t = st T
t
s, by the denition
of a stopping time, is T
-measurable. By the same token, the stopping

time is T
-measurable, hence also T
-measurable.
Taking A = in part (a) gives and < T
. Taking the
dierence gives = T
, and taking complements gives > and

T
. Now we can interchange and in the conclusions.

Part (iii). We claim rst that X(() t, ) is T
t
-measurable. To
see this, write it as the composition
(() t, ) X(() t, ).
The rst step (() t, ) is measurable as a map from (, T
t
) into
the product space
_
[0, t] , B
[0,t]
T
t
_
if the components have the correct
measurability. () t is measurable from T
t
into B
[0,t]
by a previous
part of the lemma. The other component is the identity map .
The second step of the composition is the map (s, ) X(s, ). By the
progressive measurability assumption for X, this step is measurable from
_
[0, t] , B
[0,t]
T
t
_
into (R, B
R
).
We have shown that X
t
B T
t
for B B
R
, and so
X
B, < t = X
t
B t T
t
.
This shows that X
B, < T
which was the claim.

All the stochastic processes we study will have some regularity properties
as functions of t, when is xed. These are regularity properties of paths.
A stochastic process X = X
t
: t R
+
is continuous if for each ,
the path t X
t
() is continuous as a function of t. The properties left-
continuous and right-continuous have the obvious analogous meaning. X is
right continuous with left limits (or cadlag as the French acronym for this
property goes) if the following is true for all :
X
t
() = lim
st
X
s
() for all t R
+
, and
the left limit X
t
() = lim
s,t
X
s
() exists for all t > 0.
Above s t means that s approaches t from above (from the right), and
s t approach from below (from the left). Finally, we also need to consider
the reverse situation, namely a process that is left continuous with right
limits, and for that we use the term caglad.
X is a nite variation process (FV process) if the path t X
t
() has
bounded variation on each compact interval [0, T].
We shall use all these terms also of a process that has a particular path
property for almost every . For example, if t X
t
() is continuous for
all in a set
0
of probability 1, then we can dene

X
t
() = X
t
() for

0
and

X
t
() = 0 for /
0
. Then

X has all paths continuous, and
X and

X are indistinguishable. Since we regard indistinguishable processes
as equal, it makes sense to regard X itself as a continuous process. When
we prove results under hypotheses of path regularity, we assume that the
path condition holds for each . Typically the result will be the same for
processes that are indistinguishable.
Note, however, that processes that are modications of each other can
have quite dierent path properties (Exercise 2.5).
The next two lemmas record some technical benets of path regularity.
Lemma 2.4. Let X be adapted to the ltration T
t
, and suppose X is
either left- or right-continuous. Then X is progressively measurable.
Proof. Suppose X is right-continuous. Fix T < . Dene on [0, T]
the function
X
n
(t, ) = X(0, ) 1
0
(t) +
2
n
1
k=0
X
_
(k+1)T
2
n
,
_
1
(kT2
n
,(k+1)T2
n
]
(t).
X
n
is a sum of products of B
[0,T]
T
T
-measurable functions, hence itself
B
[0,T]
T
T
-measurable. By right-continuity X
n
(t, ) X(t, ) as n ,
hence X is also B
[0,T]
T
T
-measurable when restricted to [0, T] .
We leave the case of left-continuity as an exercise.
Checking indistinguishability between two processes with some path reg-
ularity reduces to an a.s. equality check at a xed time.
Lemma 2.5. Suppose X and Y are right-continuous processes dened on
the same probability space. Suppose PX
t
= Y
t
= 1 for all t in some dense
countable subset S of R
+
. Then X and Y are indistinguishable. The same
conclusion holds under the assumption of left-continuity if 0 S.
Proof. Let
0
=

sS
: X
s
() = Y
s
(). By assumption, P(
0
) = 1.
Fix
0
. Given t R
+
, there exists a sequence s
n
in S such that s
n
t.
By right-continuity,
X
t
() = lim
n
X
sn
() = lim
n
Y
sn
() = Y
t
().
Hence X
t
() = Y
t
() for all t R
+
and
0
, and this says X and Y are
indistinguishable.
For the left-continuous case the origin t = 0 needs a separate assumption
because it cannot be approached from the left.
Filtrations also have certain kinds of limits and continuity properties.
Given a ltration T
t
, dene the -elds
(2.4) T
t+
=
s:s>t
T
s
.
T
t+
is a new ltration, and T
t+
T
t
. If T
t
= T
t+
for all t, we say T
t
is right-continuous. Performing the same operation again does not lead to

anything new: if (
t
= T
t+
then (
t+
= (
t
, as you should check. So in
particular T
t+
is a right-continuous ltration.
Right-continuity is the important property, but we could also dene
(2.5) T
0
= T
0
and T
t
=
_
_
s:s<t
T
s
_
for t > 0.
Since a union of -elds is not necessarily a -eld, T
t
needs to be dened
as the -eld generated by the union of T
s
over s < t. The generation step
was unnecessary in the denition of T
t+
because any intersection of -elds
is again a -eld. If T
t
= T
t
for all t, we say T
t
is left-continuous. .
It is convenient to note that, since T
s
depends on s in a monotone fash-
ion, the denitions above can be equivalently formulated through sequences.
For example, if s
j
> t is a sequence such that s
j
t, then T
t+
=

j
T
s
j
.
The assumption that T
t
is both complete and right-continuous is
sometimes expressed by saying that T
t
satises the usual conditions. In
many books these are standing assumptions. When we develop the sto-
chastic integral we assume the completeness. We shall not assume right-
continuity as a routine matter, and we alert the reader whenever that as-
sumption is used.
Since T
t
T
t+
, T
t+
admits more stopping times than T
t
. So there
are benets to having a right-continuous ltration. Let us explore this a
little.
Lemma 2.6. A [0, ]-valued random variable is a stopping time with
respect to T
t+
i < t T
t
for all t R
+
.
Proof. Suppose is an T
t+
-stopping time. Then for each n N,
t n
1
T
(tn
1
)+
T
t
,
and so < t =

n
t n
1
T
t
.
Conversely, if < t +n
1
T
t+n
1 for all n N, then for all m N,
t =

n:nm
< t +n
1
T
t+m
1. And so t

m
T
t+m
1 =
T
t+
.
Given a set H, dene
(2.6)
H
() = inft 0 : X
t
() H.
This is called the hitting time of the set H. Sometimes the term hitting
time is used for the inmum over t > 0, and then the above time is called
the rst entry time into the set H. These are the most important random
times we wish to deal with, so it is crucial to know whether they are stopping
times.
Lemma 2.7. Let X be a process adapted to a ltration T
t
and assume
X is left- or right-continuous. If G is an open set, then
G
is a stopping
time with respect to T
t+
. In particular, if T
t
is right-continuous,
G
is
a stopping time with respect to T
t
.
Proof. If the path s X
s
() is left- or right-continuous,
G
() < t i
X
s
() G for some s [0, t) i X
q
() G for some rational q [0, t). (If
X is right-continuous, every value X
s
for s [0, t) can be approached from
the right along values X
q
for rational q. If X is left-continuous this is true
for all values except s = 0, but 0 is among the rationals so it gets taken care
of.) Thus we have
G
< t =
_
qQ
+
[0,t)
X
q
G X
s
: 0 s < t T
t
.
Example 2.8. Assuming X continuous would not improve the conclusion
to
G
t T
t
. To see this, let G = (b, ) for some b > 0 and consider
the two paths
X
s
(
0
) = X
s
(
1
) = bs for 0 s 1
while
X
s
(
0
) = bs
X
s
(
1
) = b(2 s)
_
for s 1.
Now
G
(
0
) = 1 while
G
(
1
) = . Since X
s
(
0
) and X
s
(
1
) agree for
s [0, 1], the points
0
and
1
must be together either inside or outside any
event in T
X
1
[Exercise 1.7(c)]. But clearly
0

G
1 while
1
/
G

1. This shows that
G
1 / T
X
1
.
There is an alternative way to register arrival into a set, if we settle for
getting innitesimally close. For a process X, let X[s, t] = X(u) : s u
t, with (topological) closure X[s, t]. For a set H dene
(2.7)
H
= inft 0 : X[0, t] H ,= .
Note that for a cadlag path,
(2.8) X[0, t] = X(u) : 0 u t X(u) : 0 0. First we claim
H
t = X(0) H X(s) H or X(s) H for some s (0, t].
It is clear the event on the right is contained in the event on the left. To
prove the opposite containment, suppose
H
t. Then for every k there
exists u
k
t + 1/k such that either X(u
k
) H or X(u
k
) H. By
compactness, we may pass to a convergent subsequence u
k
s [0, t] (no
reason to introduce new notation for the subsequence). By passing to a
further subsequence, we may assume the convergence is monotone. We have
three cases to consider.
(i) If u
k
= s for some k, then X(s) H or X(s) H.
(ii) If u
k
< s for all k then both X(u
k
) and X(u
k
) converge to X(s),
which thus lies in H by the closedness of H.
(iii) If u
k
> s for all k then both X(u
k
) and X(u
k
) converge to X(s),
which thus lies in H.
The equality above is checked.
Let
H
n
= y : there exists x H such that [x y[ < n
1
be the n
1
-neighborhood of H. Let U contain all the rationals in [0, t] and
the point t itself. Next we claim
X(0) H X(s) H or X(s) H for some s (0, t]
=
n=1
_
qU
X(q) H
n
.
To justify this, note rst that if X(s) = y H for some s [0, t] or X(s) =
y H for some s (0, t], then we can nd a sequence q
j
U such that
X(q
j
) y, and then X(q
j
) H
n
for all large enough j. Conversely, suppose
we have q
n
U such that X(q
n
) H
n
for all n. Extract a convergent
subsequence q
n
s. By the cadlag property a further subsequence of X(q
n
)
converges to either X(s) or X(s). By the closedness of H, one of these
lies in H.
Combining the set equalities proved shows that
H
t T
t
.
Lemma 2.9 fails for caglad processes, unless the ltration is assumed
right-continuous (Exercise 2.12). For a continuous process X and a closed
set H the random times dened by (2.6) and (2.7) coincide. So we get this
corollary.
Corollary 2.10. Assume X is continuous and H is closed. Then
H
is a
stopping time.
Remark 2.11. (A look ahead.) The stopping times discussed above will
play a role in the development of the stochastic integral in the following way.
To integrate an unbounded real-valued process X we need stopping times
k
such that X
t
() stays bounded for 0 < t
k
(). Caglad processes
will be an important class of integrands. For a caglad X Lemma 2.7 shows
that
k
= inft 0 : [X
t
[ > k
are stopping times, provided T
t
is right-continuous. Left-continuity of X
then guarantees that [X
t
[ k for 0 < t
k
.
Of particular interest will be a caglad process X that satises X
t
= Y
t
for t > 0 for some adapted cadlag process Y . Then by Lemma 2.9 we get
the required stopping times by
k
= inft > 0 : [Y
t
[ > k or [Y
t
[ > k
without having to assume that T
t
Remark 2.12. (You can ignore all the above hitting time complications.)
The following is a known fact.
Theorem 2.13. Assume the ltration T
t
satises the usual conditions,
and X is a progressively measurable process with values in some metric space.
Then
H
dened by (2.6), or the same with inmum restricted to t > 0, are
stopping times for every Borel set H.
This is a deep theorem. A fairly accessible recent proof appears in [1].
The reader may prefer to use this theorem in the sequel, and always assume
that ltrations satisfy the usual conditions. We shall not do so in the text
to avoid proliferating the mysteries we have to accept without justication.
2.2. Quadratic variation
In stochastic analysis many processes turn out to have innite total variation
and it becomes necessary to use quadratic variation as a measure of path
oscillation. For example, we shall see in the next chapter that if a continuous
martingale M has nite variation, then M
t
= M
0
.
Let Y be a stochastic process. For a partition = 0 = t
0
< t
1
< <
t
m()
= t of [0, t] we can form the sum of squared increments
m()1
i=0
(Y
t
i+1
Y
t
i
)
2
.
We say that these sums converge to the random variable [Y ]
t
in probability
as mesh() = max
i
(t
i+1
t
i
) 0 if for each > 0 there exists > 0 such
that
(2.9) P
_
m()1
i=0
(Y
t
i+1
Y
t
i
)
2
[Y ]
t

_

for all partitions with mesh() . We express this limit as
(2.10) lim
mesh()0
i
(Y
t
i+1
Y
t
i
)
2
= [Y ]
t
in probability.
Denition 2.14. The quadratic variation process [Y ] = [Y ]
t
: t R
+
of
a stochastic process Y is dened by the limits in (2.10) provided this limit
exists for all t 0 and provided there is a version of the process [Y ] such
that the paths t [Y ]
t
() are nondecreasing for all . [Y ]
0
= 0 is part of
the denition.
We will see in the case of Brownian motion that the dening limits cannot
be required to hold almost surely, unless we pick the partitions carefully.
Hence limits in probability are used.
Between two processes we dene a quadratic covariation in terms of
quadratic variations.
Denition 2.15. Let X and Y be two stochastic processes on the same
probability space. The (quadratic) covariation process [X, Y ] = [X, Y ]
t
:
t R
+
is dened by
(2.11) [X, Y ] =
_
1
2
(X +Y )

_
1
2
(X Y )
provided the quadratic variation processes on the right exist in the sense of
Denition 2.14.
From the identity ab =
1
4
(a +b)
2
1
4
(a b)
2
applied to a = X
t
i+1
X
t
i
and b = Y
t
i+1
Y
t
i
it follows that, for each t R
+
,
(2.12) lim
mesh()0
i
(X
t
i+1
X
t
i
)(Y
t
i+1
Y
t
i
) = [X, Y ]
t
in probability.
Utilizing these limits together with the identities
ab =
1
2
_
(a +b)
2
a
2
b
2
_
=
1
2
_
a
2
+b
2
(a b)
2
_
gives these almost sure equalities at xed times:
[X, Y ]
t
=
1
2
_
[X +Y ]
t
[X]
t
[Y ]
t
_
a.s.
and
[X, Y ]
t
=
1
2
_
[X]
t
+ [Y ]
t
[X Y ]
t
_
a.s.
provided all the processes in question exist.
We dened [X, Y ] by (2.11) instead of by the limits (2.12) so that the
property [Y, Y ] = [Y ] is immediately clear. Limits (2.12) characterize [X, Y ]
t
only almost surely at each xed t and imply nothing about path properties.
This is why monotonicity was made part of Denition 2.14. Monotonicity
ensures that the quadratic variation of the identically zero process is in-
distinguishable from the zero process. (Exercise 2.5 gives a non-monotone
process that satises limits (2.10) for Y = 0.)
We give a preliminary discussion of quadratic variation and covariation
in this section. That quadratic variation exists for Brownian motion and
the Poisson process will be proved later in this chapter. The case of local
martingales is discussed in Section 3.4 of Chapter 3. Existence for FV
processes follows purely analytically from Lemma A.10 in Appendix A.
In the next proposition we show that if X is cadlag, then we can take
[X]
t
also cadlag. Technically, we show that for each t, [X]
t+
= [X]
t
almost
surely. Then the process [X]
t+
has cadlag paths and satises the denition
of quadratic variation. From denition (2.11) it then follows that [X, Y ] can
also be taken cadlag when X and Y are cadlag.
Furthermore, we show that the jumps of the covariation match the jumps
of the processes. For any cadlag process Z, the jump at t is denoted by
Z(t) = Z(t) Z(t).
The statement below about jumps can be considered preliminary. After
developing more tools we can strengthen the statement for semimartingales
Y so that the equality [Y ]
t
= (Y
t
)
2
is true for all t R
+
outside a single
exceptional event of probability zero. (See Lemma 5.40.)
Proposition 2.16. Suppose X and Y are cadlag processes, and [X, Y ] exists
in the sense of Denition 2.15. Then there exists a cadlag modication of
[X, Y ]. For any t, [X, Y ]
t
= (X
t
)(Y
t
) almost surely.
Proof. It suces to treat the case X = Y . Pick , > 0. Fix t < u. Pick
> 0 so that
(2.13) P
_
[X]
u
[X]
t
m()1
i=0
(X
t
i+1
X
t
i
)
2
<
_
> 1
whenever = t = t
0
< t
1
< < t
m
() = u is a partition of [t, u] with
mesh() < . (The little observation needed here is in Exercise 2.13(b).)
Pick such a partition . Keeping t
1
xed, rene further in [t
1
, u] so that
P
_
[X]
u
[X]
t
1

m()1
i=1
(X
t
i+1
X
t
i
)
2
<
_
> 1 .
Taking the intersection of these events, we have that with probability at
least 1 2,
[X]
u
[X]
t

m()1
i=0
(X
t
i+1
X
t
i
)
2
+
= (X
t
1
X
t
)
2
+
m()1
i=1
(X
t
i+1
X
t
i
)
2
+
(X
t
1
X
t
)
2
+ [X]
u
[X]
t
1
+ 2
which rearranges to
[X]
t
1
[X]
t
+ (X
t
1
X
t
)
2
+ 2.
Looking back, we see that this argument works for any t
1
(t, t + ). By
the monotonicity [X]
t+
[X]
t
1
, so for all these t
1
,
P
_
[X]
t+
[X]
t
+ (X
t
1
X
t
)
2
+ 2
_
> 1 2.
Shrink > 0 further so that for t
1
(t, t +) by right continuity
P
_
(X
t
1
X
t
)
2
<
_
> 1 .
The nal estimate is
P
_
[X]
t
[X]
t+
[X]
t
+ 3
_
> 1 3.
Since , > 0 were arbitrary, it follows that [X]
t+
= [X]
t
almost surely. As
explained before the statement of the proposition, this implies that we can
choose a version of [X] with cadlag paths.
To bound the jump at u, return to the partition chosen for (2.13).
Let s = t
m()1
. Keeping s xed rene suciently in [t, s] so that, with
probability at least 1 2,
[X]
u
[X]
t

m()1
i=0
(X
t
i+1
X
t
i
)
2
+
= (X
u
X
s
)
2
+
m()2
i=0
(X
t
i+1
X
t
i
)
2
+
(X
u
X
s
)
2
+ [X]
s
[X]
t
+ 2
which rearranges, through [X]
u
[X]
u
[X]
s
, to give
P
_
[X]
u
(X
u
X
s
)
2
+ 2
_
> 1 2.
Here s (u, u) was arbitrary. Again we can pick small enough so that
for all such s with probability at least 1 ,
(2.14)

(X
u
X
s
)
2
(X
u
)
2
< .
Since , are arbitrary, we have [X]
u
(X
u
)
2
almost surely.
For the other direction, return above to the partition with s = t
m()1
but now derive opposite inequalities:
[X]
u
[X]
t
(X
u
X
s
)
2
(X
u
)
2
2.
For any xed t 1 . Then
we have [X]
u
(X
u
)
2
3 with probability at least 1 3. Letting ,
to zero once more completes the proof.
In particular, we can say that in the cadlag case [Y ] is an increasing
process, according to this denition.
Denition 2.17. An increasing process A = A
t
: 0 t < is an
adapted process such that, for almost every , A
0
() = 0 and s A
s
() is
nondecreasing and right-continuous. Monotonicity implies the existence of
left limits A
t
, so it follows that an increasing process is cadlag.
Next two useful inequalities.
Lemma 2.18. Suppose the processes below exist. Then at a xed t,
(2.15) [ [X, Y ]
t
[ [X]
1/2
t
[Y ]
1/2
t
a.s.
and
(2.16)

[X]
t
[Y ]
t
[X Y ]
t
+ 2[X Y ]
1/2
t
[Y ]
1/2
t
a.s.
In the cadlag case the inequalities are valid simultaneously at all t R
+
,
with probability 1.
Proof. The last statement follows from the earlier ones because the inequal-
ities are a.s. valid simultaneously at all rational t, and then limits capture
all t R
+
.
Inequality (2.15) follows from Cauchy-Schwarz inequality
x
i
y
i

_
x
2
i
_
1/2
_
y
2
i
_
1/2
.
From
[X Y ] = [X] 2[X, Y ] + [Y ]
we get
[X] [Y ] = [X Y ] + 2
_
[X, Y ] [Y ]) = [X Y ] + 2[X Y, Y ]
where we used the equality
x
i
y
i
y
2
i
=
(x
i
y
i
)y
i
.
Utilizing (2.15),
[X] [Y ]
[X Y ] +
2[X Y, Y ]
[X Y ] + 2[X Y ]
1/2
[Y ]
1/2
.
In all our applications [X, Y ] will be a cadlag process. As the dierence
of two increasing processes in Denition (2.11), [X, Y ]
t
is BV on any compact
time interval. Lebesgue-Stieltjes integrals over time intervals with respect
to [X, Y ] have an important role in the development. These are integrals
with respect to the Lebesgue-Stieltjes measure
[X,Y ]
dened by
[X,Y ]
(a, b] = [X, Y ]
b
[X, Y ]
a
, 0 a < b < ,
as explained in Section 1.1.9. Note that there is a hidden in all these
quantities. This integration over time is done separately for each xed .
This kind of operation is called path by path because represents the
path of the underlying process [X, Y ].
When the origin is included in the time interval, we assume [X, Y ]
0
= 0,
so the Lebesgue-Stieltjes measure
[X,Y ]
give zero measure to the singleton
0. The Lebesgue-Stieltjes integrals obey the following useful inequality.
Proposition 2.19 (Kunita-Watanabe inequality). Fix such that [X], [Y ]
and [X, Y ] exist and are right-continuous on the interval [0, T]. Then for
any B
[0,T]
T-measurable bounded functions G and H on [0, T] ,
_
[0,T]
G(t, )H(t, ) d[X, Y ]
t
()
__
[0,T]
G(t, )
2
d[X]
t
()
_
1/2
__
[0,T]
H(t, )
2
d[Y ]
t
()
_
1/2
.
(2.17)
The integrals above are Lebesgue-Stieltjes integrals with respect to the t-
variable, evaluated for the xed .
Proof. Once is xed, the result is an analytic lemma, and the depen-
dence of G and H on is irrelevant. We included this dependence so that
the statement better ts its later applications. It is a property of product-
measurability that for a xed , G(t, ) and H(t, ) are measurable func-
tions of t.
Consider rst step functions
g(t) =
0
1
0
(t) +
m1
i=1
i
1
(s
i
,s
i+1
]
(t)
and
h(t) =
0
1
0
(t) +
m1
i=1
i
1
(s
i
,s
i+1
]
(t)
where 0 = s
1
< < s
m
= T is a partition of [0, T]. (Note that g and h
can be two arbitrary step functions. If they come with distinct partitions,
s
i
is the common renement of these partitions.) Then
_
[0,T]
g(t)h(t) d[X, Y ]
t
i
_
[X, Y ]
s
i+1
[X, Y ]
s
i
_
i
[
i
i
[
_
[X]
s
i+1
[X]
s
i
_
1/2
_
[Y ]
s
i+1
[Y ]
s
i
_
1/2
i
[
i
[
2
_
[X]
s
i+1
[X]
s
i
_
_
1/2
_
i
[
i
[
2
_
[Y ]
s
i+1
[Y ]
s
i
_
_
1/2
=
__
[0,T]
g(t)
2
d[X]
t
_
1/2
__
[0,T]
h(t)
2
d[Y ]
t
_
1/2
where we applied (2.15) separately on each partition interval (s
i
, s
i+1
], and
then Schwarz inequality.
Let g and h be two arbitrary bounded Borel functions on [0, T], and pick
0 < C < so that [g[ C and [h[ C. Let > 0. Dene the bounded
Borel measure
=
[X]
+
[Y ]
+[
[X,Y ]
[
on [0, T]. Above,
[X]
is the positive Lebesgue-Stieltjes measure of the
function t [X]
t
(for the xed under consideration), same for
[Y ]
,
and [
[X,Y ]
[ is the positive total variation measure of the signed Lebesgue-
Stieltjes measure
[X,Y ]
. By Lemma A.17 we can choose step functions g
and

h so that [ g[ C, [
h[ C, and
_
_
[g g[ +[h
h[
_
d <

2C
.
On the one hand
_
[0,T]
ghd[X, Y ]
t
_
[0,T]
g
hd[X, Y ]
t

_
[0,T]
gh g
d[
[X,Y ]
[
C
_
[0,T]
[g g[ d[
[X,Y ]
[ +C
_
[0,T]
[h
h[ d[
[X,Y ]
[ .
On the other hand,
_
[0,T]
g
2
d[X]
t
_
[0,T]
g
2
d[X]
t

_
[0,T]
g
2
g
2
d[X]
t
2C
_
[0,T]
[g g[ d[X]
t
,
with a similar bound for h. Putting these together with the inequality
already proved for step functions gives
_
[0,T]
ghd[X, Y ]
t
+
_
+
_
[0,T]
g
2
d[X]
t
_
1/2
_
+
_
[0,T]
h
2
d[Y ]
t
_
1/2
.
Since > 0 was arbitrary, we can let 0. The inequality as stated in the
proposition is obtained by choosing g(t) = G(t, ) and h(t) = H(t, ).
Remark 2.20. Inequality (2.17) has the following corollary. As in the proof,
let [
[X,Y ]()
[ be the total variation measure of the signed Lebesgue-Stieltjes
measure
[X,Y ]()
on [0, T]. For a xed , (1.12) implies that [
[X,Y ]()
[
[X,Y ]()
and the Radon-Nikodym derivative
(t) =
d[
[X,Y ]()
[
d
[X,Y ]()
(t)
on [0, T] satises [(t)[ 1. For an arbitrary bounded Borel function g on
[0, T]
_
[0,T]
g(t) [
[X,Y ]()
[(dt) =
_
[0,T]
g(t)(t) d[X, Y ]
t
().
Combining this with (2.17) gives
_
[0,T]
G(t, )H(t, )
[
[X,Y ]()
[(dt)
__
[0,T]
G(t, )
2
d[X]
t
()
_
1/2
__
[0,T]
H(t, )
2
d[Y ]
t
()
_
1/2
.
(2.18)
2.3. Path spaces, martingale property and Markov property
So far we have thought of a stochastic process as a collection of random
variables on a probability space. An extremely fruitful, more abstract view
regards a process as a probability distribution on a path space. This is a
natural generalization of the notion of probability distribution of a random
variable or a random vector. If Y = (Y
1
, . . . , Y
n
) is an R
n
-valued random
vector on a probability space (, T, P), its distribution is the Borel prob-
ability measure on R
n
dened by
(B) = P : Y () B, B B
R
n.
One can even forget about the abstract probability space (, T, P), and
redene Y on the concrete space (R
n
, B
R
n, ) as the identity random
variable Y (s) = s for s R
n
.
To generalize this notion to a process X = X
t
: 0 t < , we have
to choose a suitable measurable space U so that X can be thought of as a
measurable map X : U. For a xed the value X() is a function
t X
t
(), so the space U has to be a space of functions, or a path space.
The path regularity of X determines which space U will do. Here are the
three most important choices.
(i) Without any further assumptions, X is a measurable map into the
product space (R
d
)
[0,)
with product -eld B(R
d
)
[0,)
.
(ii) If X is an R
d
-valued cadlag process, then a suitable path space is
D = D
R
d[0, ), the space of R
d
-valued cadlag functions on [0, ), with
the -algebra generated by the coordinate projections (t) from D into
R
d
. It is possible to dene a metric on D that makes it a complete, separable
metric space, and under which the Borel -algebra is the one generated by
the coordinate mappings. Thus we can justiably denote this -algebra by
B
D
.
(iii) If X is an R
d
-valued continuous process, then X maps into C =
C
R
d[0, ), the space of R
d
-valued continuous functions on [0, ). This
space is naturally metrized by
(2.19) r(, ) =
k=1
2
k
_
1 sup
0tk
[(t) (t)[
_
, , C.
This is the metric of uniform convergence on compact sets. (C, r) is a
complete, separable metric space, and its Borel -algebra B
C
is generated
by the coordinate mappings. C is a subspace of D, and indeed the notions
of convergence and measurability in C coincide with the notions it inherits
as a subspace of D.
Generating the -algebra of the path space with the coordinate functions
guarantees that X is a measurable mapping from into the path space
(Exercise 1.7(b)). Then we can dene the distribution of the process on
the path space. For example, if X is cadlag, then dene (B) = PX B
for B B
D
. As in the case of the random vector, we can switch probability
spaces. Take (D, B
D
, ) as the new probability space, and dene the process
Y
t
on D via the coordinate mappings: Y
t
() = (t) for D. Then the
old process X and the new process Y have the same distribution, because
by denition
PX B = (B) = D : B = D : Y () B
= Y B.
One benet from this construction is that it leads naturally to a theory
of weak convergence of processes, which is used in many applications. This
comes from specializing the well-developed theory of weak convergence of
probability measures on metric spaces to the case of a path space.
The two most important general classes of stochastic processes are mar-
tingales and Markov processes. Both classes are dened by the relationship
of the process X = X
t
to a ltration T
t
. It is always rst assumed that
X
t
is adapted to T
t
.
Let X be a real-valued process. Then X is a martingale with respect to
T
t
if X
t
is integrable for each t, and
E[X
t
[T
s
] = X
s
for all s < t.
For the denition of a Markov process X can take its values in an ab-
stract space, but R
d
is suciently general for us. An R
d
-valued process X
is a Markov process with respect to T
t
if
(2.20) P[X
t
B[T
s
] = P[X
t
B[X
s
] for all s < t and B B
R
d.
A martingale represents a fair gamble in the sense that, given all the
information up to the present time s, the expectation of the future fortune
X
t
is the same as the current fortune X
s
. Stochastic analysis relies heavily
on martingale theory, parts of which are covered in the next chapter. The
Markov property is a notion of causality. It says that, given the present
state X
s
, future events are independent of the past.
These notions are of course equally sensible in discrete time. Let us
give the most basic example in discrete time, since that is simpler than
continuous time. Later in this chapter we will have sophisticated continuous-
time examples when we discuss Brownian motion and Poisson processes.
Example 2.21. (Random Walk) Let X
1
, X
2
, X
3
, . . . be a sequence of i.i.d.
random variables. Dene the partial sums by S
0
= 0, and S
n
= X
1
+ +X
n
for n 1. Then S
n
is a Markov chain (the term for a Markov process in
discrete time). If EX
i
= 0 then S
n
is a martingale. The natural ltration
to use here is the one generated by the process: T
n
= X
1
, . . . , X
n
.
Martingales are treated thoroughly in Chapter 3. In the remainder of
this section we discuss the Markov property and then the strong Markov
property. These topics are not necessary for all that follows, but we do
make use of the Markov property of Brownian motion for some calculations
and exercises.
With Markov processes it is natural to consider the whole family of
processes obtained by varying the initial state. In the previous example,
to have the random walk start at x, we simply say S
0
= x and S
n
=
x+X
1
+ +X
n
. The denition of such a family of processes is conveniently
expressed in terms of probability distributions on a path space. Below we
give the denition of a time-homogeneous cadlag Markov process. Time-
homogeneity means that the transition mechanism does not change with
time: given that the state of the process is x at time s, the chances of
residing in set A at time s +t depend on (x, t, A) but not on s.
On the path space D, the coordinate variables are dened by X
t
() =
(t) for D, and the natural ltration is T
t
= X
s
: s t. We write
also X(t) when subscripts are not convenient. The shift maps
s
: D D
are dened by (
s
)(t) = (s +t). In other words, the path
s
has its time
origin translated to s, and the path before s deleted. For an event A B
D
,
the inverse image
1
s
A = D :
s
A
represents the event that A happens starting at time s.
Denition 2.22. An R
d
-valued Markov process is a collection P
x
: x
R
d
of probability measures on D = D
R
d[0, ) with these properties:
(a) P
x
D : (0) = x = 1.
(b) For each A B
D
, the function x P
x
(A) is measurable on R
d
.
(c) P
x
[
1
t
A[T
t
]() = P
(t)
(A) for P
x
-almost every D, for every
x R
d
and A B
D
.
Requirement (a) in the denition says that x is the initial state under
the measure P
x
. You should think of P
x
as the probability distribution of
the entire process given that the initial state is x. Requirement (b) is for
technical purposes. Requirement (c) is the Markov property. It appears
qualitatively dierent from (2.20) because the event A can depend on the
entire process, but in fact (c) would be just as powerful if it were stated with
an event of the type A = X
s
B. The general case can then be derived
by rst going inductively to nite-dimensional events, and then by a -
argument to all A B
D
. In the context of Markov processes E
x
stands for
expectation under the measure P
x
.
Parts (b) and (c) together imply (2.20), with the help of property (viii)
of Theorem 1.26.
What (c) says is that, if we know that the process is in state y at time t,
then regardless of the past, the future of the process behaves exactly as a new
process started from y. Technically, conditional on X
t
= y and the entire
information in T
t
, the probabilities of the future process X
t+u
: u 0
obey the measure P
y
. Informally we say that at each time point the Markov
process restarts itself from its current state, forgetting its past. (Exercise
2.14 practices this in a simple calculation.)
There is also a related semigroup property for a family of operators. On
bounded measurable functions g on the state space, dene operators S(t)
by
(2.21) S(t)g(x) = E
x
[g(X
t
)].
(The way to think about this is that S(t) maps g into a new function S(t)g,
and the fomula above tells us how to compute the values S(t)g(x).) S(0)
is the identity operator: S(0)g = g. The semigroup property is S(s + t) =
S(s)S(t) where the multiplication of S(s) and S(t) means composition. This
can be checked with the Markov property:
S(s +t)g(x) = E
x
[g(X
s+t
)] = E
x
_
E
x
g(X
s+t
) [ T
s
= E
x
_
E
X(s)
g(X
t
)
= E
x
__
S(t)g
_
(X
s
)
= S(s)
_
S(t)g
_
(x).
Another question suggests itself. Statement (c) in Denition 2.22 restarts
the process at a particular time t from the state y that the process is at. But
since the Markov process is supposed to forget its past and transitions are
time-homogeneous, should not the same restarting take place for example
at the rst visit to state y? This scenario is not covered by statement (c)
because this rst visit happens at a random time. So we ask whether we
can replace time t in part (c) with a stopping time . With some additional
regularity the answer is armative. The property we are led to formulate
is called the strong Markov property.
The additional regularity needed is that the probability distribution
P
x
(X
t
) of the state at time t is a continuous function of the initial
state x. Recall the notion of weak convergence of probability measures for
spaces more general than R introduced below Denition 1.20. A Markov
process P
x
is called a Feller process if for all t 0, all x
j
, x R
d
, and all
g C
b
(R
d
) (the space of bounded continuous functions on R
d
)
(2.22) x
j
x implies E
x
j
[g(X
t
)] E
x
[g(X
t
)].
We formulate the strong Markov property more generally than the Markov
property for a function Y of both a time point and a path. This can be
advantageous for applications. Again, the state space could really be any
metric space but for concreteness we think of R
d
.
Theorem 2.23. Let P
x
be a Feller process with state space R
d
. Let
Y (s, ) be a bounded, jointly measurable function of (s, ) R
+
D and
a stopping time on D. Let the initial state x be arbitrary. Then on the
event < the equality
(2.23) E
x
[Y (, X
) [ T
]() = E
()
[Y ((), X)]
holds for P
x
-almost every path .
Before turning to the proof, let us sort out the meaning of statement
(2.23). First, we introduced a random shift
dened by (
)(s) =
(() + s) for those paths for which () < . On both sides of the
identity inside the expectations X denotes the identity random variable on
D: X() = . The purpose of writing Y (, X
) is to indicate that the

path argument in Y has been translated by , so that the expectation on
the left genuinely concerns the future process after time . On the right the
expectation should be read like this:
E
()
[Y ((), X)] =
_
D
Y ((), ) P
()
(d ).
In other words, the rst argument of Y on the right inherits the value ()
which (note!) has been xed by the conditioning on T
on the left side of

(2.23). The path argument obeys the probability measure P
()
, in other
words it is the process restarted from the current state ().
Example 2.24. Let us perform a simple calculation to illustrate the me-
chanics. Let = inft 0 : X
t
= z or X
t
= z be the rst hitting time
of point z, a stopping time by Lemma 2.9. Suppose the process is contin-
uous which we take to mean that P
x
(C) = 1 for all x. Suppose further
that P
x
( < ) = 1. From this and path continuity P
x
(X
= z) = 1.
Let us check that P
x
(X
+t
A) = P
z
(X
t
A) by using (2.23). Take
Y (s, ) = Y () = 1(t) A and then use X() = z:
P
x
(X
+t
A) = E
x
[Y
] = E
x
_
E
x
(Y
[ T
= E
x
[E
X()
(Y )] = E
x
[E
z
(Y )] = E
z
(Y ) = P
z
(X
t
A).
The E
x
-expectation goes away because the integrand E
z
(Y ) is no longer
random, it is merely a constant.
Remark 2.25. Up to now in our discussion we have used the ltration T
t
on D or C generated by the coordinates. Our important examples, such

as Brownian motion and the Poisson process, actually satisfy the Markov
property under the larger ltration T
t+
. The strong Markov property
holds also for the larger ltration because the proof below goes through as
long as the Markov property is true.
Proof of Theorem 2.23. Let A T
. What needs to be shown is that

(2.24) E
x
_
1
A
1
<
Y (, X
=
_
A<
E
()
[Y ((), X)] P
x
(d).
As so often with these proofs, we do the actual proof for a special case and
then appeal to a general principle to complete the proof.
First we assume that all the possible nite values of can be arranged
in an increasing sequence t
1
< t
2
< t
3
< . Then
E
x
_
1
A
1
<
Y (, X
n
E
x
_
1
A
1
=tn
Y (, X
n
E
x
_
1
A
1
=tn
Y (t
n
, X
tn
)
n
E
x
_
1
A
1
=tn
E
x
Y (t
n
, X
tn
) [ T
tn
n
E
x
_
1
A
1
=tn
E
(tn)
Y (t
n
, X)
= E
x
_
1
A
1
<
E
()
Y ((), X)
where we used the basic Markov property and wrote for the path variable
in the last two E
x
-expectations in order not to mix this up with the variable
X inside the E
()
-expectation.
Now let be a general stopping time. The next stage is to dene
n
=
2
n
([2
n
] +1). Check that these are stopping times that satisfy
n
< =
< and
n
as n . Since
n
> , A T
T
n
. The possible
nite values of
n
are 2
n
k : k N and so we already know the result for
n
:
(2.25)
E
x
_
1
A
1
<
Y (
n
, X
n
)
=
_
A<
E
(n)
[Y (
n
(), X)] P
x
(d).
The idea is to let n above and argue that in the limit we recover (2.24).
For this we need some continuity. We take Y of the following type:
(2.26) Y (s, ) = f
0
(s)
m
i=1
f
i
((s
i
))
where 0 s
1
< . . . < s
n
are time points and f
0
, f
1
, . . . , f
m
are bounded con-
tinuous functions. Then on the left of (2.25) we have inside the expectation
Y (
n
,
n
) = f
0
(
n
)
m
i=1
f
i
((
n
+s
i
))
n
f
0
()
m
i=1
f
i
(( +s
i
)) = Y (,
).
The limit above used the right continuity of the path . Dominated conver-
gence will now take the left side of (2.25) to the left side of (2.24).
On the right of (2.25) we have inside the integral
E
(n)
[Y (
n
(), X)] =
_
Y (
n
(), ) P
(n)
(d)
= f
0
(
n
())
_
m
i=1
f
i
((s
i
)) P
(n)
(d).
In order to assert that the above converges to the integrand on the right
side of (2.24), we need to prove the continuity of
f
0
(s)
_
m
i=1
f
i
((s
i
)) P
x
(d)
as a function of (s, x) R
+
R
d
. Since products preserve continuity, what
really is needed is the extension of the denition (2.22) of the Feller property
to expectations of more complicated continuous functions of the path. We
leave this to the reader (Exercise 2.15). Now dominated convergence again
takes the right side of (2.25) to the right side of (2.24).
We have veried (2.24) for Y of type (2.26), and it remains to argue that
this extends to all bounded measurable Y . This will follow from Theorem
B.2. The class 1 in that Theorem are sets of the type B = (s, ) : s
A
0
, (s
1
) A
1
, . . . , (s
m
) A
m
where A
0
R
+
and A
i
R
d
are closed
sets. (2.26) holds for 1
B
because we can nd continuous 0 f
i,n
1 such
that f
i,n
1
A
i
as n , and then the corresponding Y
n
1
B
and (2.26)
extends by dominated convergence. Sets of type B generate B
R
+
B
D
because this -algebra is generated by coordinate functions. This completes
the proof of the strong Markov property for Feller processes.
Next we discuss the two most important processes, Brownian motion
and the Poisson process.
2.4. Brownian motion
Denition 2.26. One some probability space (, T, P), let T
t
be a ltra-
tion and B = B
t
: 0 t < an adapted real-valued stochastic process.
Then B is a one-dimensional Brownian motion with respect to T
t
if it has
these two properties.
(i) For almost every , the path t B
t
() is continuous.
(ii) For 0 s < t, B
t
B
s
is independent of T
s
and has normal
distribution with mean zero and variance t s.
If furthermore
(iii) B
0
= 0 almost surely
then B is a standard Brownian motion. Since the denition involves both
the process and the ltration, sometimes one calls this B an T
t
Browian
motion, or the pair B
t
, T
t
: 0 t < is called the Brownian motion.
To be explicit, point (ii) of the denition can be expressed by the re-
quirement
E
_
Z h(B
t
B
s
)
= E(Z)
1
_
2(t s)
_
R
h(x) exp
_
x
2
2(t s)
_
dx
for all bounded T
s
-measurable random variables Z and all bounded Borel
functions h on R. By an inductive argument, it follows that for any 0
s
0
< s
1
< < s
n
, the increments
B
s
1
B
s
0
, B
s
2
B
s
1
, . . . , B
sn
B
s
n1
are independent random variables and independent of T
s
0
(Exercise 2.16).
Furthermore, the joint distribution of the increments is not changed by a
shift in time: namely, the joint distribution of the increments above is the
same as the joint distribution of the increments
B
t+s
1
B
t+s
0
, B
t+s
2
B
t+s
1
, . . . , B
t+sn
B
t+s
n1
for any t 0. These two points are summarized by saying that Brownian
motion has stationary, independent increments.
A d-dimensional standard Brownian motion is an R
d
-valued process
B
t
= (B
1
t
, . . . , B
d
t
) with the property that each component B
i
t
is a one-
dimensional standard Brownian motion (relative to the underlying ltration
T
t
), and the coordinates B
1
, B
2
, . . . , B
d
are independent. This is equiv-
alent to requiring that
(i) B
0
= 0 almost surely.
(ii) For almost every , the path t B
t
() is continuous.
(iii) For 0 s < t, B
t
B
s
is independent of T
s
, and has multivariate
normal distribution with mean zero and covariance matrix (t
s) I
dd
.
Above, I
dd
is the d d identity matrix.
To create a Brownian motion B
t
with a more general initial distribution
(the probability distribution of B
0
), take a standard Brownian motion
(
B
t
,

T
t
) and a -distributed random variable X independent of

T
, and
dene B
t
= X +

B
t
. The ltration is now T
t
= X,

T
t
. Since B
t
B
s
=
B
t

B
s
,

T
is independent of X, and

B
t

B
s
is independent of

T
s
, Exercise
1.10(c) implies that B
t
B
s
is independent of T
s
. Conversely, if a process B
t
satises parts (i) and (ii) of Denition 2.26, then

B
t
= B
t
B
0
is a standard
Brownian motion, independent of B
0
.
The construction (proof of existence) of Brownian motion is rather tech-
nical, and hence relegated to Section B.2 in the Appendix. For the un-
derlying probability space the construction uses the canonical path space
C = C
R
[0, ). Let B
t
() = (t) be the coordinate projections on C, and
T
B
t
= B
s
: 0 s t the ltration generated by the coordinate process.
Theorem 2.27. There exists a Borel probability measure P
0
on the path
space C = C
R
[0, ) such that the process B = B
t
: 0 t < on
the probability space (C, B
C
, P
0
) is a standard one-dimensional Brownian
motion with respect to the ltration T
B
t
.
The proof of this existence theorem relies on the Kolmogorov Extension
Theorem 1.28. The probability measure P
0
on C constructed in the theorem
is called Wiener measure. Brownian motion itself is sometimes also called
the Wiener process. Once we know that Brownian motion starting at the
origin exists, we can construct Brownian motion with an arbitrary initial
point (random or deterministic) following the description after Denition
2.26.
The construction gives us the following regularity property of paths. Fix
0 < <
1
2
. For P
0
-almost every C,
(2.27) sup
0s<tT
[B
t
() B
s
()[
[t s[
< for all T < .

We shall show later in this section that this property is not true for >
1
2
.
For = 1 this property would be called local Lipschitz continuity, while
for 0 < < 1 it is called local Holder continuity with exponent . What
happens if > 1?
2.4.1. Properties of Brownian motion. We discuss here mainly prop-
erties of the one-dimensional case. The multidimensional versions of the
matters we discuss follow naturally from the one-dimensional case. We shall
use the term Brownian motion to denote a process that satises (i) and (ii)
of Denition 2.26, and call it standard if also B
0
= 0.
A fundamental property of Brownian motion is that it is both a martin-
gale and a Markov process.
Proposition 2.28. Suppose B = B
t
is a Brownian motion with respect
to a ltration T
t
on (, T, P). Then B
t
and B
2
t
t are martingales with
respect to T
t
.
Proof. Follows from the properties of Brownian increments and basic prop-
erties of conditional expectations. Let s < t.
E[B
t
[T
s
] = E[B
t
B
s
[T
s
] +E[B
s
[T
s
] = B
s
,
and
E[B
2
t
[T
s
] = E[(B
t
B
s
+B
s
)
2
[T
s
]
= E[(B
t
B
s
)
2
[T
s
] + 2B
s
E[B
t
B
s
[T
s
] +B
2
s
= (t s) +B
2
s
.
Next we show that Brownian motion restarts itself independently of the
past. This is the heart of the Markov property. Also, it is useful to know
that the ltration of a Brownian motion can always be both augmented with
the null events and made right-continuous.
Proposition 2.29. Suppose B = B
t
is a Brownian motion with respect
to a ltration T
t
on (, T, P).
(a) We can assume that T
t
contains every set A for which there exists
an event N T such that A N and P(N) = 0. (This is the notion of a
complete or augmented ltration introduced earlier.) Furthermore, B = B
t
is also a Brownian motion with respect to the right-continuous ltration

T
t+
.
(b) Fix s R
+
and dene Y
t
= B
s+t
B
s
. Then the process Y = Y
t
:
0 t < is independent of T
s+
and it is a standard Brownian motion
with respect to the ltration (
t
dened by (
t
= T
(s+t)+
.
Proof. Part (a). Denition (2.2) shows how to complete the ltration. Of
course, the adaptedness of B to the ltration is not harmed by enlarging
the ltration, the issue is the independence of

T
s
and B
t
B
s
. If G T has
A T
s
such that P(AG) = 0, then P(G H) = P(A H) for any event
H. In particular, the independence of

T
s
from B
t
B
s
follows.
The rest follows from a single calculation. Fix s 0 and 0 = t
0
< t
1
<
t
2
< < t
n
, and for h 0 abbreviate
(h) = (B
s+h+t
1
B
s+h
, B
s+h+t
2
B
s+h+t
1
, . . . , B
s+h+tn
B
s+h+t
n1
)
for a vector of Brownian increments. Let Z be a bounded T
s+
-measurable
random variable. For each h > 0, Z is T
s+h
-measurable and by the inde-
pendence property of Brownian motion (Exercise 2.16), (h) is independent
of Z. Let f be a bounded continuous function on R
n
. By path continuity,
independence, and nally the stationarity of Brownian increments,
(2.28)
E
_
Z f((0))
= lim
h0
E
_
Z f((h))
= lim
h0
EZ E
_
f((h))
= EZ E
_
f((0))
= EZ E
_
f(B
t
1
B
0
, B
t
2
B
t
1
, . . . , B
tn
B
t
n1
)
.
The equality of the rst and last members of the above calculation extends
by Lemma B.4 from continuous f to bounded Borel f.
Now we can harvest the conclusions. The independence of T
s+
and
B
t
B
s
is contained in (2.28) so the fact that B is a Brownian motion
with respect to T
t+
has been proved. Secondly, since (0) = (Y
t
1
, Y
t
2

Y
t
1
, . . . , Y
tn
Y
t
n1
) and the vector = (Y
t
1
, Y
t
2
, . . . , Y
tn
) is a function of
(0), we conclude that is independent of T
s+
. This being true for all
choices of time points 0 < t
1
< t
2
< < t
n
implies that the entire process
Y is independent of T
s+
, and the last member of (2.28) shows that given
T
s+
, Y has the distribution of standard Brownian motion.
Finally, the independence of Y
t
2
Y
t
1
and (
t
1
is the same as the inde-
pendence of B
s+t
2
B
s+t
1
of T
(s+t
1
)+
which was already argued.
Parts (a) and (b) of the lemma together assert that Y is a standard
Brownian motion, independent of

T
t+
, the ltration obtained by replac-
ing T
t
with the augmented right-continuous version. (The order of the
two operations on the ltration is immaterial, in other words the -algebra
s:s>t
T
s
agrees with the augmentation of

s:s>t
T
s
, see Exercise 2.4.)
The key calculation (2.28) of the previous proof only used right-continuity.
Thus the same argument gives us this lemma that we can apply to other
processes.
Lemma 2.30. Suppose X = (X
t
: t R
+
) is a right-continuous process
adapted to a ltration T
t
and for all s < t the increment X
t
X
s
is
independent of T
s
. Then X
t
X
s
is also independent of

T
s+
.
Next we develop some properties of Brownian motion by concentrating
on the canonical setting. The underlying probability space is the path
space C = C
R
[0, ) with the coordinate process B
t
() = (t) and the
ltration T
B
t
= B
s
: 0 s t generated by the coordinates. For each
x R there is a probability measure P
x
on C under which B = B
t
is
Brownian motion started at x. Expectation under P
x
is denoted by E
x
and
satises
E
x
[H] = E
0
[H(x +B)]
for any bounded B
C
-measurable function H. On the right x + B is a sum
of a point and a process, interpreted as the process whose value at time t is
x + B
t
. (In Theorem 2.27 we constructed P
0
, and the equality above can
be taken as the denition of P
x
.)
On C we have the shift maps
s
: 0 s < dened by (
s
)(t) =
(s +t) that move the time origin to s. The shift acts on the process B by
s
B = B
s+t
: t 0.
A consequence of Lemma 2.29(a) is that the coordinate process B is a
Brownian motion also relative to the larger ltration T
B
t+
=

s:s>t
T
B
s
. We
shall show that members of T
B
t
and T
B
t+
dier only by null sets. (These
-algebras are dierent, see Exercise 2.10.) This will have interesting con-
sequences when we take t = 0. We begin with the Markov property.
Proposition 2.31. Let H be a bounded B
C
-measurable function on C.
(a) E
x
[H] is a Borel measurable function of x.
(b) For each x R
(2.29) E
x
[H
s
[T
B
s+
]() = E
Bs()
[H] for P
x
-almost every .
Proof. Part (a). Suppose we knew that x P
x
(F) is measurable for each
closed set F C. Then the - Theorem B.1 implies that x P
x
(A) is
measurable for each A B
C
(ll in the details for this claim as an exercise).
Since linear combinations and limits preserve measurability, it follows that
x E
x
[H] is measurable for any bounded B
C
-measurable function H.
To show that x P
x
(F) is measurable for each closed set F, consider
rst a bounded continuous function H on C. (Recall that C is metrized by
the metric (2.19) of uniform continuity on compact intervals.) If x
j
x in
R, then by continuity and dominated convergence,
E
x
j
[H] = E
0
[H(x
j
+B)] E
0
[H(x +B)] = E
x
[H]
so E
x
[H] is continuous in x, which makes it Borel measurable. The indicator
function 1
F
of a closed set can be written as a bounded pointwise limit of
continuous functions H
n
(see (B.2) in the appendix). So it follows that
P
x
(F) = lim
n
E
x
[H
n
]
is also Borel measurable in x.
Part (b). We can write the shifted process as
s
B = B
s
+ Y where
Y
t
= B
s+t
B
s
. Let Z be a bounded T
B
s+
-measurable random variable. By
Lemma 2.29(b), Y is a standard Brownian motion, independent of (Z, B
s
)
because the latter pair is T
B
s+
-measurable. Consequently
E
x
_
Z H(
s
B)
= E
x
[Z H(B
s
+Y )]
=
_
C
E
x
_
Z H(B
s
+)
P
0
(d).
By independence the expectation over Y can be separated from the expec-
tation over (Z, B
s
) (this is justied in Exercise 1.12). P
0
is the distribution
of Y because Y is a standard Brownian motion. Next move the P
0
(d)
integral back inside, and observe that
_
C
H(y +) P
0
(d) = E
y
[H]
for any point y, including y = B
s
(). This gives
E
x
_
Z H(
s
B)
= E
x
_
Z E
Bs
(H)
.
The proof is complete.
Proposition 2.32. Let H be a bounded B
C
-measurable function on C. Then
for any x R and 0 s < ,
(2.30) E
x
[H[T
B
s+
] = E
x
[H[T
B
s
] P
x
-almost surely.
Proof. Suppose rst H is of the type
H() =
n
i=1
1
A
i
((t
i
))
for some 0 t
1
< t
2
< < t
n
and A
i
B
R
. By separating those factors
where t
i
s, we can write H = H
1
(H
2

s
) where H
1
is T
B
s
-measurable.
Then
E
x
[H[T
B
s+
] = H
1
E
x
[H
2

s
[T
B
s+
] = H
1
E
Bs
[H
2
]
which is T
B
s
-measurable. Since T
B
s+
contains T
B
s
, (2.30) follows from prop-
erty (viii) of conditional expectations given in Theorem 1.26.
Let H be the collection of bounded functions H for which (2.30) holds.
By the linearity and the monotone convergence theorem for conditional ex-
pectations (Theorem B.12), H satises the hypotheses of Theorem B.2. For
the -system o needed for Theorem B.2 take the class of events of the form
: (t
1
) A
1
, . . . , (t
n
) A
n
for 0 t
1
< t
2
< < t
n
and A
i
B
R
. We checked above that indicator
functions of these sets lie in H. Furthermore, these sets generate B
C
because
B
C
is generated by coordinate projections. By Theorem B.2 H contains all
bounded B
C
-measurable functions.
Corollary 2.33. If A T
B
t+
then there exists B T
B
t
such that P
x
(AB) =
0.
Proof. Let Y = E
x
(1
A
[T
B
t
), and B = Y = 1 T
B
t
. The event AB is
contained in 1
A
,= Y , because AB implies 1
A
() = 1 ,= Y (), while
B A implies Y () = 1 ,= 0 = 1
A
(). By (2.30), 1
A
= Y P
x
-almost
surely. Hence 1
A
,= Y , and thereby AB, is a P
x
-null event.
Corollary 2.34. (Blumenthals 01 Law) Let x R. Then for A T
B
0+
,
P
x
(A) is 0 or 1.
Proof. The -algebra T
B
0
satises the 01 law under P
x
, because P
x
B
0

G = 1
G
(x). Then every P
x
-conditional expectation with respect to T
B
0
equals the expectation (Exercise 1.15). The following equalities are valid
P
x
-almost surely for A T
B
0+
:
1
A
= E
x
(1
A
[T
B
0+
) = E
x
(1
A
[T
B
0
) = P
x
(A).
Thus there must exist points C such that 1
A
() = P
x
(A), and so the
only possible values for P
x
(A) are 0 and 1.
From the 01 law we get a fact that suggests something about the fast
oscillation of Brownian motion: if it starts at the origin, then in any nontriv-
ial time interval (0, ) the process is both positive and negative, and hence
by continuity also zero. To make this precise, dene
= inft > 0 : B
t
> 0, = inft > 0 : B
t
< 0,
and T
0
= inft > 0 : B
t
= 0.
(2.31)
Corollary 2.35. P
0
-almost surely = = T
0
= 0.
Proof. To see that the event = 0 lies in T
B
0+
, write
= 0 =
m=n
B
q
> 0 for some rational q (0,
1
m
) T
B
n
1
.
Since this is true for every n N, = 0

nN
T
B
n
1
= T
B
0+
. Same
argument shows = 0 T
B
0+
.
Since each variable B
t
is a centered Gaussian,
P
0

1
m
P
0
B
1/m
> 0 =
1
2
and so
P
0
= 0 = lim
m
P
0

1
m

1
2
.
The convergence of the probability happens because the events
1
m
shrink down to = 0 as m . By Blumenthals 01 law, P

0
= 0 =
0 or 1, so this quantity has to be 1. Again, the same argument for .
Finally, x so that () = () = 0. Then there exist s, t (0, )
such that B
s
() < 0 0 can be taken arbitrarily small,
T
0
() = 0.
Identity (2.29) already veried that Brownian motion is a Markov pro-
cess, also under the larger ltration (
t
= T
B
t+
. Let us strengthen that next.
Proposition 2.36. Brownian motion is a Feller process, and consequently
the strong Markov process is valid under the ltration (
t
.
Proof. It only remains to observe the Feller property: for g C
b
(R)
E
x
[g(B
t
)] =
1
2t
_
R
g(x +y) exp
_
y
2
2t
_
dx
and the continuity as a function of x is clear by dominated convergence.
A natural way to understand the strong Markov property of Brownian
motion is that, on the event < , the process

B
t
= B
+t
B
is a
standard Brownian motion, independent of (
. Formally we can extract

this point from the statement of the strong Markov property as follows.
Given a bounded measurable function g on the path space C, dene h by
h() = g( (0)). Then
E
x
[g(
B) [ (
]() = E
x
[h
[ (
]() = E
()
[h] = E
0
[g].
The last equality comes from the denition of the measures P
x
:
E
x
[h] = E
0
[h(x +B)] = E
0
[g(x +B x)] = E
0
[g].
The Markov and strong Markov properties are valid also for d-dimensional
Brownian motion. The denitions and proofs are straightforward extensions.
Continuing still in 1 dimension, a favorite application of the strong
Markov property of Brownian motion is the reection principle, which is
useful for calculations. Dene the running maximum of Brownian motion
by
(2.32) M
t
= sup
0st
B
s
Proposition 2.37. (Reection principle) Let a b and b > 0 be real num-
bers. Then
(2.33) P
0
(B
t
a, M
t
b) = P
0
(B
t
2b a).
Each inequality in the statement above can be either strict or weak
because B
t
has a continuous distribution (Exercise 2.18).
Proof. Let
b
= inft 0 : B
t
= b
be the hitting time of point b. By path continuity M
t
b is equivalent to
b
t. In the next calculation we put the in explicitly to indicate which
quantities are random for the outer expectation but constant for the inner
probability.
P
0
(B
t
() a, M
t
() b) = P
0
(
b
() t, B
t
() a)
= E
0
_
1
b
() tP
0
(B
t
a [ T
b
)()
= E
0
_
1
b
() tP
b
(B
t
b
()
a)
= E
0
_
1
b
() tP
b
(B
t
b
()
2b a)
= P
0
(B
t
() 2b a, M
t
() b) = P
0
(B
t
() 2b a).
On the third and the fourth line
b
() appears in two places, and it is a
constant in the inner probability. Then comes the reection: by symmetry,
Brownian motion started at b is equally likely to reside below a as above
b + (b a) = 2b a. The last equality drops the condition on M
t
that is
now superuous because 2b a b.
The reader may feel a little uncomfortable about the cavalier way of
handling the strong Markov property. Let us rm it up by introducing
explicitly Y (s, ) that allows us to mechanically apply (2.23):
Y (s, ) = 1s t, (t s) 2b a 1s t, (t s) a.
By symmetry of Brownian motion, for any s t,
E
b
Y (s, B) = P
b
B
ts
2b a P
b
B
ts
a
= P
0
B
ts
b a P
0
B
ts
a b = 0.
Consequently, since
Y (
b
,
b
) = 1
b
t, (t) 2b a 1
b
t, (t) a
we have, writing X for an identity mapping on C in the innermost expecta-
tion,
0 = E
0
_
1
b
tE
b
Y (
b
, X)
= E
0
_
1
b
tE
0
Y (
b
,
b
B) [ T
= E
0
_
1
b
tY (
b
,
b
B)
= P
0
b
t, B
t
2b a P
0
b
t, B
t
a
= P
0
B
t
2b a P
0
b
t, B
t
a.
An immediate corollary of (2.33) is that M
t
has the same distribution
as [B
t
[: taking a = b > 0
(2.34)
P
0
(M
t
> b) = P
0
(B
t
b, M
t
> b) +P
0
(B
t
> b, M
t
> b)
= P
0
(B
t
> b) +P
0
(B
t
> b) = 2P
0
(B
t
> b) = P
0
([B
t
[ > b).
With a little more eort one can write down the joint density of (B
t
, M
t
)
(Exercise 2.19).
2.4.2. Path regularity of Brownian motion. As a byproduct of the
construction of Brownian motion in Section B.2 we obtained Holder conti-
nuity of paths with any exponent strictly less than
1
2
.
Theorem 2.38. Fix 0 < <
1
2
. The following is true almost surely for
Brownian motion: for every T < there exists a nite constant C() such
that
(2.35) [B
t
() B
s
()[ C()[t s[
for all 0 s, t T.
Next we prove a result from the opposite direction. Namely, for an
exponent strictly larger than
1
2
there is not even local Holder continuity.
(Local here means that the property holds in a small enough interval
around a given point.)
Theorem 2.39. Let B be a Brownian motion. For nite positive reals ,
C, and dene the event
G(, C, ) = there exists s R
+
such that [B
t
B
s
[ C[t s[
for all t [s , s +].

Then if >
1
2
, P
_
G(, C, )
_
= 0 for all positive C and .
Proof. Fix >
1
2
. Since only increments of Brownian motion are involved,
we can assume that the process in question is a standard Brownian motion.
(B
t
and

B
t
= B
t
B
0
have the same increments.) In the proof we want to
deal only with a bounded time interval. So dene
H
k
(C, ) = there exists s [k, k + 1] such that [B
t
B
s
[ C[t s[
for all t [s , s +] [k, k + 1].

G(, C, ) is contained in
k
H
k
(C, ), so it suces to show P
_
H
k
(C, )
_
= 0
for all k. Since Y
t
= B
k+t
B
k
is a standard Brownian motion, P
_
H
k
(C, )
_
=
P
_
H
0
(C, )
_
for each k. Finally, what we show is P
_
H
0
(C, )
_
= 0.
Fix m N such that m(
1
2
) > 1. Let H
0
(C, ), and pick s [0, 1]
so that the condition of the event is satised. Consider n large enough so
that m/n < . Imagine partitioning [0, 1] into intervals of length
1
n
. Let
X
n,k
= max[B
(j+1)/n
B
j/n
[ : k j k +m1 for 0 k n m.
The point s has to lie in one of the intervals [
k
n
,
k+m
n
], for some 0 k nm.
For this particular k,
[B
(j+1)/n
B
j/n
[ [B
(j+1)/n
B
s
[ +[B
s
B
j/n
[
C
_
[
j+1
n
s[
+[s
j
n
[
_
2C(
m
n
)
for all the j-values in the range k j k + m 1. (Simply because

the points
j
n
and
j+1
n
are within of s. Draw a picture.) In other words,
X
n,k
2C(
m
n
)
for this k-value.

Now consider all the possible k-values, recall that Brownian increments
are stationary and independent, and note that by basic Gaussian properties,
B
t
has the same distribution as t
1/2
B
1
.
P
_
H
0
(C, )
_

nm
k=0
PX
n,k
2C(
m
n
)
nPX
n,0
2C(
m
n
)
= n
m1
j=0
P
_
[B
(j+1)/n
B
j/n
[ 2C(
m
n
)
_
= nP
_
[B
1/n
[ 2C(
m
n
)
_
m
= nP
_
[B
1
[ 2Cn
1/2
m
_
m
= n
_
1
2
_
2Cn
1/2
m
2Cn
1/2
m
e
x
2
/2
dx
_
m
n
_
1
2
4Cn
1/2
m
_
m
K(m)n
1m(1/2)
.
In the last stages above we bounded e
x
2
/2
above by 1, and then collected
some of the constants and m-dependent quantities into the function K(m).
The bound is valid for all large n, while m is xed. Thus we may let n ,
and obtain P
_
H
0
(C, )
_
= 0. This proves the theorem.
Corollary 2.40. The following is true almost surely for Brownian motion:
the path t B
t
() is not dierentiable at any time point.
Proof. Suppose t B
t
() is dierentiable at some point s. This means
that there is a real-valued limit
= lim
ts
B
t
() B
s
()
t s
.
Thus if the integer M satises M > [[ + 1, we can nd another integer k
such that for all t [s k
1
, s +k
1
],
M
B
t
() B
s
()
t s
M = [B
t
() B
s
()[ M[t s[.
Consequently G(1, M, k
1
).
This reasoning shows that if t B
t
() is dierentiable at even a single
time point, then lies in the union

M
k
G(1, M, k
1
). This union has
probability zero by the previous theorem.
Functions of bounded variation are dierences of nondecreasing func-
tions, and monotone functions can be dierentiated at least Lebesguealmost
everywhere. Hence the above theorem implies that Brownian motion paths
are of unbounded variation on every interval.
To recapitulate, the previous results show that a Brownian path is Holder
continuous with any exponent <
1
2
but not for any >
1
2
. For the sake of
completeness, here is the precise result. Proofs can be found for example in
[7, 13].
Theorem 2.41. (Levys modulus of continuity.) Almost surely,
lim
0
sup
0s<t1
ts
B
t
B
s
_
2 log
1
= 1.
Next we show that Brownian motion has nite quadratic variation. Qua-
dratic variation was introduced in Section 2.2. This notion occupies an
important role in stochastic analysis and will be discussed more in the mar-
tingale chapter. As a corollary we get another proof of the unbounded
variation of Brownian paths, a proof that does not require knowledge about
the dierentiation properties of BV functions.
Recall again the notion of the mesh mesh() = max
i
(t
i+1
t
i
) of a
partition = 0 = t
0
< t
1
< < t
m()
= t of [0, t].
Proposition 2.42. Let B be a Brownian motion. For any partition of
[0, t],
(2.36) E
__ m()1
i=0
(B
t
i+1
B
t
i
)
2
t
_
2
_
2t mesh().
In particular
(2.37) lim
mesh()0
m()1
i=0
(B
t
i+1
B
t
i
)
2
= t in L
2
(P).
If we have a sequence of partitions
n
such that

n
mesh(
n
) < , then
the convergence above holds almost surely along this sequence.
Proof. Straightforward computation, utilizing the facts that Brownian in-
crements are independent, B
s
B
r
has mean zero normal distribution with
variance s r, and so its fourth moment is E[(B
s
B
r
)
4
] = 3(s r)
2
. Let
t
i
= t
i+1
t
i
.
E
__ m()1
i=0
(B
t
i+1
B
t
i
)
2
t
_
2
_
=
i
E
_
(B
t
i+1
B
t
i
)
4
i,=j
E
_
(B
t
i+1
B
t
i
)
2
(B
t
j+1
B
t
j
)
2
2t
i
E
_
(B
t
i+1
B
t
i
)
2
+t
2
= 3
i
(t
i
)
2
+
i,=j
t
i
t
j
2t
2
+t
2
= 2
i
(t
i
)
2
+
i,j
t
i
t
j
t
2
= 2
i
(t
i
)
2
2 mesh()
i
t
i
= 2 mesh()t.
By Chebychevs inequality,
P
_
m(
n
)1
i=0
(B
t
n
i+1
B
t
n
i
)
2
t

_

2
E
__ m(
n
)1
i=0
(B
t
n
i+1
B
t
n
i
)
2
t
_
2
_
2t
2
mesh(
n
).
If

n
mesh(
n
) < , these numbers have a nite sum over n (in short,
they are summable). Hence the asserted convergence follows from the Borel-
Cantelli Lemma.
Corollary 2.43. The following is true almost surely for a Brownian motion
B: the path t B
t
() is not a member of BV [0, T] for any 0 < T < .
Proof. Pick an such that
lim
n
2
n
1
i=0
_
B
(i+1)T/2
n() B
iT/2
n()
_
2
= T
for each T = k
1
for k N. Such s form a set of probability 1 by the
previous proposition, because the partitions iT2
n
: 0 i 2
n
have
meshes 2
n
that form a summable sequence. Furthermore, by almost sure
continuity, we can assume that
lim
n
max
0i2
n
1
B
(i+1)T/2
n() B
iT/2
n()
= 0
for each T = k
1
. (Recall that a continuous function is uniformly continuous
on a closed, bounded interval.) And now for each such T,
T = lim
n
2
n
1
i=0
_
B
(i+1)T/2
n() B
iT/2
n()
_
2
lim
n
_
max
0i2
n
1
B
(i+1)T/2
n() B
iT/2
n()
2
n
1
i=0
B
(i+1)T/2
n() B
iT/2
n()
.
Since the maximum in braces vanishes as n , the last sum must converge
to . Consequently the path t B
t
() is not BV in any interval [0, k
1
].
Any other nontrivial inteval [0, T] contains an interval [0, k
1
] for some k,
and so this path cannot have bounded variation on any interval [0, T].
2.5. Poisson processes
Poisson processes describe congurations of random points. The denition
and construction for a general state space is no more complex than for the
real line, so we dene and construct the Poisson point process on an abstract
measure space rst.
Let 0 < < . A nonnegative integer-valued random variable X has
Poisson distribution with parameter (Poisson()distribution) if
PX = k = e
k
k!
for k Z
+
.
To describe point processes we also need the extreme cases: a Poisson vari-
able with parameter = 0 is identically zero, or PX = 0 = 1, while a Pois-
son variable with parameter = is identically innite: PX = = 1.
A Poisson() variable has mean and variance . A sum of independent Pois-
son variables (including a sum of countably innitely many terms) is again
Poisson distributed. These properties make the next denition possible.
Denition 2.44. Let (S, /, ) be a -nite measure space. A process
N(A) : A / indexed by the measurable sets is a Poisson point pro-
cess with mean measure if
(i) Almost surely, N() is a Z -valued measure on (S, /).
(ii) N(A) is Poisson distributed with parameter (A).
(iii) For any pairwise disjoint A
1
, A
2
, . . . , A
n
/, the random variables
N(A
1
), N(A
2
), . . . , N(A
n
) are independent.
The interpretation is that N(A) is the number of points in the set A. N is
also called a Poisson random measure.
Observe that items (i) and (ii) give a complete description of all the
nite-dimensional distributions of N(A). For arbitrary B
1
, B
2
, . . . , B
m

/, we can nd disjoint A
1
, A
2
, . . . , A
n
/ so that each B
j
is a union of
some of the A
i
s. Then each N(B
j
) is a certain sum of N(A
i
)s, and we see
that the joint distribution of N(B
1
), N(B
2
), . . . , N(B
m
) is determined by
the joint distribution of N(A
1
), N(A
2
), . . . , N(A
n
).
Proposition 2.45. Let (S, /, ) be a -nite measure space. Then a Pois-
son point process N(A) : A / with mean measure exists.
Proof. Let S
1
, S
2
, S
3
, . . . be disjoint measurable sets such that S =

S
i
and (S
i
) < . We shall rst dene a Poisson point process N
i
supported
on the subset S
i
(this means that N
i
has no points outside S
i
). If (S
i
) = 0,
dene N
i
(A) = 0 for every measurable set A /.
With this trivial case out of the way, we may assume 0 < (S
i
) < . Let
X
i
j
: j N be i.i.d. S
i
-valued random variables with common probability
distribution
PX
i
j
B =
(B S
i
)
(S
i
)
for measurable sets B /.
Independently of the X
i
j
: j N, let K
i
be a Poisson((S
i
)) random
variable. Dene
N
i
(A) =
K
i
j=1
1
A
(X
i
j
) for measurable sets A /.
As the formula reveals, K
i
decides how many points to place in S
i
, and the
X
i
j
give the locations of the points in S
i
. We leave it as an exercise to
check that N
i
is a Poisson point process whose mean measure is restricted
to S
i
, dened by
i
(B) = (B S
i
).
We can repeat this construction for each S
i
, and take the resulting ran-
dom processes N
i
mutually independent by a suitable product space con-
struction. Finally, dene
N(A) =
i
N
i
(A).
Again, we leave checking the properties as an exercise.
The most important Poisson processes are those on Euclidean spaces
whose mean measure is a constant multiple of Lebesgue measure. These are
called homogeneous Poisson point processes. When the points lie on the
positive real line, they naturally acquire a temporal interpretation. For this
case we make the next denition.
Denition 2.46. Let (, T, P) be a probability space, T
t
a ltration on
it, and > 0. A (homogeneous) Poisson process with rate is an adapted
stochastic process N = N
t
: 0 t < with these properties.
(i) N
0
= 0 almost surely.
(ii) For almost every , the path t N
t
() is cadlag.
(iii) For 0 s < t, N
t
N
s
is independent of T
s
, and has Poisson
distribution with parameter (t s).
Proposition 2.47. Homogeneous Poisson processes on [0, ) exist.
Proof. Let N(A) : A B
(0,)
be a Poisson point process on (0, )
with mean measure m. Dene N
0
= 0, N
t
= N(0, t] for t > 0, and
T
N
t
= N
s
: 0 s t. Let 0 = s
0
< s
1
< < s
n
s < t. Then
N(s
0
, s
1
], N(s
1
, s
2
], . . . , N(s
n1
, s
n
], N(s, t]
are independent random variables, from which follows that the vector (N
s
1
,
. . . , N
sn
) is independent of N(s, t]. Considering all such n-tuples (for various
n) while keeping s < t xed shows that T
s
is independent of N(s, t] =
N
t
N
s
.
The cadlag path property of N
t
follows from properties of the Poisson
process. Almost every has the property that N(0, T] < for all T < .
Given such an and t, there exist t
0
< t < t
1
such that N(t
0
, t) = N(t, t
1
) =
0. (There may be a point at t, but there cannot be sequences of points
converging to t from either left or right.) Consequently N
s
is constant for
t
0
< s < t and so the left limit N
t
exists. Also N
s
= N
t
for t s < t
1
which gives the right continuity at t.
One can show that the jumps of N
t
are all of size 1 (Exercise 2.21). This
is because the Lebesgue mean measure does not let two Poisson points sit
on top of each other. The next lemma is proved just like its counterpart for
Brownian motion, so we omit its proof.
Proposition 2.48. Suppose N = N
t
is a homogeneous Poisson process
with respect to a ltration T
t
on (, T, P).
(a) N is a Poisson process also with respect to the augmented right-
continuous ltration
T
t+
.
(b) Dene Y
t
= N
s+t
N
s
and (
t
= T
(s+t)+
. Then Y = Y
t
: 0 t <
is a homogeneous Poisson process with respect to the ltration (
t
, and
independent of

T
s+
.
The Markov property of the Poisson process follows next just like for
Brownian motion. However, we should not take R as the state space, but
instead Z
+
. Because the state space is discrete the Feller property is au-
tomatically satised. (A discrete space is one where singleton sets x are
open. Every function on a discrete space is continuous.) Consequently ho-
mogeneous Poisson processes also satisfy the strong Markov property. The
semigroup for a rate Poisson process would be
E
x
[g(X
t
)] = E
0
[g(x +X
t
)] =
k=0
g(x +k)e
k
k!
for x Z
+
.
Since the Poisson process is monotone nondecreasing it cannot be a
martingale. We need to compensate by subtracting o the mean, and so we
dene the compensated Poisson process as
M
t
= N
t
t.
Proposition 2.49. M is a martingale.
Proof. Follows from the independence of increments.
E[N
t
[T
s
] = E[N
t
N
s
[T
s
] +E[N
s
[T
s
] = (t s) +N
s
.
The reader should also be aware of another natural construction of N
in terms of waiting times. We refer to [12, Section 4.8] for a proof.
Proposition 2.50. Let T
k
: 1 k < be i.i.d. rate exponential
random variables. Let S
n
= T
1
+ + T
n
for n 1. Then N(A) =
n
1
A
(S
n
) denes a homogeneous rate Poisson point process on R
+
, and
N
t
= maxn : S
n
t denes a rate Poisson process with respect to its
own ltration T
N
t
.
Exercises
Exercise 2.1. To see that even increasing unions of -algebras can fail to
be -algebras (so that the generation is necessary in (2.1)), look at this
example. Let = (0, 1], and for n N let T
n
be the -algebra generated
by the intervals (k2
n
, (k + 1)2
n
] : k 0, 1, 2, . . . , 2
n
1. How about
the intersection of the sets (1 2
n
, 1]?
Exercise 2.2. To ward o yet another possible pitfall: uncountable unions
must be avoided. Even if A =

tR
+
A
t
and each A
t
T
t
, A may fail to be
a member of T
. Here is an example. Let = R

+
and let T
t
= B
R
+
for
each t R
+
. Then also T
= B
R
+
. Pick a subset A of R
+
that is not a
Borel subset. (The proof that such sets exist needs to be looked up from an
analysis text.) Then take A
t
= t if t A and A
t
= otherwise.
Exercise 2.3. Let T
t
be a ltration, and let (
t
= T
t+
. Show that (
t
=
T
t
for t > 0.
Exercise 2.4. Assume the probability space (, T, P) is complete. Let
T
t
be a ltration, (
t
= T
t+
its right-continuous version, and H
t
=

T
t
its augmentation. Augment (
t
to get the ltration
(
t
, and dene also
H
t+
=

s:s>t
H
s
. Show that

(
t
= H
t+
. In other words, it is immaterial
whether we augment before or after making the ltration right-continuous.
Hints.

(
t
H
t+
should be easy. For the other direction, if C H
t+
,
then for each s > t there exists C
s
T
s
such that P(CC
s
) = 0. For any
sequence s
i
t, the set

C =

m1
im
C
s
i
lies in T
t+
. Use Exercise 1.8.
Exercise 2.5. Let the underlying probability space be = [0, 1] with P
given by Lebesgue measure. Dene two processes
X
t
() = 0 and Y
t
() = 1
t=
.
X is continuous, Y does not have a single continuous path, but they are
modications of each other.
Exercise 2.6. (Example of an adapted but not progressively measurable
process.) Let = [0, 1], and for each t let T
t
be the -eld generated by
singletons on . (Equivalently, T
t
consists of all countable sets and their
complements.) Let X
t
() = 1 = t. Then X
t
: 0 t 1 is adapted.
But X on [0, 1] is not B
[0,1]
T
1
-measurable.
Hint. Show that elements of B
[0,1]
T
1
are of the type
_
_
sI
B
s
s
_
_
H I
c
_
where I is a countable subset of , each B
t
B
[0,1]
, and H is either empty
or [0, 1]. Consequently the diagonal (t, ) : X
t
() = 1 is not an element
of B
[0,1]
T
1
.
Exercises 81
Exercise 2.7. Suppose is a stopping time. Let A be a subset of t
and an event in T
t
for some xed t. Show that then A T
.
You might see this type of property expressed as T
t
t T
, even
though strictly speaking intersecting T
t
with t is not legitimate. The
intersection is used to express the idea that the -algebra T
t
is restricted
to the set t. Questionable but convenient usage is called abuse of
notation among mathematicians. It is the kind of license that seasoned
professionals can take but beginners should exercise caution!
Exercise 2.8. Show that if A T
, then A < T
.
The following rather contrived example illustrates that T
does not have

to lie inside T
. Take = 0, 1, 2, T = 2
, T
t
= 0, 1, 2, , , and
(0) = 0, (1) = (2) = . Show that is a stopping time and nd T
.
Exercise 2.9. Let be a stopping time and Z an T
-measurable random
variable. Show that for any A B
[0,t]
, 1
A
Z is T
t
-measurable. Hint.
Start with A = [0, s]. Use the - theorem.
Exercise 2.10. Let T
t
= (s) : 0 s t be the ltration generated by
coordinates on the space C = C
R
[0, ) of continuous functions. Let
H = C : t is a local maximum for .
Show that H T
t+
T
t
.
Hints. To show H T
t+
, note that H i for all large enough
n N, (t) (q) for all rational q (t n
1
, t +n
1
). To show H / T
t
,
use Exercise 1.7(b). For any H one can construct / H such that
(s) = (s) for s t.
Exercise 2.11. Let be a stopping time. Dene
(2.38) T
=
_
T
0
_
t>0
_
T
t
> t
_
_
.
See the remark in Exercise 2.7 that explains the notation.
(a) Show that for any stopping time , T
< T
.
(b) Let
n
be a nondecreasing sequence of stopping times such that
n
and
n
< for each n. Show that T
n
T
. This last convergence

statement means that T
= (
n
T
n
).
Exercise 2.12. (a) Suppose X is a caglad process adapted to T
t
. Dene
Z(t) = X(t+) for 0 t < . Show that Z is a cadlag process adapted to
T
t+
.
(b) Show that Lemma 2.9 is valid for a caglad process under the addi-
tional assumption that T
t
(c) Let be the space of real-valued caglad paths, X the coordinate
process X
t
() = (t) on , and T
t
the ltration generated by coordinates.
Show that Lemma 2.9 fails for this setting.
Exercise 2.13. (a) Suppose Y is a cadlag process. Check that you under-
stand that while in the denition (1.13) of total variation V
Y
(t) the supre-
mum can be replaced by the limit as the mesh tends to zero (Exercise 1.3),
in the denition of the quadratic variation [Y ]
t
the limit cannot in general
be replaced by the supremum.
(b) Check that if [Y ]
u
and [Y ]
t
exist for two times u > t in the sense of
the limit in (2.10), then [Y ]
u
[Y ]
t
is a similar limit of partitions of [t, u] as
the mesh tends to 0.
Exercise 2.14. Check that you are able to use the Markov property with
this simple exercise. Let P
x
satisfy Denition 2.22. Let r < s < t be time
points and A, B B
R
d. Assuming that P
x
(X
r
B, X
s
= y) > 0, show that
P
x
(X
t
A[ X
r
B, X
s
= y) = P
y
(X
ts
A)
with the understanding that the conditional probability on the left is dened
in the elementary fashion by (1.27).
Exercise 2.15. Let P
x
be a Feller process. Use the Markov property and
the Feller property inductively to show that
_
D
m
i=1
f
i
((s
i
)) P
x
(d)
is a continuous function of x R
d
for f
1
, . . . , f
m
C
b
(R
d
).
Exercise 2.16. Using property (ii) in Denition 2.26 show these two prop-
erties of Brownian motion, for any 0 s
0
< s
1
< < s
n
.
(a) The -algebras T
s
0
, (B
s
1
B
s
0
), (B
s
2
B
s
1
), . . . , (B
sn
B
s
n1
)
are independent.
(b) The distribution of the vector
(B
t+s
1
B
t+s
0
, B
t+s
2
B
t+s
1
, . . . , B
t+sn
B
t+s
n1
)
is the same for all t 0.
Exercise 2.17. (Brownian motion as a Gaussian process.) A process X
t
is Gaussian if for all nite sets of indices t

1
, t
2
, . . . , t
n
the vector (X
t
1
, X
t
2
,
. . . , X
tn
) has multivariate normal distribution as in Example 1.19(iv). It
is a consequence of a - argument or Kolmogorovs extension theorem
that the distribution of a Gaussian process is entirely determined by two
functions: the mean m(t) = EX
t
and the covariance c(s, t) = Cov(X
s
, X
t
) =
E(X
s
X
t
) m(s)m(t).
Exercises 83
(a) Show that standard Brownian motion is a Gaussian process with
m(t) = 0 and c(s, t) = s t.
(b) To see that there is something to argue in (a), consider this example:
X is a standard normal, is independent of X with distribution P( =
1) = P( = +1) = 1/2, and Y = X. Is (X, Y ) multivariate normal?
Exercise 2.18. The calculation in the proof of Proposition 2.37 gives
P
0
(B
t
a, M
t
b) = P
0
(B
t
2b a).
Deduce from this P
0
(B
t
a, M
t
> b) = P
0
(B
t
> 2b a) by taking limits
and using the continuous distribution of Brownian motion.
Exercise 2.19. Let B
t
be standard Brownian motion and M
t
its running
maximum. Show that the joint density f(x, y) of (B
t
, M
t
) is
(2.39) f(x, y) =
2(2y x)
2t
3/2
exp
_
1
2t
(2y x)
2
_
on the domain 0 y < , < x y. Hint: Use (2.33) and convert
P
0
(B
t
> 2b a) into a double integral of the form
_
b
dy
_
a
dxf(x, y).
Can you give an argument for why it is enough to consider the events
B
t
a, M
t
> b in the calculation?
Exercise 2.20. Let B and X be two independent one-dimensional Brownian
motions. Following the proof of Proposition 2.42, show that
(2.40) lim
mesh()0
m()1
i=0
(B
t
i+1
B
t
i
)(X
t
i+1
X
t
i
) = 0 in L
2
(P).
As a consequence of your calculation, nd the covariation [B, X].
Can you also nd [B, X] from the denition in equation (2.11)?
Exercise 2.21. Let N be a homogeneous Poisson process. Show that for
almost every , N
t
() N
t
() = 0 or 1 for all t.
Hint. If N
t
N
t
2 for some t [0, T], then for any partition 0 =
t
0
< t
1
< < t
n
= T, N(t
i
, t
i+1
] 2 for some i. A crude bound shows
that this probability can be made arbitrarily small by shrinking the mesh
of the partition.
Exercise 2.22. Let N be a homogeneous rate Poisson process on R
+
t
, and M
t
= N
t
t. Show that M
2
t
t
and M
2
t
N
t
are martingales.
Exercise 2.23. As in the previous execise, let N be a homogeneous rate
Poisson process and M
t
= N
t
t. Use Corollary A.11 from the appendix
to nd the quadratic variation processes [M] and [N].
Chapter 3
Martingales
Let (, T, P) be a probability space with a ltration T
t
. We assume that
T
t
is complete but not necessarily right-continuous, unless so specied.
As dened in the previous chapter, a martingale with respect to T
t
is a
real-valued stochastic process M = M
t
: t R
+
adapted to T
t
such
that M
t
is integrable for each t, and
E[M
t
[T
s
] = M
s
for all s < t.
If the equality above is relaxed to
E[M
t
[T
s
] M
s
for all s < t
then M is a submartingale. M is a supermartingale if M is a submartingale.
M is square-integrable if E[M
2
t
] < for all t.
These properties are preserved by certain classes of functions.
Proposition 3.1. (a) If M is a martingale and a convex function such
that (M
t
) is integrable for all t 0, then (M
t
) is a submartingale.
(b) If M is a submartingale and a nondecreasing convex function such
that (M
t
) is integrable for all t 0, then (M
t
) is a submartingale.
Proof. Part (a) follows from Jensens inequality. For s < t,
E[(M
t
)[T
s
]
_
E[M
t
[T
s
]
_
= (M
s
).
Part (b) follows from the same calculation, but now the last equality becomes
the inequality due to the submartingale property E[M
t
[T
s
] M
s
and the
monotonicity of .
The martingales we work with have always right-continuous paths. Then
it is sometimes convenient to enlarge the ltration to T
t+
if the ltration is
85
86 3. Martingales
not right-continuous to begin with. The next proposition permits this move.
An example of its use appears in the proof of Doobs inequality, Theorem
3.11.
Proposition 3.2. Suppose M is a right-continuous submartingale with re-
spect to a ltration T
t
. Then M is a submartingale also with respect to
T
t+
.
Proof. Let s < t and consider n > (t s)
1
. M
t
c is a submartingale, so
E[M
t
c[T
s+n
1] M
s+n
1 c.
Since T
s+
T
s+n
1,
E[M
t
c[T
s+
] E[M
s+n
1 c[T
s+
].
By the bounds
c M
s+n
1 c E[M
t
c[T
s+n
1]
and Lemma B.14 from the Appendix, for a xed c the random variables
M
s+n
1 c are uniformly integrable. Let n . Right-continuity of
paths implies M
s+n
1 c M
s
c. Uniform integrability then gives con-
vergence in L
1
. By Lemma B.15 there exists a subsequence n
j
such that
conditional expectations converge almost surely:
E[M
s+n
1
j
c[T
s+
] E[M
s
c[T
s+
].
Consequently
E[M
t
c[T
s+
] E[M
s
c[T
s+
] = M
s
c M
s
.
As c , the dominated convergence theorem for conditional expecta-
tions (Theorem B.12) makes the conditional expectation on the left converge,
and in the limit E[M
t
[T
s+
] M
s
.
The connection between the right-continuity of the (sub)martingale and
the ltration goes the other way too. The statement below is Theorem 1.3.13
in [9].
Proposition 3.3. Suppose the ltration T
t
satises the usual conditions,
in other words (, T, P) is complete, T
0
contains all null events, and T
t
=
T
t+
. Let M be a submartingale such that t EM
t
Then there exists a cadlag modication of M that is an T
t
-submartingale.
3.1. Optional stopping
Optional stopping has to do with extending the submartingale property
E[M
t
[T
s
] M
s
from deterministic times s < t to stopping times. We begin
with discrete stopping times.
Lemma 3.4. Let M be a submartingale. Let and be two stopping times
whose values lie in an ordered countable set s
1
< s
2
< s
3
<
[0, ] where s
j
. Then for any T < ,
(3.1) E[M
T
[T
] M
T
.
Proof. Fix n so that s
n
T < s
n+1
. First observe that M
T
is integrable,
because
[M
T
[ =
n
i=1
1 = s
i
[M
s
i
[ +1 > s
n
[M
T
[
i=1
[M
s
i
[ +[M
T
[.
Next check that M
T
is T
-measurable. For discrete stopping times

this is simple. We need to show that M
T
B t T
t
for all
B B
R
and t. Let s
j
be the highest value not exceeding t. (If there is no
such s
i
, then t < s
1
, the event above is empty and lies in T
t
.) Then
M
T
B t =
j
_
i=1
_
= s
i
M
s
i
T
B t
_
.
This is a union of events in T
t
because s
i
t and is a stopping time.
Since both E[M
T
[T
] and M
T
are T
-measurable, (3.1) follows

from checking that
E1
A
E[M
T
[T
] E1
A
M
T
for all A T
.
By the denition of conditional expectation, this reduces to showing
E[1
A
M
T
] E[1
A
M
T
].
Decompose A according to whether T or > T. If > T, then
T = T and then
E[1
A>T
M
T
] = E[1
A>T
M
T
].
To handle the case T we decompose it into subcases
E[1
A=s
i
M
T
] E[1
A=s
i
M
T
]
= E[1
A=s
i
M
s
i
T
] for 1 i n.
88 3. Martingales
Since A = s
i
T
s
i
, by conditioning again on the left, this last conclu-
sion will follow from
(3.2) E[M
T
[T
s
i
] M
s
i
T
for 1 i n.
We check (3.2) by an iterative argument. First an auxiliary inequality: for
any j,
E[M
s
j+1
T
[T
s
j
] = E
_
M
s
j+1
T
1 > s
j
+M
s
j
T
1 s
j
T
s
j
= E[M
s
j+1
T
[T
s
j
] 1 > s
j
+M
s
j
T
1 s
j
M
s
j
T
1 > s
j
+M
s
j
T
1 s
j
= M
s
j
T
.
Above we used the fact that M
s
j
T
is T
s
j
-measurable (which was checked
above) and then the submartingale property. Since T = s
n+1
T
(recall that s
n
T < s
n+1
), applying the above inequality to j = n gives
E[M
T
[T
sn
] M
snT
which is case i = n of (3.2). Now do induction: assuming (3.2) has been
checked for i and applying the auxiliary inequality again gives
E[M
T
[T
s
i1
] = E
_
E[M
T
[T
s
i
]

T
s
i1
_
E
_
M
s
i
T
T
s
i1
_
M
s
i1
T
which is (3.2) for i 1. Repeat this until (3.2) has been proved down to
i = 1.
To extend this result to general stopping times, we assume some regu-
larity on the paths of M. First we derive a moment bound.
Lemma 3.5. Let M be a submartingale with right-continuous paths and
T < . Then for any stopping time that satises P T = 1,
E [M
[ 2E[M
+
T
] E[M
0
].
Proof. Dene a discrete approximation of by
n
= T if = T, and
n
= 2
n
T(2
n
/T| + 1) if < T. Then
n
is a stopping time with nitely
many values in [0, T], and
n
as n .
Averaging over (3.1) gives E[M
T
] E[M
T
]. Apply this to =
n
and = 0 to get
EM
n
EM
0
.
Next, apply (3.1) to the submartingale M
+
t
= M
t
0, with = T and
=
n
to get
EM
+
T
EM
+
n
.
From the above
EM
n
= EM
+
n
EM
n
EM
+
T
EM
0
.
Combining these,
E[M
n
[ = EM
+
n
+EM
n
2EM
+
T
EM
0
.
Let n , use right-continuity and apply Fatous Lemma to get
E[M
[ lim
n
E[M
n
[ 2EM
+
T
EM
0
.
The conclusion from the previous lemma needed next is that for any
stopping time and T R
+
, the stopped variable M
t
is integrable. Here
is the extension of Lemma 3.4 from discrete to general stopping times.
Theorem 3.6. Let M be a submartingale with right-continuous paths, and
let and be two stopping times. Then for T < ,
(3.3) E[M
T
[T
] M
T
.
Proof. As pointed out before the theorem, M
T
and M
T
are integrable
random variable. In particular, the conditional expectation is well-dened.
Dene approximating discrete stopping times by
n
= 2
n
(2
n
| + 1)
and
n
= 2
n
(2
n
| + 1). The interpretation for innite values is that
n
= if = , and similarly for
n
and .
Let c R. The function x x c is convex and nondecreasing, hence
M
t
c is also a submartingale. Applying Lemma 3.4 to this submartingale
and the stopping times
n
and
n
gives
E[M
nT
c[T
n
] M
nnT
c.
Since
n
, T
T
n
, and if we condition both sides of the above
inequality on T
, we get
(3.4) E[M
nT
c[T
] E[M
nnT
c[T
].
The purpose is now to let n in (3.4) and obtain the conclusion
(3.3) for the truncated process M
t
c, and then let c and get the
conclusion. The time arguments converge from the right:
n
T T
and
n

n
T T. Then by the right-continuity of M,
M
nT
M
T
and M
nnT
M
T
.
Next we justify convergence of the conditional expectations along a sub-
sequence. By Lemma 3.4
c M
nT
c E[M
T
c[T
n
]
90 3. Martingales
and
c M
nnT
c E[M
T
c[T
nn
].
Together with Lemma B.14 from the Appendix, these bounds imply that the
sequences M
nT
c : n N and M
nnT
c : n N are uniformly
integrable. Since these sequences converge almost surely (as argued above),
uniform integrability implies that they converge in L
1
. By Lemma B.15
there exists a subsequence n
j
along which the conditional expectations
converge almost surely:
E[M
n
j
T
c[T
] E[M
T
c[T
]
and
E[M
n
j
n
j
T
c[T
] E[M
T
c[T
].
(To get a subsequence that works for both limits, extract a subsequence for
the rst limit by Lemma B.15, and then apply Lemma B.15 again to extract
a further subsubsequence for the second limit.) Taking these limits in (3.4)
gives
E[M
T
c[T
] E[M
T
c[T
].
M is right-continuous by assumption, hence progressively measurable, and
so M
T
is T
T
-measurable. This is a sub--eld of T
, and so
E[M
T
c[T
] E[M
T
c[T
] = M
T
c M
T
.
As c , M
T
c M
T
pointwise, and for all c we have the integrable
bound [M
T
c[ [M
T
[. Thus by the dominated convergence theorem
for conditional expectations, almost surely
lim
c
E[M
T
c[T
] = E[M
T
[T
].
This completes the proof.
Corollary 3.7. Suppose M is a right-continuous submartingale and is
a stopping time. Then the stopped process M
= M
t
: t R
+
is a
submartingale with respect to the original ltration T
t
.
If M is a also a martingale, then M
is a martingale. And nally, if M

is an L
2
-martingale, then so is M
.
Proof. In (3.3), take T = t, = s < t. Then it becomes the submartingale
property for M
:
(3.5) E[M
t
[T
s
] M
s
.
If M is a martingale, we can apply this to both M and M. And if M is
an L
2
-martingale, Lemma 3.5 implies that so is M
.
Corollary 3.8. Suppose M is a right-continuous submartingale. Let (u) :
u 0 be a nondecreasing, [0, )-valued process such that (u) is a bounded
stopping time for each u. Then M
(u)
: u 0 is a submartingale with
respect to the ltration T
(u)
: u 0.
If M is a martingale or an L
2
-martingale to begin with, then so is M
(u)
.
Proof. For u < v and T (v), (3.3) gives E[M
(v)
[T
(u)
] M
(u)
. If
M is a martingale, we can apply this to both M and M. And if M is
an L
2
-martingale, Lemma 3.5 applied to the submartingale M
2
implies that
E[M
2
(u)
] 2E[M
2
T
] +E[M
2
0
].
The last corollary has the following implications:
(i) M
is a submartingale not only with respect to T

t
but also with
respect to T
t
.
(ii) Let M be an L
2
-martingale and a bounded stopping time. Then
M
t
= M
+t
M
is an L
2
-martingale with respect to

T
t
= T
+t
.
3.2. Inequalities and limits
Lemma 3.9. Let M be a submartingale, 0 < T < , and H a nite subset
of [0, T]. Then for r > 0,
(3.6) P
_
max
tH
M
t
r
_
r
1
E[M
+
T
]
and
(3.7) P
_
min
tH
M
t
r
_
r
1
_
E[M
+
T
] E[M
0
]
_
.
Proof. Let = mint H : M
t
r, with the interpretation that =
if M
t
< r for all t H. (3.3) with = T gives
E[M
T
] E[M
T
] = E[M
1
<
] +E[M
T
1
=
],
from which
rP
_
max
tH
M
t
r
_
= rP < E[M
1
<
] E[M
T
1
<
]
E[M
+
T
1
<
] E[M
+
T
].
This proves (3.6).
To prove (3.7), let = mint H : M
t
r. (3.4) with = 0 gives
E[M
0
] E[M
T
] = E[M
1
<
] +E[M
T
1
=
],
92 3. Martingales
from which
rP
_
min
tH
M
t
r
_
= rP < E[M
1
<
]
E[M
0
] E[M
T
1
=
] E[M
0
] E[M
+
T
].
Next we generalize this to uncountable suprema and inma.
Theorem 3.10. Let M be a right-continuous submartingale and 0 < T <
. Then for r > 0,
(3.8) P
_
sup
0tT
M
t
r
_
r
1
E[M
+
T
]
and
(3.9) P
_
inf
0tT
M
t
r
_
r
1
_
E[M
+
T
] E[M
0
]
_
.
Proof. Let H be a countable dense subset of [0, T] that contains 0 and T,
and let H
1
H
2
H
3
be nite sets such that H =

H
n
. Lemma
3.9 applies to the sets H
n
. Let b < r. By right-continuity,
P
_
sup
0tT
M
t
> b
_
= P
_
sup
tH
M
t
> b
_
= lim
n
P
_
sup
tHn
M
t
> b
_
b
1
E[M
+
T
].
Let b r. This proves (3.8). (3.9) is proved by a similar argument.
When X has either left- or right-continuous paths with probability 1,
we dene
(3.10) X
T
() = sup
0tT
[X
t
()[.
The measurability of X
T
is checked as follows. First dene U = sup
sR
[X
s
[
where R contains T and all rationals in [0, T]. U is T
T
-measurable as a
supremum of countably many T
T
-measurable random variables. On every
left- or right-continuous path U coincides with X
T
. Thus U = X
T
at least
almost surely. By the completeness assumption on the ltration, all events of
probability zero and their subsets lie in T
T
, and so X
T
is also T
T
-measurable.
Theorem 3.11. (Doobs Inequality) Let M be a nonnegative right-continuous
submartingale and 0 < T < . Then for 1 r
_
r
1
E
_
M
T
1M
T
r
for r > 0. Let

= inft > 0 : M
t
> r.
This is an T
t+
-stopping time by Lemma 2.7. By right-continuity M
r
when < . Quite obviously M
T
> r implies T, and so
rP
_
M
T
> r
_
E
_
M
1M
T
> r
E
_
M
1 T
.
Since M is a submartingale with respect to T
t+
by Proposition 3.2, The-
orem 3.6 gives
E[M
1
T
] = E[M
T
] E[M
T
1
>T
] E[M
T
] E[M
T
1
>T
]
= E[M
T
1
T
] E
_
M
T
1M
T
r
.
(3.12) has been veried.
Let 0 r] dr
_
b
0
pr
p2
E
_
M
T
1M
T
r
dr
= E
_
M
T

_
bM
T
0
pr
p2
dr
_
=
p
p 1
E
_
M
T
(b M
T
)
p1
p
p 1
E
_
M
p
T
1
p
E
_
(b M
T
)
p
p1
p
.
The truncation at b guarantees that the last expectation is nite so we can
divide by it through the inequality to get
E
_
(M
T
b)
p
1
p
p
p 1
E
_
M
p
T
1
p
.
Raise both sides of this last inequality to power p and then let b .
Monotone convergence theorem gives the conclusion.
The obvious application would be to M = [X[ for a martingale X.
By applying the previous inequalities to the stopped process M
t
, we can
replace T with a bounded stopping time . We illustrate the idea with
Doobs inequality.
Corollary 3.12. Let M be a nonnegative right-continuous submartingale
and a bounded stopping time. The for 1 < p <
(3.13) E
_ _
sup
0t
M
t
_
p
_

_
p
p 1
_
p
E
_
M
p
.
94 3. Martingales
Proof. Pick T so that T. Since sup
0tT
M
t
= sup
0t
M
t
and
M
T
= M
, the result follows immediately by applying (3.11) to M

t
.
Our treatment of martingales would not be complete without mention-
ing martingale limit theorems, although strictly speaking we do not need
them for developing stochastic integration. Here is the basic martingale
convergence theorem that gives almost sure convergence.
Theorem 3.13. Let M be a right-continuous submartingale such that
sup
tR
+
E(M
+
t
) < .
Then there exists a random variable M
such that E[M
[ < and M
t
()
M
() as t for almost every .

Now suppose M = M
t
: t R
+
is a martingale. When can we take
the limit M
and adjoin it to the process in the sense that M

t
: t [0, ]
is a martingale? The integrability of M
is already part of the conclusion of

Theorem 3.13. However, to also have E(M
[T
t
) = M
t
we need an additional
hypotheses of uniform integrability (Denition B.13 in the appendix).
Theorem 3.14. Let M = M
t
: t R
+
be a right-continuous martingale.
Then the following four conditions are equivalent.
(i) The collection M
t
: t R
+
is uniformly integrable.
(ii) There exists an integrable random variable M
such that
lim
t
E[M
t
M
[ = 0 (L
1
convergence).
(iii) There exists an integrable random variable M
such that M
t
()
M
() almost surely and E(M
[T
t
) = M
t
for all t R
+
.
(iv) There exists an integrable random variable Z such that M
t
= E(Z[T
t
)
for all t R
+
.
As quick corollaries we get for example the following statements.
Corollary 3.15. (a) For Z L
1
(P), E(Z[T
t
) E(Z[T
) as t both
almost surely and in L
1
.
(b) (Levys 0-1 law) For A T
, E(1
A
[T
t
) 1
A
as t both almost
surely and in L
1
.
Proof. Part (b) follows from (a). To see (a), start by dening M
t
= E(Z[T
t
)
so that statement (iv) of Theorem 3.14 is valid. By (ii) and (iii) we have
an a.s. and L
1
limit M
(explain why there cannot be two dierent limits).

By construction M
is T
-measurable. For A T
s
we have for t > s and
by L
1
convergence
E[1
A
Z] = E[1
A
M
t
] E[1
A
M
].
3.3. Local martingales and semimartingales 95
A - argument extends E[1
A
Z] = E[1
A
M
] to all A T
.
3.3. Local martingales and semimartingales
For a stopping time and a process X = X
t
: t R
+
, the stopped process
X
is dened by X
t
= X
t
.
Denition 3.16. Let M = M
t
: t R
+
be a process adapted to a
ltration T
t
. M is a local martingale if there exists a sequence of stopping
times
1

2

3
such that P
k
= 1 and for each k M
k
is a
martingale with respect to T
t
. M is a local square-integrable martingale if
M
k
is a square-integrable martingale for each k. In both cases we say
k
is a localizing sequence for M.

Remark 3.17. (a) Since M
k
0
= M
0
the denition above requires that M
0
is
integrable. This extra restriction can be avoided by phrasing the denition
so that M
tn
1
n>0
is a martingale [10, 13]. Another way is to require
that M
tn
M
0
is a martingale [2, 8]. We use the simple denition since
we have no need for the extra generality of nonintegrable M
0
.
(b) In some texts the denition of local martingale also requires that
M
k
is uniformly integrable. This can be easily arranged (Exercise 3.6).
(c) Further localization gains nothing. That is, if M is an adapted
process and
n
(a.s.) are stopping times such that M
n
is a local
martingale for each n, then M itself is a local martingale (Exercise 3.4).
(d) Sometimes we consider a local martingale M
t
: t [0, T] restricted
to a bounded time interval. Then it seems pointless to require that
n
.
Indeed it is equivalent to require a nondecreasing sequence of stopping times
n
such that M
tn
: t [0, T] is a martingale for each n and, almost
surely,
n
T for large enough n. Given such a sequence
n
one can check
that the original denition can be recovered by taking
n
=
n
1
n
<
T + 1
n
T.
We shall also use the shorter term local L
2
-martingale for a local square-
integrable martingale.
Lemma 3.18. Suppose M is a local martingale and is an arbitrary stop-
ping time. Then M
is also a local martingale. Similarly, if M is a local

L
2
. In both cases, if
k
is a localizing sequence
for M, then it is also a localizing sequence for M
.
Proof. Let
k
be a sequence of stopping times such that
k
and
M
k
is a martingale. By Corollary 3.7 the process M
k
t
= (M
k
t
is a
martingale. Thus the stopping times
k
work also for M
.
96 3. Martingales
If M
k
is an L
2
k
t
= (M
k
t
, because by
applying Lemma 3.5 to the submartingale (M
k
)
2
,
E[M
2
k
t
] 2E[M
2
k
t
] +E[M
2
0
].
Only large jumps can prevent a cadlag local martingale from being a
local L
2
-martingale. See Exercise 3.8.
Lemma 3.19. Suppose M is a cadlag local martingale, and there is a con-
stant c such that [M
t
M
t
[ c for all t. Then M is a local L
2
-martingale.
Proof. Let
k
be stopping times such that M
k
is a martingale. Let
k
= inft 0 : [M
t
[ or [M
t
[ k
be the stopping times dened by (2.7) and (2.8). By the cadlag assumption,
each path t M
t
() is locally bounded (means: bounded in any bounded
time interval), and consequently
k
() as k . Let
k
=
k

k
.
Then
k
, and M
k
is a martingale for each k. Furthermore,
[M
k
t
[ = [M
k
t
[ sup
0s<
k
[M
s
[ +[M
k
M
[ k +c.
So M
k
is a bounded process, and in particular M
k
is an L
2
-process.
Recall that the usual conditions on the ltration T
t
meant that the
ltration is complete (each T
t
contains every subset of a P-null event in T)
and right-continuous (T
t
= T
t+
).
Theorem 3.20 (Fundamental Theorem of Local Martingales). Assume
T
t
is complete and right-continuous. Suppose M is a cadlag local martin-
gale and c > 0. Then there exist cadlag local martingales

M and A such that
the jumps of

M are bounded by c, A is an FV process, and M =

M +A.
A proof of the fundamental theorem of local martingales can be found in
[11]. Combining this theorem with the previous lemma gives the following
corollary, which we will nd useful because L
2
-martingales are the starting
point for developing stochastic integration.
Corollary 3.21. Assume T
t
is complete and right-continuous. Then a
cadlag local martingale M can be written as a sum M =

M +A of a cadlag
local L
2
-martingale

M and a local martingale A that is an FV process.
Denition 3.22. A cadlag process Y is a semimartingale if it can be written
as Y
t
= Y
0
+ M
t
+ V
t
where M is a cadlag local martingale, V is a cadlag
FV process, and M
0
= V
0
= 0.
By the previous corollary, we can always select the local martingale
part of a semimartingale to be a local L
2
-martingale. The normalization
M
0
= V
0
= 0 does not exclude anything since we can always replace M with
M M
0
without losing either the local martingale or local L
2
-martingale
property, and also V V
0
has the same total variation as V does.
Remark 3.23. (A look ahead.) The denition of a semimartingale appears
ad hoc. But stochastic analysis will show us that the class of semimartingales
has several attractive properties. (i) For a suitably bounded predictable pro-
cess X and a semimartingale Y , the stochastic integral
_
X dY is again a
semimartingale. (ii) f(Y ) is a semimartingale for a C
2
function f. The class
of local L
2
-martingales has property (i) but not (ii). In order to develop a
sensible stochastic calculus that involves functions of processes, it is neces-
sary to extend the class of processes considered from local martingales to
semimartingales.
3.4. Quadratic variation for semimartingales
Quadratic variation and covariation were discussed in general in Section
2.2, and here we look at these notions for martingales, local martingales and
semimartingales. We begin with the examples we already know.
Example 3.24 (Brownian motion). From Proposition 2.42 and Exercise
2.20, for two independent Brownian motions B and Y , [B]
t
= t and [B, Y ]
t
=
0.
Example 3.25 (Poisson process). Let N be a homogeneous rate Poisson
process, and M
t
= N
t
t the compensated Poisson process. Then [M] =
[N] = N by Corollary A.11. If

N is an independent rate Poisson process
with

M
t
=

N
t
t, [M,

M] = 0 by Lemma A.10 because with probability
one M and

M have no jumps in common.
Next a general existence theorem. A proof can be found in Section 2.3
of [5].
Theorem 3.26. Let M be a right-continuous local martingale with respect to
a ltration T
t
. Then the quadratic variation process [M] exists in the sense
of Denition 2.14. There is a version of [M] with the following properties.
[M] is a real-valued, right-continuous, nondecreasing adapted process such
that [M]
0
= 0. If M is an L
2
-martingale, the convergence in (2.10) for
Y = M holds also in L
1
, and
(3.14) E( [M]
t
) = E(M
2
t
M
2
0
).
If M is continuous, then so is [M].
98 3. Martingales
We check that stopping the submartingale and stopping the quadratic
variation give the same process. Our proof works for local L
2
martingales
which is sucient for our needs. The result is true for all local martingales.
The statement for all local martingales can be derived from the estimates
in the proof of Theorem 3.26 in [5], specically from limit (3.26) on page 70
in [5].
Lemma 3.27. Let M be a right-continuous L
2
-martingale or local L
2
-
martingale. Let be a stopping time. Then [M
] = [M]
, in the sense
that these processes are indistinguishable.
Proof. The processes [M
] and [M]
are right-continuous. Hence indistin-

guishability follows from proving almost sure equality at all xed times.
Step 1. We start with a discrete stopping time whose values form
an unbounded, increasing sequence u
1
< u
2
< < u
j
. Fix t and
consider a sequence of partitions
n
= 0 = t
n
0
< t
n
1
< < t
n
m(n)
= t of
[0, t] with mesh(
n
) 0 as n . For any u
j
,
i
(M
u
j
t
n
i+1
M
u
j
t
n
i
)
2
[M]
u
j
t
in probability, as n .
We can replace the original sequence
n
with a subsequence along which
this convergence is almost sure for all j. We denote this new sequence again
by
n
. (The random variables above are the same for all j large enough so
that u
j
> t, so there are really only nitely many distinct j for which the
limit needs to happen.)
Fix an at which the convergence happens. Let u
j
= (). Then the
above limit gives
[M
]
t
() = lim
n
i
_
M
t
n
i+1
() M
t
n
i
()
_
2
= lim
n
i
_
M
t
n
i+1
() M
t
n
i
()
_
2
= lim
n
i
_
M
u
j
t
n
i+1
() M
u
j
t
n
i
()
_
2
= [M]
u
j
t
() = [M]
t
() = [M]
t
().
The meaning of the rst equality sign above is that we know [M
]
t
is given
by this limit, since according to the existence theorem, [M
]
t
is given as a
limit in probability along any sequence of partitions with vanishing mesh.
We have shown that [M
]
t
= [M]
t
almost surely for a discrete stopping
time .
Step 2. Let be an arbitrary stopping time, but assume M is an L
2
-
martingale. Let
n
= 2
n
(2
n
| + 1) be the discrete approximation that
converges to from the right. Apply (2.16) to X = M
n
and Y = M
, take
expectations, use (3.14), and apply Schwarz inequality to get
E
_
[ [M
n
]
t
[M
]
t
[
_
E
_
[M
n
M
]
t
_
+ 2E
_
[M
n
M
]
1/2
t
[M
]
1/2
t
_
E
_
(M
n
t
M
t
)
2
_
+ 2E
_
[M
n
M
]
t
_
1/2
E
_
[M
]
t
_
1/2
= E
_
(M
nt
M
t
)
2
_
+ 2E
_
(M
nt
M
t
)
2
_
1/2
E
_
M
2
t
_
1/2
_
EM
2
nt
EM
2
t
_
+ 2
_
EM
2
nt
EM
2
t
_
1/2
E
_
M
2
t
_
1/2
.
In the last step we used (3.3) in two ways: For a martingale it gives equality,
and so
E
_
(M
nt
M
t
)
2
_
= EM
2
nt
2E E(M
nt
[T
t
)M
t
+EM
2
t
= EM
2
nt
EM
2
t
.
Second, we applied (3.3) to the submartingale M
2
to get
E
_
M
2
t
_
1/2
E
_
M
2
t
_
1/2
.
The string of inequalities allows us to conclude that [M
n
]
t
converges to
[M
]
t
in L
1
as n , if we can show that
(3.15) EM
2
nt
EM
2
t
.
To argue this last limit, rst note that by right-continuity, M
2
nt
M
2
t
almost surely. By optional stopping (3.6),
0 M
2
nt
E(M
2
t
[T
nt
).
This inequality and Lemma B.14 from the Appendix imply that the sequence
M
2
nt
: n N is uniformly integrable. Under uniform integrability, the
almost sure convergence implies convergence of the expectations (3.15).
To summarize, we have shown that [M
n
]
t
[M
]
t
in L
1
as n .
By Step 1, [M
n
]
t
= [M]
nt
which converges to [M]
t
by right-continuity
of the process [M]. Putting these together, we get the almost sure equality
[M
]
t
= [M]
t
for L
2
-martingales.
Step 3. Lastly a localization step. Let
k
be stopping times such
that
k
and M
k
is an L
2
-martingale for each k. By Step 2
[M
]
t
= [M
k
]
t
.
On the event
k
> t, throughout the time interval [0, t], M
agrees with
M
and M
k
agrees with M. Hence the corresponding sums of squared
increments agree also. In the limits of vanishing mesh we have almost
surely [M
]
t
= [M
]
t
, and also [M
k
]
s
= [M]
s
for all s [0, t] by right-
continuity. We can take s = t, and this way we get the required equality
[M
]
t
= [M]
t
.
100 3. Martingales
Theorem 3.28. (a) If M is a right-continuous L
2
-martingale, then M
2
t

[M]
t
is a martingale.
(b) If M is a right-continuous local L
2
-martingale, then M
2
t
[M]
t
is a
local martingale.
Proof. Part (a). Let s < t and A T
s
. Let 0 = t
0
< < t
m
= t be a
partition of [0, t], and assume that s is a partition point, say s = t
.
E
_
1
A
_
M
2
t
M
2
s
[M]
t
+ [M]
s
_
= E
_
1
A
_
m1
i=
(M
2
t
i+1
M
2
t
i
) [M]
t
+ [M]
s
__
(3.16a)
= E
_
1
A
_
m1
i=
(M
t
i+1
M
t
i
)
2
[M]
t
+ [M]
s
__
= E
_
1
A
_
m1
i=0
(M
t
i+1
M
t
i
)
2
[M]
t
__
(3.16b)
+E
_
1
A
_
[M]
s
i=0
(M
t
i+1
M
t
i
)
2
__
. (3.16c)
The second equality above follows from
E
_
M
2
t
i+1
M
2
t
i
T
t
i
= E
_
(M
t
i+1
M
t
i
)
2

T
t
i
.
To apply this, the expectation on line (3.16a) has to be taken apart, the
conditioning applied to individual terms, and then the expectation put back
together. Letting the mesh of the partition tend to zero makes the expec-
tations on lines (3.16b)(3.16c) vanish by the L
1
convergence in (2.10) for
L
2
-martingales.
In the limit we have
E
_
1
A
_
M
2
t
[M]
t
_
= E
_
1
A
_
M
2
s
[M]
s
_
for an arbitrary A T
s
, which implies the martingale property.
(b) Let X = M
2
[M] for the local L
2
-martingale M. Let
k
be a
localizing sequence for M. By part (a), (M
k
)
2
t
[M
k
]
t
is a martingale.
Since [M
k
]
t
= [M]
t
k
by Lemma 3.27, this martingale is the same as
M
2
t
k
[M]
t
k
= X
k
t
. Thus
k
is a localizing sequence for X.
From Theorem 3.26 it follows also that the covariation [M, N] of two
right-continuous local martingales M and N exists. As a dierence of in-
creasing processes, [M, N] is a nite variation process.
Lemma 3.29. Let M and N be cadlag L
2
-martingales or local L
2
-martingales.
Let be a stopping time. Then [M
, N] = [M
, N
] = [M, N]
.
Proof. [M
, N
] = [M, N]
follows from the denition (2.11) and Lemma

3.27. For the rst equality claimed, consider a partition of [0, t]. If 0 < t,
let be the index such that t
< t
+1
. Then
i
(M
t
i+1
M
t
i
)(N
t
i+1
N
t
i
) = (M
M
t
)(N
t
+1
N
)1
0<t
+
i
(M
t
i+1
M
t
i
)(N
t
i+1
N
t
i
).
(If = 0 the equality above is still true, for both sides vanish.) Let the mesh
of the partition tend to zero. With cadlag paths, the term after the equality
sign converges almost surely to (M
)(N
+
N
)1
0<t
= 0. The
convergence of the sums gives [M
, N] = [M
, N
].
Theorem 3.30. (a) If M and N are right-continuous L
2
-martingales, then
MN [M, N] is a martingale.
(b) If M and N are right-continuous local L
2
-martingales, then MN
[M, N] is a local martingale.
Proof. Both parts follow from Theorem 3.28 after writing
MN [M, N] =
1
2
(M +N)
2
[M +N]
1
2
M
2
[M]
1
2
N
2
[N].
As the last issue we extend the existence results to semimartingales.

Corollary 3.31. Let M be a cadlag local martingale, V a cadlag FV process,
M
0
= V
0
= 0, and Y = Y
0
+ M + V the cadlag semimartingale. Then [Y ]
exists and is given by
[Y ]
t
= [M]
t
+ 2[M, V ]
t
+ [V ]
t
= [M]
t
+ 2
s(0,t]
M
s
V
s
+
s(0,t]
(V
s
)
2
. (3.17)
Furthermore, [Y
] = [Y ]
for any stopping time and the covariation [X, Y ]

exists for any pair of cadlag semimartigales.
Proof. We already know the existence and properties of [M]. According
to Lemma A.10 the two sums on line (3.17) converge absolutely. Thus the
process given by line (3.17) is a cadlag process. Theorem 3.26 and Lemma
A.10 combined imply that line (3.17) is the limit (in probability) of sums
i
(Y
t
i
Y
t
i1
)
2
as mesh() 0. It follows from the limits and the cadlag
property of paths that there is a version with nondecreasing paths. This
proves the existence of [Y ].
102 3. Martingales
[Y
] = [Y ]
follows by looking at line (3.17) term by term and by using

Lemma 3.27. Since quadratic variation exists for semimartingales, so does
[X, Y ] = [(X +Y )/2] [(X Y )/2].
3.5. Doob-Meyer decomposition
In addition to the quadratic variation [M] there is another increasing process
with similar notation, the so-called predictable quadratic variation M),
associated to a square-integrable martingale M. We will not use M) in the
text. But some books discuss only M), so for the sake of completeness we
must address the relationship between [M] and M). It turns out that for
continuous square-integrable martingales [M] and M) coincide, so in that
context one can ignore the dierence without harm.
Throughout this section we work with a xed probability space (, T, P)
with a ltration T
t
assumed to satisfy the usual conditions (complete,
right-continuous).
In order to state a precise denition we need to introduce the predictable
-algebra T on the space R
+
: T is the sub--algebra of B
R
+
T
generated by left-continuous adapted processes. More precisely, T is gen-
erated by events of the form (t, ) : X
t
() B where X is an adapted,
left-continuous process and B B
R
. Such a process is measurable, even pro-
gressively measurable (Lemma 2.4), so these events lie in B
R
+
T. There
are other ways of generating T. For example, continuous processes would
also do. Left-continuity has the virtue of focusing on the predictability:
if we know X
s
for all s < t then we can predict the value X
t
. A thorough
discussion of T has to wait till Section 5.1 where stochastic integration with
respect to cadlag martingales is introduced. Any T-measurable function
X : R
+
R is called a predictable process.
Here is the existence and uniqueness statement. It is a special case of
the Doob-Meyer decomposition.
Theorem 3.32. Assume the ltration T
t
satises the usual conditions.
Let M be a right-continuous square-integrable martingale. Then there is a
unique predictable process M) such that M
2
M) is a martingale.
If M is a right-continuous local L
2
-martingale, then there is a unique
predictable process M) such that M
2
M) is a local martingale.
Uniqueness above means uniqueness up to indistinguishability. M) is
called the predictable quadratic variation. For two such processes we can
dene the predictable covariation by M, N) =
1
4
M + N)
1
4
M N).
From Theorem 3.28 and the uniqueness of M) it follows that if [M] is
predictable then [M] = M).
3.5. Doob-Meyer decomposition 103
Proposition 3.33. Assume the ltration satises the usual conditions.
(a) Suppose M is a continuous square-integrable martingale. Then M) =
[M].
(b) Suppose M is a right-continuous square-integrable martingale with
stationary independent increments: for all s, t 0, M
s+t
M
s
is inde-
pendent of T
s
and has the same distribution as M
t
M
0
. Then M)
t
=
t E[M
2
1
M
2
0
].
Proof. Part (a). By Theorems 3.26 and 3.28, [M] is a continuous, increasing
process such that M
2
[M] is a martingale. Continuity implies that [M] is
predictable. Uniqueness of M) implies M) = [M].
Part (b). The deterministic, continuous function t tE[M
2
1
M
2
0
] is
predictable. For any t > 0 and integer k
E[M
2
kt
M
2
0
] =
k1
j=0
E[M
2
(j+1)t
M
2
jt
] =
k1
j=0
E[(M
(j+1)t
M
jt
)
2
]
= kE[(M
t
M
0
)
2
] = kE[M
2
t
M
2
0
].
Using this twice, for any rational k/n,
E[M
2
k/n
M
2
0
] = kE[M
2
1/n
M
2
0
] = (k/n)E[M
2
1
M
2
0
].
Given an irrational t > 0, pick rationals q
m
t. Fix T q
m
. By right-
continuity of paths, M
qm
M
t
almost surely. Uniform integrability of
M
2
qm
follows by the submartingale property
0 M
2
qm
E[M
2
T
[T
qm
]
and Lemma B.14. Uniform integrability gives convergence of expectations
E[M
2
qm
] E[M
2
t
]. Applying this above gives
E[M
2
t
M
2
0
] = tE[M
2
1
M
2
0
].
Now we can check the martingale property.
E[M
2
t
[T
s
] = M
2
s
+E[M
2
t
M
2
s
[T
s
] = M
2
s
+E[(M
t
M
s
)
2
[T
s
]
= M
2
s
+E[(M
ts
M
0
)
2
] = M
2
s
+E[M
2
ts
M
2
0
]
= M
2
s
+ (t s)E[M
2
1
M
2
0
].
This proposition was tailored to handle our two main examples.
Example 3.34. For a standard Brownian motion B)
t
= [B]
t
= t. For a
compensated Poisson process M
t
= N
t
t,
M)
t
= tE[M
2
1
] = tE[(N
1
)
2
] = t.
104 3. Martingales
We continue this discussion for a while, although what comes next makes
no appearance later on. It is possible to introduce M) without reference to
T. We do so next, and then state the Doob-Meyer decomposition. Recall
Denition 2.17 of an increasing process
Denition 3.35. An increasing process A is natural if for every bounded
cadlag martingale M = M(t) : 0 t < ,
(3.18) E
_
(0,t]
M(s)dA(s) = E
_
(0,t]
M(s)dA(s) for 0 < t < .
The expectationintegral on the left in condition (3.35) is interpreted
in the following way. First for a xed , the function s M(s, ) is
integrated against the (positive) Lebesgue-Stieltjes measure of the function
s A(s, ). The resulting quantity is a measurable function of (Exercise
3.9). Then this function is averaged over the probability space. Similar
interpretation on the right in (3.35), of course. The expectations in (3.35)
can be innite. For a xed ,
_
(0,t]
M(s)dA(s)
sup
s
[M(s)[A(t) <
so the random integral is nite.
Lemma 3.36. Let A be an increasing process and M a bounded cadlag
martingale. If A is continuous then (3.35) holds.
Proof. A cadlag path s M(s, ) has at most countably many disconti-
nuities. If A is continuous, the Lebesgue-Stieltjes measure
A
gives no mass
to singletons:
A
s = A(s)A(s) = 0, and hence no mass to a countable
set either. Consequently
_
(0,t]
_
M(s) M(s)
_
dA(s) = 0.
Much more is true. In fact, an increasing process is natural if and only
if it is predictable. A proof can be found in Chapter 25 of [8]. Consequently
the characterizing property of M) can be taken to be naturalness rather
than predictability.
Denition 3.37. For 0 < u < , let T
u
be the collection of stopping times
that satisfy u. A process X is of class DL if the random variables
X
: T
u
are uniformly integrable for each 0 < u < .
The main example is the following.
Lemma 3.38. A right-continuous nonnegative submartingale is of class
DL.
Proof. Let X be a right-continuous nonnegative submartingale, and 0 <
u < . By (3.3)
0 X
E[X
u
[T
].
By Lemma B.14 the collection of all conditional expectations on the right
is uniformly integrable. Consequently these inequalities imply the uniform
integrability of the collection X
: T
u
.
Here is the main theorem. For a proof see Theorem 1.4.10 in [9].
Theorem 3.39. (Doob-Meyer Decomposition) Assume the underlying l-
tration is complete and right-continuous. Let X be a right-continuous sub-
martingale of class DL. Then there is an increasing natural process A,
unique up to indistinguishability, such that X A is a martingale.
Let us return to a right-continuous square-integrable martingale M. We
can now equivalently dene M) as the unique increasing, natural process
such that M
2
t
M)
t
is a martingale, given by the Doob-Meyer decomposi-
tion.
3.6. Spaces of martingales
Stochastic integrals will be constructed as limits. To get the desirable path
properties for the integrals, it is convenient to take these limits in a space
of stochastic processes rather than simply in terms of individual random
variables. This is the purpose for introducing two spaces of martingales.
Given a probability space (, T, P) with a ltration T
t
, let /
2
denote
the space of square-integrable cadlag martingales on this space with respect
to T
t
. The subspace of /
2
of martingales with continuous paths is /
c
2
.
/
2
and /
c
2
are both linear spaces.
We measure the size of a martingale M /
2
with the quantity
(3.19) |M|
/
2
=
k=1
2
k
_
1 |M
k
|
L
2
(P)
_
.
|M
k
|
L
2
(P)
= E[ [M
k
[
2
]
1/2
is the L
2
norm of M
k
. | |
/
2
is not a norm
because |M|
/
2
is not necessarily equal to [[ |M|
/
2
for a real number
. But the triangle inequality
|M +N|
/
2
|M|
/
2
+|N|
/
2
is valid, and follows from the triangle inequality of the L
2
norm and because
1 (a +b) 1 a + 1 b for a, b 0.
106 3. Martingales
Hence we can dene a metric, or distance function, between two martingales
M, N /
2
by
(3.20) d
/
2
(M, N) = |M N|
/
2
.
A technical issue arises here. A basic property of a metric is that the
distance between two elements is zero i these two elements coincide. But
with the above denition we have d
/
2
(M, N) = 0 if M and N are in-
distinguishable, even if they are not exactly equal as functions. So if we
were to precisely follow the axiomatics of metric spaces, indistinguishable
martingales should actually be regarded as equal. The mathematically so-
phisticated way of doing this is to regard /
2
not as a space of processes
but as a space of equivalence classes
M = N : N is a square-integrable cadlag martingale on (, T, P),
and M and N are indistinguishable
Fortunately this technical point does not cause any diculties. We shall
continue to regard the elements of /
2
as processes in our discussion, and
remember that two indistinguishable processes are really two representa-
tives of the same underlying process.
Lemma 3.40. Assume the underlying probability space (, T, P) and the
ltration T
t
complete. Let indistinguishable processes be interpreted as
equal. Then /
2
is a complete metric space under the metric d
/
2
. The
subspace /
c
2
is closed, and hence a complete metric space in its own right.
Proof. Suppose M /
2
and |M|
/
2
= 0. Then E[M
2
k
] = 0 for each
k N. Since M
2
t
is a submartingale, E[M
2
t
] E[M
2
k
] for t k, and
consequently E[M
2
t
] = 0 for all t 0. In particular, for each xed t,
PM
t
= 0 = 1. A countable union of null sets is a null set, and so there
exists an event
0
such that P(
0
) = 1 and M
q
() = 0 for all
0
and q Q
+
. By right-continuity, then M
t
() = 0 for all
0
and t R
+
.
This shows that M is indistinguishable from the identically zero process.
The above paragraph shows that d
/
2
(M, N) = 0 implies M = N, in the
sense that M and N are indistinguishable. We already observed above that
d
/
2
satises the triangle inequality. The remaining property of a metric is
the symmetry d
/
2
(M, N) = d
/
2
(N, M) which is true by the denition.
To check completeness, let M
(n)
: n N be a Cauchy sequence in the
metric d
/
2
in the space /
2
. We need to show that there exists M /
2
such that |M
(n)
M|
/
2
0 as n .
For any t k N, rst because (M
(m)
t
M
(n)
t
)
2
is a submartingale,
and then by the denition (3.19),
1 E
_
(M
(m)
t
M
(n)
t
)
2
1/2
1 E
_
(M
(m)
k
M
(n)
k
)
2
1/2
2
k
|M
(m)
M
(n)
|
/
2
.
It follows that for each xed t, M
(n)
t
is a Cauchy sequence in L
2
(P). By
the completeness of the space L
2
(P), for each t 0 there exists a random
variable Y
t
L
2
(P) dened by the mean-square limit
(3.21) lim
n
E
_
(M
(n)
t
Y
t
)
2
= 0.
Take s < t and A T
s
. Let n in the equality E[1
A
M
(n)
t
] = E[1
A
M
(n)
s
].
Mean-square convergence guarantees the convergence of the expectations,
and in the limit
(3.22) E[1
A
Y
t
] = E[1
A
Y
s
].
We could already conclude here that Y
t
is a martingale, but Y
t
is not
our ultimate limit because we need the cadlag path property.
To get a cadlag limit we use a Borel-Cantelli argument. By inequality
(3.8),
(3.23) P
_
sup
0tk
[M
(m)
t
M
(n)
t
[
_

2
E
_
(M
(m)
k
M
(n)
k
)
2
.
This enables us to choose a subsequence n
k
such that
(3.24) P
_
sup
0tk
[M
(n
k+1
)
t
M
(n
k
)
t
[ 2
k
_
2
k
.
To achieve this, start with n
0
= 1, and assuming n
k1
has been chosen, pick
n
k
> n
k1
so that
|M
(m)
M
(n)
|
/
2
2
3k
for m, n n
k
. Then for m n
k
,
1 E
_
(M
(m)
k
M
(n
k
)
k
)
2
1/2
2
k
|M
(m)
M
(n
k
)
|
/
2
2
2k
,
and the minimum with 1 is superuous since 2
2k
< 1. Substituting this
back into (3.23) with = 2
k
gives (3.24) with 2
2k
on the right-hand side.
By the Borel-Cantelli lemma, there exists an event
1
with P(
1
) = 1
such that for
1
,
sup
0tk
[M
(n
k+1
)
t
() M
(n
k
)
t
()[ < 2
k
for all but nitely many ks. It follows that the sequence of cadlag functions
t M
(n
k
)
t
() are Cauchy in the uniform metric over any bounded time
interval [0, T]. By Lemma A.5 in the Appendix, for each T < there exists
108 3. Martingales
a cadlag process N
(T)
t
() : 0 t T such that M
(n
k
)
t
() converges to
N
(T)
t
() uniformly on the time interval [0, T], as k , for any
1
.
N
(S)
t
() and N
(T)
t
() must agree for t [0, S T], since both are limits of
the same sequence. Thus we can dene one cadlag function t M
t
() on
R
+
for
1
, such that M
(n
k
)
t
() converges to M
t
() uniformly on each
bounded time interval [0, T]. To have M dened on all of , set M
t
() = 0
for /
1
.
The event
1
lies in T
t
by the assumption of completeness of the ltra-
tion. Since M
(n
k
)
t
M
t
on
1
while M
t
= 0 on
c
1
, it follows that M
t
is
T
t
-measurable. The almost sure limit M
t
and the L
2
limit Y
t
of the sequence
M
(n
k
)
t
must coincide almost surely. Consequently (3.22) becomes
(3.25) E[1
A
M
t
] = E[1
A
M
s
]
for all A T
s
and gives the martingale property for M.
To summarize, M is now a square-integrable cadlag martingale, in other
words an element of /
2
. The nal piece, namely |M
(n)
M|
/
2
0,
follows because we can replace Y
t
by M
t
in (3.21) due to the almost sure
equality M
t
= Y
t
.
If all M
(n)
are continuous martingales, the uniform convergence above
produces a continuous limit M. This shows that /
c
2
is a closed subspace of
/
2
under the metric d
/
2
.
By adapting the argument above from equation (3.23) onwards, we get
this useful consequence of convergence in /
2
.
Lemma 3.41. Suppose |M
(n)
M|
/
2
0 as n . Then for each
T < and > 0,
(3.26) lim
n
P
_
sup
0tT
[M
(n)
t
M
t
[
_
= 0.
Furthermore, there exists a subsequence M
(n
k
)
and an event
0
such that
P(
0
) = 1 and for each
0
and T < ,
(3.27) lim
n
sup
0tT
M
(n
k
)
t
() M
t
()
= 0.
The convergence dened by (3.26) for all T < and > 0 is called
uniform convergence in probability on compact intervals.
We shall write /
2,loc
for the space of cadlag local L
2
-martingales with
respect to a given ltration T
t
on a given probability space (, T, P). We
do not introduce a distance function on this space.
Exercises 109
Exercises
Exercise 3.1. Create a simple example that shows that E(M
[T
) = M
cannot hold for general random times that are not stopping times.
Exercise 3.2. Let B be a standard Brownian motion and = inft 0 :
B
t
= 1. Show that P( < ) = 1, E(B
n
) = E(B
0
) for all n N but
E(B
) = E(B
0
) fails. In other words, optional stopping does not work for
all stopping times.
Exercise 3.3. Let be a stopping time and M a right-continuous mar-
tingale. Corollaries 3.7 and 3.8 imply that M
is a martingale for both

ltrations T
t
and T
t
. This exercise shows that any martingale with
respect to T
t
is also a martingale with respect to T
t
.
So suppose X
t
is a martingale with respect to T
t
. That is, X
t
is
T
t
-measurable and integrable, and E(X
t
[ T
s
) = X
s
for s < t.
(a) Show that for s < t, 1 sX
t
= 1 sX
s
. Hint: 1 sX
t
is T
s
-measurable. Multiply the martingale property by 1 s.
(b) Show that X
t
is a martingale with respect to T
t
. Hint: With
s < t and A T
s
, start with E(1
A
X
t
) = E(1
A
1 sX
t
) + E(1
A
1 >
sX
t
).
Exercise 3.4. (a) Suppose X is an adapted process and
1
, . . . ,
k
stopping
times such that X
1
, . . . , X
k
are all martingales. Show that then X
k
is a martingale. Hint. In the case k = 2 write X
2
in terms of X
1
, X
2
and X
2
.
(b) Let M be an adapted process, and suppose there exists a sequence
of stopping times
n
(a.s.) such that M
n
is a local martingale for
each n. Show that then M is a local martingale.
Exercise 3.5. Let M be a right-continuous local martingale such that M
t

L
1
(P) for all t R
+
. Show that then M is a martingale. Hint: Write down
the martingale property of the localized process and use the dominated
convergence theorem of conditional expectations.
Exercise 3.6. Let M be a local martingale with localizing sequence
k
.
Show that M
nn
is uniformly integrable (in other words, that the family
of random variables M
nn
t
: t R
+
is uniformly integrable). Hint: Use
Lemma B.14.
Exercise 3.7. Let M be some process and X
t
= M
t
M
0
. Show that if
M is a local martingale, then so is X, but the converse is not true. For the
counterexample, consider simply M
t
= for a xed random variable .
Exercise 3.8. Let (, T, P) be a complete probability space, ^ = N
T : P(N) = 0 the class of null sets, and take a random variable X L
1
(P)
110 3. Martingales
but not in L
2
(P). For t R
+
dene
T
t
=
_
(^), 0 t < 1
T, t 1
and M
t
=
_
EX, 0 t < 1
X, t 1.
Then T
t
satises the usual conditions and M is a martingale but not a
local L
2
martingale. Hint: Show that < 1 (^) for any stopping time
.
Exercise 3.9. Let A be an increasing process, and : R
+
R a
bounded B
R
+
T-measurable function. Let T < .
g
() =
_
(0,T]
(t, )dA
t
()
is an T-measurable function. Show also that, for any B
R
+
T-measurable
nonnegative function : R
+
R
+
,
g
() =
_
(0,)
(t, )dA
t
()
is an T-measurable function. The integrals are Lebesgue-Stieltjes integrals,
evaluated separately for each . The only point in separating the two cases
is that if takes both positive and negative values, the integral over the
entire interval [0, ) might not be dened.
Hint: One can start with (t, ) = 1
(a,b]
(t, ) for 0 a < b < and
T. Then apply Theorem B.2 from the Appendix.
Exercise 3.10. Let N = N(t) : 0 t < be a homogeneous rate
Poisson process with respect to T
t
and M
t
= N
t
t the compensated
Poisson process. We have seen that the quadratic variation is [M]
t
= N
t
while M)
t
= t. It follows that N cannot be a natural increasing process.
In this exercise you show that the naturalness condition fails for N.
(a) Let > 0. Show that
X(t) = expN(t) +t(1 e
)
is a martingale.
(b) Show that N is not a natural increasing process, by showing that for
X dened above, the condition
E
_
(0,t]
X(s)dN(s) = E
_
(0,t]
X(s)dN(s)
fails. (In case you protest that X is not a bounded martingale, x T > t
and consider X(s T).)
Chapter 4
Stochastic Integral
with respect to
Brownian Motion
As an introduction to stochastic integration we develop the stochastic inte-
gral with respect to Brownian motion. This can be done with fewer tech-
nicalities than are needed for integrals with respect to general cadlag mar-
tingales, so the basic ideas of stochastic integration are in clearer view. The
same steps will be repeated in the next chapter in the development of the
more general integral. For this reason we leave the routine verications in
this chapter as exercises. We develop only enough properties of the inte-
gral to enable us to get to the point where the integral of local integrands
is dened. This chapter can be skipped without loss. Only Lemma 4.2 is
referred to later in Section 5.5. This lemma is of technical nature and can
be read independently of the rest of the chapter.
Throughout this chapter, (, T, P) is a xed probability space with a
ltration T
t
, and B = B
t
is a standard one-dimensional Brownian mo-
tion with respect to the ltration T
t
(Denition 2.26). We assume that T
and each T
t
contains all subsets of events of probability zero, an assumption
that entails no loss of generality as explained in Section 2.1.
To begin, let us imagine that we are trying to dene the integral
_
t
0
B
s
dB
s
through an approximation by Riemann sums. The next calculation reveals
that, contrary to the familiar Riemann and Stieltjes integrals with reason-
ably regular functions, the choice of point of evaluation in a partition interval
matters.
111
112 4. Stochastic Integral with respect to Brownian Motion
Lemma 4.1. Fix a number u [0, 1]. Given a partition = 0 = t
0
< t
1
<
< t
m()
= t, let s
i
= (1 u)t
i
+ut
i+1
, and dene
S() =
m()1
i=0
B
s
i
(B
t
i+1
B
t
i
).
Then
lim
mesh()0
S() =
1
2
B
2
t

1
2
t +ut in L
2
(P).
Proof. First check the algebra identity
b(a c) =
a
2
2

c
2
2

(a c)
2
2
+ (b c)
2
+ (a b)(b c).
Applying this,
S() =
1
2
B
2
t

1
2
i
(B
t
i+1
B
t
i
)
2
+
i
(B
s
i
B
t
i
)
2
+
i
(B
t
i+1
B
s
i
)(B
s
i
B
t
i
)
1
2
B
2
t
S
1
() +S
2
() +S
3
()
where the last equality denes the sums S
1
(), S
2
(), and S
3
(). By Propo-
sition 2.42,
lim
mesh()0
S
1
() =
1
2
t in L
2
(P).
For the second sum,
E[S
2
()] =
i
(s
i
t
i
) = u
i
(t
i+1
t
i
) = ut,
and
Var[S
2
()] =
i
Var[(B
s
i
B
t
i
)
2
] = 2
i
(s
i
t
i
)
2
2
i
(t
i+1
t
i
)
2
2t mesh()
which vanishes as mesh() 0. The factor 2 above comes from Gaussian
properties: if X is a mean zero normal with variance
2
, then
Var[X
2
] = E[X
4
]
_
E[X
2
]
_
2
= 3
4
4
= 2
2
.
The vanishing of the variance of S
2
() as mesh() 0 is equivalent to
lim
mesh()0
S
2
() = ut in L
2
(P).
4. Stochastic Integral with respect to Brownian Motion 113
Lastly, we show that S
3
() vanishes in L
2
(P) as mesh() 0.
E[S
3
()
2
] = E
_ _
i
(B
t
i+1
B
s
i
)(B
s
i
B
t
i
)
_
2
_
=
i
E
_
(B
t
i+1
B
s
i
)
2
(B
s
i
B
t
i
)
2
i,=j
E
_
(B
t
i+1
B
s
i
)(B
s
i
B
t
i
)(B
t
j+1
B
s
j
)(B
s
j
B
t
j
)
i
(t
i+1
s
i
)(s
i
t
i
)
i
(t
i+1
t
i
)
2
t mesh()
which again vanishes as mesh() 0.
Proposition 2.28 says that only one choice, namely u = 0 or that s
i
= t
i
,
the initial point of the partition interval, makes the limit of S() into a
martingale. This is the choice for the Ito integral with respect to Brownian
motion that we develop.
For a measurable process X, the L
2
-norm over the set [0, T] is
(4.1) |X|
L
2
([0,T])
=
_
E
_
[0,T]
[X(t, )[
2
dt
_
1/2
.
Let /
2
(B) denote the collection of all measurable, adapted processes X such
that
|X|
L
2
([0,T])
<
for all T < . A metric on /
2
(B) is dened by d
/
2
(X, Y ) = |X Y |
/
2
(B)
where
(4.2) |X|
/
2
(B)
=
k=1
2
k
_
1 |X|
L
2
([0,k])
_
.
The triangle inequality
|X +Y |
/
2
(B)
|X|
/
2
(B)
+|Y |
/
2
(B)
is valid, and this gives the triangle inequality
d
/
2
(X, Y ) d
/
2
(X, Z) +d
/
2
(Z, Y )
required for d
/
2
(X, Y ) to be a genuine metric.
To have a metric, one also needs the property d
/
2
(X, Y ) = 0 i X =
Y . We have to adopt the point of view that two processes X and Y are
considered equal if the set of points (t, ) where X(t, ) ,= Y (t, ) has
mP-measure zero. Equivalently,
(4.3)
_

0
PX(t) ,= Y (t) dt = 0.
In particular processes that are indistinguishable, or modications of each
other have to be considered equal under this interpretation.
The symmetry d
/
2
(X, Y ) = d
/
2
(Y, X) is immediate from the deni-
tion. So with the appropriate meaning assigned to equality, /
2
(B) is a
metric space. Convergence X
n
X in /
2
(B) is equivalent to X
n
X in
L
2
([0, T] ) for each T < .
The symbol B reminds us that /
2
(B) is a space of integrands for sto-
chastic integration with respect to Brownian motion.
The nite mean square requirement for membership in /
2
(B) is restric-
tive. For example, it excludes some processes of the form f(B
t
) where f is a
smooth but rapidly growing function. Consequently from /
2
(B) we move to
a wider class of processes denoted by /(B), where the mean square require-
ment is imposed only locally and only on integrals over the time variable.
Precisely, /(B) is the class of measurable, adapted processes X such that
(4.4) P
_
:
_
T
0
X(t, )
2
dt < for all T <
_
= 1.
This will be the class of processes X for which the stochastic integral process
with respect to Brownian motion, denoted by
(X B)
t
=
_
t
0
X
s
dB
s
will ultimately be dened.
The development of the integral starts from a class of processes for which
the integral can be written down directly. There are several possible starting
places. Here is our choice.
A simple predictable process is a process of the form
(4.5) X(t, ) =
0
()1
0
(t) +
n1
i=1
i
()1
(t
i
,t
i+1
]
(t)
where n is nite integer, 0 = t
0
= t
1
< t
2
< < t
n
are time points, and for
0 i n1,
i
is a bounded T
t
i
-measurable random variable on (, T, P).
Predictability refers to the fact that the value X
t
can be predicted from
X
s
: s < t. Here this point is rather simple because X is left-continuous so
X
t
= limX
s
as s t. In the next chapter we need to deal seriously with the
notion of predictability but in this chapter it is not really needed. We use
the term only to be consistent with what comes later. The value
0
at t = 0
is irrelevant both for the stochastic integral of X and for approximating
general processes. We include it so that the value X(0, ) is not articially
restricted.
A key point is that processes in /
2
(B) can be approximated by simple
predictable processes in the /
2
(B)-distance. We split this approximation
into two steps.
Lemma 4.2. Suppose X is a bounded, measurable, adapted process. Then
there exists a sequence X
n
of simple predictable processes such that, for
any 0 < T < ,
lim
n
E
_
T
0
X
n
(t) X(t)
2
dt = 0.
Proof. We begin by showing that, given T < , we can nd simple pre-
dictable processes Y
(T)
k
that vanish outside [0, T] and satisfy
(4.6) lim
k
E
_
T
0
Y
(T)
k
(t) X(t)
2
dt = 0.
Extend X to R by dening X(t, ) = 0 for t < 0. For each n N and
s [0, 1], dene
Z
n,s
(t, ) =
jZ
X(s + 2
n
j, )1
(s+2
n
j,s+2
n
(j+1)]
(t) 1
[0,T]
(t).
Z
n,s
is a simple predictable process. It is jointly measurable as a function of
the triple (s, t, ), so it can be integrated over all three variables. Fubinis
theorem allows us to perform these integrations in any order we please.
We claim that
(4.7) lim
n
E
_
T
0
dt
_
1
0
ds
Z
n,s
(t) X(t)
2
= 0.
This limit relies on the property called L
p
-continuity (Proposition A.18).
To prove (4.7), start by considering a xed and rewrite the double
integral as follows:
_
T
0
dt
_
1
0
ds
Z
n,s
(t, ) X(t, )
2
=
_
T
0
dt
jZ
_
1
0
ds
X(s + 2
n
j, ) X(t, )
2
1
(s+2
n
j,s+2
n
(j+1)]
(t)
=
_
T
0
dt
jZ
_
1
0
ds
X(s + 2
n
j, ) X(t, )
2
1
[t2
n
(j+1),t2
n
j)
(s).
For a xed t, the s-integral vanishes unless
0 < t 2
n
j and t 2
n
(j + 1) < 1,
which is equivalent to 2
n
(t 1) 1 < j < 2
n
t. For each xed t and j, change
variables in the s-integral: let h = t s2
n
j. Then s [t 2
n
(j +1), t
2
n
j) i h (0, 2
n
]. These steps turn the integral into
_
T
0
dt
jZ
1
2
n
(t1)1<j<2
n
t
_
2
n
0
dh
X(t h, ) X(t, )
2
(2
n
+ 1)
_
2
n
0
dh
_
T
0
dt
X(t h, ) X(t, )
2
.
The last upper bound follows because there are at most 2
n
+ 1 j-values
that satisfy the restriction 2
n
(t 1) 1 < j < 2
n
t. Now take expectations
through the inequalities. We get
E
_
T
0
dt
_
1
0
ds
Z
n,s
(t) X(t)
2
dt
(2
n
+ 1)
_
2
n
0
dh
_
E
_
T
0
dt
X(t h, ) X(t, )
2
_
.
The last line vanishes as n for these reasons: First,
lim
h0
_
T
0
dt
X(t h, ) X(t, )
2
= 0
for each xed by virtue of L
p
-continuity (Proposition A.18). Since X is
bounded, the expectations converge by dominated convergence:
lim
h0
E
_
T
0
dt
X(t h, ) X(t, )
2
= 0.
Last,
lim
n
(2
n
+ 1)
_
2
n
0
dh
_
E
_
T
0
dt
X(t h, ) X(t, )
2
_
= 0
follows from this general fact, which we leave as an exercise: if f(x) 0 as
x 0, then
1
_

0
f(x) dx 0 as 0.
We have justied (4.7).
We can restate (4.7) by saying that the function
n
(s) = E
_
T
0
dt
Z
n,s
(t) X(t)
2
dt
satises
n
0 in L
1
[0, 1]. Consequently a subsequence
n
k
(s) 0 for
Lebesgue-almost every s [0, 1]. Fix any such s. Dene Y
(T)
k
= Z
n
k
,s
, and
we have (4.6).
To complete the proof, create the simple predictable processes Y
(m)
k

for all T = m N. For each m, pick k
m
such that
E
_
m
0
dt
Y
(m)
km
(t) X(t)
2
dt <
1
m
.
Then X
m
= Y
(m)
km
satises the requirement of the lemma.
Proposition 4.3. Suppose X /
2
(B). Then there exists a sequence of
simple predictable processes X
n
such that |X X
n
|
/
2
(B)
0.
Proof. Let X
(k)
= (Xk)(k). Since [X
(k)
X[ [X[ and [X
(k)
X[ 0
pointwise on R
+
,
lim
k
E
_
m
0
X
(k)
(t) X(t)
2
dt = 0
for each m N. This is equivalent to |X X
(k)
|
/
2
(B)
0. Given > 0,
pick k such that |X X
(k)
|
/
2
(B)
/2. Since X
(k)
is a bounded process,
the previous lemma gives a simple predictable process Y such that |X
(k)
Y |
/
2
(B)
/2. By the triangle inequality |X Y |
/
2
(B)
. Repeat this
argument for each = 1/n, and let X
n
be the Y thus selected. This gives
the approximating sequence X
n
.
We are ready to proceed to the construction of the stochastic integral.
There are three main steps.
(i) First an explicit formula is given for the integral X B of a sim-
ple predictable process X. This integral will be a continuous L
2
-
martingale.
(ii) A general process X in /
2
(B) is approximated by simple processes
X
n
. One shows that the integrals X
n
B of the approximating
simple processes converge to a uniquely dened continuous L
2
-
martingale which is then dened to be the stochastic integral X B.
(iii) A localization step is used to get from integrands in /
2
(B) to in-
tegrands in /(B). The integral X B is a continuous local L
2
-
martingale
The lemmas needed along the way for this development make valuable
exercises. So we give only hints for the proofs, and urge the rst-time reader
to give them a try. These same properties are proved again with full detail
in the next chapter when we develop the more general integral. The proofs
for the Brownian case are very similar to those for the general case.
We begin with the integral of simple processes. For a simple predictable
process of the type (4.5), the stochastic integral is the process X B dened
by
(4.8) (X B)
t
() =
n1
i=1
i
()
_
B
t
i+1
t
() B
t
i
t
()
_
.
Note that our convention is such that the value of X at t = 0 does not
inuence the integral. We also write I(X) = X B when we need a symbol
for the mapping I : X X B.
Let o
2
denote the space of simple predictable processes. It is a subspace
of /
2
(B). An element X of o
2
can be represented in the form (4.5) in many
dierent ways. We need to check that the integral X B depends only on
the process X and not on the particular representation. Also, we need to
know that o
2
is a linear space, and that the integral I(X) is a linear map
on o
2
.
Lemma 4.4. (a) Suppose the process X in (4.5) also satises
X
t
() =
0
()1
0
(t) +
m1
j=1
j
()1
(s
i
,s
i+1
]
(t)
for all (t, ), where 0 = s
0
= s
1
< s
2
< < s
m
< and
j
is T
s
j
-
measurable for 0 j m1. Then for each (t, ),
n1
i=1
i
()
_
B
t
i+1
t
() B
t
i
t
()
_
=
m1
j=1
i
()
_
B
s
i+1
t
() B
s
i
t
()
_
.
In other words, X B is independent of the representation.
(b) o
2
is a linear space, in other words for X, Y o
2
and reals and
, X +Y o
2
. The integral satises
(X +Y ) B = (X B) +(Y B).
Hints for proof. Let u
k
= s
j
t
i
be the common renement of
the partitions s
j
and t
i
. Rewrite both representations of X in terms
of u
k
. The same idea can be used for part (b) to write two arbitrary
simple processes in terms of a common partition, which makes adding them
easy.
Next we need some continuity properties for the integral. Recall the
distance measure | |
/
2
dened for continuous L
2
-martingales by (3.19).
Lemma 4.5. Let X o
2
. Then X B is a continuous square-integrable
martingale. We have these isometries:
(4.9) E
_
(X B)
2
t
= E
_
t
0
X
2
s
ds for all t 0,
and
(4.10) |X B|
/
2
= |X|
/
2
(B)
.
Hints for proof. To show that X B is a martingale, start by proving this
statement: if u < v and is a bounded T
u
-measurable random variable,
then Z
t
= (B
tv
B
tu
) is a martingale.
To prove (4.9), rst square:
(X B)
2
t
=
n1
i=1
2
i
_
B
tt
i+1
B
tt
i
_
2
+ 2
i<j
j
(B
tt
i+1
B
tt
i
)(B
tt
j+1
B
tt
j
).
Then compute the expectations of all terms.
From the isometry property we can deduce that simple process approx-
imation gives approximation of stochastic integrals.
Lemma 4.6. Let X /
2
(B). Then there is a unique continuous L
2
-
martingale Y such that, for any sequence of simple predictable processes
X
n
such that
|X X
n
|
/
2
(B)
0,
we have
|Y X
n
B|
/
2
0.
Hints for proof. It all follows from these facts: an approximating sequence
of simple predictable processes exists for each process in /
2
(B), a convergent
sequence in a metric space is a Cauchy sequence, a Cauchy sequence in a
complete metric space converges, the space /
c
2
of continuous L
2
-martingales
is complete, the isometry (4.10), and the triangle inequality.
Note that uniqueness of the process Y dened in the lemma means
uniqueness up to indistinguishability: any process

Y indistinguishable from
Y also satises |
Y X
n
B|
/
2
0.
Now we can state the denition of the integral of /
2
(B)-integrands with
respect to Brownian motion.
Denition 4.7. Let B be a Brownian motion on a probability space (, T, P)
t
. For any measurable adapted process
X /
2
(B), the stochastic integral I(X) = X B is the square-integrable
continuous martingale that satises
lim
n
|X B X
n
B|
/
2
= 0
for any sequence X
n
o
2
of simple predictable processes such that
|X X
n
|
/
2
(B)
0.
The process I(X) is unique up to indistinguishability. Alternative notation
for the stochastic integral is the familiar
_
t
0
X
s
dB
s
= (X B)
t
.
The reader familiar with more abstract principles of analysis should note
that the extension of the stochastic integral XB from X o
2
to X /
2
(B)
is an instance of a general, classic argument. A uniformly continuous map
from a metric space into a complete metric space can always be extended to
the closure of its domain (Lemma A.3). If the spaces are linear, the linear
operations are continuous, and the map is linear, then the extension is a
linear map too (Lemma A.4). In this case the map is X X B, rst
dened for X o
2
. Uniform continuity follows from linearity and (4.10).
Proposition 4.3 implies that the closure of o
2
in /
2
(B) is all of /
2
(B).
Some books rst dene the integral (X B)
t
at a xed time t as a map
from L
2
([0, t], mP) into L
2
(P), utilizing the completeness of L
2
-spaces.
Then one needs a separate argument to show that the integrals dened for
dierent times t can be combined into a continuous martingale t (X B)
t
.
We dened the integral directly as a map into the space of martingales /
c
2
to avoid the extra argument. Of course, we did not really save work. We
just did part of the work earlier when we proved that /
c
2
is a complete
space (Lemma 3.40).
Example 4.8. In the denition (4.5) of the simple predictable process we
required the
i
bounded because this will be convenient later. For this
section it would have been more convenient to allow square-integrable
i
.
So let us derive the integral for that case. Let
X(t) =
m1
i=1
i
1
(s
i
,s
i+1
]
(t)
where 0 s
1
< < s
m
and each
i
L
2
(P) is T
s
i
-measurable. Check
that a sequence of approximating simple processes is given by
X
k
(t) =
m1
i=1
(k)
i
1
(s
i
,s
i+1
]
(t)
with truncated variables
(k)
i
= (
i
k) (k). And then that
_
t
0
X(s) dB
s
=
m1
i=1
i
(B
ts
i+1
B
ts
i
).
There is something to check here because it is not immediately obvious that
the terms on the right above are square-integrable. See Exercise 4.2.
Example 4.9. One can check that Brownian motion itself is an element of
/
2
(B). Let t
n
i
= i2
n
and
X
n
(t) =
2
n
n1
i=0
B
t
n
i
1
(t
n
i
,t
n
i+1
]
(t).
For any T < n,
E
_
T
0
[X
n
(s) B
s
[
2
dt
2
n
n1
i=0
_
t
n
i+1
t
n
i
E
_
(B
s
B
t
n
i
)
2
ds
=
2
n
n1
i=0
1
2
(t
n
i+1
t
n
i
)
2
=
1
2
n2
n
.
Thus X
n
converges to B in /
2
(B) as n . By Example 4.8
_
t
0
X
n
(s) dB
s
=
2
n
n1
i=1
B
t
n
i
(B
tt
n
i+1
B
tt
n
i
).
By the isometry (4.12) in the next Proposition, this integral converges to
_
t
0
B
s
dB
s
in L
2
as n , so by Lemma 4.1,
_
t
0
B
s
dB
s
=
1
2
B
2
t

1
2
t.
Before developing the integral further, we record some properties.
Proposition 4.10. Let X, Y /
2
(B).
(a) Linearity carries over:
(X +Y ) B = (X B) +(Y B).
(b) We have again the isometries
(4.11) E
_
(X B)
2
t
= E
_
t
0
X
2
s
ds for all t 0,
and
(4.12) |X B|
/
2
= |X|
/
2
(B)
.
In particular, if X, Y /
2
(B) are mP-equivalent in the sense (4.3), then
X B and Y B are indistinguishable.
(c) Suppose is a stopping time such that X(t, ) = Y (t, ) for t ().
Then for almost every , (X B)
t
() = (Y B)
t
() for t ().
Hints for proof. Parts (a)(b): These properties are inherited from the
integrals of the approximating simple processes X
n
. One needs to justify
taking limits in Lemma 4.4(b) and Lemma 4.5.
The proof of part (c) is dierent from the one that is used in the next
chapter. So we give here more details than in previous proofs.
By considering Z = XY , it suces to prove that if Z /
2
(B) satises
Z(t, ) = 0 for t (), then (Z B)
t
() = 0 for t ().
Assume rst that Z is bounded, so [Z(t, )[ C. Pick a sequence Z
n
of simple predictable processes that converge to Z in /

2
(B). Let Z
n
be of
the generic type (recall (4.5))
Z
n
(t, ) =
m(n)1
i=1
n
i
()1
(t
n
i
,t
n
i+1
]
(t).
(To approximate a process in /
2
(B) the t = 0 term in (4.5) is not needed
because values at t = 0 do not aect dt-integrals.) We may assume [
n
i
[ C
always, for if this is not the case, replacing
n
i
by (
n
i
C) (C) will only
improve the approximation. Dene another sequence of simple predictable
processes by
Z
n
(t) =
m(n)1
i=1
n
i
1 t
n
i
1
(t
n
i
,t
n
i+1
]
(t).
We claim that
(4.13)

Z
n
Z in /
2
(B).
To prove (4.13), note rst that Z
n
1 < t Z1 < t = Z in /
2
(B). So
it suces to show that
(4.14) Z
n
1 < t

Z
n
0 in /
2
(B).
Estimate
Z
n
(t)1 < t

Z
n
(t)
1 < t 1 t
n
i
1
(t
n
i
,t
n
i+1
]
(t)
C
i
1t
n
i
< < t
n
i+1
1
(t
n
i
,t
n
i+1
]
(t).
Integrate over [0, T] , to get
E
_
T
0
Z
n
(t)1 < t

Z
n
(t)
2
dt
C
2
i
Pt
n
i
< < t
n
i+1
_
T
0
1
(t
n
i
,t
n
i+1
]
(t) dt
C
2
maxT t
n
i+1
T t
n
i
: 1 i m(n) 1.
We can articially add partition points t
n
i
to each Z
n
so that this last quan-
tity converges to 0 as n , for each xed T. This veries (4.14), and
thereby (4.13).
The integral of

Z
n
is given explicitly by
(
Z
n
B)
t
=
m(n)1
i=1
n
i
1 t
n
i
(B
tt
n
i+1
B
tt
n
i
).
By inspecting each term, we see that (
Z
n
B)
t
= 0 if t . By the denition
of the integral and (4.13),

Z
n
B Z B in /
c
2
. Then by Lemma 3.41
there exists a subsequence

Z
n
k
B and an event
0
of full probability such
that, for each
0
and T < ,
(
Z
n
B)
t
() (Z B)
t
() uniformly for 0 t T.
For any
0
, in the n limit (Z B)
t
() = 0 for t (). Part (c)
has been proved for a bounded process.
To complete the proof, given Z /
2
(B), let Z
(k)
(t, ) = (Z(t, ) k)
(k), a bounded process in /
2
(B) with the same property Z
(k)
(t, ) = 0 if
t (). Apply the previous step to Z
(k)
and justify what happens in the
limit.
Next we extend the integral to integrands in /(B). Given a process
X /(B), dene the stopping times
(4.15)
n
() = inf
_
t 0 :
_
t
0
X(s, )
2
ds n
_
.
These are stopping times by Corollary 2.10 because the function
t
_
t
0
X(s, )
2
ds
is continuous for each in the event in the denition (4.4). By this same
continuity, if
n
() < ,
_

0
X(s, )
2
1s
n
() ds =
_
n()
0
X(s, )
2
ds = n.
Let X
n
(t, ) = X(t, )1t
n
. Adaptedness of X
n
follows from t
n
=
n
< t
c
T
t
. The function (t, ) 1t
n
() is B
R
+
T-
measurable by Exercise 4.1, hence X
n
is a measurable process. Together
these properties say that X
n
/
2
(B), and the stochastic integrals X
n
B
are well-dened.
The goal is to show that there is a uniquely dened limit of the processes
X
n
B as n , and this will then serve as the denition of X B.
Lemma 4.11. For almost every , (X
m
B)
t
() = (X
n
B)
t
() for all
t
m
()
n
().
Proof. Immediate from Proposition 4.10(c).
The lemma says that, for a given (t, ), once n is large enough so that
n
() t, the value (X
n
B)
t
() does not change with n. The denition
(4.4) guarantees that
n
() for almost every . These ingredients
almost justify the next extension of the stochastic integral to /(B).
Denition 4.12. Let B be a Brownian motion on a probability space
(, T, P) with respect to a ltration T
t
, and X /(B). Let
0
be
the event of full probability on which
n
and the conclusion of Lemma
4.11 holds for all pairs m, n. The stochastic integral X B is dened for

0
by
(4.16) (X B)
t
() = (X
n
B)
t
() for any n such that
n
() t.
For /
0
dene (X B)
t
() 0. The process X B is a continuous local
L
2
-martingale.
To justify the claim that X B is a local L
2
-martingale, just note that
n
serves as a localizing sequence:
(X B)
n
t
= (X B)
tn
= (X
n
B)
tn
= (X
n
B)
n
t
,
so (X B)
n
= (X
n
B)
n
, which is an L
2
-martingale by Corollary 3.7. The
above equality also implies that (X B)
t
() is continuous for t [0,
n
()],
which contains any given interval [0, T] if n is taken large enough.
It seems somewhat arbitrary to base the denition of the stochastic
integral on the particular stopping times
n
. The property that enabled
us to dene X B by (4.16) was that X(t)1t
n
is a process in the space
/
2
(B) for all n. Let us make this into a new denition.
Denition 4.13. Let X be an adapted, measurable process. A nonde-
creasing sequence of stopping times
n
is a localizing sequence for X if
X(t)1t
n
is in /
2
(B) for all n, and
n
with probability one.
Exercises 125
One can check that X /(B) if and only if X has a localizing sequence
n
. Lemma 4.11 and Denition 4.12 work equally well with
n
replaced
by an arbitrary localizing sequence
n
. Fix such a sequence
n
and
dene

X
n
(t) = 1t
n
X(t). Let
1
be the event of full probability on
which
n
and for all pairs m, n, (
X
m
B)
t
= (
X
n
B)
t
for t
m
n
.
(In other words, the conclusion of Lemma 4.11 holds for
n
.) Let Y be
the process dened by
(4.17) Y
t
() = (
X
n
B)
t
() for any n such that
n
() t,
for
1
, and identically zero outside
1
.
Lemma 4.14. Y = X B in the sense of indistinguishability.
Proof. Let
2
be the intersection of the full-probability events
0
and
1
dened previously above (4.16) and (4.17). By applying Proposition 4.10(c)
to the stopping time
n
n
and the processes X
n
and

X
n
, we conclude that
for almost every
2
, if t
n
()
n
(),
Y
t
() = (
X
n
B)
t
() = (X
n
B)
t
() = (X B)
t
().
Since
n
()
n
() , the above equality holds almost surely for all
0 t < .
This lemma tells us that for X /(B) the stochastic integral X B can
be dened in terms of any localizing sequence of stopping times.
Exercises
Exercise 4.1. Show that for any [0, ]-valued measurable function Y on
(, T), the set (s, ) R
+
: Y () > s is B
R
+
T-measurable.
Hint. Start with a simple Y . Show that if Y
n
Y pointwise, then
(s, ) : Y () > s =

n
(s, ) : Y
n
() > s.
Exercise 4.2. Suppose L
2
(P) is T
s
measurable and t > s. Show that
E
_
2
(B
t
B
s
)
2
= E
_
E
_
(B
t
B
s
)
2
by truncating and using monotone convergence. In particular, this implies

that (B
t
B
s
) L
2
(P).
Complete the details in Example 4.8. You need to show rst that X
k

X in /
2
(B), and then that
_
t
0
X
k
(s) dB
s

m1
i=1
i
(B
ts
i+1
B
ts
i
) in /
c
2
.
Exercise 4.3. Show that B
2
t
is a process in /
2
(B) and evaluate
_
t
0
B
2
s
dB
s
.
Hint. Follow the example of
_
t
0
B
s
dB
s
. Answer:
1
3
B
3
t

_
t
0
B
s
ds.
Exercise 4.4. Let X be an adapted, measurable process. Show that X
/(B) if and only if X has a localizing sequence
n
in the sense of Denition
4.13.
Exercise 4.5. (Integral of a step function in /(B).) Fix 0 = t
0
< t
1
<
< t
M
< , and random variables
0
, . . . ,
M1
. Assume that
i
is al-
most surely nite and T
t
i
-measurable, but make no integrability assumption.
Dene
g(s, ) =
M1
i=0
i
()1
(t
i
,t
i+1
]
(s).
The task is to show that g /(B) (virtually immediate) and that
_
t
0
g(s) dB
s
=
M1
i=0
i
(B
t
i+1
t
B
t
i
t
)
as one would expect.
Hints. Show that
n
() = inft : [g(t, )[ n. denes a localizing
sequence of stopping times. (Recall the convention inf = .) Show that
g
n
(t, ) = g(t, )1t
n
() is also a simple predictable process with the
same partition. Then we know what the approximating integrals
Y
n
(t, ) =
_
t
0
g
n
(s, ) dB
s
()
look like.
Chapter 5
Stochastic Integration
of Predictable
Processes
The main goal of this chapter is the denition of the stochastic integral
_
(0,t]
X(s) dY (s) where the integrator Y is a cadlag semimartingale and X
is a locally bounded predictable process. The most important special case
is the one where the integrand is of the form X(t) for some cadlag process
X. In this case the stochastic integral
_
(0,t]
X(s) dY (s) can be realized as
the limit of Riemann sums
S
n
(t) =
i=0
X(s
i
)
_
Y (s
i+1
t) Y (s
i
t)
_
when the mesh of the partition s
i
tends to zero. The convergence is then
uniform on compact time intervals, and happens in probability. Random
partitions of stopping times can also be used.
These results will be reached in Section 5.3. Before the semimartingale
integral we explain predictable processes and construct the integral with
respect to L
2
-martingales and local L
2
-martingales. Right-continuity of the
ltration T
t
is not needed until we dene the integral with respect to a
semimartingale. And even there it is needed only for guaranteeing that the
semimartingale has a decomposition whose local martingale part is a local
L
2
-martingale. Right-continuity of T
t
is not needed for the arguments
that establish the integral.
127
128 5. Stochastic Integral
5.1. Square-integrable martingale integrator
Throughout this section, we consider a xed probability space (, T, P) with
a ltration T
t
. M is a square-integrable cadlag martingale relative to the
ltration T
t
. We assume that the probability space and the ltration are
complete. In other words, if N T has P(N) = 0, then every subset F N
is a member of T and each T
t
. Right-continuity of the ltration T
t
is not
assumed unless specically stated.
5.1.1. Predictable processes. Predictable rectangles are subsets of R
+
of the type (s, t] F where 0 s < t < and F T

s
, or of the
type 0 F
0
where F
0
T
0
. 1 stands for the collection of all predictable
rectangles. We regard the empty set also as a predictable rectangle, since
it can be represented as (s, t] . The -eld generated by 1 in the space
R
+
is denoted by T and called the predictable -eld. T is a sub--eld
of B
R
+
T because 1 B
R
+
T. Any T-measurable function X from
R
+
into R is called a predictable process.
A predictable process is not only adapted to the original ltration T
t
but also to the potentially smaller ltration T

t
dened in (2.5) [Exercise
5.1]. This gives some mathematical sense to the term predictable, because
it means that X
t
is knowable from the information immediately prior to t
represented by T
t
.
Predictable processes will be the integrands for the stochastic integral.
Before proceeding, let us develop additional characterizations of the -eld
T.
Lemma 5.1. The following -elds on R
+
are all equal to T.
(a) The -eld generated by all continuous adapted processes.
(b) The -eld generated by all left-continuous adapted processes.
(c) The -eld generated by all adapted caglad processes (that is, left-
continuous processes with right limits).
Proof. Continuous processes and caglad processes are left-continuous. Thus
to show that -elds (a)(c) are contained in T, it suces to show that all
left-continuous processes are T-measurable.
Let X be a left-continuous, adapted process. Let
X
n
(t, ) = X
0
()1
0
(0) +
i=0
X
i2
n()1
(i2
n
,(i+1)2
n
]
(t).
Then for B B
R
,
(t, ) : X
n
(t, ) B = 0 : X
0
() B
_
i=0
_
(i2
n
, (i + 1)2
n
] : X
i2
n() B
_
which is an event in T, being a countable union of predictable rectangles.
Thus X
n
is T-measurable. By left-continuity of X, X
n
(t, ) X(t, ) as
n for each xed (t, ). Since pointwise limits preserve measurability,
X is also T-measurable.
We have shown that T contains -elds (a)(c).
The indicator of a predictable rectangle is itself an adapted caglad pro-
cess, and by denition this subclass of caglad processes generates T. Thus
-eld (c) contains T. By the same reasoning, also -eld (b) contains T.
It remains to show that -eld (a) contains T. We show that all pre-
dictable rectangles lie in -eld (a) by showing that their indicator functions
are pointwise limits of continuous adapted processes.
If X = 1
0F
0
for F
0
T
0
, let
g
n
(t) =
_
1 nt, 0 t < 1/n
0, t 1/n,
and then dene X
n
(t, ) = 1
F
0
()g
n
(t). X
n
is clearly continuous. For a
xed t, writing
X
n
(t) =
_
g
n
(t)1
F
0
, 0 t < 1/n
0, t 1/n,
and noting that F
0
T
t
for all t 0, shows that X
n
is adapted. Since
X
n
(t, ) X(t, ) as n for each xed (t, ), 0 F
0
lies in -eld
(a).
If X = 1
(u,v]F
for F T
u
, let
h
n
(t) =
_
_
n(t u), u t v + 1/n.
Consider only n large enough so that 1/n < v u. Dene X
n
(t, ) =
1
F
()h
n
(t), and adapt the previous argument. We leave the missing details
as Exercise 5.3.
The previous lemma tells us that all continuous adapted processes, all
left-continuous adapted processes, and any process that is a pointwise limit
[at each (t, )] of a sequence of such processes, is predictable. It is impor-
tant to note that left and right continuity are not treated equally in this
theory. The dierence arises from the adaptedness requirement. Not all
right continuous processes are predictable. However, an arbitrary determin-
istic process, one that does not depend on , is predictable. (See Exercises
5.2 and 5.4).
Given a square-integrable cadlag martingale M, we dene its Doleans
measure
M
on the predictable -eld T by
(5.1)
M
(A) = E
_
[0,)
1
A
(t, )d[M]
t
(), A T.
The meaning of formula (5.1) is that rst, for each xed , the function
t 1
A
(t, ) is integrated by the Lebesgue-Stieltjes measure
[M]()
of the
nondecreasing right-continuous function t [M]
t
(). The resulting integral
is a measurable function of , which is then averaged over the probability
space (, T, P) (Exercise 3.9). Recall that our convention for the measure
[M]()
0 of the origin is
[M]()
0 = [M]
0
() [M]
0
() = 0 0 = 0.
Consequently integrals over (0, ) and [0, ) coincide in (5.1).
Formula (5.1) would make sense for any A B
R
+
T. But we shall see
that when we want to extend
M
beyond T, formula (5.1) does not always
provide the right extension. Since
(5.2)
M
([0, T] ) = E([M]
T
) = E(M
2
T
M
2
0
) <
for all T < , the measure
M
is -nite.
Example 5.2 (Brownian motion). If M = B, standard Brownian motion,
we saw in Proposition 2.42 that [B]
t
= t. Then
B
(A) = E
_
[0,)
1
A
(t, )dt = mP(A)
where m denotes Lebesgue measure on R
+
. So the Doleans measure of
Brownian motion is m P, the product of Lebesgue measure on R
+
and
the probability measure P on .
Example 5.3 (Compensated Poisson process). Let N be a homogeneous
rate Poisson process on R
+
with respect to the ltration T
t
. Let M
t
=
N
t
t. We claim that the Doleans measure of
M
is m P, where as
above m is Lebesgue measure on R
+
. We have seen that [M] = N (Example
3.25). For a predictable rectangle A = (s, t] F with F T
s
,
M
(A) = E
_
[0,)
1
A
(u, )d[M]
u
() = E
_
[0,)
1
F
()1
(s,t]
(u)dN
u
()
= E
_
1
F
(N
t
N
s
)
= E
_
1
F
E
_
(N
t
N
s
)
= P(F)(t s) = mP(A).
A crucial step above used the independence of N
t
N
s
and T
s
which is
part of the denition of a Poisson process. Both measures
M
and mP
give zero measure to sets of the type 0 F
0
. We have shown that
M
and mP agree on the class 1 of predictable rectangles. By Lemma B.3
they then agree on T. For the application of Lemma B.3, note that the
space R
+
can be written as a countable disjoint union of predictable
rectangles: R
+
=
_
0
_
n0
(n, n + 1] .
For predictable processes X, we dene the L
2
norm over the set [0, T]
under the measure
M
by
|X|
M
,T
=
__
[0,T]
[X[
2
d
M
_
1/2
=
_
E
_
[0,T]
[X(t, )[
2
d[M]
t
()
_
1/2
.
(5.3)
Let /
2
= /
2
(M, T) denote the collection of all predictable processes X
such that |X|
M
,T
< for all T < . A metric on /
2
is dened by
d
/
2
(X, Y ) = |X Y |
/
2
where
(5.4) |X|
/
2
=
k=1
2
k
_
1 |X|
M
,k
_
.
/
2
is not an L
2
space, but instead a local L
2
space of sorts. The discussion
following denition (3.19) of the metric on martingales can be repeated
here with obvious changes. In particular, to satisfy the requirement that
d
/
2
(X, Y ) = 0 i X = Y , we have to regard two processes X and Y in /
2
as equal if
(5.5)
M
(t, ) : X(t, ) ,= Y (t, ) = 0.
Let us say processes X and Y are
M
-equivalent if (5.5) holds.
Example 5.4. For both Brownian motion and the compensated Poisson
process, the form of
M
tells us that a predictable process X lies in /
2
if
and only if
E
_
T
0
X(s, )
2
ds < for all T < .
For Brownian motion this is the same integrability requirement as imposed
for the space /
2
(B) in Chapter 4, except that now we are restricted to
predictable integrands while in Chapter 4 we were able to integrate more
general measurable, adapted processes.
Example 5.5. Suppose X is predictable and bounded on bounded time
intervals, in other words there exist constants C
T
< such that, for almost
every and all T < , [X
t
()[ C
T
for 0 t T. Then X /
2
(M, T)
because
E
_
[0,T]
X(s)
2
d[M]
s
C
2
T
E[M]
T
= C
2
T
EM
2
T
M
2
0
< .
5.1.2. Construction of the stochastic integral. In this section we de-
ne the stochastic integral process (X M)
t
=
_
(0,t]
X dM for integrands
X /
2
. There are two steps: rst an explicit denition of integrals for a
class of processes with a particularly simple structure, and then an approx-
imation step that denes the integral for a general X /
2
.
A simple predictable process is a process of the form
(5.6) X
t
() =
0
()1
0
(t) +
n1
i=1
i
()1
(t
i
,t
i+1
]
(t)
where n is a nite integer, 0 = t
0
= t
1
< t
2
< < t
n
are time points,
i
is
a bounded T
t
i
-measurable random variable on (, T, P) for 0 i n 1.
We set t
1
= t
0
= 0 for convenience, so the formula for X covers the interval
[0, t
n
] without leaving a gap at the origin.
Lemma 5.6. A process of type (5.6) is predictable.
Proof. Immediate from Lemma 5.1 and the left-continuity of X.
Alternatively, here is an elementary argument that shows that X is T-
measurable. For each
i
we can nd T
t
i
-measurable simple functions
N
i
=
m(i,N)
j=1
i,N
j
1
F
i,N
j
such that
N
i
()
i
() as N . Here
i,N
j
are constants and F
i,N
j

T
t
i
. Adding these up, we have that
X
t
() = lim
N
_
N
0
()1
0
(t) +
n1
i=1
N
i
()1
(t
i
,t
i+1
]
(t)
_
= lim
N
_m(0,N)
j=1
0,N
j
1
0F
0,N
j
(t, ) +
n1
i=1
m(i,N)
j=1
i,N
j
1
(t
i
,t
i+1
]F
i,N
j
(t, )
_
.
The last function is clearly T-measurable, being a linear combination of indi-
cator functions of predictable rectangles. Consequently X is T-measurable
as a pointwise limit of T-measurable functions.
Denition 5.7. For a simple predictable process of the type (5.6), the
stochastic integral is the process X M dened by
(5.7) (X M)
t
() =
n1
i=1
i
()
_
M
t
i+1
t
() M
t
i
t
()
_
.
Note that our convention is such that the value of X at t = 0 does not
inuence the integral. We also write I(X) = X M when we need a symbol
for the mapping I : X X M.
Let o
2
denote the subspace of /
2
consisting of simple predictable pro-
cesses. Any particular element X of o
2
can be represented in the form (5.6)
with many dierent choices of random variables and time intervals. The
rst thing to check is that the integral X M depends only on the process X
and not on the particular representation (5.6) used. Also, let us check that
the space o
2
is a linear space and the integral behaves linearly, since these
properties are not immediately clear from the denitions.
Lemma 5.8. (a) Suppose the process X in (5.6) also satises
X
t
() =
0
()1
0
(t) +
m1
j=1
j
()1
(s
i
,s
i+1
]
(t)
for all (t, ), where 0 = s
0
= s
1
< s
2
< < s
m
< and
j
is T
s
j
-
measurable for 0 j m1. Then for each (t, ),
n1
i=1
i
()
_
M
t
i+1
t
() M
t
i
t
()
_
=
m1
j=1
i
()
_
M
s
i+1
t
() M
s
i
t
()
_
.
In other words, X M is independent of the representation.
(b) o
2
is a linear space, in other words for X, Y o
2
and reals and
, X +Y o
2
. The integral satises
(X +Y ) M = (X M) +(Y M).
Proof. Part (a). We may assume s
m
= t
n
. For if say t
n
< s
m
, replace n
by n + 1, dene t
n+1
= s
m
and
n
() = 0, and add the term
n
1
(tn,t
n+1
]
to the
i
, t
i
-representation (5.6) of X. X did not change because the
new term is zero. The stochastic integral X M then acquires the term
n
(M
t
n+1
t
M
tnt
) which is identically zero. Thus the new term in the
representation does not change the value of either X
t
() or (X M)
t
().
Let T = s
m
= t
n
, and let 0 = u
1
< u
2
< < u
p
= T be an ordered
relabeling of the set s
j
: 1 j m t
i
: 1 i n. Then for each
1 k p 1 there are unique indices i and j such that
(u
k
, u
k+1
] = (t
i
, t
i+1
] (s
j
, s
j+1
].
For t (u
k
, u
k+1
], X
t
() =
i
() and X
t
() =
j
(). So for these particular
i and j,
i
=
j
.
The proof now follows from a reordering of the sums for the stochastic
integrals.
n1
i=1
i
(M
t
i+1
t
M
t
i
t
)
=
n1
i=1
i
p1
k=1
(M
u
k+1
t
M
u
k
t
)1(u
k
, u
k+1
] (t
i
, t
i+1
]
=
p1
k=1
(M
u
k+1
t
M
u
k
t
)
n1
i=1
i
1(u
k
, u
k+1
] (t
i
, t
i+1
]
=
p1
k=1
(M
u
k+1
t
M
u
k
t
)
m1
j=1
j
1(u
k
, u
k+1
] (s
j
, s
j+1
]
=
m1
j=1
j
p1
k=1
(M
u
k+1
t
M
u
k
t
)1(u
k
, u
k+1
] (s
j
, s
j+1
]
=
m1
j=1
j
(M
s
j+1
t
M
s
j
t
).
Part (b). Suppose we are given two simple predictable processes
X
t
=
0
1
0
(t) +
n1
i=1
i
1
(t
i
,t
i+1
]
(t)
and
Y
t
=
0
1
0
(t) +
m1
j=1
j
1
(s
i
,s
i+1
]
(t).
As above, we can assume that T = t
n
= s
m
by adding a zero term to one of
these processes. As above, let u
k
: 1 k p be the common renement
of s
j
: 1 j m and t
i
: 1 i n, as partitions of [0, T]. Given
1 k p 1, let i = i(k) and j = j(k) be the indices dened by
(u
k
, u
k+1
] = (t
i
, t
i+1
] (s
j
, s
j+1
].
Dene
k
() =
i(k)
() and
k
() =
j(k)
(). Then
X
t
=
0
1
0
(t) +
p1
k=1
k
1
(u
k
,u
k+1
]
(t)
and
Y
t
=
0
1
0
(t) +
p1
k=1
k
1
(u
k
,u
k+1
]
(t).
The representation
X
t
+Y
t
= (
0
+
0
)1
0
(t) +
p1
k=1
(
k
+
k
)1
(u
k
,u
k+1
]
(t)
shows that X + Y is a member of o
2
. According to part (a) proved
above, we can write the stochastic integrals based on these representations,
and then linearity of the integral is clear.
To build a more general integral on denition (5.7), we need some con-
tinuity properties.
Lemma 5.9. Let X o
2
. Then X M is a square-integrable cadlag mar-
tingale. If M is continuous, then so is X M. These isometries hold: for
all t > 0
(5.8) E
_
(X M)
2
t
=
_
[0,t]
X
2
d
M
and
(5.9) |X M|
/
2
= |X|
/
2
.
Proof. The cadlag property for each xed is clear from the denition
(5.7) of X M, as is the continuity if M is continuous to begin with.
Linear combinations of martingales are martingales. So to prove that
X M is a martingale it suces to check this statement: if M is a mar-
tingale, u < v and is a bounded T
u
-measurable random variable, then
Z
t
= (M
tv
M
tu
) is a martingale. The boundedness of and integrabil-
ity of M guarantee integrability of Z
t
. Take s < t.
First, if s < u, then
E[Z
t
[T
s
] = E
_
(M
tv
M
tu
)

T
s
= E
_
EM
tv
M
tu
[T
u
T
s
= 0 = Z
s
because M
tv
M
tu
= 0 for t u, and for t > u the martingale property
of M gives
EM
tv
M
tu
[T
u
= EM
tv
[T
u
M
u
= 0.
Second, if s u, then also t > s u, and it follows that is T
s
-
measurable and M
tu
= M
u
= M
su
is also T
s
-measurable. Then
E[Z
t
[T
s
] = E
_
(M
tv
M
tu
)

T
s
= E
_
M
tv
M
su
T
s
=
_
E[M
tv
[T
s
] M
su
_
= (M
sv
M
su
) = Z
s
.
In the last equality above, either s v in which case M
tv
= M
v
= M
sv
is T
s
-measurable, or s < v in which case we use the martingale property of
M.
We have proved that X M is a martingale.
Next we prove (5.8). After squaring,
(X M)
2
t
=
n1
i=1
2
i
_
M
tt
i+1
M
tt
i
_
2
+ 2
i<j
j
(M
tt
i+1
M
tt
i
)(M
tt
j+1
M
tt
j
).
We claim that each term of the last sum has zero expectation. Since i < j,
t
i+1
t
j
and both
i
and
j
are T
t
j
-measurable.
E
_
j
(M
tt
i+1
M
tt
i
)(M
tt
j+1
M
tt
j
)
= E
_
j
(M
tt
i+1
M
tt
i
)EM
tt
j+1
M
tt
j
[T
t
j
= 0
because the conditional expectation vanishes by the martingale property of
M.
Now we can compute the mean of the square. Let t > 0. The key point
of the next calculation is the fact that M
2
[M] is a martingale.
E
_
(X M)
2
t
=
n1
i=1
E
_
2
i
_
M
tt
i+1
M
tt
i
_
2
=
n1
i=1
E
_
2
i
E
_
(M
tt
i+1
M
tt
i
)
2

T
t
i
_
=
n1
i=1
E
_
2
i
E
_
M
2
tt
i+1
M
2
tt
i
T
t
i
_
=
n1
i=1
E
_
2
i
E
_
[M]
tt
i+1
[M]
tt
i
T
t
i
_
=
n1
i=1
E
_
2
i
_
[M]
tt
i+1
[M]
tt
i
_
=
n1
i=1
E
_
2
i
_
[0,t]
1
(t
i
,t
i+1
]
(s) d[M]
s
_
= E
__
[0,t]
_
2
0
1
0
(s) +
n1
i=1
2
i
1
(t
i
,t
i+1
]
(s)
_
d[M]
s
_
= E
__
[0,t]
_
0
1
0
(s) +
n1
i=1
i
1
(t
i
,t
i+1
]
(s)
_
2
d[M]
s
_
=
_
[0,t]
X
2
d
M
.
In the third last equality we added the term
2
0
1
0
(s) inside the d[M]
s
-
integral because this term integrates to zero (recall that
[M]
0 = 0). In
the second last equality we used the equality
2
0
1
0
(s) +
n1
i=1
2
i
1
(t
i
,t
i+1
]
(s) =
_
0
1
0
(s) +
n1
i=1
i
1
(t
i
,t
i+1
]
(s)
_
2
which is true due to the pairwise disjointness of the time intervals.
The above calculation checks that
|(X M)
t
|
L
2
(P)
= |X|
M
,t
for any t > 0. Comparison of formulas (3.19) and (5.4) then proves (5.9).
Let us summarize the message of Lemmas 5.8 and 5.9 in words. The
stochastic integral I : X X M is a linear map from the space o
2
of
predictable simple processes into /
2
. Equality (5.9) says that this map is
a linear isometry that maps from the subspace (o
2
, d
/
2
) of the metric space
(/
2
, d
/
2
), and into the metric space (/
2
, d
/
2
). In case M is continuous,
the map goes into the space (/
c
2
, d
/
2
).
A consequence of (5.9) is that if X and Y satisfy (5.5) then X M and
Y M are indistinguishable. For example, we may have Y
t
= X
t
+1t = 0
for a bounded T
0
-measurable random variable . Then the integrals X M
and Y M are indistinguishable, in other words the same process. This is
no dierent from the analytic fact that changing the value of a function f
on [a, b] at a single point (or even at countably many points) does not aect
the value of the integral
_
b
a
f(x) dx.
We come to the approximation step.
Lemma 5.10. For any X /
2
there exists a sequence X
n
o
2
such that
|X X
n
|
/
2
0.
Proof. Let

/
2
denote the class of X /
2
for which this approximation is
possible. Of course o
2
itself is a subset of

/
2
.
Indicator functions of time-bounded predictable rectangles are of the
form
1
0F
0
(t, ) = 1
F
0
()1
0
(t),
or
1
(u,v]F
(t, ) = 1
F
()1
(u,v]
(t),
for F
0
T
0
, 0 u < v < , and F T
u
. They are elements of o
2
due to
(5.2). Furthermore, since o
2
is a linear space, it contains all simple functions
of the form
(5.10) X(t, ) =
n
i=0
c
i
1
R
i
(t, )
where c
i
are nite constants and R
i
are time-bounded predictable rect-
angles.
The approximation of predictable processes proceeds from constant mul-
tiples of indicator functions of predictable sets through T-measurable simple
functions to the general case.
Step 1. Let G T be an arbitrary set and c R. We shall show that
X = c1
G
lies in

/
2
. We can assume c ,= 0, otherwise X = 0 o
2
. Again by
(5.2) c1
G
/
2
because
|c1
G
|
M
,T
= [c[
_
G ([0, T] )
_
1/2
<
for all T < .
Given > 0 x n large enough so that 2
n
< /2. Let G
n
= G([0, n]
). Consider the restricted -algebra
T
n
= A T : A [0, n] = B ([0, n] ) : B T.
T
n
is generated by the collection 1
n
of predictable rectangles that lie in
[0, n] (Exercise 1.7 part (d)). 1
n
is a semialgebra in the space [0, n] .
(For this reason it is convenient to regard as a member of 1
n
.) The algebra
/
n
generated by 1
n
is the collection of all nite disjoint unions of members
of 1
n
(Lemma B.5). Restricted to [0, n],
M
is a nite measure. Thus by
Lemma B.6 there exists R /
n
such that
M
(G
n
R) < [c[
2
2
/4. We can
write R = R
1
R
p
as a nite disjoint union of time-bounded predictable
rectangles.
Let Z = c1
R
. By the disjointness,
Z = c1
R
= c1
_
p
_
i=1
R
i
_
=
p
i=1
c1
R
i
so in fact Z is of type (5.10) and a member of o
2
. The /
2
-distance between
Z = c1
R
and X = c1
G
is now estimated as follows.
|Z X|
/
2

n
k=1
2
k
|c1
R
c1
G
|
M
,k
+ 2
n
k=1
2
k
[c[
__
[0,k]
[1
R
1
G
[
2
d
M
_
1/2
+/2
[c[
__
[0,n]
[1
R
1
Gn
[
2
d
M
_
1/2
+/2
= [c[
M
(G
n
R)
1/2
+/2 .
Above we bounded integrals over [0, k] with the integral over [0, n]
for 1 k n, then noted that 1
G
(t, ) = 1
Gn
(t, ) for (t, ) [0, n] ,
and nally used the general fact that
[1
A
1
B
[ = 1
A.B
for any two sets A and B.
To summarize, we have shown that given G T, c R, and > 0,
there exists a process Z o
2
such that |Z c1
G
|
/
2
. Consequently
c1
G

/
2
.
Let us observe that

/
2
is closed under addition. Let X, Y

/
2
and
X
n
, Y
n
o
2
be such that |X
n
X|
/
2
and |Y
n
Y |
/
2
vanish as n .
Then X
n
+Y
n
o
2
and by the triangle inequality
|(X +Y ) (X
n
+Y
n
)|
/
2
|X
n
X|
/
2
+|X
n
X|
/
2
0.
From this and the proof for c1
G
we conclude that all simple functions of the
type
(5.11) X =
n
i=1
c
i
1
G
i
, with c
i
R and G
i
T,
lie in

/
2
.
Step 2. Let X be an arbitrary process in /
2
. Given > 0, pick n so
that 2
n
< /3. Find simple functions X
m
of the type (5.11) such that [X
X
m
[ [X[ and X
m
(t, ) X(t, ) for all (t, ). This is just an instance of
the general approximation of measurable functions with simple functions, as
for example in (1.3). Since X L
2
([0, n], T
n
,
M
), Lebesgues dominated
convergence theorem implies for 1 k n that
limsup
m
|X X
m
|
M
,k
lim
m
__
[0,n]
[X X
m
[
2
d
M
_
1/2
= 0.
Consequently
limsup
m
|X X
m
|
/
2

n
k=1
2
k
limsup
m
|X X
m
|
M
,k
+/3 = /3.
Fix m large enough so that |XX
m
|
/
2
/2. Using Step 1 nd a process
Z o
2
such that |X
m
Z|
/
2
< /2. Then by the triangle inequality
|X Z|
/
2
. We have shown that an arbitrary process X /
2
can be
approximated by simple predictable processes in the /
2
-distance.
Now we can state formally the denition of the stochastic integral.
Denition 5.11. Let M be a square-integrable cadlag martingale on a
probability space (, T, P) with ltration T
t
. For any predictable process
X /
2
(M, T), the stochastic integral I(X) = X M is the square-integrable
cadlag martingale that satises
lim
n
|X M X
n
M|
/
2
= 0
for every sequence X
n
o
2
of simple predictable processes such that |X
X
n
|
/
2
0. The process I(X) is unique up to indistinguishability. If M is
continuous, then so is X M.
Justication of the denition. Here is the argument that justies the
claims implicit in the denition. It is really the classic argument about
extending a uniformly continuous map into a complete metric space to the
closure of its domain.
Existence. Let X /
2
. By Lemma 5.10 there exists a sequence X
n
o
2
such that |X X
n
|
/
2
0. From the triangle inequality it then follows
that X
n
is a Cauchy sequence in /
2
: given > 0, choose n
0
so that
|X X
n
|
/
2
/2 for n n
0
. Then if m, n n
0
,
|X
m
X
n
|
/
2
|X
m
X|
/
2
+|X X
n
|
/
2
.
For X
n
o
2
the stochastic integral X
n
M was dened in (5.7). By the
isometry (5.9) and the additivity of the integral,
|X
m
M X
n
M|
/
2
= |(X
m
X
n
) M|
/
2
= |X
m
X
n
|
/
2
.
Consequently X
n
M is a Cauchy sequence in the space /
2
of martingales.
If M is continuous this Cauchy sequence lies in the space /
c
2
. By Lemma
3.40 these spaces are complete metric spaces, and consequently there exists
a limit process Y = lim
n
X
n
M. This process Y we call I(X) = X M.
Uniqueness. Let Z
n
be another sequence in o
2
that converges to X in
/
2
. We need to show that Z
n
M converges to the same Y = X M in /
2
.
This follows again from the triangle inequality and the isometry:
|Y Z
n
M|
/
2
|Y X
n
M|
/
2
+|X
n
M Z
n
M|
/
2
= |Y X
n
M|
/
2
+|X
n
Z
n
|
/
2
|Y X
n
M|
/
2
+|X
n
X|
/
2
+|X Z
n
|
/
2
.
All terms on the last line vanish as n . This shows that Z
n
M Y ,
and so there is only one process Y = X M that satises the description of
the denition.
Note that the uniqueness of the stochastic integral cannot hold in a sense
stronger than indistinguishability. If W is a process that is indistinguishable
from X M, which meant that
P : W
t
() = (X M)
t
() for all t R
+
= 1,
then W also has to be regarded as the stochastic integral. This is built into
the denition of I(X) as the limit: if |X M X
n
M|
/
2
0 and W is
indistinguishable from X M, then also |W X
n
M|
/
2
0.
The denition of the stochastic integral X M feels somewhat abstract
because the approximation happens in a space of processes, and it may not
seem obvious how to produce the approximating predictable simple processes
X
n
. When X is caglad, one can use Riemann sum type approximations
with X-values evaluated at left endpoints of partition intervals. To get /
2
approximation, one must truncate the process, and then let the mesh of the
partition shrink fast enough and the number of terms in the simple process
grow fast enough. See Proposition 5.32 and Exercise 5.12.
We took the approximation step in the space of martingales to avoid
separate arguments for the path properties of the integral. The completeness
of the space of cadlag martingales and the space of continuous martingales
gives immediately a stochastic integral with the appropriate path regularity.
As for the style of convergence in the denition of the integral, let us
recall that convergence in the spaces of processes actually reduces back to
familiar mean-square convergence. |X
n
X|
/
2
0 is equivalent to having
_
[0,T]
[X
n
X[
2
d
M
0 for all T < .
Convergence in /
2
is equivalent to L
2
(P) convergence at each xed time t:
for martingales N
(j)
, N /
2
,
|N
(j)
N|
/
2
0
if and only if
E
_
(N
(j)
t
N
t
)
2
0 for each t 0.
In particular, at each time t 0 the integral (X M)
t
is the mean-square
limit of the integrals (X
n
M)
t
of approximating simple processes. These
observations are used in the extension of the isometric property of the inte-
gral.
Proposition 5.12. Let M /
2
and X /
2
(M, T). Then we have the
isometries
(5.12) E
_
(X M)
2
t
=
_
[0,t]
X
2
d
M
for all t 0,
and
(5.13) |X M|
/
2
= |X|
/
2
.
In particular, if X, Y /
2
(M, T) are
M
-equivalent in the sense (5.5), then
X M and Y M are indistinguishable.
Proof. As already observed, the triangle inequality is valid for the distance
measures | |
/
2
and | |
/
2
. From this we get a continuity property. Let
Z, W /
2
.
|Z|
/
2
|W|
/
2
|Z W|
/
2
+|W|
/
2
|W|
/
2
|Z W|
/
2
.
This and the same inequality with Z and W switched give
(5.14)

|Z|
/
2
|W|
/
2
|Z W|
/
2
.
This same calculation applies to | |
/
2
also, and of course equally well to
the L
2
norms on and [0, T] .
Let X
n
o
2
be a sequence such that |X
n
X|
/
2
0. As we proved
in Lemma 5.9, the isometries hold for X
n
o. Consequently to prove the
proposition we need only let n in the equalities
E
_
(X
n
M)
2
t
=
_
[0,t]
X
2
n
d
M
and
|X
n
M|
/
2
= |X
n
|
/
2
that come from Lemma 5.9. Each term converges to the corresponding term
with X
n
replaced by X.
The last statement of the proposition follows because |X Y |
/
2
= 0
i X and Y are
M
-equivalent, and |X M Y M|
/
2
= 0 i X M and
Y M are indistinguishable.
Remark 5.13 (Enlarging the ltration). Throughout we assume that M
is a cadlag martingale. By Proposition 3.2, if our original ltration T
t
is not already right-continuous, we can replace it with the larger ltration

T
t+
. Under the ltration T
t+
we have more predictable rectangles than
before, and hence T
+
(the predictable -eld dened in terms of T
t+
)
is potentially larger than our original predictable -eld T. The relevant
question is whether switching to T
t+
and T
+
gives us more processes X
to integrate? The answer is essentially no. Only the value at t = 0 of a T
+
-
measurable process dierentiates it from a T-measurable process (Exercise
5.5). And as already seen, the value X
0
is irrelevant for the stochastic
integral.
5.1.3. Properties of the stochastic integral. We prove here basic prop-
erties of the L
2
integral X M constructed in Denition 5.11. Many of these
properties really amount to saying that the notation works the way we would
expect it to work. Those properties that take the form of an equality between
two stochastic integrals are interpreted in the sense that the two processes
are indistinguishable. Since the stochastic integrals are cadlag processes,
indistinguishability follows from showing almost sure equality at any xed
time (Lemma 2.5).
The stochastic integral X M was dened as a limit X
n
M X M
in /
2
-space, where X
n
M are stochastic integrals of approximating simple
predictable processes X
n
. Recall that this implies that for a xed time t, (X
M)
t
is the L
2
(P)-limit of the random variables (X
n
M)
t
. And furthermore,
there is uniform convergence in probability on compact intervals:
(5.15) lim
n
P
_
sup
0tT
(X
n
M)
t
(X M)
t

_
= 0
for each > 0 and T < . By the Borel-Cantelli lemma, along some
subsequence n
j
there is almost sure convergence uniformly on compact
time intervals: for P-almost every
lim
n
sup
0tT
(X
n
M)
t
() (X M)
t
()
= 0 for each T < .

These last two statements are general properties of /
2
-convergence, see
Lemma 3.41.
A product of functions of t, , and (t, ) is regarded as a process in
the obvious sense: for example, if X is a process, Z is a random variable
and f is a function on R
+
, then fZX is the process whose value at (t, )
is f(t)Z()X(t, ). This just amounts to taking some liberties with the
notation: we do not distinguish notationally between the function t f(t)
on R
+
and the function (t, ) f(t) on R
+
.
Throughout these proofs, when X
n
o
2
approximates X /
2
, we write
X
n
generically in the form
(5.16) X
n
(t, ) =
0
()1
0
(t) +
k1
i=1
i
()1
(t
i
,t
i+1
]
(t).
We introduce the familiar integral notation through the denition
(5.17)
_
(s,t]
X dM = (X M)
t
(X M)
s
for 0 s t.
To explicitly display either the time variable (the integration variable), or
the sample point , this notation has variants such as
_
(s,t]
X dM =
_
(s,t]
X
u
dM
u
=
_
(s,t]
X
u
() dM
u
().
When the martingale M is continuous, we can also write
_
t
s
X dM
because then including or excluding endpoints of the interval make no dif-
ference (Exercise 5.6). We shall alternate freely between the dierent nota-
tions for the stochastic integral, using whichever seems more clear, compact
or convenient.
Since (X M)
0
= 0 for any stochastic integral,
_
(0,t]
X dM = (X M)
t
.
It is more accurate to use the interval (0, t] above rather than [0, t] because
the integral does not take into consideration any jump of the martingale at
the origin. Precisely, if is an T
0
-measurable random variable and

M
t
=
+ M
t
, then [
M] = [M], the spaces /

2
(
M, T) and /(M, T) coincide, and

X

M = X M for each admissible integrand.
An integral of the type
_
(u,v]
G(s, ) d[M]
s
()
is interpreted as a path-by-path Lebesgue-Stieltjes integral (in other words,
evaluated as an ordinary Lebesgue-Stieltjes integral over (u, v] separately for
each xed ).
Proposition 5.14. (a) Linearity:
(X +Y ) M = (X M) +(Y M).
(b) For any 0 u v,
(5.18)
_
(0,t]
1
[0,v]
X dM =
_
(0,vt]
X dM
and
_
(0,t]
1
(u,v]
X dM = (X M)
vt
(X M)
ut
=
_
(ut,vt]
X dM.
(5.19)
The inclusion or exclusion of the origin in the interval [0, v] is immaterial be-
cause a process of the type 1
0
(t)X(t, ) for X /
2
(M, T) is
M
-equivalent
to the identically zero process, and hence has zero stochastic integral.
(c) For s < t, we have a conditional form of the isometry:
(5.20) E
__
(X M)
t
(X M)
s
_
2

T
s
= E
__
(s,t]
X
2
u
d[M]
u
T
s
_
.
This implies that
(X M)
2
t

_
(0,t]
X
2
u
d[M]
u
is a martingale.
Proof. Part (a). Take limits in Lemma 5.8(b).
Part (b). If X
n
o
2
approximate X, then
1
[0,v]
(t)X
n
(t) =
0
1
0
(t) +
k1
i=1
i
1
(t
i
v,t
i+1
v]
(t)
are simple predictable processes that approximate 1
[0,v]
X.
_
(1
[0,v]
X
n
) M
_
t
=
k1
i=1
i
(M
t
i+1
vt
M
t
i
vt
)
= (X
n
M)
vt
.
Letting n along a suitable subsequence gives in the limit the almost
sure equality
_
(1
[0,v]
X) M
_
t
= (X M)
vt
which is (5.18). The second part (5.19) comes from 1
(u,v]
X = 1
[0,v]
X
1
[0,u]
X, the additivity of the integral, and denition (5.17).
Part (c). First we check this for the simple process X
n
in (5.16). This
is essentially a redoing of the calculations in the proof of Lemma 5.9. Let
s < t. If s s
k
then both sides of (5.20) are zero. Otherwise, x an index
1 m k 1 such that t
m
s < t
m+1
. Then
(X
n
M)
t
(X
n
M)
s
=
m
(M
t
m+1
t
M
s
) +
k1
i=m+1
i
(M
t
i+1
t
M
t
i
t
)
=
k1
i=m
i
(M
u
i+1
t
M
u
i
t
)
where we have temporarily dened u
m
= s and u
i
= t
i
for i > m. After
squaring,
(X
n
M
t
X
n
M
s
)
2
=
k1
i=m
2
i
_
M
u
i+1
t
M
u
i
t
_
2
+ 2
mi<j<k
j
(M
u
i+1
t
M
u
i
t
)(M
u
j+1
t
M
u
j
t
).
We claim that the cross terms vanish under the conditional expectation.
Since i < j, u
i+1
u
j
and both
i
and
j
are T
u
j
-measurable.
E
_
j
(M
u
i+1
t
M
u
i
t
)(M
u
j+1
t
M
u
j
t
)

T
s
= E
_
j
(M
u
i+1
t
M
u
i
t
)EM
u
j+1
t
M
u
j
t
[T
u
j
T
s
= 0
because the inner conditional expectation vanishes by the martingale prop-
erty of M.
Now we can compute the conditional expectation of the square.
E
__
(X
n
M)
t
(X
n
M)
s
_
2

T
s
=
k1
i=m
E
_
2
i
_
M
u
i+1
t
M
u
i
t
_
2

T
s
=
k1
i=m
E
_
2
i
E
_
(M
u
i+1
t
M
u
i
t
)
2

T
u
i
_
T
s
=
k1
i=m
E
_
2
i
E
_
M
2
u
i+1
t
M
2
u
i
t
T
u
i
_
T
s
=
k1
i=m
E
_
2
i
E
_
[M]
u
i+1
t
[M]
u
i
t
T
u
i
_
T
s
=
k1
i=m
E
_
2
i
_
[M]
u
i+1
t
[M]
u
i
t
_
T
s

=
k1
i=m
E
_
2
i
_
(s,t]
1
(u
i
,u
i+1
]
(u) d[M]
u
T
s
_
= E
__
(s,t]
_
0
1
0
(u) +
k1
i=1
2
i
1
(t
i
,t
i+1
]
(u)
_
d[M]
u
T
s
_
= E
__
(s,t]
_
0
1
0
(u) +
k1
i=1
i
1
(t
i
,t
i+1
]
(u)
_
2
d[M]
u
T
s
_
= E
__
(s,t]
X
n
(u, )
2
d[M]
u
()
T
s
_
Inside the d[M]
u
integral above we replaced the u
i
s with t
i
s because for
u (s, t], 1
(u
i
,u
i+1
]
(u) = 1
(t
i
,t
i+1
]
(u). Also, we brought in the terms for
i < m because these do not inuence the integral, as they are supported on
[0, t
m
] which is disjoint from (s, t].
Next let X /
2
be general, and X
n
X in /
2
. The limit n is
best taken with expectations, so we rewrite the conclusion of the previous
calculation as
E
_
(X
n
M
t
X
n
M
s
)
2
1
A
= E
_
1
A
_
(s,t]
X
2
n
(u) d[M]
u
_
for an arbitrary A T
s
. Rewrite this once again as
E
_
(X
n
M)
2
t
1
A
E
_
(X
n
M)
2
s
1
A
=
_
(s,t]A
X
2
n
d
M
.
All terms in this equality converge to the corresponding integrals with X
n
replaced by X, because (X
n
M)
t
(X M)
t
in L
2
(P) and X
n
X in
L
2
((0, t] ,
M
) (see Lemma A.16 in the Appendix for the general idea).
As A T
s
is arbitrary, (5.20) is proved.
Given stopping times and we can dene various stochastic intervals.
These are subsets of R
+
. Here are two examples which are elements of
T:
[0, ] = (t, ) R
+
: 0 t (),
and
(, ] = (t, ) R
+
: () < t ().
If () = , the -section of [0, ] is [0, ), because (, ) is not a point
in the space R
+
. If () = then the -section of (, ] is empty. The
path t 1
[0,]
(t, ) = 1
[0,()]
(t) is adapted and left-continuous with right
limits. Hence by Lemma 5.1 the indicator function 1
[0,]
is T-measurable.
The same goes for 1
(,]
. If X is a predictable process, then so is the product
1
[0,]
X.
Recall also the notion of a stopped process M
dened by M
t
= M
t
.
If M /
2
then also M
/
2
, because Lemma 3.5 implies
E[M
2
t
] 2E[M
2
t
] +E[M
2
0
].
We insert a lemma on the eect of stopping on the Doleans measure.
Lemma 5.15. Let M /
2
and a stopping time. Then for any T-
measurable nonnegative function Y ,
(5.21)
_
R
+
Y d
M
=
_
R
+
1
[0,]
Y d
M
.
Proof. Consider rst a nondecreasing cadlag function G on [0, ). For
u > 0, dene the stopped function G
u
by G
u
(t) = G(u t). Then the
Lebesgue-Stieltjes measures satisfy
_
(0,)
hd
G
u =
_
(0,)
1
(0,u]
h
G
for every nonnegative Borel function h. This can be justied by the -
Theorem. For any interval (a, b],
G
u(s, t] = G
u
(t) G
u
(s) = G(u t) G(u s) =
G
_
(s, t] (0, u]
_
.
Then by Lemma B.3 the measures
G
u and
G
( (0, u]) coincide on all
Borel sets of (0, ). The equality extends to [0, ) if we set G(0) = G(0)
so that the measure of 0 is zero under both measures.
Now x and apply the preceding. By Lemma 3.27, [M
] = [M]
, and
so
_
[0,)
Y (s, ) d[M
]
s
() =
_
[0,)
Y (s, ) d[M]
s
()
=
_
[0,)
1
[0,()]
(s)Y (s, ) d[M]
s
().
Taking expectation over this equality gives the conclusion.
The lemma implies that the measure
M
is absolutely continuous with
respect to
M
, and furthermore that /
2
(M, T) /
2
(M
, T).
2
, X /
2
(M, T), and let be a stopping
time.
(a) Let Z be a bounded T
-measurable random variable. Then Z1

(,)
X
and 1
(,)
X are both members of /
2
(M, T), and
(5.22)
_
(0,t]
Z1
(,)
X dM = Z
_
(0,t]
1
(,)
X dM.
(b) The integral behaves as follows under stopping:
(5.23)
_
(1
[0,]
X) M
_
t
= (X M)
t
= (X M
)
t
.
(c) Let also N /
2
and Y /
2
(N, T). Suppose there is a stopping
time such that X
t
() = Y
t
() and M
t
() = N
t
() for 0 t (). Then
(X M)
t
= (Y N)
t
.
Remark 5.17. Equation (5.23) implies that can appear in any subset of
the three locations. For example,
(X M)
t
= (X M)
t
= (X M
)
t
= (X M
)
t
=
_
(1
[0,]
X) M
_
t
.
(5.24)
Proof. (a) Z1
(,)
is T-measurable because it is an adapted caglad process.
(This process equals Z if t > , otherwise it vanishes. If t > then T
T
t
which implies that Z is T
t
-measurable.) This takes care of the measurability
issue. Multiplying X /
2
(M, T) by something bounded and T-measurable
creates a process in /
2
(M, T).
Assume rst = u, a deterministic time. Let X
n
as in (5.16) approxi-
mate X in /
2
. Then
1
(u,)
X
n
=
k1
i=1
i
1
(ut
i
,ut
i+1
]
approximates 1
(u,)
X in /
2
. And
Z1
(u,)
X
n
=
k1
i=1
Z
i
1
(ut
i
,ut
i+1
]
are elements of o
2
that approximate Z1
(u,)
X in /
2
. Their integrals are
_
(Z1
(u,)
X
n
) M
_
t
=
k1
i=1
Z
i
(M
(ut
i+1
)t
M
(ut
i
)t
)
= Z
_
(1
(u,)
X
n
) M
t
_
.
Letting n along a suitable subsequence gives almost sure convergence
of both sides of this equality to the corresponding terms in (5.22) at time t,
in the case = u.
Now let be a general stopping time. Dene
m
by
m
=
_
i2
m
, if (i 1)2
m
< i2
m
for some 1 i 2
m
m
, if m.
Pointwise
m
as m , and 1
(
m
,)
1
(,)
. Both
1
(i1)2
m
<i2
m
and 1
(i1)2
m
<i2
m
Z
are T
i2
m-measurable for each i. (The former by denition of a stopping
time, the latter by Exercise 2.9.) The rst part proved above applies to each
such random variable with u = i2
m
.
(Z1
(
m
,)
X) M =
_
2
m
m
i=1
1
(i1)2
m
<i2
m
Z1
(i2
m
,)
X
_
M
= Z
2
m
m
i=1
1
(i1)2
m
<i2
m
_
1
(i2
m
,)
X
_
M
= Z
_
2
m
m
i=1
1
(i1)2
m
<i2
m
1
(i2
m
,)
X
_
M
= Z
_
(1
(
m
,)
X) M
_
.
Let m . Because Z1
(
m
,)
X Z1
(,)
X and 1
(
m
,)
X 1
(,)
X
in /
2
, both extreme members of the equalities above converge in /
2
to the
corresponding martingales with
m
replaced by . This completes the proof
of part (a).
Part (b). We prove the rst equality in (5.23). Let
n
= 2
n
(2
n
| +1)
be the usual discrete approximation that converges down to as n .
Let (n) = 2
n
t| + 1. Since k2
n
i
n
(k + 1)2
n
,
(X M)
nt
=
(n)
k=0
1 k2
n
_
(X M)
(k+1)2
n
t
(X M)
k2
n
t
_
=
(n)
k=0
1 k2
n
_
(0,t]
1
(k2
n
,(k+1)2
n
]
X dM
=
(n)
k=0
_
(0,t]
1 k2
n
1
(k2
n
,(k+1)2
n
]
X dM
=
_
(0,t]
_
1
0
X +
(n)
k=0
1 k2
n
1
(k2
n
,(k+1)2
n
]
X
_
dM
=
_
(0,t]
1
[0,n]
X dM.
In the calculation above, the second equality comes from (5.19), the third
from (5.22) where Z is the T
k2
n-measurable 1 k2
n
. The next to
last equality uses additivity and adds in the term 1
0
X that integrates to
zero. The last equality comes from the observation that for s [0, t]
1
[0,n]
(s, ) = 1
0
(s) +
(n)
k=0
1
k2
n
()1
(k2
n
,(k+1)2
n
]
(s).
Now let n . By right-continuity, (X M)
nt
(X M)
t
. To show
that the last term of the string of equalities converges to
_
(1
[0,]
X) M
_
t
, it
suces to show, by the isometry (5.12), that
lim
n
_
[0,t]
1
[0,n]
X 1
[0,]
X
2
d
M
= 0.
This follows from dominated convergence. The integrand vanishes as n
because
1
[0,n]
(t, ) 1
[0,]
(t, ) =
_
0, () =
1() < t
n
(), () <
and
n
() (). The integrand is bounded by [X[
2
for all n, and
_
[0,t]
[X[
2
d
M
<
by the assumption X /
2
. This completes the proof of the rst equality in
(5.23).
We turn to proving the second equality of (5.23). Let X
n
o
2
as in
(5.16) approximate X in /
2
(M, T). By (5.21), X /
2
(M
, T) and the
processes X
n
approximate X also in /
2
(M
, T). Comparing their integrals,

we get
(X
n
M
)
t
=
i
(M
t
i+1
t
M
t
i
t
)
=
i
(M
t
i+1
t
M
t
i
t
)
= (X
n
M)
t
By the denition of the stochastic integral X M
, the random variables

(X
n
M
)
t
converge to (X M
)
t
in L
2
as n .
We cannot appeal to the denition of the integral to assert the con-
vergence of (X
n
M)
t
to (X M)
t
because the time point is random.
However, martingales aord strong control of their paths. Y
n
(t) = (X
n

M)
t
(X M)
t
is an L
2
martingale with Y
n
(0) = 0. Lemma 3.5 applied to
the submartingale Y
2
n
(t) implies
E[Y
2
n
(t )] 2E[Y
2
n
(t)] = 2E
_
_
(X
n
M)
t
(X M)
t
_
2
_
and this last expectation vanishes as n by the denition of X M.
Consequently
(X
n
M)
t
(X M)
t
in L
2
.
This completes the proof of the second equality in (5.23).
Part (c). Since 1
[0,]
X = 1
[0,]
Y and M
= N
,
(X M)
t
=
_
(1
[0,]
X) M
_
t
=
_
(1
[0,]
Y ) N
_
t
= (Y N)
t
.
Example 5.18. Let us record some simple integrals as consequences of the
properties.
(a) Let be two stopping times, and a bounded T
-measurable
random variable. Dene X = 1
(,]
, or more explicitly,
X
t
() = ()1
((),()]
(t).
As an adapted caglad process, X is predictable. Let M be an L
2
-martingale.
Pick a constant C [()[. Then for any T < ,
_
[0,T]
X
2
d
M
= E
_
2
_
[M]
T
[M]
T
__
C
2
E
_
[M]
T
_
= C
2
EM
2
T
C
2
EM
2
T
< .
Thus X /
2
(M, T). By (5.22) and (5.23),
X M = (1
(,)
1
[0,]
) M =
_
(1
(,)
1
[0,]
) M
_
=
_
(1
[0,]
1
[0,]
) M
_
=
_
(1 M)
(1 M)
_
= (M
).
Above we used 1 to denote the function or process that is identically one.
(b) Continuing the example, consider a sequence 0
1

2

i
of stopping times, and random variables
i
: i 1 such that
i
is
T
i
-measurable and C = sup
i,
[
i
()[ < . Let
X(t) =
i=1
i
1
(
i
,
i+1
]
(t).
As a bounded caglad process, X /
2
(M, T) for any L
2
-martingale M. Let
X
n
(t) =
n
i=1
i
1
(
i
,
i+1
]
(t).
By part (a) of the example and the additivity of the integral,
X
n
M =
n
i=1
i
(M
i+1
M
i
).
X
n
X pointwise. And since [X X
n
[ 2C,
_
[0,T]
[X X
n
[
2
d
M
0
for any T < by dominated convergence. Consequently X
n
X in
/
2
(M, T), and then by the isometry, X
n
M X M in /
2
. From the
formula for X
n
M it is clear where it converges pointwise, and this limit
must agree with the /
2
limit. The conclusion is
X M =
i=1
i
(M
i+1
M
i
).
As the last issue of this section, we consider integrating a given process
X with respect to more than one martingale.
Proposition 5.19. Let M, N /
2
, , R, and X /
2
(M, T)
/
2
(N, T). Then X /
2
(M +N, T), and
(5.25) X (M +N) = (X M) +(X N).
Lemma 5.20. For a predictable process Y ,
__
[0,T]
[Y [
2
d
M+N
_
1/2
[[
__
[0,T]
[Y [
2
d
M
_
1/2
+[[
__
[0,T]
[Y [
2
d
N
_
1/2
.
Proof. The linearity
[M +N] =
2
[M] + 2[M, N] +
2
[N]
is inherited by the Lebesgue-Stieltjes measures. By the Kunita-Watanabe
inequality (2.17),
_
[0,T]
[Y
s
[
2
d[M +N]
s
=
2
_
[0,T]
[Y
s
[
2
d[M]
s
+ 2
_
[0,T]
[Y
s
[
2
d[M, N]
s
+
2
_
[0,T]
[Y
s
[
2
d[N]
s

2
_
[0,T]
[Y
s
[
2
d[M]
s
+ 2[[[[
__
[0,T]
[Y
s
[
2
d[M]
s
_
1/2
__
[0,T]
[Y
s
[
2
d[N]
s
_
1/2
+
2
_
[0,T]
[Y
s
[
2
d[N]
s
.
The above integrals are Lebesgue-Stieltjes integrals over [0, T], evaluated at
a xed . Take expectations and apply Schwarz inequality to the middle
term.
Proof of Proposition 5.19. The Lemma shows that X /
2
(M+N, T).
Replace the measure
M
in the proof of Lemma 5.10 with the measure
=
M
+
N
. The proof works exactly as before, and gives a sequence of
simple predictable processes X
n
such that
_
[0,T]
[X X
n
[
2
d(
M
+
N
) 0
for each T < . This combined with the previous lemma says that X
n
X
simultaneously in spaces /
2
(M, T), /
2
(N, T), and /
2
(M +N, T). (5.25)
holds for X
n
by the explicit formula for the integral of a simple predictable
process, and the general conclusion follows by taking the limit.
5.2. Local square-integrable martingale integrator
Recall that a cadlag process M is a local L
2
-martingale if there exists a
nondecreasing sequence of stopping times
k
such that
k
almost
surely, and for each k the stopped process M
k
= M
k
t
: t R
+
is an
L
2
-martingale. The sequence
k
is a localizing sequence for M. /
2,loc
is
the space of cadlag, local L
2
-martingales.
We wish to dene a stochastic integral X M where M can be a local L
2
-
martingale. The earlier approach via an L
2
isometry will not do because the
whole point is to get rid of integrability assumptions. We start by dening
the class of integrands. Even for an L
2
-martingale this gives us integrands
beyond the /
2
-space of the previous section.
Denition 5.21. Given a local square-integrable martingale M, let /(M, T)
denote the class of predictable processes X which have the following prop-
erty: there exists a sequence of stopping times 0
1

2

3

k
such that
(i) P
k
= 1,
(ii) M
k
is a square-integrable martingale for each k, and
(iii) the process 1
[0,
k
]
X lies in /
2
(M
k
, T) for each k.
Let us call such a sequence of stopping times a localizing sequence for the
pair (X, M).
By our earlier development, for each k the stochastic integral
Y
k
= (1
[0,
k
]
X) M
k
exists as an element of /
2
. The idea will be now to exploit the consistency
in the sequence of stochastic integrals Y
k
, which enables us to dene (X
M)
t
() for a xed (t, ) by the recipe take Y
k
t
() for a large enough k.
First a lemma that justies the approach.
Lemma 5.22. Let M /
2,loc
and let X be a predictable process. Sup-
pose and are two stopping times such that M
and M
are cadlag
L
2
-martingales, 1
[0,]
X /
2
(M
, T) and 1
[0,]
X /
2
(M
, T). Let
Z
t
=
_
(0,t]
1
[0,]
X dM
and W
t
=
_
(0,t]
1
[0,]
X dM
denote the stochastic integrals, which are cadlag L

2
-martingales. Then
Z
t
= W
t
where we mean that the two processes are indistinguishable.
Proof. A short derivation based on (5.23)(5.24) and some simple ob-
servations: (M
= (M
= M
, and 1
[0,]
X, 1
[0,]
X both lie in
/
2
(M
, T).
Z
t
=
_
(1
[0,]
X) M
_
t
=
_
(1
[0,]
1
[0,]
X) (M
_
t
=
_
(1
[0,]
1
[0,]
X) (M
_
t
=
_
(1
[0,]
X) M
_
t
= W
t
.
Let
0
be the following event:
0
= :
k
() as k , and for all (k, m),
Y
k
t
k
m
() = Y
m
t
k
m
() for all t R
+
.
(5.26)
P(
0
) = 1 by the assumption P
k
= 1, by the previous lemma, and
because there are countably many pairs (k, m). To rephrase this, on the
event
0
, if k and m are indices such that t
k
m
, then Y
k
t
= Y
m
t
. This
makes the denition below sensible.
Denition 5.23. Let M /
2,loc
, X /(M, T), and let
k
be a localiz-
ing sequence for (X, M). Dene the event
0
as in the previous paragraph.
The stochastic integral X M is the cadlag local L
2
-martingale dened as
follows: on the event
0
set
(X M)
t
() =
_
(1
[0,
k
]
X) M
k
_
t
()
for any k such that
k
() t.
(5.27)
Outside the event
0
set (X M)
t
= 0 for all t.
This denition is independent of the localizing sequence
k
in the sense
that using any other localizing sequence of stopping times gives a process
indistinguishable from X M dened above.
Justication of the denition. The process XM is cadlag on any boun-
ded interval [0, T] for the following reasons. If /
0
the process is constant
in time. If
0
, pick k large enough so that
k
() > T, and note that
the path t (X M)
t
() coincides with the cadlag path t Y
k
t
() on the
interval [0, T]. Being cadlag on all bounded intervals is the same as being
cadlag on R
+
, so it follows that X M is cadlag process.
The stopped process satises
(X M)
k
t
= (X M)
k
t
= Y
k
k
t
= (Y
k
)
k
t
because by denition X M = Y
k
(almost surely) on [0,
k
]. Y
k
is an
L
2
-martingale, hence so is (Y
k
)
k
. Consequently (X M)
k
is a cadlag L
2
-
martingale. This shows that X M is a cadlag local L
2
-martingale.
To take up the last issue, let
j
be another localizing sequence of
stopping times for (X, M). Let
W
j
t
=
_
(0,t]
1
[0,
j
]
X dM
j
.
Corresponding to the event
0
and denition (5.27) from above, based on
j
we dene an event
1
with P(
1
) = 1, and on
1
an alternative
stochastic integral by
(5.28) W
t
() = W
j
t
() for j such that
j
() t.
Lemma 5.22 implies that the processes W
j
t
j
k
and Y
k
t
j
k
are indistin-
guishable. Let
2
be the set of
0

1
for which
W
j
t
j
k
() = Y
k
t
j
k
() for all t R
+
and all pairs (j, k).
P(
2
) = 1 because it is an intersection of countably many events of proba-
bility one. We claim that for
2
, (X M)
t
() = W
t
() for all t R
+
.
Given t, pick j and k so that
j
()
k
() t. Then, using (5.27), (5.28)
and
2
,
(X M)
t
() = Y
k
t
() = W
j
t
() = W
t
().
We have shown that X M and W are indistinguishable, so the denition
of X M does not depend on the particular localizing sequence used.
We have justied all the claims made in the denition.
Remark 5.24 (Irrelevance of the time origin). The value X
0
does not af-
fect anything above because
Z
(0 ) = 0 for any L
2
-martingale Z. If
a predictable X is given and

X
t
= 1
(0,)
(t)X
t
, then
Z
X ,=

X = 0. In
particular,
k
is a localizing sequence for (X, M) i it is a localizing se-
quence for (
X, M), and X M =

X M if a localizing sequence exists. Also,
in part (iii) of Denition 5.21 we can equivalently require that 1
(0,
k
]
X lies
in /
2
(M
k
, T).
Remark 5.25 (Path continuity). If the local L
2
-martingale M has continu-
ous paths to begin with, then so do M
k
, hence also the integrals 1
[0,
k
]
M
k
have continuous paths, and the integral X M has continuous paths.
Example 5.26. Compared with Example 5.4, with Brownian motion and
the compensated Poisson process we can now integrate predictable processes
X that satisfy
_
T
0
X(s, )
2
ds < for all T < , for P-almost every .
(Exercise 5.7 asks you to verify this.) Again, we know from Chapter 4
that predictability is not really needed for integrands when the integrator is
Brownian motion.
Property (iii) of Denition 5.21 made the localization argument of the
denition of X M work. In important special cases property (iii) follows
from this stronger property:
there exist stopping times
k
such that
k
almost
surely and 1
(0,
k
]
X is a bounded process for each k.
(5.29)
Let M be an arbitrary local L
2
-martingale with localizing sequence
k
,
and assume X is a predictable process that satises (5.29). A bounded
process is in /
2
(Z, T) for any L
2
-martingale Z, and consequently 1
(0,
k
]
X
/
2
(M
k
, T). By Remark 5.24 the conclusion extends to 1
[0,
k
]
X. Thus the
stopping times
k
=
k
k
localize the pair (X, M), and the integral X M
is well-dened.
The time origin is left out of 1
(0,
k
]
X because 1
[0,
k
]
X cannot be bounded
unless X
0
is bounded. This would be unnecessarily restrictive.
The next proposition lists the most obvious types of predictable pro-
cesses that satisfy (5.29). In certain cases demonstrating the existence of
the stopping times may require a right-continuous ltration. Then one re-
places T
t
with T
t+
. As observed in the beginning of Chapter 3, this
can be done without losing any cadlag martingales (or local martingales).
Recall also the denition
(5.30) X
T
() = sup
0tT
[X
t
()[
which is T
T
-measurable for any left- or right-continuous process X, provided
we make the ltration complete. (See discussion after (3.10) in Chapter 3.)
Proposition 5.27. The following cases are examples of processes with stop-
ping times
k
that satisfy condition (5.29).
(i) X is predictable, and for each T < there exists a constant C
T
<
such that, with probability one, X
t
C
T
for all 0 < t T. Take
k
= k.
(ii) X is adapted and has almost surely continuous paths. Take
k
= inft 0 : [X
t
[ k.
(iii) X is adapted, and there exists an adapted, cadlag process Y such
that X(t) = Y (t) for t > 0. Take
k
= inft > 0 : [Y (t)[ k or [Y (t)[ k.
(iv) X is adapted, has almost surely left-continuous paths, and X
T
<
almost surely for each T < . Assume the underlying ltration T
t
right-
continuous. Take
k
= inft 0 : [X
t
[ > k.
Remark 5.28. Category (ii) is a special case of (iii), and category (iii) is a
special case of (iv). Category (iii) seems articial but will be useful. Notice
that every caglad X satises X(t) = Y (t) for the cadlag process Y dened
by Y (t) = X(t+), but this Y may fail to be adapted. Y is adapted if T
t
is right-continuous. But then we nd ourselves in Category (iv).

Let us state the most important special case of continuous processes as
a corollary in its own right. It follows from Lemma 3.19 and from case (ii)
above.
Corollary 5.29. For any continuous local martingale M and continuous,
adapted process X, the stochastic integral X M is well-dened.
Proof of Proposition 5.27. Case (i): nothing to prove.
Case (ii). By Lemma 2.10 this
k
is a stopping time. A continuous path
t X
t
() is bounded on compact time intervals. Hence for almost every
,
k
() . Again by continuity, [X
s
[ k for 0 < s
k
. Note that
if [X
0
[ > k then
k
= 0, so we cannot claim 1
[0,
k
]
[X
0
[ k. This is why
boundedness cannot be required at time zero.
Case (iii). By Lemma 2.9 this
k
is a stopping time. A cadlag path is
locally bounded just like a continuous path (Exercise A.1), and so
k
.
If
k
> 0, then [X(t)[ < k for t <
k
, and by left-continuity [X(
k
)[ k.
Note that [Y (
k
)[ k may fail so we cannot adapt this argument to Y .
Case (iv). By Lemma 2.7
k
is a stopping time since we assume T
t
right-continuous. As in case (iii), by left-continuity [X

s
[ k for 0 < s
k
. Given such that X
T
() < for all T < , we can choose k
T
>
sup
0tT
[X
t
()[ and then
k
() T for k k
T
. Thus
k
almost
surely.
Example 5.30. Let us repeat Example 5.18 without boundedness assump-
tions. 0
1

2

i
are stopping times,
i
is a nite
T
i
-measurable random variable for i 1, and
X
t
=
i=1
i
1
(
i
,
i+1
]
(t).
X is a caglad process, and satises the hypotheses of case (iii) of Proposition
5.27. We shall dene a concrete localizing sequence. Fix M /
2,loc
and
let
k
be a localizing sequence for M. Dene
k
=
_
j
, if max
1ij1
[
i
[ k < [
j
[ for some j
, if [
i
[ k for all i.
That
k
is a stopping time follows directly from
k
t =
_
j=1
_
_
max
1ij1
[
i
[ k < [
j
[
_
j
t
_
_
.
Also
k
since
i
. The stopping times
k
=
k

k
localize the
pair (X, M).
Truncate
(k)
i
= (
i
k) (k). If t (
i
k
,
i+1
k
] then necessarily
i
<
k
. This implies
k

i+1
which happens i
=
(k)
for 1 i.
Hence
1
[0,
k
]
(t)X
t
=
i=1
(k)
i
1
(
i
k
,
i+1
k
]
(t).
This process is bounded, so by Example 5.18,
_
(1
[0,
k
]
X) M
k
_
t
=
i=1
(k)
i
(M
i+1
k
t
M
k
t
).
Taking k so that
k
t, we get
(X M)
t
=
i=1
i
(M
i+1
t
M
i
t
).
We use the integral notation
_
(s,t]
X dM = (X M)
t
(X M)
s
and other notational conventions exactly as for the L
2
integral. The stochas-
tic integral with respect to a local martingale inherits the path properties
of the L
2
integral, as we observe in the next proposition. Expectations and
conditional expectations of (X M)
t
do not necessarily exist any more so we
cannot even contemplate their properties.
2,loc
, X /(M, T), and let be a
stopping time.
(a) Linearity continues to hold: if also Y /(M, T), then
(X +Y ) M = (X M) +(Y M).
(b) Let Z be a bounded T
-measurable random variable. Then Z1

(,)
X
and 1
(,)
X are both members of /(M, T), and
(5.31)
_
(0,t]
Z1
(,)
X dM = Z
_
(0,t]
1
(,)
X dM.
Furthermore,
(5.32)
_
(1
[0,]
X) M
_
t
= (X M)
t
= (X M
)
t
.
(c) Let Y /(N, T). Suppose X
t
() = Y
t
() and M
t
() = N
t
() for
0 t (). Then
(X M)
t
= (Y N)
t
.
(d) Suppose X /(M, T) /(N, T). Then for , R, X /(M +
N, T) and
X (M +N) = (X M) +(X N).
Proof. The proofs are short exercises in localization. We show the way by
doing (5.31) and the rst equality in (5.32).
Let
k
be a localizing sequence for the pair (X, M). Then
k
is
a localizing sequence also for the pairs (1
(,)
X, M) and (Z1
(,)
X, M).
Given and t, pick k large enough so that
k
() t. Then by the denition
of the stochastic integrals for localized processes,
Z
_
(1
(,)
X) M)
t
() = Z
_
(1
[0,
k
]
1
(,)
X) M
k
)
t
()
and
_
(Z1
(,)
X) M)
t
() =
_
(1
[0,
k
]
Z1
(,)
X) M
k
)
t
().
The right-hand sides of the two equalities above coincide, by an application
of (5.22) to the L
2
-martingale M
k
and the process 1
[0,
k
]
X in place of X.
This veries (5.31).
The sequence
k
works also for (1
[0,]
X, M). If t
k
(), then
_
(1
[0,]
X) M
_
t
=
_
(1
[0,
k
]
1
[0,]
X) M
k
_
t
=
_
(1
[0,
k
]
X) M
k
_
t
= (X M)
t
.
The rst and the last equality are the denition of the local integral, the
middle equality an application of (5.23). This checks the rst equality in
(5.32).
We come to a very helpful result for later development. The most im-
portant processes are usually either caglad or cadlag. The next proposition
shows that for left-continuous processes the integral can be realized as a
limit of Riemann sum-type approximations. For future benet we include
random partitions in the result.
However, a cadlag process X is not necessarily predictable and therefore
not an admissible integrand. The Poisson process is a perfect example, see
Exercise 5.4. It is intuitively natural that the Poisson process cannot be
predictable, for how can we predict when the process jumps? But it turns
out that the Riemann sums still converge for a cadlag integrand. They just
cannot converge to X M because this integral might not exist. Instead,
these sums converge to the integral X
M of the caglad process X
dened
by
X
(0) = X(0) and X
(t) = X(t) for t > 0.

We leave it as an exercise to verify that X
has caglad paths.

The limit in the next proposition is not a mere curiosity. It will be
important when we derive Itos formula. Note the similarity with Lemma
1.12 for Lebesgue-Stieltjes integrals.
Proposition 5.32. Let X be an adapted process and M /
2,loc
. Suppose
0 =
n
0

n
1

n
2

n
3
are stopping times such that for each n,
n
i
almost surely as i , and
n
= sup
i
(
n
i+1

n
i
) tends to zero
almost surely as n . Dene the process
(5.33) R
n
(t) =
i=0
X(
n
i
)
_
M(
n
i+1
t) M(
n
i
t)
_
.
(a) Assume X is left-continuous and satises (5.29). Then for each xed
T < and > 0,
lim
n
P
_
sup
0tT
[R
n
(t) (X M)
t
[
_
= 0.
In other words, R
n
converges to X M in probability, uniformly on compact
time intervals.
(b) If X is a cadlag process, then R
n
converges to X
M in probability,
uniformly on compact time intervals.
Proof. Since X
= X for a left-continuous process, we can prove parts (a)

and (b) simultaneously.
Assume rst X
0
= 0. This is convenient for the proof. At the end we lift
this assumption. By left- or right-continuity X is progressively measurable
(Lemma 2.4) and therefore X(
n
i
) is T
n
i
-measurable on the event
n
i
<
(Lemma 2.3). Dene
Y
n
(t) =
i=0
X(
n
i
)1
(
n
i
,
n
i+1
]
(t) X
(t).
By the hypotheses and by Example 5.30, Y
n
is an element of /(M, T) and
its integral is
Y
n
M = R
n
X
M.
Consequently we need to show that Y
n
M 0 in probability, uniformly on
compacts.
Let
k
be a localizing sequence for (X
, M) such that 1
(0,
k
]
X
is
bounded. In part (a) existence of
k
is a hypothesis. For part (b) apply
part (iii) of Proposition 5.27. As explained there, it may happen that X(
k
)
is not bounded, but X(t) will be bounded for 0 < t
k
.
Pick constants b
k
such that [X
(t)[ b
k
for 0 t
k
(here we rely on
the assumption X
0
= 0). Dene X
(k)
= (X b
k
) (b
k
) and
Y
(k)
n
(t) =
i=0
X
(k)
(
n
i
)1
(
n
i
,
n
i+1
]
(t) X
(k)
(t).
In forming X
(k)
(t), it is immaterial whether truncation follows the left limit

or vice versa.
We have the equality
1
[0,
k
]
Y
n
(t) = 1
[0,
k
]
Y
(k)
n
(t).
For the sum in Y
n
this can be seen term by term:
1
[0,
k
]
(t)1
(
n
i
,
n
i+1
]
(t)X(
n
i
) = 1
[0,
k
]
(t)1
(
n
i
,
n
i+1
]
(t)X
(k)
(
n
i
)
because both sides vanish unless
n
i
< t
k
.
Thus
k
is a localizing sequence for (Y
n
, M). On the event
k
> T,
for 0 t T, by denition (5.27) and Proposition 5.16(b)(c),
(Y
n
M)
t
=
_
(1
[0,
k
]
Y
n
) M
k
)
t
= (Y
(k)
n
M
k
)
t
.
Fix > 0. In the next bound we apply martingale inequality (3.8) and
the isometry (5.12).
P
_
sup
0tT
[(Y
n
M)
t
[
_
P
k
T
+P
_
sup
0tT
_
Y
(k)
n
M
k
_
t

_
P
k
T +
2
E
_
(Y
(k)
n
M
k
)
2
T
P
k
T +
2
_
[0,T]
[Y
(k)
n
(t, )[
2
k (dt, d).
Let
1
> 0. Fix k large enough so that P
k
T <
1
. As n ,
Y
(k)
n
(t, ) 0 for all t, if is such that the path s X
(s, ) is left-
continuous and the assumption
n
() 0 holds. This excludes at most a
zero probability set of s, and so this convergence happens
M
k -almost
everywhere. By the bound [Y
(k)
n
[ 2b
k
and dominated convergence,
_
[0,T]
[Y
(k)
n
[
2
d
M
k 0 as n .
Letting n in the last string of inequalities gives
lim
n
P
_
sup
0tT
[(Y
n
M)
t
[
_

1
.
Since
1
> 0 can be taken arbitrarily small, the limit above must actually
equal zero.
At this point we have proved
(5.34) lim
n
P
_
sup
0tT
[R
n
(t) (X
M)
t
[
_
= 0
under the extra assumption X
0
= 0. Suppose

X satises the hypotheses
of the proposition, but

X
0
is not identically zero. Then (5.34) is valid
for X
t
= 1
(0,)
(t)
X
t
. Changing value at t = 0 does not aect stochastic
integration, so

X M = X M. Let
R
n
(t) =
i=0
X(
n
i
)
_
M(
n
i+1
t) M(
n
i
t)
_
.
The conclusion follows for

X if we can show that
sup
0t<
[
R
n
(t) R
n
(t)[ 0 as n .
Since

R
n
(t) R
n
(t) =

X(0)
_
M(
n
1
t) M(0)
_
, we have the bound
sup
0t<
[
R
n
(t) R
n
(t)[ [
X(0)[ sup
0tn
[M(t) M(0)[.
The last quantity vanishes almost surely as n , by the assumption
n
0 and the cadlag paths of M. In particular it converges to zero in
probability.
To summarize, (5.34) now holds for all processes that satisfy the hy-
potheses.
Remark 5.33 (Doleans measure). We discuss here briey the Doleans mea-
sure of a local L
2
-martingale. It provides an alternative way to dene the
space /(M, T) of admissible integrands. The lemma below will be used to
extend the stochastic integral beyond predictable integrands, but that point
is not central to the main development, so the remainder of this section can
be skipped.
Fix a local L
2
-martingale M and stopping times
k
such that
M
k
/
2
for each k. By Theorem 3.26 the quadratic variation [M] exists
as a nondecreasing cadlag process. Consequently Lebesgue-Stieltjes integrals
with respect to [M] are well-dened. The Doleans measure
M
can be
dened for A T by
(5.35)
M
(A) = E
_
[0,)
1
A
(t, )d[M]
t
(),
exactly as for L
2
-martingales earlier. The measure
M
is -nite: the union
of the stochastic intervals [0,
k
k] : k N exhausts R
+
, and
M
([0,
k
k]) = E
_
[0,)
1
[0,
k
()k]
(t) d[M]
t
() = E[M]
k
k
= E[M
k
]
k
= E(M
k
k
)
2
(M
k
0
)
2
< .
Along the way we used Lemma 3.27 and then the square-integrability of
M
k
.
The following alternative characterization of membership in /(M, T) will
be useful for extending the stochastic integral to non-predictable integrands
in Section 5.5.
Lemma 5.34. Let M be a local L
2
-martingale and X a predictable process.
Then X /(M, T) i there exist stopping times
k
(a.s.) such that
for each k,
_
[0,T]
1
[0,
k
]
[X[
2
d
M
< for all T < .
We leave the proof of this lemma as an exercise. The key point is that
for both L
2
-martingales and local L
2
-martingales, and a stopping time ,
M
(A) =
M
(A [0, ]) for A T. (Just check that the proof of Lemma
5.15 applies without change to local L
2
-martingales.)
Furthermore, we leave as an exercise proof of the result that if X, Y
/(M, T) are
M
-equivalent, which means again that
M
(t, ) : X(t, ) ,= Y (t, ) = 0,
then X M = Y M in the sense of indistinguishability.
5.3. Semimartingale integrator
First a reminder of some terminology and results. A cadlag semimartingale
is a process Y that can be written as Y
t
= Y
0
+M
t
+V
t
where M is a cadlag
local martingale, V is a cadlag FV process, and M
0
= V
0
= 0. To dene the
stochastic integral, we need M to be a local L
2
-martingale. If we assume
the ltration T
t
complete and right-continuous (the usual conditions),
then by Corollary 3.21 we can always select the decomposition so that M is
a local L
2
-martingale. Thus usual conditions for T
t
need to be assumed in
this section, unless one works with a semimartingale Y for which it is known
that M can be chosen a local L
2
-martingale. If g is a function of bounded
variation on [0, T], then the Lebesgue-Stieltjes measure
g
of g exists as a
signed Borel measure on [0, T] (Section 1.1.9).
In this section the integrands will be predictable processes X that satisfy
this condition:
n
such that
n
almost
surely and 1
(0,n]
X is a bounded process for each n.
(5.36)
In particular, the categories listed in Proposition 5.27 are covered. We
deliberately ask for 1
(0,n]
X to be bounded instead of 1
[0,n]
X because X
0
might not be bounded.
Denition 5.35. Let Y be a cadlag semimartingale. Let X be a predictable
process that satises (5.36). Then we dene the integral of X with respect
to Y as the process
(5.37)
_
(0,t]
X
s
dY
s
=
_
(0,t]
X
s
dM
s
+
_
(0,t]
X
s

V
(ds).
Here Y = Y
0
+M+V is some decomposition of Y into a local L
2
-martingale
M and an FV process V ,
_
(0,t]
X
s
dM
s
= (X M)
t
is the stochastic integral of Denition 5.23, and
_
(0,t]
X
s

V
(ds) =
_
(0,t]
X
s
dV
s
is the path-by-path Lebesgue-Stieltjes integral of X with respect to the
function s V
s
. The process
_
X dY thus dened is unique up to indistin-
guishability and it is a semimartingale.
As before, we shall use the notations X Y and
_
X dY interchangeably.
Justication of the denition. The rst item to check is that the inte-
gral does not depend on the decomposition of Y chosen. Suppose Y =
Y
0
+

M +

V is another decomposition of Y into a local L
2
-martingale

M
and an FV process

V . We need to show that
_
(0,t]
X
s
dM
s
+
_
(0,t]
X
s

V
(ds) =
_
(0,t]
X
s
d
M
s
+
_
(0,t]
X
s

e
V
(ds)
in the sense that the processes on either side of the equality sign are indistin-
guishable. By Proposition 5.31(d) and the additivity of Lebesgue-Stieltjes
measures, this is equivalent to
_
(0,t]
X
s
d(M

M)
s
=
_
(0,t]
X
s

e
V V
(ds).
From Y = M +V =

M +

V we get
M

M =

V V
and this process is both a local L
2
-martingale and an FV process. The
equality we need is a consequence of the next proposition.
Proposition 5.36. Suppose Z is a cadlag local L
2
-martingale and an FV
process. Let X be a predictable process that satises (5.36). Then for almost
every
(5.38)
_
(0,t]
X(s, )dZ
s
() =
_
(0,t]
X(s, )
Z()
(ds) for all 0 t < .
On the left is the stochastic integral, on the right the Lebesgue-Stieltjes in-
tegral evaluated separately for each xed .
Proof. Both sides of (5.38) are right-continuous in t, so it suces to check
that for each t they agree with probability 1.
Step 1. Start by assuming that Z is an L
2
-martingale. Fix 0 < t < .
Let
H = X : X is a bounded predictable process and (5.38) holds for t.
By the linearity of both integrals, H is a linear space.
Indicators of predictable rectangles lie in H because we know explicitly
what integrals on both sides of (5.38) look like. If X = 1
(u,v]F
for F T
u
,
then the left side of (5.38) equals 1
F
(Z
vt
Z
ut
) by the rst denition (5.7)
of the stochastic integral. The right-hand side equals the same thing by the
denition of the Lebesgue-Steltjes integral. If X = 1
0F
0
for F
0
T
0
,
both sides of (5.38) vanish, on the left by the denition (5.7) of the stochastic
integral and on the right because the integral is over (0, t] and hence excludes
the origin.
Let X be a bounded predictable process, X
n
H and X
n
X. We
wish to argue that at least along some subsequence n
j
, both sides of
(5.39)
_
(0,t]
X
n
(s, ) dZ
s
() =
_
(0,t]
X
n
(s, )
Z()
(ds)
converge for almost every to the corresponding integrals with X. This
would imply that X H. Then we would have checked that the space H
satises the hypotheses of Theorem B.2. (The -system for the theorem is
the class of predictable rectangles.)
On the right-hand side of (5.39) the desired convergence follows from
dominated convergence. For a xed , the BV function s Z
s
() on [0, t]
can be expressed as the dierence Z
s
() = f(s) g(s) of two nondecreasing
functions. Hence the signed measure
Z()
is the dierence of two nite
positive measures:
Z()
=
f

g
. Then
lim
n
_
(0,t]
X
n
(s, )
Z()
(ds)
= lim
n
_
(0,t]
X
n
(s, )
f
(ds) lim
n
_
(0,t]
X
n
(s, )
g
(ds)
=
_
(0,t]
X(s, )
f
(ds)
_
(0,t]
X(s, )
g
(ds)
=
_
(0,t]
X(s, )
Z()
(ds).
The limits are applications of the usual dominated convergence theorem,
because C X
1
X
n
X C for some constant C.
Now the left side of (5.39). For a xed T < , by dominated conver-
gence,
lim
n
_
[0,T]
[X X
n
[
2
d
Z
= 0.
Hence |X
n
X|
/
2
(Z,1)
0, and by the isometry X
n
Z X Z in /
2
, as
n . Then for a xed t, (X
n
j
Z)
t
(X Z)
t
almost surely along some
subsequence n
j
. Thus taking the limit along n
j
on both sides of (5.39)
gives
_
(0,t]
X(s, ) dZ
s
() =
_
(0,t]
X(s, )
Z()
(ds)
almost surely. By Theorem B.2, H contains all bounded T-measurable pro-
cesses.
This completes Step 1: (5.38) has been veried for the case where Z
/
2
and X is bounded.
Step 2. Now consider the case of a local L
2
-martingale Z. By the
assumption on X we may pick a localizing sequence
k
such that Z
k
is
an L
2
-martingale and 1
(0,
k
]
X is bounded. Then by Step 1,
(5.40)
_
(0,t]
1
(0,
k
]
(s)X(s) dZ
k
s
=
_
(0,t]
1
(0,
k
]
(s)X(s)
Z
k (ds).
We claim that on the event
k
t the left and right sides of (5.40) coincide
almost surely with the corresponding sides of (5.38).
The left-hand side of (5.40) coincides almost surely with
_
(1
[0,
k
]
X)Z
k
_
t
due to the irrelevance of the time origin. By (5.27) this is the denition of
(X Z)
t
on the event
k
t.
On the right-hand side of (5.40) we only need to observe that if
k

t, then on the interval (0, t] 1
(0,
k
]
(s)X(s) coincides with X(s) and Z
k
s
coincides with Z
s
. So it is clear that the integrals on the right-hand sides of
(5.40) and (5.38) coincide.
The union over k of the events
k
t equals almost surely the whole
space . Thus we have veried (5.38) almost surely, for this xed t.
Returning to the justication of the denition, we now know that the
process
_
X dY does not depend on the choice of the decomposition Y =
Y
0
+M +V .
X M is a local L
2
-martingale, and for a xed the function
t
_
(0,t]
X
s
()
V ()
(ds)
has bounded variation on every compact interval (Lemma 1.16). Thus the
denition (5.37) provides the semimartingale decomposition of
_
X dY .
As in the previous step of the development, we want to check the Rie-
mann sum approximations. Recall that for a cadlag X, the caglad process
X
is dened by X
(0) = X(0) and X
(t) = X(t) for t > 0. The hy-

potheses for the integrand in the next proposition are exactly the same as
earlier in Proposition 5.32.
Parts (a) and (b) below could be subsumed under a single statement
since X = X
for a left-continuous process. We prefer to keep them separate

to avoid confusing the issue that for a cadlag process X the limit is not
necessarily the stochastic integral of X. The integral X Y may fail to
exist, and even if it exists, it does not necessarily coincide with X
Y .
This is not a consequence of the stochastic aspect, but can happen also for
Lebesgue-Stieltjes integrals. (Find examples!)
Proposition 5.37. Let X be an adapted process and Y a cadlag semimartin-
gale. Suppose 0 =
n
0

n
1

n
2

n
3
are stopping times such that
for each n,
n
i
almost surely as i , and
n
= sup
0i<
(
n
i+1
n
i
)
tends to zero almost surely as n . Dene
(5.41) S
n
(t) =
i=0
X(
n
i
)
_
Y (
n
i+1
t) Y (
n
i
t)
_
.
(a) Assume X is left-continuous and satises (5.36). Then for each xed
T < and > 0,
lim
n
P
_
sup
0tT
[S
n
(t) (X Y )
t
[
_
= 0.
In other words, S
n
converges to X Y in probability, uniformly on compact
time intervals.
(b) If X is an adapted cadlag process, then S
n
converges to X
Y in
probability, uniformly on compacts.
Proof. Pick a decomposition Y = Y
0
+ M + V . We get the corresponding
decomposition S
n
= R
n
+U
n
by dening
R
n
(t) =
i=0
X(
n
i
)
_
M
n
i+1
t
M
n
i
t
_
as in Proposition 5.32, and
U
n
(t) =
i=0
X(
n
i
)
_
V
n
i+1
t
V
n
i
t
_
.
Proposition 5.32 gives the convergence R
n
X
M. Lemma 1.12
applied to the Lebesgue-Stieltjes measure of V
t
() tells us that, for almost
every xed
lim
n
sup
0tT
U
n
(t, )
_
(0,t]
X(s, )
V ()
(ds)
= 0.
The reservation almost every is needed in case there is an exceptional
zero probability event on which X or V fails to have the good path proper-
ties. Almost sure convergence implies convergence in probability.
Remark 5.38. Matrix-valued integrands and vector-valued integra-
tors. In order to consider equations for vector-valued processes, we need to
establish the (obvious componentwise) conventions regarding the integrals
of matrix-valued processes with vector-valued integrators. Let a probabil-
ity space (, T, P) with ltration T
t
be given. If Q
i,j
(t), 1 i d
and 1 j m, are predictable processes on this space, then we regard
Q(t) = (Q
i,j
(t)) as a d m-matrix valued predictable process. And if Y
1
,
. . . , Y
m
are semimartingales on this space, then Y = (Y
1
, . . . , Y
m
)
T
is an
R
m
-valued semimartingale. The stochastic integral Q Y =
_
QdY is the
R
d
-valued process whose ith component is
(Q Y )
i
(t) =
n
j=1
_
(0,t]
Q
i,j
(s) dY
j
(s)
assuming of course that all the componentwise integrals are well-dened.
One note of caution: the denition of a d-dimensional Brownian motion
B(t) = [B
1
(t), . . . , B
d
(t)]
T
includes the requirement that the coordinates be
independent of each other. For a vector-valued semimartingale, it is only
required that each coordinate be marginally a semimartingale.
5.4. Further properties of stochastic integrals
Now that we have constructed the stochastic integral, subsequent chapters
on Itos formula and stochastic dierential equations develop techniques that
can be applied to build models and study the behavior of those models. In
this section we develop further properties of the integral as groundwork for
later chapters. The content of Sections 5.4.25.4.4 is fairly technical and
is used only in the proof of existence and uniqueness for a semimartingale
equation in Section 7.4.
The usual conditions on T
t
need to be assumed only insofar as this is
needed for a denition of the integral with respect to a semimartingale. For
the proofs of this section right-continuity of T
t
is not needed. (Complete-
ness we assume always.)
The properties listed in Proposition 5.31 extend readily to the semi-
martingale intergal. Each property is linear in the integrator, holds for the
martingale part by the proposition, and can be checked path by path for
the FV part. We state the important ones for further reference and leave
the proofs as exercises.
Proposition 5.39. Let Y and Z be semimartingales, G and H predictable
processes that satisfy the local boundedness condition (5.36), and let be a
stopping time.
(a) Let U be a bounded T
-measurable random variable. Then

(5.42)
_
(0,t]
U1
(,)
GdY = U
_
(0,t]
1
(,)
GdY.
Furthermore,
(5.43)
_
(1
[0,]
G) Y
_
t
= (G Y )
t
= (G Y
)
t
.
(b) Suppose G
t
() = H
t
() and Y
t
() = Z
t
() for 0 t (). Then
(G Y )
t
= (H Z)
t
.
5.4.1. Jumps of a stochastic integral. For any cadlag process Z, Z(t)
= Z(t) Z(t) denotes the jump at t. First a strengthening of the part of
Proposition 2.16 that identies the jumps of the quadratic variation. The
strengthening is that the almost surely qualier is not applied separately
to each t.
Lemma 5.40. Let Y be a semimartingale. Then the quadratic variation
[Y ] exists. For almost every , [Y ]
t
= (Y
t
)
2
for all 0 < t < .
Proof. Fix 0 < T < . By Proposition 5.37, we can pick a sequence of
partitions
n
= t
n
i
of [0, T] such that the process
S
n
(t) = 2
i
Y
t
n
i
(Y
t
n
i+1
t
Y
t
n
i
t
) = Y
2
t
Y
2
0

i
(Y
t
n
i+1
t
Y
t
n
i
t
)
2
converges to the process
S(t) = 2
_
(0,t]
Y (s) dY (s),
uniformly for t [0, T], for almost every . This implies the convergence of
the sum of squares, so the quadratic variation [Y ] exists and satises
[Y ]
t
= Y
2
t
Y
2
0
S(t).
It is true in general that a uniform bound on the dierence of two func-
tions gives a bound on the dierence of the jumps: if f and g are cadlag
and [f(x) g(x)[ for all x, then
[f(x) g(x)[ = lim
y,x, y<x
[f(x) f(y) g(x) +g(y)[ 2.
Fix an for which the uniform convergence S
n
S holds. Then for
each t (0, T], the jump S
n
(t) converges to S(t) = (Y
2
)
t
[Y ]
t
.
Directly from the denition of S
n
one sees that S
n
(t) = 2Y
t
n
k
Y
t
for
the index k such that t (t
n
k
, t
n
k+1
]. Here is the calculation in detail: if s < t
is close enough to t, also s (t
n
k
, t
n
k+1
], and then
S
n
(t) = lim
s,t, s<t
_
S
n
(t) S
n
(s)
_
= lim
s,t, s<t
_
2
i
Y
t
n
i
(Y
t
n
i+1
t
Y
t
n
i
t
) 2
i
Y
t
n
i
(Y
t
n
i+1
s
Y
t
n
i
s
)
_
= lim
s,t, s<t
_
2Y
t
n
k
(Y
t
Y
t
n
k
) 2Y
t
n
k
(Y
s
Y
t
n
k
)
_
= lim
s,t, s<t
2Y
t
n
k
(Y
t
Y
s
) = 2Y
t
n
k
Y
t
.
By the cadlag path of Y , S
n
(t) 2Y
t
Y
t
. Equality of the two limits
of S
n
(t) gives
2Y
t
Y
t
= (Y
2
)
t
[Y ]
t
which rearranges to [Y ]
t
= (Y
t
)
2
.
Theorem 5.41. Let Y be a cadlag semimartingale, X a predictable process
that satises the local boundedness condition (5.36), and X Y the stochastic
integral. Then for all in a set of probability one,
(X Y )(t) = X(t)Y (t) for all 0 < t < .
We prove this theorem in stages. The reader may be surprised by the
appearance of the point value X(t) in a statement about integrals. After all,
one of the lessons of integration theory is that point values of functions are
often irrelevant. But implicit in the proof below is the notion that a point
value of the integrand X inuences the integral exactly when the integrator
Y has a jump.
Lemma 5.42. Let M be a cadlag local L
2
-martingale and X /(M, T).
Then for all in a set of probability one, (X M)(t) = X(t)M(t) for all
0 < t < .
Proof. Suppose the conclusion holds if M /
2
and X /
2
(M, T). Pick
a sequence
k
that localizes (X, M) and let X
k
= 1
[0,
k
]
X. Fix such
that denition (5.27) of the integral X M works and the conclusion above
holds for each integral X
k
M
k
. Then if
k
> t,
(X M)
t
= (X
k
M
k
)
t
= X
k
(t)M
k
t
= X(t)M
t
.
For the remainder of the proof we may assume that M /
2
and
X /
2
(M, T). Pick simple predictable processes
X
n
(t) =
m(n)1
i=1
n
i
1
(t
n
i
,t
n
i+1
]
(t)
such that X
n
X in /
2
(M, T). By the denition of Lebesgue-Stieltjes
integrals, and because by Lemma 5.40 the processes [M]
t
and (M
t
)
2
are
indistinguishable,
E
_

s(0,T]
X
n
(s)M
s
X(s)M
s
2
_
= E
_

s(0,T]
X
n
(s) X(s)
2
[M]
s
_
E
_
(0,T]
X
n
(s) X(s)
2
d[M]
s
and by hypothesis the last expectation vanishes as n . Thus there
exists a subsequence n
j
along which almost surely
lim
j
s(0,T]
X
n
j
(s)M
s
X(s)M
s
2
= 0.
In particular, on this event of full probability, for any t (0, T]
(5.44) lim
j
X
n
j
(t)M
t
X(t)M
t
2
= 0.
On the other hand, X
n
j
M X M in /
2
implies that along a further
subsequence (which we denote by the same n
j
), almost surely,
(5.45) lim
j
sup
t[0,T]
(X
n
j
M)
t
(X M)
t
= 0.
Fix an on which both almost sure limits (5.44) and (5.45) hold. For
any t (0, T], the uniform convergence in (5.45) implies (X
n
j
M)
t

(X M)
t
. Also, since a path of X
n
j
M is a step function, (X
n
j
M)
t
=
X
n
j
(t)M
t
. (The last two points were justied explicitly in the proof of
Lemma 5.40 above.) Combining these with the limit (5.44) shows that, for
this xed and all t (0, T],
(X M)
t
= lim
j
(X
n
j
M)
t
= lim
j
X
n
j
(t)M
t
= X(t)M
t
.
In other words, the conclusion holds almost surely on [0, T]. To nish,
consider countably many T that increase up to .
Lemma 5.43. Let f be a bounded Borel function and U a BV function on
[0, T]. Denote the Lebesgue-Stieltjes integral by
(f U)
t
=
_
(0,t]
f(s) dU(s).
Then (f U)
t
= f(t)U(t) for all 0 < t T.
Proof. By the rules concerning Lebesgue-Stieltjes integration,
(f U)
t
(f U)
s
=
_
(s,t]
f(s) dU(s)
=
_
(s,t)
f(s) dU(s) +f(t)U(t)
and
_
(s,t)
f(s) dU(s)
|f|
V
U
(s, t).
V
U
is the total variation function of U. As s t, the set (s, t) decreases
down to the empty set. Since
V
U
is a nite positive measure on [0, T],
V
U
(s, t) 0 as s t.
Proof of Theorem 5.41 follows from combining Lemmas 5.42 and 5.43.
We introduce the following notation for the left limit of a stochastic integral:
(5.46)
_
(0,t)
H dY = lim
s,t, s<t
_
(0,s]
H dY
and then we have this identity:
(5.47)
_
(0,t]
H dY =
_
(0,t)
H dY +H(t)Y (t).
5.4.2. Convergence theorem for stochastic integrals. The next the-
orem is a dominated convergence theorem of sorts for stochastic integrals.
We will use it in the existence proof for solutions of stochastic dierential
equations.
Theorem 5.44. Let H
n
be a sequence of predictable processes and G
n
a
sequence of nonnegative adapted cadlag processes. Assume [H
n
(t)[ G
n
(t)
for all 0 < t < , and the running maximum
G
n
(T) = sup
0tT
G
n
(t)
converges to zero in probability, for each xed 0 < T < . Then for any
cadlag semimartingale Y , H
n
Y 0 in probability, uniformly on compact
time intervals.
Proof. Let Y = Y
0
+ M + U be a decomposition of Y into a local L
2
-
martingale M and an FV process U. We show that both terms in H
n
Y =
H
n
M +H
n
U converge to zero in probability, uniformly on compact time
intervals. The FV part is immediate:
sup
0tT
_
(0,t]
H
n
(s) dU(s)
sup
0tT
_
(0,t]
[H
n
(s)[ dV
U
(s) G
n
(T)V
U
(T)
where we applied inequality (1.14). Since V
U
(T) is a nite random variable,
the last bound above converges to zero in probability.
For the local martingale part H
n
M, pick a sequence of stopping times
k
that localizes M. Let
n
= inft 0 : G
n
(t) 1 inft > 0 : G
n
(t) 1.
These are stopping times by Lemma 2.9. By left-continuity, G
n
(t) 1 for
0 < t
n
. For any T < ,
P
n
T PG
n
(T) > 1/2 0 as n .
Let H
(1)
n
= (H
n
1) (1) denote the bounded process obtained by trun-
cation. If t
k

n
then H
n
M = H
(1)
n
M
k
by part (c) of Proposition
5.31. As a bounded process H
(1)
n
/
2
(M
k
, T), so by martingale inequality
(3.8) and the stochastic integral isometry (5.12),
P
_
sup
0tT
[(H
n
M)
t
[
_
P
k
T +P
n
T
+P
_
sup
0tT
_
H
(1)
n
M
k
_
t

_
P
k
T +P
n
T +
2
E
_
(H
(1)
n
M
k
)
2
T
= P
k
T +P
n
T +
2
E
_
[0,T]
[H
(1)
n
(t)[
2
d[M
k
]
t
P
k
T +P
n
T +
2
E
_
[0,T]
_
G
n
(t) 1
_
2
d[M
k
]
t
P
k
T +P
n
T +
2
E
_
[M
k
]
T
_
G
n
(T) 1
_
2
_
.
As k and T stay xed and n , the last expectation above tends to zero.
This follows from the dominated convergence theorem under convergence in
probability (Theorem B.11). The integrand is bounded by the integrable
random variable [M
k
]
T
. Given > 0, pick K > so that
P
_
[M
k
]
T
K
_
< /2.
Then
P
_
[M
k
]
T
_
G
n
(T) 1
_
2

_
/2 +P
_
G
n
(T)
_
/K
_
where the last probability vanishes as n , by the assumption that
G
n
(T) 0 in probability.
Returning to the string of inequalities and letting n gives
lim
n
P
_
sup
0tT
[(H
n
M)
t
[
_
P
k
T.
This last bound tends to zero as k , and so we have proved that
H
n
M 0 in probability, uniformly on [0, T].
5.4.3. Restarting at a stopping time. Let Y be a cadlag semimartingale
and G a predictable process that satises the local boundedness condition
(5.36). Let be a bounded stopping time with respect to the underlying
ltration T
t
. We take a bounded stopping time in order to get moment
bounds from Lemma 3.5. Dene a new ltration

T
t
= T
+t
. Let

T be the
predictable -eld under the ltration
T
t
. In other words,

T is the -eld
generated by sets of the type (u, v] for

T
u
and 0
0
for
0

T
0
.
Dene new processes
Y (t) = Y ( +t) Y () and

G(s) = G( +s).
For

Y we could dene just as well

Y (t) = Y ( + t) without changing the
statements below. The reason is that

Y appears only as an integrator, so
only its increments matter. Sometimes it might be convenient to have initial
value zero, that is why we subtract Y () from

Y (t).
Theorem 5.45. Let be a bounded stopping time with respect to T
t
.
Under the ltration
T
t
, the process

G is predictable and

Y is an adapted
semimartingale. We have this equality of stochastic integrals:
_
(0,t]
G(s) d
Y (s) =
_
(,+t]
G(s) dY (s)
=
_
(0,+t]
G(s) dY (s)
_
(0,]
G(s) dY (s).
(5.48)
The second equality in (5.48) is the denition of the integral over (, +
t]. The proof of Theorem 5.45 follows after two lemmas.
Lemma 5.46. For any T-measurable function G,

G(t, ) = G(() +t, )
is

T-measurable.
Proof. Let | be the space of T-measurable functions G such that

G is
T-measurable. | is a linear space and closed under pointwise limits since

these operations preserve measurability. Since any T-measurable function
is a pointwise limit of bounded T-measurable functions, it suces to show
that | contains all bounded T-measurable functions. According to Theorem
B.2 we need to check that | contains indicator functions of predictable
rectangles. We leave the case 0
0
for
0
T
0
to the reader.
Let T
u
and G = 1
(u,v]
. Then
G(t) = 1
(u,v]
( +t)1
() =
_
1
(), u < +t v
0, otherwise.
For a xed ,

G(t) is a caglad process. By Lemma 5.1,

T-measurability of
G follows if it is adapted to
T
t
. Since
G(t) = 1 = u < +t v,
G is adapted to
T
t
if u < +t v

T
t
. This is true by Lemma 2.3
because u, v and +t can be regarded as stopping times and

T
t
= T
+t
.
Lemma 5.47. Let M be a local L
2
-martingale with respect to T
t
. Then
M
t
= M
+t
M
is a local L
2
T
t
. If G
/(M, T), then

G /(

M,

T), and
(

G

M)
t
= (G M)
+t
(G M)
.
Proof. Let
k
localize (G, M). Let
k
= (
k
)
+
. For any 0 t < ,
k
t =
k
+t T
+t
=

T
t
by Lemma 2.3(ii)
so
k
is a stopping time for the ltration
T
t
. If
k
,
k
t
=

M
k
t
= M
+(
k
t)
M
= M
(+t)
k
M
k
= M
k
+t
M
.
If >
k
, then
k
= 0 and

M
k
t
=

M
0
= M
= 0. The earlier formula

also gives zero in case >
k
, so in both cases we can write
(5.49)

M
k
t
= M
k
+t
M
.
Since M
k
is an L
2
-martingale, the right-hand side above is an L
2
-martingale
with respect to
T
t
. See Corollary 3.8 and point (ii) after the corollary.
Consequently

M
k
is an L
2
T
t
, and hence

M
is a local L
2
-martingale.
Next we show that
k
localizes (

G,

M). We x k temporarily and
write Z = M
k
to lessen the notational burden. Then (5.49) shows that
k
t
=

Z
t
Z
+t
Z
,
in other words obtained from Z by the operation of restarting at . We
turn to look at the measure
Z
=
M
k
. Let T
u
. Then 1
(u,v]
is
T-measurable, while 1
(u,v]
( +t, ) is

T-measurable.
_
1
()1
(u,v]
( +t)
Z
(dt, d) = E
_
1
()
_
[
Z]
(v)
+ [
Z]
(u)
+
_
_
= E
_
1
()
_
Z
(v)
+

Z
(u)
+
_
2
_
= E
_
1
()
_
Z
+(v)
+ Z
+(u)
+
_
2
_
= E
_
1
()
_
[Z]
v
[Z]
u
_
_
= E
_
1
()
_
1
((),)
(t)1
(u,v]
(t) d[Z]
t
_
=
_
1
(,)
(t, )1
()1
(u,v]
(t) d
Z
.
Several observations go into the calculation. First, (u )
+
is a stopping
time for
T
t
, and

T
(u)
+. We leave checking these as exercises. This
justies using optional stopping for the martingale

Z
2
[
Z]. Next note that

+ (u )
+
= u, and again the martingale property is justied by
T
u
. Finally ( u, v] = (, ) (u, v].
From predictable rectangles (u, v] T this integral identity extends
to nonnegative predictable processes X to give, with

X(t) = X( +t),
(5.50)
_

X d
Z
=
_
1
(,)
X d
Z
.
Let T < and apply this to X = 1
(,+T]
(t)1
(0,
k
]
(t)[G(t)[
2
. Then
X(t) = 1
[0,T]
(t)1
(0,
k
]
(t)

G(t)
and
_
[0,T]
1
(0,
k
]
(t)[
G(t)[
2
d
Z
=
_
1
(,+T]
(t)1
(0,
k
]
(t)[G(t)[
2
d
Z
< ,
where the last inequality is a consequence of the assumptions that
k
localizes (G, M) and is bounded.

To summarize thus far: we have shown that
k
localizes (

G,

M). This
checks

G /(

M,

T).
Fix again k and continue denoting the L
2
-martingales by Z = M
k
and
Z =

M
k
. Consider a simple T-predictable process
H
n
(t) =
m1
i=0
i
1
(u
i
,u
i+1
]
(t).
Let k denote the index that satises u
k+1
> u
k
. (If there is no such k
then

H
n
= 0.) Then
H
n
(t) =
ik
i
1
(u
i
,u
i+1
]
(t).
The stochastic integral is
(

H
n

Z )
t
=
ik
i
_
Z
(u
i+1
)t

Z
(u
i
)
+
t
_
=
i>k
i
_
Z
+(u
i+1
)t
Z
+(u
i
)t
_
+
k
_
Z
+(u
k+1
)t
Z
_
.
The i = k term above develops dierently from the others because

Z
(u
k
)
+
t
=
Z
0
= 0. Observe that
_
+ (u
i
) t = ( +t) u
i
for i > k
( +t) u
i
= u
i
= u
i
for i k.
Now continue from above:
(

H
n

Z )
t
=
i>k
i
_
Z
(+t)u
i+1
Z
(+t)u
i
_
+
k
_
Z
(+t)u
k+1
Z
_
=
i
_
Z
(+t)u
i+1
Z
(+t)u
i
_

i
_
Z
u
i+1
Z
u
i
_
= (H
n
Z)
+t
(H
n
Z)
.
Next we take an arbitrary process H /
2
(Z, T), and wish to show
(5.51) (

H

Z)
t
= (H Z)
+t
(H Z)
.
Take a sequence H
n
of simple predictable processes such that H
n
H
in /
2
(Z, T). Equality (5.50), along with the boundedness of , then implies
that

H
n

H in /
2
(
Z,

T). Consequently we get the convergence

H
n
H

Z and H
n
Z HZ in probability, uniformly on compact time intervals.
Identity (5.51) was veried above for simple predictable processes, hence the
convergence passes it on to general processes. Boundedness of is used here
because then the time arguments +t and on the right side of (5.51) stay
bounded. Since (H Z)
=
_
(1
[0,]
H) Z
_
+t
we can rewrite (5.51) as
(5.52) (

H

Z)
t
=
_
(1
(,)
H) Z
_
+t
.
At this point we have proved the lemma in the L
2
case and a localization
argument remains. Given t, pick
k
> t. Then
k
> +t. Use the fact that
k
and
k
are localizing sequences for their respective integrals.
(

G

M)
t
=
_
(1
(0,
k
]

G)

M
k
_
t
=
_
(1
(,)
1
[0,
k
]
G) M
k
_
+t
=
_
(1
(,)
G) M
_
+t
= (G M)
+t
(G M)
.
This completes the proof of the lemma.
Proof of Theorem 5.45.

Y is a semimartingale because by Lemma 5.47
the restarting operation preserves the local L
2
-martingale part, while the
FV part is preserved by a direct argument. If Y were an FV process, the
integral identity (5.48) would be evident as a path-by-path identity. And
again Lemma 5.47 takes care of the local L
2
-martingale part of the integral,
so (5.48) is proved for a semimartingale Y .
5.4.4. Stopping just before a stopping time. Let be a stopping time
and Y a cadlag process. The process Y
is dened by
(5.53) Y
(t) =
_
_
Y (0), t = 0 or = 0
Y (t), 0 < t <
Y (), 0 < t.
In other words, the process Y has been stopped just prior to the stopping
time. This type of stopping is useful for processes with jumps. For example,
if
= inft 0 : [Y (t)[ r or [Y (t)[ r
then [Y
[ r may fail if Y jumped exactly at time , but [Y
[ r is true.
For continuous processes Y
and Y
coincide. More precisely, the

relation between the two stoppings is that
Y
(t) = Y
(t) + Y ()1t .
In other words, only a jump of Y at can produce a dierence, and that is
not felt until t reaches .
The next example shows that stopping just before can fail to preserve
the martingale property. But it does preserve a semimartingale, because a
single jump can be moved to the FV part, as evidenced in the proof of the
lemma after the example.
Example 5.48. Let N be a rate Poisson process and M
t
= N
t
t.
Let be the time of the rst jump of N. Then N
= 0, from which
M
t
= t1
>t
1
t
. This process cannot be a martingale because
M
0
= 0 while M
t
< 0 for all t. One can also check that the expectation
of M
t
is not constant in t, which violates martingale behavior.
Lemma 5.49. Let Y be a semimartingale and a stopping time. Then
Y
is a semimartingale.
Proof. Let Y = Y
0
2
-
martingale M and an FV process U. Then
(5.54) Y
t
= Y
0
+M
t
+U
t
M
1t .
M
is a local L
2
-martingale, and the remaining part
G
t
= U
t
M
1t
is an FV process.
Next we state some properties of integrals stopped just before .
Proposition 5.50. Let Y and Z be semimartingales, G and J predictable
processes locally bounded in the sense (5.36), and a stopping time.
(a) (G Y )
= (1
[0,]
G) (Y
) = G (Y
).
(b) If G = J on [0, ] and Y = Z on [0, ), then (G Y )
= (J Z)
.
Proof. It suces to check (a), as (b) is an immediate consequence. Using
the semimartingale decomposition (5.54) for Y
we have
(1
[0,]
G) Y
= (1
[0,]
G) M
+ (1
[0,]
G) U
G()M
1
[,)
= (G M)
+ (G U)
G()M
1
[,)
,
and the same conclusion also without the factor 1
[0,]
:
G Y
= G M
+G U
G()M
1
[,)
= (G M)
+ (G U)
G()M
1
[,)
.
In the steps above, the equalities
(1
[0,]
G) M
= G M
= (G M)
.
were obtained from (5.32). The equality
(1
[0,]
G) U
= G U
= (G U)
comes from path by path Lebesgue-Stieltjes integration: since U
froze
just prior to , its Lebesgue-Stieltjes measure satises
U
(A) = 0 for any
Borel set A [, ), and so for any bounded Borel function g and any
t ,
_
[0,t]
g d
U
=
_
[0,)
g d
U
.
Lastly, note that by Theorem 5.41
(G M)
t
G()M
1
[,)
(t) = (G M)
t
(G M)
1
[,)
(t)
= (G M)
t
.
Applying this change above gives
(1
[0,]
G) Y
= (G M)
+ (G U)
= (G Y )
and completes the proof.

5.5. Integrator with absolutely continuous Doleans measure
This section is an appendix to the chapter, somewhat apart from the main
development, and can be skipped.
In Chapter 4 we saw that any measurable, adapted process can be in-
tegrated with respect to Brownian motion, provided the process is locally
in /
2
. On the other hand, the integral of Sections 5.1 and 5.2 is restricted
to predictable processes. How does the more general Brownian integral t
in the general theory? Do we need to develop separately a more general
integral for other individual processes besides Brownian motion?
In this section we partially settle the issue by showing that when the
Doleans measure
M
of a local L
2
-martingale M is absolutely continuous
with respect to m P, then all measurable adapted processes can be inte-
grated (subject to the localization requirement). In particular, this applies
to Brownian motion to give all the integrals dened in Section 4, and also
applies to the compensated Poisson process M
t
= N
t
t.
The extension is somewhat illusory though. We do not obtain genuinely
new integrals. Instead we show that every measurable adapted process X
is equivalent, in a sense to be made precise below, to a predictable process
X. Then we dene X M =

X M. This will be consistent with our earlier
denitions because it was already the case that
M
-equivalent processes have
equal stochastic integrals.
5.5. Non-predictable integrands 183
For the duration of this section, x a local L
2
-martingale M with Doleans
measure
M
. The Doleans measure of a local L
2
-martingale is dened ex-
actly as for L
2
-martingales, see Remark 5.33. We assume that
M
is ab-
solutely continuous with respect to m P on the predictable -algebra T.
Recall that absolute continuity, abbreviated
M
m P, means that if
mP(D) = 0 for some D T, then
M
(D) = 0.
Let T
be the -algebra generated by the predictable -algebra T and

all sets D B
R
+
T with m P(D) = 0. Equivalently, as checked in
Exercise 1.7(e),
T
= G B
R
+
T : there exists A T
such that mP(GA) = 0.
(5.55)
Denition 5.51. Suppose
M
is absolutely continuous with respect to m
P on the predictable -algebra T. By the Radon-Nikodym Theorem, there
exists a T-measurable function f
M
0 on R
+
such that
(5.56)
M
(A) =
_
A
f
M
d(mP) for A T.
Dene a measure
M
on the -algebra T
by
(5.57)
M
(G) =
_
G
f
M
d(mP) for G T
.
The measure
M
is an extension of
M
from T to the larger -algebra T
.
Firstly, denition (5.57) makes sense because f
M
and G are B
R
+
T-
measurable, so they can be integrated against the product measure mP.
Second, if G T, comparison of (5.56) and (5.57) shows
M
(G) =
M
(G),
so
M
is an extension of
M
.
Note also the following. Suppose G and A are in the relationship (5.55)
that characterizes T
. Then
M
(G) =
M
(A). To see this, write
M
(G) =
M
(A) +
M
(G A)
M
(A G)
=
M
(A) +
_
G\A
f
M
d(mP)
_
A\G
f
M
d(mP)
=
M
(A)
where the last equality follows from
mP(G A) +mP(A G) = mP(GA) = 0.
The key facts that underlie the extension of the stochastic integral are
assembled in the next lemma.
Lemma 5.52. Let X be an adapted, measurable process. Then there exists
a T-measurable process

X such that
(5.58) mP(t, ) R
+
: X(t, ) ,=

X(t, ) = 0.
In particular, all measurable adapted processes are T
-measurable.
Under the assumption
M
mP, we also have
(5.59)
M
(t, ) R
+
: X(t, ) ,=

X(t, ) = 0.
Following our earlier conventions, we say that X and

X are
M
-equivalent.
Proof. Let X be a bounded, adapted, measurable process. By Lemma 4.2
there exists a sequence of simple predictable processes X
n
such that
(5.60) E
_
[0,T]
[X X
n
[
2
ds 0
for all T < .
We claim that there exists a subsequence X
n
k
such that X
n
k
(t, )
X(t, ) mP-almost everywhere on R
+
. We perform this construction
with the usual diagonal argument. In general, L
2
convergence implies almost
everywhere convergence along some subsequence. Thus from (5.60) for T =
1 we can extract a subsequence X
n
1
j
: j N such that X
n
1
j
X mP-
almost everywhere on the set [0, 1] . Inductively, suppose we have a
subsequence X
n
j
: j N such that lim
j
X
n
j
= X m P-almost
everywhere on the set [0, ] . Then apply (5.60) for T = +1 to extract
a subsequence n
+1
j
: j N of n
j
: j N such that lim
j
X
n
+1
j
= X
mP-almost everywhere on the set [0, + 1] .
From the array n
j
: , j N thus constructed, take the diagonal
n
k
= n
k
k
for k N. For any , n
k
: k is a subsequence of n
j
: j N,
and consequently X
n
k
X mP-almost everywhere on the set [0, ] .
Let
A = (t, ) R
+
: X
n
k
(t, ) X(t, ) as k .
The last observation on n
k
implies that
A
c
=
_
=1
(t, ) [0, ] : X
n
k
(t, ) does not converge to X(t, )
is a countable union of sets of mP-measure zero. Thus X
n
k
X mP-
almost everywhere on R
+
.
Set
X(t, ) = limsup
k
X
n
k
(t, ).
5.5. Non-predictable integrands 185
The processes X
n
k
are T-measurable by construction. Consequently

X is
T-measurable. On the set A,

X = X. Since mP(A
c
) = 0, (5.58) follows
for this X.
For a Borel set B B
R
,
(t, ) R
+
: X(t, ) B
= (t, ) A : X(t, ) B (t, ) A
c
: X(t, ) B
= (t, ) A :

X(t, ) B (t, ) A
c
: X(t, ) B.
This expresses X B as a union of a set in T and a set in B
R
+
T with
mP-measure zero. So X B T
.
The lemma has now been proved for a bounded adapted measurable
process X.
Given an arbitrary adapted measurable process X, let X
(k)
= (X
k) (k). By the previous part, X
(k)
is T
-measurable, and there exist

T-measurable processes

X
k
such that X
(k)
=

X
k
mP-almost everywhere.
Dene (again)

X = limsup
k

X
k
. Since X
(k)
X pointwise, X is also
T
-measurable, and X =

X mP-almost everywhere.
A consequence of the lemma is that it makes sense to talk about the
M
-measure of any event involving measurable, adapted processes.
Denition 5.53. Assume M is a local L
2
-martingale whose Doleans mea-
sure
M
satises
M
mP on T. Dene the extension
M
of
M
to T
as in Denition 5.51.
Let /(M, T
) be the class of measurable adapted processes X for which

there exists a nondecreasing sequence of stopping times
k
such that
k

almost surely, and for each k,
_
[0,T]
1
[0,
k
]
[X[
2
d
M
< for all T < .
For X /(M, T
), the stochastic integral XM is the local L

2
-martingale
given by

X M, where

X is the T-measurable process that is
M
-equivalent
to X in the sense (5.59). This

X will lie in /(M, T) and so the process

XM
exists. The process X M thus dened is unique up to indistinguishability.
Justication of the denition. Since

X = X
M
-almost everywhere,
their
M
-integrals agree, and in particular
_
[0,T]
1
[0,
k
]
[

X[
2
d
M
=
_
[0,T]
1
[0,
k
]
[X[
2
d
M
< for all T < .
By Lemma 5.34, this is just another way of expressing

X /(M, T). It fol-
lows that the integral

X M exists and is a member of /
2,loc
. In particular,
we can dene X M =

X M as an element of /
2,loc
.
If we choose another T-measurable process

Y that is
M
-equivalent to
X, then

X and

Y are
M
-equivalent, and the integrals

X M and

Y M are
indistinguishable by Exercise 5.8.
If M is an L
2
-martingale to begin with, such as Brownian motion or the
compensated Poisson process, we naturally dene /
2
(M, T
) as the space
of measurable adapted processes X such that
_
[0,T]
[X[
2
d
M
< for all T < .
The integral extended to /(M, T
) or /
2
(M, T
) enjoys all the properties

derived before, because the class of processes that appear as stochastic in-
tegrals has not been expanded.
A technical point worth noting is that once the integrand is not pre-
dictable, the stochastic integral does not necessarily coincide with the Lebes-
gue-Stieltjes integral even if the integrator has paths of bounded variation.
This is perfectly illustrated by the Poisson process in Exercise 5.11. While
this may seem unnatural, we must remember that the theory of Ito stochas-
tic integrals is so successful because integrals are martingales.
Exercises 187
Exercises
Exercise 5.1. Show that if X : R
+
R is T-measurable, then X
t
() =
X(t, ) is a process adapted to the ltration T
t
.
Hint: Given B B
R
, let A = (t, ) : X(t, ) B T. The event
X
t
B equals the t-section A
t
= : (t, ) A, so it suces to show
that an arbitrary A T satises A
t
T
t
for all t R
+
. This follows
from checking that predictable rectangles have this property, and that the
collection of sets in B
R
+
T with this property form a sub--eld.
Exercise 5.2. (a) Show that for any Borel function h : R
+
R, the
deterministic process X(t, ) = h(t) is predictable. Hint: Intervals of the
type (a, b] generate the Borel -eld of R
+
.
(b) Suppose X is an R
m
-valued predictable process and g : R
m
R
d
a Borel function. Show that the process Z
t
= g(X
t
) is predictable.
(c) Suppose X is an R
m
-valued predictable process and f : R
+
R
m
R
d
a Borel function. Show that the process W
t
= f(t, X
t
) is predictable.
Hint. Parts (a) and (b) imply that h(t)g(X
t
) is predictable. Now appeal
to results around the - Theorem.
Exercise 5.3. Fill in the missing details of the proof that T is generated
by continuous processes (see Lemma 5.1).
Exercise 5.4. Let N
t
be a rate Poisson process on R
+
and M
t
= N
t
t.
In Example 5.3 we checked that
M
= mP on T. Here we use this result
to show that the process N is not T-measurable.
(a) Using
M
= mP evaluate the integral
_
[0,T]
N(s, )
M
(ds, d).
(b) Evaluate
E
_
[0,T]
N
s
d[M]
s
where the inner integral is the pathwise Lebsgue-Stieltjes integral, in accor-
dance with the interpretation of denition (5.1) of
M
. Conclude that N
cannot be T-measurable.
(c) For comparison, evaluate explicitly
E
_
(0,T]
N
s
dN
s
.
Here N
s
() = lim
u,s
N
u
() is the left limit. Explain why we know without
calculation that the answer must agree with part (a)?
Exercise 5.5. Let T
t
be a ltration, T the corresponding predictable
-eld, T
t+
dened as in (2.4), and T
+
the predictable -eld that cor-
responds to T
t+
. In other words, T
+
is the -eld generated by the sets
0 F
0
for F
0
T
0+
and (s, t] F for F T
s+
.
(a) Show that each (s, t] F for F T
s+
lies in T.
(b) Show that a set of the type 0 F
0
for F T
0+
T
0
cannot lie in
T.
(c) Show that the -elds T and T
+
coincide on the subspace (0, )
of R
+
. Hint: Apply part (d) of Exercise 1.7.
(d) Suppose X is a T
+
-measurable process. Show that the process
Y (t, ) = X(t, )1
(0,)
(t) is T-measurable.
Exercise 5.6. Suppose M is a continuous L
2
-martingale and X /
2
(M, T).
Show that (1
(a,b)
X) M = (1
[a,b]
X) M. Hint: [M] is continuous.
Exercise 5.7. Let X be a predictable process. Show that
_
T
0
X(s, )
2
ds < for all T < , for P-almost every ,
if and only if there exist stopping times
k
such that
E
_

k
()T
0
X(s, )
2
ds < for all T < .
From this argue the characterization of /(M, T) for Brownian motion and
the compensated Poisson process claimed in Example 5.26.
Exercise 5.8. Let M be a local L
2
martingale. Show that if X, Y
/(M, T) are
M
-equivalent, namely
M
(t, ) : X(t, ) ,= Y (t, ) = 0,
then X M and Y M are indistinguishable.
Exercise 5.9. Finish the proof of Proposition 5.31.
Exercise 5.10. (Computations with the Poisson process.) Let N be a
homogeneous rate Poisson process, and M
t
= N
t
t the compensated
Poisson process which is an L
2
-martingale. Let 0 <
1
<
2
< . . . <
N(t)
denote the jump times of N in (0, t].
(a) Show that
E
_ N(t)
i=1
i
_
=
t
2
2
.
Hint: For a homogeneous Poisson process, given that there are n jumps in an
interval I, the locations of the jumps are n i.i.d. uniform random variables
in I.
Exercises 189
(b) Compute the integral
_
(0,t]
N(s) dM(s).
Compute means to nd a reasonably simple formula in terms of N
t
and
the
i
s. One way is to justify the evaluation of this integral as a Lebesgue-
Stieltjes integral.
(c) Use the formula you obtained in part (b) to check that the process
_
N(s) dM(s) is a martingale. (Of course, this conclusion is part of the
theory but the point here is to obtain it through hands-on computation.
Part (a) and Exercise 2.22 take care of parts of the work.)
(d) Suppose N were predictable. Then the stochastic integral
_
N dM
would exist and be a martingale. Show that this is not true and conclude
that N cannot be predictable.
Hints: It might be easiest to nd
_
(0,t]
N(s) dM(s)
_
(0,t]
N(s) dM(s) =
_
(0,t]
_
N(s) N(s)
_
dM(s)
and use the fact that the integral of N(s) is a martingale.
Exercise 5.11. (Extended stochastic integral of the Poisson process.) Let
N
t
be a rate Poisson process, M
t
= N
t
t and N
(t) = N(t). Show

that N
is a modication of N, and
M
(t, ) : N
t
() ,= N
t
() = 0.
Thus the stochastic integral N M can be dened according to the extension
in Section 5.5 and this N M must agree with N
M.
Exercise 5.12. (Riemann sum approximation in /
2
.) Let M be an L
2
martingale, X /
2
(M, T), and assume X also satises the hypotheses of
Proposition 5.32. Let
m
= 0 = t
m
1
< t
m
2
< t
m
3
< be partitions such
that mesh
m
0. Let
k,m
i
= (X
t
m
i
k) (k), and dene the simple
predictable processes
W
k,m,n
t
=
n
i=1
m,k
i
1
(t
m
i
,t
m
i+1
]
(t).
Then there exists a subsequences m(k) and n(k) such that
lim
k
|X M W
k,m(k),n(k)
M|
/
2
= 0.
Exercise 5.13. Let 0 < a < b < be constants and M /
2,loc
. Find
the stochastic integral
_
(0,t]
1
[a,b)
(s) dM
s
.
Hint: Check that if M /
2
then 1
(a1/n,b1/n]
converges to 1
[a,b)
in
/
2
(M, T).
Chapter 6
It os Formula
Itos formula works as a fundamental theorem of calculus in stochastic anal-
ysis. As preparation for its statement and proof, we extend our earlier
treatment of quadratic variation and covariation. The reader who wishes to
get quickly to the proof of Itos formula can glance at Proposition 6.10 and
move on to Section 6.2.
Right-continuity of the ltration T
t
is not needed for the proofs of this
chapter. This property might be needed for dening the stochastic integral
with respect to a semimartingale one wants to work with. As explained in
the beginning of Section 5.3, under this assumption we can apply Theorem
3.20 (fundamental theorem of local martingales) to guarantee that every
semimartingale has a decomposition whose local martingale part is a local
L
2
-martingale. This L
2
property was used for the construction of the sto-
chastic integral. The problem can arise only when the local martingale part
can have unbounded jumps. When the local martingale is continuous or its
jumps have a uniform bound, it can be localized into an L
2
martingale and
the fundamental theorem is not needed.
6.1. Quadratic variation
Recall from Section 2.2 that the quadratic variation [X] of a process X,
when it exists, is by denition a nondecreasing process with [X]
0
= 0 whose
value at time t is determined, up to a null event, by the limit in probability
(6.1) [X]
t
= lim
mesh()0
i
(X
t
i+1
X
t
i
)
2
.
191
192 6. Itos Formula
Here = 0 = t
0
< t
1
< < t
m()
= t is a partition of [0, t]. Quadratic
covariation [X, Y ] of two processes X and Y was dened as the FV process
(6.2) [X, Y ] = [(X +Y )/2] [(X Y )/2],
assuming the processes on the right exist. Again there is a limit in proba-
bility:
(6.3) lim
mesh()0
i
(X
t
i+1
X
t
i
)(Y
t
i+1
Y
t
i
) = [X, Y ]
t
.
Identity [X] = [X, X] holds. For cadlag semimartingales X and Y , [X] and
[X, Y ] exist and have cadlag versions (Corollary 3.31).
Limit (6.3) shows that, for processes X, Y and Z and reals and ,
(6.4) [X +Y, Z] = [X, Z] +[Y, Z]
provided these processes exist. The equality can be taken in the sense of
indistinguishability for cadlag versions, provided such exist. Consequently
[ , ] operates somewhat in the manner of an inner product.
Lemma 6.1. Suppose M
n
, M, N
n
and N are L
2
-martingales. Fix 0 T <
. Assume M
n
(T) M(T) and N
n
(T) N(T) in L
2
as n . Then
E
_
sup
0tT
[M
n
, N
n
]
t
[M, N]
t
_
0 as n .
Proof. It suces to consider the case where M
n
= N
n
and M = N because
the general case then follows from (6.2). Apply inequality (2.16) and note
that [X]
t
is nondecreasing in t.
[M
n
]
t
[M]
t
[M
n
M]
t
+ 2[M
n
M]
1/2
t
[M]
1/2
t
[M
n
M]
T
+ 2[M
n
M]
1/2
T
[M]
1/2
T
.
Take expectations, apply Schwarz and recall (3.14).
E
_
sup
0tT
[M
n
]
t
[M]
t
_
|M
n
(T) M(T)|
2
L
2
(P)
+ 2|M
n
(T) M(T)|
L
2
(P)
|M(T)|
L
2
(P)
.
Letting n completes the proof.
2,loc
, G /(M, T), and H /(N, T).
Then
[G M, H N]
t
=
_
(0,t]
G
s
H
s
d[M, N]
s
.
Proof. It suces to show
(6.5) [G M, L]
t
=
_
(0,t]
G
s
d[M, L]
s
for an arbitrary L /
2,loc
. This is because then (6.5) applied to L = H N
gives
[M, L]
t
=
_
(0,t]
H
s
d[M, N]
s
so the Lebesgue-Stieltjes measures satisfy d[M, L]
t
= H
t
d[M, N]
t
. Substi-
tute this back into (6.5) to get the desired equality
[G M, L]
t
=
_
(0,t]
G
s
d[M, L]
s
=
_
(0,t]
G
s
H
s
d[M, N]
s
.
Step 1. Assume L, M /
2
. First consider G = 1
(u,v]
for a bounded
T
u
-measurable random variable . Then G M = (M
v
M
u
). By the
bilinearity of quadratic covariation,
[G M, L]
t
=
_
[M
v
, L]
t
[M
u
, L]
t
_
=
_
[M, L]
vt
[M, L]
ut
_
=
_
(0,t]
1
(u,v]
(s) d[M, L]
s
=
_
(0,t]
G
s
d[M, L]
s
.
The second equality above used Lemma 3.29. moves freely in and out of
the integrals because they are path-by-path Lebesgue-Stieltjes integrals. By
additivity of the covariation conclusion (6.5) follows for G that are simple
predictable processes of the type (5.6).
Now take a general G /
2
(M, T). Pick simple predictable processes
G
n
such that G
n
G in /
2
(M, T). Then (G
n
M)
t
(G M)
t
in L
2
(P).
By Lemma 6.1 [G
n
M, L]
t
[G M, L]
t
in L
1
(P). On the other hand the
previous lines showed
[G
n
M, L]
t
=
_
(0,t]
G
n
(s) d[M, L]
s
.
The desired equality
[G M, L]
t
=
_
(0,t]
G(s) d[M, L]
s
follows if we can show the L
1
(P) convergence
_
(0,t]
G
n
(s) d[M, L]
s

_
(0,t]
G(s) d[M, L]
s
.
The rst step below is a combination of the Kunita-Watanabe inequal-
ity (2.17) and the Schwarz inequality. Next, the assumption G
n
G in
194 6. Itos Formula
/
2
(M, T) implies G
n
G in L
2
([0, t] ,
M
).
E
_
_
(0,t]
_
G
n
(s) G(s)
_
d[M, L]
s
_
E
_
(0,t]
[G
n
(s) G(s)[
2
d[M]
s
_
1/2
_
E
_
[L]
t
__
1/2
=
__
(0,t]
[G
n
G[
2
d
M
_
1/2
_
E
_
L
2
t
L
2
0
__
1/2
0 as n .
We have shown (6.5) for the case L, M /
2
and G /
2
(M, T).
Step 2. Now the case L, M /
2,loc
and G /(M, T). Pick stopping
times
k
that localize both L and (G, M). Abbreviate G
k
= 1
(0,
k
]
G.
Then if
k
() t,
[G M, L]
t
= [G M, L]
k
t
= [(G M)
k
, L
k
]
t
= [(G
k
M
k
), L
k
]
t
=
_
(0,t]
G
k
s
d[M
k
, L
k
]
s
=
_
(0,t]
G
s
d[M, L]
s
.
We used Lemma 3.29 to move the
k
superscript in and out of covariations,
and the denition (5.27) of the integral G M. Since
k
() t, we have
G
k
= G on [0, t], and consequently G
k
can be replaced by G on the last line.
This completes the proof.
We can complement part (c) of Proposition 5.14 with this result.
Corollary 6.3. Suppose M, N /
2
, G /
2
(M, T), and H /
2
(N, T).
Then
(G M)
t
(H N)
t
_
(0,t]
G
s
H
s
d[M, N]
s
is a martingale. If we weaken the hypotheses to M, N /
2,loc
, G
/(M, T), and H /(N, T), then the process above is a local martingale.
Proof. Follows from Proposition 6.2 above, and a general property of [M, N]
for (local) L
2
-martingales M and N, stated as Theorem 3.30 in Section
3.4.
Since quadratic variation functions like an inner product for local L
2
-
martingales, we can use the previous result as a characterization of the
stochastic integral. According to this characterization, to verify that a given
process is the stochastic integral, it suces to check that it behaves the right
way in the quadratic covariation.
Lemma 6.4. Let M /
2,loc
and assume M
0
= 0.
(a) For any 0 t < , [M]
t
= 0 almost surely i sup
0st
[M
s
[ = 0
almost surely.
(b) Let G /(M, T). The stochastic integral G M is the unique process
Y /
2,loc
that satises Y
0
= 0 and
[Y, L] =
_
(0,t]
G
s
d[M, L]
s
almost surely
for each 0 t < and each L /
2,loc
.
Proof. Part (a). Fix t. Let
k
be a localizing sequence for M. Then for
s t, [M
k
]
s
[M
k
]
t
= [M]
k
t
[M]
t
= 0 almost surely by Lemma 3.27
and the t-monotonicity of [M]. Consequently E(M
k
s
)
2
= E[M
k
]
s
= 0,
from which M
k
s
= 0 almost surely. Taking k large enough so that
k
() s,
M
s
= M
k
s
= 0 almost surely. Since M has cadlag paths, there is a single
event
0
such that P(
0
) = 1 and M
s
() = 0 for all
0
and s [0, t].
Conversely, if M vanishes on [0, t] then so does [M] by its denition.
Part (b). We checked that G M satises (6.5) which is the property
required here. Conversely, suppose Y /
2,loc
satises the property. Then
by the additivity,
[Y G M, L]
t
= [Y, L] [G M, L] = 0
for any L /
2,loc
. Taking L = Y G M gives [Y G M, Y G M] = 0,
and then by part (a) Y = G M.
The next change-of-integrator or substitution property could have been
proved in Chapter 5 with a case-by-case argument, from simple predictable
integrands to localized integrals. At this point we can give a quick proof,
since we already did the tedious work in the previous proofs.
2,loc
, G /(M, T), and let N = G M,
also a member of /
2,loc
. Suppose H /(N, T). Then HG /(M, T) and
H N = (HG) M.
Proof. Let
k
be a localizing sequence for (G, M) and (H, N). By part
(b) of Proposition 5.31 N
k
= (G M)
k
= G M
k
, and so Proposition
6.2 gives the equality of Lebesgue-Stieltjes measures d[N
k
]
s
= G
2
s
d[M
k
]
s
.
Then for any T < ,
E
_
[0,T]
1
[0,
k
]
(t)H
2
t
G
2
t
d[M
k
]
t
= E
_
[0,T]
1
[0,
k
]
(t)H
2
t
d[N
k
]
t
<
because
k
is assumed to localize (H, N). This checks that
k
localizes
(HG, M) and so HG /(M, T).
196 6. Itos Formula
Let L /
2,loc
. Equation (6.5) gives G
s
d[M, L]
s
= d[N, L]
s
, and so
[(HG) M, L]
t
=
_
(0,t]
H
s
G
s
d[M, L]
s
=
_
(0,t]
H
s
d[N, L]
s
= [H N, L]
t
.
By Lemma 6.4(b), (HG) M must coincide with H N.
We proceed to extend some of these results to semimartingales, begin-
ning with the change of integrator. Recall some notation and terminology.
The integral
_
GdY with respect to a cadlag semimartingale Y was dened
in Section 5.3 for predictable integrands G that satisfy this condition:
n
such that
n
almost
surely and 1
(0,n]
G is a bounded process for each n.
(6.6)
If X is a cadlag process, we dened the caglad process X
by X
(0) = X(0)
and X
(t) = X(t) for t > 0. The notion of uniform convergence on

compact time intervals in probability rst appeared in our discussion of
martingales (Lemma 3.41) and then with integrals (Propositions 5.32 and
5.37).
Corollary 6.6. Let Y be a cadlag semimartingale, G and H predictable
processes that satisfy (6.6), and X =
_
H dY , also a cadlag semimartingale.
Then
_
GdX =
_
GH dY.
Proof. Let Y = Y
0
2
-
martingale M and an FV process U. Let V
t
=
_
(0,t]
H
s
dU
s
, another FV
process. This denition entails that, for a xed , H
s
is the Radon-Nikodym
derivative d
V
/d
U
of the Lebesgue-Stieltjes measures on the time line
(Lemma 1.16). X = H M + V is a semimartingale decomposition of X.
By denition of the integral
_
GdX and Proposition 6.5,
_
(0,t]
G
s
dX
s
=
_
G (H M)
_
t
+
_
(0,t]
G
s
dV
s
=
_
(GH) M
_
t
+
_
(0,t]
G
s
H
s
dU
s
=
_
(0,t]
G
s
H
s
dY
s
.
The last equality is the denition of the semimartingale integral
_
GH dY .
In terms of stochastic dierentials we can express the conclusion as

dX = H dY . These stochastic dierentials do not exist as mathematical
objects, but the rule dX = H dY tells us that dX can be replaced by H dY
in the stochastic integral.
Proposition 6.7. Let Y and Z be cadlag semimartingales. Then [Y, Z]
exists as a cadlag, adapted FV process and satises
(6.7) [Y, Z]
t
= Y
t
Z
t
Y
0
Z
0
_
(0,t]
Y
s
dZ
s
_
(0,t]
Z
s
dY
s
.
The product Y Z is a semimartingale, and for a predictable process H that
satises (6.6),
_
(0,t]
H
s
d(Y Z)
s
=
_
(0,t]
H
s
Y
s
dZ
s
+
_
(0,t]
H
s
Z
s
dY
s
+
_
(0,t]
H
s
d[Y, Z]
s
.
(6.8)
Proof. We can give here a proof of the existence [Y, Z] independent of
Corollary 3.31. Take a countably innite partition = 0 = t
1
< t
2
< t
3
<
< t
i
< of R
+
such that t
i
, and in fact take a sequence of
such partitions with mesh() 0. (We omit the index for the sequence of
partitions.) For t R
+
consider
(6.9)
S
(t) =
(Y
t
i+1
t
Y
t
i
t
)(Z
t
i+1
t
Z
t
i
t
)
=
_

(Y
t
i+1
t
Z
t
i+1
t
Y
t
i
t
Z
t
i
t
)
Y
t
i
(Z
t
i+1
t
Z
t
i
t
)
Z
t
i
(Y
t
i+1
t
Y
t
i
t
)
_
mesh()0
Y
t
Z
t
Y
0
Z
0
_
(0,t]
Y
s
dZ
s
_
(0,t]
Z
s
dY
s
.
Proposition 5.37 implies that the convergence takes place in probability,
uniformly on any compact time interval [0, T], and also gives the limit in-
tegrals. By the usual Borel-Cantelli argument we then get the limit almost
surely, uniformly on [0, T], along some subsequence of the original sequence
of partitions. The limit process is cadlag.
From this limit we get all the conclusions. Take rst Y = Z. Suppose
s < t. Once the mesh is smaller than t s we have indices k < such that
198 6. Itos Formula
t
k
< s t
k+1
and t
< t t
+1
. Then
S
(t) S
(s) = (Y
t
Y
t
)
2
+
1
i=k
(Y
t
i+1
Y
t
i
)
2
(Y
s
Y
t
k
)
2
(Y
t
k+1
Y
t
k
)
2
(Y
s
Y
t
k
)
2
(Y
s
)
2
(Y
s
)
2
= 0.
Since this limit holds almost surely simultaneously for all s < t in [0, T],
we can conclude that the limit process is nondecreasing. We have satised
Denition 2.14 and now know that every cadlag semimartingale Y has a
nondecreasing, cadlag quadratic variation [Y ].
Next Denition 2.15 gives the existence of a cadlag, FV process [Y, Z].
By the limits (2.12), [Y, Z] coincides with the cadlag limit process in (6.9)
almost surely at each xed time, and consequently these processes are in-
distinguishable. We have veried (6.7).
We can turn (6.7) around to say
Y Z = Y
0
Z
0
+
_
Y
dZ +
_
Z
dY + [Y, Z]
which represents Y Z as a sum of semimartingales, and thereby Y Z itself is
a semimartingale.
Equality (6.8) follows from the additivity of integrals, and the change-of-
integrator formula applied to the semimartingales
_
Y
dZ and
_
Z
dY .
Remark 6.8. Identity (6.7) gives an integration by parts rule for stochastic
integrals. It generalizes the integration by parts rule of Lebesgue-Stieltjes
integrals that is part of standard real analysis [6, Section 3.5].
Next we extend Proposition 6.2 to semimartingales.
Theorem 6.9. Let Y and Z be cadlag semimartingales and G and H pre-
dictable processes that satisfy (6.6). Then
[G Y, H Z]
t
=
_
(0,t]
G
s
H
s
d[Y, Z]
s
.
Proof. For the same reason as in the proof of Proposition 6.2, it suces to
show
[G Y, Z]
t
=
_
(0,t]
G
s
d[Y, Z]
s
.
Let Y = Y
0
+ M + A and Z = Z
0
+ N + B be decompositions into local
L
2
-martingales M and N and FV processes A and B. By the linearity of
quadratic covariation,
[G Y, Z] = [G M, N] + [G M, B] + [G A, Z].
By Proposition 6.2 [G M, N] =
_
Gd[M, N]. To handle the other two
terms, x an such that the paths of the integrals and the semimartingales
are cadlag, and the characterization of jumps given in Theorem 5.41 is valid
for each integral that appears here. Then since B and G A are FV pro-
cesses, Lemma A.10 applies. Combining Lemma A.10, Theorem 5.41, and
the denition of a Lebesgue-Stieltjes integral with respect to a step function
gives
[G M, B]
t
+ [G A, Z]
t
=
s(0,t]
(G M)
s
B
s
+
s(0,t]
(G A)
s
Z
s
=
s(0,t]
G
s
M
s
B
s
+
s(0,t]
G
s
A
s
Z
s
=
_
(0,t]
G
s
d[M, B]
s
+
_
(0,t]
G
s
d[A, Z]
s
.
Combining terms gives the desired equation.
As the last result we generalize the limit taken in (6.9) by adding a
coecient to the sum. This is the technical lemma from this section that
we need for the proof of It os formula.
Proposition 6.10. Let Y and Z be cadlag semimartingales, and G an
adapted cadlag process. Given a partition = 0 = t
1
< t
2
< t
3
< <
t
i
of [0, ), dene
(6.10) R
t
() =
i=1
G
t
i
(Y
t
i+1
t
Y
t
i
t
)(Z
t
i+1
t
Z
t
i
t
).
Then as mesh() 0, R() converges to
_
G
d[Y, Z] in probability, uni-

formly on compact time intervals.
Proof. Algebra gives
R
t
() =
G
t
i
(Y
t
i+1
t
Z
t
i+1
t
Y
t
i
t
Z
t
i
t
)
G
t
i
Y
t
i
(Z
t
i+1
t
Z
t
i
t
)
G
t
i
Z
t
i
(Y
t
i+1
t
Y
t
i
t
).
We know from Proposition 6.7 that GY and GZ are semimartingales. Ap-
plying Proposition 5.37 to each sum gives the claimed type of convergence
to the limit
_
(0,t]
G
s
d(Y Z)
s
_
(0,t]
G
s
Y
s
dZ
s
_
(0,t]
G
s
Z
s
dY
s
which by (6.8) equals
_
(0,t]
G
s
d[Y, Z]
s
.
200 6. Itos Formula
6.2. It os formula
We prove Itos formula in two main stages, rst for real-valued semimartin-
gales and then for vector-valued semimartingales. Additionally we state
several simplications that result if the cadlag semimartingale specializes to
an FV process, a continuous semimartingale, or Brownian motion.
For an open set D R, C
2
(D) is the space of functions f : D R such
that the derivatives f
t
and f
tt
exist everywhere on D and are continuous
functions. For a real or vector-valued cadlag process X, the jump at time s
is denoted by
X
s
= X
s
X
s
as before. Recall also the notion of the closure of the path over a time
interval, for a cadlag process given by
X[0, t] = X(s) : 0 s t X(s) : 0 < s t.
Itos formula contains a term which is a sum over the jumps of the
process. This sum has at most countably many terms because a cadlag path
has at most countably many discontinuities (Lemma A.7). It is also possible
to dene rigorously what is meant by a convergent sum of uncountably many
terms, and arrive at the same value. See the discussion around (A.5) in the
appendix.
Theorem 6.11. Fix 0 < T < . Let D be an open subset of R and
f C
2
(D). Let Y be a cadlag semimartingale with quadratic variation
process [Y ]. Assume that for all outside some event of probability zero,
Y [0, T] D. Then
f(Y
t
) = f(Y
0
) +
_
(0,t]
f
t
(Y
s
) dY
s
+
1
2
_
(0,t]
f
tt
(Y
s
) d[Y ]
s
+
s(0,t]
_
f(Y
s
) f(Y
s
) f
t
(Y
s
)Y
s
1
2
f
tt
(Y
s
)(Y
s
)
2
_
.
(6.11)
Part of the conclusion is that the last sum over s (0, t] converges absolutely
for almost every . Both sides of the equality above are cadlag processes,
and the meaning of the equality is that these processes are indistinguishable
on [0, T]. In other words, there exists an event
0
of full probability such
that for
0
, (6.11) holds for all 0 t T.
Proof. The proof starts by Taylor expanding f. We formulate this in the
following way. Dene the function on DD by (x, x) = 0 for all x D,
and for x ,= y
(6.12) (x, y) =
1
(y x)
2
_
f(y) f(x) f
t
(x)(y x)
1
2
f
tt
(x)(y x)
2
_
.
On the set (x, y) D D : x ,= y is continuous as a quotient of
two continuous functions. That is continuous also at diagonal points
(z, z) follows from Taylors theorem (Theorem A.14 in the appendix). Given
z D, pick r > 0 small enough so that G = (z r, z + r) D. Then for
x, y G, x ,= y, there exists a point
x,y
between x and y such that
f(y) = f(x) +f
t
(x)(y x) +
1
2
f
tt
(
x,y
)(y x)
2
.
So for these (x, y)
(x, y) =
1
2
f
tt
(
x,y
)
1
2
f
tt
(x).
As (x, y) converges to (z, z),
x,y
converges to z, and so by the assumed
continuity of f
tt
(x, y)
1
2
f
tt
(z)
1
2
f
tt
(z) = 0 = (z, z).
We have veried that is continuous on D D.
Write
f(y) f(x) = f
t
(x)(y x) +
1
2
f
tt
(x)(y x)
2
+(x, y)(y x)
2
.
Given a partition = t
i
of [0, ), apply the above identity to each par-
tition interval to write
f(Y
t
) = f(Y
0
) +
i
_
f(Y
tt
i+1
) f(Y
tt
i
)
_
= f(Y
0
) +
i
f
t
(Y
tt
i
)(Y
tt
i+1
Y
tt
i
) (6.13)
+
1
2
i
f
tt
(Y
tt
i
)(Y
tt
i+1
Y
tt
i
)
2
(6.14)
+
i
(Y
tt
i
, Y
tt
i+1
)(Y
tt
i+1
Y
tt
i
)
2
. (6.15)
By Propositions 5.37 and 6.10 we can x a sequence of partitions
such
that mesh(
) 0, and so that the following limits happen almost surely,

uniformly for t [0, T], as .
(i) The sum on line (6.13) converges to
_
(0,t]
f
t
(Y
s
) dY
s
.
(ii) The term on line (6.14) converges to
1
2
_
(0,t]
f
tt
(Y
s
) d[Y ]
s
.
(iii) The limit
i
(Y
tt
i+1
Y
tt
i
)
2
[Y ]
t
202 6. Itos Formula
happens.
Fix so that the limits in items (i)(iii) above happen. We apply
the scalar case of Lemma A.13, but simplied so that and in (A.8)
have no time variables, to the cadlag function s Y
s
() on [0, t] and the
sequence of partitions
chosen above. For the closed set K in Lemma

A.13 take K = Y [0, T]. For the continuous function in Lemma A.13 take
(x, y) = (x, y)(y x)
2
. By hypothesis, K is a subset of D. Consequently,
as veried above, the function
(x, y) =
_
(x y)
2
(x, y), x ,= y
0, x = y
is continuous on KK. Assumption (A.9) of Lemma A.13 holds by item (iii)
above. The hypotheses of Lemma A.13 have been veried. The conclusion
is that for this xed and each t [0, T], the sum on line (6.15) converges
to
s(0,t]
(Y
s
, Y
s
) =
s(0,t]
(Y
s
, Y
s
)
_
Y
s
Y
s
_
2
=
s(0,t]
_
f(Y
s
) f(Y
s
) f
t
(Y
s
)Y
s
1
2
f
tt
(Y
s
)(Y
s
)
2
_
.
Lemma A.13 also contains the conclusion that this last sum is absolutely
convergent.
To summarize, given 0 < T < , we have shown that for almost every
, (6.11) holds for all 0 t T.
Remark 6.12. (Semimartingale property) A corollary of Theorem 6.11 is
that under the conditions of the theorem f(Y ) is a semimartingale. Equation
(6.11) expresses f(Y ) as a sum of semimartingales. The integral f
tt
(Y
) d[Y]
is an FV process. To see that the sum over jump times s (0, t] produces
also an FV process, x such that Y
s
() is a cadlag function and (6.11)
holds. Let s
i
denote the (at most countably many) jumps of s Y
s
()
in [0, T]. The theorem gives the absolute convergence
f(Y
s
i
()) f
t
(Y
s
i
())Y
s
i
()
1
2
f
tt
(Y
s
i
())(Y
s
i
())
2
< .
Consequently for this xed the sum in (6.11) denes a function in BV [0, T],
as in Example 1.14.
Let us state some simplications of Itos formula.
Corollary 6.13. Under the hypotheses of Theorem 6.11 we have the follow-
ing special cases.
(a) If Y is continuous on [0, T], then
(6.16) f(Y
t
) = f(Y
0
) +
_
t
0
f
t
(Y
s
) dY
s
+
1
2
_
t
0
f
tt
(Y
s
) d[Y ]
s
.
(b) If Y has bounded variation on [0, T], then
f(Y
t
) = f(Y
0
) +
_
(0,t]
f
t
(Y
s
) dY
s
+
s(0,t]
_
f(Y
s
) f(Y
s
) f
t
(Y
s
)Y
s
_
(6.17)
(c) If Y
t
= Y
0
+B
t
, where B is a standard Brownian motion independent
of Y
0
, then
(6.18) f(B
t
) = f(Y
0
) +
_
t
0
f
t
(Y
0
+B
s
) dB
s
+
1
2
_
t
0
f
tt
(Y
0
+B
s
) ds.
Proof. Part (a). Continuity eliminates the sum over jumps, and renders
endpoints of intervals irrelevant for integration.
Part (b). By Corollary A.11 the quadratic variation of a cadlag BV path
consists exactly of the squares of the jumps. Consequently
1
2
_
(0,t]
f
tt
(Y
s
) d[Y ]
s
=
s(0,t]
1
2
f
tt
(Y
s
)(Y
s
)
2
and we get cancellation in the formula (6.11).
Part (c). Specialize part (a) to [B]
t
= t.
The open set D in the hypotheses of Itos formula does not have to be
an interval, so it can be disconnected.
The important hypothesis Y [0, T] D prevents the process from reach-
ing the boundary. Precisely speaking, the hypothesis implies that for some
> 0, dist(Y (s), D
c
) for all s [0, T]. To prove this, assume the
contrary, namely the existence of s
i
[0, T] such that dist(Y (s
i
), D
c
) 0.
Since [0, T] is compact, we may pass to a convergent subsequence s
i
s.
And then by the cadlag property, Y (s
i
) converges to some point y. Since
dist(y, D
c
) = 0 and D
c
is a closed set, y D
c
. But y Y [0, T], and we have
contradicted Y [0, T] D.
But note that the (the distance to the boundary of D) can depend on
. So the hypothesis Y [0, T] D does not require that there exists a xed
closed subset H of D such that PY (t) H for all t [0, T] = 1.
Hypothesis Y [0, T] D is needed because otherwise a blow-up at
the boundary can cause problems. The next example illustrates why we
204 6. Itos Formula
need to assume the containment in D of the closure Y [0, T], and not merely
Y [0, T] D.
Example 6.14. Let D = (, 1) (
3
2
, ), and dene
f(x) =
_
1 x, x < 1
0, x >
3
2
,
a C
2
-function on D. Dene the deterministic process
Y
t
=
_
t, 0 t < 1
1 +t, t 1.
Y
t
D for all t 0. However, if t > 1,
_
(0,t]
f
t
(Y
s
) dY
s
=
_
(0,1)
f
t
(s) ds +f
t
(Y
1
) +
_
(1,t]
f
t
(s) ds
= 1 + () + 0.
As the calculation shows, the integral is not nite. The problem is that the
closure Y [0, t] contains the point 1 which lies at the boundary of D, and the
derivative f
t
blows up there.
We extend Itos formula to vector-valued semimartingales. For purposes
of matrix multiplication we think of points x R
d
as column vectors, so
with T denoting transposition,
x = [x
1
, x
2
, . . . , x
d
]
T
.
Let Y
1
(t), Y
2
(t), . . . , Y
d
(t) be cadlag semimartingales with respect to a com-
mon ltration T
t
. We write Y (t) = [Y
1
(t), . . . , Y
d
(t)]
T
for the column vec-
tor with coordinates Y
1
(t), . . . , Y
d
(t), and call Y an R
d
-valued semimartin-
gale. Its jump is the vector of jumps in the coordinates:
Y (t) = [Y
1
(t), Y
2
(t), . . . , Y
d
(t)]
T
.
For 0 < T < and an open subset D of R
d
, C
1,2
([0, T]D) is the space
of continuous functions f : [0, T] D R whose partial derivatives f
t
, f
x
i
,
and f
x
i
,x
j
exist and are continuous in the interior (0, T) D, and extend
as continuous functions to [0, T] D. So the superscript in C
1,2
stands for
one continuous time derivative and two continuous space derivatives. For
f C
1,2
([0, T] D) and (t, x) [0, T] D, the spatial gradient
Df(t, x) =
_
f
x
1
(t, x), f
x
2
(t, x), . . . , f
x
d
(t, x)
T
is the column vector of rst-order partial derivatives in the space variables,
and the Hessian matrix D
2
f(t, x) is the dd matrix of second-order spatial
partial derivatives:
D
2
f(t, x) =
_
_
f
x
1
,x
1
(t, x) f
x
1
,x
2
(t, x) f
x
1
,x
d
(t, x)
f
x
2
,x
1
(t, x) f
x
2
,x
2
(t, x) f
x
2
,x
d
(t, x)
.
.
.
.
.
.
.
.
.
.
.
.
f
x
d
,x
1
(t, x) f
x
d
,x
2
(t, x) f
x
d
,x
d
(t, x)
_
_
.
Theorem 6.15. Fix d 2 and 0 < T < . Let D be an open subset of
R
d
and f C
1,2
([0, T] D). Let Y be an R
d
-valued cadlag semimartingale
such that outside some event of probability zero, Y [0, T] D. Then
f(t, Y (t)) = f(0, Y (0)) +
_
t
0
f
t
(s, Y (s)) ds
+
d
j=1
_
(0,t]
f
x
j
(s, Y (s)) dY
j
(s)
+
1
2
1j,kd
_
(0,t]
f
x
j
,x
k
(s, Y (s)) d[Y
j
, Y
k
](s)
+
s(0,t]
_
f(s, Y (s)) f(s, Y (s))
Df(s, Y (s))
T
Y (s)
1
2
Y (s)
T
D
2
f(s, Y (s))Y (s)
_
(6.19)
Proof. Let us write Y
k
t
= Y
k
(t) in the proof. The pattern is the same as in
the scalar case. Dene a function on [0, T]
2
D
2
by the equality
f(t, y) f(s, x) = f
t
(s, x)(t s) +Df(s, x)
T
(y x)
+
1
2
(y x)
T
D
2
f(s, x)(y x) +(s, t, x, y).
(6.20)
Apply this to partition intervals to write
f(t, Y
t
) = f(0, Y
0
) +
i
_
f(t t
i+1
, Y
tt
i+1
) f(t t
i
, Y
tt
i
)
_
= f(0, Y
0
)
+
i
f
t
(t t
i
, Y
tt
i
)
_
(t t
i+1
) (t t
i
)
_
(6.21)
+
d
k=1
i
f
x
k
(t t
i
, Y
tt
i
)
_
Y
k
tt
i+1
Y
k
tt
i
_
(6.22)
+
1
2
1j,kd
i
f
x
j
,x
k
(t t
i
, Y
tt
i
)
_
Y
j
tt
i+1
Y
j
tt
i
__
Y
k
tt
i+1
Y
k
tt
i
_
(6.23)
+
i
(t t
i
, t t
i+1
, Y
tt
i
, Y
tt
i+1
). (6.24)
206 6. Itos Formula
By Propositions 5.37 and 6.10 we can x a sequence of partitions
such
that mesh(
) 0, and so that the following limits happen almost surely,

uniformly for t [0, T], as .
(i) Line (6.21) converges to
_
(0,t]
f
t
(s, Y
s
) ds.
(ii) Line (6.22) converges to
d
k=1
_
(0,t]
f
x
k
(s, Y
s
) dY
k
s
.
(iii) Line (6.23) converges to
1
2
1j,kd
_
(0,t]
f
x
j
,x
k
(s, Y
s
) d[Y
j
, Y
k
]
s
.
(iv) The limit
i
_
Y
k
tt
i+1
Y
k
tt
i
_
2
[Y
k
]
t
happens for 1 k d.
Fix such that Y [0, T] D and the limits in items (i)(iv) hold. By the
above paragraph and by hypothesis these conditions hold for almost every
.
To treat the sum on line (6.24), we apply Lemma A.13 to the R
d
-valued
cadlag function s Y
s
() on [0, T], with the function dened by (6.20),
the closed set K = Y [0, T], and the sequence of partitions
chosen above.
We need to check that and the set K satisfy the hypotheses of Lemma
A.13. Continuity of follows from the denition (6.20). Next we argue that
if (s
n
, t
n
, x
n
, y
n
) (u, u, z, z) in [0, T]
2
K
2
while for each n, either s
n
,= t
n
or x
n
,= y
n
, then
(6.25)
(s
n
, t
n
, x
n
, y
n
)
[t
n
s
n
[ +[y
n
x
n
[
2
0.
Given > 0, let I be an interval around u in [0, T] and let B be an open
ball centered at z and contained in D such that
[f
t
(v, w) f
t
(u, z)[ +[D
2
f(v, w) D
2
f(u, z)[
for all v I, w B. Such an interval I and ball B exist by the openness of
D and by the assumption of continuity of derivatives of f in [0, T] D. For
large enough n, we have s
n
, t
n
I and x
n
, y
n
B. Since a ball is convex,
by Taylors formula (A.16) we can write
(s
n
, t
n
, x
n
, y
n
) =
_
f
t
(
n
, y
n
) f
t
(s
n
, x
n
)
_
(t
n
s
n
)
+
1
2
(y
n
x
n
)
T
_
D
2
f(s
n
,
n
) D
2
f(s
n
, x
n
)
_
(y
n
x
n
),
where
n
lies between s
n
and t
n
, and
n
is a point on the line segment
connecting x
n
and y
n
. Hence
n
I and
n
B, and by Schwarz inequality
in the form (A.7),
[(s
n
, t
n
, x
n
, y
n
)[
f
t
(
n
, y
n
) f
t
(s
n
, x
n
)
[t
n
s
n
[
+
D
2
f(s
n
,
n
) D
2
f(t
n
, x
n
)
[y
n
x
n
[
2
2
_
[t
n
s
n
[ +[y
n
x
n
[
2
_
.
Thus
(s
n
, t
n
, x
n
, y
n
)
[t
n
s
n
[ +[y
n
x
n
[
2
2
for large enough n, and we have veried (6.25).
The function
(s, t, x, y)
(s, t, x, y)
[t s[ +[y x[
2
is continuous at points where either s ,= t or x ,= y, as a quotient of two
continuous functions. Consequently the function dened by (A.8) is con-
tinuous on [0, T]
2
K
2
.
Hypothesis (A.9) of Lemma A.13 is a consequence of the limit in item
(iv) above.
The hypotheses of Lemma A.13 have been veried. By this lemma, for
this xed and each t [0, T], the sum on line (6.24) converges to
s(0,t]
(s, s, Y
s
, Y
s
) =
s(0,t]
_
f(s, Y
s
) f(s, Y
s
)
Df(s, Y
s
)Y
s
1
2
Y
T
s
D
2
f(s, Y
s
)Y
s
_
.
This completes the proof of Theorem 6.15.
Remark 6.16. (Notation) Often Itos formula is expressed in terms of dif-
ferential notation which is more economical than the integral notation. As
an example, if Y is a continuous R
d
-valued semimartingale, equation (6.19)
can be written as
df(t, Y (t)) = f
t
(t, Y (t)) dt +
d
j=1
f
x
j
(t, Y (t)) dY
j
(t)
+
1
2
1j,kd
f
x
j
,x
k
(t, Y (t)) d[Y
j
, Y
k
](t).
(6.26)
208 6. Itos Formula
As mentioned already, these stochastic dierentials have no rigorous mean-
ing. The formula above is to be regarded only as an abbreviation of the
integral formula (6.19).
We state the Brownian motion case as a corollary. For f C
2
(D) in
R
d
, the Laplace operator is dened by
f = f
x
1
,x
1
+ +f
x
d
,x
d
.
A function f is harmonic in D if f = 0 on D.
Corollary 6.17. Let B(t) = (B
1
(t), . . . , B
d
(t)) be Brownian motion in R
d
,
with random initial point B(0), and f C
2
(R
d
). Then
(6.27) f(B(t)) = f(B(0)) +
_
t
0
Df(B(s))
T
dB(s) +
1
2
_
t
0
f(B(s)) ds.
Suppose f is harmonic in an open set D R
d
. Let D
1
be an open subset
of D such that dist(D
1
, D
c
) > 0. Assume initially B(0) = z for some point
z D
1
, and let
(6.28) = inft 0 : B(t) D
c
1
be the exit time for Brownian motion from D

1
. Then f(B
(t)) is a local
L
2
-martingale.
Proof. Formula (6.27) comes directly from Itos formula, because [B
i
, B
j
] =
i,j
t.
The process B
is a (vector) L
2
martingale that satises B
[0, T] D
for all T < . Thus Itos formula applies. Note that [B
i
, B
j
] = [B
i
, B
j
]
i,j
(t ). Hence f = 0 in D eliminates the second-order term, and the
formula simplies to
f(B
(t)) = f(z) +
_
t
0
Df(B
(s))
T
dB
(s)
which shows that f(B
(t)) is a local L
2
-martingale.
6.3. Applications of It os formula
One can use Itos formula to nd pathwise expressions for stochastic inte-
grals.
Example 6.18. To evaluate
_
t
0
B
k
s
dB
s
for standard Brownian motion and
k 1, take f(x) = (k + 1)
1
x
k+1
, so that f
t
(x) = x
k
and f
tt
(x) = kx
k1
.
Itos formula gives
_
t
0
B
k
s
dB
s
= (k + 1)
1
B
k+1
t

k
2
_
t
0
B
k1
s
ds.
6.3. Applications of Itos formula 209
The integral on the right is a familiar Riemann integral of the continuous
function s B
k1
s
.
One can use Itos formula to nd martingales, which in turn are useful
for calculations, as the next lemma and example show.
Lemma 6.19. Suppose f C
1,2
(R
+
R) and f
t
+
1
2
f
xx
= 0. Let B
t
be
one-dimensional standard Brownian motion. Then f(t, B
t
) is a local L
2
-
martingale. If
(6.29)
_
T
0
E
_
f
x
(t, B
t
)
2
dt < ,
then f(t, B
t
) is an L
2
-martingale on [0, T].
Proof. Since [B]
t
= t, (6.19) specializes to
f(t, B
t
) = f(0, 0) +
_
t
0
f
x
(s, B
s
) dB
s
+
_
t
0
_
f
t
(s, B
s
) +
1
2
f
xx
(s, B
s
)
_
ds
= f(0, 0) +
_
t
0
f
x
(s, B
s
) dB
s
where the last line is a local L
2
-martingale. (The integrand f
x
(s, B
s
) is a
continuous process, hence predictable, and satises the local boundedness
condition (5.29).)
The integrability condition (6.29) guarantees that f
x
(s, B
s
) lies in the
space /
2
(B, T) of integrands on the interval [0, T]. In our earlier develop-
ment of stochastic integration we always considered processes dened for all
time. To get an integrand process on the entire time line [0, ), one can
extend f
x
(s, B
s
) by declaring it identically zero on (T, ). This does not
change the integral on [0, T].
Example 6.20. Let R and ,= 0 be constants. Let a < 0 < b. Let
B
t
be one-dimensional standard Brownian motion, and X
t
= t + B
t
a
Brownian motion with drift. Question: What is the probability that X
t
exits the interval (a, b) through the point b?
Dene the stopping time
= inft > 0 : B
t
= a or B
t
= b.
First we need to check that < almost surely. For example, the events
B
n+1
B
n
> b a +[[ + 1, n = 0, 1, 2, . . .
are independent and have a common positive probability. Hence one of them
happens almost surely. Consequently X
n
cannot remain in (a, b) for all n.
210 6. Itos Formula
We seek a function h such that h(X
t
) is a martingale. Then, if we could
justify Eh(X
) = h(0), we could compute the desired probability P(X
= b)
from
h(0) = Eh(X
) = h(a)P(X
= a) +h(b)P(X
= b).
To utilize Lemma 6.19, let f(t, x) = h(t +x). The condition f
t
+
1
2
f
xx
= 0
becomes
h
t
+
1
2
2
h
tt
= 0.
At this point we need to decide whether = 0 or not. Let us work the case
,= 0. Solving for h gives
h(x) = C
1
exp
_
2
2
x
_
+C
2
for two constants of integration C
1
, C
2
. To check (6.29), from f(t, x) =
h(t +x) derive
f
x
(t, x) = 2C
2
1
exp
_
2
2
(t +x)
_
.
Since B
t
is a mean zero normal with variance t, one can verify that (6.29)
holds for all T < .
Now M
t
= h(X
t
) is a martingale. By optional stopping, M
t
is also a
martingale, and so EM
t
= EM
0
= h(0). By path continuity and < ,
M
t
M
almost surely as t . Furthermore, the process M

t
is
bounded, because up to time process X
t
remains in [a, b], and so [M
t
[
C sup
axb
[h(x)[. Dominated convergence gives EM
t
EM
as t
. We have veried that Eh(X
) = h(0).
Finally, we can choose the constants C
1
and C
2
so that h(b) = 1 and
h(a) = 0. After some details,
(6.30) P(X
= b) = h(0) =
e
2a/
2
1
e
2a/
2
e
2b/
2
.
Can you explain what you see as you let either a or b ? (Decide
rst whether is positive or negative.)
We leave the case = 0 as an exercise. You should get P(X
= b) =
(a)/(b a).
Next we use Itos formula to investigate recurrence and transience of
Brownian motion, and whether Brownian motion ever hits a point. Let us
rst settle these questions in one dimension.
Proposition 6.21. Let B
t
be Brownian motion in R. Then lim
t
B
t
=
and lim
t
B
t
= , almost surely. Consequently almost every Brownian
path visits every point innitely often.
Proof. Let
0
= 0 and (k + 1) = inft > (k) : [B
t
B
(k)
[ = 4
k+1
. By
the strong Markov property of Brownian motion, for each k the restarted
process B
(k)+s
B
(k)
: s is a standard Brownian motion, independent
of T
(k)
. Consequently, by the case = 0 of Example 6.20,
P
_
B
(k+1)
B
(k)
= 4
k+1
= P
_
B
(k+1)
B
(k)
= 4
k+1
=
1
2
.
By the strong Markov property these random variables indexed by k are
independent. Thus, for almost every , there are arbitrarily large j and k
such that B
(j+1)
B
(j)
= 4
j+1
and B
(k+1)
B
(k)
= 4
k+1
. But then
since
[B
(j)
[
j
i=1
4
i
=
4
j+1
1
4 1

4
j+1
2
,
B
(j+1)
4
j
, and by the same argument B
(k+1)
4
k
. Thus lim
t
B
t
=
and lim
t
B
t
= almost surely.
Almost every Brownian path visits every point innitely often due to a
special property of one dimension: it is impossible to go around a point.
Proposition 6.22. Let B

t
be Brownian motion in R
d
, and let P
z
denote
the probability measure when the process B
t
is started at point z R
d
. Let
r
= inft 0 : [B
t
[ r
be the rst time Brownian motion hits the ball of radius r around the origin.
(a) If d = 2, P
z
(
r
< ) = 1 for all r > 0 and z R
d
.
(b) If d 3, then for z outside the ball of radius r,
P
z
(
r
< ) =
_
r
[z[
_
d2
.
There will be an almost surely nite time T such that [B
t
[ > r for all t T.
(c) For d 2 and any z, y R
d
,
P
z
[ B
t
,= y for all 0 < t < ] = 1.
Note that z = y is allowed. That is why t = 0 is not included in the event.
Proof. Observe that for (c) it suces to consider y = 0, because
P
z
[ B
t
,= y for all 0 < t < ] = P
zy
[ B
t
,= 0 for all 0 < t < ].
Then, it suces to consider z ,= 0, because if we have the result for all z ,= 0,
then we can use the Markov property to restart the Brownian motion after
212 6. Itos Formula
a small time s:
P
0
[ B
t
,= 0 for all 0 < t < ]
= lim
s0
P
0
[ B
t
,= 0 for all s < t < ]
= lim
s0
E
0
_
P
B(s)
B
t
,= 0 for all 0 < t <
= 1.
The last equality is true because B(s) ,= 0 with probability one.
Now we turn to the actual proofs, and we can assume z ,= 0 and the
small radius r < [z[.
We start with dimension d = 2. The function g(x) = log[x[ is harmonic
in D = R
2
0. (Only in d = 2, check.) Let
R
= inft 0 : [B
t
[ R
be the rst time to exit the open ball of radius R. Pick r < [z[ < R, and
dene the annulus A = x : r < [x[ < R. The time to exit the annulus is
=
r

R
. Since any coordinate of B
t
has limsup almost surely, or by
the independent increments argument used in Example 6.20, the exit time
R
and hence also is nite almost surely.
Apply Corollary 6.17 to the harmonic function
f(x) =
log R log[x[
log R log r
and the annulus A. We get that f(B
t
) is a local L
2
-martingale, and since
f is bounded on the closure of A, it follows that f(B
t
) is an L
2
-martingale.
The optional stopping argument used in Example 6.20, in conjunction with
letting t , gives again f(z) = E
z
f(B
0
) = E
z
f(B
). Since f = 1 on the
boundary of radius r but vanishes at radius R,
P
z
(
r
<
R
) = P
z
( [B
[ = r) = E
z
f(B
) = f(z),
and so
(6.31) P
z
(
r
<
R
) =
log R log[z[
log R log r
.
From this we can extract both part (a) and part (c) for d = 2. First,
R
as R because a xed path of Brownian motion is bounded
on bounded time intervals. Consequently
P
z
(
r
< ) = lim
R
P
z
(
r
<
R
)
= lim
R
log R log[z[
log R log r
= 1.
Secondly, consider r = r(k) = (1/k)
k
and R = R(k) = k. Then we get
P
z
(
r(k)
<
R(k)
) =
log k log[z[
(k + 1) log k
which vanishes as k . Let = inft 0 : B
t
= 0 be the rst hitting
time of 0. For r < [z[,
r
because B
t
cannot hit zero without rst
entering the ball of radius r. Again since
R(k)
as k ,
P
z
( < ) = lim
k
P
z
( <
R(k)
)
lim
k
P
z
(
r(k)
<
R(k)
) = 0.
For dimension d 3 we use the harmonic function g(x) = [x[
2d
, and
apply Itos formula to the function
f(x) =
R
2d
[x[
2d
R
2d
r
2d
.
The annulus A and stopping times
R
and are dened as above. The same
reasoning now leads to
(6.32) P
z
(
r
<
R
) =
R
2d
[z[
2d
R
2d
r
2d
.
Letting R gives
P
z
(
r
< ) =
[z[
2d
r
2d
=
_
r
[z[
_
d2
as claimed. Part (c) follows now because the quantity above tends to zero
as r 0.
It remains to show that after some nite time, the ball of radius r is no
longer visited. Let r < R. Dene
1
R
=
R
, and for n 2,
n
r
= inft >
n1
R
: [B
t
[ r
and
n
R
= inft >
n
r
: [B
t
[ R.
In other words,
1
R
<
2
r
<
2
R
<
3
r
< are the successive visits to radius
R and back to radius r. Let = (r/R)
d2
< 1. We claim that for n 2,
P
z
(
n
r
< ) =
n1
.
For n = 2, since
1
R
<
2
r
, use the strong Markov property to restart the
Brownian motion at time
1
R
.
P
z
(
2
r
< ) = P
z
(
1
R
<
2
r
< )
= E
z
_
P
B(
1
R
)
r
<
= .
214 6. Itos Formula
Then by induction.
P
z
(
n
r
< ) = P
z
(
n1
r
<
n1
R
<
n
r
< )
= E
z
_
1
n1
r
<
n1
R
< P
B(
n1
R
)
r
<
= P
z
(
n1
r
< ) =
n1
.
Above we used the fact that if
n1
r
< , then necessarily
n1
R
<
because each coordinate of B
t
has limsup .
The claim implies
n
P
z
(
n
r
< ) < .
By Borel-Cantelli,
n
r
< can happen only nitely many times, almost
surely.
Remark 6.23. We see here an example of an uncountable family of events
with probability one whose intersection must vanish. Namely, the inter-
section of the events that Brownian motion does not hit a particular point
would be the event that Brownian motion never hits any point. Yet Brow-
nian motion must reside somewhere.
Theorem 6.24. (Levys Characterization of Brownian Motion.) Let M =
[M
1
, . . . , M
d
]
T
be a continuous R
d
-valued local martingale with M(0) = 0.
Then M is a standard Brownian motion i [M
i
, M
j
]
t
=
i,j
t.
Proof. We already know that a d-dimensional standard Brownian motion
satises [B
i
, B
j
]
t
=
i,j
t.
What we need to show is that M with the assumed covariance is Brow-
nian motion. As a continuous local martingale, M is also a local L
2
-
martingale. Fix a vector = (
1
, . . . ,
d
)
T
R
d
and
f(t, x) = exp(i
T
x +
1
2
[[
2
t).
Let Z
t
= f(t, M(t)). Itos formula applies equally well to complex-valued
functions, and so
Z
t
= 1 +
[[
2
2
_
t
0
Z
s
ds +
d
j=1
i
j
_
t
0
Z
s
dM
j
(s)
1
2
d
j=1
2
j
_
t
0
Z
s
ds
= 1 +
d
j=1
i
j
_
t
0
Z
s
dM
j
(s).
This shows that Z is a local L
2
-martingale. On any bounded time interval
Z is bounded because the random factor exp(i
T
M(t)) has absolute value
one. Consequently Z is an L
2
-martingale. For s < t, E[Z
t
[T
s
] = Z
s
can be
rewritten as
E
_
exp
_
i
T
(M(t) M(s))
_
T
s
= exp
_
1
2
[[
2
(t s)
_
.
By Lemma B.16 in the appendix, conditioned on T
s
, the increment M(t)
M(s) has normal distribution with mean zero and covariance matrix iden-
tity. In particular, M(t) M(s) is independent of T
s
. Thus M has all the
properties of Brownian motion.
Remark 6.25. The proof actually gives more than claimed: if M is a
continuous process such that X
t
= M
t
M
0
is a local martingale with
[X
i
, X
j
]
t
=
i,j
t then X is a standard Brownian motion independent of M
0
.
Only a small observation on top of the proof above is needed.
Next we prove a useful moment inequality. It is one case of the Burkholder-
Davis-Gundy inequalities. Recall the notation M
t
= sup
0st
[M
s
[.
Proposition 6.26. Let p [2, ). Then there exists a constant C
p
such
that for all continuous local martingales M with M
0
= 0 and all 0 < t < ,
(6.33) E[(M
t
)
p
] C
p
E
_
[M]
p/2
t
_
.
Proof. Check that f(x) = [x[
p
is a C
2
function on all of R, with derivatives
f
t
(x) = sign(x)p[x[
p1
and f
tt
(x) = p(p 1)[x[
p2
. (The origin needs a
separate check. The sign function is sign(x) = x/[x[ for nonzero x, and here
the convention at x = 0 is immaterial.) Version (6.16) of Itos formula gives
[M
t
[
p
=
_
t
0
sign(M
s
)p[M
s
[
p1
dM
s
+
1
2
p(p 1)
_
t
0
[M
s
[
p2
d[M]
s
.
Assume M is bounded. Then M is an L
2
martingale and M /
2
(M, T).
Consequently the term
_
t
0
sign(M
s
)p[M
s
[
p1
dM
s
is a mean zero L
2
martin-
gale. Take expectations and apply Holders inequality:
E
_
[M
t
[
p
_
=
1
2
p(p 1)E
_
t
0
[M
s
[
p2
d[M]
s
1
2
p(p 1)E
_
(M
t
)
p2
[M]
t
_
1
2
p(p 1)
_
E
_
(M
t
)
p
_
_
1
2
p
_
E
_
[M]
p/2
t
_
_2
p
.
Combining the above with Doobs inequality (3.11) gives
E[(M
t
)
p
] C
p
E
_
[M
t
[
p
_
C
p
_
E
_
(M
t
)
p
_
_
1
2
p
_
E
_
[M]
p/2
t
_
_2
p
.
216 6. Itos Formula
Rearranging the above inequality gives the conclusion for a bounded mar-
tingale. The constant C
p
changes along the way but depends only on p.
The general case comes by localization. Let
k
= inft 0 : [M
t
[ k,
a stopping time by Corollary 2.10. By continuity M
k
is bounded, so we
can apply the already proved result to claim
E[(M
k
t
)
p
] C
p
E
_
[M]
p/2
k
t
_
.
On both sides monotone convergence leads to (6.33) as k .
Exercises
Exercise 6.1. Does part (a) of Lemma 6.4 extend to semimartingales?
Exercise 6.2. (a) Let Y be a cadlag semimartingale with Y
0
= 0. Check
that Itos formula gives 2
_
Y
dY = Y
2
+ [Y ], as already follows from the
intregration by parts formula (6.7).
(b) Let N be a rate Poisson process and M
t
= N
t
t. Apply part
(a) to show that M
2
t
N
t
is a martingale.
Exercise 6.3. Let B
t
be standard Brownian motion and M
t
= B
2
t
t.
(a) Justify the identity [M] = [B
2
]. (Use the linearity (6.4).)
(b) Apply Itos formula to B
2
t
and from that nd a representation for
[B
2
] in terms of a single ds-integal.
(c) Using Itos formula and your answer to (b), check that M
2
[B
2
]
is a martingale, as it should. (Of course without appealing to the fact that
M
2
[B
2
] = M
2
[M] is a martingale.)
(d) Use integration by parts (6.7) to nd another representation of [B
2
].
Then use Itos formula to show that this representation agrees with the
answer in (b).
Exercise 6.4. (a) Check that for a rate Poisson process N Itos formula
reduces to an obvious identity, namely a telescoping sum over jumps that
can be written as
(6.34) f(N
t
) = f(0) +
_
(0,t]
_
f(N
s
) f(N
s
)
_
dN
s
.
(b) Suppose E
_
T
0
[f(N
s
)[
2
ds < . Show that for t [0, T]
(6.35) E[f(N
t
)] = f(0) +E
_
t
0
_
f(N
s
+ 1) f(N
s
)
_
ds.
Warning. Do not attempt to apply stochastic calculus to non-predictable
integrands.
Exercises 217
Exercise 6.5. Suppose X is a nondecreasing cadlag process such that
X(0) = 0, all jumps are of size 1 (that is, X
s
X
s
= 0 or 1), between
jumps X is constant, and M
t
= X
t
t is a martingale. Show that X is a
rate Poisson process. Hint. Imitate the proof of Theorem 6.24.
Exercise 6.6. Let B be standard one-dimensional Brownian motion. Show
that for a continuously dierentiable nonrandom function ,
_
t
0
(s) dB
s
= (t)B
t
_
t
0
B
s
t
(s) ds.
Exercise 6.7. Dene
V
1
(t) =
_
t
0
B
s
ds, V
2
(t) =
_
t
0
V
1
(s) ds,
and generally
V
n
(t) =
_
t
0
V
n1
(s) ds =
_
t
0
ds
1
_
s
1
0
ds
2

_
s
n1
0
ds
n
B
sn
.
V
n
is known as n times integrated Brownian motion, and appears in appli-
cations in statistics. Show that
V
n
(t) =
1
n!
_
t
0
(t s)
n
dB
s
.
Then show that the process
M
n
(t) = V
n
(t)
n
j=1
t
j
j!(n j)!
_
t
0
(s)
nj
dB
s
is a martingale.
Exercise 6.8. Let X and Y be independent rate Poisson processes.
(a) Show that
PX and Y have no jumps in common = 1.
Find the covariation [X, Y ].
(b) Find a process U such that X
t
Y
t
U
t
is a martingale.
Exercise 6.9. Let D = R
3
0, the complement of the origin in R
3
. Dene
f C
2
(D) by
f(x) =
1
[x[
=
1
(x
2
1
+x
2
2
+x
2
3
)
1/2
Let B(t) = (B
1
(t), B
2
(t), B
3
(t)) be Brownian motion in R
3
started at some
point B(0) = z ,= 0. Show that f(B
t
) is a uniformly integrable local L
2
-
martingale, but not a martingale.
218 6. Itos Formula
The point of the exercise is that a bounded local martingale is a martin-
gale, so it is natural to ask whether a uniformly integrable local martingale
is a martingale.
Hints. From Proposition 6.22 P
z
B
t
,= 0 for all t = 1. Uniform inte-
grability follows for example from having E[f(B
t
)
2
] bounded in t. A mar-
tingale has a constant expectation.
Exercise 6.10. Let B
t
be Brownian motion in R
k
started at point z R
k
.
As before, let
R
= inft 0 : [B
t
[ R
be the rst time the Brownian motion leaves the ball of radius R. Compute
the expectation E
z
[
R
] as a function of z, k and R.
Hint. Start by applying Itos formula to f(x) = x
2
1
+ +x
2
k
.
Exercise 6.11. Provide the proof for Remark 6.25.
Exercise 6.12. (Space-time Poisson random measure) In some applications
the random occurrences modeled by a Poisson process also come with a
spatial structure. Let U be a subset of R
d
and a nite Borel measure
on U. Let be a Poisson random measure on U R
+
with mean measure
m. Dene the ltration T
t
= (B) : B B
U[0,t]
. Fix a bounded
Borel function h on U R
+
. Dene
X
t
=
_
U[0,t]
h(x, s) (dx, ds) , H
t
=
_
t
0
_
U
h(x, s) (dx) ds
and M
t
= X
t
H
t
.
(a) Show that these are cadlag processes, M is a martingale, H is an FV
process and X is semimartingale.
(b) Show that for an adapted cadlag process Z,
_
(0,t]
Z(s) dX
s
=
_
U[0,t]
Z(s)h(x, s) (dx, ds).
Hint. The Riemann sums in Proposition 5.37 would work.
(c) Show that for an adapted cadlag process Z,
E
_
(0,t]
Z(s) dX
s
=
_
t
0
_
U
E[Z(s)]h(x, s) (dx) ds.
What moment assumptions to do you need for Z?
Chapter 7
Stochastic Dierential
Equations
In this chapter we study equations of the type
(7.1) X(t, ) = H(t, ) +
_
(0,t]
F(s, , X()) dY (s, )
for an unknown R
d
-valued process X. In the equation, Y is a given R
m
-
valued cadlag semimartingale and H is a given R
d
-valued adapted cadlag
process. The coecient F(t, , ) is a d m-matrix valued function of the
time variable t, the sample point , and a cadlag path . The integral
_
F dY with a matrix-valued integrand and vector-valued integrator has a
natural componentwise interpretation, as explained in Remark 5.38.
Underlying the equation is a probability space (, T, P) with a ltration
T
t
, on which the ingredients H, Y and F are dened. A solution X is
an R
d
-valued cadlag process that is dened on this given probability space
(, T, P) and adapted to T
t
, and that satises the equation in the sense
that the two sides of (7.1) are indistinguishable. This notion is the so-called
strong solution which means that the solution process can be constructed
on the given probability space, adapted to the given ltration.
In order for the integral
_
F dY to be sensible, F has to have the prop-
erty that, whenever an adapted cadlag process X is substituted for , the
resulting process (t, ) F(t, , X()) is predictable and locally bounded
in the sense (5.36). We list precise assumptions on F when we state and
prove an existence and uniqueness theorem.
Since the integral term vanishes at time zero, equation (7.1) contains
the initial value X(0) = H(0). The integral equation (7.1) can be written
219
220 7. Stochastic Dierential Equations
in the dierential form
(7.2) dX(t) = dH(t) +F(t, X) dY (t), X(0) = H(0)
where the initial value must then be displayed explicitly. The notation can
be further simplied by dropping the superuous time variables:
(7.3) dX = dH +F(t, X) dY, X(0) = H(0).
Equations (7.2) and (7.3) have no other interpretation except as abbre-
viations for (7.1). Equations such as (7.3) are known as SDEs (stochastic
dierential equations) even though precisely speaking they are integral equa-
tions.
7.1. Examples of stochastic equations and solutions
7.1.1. Ito equations. Let B
t
be a standard Brownian motion in R
m
with
respect to a ltration T
t
. Let be an R
d
-valued T
0
-measurable ran-
dom variable. Often is a nonrandom point x
0
. The assumption of T
0
-
measurability implies that is independent of the Brownian motion. An Ito
equation has the form
(7.4) dX
t
= b(t, X
t
) dt +(t, X
t
) dB
t
, X
0
=
or in integral form
(7.5) X
t
= +
_
t
0
b(s, X
s
) ds +
_
t
0
(s, X
s
) dB
s
.
The coecients b(t, x) and (t, x) are Borel measurable functions of (t, x)
[0, ) R
d
. The drift vector b(t, x) is R
d
-valued, and the dispersion matrix
(t, x) is d m-matrix valued. The d d matrix a(t, x) = (t, x)(t, x)
T
is
called the diusion matrix. X is the unknown R
d
-valued process.
Linear Ito equations can be explicitly solved by a technique from basic
theory of ODEs (ordinary dierential equations) known as the integrating
factor. We begin with the nonrandom case from elementary calculus.
Example 7.1. (Integrating factor for linear ODEs.) Let a(t) and g(t) be
given functions. Suppose we wish to solve
(7.6) x
t
+a(t)x = g(t)
for an unknown function x = x(t). The trick is to multiply through the
equation by the integrating factor
(7.7) z(t) = e
R
t
0
a(s) ds
and then identify the left-hand side zx
t
+ azx as the derivative (zx)
t
. The
equation becomes
d
dt
_
x(t)e
R
t
0
a(s) ds
_
= g(t)e
R
t
0
a(s) ds
.
Integrating from 0 to t gives
x(t)e
R
t
0
a(s) ds
x(0) =
_
t
0
g(s)e
R
s
0
a(u) du
ds
which rearranges into
x(t) = x(0)e
R
t
0
a(s) ds
+e
R
t
0
a(s) ds
_
t
0
g(s)e
R
s
0
a(u) du
ds.
Now one can check by dierentiation that this formula gives a solution.
We apply this idea to the most basic Ito equation.
Example 7.2. (Ornstein-Uhlenbeck process) Let > 0 and 0 < be
constants, and consider the SDE
(7.8) dX
t
= X
t
dt + dB
t
with a given initial value X
0
independent of the one-dimensional Brownian
motion B
t
. Formal similarity with the linear ODE (7.6) suggests we try the
integrating factor Z
t
= e
t
. For the ODE the key was to take advantage of
the formula d(zx) = xdz + z dx for the derivative of a product. Here we
attempt the same, but dierentiation rules from calculus have to be replaced
by Itos formula. The integration by parts rule which is a special case of
Itos formula gives
d(ZX) = Z dX +X dZ +d[Z, X].
The denition of Z gives dZ = Z dt. Since Z is a continuous FV process,
[Z, X] = 0. Assuming X satises (7.8), we get
d(ZX) = ZX dt +Z dB +ZX dt = Z dB
which in integrated from becomes (note Z
0
= 1)
Z
t
X
t
= X
0
+
_
t
0
e
s
dB
s
and we can solve for X to get
(7.9) X
t
= X
0
e
t
+e
t
_
t
0
e
s
dB
s
.
Since we arrived at this formula by assuming the solution of the equation,
now we turn around and verify that (7.9) denes a solution of (7.8). This is
a straightforward Ito calculation. It is conveniently carried out in terms of
formal stochastic dierentials (but could just as well be recorded in terms
of integrals):
dX = X
0
e
t
dt e
t
__
t
0
e
s
dB
_
dt +e
t
e
t
dB
+d
_
e
t
,
_
e
s
dB
_
=
_
X
0
e
t
+e
t
_
t
0
e
s
dB
_
dt + dB
= X dt + dB.
The quadratic covariation vanishes by Lemma A.10 because e
t
is a
continuous BV function. Notice that we engaged in slight abuse of notation
by putting e
t
inside the brackets, though of course we do not mean
the point value but rather the process, or function of t. By the same token,
_
e
s
dB stands for the process
_
t
0
e
s
dB.
The process dened by the SDE (7.8) or by the formula (7.9) is known
as the Ornstein-Uhlenbeck process.
If the equation had no noise ( = 0) the solution would be X
t
= X
0
e
t
which simply decays to 0 as t . Let us observe that in contrast to this,
when the noise is turned on ( > 0), the process X
t
is not driven to zero
but instead to a nontrivial statistical steady state.
Consider rst the random variable Y
t
= e
t
_
t
0
e
s
dB
s
that captures
the dynamical noise part in the solution (7.9). Y
t
is a mean zero martingale.
From the construction of stochastic integrals we know that Y
t
is a limit (in
probability, or almost sure) of sums:
Y
t
= lim
mesh()0
e
t
m()1
k=0
e
s
k
(B
s
k+1
B
s
k
).
From this we can nd the distribution of Y
t
by computing its characteristic
function. By the independence of Brownian increments
E(e
iYt
) = lim
mesh()0
E
_
exp
_
ie
t
m()1
k=0
e
s
k
(B
s
k+1
B
s
k
)
__
= lim
mesh()0
exp
_
1
2
2
e
2t
m()1
k=0
e
2s
k
(s
k+1
s
k
)
_
= exp
_
1
2
2
e
2t
_
t
0
e
2s
ds
_
= exp
_
2
4
(1 e
2t
)
_
.
This says that Y
t
is a mean zero Gaussian with variance

2
2
(1 e
2t
). In
the t limit Y
t
converges weakly to the ^(0,

2
2
) distribution. Then
from (7.9) we see that this distribution is the weak limit of X
t
as t .
Suppose we take the initial point X
0
^(0,

2
2
), independent of B.
Then (7.9) represents X
t
as the sum of two independent mean zero Gaus-
sians, and we can add the variances to see that for each t R
+
, X
t

^(0,

2
2
).
X
t
is in fact a Markov process. This can be seen from
X
t+s
= e
s
X
t
+e
(t+s)
_
t+s
t
e
u
dB
u
.
Our computations above show that as t , for every choice of the initial
state X
0
, X
t
converges weakly to its invariant distribution ^(0,

2
2
).
We continue with another example of a linear equation where the inte-
grating factor needs to take into consideration the stochastic part.
Example 7.3. (Geometric Brownian motion) Let , be constants, B one-
dimensional Brownian motion, and consider the SDE
(7.10) dX = X dt +X dB
with a given initial value X
0
independent of the Brownian motion. Since
the equation rewrites as
dX X(dt + dB) = 0
a direct adaptation of (7.7) would suggest multiplication by exp(tB
t
).
The reader is invited to try this. Some trial and error with Itos formula
reveals that the quadratic variation must be taken into account, and the
useful integrating factor turns out to be
Z
t
= exp
_
t B
t
+
1
2
2
t
_
.
Itos formula gives rst
(7.11) dZ = ( +
2
)Z dt Z dB.
(Note that a second
2
/2 factor comes from the quadratic variation term.)
Then, assuming X satises (7.10), one can check that d(ZX) = 0. For this
calculation, note that the only nonzero contribution to the covariation of
X and Z comes from the dB-terms of their semimartingale representations
(7.11) and (7.10):
[Z, X]
t
=
_
_
Z dB,
_
X dB
_
t
=
2
_
t
0
X
s
Z
s
d[B, B]
s
=
2
_
t
0
X
s
Z
s
ds.
Above we used Proposition 6.2. In dierential form the above is simply
d[Z, X] =
2
ZX dt.
Continuing with the solution of equation (7.10), d(ZX) = 0 and Z
0
= 1
give Z
t
X
t
= Z
0
X
0
= X
0
, from which X
t
= X
0
Z
1
t
. More explicitly, our
tentative solution is
X
t
= X
0
exp
_
(
1
2
2
)t +B
t
_
whose correctness can again be veried with Itos formula. This process is
called geometric Brownian motion, although some texts reserve the term for
the case = 0.
Solutions of the previous two equations are dened for all time. In the
next example this is not the case.
Example 7.4. (Brownian bridge) Fix 0 < T < 1. The SDE is now
(7.12) dX =
X
1 t
dt +dB, for 0 t T, with X
0
= 0.
The integrating factor
Z
t
= exp
_
_
t
0
ds
1 s
_
=
1
1 t
works, and we arrive at the solution
(7.13) X
t
= (1 t)
_
t
0
1
1 s
dB
s
.
To check that this solves (7.12), apply the product formula d(UV ) = U dV +
V dU +d[U, V ] with U = 1 t and V =
_
(1 s)
1
dB
s
= (1 t)
1
X. With
dU = dt, dV = (1 t)
1
dB and [U, V ] = 0 this gives
dX = d(UV ) = (1 t)(1 t)
1
dB (1 t)
1
Xdt
= dB (1 t)
1
Xdt
which is exactly (7.12).
The solution (7.13) is dened for 0 t < 1. One can show that it
converges to 0 as t 1, along almost every path of B
s
(Exercise 7.1).
7.1.2. Stochastic exponential. The exponential function g(t) = e
ct
can
be characterized as a the unique function g that satises the equation
g(t) = 1 +c
_
t
0
g(s) ds.
The stochastic exponential generalizes the dt-integral to a semimartingale
integral.
Theorem 7.5. Let Y be a real-valued semimartingale such that Y
0
= 0.
Dene
(7.14) Z
t
= exp
_
Y
t
1
2
[Y ]
t
_

s(0,t]
(1 + Y
s
) exp
_
Y
s
+
1
2
(Y
s
)
2
_
.
Then the process Z is a cadlag semimartingale, and it is the unique cadlag
semimartingale Z that satises the equation
(7.15) Z
t
= 1 +
_
(0,t]
Z
s
dY
s
.
Proof. The uniqueness of the solution of (7.15) will follow from the general
uniqueness theorem for solutions of semimartingale equations.
We start by showing that Z dened by (7.14) is a semimartingale. The
denition can be reorganized as
Z
t
= exp
_
Y
t
1
2
[Y ]
t
+
1
2
s(0,t]
(Y
s
)
2
_

s(0,t]
(1 + Y
s
) exp
_
Y
s
_
.
The continuous part
[Y ]
c
t
= [Y ]
t
s(0,t]
(Y
s
)
2
of the quadratic variation is an increasing process (cadlag, nondecreasing),
hence an FV process. Since e
x
is a C
2
function, the factor
(7.16) exp(W
t
) exp
_
Y
t
1
2
[Y ]
t
+
1
2
s(0,t]
(Y
s
)
2
_
is a semimartingale by Itos formula. It remains to show that for a xed
the product
(7.17) U
t

s(0,t]
(1 + Y
s
) exp
_
Y
s
_
converges and has paths of bounded variation. The part
s(0,t]:[Y (s)[1/2
(1 + Y
s
) exp
_
Y
s
_
has only nitely many factors because a cadlag path has only nitely many
jumps exceeding a given size in a bounded time interval. Hence this part is
piecewise constant, cadlag and in particular FV. Let
s
= Y
s
1
[Y (s)[<1/2
denote a jump of magnitude below
1
2
. It remains to show that
H
t
=
s(0,t]
(1 +
s
) exp
_
s
_
= exp
_

s(0,t]
_
log(1 +
s
)
s
_
_
is an FV process. For [x[ < 1/2,
log(1 +x) x
x
2
.
Hence the sum above is absolutely convergent:
s(0,t]
log(1 +
s
)
s
s(0,t]
2
s
[Y ]
t
< .
It follows that log H
t
is a cadlag FV process (see Example 1.14 in this con-
text). Since the exponential function is locally Lipschitz, H
t
= exp(log H
t
)
is also a cadlag FV process (Lemma A.6).
To summarize, we have shown that e
Wt
in (7.16) is a semimartingale
and U
t
in (7.17) is a well-dened real-valued FV process. Consequently
Z
t
= e
Wt
U
t
is a semimartingale.
The second part of the proof is to show that Z satises equation (7.15).
Let f(w, u) = e
w
u, and nd
f
w
= f, f
u
= e
w
, f
uu
= 0, f
uw
= e
w
and f
ww
= f.
Note that W
s
= Y
s
because the jump in [Y ] at s equals exactly (Y
s
)
2
.
A straight-forward application of Ito gives
Z
t
= 1 +
_
(0,t]
e
W(s)
dU
s
+
_
(0,t]
Z
s
dW
s
+
1
2
_
(0,t]
Z
s
d[W]
s
+
_
(0,t]
e
W(s)
d[W, U]
s
+
s(0,t]
_
Z
s
Z
s
Y
s
e
W(s)
U
s
1
2
Z
s
(Y
s
)
2
e
W(s)
Y
s
U
s
_
.
Since
W
t
= Y
t
1
2
_
[Y ]
t
s(0,t]
(Y
s
)
2
_
where the part in parentheses is FV and continuous, linearity of covariation
and Lemma A.10 imply [W] = [Y ] and [W, U] = [Y, U] =

Y U.
Now match up and cancel terms. First,
_
(0,t]
e
W(s)
dU
s
s(0,t]
e
W(s)
U
s
= 0
because U is an FV process whose paths are step functions so the integral
reduces to a sum over jumps. Second,
_
(0,t]
Z
s
dW
s
+
1
2
_
(0,t]
Z
s
d[W]
s
s(0,t]
Z
s
(Y
s
)
2
=
_
(0,t]
Z
s
dY
s
by the denition of W and the observation [W] = [Y ] from above. Then
_
(0,t]
e
W(s)
d[W, U]
s
s(0,t]
e
W(s)
Y
s
U
s
= 0
by the observation [W, U] =

Y U from above. Lastly, Z
s
Z
s
Y
s
=
0 directly from the denition of Z.
After the cancelling we are left with
Z
t
= 1 +
_
(0,t]
Z
s
dY
s
,
the desired equation.
The semimartingale Z dened in the theorem is called the stochastic
exponential of Y , and denoted by c(Y ).
Example 7.6. Let Y
t
= B
t
where B is Brownian motion. The stochastic
exponential Z = c(B), given by Z
t
= exp(B
t

1
2
2
t), another instance
of geometric Brownian motion. The equation
Z
t
= 1 +
_
t
0
Z
s
dB
s
and moment bounds show that geometric Brownian motion is a continuous
L
2
-martingale.
7.2. Existence and uniqueness for It o equations
We present the existence and uniqueness results for SDEs in two stages.
This section covers the standard strong existence and uniqueness result for
Ito equations contained in every stochastic analysis book. Our presentation
is modeled after [9]. In the subsequent two sections we undertake a much
more technical existence and uniqueness proof for a cadlag semimartingale
equation.
Let B
t
be a standard R
m
-valued Brownian motion with respect to the
complete ltration T
t
on the complete probability space (, T, P). The
given data are coecient functions b : R
+
R
d
R
d
and : R
+
R
d
R
dm
and an R
d
-valued T
0
-measurable random variable that gives the
initial position. On R
d
we use the Euclidean norm [x[, and similarly for a
matrix A = (a
i,j
) the norm is [A[ = (
[a
i,j
[
2
)
1/2
.
The goal is to prove that on (, T, P) there exists an R
d
-valued process
X that is adapted to T
t
and satises the integral equation
(7.18) X
t
= +
_
t
0
b(s, X
s
) ds +
_
t
0
(s, X
s
) dB
s
.
in the sense that the processes on the right and left of the equality sign
are indistinguishable. Part of the requirement is that the integrals are well-
dened, for which we require that
(7.19) P
_
T < :
_
T
0
[b(s, X
s
)[ ds +
_
T
0
[(s, X
s
)[
2
ds <
_
= 1.
Such a process X on (, T, P) is called a strong solution because it exists
on the given probability space and is adapted to the given ltration. Strong
uniqueness then means that if Y is another process adapted to T
t
that
satises (7.18) (7.19), then X and Y are indistinguishable.
Next we state the standard Lipschitz assumption on the coecient func-
tions and then the strong existence and uniqueness theorem.
Assumption 7.7. Assume the Borel functions b : R
+
R
d
R
d
and
: R
+
R
d
R
dm
satisfy the Lipschitz condition
(7.20) [b(t, x) b(t, y)[ +[(t, x) (t, y)[ L[x y[
and the bound
(7.21) [b(t, x)[ +[(t, x)[ L(1 +[x[)
for a constant L < and all t R
+
and x, y R
d
.
Theorem 7.8. Let Assumption 7.7 be in force and let be an R
d
-valued T
0
-
measurable random variable. Then there exists a continuous process X on
(, T, P) adapted to T
t
that satises integral equation (7.18) and property
(7.19). The process X with these properties is unique up to indistinguisha-
bility.
Before turning to the proofs we present simple examples from ODE
theory that illustrate the loss of existence and uniqueness when the Lipschitz
assumption on the coecient is weakened.
Example 7.9. (a) Consider the equation
x(t) =
_
t
0
2
_
x(s) ds.
The function b(x) =

x is not Lipschitz on [0, 1] because b
t
(x) blows up at
the origin. The equation has innitely many solutions. Two of them are
x(t) = 0 and x(t) = t
2
.
(b) The equation
x(t) = 1 +
_
t
0
x
2
(s) ds
does not have a solution for all time. The unique solution starting at t = 0
is x(t) = (1t)
1
which exists only for 0 t < 1. The function b(x) = x
2
is
locally Lipschitz. This is enough for uniqueness, as determined in Exercise
7.3.
We prove Theorem 7.8 in stages, and begin with existence under an L
2
assumption on the initial state.
Theorem 7.10. Let Assumption 7.7 be in force and the T
0
-measurable ini-
tial point satisfy
E([[
2
) < .
Then there exists a continuous process X adapted to T
t
that satises in-
tegral equation (7.18) and this moment bound: for each T < there exists
a constant C = C(T, L) < such that
(7.22) E([X
t
[
2
) C
_
1 +E([[
2
)
_
for t [0, T].
In particular, the integrand (s, X
s
) is a member of /
2
(B) and
E
_
T
0
[b(s, X
s
)[
2
ds < for each T < .
Proof of Theorem 7.10. The existence proof uses classic Picard iteration.
We dene a sequence of processes X
n
indexed by n Z
+
by X
0
(t) = and
then for n 0
(7.23) X
n+1
(t) = +
_
t
0
b(s, X
n
(s)) ds +
_
t
0
(s, X
n
(s)) dB
s
.
Step 1. We show that for each n the process X
n
is well-dened, con-
tinuous, and satises
(7.24) sup
s[0,t]
E([X
n
(s)[
2
) < for all t R
+
.
Bound (7.24) is clear in the case n = 0 by the denition X
0
(t) = and
assumption L
2
(P).
Assume (7.24) for n. To show that process X
n+1
is well-dened by the
integrals in (7.23) we check that, for each t R
+
,
(7.25) E
_ _
t
0
[b(s, X
n
(s))[
2
ds +
_
t
0
[(s, X
n
(s))[
2
ds
_
< .
Assumption (7.21) bounds the left side of (7.25) by
4L
2
t
_
1 + sup
s[0,t]
E[X
n
(s)[
2
_
which is nite by (7.24).
Property (7.25) implies that the rst (Lebesgue) integral on the right of
(7.23) is a continuous L
2
process and that the second (stochastic) integral
is a continuous L
2
martingale because the integrand lies in /
2
(B).
Now we know X
n+1
(t) is an L
2
random variable. We bound the second
moment by rst using (a + b + c)
2
9(a
2
+ b
2
+ c
2
) and by then apply-
ing Schwarz inequality to the rst integral and the isometry of stochastic
integration to the second:
(7.26)
E[X
n+1
(t)[
2
9E([[
2
) + 9E
_
t
0
b(s, X
n
(s)) ds
2
+ 9E
_
t
0
(s, X
n
(s)) dB
s
2
9E([[
2
) + 9tE
_
t
0
[b(s, X
n
(s))[
2
ds + 9E
_
t
0
[(s, X
n
(s))[
2
ds
9E([[
2
) + 18L
2
(1 +t)t + 18L
2
(1 +t)
_
t
0
E[X
n
(s)[
2
ds.
In the last step we applied assumption (7.21). Now it is clear that (7.24) is
passed from n to n + 1.
Step 2. For each T < there exists a constant A = A(T, L) < such
that
(7.27) E([X
n
(t)[
2
) A
_
1 +E([[
2
)
_
for all n and t [0, T].
To prove Step 2, restrict t to [0, T] and introduce a constant C = C(T, L)
to rewrite the outcome of (7.26) as
E[X
n+1
(t)[
2
C
_
E([[
2
) + 1
_
+C
_
t
0
E[X
n
(s)[
2
ds.
With y
n
(t) = E[X
n
(t)[
2
we have this situation: there are constants B, C
such that
(7.28) y
0
(t) B and y
n+1
(t) B +C
_
t
0
y
n
(s) ds.
This assumption gives inductively
(7.29) y
n
(t) B
n
k=0
C
k
t
k
k!
Be
Ct
.
This translates into (7.27).
Step 3. There exists a continuous, adapted process X such that X
n

X uniformly on compact time intervals both almost surely and in this L
2
sense: for each T <
(7.30) lim
n
E
_
sup
0sT
[X(s) X
n
(s)[
2
= 0.
Furthermore, X satises moment bound (7.22).
For process X
n+1
X
n
(7.23) gives the equation
(7.31)
X
n+1
(t) X
n
(t) =
_
t
0
_
b(s, X
n
(s)) b(s, X
n1
(s))
ds
+
_
t
0
_
(s, X
n
(s)) (s, X
n1
(s))
dB
s
G
n
(t) +M
n
(t)
where G
n
is an FV process and M
n
an L
2
martingale. Schwarz inequality
and the Lipschitz assumption (7.20) give
(7.32)
E
_
sup
0st
[G
n
(s)[
2
t
_
t
0
E
b(s, X
n
(s)) b(s, X
n1
(s))
2
ds
L
2
t
_
t
0
E[X
n
(s)) X
n1
(s)[
2
ds.
The isometry of stochastic integration and Doobs inequality (Theorem 3.11)
give
(7.33)
E
_
sup
0st
[M
n
(s)[
2
4E[ [M
n
(t)[
2
]
= 4
_
t
0
E
(s, X
n
(s)) (s, X
n1
(s))
2
ds
4L
2
_
t
0
E[X
n
(s)) X
n1
(s)[
2
ds.
Restrict t to [0, T] and combine the above bounds into
(7.34)
E
_
sup
0st
[X
n+1
(s) X
n
(s)[
2
C
_
t
0
E
_
sup
0us
[X
n
(u)) X
n1
(u)[
2
ds.
For y
n
(t) = E
_
sup
0st
[X
n+1
(s) X
n
(s)[
2
we have constants B, C such

that these inequalities hold for t [0, T]:
(7.35) y
0
(t) B and y
n+1
(t) C
_
t
0
y
n
(s) ds.
The inequality for y
0
(t) is checked separately:
y
0
(t) = E
_
sup
0st
[X
1
(s) X
0
(s)[
2
= E
_
sup
0st
_
s
0
b(u, ) du +
_
s
0
(u, ) dB
u
2
_
and reason as above once again.
Bounds (7.35) develop inductively into
(7.36) y
n
(t) B
C
n
t
n
n!
.
An application of Chebyshevs inequality gives us a summable bound that
we can feed into the Borel-Cantelli lemma:
P
_
sup
0st
[X
n+1
(s) X
n
(s)[ 2
n
_
4
n
E
_
sup
0st
[X
n+1
(s) X
n
(s)[
2
B
4
n
C
n
t
n
n!
.
Consider now T N. From
n
P
_
sup
0sT
[X
n+1
(s) X
n
(s)[ 2
n
_
<
and the Borel-Cantelli lemma we conclude that for almost every and each
T N there exists a nite N
T
() < such that n N
T
() implies
sup
0sT
[X
n+1
(s) X
n
(s)[ < 2
n
.
Thus for almost every the paths X
n
form a Cauchy sequence in the space
C[0, T] of continuous functions with supremum distance, for each T N.
This space is complete (Lemma A.5). Consequently there exists a continuous
path X
s
() : s R
+
, dened for almost every , such that
sup
0sT
[X(s) X
n
(s)[ 0 as n
for all T N and almost every . This denes X and proves the almost sure
convergence part of Step 3. Adaptedness of X comes from X
n
(t) X(t)
a.s. and the completeness of the -algebra T
t
.
From the uniform convergence we get
sup
0sT
[X(s) X
n
(s)[ = lim
m
sup
0sT
[X
m
(s) X
n
(s)[.
Abbreviate
n
= sup
0sT
[X
n+1
(s) X
n
(s)[ and let |
n
|
2
= E(
2
n
)
1/2
de-
note L
2
norm. By Fatous lemma and the triangle inequality for the L
2
norm,
_
_
sup
0sT
[X(s) X
n
(s)[
_
_
2
lim
m
_
_
sup
0sT
[X
m
(s) X
n
(s)[
_
_
2
lim
m
m1
k=n
|
k
|
2
=
k=n
|
k
|
2
.
By estimate (7.36) the last expression vanishes as n . This gives (7.30).
Finally, L
2
convergence (or Fatous lemma) converts (7.27) into (7.22). This
completes the proof of Step 3.
Step 4. Process X satises (7.18).
The integrals in (7.18) are well-dened continuous processes because
we have established the L
2
bound on X that gives the L
2
bounds on the
integrands as in the last step of (7.26). To prove Step 4 we show that the
terms in (7.23) converge to those of (7.18) in L
2
. We have derived the
required estimates already. For example, by the Lipschitz assumption and
(7.30), for every T < ,
E
_
T
0
(s, X
n
(s)) (s, X(s))
2
ds L
2
E
_
T
0
[X
n
(s) X(s)[
2
ds 0.
This says that the integrand (s, X
n
(s)) converges to (s, X(s)) in the space
/
2
(B). Consequently we have the L
2
convergence of stochastic integrals:
_
t
0
(s, X
n
(s)) dB
s

_
t
0
(s, X(s)) dB
s
which also holds uniformly on compact time intervals.
We leave the argument for the other integral as an exercise. Proof of
Theorem 7.10 is complete.
At this point we prove strong uniqueness. We prove a slightly more
general result which we can use also to extend the existence result beyond
L
2
(P). Let be another initial condition and Y a process that satises
(7.37) Y
t
= +
_
t
0
b(s, Y
s
) ds +
_
t
0
(s, Y
s
) dB
s
.
Theorem 7.11. Let and be R
d
-valued T
0
-measurable random variables
without any integrability assumptions. Assume that coecients b and are
Borel functions that satisfy Lipschitz condition (7.20). Let X and Y be
two continuous processes on (, T, P) adapted to T
t
. Assume X satises
integral equation (7.18) and condition (7.19). Assume Y satises integral
equation (7.37) and condition (7.19) with X replaced by Y . Then on the
event = processes X and Y are indistinguishable. That is, for almost
every such that () = (), X
t
() = Y
t
() for all t R
+
.
In the case = we get the strong uniqueness. The uniqueness state-
ment is actually true with a weaker local Lipschitz condition (Exercise 7.3).
Corollary 7.12. Let be an R
d
-valued T
0
-measurable random variable.
Assume that b and are Borel functions that satisfy Lipschitz condition
(7.20). Then up to indistinguishability there is at most one continuous pro-
cess X on (, T, P) adapted to T
t
that satises (7.18)(7.19).
For the proof of Theorem 7.11 we introduce a basic tool from analysis.
Lemma 7.13. (Gronwalls inequality) Let g be an integrable Borel function
on [a, b], f a nondecreasing function on [a, b], and assume that there exists
a constant B such that
g(t) f(t) +B
_
t
a
g(s) ds, a t b.
Then
g(t) f(t)e
B(ta)
, a t b.
Proof. The integral
_
t
a
g(s) ds of an integrable function g is an absolutely
continuous (AC) function of the upper limit t. At Lebesgue-almost every t
it is dierentiable and the derivative equals g(t). Consequently the equation
d
dt
_
e
Bt
_
t
a
g(s) ds
_
= Be
Bt
_
t
a
g(s) ds +e
Bt
g(t)
is valid for Lebesgue-almost every t. Hence by the hypothesis
d
dt
_
e
Bt
_
t
a
g(s) ds
_
= e
Bt
_
g(t) B
_
t
a
g(s) ds
_
f(t)e
Bt
for almost every t.
An absolutely continuous function is the integral of its almost everywhere
existing derivative. Integrating above and using the monotonicity of f give
e
Bt
_
t
a
g(s) ds
_
t
a
f(s)e
Bs
ds
f(t)
B
_
e
Ba
e
Bt
_
from which
_
t
a
g(s) ds
f(t)
B
_
e
B(ta)
1
_
.
Using the assumption once more,
g(t) f(t) +B
f(t)
B
_
e
B(ta)
1
_
= f(t)e
B(ta)
.
Proof of Theorem 7.11. In order to use L
2
bounds we stop the processes
before they get too large. Fix n N for the time being and dene
= inft 0 : [X
t
[ n or [Y
t
[ n.
This is a stopping time by Corollary 2.10. Processes X
and Y
are
bounded.
The bounded T
0
-measurable random variable 1 = moves freely in
and out of a stochastic integrals. This can be seen for example from the
constructions themselves. By subtracting the equations for X
t
and Y
t
and
then stopping we get this equation:
(7.38)
(X
t
Y
t
) 1 =
=
_
t
0
_
b(s, X
s
) b(s, Y
s
)
1 = ds
+
_
t
0
_
(s, X
s
) (s, Y
s
)
1 = dB
s
=
_
t
0
_
b(s, X
s
) b(s, Y
s
)
1 = ds
+
_
t
0
_
(s, X
s
) (s, Y
s
)
1 = dB
s
G(t) +M(t).
The last equality denes the processes G and M. The next to last equality
comes from the fact that if integrands agree up to a stopping time, then so
do stochastic integrals (Proposition 4.10(c) for Brownian motion). We have
the constant bound
(s, X
s
) (s, Y
s
)
1 = L[X
s
Y
s
[ 1 =
L[(X
s
) (Y
s
)[ 1 = 2Ln
and similarly for the other integrand. Consequently G is an L
2
FV process
and M an L
2
martingale.
Schwarz inequality, bounding t above with t, and the Lipschitz as-
sumption (7.20) give
E
_
[G(t)[
2
t
_
t
0
E
_
[b(s, X
s
) b(s, Y
s
)[
2
1 =
ds
L
2
t
_
t
0
E
_
[X
s
Y
s
[
2
1 =
ds.
The isometry of stochastic integration gives
E
_
[M(t)[
2
= E
_ __
t
0
_
(s, X
s
) (s, Y
s
)
1 = dB
s
_
2
_
= E
_ __
t
0
1
[0,]
(t)
_
(s, X
s
) (s, Y
s
)
1 = dB
s
_
2
_
=
_
t
0
E
_
1
[0,]
(t)[(s, X
s
) (s, Y
s
)[
2
1 =
ds
L
2
_
t
0
E
_
[X
s
Y
s
[
2
1 =
ds.
In the second equality above we moved the cut-o at from the upper limit
into the integrand in accordance with (5.23) from Proposition 5.16. (Strictly
speaking we have proved this only for predictable integrands but naturally
the same property can be proved for the more general Brownian motion
integral.)
Combine the above bounds into
E
_
[X
t
Y
t
[
2
1 =
2L
2
(t + 1)
_
t
0
E
_
[X
s
Y
s
[
2
1 =
ds.
If we restrict t to [0, T] we can replace 2L
2
(t + 1) with the constant B =
2L
2
(T + 1) above. Then Gronwalls inequality implies
E
_
[X
t
Y
t
[
2
1 =
= 0 for t [0, T].

We can repeat this argument for each T N and so extend the conclusion to
all t R. Thus on the event = the continuous processes X
and Y
agree
almost surely at each xed time. It follows that they are indistinguishable.
Now we have this statement for each n that appears in the denition of .
With continuous paths we have as we take n to innity. In the end
we can conclude that X and Y are indistinguishable on the event = .
Completion of the proof of Theorem 7.8. It remains to remove the as-
sumption L
2
(P) from the existence proof. Let X
m
(t) be the solution
given by Theorem 7.10 for the initial point 1[[ m. By Theorem
7.11, for m < n processes X
m
and X
n
are indistinguishable on the event
[[ m. Thus we can consistently dene
X(t) = X
m
(t) on the event [[ m.
X has continuous paths because each X
m
does, and adaptedness of X can
be checked because [[ m T
0
. X satises (7.19) because each X
m
does. Thus the integrals in (7.18) are well-dened for X.
To verify equation (7.18) we can again pass the T
0
-measurable indicator
1[[ m in and out of stochastic integrals, and then use the equality
of processes b(s, X(s))1[[ m = b(s, X
m
(s))1[[ m with the same
property for :
X(t)1[[ m = X
m
(t)1[[ m
= 1[[ m +
_
t
0
b(s, X
m
(s))1[[ m ds
+
_
t
0
(s, X
m
(s))1[[ m dB
s
= 1[[ m +
_
t
0
b(s, X(s))1[[ m ds
+
_
t
0
(s, X(s))1[[ m dB
s
= 1[[ m
_
+
_
t
0
b(s, X(s)) ds +
_
t
0
(s, X(s)) dB
s
_
.
Since the union of the events [[ m is the entire space (almost surely),
we have the equation for X.
7.3. A semimartingale equation
Fix a probability space (, T, P) with a ltration T
t
. We consider the
equation
(7.39) X(t, ) = H(t, ) +
_
(0,t]
F(s, , X()) dY (s, )
where Y is a given R
m
-valued cadlag semimartingale, H is a given R
d
-
valued adapted cadlag process, and X is the unknown R
d
-valued process.
The coecient F is a d m-matrix valued function of its arguments. (The
componentwise interpretation of the stochastic integral
_
F dY appeared
Remark 5.38.) For the coecient F we make these assumptions.
Assumption 7.14. The coecient function F(s, , ) in equation (7.39) is
a measurable function from the space R
+
D
R
d[0, ) into the space
R
dm
of d m matrices, and satises the following requirements.
(i) F satises a spatial Lipschitz condition uniformly in the other vari-
ables: there exists a nite constant L such that this holds for all
(t, ) R
+
and all , D
R
d[0, ):
(7.40) [F(t, , ) F(t, , )[ L sup
s[0,t)
[(s) (s)[.
(ii) Given any adapted R
d
-valued cadlag process X on , the function
(t, ) F(t, , X()) is a predictable process.
(iii) Given any adapted R
d
-valued cadlag process X on , there exist
stopping times
k
such that 1
(0,
k
]
(t)F(t, X) is bounded for
each k.
These conditions on F are rather technical. They are written to help
prove the next theorem. The proof is lengthy and postponed to the next
section. There are technical steps where we need right-continuity of the
ltration, hence we include this requirement in the hypotheses.
Theorem 7.15. Assume T
t
is complete and right-continuous. Let H
be an adapted R
d
-valued cadlag process and Y an R
m
-valued cadlag semi-
martingale. Assume F satises Assumption 7.14. Then there exists a cadlag
process X(t) : 0 t < adapted to T
t
that satises equation (7.39),
and X is unique up to indistinguishability.
In the remainder of this section we discuss the assumptions on F and
state some consequences of the existence and uniqueness theorem.
Notice the exclusion of the endpoint t on the right-hand side of the
Lipschitz condition (7.40). This implies that F(t, , ) is a function of the
stopped path
t
(s) =
_
_
(0), t = 0, 0 s <
(s), 0 s < t
(t), s t > 0.
In other words, the function F(t, , ) only depends on the path on the time
interval [0, t).
Parts (ii)(iii) guarantees that the stochastic integral
_
F(s, X) dY (s)
exists for an arbitrary adapted cadlag process X and semimartingale Y .
The existence of the stopping times
k
in part (iii) can be veried via this
local boundedness condition.
Lemma 7.16. Assume F satises parts (i) and (ii) of Assumpotion 7.14.
Suppose there exists a path

D
R
d[0, ) such that for all T < ,
(7.41) c(T) = sup
t[0,T],
[F(t, ,

)[ < .
Then for any adapted R
d
-valued cadlag process X there exist stopping times
k
such that 1
(0,
k
]
(t)F(t, X) is bounded for each k.
Proof. Dene
k
= inft 0 : [X(t)[ k inft > 0 : [X(t)[ k k
These are bounded stopping times by Lemma 2.9. [X(s)[ k for 0 s <
k
,
but if
k
= 0 we cannot claim that [X(0)[ k. The stopped process
X
(t) =
_
_
X(0),
k
= 0
X(t), 0 t <
k
X(
k
), t
k
> 0
is cadlag and adapted.
We show that 1
(0,
k
]
(s)F(s, X) is bounded. If s = 0 or
k
= 0 then this
random variable vanishes. Suppose 0 < s
k
. Since [0, s) [0,
k
), X
agrees with X
on [0, s), and then F(s, X) = F(s, X
) by (7.40).
[F(s, X)[ = [F(s, X
)[ [F(s,

)[ +[F(s, X
) F(s,

)[
c(k) +L sup
0t<
k
[X(t)

(t)[
c(k) +L
_
k + sup
0tk
[
(t)[
_
.
The last line above is a nite quantity because

is locally bounded, being
a cadlag path.
Here is how to apply the theorem to an equation that is not dened for
all time.
Corollary 7.17. Let 0 < T < . Assume T
t
is right-continuous, Y
is a cadlag semimartingale and H is an adapted cadlag process, all dened
for 0 t T. Let F satisfy Assumption 7.14 for (t, ) [0, T] . In
particular, part (ii) takes this form: if X is a predictable process dened
on [0, T] , then so is F(t, X), and there is a nondecreasing sequence of
stopping times
k
such that 1
(0,
k
]
(t)F(t, X) is bounded for each k, and
for almost every ,
k
= T for all large enough k.
Then there exists a unique solution X to equation (7.39) on [0, T].
Proof. Extend the ltration, H, Y and F to all time in this manner: for
t (T, ) and dene T
t
= T
T
, H
t
() = H
T
(), Y
t
() = Y
T
(), and
F(t, , ) = 0. Then the extended processes H and Y and the coecient F
satisfy all the original assumptions on all of [0, ). Note in particular that
if G(t) = F(t, X) is a predictable process for 0 t T, then extending
it as a constant to (T, ) produces a predictable process for 0 t < .
And given a predictable process X on [0, ), let
k
be the stopping times
given by the assumption, and then dene
k
=
_
k
,
k
< T
,
k
= T.
These stopping times satisfy part (iii) of Assumption 7.14 for [0, ). Now
Theorem 7.39 gives a solution X for all time 0 t < for the extended
equation, and on [0, T] X solves the equation with the original H, Y and F.
For the uniqueness part, given a solution X of the equation on [0, T],
extend it to all time by dening X
t
= X
T
for t (T, ). Then we have
a solution of the extended equation on [0, ), and the uniqueness theorem
applies to that.
Another easy generalization is to the situation where the Lipschitz con-
stant is an unbounded function of time.
Corollary 7.18. Let the assumptions be as in Theorem 7.15, except that
the Lipschitz assumption is weakened to this: for each 0 < T < there
exists a nite constant L(T) such that this holds for all (t, ) [0, T]
and all , D
R
d[0, ):
(7.42) [F(t, , ) F(t, , )[ L(T) sup
s[0,t)
[(s) (s)[.
Then equation (7.39) has a unique solution X adapted to T
t
.
Proof. For k N, the function 1
0tk
F(t, , ) satises the original hy-
potheses. By Theorem 7.15 there exists a process X
k
that satises the
equation
(7.43) X
k
(t) = H
k
(t) +
_
(0,t]
1
[0,k]
(s)F(s, X
k
) dY
k
(s).
The notation above is H
k
(t) = H(k t) for a stopped process as before. Let
k < m. Stopping the equation
X
m
(t) = H
m
(t) +
_
(0,t]
1
[0,m]
(s)F(s, X
m
) dY
m
(s)
at time k gives the equation
X
k
m
(t) = H
k
(t) +
_
(0,tk]
1
[0,m]
(s)F(s, X
m
) dY
m
(s),
valid for all t. By Proposition 5.39 stopping a stochastic integral can be
achieved by stopping the integrator or by cutting o the integrand with an
indicator function. If we do both, we get the equation
X
k
m
(t) = H
k
(t) +
_
(0,t]
1
[0,k]
(s)F(s, X
k
m
) dY
k
(s).
Thus X
k
and X
k
m
satisfy the same equation, so by the uniqueness theorem,
X
k
= X
k
m
for k < m. Thus we can unambiguously dene a process X by
setting X = X
k
on [0, k]. Then for 0 t k we can substitute X for X
k
in
equation (7.43) and get the equation
X(t) = H
k
(t) +
_
(0,kt]
F(s, X) dY (s), 0 t k.
Since this holds for all k, X is a solution to the original SDE (7.39).
Uniqueness works similarly. If X and

X solve (7.39), then X(k t) and
X(k t) solve (7.43). By the uniqueness theorem X(k t) =

X(k t) for
all t, and since k can be taken arbitrary, X =

X.
Example 7.19. Here are ways by which a coecient F satisfying the as-
sumptions of Corollary 7.18 can arise.
(i) Let f(t, , x) be a T B
R
d-measurable function from (R
+
) R
d
into d m-matrices. Assume f satises the Lipschitz condition
[f(t, , x) f(t, , y)[ L(T)[x y[
for (t, ) [0, T] and x, y R
d
, and the local boundedness condition
sup
_
[f(t, , 0)[ : 0 t T,
_
<
for all 0 < T < . Then put F(t, , ) = f(t, , (t)).
Satisfaction of the conditions of Assumption 7.14 is straight-forward,
except perhaps the predictability. Fix a cadlag process X and let | be
the space of T B
R
d-measurable f such that (t, ) f(t, , X
t
()) is
T-measurable. This space is linear and closed under pointwise convergence.
By Theorem B.2 it remains to check that | contains indicator functions
f = 1
B
of products of T and B B
R
d. For such f, f(t, , X
t
()) =
1
(t, )1
B
(X
t
()). The rst factor 1
(t, ) is predictable by construction.

The second factor 1
B
(X
t
()) is predictable because X
t
is a predictable
process, and because of the general fact that g(Z
t
) is predictable for any
predictable process Z and any measurable function g on the state space of
Z.
(ii) A special case of the preceding is a Borel measurable function f(t, x)
on [0, ) R
d
with the required Lipschitz and boundedness conditions. In
other words, no dependence on .
(iii) Continuing with non-random functions, suppose a function G :
D
R
d[0, ) D
R
d[0, ) satises a Lipschitz condition in this form:
(7.44) G()
t
G()
t
L sup
s[0,t)
[(s) (s)[.
Then F(t, , ) = G()
t
satises Assumption 7.14.
Next, let us specialize to Ito equations to recover Theorem 7.8 proved in
Section 7.2.
Corollary 7.20. Let B
t
be a standard Brownian motion in R
m
with respect
to a right-continuous ltration T
t
and an R
d
-valued T
0
-measurable ran-
dom variable. Fix 0 < T < . Assume the functions b : [0, T] R
d
R
d
and : [0, T] R
d
R
dm
satisfy the Lipschitz condition
[b(t, x) b(t, y)[ +[(t, x) (t, y)[ L[x y[
and the bound
[b(t, x)[ +[(t, x)[ L(1 +[x[)
for a constant L and all 0 t T, x, y R
d
.
Then there exists a unique continuous process X on [0, T] that is adapted
to T
t
and satises
(7.45) X
t
= +
_
t
0
b(s, X
s
) ds +
_
t
0
(s, X
s
) dB
s
for 0 t T.
Proof. To t this into Theorem 7.15, let Y (t) = [t, B
t
]
T
, H(t) = , and
F(t, , ) =
_
b(t, (t)), (t, (t))
.
We write b(t, (t)) to get a predictable coecient. For a continuous path
this left limit is immaterial as (t) = (t). The hypotheses on b and
establish the conditions required by the existence and uniqueness theorem.
Once there is a cadlag solution X from the theorem, continuity of X follows
by observing that the right-hand side of (7.45) is a continuous process.
7.4. Existence and uniqueness for a semimartingale equation
This section proves Theorem 7.15. The key part of both the existence and
uniqueness proof is a Gronwall-type estimate, which we derive after some
technical preliminaries. Note that we can assume the Lipschitz constant
L > 0. If L = 0 then F(s, , X) does not depend on X, equation (7.39)
directly denes the process X and there is nothing to prove.
7.4.1. Technical preliminaries.
Lemma 7.21. Let X be a cadlag process and Y a cadlag semimartingale.
Fix t > 0. Let be a nondecreasing continuous process on [0, t] such that
(0) = 0 and (u) is a stopping time for each u. Then
(7.46)
_
(0,(t)]
X(s) dY (s) =
_
(0,t]
X (s) d(Y )(s).
Remark 7.22. On the right the integrand is
X (s) = lim
u,s, u<s
X((u))
which is not the same as X((s)) if the latter is interpreted as X evaluated
at (s). To see this just think of (u) = u and imagine X has a jump at
s. Note that for the stochastic integral on the right in (7.46) the ltration
changed to T
(u)
.
Proof. As t
n
i
goes through partitions of [0, t] with mesh tending to zero,
(t
n
i
) goes through partitions of [0, (t)] with mesh tending to zero. By
Proposition 5.37, both sides of (7.46) equal the limit of the sums
i
X((t
n
i
))
_
Y (t
n
i+1
) Y (t
n
i
)
_
.
Lemma 7.23. Suppose A is a nondecreasing cadlag function such that
A(0) = 0, and Z is a nondecreasing real-valued cadlag function. Then
(u) = inft 0 : A(t) > u
denes a nondecreasing cadlag function with (0) = 0, and
_
(0,u]
Z (s) d(A )(s)
_
A((u)) u
_
Z (u)
+
_
(0,u]
Z (s) ds.
(7.47)
Proof. That is nondecreasing is immediate. To see right-continuity, let
s > (u). Then A(s) > u. Pick > 0 so that A(s) > u + . Then for
v [u, u +], (v) s.
Also, since A(t) > u for t > (u), by the cadlag property of A
(7.48) A((u)) u.
Since Z is cadlag, the integrals (7.47) are limits of Riemann sums with
integrand evaluated at the left endpoint of partition intervals (Lemma 1.12).
Let 0 = s
0
< s
1
< < s
m
= u be a partition of [0, u]. Next algebraic
manipulations:
m1
i=0
Z((s
i
))
_
A((s
i+1
)) A((s
i
))
_
=
m1
i=0
Z((s
i
))(s
i+1
s
i
)
+
m1
i=0
Z((s
i
))
_
A((s
i+1
)) s
i+1
A((s
i
)) +s
i
_
=
m1
i=0
Z((s
i
))(s
i+1
s
i
) +Z((s
m1
))
_
A((u)) u
_
m1
i=1
_
Z((s
i
)) Z((s
i1
))
__
A((s
i
)) s
i
_
.
The sum on the last line above is nonnegative by the nondecreasing mono-
tonicity of Z and by (7.48). Thus we have
m1
i=0
Z((s
i
))
_
A((s
i+1
)) A((s
i
))
_
m1
i=0
Z((s
i
))(s
i+1
s
i
) +Z((s
m1
))
_
A((u)) u
_
.
Letting the mesh of the partition to zero turns the inequality above into
(7.47).
Let F be a cadlag BV function on [0, T] and
F
its Lebesgue-Stieltjes
measure. The total variation function of F was denoted by V
F
(t). The total
variation measure [
F
[ satises [
F
[ =
V
F
. For Lebesgue-Stieltjes integrals
the general inequality (1.11) for signed measures can be written in this form:
(7.49)
_
(0,T]
g(s) dF(s)

_
(0,T]
[g(s)[ dV
F
(s).
Lemma 7.24. Let X be an adapted cadlag process and > 0. Let
1
<
2
<
3
< be the times of successive jumps in X of magnitude above :
with
0
= 0,
k
= inft >
k1
: [X(t) X(t)[ > .
Then the
k
are stopping times.
Proof. Fix k. We show
k
t T
t
. Consider the event
A =
_
1
_
m1
_
N1
nN
_
there exist integers 0 +
1
m
_
.
We claim that A =
k
t.
Suppose A. Let s
n,i
= u
i
t/n, 1 i k, be the points whose
existence follows for all n N. Pass to a subsequence n
j
such that for
each 1 i k we have convergence s
n
j
,i
t
i
[0, t] as j . From
the description of A there is an 1 < such that t
i
t
i1
t/. The
convergence also forces s
n
j
,i
t/n t
i
. Since the increment in the path X
cannot shrink below + 1/m across the two points s
n
j
,i
t/n and s
n
j
,i
, it
must be the case that [X(t
i
) X(t
i
)[ + 1/m.
Consequently the points t
1
, t
2
, . . . , t
k
are times of jumps of magnitude
above , and thereby
k
t.
Conversely, suppose
k
t and let t
1
< t
2
< < t
k
be times of
jumps of magnitude above in [0, t]. Pick > 0 so that [X(t
i
) X(t
i
)[
+2 and t
i
t
i1
> 4 for 2 i k. Pick (0, ) such that r [t
i
, t
i
)
and s [t
i
, t
i
+] imply
[X(s) X(r)[ > +.
Now let > t/, m > 1/ and N > t/. Then for n N nd integers u
i
such that
(u
i
1)
t
n
< t
i
u
i
t
n
.
Then also
t
i
< (u
i
1)
t
n
< t
i
u
i
t
n
< t
i
+
from which
X
_
u
i
t
n
_
X
_
u
i
t
n

t
n
_
> +
1
m
.
This shows A.
We need to consider equations for stopped processes, and also equations
that are restarted at a stopping time.
Lemma 7.25. Suppose F satises Assumption 7.14 and let
(t) =
_
(0,t]
F(s, X) dY (s).
Let be a nite stopping time. Then
(7.50)
(t) =
_
(0,t]
F(s, X
) dY
(s).
In particular, suppose X satises equation (7.39). Then the equation con-
tinues to hold when all processes are stopped at . In other words,
(7.51) X
(t) = H
(t) +
_
(0,t]
F(s, X
) dY
(s).
Proof. It suces to check (7.50), the second conclusion is then immediate.
Apply part (b) of Proposition 5.50 with G(s) = F(s, X
) and J(s) =
F(s, X). By the precise form of the Lipschitz property (7.40) of F, which is
true for each xed , for 0 s we have
[F(s, X
) F(s, X)[ L sup

u[0,s)
[X
s
X
s
[
L sup
u[0,)
[X
s
X
s
[ = 0.
Then rst by part (b), then by part (a) of Proposition 5.50,
= (J Y )
= (G Y )
= G Y
.
Lemma 7.26. Assume the ltration T
t
is right-continuous. Let be a
nite stopping time for T
t
and

T
t
= T
+t
.
(a) Let be a stopping time for
T
t
. Then + is a stopping time for
T
t
and

T
T
+
.
(b) Suppose is a stopping time for T
t
. Then = ( )
+
is an
T
-measurable random variable, and an
T
t
-stopping time.
(c) Let Z be a cadlag process adapted to T
t
and

Z a cadlag process
adapted to
T
t
. Dene
X(t) =
_
Z(t), t <
Z() +

Z(t ), t .
Then X is a cadlag process adapted to T
t
.
Proof. Part (a). Let A

T
. Observe that
A + < t =
_
r(0,t)Q
A < r t r.
(If > 0 satises + < t , then any rational r (, +) will do.) By
the denition of

T
,
A < r =
_
mN
A r
1
m

T
r
= T
+r
,
and consequently by the denition of T
+r
,
A < r +r t T
t
.
If A = , we have showed that + < t T
t
. By Lemma 2.6 and the
right-continuity of T
t
, this implies that + is an T
t
-stopping time.
For the general A

T
we have showed that

A + t =
mn
A + < t +
1
m
T
t+(1/n)
.
Since this is true for any n,
A + t
n
T
t+(1/n)
= T
t
where we used the right-continuity of T
t
again.
Part (b). For 0 t < , ( )
+
t = + t . By part (ii)
of Lemma 2.3 this event lies in both T
and T
+t
=

T
t
. This second part
implies that = ( )
+
is an
T
t
-stopping time.
Part (c). Fix 0 t < and a Borel set B on the state space of these
processes.
X(t) B = > t, Z(t) B t, Z() +

Z(t ) B.
The rst part > t, Z(t) B lies in T
t
because it is the intersection of
two sets in T
t
. Let = (t )
+
. The second part can be written as
t, Z() +

Z() B.
Cadlag processes are progressively measurable, hence Z() is T
-measurable
and

Z() is

T
-measurable (part (iii) of Lemma 2.3). Since + ,

T
T
+
. By part (a)

T
T
+
. Consequently Z() +

Z() is T
+
-
measurable. Since t is equivalent to + t, we can rewrite the second
part once more as
+ t Z() +

Z() B.
By part (i) of Lemma 2.3 applied to the stopping times + and t, the set
above lies in T
t
.
This concludes the proof that X(t) B T
t
.
7.4.2. A Gronwall estimate for semimartingale equations. In this
section we prove a Gronwall-type estimate for SDEs under fairly stringent
assumptions on the driving semimartingale. When the result is applied,
the assumptions can be relaxed through localization arguments. All the
processes are dened for 0 t < . F is assumed to satisfy the conditions
of Assumption 7.14, and in particular L is the Lipschitz constant of F.
In this section we work with a class of semimartingales that satises the
following denition.
Denition 7.27. Let 0 < , K < be constants. Let us say an R
m
-
valued cadlag semimartingale Y = (Y
1
, . . . , Y
m
)
T
is of type (, K) if Y has a
decomposition Y = Y (0)+M+S where M = (M
1
, . . . , M
m
)
T
is an m-vector
of L
2
-martingales, S = (S
1
, . . . , S
m
)
T
is an m-vector of FV processes, and
(7.52) [M
j
(t)[ , [S
j
(t)[ , and V
t
(S
j
) K
for all 0 t < and 1 j m, almost surely.
This notion of (, K) type is of no general signicance. We use it as a
convenient abbreviation in the existence and uniqueness proofs that follow.
As functions of the various constants that have been introduced, dene the
increasing process
(7.53) A(t) = 16L
2
dm
m
j=1
[M
j
]
t
+ 4KL
2
dm
m
j=1
V
S
j
(t) +t,
the stopping times
(7.54) (u) = inft 0 : A(t) > u,
and the constant
(7.55) c = c(, K, L) = 16
2
L
2
dm
2
+ 4KL
2
dm
2
.
Let us make some observations about A(t) and (u). The term t is
added to A(t) to give A(t) t and make A(t) strictly increasing. Then
A(u + ) u + > u for all > 0 gives (u) u. To see that (u) is
a stopping time, observe that (u) t = A(t) u. First, assuming
(u) t, by (7.48) and monotonicity A(t) A((u)) u. Second, if
A(t) u then by strict monotonicity A(s) > u for all s > t, which implies
(u) t.
Strict monotonicity of A gives continuity of . Right continuity of
was already argued in Lemma 7.23. For left continuity, let s < (u). Then
A(s) u, contradicting
t < (u). Then (v) s for v (A(s), u) which shows left continuity.
In summary: u (u) is a continuous nondecreasing function of bounded
stopping times such that (u) u. For any given and T, once u > A(T)
we have (u) T, and so (u) as u .
If Y is of type (
0
, K
0
) for any
0
and K
0
K then all jumps of A
satisfy [A(t)[ c. This follows because the jumps of quadratic variation
and total variation obey [M](t) = (M(t))
2
and V
S
j
(t) = [S
j
(t)[.
For = 1, 2, let H
, X
, and Z
be adapted R
d
-valued cadlag processes.
Assume they satisfy the equations
(7.56) Z
(t) = H
(t) +
_
(0,t]
F(s, X
) dY (s), = 1, 2,
for all 0 t < . Let
(7.57) D
X
(t) = sup
0st
[X
1
(s) X
2
(s)[
2
and
(7.58)
X
(u) = E
_
D
X
(u)
= E
_
sup
0s(u)
[X
1
(s) X
2
(s)[
2
_
.
Make the same denitions with X replaced by Z and H. D
X
, D
Z
, and
D
H
are nonnegative, nondecreasing cadlag processes.
X
,
Z
, and
H
are
nonnegative nondecreasing functions, and cadlag at least on any interval on
which they are nite. We assume that
(7.59)
H
(u) = E
_
sup
0s(u)
[H
1
(s) H
2
(s)[
2
_
<
for all 0 u < .
The proposition below is the key tool for both the uniqueness and exis-
tence proofs. Part (b) is a Gronwall type estimate for SDEs.
Proposition 7.28. Suppose F satises Assumption 7.14 and Y is a semi-
martingale of type (, K) in Denition 7.27. Furthermore, assume (7.59),
and let the pairs (X
, Z
) satisfy (7.56).
(a) For 0 u < ,
(7.60)
Z
(u) 2
H
(u) +c
X
(u) +
_
u
0
X
(s) ds.
(b) Suppose Z
= X
, in other words X
1
and X
2
satisfy the equations
(7.61) X
(t) = H
(t) +
_
(0,t]
F(s, X
) dY (s), = 1, 2.
Then
X
(u) < . If is small enough relative to K and L so that c < 1,
then for all u > 0,
(7.62)
X
(u)
2
H
(u)
1 c
exp
_
u
1 c
_
.
Before starting the proof of the proposition, we establish an auxiliary
lemma.
Lemma 7.29. Let 0 < , K < and suppose Y is a semimartingale of
type (, K) as specied in Denition 7.27. Let G = G
i,j
be a bounded
predictable d m-matrix valued process. Then for 0 u < we have the
bounds
E
_
sup
0t(u)
_
(0,t]
G(s) dY (s)
2
_

1
2
L
2
E
_
(0,(u)]
[G(s)[
2
dA(s)
1
2
L
2
(u +c)|G|
2
.
(7.63)
Proof of Lemma 7.29. By the denition of Euclidean norm and by the
inequality (x +y)
2
2x
2
+ 2y
2
,
_
(0,t]
G(s) dY (s)
2
2
d
i=1
_
m
j=1
_
(0,t]
G
i,j
(s) dM
j
(s)
_
2
+ 2
d
i=1
_
m
j=1
_
(0,t]
G
i,j
(s) dS
j
(s)
_
2
.
(7.64)
The rst sum inside parentheses after the inequality in (7.64) is an L
2
-
martingale by the boundedness of Gand because each M
j
is an L
2
-martingale.
Take supremum over 0 t (u), on the right pass the supremum inside
the i-sums, and take expectations. By Doobs inequality (3.13), Schwarz in-
equality in the form (
m
i=1
a
i
)
2
m
m
i=1
a
2
i
, and by the isometric property
of stochastic integrals,
E
_
sup
0t(u)
_
m
j=1
_
(0,t]
G
i,j
(s) dM
j
(s)
_
2
_
4E
_ _
m
j=1
_
(0,(u)]
G
i,j
(s) dM
j
(s)
_
2
_
4m
m
j=1
E
_ __
(0,(u)]
G
i,j
(s) dM
j
(s)
_
2
_
4m
m
j=1
E
_
(0,(u)]
G
i,j
(s)
2
d[M
j
](s).
(7.65)
Similarly handle the expectation of the last sum in (7.64). Instead of
quadratic variation, use (7.49) and Schwarz another time.
E
_
sup
0t(u)
_
m
j=1
_
(0,t]
G
i,j
(s) dS
j
(s)
_
2
_
m
m
j=1
E
_
sup
0t(u)
__
(0,t]
G
i,j
(s) dS
j
(s)
_
2
_
m
m
j=1
E
_ __
(0,(u)]
G
i,j
(s)
dV
S
j
(s)
_
2
_
m
m
j=1
E
_
V
S
j
((u))
_
(0,(u)]
G
i,j
(s)
2
dV
S
j
(s)
_
.
(7.66)
Now we prove (7.63). Equations (7.64), (7.65) and (7.66), together with
the hypothesis V
S
j
(t) K, imply that
E
_
sup
0t(u)
_
(0,t]
G(s) dY (s)
2
_
8dm
m
j=1
E
_
(0,(u)]
[G(s)[
2
d[M
j
](s)
+ 2Kdm
m
j=1
E
_
(0,(u)]
[G(s)[
2
dV
S
j
(s)
1
2
L
2
E
_
(0,(u)]
[G(s)[
2
dA(s)
1
2
L
2
E
_
sup
0t(u)
[G(s)[
2
A((u))
_
1
2
L
2
(u +c)|G|
2
.
From A((u)) u and the bound c on the jumps in A came the bound
A((u)) u +c used above. This completes the proof of Lemma 7.29.
Proof of Proposition 7.28. Step 1. We prove the proposition rst under
the additional assumption that there exists a constant C
0
such that [F[ C
0
,
and under the relaxed assumption that Y is of type (, K). This small
relaxation accommodates the localization argument that in the end removes
the boundedness assumptions on F.
Use the inequality (x +y)
2
2x
2
+ 2y
2
to write
[Z
1
(t) Z
2
(t)[
2
2[H
1
(t) H
2
(t)[
2
+ 2
_
(0,t]
_
F(s, X
1
) F(s, X
2
)
_
dY (s)
2
.
(7.67)
We rst check that
(7.68)
Z
(u) = E
_
sup
0t(u)
[Z
1
(t) Z
2
(t)[
2
_
< for all 0 u < .
Combine (7.63) with the hypothesis [F[ C
0
to get the bound
E
_
sup
0t(u)
_
(0,t]
_
F(s, X
1
) F(s, X
2
)
_
dY (s)
2
_
2L
2
(u +c)C
2
0
.
(7.69)
Now (7.68) follows from a combination of inequality (7.67), assumption
(7.59), and bound (7.69). Note that (7.68) does not require a bound on
X, due to the boundedness assumption on F and (7.59).
The Lipschitz assumption on F gives
F(s, X
1
) F(s, X
2
)
2
L
2
D
X
(s).
Apply (7.63) together with the Lipschitz bound to get
E
_
sup
0t(u)
_
(0,t]
_
F(s, X
1
) F(s, X
2
)
_
dY (s)
2
_
1
2
L
2
E
_
(0,(u)]
F(s, X
1
) F(s, X
2
)
2
dA(s)
1
2
E
_
(0,(u)]
D
X
(s) dA(s).
(7.70)
Now we prove part (a) under the assumption of bounded F. Take supre-
mum over 0 t (u) in (7.67), take expectations, and apply (7.70) to
write
Z
(u) = E
_
D
Z
(u)
= E
_
sup
0t(u)
[Z
1
(t) Z
2
(t)[
2
_
2
H
(u) +E
_
(0,(u)]
D
X
(s) dA(s).
(7.71)
To the dA-integral above apply rst the change of variable from Lemma 7.21
and then inequality (7.47). This gives
Z
(u) 2
H
(u)
+E
__
A((u)) u
_
D
X
(u)
+E
_
(0,u]
D
X
(s) ds.
(7.72)
For a xed , cadlag paths are bounded on bounded time intervals, so apply-
ing Lemmas 7.21 and 7.23 to the path-by-path integral
_
(0,(u)]
D
X
(s) dA(s)
is not problematic. And then, since the resulting terms are nonnegative,
their expectations exist.
By the denition of (u), A(s) u for s < (u), and so A((u)) u.
Thus by the bound c on the jumps of A,
A((u)) u A((u)) A((u)) c.
Applying this to (7.72) gives
Z
(u) 2
H
(u) +cE
_
D
X
(u)
+
_
(0,u]
E
_
D
X
(s)
ds
2
H
(u) +c
X
(u) +
_
(0,u]
X
(u) ds.
(7.73)
This is the desired conclusion (7.60), and the proof of part (a) for the case
of bounded F is complete.
We prove part (b) for bounded F. By assumption, Z
= X
and c < 1.
Now
X
(u) =
Z
(u), and by (7.68) this function is nite. Since it is nonde-
creasing, it is bounded on bounded intervals. Inequality (7.73) becomes
(1 c)
X
(u) 2
H
(u) +
_
u
0
X
(s) ds.
An application of Gronwalls inequality (Lemma 7.13) gives the desired con-
clusion (7.62). This completes the proof of the proposition for the case of
bounded F.
Step 2. Return to the original hypotheses: Assumption 7.14 for F
without additional boundedness and Denition 7.27 for Y with type (, K
). By part (iii) of Assumption 7.14, we can pick stopping times
k

and constants B
k
such that
1
(0,
k
]
(s)F(s, X
B
k
for = 1, 2.
Dene truncated functions by
(7.74) F
B
k
(s, , ) =
_
F(s, , ) B
k
_
(B
k
).
By Lemma 7.25, assumption (7.56) gives the equations
Z
(t) = H
(t) +
_
(0,t]
F(s, X
) dY
(s), 1, 2.
Since X
= X
on [0,
k
), F(s, X
) = F(s, X
) on [0,
k
]. The trunca-
tion has no eect on F(s, X
) if 0 s
k
, and so also
F(s, X
) = F
B
k
(s, X
) on [0,
k
].
By part (b) of Proposition 5.50 we can perform this substitution in the
integral, and get the equations
Z
(t) = H
(t) +
_
(0,t]
F
B
k
(s, X
) dY
(s), 1, 2.
Now we have equations with bounded coecients in the integral as required
for Step 1.
We need to check what happens to the type in Denition 7.27 as we
replace Y with Y
. Originally Y was of type (, K). Decompose Y
as
Y
(t) = Y
0
+M
k
(t) +S
(t) M(
k
)1
t
k
.
The new martingale part M
k
is still an L
2
-martingale. Its jumps are a
subset of the jumps of M, hence bounded by . The jth component of the
new FV part is G
j
(t) = S
j
(t) M
j
(
k
)1
t
k
. It has jumps of S
j
on [0,
k
) and the jump of M
j
at
k
, hence all bounded by . The total
variation V
G
j
is at most V
S
j
+ [M
j
(
k
)[ K + = K. We conclude
that Y
is of type (, K).
We have veried all the hypotheses of Step 1 for the stopped processes
and the function F
B
k
. Consequently Step 1 applies and gives parts (a) and
(b) for the stopped processes Z
, H
, and X
. Note that
E
_
sup
0s(u)
[X
1
(s) X
2
(s)[
2
_
= E
_
sup
0s(u), s<
k
[X
1
(s) X
2
(s)[
2
_
E
_
sup
0s(u)
[X
1
(s) X
2
(s)[
2
_
=
X
(u)
while
lim
k
E
_
sup
0s(u)
[X
1
(s) X
2
(s)[
2
_
=
X
(u)
by monotone convergence because
k
. Same facts hold of course for
Z and H too.
Using the previous inequality, the outcome from Step 1 can be written
for part (a) as
E
_
sup
0s(u)
[Z
1
(s) Z
2
(s)[
2
_
2
H
(u) +c
X
(u) +
_
u
0
X
(s) ds.
and for part (b) as
E
_
sup
0s(u)
[X
1
(s) X
2
(s)[
2
_

2
H
(u)
1 c
exp
_
u
1 c
_
.
As k the left-hand sides of the inequalities above converge to the
desired expectations. Parts (a) and (b) both follow, and the proof is com-
plete.
7.4.3. The uniqueness theorem.
Theorem 7.30. Let H be an R
d
-valued adapted cadlag process and Y an
R
m
-valued cadlag semimartingale. Assume F satises Assumption 7.14.
Suppose X
1
and X
2
are two adapted R
d
-valued cadlag processes, and both
are solutions to the equation
(7.75) X(t) = H(t) +
_
(0,t]
F(s, X) dY (s), 0 t < .
Then for almost every , X
1
(t, ) = X
2
(t, ) for all 0 t < .
Proof. At time zero X
1
(0) = H(0) = X
2
(0).
Step 1. We rst show that there exists a stopping time , dened in
terms of Y , such that P > 0 = 1, and for all choices of H and F, and
for all solutions X
1
and X
2
, X
1
(t) = X
2
(t) for 0 t .
To prove this, we stop Y so that the hypotheses of Proposition 7.28 are
satised. Choose 0 < < K/3 < so that c < 1 for c dened by (7.55).
By Theorem 3.20 (fundamental theorem of local martingales) we can choose
the semimartingale decomposition Y = Y (0) + M + G so that the local
L
2
-martingale M has jumps bounded by /2. Fix a constant 0 < C < .
Dene the following stopping times:
1
= inft > 0 : [Y (t) Y (0)[ C or [Y (t) Y (0)[ C,
2
= inft > 0 : [Y (t)[ > /2,
and
3
= inft > 0 : V
G
j
(t) K 2 for some 1 j m.
Lemmas 2.9 and 7.24 guarantee that
1
and
2
are stopping times. For
3
,
observe that since V
G
j
(t) is nondecreasing and cadlag,
3
t =
m
_
j=1
_
V
G
j
(t) K 2
_
.
Each of
1
,
2
and
3
is strictly positive. A cadlag path satises [Y (s)
Y (0)[ < C for s for a positive that depends on the path, but then
1

. The interval [0, 1] contains only nitely many jumps of Y of magnitude
above /2, so there must be a rst one, and this occurs at some positive
time because Y (0+) = Y (0). Finally, total variation V
G
j
is cadlag and so
V
G
j
(0+) = V
G
j
(0) = 0, hence
3
> 0.
Let T be an arbitrary nite positive number and
(7.76) =
1

2

3
T.
P( > 0) = 1 by what was said above.
We claim that semimartingale Y
satises the hypotheses of Proposi-

tion 7.28. To see this, decompose Y
as
(7.77) Y
(t) = Y (0) +M
(t) +G
(t) M() 1
t
.
Stopping does not introduce new jumps, so the jumps of M
are still bounded

by /2. The jumps of Y
are bounded by /2 since

2
. On [0, ) the
FV part S(t) = G
(t) M() 1
t
has the jumps of G
. These are
bounded by because G
(t) = Y
(t) M
(t). At time S has

the jump M(), bounded by /2. The total variation of a component S
j
of S is
V
S
j
(t) V
G
j
(t) +[M
j
()[ V
G
j
(
3
) +/2 K 2 +/2
K .
Since [Y
Y (0)[ C and [S
j
[ [V
S
j
[ K, it follows that M
is bounded,
and consequently an L
2
-martingale.
We have veried that Y
is of type (, K ) according to Denition

7.27.
By assumption equation (7.75) is satised by both X
1
and X
2
. By
Lemma 7.25, both X
1
and X
2
satisfy the equation
(7.78) X(t) = H
(t) +
_
(0,t]
F(s, X) dY
(s), 0 t < .
To this equation we apply Proposition 7.28. In the hypotheses of Proposition
7.28 take H
1
= H
2
= H
, so
H
(u) = 0 and assumption (7.59) is satised.
All the hypotheses of part (b) of Proposition 7.28 are satised, and we get
E
_
sup
0t(u)
[X
1
(t) X
2
(t)[
2
_
= 0
for any u > 0, where (u) is the stopping time dened by (7.54). As we
let u we get X
1
(t) = X
2
(t) for all 0 t < . This implies that
X
1
(t) = X
2
(t) for 0 t < .
At time ,
X
1
() = H() +
_
(0,]
F(s, X
1
) dY (s)
= H() +
_
(0,]
F(s, X
2
) dY (s)
= X
2
()
(7.79)
because the integrand F(s, X
1
) depends only on X
1
(s) : 0 s < which
agrees with the corresponding segment of X
2
, as established in the previous
paragraph. Now we know that X
1
(t) = X
2
(t) for 0 t . This concludes
the proof of Step 1.
Step 2. Now we show that X
1
and X
2
agree up to an arbitrary nite
time T. At this stage we begin to make use of the assumption of right-
continuity of T
t
. Dene
(7.80) = inft 0 : X
1
(t) ,= X
2
(t).
The time when the cadlag process X
1
X
2
rst enters the open set 0
c
is
a stopping time under a right-continuous ltration by Lemma 2.7. Hence
is a stopping time. Since X
1
= X
2
on [0, ), if < , a calculation like
the one in (7.79) shows that X
1
= X
2
on [0, ]. From Step 1 we also know
> 0.
The idea is to apply Step 1 again, starting from time . For this we need
a lemma that enables us to restart the equation.
Lemma 7.31. Assume F satises Assumption 7.14 and X satises equation
(7.39) on [0, ). Let be a bounded stopping time. Dene a new ltration,
new processes, and a new coecient by

T
t
= T
+t
,

X(t) = X(+t)X(),
Y (t) = Y ( +t) Y (),

H(t) = H( +t) H(), and
F(t, , ) = F( +t, ,
,
)
where the cadlag path
,
D
R
d[0, ) is dened by
,
(s) =
_
X(s), 0 s <
X() +(s ), s .
Then under
T
t
,

X and

H are adapted cadlag processes,

Y is a semimartin-
gale, and

F satises Assumption 7.14.

X is a solution of the equation
X(t) =

H(t) +
_
(0,t]
F(s,

X) d
Y (s).
Proof of Lemma 7.31. We check that the new

F satises all the hypothe-
ses. The Lipschitz property is immediate. Let

Z be a cadlag process adapted
to
T
t
. Dene the process Z by
Z(t) =
_
X(t), t <
X() +

Z(t ), t .
Then Z is a cadlag process adapted to T
t
by Lemma 7.26.

F(t,

Z) =
F( +t, Z) is predictable under
T
t
by Lemma 5.46. Find stopping times
k
such that 1
(0,
k
]
(s)F(s, Z) is bounded for each k. Dene
k
=
(
k
)
+
. Then
k
, by Lemma 7.26
k
is a stopping time for
T
t
,
and 1
(0,
k
]
(s)

F(s,

Z) = 1
(0,
k
]
( +s)F( +s, Z) which is bounded.
Y is a semimartingale by Theorem 5.45.

X and

H are adapted to
T
t
by part (iii) of Lemma 2.3 (recall that cadlag paths imply progressive mea-
surability).
The equation for

X checks as follows:
X(t) = X( +t) X()

= H( +t) H() +
_
(,+t]
F(s, X) dY (s)
=

H(t) +
_
(0,t]
F( +s, X) d
Y (s)
=

H(t) +
_
(0,t]
F(s,

X) d
Y (s).
The next to last equality is from (5.48), and the last equality from the
denition of

F and
,

X
= X.
We return to the proof of the uniqueness theorem. Let 0 < T < . By
Lemma 7.31, we can restart the equations for X
1
and X
2
at time = T.
Since X
1
= X
2
on [0, ], both X
1
and X
2
lead to the same new coecient
F(t, , ) for the restarted equation. Consequently we have
(t) =

H(t) +
_
(0,t]
F(s,

X
) d
Y (s), = 1, 2.
Applying Step 1 to this restarted equation, we nd a stopping time > 0
in the ltration T
(T)+t
such that

X
1
=

X
2
on [0, ]. This implies that
X
1
= X
2
on [0, T +]. Hence by denition (7.80), T +, which
implies that T. Since T was arbitrary, = . This says that X
1
and
X
2
agree for all time.
Remark 7.32. Let us note the crucial use of the fundamental theorem of
local martingales in Step 1 of the proof above. Suppose we could not choose
the decomposition Y = Y (0) + M + G so that M has jumps bounded by
/2. Then to satisfy the hypotheses of Proposition 7.28, we could try to stop
before either M or S has jumps larger than /2. However, this attempt runs
into trouble. The stopped local martingale part M
in (7.77) might have

a large jump exactly at time . Replacing M
with M
would eliminate
this problem, but there is no guarantee that the process M
is a local
martingale.
7.4.4. Existence theorem. We begin with an existence theorem under
stringent assumptions on Y . These hypotheses are subsequently relaxed
with a localization argument.
Proposition 7.33. A solution to (7.39) on [0, ) exists under the following
assumptions:
(i) F satises Assumption 7.14.
(ii) There are constants 0 < < K < such that c dened by (7.55)
satises c < 1, and Y is of type (, K) as specied by Denition
7.27.
We prove this proposition in several stages. The hypotheses are chosen
so that the estimate in Proposition 7.28 can be applied.
We dene a Picard iteration as follows. Let X
0
(t) = H(t), and for n 0,
(7.81) X
n+1
(t) = H(t) +
_
(0,t]
F(s, X
n
) dY (s).
We need to stop the processes in order to get bounds that force the iteration
to converge. By part (ii) of Assumption 7.14 x stopping times
k

and constants B
k
such that 1
(0,
k
]
(s)[F(s, H)[ B
k
, for all k. By Lemma
7.25 equation (7.81) continues to hold for stopped processes:
(7.82) X
n+1
(t) = H
(t) +
_
(0,t]
F(s, X
n
) dY
(s).
Fix k for a while. For n 0 let
D
n
(t) = sup
0st
[X
n+1
(s) X
n
(s)[
2
.
Recall denition (7.54), and for 0 u < let
n
(u) = E
_
D
n
(u)
.
Lemma 7.34. The function
0
(u) is bounded on bounded intervals.
Proof. Because
0
is nondecreasing it suces to show that
0
(u) is nite
for any u. First,
[X
1
(t) X
0
(t)[
2
=
_
(0,t]
F(s, H
) dY
(s)
2
=
_
(0,t]
F
B
k
(s, H
) dY
(s)
2
where F
B
k
denotes the truncated function
F
B
k
(s, , ) =
_
F(s, , ) B
k
_
(B
k
).
The truncation can be introduced in the stochastic integral because on [0,
k
]
F(s, H
) = F(s, H) = F
B
k
(s, H) = F
B
k
(s, H
).
We get the bound
0
(u) = E
_
sup
0t(u)
[X
1
(t) X
0
(t)[
2
_
E
_
sup
0t(u)
_
(0,t]
F
B
k
(s, H) dY (s)
2
_
1
2
L
2
(u +c)B
2
k
.
The last inequality above is from Lemma 7.29.
Part (a) of Proposition 7.28 applied to (Z
1
, Z
2
) = (X
n
, X
n+1
) and
(H
1
, H
2
) = (H
, H
) gives
(7.83)
n+1
(u) c
n
(u) +
_
u
0
n
(s) ds.
Note that we need no assumption on H because only the dierence H
= 0 appears in hypothesis (7.59) for Proposition 7.28. By the previous

lemma
0
is bounded on bounded intervals. Then (7.83) shows inductively
that all
n
are bounded functions on bounded intervals. The next goal is to
prove that

n
n
(u) < .
Lemma 7.35. Fix 0 < T < . Let
n
be nonnegative measurable func-
tions on [0, T] such that
0
B for some constant B, and inequality (7.83)
is satised for all n 0 and 0 u T. Then for all n and 0 u t,
(7.84)
n
(u) B
n
k=0
1
k!
_
n
k
_
u
k
c
nk
.
Proof. Use induction. (7.84) is true for n = 0 by hypothesis. Assume it is
true for n. By (7.83),
n+1
(u) c
n
(u) +
_
u
0
n
(s) ds
B
n
k=0
c
n+1k
k!
_
n
k
_
u
k
+B
n
k=0
c
nk
k!
_
n
k
__
u
0
s
k
ds
= B
n
k=0
c
n+1k
k!
_
n
k
_
u
k
+B
n+1
k=1
c
n+1k
k!
_
n
k 1
_
u
k
= B
n+1
k=0
c
n+1k
k!
_
n + 1
k
_
u
k
.
For the last step above, combine terms and use
_
n
k
_
+
_
n
k1
_
=
_
n+1
k
_
.
To (7.84) we apply the next limit.
Lemma 7.36. For any 0 < < 1 and 0 < u < ,
n=0
n
k=0
1
k!
_
n
k
_
u
k
nk
=
1
1
exp
_
u
1
_
.
Proof. First check this auxiliary equality for 0 < x < 1 and k 0:
(7.85)
m=0
(m+ 1)(m+ 2) (m+k)x
m
=
k!
(1 x)
k+1
.
One way to see this is to write the left-hand sum as
m=0
d
k
dx
k
x
m+k
=
d
k
dx
k
_

m=0
x
m+k
_
=
d
k
dx
k
_
x
k
1 x
_
=
k
j=0
_
k
j
_
d
j
dx
j
_
1
1 x
_
d
kj
dx
kj
_
x
k
_
=
k
j=0
_
k
j
_
j!
(1 x)
j+1
k(k 1) (j + 1)x
j
=
k!
1 x
k
j=0
_
k
j
_
_
x
1 x
_
j
=
k!
1 x
_
1 +
x
1 x
_
k
=
k!
(1 x)
k+1
.
For an alternative proof of (7.85) see Exercise 7.5.
After changing the order of summation, the sum in the statement of the
lemma becomes
k=0
u
k
(k!)
2
n=k
n(n 1) (n k + 1)
nk
=
k=0
u
k
(k!)
2
m=0
(m+ 1)(m+ 2) (m+k)
m
=
k=0
u
k
(k!)
2

k!
(1 )
k+1
=
1
1
k=0
1
k!
_
u
1
_
k
=
1
1
exp
_
u
1
_
.
As a consequence of the last two lemmas we get
(7.86)
n=0
n
(u) < .
It follows by the next Borel-Cantelli argument that for almost every , the
cadlag functions X
n
(t) : n Z
+
on the interval t [0, (u)] form a
Cauchy sequence in the uniform norm. (Recall that we are still holding k
xed.) Pick (c, 1). By Chebychevs inequality and (7.84),
n=0
P
_
sup
0t(u)
[X
n+1
(t) X
n
(t)[
n/2
_
B
n=0
n
(u)
B
n=0
n
k=0
1
k!
_
n
k
_
_
u
_
k
_
c
_
nk
.
This sum converges by Lemma 7.36. By the Borel-Cantelli lemma there
exists an almost surely nite N() such that for n N(),
sup
0t(u)
[X
n+1
(t) X
n
(t)[ <
n/2
.
Consequently for p > m N(),
sup
0t(u)
[X
m
(t) X
p
(t)[
p1
n=m
sup
0t(u)
[X
n+1
(t) X
n
(t)[
n=m
n/2
=

m/2
1
1/2
.
This last bound can be made arbitrarily small by taking m large enough.
This gives the Cauchy property. By the completeness of cadlag functions
under uniform distance (Lemma A.5), for almost every there exists a
cadlag function t

X
k
(t) on the interval [0, (u)] such that
(7.87) lim
n
sup
0t(u)
[
X
k
(t) X
n
(t)[ = 0.
By considering a sequence of u-values increasing to innity, we get a single
event of probability one on which (7.87) holds for all 0 u < . For any
xed and T < , (u) T for all large enough u. We conclude that,
with probability one, there exists a cadlag function

X
k
on the interval [0, )
such that
(7.88) lim
n
sup
0tT
[
X
k
(t) X
n
(t)[ = 0 for all T < .
Next we show, still with k xed, that

X
k
is a solution of
(7.89)

X
k
(t) = H
(t) +
_
(0,t]
F(s,

X
k
) dY
(s).
For this we let n in (7.82) to obtain (7.89) in the limit. The left
side of (7.82) converges to the left side of (7.89) almost surely, uniformly on
compact time intervals by (7.88). For the the right side of (7.82) we apply
Theorem 5.44. From the Lipschitz property of F
F(t,

X
k
) F(t, X
n
)
L sup
0s<t
[
X
k
(s) X
n
(s)[.
In Theorem 5.44 take H
n
(t) = F(t,

X
k
) F(t, X
n
) and the cadlag bound
G
n
(t) = L sup
0st
[
X
k
(s) X
n
(s)[.
(Note the range of the supremum over s.) The convergence in (7.88) gives
the hypothesis in Theorem 5.44, and we conclude that the right side of
(7.82) converges to the right side of (7.89), in probability, uniformly over
t in compact time intervals. We can get almost sure convergence along a
subsequence. Equation (7.89) has been veried.
This can be repeated for all values of k. The limit (7.88) implies that if
k < m, then

X
m
=

X
k
on [0,
k
). Since
k
, we conclude that there is a
single well-dened cadlag process X on [0, ) such that X =

X
k
on [0,
k
).
On [0,
k
) equation (7.89) agrees term by term with the equation
(7.90) X(t) = H(t) +
_
(0,t]
F(s, X) dY (s).
For the integral term this follows from part (b) of Proposition 5.50 and
the manner of dependence of F on the path: since X =

X
k
on [0,
k
),
F(s, X) = F(s,

X
k
) on [0,
k
].
We have found a solution to equation (7.90). The proof of Proposition
7.33 is thereby complete.
The main result of this section is the existence of a solution without extra
assumptions on Y . This theorem will also complete the proof of Theorem
7.15.
Theorem 7.37. A solution to (7.39) on [0, ) exists under Assumption
7.14 on F, for an arbitrary semimartingale Y and cadlag process H.
Given an arbitrary semimartingale Y and a cadlag process H, x con-
stants 0 < < K/3 < so that c dened by (7.55) satises c < 1. Pick
a decomposition Y = Y
0
+ M + G such that the local L
2
-martingale M
has jumps bounded by /2. Fix a constant 0 < C < . Dene the fol-
lowing stopping times
k
,
k
and
i
k
for 1 i 3 and k Z
+
. First
0
=
0
=
i
0
= 0 for 1 i 3. For k 1,
1
k
= inft > 0 : [Y (
k1
+t) Y (
k1
)[ C
or [Y ((
k1
+t)) Y (
k1
)[ C,
2
k
= inft > 0 : [Y (
k1
+t)[ > /2,
3
k
= inft > 0 : V
G
j
(
k1
+t) V
G
j
(
k1
) K 2
for some 1 j m,
k
=
1
k

2
k

3
k
1, and
k
=
1
+ +
k
.
Each
k
> 0 for the same reasons that dened by (7.76) was positive.
Consequently 0 =
0
<
1
<
2
<
We claim that
k
. To see why, suppose to the contrary that
k
() u < for some sample point . By the existence of the limit
Y (u) there exists > 0 such that [Y (s) Y (t)[ /4 for s, t [u , u).
But then, once
k
[u /2, u), it must be that
k
+
i
k+1
u for both
i = 1, 2. (Tacitly assuming here that C > /4.) since V
G
j
is nondecreasing,
3
km
u would force V
G
j
(u) = .
By iterating part (a) of Lemma 7.26 one can conclude that for each
k,
k
is a stopping time for T
t
, and then
k+1
is a stopping time for
T
k
+t
: t 0. (Recall that we are assuming T
t
right-continuous now.)
The heart of the existence proof is an iteration which we formulate with
the next lemma.
Lemma 7.38. For each k, there exists an adapted cadlag process X
k
(t) such
that the equation
(7.91) X
k
(t) = H
k
(t) +
_
(0,t]
F(s, X
k
) dY
k
(s)
is satised.
Proof. The proof is by induction. After each
k
, we restart the equation
but stop before the hypotheses of Proposition 7.33 are violated. This way
we can apply Proposition 7.33 to construct a segment of the solution, one
interval (
k
,
k+1
) at a time. An explicit denition takes the solution up to
time
k+1
, and then we are ready for the next iteration.
For k = 1,
1
=
1
. The semimartingale Y
satises the hypotheses

of Proposition 7.33. This argument is the same as in the uniqueness proof
of the previous section, at (7.77). Consequently by Proposition 7.33 there
exists a solution

X of the equation
(7.92)

X(t) = H
(t) +
_
(0,t]
F(s,

X) dY
(s).
Dene
X
1
(t) =
_
X(t), 0 t <
1
X
1
(
1
) + H(
1
) +F(
1
,

X)Y (
1
), t
1
.
F(t,

X) = F(t, X
1
) for 0 t
1
because X
1
=

X on [0,
1
). Then X
1
solves (7.91) for k = 1 and 0 t <
1
. For t
1
, recalling (5.46)(5.47)
for left limits and jumps of stochastic integrals:
X
1
(t) = X
1
(
1
) + H(
1
) +F(
1
,

X)Y (
1
)
= H(
1
) +
_
(0,
1
]
F(s, X
1
) dY (s)
= H
1
(t) +
_
(0,t]
F(s, X
1
) dY
1
(s).
The equality of stochastic integrals of the last two lines above is an instance
of the general identity G Y
= (G Y )
(Proposition 5.39). The case k = 1

of the lemma has been proved.
Assume a process X
k
(t) solves (7.91). Dene

T
t
= T
k
+t
,
H(t) = H(
k
+t) H(
k
),
Y (t) = Y (
k
+t) Y (
k
),
and

F(t, , ) = F(
k
+t, ,
,
)
where the cadlag path
,
is dened by
,
(s) =
_
X
k
(s), 0 s <
k
X
k
(
k
) +(s
k
), s
k
.
Now we nd a solution

X to the equation
(7.93)

X(t) =

H
k+1
(t) +
_
(0,t]
F(s,

X) d
k+1
(s)
under the ltration
T
t
. We need to check that this equation is the type
to which Proposition 7.33 applies. Semimartingale

Y
k+1
satises the as-

sumption of Proposition 7.33, again by the argument already used in the
uniqueness proof.

F satises Assumption 7.14 exactly as was proved earlier
for Lemma 7.31.
The hypotheses of Proposition 7.33 have been checked, and so there
exists a process

X that solves (7.93). Note that

X(0) =

H(0) = 0. Dene
X
k+1
(t) =
_
_
X
k
(t), t <
k
X
k
(
k
) +

X(t
k
),
k
t <
k+1
X
k+1
(
k+1
) + H(
k+1
)
+F(
k+1
, X
k+1
)Y (
k+1
), t
k+1
.
The last case of the denition above makes sense because it depends on the
segment X
k+1
(s) : 0 s <
k+1
dened by the two preceding cases.
By induction X
k+1
satises the equation (7.91) for k+1 on [0,
k
]. From
the denition of

F,

F(s,

X) = F(
k
+ s, X
k+1
) for 0 s
k+1
. Then for
k
< t <
k+1
,
X
k+1
(t) = X
k
(
k
) +

X(t
k
)
= X
k
(
k
) +

H(t
k
) +
_
(0,t
k
]
F(s,

X) d
Y (s)
= X
k
(
k
) +H(t) H(
k
) +
_
(
k
,t]
F(s, X
k+1
) dY (s)
= H(t) +
_
(0,t]
F(s, X
k+1
) dY (s)
= H
k+1
(t) +
_
(0,t]
F(s, X
k+1
) dY
k+1
(s).
The last line of the denition of X
k+1
extends the validity of the equation
to t
k+1
:
X
k+1
(t) = X
k+1
(
k+1
) + H(
k+1
) +F(
k+1
, X
k+1
)Y (
k+1
)
= H(
k+1
) + H(
k+1
)
+
_
(0,
k+1
)
F(s, X
k+1
) dY (s) +F(
k+1
, X
k+1
)Y (
k+1
)
= H(
k+1
) +
_
(0,
k+1
]
F(s, X
k+1
) dY (s)
= H
k+1
(t) +
_
(0,t]
F(s, X
k+1
) dY
k+1
(s).
By induction, the lemma has been proved.
We are ready to nish o the proof of Theorem 7.37. If k < m, stopping
the processes of the equation
X
m
(t) = H
m
(t) +
_
(0,t]
F(s, X
m
) dY
m
(s)
at
k
gives the equation
X
k
m
(t) = H
k
(t) +
_
(0,t]
F(s, X
k
m
) dY
k
(s).
By the uniqueness theorem, X
k
m
= X
k
for k < m. Consequently there exists
a process X that satises X = X
k
on [0,
k
] for each k. Then for 0 t
k
,
equation (7.91) agrees term by term with the desired equation
X(t) = H(t) +
_
(0,t]
F(s, X) dY (s).
Hence this equation is valid on every [0,
k
], and thereby on [0, ). The
existence and uniqueness theorem has been proved.
Exercises
Exercise 7.1. (a) Show that for any g C[0, 1],
lim
t,1
(1 t)
_
t
0
g(s)
(1 s)
2
ds = g(1).
(b) Let the process X
t
be dened by (7.13) for 0 t < 1. Show that
X
t
0 as t 1.
Hint. Apply Exercise 6.6 and then part (a).
Exercise 7.2. Let B be standard one-dimensional Brownian motion. Find
a solution Y
t
to the SDE
dY = r dt +Y dB
with a given initial value Y
0
independent of the Brownian motion.
Hint. You might try a suitable integrating factor to guess at a solution.
Then check your solution by using Itos formula to compute dY
t
.
Exercise 7.3. Show that the proof of Theorem 7.11 continues to work if
the Lipschitz assumption on and b is weakened to this local Lipschitz
condition: for each n N there exists a constant L
n
< such that
(7.94) [b(t, x) b(t, y)[ +[(t, x) (t, y)[ L
n
[x y[
for all t R
+
and x, y R
d
.
Exercise 7.4. (a) Let Y be a cadlag semimartingale. Find [[Y ]] (the qua-
dratic variation of the quadratic variation) and the covariation [Y, [Y ]]. For
use in part (b) you should nd the simplest possible formulas in terms of
the jumps of Y .
(b) Let Y be a continuous semimartingale. Show that
X
t
= X
0
exp(t +Y
t
1
2
2
[Y ]
t
)
solves the SDE
dX = X dt +X dY.
Exercise 7.5. Here is an alternative inductive proof of the identity (7.85)
used in the existence proof for solutions of SDEs. Fix 1 < x < 1 and let
a
k
=
m=0
(m+ 1)(m+ 2) (m+k)x
m
and
b
k
=
m=0
m(m+ 1)(m+ 2) (m+k)x
m
.
Compute a
1
explicitly, then derive the identities b
k
= xa
k+1
and a
k+1
=
(k + 1)a
k
+b
k
.
Chapter 8
Applications of
Stochastic Calculus
8.1. Local time
In Section 8.1.1 we construct local time for Brownian motion and then in
Section 8.1.2 we derive Tanakas formula and the distribution of local time.
8.1.1. Existence of local time for Brownian motion. (, T, P) is a
xed complete probability space with a complete ltration T
t
, and B =
B
t
a one-dimensional Brownian motion with respect to the ltration T
t
(Denition 2.26). By redening B on an event of probability zero if necessary

we can assume that B
t
() is continuous in t for each . The initial
point B
0
is an arbitrary, possibly random point in R.
We construct the local time process L(t, x) of Brownian motion. This
process measures the time spent at point x up to time t. Note that it does not
make sense to look literally at the amount of time spent at x up to time t:
this would be the random variable
_
t
0
1B
s
= x ds which vanishes almost
surely (Exercise 8.1). The proper thing to look for is a Radon-Nikodym
derivative with respect to Lebesgue measure. This idea is captured by the
requirement that
(8.1)
_
t
0
g(B
s
()) ds =
_
R
g(x)L(t, x, ) dx
for all bounded Borel functions g. Here is the existence theorem that sum-
marizes the main properties.
269
270 8. Applications of Stochastic Calculus
Theorem 8.1. There exists a process L(t, x, ) : t R
+
, x R on
(, T, P) and an event
0
such that P(
0
) = 1 with the following properties.
L(t, x) is T
t
-measurable for each (t, x). For each
0
the following
statements hold.
(a) L(t, x, ) is jointly continuous in (t, x) and nondecreasing in t.
(b) Identity (8.1) holds for all t R
+
and bounded Borel functions g on
R.
(c) For each xed x R,
(8.2)
_

0
1B
t
() ,= x L(dt, x, ) = 0.
The integral in (8.2) is the Lebesgue-Stieltjes integral over time whose
integrator is the nondecreasing function t L(t, x, ). The continuity of
L(t, x) and (8.1) imply a pointwise characterization of local time as
(8.3) L(t, x, ) = lim
0
1
2
_
t
0
1
(x,x+)
(B
s
()) ds for
0
.
We do not base the proof of Theorem 8.1 on a direct construction such as
(8.3), intuitively attractive as it is. But note that (8.3) does imply the claim
about the monotonicity of L(t, x, ) as a function of t.
We shall rst give a precise denition of L(t, x), give some intuitive
justication for it, and then prove Theorem 8.1. We use stochastic integrals
of the type
_
t
0
1
[x,)
(B
s
) dB
s
. To justify their well-denedness note that the
integrand 1
[x,)
(B
s
()) is predictable, by Exercise 5.2(b). Alternately, it is
enough to observe that the integrand is a measurable process and appeal to
the fact that measurability is enough for integration with Brownian motion
(Chapter 4 or Section 5.5).
To dene L(t, x) we need a nice version of these integrals. We postpone
the proof of this lemma to the end of the section.
Lemma 8.2. There exist a process I(t, x) : t R
+
, x R such that
(t, x) I(t, x, ) is continuous for all and for each (t, x),
(8.4) I(t, x) =
_
t
0
1
[x,)
(B
s
) dB
s
with probability 1.
Dene the process L(t, x) by the equation
(8.5)
1
2
L(t, x, ) = (B
t
() x)
+
(B
0
() x)
+
I(t, x, ).
L(t, x, ) is continuous in (t, x) for each because the terms on the
right are continuous.
8.1. Local time 271
Heuristic justication. Let us explain informally the idea behind this
denition. The limit in (8.3) captures the idea of local time as a derivative
with respect to Lebesgue measure. But Brownian paths are rough so we
prefer to take the 0 limit in a situation with more regularity. Itos
formula gives us a way out. Among the terms that come out of applying
Itos formula to a function is the integral
1
2
_
t
0

tt
(B
s
) ds. So x a small
> 0 and take
tt
(z) = (2)
1
1
(x,x+)
(z) so that Itos formula captures
the integral on the right of (8.3). Two integrations show that should
satisfy
t
(z) =
_
_
0, z < x
(2)
1
(z x +), x z x +
1, z > x +
and
(z) =
_
_
0, z < x
(4)
1
(z x +)
2
, x z x +
z x, z > x +.
However,
tt
is not continuous so we cannot apply Itos formula to . But
if we could, the outcome would be
(8.6)
1
4
_
t
0
1
(x,x+)
(B
s
()) ds = (B
t
) (B
0
)
_
t
0
t
(B) dB.
As 0, (z) (z x)
+
and
t
(z) 1
[x,)
(z), except at z = x, but this
single point would not make a dierence to the stochastic integrals. The
upshot is that denition (8.5) represents the unjustied 0 limit of the
unjustied equation (8.6).
Returning to the rigorous proof, next we show that L(t, x) functions as
an occupation density.
Proposition 8.3. There exists an event
0
such that P(
0
) = 1 and for all

0
the process L(t, x, ) dened by (8.5) satises (8.1) for all t R
+
and all bounded Borel functions g on R.
Proof. We show that (8.1) holds for a special class of functions. The class
we choose comes from the proof given in [9]. The rest is the usual clean-up.
Step 1. We verify that (8.1) holds almost surely for a xed t and a
xed piecewise linear continuous function g = g
q,r,
of the following type:
(8.7) g(x) =
_
_
0, x q or x r +
(x q +)/, q x q
1, q x r
(r + x)/, r x r +
where q < r and > 0 are rationals. As becomes small, g approximates
the indicator function of the closed interval [q, r]. Abbreviate a = q and
b = r + so that g vanishes outside [a, b].
We need a Fubini theorem of sorts for stochastic integrals. We sketch
the proof of this lemma after the main proof.
Lemma 8.4. The equality
(8.8)
_
t
0
__
b
a
g(x)1
[x,)
(B
s
) dx
_
dB
s
=
_
b
a
g(x) I(t, x) dx
holds almost surely.
The trick is now to write down a version of Itos formula where the
integral
_
t
0
g(B
s
) ds appears as the quadratic variation part. To this end,
dene
f(z) =
_
z
dy
_
y
dxg(x) =
_
b
a
g(x)(z x)
+
dx
and note that
f
t
(z) =
_
z
g(x) dx =
_
b
a
g(x)1
[x,)
(z) dx and f
tt
(z) = g(z).
By Itos formula
1
2
_
t
0
g(B
s
) ds = f(B
t
) f(B
0
)
_
t
0
f
t
(B
s
) dB
s
=
_
b
a
g(x)
_
(B
t
x)
+
(B
0
x)
+
dx
_
t
0
__
b
a
g(x)1
[x,)
(B
s
) dx
_
dB
s
[by (8.8)]
=
_
b
a
g(x)
_
(B
t
x)
+
(B
0
x)
+
I(t, x)
dx
[by (8.5)]
=
1
2
_
b
a
g(x)L(t, x) dx.
Comparing the rst and last expressions of the calculation veries (8.1)
almost surely for a particular g of type (8.7) and a xed t.
Step 2. Let
0
be the event of full probability on which (8.1) holds for
all rational t 0 and the countably many g
q,r,
for rational (q, r, ). For a
xed g
q,r,
both sides of (8.1) are continuous in t. (The right side because
L(t, x, ) is uniformly continuous for (t, x) in a xed rectangle which can be
taken large enough to contain the support of g.) Consequently on
0
(8.1)
extends from rational t to all t R
+
.
8.1. Local time 273
At this point we can see that L(t, x, ) 0. If L(t, x, ) < 0 then
by continuity there exists > 0 and an interval (a, b) around x such that
L(t, y, ) for y (a, b). But for g of type (8.7) supported inside (a, b)
we have
_
R
g(x)L(t, x, ) dx =
_
t
0
g(B
s
()) ds 0,
a contradiction.
To extend (8.1) to all functions with
0
xed, think of both sides
of (8.1) as dening a measure on R. By letting q and r in
g
q,r,
we see that both sides dene nite measures with total mass t. By
letting 0 we see that the two measures agree on closed intervals [q, r]
with rational endpoints. These intervals form a -system that generates B
R
.
By Lemma B.3 the two measures agree on all Borel sets, and consequently
integrals of bounded Borel functions agree also.
It remains to prove claim (8.2). By Exercise 1.2 it suces to show
that if t is a point of strict increase of L(, x, ) with x R and
0
xed, then B
t
() = x. So suppose t is such a point, so that in particular
L(t, x, ) < L(t
1
, x, ) for all t
1
> t. It follows from (8.3) that for all small
enough > 0,
_
t
0
1
(x,x+)
(B
s
()) ds <
_
t
1
0
1
(x,x+)
(B
s
()) ds
which would not be possible unless there exists s (t, t
1
) such that B
s
()
(x , x + ). Take a sequence
j
0 and for each
j
pick s
j
(t, t
1
)
such that B
s
j
() (x
j
, x +
j
). By compactness there is a convergent
subsequence s
j
k
s [t, t
1
]. By path-continuity B
s
j
k
() B
s
() and by
choice of s
j
s, B
s
j
k
() x. We conclude that for each t
1
> t there exists
s [t, t
1
] such that B
s
() = x. Taking t
1
t gives B
t
() = x.
We have now proved all the claims of Theorem 8.1. It remains to prove
the two lemmas used along the way.
Proof of Lemma 8.4. Recall that we are proving the almost sure identity
(8.9)
_
t
0
__
b
a
g(x)1
[x,)
(B
s
) dx
_
dB
s
=
_
b
a
g(x) I(t, x) dx
where g is given by (8.7), a = q and b = r + , and g vanishes outside
[a, b].
The left-hand stochastic integral is well-dened because the integrand is
a bounded continuous process. The continuity can be seen from
(8.10)
_
b
a
g(x)1
[x,)
(B
s
) dx =
_
Bs
g(x) dx.
Since g(x)I(t, x) is continuous the integral on the right of (8.9) can be
written as a limit of Riemann sums. Introduce partitions x
i
= a +(i/n)(b
a), 0 i n, with mesh = (b a)/n. Stochastic integrals are linear, so
on the right of (8.9) we have almost surely
lim
n
i=1
g(x
i
)I(t, x
i
) = lim
n
i=1
g(x
i
)
_
t
0
1
[x
i
,)
(B
s
) dB
s
= lim
n
_
t
0
i=1
g(x
i
)1
[x
i
,)
(B
s
) dB
s
.
By the L
2
isometry of stochastic integration we can assert that this limit
equals the stochastic integral on the left of (8.9) if
lim
n
_
t
0
E
___
b
a
g(x)1
[x,)
(B
s
) dx
n
i=1
g(x
i
) 1
[x
i
,)
(B
s
)
_
2
_
= 0.
We leave this as an exercise.
Proof of Lemma 8.2. Abbreviate M(t, x) =
_
t
0
1
[x,)
(B
s
) dB
s
. Recall
that the task is to establish the existence of a continuous version for M(t, x),
as a function of (t, x) R
+
R. We will show that for any T < and
m N there exists a nite constant C = C(T, m) so that
(8.11) E
_
[M(s, x) M(t, y)[
2m
C[(s, x) (t, y)[

m
for (s, x), (t, y) [0, T] R. After (8.11) has been shown the lemma follows
from the Kolmogorov-Centsov criterion (Theorem B.17). Precisely speaking,
here is what needs to be said to derive the lemma from Theorem B.17.
The index set has dimension two. So if we take m > 2, Theorem B.17
implies that for each positive integer k the process M(t, x) has a continuous
version Y
(k)
(t, x) : (t, x) [0, k] [k, k] on the compact index set [0, k]
[k, k]. The processes Y
(k)
are then combined to yield a single continuous
process indexed by R
+
R via the following reasoning. If k < , then for
rational (t, x) [0, k] [k, k] the equality Y
(k)
(t, x) = M(t, x) = Y
()
(t, x)
holds with probability 1. Since rationals are countable, there is a single event
k,
such that P(
k,
) = 1 and for
k,
, Y
(k)
(t, x, ) = Y
()
(t, x, ) for
all rational (t, x) [0, k] [k, k]. All values of a continuous function
are completely determined by the values on a dense set, hence for
k,
,
Y
(k)
(t, x, ) = Y
()
(t, x, ) for all (t, x) [0, k][k, k]. Let
0
=

k<

k,
.
Then P(
0
) = 1 and for
0
we can consistently dene I(t, x, ) =
Y
(k)
(t, x, ) for any k such that (t, x) [0, k] [k, k]. Outside
0
we can
dene for example I(t, x, ) 0 to get a process that is continuous in (t, x)
at all .
8.1. Local time 275
We turn to prove (8.11). We can assume s < t. First using additivity of
stochastic integrals and abbreviating a = x y, b = x y, almost surely
M(s, x) M(t, y) = sign(y x)
_
s
0
1
[a,b)
(B) dB
_
t
s
1
[y,)
(B) dB.
Using [u +v[
k
2
k
([u[
k
+[v[
k
) and then the inequality in Proposition 6.26
E[M(s, x) M(t, y)[
2m
CE
_

_
s
0
1
[a,b)
(B) dB
2m
_
+CE
_

_
t
s
1
[y,)
(B) dB
2m
_
CE
_
_
_
s
0
1
[a,b)
(B
u
) du
_
m
_
+CE
_
_
_
t
s
1
[y,)
(B
u
) du
_
m
_
. (8.12)
The integrals on the last line above appear because the quadratic variation
of
_
s
0
X dB is
_
s
0
X
2
du (Proposition 6.2). For the second integral we need
to combine this with the step given in Theorem 5.45:
_
t
s
1
[y,)
(B
u
) dB
u
=
_
ts
0
1
[y,)
(B
s+u
) d

B
u
where

B
u
= B
s+u
B
s
is the restarted Brownian motion (which of course
is again a standard Brownian motion).
The second integral on line (8.12) is immediately bounded by [ts[
m
. We
estimate the rst integral with the help of independent Brownian increments.
Let us abbreviate du = du
m
du
1
for a multivariate integral and u
j
=
u
j
u
j1
for increments of the time variables, and introduce u
0
= 0 so that
u
1
= u
1
.
Suppose f is a symmetric function of m variables. This means that
permuting the variables does not change the value of f:
f(u
1
, u
2
, . . . , u
m
) = f(u
i
1
, u
i
2
, . . . , u
im
)
for any rearrangement (u
i
1
, u
i
2
, . . . , u
im
) of (u
1
, u
2
, . . . , u
m
). Then
_
s
0

_
s
0
f(u
1
, u
2
, . . . , u
m
) du
= m!
_

_
0<u
1
<<um<s
f(u
1
, u
2
, . . . , u
m
) du.
Check this as an exercise.
Now for the rst integral in (8.12) switch around the expectation and
the integration and use the above trick, to make it equal
E
_
s
0

_
s
0
m
i=1
1
[a,b)
(B
u
i
) du
= m!
_

_
0<u
1
<<um<s
E
_
1
[a,b)
(B
u
1
)1
[a,b)
(B
u
2
) 1
[a,b)
(B
um
)
du. (8.13)
We estimate the expectation one indicator at a time, beginning with a con-
ditioning step:
E
_
1
[a,b)
(B
u
1
) 1
[a,b)
(B
u
m1
) 1
[a,b)
(B
um
)
= E
_
1
[a,b)
(B
u
1
) 1
[a,b)
(B
u
m1
) E
_
1
[a,b)
(B
um
)
T
u
m1
_
To handle the conditional expectation recall that the increment B
um
B
u
m1
is independent of T
u
m1
and has the ^(0, u
m
) distribution. Use property
(x) of conditional expectations given in Theorem 1.26.
E
_
1
[a,b)
(B
um
)
T
u
m1
_
= E
_
1
[a,b)
(B
u
m1
+B
um
B
u
m1
)
T
u
m1
_
=
_

1
[a,b)
(B
u
m1
+x)
e
x
2
/(2um)
2u
m
dx
b a
2u
m
.
In the last upper bound the exponential was simply dropped (it is 1) and
then the integral taken.
This step can be applied repeatedly to the expectation in (8.13) to re-
place each indicator with the upper bound (b a)/
2u
m
. At the end of
this step
line (8.13) (2)
m/2
m!(b a)
m
_

_
0<u
1
<<um<s
1
u
1
u
m
du.
The integral can be increased by letting s increase to T, then integrated to
a constant that depends on T and m, and we get
line (8.13) C(m, T)(b a)
m
.
Tracing the development back up we see that this is an upper bound for the
rst integral on line (8.12).
We now have for line (8.12) the upper bound [t s[
m
+C(m, T)(b a)
m
which is bounded by C(m, T)[(s, x) (t, y)[
m
. We have proved inequality
(8.11) and thereby the lemma.
8.1. Local time 277
8.1.2. Tanakas formula and reection. We begin the next develop-
ment by extending the dening equation (8.5) of local time to an Ito formula
for the convex function dened by the absolute value. The sign function is
dened by sign(u) = 1u > 0 1u < 0.
Theorem 8.5 (Tanakas formula). For each x R we have the identity
(8.14) [B
t
x[ [B
0
x[ =
_
t
0
sign(B
s
x) dB
s
+L(t, x)
in the sense that the processes indexed by t R
+
on either side of the
equality sign are indistinguishable.
Proof. All the processes in (8.14) are continuous in t (the stochastic integral
by its construction as a limit in /
c
2
), hence it suces to show almost sure
equality for a xed t.
Let L
and I
denote the local time and the stochastic integral in (8.4)

associated to the Brownian motion B dened on the same probability space
(, T, P) as B. Let
0
be the full-probability event where the properties of
Theorem 8.1 hold for B and L
. On
0

0
we can apply (8.3) to both
B and B to get L
(t, x) = L(t, x) as common sense would dictate.

L(t, x) =
1
2
L(t, x) +
1
2
L
(t, x)
= (B
t
() x)
+
(B
0
() x)
+
I(t, x, )
+ (B
t
() +x)
+
(B
0
() +x)
+
I
(t, x, )
= [B
t
() x[ [B
0
() x[
_
t
0
1
[x,)
(B
s
) dB
s
_
t
0
1
[x,)
(B
s
) d(B)
s
.
The construction of the stochastic integral shows that if the signs of all the
B-increments are switched,
_
t
0
X d(B) =
_
t
0
X dB a.s. Consequently,
almost surely,
_
t
0
1
[x,)
(B
s
) dB
s
+
_
t
0
1
[x,)
(B
s
) d(B)
s
=
_
t
0
_
1
[x,)
(B
s
) 1
(,x]
(B
s
)
dB
s
=
_
t
0
sign(B
s
x) dB
s
where in the last equality we used the fact that
_
0
1B
s
= x ds = 0 almost
surely for any xed x. (And in fact the convention of the sign function at 0
is immaterial.) Substituting the last integral back up nishes the proof.
While local time is a subtle object, its distribution has a surprisingly
simple description. Considering a particular xed Brownian motion B, let
(8.15) M
t
= sup
0st
B
s
be its running maximum, and write L = L
t
= L(t, 0) : t R
+
for the
local time process at the origin.
Theorem 8.6. Let B be a standard Brownian motion. Then we have this
equality in distribution of processes:
(M B, M)
d
= ([B[, L).
The absolute value [B
t
[ is thought of as Brownian motion reected at
the origin. The rst equality M B
d
= [B[ indicates that Brownian motion
reects o its running maximum statistically in exactly the same way as o
the xed point 0. Recall from (2.34) that the reection principle of Brownian
motion gives the xed time distributional equality M
t
d
= [B
t
[ but this does
not extend to a process-level equality.
To prove Theorem 8.6 we introduce Skorohods solution to the reection
problem. C(R
+
) is the space of continuous functions on R
+
.
Lemma 8.7. Let b C(R
+
) such that b(0) 0. Then there is a unique
pair (a, ) of functions from C(R
+
) determined by these properties:
(i) a = b + and a(t) 0 for all t R
+
.
(ii) (0) = 0, is nondecreasing and
(8.16)
_

0
a(t) d(t) = 0.
Remark 8.8. Condition (8.16) is equivalent to t R
+
: a(t) > 0 = 0
where is the Lebesgue-Stieltjes measure of . This follows from the basic
integration fact that for a nonnegative function a,
_
a d = 0 if and only if
a = 0 -a.e. In particular, if a > 0 on (s, t), by the continuity of ,
0 = (s, t) = (t) (s) = (t) (s).
The point of (i)(ii) is that provides the minimal amount of upward
push to keep a nonnegative. (8.16) says that no push happens when a >
0 which would be wasted eort. (See Exercise 8.2 on this point of the
optimality of the solution.)
Proof of Lemma 8.7. Existence. Dene
(8.17) (t) = sup
s[0,t]
b
(s) and a(t) = b(t) +(t).

We check that the functions and a dened above satisfy the required
properties. We leave it as an exercise to check that is continuous.
8.1. Local time 279
For part (i) it remains only to observe that
a(t) = b(t) +(t) b(t) +b
(t) = b
+
(t) 0.
For part (ii), b(0) 0 implies (0) = b
(0) = 0 and the denition of

makes it nondecreasing. By Exercise 1.2, to show (8.16) it suces to show
that if t is a point of strict increase for then a(t) = 0. So x t and suppose
that (t
1
) > (t) for all t
1
> t. From the denition it then follows that for
each k N there exists s
k
[t, t +k
1
] such that
(8.18) 0 (t) 0 it follows that b(s
k
) < 0
and then by continuity b(t) 0. Again by continuity, in the limit (8.18)
becomes (t) = b
(t). Consequently a(t) = b(t) +(t) = b
(t) +b
(t) = 0.
Uniqueness. Suppose (a, ) and ( a,

) are two pairs of C(R
+
)-functions
that satisfy (i)(ii), and a(t) > a(t) for some t > 0. Let u = sups [0, t] :
a(s) = a(s). By continuity a(u) = a(u) and so u < t. Then a(s) > a(s) for
all s (u, t] because (again by continuity) aa cannot change sign without
passing through 0.
Since a(s) 0 we see that a(s) > 0 for all s (u, t], and then by
property (8.16) (or by its equivalent form in Remark 8.8),

(t) =

(u). Since
a = b = a

implies a a =

and is nondecreasing, we get
a(t) a(t) =

(t) (t)

(u) (u) = a(u) a(u) = 0
contradicting a(t) > a(t).
Thus a a. The roles of a and a can be switched in the argument, and
we conclude that a = a. The above implication then gives

= .
Tanakas formula (8.14) and the properties of local time established in
Theorem 8.1 show that the pair ([Bx[, L( , x)) solves the reection problem
for the process
(8.19)

B
t
= [B
0
x[ +
_
t
0
sign(B
s
x) dB
s
.
So by (8.17) and the uniqueness part of Lemma 8.7 we get the new identity
(8.20) L(t, x) = sup
0st
s
.
Levys criterion (Remark 6.25 based on Theorem 6.24) tells us that

B is
a Brownian motion. For this we only need to check the quadratic variation
of the stochastic integral:
_
_

0
sign(B
s
x) dB
s
_
t
=
_
t
0
_
sign(B
s
x)
_
2
ds = t.
Proof of Theorem 8.6. Take B
0
= x = 0 in (8.14) and (8.19) and let
B =
B which is now another standard Brownian motion. (8.14) takes the

form [B[ =
B+L which implies that ([B[, L) solves the reection problem

for
B. Uniqueness and denition (8.17) give

L
t
= sup
0st
(
B)
s
= sup
0st
B
s

M
t
where the last identity denes the running maximum process

M
t
.
To restate, we have two solutions for the reection problem for
B,
namely ([B[, L) and the solution (

M

B,

M) from (8.17). By uniqueness
([B[, L) = (

M

B,

M), and since (

M

B,

M)
d
= (M B, M), we have the
claimed distributional equality ([B[, L)
d
= (M B, M).
8.2. Change of measure
In this section we take up a new idea, changing a random variable to a
dierent one by changing the probability measure on the underlying sample
space. This elementary example illustrates. Let X be a ^(0, 1) random
variable dened on a probability space (, T, P). Let R. On dene
the random variable Z = exp(X
2
/2). Then Z 0 and E(Z) = 1.
Consequently we can use Z as a Radon-Nikodym derivative to dene a new
probability measure Qon (, T) by Q(A) = E(Z1
A
) for measurable sets A
T. Now that we have two measures P and Q on the same measurable space,
expectations under these two measures need to be distinguished notationally.
A common way to do this is to write E
P
(f) =
_
f dP for expectation under
P, and similarly E
Q
for expectation under Q.
To see what happens to the distribution of X when we replace P with Q,
let us derive the density of X under Q. Let f be a bounded Borel function
on R. Using the density of X under P we see that
E
Q
[f(X)] = E
P
[Z f(X)] = E
P
[e
X
2
/2
f(X)]
=
1
2
_
R
f(x)e
x
2
/2x
2
/2
dx =
1
2
_
R
f(x)e
(x)
2
/2
dx.
Consequently, under Q, X is a normal variable with mean and variance
1. The switch from P to Q added a mean to X. The transformation works
also in the opposite direction. If we start with Q and then dene P by
dP = Z
1
dQ then the switch from P to Q removes the mean from X.
Our goal is to achieve the same with Brownian motion. By a change of
the underlying measure with an explicitly given Radon-Nikodym derivative
we can add a drift to a Brownian motion or remove a drift.
Let now (, T, P) be a complete probability space with a complete ltra-
tion T
t
, and B(t) = [B
1
(t), . . . , B
d
(t)]
T
a standard d-dimensional Brown-
ian motion with respect to the ltration T
t
. Let H(t) = (H
1
(t), . . . , H
d
(t))
be an R
d
-valued adapted, measurable process such that
(8.21)
_
T
0
[H(t, )[
2
dt < for all T < , for P-almost every .
Under these conditions the real valued stochastic integral
_
t
0
H(s) dB(s) =
d
i=1
_
t
0
H
i
(s) dB
i
(s)
is well-dened, as explained in either Chapter 4 or Section 5.5. Dene the
stochastic exponential
(8.22) Z
t
= exp
_
_
t
0
H(s) dB(s)
1
2
_
t
0
[H(s)[
2
ds
_
.
Here it is important that [x[ = (x
2
1
+ + x
2
d
)
1/2
is the Euclidean norm.
By using a continuous version of the stochastic integral we see that Z
t
is a
continuous process. Itos formula shows that
(8.23) Z
t
= 1 +
_
t
0
Z
s
H(s) dB(s)
and so Z
t
is a continuous local martingale.
If will be of fundamental importance for the sequel to make sure that Z
t
is a martingale. This can be guaranteed by strong enough assumptions on
H. Let us return to this point later and continue now under the assumption
that Z
t
is a martingale. Then
EZ
t
= EZ
0
= 1 for all t 0.
Thus as in the opening paragraph, each Z
t
qualies as a Radon-Nikodym
derivative that gives a new probability measure Q
t
. If we restrict Q
t
to T
t
,
the entire family Q
t
tR
+
acquires a convenient consistency property. So
for each t R
+
let us dene a probability measure Q
t
on (, T
t
) by
(8.24) Q
t
(A) = E
P
[1
A
Z
t
] for A T
t
.
Then by the martingale property, for s < t and A T
s
,
(8.25)
Q
t
(A) = E
P
[1
A
Z
t
] = E
P
[1
A
E
P
(Z
t
[T
s
)] = E
P
[1
A
Z
s
]
= Q
s
(A).
This is the consistency property.
Additional conditions are needed if we wish to work with a single measure
Q instead of the family Q
t
tR
+
. But as long as our computations are
restricted to a xed nite time horizon [0, T] we have all that is needed.
The consistency property (8.25) allows us to work with just one measure
Q
T
.
Continuing towards the main result, dene the continuous, adapted R
d
-
valued process W by
W(t) = B(t)
_
t
0
H(s) ds.
The theorem, due to Cameron, Martin and Girsanov, states that under the
transformed measure W is a Brownian motion.
Theorem 8.9. Fix 0 < T < . Assume that Z
t
: t [0, T] is a mar-
tingale, H satises (8.21), and dene the probability measure Q
T
on T
T
by
(8.24). Then on the probability space (, T
T
, Q
T
) the process W(t) : t
[0, T] is a d-dimensional standard Brownian motion relative to the ltration
T
t
t[0,T]
.
Let us illustrate how this can be used in practice.
Example 8.10. Let a < 0, R, and B
t
a standard Brownian motion.
Let be the rst time when B
t
hits the space-time line x = at. We nd
the probability distribution of .
To apply Girsanovs theorem we formulate the question in terms of X
t
=
B
t
+t, Brownian motion with drift . We can express as
= inft 0 : X
t
= a.
Dene Z
t
= e
Bt
2
t/2
and Q
t
(A) = E
P
(Z
t
1
A
). Check that Z
t
is a
martingale. Thus under Q
t
, X
s
: 0 s t is a standard Brownian
motion. Compute as follows.
P( > t) = E
Q
(Z
1
t
1 > t) = e
2
t/2
E
Q
_
e
Xt
1 inf
0st
X
s
> a
_
= e
2
t/2
E
P
_
e
Bt
1 M
t
< a
_
.
On the last line we introduced a new standard Brownian motion B = X
with running maximum M
t
= sup
0st
B
s
, and then switched back to using
the distribution P of standard Brownian motion. From (2.39) in Exercise
2.19 we can read o the density of (B
t
, M
t
). After some calculus
P( > t) =
t
1/2
a+
t
_
e
x
2
/2
2
dx e
2a
t
1/2
a+
t
_
e
x
2
/2
2
dx
Let us ask for P( = ), that is, the probability that X stays forever
in (a, ). By taking t above we get
P( = ) =
_
0 0
1 e
2a
> 0.
We can deduce the same answer by letting b in (6.30).
Proof of Theorem 8.9. Theorem 8.9 is proved with Levys criterion, so
we need to check that W(t) : t [0, T] is a local martingale under Q with
quadratic covariation [W
i
, W
j
]
t
=
i,j
t.
The covariation follows by collecting facts we know already. Since Q
T

P on T
T
, any P-almost sure property on T
T
is also Q
T
-almost sure. Con-
dition (8.21) implies that
_
t
0
H(s) ds is a continuous BV process on [0, T]
(Exercise 1.4). Thus by Lemma A.10 [H
i
, H
j
] = [B
i
, H
j
] = 0. We also get
[B
i
, B
j
]
t
=
i,j
t under Q in a few lines. Given a partition = t
k
of [0, t]
with t T and > 0, dene the event
A
=
_
m()1
k=0
_
B
i
(t
k+1
) B
i
(t
k
)
__
B
j
(t
k+1
) B
j
(t
k
)
_
i,j
t

_
.
Property [B
i
, B
j
]
t
=
i,j
t under P means precisely that, for any > 0,
P(A
) 0 as mesh() 0. Dominated convergence and the assumed

integrability of Z
T
give the same under Q: Q(A
) = E
P
[Z
T
1
A
] 0. The
application of the dominated convergence theorem is not quite the usual
one. We can assert that the integrand Z
T
1
A
0 in probability, because
PZ
T
1
A
P(A
) 0 for any > 0.

Convergence in probability is a sucient hypothesis for the dominated con-
vergence theorem (Theorem B.11).
It remains to check that W(t) : t [0, T] is a local martingale under
Q. This follows by taking M = B
i
in Lemma 8.11 below, for each i in turn.
With this we can consider Theorem 8.9 proved.
The above proof is completed by addressing the question of how to trans-
form a local martingale under P so that it becomes a local martingale under
Q. Continuing with all the assumptions we have made, let M
t
: t [0, T]
be a continuous local martingale under P such that M
0
= 0. Dene
(8.26) N
t
= M
t
j=1
_
t
0
H
j
(s) d[M, B
j
]
s
.
By the Kunita-Watanable inequality (Proposition 2.19)
(8.27)
_
t
0
H
j
(s) d[M, B
j
]
s
[M]
1/2
t
__
t
0
H
j
(s)
2
ds
_
1/2
so the integrals in the denition of N
t
are nite. Furthermore, [M, B
j
]
t
is
continuous (Proposition 2.16 or Theorem 3.26) and consequently the inte-
gral
_
t
0
H
j
(s) d[M, B
j
]
s
is a continuous BV process (Exercise 1.4). When
we switch measures from P to Q
T
it turns out that M continues to be a
semimartingale. Equation (8.26) gives us the semimartingale decomposition
of M under Q
T
because N is a local martingale as stated in the next lemma.
Lemma 8.11. N
t
: t [0, T] is a continuous local martingale under Q
T
.
Proof. By its denition N is a semimartingale under P. Integration by
parts (Proposition 6.7), equation (8.23) for Z, and the substitution rules
(Corollary 6.6 and Theorem 6.9) give
N
t
Z
t
=
_
t
0
N dZ +
_
t
0
Z dN + [N, Z]
t
=
_
t
0
NZH dB +
_
t
0
Z dM
j
_
t
0
ZH
j
d[M, B
j
]
+
j
_
t
0
ZH
j
d[M, B
j
]
=
_
t
0
NZH dB +
_
t
0
Z dM.
The last line shows that NZ is a continuous local martingale under P.
Now we split the remaining proof into two cases. Assume rst that N is
uniformly bounded: [N
t
()[ C for all t [0, T] and . We shall show
that in this case NZ is actually a martingale.
Let
n
be stopping times that localize NZ under P. Let 0 s <
t T and A T
s
. From the martingale property of (NZ)
n
we have
(8.28) E
P
_
1
A
N
tn
Z
tn
= E
P
_
1
A
N
sn
Z
sn
.
We claim that
(8.29) lim
n
E
P
_
1
A
N
tn
Z
tn
= E
P
_
1
A
N
t
Z
t
.
This follows from the generalized dominated convergence theorem (Theo-
rem A.15). We have the almost sure convergences N
tn
N
t
and Z
tn

Z
t
and the domination [N
tn
Z
tn
[ CZ
tn
. We also have the limit
E[1
A
Z
tn
] E[1
A
Z
t
] by the following reasoning. We have assumed that
Z
t
is a martingale, hence by optional stopping (Theorem 3.6) Z
tn
=
E
P
(Z
t
[ T
tn
). Consequently by Lemma B.14 the sequence Z
tn
is uni-
formly integrable. The limit E[1
A
Z
tn
] E[1
A
Z
t
] nally comes from the
almost sure convergence, uniform integrability and Theorem 1.21(iv).
Taking limit (8.29) on both sides of (8.28) veries the martingale prop-
erty of NZ. Then we can check that N is a martingale under Q
T
:
E
Q
T
[1
A
N
t
] = E
P
_
1
A
N
t
Z
t
= E
P
_
1
A
N
s
Z
s
= E
Q
T
[1
A
N
s
].
Now the general case. Dene
n
= inf
_
t 0 : [M
t
[ + [M]
t
+
_
t
0
[H(s)[
2
ds n
_
.
By the continuity of the processes in question, these are stopping times
(Corollary 2.10) such that, Q
T
-almost surely,
n
T for large enough n.
We apply the part already proved to M
n
in place of M, and let N
(n)
denote
the process dened by (8.26) with M
n
in place of M.
Observe that
_
t
0
H
j
(s) d[M
n
, B
j
]
s
=
_
t
0
H
j
(s) d[M, B
j
]
n
s
=
_
tn
0
H
j
(s) d[M, B
j
]
s
.
This tells us two things. First, combined with inequality (8.27) and [M
n
[
n, we see from the denition (8.26) that N
(n)
is bounded. Thus the part of
the proof already done implies that N
(n)
is a martingale under Q. Second,
looking at the denition again we see that N
(n)
= N
n
.
To summarize, we have found a sequence of stopping times
n
such that
n
T for large enough n Q
T
-almost surely, and N
n
t
: t [0, T] is a
martingale. In other words, we have shown that N is a local martingale
under Q.
Exercises
Exercise 8.1. Show that for a given x,
_
t
0
1B
s
= x ds = 0 for all t R
+
almost surely. Hint. Simply take the expectation. Note that this example
again shows the interplay between countable and uncountable in probability:
in some sense the result says that Brownian motion spends zero time at
x. Yet of course Brownian motion is somewhere!
Exercise 8.2. Let (a, ) be the solution of the reection problem in Lemma
8.7. If ( a,

) is a pair of C(R
+
)-functions that satises a = b +

0, show
that a a and

.
Exercise 8.3. Let B
()
t
= B
t
+t be standard Brownian motion with drift
and running maximum M
()
t
= sup
0st
B
()
s
. Find the joint density of
(B
()
t
, M
()
t
). Naturally you will utilize (2.39).
Appendix A
Analysis
Denition A.1. Let X be a space. A function d : X X [0, ) is a
metric if for all x, y, z X,
(i) d(x, y) = 0 if and only if x = y,
(ii) d(x, y) = d(y, x), and
(iii) d(x, y) d(x, z) +d(z, y).
The pair (X, d), or X alone, is called a metric space.
Convergence of a sequence x
n
to a point x in X means that the dis-
tance vanishes in the limit: x
n
x if d(x
n
, x) 0. x
n
is a Cauchy
sequence if sup
m>n
d(x
m
, x
n
) 0 as n . Completeness of a metric
space means that every Cauchy sequence in the space has a limit in the
space. A countable set y
k
is dense in X if for every x X and every
> 0, there exists a k such that d(x, y
k
) < . X is a separable metric space
if it has a countable dense set. A complete, separable metric space is called
a Polish space.
The Cartesian product X
n
= X X X is a metric space with
metric

d(x, y) =

i
d(x
i
, y
i
) dened for vectors x = (x
1
, . . . , x
n
) and y =
(y
1
, . . . , y
n
) in X
n
.
A vector space (also linear space) is a space X on which addition x + y
(x, y X) and scalar multiplication x ( R, x X) are dened, and
satisfy the usual algebraic properties that familiar vector spaces such as R
n
have. For example, there must exist a zero vector 0 X, and each x X
has an additive inverse x such that x + (x) = 0.
Denition A.2. Let X be a vector space. A function | | : X [0, ) is
a norm if for all x, y X and R,
287
288 A. Analysis
(i) |x| = 0 if and only if x = 0,
(ii) |x| = [[ |x|, and
(iii) |x +y| |x| +|y|.
X is a normed space if it has a norm. A metric on a normed space is dened
by d(x, y) = |x y|. A normed space that is complete as a metric space is
called a Banach Space.
A basic extension result says that a uniformly continuous map (a syn-
onym for function) into a complete space can be extended to the closure
of its domain.
Lemma A.3. Let (X, d) and (Y, r) be metric spaces, and assume (Y, r) is
complete. Let S be a subset of X, and f : S Y a map that is uniformly
continuous. Then there exists a unique continuous map g :

S Y such that
g(x) = f(x) for x S. Furthermore g is also uniformly continuous.
Proof. Fix x

S. There exists a sequence x
n
S that converges to x.
We claim that f(x
n
) is a Cauchy sequence in Y . Let > 0. By uniform
continuity, there exists > 0 such that for all z, w S, if d(z, w)
then r
_
f(z), f(w)
_
/2. Since x
n
x, there exists N < such that
d(x
n
, x) < /2 whenever n N. Now if m, n N,
d(x
m
, x
n
) d(x
m
, x) +d(x, x
n
) < /2+ < /2 = ,
so by choice of ,
(A.1) r
_
f(x
m
), f(x
n
)
_
/2 < for m, n N.
By completeness of Y , the Cauchy sequence f(x
n
) converges to some
point y Y . Let us check that for any other sequence S z
k
x, the limit
of f(z
m
) is again y. Let again > 0, choose and N as above, and choose
K so that d(z
k
, x) < /2 whenever k K. Let n N and k K. By the
triangle inequality
r
_
y, f(z
k
)
_
r
_
y, f(x
n
)
_
+r
_
f(x
n
), f(z
k
)
_
.
We can let m in (A.1), and since f(x
m
) y and the metric is itself a
continuous funcion of each of its arguments, in the limit r
_
y, f(x
n
)
_
/2.
Also,
d(z
k
, x
n
) d(z
k
, x) +d(x, x
n
) < /2+ < /2 = ,
so r
_
f(x
n
), f(z
k
)
_
/2. We have shown that r
_
y, f(z
k
)
_
for k K.
In other words f(z
k
) y.
The above development shows that, given x

S, we can unambigously
dene g(x) = limf(x
n
) by using any sequence x
n
in S that converges to x.
By the continuity of f, if x happens to lie in S, then g(x) = limf(x
n
) = f(x),
so g is an extension of f to

S.
To show the continuity of g, let > 0, pick as above, and suppose
d(x, z) /2 for x, z

S. Pick sequences x
n
x and z
n
z from S.
Pick n large enough so that r
_
f(x
n
), f(x)
_
< /4, r
_
f(z
n
), f(z)
_
< /4,
d(x
n
, x) < /4 and d(z
n
, z) < /4. Then
d(x
n
, z
n
) d(x
n
, x) +d(x, z) +d(z
n
, z) <
which implies r
_
f(x
n
), f(z
n
)
_
/2. Then
r
_
f(x), f(z)
_
r
_
f(x), f(x
n
)
_
+r
_
f(x
n
), f(z
n
)
_
+r
_
f(z
n
), f(z)
_
.
This shows the uniform continuity of g.
Uniqueness of g follows from above because any continuous extension h
of f must satisfy h(x) = limf(x
n
) = g(x) whenever a sequence x
n
from S
converges to x.
Without uniform continuity the extension might not be possible, as ev-
idenced by a simple example such as S = [0, 1) (1, 2], f 1 on [0, 1), and
f 2 on (1, 2].
One key application of the extension is the following situation.
Lemma A.4. Let X, Y and S be as in the previous lemma. Assume in ad-
dition that they are all linear spaces, and that the metrics satisfy d(x
0
, x
1
) =
d(x
0
+ z, x
1
+ z) for all x
0
, x
1
, z X and r(y
0
, y
1
) = r(y
0
+ w, y
1
+ w) for
all y
0
, y
1
, w Y . Let I : S Y be a continuous linear map. Then there
exists a linear map T :

S Y which agrees with I on S.
A.1. Continuous, cadlag and BV functions
Fix an interval [a, b]. The uniform norm and the uniform metric on functions
on [a, b] are dened by
(A.2) |f|
= sup
t[a,b]
[f(t)[ and d
(f, g) = sup
t[a,b]
[f(t) g(t)[.
One needs to distinguish these from the L
norms dened in (1.8).

We can impose this norm and metric on dierent spaces of functions on
[a, b]. Continuous functions are the most familiar. Studying stochastic pro-
cesses leads us also to consider cadlag functions, which are right-continuous
and have left limits at each point in [a, b]. Both cadlag functions and con-
tinuous functions form a complete metric space under the uniform metric.
This is proved in the next lemma.
Lemma A.5. (a) Suppose f
n
is a Cauchy sequence of functions in the
metric d
on [a, b]. Then there exists a function f on [a, b] such that

d
(f
n
, f) 0.
290 A. Analysis
(b) If all the functions f
n
are cadlag on [a, b], then so is the limit f.
(c) If all the functions f
n
are continuous on [a, b], then so is the limit f.
Proof. As an instance of the general denition, a sequence of functions f
n
on [a, b] is a Cauchy sequence in the metric d
if for each > 0 there exists

a nite N such that d
(f
n
, f
m
) for all m, n N. For a xed t, the
sequence of numbers f
n
(t) is a Cauchy sequence, and by the completeness
of the real number system f
n
(t) converges as n . These pointwise limits
dene the function
f(t) = lim
n
f
n
(t).
It remains to show uniform convergence f
n
f. Fix > 0, and use the
uniform Cauchy property to pick N so that
[f
n
(t) f
m
(t)[ for all m, n N and t [a, b].
With n and t xed, let m . In the limit we get
[f
n
(t) f(t)[ for all n N and t [a, b].
This shows the uniform convergence.
(b) Fix t [a, b]. We rst show f(s) f(t) as s t. (If t = b no
approach from the right is possible and there is nothing to prove.) Let
> 0. Pick n so that
sup
x[a,b]
[f
n
(x) f(x)[ /4.
By assumption f
n
is cadlag so we may pick > 0 so that t s t +
implies [f
n
(t) f
n
(s)[ /4. The triangle inequality then shows that
[f(t) f(s)[
for s [t, t +]. We have shown that f is right-continuous.
To show that left limits exist for f, use the left limit of f
n
to nd > 0
so that [f
n
(r) f
n
(s)[ /4 for r, s [t , t). Then again by the triangle
inequality, [f(r) f(s)[ for r, s [t , t). This implies that
0 limsup
s,t
f(s) liminf
s,t
f(s)
and that both limsup and liminf are nite. Since > 0 was arbitrary, the
existence of the limit f(t) = lim
s,t
f(s) follows.
(c) Right continuity of f follows from part (b), and left continuity is
proved by the same argument.
A function f dened on R
d
(or some subset of it) is locally Lipschitz if
for every compact set K there exists a constant L
K
such that
[f(x) f(y)[ L
K
[x y[
for all x, y K in the domain of f. In particular, a locally Lipschitz function
on R is Lipschitz continuous on any compact interval [a, b].
Lemma A.6. Suppose g BV [0, T] and f is locally Lipschitz on R, or
some subset of it that contains the range of g. Then f g is BV.
Proof. A BV function on [0, T] is bounded because
[g(x)[ [g(0)[ +V
g
(T).
Hence f is Lipschitz continuous on the range of g. With L denoting the
Lipschitz constant of f on the range of g, for any partition t
i
of [0, T],
i
[f(g(t
i+1
)) f(g(t
i
))[ L
i
[g(t
i+1
) g(t
i
)[ LV
g
(T).
Lemma A.7. Suppose f has left and right limits at all points in [0, T]. Let
> 0. Dene the set of jumps of magnitude at least by
(A.3) U = t [0, T] : [f(t+) f(t)[
with the interpretations f(0) = f(0) and f(T+) = f(T). Then U is nite.
Consequently f can have at most countably many jumps in [0, T].
Proof. If U were innite, it would have a limit point s [0, T]. This means
that every interval (s , s + ) contains a point of U, other than s itself.
But since the limits f(s) both exist, we can pick small enough so that
[f(r) f(t)[ < /2 for all pairs r, t (s , s), and all pairs r, t (s, s +).
Then also [f(t+) f(t)[ /2 for all t (s, s) (s, s+), and so these
intervals cannot contain any point from U.
The lemma applies to monotone functions, BV functions, cadlag and
caglad functions.
Lemma A.8. Let f be a cadlag function on [0, T] and dene U as in (A.3).
Then
lim
0
sup[f(v) f(u)[ : 0 u < v T, v u , (u, v] U = .
Proof. This is proved by contradiction. Assume there exists a sequence
n
0 and points u
n
, v
n
[0, T] such that 0 < v
n
u
n

n
, (u
n
, v
n
]U = ,
and
(A.4) [f(v
n
) f(u
n
)[ > +
292 A. Analysis
for some > 0. By compactness of [0, T], we may pass to a subsequence
(denoted by u
n
, v
n
again) such that u
n
s and v
n
s for some s [0, T].
One of the three cases below has to happen for innitely many n.
Case 1. u
n
< v
n
< s. Passing to the limit along a subsequence for which
this happens gives
f(v
n
) f(u
n
) f(s) f(s) = 0
by the existence of left limits for f. This contradicts (A.4).
Case 2. u
n
< s v
n
. By the cadlag property,
[f(v
n
) f(u
n
)[ [f(s) f(s)[.
(A.4) is again contradicted because s (u
n
, v
n
] implies s cannot lie in U, so
the jump at s must have magnitude strictly less than .
Case 3. s u
n
< v
n
. This is like Case 1. Cadlag property gives
f(v
n
) f(u
n
) f(s) f(s) = 0.
In Section 1.1.9 we dened the total variation V
f
(t) of a function dened
on [0, t]. Let us also dene the quadratic cross variation of two functions f
and g by
[f, g](t) = lim
mesh()0
m()1
i=0
_
f(s
i+1
) f(s
i
)
__
g(s
i+1
) g(s
i
)
_
if the limit exists as the mesh of the partition = 0 = s
0
< < s
m()
= t
tends to zero. The quadratic variation of f is [f] = [f, f].
In the next development we write down sums of the type

,
x()
where / is an arbitrary set and x : / R a function. Such a sum can be
dened as follows: the sum has a nite value c if for every > 0 there exists
a nite set B / such that if E is a nite set with B E / then
(A.5)
E
x() c
.
If
,
x() has a nite value, x() ,= 0 for at most countably many -values.
In the above condition, the set B must contain all such that [x()[ > 2,
for otherwise adding on one such term violates the inequality. In other
words, the set : [x()[ is nite for any > 0.
If x() 0 always, then
,
x() = sup
_
B
x() : B is a nite subset of /
_
gives a value in [0, ] which agrees with the denition above if it is nite.
As for familiar series, absolute convergence implies convergence. In other
words if

,
[x()[ < ,
then the sum

,
x() has a nite value.
Lemma A.9. Let f be a function with left and right limits on [0, T]. Then
s[0,T]
[f(s+) f(s)[ V
f
(T),
where we interpret f(0) = f(0) and f(T+) = f(T). The sum is actually
over a countable set because f has at most countably many jumps.
Proof. If f has unbounded variation there is nothing to prove because the
right-hand side of the inequality is innite. Suppose V
f
(T) < . If the
conclusion of the lemma fails, there exists a nite set s
1
< s
2
< < s
m
of jumps such that

m
i=1
[f(s
i
+) f(s
i
)[ > V
f
(T).
Pick disjoint intervals (a
i
, b
i
) s
i
for each i. (If s
1
= 0 take a
1
= 0, and if
s
m
= T take b
m
= T.) Then
V
f
(T)
m
i=1
[f(b
i
) f(a
i
)[
Let a
i
s
i
and b
i
s
i
for each i, except a
1
in case s
1
= 0 and b
m
in case
s
m
= T. Then the right-hand side above converges to
m
i=1
[f(s
i
+) f(s
i
)[,
contradicting the earlier inequality.
Lemma A.10. Let f and g be a real-valued cadlag functions on [0, T], and
assume f BV [0, T]. Then
(A.6) [f, g](T) =
s(0,T]
_
f(s) f(s)
__
g(s) g(s)
_
and the sum above converges absolutely.
Proof. As a cadlag function g is bounded. (If it were not, we could pick s
n
so that [g(s
n
)[ . By compactness, a subsequence s
n
k
t [0, T]. But
g(t) exist and are nite.) Then by the previous lemma
s(0,T]
[f(s) f(s)[ [g(s) g(s)[ 2|g|
s(0,T]
[f(s) f(s)[
2|g|
V
f
(T) < .
294 A. Analysis
This checks that the sum in (A.6) converges absolutely. Hence it has a nite
value, and we can approximate it with nite sums. Let > 0. Let U
be
the set dened in (A.3) for g. For small enough > 0,
sU
_
f(s) f(s)
__
g(s) g(s)
_
s(0,T]
_
f(s) f(s)
__
g(s) g(s)
_
.
Shrink further so that 2V
f
(T) < .
For such > 0, let U
= u
1
< u
2
< < u
n
. Let =
1
2
minu
k+1

u
k
be half the minimum distance between two of these jumps. Consider
partitions = 0 = s
0
< < s
m()
= t with mesh() . Let i(k)
be the index such that u
k
(s
i(k)
, s
i(k)+1
], 1 k n. By the choice of ,
a partition interval (s
i
, s
i+1
] can contain at most one u
k
. Each u
k
lies in
some (s
i
, s
i+1
] because these intervals cover (0, T] and for a cadlag function
0 is not a discontinuity. Let I = 0, . . . , m() 1 i(1), . . . , i(n) be the
complementary set of indices.
By Lemma A.8 we can further shrink so that if mesh() then
[g(s
i+1
) g(s
i
)[ 2 for i I. Then
m()1
i=0
_
f(s
i+1
) f(s
i
)
__
g(s
i+1
) g(s
i
)
_
=
n
k=1
_
f(s
i(k)+1
) f(s
i(k)
)
__
g(s
i(k)+1
) g(s
i(k)
)
_
+
iI
_
f(s
i+1
) f(s
i
)
__
g(s
i+1
) g(s
i
)
_
k=1
_
f(s
i(k)+1
) f(s
i(k)
)
__
g(s
i(k)+1
) g(s
i(k)
)
_
+ 2V
f
(T).
As mesh() 0, s
i(k)
< u
k
and s
i(k)+1
u
k
, while both converge to u
k
.
By the cadlag property the sum on the last line above converges to
n
k=1
_
f(u
k
) f(u
k
)
__
g(u
k
) g(u
k
)
_
.
Combining this with the choice of made above gives
lim
mesh()0
m()1
i=0
_
f(s
i+1
) f(s
i
)
__
g(s
i+1
) g(s
i
)
_
s(0,T]
_
f(s) f(s)
__
g(s) g(s)
_
+ 2.
Reversing inequalities in the argument gives
lim
mesh()0
m()1
i=0
_
f(s
i+1
) f(s
i
)
__
g(s
i+1
) g(s
i
)
_
s(0,T]
_
f(s) f(s)
__
g(s) g(s)
_
2.
Since > 0 was arbitrary the proof is complete.
Corollary A.11. If f is a cadlag BV-function on [0, T], then [f]
T
is nite
and given by
[f](T) =
s(0,T]
_
f(s) f(s)
_
2
.
Lemma A.12. Let f
n
, f, g
n
and g be cadlag functions on [0, T] such that
f
n
f uniformly and g
n
g uniformly on [0, T]. Suppose f
n
BV [0, T]
for each n, and C
0
sup
n
V
fn
(T) < . Then [f
n
, g
n
]
T
[f, g]
T
as n .
Proof. The function f is also BV, because for any partition,
i
[f(s
i+1
) f(s
i
)[ = lim
n
i
[f
n
(s
i+1
) f
n
(s
i
)[ C
0
.
Consequently by Lemma A.10 we have
[f, g]
T
=
s(0,T]
_
f(s) f(s)
__
g(s) g(s)
_
and
[f
n
, g
n
]
T
=
s(0,T]
_
f
n
(s) f
n
(s)
__
g
n
(s) g
n
(s)
_
and all these sums converge absolutely. Pick > 0. Let
U
n
() = s (0, T] : [g
n
(s) g
n
(s)[
be the set of jumps of g of magnitude at least , and U() the same for g.
By the absolute convergence of the sum for [f, g] we can pick > 0 so that
[f, g]
T

sU()
_
f(s) f(s)
__
g(s) g(s)
_
.
296 A. Analysis
Shrink further so that < /C
0
. Then for any nite H U
n
(),
[f
n
, g
n
]
T

sH
_
f
n
(s) f
n
(s)
__
g
n
(s) g
n
(s)
_
s/ H
[f
n
(s) f
n
(s)[ [g
n
(s) g
n
(s)[

s(0,T]
[f
n
(s) f
n
(s)[ C
0
.
We claim that for large enough n, U
n
() U(). Since a cadlag
function has only nitely many jumps with magnitude above any given
positive quantity, there exists a small > 0 such that g has no jumps
s such that [g(s) g(s)[ ( , ). If n is large enough so that
sup
s
[g
n
(s) g(s)[ < /4, then for each s
(g
n
(s) g
n
(s)) (g(s) g(s))
/2.
Now if s U
n
(), [g
n
(s) g
n
(s)[ and the above inequality imply
[g(s) g(s)[ /2. This jump cannot fall in the forbidden range
( , ), so in fact it must satisfy [g(s) g(s)[ and then s U().
Now we can complete the argument. Take n large enough so that U()
U
n
(), and take H = U() in the estimate above. Putting the estimates
together gives
[f, g] [f
n
, g
n
]
2
+
sU()
_
f(s) f(s)
__
g(s) g(s)
_
sU()
_
f
n
(s) f
n
(s)
__
g
n
(s) g
n
(s)
_
.
As U() is a xed nite set, the dierence of two sums over U tends to zero
as n . Since > 0 was arbitrary, the proof is complete.
We regard vectors as column vectors. T denotes transposition. So if
x = [x
1
, . . . , x
d
]
T
and y = [y
1
, . . . , y
d
]
T
are elements of R
d
and A = (a
i,j
) is
a d d matrix, then
x
T
Ay =
1i,jd
x
i
a
i,j
y
j
.
The Euclidean norm is [x[ = (x
2
1
+ + x
2
d
)
1/2
, and we apply this also to
matrices in the form
[A[ =
_

1i,jd
a
2
i,j
_
1/2
.
The Schwarz inequality says [x
T
y[ [x[ [y[. This extends to
(A.7) [x
T
Ay[ [x[ [A[ [y[.
Lemma A.13. Let g
1
, . . . , g
d
be cadlag functions on [0, T], and form the R
d
-
valued cadlag function g = (g
1
, . . . , g
d
)
T
with coordinates g
1
, . . . , g
d
. Cadlag
functions are bounded, so there exists a closed, bounded set K R
d
such
that g(s) K for all s [0, T].
Let be a continuous function on [0, T]
2
K
2
such that the function
(A.8) (s, t, x, y) =
_
_
_
(s, t, x, y)
[t s[ +[y x[
2
, s ,= t or x ,= y
0, s = t and x = y
is also continuous on [0, T]
2
K
2
. Let
= 0 = t
0
< t
1
< < t
m()
= T
be a sequence of partitions of [0, T] such that mesh(
) 0 as , and
(A.9) C
0
= sup
m()1
i=0
g(t
i+1
) g(t
i
)
2
< .
Then
lim
m()1
i=0
_
t
i
, t
i+1
, g(t
i
), g(t
i+1
)
_
=
s(0,T]
_
s, s, g(s), g(s)
_
. (A.10)
The limit on the right is a nite, absolutely convergent sum.
Proof. To begin, note that [0, T]
2
K
2
is a compact set, so any continuous
function on [0, T]
2
K
2
is bounded and uniformly continuous.
We claim that
(A.11)
s(0,T]
g(s) g(s)
2
C
0
where C
0
is the constant dened in (A.9). This follows the reasoning of
Lemma A.9. Consider any nite set of points s
1
< < s
n
in (0, T]. For
each , pick indices i(k) such that s
k
(t
i(k)
, t
i(k)+1
], 1 k n. For large
enough all the i(k)s are distinct, and then by (A.9)
n
k=1
g(t
i(k)+1
) g(t
i(k)
)
2
C
0
.
As the inequality above becomes
n
k=1
g(s
k
) g(s
k
)
2
C
0
298 A. Analysis
by the cadlag property, because for each , t
i(k)
< s
k
t
i(k)+1
, while both
extremes converge to s
k
. The sum on the left-hand side of (A.11) is by
denition the supremum of sums over nite sets, hence the inequality in
(A.11) follows.
By continuity of there exists a constant C
1
such that
(A.12) [(s, t, x, y)[ C
1
_
[t s[ +[y x[
2
_
for all s, t [0, T] and x, y K. From (A.12) and (A.11) we get the bound
s(0,T]
_
s, s, g(s), g(s)
_
C
0
C
1
< .
This absolute convergence implies that the sum on the right-hand side of
(A.10) can be approximated by nite sums. Given > 0, pick > 0 small
enough so that
sU
_
s, s, g(s), g(s)
_

s(0,T]
_
s, s, g(s), g(s)
_

where
U
= s (0, T] : [g(s) g(s)[

is the set of jumps of magnitude at least .
Since is uniformly continuous on [0, T]
2
K
2
and vanishes on the set
(u, u, z, z) : u [0, T], z K, we can shrink further so that
(A.13) [(s, t, x, y)[ /(T +C
0
)
whenever [t s[ 2 and [y x[ 2. Given this , let I
be the set of
indices 0 i m()1 such that (t
i
, t
i+1
]U
= . By Lemma A.8 and the

assumption mesh(
) 0 we can x
0
so that for
0
, mesh(
) < 2
and
(A.14) [g(t
i+1
) g(t
i
)[ 2 for all i I
.
Note that the proof of Lemma A.8 applies to vector-valued functions.
Let J
= 0, . . . , m() 1 I
be the complementary set of indices i

such that (t
i
, t
i+1
] contains a point of U
. Now we can bound the dierence

in (A.10). Consider
0
.
m()1
i=0
_
t
i
, t
i+1
, g(t
i
), g(t
i+1
)
_

s(0,T]
_
s, s, g(s), g(s)
_

iI
_
t
i
, t
i+1
, g(t
i
), g(t
i+1
)
_
iJ
_
t
i
, t
i+1
, g(t
i
), g(t
i+1
)
_

sU
_
s, s, g(s), g(s)
_

+.
The rst sum after the inequality above is bounded above by , by (A.8),
(A.9), (A.13) and (A.14). The dierence of two sums in absolute values
vanishes as , because for large enough each interval (t
i
, t
i+1
] for
i J
contains a unique s U
, and as ,
t
i
s, t
i+1
s, g(t
i
) g(s) and g(t
i+1
) g(s)
by the cadlag property. (Note that U
is nite by Lemma A.7 and for large

enough index set J
has exactly one term for each s U
.) We conclude
limsup
m()1
i=0
_
t
i
, t
i+1
, g(t
i
), g(t
i+1
)
_

s(0,T]
_
s, s, g(s), g(s)
_

2.
Since > 0 was arbitrary, the proof is complete.
A.2. Dierentiation and integration
For an open set G R
d
, C
2
(G) is the space of functions f : G R whose
partial derivatives up to second order exist and are continuous. We use
subscript notation for partial derivatives, as in
f
x
1
=
f
x
1
and f
x
1
,x
2
=

2
f
x
1
x
2
.
These will always be applied to functions with continuous partial derivatives
so the order of dierentiation does not matter. The gradient Df is the
column vector of rst-order partial derivatives:
Df(x) =
_
f
x
1
(x), f
x
2
(x), . . . , f
x
d
(x)
T
.
The Hessian matrix D
2
f is the d d matrix of second-order partial deriva-
tives:
D
2
f(x) =
_
_
f
x
1
,x
1
(x) f
x
1
,x
2
(x) f
x
1
,x
d
(x)
f
x
2
,x
1
(x) f
x
2
,x
2
(x) f
x
2
,x
d
(x)
.
.
.
.
.
.
.
.
.
.
.
.
f
x
d
,x
1
(x) f
x
d
,x
2
(x) f
x
d
,x
d
(x)
_
_
.
We prove a version of Taylors theorem. Let us write f C
1,2
([0, T]G)
if the partial derivatives f
t
, f
x
i
and f
x
i
,x
j
exist and are continuous on (0, T)
G, and they extend as continuous functions to [0, T] G. The continuity of
f
t
is actually not needed for the next theorem. But this hypothesis will be
present in the application to the proof of Itos formula, so we assume it here
already.
Theorem A.14. (a) Let (a, b) be an open interval in R, f C
1,2
([0, T]
(a, b)), s, t [0, T] and x, y (a, b). Then there exists a point between s
300 A. Analysis
and t and a point between x and y such that
f(t, y) = f(s, x) +f
x
(s, x)(y x) +f
t
(, y)(t s)
+
1
2
f
xx
(s, )(y x)
2
.
(A.15)
(b) Let G be an open convex set in R
d
, f C
1,2
([0, T] G), s, t [0, T]
and x, y G. Then there exists a point between s and t and [0, 1]
such that, with = x + (1 )y,
f(t, y) = f(s, x) +Df(s, x)
T
(y x) +f
t
(, y)(t s)
+
1
2
(y x)
T
D
2
f(s, )(y x).
(A.16)
Proof. Part (a). Check by integration by parts that
(s, x, y) =
_
y
x
(y z)f
xx
(s, z) dz
satises
(s, x, y) = f(s, y) f(s, x) f
x
(s, x)(y x).
By the mean value theorem there exists a point between s and t such that
f(t, y) f(s, y) = f
t
(, y)(t s).
By the intermediate value theorem there exists a point between x and y
such that
(s, x, y) = f
xx
(s, )
_
y
x
(y z) dz =
1
2
f
xx
(s, )(y x)
2
.
The application of the intermediate value theorem goes like this. Let f
xx
(s, a)
and f
xx
(s, b) be the minimum and maximum of f
xx
(s, ) in [x, y] (or [y, x] if
y < x). Then
f
xx
(s, a)
(x, y)
1
2
(y x)
2
f
xx
(s, b).
The intermediate value theorem gives a point between a and b such that
f
xx
(s, ) =
(x, y)
1
2
(y x)
2
.
The continuity of f
xx
is needed here. Now
f(t, y) = f(s, x) +f
x
(s, x)(y x) +f
t
(, y)(t s) +(s, x, y)
= f(s, x) +f
x
(s, x)(y x) +f
t
(, y)(t s) +
1
2
f
xx
(s, )(y x)
2
.
Part (b). Apply part (a) to the function g(t, r) = f
_
t, x +r(y x)
_
for
(t, r) [0, T] (, 1 +) for a small enough > 0 so that x+r(yx) G
for r 1 +.
The following generalization of the dominated convergence theorem is
sometimes useful. It is proved by applying Fatous lemma to g
n
f
n
.
Theorem A.15. Let f
n
, f, g
n
, and g be measurable functions such that
f
n
f, [f
n
[ g
n
L
1
() and
_
g
n
d
_
g d < . Then
_
f d = lim
n
_
f
n
d.
Lemma A.16. Let (X, B, ) be a general measure space. Assume g
n
g
in L
p
() for some 1 p < . Then for any measurable set B B,
_
B
[g
n
[
p
d
_
B
[g[
p
d.
Proof. Let denote the measure restricted to the subspace B, and g
n
and
g denote functions restricted to this space. Since
_
B
[ g
n
g[
p
d =
_
B
[g
n
g[
p
d
_
X
[g
n
g[
p
d
we have L
p
( ) convergence g
n
g. L
p
norms (as all norms) are subject to
the triangle inequality, and so
| g
n
|
L
p
(e )
| g|
L
p
(e )
| g
n
g|
L
p
(e )
0.
Consequently
_
B
[g
n
[
p
d = | g
n
|
p
L
p
(e )
| g|
p
L
p
(e )
=
_
B
[g[
p
d.
A common technical tool is approximation of general functions by func-

tions of some convenient type. Here is one such result that we shall use
later.
Lemma A.17. Let be a -nite Borel measure on R, 1 p < , and
f L
p
(). Let > 0. Then there is a continuous function g supported on
a bounded interval, and a step function h of the form
(A.17) h(t) =
m1
i=1
i
1
(s
i
,s
i+1
]
(t)
for some points < s
1
< s
2
< < s
m
< and reals
i
, such that
_
R
[f g[
p
d < and
_
R
[f h[
p
d < .
If there exists a constant C such that [f[ C, then g and h can also be
selected so that [g[ C and [h[ C.
302 A. Analysis
When the underlying measure is Lebesgue measure, one often writes
L
p
(R) for the function spaces.
Proposition A.18 (L
p
continuity). Let f L
p
(R). Then
lim
h0
_
R
[f(t) f(t +h)[
p
dt = 0.
Proof. Check that the property is true for a step function of the type (A.17).
Then approximate an arbitrary f L
p
(R) with a step function.
Proposition A.19. Let T be an invertible linear transformation on R
n
and
f a Borel or Lebesgue measurable function on R
n
. Then if f is either in
L
1
(R
n
) or nonnegative,
(A.18)
_
R
n
f(x) dx = [det T[
_
R
n
f(T(x)) dx.
Exercises
Exercise A.1. Let f be a cadlag function on [a, b]. Show that f is bounded,
that is, sup
t[a,b]
[f(t)[ < . Hint. Suppose [f(t
j
)[ for some sequence
t
j
[a, b]. By compactness some subsequence converges: t
j
k
t [a, b].
But cadlag says that f(t) exists as a limit among real numbers and f(t+) =
f(t).
Exercise A.2. Let / be a set and x : / R a function. Suppose
c
1
sup
_
B
[x()[ : B is a nite subset of /
_
< .
Show that then the sum

,
x() has a nite value in the sense of the
denition stated around equation (A.5).
Hint. Pick nite sets B
k
such that

B
k
[x()[ > c
1
1/k. Show that
the sequence a
k
=

B
k
x() is a Cauchy sequence. Show that c = lima
k
is
the value of the sum.
Appendix B
Probability
B.1. General matters
Let be an arbitrary space, and / and 1 collections of subsets of . 1
is a -system if it is closed under intersections, in other words if A, B 1,
then A B 1. / is a -system if it has the following three properties:
(1) /.
(2) If A, B / and A B then B A /.
(3) If A
n
: 1 n < / and A
n
A then A /.
Theorem B.1 (Dynkins - theorem). If 1 is a -system and / is a -
system that contains 1, then / contains the -algebra (1) generated by
1.
For a proof, see the Appendix in [4]. The - theorem has the following
version for functions.
Theorem B.2. Let 1 be a -system on a space X such that X =

B
i
for
some pairwise disjoint sequence B
i
1. Let H be a linear space of bounded
functions on X. Assume that 1
B
H for all B 1, and assume that H
is closed under bounded, increasing pointwise limits. The second statement
means that if f
1
f
2
f
3
are elements of H and sup
n,x
f
n
(x) c
for some constant c, then f(x) = lim
n
f
n
(x) is a function in H. It
follows from these assumptions that H contains all bounded (1)-measurable
functions.
Proof. / = A : 1
A
H is a -system containing 1. Note that X / fol-
lows because by assumption 1
X
= lim
n
n
i=1
1
B
i
is an increasing limit
303
304 B. Probability
of functions in H. By the - theorem indicator functions of all (1)-
measurable sets lie in H. By linearity all (1)-measurable simple functions
lie in H. A bounded, nonnegative (1)-measurable function is an increas-
ing limit of nonnegative simple functions, and hence in H. And again by
linearity, all bounded (1)-measurable functions lie in H.
Lemma B.3. Let 1 be a -system on a space X. Let and be two
(possibly innite) measures on (1), the -algebra generated by 1. Assume
that and agree on 1. Assume further that there is a countable collection
of pairwise disjoint sets R
i
1 such that X =

R
i
and (R
i
) = (R
i
) <
. Then = on (1).
Proof. It suces to check that (A) = (A) for all A (1) that lie inside
some R
i
. Then for a general B (1),
(B) =
i
(B R
i
) =
i
(B R
i
) = (B).
Inside a xed R
j
, let
T = A (1) : A R
j
, (A) = (A).
T is a -system. Checking property (2) uses the fact that R
j
has nite
measure under and so we can subtract: if A B and both lie in T, then
(B A) = (B) (A) = (B) (A) = (B A),
so BA T. By hypothesis T contains the -systemA 1 : A R
j
. By
the --theorem T contains all the (1)-sets that are contained in R
j
.
Lemma B.4. Let and be two nite Borel measures on a metric space
(S, d). Assume that
(B.1)
_
S
f d =
_
S
f d
for all bounded continuous functions f on S. Then = .
Proof. Given a closed set F S,
(B.2) f
n
(x) =
1
1 +ndist(x, F)
denes a bounded continuous function which converges to 1
F
(x) as n .
The quantity in the denominator is the distance from the point x to the set
F, dened by
dist(x, F) = infd(x, y) : y F.
For a closed set F, dist(x, F) = 0 i x F. Letting n in (B.1) with
f = f
n
gives (F) = (F). Apply Lemma B.3 to the class of closed sets.
Two denitions of classes of sets that are more primitive than -algebras,
hence easier to deal with. A collection / of subsets of is an algebra if
(i) /.
(ii) A
c
/ whenever A /.
(iii) A B / whenever A / and B /.
A collection o of subsets of is a semialgebra if it has these properties:
(i) o.
(ii) If A, B o then also A B o.
(iii) If A o, then A
c
is a nite disjoint union of elements of o.
In applications, one sometimes needs to generate an algebra with a semial-
gebra, which is particularly simple.
Lemma B.5. Let o be a semialgebra, and / the algebra generated by o, in
other words the intersection of all algebras that contain o. Then / is the
collection of all nite disjoint unions of members of o.
Proof. Let B be the collection of all nite disjoint unions of members of
o. Since any algebra containing o must contain B, it suces to verify
that B is an algebra. By hypothesis o and =
c
is a nite disjoint
union of members of o, hence a member of B. Since A B = (A
c
B
c
)
c
(deMorgans law), it suces to show that B is closed under intersections
and complements.
Let A =

1im
S
i
and B =

1jn
T
j
be nite disjoint unions of
members of o. Then A B =

i,j
A
i
B
j
is again a nite disjoint union
of members of o. By the properties of a semialgebra, we can write S
c
i
=
1kp(i)
R
i,k
as a nite disjoint union of members of o. Then
A
c
=
1im
S
c
i
=
_
(k(1),...,k(m))
1im
R
i,k(i)
.
The last union above is over m-tuples (k(1), . . . , k(m)) such that 1 k(i)
p(i). Each

1im
R
i,k(i)
is an element of o, and for distinct m-tuples these
are disjoint because k ,= implies R
i,k
R
i,
= . Thus A
c
B too.
Algebras are suciently rich to provide approximation with error of
arbitrarily small measure. The operation is the symmetric dierence
dened by AB = (AB) (BA). The next lemma is proved by showing
that the collection of sets that can be approximated as claimed form a -
algebra.
306 B. Probability
Lemma B.6. Suppose is a nite measure on the -algebra (/) generated
by an algebra /. Show that for every B (/) and > 0 there exists A /
such that (AB) < .
There are a few inequalities that are used constantly in probability. We
already stated the conditional version of Jensens inequality in Theorem
1.26. Of course the reader should realize that as a special case we get
the unconditioned version: f(EX) Ef(X) for convex f for which the
expectations are well-dened.
Proposition B.7. (i) (Markovs inequality) For a random variable X 0
and a real number b > 0,
(B.3) PX b b
1
EX.
(ii) (Chebyshevs inequality) For a random variable X with nite mean and
variance and a real b > 0,
(B.4) P[X EX[ b b
2
Var(X).
Proof. Chebyshevs is a special case of Markovs, and Markovs is quickly
proved:
PX b = E1X b b
1
E
_
X1X b
_
b
1
EX.
Let A
n
be a sequence of events in a probability space (, T, P). We
say A
n
happens innitely often at if A
n
for innitely many n. Equiv-
alently,
: A
n
happens innitely often =
m=1
_
n=m
A
n
.
The complementary event is
: A
n
happens only nitely many times =
_
m=1
n=m
A
c
n
.
Often it is convenient to use the random variable
N
0
() = inf
_
m :
n=m
A
c
n
_
with N
0
() = if A
n
happens innitely often at .
Lemma B.8. (Borel-Cantelli Lemma) Let A
n
be a sequence of events in
a probability space (, T, P) such that

n
P(A
n
) < . Then
PA
n
happens innitely often = 0,
or equivalently, PN
0
< = 1.
Proof. Since the tail of a convergent series can be made arbitrarily small,
we have
P
_

m=1
_
n=m
A
n
_

n=m
P(A
n
) 0 as m .
The typical application of the Borel-Cantelli lemma is some variant of
this idea.
Lemma B.9. Let X
n
be a sequence of random variables on (, T, P).
Suppose
n
P[X
n
[ < for every > 0.
Then X
n
0 almost surely.
Proof. Translating the familiar -denition of convergence into an event
gives
: X
n
() 0 =
k1
_
m1
nm
: [X
n
()[ 1/k.
For each k 1, the hypothesis and the Borel-Cantelli lemma give
P
_
_
m1
nm
[X
n
[ 1/k
_
= 1 P
_

m1
_
nm
[X
n
[ > 1/k
_
= 1.
A countable intesection of events of probability one has also probability
one.
Here is an example of the so-called diagonal trick for constructing a
single subsequence that satises countably many requirements.
Lemma B.10. For each j N let Y
j
and X
j
n
be random variables such
that X
j
n
Y
j
in probability as n . Then there exists a single sub-
sequence n
k
such that, simultaneously for all j N, X
j
n
k
Y
j
almost
surely as k .
Proof. Applying part (iii) of Theorem 1.21 as it stands gives for each j
separately a subsequence n
k,j
such that X
j
n
k,j
Y
j
almost surely as k .
To get one subsequence that works for all j we need a further argument.
Begin by choosing for j = 1 a subsequence n(k, 1)
kN
such that
X
1
n(k,1)
Y
1
almost surely as k . Along the subsequence n(k, 1) the
variables X
2
n(k,1)
still converge to Y
2
in probability. Consequently we can
choose a further subsequence n(k, 2) n(k, 1) such that X
2
n(k,2)
Y
2
Continue inductively, producing nested subsequences
n(k, 1) n(k, 2) n(k, j) n(k, j + 1)
308 B. Probability
such that for each j, X
j
n(k,j)
Y
j
almost surely as k . Now set
n
k
= n(k, k). Then for each j, from k = j onwards n
k
is a subsequence of
n(k, j). Consequently X
j
n
k
Y
j
The basic convergence theorems of integration theory often work if al-
most everywhere convergence is replaced by the weaker types of convergence
that are common in probability. As an example, here is the dominated con-
vergence theorem under convergence in probability.
Theorem B.11. Let X
n
be random variables on (, T, P), and assume
X
n
X in probability. Assume there exists a random variable Y 0 such
that [X
n
[ Y almost surely for each n, and EY < . Then EX
n
EX.
Proof. It suces to show that every subsequence n
k
has a further sub-
subsequence n
k
j
such that EX
n
k
j
EX as j . So let n
k
be
given. Convergence in probability X
n
k
X implies almost sure conver-
gence X
n
k
j
X along some subsubsequence n
k
j
. The standard domi-
nated convergence theorem now gives EX
n
k
j
EX.
Conditional expectations satisfy some of the same convergence theorems
as ordinary expectations.
Theorem B.12. Let / be a sub--eld of T on the probability space (, T, P).
Let X
n
and X be integrable random variables. The hypotheses and con-
clusions below are all almost surely.
(i) (Monotone Convergence Theorem) Suppose X
n+1
X
n
0 and
X
n
X. Then E(X
n
[/) E(X[/).
(ii) (Fatous Lemma) If X
n
0 for all n, then
E
_
limX
n
[/
_
limE(X
n
[/).
(iii) (Dominated Convergence Theorem) Suppose X
n
X, [X
n
[ Y
for all n, and Y L
1
(P). Then E(X
n
[/) E(X[/).
Proof. Part (i). By the monotonicity of the sequence E(X
n
[/) and the
ordinary monotone convergence theorem, for A /,
E
_
1
A
lim
n
E(X
n
[/)
= lim
n
E
_
1
A
E(X
n
[/)
= lim
n
E
_
1
A
X
n
= E
_
1
A
X
= E
_
1
A
E(X[/)
.
Since A / is arbitrary, this implies the almost sure equality of the /-
measurable random variables lim
n
E(X
n
[/) and E(X[/).
Part (ii). The sequence Y
k
= inf
mk
X
m
increases up to limX
n
. Thus
by part (i),
E
_
limX
n
[/
_
= lim
n
E
_
inf
kn
X
k
[/
_
lim
n
E(X
n
[/).
Part (iii). Usig part (ii),
E(X[/) +E(Y [/) = E
_
limX
n
+Y [/
_
limE
_
X
n
+Y [/
_
= limE(X
n
[/) +E(Y [/).
This gives limE(X
n
[/) E(X[/). Apply this to X
n
to get limE(X
n
[/)
E(X[/).
We dened uniform integrability for a sequence in Section 1.2.2. The
idea is the same for a more general collection of random variables.
Denition B.13. Let X
: A be a collection of random variables

dened on a probability space (, T, P). They are uniformly integrable if
lim
M
sup
A
E
_
[X
[ 1[X
[ M
= 0.
Equivalently, the following two conditions are satised. (i) sup
E[X
[ < .
(ii) Given > 0, there exists a > 0 such that for every event B such that
P(B) ,
sup
A
_
B
[X
[ dP .
Proof of the equivalence of the two formulations of uniform integrability
can be found for example in [3]. The next lemma is a great exercise, or a
proof can be looked up in Section 4.5 of [4].
Lemma B.14. Let X be an integrable random variable on a probability
space (, T, P). Then the collection of random variables
E(X[/) : / is a sub--eld of T
is uniformly integrable.
Lemma B.15. Suppose X
n
X in L
1
on a probability space (, T, P).
Let / be a sub--eld of T. Then there exists a subsequence n
j
such that
E[X
n
j
[/] E[X[/] a.s.
Proof. We have
lim
n
E
_
E[ [X
n
X[ [/]
_
= lim
n
E
_
[X
n
X[
_
= 0,
and since
[ E[X
n
[/] E[X[/] [ E[ [X
n
X[ [/],
310 B. Probability
we conclude that E[X
n
[/] E[X[/] in L
1
. L
1
convergence implies a.s.
convergence along some subsequence.
Lemma B.16. Let X be a random d-vector and / a sub--eld on the
probability space (, T, P). Let
() =
_
R
d
e
i
T
x
(dx) ( R
d
)
be the characteristic function (Fourier transform) of a probability distribu-
tion on R
d
. Assume
E
_
expi
T
X1
A
= ()P(A)
for all R
d
and A /. Then X has distribution and is independent
of /.
Proof. Taking A = above shows that X has distribution . Fix A such
that P(A) > 0, and dene the probability measure
A
on R
d
by
A
(B) =
1
P(A)
E
_
1
B
(X)1
A
, B B
R
d.
By hypothesis, the characteristic function of
A
is . Hence
A
= , which
is the same as saying that
P
_
X B A
_
=
A
(B)P(A) = (B)P(A).
Since B B
R
d and A / are arbitrary, the independence of X and /
follows.
The extremely useful Kolmogorov-Centsov criterion establishes path-
level continuity of a stochastic process from moment estimates on incre-
ments. We prove the result for a process X indexed by the d-dimensional
unit cube [0, 1]
d
and with values in a general complete separable metric space
(S, ). The unit cube can be replaced by any bounded rectangle in R
d
by a
simple ane transformation s As +h of the index.
Dene
D
n
=
_
(k
1
, . . . , k
d
)2
n
: k
1
, . . . , k
d
0, 1, . . . , 2
n
_
and then the set D =

n0
D
n
of dyadic rational points in [0, 1]
d
. The trick
for getting a continuous version of a process is to use Lemma A.3 to rebuild
the process from its values on D.
Theorem B.17. Let (S, ) be a complete separable metric space.
(a) Suppose X
s
: s D is an S-valued stochastic process dened
on some probability space (, T, P) with the following property: there exist
constants K < and , > 0 such that
(B.5) E
_
(X
s
, X
r
)
K[s r[
d+
for all r, s D.
Then there exists an S-valued process Y
s
: s [0, 1]
d
on (, T, P) such
that the path s Y
s
() is continuous for each and PY
s
= X
s
= 1
for each s D. Furthermore, Y is Holder continuous with index for each
(0, /): for each there exists a constant C(, ) < such that
(B.6) (Y
s
(), Y
r
()) C(, )[s r[
for all r, s [0, 1]

d
.
(b) Suppose X
s
: s [0, 1]
d
is an S-valued stochastic process that sat-
ises the moment bound (B.5) for all r, s [0, 1]
d
. Then X has a continuous
version that is Holder continuous with exponent for each (0, /).
Proof. For n 0 let
n
= max(X
r
, X
s
) : r, s D
n
, [sr[ = 2
n
denote
the maximal nearest-neighbor increment among points in D
n
. The idea of
the proof is to use nearest-neighbor increments to control all increments
by building paths through dyadic points. Pick (0, /). First, noting
that D
n
has (2
n
+ 1)
d
points and each point in D
n
has at most 2d nearest
neighbors,
E(
n
)
r,sDn:[sr[=2
n
E
_
(X
s
, X
r
)
2d(2
n
+ 1)
d
K2
n(d+)
C(d)2
n
.
Consequently
E
_
n0
(2
n
n
)
_
=
n0
2
n
E
n
C(d)
n0
2
n()
< .
A nonnegative random variable with nite expectation must be nite almost
surely. So there exists an event
0
such that P(
0
) = 1 and

n
(2
n
n
)
<
for
0
. In particular, for each
0
there exists a constant
C() < such that
(B.7)
n
() C()2
n
for all n 0.
This constant depends on we ignore that in the notation.
Now we construct the paths. Suppose rst d = 1 so our dyadic rationals
are points in the interval [0, 1]. The simple but key observation is that given
r < s in D such that s r 2
m
, the interval (r, s] can be partitioned as
r = s
0
< s
1
< < s
M
= s so that each pair (s
i1
, s
i
) is a nearest-neighbor
pair in a particular D
n
for some n m, and no D
n
contributes more than
two such pairs .
To see this, start by nding the smallest m
0
m such that (r, s] contains
an interval of the type (k2
m
0
, (k +1)2
m
0
]. At most two such intervals t
inside (r, s] because otherwise we could have taken m
0
1. Remove these
level m
0
intervals from (r, s]. What remains are two intervals (r, k
0
2
m
0
]
and (
0
2
m
0
, s] both of length strictly less than 2
m
0
. Now the process
312 B. Probability
continues separately on the left and right piece. On the left, next look for
the smallest m
1
> m
0
such that for some k
1
(k
1
1)2
m
1
< r k
1
2
m
1
< k
0
2
m
0
.
This choice ensures that (k
1
2
m
1
, k
0
2
m
0
) is a nearest-neighbor pair in D
m
1
.
Remove the interval (k
1
2
m
1
, k
0
2
m
0
], be left with (r, k
1
2
m
1
], and continue
in this manner. Note that if r < k
1
2
m
1
then r cannot lie in D
m
1
. Since r lies
in some D
n
0
eventually this process stops. The same process is performed
on the right.
Extend this to d dimensions by connecting each coordinate in turn. In
conclusion, if r, s D satisfy [s r[ 2
m
we can write a nite sum
sr =

(s
i
s
i1
) such that each (s
i1
, s
i
) is a nearest-neighbor pair from
a particular D
n
for some n m, and no D
n
contributes more than 2d such
pairs. By the triangle inequality and by (B.7), for
0
(X
r
(), X
s
())
nm
2dC()2
n
C()2
m
.
We utilized above the common convention of letting a constant change from
one step to the next when its precise value is of no consequence.
Now given any r, s D pick m 0 so that 2
m1
[r s[ 2
m
, and
use the previous display to write, still for
0
,
(B.8)
(X
r
(), X
s
()) C()2
m
C()2
(2
m1
)
C()[s r[
.
This shows that for
0
, X
() is Holder continuous with exponent , and

so in particular uniformly continuous. To dene Y x some point x
0
S,
given s [0, 1]
d
nd D s
j
s and dene
Y
s
() =
_
lim
j
X
s
j
(),
0
x
0
, /
0
.
The proof of Lemma A.3 shows that the denition of Y
s
() does not depend
on the sequence s
j
chosen and s Y
s
() is continuous for
0
. For
/
0
this path is continuous by virtue of being a constant. Also, Y
s
() =
X
s
() for
0
and s D. The Holder continuity of Y follows from (B.8)
by letting s and r converge to points in [0, 1]
d
. This completes the proof of
part (a) of the theorem.
For part (b) construct Y as in part (a) from the restriction X
s
: s
D. To see that PY
s
= X
s
= 1 also for s / D, pick again a sequence
D s
j
s, > 0 and write
P(X
s
, Y
s
) P(X
s
, X
s
j
) /3 +P(X
s
j
, Y
s
j
) /3
+P(Y
s
j
, Y
s
) /3.
As j , on the right the rst probability vanishes by the moment as-
sumption and Chebyshevs inequality:
P(X
s
, X
s
j
) /3 3
K[s s
j
[
d+
0.
The last probability vanishes as j because (Y
s
j
(), Y
s
()) 0 for
all by the continuity of Y . The middle probability equals zero because
PY
s
j
= X
s
j
= 1.
In conclusion, P(X
s
, Y
s
) = 0 for every > 0. This implies that
with probability 1, (X
s
, Y
s
) = 0 or equivalently X
s
= Y
s
.
B.2. Construction of Brownian motion
We construct here a one-dimensional standard Brownian motion by con-
structing its probability distribution on the canonical path space C =
C
R
[0, ). Let B
t
() = (t) be the coordinate projections on C, and
T
B
t
= B
s
: 0 s t the ltration generated by the coordinate pro-
cess.
Theorem B.18. There exists a Borel probability measure P
0
on C =
C
R
[0, ) such that the process B = B
t
: 0 t < on the probabil-
ity space (C, B
C
, P
0
) is a standard one-dimensional Brownian motion with
respect to the ltration T
B
t
.
The construction gives us the following regularity property of paths. Fix
0 < <
1
2
. For P
0
-almost every C,
(B.9) sup
0s<tT
[B
t
() B
s
()[
[t s[
< for all T < .

The proof relies on the Kolmogorov Extension Theorem 1.28. We do not
directly construct the measure on C. Instead, we rst construct the process
on positive dyadic rational time points
Q
2
=
_
k
2
n
: k, n N
_
.
Then we apply an important theorem, the Kolmogorov-Centsov criterion, to
show that the process has a unique continuous extension from Q
2
to [0, )
that satises the Holder property. The distribution of this extension will be
the Wiener measure on C.
Let
(B.10) g
t
(x) =
1
2t
exp
_
x
2
2t
_
be the density of the normal distribution with mean zero and variance t (g for
Gaussian). For an increasing n-tuple of positive times 0 < t
1
< t
2
< < t
n
,
let t = (t
1
, t
2
, . . . , t
n
). We shall write x = (x
1
, . . . , x
n
) for vectors in R
n
,
314 B. Probability
and abbreviate dx = dx
1
dx
2
dx
n
to denote integration with respect to
Lebesgue measure on R
n
. Dene a probability measure
t
on R
n
by
(B.11)
t
(A) =
_
R
n
1
A
(x) g
t
1
(x
1
)
n
i=2
g
t
i
t
i1
(x
i
x
i1
) dx
for A B
R
n. Before proceeding further, we check that this denition is
the right one, namely that
t
is the distribution we want for the vector
(B
t
1
, B
t
2
, . . . , B
tn
).
Lemma B.19. If a one-dimensional standard Brownian motion B exists,
then for A B
R
n
P
_
(B
t
1
, B
t
2
, . . . , B
tn
) A
_
=
t
(A).
Proof. Dene the nonsingular linear map
T(y
1
, y
2
, y
3
, . . . , y
n
) = (y
1
, y
1
+y
2
, y
1
+y
2
+y
3
, . . . , y
1
+y
2
+ +y
n
)
with inverse
T
1
(x
1
, x
2
, x
3
, . . . , x
n
) = (x
1
, x
2
x
1
, x
3
x
2
, . . . , x
n
x
n1
).
In the next calculation, use the fact that the Brownian increments are inde-
pendent and B
t
i
B
t
i1
has density g
t
i
t
i1
.
P
_
(B
t
1
, B
t
2
, B
t
3
, . . . , B
tn
) A
_
= P
_
T(B
t
1
, B
t
2
B
t
1
, B
t
3
B
t
2
, . . . , B
tn
B
t
n1
) A
_
= P
_
(B
t
1
, B
t
2
B
t
1
, B
t
3
B
t
2
, . . . , B
tn
B
t
n1
) T
1
A
_
=
_
R
n
1
A
(Ty)g
t
1
(y
1
)g
t
2
t
1
(y
2
) g
tnt
n1
(y
n
) dy.
Now change variables through x = Ty y = T
1
x. T has determinant
one, so by (A.18) the integral above becomes
_
R
n
1
A
(x)g
t
1
(x
1
)g
t
2
t
1
(x
2
x
1
) g
tnt
n1
(x
n
x
n1
) dx =
t
(A).
To apply Kolmogorovs Extension Theorem we need to dene consistent

nite-dimensional distributions. For an n-tuple s = (s
1
, s
2
, . . . , s
n
) of dis-
tinct elements from Q
2
let be the permutation that orders it. In other
words, is the bijection on 1, 2, . . . , n determined by
s
(1)
< s
(2)
< < s
(n)
.
(In this section all n-tuples of time points have distinct entries.) Dene the
probability measure Q
s
on R
n
by Q
s
=
s
, or in terms of integrals of
bounded Borel functions,
_
R
n
f dQ
s
=
_
R
n
f(
1
(x)) d
s
.
Let us convince ourselves again that this denition is the one we want.
Q
s
should represent the distribution of the vector (B
s
1
, B
s
2
, . . . , B
sn
), and
indeed this follows from Lemma B.19:
_
R
n
f(
1
(x)) d
s
= E
_
(f
1
)(B
s
(1)
, B
s
(2)
)
, . . . , B
s
(n)
)
= E
_
f(B
s
1
, B
s
2
, . . . , B
sn
)
.
For the second equality above, apply to y
i
= B
s
i
the identity
1
(y
(1)
, y
(2)
, . . . , y
(n)
) = (y
1
, y
2
, . . . , y
n
)
which is a consequence of the way we dened the action of a permutation
on a vector in Section 1.2.4.
Let us check the consistency properties (i) and (ii) required for the Ex-
tension Theorem 1.28. Suppose t = s for two distinct n-tuples s and t
from Q
2
and a permutation . If orders t then orders s, because
t
(1)
< t
(2)
implies s
((1))
< s
((2))
. One must avoid confusion over how
the action of permutations is composed:
(s) = (s
((1))
, s
((2))
, . . . , s
((n))
)
because
_
(s)
_
i
= (s)
(i)
= s
((i))
. Then
Q
s

1
=
_
(s)
( )
_

1
=
t
= Q
t
.
This checks (i). Property (ii) will follow from this lemma.
Lemma B.20. Let t = (t
1
, . . . , t
n
) be an ordered n-tuple , and let

t =
(t
1
, . . . , t
j1
, t
j+1
, . . . , t
n
) be the (n 1)-tuple obtained by removing t
j
from
t. Then for A B
R
j1 and B B
R
nj ,
t
(ARB) =
t
(AB).
Proof. A basic analytic property of Gaussian densities is the convolution
identity g
s
g
t
(x) = g
s+t
(x), where the convolution of two densities f and g
is in general dened by
(f g)(x) =
_
R
f(y)g(x y) dy.
316 B. Probability
We leave checking this to the reader. The corresponding probabilistic prop-
erty is that the sum of independent normal variables is again normal, and
the means and variances add.
The conclusion of the lemma follows from a calculation. Abbreviate x
t
=
(x
1
, . . . , x
j1
), x
tt
= (x
j+1
, . . . , x
n
) and x = (x
t
, x
tt
). It is also convenient to
use t
0
= 0 and x
0
= 0.
t
(ARB) =
_
A
dx
t
_
R
dx
j
_
B
dx
tt
n
i=1
g
t
i
t
i1
(x
i
x
i1
)
=
_
AB
d x
i,=j,j+1
g
t
i
t
i1
(x
i
x
i1
)
_
R
dx
j
g
t
j
t
j1
(x
j
x
j1
)g
t
j+1
t
j
(x
j+1
x
j
).
After a change of variables y = x
j
x
j1
, the interior integral becomes
_
R
dy g
t
j
t
j1
(y)g
t
j+1
t
j
(x
j+1
x
j1
y) = g
t
j
t
j1
g
t
j+1
t
j
(x
j+1
x
j1
)
= g
t
j+1
t
j1
(x
j+1
x
j1
).
Substituting this back up gives
_
AB
d x
n
i,=j,j+1
g
t
i
t
i1
(x
i
x
i1
) g
t
j+1
t
j1
(x
j+1
x
j1
)
=
t
(AB).
As A B
R
j1 and B B
R
nj range over these -algebras, the class of
product sets A B generates B
R
n1 and is also closed under intersections.
Consequently by Lemma B.3, the conclusion of the above lemma generalizes
to
(B.12)
t
x R
n
: x G =
t
(G) for all G B
R
n1.
We are ready to check requirement (ii) of Theorem 1.28. Let t =
(t
1
, t
2
, . . . , t
n1
, t
n
) be an n-tuple, s = (t
1
, t
2
, . . . , t
n1
) and A B
R
n1.
We need to show Q
s
(A) = Q
t
(A R). Suppose orders t. Let j be the
index such that (j) = n. Then s is ordered by the permutation
(i) =
_
(i), i = 1, . . . , j 1
(i + 1), i = j, . . . , n 1.
A few observations before we compute: is a bijection on 1, 2, . . . , n 1
as it should. Also
(x
1
(1)
, x
1
(2)
, . . . , x
1
(n1)
) = (x
1
((1))
, x
1
((2))
, . . . , x
1
((n1))
)
= (x
1
, . . . , x
j1
, x
j+1
, . . . , x
n
) = x
where the notation x is used as in the proof of the previous lemma. And
lastly, using

t to denote the omission of the jth coordinate,
t = (t
(1)
, . . . , t
(j1)
, t
(j+1)
, . . . , t
(n)
)
= (t
(1)
, . . . , t
(j1)
, t
(j)
, . . . , t
(n1)
)
= s.
Equation (B.12) says that the distribution of x under
t
is
c
t
. From these
ingredients we get
Q
t
(AR) =
t
((AR))
=
t
x R
n
: (x
1
(1)
, x
1
(2)
, . . . , x
1
(n1)
) A
=
t
x R
n
: x A
=
c
t
(A) =
s
(A) = Q
s
(A).
We have checked the hypotheses of Kolmogorovs Extension Theorem for
the family Q
t
.
Let
2
= R
Q
2
with its product -algebra (
2
= B(R)
Q
2
. Write = (
t
:
t Q
2
) for a generic element of
2
. By Theorem 1.28 there is a probability
measure Q on (
2
, (
2
) with marginals
Q
_

2
: (
t
1
,
t
2
, . . . ,
tn
) B
_
= Q
t
(B)
for n-tuples t = (t
1
, t
2
, . . . , t
n
) of distinct elements from Q
2
and B B
R
n.
We write E
Q
for expectation under the measure Q.
So far we have not included the time origin in the index set Q
2
. This
was for reasons of convenience. Because B
0
has no density, to include t
0
= 0
would have required two formulas for
t
in (B.11), one for n-tuples with
zero, the other for n-tuples without zero. Now dene Q
0
2
= Q
2
0. On
2
dene random variables X
q
: q Q
0
2
by
(B.13) X
0
() = 0, and X
q
() =
q
for q Q
2
.
The second step of our construction of Brownian motion is the proof
that the process X
q
is uniformly continuous on bounded index sets. This
comes from an application of the Kolmogorov-Centsov criterion, Theorem
B.17.
Lemma B.21. Let X
q
be the process dened by (B.13) on the probability
space (
2
, (
2
, Q), where Q is the probability measure whose existence came
from Kolmogorovs Extension Theorem. Let 0 < <
1
2
. Then there is an
event such that Q() = 1 with this property: for all and T < ,
there exists a nite constant C
T
() such that
(B.14) [X
s
() X
r
()[ C
T
()[s r[
for all r, s Q
0
2
[0, T].
318 B. Probability
In particular, for the function q X
q
() is uniformly continuous on
Q
0
2
[0, T] for every T < .
Proof. We apply Theorem B.17 to an interval [0, T]. Though Theorem
B.17 was written for the unit cube [0, 1]
d
it applies obviously to any closed
bounded interval or rectangle, via a suitable transformation of the index s.
We need to check the hypothesis (B.5). Due to the denition of the
nite-dimensional distributions of Q, this reduces to computing a moment
of the Gaussian distribution. Fix an integer m 2 large enough so that
1
2

1
2m
> . Let 0 q < r be indices in Q
0
2
. In the next calculation, note
that after changing variables in the dy
2
-integral it no longer depends on y
1
,
and the y
1
-variable can be integrated away.
E
Q
_
(X
r
X
q
)
2m
=
__
R
2
(y
2
y
1
)
2m
g
q
(y
1
)g
rq
(y
2
y
1
) dy
1
dy
2
=
_
R
dy
1
g
q
(y
1
)
_
R
dy
2
(y
2
y
1
)
2m
g
rq
(y
2
y
1
)
=
_
R
dy
1
g
q
(y
1
)
_
R
dxx
2m
g
rq
(x) =
_
R
dxx
2m
g
rq
(x)
=
1
_
2(r q)
_
R
x
2m
exp
_
x
2
2(r q)
_
dx
= (r q)
m
_
R
z
2m
exp
_
z
2
2
_
dz = C
m
[r q[
m
,
where C
m
= 1 3 5 (2m 1), the product of the odd integers less than
2m. We have veried the hypothesis (B.5) for the values = m 1 and
= 2m, and by choice of m,
0 < < / =
1
2

1
2m
.
Theorem B.17 now implies the following. For each T < there exists an
event
T

2
such that Q(
T
) = 1 and for every
T
there exists a
nite constant C() such that
(B.15) [X
r
() X
q
()[ C()[r q[
for any q, r [0, T] Q

0
2
. Take
=
T=1
T
.
Then Q() = 1 and each has the required property.
With the local Holder property (B.14) in hand, we can now extend the
denition of the process X to the entire time line [0, ) while preserving
the local Holder property, as was done for part (b) in Theorem B.17. The
value X
t
() for any t / Q
0
2
can be dened by
X
t
() = lim
i
X
q
i
()
for any sequence q
i
from Q
0
2
such that q
i
t. This tells us that the
random variables X
t
: 0 t < are measurable on . To have a
continuous process X
t
dened on all of
2
set
X
t
() = 0 for / and all t 0.
The process X
t
is a one-dimensional standard Brownian motion. The
initial value X
0
= 0 has been built into the denition, and we have the
path continuity. It remains to check nite-dimensional distributions. Fix
0 < t
1
< t
2
< < t
n
. Pick points 0 < q
k
1
< q
k
2
< < q
k
n
from Q
0
2
so
that q
k
i
t
i
as k , for 1 i n. Pick them so that for some > 0,
q
k
i
q
k
i1

1
for all i and k. Let be a bounded continuous function
on R
n
. Then by the path-continuity (use again t
0
= 0 and x
0
= 0)
_
2
(X
t
1
, X
t
2
, . . . , X
tn
) dQ = lim
k
_
2
(X
q
k
1
, X
q
k
2
, . . . , X
q
k
n
) dQ
= lim
k
_
R
n
(x)
n
i=1
g
q
k
i
q
k
i1
(x
i
x
i1
) dx
=
_
R
n
(x)
n
i=1
g
t
i
t
i1
(x
i
x
i1
) dx =
_
R
n
(x)
t
(dx).
The second-last equality above is a consequence of dominated convergence.
An L
1
(R
n
) integrable bound can be gotten by
g
q
k
i
q
k
i1
(x
i
x
i1
) =
exp
_
(x
i
x
i1
)
2
2(q
k
i
q
k
i1
)
_
_
2(q
k
i
q
k
i1
)
exp
_
(x
i
x
i1
)
2
2
1
_
2
.
Comparison with Lemma B.19 shows that (X
t
1
, X
t
2
, . . . , X
tn
) has the dis-
tribution that Brownian motion should have. An application of Lemma
B.4 is also needed here, to guarantee that it is enough to check continuous
functions . It follows that X
t
has independent increments, because this
property is built into the denition of the distribution
t
.
To complete the construction of Brownian motion and nish the proof
of Theorem 2.27, we dene the measure P
0
on C as the distribution of the
process X = X
t
:
P
0
(B) = Q
2
: X() A for A B
C
.
For this, X has to be a measurable map from
2
into C. This follows
from Exercise 1.7(b) because B
C
is generated by the projections
t
:
320 B. Probability
(t). The compositions of projections with X are precisely the coordinates
t
X = X
t
which are measurable functions on
2
. The coordinate process
B = B
t
on C has the same distribution as X by the denition of P
0
:
P
0
B H = P
0
(H) = QX H for H B
C
.
We also need to check that B has the correct relationship with the
ltration T
B
t
, as required by Denition 2.26. B is adapted to T
B
t
by
construction. The independence of B
t
B
s
and T
B
s
follows from two facts:
B inherits independent increments from X (independence of increments is a
property of nite-dimensional distributions), and the increments B
v
B
u
:
0 u < v s generate T
B
s
. This completes the proof of Theorem 2.27.
Holder continuity (2.27) follows from the equality in distribution of the
processes B and X, and because the Holder property of X was built into
the construction through the Kolmogorov-Centsov theorem.
Bibliography
[1] R. F. Bass. The measurability of hitting times. Electron. Commun. Probab., 15:99
105, 2010.
[2] K. L. Chung and R. J. Williams. Introduction to stochastic integration. Probability
and its Applications. Birkh auser Boston Inc., Boston, MA, second edition, 1990.
[3] R. M. Dudley. Real analysis and probability. The Wadsworth & Brooks/Cole Mathe-
matics Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacic Grove,
CA, 1989.
[4] R. Durrett. Probability: theory and examples. Duxbury Advanced Series.
Brooks/ColeThomson, Belmont, CA, third edition, 2004.
[5] S. N. Ethier and T. G. Kurtz. Markov processes: Characterization and convergence.
Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons Inc.,
New York, 1986.
[6] G. B. Folland. Real analysis: Modern techniques and their applications. Pure and
Applied Mathematics. John Wiley & Sons Inc., New York, second edition, 1999.
[7] K. It o and H. P. McKean, Jr. Diusion processes and their sample paths. Springer-
Verlag, Berlin, 1974. Second printing, corrected, Die Grundlehren der mathematis-
chen Wissenschaften, Band 125.
[8] O. Kallenberg. Foundations of modern probability. Probability and its Applications
(New York). Springer-Verlag, New York, second edition, 2002.
[9] I. Karatzas and S. E. Shreve. Brownian motion and stochastic calculus, volume 113
of Graduate Texts in Mathematics. Springer-Verlag, New York, second edition, 1991.
[10] P. E. Protter. Stochastic integration and dierential equations, volume 21 of Appli-
cations of Mathematics (New York). Springer-Verlag, Berlin, second edition, 2004.
Stochastic Modelling and Applied Probability.
[11] P. E. Protter. Stochastic integration and dierential equations, volume 21 of Appli-
cations of Mathematics (New York). Springer-Verlag, Berlin, second edition, 2004.
Stochastic Modelling and Applied Probability.
[12] S. Resnick. Adventures in stochastic processes. Birkh auser Boston Inc., Boston, MA,
1992.
321
322 Bibliography
[13] D. Revuz and M. Yor. Continuous martingales and Brownian motion, volume 293 of
Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Math-
ematical Sciences]. Springer-Verlag, Berlin, third edition, 1999.
[14] K. R. Stromberg. Introduction to classical real analysis. Wadsworth International,
Belmont, Calif., 1981. Wadsworth International Mathematics Series.
Notation and
Conventions
Mathematicians tend to use parentheses ( ), braces and square brackets
[ ] interchangeably in many places. But there are situations where the de-
limiters have specic technical meanings and they cannot be used carelessly.
One example is (a, b) for an ordered pair or an open interval, and [a, b] for a
closed interval. Then there are situations where one type is conventionally
preferred and deviations are discouraged: for example f(x) for a function
value and x : f(x) B for sets. An example were all three forms are
acceptable is the probability of the event X B: P(X B), P[X B] or
PX B.
With expectations there is the question of the meaning of EX
2
. Does
the square come before or after the expectation? The notation in this book
follows these conventions:
EX
2
= E(X
2
) while E[X]
2
= E(X)
2
=
_
E[X]
_
2
.
Vectors are sometimes boldface: u = (u
1
, u
2
, . . . , u
n
) and also du =
du
1
du
2
du
n
in multivariate integrals.
C
2
(D) is a function space introduced on page 200.
C
1,2
([0, T] D) is a function space introduced on page 204.
i,j
= 1i = j =
_
1 if x = y
0 if x ,= y
is sometimes called the Dirac delta.
323
324 Notation and Conventions
x
(A) =
_
1 if x A
0 if x / A
is the point mass at x, or the probability
measure that puts all its mass on the point x.
denotes the symmetric dierence of two sets:
AB = (A B) (B A) = (A B
c
) (B A
c
).
f(x) = f
x
1
,x
1
(x) + +f
x
d
,x
d
(x) is the Laplace operator.
Z(t) = Z(t) Z(t) is the jump at t of the cadlag process Z.
i is short for if and only if.
/
2
= /
2
(M, T) is the space of predictable processes locally L
2
under
M
.
/(M, T) is the space of predictable processes locally in /
2
(M, T), Denition 5.21.
m is Lebesgue measure, Section 1.1.2.
/
2
is the space of cadlag L
2
-martingales.
/
2,loc
is the space of cadlag, local L
2
-martingales.
M
(A) = E
_
[0,)
1
A
(t, )d[M]
t
() is the Doleans measure
of a square-integrable cadlag martingale M.
N = 1, 2, 3, . . . is the set of positive integers.
T is the predictable -eld.
R
+
= [0, ) is the set of nonnegative real numbers.
o
2
is the space of simple predictable processes.
V
F
is the total variation of a function F, Section 1.1.9.
X
T
() = sup
0tT
[X
t
()[.
X
t
= X
t
denes the stopped process X
.
Z
+
= 0, 1, 2, . . . is the set of nonnegative integers.
Index
/(M, 1), 155
M
-equivalence, 131
-algebra, 2
augmented, 34
Borel, 3
generated by a class of functions, 3
generated by a class of sets, 2
product, 4, 35
-eld, 2
absolute continuity of measure, 17
abuse of notation, 81, 222
algebra of sets, 2, 305
beta distribution, 22
Blumenthals 01 law, 69
Borel -algebra, 3
bounded variation, 14
Brownian bridge, 224
Brownian motion
Blumenthals 01 law, 69
construction, 313
denition, 63
Doleans measure, 130
Gaussian process, 82
geometric, 223
geometric Brownian motion, 227
Holder continuity, 65, 72
martingale property, 65
modulus of continuity, 74
multidimensional, 64
non-dierentiability, 74
quadratic cross-variation, 83
quadratic variation, 74
reected, 277
reection principle, 71, 278
running maximum, 71, 277
standard, 63
unbounded variation, 74, 75
Burkholder-Davis-Gundy inequalities, 215
BV function, 14
cadlag, 44
caglad, 44
change of variables, 9
characteristic function (of a set), 6
Chebyshevs inequality, 306
complete measure, 11
conditional expectation
convergence theorems, 308
denition, 26
Jensens inequality, 29
convex function, 29
cumulative distribution function, 21
diagonal trick, 307
discrete space, 79
distribution function, 21
dominated convergence theorem, 8
for conditional expectations, 308
generalized, 301
stochastic integral, 175
under convergence in probability, 308
Doobs inequality, 92
equality
almost sure, 22
in distribution, 22
equality in distribution
stochastic processes, 41
exponential distribution, 21
325
326 Index
memoryless property, 28, 35
Fatous lemma, 8
Feller process, 60
ltration, 39
augmented, 40
complete, 40
left-continuous, 45
right-continuous, 45
usual conditions, 45, 49, 96
Fubinis theorem, 12
fundamental theorem of local martingales,
96
FV process, 44
gamma distribution, 21
gamma function, 22
Gaussian (normal) distribution, 22
Gaussian process, 82
geometric Brownian motion, 223, 227
Gronwalls inequality, 234
for SDEs, 249
Holder continuity, 65
Hahn decomposition, 14
hitting time, 46
is a stopping time, 49
i, 324
increasing process, 52
natural, 104
independence, 24
pairwise, 35
indicator function, 6
integral, 6
Lebesgue-Stieltjes, 15
Riemann, 9
Riemann-Stieltjes, 15
integrating factor, 220
integration by parts, 198
Ito equation, 220
Jensens inequality, 29, 306
Jordan decomposition
of a function, 15
of a signed measure, 14
Kolmogorovs extension theorem, 31
Kolmogorov-Centsov criterion, 310
Kunita-Watanabe inequality, 53
Levys 0-1 law, 94
Laplace operator, 208
law (probability distribution) of a random
variable, 20
Lebesgue integral, 6
Lebesgue measure, 6
Lebesgue-Stieltjes measure, 5, 15
point of increase, 33
Lipschitz continuity, 65, 291
local L
2
martingale, 95
local martingale, 95
fundamental theorem, 96
Markov process, 57, 58
Markovs inequality, 306
martingale
convergence theorem, 94
Doobs inequality, 92
local L
2
martingale, 95
local martingale, 95
semimartingale, 97
spaces of martingales, 105
measurable
function, 2
set, 2
space, 2
measurable rectangle, 12
measure, 4
absolute continuity, 17
complete, 11
Lebesgue, 6
Lebesgue-Stieltjes, 5
product, 12
signed, 13
memoryless property, 35
mesh of a partition, 9
metric
uniform convergence on compacts, 56
monotone convergence theorem, 8
normal (Gaussian) distribution, 22
null set, 8, 11, 22
ordinary dierential equation (ODE), 220,
228
Ornstein-Uhlenbeck process, 221
pairwise independence, 35
partition, 9, 14
path space, 56
Cspace, 56
Dspace, 56
point of increase, 33
Poisson process
compensated, 79, 83
homogeneous, 78
Markov property, 79
martingale characterization, 217
Index 327
martingales, 83
not predictable, 187
on an abstract space, 77
semigroup, 79
strong Markov property, 79
Poisson random measure, 77
space-time, 218
power set, 2
predictable
-eld, 128
process, 128, 187
rectangle, 128
predictable covariation, 102
predictable quadratic variation, 102
probability density, 21
probability distribution, 20
beta, 22
exponential, 21
gamma, 21
normal (Gaussian), 22
standard normal, 22
uniform, 21
product -algebra, 4
product measure, 12
progressive measurability, 40, 43, 44, 49
quadratic covariation, 50
local martingales, 98
predictable, 102
semimartingales, 101
Radon-Nikodym derivative
conditional expectation, 26
existence, 17
local time, 269
probability density, 21
Radon-Nikodym theorem, 17
reection problem, 278
Riemann integral, 9
Riemann sum, 9
semialgebra of sets, 305
semigroup property, 59
semimartingale, 97
separable metric space, 13
simple function, 6
simple predictable process, 114, 132
spaces of martingales, 105
standard normal distribution, 22
stochastic dierential, 208
stochastic dierential equation (SDE), 219
Brownian bridge, 224
Ito equation, 220
Ornstein-Uhlenbeck process, 221
stochastic exponential, 224
strong solution, 228
stochastic exponential, 224
stochastic integral
cadlag integrand, 162
characterization in terms of quadratic
covariation, 194
continuous local martingale integrator,
159
dominated convergence theorem, 175
FV martingale intergator, 167
integration by parts, 198
irrelevance of time origin, 157
jumps, 172
limit of Riemann sums, 162, 169
local L
2
martingale integrator, 156
quadratic (co)variation of, 192
semimartingale integrator, 166
substitution, 195, 196
stochastic interval, 147
stochastic process, 40
cadlag, 44
caglad, 44
continuous, 44
equality in distribution, 41
nite variation (FV), 44
Gaussian, 82
increasing, 52
indistinguishability, 41
modication, 41
progressively measurable, 40
version, 41
stopping time, 41
hitting time, 46, 49
maximum, 42
minimum, 42
Tanakas formula, 276
Taylors theorem, 299
total variation
of a function, 14
of a measure, 14
uniform convergence in probability, 108
uniform distribution, 21
uniform integrability, 23, 309
usual conditions, 45, 49, 96
Wiener measure, 64

Basics of Stochastic Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basics of Stochastic Analysis

Uploaded by

Copyright:

Available Formats

Basics of Stochastic Analysis

c _Timo Sepp alainen

are nonnegative functions, and satisfy f = f

. The integral of f is dened by

holds almost everywhere, but can fail on a null set of points x.

/ = A X : there exists B, N / and F N

are mutually singular, and means that there exists a measurable

is the negative variation of , and =

. We say that the signed measure is -nite

whenever both integrals on the right are nite. A function f is integrable

d is nite, and for

provided the conditional expectations are well dened.

with equality i Z = E(X[/).

= (| ^) be the -eld generated by | and ^.

= A / : there exists U | such that UA ^.

is called the augmentation of |.

is contained in T but can be strictly smaller than T.

for all nite subsets t

are complete conveniently avoids certain measurability complications. For

denotes the value of

is dened on the event < , so not necessarily on

is dened with probability one.

, the events A and A < lie in T

-measurable. The events , < ,

. For the rst statement, we need to show that

by the rst part of the proof,

-measurable. By the same token, the stopping

-measurable, hence also T

, and taking complements gives > and

. Now we can interchange and in the conclusions.

which was the claim.

is right-continuous. Performing the same operation again does not lead to

) is to indicate that the

on the left side of

on D or C generated by the coordinates. Our important examples, such

. What needs to be shown is that

< for all T < .

is also a Brownian motion with respect to the right-continuous ltration

shrink down to = 0 as m . By Blumenthals 01 law, P

. Formally we can extract

for all t [s , s +].

for all t [s , s +] [k, k + 1].

for all the j-values in the range k j k + m 1. (Simply because

for this k-value.

. Here is an example. Let = R

does not have

. This last convergence

is Gaussian if for all nite sets of indices t

-measurable. For discrete stopping times

-measurable, (3.1) follows

is a martingale. And nally, if M

is a submartingale not only with respect to T

for r > 0. Let

, the result follows immediately by applying (3.11) to M

such that E[M

() as t for almost every .

and adjoin it to the process in the sense that M

is already part of the conclusion of

() almost surely and E(M

(explain why there cannot be two dierent limits).

is a localizing sequence for M.

is also a local martingale. Similarly, if M is a local

are right-continuous. Hence indistin-

follows from the denition (2.11) and Lemma