Stochastic
Calculus
Alan Bain
1. Introduction
The following notes aim to provide a very informal introduction to Stochastic Calculus,
and especially to the Itˆo integral and some of its applications. They owe a great deal to Dan
Crisan’s Stochastic Calculus and Applications lectures of 1998; and also much to various
books especially those of L. C. G. Rogers and D. Williams, and Dellacherie and Meyer’s
multi volume series ‘Probabilities et Potentiel’. They have also beneﬁted from insights
gained by attending lectures given by T. Kurtz.
The present notes grew out of a set of typed notes which I produced when revising
for the Cambridge, Part III course; combining the printed notes and my own handwritten
notes into a consistent text. I’ve subsequently expanded them inserting some extra proofs
from a great variety of sources. The notes principally concentrate on the parts of the course
which I found hard; thus there is often little or no comment on more standard matters; as
a secondary goal they aim to present the results in a form which can be readily extended
Due to their evolution, they have taken a very informal style; in some ways I hope this
may make them easier to read.
The addition of coverage of discontinuous processes was motivated by my interest in
the subject, and much insight gained from reading the excellent book of J. Jacod and
A. N. Shiryaev.
The goal of the notes in their current form is to present a fairly clear approach to
the Itˆo integral with respect to continuous semimartingales but without any attempt at
maximal detail. The various alternative approaches to this subject which can be found
in books tend to divide into those presenting the integral directed entirely at Brownian
Motion, and those who wish to prove results in complete generality for a semimartingale.
Here at all points clarity has hopefully been the main goal here, rather than completeness;
although secretly the approach aims to be readily extended to the discontinuous theory.
I make no apology for proofs which spell out every minute detail, since on a ﬁrst look at
the subject the purpose of some of the steps in a proof often seems elusive. I’d especially
like to convince the reader that the Itˆo integral isn’t that much harder in concept than
the Lebesgue Integral with which we are all familiar. The motivating principle is to try
and explain every detail, no matter how trivial it may seem once the subject has been
understood!
Passages enclosed in boxes are intended to be viewed as digressions from the main
text; usually describing an alternative approach, or giving an informal description of what
is going on – feel free to skip these sections if you ﬁnd them unhelpful.
In revising these notes I have resisted the temptation to alter the original structure
of the development of the Itˆo integral (although I have corrected unintentional mistakes),
since I suspect the more concise proofs which I would favour today would not be helpful
on a ﬁrst approach to the subject.
These notes contain errors with probability one. I always welcome people telling me
about the errors because then I can ﬁx them! I can be readily contacted by email as
alanb@chiark.greenend.org.uk. Also suggestions for improvements or other additions
are welcome.
Alan Bain
[i]
2. Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . i
2. Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
3. Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . 1
3.1. Probability Space . . . . . . . . . . . . . . . . . . . . . . . 1
3.2. Stochastic Process . . . . . . . . . . . . . . . . . . . . . . 1
4. Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.1. Stopping Times . . . . . . . . . . . . . . . . . . . . . . . 4
5. Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.1. Local Martingales . . . . . . . . . . . . . . . . . . . . . . 8
5.2. Local Martingales which are not Martingales . . . . . . . . . . . 9
6. Total Variation and the Stieltjes Integral . . . . . . . . . . . . . 11
6.1. Why we need a Stochastic Integral . . . . . . . . . . . . . . . 11
6.2. Previsibility . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.3. LebesgueStieltjes Integral . . . . . . . . . . . . . . . . . . . 13
7. The Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.1. Elementary Processes . . . . . . . . . . . . . . . . . . . . . 15
7.2. Strictly Simple and Simple Processes . . . . . . . . . . . . . . 15
8. The Stochastic Integral . . . . . . . . . . . . . . . . . . . . . 17
8.1. Integral for H ∈ L and M ∈ ´
2
. . . . . . . . . . . . . . . . 17
8.2. Quadratic Variation . . . . . . . . . . . . . . . . . . . . . 19
8.3. Covariation . . . . . . . . . . . . . . . . . . . . . . . . . 22
8.4. Extension of the Integral to L
2
(M) . . . . . . . . . . . . . . . 23
8.5. Localisation . . . . . . . . . . . . . . . . . . . . . . . . . 26
8.6. Some Important Results . . . . . . . . . . . . . . . . . . . . 27
9. Semimartingales . . . . . . . . . . . . . . . . . . . . . . . . . 29
10. Relations to Sums . . . . . . . . . . . . . . . . . . . . . . . 31
10.1. The UCP topology . . . . . . . . . . . . . . . . . . . . . 31
10.2. Approximation via Riemann Sums . . . . . . . . . . . . . . . 32
11. Itˆo’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . 35
11.1. Applications of Itˆo’s Formula . . . . . . . . . . . . . . . . . 40
11.2. Exponential Martingales . . . . . . . . . . . . . . . . . . . 41
12. L´evy Characterisation of Brownian Motion . . . . . . . . . . . 46
13. Time Change of Brownian Motion . . . . . . . . . . . . . . . 48
13.1. Gaussian Martingales . . . . . . . . . . . . . . . . . . . . 49
14. Girsanov’s Theorem . . . . . . . . . . . . . . . . . . . . . . 51
14.1. Change of measure . . . . . . . . . . . . . . . . . . . . . 51
15. Brownian Martingale Representation Theorem . . . . . . . . . 53
16. Stochastic Diﬀerential Equations . . . . . . . . . . . . . . . . 56
17. Relations to Second Order PDEs . . . . . . . . . . . . . . . . 61
17.1. Inﬁnitesimal Generator . . . . . . . . . . . . . . . . . . . . 61
17.2. The Dirichlet Problem . . . . . . . . . . . . . . . . . . . . 62
[ii]
Contents iii
17.3. The Cauchy Problem . . . . . . . . . . . . . . . . . . . . 64
17.4. FeynmanKa˘c Representation . . . . . . . . . . . . . . . . . 66
18. Stochastic Filtering . . . . . . . . . . . . . . . . . . . . . . . 69
18.1. Signal Process . . . . . . . . . . . . . . . . . . . . . . . 69
18.2. Observation Process . . . . . . . . . . . . . . . . . . . . . 70
18.3. The Filtering Problem . . . . . . . . . . . . . . . . . . . . 70
18.4. Change of Measure . . . . . . . . . . . . . . . . . . . . . 70
18.5. The Unnormalised Conditional Distribution . . . . . . . . . . . 76
18.6. The Zakai Equation . . . . . . . . . . . . . . . . . . . . . 78
18.7. KushnerStratonowich Equation . . . . . . . . . . . . . . . . 86
19. Gronwall’s Inequality . . . . . . . . . . . . . . . . . . . . . . 87
20. Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 89
20.1. Conditional Mean . . . . . . . . . . . . . . . . . . . . . . 89
20.2. Conditional Covariance . . . . . . . . . . . . . . . . . . . . 90
21. Discontinuous Stochastic Calculus . . . . . . . . . . . . . . . 92
21.1. Compensators . . . . . . . . . . . . . . . . . . . . . . . . 92
21.2. RCLL processes revisited . . . . . . . . . . . . . . . . . . . 94
22. References . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3. Stochastic Processes
The following notes are a summary of important deﬁnitions and results from the theory of
stochastic processes, proofs may be found in the usual books for example [Durrett, 1996].
3.1. Probability Space
Let (Ω, T, P) be a probability space. The set of Pnull subsets of Ω is deﬁned by
^ := ¦N ⊂ Ω : N ⊂ A for A ∈ T, with P(A) = 0¦ .
The space (Ω, T, P) is said to be complete if for A ⊂ B ⊂ Ω with B ∈ T and P(B) = 0
then this implies that A ∈ T.
In addition to the probability space (Ω, T, P), let (E, c) be a measurable space, called
the state space, which in many of the cases considered here will be (R, B), or (R
n
, B). A
random variable is a T/c measurable function X : Ω → E.
3.2. Stochastic Process
Given a probability space (Ω, T, P) and a measurable state space (E, c), a stochastic
process is a family (X
t
)
t≥0
such that X
t
is an E valued random variable for each time
t ≥ 0. More formally, a map X : (R
+
Ω, B
+
⊗T) → (R, B), where B
+
are the Borel sets
of the time space R
+
.
Deﬁnition 1. Measurable Process
The process (X
t
)
t≥0
is said to be measurable if the mapping (R
+
Ω, B
+
⊗T) → (R, B) :
(t, ω) → X
t
(ω) is measurable on R Ω with respect to the product σﬁeld B(R) ⊗T.
Associated with a process is a ﬁltration, an increasing chain of σalgebras i.e.
T
s
⊂ T
t
if 0 ≤ s ≤ t < ∞.
Deﬁne T
∞
by
T
∞
=
¸
t≥0
T
t
:= σ
¸
¸
t≥0
T
t
.
If (X
t
)
t≥0
is a stochastic process, then the natural ﬁltration of (X
t
)
t≥0
is given by
T
X
t
:= σ(X
s
: s ≤ t).
The process (X
t
)
t≥0
is said to be (T
t
)
t≥0
adapted, if X
t
is T
t
measurable for each t ≥ 0.
The process (X
t
)
t≥0
is obviously adapted with respect to the natural ﬁltration.
[1]
Stochastic Processes 2
Deﬁnition 2. Progressively Measurable Process
A process is progressively measurable if for each t its restriction to the time interval [0, t],
is measurable with respect to B
[0,t]
⊗ T
t
, where B
[0,t]
is the Borel σ algebra of subsets of
[0, t].
Why on earth is this useful? Consider a noncontinuous stochastic process X
t
. From
the deﬁnition of a stochastic process for each t that X
t
∈ T
t
. Now deﬁne Y
t
= sup
s∈[0,t]
X
s
.
Is Y
s
a stochastic process? The answer is not necessarily – sigma ﬁelds are only guaranteed
closed under countable unions, and an event such as
¦Y
s
> 1¦ =
¸
0≤s≤s
¦X
s
> 1¦
is an uncountable union. If X were progressively measurable then this would be suﬃcient
to imply that Y
s
is T
s
measurable. If X has suitable continuity properties, we can restrict
the unions which cause problems to be over some dense subset (say the rationals) and this
solves the problem. Hence the next theorem.
Theorem 3.3.
Every adapted right (or left) continuous, adapted process is progressively measurable.
Proof
We consider the process X restricted to the time interval [0, s]. On this interval for each
n ∈ N we deﬁne
X
n
1
:=
2
n
−1
¸
k=0
1
(ks/2
n
,(k+1)s/2
n
]
(t)X
ks/2
n(ω),
X
n
2
:= 1
[0,s/2
n
)
(t)X
0
(ω) +
2
n
¸
k=1
1
[ks/2
n
,(k+1)s/2
n
)
(t)X
(k+1)s/2
n(ω)
Note that X
n
1
is a left continuous process, so if X is left continuous, working pointwise
(that is, ﬁx ω), the sequence X
n
1
converges to X.
But the individual summands in the deﬁnition of X
n
1
are by the adpatedness of X
clearly B
[0,s]
⊗ T
s
measurable, hence X
n
1
is also. But the convergence implies X is also;
hence X is progressively measurable.
Consideration of the sequence X
n
2
yields the same result for right continuous, adapted
processes.
The following extra information about ﬁltrations should probably be skipped on a
ﬁrst reading, since they are likely to appear as excess baggage.
Stochastic Processes 3
Deﬁne
∀t ∈ (0, ∞) T
t−
=
¸
0≤s<t
T
s
∀t ∈ [0, ∞) T
t+
=
t≤s<∞
T
s
,
whence it is clear that for each t, T
t−
⊂ T
t
⊂ T
t+
.
Deﬁnition 3.2.
The family ¦T
t
¦ is called right continuous if
∀t ∈ [0, ∞) T
t
= T
t+
.
Deﬁnition 3.3.
A process (X
t
)
t≥0
is said to be bounded if there exists a universal constant K such that
for all ω and t ≥ 0, then [X
t
(ω)[ < K.
Deﬁnition 3.4.
Let X = (X
t
)
t≥0
be a stochastic process deﬁned on (Ω, T, P), and let X
= (X
t
)
t≥0
be a
stochastic process deﬁned on (Ω, T, P). Then X and X
have the same ﬁnite dimensional
distributions if for all n, 0 ≤ t
1
< t
2
< < t
n
< ∞, and A
1
, A
2
, . . . , A
n
∈ c,
P(X
t
1
∈ A
1
, X
t
2
∈ A
2
, . . . , X
t
n
∈ A
n
) = P
(X
t
1
∈ A
1
, X
t
2
∈ A
2
, . . . , X
t
n
∈ A
n
).
Deﬁnition 3.5.
Let X and X
be deﬁned on (Ω, T, P). Then X and X
are modiﬁcations of each other if
and only if
P(¦ω ∈ Ω : X
t
(ω) = X
t
(ω)¦) = 1 ∀t ≥ 0.
Deﬁnition 3.6.
Let X and X
be deﬁned on (Ω, T, P). Then X and X
are indistinguishable if and only if
P(¦ω ∈ Ω : X
t
(ω) = X
t
(ω)∀t ≥ 0¦) = 1.
There is a chain of implications
indistinguishable ⇒ modiﬁcations ⇒ same f.d.d.
The following deﬁnition provides us with a special name for a process which is indistin
guishable from the zero process. It will turn out to be important because many deﬁnitions
can only be made up to evanescence.
Deﬁnition 3.7.
A process X is evanescent if P(X
t
= 0 ∀t) = 1.
4. Martingales
Deﬁnition 4.1.
Let X = ¦X
t
, T
t
, t ≥ 0¦ be an integrable process then X is a
(i) Martingale if and only if E(X
t
[T
s
) = X
s
a.s. for 0 ≤ s ≤ t < ∞
(ii) Supermartingale if and only if E(X
t
[T
s
) ≤ X
s
a.s. for 0 ≤ s ≤ t < ∞
(iii) Submartingale if and only if E(X
t
[T
s
) ≥ X
s
a.s. for 0 ≤ s ≤ t < ∞
Theorem (Kolmogorov) 4.2.
Let X = ¦X
t
, T
t
, t ≥ 0¦ be an integrable process. Then deﬁne T
t+
:=
>0
T
t+
and also
the partial augmentation of T by
˜
T
t
= σ(T
t+
, ^). Then if t →E(X
t
) is continuous there
exists an
˜
T
t
adapted stochastic process
˜
X = ¦
˜
X
t
,
˜
T
t
, t ≥ 0¦ with sample paths which are
right continuous, with left limits (CADLAG) such that X and
˜
X are modiﬁcations of each
other.
Deﬁnition 4.3.
A martingale X = ¦X
t
, T
t
, t ≥ 0¦ is said to be an L
2
martingale or a square integrable
martingale if E(X
2
t
) < ∞ for every t ≥ 0.
Deﬁnition 4.4.
A process X = ¦X
t
, T
t
, t ≥ 0¦ is said to be L
p
bounded if and only if sup
t≥0
E([X
t
[
p
) < ∞.
The space of L
2
bounded martingales is denoted by ´
2
, and the subspace of continuous
L
2
bounded martingales is denoted ´
c
2
.
Deﬁnition 4.5.
A process X = ¦X
t
, T
t
, t ≥ 0¦ is said to be uniformly integrable if and only if
sup
t≥0
E
[X
t
[1
X
t
≥N
→ 0 as N → ∞.
Orthogonality of Martingale Increments
A frequently used property of a martingale M is the orthogonality of increments property
which states that for a square integrable martingale M, and Y ∈ T
s
with E(Y
2
) < ∞ then
E[Y (M
t
−M
s
)] = 0 for t ≥ s.
Proof
Via Cauchy Schwartz inequality E[Y (M
t
−M
s
)[ < ∞, and so
E(Y (M
t
−M
s
)) = E(E(Y (M
t
−M
s
)[T
s
)) = E(Y E(M
t
−M
s
[T
s
)) = 0.
A typical example is Y = M
s
, whence E(M
s
(M
t
− M
s
)) = 0 is obtained. A common
application is to the diﬀerence of two squares, let t ≥ s then
E((M
t
−M
s
)
2
[T
s
) =E(M
2
t
[T
s
) −2M
s
E(M
t
[T
s
) +M
2
s
=E(M
2
t
−M
2
s
[T
s
) = E(M
2
t
[T
s
) −M
2
s
.
[4]
Martingales 5
4.1. Stopping Times
A random variable T : Ω → [0, ∞) is a stopping (optional) time if and only if ¦ω : T(ω) ≤
t¦ ∈ T
t
.
The following theorem is included as a demonstration of checking for stopping times,
and may be skipped if desired.
Theorem 4.6.
T is a stopping time with respect to T
t+
if and only if for all t ∈ [0, ∞), the event ¦T < t¦
if T
t
measurable.
Proof
If T is an T
t+
stopping time then for all t ∈ (0, ∞) the event ¦T ≤ t¦ is T
t+
measurable.
Thus for 1/n < t we have
T ≤ t −
1
n
∈ T
(t−1/n)
+ ⊂ T
t
so
¦T < t¦ =
∞
¸
n=1
T ≤ t −
1
n
∈ T
t
.
To prove the converse, note that if for each t ∈ [0, ∞) we have that ¦T < t¦ ∈ T
t
,
then for each such t
T < t +
1
n
∈ T
t+1/n
,
as a consequence of which
¦T ≤ t¦ =
∞
¸
n=1
T < t +
1
n
∈
∞
n=1
T
t+1/n
= T
t+
.
Given a stochastic process X = (X
t
)
t≥0
, a stopped process X
T
may be deﬁned by
X
T
(ω) :=X
T(ω)∧t
(ω),
T
T
:=¦A ∈ T : A∩ ¦T ≤ t¦ ∈ T
t
¦.
Theorem (Optional Stopping).
Let X be a right continuous integrable, T
t
adapted process. Then the following are equiv
alent:
(i) X is a martingale.
(ii) X
T
is a martingale for all stopping times T.
(iii) E(X
T
) = E(X
0
) for all bounded stopping times T.
(iv) E(X
T
[T
S
) = X
S
for all bounded stopping times S and T such that S ≤ T. If in
addition, X is uniformly integrable then (iv) holds for all stopping times (not necessarily
bounded).
Martingales 6
The condition which is most often forgotten is that in (iii) that the stopping time T
be bounded. To see why it is necessary consider B
t
a Brownian Motion starting from zero.
Let T = inf¦t ≥ 0 : X
t
= 1¦, clearly a stopping time. Equally B
t
is a martingale with
respect to the ﬁltration generated by B itself, but it is also clear that EB
T
= 1 = EB
0
= 0.
Obviously in this case T < ∞ is false.
Theorem (Doob’s Martingale Inequalities).
Let M = ¦M
t
, T
t
, t ≥ 0¦ be a uniformly integrable martingale, and let M
∗
:= sup
t≥0
[M
t
[.
Then
(i) Maximal Inequality. For λ > 0,
λP(M
∗
≥ λ) ≤ E[[M
∞
[1
M
∗
<∞
] .
(ii) L
p
maximal inequality. For 1 < p < ∞,
M
∗

p
≤
p
p −1
M
∞

p
.
Note that the norm used in stating the Doob L
p
inequality is deﬁned by
M
p
= [E([M[
p
)]
1/p
.
Theorem (Martingale Convergence).
Let M = ¦M
t
, T
t
, t ≥ 0¦ be a martingale.
(i) If M is L
p
bounded then M
∞
(ω) := lim
t→∞
M
t
(ω) Pa.s.
(ii) If p = 1 and M is uniformly integrable then lim
t→∞
M
t
(ω) = M
∞
(ω) in L
1
. Then
for all A ∈ L
1
(T
∞
), there exists a martingale A
t
such that lim
t→∞
A
t
= A, and
A
t
= E(A[T
t
). Here T
∞
:= lim
t→∞
T
t
.
(iii) If p > 1 i.e. M is L
p
bounded lim
t→∞
M
t
= M
∞
in L
p
.
Deﬁnition 4.7.
Let ´
2
denote the set of L
2
bounded CADLAG martingales i.e. martingales M such that
sup
t≥0
EM
2
t
< ∞.
Let ´
c
2
denote the set of L
2
bounded CADLAG martingales which are continuous. A norm
may be deﬁned on the space ´
2
by M
2
= M
∞

2
2
= E(M
2
∞
).
From the conditional Jensen’s inequality, since f(x) = x
2
is convex,
E
M
2
∞
[T
t
≥(E(M
∞
[T
t
))
2
E
M
2
∞
[T
t
≥(EM
t
)
2
.
Hence taking expectations
EM
2
t
≤ EM
2
∞
,
and since by martingale convergence in L
2
, we get E(M
2
t
) →E(M
2
∞
), it is clear that
E(M
2
∞
) = sup
t≥0
E(M
2
t
).
Martingales 7
Theorem 4.8.
The space (´
2
, ) (up to equivalence classes deﬁned by modiﬁcations) is a Hilbert space,
with ´
c
2
a closed subspace.
Proof
We prove this by showing a one to one correspondence between ´
2
(the space of square
integrable martingales) and L
2
(T
∞
). The bijection is obtained via
f :´
2
→ L
2
(T
∞
)
f :(M
t
)
t≥0
→ M
∞
≡ lim
t→∞
M
t
g :L
2
(T
∞
) → ´
2
g :M
∞
→ M
t
≡ E(M
∞
[T
t
)
Notice that
sup
t
EM
2
t
= M
∞

2
2
= E(M
2
∞
) < ∞,
as M
t
is a square integrable martingale. As L
2
(T
∞
) is a Hilbert space, ´
2
inherits this
structure.
To see that ´
c
2
is a closed subspace of ´
2
, consider a Cauchy sequence ¦M
(n)
¦ in
´
2
, equivalently ¦M
(n)
∞
¦ is Cauchy in L
2
(T
∞
). Hence M
(n)
∞
converges to a limit, M
∞
say,
in L
2
(T
∞
). Let M
t
:= E(M
∞
[T
t
), then
sup
t≥0
M
(n)
t
−M
t
→ 0, in L
2
,
that is M
(n)
→ M uniformly in L
2
. Hence there exists a subsequence n(k) such that
M
n(k)
→ M uniformly; as a uniform limit of continuous functions is continuous, M ∈ ´
c
2
.
Thus ´
c
2
is a closed subspace of ´.
5. Basics
5.1. Local Martingales
A martingale has already been deﬁned, but a weaker deﬁnition will prove useful for stochas
tic calculus. Note that I’ll often drop references to the ﬁltration T
t
, but this nevertheless
forms an essential part of the (local) martingale.
Just before we dive in and deﬁne a Local Martingale, maybe we should pause and
consider the reason for considering them. The important property of local martingales
will only be seen later in the notes; and as we frequently see in this subject it is one of
stability that is, they are a class of objects which are closed under an operation, in this case
under the stochastic integral – an integral of a previsible process with a local martingale
integrator is a local martingale.
Deﬁnition 5.1.
M = ¦M
t
, T
t
, 0 ≤ t ≤ ∞¦ is a local martingale if and only if there exists a sequence of
stopping times T
n
tending to inﬁnity such that M
T
n
are martingales for all n. The space
of local martingales is denotes ´
loc
, and the subspace of continuous local martingales is
denotes ´
c
loc
.
Recall that a martingale (X
t
)
t≥0
is said to be bounded if there exists a universal
constant K such that for all ω and t ≥ 0, then [X
t
(ω)[ < K.
Theorem 5.2.
Every bounded local martingale is a martingale.
Proof
Let T
n
be a sequence of stopping times as in the deﬁnition of a local martingale. This
sequence tends to inﬁnity, so pointwise X
T
n
t
(ω) → X
t
(ω). Using the conditional form of the
dominated convergence theorem (using the constant bound as the dominating function),
for t ≥ s ≥ 0
lim
n→∞
E(X
T
n
t
[T
s
) = E(X
t
[T
s
).
But as X
T
n
is a (genuine) martingale, E(X
T
n
t
[T
s
) = X
T
n
s
= X
T
n
∧s
; so
E(X
t
[T
s
) = lim
n→∞
E(X
T
n
t
[T
s
) = lim
n→∞
X
T
n
s
= X
s
.
Hence X
t
is a genuine martingale.
Proposition 5.3.
The following are equivalent
(i) M = ¦M
t
, T
t
, 0 ≤ t ≤ ∞¦ is a continuous martingale.
(ii) M = ¦M
t
, T
t
, 0 ≤ t ≤ ∞¦ is a continuous local martingale and for all t ≥ 0, the set
¦M
T
: T a stopping time, T ≤ t¦ is uniformly integrable.
Proof
(i) ⇒ (ii) By optional stopping theorem, if T ≤ t then M
T
= E(M
t
[T
T
) hence the set is
uniformly integrable.
[8]
Basics 9
(ii) ⇒ (i)It is required to prove that E(M
0
) = E(M
T
) for any bounded stopping time T.
Then by local martingale property for any n,
E(M
0
) = E(M
T∧T
n
),
uniform integrability then implies that
lim
n→∞
E(M
T∧T
n
) = E(M
T
).
5.2. Local Martingales which are not Martingales
There do exist local martingales which are not themselves martingales. The following is
an example Let B
t
be a d dimensional Brownian Motion starting from x. It can be shown
using Itˆo’s formula that a harmonic function of a Brownian motion is a local martingale
(this is on the example sheet). From standard PDE theory it is known that for d ≥ 3, the
function
f(x) =
1
[x[
d−2
is a harmonic function, hence X
t
= 1/[B
t
[
d−2
is a local martingale. Now consider the L
p
norm of this local martingale
E
x
[X
t
[
p
=
1
(2πt)
d/2
exp
−
[y −x[
2
2t
[y[
−(d−2)p
dy.
Consider when this integral converges. There are no divergence problems for [y[ large, the
potential problem lies in the vicinity of the origin. Here the term
1
(2πt)
d/2
exp
−
[y −x[
2
2t
is bounded, so we only need to consider the remainder of the integrand integrated over a
ball of unit radius about the origin which is bounded by
C
B(0,1)
[y[
−(d−2)p
dy,
for some constant C, which on tranformation into polar coordinates yields a bound of the
form
C
1
0
r
−(d−2)p
r
d−1
dr,
with C
another constant. This is ﬁnite if and only if −(d −2)p + (d −1) > −1 (standard
integrals of the form 1/r
k
). This in turn requires that p < d/(d −2). So clealry E
x
[X
t
[ will
be ﬁnite for all d ≥ 3.
Now although E
x
[X
t
[ < ∞ and X
t
is a local martingale, we shall show that it is not
a martingale. Note that (B
t
− x) has the same distribution as
√
t(B
1
− x) under P
x
(the
Basics 10
probability measure induced by the BM starting from x). So as t → ∞, [B
t
[ → ∞ in
probability and X
t
→ 0 in probability. As X
t
≥ 0, we see that E
x
(X
t
) = E
x
[X
t
[ < ∞.
Now note that for any R < ∞, we can construct a bound
E
x
X
t
≤
1
(2πt)
d/2
y≤R
[y[
−(d−2)
dy +R
−(d−2)
,
which converges, and hence
limsup
t→∞
E
x
X
t
≤ R
−(d−2)
.
As R was chosen arbitrarily we see that E
x
X
t
→ 0. But E
x
X
0
= [x[
−(d−2)
> 0, which
implies that E
x
X
t
is not constant, and hence X
t
is not a martingale.
6. Total Variation and the Stieltjes Integral
Let A : [0, ∞) → R be a CADLAG (continuous to right, with left limits) process. Let a
partition Π = ¦t
0
, t
1
, . . . , t
m
¦ have 0 = t
0
≤ t
1
≤ ≤ t
m
= t; the mesh of the partition is
deﬁned by
δ(Π) = max
1≤k≤m
[t
k
−t
k−1
[.
The variation of A is then deﬁned as the increasing process V given by,
V
t
:= sup
Π
n(Π)
¸
k=1
A
t
k
∧t
−A
t
k−1
∧t
: 0 = t
0
≤ t
1
≤ ≤ t
n
= t
.
An alternative deﬁnition is given by
V
0
t
:= lim
n→∞
n(Π)
¸
1
A
k2
−n
∧t
−A
(k−1)2
−n
∧t
.
These can be shown to be equivalent (for CADLAG processes), since trivially (use the
dyadic partition), V
0
t
≤ V
t
. It is also possible to show that V
0
t
≥ V
t
for the total variation
of a CADLAG process.
Deﬁnition 6.1.
A process A is said to have ﬁnite variation if the associated variation process V is ﬁnite
(i.e. if for every t and every ω, [V
t
(ω)[ < ∞.
6.1. Why we need a Stochastic Integral
Before delving into the depths of the integral it’s worth stepping back for a moment to see
why the ‘ordinary’ integral cannot be used on a path at a time basis (i.e. separately for
each ω ∈ Ω). Suppose we were to do this i.e. set
I
t
(X) =
t
0
X
s
(ω)dM
s
(ω),
for M ∈ ´
c
2
; but for an interesting martingale (i.e. one which isn’t zero a.s.), the total
variation is not ﬁnite, even on a bounded interval like [0, T]. Thus the LebesgueStieltjes
integral deﬁnition isn’t valid in this case. To generalise we shall see that the quadratic
variation is actually the ‘right’ variation to use (higher variations turn out to be zero and
lower ones inﬁnite, which is easy to prove by considering the variation expressed as the
limit of a sum and factoring it by a maximum multiplies by the quadratic variation, the
ﬁrst term of which tends to zero by continuity). But to start, we shall consider integrating
a previsible process H
t
with an integrator which is an increasing ﬁnite variation process.
First we shall prove that a continuous local martingale of ﬁnite variation is zero.
[11]
Total Variation and the Stieltjes Integral 12
Proposition 6.2.
If M is a continuous local martingale of ﬁnite variation, starting from zero then M is
identically zero.
Proof
Let V be the variation process of M. This V is a continuous, adapted process. Now deﬁne a
sequence of stopping times S
n
as the ﬁrst time V exceeds n, i.e. S
n
:= inf
t
¦t ≥ 0 : V
t
≥ n¦.
Then the martingale M
S
n
is of bounded variation. It therefore suﬃces to prove the result
for a bounded, continuous martingale M of bounded variation.
Fix t ≥ 0 and let ¦0 = t
0
, t
1
, . . . , t
N
= t¦ be a partition of [0, t]. Then since M
0
= 0 it is
clear that, M
2
t
=
¸
N
k=1
M
2
t
k
−M
2
t
k−1
. Then via orthogonality of martingale increments
E(M
2
t
) =E
N
¸
k=1
M
t
k
−M
t
k−1
2
≤E
V
t
sup
k
M
t
k
−M
t
k−1
The integrand is bounded by n
2
(from deﬁnition of the stopping time S
n
), hence the
expectation converges to zero as the modulus of the partition tends to zero by the bounded
convergence theorem. Hence M ≡ 0.
6.2. Previsibility
The term previsible has crept into the discussion earlier. Now is the time for a proper
deﬁnition.
Deﬁnition 6.3.
The previsible (or predictable) σﬁeld { is the σﬁeld on R
+
Ω generated by the processes
(X
t
)
t≥0
, adapted to T
t
, with left continuous paths on (0, ∞).
Remark
The same σﬁeld is generated by left continuous, right limits processes (i.e. c`agl`ad pro
cesses) which are adapted to T
t−
, or indeed continuous processes (X
t
)
t≥0
which are adapted
to T
t−
. It is gnerated by sets of the form A(s, t] where A ∈ T
s
. It should be noted that
c`adl`ag processes generate the optional σ ﬁeld which is usually diﬀerent.
Theorem 6.4.
The previsible σ ﬁeldis also generated by the collection of random sets A ¦0¦ where
A ∈ T
0
and A(s, t] where A ∈ T
s
.
Proof
Let the σ ﬁeld generated by the above collection of sets be denotes {
. We shall show
{ = {
. Let X be a left continuous process, deﬁne for n ∈ N
X
n
= X
0
1
0
(t) +
¸
k
X
k/2
n1
(k/2
n
,(k+1)/2
n
]
(t)
It is clear that X
n
∈ {
. As X is left continuous, the above sequence of leftcontinuous
processes converges pointwise to X, so X is {
measurable, thus { ⊂ {
. Conversely
Total Variation and the Stieltjes Integral 13
consider the indicator function of A (s, t] this can be written as 1
[0,t
A
]\[0,s
A
]
, where
s
A
(ω) = s for ω ∈ A and +∞ otherwise. These indicator functions are adapated and left
continuous, hence {
⊂ {.
Deﬁnition 6.5.
A process (X
t
)
t≥0
is said to be previsible, if the mapping (t, ω) → X
t
(ω) is measurable
with respect to the previsible σﬁeld {.
6.3. LebesgueStieltjes Integral
[In the lecture notes for this course, the LebesgueStieltjes integral is considered ﬁrst for
functions A and H; here I consider processes on a pathwise basis.]
Let A be an increasing cadlag process. This induces a Borel measure dA on (0, ∞)
such that
dA((s, t])(ω) = A
t
(ω) −A
s
(ω).
Let H be a previsible process (as deﬁned above). The LebesgueStieltjes integral of H is
deﬁned with respect to an increasing process A by
(H A)
t
(ω) =
t
0
H
s
(ω)dA
s
(ω),
whenever H ≥ 0 or ([H[ A)
t
< ∞.
As a notational aside, we shall write
(H A)
t
≡
t
0
HdX,
and later on we shall use
d(H X) ≡ HdX.
This deﬁnition may be extended to integrator of ﬁnite variation which are not increas
ing, by decomposing the process A of ﬁnite variation into a diﬀerence of two increasing
processes, so A = A
+
−A
−
, where A
±
= (V ±A)/2 (here V is the total variation process
for A). The integral of H with respect to the ﬁnite variation process A is then deﬁned by
(H A)
t
(ω) := (H A
+
)
t
(ω) −(H A
−
)
t
(ω),
whenever ([H[ V )
t
< ∞.
There are no really new concepts of the integral in the foregoing; it is basically the
LebesgueStieltjes integral eextended from functions H(t) to processes in a pathwise fashion
(that’s why ω has been included in those deﬁnitions as a reminder).
Theorem 6.6.
If X is a nonnegative continuous local martingale and E(X
0
) < ∞ then X
t
is a super
martingale. If additionally X has constant mean, i.e. E(X
t
) = E(X
0
) for all t then X
t
is a
martingale.
Total Variation and the Stieltjes Integral 14
Proof
As X
t
is a continuous local martingale there is a sequence of stopping times T
n
↑ ∞ such
that X
T
n
is a genuine martingale. From this martingale property
E(X
T
n
t
[T
s
) = X
T
n
s
.
As X
t
≥ 0 we can apply the conditional form of Fatou’s lemma, so
E(X
t
[T
s
) = E(liminf
n→∞
X
T
n
t
[T
s
) ≤ liminf
n→∞
E(X
T
n
t
[T
s
) = liminf
n→∞
X
T
n
s
= X
s
.
Hence E(X
t
[T
s
) ≤ X
s
, so X
t
is a supermartingale.
Given the constant mean property E(X
t
) = E(X
s
). Let
A
n
:= ¦ω : X
s
−E(X
t
[T
s
) > 1/n¦,
so
A :=
∞
¸
n=1
A
n
= ¦ω : X
s
−E(X
t
[T
s
) > 0¦.
Consider P(A) = P(∪
∞
n=1
A
n
) ≤
¸
∞
n=1
P(A
n
). Suppose for some n, P(A
n
) > , then note
that
ω ∈ A
n
: X
s
−E(X
t
[T
s
) > 1/n
ω ∈ Ω/A
n
: X
s
−E(X
t
[T
s
) ≥ 0
Hence
X
s
−E(X
t
[T
s
) ≥
1
n
1
A
n
,
taking expectations yields
E(X
s
) −E(X
t
) >
n
,
but by the constant mean property the left hand side is zero; hence a contradiction, thus
all the P(A
n
) are zero, so
X
s
= E(X
t
[T
s
) a.s.
7. The Integral
We would like eventually to extend the deﬁnition of the integral to integrands which are
previsible processes and integrators which are semimartingales (to be deﬁned later in these
notes). In fact in these notes we’ll only get as far as continuous semimartingales; but it is
possible to go the whole way and deﬁne the integral of a previsible process with respect to
a general semimartingale; but some extra problems are thrown up on the way, in particular
as regards the construction of the quadratic variation process of a discontinuous process.
Various special classes of process will be needed in the sequel and these are all deﬁned
here for convenience. Naturally with terms like ‘elementary’ and ‘simple’ occurring many
books have diﬀerent names for the same concepts – so beware!
7.1. Elementary Processes
An elementary process H
t
(ω) is one of the form
H
t
(ω) = Z(ω)1
(S(ω),T(ω)]
(t),
where S, T are stopping times, S ≤ T ≤ ∞, and Z is a bounded T
S
measurable random
variable.
Such a process is the simplest nontrivial example of a previsible process. Let’s prove
that it is previsible:
H is clearly a left continuous process, so we need only show that it is adapted. It can
be considered as the pointwise limit of a sequence of right continuous processes
H
n
(t) = lim
n→∞
Z1
[S
n
,T
n
)
, S
n
= S +
1
n
, T
n
= T +
1
n
.
So it is suﬃcient to show that Z1
[U,V )
is adapted when U and V are stopping times which
satisfy U ≤ V , and Z is a bounded T
U
measurable function. Let B be a borel set of R,
then the event
¦Z1
[U,V )
(t) ∈ B¦ = [¦Z ∈ B¦ ∩ ¦U ≤ t¦] ∩ ¦V > t¦.
By the deﬁnition of U as a stopping time and hence the deﬁnition of T
U
, the event enclosed
by square brackets is in T
t
, and since V is a stopping time ¦V > t¦ = Ω/¦V ≤ t¦ is also
in T
t
; hence Z1
[U,V )
is adapted.
7.2. Strictly Simple and Simple Processes
A process H is strictly simple (H ∈ L
∗
) if there exist 0 ≤ t
0
≤ ≤ t
n
< ∞ and uniformly
bounded T
t
k
measurable random variables Z
k
such that
H = H
0
(ω)1
0
(t)
n−1
¸
k=0
Z
k
(ω)1
(t
k
,t
k+1
](t)
.
[15]
The Integral 16
This can be extended to H is a simple processes (H ∈ L), if there exists a sequence
of stopping times 0 ≤ T
0
≤ ≤ T
k
→ ∞, and Z
k
uniformly bounded T
T
k
measurable
random variables such that
H = H
0
(ω)1
0
(t) +
∞
¸
k=0
Z
k
1
(T
k
,T
k+1
]
.
Similarly a simple process is also a previsible process. The fundamental result will
follow from the fact that the σalgebra generated by the simple processes is exactly the
previsible σalgebra. We shall see the application of this after the next section.
8. The Stochastic Integral
As has been hinted at earlier the stochastic integral must be built up in stages, and to
start with we shall consider integrators which are L
2
bounded martingales, and integrands
which are simple processes.
8.1. Integral for H ∈ L and M ∈ ´
2
For a simple process H ∈ L, and M an L
2
bounded martingale then the integral may be
deﬁned by the ‘martingale transform’ (c.f. discrete martingale theory)
t
0
H
s
dM
s
= (H M)
t
:=
∞
¸
k=0
Z
k
M
T
k+1
∧t
−M
T
k
∧t
Proposition 8.1.
If H is a simple process, M a L
2
bounded martingale, and T a stopping time. Then
(i) (H M)
T
= (H1
(0,T]
) M = H (M
T
).
(ii) (H M) ∈ ´
2
.
(iii) E[(H M)
2
∞
] =
¸
∞
k=0
E[Z
2
k
(M
2
T
k+1
−M
2
T
k
)] ≤ H
2
∞
E(M
2
∞
).
Proof
Part (i)
As H ∈ L we can write
H =
∞
¸
k=0
Z
k
1
(T
k
,T
k+1
]
,
for T
k
stopping times, and Z
k
an T
T
k
measurable bounded random variable. By our deﬁ
nition for M ∈ ´
2
, we have
(H M)
t
=
∞
¸
k=0
Z
k
M
T
k+1
∧t
−M
T
k
∧t
,
and so, for T a general stopping time consider (H M)
T
t
= (H M)
T∧t
and so
(H M)
T
t
=
∞
¸
k=0
Z
k
M
T
k+1
∧T∧t
−M
T
k
∧T∧t
.
Similar computations can be performed for (H M
T
), noting that M
T
t
= M
T∧t
and for
(H1
(0,T]
M) yielding the same result in both cases. Hence
(H M)
T
= (H1
(0,T]
M) = (H M
T
).
[17]
The Stochastic Integral 18
Part (ii)
To prove this result, ﬁrst we shall establish it for an elementary process H ∈ c, and then
extend to L by linearity. Suppose
H = Z1
(R,S]
,
where R and S are stopping times and Z is a bounded T
R
measurable random variable.
Let T be an arbitrary stopping time. We shall prove that
E((H M)
T
) = E((H M)
0
) ,
and hence via optional stopping conclude that (H M)
t
is a martingale.
Note that
(H M)
∞
= Z (M
S
−M
R
) ,
and hence as M is a martingale, and Z is T
R
measurable we obtain
E(H M)
∞
=E(E(Z (M
S
−M
R
)) [T
R
) = E(ZE((M
S
−M
R
) [T
R
))
=0.
Via part (i) note that E(H M)
T
= E(H M
T
), so
E(H M)
T
= E(H M
T
)
∞
= 0.
Thus (H M)
t
is a martingale by optional stopping theorem. By linearity, this result
extends to H a simple process (i.e. H ∈ L).
Part (iii)
We wish to prove that (H M) is an L
2
bounded martingale. We again start by considering
H ∈ c, an elementary process, i.e.
H = Z1
(R,S]
,
where as before R and S are stopping times, and Z is a bounded T
R
measurable random
variable.
E
(H M)
2
∞
=E
Z
2
(M
S
−M
R
)
2
,
=E
Z
2
E
(M
S
−M
R
)
2
[T
R
,
where Z
2
is removed from the conditional expectation since it is and T
R
measurable
random variable. Using the same argument as used in the orthogonality of martingale
increments proof,
E
(H M)
2
∞
= E
Z
2
E
(M
2
S
−M
2
R
)[T
R
= E
Z
2
M
2
S
−M
2
R
.
As M is an L
2
bounded martingale and Z is a bounded process,
E
(H M)
2
∞
≤ sup
ω∈Ω
2[Z(ω)[
2
E
M
2
∞
.
The Stochastic Integral 19
so (H M) is an L
2
bounded martingale; so together with part (ii), (H M) ∈ ´
2
.
To extend this to simple processes is similar, but requires a little care In general the
orthogonality of increments arguments extends to the case where only ﬁnitely many of the
Z
k
in the deﬁnition of the simple process H are non zero. Let K be the largest k such that
Z
k
≡ 0.
E
(H M)
2
∞
=
K
¸
k=0
E
Z
2
k
M
2
T
k+1
−M
2
T
k
,
which can be bounded as
E
(H M)
2
∞
≤H
∞

2
E
K
¸
k=0
M
2
T
k+1
−M
2
T
k
≤H
∞

2
E
M
2
T
K+1
−M
2
T
0
≤ H
∞

2
EM
2
∞
,
since we require T
0
= 0, and M ∈ ´
2
, so the ﬁnal bound is obtained via the L
2
martingale
convergence theorem.
Now extend this to the case of an inﬁnite sum; let n ≤ m, we have that
(H M)
T
m
−(H M)
T
n
= (H1
(T
n
,T
m
]
M),
applying the result just proven for ﬁnite sums to the right hand side yields
(H M)
T
m
∞
−(H M)
T
n
∞
2
2
=
m−1
¸
k=n
E
Z
2
k
M
2
T
k+1
−M
2
T
k
≤ H
∞

2
2
E
M
2
∞
−M
2
T
n
.
But by the L
2
martingale convergence theorem the right hand side of this bound tends to
zero as n → ∞; hence (H M)
T
n
converges in ´
2
and the limit must be the pointwise
limit (H M). Let n = 0 and m → ∞ and the result of part (iii) is obtained.
8.2. Quadratic Variation
We mentioned earlier that the total variation is the variation which is used by the usual
LebesgueStieltjes integral, and that this cannot be used for deﬁning a stochastic integral,
since any continuous local martingale of ﬁnite variation is indistinguishable from zero. We
are now going to look at a variation which will prove fundamental for the construction of
the integral. All the deﬁnitions as given here aren’t based on the partition construction.
This is because I shall follow Dellacherie and Meyer and show that the other deﬁnitions
are equivalent by using the stochastic integral.
Theorem 8.2.
The quadratic variation process 'M`
t
of a continuous L
2
integrable martingale M is
the unique process A
t
starting from zero such that M
2
t
− A
t
is a uniformly integrable
martingale.
The Stochastic Integral 20
Proof
For each n deﬁne stopping times
S
n
0
:= 0, S
n
k+1
:= inf
t > S
n
k
:
M
t
−M
S
n
k
> 2
−n
¸
for k ≥ 0
Deﬁne
T
n
k
:= S
n
k
∧ t
Then
M
2
t
=
¸
k≥1
M
2
t∧S
n
k
−M
2
t∧S
n
k−1
=
¸
k≥1
M
2
T
n
k
−M
2
T
n
k−1
=2
¸
k≥1
M
T
n
k−1
M
T
n
k
−M
T
n
k−1
+
¸
k≥1
M
T
n
k
−M
T
n
k−1
2
(∗)
Now deﬁne H
n
to be the simple process given by
H
n
:=
¸
k≥1
M
S
n
k−1
1
(S
n
k−1
,S
n
k
]
.
We can then think of the ﬁrst term in the decomposition (∗) as (H
n
M). Now deﬁne
A
n
t
:=
¸
k≥1
M
T
n
k
−M
T
n
k−1
2
,
so the expression (∗) becomes
M
2
t
= 2(H
n
M)
t
+A
n
t
. (∗∗)
From the construction of the stopping times S
n
k
we have the following properties
H
n
−H
n+1

∞
=sup
t
[H
n
t
−H
n+1
t
[ ≤ 2
−(n+1)
H
n
−H
n+m

∞
=sup
t
[H
n
t
−H
n+m
t
[ ≤ 2
−(n+1)
for all m ≥ 1
H
n
−M
∞
=sup
t
[H
n
t
−M
t
[ ≤ 2
−n
Let J
n
(ω) be the set of all stopping times S
n
k
(ω) i.e.
J
n
(ω) := ¦S
n
k
(ω) : k ≥ 0¦.
Clearly J
n
(ω) ⊂ J
n+1
(ω). Now for any m ≥ 1, using Proposition 8.1(iii) the following
result holds
E
(H
n
M) −(H
n+m
M)
2
∞
=E
¸
H
n
−H
n+m
¸
M
2
∞
≤H
n
−H
n+m

2
∞
E(M
2
∞
)
≤
2
−(n+1)
2
E(M
2
∞
).
The Stochastic Integral 21
Thus (H
n
M)
∞
is a Cauchy sequence in the complete Hilbert space L
2
(T
∞
); hence by
completeness of the Hilbert Space it converges to a limit in the same space. As (H
n
M)
is a continuous martingale for each n, so is the the limit N say. By Doob’s L
2
inequality
applied to the continuous martingale (H
n
M) −N,
E
sup
t≥0
[(H
n
M) −N[
2
≤ 4E
[(H M) −N]
2
∞
→
n→∞
0.
Hence (H
n
M) converges to N uniformly a.s.. From the relation (∗∗) we see that as a
consequence of this, the process A
n
converges uniformly a.s. to a process A, where
M
2
t
= 2N
t
+A
t
.
Now we must check that this limit process A is increasing. Clearly A
n
(S
n
k
) ≤ A
n
(S
n
k+1
),
and since J
n
(ω) ⊂ J
n+1
(ω), it is also true that A(S
n
k
) ≤ A(S
n
k+1
) for all n and k, and
so A is certainly increasing on the closure of J(ω) := ∪
n
J
n
(ω). However if I is an open
interval in the complement of J, then no stopping time S
n
k
lies in this interval, so M must
be constant throughout I, so the same is true for the process A. Hence the process A
is continuous, increasing, and null at zero; such that M
2
t
− A
t
= 2N
t
, where N
t
is a UI
martingale (since it is L
2
bounded). Thus we have established the existence result. It only
remains to consider uniqueness.
Uniqueness follows from the result that a continuous local martingale of ﬁnite variation
is everywhere zero. Suppose the process A in the above deﬁnition were not unique. That is
suppose that also for some B
t
continuous increasing from zero, M
2
t
−B
t
is a UI martingale.
Then as M
2
t
− A
t
is also a UI martingale by subtracting these two equations we get that
A
t
−B
t
is a UI martingale, null at zero. It clearly must have ﬁnite variation, and hence be
zero.
The following corollary will be needed to prove the integration by parts formula, and
can be skipped on a ﬁrst reading; however it is clearer to place it here, since this avoids
having to redeﬁne the notation.
Corollary 8.3.
Let M be a bounded continuous martingale, starting from zero. Then
M
2
t
= 2
t
0
M
s
dM
s
+'M`
t
.
Proof
In the construction of the quadratic variation process the quadratic variation was con
structed as the uniform limit in L
2
of processes A
n
t
such that
A
n
t
= M
2
t
−2(H
n
M)
t
,
where each H
n
was a bounded previsible process, such that
sup
t
[H
n
t
−M[ ≤ 2
−n
,
The Stochastic Integral 22
and hence H
n
→ M in L
2
(M), so the martingales (H
n
M) converge to (M M) uniformly
in L
2
, hence it follows immediately that
M
2
t
= 2
t
0
M
s
dM
s
+'M`
t
,
Theorem 8.4.
The quadratic variation process 'M`
t
of a continuous local martingale M is the unique
increasing process A, starting from zero such that M
2
−A is a local martingale.
Proof
We shall use a localisation technique to extend the deﬁnition of quadratic variation from
L
2
bounded martingales to general local martingales.
The mysterious seeming technique of localisation isn’t really that complex to under
stand. The idea is that it enables us to extend a deﬁnition which applies for ‘X widgets’ to
one valid for ‘local X widgets’. It achieves this by using a sequence of stopping times which
reduce the ‘local X widgets’ to ‘X widgets’ ; the original deﬁnition can then be applied to
the stopped version of the ‘X widget’. We only need to check that we can sew up the pieces
without any holes i.e. that our deﬁnition is independent of the choice of stopping times!
Let T
n
= inf¦t : [M
t
[ > n¦, deﬁne a sequence of stopping times. Now deﬁne
'M`
t
:= 'M
T
n
` for 0 ≤ t ≤ T
n
To check the consistency of this deﬁnition note that
'M
T
n
`
T
n−1
= 'M
T
n−1
`
and since the sequence of stopping times T
n
→ ∞, we see that 'M` is deﬁned for all t.
Uniqueness follows from the result that any ﬁnite variation continuous local martingale
starting from zero is identically zero.
The quadratic variation turns out to be the ‘right’ sort of variation to consider for
a martingale; since we have already shown that all but the zero martingale have inﬁnite
total variation; and it can be shown that the higher order variations of a martingale are
zero a.s.. Note that the deﬁnition given is for a continuous local martingale; we shall
see later how to extend this to a continuous semimartingale.
8.3. Covariation
From the deﬁnition of the quadratic variation of a local martingale we can deﬁne the covari
ation of two local martingales N and M which are locally L
2
bounded via the polarisation
identity
'M, N` :=
'M +N` −'M −N`
4
.
We need to generalise this slightly, since the above deﬁnition required the quadratic
variation terms to be ﬁnite. We can prove the following theorem in a straightforward
manner using the deﬁnition of quadratic variation above, and this will motivate the general
deﬁnition of the covariation process.
The Stochastic Integral 23
Theorem 8.5.
For M and N two local martingales which are locally L
2
bounded then there exists a unique
ﬁnite variation process A starting from zero such that MN −A is a local martingale. This
process A is the covariation of M and N.
This theorem is turned round to give the usual deﬁnition of the covariation process of
two continuous local martingales as:
Deﬁnition 8.6.
For two continuous local martingales N and M, there exists a unique ﬁnite variation
process A, such that MN − A is a local martingale. The covariance process of N and M
is deﬁned as this process A.
It can readily be veriﬁed that the covariation process can be regarded as a symmet
ric bilinear form on the space of local martingales, i.e. for L,M and N continuous local
martingales
'M +N, L` ='M, L` +'N, L`,
'M, N` ='N, M`,
'λM, N` =λ'M, N`, λ ∈ R.
8.4. Extension of the Integral to L
2
(M)
We have previously deﬁned the integral for H a simple process (in L), and M ∈ ´
c
2
, and
we have noted that (H M) is itself in ´
2
. Hence
E
(H M)
2
∞
= E
Z
2
i−1
M
T
i
−M
T
i−1
2
Recall that for M ∈ ´
2
, then M
2
− 'M` is a uniformly integrable martingale. Hence for
S and T stopping times such that S ≤ T, then
E
(M
T
−M
S
)
2
[T
S
= E(M
2
T
−M
2
S
[T
S
) = E('M`
T
−'M`
S
[T
S
).
So summing we obtain
E
(H M)
2
∞
=E
¸
Z
2
i−1
'M`
T
i
−'M`
T
i−1
,
=E
(H
2
'M`)
∞
.
In the light of this, we deﬁne a seminorm H
M
via
H
M
=
¸
E
(H
2
'M`)
∞
1/2
=
¸
E
∞
0
H
2
s
d'M`
s
1/2
.
The space L
2
(M) is then deﬁned as the subspace of the previsible processes, where this
seminorm is ﬁnite, i.e.
L
2
(M) := ¦previsible processes H such that H
M
< ∞¦.
The Stochastic Integral 24
However we would actually like to be able to treat this as a Hilbert space, and there
remains a problem, namely that if X ∈ L
2
(M) and X
M
= 0, this doesn’t imply that X
is the zero process. Thus we follow the usual route of deﬁning an equivalence relation via
X ∼ Y if and only if X −Y 
M
= 0. We now deﬁne
L
2
(M) := ¦equivalence classes of previsible processes H such that H
M
< ∞¦,
and this is a Hilbert space with norm  
M
(it can be seen that it is a Hilbert space by
considering it as suitable L
2
space).
This establishes an isometry (called the Itˆo isometry) between the spaces L
2
(M) ∩ L
and L
2
(T
∞
) given by
I :L
2
(M) ∩ L → L
2
(T
∞
)
I :H → (H M)
∞
Remember that there is a basic bijection between the space ´
2
and the Hilbert Space
L
2
(T
∞
) in which each square integrable martingale M is represented by its limiting value
M
∞
, so the image under the isometry (H M)
∞
in L
2
(T
∞
) may be thought of a describing
an ´
2
martingale. Hence this endows ´
2
with a Hilbert Space structure, with an inner
product given by
(M, N) = E(N
∞
M
∞
) .
We shall now use this Itˆo isometry to extend the deﬁnition of the stochastic integral
from L (the class of simple processes) to the whole of L
2
(M). Roughly speaking we shall
approximate an element of L
2
(M) via a sequence of simple processes converging to it; just
as in the construction of the Lebesgue Integral. In doing this, we shall use the Monotone
Class Theorem.
Recall that in the conventional construction of the Lebesgue integration, and proof of
the elementary results the following standard machine is repeatedly invoked. To prove a
‘linear’ result for all h ∈ L
1
(S, Σ, µ), proceed in the following way:
(i) Show the result is true for h an indicator function.
(ii) Show that by linearity the result extends to all positive step functions.
(iii) Use the Monotone convergence theorem to see that if h
n
↑ h, where the h
n
are
step functions, then the result must also be true for h a nonnegative, Σ measurable
function.
(iv) Write h = h
+
−h
−
where both h
+
and h
−
are nonnegative functions and use linearity
to obtain the result for h ∈ L
1
.
The monotone class lemmas is a replacement for this procedure, which hides away all
the ‘machinery’ used in the constructions.
Monotone Class Theorem.
Let / be πsystem generating the σalgebra T (i.e. σ(/) = T). If H is a linear set of
bounded functions from Ω to R satisfying
(i) 1
A
∈ H, for all A ∈ /,
The Stochastic Integral 25
(ii) 0 ≤ f
n
↑ f, where f
n
∈ H and f is a bounded function f : Ω → R, then this implies
that f ∈ H,
then H contains every bounded, Tmeasurable function f : Ω →R.
In order to apply this in our case, we need to prove that the σalgebra of previsible
processes is that generated by the simple processes.
The Previsible σﬁeld and the Simple Processes
It is fairly simple to show that the space of simple processes L forms a vector space
(exercise: check linearity, constant multiples and zero).
Lemma 8.7.
The σalgebra generated by the simple processes is the previsible σalgebra i.e. the pre
visible σalgebra us the smallest σalgebra with respect to which every simple process is
measurable.
Proof
It suﬃces to show that every left continuous right limit process, which is bounded and
adapted to T
t
is measurable with respect to the σalgebra generated by the simple pro
cesses. Let H
t
be a bounded left continuous right limits process, then
H = lim
k→∞
lim
n→∞
nk
¸
i=2
H
(i−1)/n
i −1
n
,
i
n
,
and if H
t
is adapted to T
t
then H
(i−1)/n
is a bounded element of T
(i−1)/n
.
We can now apply the Monotone Class Theorem to the vector space H of processes
with a time parameter in (0, ∞), regarded as maps from (0, ∞) Ω → R. Then if this
vector space contains all the simple processes i.e. L ⊂ H, then H contains every bounded
previsible process on (0, ∞).
Assembling the Pieces
Since I is an isometry it has a unique extension to the closure of
 = L
2
(M) ∩ L,
in L
2
(M). By the application of the monotone class lemma to H = , and the πsystem
of simple processes. We see that  must contain every bounded previsible process; hence
 = L
2
(M). Thus the Itˆo Isometry extends to a map from L
2
(M) to L
2
(T
∞
).
Let us look at this result more informally. For a previsible H ∈ L
2
(M), because of the
density of L in L
2
(M), we can ﬁnd a sequence of simple processes H
n
which converges to
H, as n → ∞. We then consider I(H) as the limit of the I(H
n
). To verify that this limit
is unique, suppose that H
n
→ H as n → ∞ also, where H
n
∈ L. Note that H
n
−H
n
∈ L.
Also H
n
−H
n
→ 0 and so ((H
n
−H
n
) M) → 0, and hence by the Itˆo isometry the limits
lim
n→∞
(H
n
M) and lim
n→∞
(H
n
M) coincide.
The following result is essential in order to extend the integral to continuous local
martingales.
The Stochastic Integral 26
Proposition 8.8.
For M ∈ ´
2
, for any H ∈ L
2
(M) and for any stopping time T then
(H M)
T
= (H1
(0,T]
M) = (H M
T
).
Proof
Consider the following linear maps in turn
f
1
:L
2
(T
∞
) → L
2
(T
∞
)
f
1
:Y →E(Y [T
T
)
This map is a contraction on L
2
(T
∞
) since by conditional Jensen’s inequality
E(Y
∞
[T
T
)
2
≤ E(Y
2
∞
[T
T
),
and taking expectations yields
E(Y [T
T
)
2
2
= E
E(Y
∞
[T
T
)
2
≤ E
E(Y
2
∞
[T
T
)
= E(Y
2
∞
) = Y 
2
2
.
Hence f
1
is a contraction on L
2
(T
∞
). Now
f
2
:L
2
(M) → L
2
(M)
f
2
:H → H1
(0,T]
Clearly from the deﬁnition of  
M
, and from the fact that the quadratic variation process
is increasing
H1
(0,T]
M
=
∞
0
H
2
s
1
(0,T]
d'M`
s
=
T
0
H
2
s
d'M`
s
≤
∞
0
H
2
s
d'M`
s
= H
M
.
Hence f
2
is a contraction on L
2
(M). Hence if I denotes the Itˆo isometry then f
1
◦ I and
I ◦ f
2
are also contractions from L
2
(M) to L
2
(T
∞
), (using the fact that I is an isometry
between L
2
(M) and L
2
(T
∞
)).
Now introduce I
(T)
, the stochastic integral map associated with M
T
, i.e.
I
(T)
(H) ≡ (H M
T
)
∞
.
Note that
I
(T)
(H)
2
= H
M
T ≤ H
M
.
We have previously shown that the maps f
1
◦ I and I ◦ f
2
and H → I
(T)
(H) agree on
the space of simple processes by direct calculation. We note that L is dense in L
2
(M)
(from application of Monotone Class Lemma to the simple processes). Hence from the
three bounds above the three maps agree on L
2
(M).
The Stochastic Integral 27
8.5. Localisation
We’ve already met the idea of localisation in extending the deﬁnition of quadratic variation
from L
2
bounded continuous martingales to continuous local martingales. In this context
a previsible process ¦H
t
¦
t≥0
, is locally previsible if there exists a sequence of stopping
times T
n
→ ∞ such that for all n H1
(0,T
n
]
is a previsible process. Fairly obviously every
previsible process has this property. However if in addition we want the process H to be
locally bounded we need the condition that there exists a sequence T
n
of stopping times,
tending to inﬁnity such that H1
(0,T
n
]
is uniformly bounded for each n.
For the integrator (a martingale of integrable variation say), the localisation is to a
local martingale, that is one which has a sequence of stopping times T
n
→ ∞ such that
for all n, X
T
n
is a genuine martingale.
If we can prove a result like
(H X)
T
= (H1
(0,T]
X
T
)
for H and X in their original (i.e. nonlocalised classes) then it is possible to extend the
deﬁnition of (H X) to the local classes.
Note ﬁrstly that for H and X local, and T
n
a reducing sequence
1
of stopping times
for both H and X then we see that (H1
(0,T]
X
T
) is deﬁned in the existing fashion. Also
note that if T = T
n−1
we can check consistency
(H1
(0,T
n
]
X
T
n
)
T
n−1
= (H X)
T
n−1
= (H1
(0,T
n−1
]
X
T
n−1
).
Thus it is consistent to deﬁne (H X)
t
on t ∈ [0, ∞) via
(H X)
T
n
= (H1
(0,T
n
]
X
T
n
), ∀n.
We must check that this is well deﬁned, viz if we choose another regularising sequence S
n
,
we get the same deﬁnition of (H X). To see this note:
(H1
(0,T
n
]
X
T
n
)
S
n
= (H1
(0,T
n
∧S
n
]
X
T
n
∧S
n
) = (H1
(0,S
n
]
X
S
n
)
T
n
,
hence the deﬁnition of (H X)
t
is the same if constructed from the regularising sequence
S
n
as if constructed via T
n
.
8.6. Some Important Results
We can now extend most of our results to stochastic integrals of a previsible process H
with respect to a continuous local martingale M. In fact in these notes we will never
drop the continuity requirement. It can be done; but it requires considerably more work,
especially with regard to the deﬁnition of the quadratic variation process.
1
The reducing sequence is the sequence of stopping times tending to inﬁnity which makes the local
version of the object into the nonlocal version. We can ﬁnd one such sequence, because if say {T
n
}
reduces H and {S
n
} reduces X then T
n
∧ S
n
reduces both H and X.
The Stochastic Integral 28
Theorem 8.9.
Let H be a locally bounded previsible process, and M a continuous local martingale. Let
T be an arbitrary stopping time. Then:
(i) (H M)
T
= (H1
(0,T]
M) = (H M
T
)
(ii) (H M) is a continuous local martingale
(iii) 'H M` = H
2
'M`
(iv) H (K M) = (HK) M
Proof
The proof of parts (i) and (ii) follows from the result used in the localisation that:
(H M)
T
= (H1
(0,T]
M) = (H M
T
)
for H bounded previsible process in L
2
(M) and M an L
2
bounded martingale. Using this
result it suﬃces to prove (iii) and (iv) where M, H and K are uniformly bounded (via
localisation).
Part (iii)
E
(H M)
2
T
=E
H1
(0,T]
M
M)
2
∞
=E
H1
(0,T]
'M`
2
∞
=E
H
2
'M`
T
Hence we see that (HM)
2
−(H
2
'M`) is a martingale (via the optional stopping theorem),
and so by uniqueness of the quadratic variation process, we have established
'H M` = H
2
'M`.
Part (iv)
The truth of this statement is readily established for H and K simple processes (in L). To
extend to H and K bounded previsible processes note that
E
(H (K M))
2
∞
=E
H
2
'K M`
∞
=E
H
2
(K
2
'M`)
∞
=E
(HK)
2
'M`
∞
=E
((HK) M)
2
∞
Also note the following bound
E
HK)
2
'M`
∞
≤ min
¸
H
2
∞
K
2
M
, H
2
M
K
2
∞
¸
.
9. Semimartingales
I mentioned at the start of these notes that the most general form of the stochastic integral
would have a previsible process as the integrand and a semimartingale as an integrator.
Now it’s time to extend the deﬁnition of the Itˆo integral to the case of semimartingale
integrators.
Deﬁnition 9.1.
A process X is a semimartingale if X is an adapted CADLAG process which has a decom
position
X = X
0
+M +A,
where M is a local martingale, null at zero and A is a process null at zero, with paths of
ﬁnite variation.
Note that the decomposition is not necessarily unique as there exist martingales which
have ﬁnite variation. To remove many of these diﬃculties we shall impose a continuity
condition, since under this most of our problems will vanish.
Deﬁnition 9.2.
A continuous semimartingale is a process (X
t
)
t≥0
which has a DoobMeyer decomposition
X = X
0
+M +A,
where X
0
is T
0
measurable, M
0
= A
0
= 0, M
t
is a continuous local martingale and A
t
is
a continuous adapted process of ﬁnite variation.
Theorem 9.3.
The DoobMeyer decomposition in the deﬁnition of a continuous semimartingale is unique.
Proof
Let another such decomposition be
X = X
0
+M
+A
,
where M
is a continuous local martingale and A a continuous adapted process of ﬁnite
variation. Then consider the process N, where
N = M
−M = A
−A,
by the ﬁrst equality, N is the diﬀerence of two continuous local martingales, and hence is
itself a continuous local martingale; and by the second inequality it has ﬁnite variation.
Hence by an earlier proposition (5.2) it must be zero. Hence M
= M and A
= A.
We deﬁne† the quadratic variation of the continuous semimartingale as that of the
continuous local martingale part i.e. for X = X
0
+M +A,
'X` := 'M`.
† These deﬁnitions can be made to look natural by considering the quadratic variation deﬁned in terms
of a sum of squared increments; but following this approach, these are result which are proved later
using the Itˆo integral, since this provided a better approach to the discontinuous theory.
[29]
Semimartingales 30
Similarly if Y +Y
0
+N +B is another semimartingale, where B is ﬁnite variation and N
is a continuous local martingale, we deﬁne
'X, Y ` := 'M, N`.
We can extend the deﬁnition of the stochastic integral to continuous semimartingale
integrators by deﬁning
(H X) := (H M) + (H A),
where the ﬁrst integral is a stochastic integral as deﬁned earlier and the second is a
LebesgueStieltjes integral (as the integrator is a process of ﬁnite variation).
10. Relations to Sums
This section is optional; and is included to bring together the two approaches to the
constructions involved in the stochastic integral.
For example the quadratic variation of a process can either be deﬁned in terms of
martingale properties, or alternatively in terms of sums of squares of increments.
10.1. The UCP topology
We shall meet the notion of convergence uniformly on compacts in probability when con
sidering stochastic integrals as limits of sums, so it makes sense to review this topology
here.
Deﬁnition 10.1.
A sequence ¦H
n
¦
n≥1
converges to a process H uniformly on compacts in probability (ab
breviated u.c.p.) if for each t > 0,
sup
0≤s≤t
[H
n
s
−H
s
[ → 0 in probability.
At ﬁrst sight this may seem to be quite an esoteric deﬁnition; in fact it is a natural
extension of convergence in probability to processes. It would also appear to be quite
diﬃcult to handle, however Doob’s martingale inequalities provide the key to handling it.
Let
H
∗
t
= sup
0≤s≤t
[H
s
[,
then for Y
n
a CADLAG process, Y
n
converges to Y u.c.p. iﬀ (Y
n
− Y )
∗
converges to
zero in probability for each t ≥ 0. Thus to prove that a sequence converges u.c.p. it often
suﬃces to apply Doob’s inequality to prove that the supremum converges to zero in L
2
,
whence it must converge to zero in probability, whence u.c.p. convergence follows.
The space of CADLAG processes with u.c.p. topology is in fact metrizable, a compat
ible metric is given by
d(X, Y ) =
∞
¸
n=1
1
2
n
E(min(1, (X −Y )
∗
n
)) ,
for X and Y CADLAG processes. The metric space can also be shown to be complete. For
details see Protter.
[31]
Relations to Sums 32
Since we have just met a new kind of convergence, it is helpful to recall the other usual
types of convergence on a probability space. For convenience here are the usual deﬁnitions:
Pointwise
A sequence of random variables X
n
converges to X pointwise if for all ω not in some null
set,
X
n
(ω) → X(ω).
Probability
A sequence of r.v.s X
n
converges to X in probability, if for any > 0,
P([X
n
−X[ > ) → 0, as n → ∞.
L
p
convergence
A sequence of random variables X
n
converges to X in L
p
, if
E[X
n
−X[
p
→ 0, as n → ∞.
It is trivial to see that pointwise convergence implies convergence in probability. It is
also true that L
p
convergence implies convergence in probability as the following theorem
shows
Theorem 10.2.
If X
n
converges to X in L
p
for p > 0, then X
n
converges to X in probability.
Proof
Apply Chebyshev’s inequality to f(x) = x
p
, which yields for any > 0,
P([X
n
[ ≥ ) ≤
−p
E([X
n
[
p
) → 0, as n → ∞.
Theorem 10.3.
If X
n
→ X in probability, then there exists a subsequence n
k
such that X
n
k
→ X a.s.
Theorem 10.4.
If X
n
→ X a.s., then X
n
→ X in probability.
10.2. Approximation via Riemann Sums
Following Dellacherie and Meyer we shall establish the equivalence of the two constructions
for the quadratic variation by the following theorem which approximates the stochastic
integral via Riemann sums.
Relations to Sums 33
Theorem 10.2.
Let X be a semimartingale, and H a locally bounded previsible CADLAG process starting
from zero. Then
t
0
H
s
dX
s
= lim
n→∞
∞
¸
k=0
H
t∧k2
−n
X
t∧(k+1)2
−n −X
t∧k2
−n
u.c.p.
Proof
Let K
s
= H
s
1
s≤t
, and deﬁne the following sequence of simple process approximations
K
n
s
:=
∞
¸
k=0
H
t∧k2
−n1
(t∧k2
−n
,t∧(k+1)2
−n
]
(s).
Clearly this sequence K
n
s
converges pointwise to K
s
. We can decompose the semimartingale
X as X = X
0
+ A
t
+ M
t
where A
t
is of ﬁnite variation and M
t
is a continuous local
martingale, both starting from zero. The result that
t
0
K
n
s
dA
s
→
t
0
K
s
dA
s
, u.c.p.
is standard from the LebesgueStieltjes theory. Let T
k
be a reducing sequence for the
continuous local martingale M such that M
T
k
is a bounded martingale. Also since K is
locally bounded we can ﬁnd a sequence of stopping times S
k
such that K
S
k
is a bounded
previsible process. It therefore suﬃces to prove for a sequence of stopping times R
k
such
that R
k
↑ ∞, then
(K
n
M)
R
k
s
→ (K M)
R
k
s
, u.c.p..
By Doob’s L
2
inequality, and the Itˆo isometry we have
E
((K
n
M) −(K M))
∗
2
≤4E[(K
n
M) −(K M)]
2
, Doob L
2
≤4K
n
−K
2
M
, Itˆo Isometry
≤4
(K
n
s
−K
s
)
2
d'M`
s
As [K
n
−K[ → 0 pointwise, and K is bounded, clearly [K
n
−K[ is also bounded uniformly
in n. Hence by the Dominated Convergence Theorem for the LebesgueStieltjes integral
(K
n
s
−K
s
)
2
d'M`
s
→ 0 a.s..
Hence, we may conclude
E
[(K
n
M) −(K M)]
∗
2
→ 0, as n → ∞.
So
[(K
n
M) −(K M)]
∗
→ 0 in L
2
,
Relations to Sums 34
as n → ∞; but this implies that
[(K
n
M) −(K M)]
∗
→ 0 in probability.
Hence
[(K
n
M) −(K M)] → 0 u.c.p.
as required, and putting the two parts together yields
t
0
K
n
s
dX
s
→
t
0
K
s
dX
s
, u.c.p.
which is the required result.
This result can now be applied to the construction of the quadratic variation process,
as illustrated by the next theorem.
Theorem 10.3.
The quadratic variation process 'X`
t
is equal to the following limit in probability
'X`
t
= lim
n→∞
∞
¸
k=0
X
t∧(k+1)2
−n −X
t∧k2
−n
2
in probability.
Proof
In the theorem (7.2) establishing the existence of the quadratic variation process, we noted
in (∗∗) that
A
n
t
= M
2
t
−2(H
n
M)
t
.
Now from application of the previous theorem
2
t
0
X
s
dX
s
= lim
n→∞
∞
¸
k=0
X
t∧k2
−n
X
t∧(k+1)2
−n −X
t∧k2
−n
.
In addition,
X
2
t
−X
2
0
=
∞
¸
k=0
X
2
t∧(k+1)2
−n −X
2
t∧k2
−n
.
The diﬀerence of these two equations yields
A
t
= X
2
0
+ lim
n→∞
∞
¸
k=0
X
t∧(k+1)2
−n −X
t∧k2
−n
2
,
where the limit is taken in probability. Hence the process A is increasing and positive on
the rational numbers, and hence on the whole of R by right continuity.
Remark
The theorem can be strengthened still further by a result of Dol´eansDade to the eﬀect
that for X a continuous semimartingale
'X`
t
= lim
n→∞
∞
¸
k=0
X
t∧(k+1)2
−n −X
t∧k2
−n
2
,
where the limit is in the strong sense in L
1
. This result is harder to prove (essentially the
uniform integrability of the sums must be proven) and this is not done here.
11. Itˆ o’s Formula
Itˆo’s Formula is the analog of integration by parts in the stochastic calculus. It is also
the ﬁrst place where we see a major diﬀerence creep into the theory, and realise that our
formalism has found a new subtlety in the subject.
More importantly, it is the fundamental weapon used to evaluate Itˆo integrals; we
shall see some examples of this shortly.
The Itˆo isometry provides a cleancut deﬁnition of the stochastic integral; however it
was originally deﬁned via the following theorem of Kunita and Watanabe.
Theorem (KunitaWatanabe Identity) 11.1.
Let M ∈ ´
2
and H and K are locally bounded previsible processes. Then (H M)
∞
is
the unique element of L
2
(T
∞
) such that for every N ∈ M
2
we have:
E[(H M)
∞
N
∞
] = E[(H 'M, N`)
∞
]
Moreover we have
'(H M), N` = H 'M, N`.
Proof
Consider an elementary function H, so H = Z1
(S,T]
, where Z is an T
S
measurable bounded
random variable, and S and T are stopping times such that S ≤ T. It is clear that
E[(H M)
∞
N
∞
] =E[Z (M
T
−M
S
) N
∞
]
=E[Z(M
T
N
T
−M
S
N
S
)]
=E[M
∞
(H N)
∞
]
Now by linearity this can be extended to establish the result for all simple functions (in
L). We ﬁnally extend to general locally bounded previsible H, by considering a sequence
(provided it exists) of simple functions H
n
such that H
n
→ H in L
2
(M). Then there exists
a subsequence n
k
such that H
n
k
converges to H is L
2
(N). Then
E
(H
n
k
M)
∞
N
∞
−E
(H M)
∞
N
∞
=E
(H
n
k
−H) M
N
∞
≤
E
[(H
n
k
−H) M)]
2
E(N
2
∞
)
≤
E
[(H
n
k
M) −(H M)]
2
E(N
2
∞
)
By construction H
n
k
→ H in L
2
(M) which means that
H
n
k
−H
M
→ 0, as k → ∞.
By the Itˆo isometry
E
((H
n
k
−H) M)
2
= H
n
k
−H
2
M
→ 0, as k → ∞,
[35]
Itˆo’s Formula 36
that is (H
n
k
M)
∞
→ (H M)
∞
in L
2
. Hence as N is an L
2
bounded martingale, the right
hand side of the above expression tends to zero as k → ∞. Similarly as (H
n
k
N)
∞
→
(H N)
∞
in L
2
, we see also that
E((H
n
k
N)
∞
M
∞
) → 0, as k → ∞.
Hence we can pass to the limit to obtain the result for H.
To prove the second part of the theorem, we shall ﬁrst show that
'(H N), (K M)` +'(K N), (H M)` = 2HK'M, N`.
By polarisation
'M, N` =
'M +N` −'M −N`
4
,
also
HK =
(H +K)
2
−(H −K)
2
4
.
Hence
2(HK 'M, N`) =
1
8
(H +K)
2
−(H −K)
2
¦'M+N` −'MN`¦
.
Now we use the result that '(H M)` = (H
2
'M`) which has been proved previously in
theorem (7.9(iii)), to see that
2(HK 'M, N`) =
1
8
'(H +K) (M +N)` −'(H +K) (M −N)`
−'(H −K) (M +N)` +'(H −K) (M −N)`
.
Considering the ﬁrst two terms
'(H +K)(M +N)` −'(H +K) (M −N)` =
='(H +K) M + (H +K) N` −'(H +K) M −(H +K) N`
=4'(H +K) M, (H +K) N` by polarisation
=4 ('H M, H N` +'H M, K N` +'K M, H N` +'K M, K N`) .
Similarly for the second two terms
'(H −K)(M +N)` −'(H −K) (M −N)` =
='(H −K) M + (H −K) N` −'(H −K) M −(H −K) N`
=4'(H −K) M, (H −K) N` by polarisation
=4 ('H M, H N` −'H M, K N` −'K M, H N` +'K M, K N`) .
Itˆo’s Formula 37
Adding these two together yields
2(HK 'M, N`) = '(H N), (K M)` +'(K N), (H M)`
Putting K ≡ 1 yields
2H 'M, N` = 'H M, N` +'M, H N`.
So it suﬃces to prove that '(H M), N` = 'M, (H N)`, which is equivalent to showing
that
(H M)N −(H N)M
is a local martingale (from the deﬁnition of covariation process). By localisation it suﬃces
to consider M and N bounded martingales, whence we must check that for all stopping
times T,
E((H M)
T
N
T
) = E((H N)
T
M
T
) ,
but by the ﬁrst part of the theorem
E((H M)
∞
N
∞
) = E((H N)
∞
M
∞
) ,
which is suﬃcient to establish the result, since
(H M)
T
N
T
=(H M)
T
∞
N
T
∞
(H N)
T
M
T
=(H N)
T
∞
M
T
∞
Corollary 11.2.
Let N, M be continuous local martingales and H and K locally bounded previsible pro
cesses, then
'(H N), (K M)` = (HK 'N, M`).
Proof
Note that the covariation is symmetric, hence
'(H N), (K M)` =(H 'X, (K M)`)
=(H '(K M), X)`)
=(HK 'M, N`).
We can prove a stochastic calculus analogue of the usual integration by parts formula.
However note that there is an extra term on the right hand side, the covariation of the
processes X and Y . This is the ﬁrst major diﬀerence we have seen between the Stochastic
Integral and the usual Lebesgue Integral.
Before we can prove the general theorem, we need a lemma.
Itˆo’s Formula 38
Lemma (Parts for Finite Variation Process and a Martingale) 11.3.
Let M be a bounded continuous martingale starting from zero, and V a bounded variation
process starting from zero. Then
M
t
V
t
=
t
0
M
s
dV
s
+
t
0
V
s
dM
s
.
Proof
For n ﬁxed, we can write
M
t
V
t
=
¸
k≥1
M
k2
−n
∧t
V
k2
−n
∧t
−V
(k−1)2
−n
∧t
+
¸
k≥1
V
(k−1)2
−n
∧t
M
k2
−n
∧t
−M
(k−1)2
−n
∧t
=
¸
k≥1
M
k2
−n
∧t
V
k2
−n
∧t
−V
(k−1)2
−n
∧t
+
t
0
H
n
s
dM
s
,
where H
n
is the previsible simple process
H
n
s
=
¸
k≥1
V
k2
−n
∧t
1
((k−1)2
−n
∧t,k2
−n
∧t]
(s).
These H
n
are bounded and converge to V by the continuity of V , so as n → ∞ the second
term tends to
t
0
V
s
dM
s
,
and by the Dominated Convergence Theorem for LebesgueStieltjes integrals, the second
term converges to
t
0
M
s
dV
s
,
as n → ∞.
Theorem (Integration by Parts) 11.4.
For X and Y continuous semimartingales, then the following holds
X
t
Y
t
−X
0
Y
0
=
t
0
X
s
dY
s
+
t
0
Y
s
dX
s
+'X, Y `
t
.
Proof
It is trivial to see that it suﬃces to prove the result for processes starting from zero.
Hence let X
t
= M
t
+ A
t
and Y
t
= N
t
+ B
t
in DoobMeyer decomposition, so N
t
and M
t
are continuous local martingales and A
t
and B
t
are ﬁnite variation processes, all starting
from zero. By localisation we can consider the local martingales M and N to be bounded
martingales and the FV processes A and B to have bounded variation. Hence by the usual
(ﬁnite variation) theory
A
t
B
t
=
t
0
A
s
dB
s
+
t
0
B
s
dA
s
.
Itˆo’s Formula 39
It only remains to prove for bounded martingales N and M starting from zero that
M
t
N
t
=
t
0
M
s
dN
s
+
t
0
N
s
dM
s
+'M, N`
t
.
This follows by application of polarisation to corollary (7.3) to the quadratic variation
existence theorem. Hence
(M
t
+A
t
)(N
t
+B
t
) =M
t
N
t
+M
t
B
t
+N
t
A
t
+A
t
B
t
=
t
0
M
s
dN
s
+
t
0
N
s
dM
s
+'M, N`
t
+
t
0
M
s
dB
s
+
t
0
B
s
dM
s
+
t
0
N
s
dA
s
+
t
0
A
s
dN
s
+
t
0
A
s
dB
s
+
t
0
B
s
dA
s
=
t
0
(M
s
+A
s
)d(N
s
+B
s
) +
t
0
(N
s
+B
s
)d(M
s
+A
s
) +'M, N`.
Reﬂect for a moment that this theorem is implying another useful closure property of
continuous semimartingales. It implies that the product of two continuous semimartingales
X
t
Y
t
is a continuous semimartingale, since it can be written as a stochastic integrals with
respect to continuous semimartingales and so it itself a continuous semimartingale.
Theorem (Itˆo’s Formula) 11.5.
Let f : R
n
→ R
n
be a twice continuously diﬀerentiable function, and also let X =
(X
1
,X
2
, . . . , X
n
) be a continuous semimartingale in R
n
. Then
f(X
t
) −f(X
0
) =
n
¸
i=1
t
0
∂f
∂x
i
(X
s
)dX
i
s
+
1
2
n
¸
i,j=1
t
0
∂f
∂x
i
∂x
j
(X
s
)d'X
i
, X
j
`
s
.
Proof
To prove Itˆ o’s formula; ﬁrst consider the n = 1 case to simplify the notation. Then let
/ be the collection of C
2
(twice diﬀerentiable) functions f : R → R for which it holds.
Clearly / is a vector space; in fact we shall show that it is also an algebra. To do this we
must check that if f and g are in /, then their product fg is also in /. Let F
t
= f(X
t
)
and G
t
= g(X
t
) be the associated semimartingales. From the integration by parts formula
F
t
G
t
−F
0
G
0
=
t
0
F
s
dG
s
+
t
0
G
s
dF
s
+'F
s
, G
s
`.
However since by assumption f and g are in /, Itˆo’s formula may be applied to them
individually, so
t
0
F
s
dG
s
=
t
0
f(X
s
)
∂f
∂x
(X
s
)dX
s
.
Itˆo’s Formula 40
Also by the KunitaWatanabe formula extended to continuous local martingales we have
'F, G`
t
=
t
0
f
(X
s
)g
(X
s
)d'X, X`
s
.
Thus from the integration by parts,
F
t
G
t
−F
0
G
0
=
t
0
F
s
dG
s
+
t
0
G
s
dF
s
+
t
0
f
(X
s
)g
(X
s
)d'X, X`
s
,
=
t
0
(F
s
g
(X
s
) +f
(X
s
)G
s
) dX
s
+
1
2
t
0
(F
s
g
(X
s
) + 2f
g
(X
s
) +f
(X
s
)G
s
) d'M`
s
.
So this is just what Itˆo’s formula states for fg and so Itˆo’s formula also applies to fg;
hence fg ∈ /.
Since trivially f(x) = x is in /, then as / is an algebra, and a vector space this
implies that / contains all polynomials. So to complete the proof, we must approximate f
by polynomials (which we can do by standard functional analysis), and check that in the
limit we obtain Itˆo’s formula.
Introduce a sequence U
n
:= inf¦t : [X
t
[ + 'X`
t
> n¦. Hence ¦U
n
¦ is a sequence of
stopping times tending to inﬁnity. Now we shall prove Itˆo’s formula for twice continuously
diﬀerentiable f restricted to the interval [0, U
n
], so we can consider X as a bounded martin
gale. Consider a polynomial sequence f
k
approximating f, in the sense that for r = 0, 1, 2,
f
(r)
k
→ f
(r)
uniformly on a compact interval. We have proved that Itˆo’s formula holds for
all polynomial, so it holds for f
k
and hence
f
k
(X
t∧U
n
) −f
k
(X
0
) =
t∧U
n
0
f
(X
s
)dX
s
+
1
2
t∧U
n
0
f
k
(X
s
)d'X`
s
.
Let the continuous semimartingale X have DoobMeyer decomposition
X
t
= X
0
+M
t
+A
t
,
where M is a continuous local martingale and A is a ﬁnite variation process. We can rewrite
the above as
f
k
(X
t∧U
n
) −f
k
(X
0
) =
t∧U
n
0
f
(X
s
)dM
s
+
t∧U
n
0
f
(X
s
)dA
s
+
1
2
t∧U
n
0
f
k
(X
s
)d'M`
s
.
since 'X` = 'M`. On (0, U
n
] the process [X[ is uniformly bounded by n, so for r = 0, 1, 2
from the convergence (which is uniform on the compact interval [0, U
n
]) we obtain
sup
x≤n
f
(r)
k
−f
(r)
→ 0 as k → ∞
And from the Itˆo isometry we get the required convergence.
Itˆo’s Formula 41
11.1. Applications of Itˆo’s Formula
Let B
t
be a standard Brownian motion; the aim of this example is to establish that:
t
0
B
s
dB
s
=
1
2
B
2
t
−
1
2
t.
This example gives a nice simple demonstration that all our hard work has achieved some
thing. The result isn’t the same as that which would be given by the ‘logical’ extension of
the usual integration rules.
To prove this we apply Itˆo’s formula to the function f(x) = x
2
. We obtain
f(B
t
) −f(B
0
) =
t
0
∂f
∂x
(B
s
)dB
s
+
1
2
t
0
∂
2
f
∂x
2
(B
s
)d'B, B`
s
,
noting that B
0
= 0 for a standard Brownian Motion we see that
B
2
t
= 2
t
0
B
s
dB
s
+
1
2
2ds,
whence we derive that
t
0
B
s
dB
s
=
B
2
t
2
−
t
2
.
For those who have read the foregoing material carefully, there are grounds to complain that
there are simpler ways to establish this result, notably by consideration of the deﬁnition of
the quadratic variation process. However the point of this example was to show how Itˆo’s
formula can help in the actual evaluation of stochastic integrals; not to establish a totally
new result.
11.2. Exponential Martingales
Exponential martingales play an important part in the theory. Suppose X is a continuous
semimartingale starting from zero. Deﬁne:
Z
t
= exp
X
t
−
1
2
'X`
t
.
This Z
t
is called the exponential semimartingale associated with X
t
, and it is the
solution of the stochastic diﬀerential equation
dZ
t
= Z
t
dX
t
,
that is
Z
t
= 1 +
t
0
Z
s
dX
s
,
so clearly if X is a continuous local martingale, i.e. X ∈ ´
c
loc
then this implies, by the
stability property of stochastic integration, that Z ∈ ´
c
loc
also.
Itˆo’s Formula 42
Proof
For existence, apply Itˆo’s formula to f(x) = exp(x) to obtain
d (exp(Y
t
)) = exp(Y
t
)dY
t
+
1
2
exp(Y
t
)d'Y, Y `
t
.
Hence
d
exp(X
t
−
1
2
'X`
t
)
=exp(X
t
−
1
2
'X`
t
)d
X
t
−
1
2
'X`
t
+
1
2
exp
X
t
−
1
2
'X`
t
d
X
t
−
1
2
'X`
t
, X
t
−
1
2
'X`
t
=exp(X
t
−
1
2
'X`
t
)dX
t
−
1
2
exp
X
t
−
1
2
'X`
t
d'X`
t
+
1
2
exp
X
t
−
1
2
'X`
t
d'X`
t
=Z
t
dX
t
Hence Z
t
certainly solves the equation. Now to check uniqueness, deﬁne
Y
t
= exp
−X
t
+
1
2
'X`
t
,
we wish to show that for every solution of the Stochastic Diﬀerential Equation Z
t
Y
t
is a
constant. By a similar application of Itˆo’s formula
dY
t
= −Y
t
dX
t
+Y
t
d'X`
t
,
whence by integration by parts (alternatively consider Itˆo applied to f(x, y) = xy),
d(Z
t
Y
t
) =Z
t
dY
t
+Y
t
dZ
t
+'Z, Y `
t
,
=Z
t
(−Y
t
dX
t
+Y
t
d'X`
t
) +Y
t
Z
t
dX
t
+ (−Y
t
Z
t
)d'X`
t
,
=0.
So Z
t
Y
t
is a constant, hence the unique solution of the stochastic diﬀerential equation
dZ
t
= Z
t
dX
t
, with Z
0
= 1, is
Z
t
= exp
X
t
−
1
2
'X`
t
.
Itˆo’s Formula 43
Example
Let X
t
= λB
t
, for an arbitrary scalar λ. Clearly X
t
is a continuous local martingale, so
the associated exponential martingale is
M
t
= exp
λB
t
−
1
2
λ
2
t
.
It is clear that the exponential semimartingale of a real valued martingale must be
nonnegative, and thus by application of Fatou’s lemma we can show that it is a super
martingale, thus E(M
t
) ≤ 1 for all t.
Theorem 11.6.
Let M be a nonnegative local martingale, such that EM
0
< ∞ then M is a supermartin
gale.
Proof
Let T
n
be a reducing sequence for M
n
−M
0
, then for t > s ≥ 0,
E(M
t∧T
n
[T
s
) = E(M
0
[T
s
) +E(M
t∧T
n
−M
0
[T
s
)
= M
0
+M
s∧T
n
−M
0
= M
s∧T
n
.
Now by application of the conditional form of Fatou’s lemma
E(M
t
[F
s
) = E(liminf
n→∞
M
t∧T
n
[T
s
) ≤ liminf
n→∞
E(M
t∧T
n
[T
s
) = M
s
.
Thus M is a supermartingale as required.
Checking for a Martingale
The general exponential semimartingale is useful, but in many applications, not the least
of which will be Girsanov’s formula an actual Martingale will be needed. How do we go
about checking if a local martingale is a martingale anyway? It will turn out that there are
various methods, some of which crop up in the section on ﬁltration. First I shall present a
simple example and then prove a more general theorem. A common error is to think that
it is suﬃcient to show that a local martingale is locally bounded in L
2
to show that it is
a martingale – this is not suﬃcient as should be made clear by this example!
The exponential of λB
t
We continue our example from above. Let M
t
= exp(λB
t
− 1/2λ
2
t) be the exponential
semimartingale associated with a standard Brownian Motion B
t
, starting from zero. By
the previous argument we know that
M
t
= 1 +
t
0
λM
s
dB
s
,
hence M is a local martingale. Fix T a constant time, which is of course a stopping time,
then B
T
is an L
2
bounded martingale (E(B
T
t
)
2
= t ∧ T ≤ T). We then show that M
T
is
Itˆo’s Formula 44
in L
2
(B
T
) as follows
M
T

B
= E
T
0
M
2
s
d'B`
s
,
= E
T
0
M
2
s
ds
,
≤ E
T
0
exp(2λB
s
)ds
,
=
T
0
E(exp(2λB
s
)) ds,
=
T
0
exp(2λ
2
s
2
)ds < ∞.
In the ﬁnal equality we use that fact that B
s
is distributed as N(0, s), and we use the char
acteristic function of the normal distribution. Thus by the integration isometry theorem
we have that (M
T
B)
t
is an L
2
bounded martingale. Thus for every such T, Z
T
is an L
2
bounded martingale, which implies that M is a martingale.
The Exponential Martingale Inequality
We have seen a speciﬁc proof that a certain (important) exponential martingale is a true
martingale, we now show a more general argument.
Theorem 11.7.
Let M be a continuous local martingale, starting from zero. Suppose for each t, there exists
a constant K
t
such that 'M`
t
< ∞ a.s., then for every t, and every y > 0,
P
¸
sup
0≤s≤t
M
s
> y
≤ exp(−y
2
/2K
t
).
Furthermore, the associated exponential semimartingale Z
t
= exp(θM
t
−1/2θ
2
'M`
t
) is a
true martingale.
Proof
We have already noted that the exponential semimartingale Z
t
is a supermartingale, so
E(Z
t
) ≤ 1 for all t ≥ 0, and hence for θ > 0 and y > 0,
P
¸
sup
s≤t
M
s
> y
≤ P
¸
sup
s≤t
Z
s
> exp(θy −1/2θ
2
K
t
)
,
≤ exp(−θy + 1/2θ
2
K
t
).
Optimizing over θ now gives the desired result. For the second part, we establish the
Itˆo’s Formula 45
following bound
E
sup
0≤s≤t
Z
s
≤ E
exp
¸
sup
0≤s≤t
Z
s
,
≤
∞
0
P[sup 0 ≤ s ≤ tZ
s
≥ log λ] dλ,
≤ 1 +
∞
1
exp
−(log λ)
2
/2K
t
dλ < ∞.
(∗)
We have previously noted that Z is a local martingale; let T
n
be a reducing sequence for
Z, hence Z
T
n
is a martingale, hence
E[Z
t∧T
n
[T
s
] = Z
s∧T
n
. (∗∗)
We note that Z
t
is dominated by exp(θ sup
0≤s≤t
Z
s
), and thus by our bound we can
apply the dominated convergence theorem to (∗∗) as n → ∞ to establish that Z is a true
martingale.
Corollary 11.8.
For all , δ > 0,
P
¸
sup
t≥0
M
t
≥ & 'M`
∞
≤ δ
≤ exp(−
2
/2δ).
Proof
Set T = inf¦t ≥ 0 : M
t
≥ ¦, the conditions of the previous theorem now apply to M
T
,
with K
t
= .
From this corollary, it is clear that if H is any bounded previsible process, then
exp
t
0
H
s
dB
s
−
1
2
t
0
[H
s
[
2
ds
is a true martingale, since this is the exponential semimartingale associated with the process
HdB.
Corollary 11.9.
If the bounds K
t
on 'M` are uniform, that is if K
t
≤ C for all t, then the exponential
martingale is Uniformly Integrable. We shall use the useful result
∞
0
P(X ≥ log λ)dλ =
∞
0
E
1
e
X
≥λ
dλ = E
∞
0
1
e
X
≥λ
dλ = E(e
X
).
Proof
Note that the bound (∗) extends to a uniform bound
E
sup
t≥0
Z
t
≤ 1 +
∞
1
exp
−(log λ)
2
/2C
dλ < ∞.
Hence Z is bounded in L
∞
and thus a uniformly integrable martingale.
12. L´evy Characterisation of Brownian Motion
A very useful result can be proved using the Itˆo calculus about the characterisation of
Brownian Motion.
Theorem 12.1.
Let ¦B
i
¦
t≥0
be continuous local martingales starting from zero for i = 1, . . . , n. Then
B
t
= (B
1
t
, . . . , B
n
t
) is a Brownian motion with respect to (Ω, T, P) adapted to the ﬁltration
T
t
, if and only iﬀ
'B
i
, B
j
`
t
= δ
ij
t ∀i, j ∈ ¦1, . . . , n¦.
Proof
In these circumstances it follows that the statement B
t
is a Brownian Motion is by deﬁnition
equivalent to stating that B
t
− B
s
is independent of T
s
and is distributed normally with
mean zero and covariance matrix (t −s)I.
Clearly if B
t
is a Brownian motion then the covariation result follows trivially from the
deﬁnitions. Now to establish the converse, we assume 'B
i
, B
j
`
t
= δ
ij
t for i, j ∈ ¦1, . . . , n¦,
and shall prove B
t
is a Brownian Motion.
Observe that for ﬁxed θ ∈ R
n
we can deﬁne M
θ
t
by
M
θ
t
:= f(B
t
, t) = exp
i(θ, x) +
1
2
[θ[
2
t
.
By application of Itˆo’s formula to f we obtain (in diﬀerential form using the Einstein
summation convention)
d (f(B
t
, t)) =
∂f
∂x
j
(B
t
, t)dB
j
t
+
∂f
∂t
(B
t
, t)dt +
1
2
∂
2
f
∂x
j
∂x
k
(B
t
, t)d'B
j
, B
k
`
t
,
=iθ
j
f(B
t
, t)dB
j
t
+
1
2
[θ[
2
f(B
t
, t)dt −
1
2
θ
j
θ
k
δ
jk
f(B
t
, t)dt
=iθ
j
f(B
t
, t)dB
j
t
.
Hence
M
θ
t
= 1 +
t
0
d(f(B
t
, t)),
and is a sum of stochastic integrals with respect to continuous local martingales and is
hence itself a continuous local martingale. But note that for each t,
[M
θ
t
[ =
e
1
2
θ
2
t
< ∞
Hence for any ﬁxed time t
0
, (M
t
0
)
t
satisﬁes
[(M
t
0
)
t
[ ≤ [(M
t
0
)
∞
[ < ∞,
[46]
L´evy Characterisation of Brownian Motion 47
and so is a bounded local martingale; hence (M
t
0
)
t
) is a martingale. Hence M
t
0
is a genuine
martingale. Thus for 0 ≤ s < t we have
E(exp (i(θ, B
t
−B
s
)) [T
s
) = exp
−
1
2
(t −s) [θ[
2
a.s.
However this is just the characteristic function of a normal random variable following
N(0, t − s); so by the L´evy character theorem B
t
− B
s
is a N(0, t − s) random variable.
13. Time Change of Brownian Motion
This result is one of frequent application, essentially it tells us that any continuous local
martingale starting from zero, can be written as a time change of Brownian motion. So
modulo a time change a Brownian motion is the most general kind of continuous local
martingale.
Proposition 13.1.
Let M be a continuous local martingale starting from zero, such that 'M`
t
→ ∞as t → ∞.
Then deﬁne
τ
s
:= inf¦t > 0 : 'M`
t
> s¦.
Then deﬁne
˜
A
s
:= M
τ
s
.
(i) This τ
s
is an T stopping time.
(ii) 'M`
τ
s
= s.
(iii) The local martingale M can be written as a time change of Brownian Motion as
M
t
= B
'M`
t
. Moreover the process
˜
A
s
is an
˜
T
s
adapted Brownian Motion, where
˜
T
s
is
the timechanged σ algebra i.e.
˜
T
s
= T
τ
s
.
Proof
We may assume that the map t → 'M`
t
is strictly increasing. Note that the map s → τ
s
is
the inverse to t → 'M`
t
. Hence the results (i),(ii) and (iii).
Deﬁne
T
n
:= inf¦t : [M[
t
> n¦,
[U
n
:= 'M`
T
n
.
Note that from these deﬁnitions
τ
t∧U
n
=inf¦s > 0 : 'M`
s
> t ∧ U
n
¦
=inf¦s > 0 : 'M`
s
> t ∧ 'M`
T
n
¦
=T
n
∧ τ
t
So
˜
A
U
n
s
=
˜
A
s∧U
n
= M
T
n
τ
t
.
Now note that U
n
is an
˜
T
t
stopping time, since consider
Λ ≡ ¦U
n
≤ t¦ ≡ ¦'M`
T
n
≤ t¦ ≡ ¦T
n
≤ τ
t
¦,
the latter event is clearly T
τ
t
measurable i.e. it is
˜
F
t
measurable, so U
n
is a
˜
T
t
stopping
time. We may now apply the optional stopping theorem to the UI martingale M
T
n
, which
yields
E
˜
A
U
n
t
[T
s
=E
˜
A
t∧U
n
[
˜
T
s
= E
M
T
n
τ
t
[
˜
T
s
=E
M
T
n
τ
t
[T
τ
s
= M
T
n
τ
s
=
˜
A
U
n
s
.
[48]
Time Change of Brownian Motion 49
So
˜
A
t
is a
˜
T
t
local martingale. But we also know that (M
2
− 'M`)
T
n
is a UI martingale,
since M
T
n
is a UI martingale. By the optional stopping theorem, for 0 < r < s we have
E
˜
A
2
s∧U
n
−(s ∧ U
n
)[
˜
T
r
= E
M
T
n
τ
s
2
−'M`
τ
s
∧T
n
[T
τ
r
=E
M
2
τ
s
−'M`
τ
s
T
n
[T
τ
r
=
M
2
τ
r
−'M`
τ
r
T
n
=
˜
A
2
r∧U
n
−(r ∧ U
n
).
Hence
˜
A
2
−t is a
˜
T
t
local martingale. Before we can apply L´evy’s characterisation theorem
we must check that
˜
A is continuous; that is we must check that for almost every ω that
M is constant on each interval of constancy of 'M`. By localisation it suﬃces to consider
M a square integrable martingale, now let q be a positive rational, and deﬁne
S
q
:= inf¦t > q : 'M`
t
> 'M`
q
¦,
then it is enough to show that M is constant on [q, S
q
). But M
2
− 'M` is a martingale,
hence
E
¸
M
2
S
q
−'M`
S
q
2
[T
q
=M
2
q
−'M`
q
=M
2
q
−'M`
S
q
, as 'M`
q
= 'M`
S
q
.
Hence
E
M
S
q
−M
q
2
[T
q
= 0,
which establishes that
˜
A is continuous.
Thus
˜
A is a continuous
˜
T
t
adapted martingale with '
˜
A
s
`
s
= s and so by the L´evy
characterisation theorem
˜
A
s
is a Brownian Motion.
13.1. Gaussian Martingales
The time change of Brownian Motion can be used to prove the following useful theorem.
Theorem 13.2.
If M is a continuous local martingale starting from zero, and 'M`
t
is deterministic, that
is if we can ﬁnd a deterministic function f taking values in the nonnegative real numbers
such that 'M`
t
= f(t) a.e., then M is a Gaussian Martingale (i.e. M
t
has a Gaussian
distribution for almost all t).
Proof
Note that by the time change of Brownian Motion theorem, we can write M
t
as a time
change of Brownian Motion through
M
t
= B
'M`
t
,
where B is a standard Brownian Motion. By hypothesis 'M`
t
= f(t), a deterministic
function for almost all t, hence for almost all t,
M
t
= B
f(t)
,
Time Change of Brownian Motion 50
but the right hand side is a Gaussian random variable following N(0, f(t)). Hence M is a
Gaussian Martingale, and at time t it has distribution given by N(0, 'M`
t
).
As a corollary consider the stochastic integral of a purely deterministic function with
respect to a Brownian motion.
Corollary 13.3.
Let g(t) be a deterministic function of t, then M deﬁned by
M
t
:=
t
0
f(s)dB
s
,
satisﬁes
M
t
∼ N
0,
t
0
[f(s)[
2
ds
.
Proof
From the deﬁnition of M via a stochastic integral with respect to a continuous martingale,
it is clear that M is a continuous local martingale, and by the KunitaWatanabe result,
the quadratic variation of M is given by
'M`
t
=
t
0
[f(s)[ds,
hence the result follows.
This result can also be established directly in a fashion which is very similar to the
proof of the L´evy characterisation theorem. Consider Z deﬁned via
Z
t
= exp
iθM
t
+
1
2
θ
2
'M`
t
,
as in the L´evy characterisation proof, we see that this is a continuous local martingale,
and by boundedness furthermore is a martingale, and hence
E(Z
0
) = E(Z
t
),
whence
E(exp(iθM
t
)) = E
exp
−
1
2
θ
2
t
0
f(s)
2
ds
which is the characteristic function of the appropriate normal distribution.
14. Girsanov’s Theorem
Girsanov’s theorem is an element of stochastic calculus which does not have an analogue
in standard calculus.
14.1. Change of measure
When we wish to compare two measures P and Q, we don’t want either of them simply to
throw information away; since when they are positive they can be related by the Radon
Nikodym derivative; this motivates the following deﬁnition of equivalence of two measures.
Deﬁnition 14.1.
Two measures P and Q are said to be equivalent if they operate on the same sample space,
and if A is any event in the sample space then
P(A) > 0 ⇔Q(A) > 0.
In other words P is absolutely continuous with respect to Q and Q is absolutely continuous
with respect to P.
Theorem 14.2.
If P and Q are equivalent measures, and X
t
is an T
t
adapted process then the following
results hold
E
Q
(X
t
) = E
P
dQ
dP
X
t
,
E
Q
(X
t
[T
s
) = L
−1
s
E
P
(L
t
X
t
[T
s
) ,
where
L
s
= E
P
dQ
dP
T
s
.
Here L
t
is the RadonNikodym derivative of Q with respect to P. The ﬁrst result basically
shows that this is a martingale, and the second is a continuous time version of Bayes
theorem.
Proof
The ﬁrst part is basically the statement that the RadonNikodym derivative is a martingale.
This follows because the measures P and Q are equivalent, but this will not be proved in
detail here. Let Y be an T
t
measurable random variable, such that E
Q
([Y [) < ∞. We shall
prove that
E
Q
(Y [T
s
) =
1
L
s
E
P
[Y L
t
[T
s
] a.s. (P and Q).
Then for any A ∈ T
s
, using the deﬁnition of conditional expectation we have that
E
Q
1
A
1
L
s
E
P
[Y L
t
[T
s
]
=E
P
(1
A
E
P
[Y L
t
[T
s
])
=E
P
[1
A
Y L
t
] = E
Q
[1
A
Y ] .
Substituting Y = X
t
gives the desired result.
[51]
Girsanov’s Theorem 52
Theorem (Girsanov).
Let M be a continuous local martingale, and let Z be the associated exponential martingale
Z
t
= exp
M
t
−
1
2
'M`
t
.
If Z is uniformly integrable, then a new measure Q, equivalent to P may be deﬁned by
dQ
dP
= Z
∞
.
Then if X is a continuous P local martingale, X −'X, M` is a Q local martingale.
Proof
Since Z
∞
exists a.s. it deﬁnes a uniformly integrable martingale (the exponential mar
tingale), a version of which is given by Z
t
= E(Z
∞
[T
t
). Hence Q constructed thus is
a probability measure which is equivalent to P. Now consider X, a P local martingale.
Deﬁne a sequence of stopping times which tend to inﬁnity via
T
n
:= inf¦t ≥ 0 : [X
t
[ ≥ n, or ['X, M`
t
[ ≥ n¦.
Now consider the process Y deﬁned via
Y := X
T
n
−'X
T
n
, M`.
By Itˆo’s formula for 0 ≤ t ≤ T
n
, remembering that dZ
t
= Z
t
dM
t
as Z is the exponential
martingale associated with M,
d(Z
t
Y
t
) =Z
t
dY
t
+Y
t
dZ
t
+ d'Z, Y `
t
=Z
t
(dX
t
−d'X, M`
t
) +Y
t
Z
t
dM
t
+ d'Z, Y `
t
=Z
t
(dX
t
−d'X, M`
t
) + (X
t
−'X, M`
t
)Z
t
dM
t
+Z
t
d'X, M`
t
=(X
t
−'X, M`
t
)Z
t
dM
t
+Z
t
dX
t
Where the result d'Z, Y `
t
= Z
t
d'X, M`
t
follows from the KunitaWatanabe theorem.
Hence ZY is a Plocal martingale. But since Z is uniformly integrable, and Y is bounded
(by construction of the stopping time T
n
), hence ZY is a genuine Pmartingale. Hence for
s < t and A ∈ T
s
, we have
E
Q
[(Y
t
−Y
s
)1
A
] = E[Z
∞
(Y
t
−Y
s
)1
A
] = E[(Z
t
Y
t
−Z
s
Y
s
)1
A
] = 0,
hence Y is a Q martingale. Thus X−'X, M` is a Q local martingale, since T
n
is a reducing
sequence such that (X −'X, M`)
T
n
is a Qmartingale, and T
n
↑ ∞ as n → ∞.
Corollary 14.3.
Let W
t
be a P Brownian motion, then
˜
W
t
:= W
t
−'W, M`
t
is a Q Brownian motion.
Proof
Use L´evy’s characterisation of Brownian motion to see that since
˜
W
t
is continuous and
'
˜
W,
˜
W`
t
= 'W, W`
t
= t, since W
t
is a P Brownian motion, then
˜
W is a Q Brownian
motion.
15. Brownian Martingale Representation Theorem
The following theorem has many applications, for example in the rigorous study of math
ematical ﬁnance, even though the result is purely an existence theorem. The Malliavin
calculus oﬀers methods by which the process H in the following theorem can be stated
explicitly, but these methods are beyond the scope of these notes!
Theorem 15.1.
Let B
t
be a Brownian Motion on R
n
and (
t
is the usual augmentation of the ﬁltration
generated by B
t
. If Y is L
2
integrable and is measurable with respect to (
∞
then there
exists a previsible (
t
measurable process H
s
uniquely deﬁned up to evanescence such that
E(Y [(
t
) = E(Y ) +
t
0
H
s
dB
s
(1)
The proof of this result can seem hard if you are not familiar with functional analysis
style arguments. The outline of the proof is to describe all Y s which are (
T
measurable
which cannot be represented in the form (1) as belonging to the orthogonal complement of
a space. Then we show that for Z in this orthogonal complement that E(ZX) = 0 for all X
in a large space of (
T
measurable functions. Finally we show that this space is suﬃciently
big that actually we have proved this for all (
T
measurable functions, which includes Z so
E(Z
2
) = 0 and hence Z = 0 a.s. and we are done!
Proof
Without loss of generality prove the result in the case EY = 0 where Y is L
2
integrable
and measurable with respect to (
T
for some constant T > 0.
Deﬁne the space
L
2
T
(B) =
H : H is (
t
previsible and E
T
0
H
s

2
ds
< ∞
¸
Consider the stochastic integral map
I : L
2
T
(B) → L
2
((
T
)
deﬁned by
I(H) =
T
0
H
s
dB
s
.
As a consequence of the Itˆo isometry theorem, this map is an isometry. Hence the image V
under I of the Hilbert space L
2
T
(B) is complete and hence a closed subspace of L
2
0
((
T
) =
¦H ∈ L
2
((
T
) : EH = 0¦. The theorem will be proved if we can establish that the image is
the whole space.
We follow the usual approach in such proofs; consider the orthogonal complement of
V in L
2
0
((
T
) and we aim to show that every element of this orthogonal complement is zero.
Suppose that Z is in the orthogonal complement of L
2
0
((
T
), thus
E(ZX) = 0 for all X ∈ L
2
0
((
T
) (2)
[53]
Brownian Martingale Representation Theorem 54
We can deﬁne Z
t
= E(Z[(
t
) which is an L
2
bounded martingale. We know that the sigma
ﬁeld (
0
is trivial by the 01 law therefore
Z
0
= E(Z[(
0
) = EZ = 0.
Let H ∈ L
2
(B) let N
T
= I(H); we may deﬁne N
t
= E(N
T
[(
t
) for 0 ≤ t ≤ T. Clearly
N
T
∈ V as it is the image under I of some H.
Let S be a stopping time such that S ≤ T then
N
S
= E(N
T
[(
S
) = E
S
0
H
s
dB
s
+
T
S
H
s
dB
s
(
S
= I(H1
(0,S]
),
so consequently N
S
∈ V . The orthogonality relation (2) then implies that E(ZN
S
) = 0.
Thus using the martingale property of Z,
E(ZN
S
) = E(E(ZN
S
[(
S
)) = E(N
S
E(Z[(
S
)) = E(Z
S
N
S
) = 0
Since Z
T
and N
T
are square integrable, it follows that Z
t
N
t
is a uniformly integrable
martingale.
Since the stochastic exponential of a process may be written as
M
t
= c(iθθ θ B
t
) = exp
iθθ θ B
t
+
1
2
[θθ θ[
2
t
=
t
0
iM
t
θθ θ dB
t
,
such a process can be taken as H = iθθ θM
t
in the deﬁnition of N
T
and by the foregoing
argument we see that Z
t
M
t
is a martingale. Thus
Z
s
M
s
= E(Z
t
M
t
[(
s
) = E
Z
t
exp
iθθ θ B
t
+
1
2
[θθ θ[
2
t
(
s
Thus
Z
s
exp
−
1
2
[θθ θ[
2
(t −s)
= E( Z
t
exp (iθθ θ (B
t
−B
s
)[ (
s
)) .
Consider a partition 0 < t
1
< t
2
< < t
m
≤ T, and by repeating the above argument,
conditioning on each (
t
j
we establish that
E
¸
Z
T
exp
¸
i
¸
j
θθ θ
j
(B
t
j
−B
t
j−1
)
= E
Z
0
exp
−
1
2
¸
(t
j
−t
j−1
)[θθ θ
j
[
2
= 0,
(3)
where the last equality follows since Z
0
= 0.
This is true for any choices of θθ θ
j
∈ R
n
for j = 1, . . . n. The complex valued functions
deﬁned on (R
n
)
m
by
P
(r)
(x
1
, . . . , x
m
) =
K
(n)
¸
k=1
c
(r)
k
exp
¸
i
m
¸
j=1
a
(r)
j,k
x
j
Brownian Martingale Representation Theorem 55
clearly separate points (i.e. for given distinct points we can choose coeﬃcients such that the
functions have distinct values at these points), form a linear space and are closed under
complex conjugation. Therefore by the StoneWeierstass theorem (see [Bollobas, 1990]),
their uniform closure is the space of complex valued functions (recall that the complex
variable form of this theorem only requires local compactness of the domain).
Therefore we can approximate any continuous bounded complex valued function f :
(R
n
)
m
→C by a sequence of such Ps. But we have already shown in (3) that
E
Z
T
P
(r)
(B
t
1
, . . . , B
t
n
)
= 0
Hence by uniform approximation we can extend this to any f continuous, bounded
E(Z
T
f(B
t
1
, . . . , B
t
n
)) = 0.
Now we use the monotone class framework; consider the class H such that for H ∈ H,
E(Z
T
H) = 0
This ¦calH¦ is a vector space, and contains the constant one since E(Z) = 0. The fore
going argument shows that it contains all H measurable with respect to the sigma ﬁeld
σ(B
t
1
, . . . , B
t
n
) with 0 < t
1
< t
2
< < t
n
≤ T. Thus the monotone class theorem
implies that it contains all functions which are measurable with respect to (
T
.
The function Z
T
∈ (
T
, and we have shown E(Z
T
X) = 0 for X ∈ (
T
. Thus we can
take X = Z
T
whence E(Z
2
T
) = 0 which implies that Z
T
= 0 a.s.. This establishes the
desired result.
The reader should examine the latter part of the proof carefully; it is in fact related
to the proof that the set
exp
i
t
0
θθ θ
s
dB
s
: θθ θ ∈ L
∞
([0, t], R
m
)
is total in L
1
. A set S is said to be total if E(af) = 0 for all a ∈ S implies a = 0 a.s.. The
full proof of this result will reappear in a more abstract form in the stochastic ﬁltering
section of these notes.
16. Stochastic Diﬀerential Equations
Stochastic diﬀerential equations arise naturally in various engineering problems, where the
eﬀects of random ‘noise’ perturbations to a system are being considered. For example in
the problem of tracking a satelite, we know that it’s motion will obey Newton’s law to a
very high degree of accuracy, so in theory we can integrate the trajectories from the initial
point. However in practice there are other random eﬀects which perturb the motion.
The variety of SDE to be considered here describes a diﬀusion process and has the
form
dX
t
= b(t, X
t
) +σ(t, X
t
)dB
t
, (∗)
where b
i
(x, t), and σ
ij
(t, x) for 1 ≤ i ≤ d and 1 ≤ j ≤ r are Borel measurable functions.
In practice such SDEs generally occur written in the Statonowich form, but as we have
seen the Itˆo form has numerous calculational advantages (especially the fact that local
martinagles are a closed class under the Itˆ o integral), so it is conventional to transform the
SDE to the Itˆo form before proceeding.
Strong Solutions
A strong solution of the SDE (∗) on the given probability space (Ω, T, P) with initial
condition ζ is a process (X
t
)
t≥0
which has continuous sample paths such that
(i) X
t
is adapted to the augmented ﬁltration generated by the Brownian motion B and
initial condition ζ, which is denoted T
t
.
(ii) P(X
0
= ζ) = 1
(iii) For every 0 ≤ t < ∞ and for each 1 ≤ i ≤ d and 1 ≤ j ≤ r, then the folllowing holds
almost surely
t
0
[b
i
(s, X
s
)[ +σ
2
ij
(s, X
s
)
ds < ∞,
(iv) Almost surely the following holds
X
t
= X
0
+
t
0
b(s, X
s
)ds +
t
0
σ(s, X
s
)dB
s
.
Lipshitz Conditions
Let   denote the usual Euclidean norm on R
d
. Recall that a function f is said to be
Lipshitz if there exists a constant K such that
f(x) −f(y) ≤ Kx −y,
we shall generalise the norm concept to a (d r) matrix σ by deﬁning
σ
2
=
d
¸
i=1
r
¸
j=1
σ
2
ij
.
The concept of Liphshitz continuity can be extended to that of local Lipshitz continuity,
by requiring that for each n there exists K
n
, such that for all x and y such that x ≤ n
and y ≤ n then
f(x) −f(y) ≤ K
n
x −y.
[56]
Stochastic Diﬀerential Equations 57
Strong Uniqueness of Solutions
Theorem (Uniqueness) 16.1.
Suppose that b(t, x) and σ(t, x) are locally Lipshitz continuous in the spatial variable (x).
That is for every n ≥ 1 there exists a constant K
n
> 0 such that for every t ≥ 0, x ≤ n
and y ≤ n the following holds
b(t, x) −b(t, y) +σ(t, x) −σ(t, y) ≤ K
n
x −y.
Then strong uniqueness holds for the pair (b, σ), which is to say that if X and
˜
X are two
strong solutions of (∗) relative to B with initial condition ζ then X and
˜
X are indistin
guishable, that is
P
X
t
=
˜
X
t
∀t : 0 ≤ t < ∞
= 1.
The proof of this result is importanty inasmuch as it illustrates the ﬁrst example of a
technique of bounding which recurs again and again throughout the theory of stochastic
diﬀerential equations. Therefore I make no apology for spelling the proof out in excessive
detail, as it is most important to understand exactly where each step comes from!
Proof
Suppose that X and
˜
X are strong solutions of (∗), relative to the same brownian motion
B and initial condition ζ on the same probability space (Ω, T, P). Deﬁne a sequence of
stopping times
τ
n
= inf¦t ≥ 0 : X
t
 ≥ n¦, and ˜ τ
n
= inf¦t ≥ 0 : Y
t
 ≥ n¦.
Now set S
n
= min(τ
n
, ˜ τ
n
). Clearly S
n
is also a stopping time, and S
n
→ ∞ a.s. (P) as
n → ∞. These stopping times are only needed because b and σ are being assumed merely
to be locally Lipshitz. If they are assumed to be Lipshitz, as will be needed in the existence
part of the proof, then this complexity may be ignored.
Hence
X
t∧S
n
−
˜
X
t∧S
n
=
t∧S
n
0
b(u, X
u
) −b(u,
˜
X
u
)
du
+
t∧S
n
0
σ(u, X
u
) −σ(u,
˜
X
u
)
dW
u
.
Now we consider evaluating EX
t∧S
n
−
˜
X
t∧S
n

2
, the ﬁrst stage follows using the identity
(a +b)
2
≤ 4(a
2
+b
2
),
EX
t∧S
n
−
˜
X
t∧S
n

2
≤4E
¸
t∧S
n
0
b(u, X
u
) −b(u,
˜
X
u
)
du
¸
2
+ 4E
¸
t∧S
n
0
σ(u, X
u
) −σ(u,
˜
X
u
)
dW
u
¸
2
Stochastic Diﬀerential Equations 58
Considering the second term, we use the Itˆo isometry which we remember states that
(H M)
2
= H
M
, so
E
¸
t∧S
n
0
σ(u, X
u
) −σ(u,
˜
X
u
)
dW
u
¸
2
=E
¸
t∧S
n
0
[σ(u, X
u
) −σ(u,
˜
X
u
)[
2
du
¸
The classical H¨older inequality (in the form of the Cauchy Schwartz inequality) for
Lebsgue integrals which states that for p, q ∈ (1, ∞), with p
−1
+ q
−1
= 1 the following
inequality is satisﬁed.
[f(x)g(x)[dµ(x) ≤
[f(x)[
p
dµ(x)
1/p
[g(x)[
q
dµ(x)
1/q
This result may be applied to the other term, taking p = q = 2 which yields
E
¸
t∧S
n
0
b(u, X
u
) −b(u,
˜
X
u
)
du
¸
2
≤ E
¸
t∧S
n
0
b(u, X
u
) −b(u,
˜
X
u
)
du
¸
2
≤ E
¸
t∧S
n
0
1ds
t∧S
n
0
b(u, X
u
) −b(u,
˜
X
u
)
2
ds
¸
≤ E
¸
t
t∧S
n
0
b(u, X
u
) −b(u,
˜
X
u
)
2
¸
Thus combining these two useful inequalities and using the nth local Lipshitz relations we
have that
EX
t∧S
n
−
˜
X
t∧S
n

2
≤4tE
¸
t∧S
n
0
b(u, X
u
) −b(u,
˜
X
u
)
2
¸
+ 4E
¸
t∧S
n
0
[σ(u, X
u
) −σ(u,
˜
X
u
)[
2
du
¸
≤ 4(T + 1)K
2
n
E
t
0
X
u∧S
n
−
˜
X
u∧S
n
2
du
Now by Gronwall’s lemma, which in this case has a zero term outside of the integral, we
see that EX
t∧S
n
−
˜
X
t∧S
n

2
= 0, and hence that P(X
t∧S
n
=
˜
X
t∧S
n
) = 1 for all t < ∞.
That is these two processes are modiﬁcations, and thus indistinguishable. Letting n → ∞
we see that the same is true for ¦X
t
¦
t≥0
and ¦
˜
X
t
¦
t≥0
.
Now we impose Lipshitz conditions on the functions b and σ to produce an existence
result. The following form omits some measure theoretic details which are very important;
for a clear treatment see Chung & Williams chapter 10.
Stochastic Diﬀerential Equations 59
Theorem (Existence) 16.2.
If the coeﬃcients b and σ satisfy the global Lipshitz conditions that for all u, t
b(u, x) −b(u, y) ≤ K[x −y[, [σ(t, x) −σ(t, y)[ ≤ K[x −y[,
and additionally the bounded growth condition
[b(t, x)[
2
+[σ(t, x)[
2
≤ K
2
(1 +[x[
2
)
Fix a probability space (Ω, T, P). Let ξ be a random valued vector, independent of the
Brownian Motion B
t
, with ﬁnite second moment. Let T
t
be the augmented ﬁltration as
sociated with the Brownian Motion B
t
and ξ. Then there exists a continuous, adapted
process X which is a strong solution of the SDE with initial condition ξ. Additionally this
process is square integrable: for each T > 0 there exists C(K, T) such that
E[X
t
[
2
≤ C
1 +E[ξ[
2
e
Ct
,
for 0 ≤ t ≤ T.
Proof
This proof proceeds by Picard iteration through a map F, analogously to the deterministic
case to prove the existence of solutions to ﬁrst order ordinary diﬀerential equations. This is
a departure from the more conventional proof of this result. Let F be a map from the space
(
T
of continuous adapted processes X from Ω[0, T] to R, such that E
sup
t≤T
X
t
2
<
∞. Deﬁne X
(k)
t
recursively, with X
(0)
t
= ξ, and
X
(k+1)
t
= F(X
k
)
t
= ξ +
t
0
b(s, X
(k)
s
)ds +
t
0
σ(s, X
(k)
s
)dB
s
[Note: we have left out checking that the image of X under F is indeed adapted!] Now note
that using (a+b)
2
≤ 2a
2
+ab
2
, we have using the same bounds as in the uniqueness result
that
E
¸
sup
0≤t≤T
F(X)
t
−F(Y )
t
2
¸
≤ 2E
¸
sup
t≤T
T
0
(σ(X
s
) −σ(Y
s
)) dB
s
2
+ 2E
¸
sup
t≤T
T
0
(b(X
s
) −b(Y
s
)) ds
2
≤ 2K
2
(4 +T)
T
0
E
¸
sup
t≤T
[X
t
−Y
t
[
2
2
¸
dt.
By induction we see that for each T we can prove
E
¸
sup
t≤T
F
n
(X) −F
n
(Y )
2
¸
≤
C
n
T
n
n!
E
¸
sup
t≤T
X
t
−Y
t
2
¸
Stochastic Diﬀerential Equations 60
So by taking n suﬃciently large we have that F
n
is a contraction mapping and so by the
contraction mapping theorem, F
n
mapping (
T
to itself has a ﬁxed point, which must be
unique, call it X
(T)
. Clearly from the uniqueness part X
(T)
t
= X
(T
)
t
for t ≤ T ∧ T
a.s.,
and so we may consistently deﬁne X ∈ ( by
X
t
= X
(N)
t
for t ≤ N, N ∈ N,
which solves the SDE, and has already been shown to be unique.
17. Relations to Second Order PDEs
The aim of this section is to show a rather surprising connection between stochastic diﬀer
ential equations and the solution of second order partial diﬀerential equations. Surprising
though the results may seem they often provide a viable route to calculating the solutions
of explicit PDEs (an example of this is solving the BlackScholes Equation in Option Pric
ing, which is much easier to solve via stochastic methods, than as a second order PDE).
At ﬁrst this may well seem to be surprising since one problem is entirely deterministic and
the other in inherently stochastic!
17.1. Inﬁnitesimal Generator
Consider the following ddimensional SDE,
dX
t
=b(X
t
)dt +σ(X
t
)dB
t
,
X
0
=x
0
where σ is a d d matrix with elements σ = ¦σ
ij
¦. This SDE has inﬁnitesimal generator
A, where
A =
d
¸
j=1
b
k
(X
t
)
∂
∂x
j
+
1
2
d
¸
i=1
d
¸
j=1
d
¸
k=1
σ
ik
(X
t
)σ
kj
(X
t
)
∂
2
∂x
i
∂x
j
.
It is conventional to set
a
ij
=
d
¸
k=1
σ
ik
σ
kj
,
whence A takes the simpler form
A =
d
¸
j=1
b
j
(X
t
)
∂
∂x
j
+
1
2
d
¸
i=1
d
¸
j=1
a
ij
(X
t
)
∂
2
∂x
i
∂x
j
.
Why is the deﬁnition useful? Consider application of Itˆo’s formula to f(X
t
), which
yields
f(X
t
) −f(X
0
) =
t
0
d
¸
j=1
∂f
∂x
j
(X
s
)dX
s
+
1
2
t
0
d
¸
i=1
d
¸
j=1
∂
2
f
∂x
i
∂x
j
(X
s
)d'X
i
, X
j
`
s
.
Substituting for dX
t
from the SDE we obtain,
f(X
t
) −f(X
0
) =
t
0
¸
d
¸
j=1
b
j
(X
s
)
∂f
∂x
j
(X
s
) +
1
2
d
¸
i=1
d
¸
j=1
d
¸
k=1
σ
ik
σ
kj
∂
2
f
∂x
i
∂x
j
(X
s
)
dt
+
t
0
d
¸
j=1
σ
ij
(X
s
)
∂f
∂x
j
(X
s
)dB
s
=
t
0
Af(X
s
)ds +
t
0
d
¸
j=1
σ
ij
(X
s
)
∂f
∂x
j
(X
s
)dB
s
[61]
Relations to Second Order PDEs 62
Deﬁnition 17.1.
We say that X
t
satisﬁes the martingale problem for A, if X
t
is T
t
adapted and
M
t
= f(X
t
) −f(X
0
) −
t
0
Af(X
s
)ds,
is a martingale for each f ∈ C
2
c
(R
d
).
It is simple to verify from the foregoing that any solution of the associated SDE
solves the martingale problem for A. This can be generalised if we consider test functions
φ ∈ C
2
(R
+
R
d
, R), and deﬁne
M
φ
t
:= φ(t, X
t
) −φ(0, X
0
) −
t
0
∂
∂s
+A
φ(s, X
s
)ds.
then M
φ
t
is a local martingale, for X
t
a solution of the SDE associated with the inﬁnitesimal
generator A. The proof follows by an application of Itˆo’s formula to M
φ
t
, similar to that
of the above discussion.
17.2. The Dirichlet Problem
Let Ω be a subspace of R
d
with a smooth boundary ∂Ω. The Dirichlet Problem for f is
deﬁned as the solution of the system
Au +φ = 0 on Ω,
u = f on ∂Ω.
Where A is a second order partial diﬀerential operator of the form
A =
d
¸
j=1
b
j
(X
t
)
∂
∂x
j
+
1
2
d
¸
i=1
d
¸
j=1
a
ij
(X
t
)
∂
2
∂x
i
∂x
j
,
which is associated as before to an SDE. This SDE will play an important role in what is
to follow.
A simple example of a Dirichlet Problem is the solution of the Laplace equation in
the disc, with Dirichlet boundary conditions on the boundary, i.e.
∇
2
u = 0 on D,
u = f on ∂D.
Relations to Second Order PDEs 63
Theorem 17.2.
For each f ∈ C
2
b
(∂Ω) there exists a unique u ∈ C
2
b
(Ω) solving the Dirichlet problem for f.
Moreover there exists a continuous function m : Ω → (0, ∞) such that for all f ∈ C
2
b
(∂Ω)
this solution is given by
u(x) =
∂Ω
m(x, y)f(y)σ(dy).
Now remember the SDE which is associated with the inﬁnitesimal generator A:
dX
t
=b(X
t
)dt +σ(X
t
)dB
t
,
X
0
=x
0
Often in what follows we shall want to consider the conditional expectation and probability
measures, conditional on x
0
= x, these will be denoted E
x
and P
x
respectively.
Theorem (Dirichlet Solution).
Deﬁne a stopping time via
T := inf¦t ≥ 0 : X
t
/ ∈ Ω¦.
Then u(x) given by
u(x) := E
x
¸
T
0
φ(X
s
)ds +f(X
T
)
¸
,
solves the Dirichlet problem for f.
Proof
Deﬁne
M
t
:= u(X
T∧t
) +
t∧T
0
φ(X
s
)ds.
We shall now show that this M
t
is a martingale. For t ≥ T, it is clear that dM
t
= 0. For
t < T by Itˆo’s formula
dM
t
= du(X
t
) +φ(X
t
)dt.
Also, by Itˆo’s formula,
du(X
t
) =
d
¸
j=1
∂u
∂x
i
(X
t
)dX
i
t
+
1
2
d
¸
i=1
d
¸
j=1
∂
2
u
∂x
i
∂x
j
(X
t
)d'X
i
, X
j
`
t
=
d
¸
j=1
∂u
∂x
j
(X
t
) [b(X
t
)dt +σ(X
t
)dB
t
] +
1
2
d
¸
i=1
d
¸
j=1
d
¸
k=1
σ
ik
σ
kj
∂
d
u
∂x
i
∂x
j
(X
t
)dt
=Au(X
t
)dt +
d
¸
j=1
σ(X
t
)
∂u
∂x
j
(X
t
)dB
t
.
Putting these two applications of Itˆo’s formula together yields
dM
t
= (Au(X
t
) +φ(X
t
)) dt +
d
¸
j=1
σ(X
t
)
∂u
∂x
j
(X
t
)dB
t
.
Relations to Second Order PDEs 64
but since u solves the Dirichlet problem, then
(Au +φ)(X
t
) = 0,
hence
dM
t
=(Au(X
t
) +φ(X
t
)) dt +
d
¸
j=1
σ(X
t
)
∂u
∂x
j
(X
t
)dB
j
t
,
=
d
¸
j=1
σ(X
t
)
∂u
∂x
j
(X
t
)dB
j
t
.
from which we conclude by the stability property of the stochastic integral that M
t
is a
local martingale. However M
t
is uniformly bounded on [0, t], and hence M
t
is a martingale.
In particular, let φ(x) ≡ 1, and f ≡ 0, by the optional stopping theorem, since T ∧ t
is a bounded stoppping time, this gives
u(x) = E
x
(M
0
) = E
x
(M
T∧t
) = E
x
[u(X
T∧t
) + (T ∧ t)].
Letting t → ∞, we have via monotone convergence that E
x
(T) < ∞, since we know that
the solutions u is bounded from the PDE solution existance theorem; hence T < ∞ a.s..
We cannot simply apply the optional stopping theorem directly, since T is not necessarily
a bounded stopping time. But for arbitrary φ and f, we have that
[M
t
[ ≤ u
∞
+Tφ
∞
= sup
x∈Ω
[u(x)[ +T sup
x∈Ω
[φ(x)[,
whence as E
x
(T) < ∞, the martingale M is uniformly integrable, and by the martingale
convergence theorem has a limit M
∞
. This limiting random variable is given by
M
∞
= f(X
T
) +
T
0
φ(X
s
)ds.
Hence from the identity E
x
M
0
= E
x
M
∞
we have that,
u(x) = E
x
(M
0
) = E
x
(M
∞
) = E
x
¸
f(X
T
) +
T
0
φ(X
s
)ds
¸
.
Relations to Second Order PDEs 65
17.3. The Cauchy Problem
The Cauchy Problem for f, a C
2
b
function, is the solution of the system
∂u
∂t
= Au on Ω
u(0, x) = f(x) on x ∈ Ω
u(t, x) = f(x) ∀t ≥ 0, on x ∈ ∂Ω
A typical problem of this sort is to solve the heat equation,
∂u
∂t
=
1
2
∇
2
u,
where the function u represents the temperature in a region Ω, and the boundary condition
is to specify the temperature ﬁeld over the region at time zero, i.e. a condition of the form
u(0, x) = f(x) for x ∈ Ω,
In addition the boundary has its temperature ﬁxed at zero,
u(0, x) = 0 for x ∈ ∂Ω.
If Ω is just the real line, then the solution has the beautifully simple form
u(t, x) = E
x
(f(B
t
)) ,
where B
t
is a standard Brownian Motion.
Theorem (Cauchy Existence) 17.3.
For each f ∈ C
2
b
(R
d
) there exists a unique u in C
1,2
b
(RR
d
) such that u solves the Cauchy
Problem for f. Moreover there exists a continuous function (the heat kernel)
p : (0, ∞) R
d
R
d
→ (0, ∞),
such that for all f ∈ C
2
b
(R
d
), the solution to the Cauchy Problem for f is given by
u(t, x) =
R
d
p(t, x, y)f(y)dy.
Theorem 17.4.
Let u ∈ C
1,2
b
(R R
d
) be the solution of the Cauchy Problem for f. Then deﬁne
T := inf¦t ≥ 0 : X
t
/ ∈ Ω¦,
a stopping time. Then
u(t, x) = E
x
[f(X
T∧t
)]
Relations to Second Order PDEs 66
Proof
Fix s ∈ (0, ∞) and consider the time reversed process
M
t
:= u((s −t) ∧ T, X
t∧T
).
There are three cases now to consider; for 0 ≤ T ≤ t ≤ s, M
t
= u((s − t) ∧ T, X
T
),
where X
T
∈ ∂Ω, so from the boundary condition, M
t
= f(X
T
), and hence it is clear that
dM
t
= 0. For 0 ≤ s ≤ T ≤ t and for 0 ≤ t ≤ s ≤ T, the argument is similar; in the latter
case by Itˆo’s formula we obtain
dM
t
=−
∂u
∂t
(s −t, X
t
)dt +
d
¸
j=1
∂u
∂x
j
(s −t, X
t
)dX
j
t
+
1
2
d
¸
i=1
d
¸
j=1
∂
2
u
∂x
i
∂x
j
(s −t, X
t
)d'X
i
, X
j
`
t
,
=
−
∂u
∂t
+Au
(s −t, X
t
)dt +
d
¸
j=1
∂u
∂x
j
(s −t, X
t
)
d
¸
k=1
σ
jk
(X
t
)dB
k
t
.
We obtain a similar result in the 0 ≤ t ≤ T ≤ s, case but with s replaced by T. Thus for
u solving the Cauchy Problem for f, we have that
−
∂u
∂t
+Au
= 0,
we see that M
t
is a local martingale. Boundedness implies that M
t
is a martingale, and
hence by optional stopping
u(s, x) = E
x
(M
0
) = E
x
(M
s
) = E
x
(f(X
s∧T
)),
17.4. FeynmanKa˘c Representation
Feynman observed the following representation for the representation of the solution of a
PDE via the expectation of a suitable function of a Brownian Motion ‘intuitively’ and the
theory was later made rigorous by Ka˘c.
In what context was Feynman interested in this problem? Consider the Schr¨odinger Equa
tion,
−
2
2m
∇
2
Φ(x, t) +V (x)Φ(x, t) = i
∂
∂t
Φ(x, t),
which is a second order PDE. Feynman introduced the concept of a pathintegral to express
solutions to such an equation. In a manner which is analogous to the ‘Hamiltonian’ principle
in classical mechanics, there is an action integral which is minimised over all ‘permissible
paths’ that the system can take.
Relations to Second Order PDEs 67
We have already considered solving the Cauchy problem
∂u
∂t
= Au on Ω
u(0, x) = f(x) on x ∈ Ω
u(t, x) = f(x) ∀t ≥ 0, on x ∈ ∂Ω
where A is the generator of an SDE and hence of the form
A =
d
¸
j=1
b
j
(X
t
)
∂
∂x
j
+
1
2
d
¸
i=1
d
¸
j=1
a
ij
(X
t
)
∂
2
∂x
i
∂x
j
.
Now consider the more general form of the same Cauchy problem where we consider a
Cauchy Problem with generator L of the form:
L ≡ A+v =
d
¸
j=1
b
j
(X
t
)
∂
∂x
j
+
1
2
d
¸
i=1
d
¸
j=1
a
ij
(X
t
)
∂
2
∂x
i
∂x
j
+v(X
t
).
For example let X
t
= B
t
corresponding to
A =
1
2
∇
2
, L =
1
2
∇
2
+v(X
t
),
so in this example we are solving the problem
∂u
∂t
=
1
2
∇
2
u(t, X
t
) +v(X
t
)u(X
t
) on R
d
.
u(0, x) = f(x) on ∂R
d
.
The FeynmanKa˘c Representation Theorem expresses the solution of a general second
order PDE in terms of an expectation of a function of a Brownian Motion. To simplify
the statement of the result, we shall work on Ω = R
d
, since this removes the problem of
considering the Brownian Motion hitting the boundary.
Theorem (FeynmanKa˘c Representation).
Let u ∈ C
1,2
b
(R R
d
) be a solution of the Cauchy Problem with a generator of the above
form with initial condition f, and let X
t
be a solution of the SDE with inﬁnitessimal
generator A in R
d
starting at x. Then
u(t, x) = E
x
¸
f(X
t
) exp
t
0
v(X
s
)ds
.
Proof
Note that the SDE for X
t
has the form
dX
t
= a(X
t
)dB
t
+b(X
t
)dt.
Relations to Second Order PDEs 68
Fix s ∈ (0, ∞) and apply Itˆo’s formula to
M
t
= u(s −t, X
t
) exp
t
0
v(X
r
)dr
.
For notational convenience, let
E
t
= exp
t
0
v(X
r
)dr
,
whence by Itˆo’s formula dE
t
= E
t
v(X
t
)dt. For 0 ≤ t ≤ s, we have
dM
t
=
d
¸
j=1
∂u
∂x
j
(s −t, X
t
)E
t
dB
j
t
+
−
∂u
∂t
+Au +vu
(s −t, X
t
)E
t
dt
=
d
¸
j=1
∂u
∂x
j
(s −t, X
t
)E
t
dB
j
t
.
Hence M
t
is a local martingale; since it it bounded, M
t
is a martingale and hence by
optional stopping
u(s, x) = E
x
(M
0
) = E
x
(M
s
) = E
x
(f(X
s
)E
s
) .
18. Stochastic Filtering
Engineers have for many years studied system problems arising in the design of control
systems to be placed between user inputs to control system parameters in order to achieve
a desired response. For example the connection between the ﬂight controls on a modern
aircraft such as the Harrier Jump Jet and the aircraft control surfaces go via a complex
control system rather than simple wire cables as found in early aircraft.
Even a basic control system such as a central heating thermostat is based upon an
element of feedback – observing the state of the system (room temperature) and adjusting
the system inputs (turning the boiler oﬀ or on) to keep the system state close to the desired
value (the temperature set on the dial).
In the example of a high performance aircraft, sudden movement of an aircraft control
surface communicated directly to the control surfaces might cause uncontrolled growing
oscillations – which was not the response required and the control system would have to
modify the control surfaces to avoid this happening.
In the real world the behaviour of these systems is aﬀected by extra random unknown
factors which can not be known; for example wind eﬀects on the airframe, engine vibration.
Also all measurements have inherent random errors which cannot be eliminated completely.
Attempts to solve control problems in this context require that we have some idea how to
use the observations made to give a best possible estimate of the system state at a given
time.
In this basic introduction to stochastic nonlinear ﬁltering we shall consider a system
described by a stochastic process called the signal process whose behaviour is governed
by a dynamical system containing a Brownian motion noise input. This is observed, to
gether with unavoidable observation noise. The goal of the stochastic ﬁltering problem is
to determine the law of the signal given the observations. This must be formalised in the
language of stochastic calculus. The notation   will be used in this section to denote the
Euclidean norm in ddimensional space.
An important modern example of stochastic ﬁltering is the use of GPS measurements
to correct an inertial navigation system (INS). An INS system has transducers which mea
sure acceleration. Starting from a known point in space the acceleration can be integrated
twice to compute a position. But since there are error in acceleration measurements this
position will gradually diverge from the true position. This is a phenomenon familiar to
anyone who has navigated with a map by deadreckoning (From the station I walk NE for
a mile, E for 2 miles – where am I?).
The simple navigator checks his position as frequently as possible by other means (e.g.
observing the bearing to several landmarks, taking a starsight or by looking at a handheld
GPS receiver).
In a similar fashion we might try and correct our INS position estimates by using
periodic GPS observations; these GPS position measurements can not be made continu
ously and are subject to random errors; additionally the observer continues to move while
taking the readings. What is the optimal way to use these GPS measurements to correct
our INS estimate? The answer lies in the solution of a stochastic ﬁltering problem. The
signal process is the true position and the observation process is the GPS position process.
[69]
Stochastic Filtering 70
18.1. Signal Process
Let ¦X
t
, T
t
, t ≥ 0¦ be the signal process. It is deﬁned to be the solution of the stochastic
diﬀerential equation
dX
t
= f (t, X
t
)dt +σ(t, X
t
)dV
t
, (4)
where V is a d dimensional standard Brownian motion. The coeﬃcients satisfy the condi
tions
f (t, x) −f (t, y) +σ(t, x) −σ(t, y) ≤ kx −y
f (t, x)
2
+σ(t, x)
2
≤ k
2
(1 +x
2
),
which ensure that the equation for X has a unique solution.
18.2. Observation Process
The observation process satisﬁes the stochastic diﬀerential equation
dY
t
= h(t, X
t
)dt + dW
t
, Y
0
= 0, (5)
where W
t
is an m dimensional standard Brownian motion independent of V
t
. The function
h taking values in R
m
satisﬁes a linear growth condition
h(t, x)
2
≤ k(1 +x
2
).
A consequence of this condition is that for any T > 0,
E
X
t

2
≤ C
1 +E
X
0

2
e
ct
, (6)
for t ∈ [0, T] with suitable constants C and c which may be functions of T. As a result of
this bound
E
¸
T
0
h(s, X
s
)
2
ds
¸
< ∞.
Given this process a sequence of observation σalgebras may be deﬁned
\
o
t
:= σ (Y
s
: 0 ≤ s ≤ t) .
These must be augmented in the usual fashion to give
\
t
= \
o
t
∪ ^,
where ^ is the set of P null subsets of Ω, and
\ :=
¸
t≥0
\
t
.
Stochastic Filtering 71
18.3. The Filtering Problem
The above has set the scene, we have a real physical system whose state at time t is
represented by the vector X
t
. The system state is governed by an SDE with a noise term
representing random perturbations to the system. This is observed to give the observation
process Y
t
which includes new independent noise (represented by W
t
).
The ﬁltering problem is to ﬁnd the conditional law of the signal process given the
observations to date, i.e. to ﬁnd
π
t
(φ) := E(φ(X
t
)[\
t
) ∀t ∈ [0, T]. (7)
The following proofs are in many cases made complex by the fact that we do not
assume that the functions f, h and σ are bounded.
18.4. Change of Measure
To solve the ﬁltering problem Girsanov’s theorem will be used to make a change of measure
to a new measure under which the observation process Y is a Brownian motion. Let
Z
t
:= exp
−
t
0
h
T
(s, X
s
)dW
s
−
1
2
t
0
h(s, X
s
)
2
ds
, (8)
be a real valued process.
Proposition 18.1.
The process Z
t
is a martingale with respect to the ﬁltration ¦T
t
, t ≥ 0¦.
Proof
The process Z
t
satisﬁes
Z
t
= 1 −
t
0
Z
s
h
T
(s, X
s
)dW
s
,
that is Z
t
solves the SDE
dZ
t
= −Z
t
h
T
(t, X
t
)dW
t
,
Z
0
= 1.
Given this expression, Z
t
is a positive continuous local martingale and hence is a super
martingale. To prove that it is a true martingale we must prove in addition that it has
constant mean.
Let T
n
be a reducing sequence for the local martingale Z
t
, i.e. an increasing sequence
of stopping times tending to inﬁnity as n → ∞ such that for each n, Z
T
n
is a genuine
martingale. By Fatou’s lemma, and the local martingale property
EZ
t
= E lim
n→∞
Z
T
n
t
≤ liminf EZ
T
n
t
= EZ
T
n
0
= 1,
so
EZ
t
≤ 1 ∀t. (9)
This will be used as an upper bound in an application of the dominated convergence
theorem. By application of Itˆo’s formula to the function f deﬁned by
f(x) =
x
1 +x
,
Stochastic Filtering 72
we obtain that
f(Z
t
) = f(Z
0
) +
t
0
f
(Z
s
)dZ
s
+
1
2
t
0
f
(Z
s
)d'Z
s
, Z
s
`,
Hence
Z
t
1 +Z
t
=
1
1 +
−
t
0
Z
s
h
T
(s, X
s
)
(1 +Z
s
)
2
dW
s
−
t
0
Z
2
s
h(s, X
s
)
2
(1 +Z
s
)
3
ds. (10)
Consider the term
t
0
Z
s
h
T
(s, X
s
)
(1 +Z
s
)
2
dW
s
,
clearly this a local martingale, since it is a stochastic integral. The next step in the proof
is explained in detail as it is one which occurs frequently. We wish to show that the above
stochastic integral is in fact a genuine martingale. From the earlier theory (for L
2
integrable
martingales) it suﬃces to show that integrand is in L
2
(W). We therefore compute
Z
s
h
T
(s, X
s
)
(1 +Z
s
)
2
W
= E
¸
t
0
Z
s
h
T
(s, X
s
)
(1 +Z
s
)
2
2
ds
¸
,
with a view to showing that it is ﬁnite, whence the integrand must be in L
2
(W) and hence
the integral a genuine martingale. We note that since Z
s
≥ 0, that
1
(1 +Z
s
)
≤ 1, and
Z
2
s
(1 +Z
s
)
2
≤
1
2
,
so,
Z
s
h
T
(s, X
s
)
(1 +Z
s
)
2
2
=
Z
2
s
(1 +Z
s
)
4
h(s, X
s
)
2
=
¸
Z
2
s
(1 +Z
s
)
2
¸
1
(1 +Z
s
)
2
h(s, X
s
)
2
≤
1
2
h(s, X
s
)
2
.
Using this inequality we obtain
E
¸
t
0
Z
s
h
T
(s, X
s
)
(1 +Z
s
)
2
2
ds
¸
≤
1
2
E
¸
T
0
h(s, X
s
)
2
ds
¸
,
however as a consequence of our linear growth condition on h, the last expectation is ﬁnite.
Therefore
t
0
Z
s
h
T
(s, X
s
)
(1 +Z
s
)
2
dW
s
is a genuine martingale, and hence taking the expectation of (10) we obtain
E
¸
Z
t
1 +Z
t
=
1
1 +
−E
¸
t
0
Z
2
s
h(s, X
s
)
2
(1 +Z
s
)
3
ds
.
Stochastic Filtering 73
Consider the integrand on the right hand side; clearly it tends to zero as → 0. But also
Z
2
s
h(s, X
s
)
2
(1 +Z
s
)
3
=
Z
s
(1 +Z
s
)
3
Z
s
h(s, X
s
)
2
≤
1 +Z
s
(1 +Z
s
)
3
Z
s
h(s, X
s
)
2
≤kZ
s
1 +X
s

2
,
where we have used the fact that h(s, X
s
)
2
≤ k(1 + X
s

2
). So from the fact that
EZ
s
≤ 1 and using the next lemma we shall see that E
Z
s
X
s

2
≤ C. Hence we conclude
that kZ
s
(1 + X
s

2
) is an integrable dominating function for the integrand on interest.
Hence by the Dominated Convergence Theorem, as → 0,
E
¸
t
0
Z
2
s
h(s, X
s
)
2
(1 +Z
s
)
3
ds
→ 0.
In addition, since E(Z
s
) ≤ 1, the bounded convergence theorem yields
E
Z
s
1 +Z
s
→E(Z
s
), as → 0.
Hence we conclude that
E(Z
t
) = 1 ∀t,
and so Z
t
is a genuine martingale.
Lemma 18.2.
E
¸
t
0
Z
s
X
s

2
ds
< C(T) ∀t ∈ [0, T].
Proof
To establish this result Itˆo’s formula is used to derive two important results
d(X
t

2
) =2X
T
t
(f dt +σdW
s
) + tr(σ
T
σ)dt
d(Z
t
X
t

2
) =−Z
t
X
t

2
h
T
dV
s
+Z
t
2X
T
t
(f dt +σdW
s
) + tr(σσ
T
)
dt
By Itˆo’s formula,
d
Z
t
X
t

2
1 +Z
t
X
t

2
=
1
(1 +Z
t
X
t

2
)
2
d
Z
t
X
t

2
+
(1 +Z
t
X
t

2
)
3
d'Z
t
X
t

2
`.
Substituting for d(Z
t
X
t

2
) into this expression yields
d
Z
t
X
t

2
1 +Z
t
X
t

2
=
1
(1 +Z
t
X
t

2
)
2
−Z
t
X
t

2
h
T
dW
t
+Z
t
2X
T
t
σdV
s
+
¸
Z
t
2X
T
t
f + tr(σσ
T
)
(1 +Z
t
X
t

2
)
2
−
Z
2
t
X
t

4
h
T
h + 4Z
2
t
2X
T
t
σσ
T
X
t
(1 +Z
t
X
t

2
)
3
¸
dt.
Stochastic Filtering 74
After integrating this expression from 0 to t, the terms which are stochastic integrals
are local martingales. In fact they can be shown to be genuine martingales. For example
consider the term
t
0
Z
s
2X
T
s
σ
(1 +Z
s
X
s

2
)
2
dV
s
,
to show that this is a martingale we must therefore establish that
E
t
0
¸
Z
s
2X
T
s
σ
(1 +Z
s
X
s

2
)
2
¸
2
ds
¸
¸
= 4E
¸
t
0
Z
2
s
X
T
s
σσ
T
X
s
(1 +Z
s
X
s

2
)
4
ds
¸
< ∞.
In order to establish this inequality consider the term X
T
t
σσ
T
X
t
which is a sum over terms
of the form
[X
i
t
σ
ij
(t, X
t
)σ
kj
(t, X
t
)X
k
t
[ ≤ X
t

2
σ
2
,
of which there are d
3
terms (in R
d
). But by the linear increase condition on σ, we have for
some constant κ
σ
2
≤ κ(1 +X
2
),
and hence
[X
i
t
σ
ij
(t, X
t
)σ
kj
(t, X
t
)X
k
t
[ ≤ κX
t

2
1 +X
t

2
,
so the integral may be bounded by
t
0
Z
2
s
X
T
s
σσ
T
X
s
(1 +Z
s
X
s

2
)
4
ds ≤ κd
3
t
0
Z
2
s
X
s

2
1 +X
s

2
(1 +Z
s
X
s

2
)
4
ds
= κd
3
¸
t
0
Z
2
s
X
s

2
(1 +Z
s
X
s

2
)
4
ds +
t
0
Z
2
s
X
s

4
(1 +Z
s
X
s

2
)
4
¸
ds
Considering each term separately, the ﬁrst satisﬁes
t
0
Z
2
s
X
s

2
(1 +Z
s
X
s

2
)
4
ds ≤
t
0
Z
s
Z
s
X
s

2
(1 +Z
s
X
t

2
)
1
(1 +Z
s
X
s

2
)
3
ds
≤
t
0
Z
s
ds ≤
1
t
0
Z
s
.
The last integral has a bounded expectation since E(Z
s
) ≤ 1. Similarly for the second
term,
t
0
Z
2
s
X
s

4
(1 +Z
s
X
s

2
)
4
ds ≤
t
0
Z
2
s
X
s

4
(1 +Z
s
X
s

2
)
2
1
(1 +Z
s
X
s

2
)
2
ds ≤
1
2
t < ∞.
A similar argument holds for the other stochastic integrals, so they must both be genuine
martingales, and hence if we take the expectations of the integrals they are zero. Integrating
Stochastic Filtering 75
the whole expression from 0 to t, then taking the expectation and ﬁnally diﬀerentiation
with respect to t yields
d
dt
E
Z
t
X
t

2
1 +Z
t
X
t

2
≤ E
Z
t
(2X
T
t
f + tr(σσ
T
)
(1 +Z
t
X
t

2
)
2
≤ k
1 +E
¸
Z
t
X
t

2
1 +Z
t
X
t

2
.
An application of Gronwall’s inequality (see the next section for a proof of this useful
result) establishes,
E
Z
t
X
t

2
1 +Z
t
X
t

2
≤ C(T); ∀t ∈ [0, T].
Applying Fatou’s lemma the desired result is obtained.
Given that Z
t
has been shown to be a martingale a new probability measure
˜
P may
be deﬁned by the RadonNikodym derivative:
d
˜
P
dP
F
o
T
= Z
T
.
This change of measure allows us to establish one of the most important results in stochastic
ﬁltering theory.
Proposition 18.3.
Under the probability measure
˜
P, the observations process Y
t
is a Brownian motion, which
is independent of X
t
.
Proof
Deﬁne a process M
t
taking values in R by
M
t
= −
t
0
h
T
(s, X
s
) dW
s
, (11)
is a martingale. Since W
t
is a P Brownian Motion, which is a martingale under P, it follows
that M
t
is a P local martingale. The previous lemma then establishes that M
t
is in fact a
P martingale. Thus we may apply Girsanov’s theorem, to construct a new measure where
Y
t
=
d
˜
P
dP
t
= c(M
t
) = exp
−
t
0
h
T
(s, X
s
) dW
s
−
1
2
t
0
h(s, X
s
)
2
ds
= Z
t
.
As a corollary of Girsanov’s theorem W
t
− 'W, M`
t
is a continuous
˜
P local martingale.
L´evy’s characterisation theorem then implies that under
˜
P the process
W
t
−'W, M`
t
= W
t
+
t
0
h(s, X
s
) ds,
Stochastic Filtering 76
is an m dimensional Brownian Motion. But this process is just the observation process Y
t
by deﬁnition. Thus we have proven that under
˜
P, the observation process Y
t
is a Brownian
Motion. To establish the second part of the proposition we must prove the independence
of Y and X. The law of the pair (X, Y) on [0, T] is absolutely continuous with respect to
that of the pair (X, V) on [0, T], since the latter is equal to the former plus a drift term.
Their RadonNikodym derivative (which exists since they are absolutely continuous
with respect to each other) is Ψ
T
, which means that for f any bounded measurable function
E
P
[f(X
t
, Y
t
)Ψ
∞
] = E
P
[E
P
[f(X
t
, Y
t
)Ψ
t
[T
t
]]
= E
P
[f(X
t
, Y
t
)E
P
[Ψ
∞
[T
t
]]
= E
P
[f(X
t
, Y
t
)Ψ
t
] = E
P
[f(X
t
, V
t
)] .
Hence in terms of the probability measure
˜
P
E
˜
P
[f(X
t
, Y
t
)] = E
P
[f(X
t
, Y
t
)Z
∞
] = E
P
[f(X
t
, V
t
)] ,
and hence under
˜
P, the variables X and Y are independent.
Proposition 18.4.
Let U be an integrable T
t
measurable random variable. Then
E
˜
P
[U[\
t
] = E
˜
P
[U[\] .
Proof
Deﬁne
\
t
:= σ (Y
t+u
−Y
t
: u ≥ 0) ,
then \ = \
t
∪ \
t
. Under the measure
˜
P, the σalgebra \
t
is independent of T
t
, since Y is
an T
t
adapted Brownian Motion which implies that it satisﬁes the independent increment
property. Hence
E
˜
P
[U[\
t
] = E
˜
P
[U[\
t
∪ \
t
] = E
˜
P
[U[\] .
18.5. The Unnormalised Conditional Distribution
To simplify the notation in what follows deﬁne
˜
Z
t
= 1/Z
t
. Then under the measure
˜
P we
have
d
˜
Z
t
=
˜
Z
t
h
T
(t, X
t
)dY
t
, (12)
so
˜
Z
t
is a
˜
P local martingale. Hence
˜
Z
t
= exp
t
0
h
T
(s, X
s
)dY
t
−
1
2
t
0
h(s, X
s
)
2
ds
.
Stochastic Filtering 77
Also for any stopping time T, E
˜
P
˜
Z
T
= E
P
Z
T
˜
Z
T
= 1, the constant mean condition
implies that
˜
Z
T
is a UI
˜
P martingale and thus by the martingale convergence theorem
Z
t
→ Z
∞
in L
1
. So we may write
dP
d
˜
P
F
∞
=
˜
Z
∞
,
whence
dP
d
˜
P
F
t
=
˜
Z
t
, for all t ≥ 0.
Now deﬁne, for every bounded measurable function φ, the unnormalised conditional
distribution of X via
ρ
t
(φ) :=E
˜
P
φ(X
t
)
˜
Z
t
[\
t
=E
˜
P
φ(X
t
)
˜
Z
t
[\
, (13)
using the result of proposition 18.4.
The following result within the context of stochastic ﬁltering is sometimes called the
KallianpurStriebel formula.
Proposition 18.5.
For every bounded measurable function φ, we have
π
t
(φ) := E
P
[φ(X
t
)[\
t
] =
ρ
t
(φ)
ρ
t
(1)
, P and
˜
P a.s..
Proof
Let b be a bounded \
t
measurable function. From the deﬁnition of π
t
viz
π
t
(φ) := E
P
[φ(X
t
)[\
t
] ,
we deduce from the deﬁnition of conditional expectation that
E
P
(π
t
(φ)b) = E
P
(φ(X
t
)b) ,
hence by the tower property of conditional expectation
E
˜
P
˜
Z
t
π
t
(φ)b
= E
˜
P
˜
Z
t
φ(X
t
)b
.
Whence
E
˜
P
π
t
(φ)E
˜
P
˜
Z
t
[\
t
b
= E
˜
P
E
˜
P
φ(X
t
)
˜
Z
t
[\
t
b
.
which is equivalent to
E
˜
P
(π
t
(φ)ρ
t
(1)b) = E
˜
P
(ρ
t
(φ)b) .
This implies that
E
˜
P
( π
t
(φ)ρ
t
(1)[ \
t
) = E
˜
P
( ρ
t
(φ)[ \
t
)
As the random variables π
t
(φ), ρ
t
(φ) and ρ
t
(1) from their deﬁnitions are all \
t
measurable
this implies that
π
t
(φ)ρ
t
(1) = ρ
t
(φ)
Since
˜
Z
t
> 0 a.s. which implies that ρ
t
(1) > 0 a.s. this is suﬃcient to prove the required
result.
Stochastic Filtering 78
Remark
The above proof is simply a speciﬁc proof of the abstract form of Bayes’ theorem, the
general form of which is a standard result, see e.g. A.0.3 in [Musiela and Rutkowski, 2005].
18.6. The Zakai Equation
We wish to derive a stochastic diﬀerential equation which is satisﬁed by the unnormalised
density ρ
t
.
Deﬁnition 18.6.
A set S is said to be total in L
1
(Ω, T
t
, P), if for every a ∈ L
1
(Ω, T
t
, P)
E(a
t
) = 0 ∀ ∈ S ⇒ a = 0.
Lemma 18.7.
Let
S
t
:=
t
= exp
i
t
0
r
T
s
dY
s
+
1
2
t
0
r
s

2
ds
: r
s
∈ L
∞
([0, t], R
m
)
,
then the set S
t
is a total set in L
1
(Ω, \
t
,
˜
P). The following proof of this standard result is
taken from [Bensoussan, 1982].
Proof
It suﬃces to show that if a ∈ L
1
(Ω, \
t
,
˜
P) and
˜
E(a
t
) = 0 for all
t
∈ S
t
that a = 0. Take
t
1
, t
2
, . . . , t
p
∈ (0, t) such that t
1
< t
2
< < t
p
, then given k
1
, . . . , k
p
p
¸
j=1
k
T
j
Y
t
j
=
p
¸
j=1
µ
T
i
(Y
t
j
−Y
t
j−1
) =
t
0
b
T
s
dY
s
where
µ
p
= k
p
, µ
p−1
= k
p
+k
p−1
, . . . , µ
1
= k
p
+ k
1
.
and
b
s
=
µ
j
for t ∈ (t
j−1
, t
j
)
0 for t ∈ (t
p
, T)
Hence
˜
E
a exp
i
t
0
b
s
dY
s
=
˜
E
¸
a exp
¸
i
p
¸
j=1
k
T
j
Y
t
j
= 0
and the same holds for linear combinations by the linearity of
˜
E viz
˜
E
¸
a
K
¸
k=1
c
k
exp
¸
i
p
¸
j=1
k
T
j,k
Y
t
j
= 0
Stochastic Filtering 79
where c
1
, . . . , c
K
are complex coeﬃcients. If F(x
1
, . . . , x
p
) is a continuous bounded complex
valued function deﬁned on (R
m
)
p
then since the set
f(x
1
, . . . , x
p
) =
K
¸
k=1
c
k
exp
¸
i
p
¸
j=1
k
T
j,k
x
j
: c
k
∈ C ∀k, k
j,k
∈ C ∀j, k
is closed under complex conjugation, and separates points in (R
m
)
p
, the StoneWeierstrass
approximation theorem for complex valued functions implies that there exists a uniformly
bounded sequence of functions of the form
P
(n)
(x
1
, . . . , x
p
) =
K
(n)
¸
k=1
c
(n)
k
exp
¸
i
p
¸
j=1
k
(n)
j,k
T
x
j
such that
lim
n→∞
P
(n)
(x
1
, . . . , x
p
) = F(x
1
, . . . , x
p
)
hence
˜
E
aF(Y
t
1
, . . . , Y
t
p
)
= 0
for every continuous bounded F and by the usual approximation argument this extends
to every F bounded Borel measurable with respect to σ(Y
t
1
, . . . , Y
t
p
). As 0 < t
1
< . . . <
t
p
< t were arbitrary it follows that
˜
E(af) = 0 for every f bounded \
t
measurable; taking
f(x) = a(x) ∧ n for arbitrary n then implies
˜
E(a
2
∧ n) = 0, whence a = 0
˜
P a.s..
Remark
As a result if φ ∈ L
1
(Ω, \
t
,
˜
P) and we wish to compute E(φ[\), it suﬃces to compute
E(φ
t
) for all
t
∈ S
t
.
Lemma 18.8.
Let ¦U
t
¦
t≥0
be a continuous T
t
measurable process such that
˜
E
T
0
U
2
t
dt
< ∞, ∀T ≥ 0,
then for any t ≥ 0, and for j = 1, . . . , m the following holds
˜
E
t
0
U
s
dY
j
s
\
t
=
t
0
˜
E(U
s
[\
s
) dY
j
s
.
Proof
As S
t
is a total set in L
1
(Ω, \
t
,
˜
P), it is suﬃcient to prove that for any
t
∈ S
t
that
˜
E
t
t
0
U
s
dY
j
s
=
˜
E
t
t
0
˜
E(U
s
[\
s
) dY
j
s
.
Stochastic Filtering 80
To this end note that for every
t
∈ S
t
the following holds
t
= 1 +
t
0
i
s
r
T
s
dY
s
,
and hence
˜
E
t
t
0
U
s
dY
j
s
=
˜
E
1 +
t
0
i
s
r
T
s
dY
s
t
0
U
s
dY
j
s
=
˜
E
t
0
U
s
dY
j
s
+
˜
E
t
0
i
s
r
j
s
U
s
ds
.
Here the last term is computed by recalling the deﬁnition of the covariation process, in
conjunction with the Kunita Watanabe identity. The ﬁrst term on the right hand side above
is a martingale (from the boundedness condition on U
s
) and hence its expectation vanishes.
Thus using the fact that the lebesgue integral and the expectation
˜
E([\) commute
˜
E
t
t
0
U
s
dY
j
s
=
˜
E
t
0
i
s
r
j
s
U
s
ds
=
˜
E
˜
E
t
0
i
s
r
j
s
U
s
ds
\
=
˜
E
t
0
i
˜
E
s
r
j
s
U
s
\
ds
=
˜
E
t
0
i
s
r
j
s
˜
E(U
s
[\) ds
=
˜
E
t
0
i
s
r
T
s
dY
s
t
0
˜
E(U
s
[\) dY
j
s
=
˜
E
t
t
0
˜
E(U
s
[\) dY
j
s
.
But from proposition 18.4, it follows that since U
s
is \
s
measurable
˜
E(U
s
[\) =
˜
E(U
s
[\
s
)
whence
˜
E
t
t
0
U
s
dY
j
s
=
˜
E
t
t
0
˜
E(U
s
[\
s
) dY
j
s
.
As this holds for all
t
in S
t
, and the latter is a total set, by the earlier remarks, this
establishes the result.
Lemma 18.9.
Let ¦U
t
¦
t≥0
be a real valued T
t
adapted continuous process such that
˜
E
T
0
U
2
s
ds
< ∞, ∀T ≥ 0, (14)
Stochastic Filtering 81
and let R
t
be another continuous T
t
adapted process such that 'R, Y `
t
= 0 for all t. Then
˜
E
¸
t
0
U
s
dR
s
[\
t
= 0.
Proof
As before use the fact that S
t
is a total set, so it suﬃces to show that for each
t
∈ S
t
˜
E
t
t
0
U
s
dR
s
= 0.
As in the previous proof we note that each
t
in S
t
satisﬁes
t
= 1 +
t
0
i
s
r
T
s
dY
s
Hence substituting this into the above expression yields
˜
E
t
t
0
U
s
dR
s
=
˜
E
t
0
U
s
dR
s
+
˜
E
¸
t
0
i
s
r
T
s
dY
s
t
0
U
s
dR
s
=
˜
E
t
0
U
s
dR
s
+
˜
E
t
0
i
s
U
s
r
T
s
d'Y, R`
s
=
˜
E
t
0
U
s
dR
s
=0.
The last term vanishes, because by the condition 14, the stochastic integral is a genuine
martingale. The other term vanishes because from the hypotheses 'Y, R`
t
= 0.
Remark
In the context of stochastic ﬁltering a natural application of this lemma will be made by
setting R
t
= V
t
, the stochastic noise process driving the signal process.
We are now in a position to state and prove the Zakai equation. The Zakai equa
tion is important because it is a parabolic stochastic partial diﬀerential equation which is
satisﬁed by ρ
t
, and indeed it’s solution provides a practical means of computing ρ
t
(φ) =
˜
E
φ(X
t
)
˜
Z
t
[\
t
. Indeed the Zakai equation provides a method by which numerical solu
tion of the nonlinear ﬁltering problem may be approached by using recursive algorithms
to solve the stochastic diﬀerential equation.
Theorem 18.10.
Let A be the inﬁnitesimal generator of the signal process X
t
, let the domain of this in
ﬁnitesimal generator be denoted T(A). Then subject to the usual conditions on f, σ and
h, the unnormalised conditional distribution on X satisﬁes the Zakai equation which is
ρ
t
(φ) = π
0
(φ) +
t
0
ρ
s
(A
s
φ)ds +
t
0
ρ
s
(h
T
φ)dY
s
, ∀t ≥ 0, ∀φ ∈ T(A). (15)
Stochastic Filtering 82
Proof
We approximate
˜
Z
t
by
˜
Z
t
=
˜
Z
t
1 +
˜
Z
t
,
noting that since Z
t
≥ 0, this approximation is bounded by
˜
Z
t
≤
4
27
2
.
Thus we have approximated the process Z
t
by a bounded process. Application of Itˆo’s
formula to f(x) = x/(1 +x), yields
d
˜
Z
t
= df(
˜
Z
t
) = (1 +
˜
Z
t
)
−2
˜
Z
t
h
T
dY
t
−(1 +
˜
Z
t
)
−3
˜
Z
2
t
h(t, X
t
)
2
.
The integration by parts formula yields,
d
˜
Z
t
φ(X
t
)
= φ(X
t
)d
˜
Z
t
+
˜
Z
t
d (φ(X
t
)) +
˜
Z
t
, φ(X
t
)
.
However from the deﬁnition of an inﬁnitesimal generator, we know that
φ(X
t
) −φ(X
0
) =
t
0
A
s
φ(X
s
)ds +M
φ
t
,
where M
φ
t
is a martingale given by
M
φ
t
=
d
¸
j=1
t
0
d
¸
i=1
∂φ
∂x
i
(X
t
)σ
ij
(t, X
t
) dW
j
t
.
Using the expression for dZ
t
found earlier we may write
d
˜
Z
t
φ(X
t
)
=
˜
Z
t
Aφ(X
t
) −φ(X
t
)(1 +
˜
Z
t
)
−3
˜
Z
2
t
h(t, X
t
)
2
dt
+
˜
Z
t
dM
φ
t
+φ(X
t
)(1 +
˜
Z
t
)
−2
˜
Z
t
h
T
(t, X
t
)dY
t
In terms of the functions already deﬁned we can write the inﬁnitesimal generator for X
t
as
A
s
φ(x) =
d
¸
i=1
f
i
(s, x)
∂φ
∂x
i
(x) +
1
2
d
¸
i=1
d
¸
j=1
d
¸
k=1
σ
ik
(s, x)σ
kj
(s, x)
∂
2
φ
∂x
i
∂x
j
(x),
Now we wish to compute
ρ
t
(φ) =
˜
E
˜
Z
t
φ(X
t
)[\
t
Stochastic Filtering 83
we shall do this using the previous expression which on integration gives us
˜
Z
t
φ(X
t
) =
˜
Z
0
φ(X
0
) +
t
0
˜
Z
s
Aφ(X
s
) −φ(X
t
)(1 +
˜
Z
s
)
−3
˜
Z
2
s
h(s, X
s
)
2
ds
+
t
0
˜
Z
s
dM
φ
s
+
t
0
φ(X
s
)(1 +
˜
Z
s
)
−2
˜
Z
s
h
T
(s, X
s
)dY
s
(16)
Note that
˜
E
˜
Z
0
φ(X
0
)
\
t
=
˜
E
1
1 +
φ(X
0
)
\
0
= π
0
(φ)
1
1 +
.
Hence taking conditional expectations of both sides of (16)
˜
E
˜
Z
t
φ(X
t
)
\
t
=
π
0
(φ)
1 +
+
˜
E
¸
t
0
φ(X
s
)(1 +
˜
Z
s
)
−2
˜
Z
s
h
T
(s, X
s
) dY
s
\
t
+
˜
E
¸
t
0
˜
Z
s
Aφ(X
s
) −φ(X
s
)(1 +
˜
Z
s
)
−3
˜
Z
2
s
h(s, X
s
)
2
ds
\
t
+
˜
E
¸
t
0
˜
Z
s
dM
φ
s
\
t
Applying lemma 18.8 since
˜
Z
t
is bounded allows us to interchange the (stochastic) integrals
and the conditional expectations to give
˜
E
˜
Z
t
φ(X
t
)
\
t
=
π
0
(φ)
1 +
+
t
0
˜
E
φ(X
s
)(1 +
˜
Z
s
)
−2
˜
Z
s
h
T
(s, X
s
)
\
t
dY
s
+
t
0
˜
E
˜
Z
s
Aφ(X
s
) −φ(X
s
)(1 +
˜
Z
s
)
−3
˜
Z
2
s
h(s, X
s
)
2
\
t
ds
+
t
0
˜
E
˜
Z
s
\
t
dM
φ
s
Now we take the ↓ 0 limit. Clearly
˜
Z
t
→
˜
Z
t
pointwise, and thus by the monotone
convergence theorem since Z
t
↑ Z
t
as → 0,
˜
E
˜
Z
t
φ(X
t
)[\
t
→
˜
E
˜
Z
t
φ(X
t
)[\
t
= ρ
t
(φ),
and also by the monotone convergence theorem
˜
E
˜
Z
s
A
s
φ(X
s
)[\
s
→ ρ
s
(A
s
φ).
We can bound this by
∀ > 0,
˜
E(Z
s
A
s
φ(X
s
)[\
s
) ≤ A
s
φ
˜
E
˜
Z
s
[\
s
,
Stochastic Filtering 84
and the right hand side is an L
1
bounded quantity. Hence by the dominated convergence
theorem we have
t
0
E
˜
Z
s
A
s
φ(X
s
)[\
s
ds →
t
0
ρ
s
(A
s
φ) ds a.s.
Also note that
lim
→0
φ(X
s
)
˜
Z
s
2
1 +
˜
Z
s
−1
h(s, X
s
)
2
= 0.
and since the function may be dominated by
φ(X
s
)
˜
Z
s
2
1 +
˜
Z
s
−1
h(s, X
s
)
2
≤ φ
2
∞
˜
Z
s
1 +
˜
Z
s
Z
s
1 +
˜
Z
s
1
1 +
˜
Z
s
h(s, X
s
)
2
≤
˜
Z
s
h(s, X
s
)
2
≤
˜
Z
s
k(1 +X
s

2
)
But
˜
E(
˜
Z
s
k(1 +X
s

2
)) = kE
Z
s
˜
Z
s
1 +X
s

2
= k
1 +E(X
s

2
)
≤ k
1 +C
1 +E(X
0

2
)
e
ct
(17)
Where the ﬁnal inequality follows by (6). As a consequence of these two results, as the right
hand side of the above inequality is integrable, by the conditional form of the Dominated
Convergence theorem it follows that
˜
E
φ(X
s
)
˜
Z
s
2
1 +
˜
Z
s
−1
h(s, X
s
)
2
\
→ 0
as → 0. By proposition 18.4 the sigma ﬁeld \ may be replaced by \
t
which yields that
as → 0
˜
E
φ(X
s
)
˜
Z
s
2
1 +
˜
Z
s
−1
h(s, X
s
)
2
\
s
→ 0.
Now note again that the dominating function (17) is also Lebesgue integrable and bounded
thus a second application of the Dominated convergence theorem yields
t
0
˜
E
¸
φ(X
s
)
˜
Z
s
2
1 +
˜
Z
s
−1
h(s, X
s
)
2
\
s
ds → 0, as → 0.
It now remains to check that the stochastic integral
t
0
˜
E
¸
φ(X
s
)
˜
Z
s
1 +
˜
Z
s
−1
h
T
(s, X
s
)
\
s
dY
s
→
t
0
ρ
s
(h
T
φ(X
s
)) dY
s
.
Stochastic Filtering 85
To this end deﬁne
I
(t) :=
t
0
˜
E
¸
φ(X
s
)
˜
Z
s
1 +
˜
Z
s
−1
h
T
(s, X
s
)
\
s
dY
s
,
and the desired limit
I(t) :=
t
0
ρ
s
(h
T
φ(X
s
))dY
s
=
t
0
˜
E
˜
Z
s
h
T
(s, X
s
)φ(X
s
)
\
t
dY
s
.
Both I(t) and I
(t) are continuous bounded square integrable martingales. Consider the
diﬀerence
I
(t) −I(t) =
t
0
˜
E
¸
¸
˜
Z
s
1 +
˜
Z
s
2
−
˜
Z
s
φ(X
s
)h
T
(s, X
s
)
\
¸
¸
¸dY
s
= −
t
0
˜
E
˜
Z
2
s
2 +
˜
Z
s
1 +
˜
Z
s
2
φ(X
s
)h
T
(s, X
s
)
\
¸
¸
¸dY
s
,
Consider the L
2
norm of this diﬀerence and use the fact that Y is a
˜
P Brownian motion
˜
E(I
(t) −I(t))
2
=
˜
E
¸
¸
t
0
˜
E
˜
Z
2
s
2 +
˜
Z
s
1 +
˜
Z
s
2
φ(X
s
)h
T
(s, X
s
)
\
¸
¸
¸dY
s
2
=
˜
E
¸
¸
t
0
˜
E
˜
Z
2
s
2 +
˜
Z
s
1 +
˜
Z
s
2
φ(X
s
)h
T
(s, X
s
)
\
¸
¸
¸
2
ds
By the conditional form of Jensen’s inequality 
˜
E(X[\)
2
≤
˜
E
X
2
\
. Thus
˜
E
˜
Z
2
s
2 +
˜
Z
s
1 +
˜
Z
s
2
φ(X
s
)h
T
(s, X
s
)
\
¸
¸
¸
2
≤
˜
E
¸
¸
˜
Z
2
s
2 +
˜
Z
s
1 +
˜
Z
s
2
φ(X
s
)h
T
(s, X
s
)
2
\
A dominating function can can readily be found for the quantity inside the conditional
expectation viz
˜
Z
2
s
2 +
˜
Z
s
1 +
˜
Z
s
2
φ(X
s
)h
T
(s, X
s
)
2
≤ φ
2
∞
˜
Z
s
1 +
˜
Z
s
1 +
1
1 +
˜
Z
s
˜
Z
s
2
h(s, X
s
)
2
≤ 4φ
2
∞
˜
Z
2
s
h(s, X
s
)
2
≤ 4kφ
2
∞
˜
Z
2
s
1 +X
s

2
Stochastic Filtering 86
Since
˜
E
˜
Z
s
˜
Z
s
+
˜
Z
s
X
s

2
= E
˜
Z
s
+E
˜
Z
s
X
s

2
Using the dominated convergence theorem as → 0
t
0
˜
E
˜
Z
2
s
2 +
˜
Z
s
1 +
˜
Z
s
2
φ(X
s
)h
T
(s, X
s
)
\
t
¸
¸
¸
2
ds → 0
Now we use the dominated convergence theorem to show that for a suitable subse
quence that since h(s, X
s
)
2
≤ K(1 +X
s

2
) we can write
˜
E
t
0
¸
¸
˜
E
φ(X
s
)
˜
Z
2
s
2 +
˜
Z
s
1 +
˜
Z
s
2
h
T
(s, X
s
)[\
s
¸
¸
¸
2
ds
¸
¸
¸ → 0. a.s.
Thus we have shown that
˜
E
(I
(t) −I(t))
2
→ 0.
By a standard theorem of convergence, this means that there is a suitable subsequence
n
such that
I
n
(t) → I(t) a.s.
which is suﬃcient to establish the claim.
18.7. KushnerStratonowich Equation
The Zakai equation which has been derived in the previous section is an SDE which is
satisﬁed by the unnormalised conditional distribution of X
t
, i.e. ρ
t
(φ). It seems natural
therefore to derive a similar equation satisﬁed by the normalised conditional distribution,
π
t
(φ). Recall that the unnormalised conditional distribution is given by
ρ
t
(φ) =
˜
E
φ(X
t
)
˜
Z
t
[\
t
,
and the normalised conditional law is given by
π
t
(φ) := E[φ(X
t
)[\
t
] .
As a consequence of the proposition 17.5
π
t
(φ) =
ρ
t
(φ)
ρ
t
(1)
,
Now using this result together with the Zakai equation, we can derive the Kushner
Stratonovitch equation.
Theorem (KushnerStratonovich) 18.11.
The normalised conditional law of the process X
t
subject to the usual conditions on the
processes X
t
and Y
t
satisﬁes the KushnerStratonovich equation
π
t
(φ) = π
0
(φ) +
t
0
π
s
(A
s
φ)ds +
t
0
π
s
(h
T
φ) −π
s
(h
T
)π
s
(φ)
(dY
s
−π
s
(h)ds) . (18)
The process Y
t
−
t
0
π
s
(h(s, X
s
)) ds is called the innovations process.
19. Gronwall’s Inequality
An important and frequently used result in the theory of stochastic diﬀerential equations
is Gronwall’s lemma. There are various versions of Gronwall’s theorem (see [Ethier and
Kurtz, 1986], [Karatzas and Shreve, 1987]) but the following general form and proof follows
that given in [Mandelbaum et al., 1998].
Theorem (Gronwall).
Let x, y and z be measurable nonnegative functions on the real numbers. If y is bounded
and z is integrable on [0, T], and for all 0 ≤ t ≤ T then
x(t) ≤ z(t) +
t
0
x(s)y(s)ds, (19)
then
x(t) ≤ z(t) +
t
0
z(s)y(s) exp
t
s
y(r)dr
ds.
Proof
Multiplying both sides of the inequality (19) by y(t) exp
−
t
0
y(s)ds
yields
x(t)y(t) exp
−
t
0
y(s)ds
−
t
0
x(s)y(s)ds
y(t) exp
−
t
0
y(s)ds
≤ z(t)y(t) exp
−
t
0
y(s)ds
.
The left hand side can be written as the derivative of a product,
d
dt
¸
t
0
x(s)y(s)ds
exp
−
t
0
y(s)ds
≤ z(t)y(t) exp
−
t
0
y(s)ds
.
This can now be integrated to give
t
0
x(s)y(s)ds
exp
−
t
0
y(s)ds
≤
t
0
z(s)y(s) exp
−
s
0
y(r)dr
ds,
or equivalently
t
0
x(s)y(s)ds ≤
t
0
z(s)y(s) exp
t
s
y(r)dr
ds.
Combining this with the original equation (19) gives the desired result.
.
[87]
Gronwall’s Inequality 88
Corollary 19.1.
If x, y, and z satisfy the conditions for Gronwall’s theorem then
sup
0≤t≤T
x(t) ≤
sup
0≤t≤T
z(t)
exp
T
0
y(s)ds
.
Proof
From the conclusion of Gronwall’s theorem we see that for all 0 ≤ t ≤ T
x(t) ≤ sup
0≤t≤T
z(t) +
T
0
z(s)y(s) exp
T
s
y(r)dr
ds,
which yields
sup
0≤t≤T
x(t) ≤ sup
0≤t≤T
z(t) +
T
0
z(s)y(s) exp
T
s
y(r)dr
ds,
≤ sup
0≤t≤T
z(t) +
sup
0≤t≤T
z(t)
T
0
y(s) exp
T
s
y(r)dr
ds
≤ sup
0≤t≤T
z(t) +
sup
0≤t≤T
z(t)
exp
T
0
y(s)ds
−1
≤
sup
0≤t≤T
z(t)
exp
T
0
y(s)ds
.
20. Kalman Filter
It is clear that in the general case the SDEs given by the KushnerStratonivich equation
are inﬁnite dimensional. A special case in which they are of ﬁnite dimension is the most
famous example of stochastic ﬁltering is undoubtedly the KalmanBucy ﬁlter. The relevant
equations can be derived without using the general machinery of stochastic ﬁltering theory.
The ﬁrst practical use of the Kalman ﬁlter was in the ﬂight guidance computer for the
Apollo missions, long before the general nonlinear stochastic ﬁltering theory had been
developed. However by examining this special linear case, it is hoped that the operation
of the KushnerStratonowich equation may be made clearer.
We consider the system (X
t
, Y
t
) ∈ R
N
R
M
where X
t
is the signal process and Y
t
is the observation process, described by the following SDEs,
dX
t
= (B
t
X
t
+b
t
) dt +F
t
dV
t
dY
t
= H
t
X
t
dt + dW
t
where B
t
and F
t
are a N N matrix valued processes, V
t
is a N dimensional Brownian
motion, H
t
is an M N matrix valued process and W
t
is an M dimensional Brownian
Motion, which is independent of V
t
.
Thus in terms of the notation used earlier
f(t, X
t
) = b
t
+B
t
X
t
σ(t, X
t
) = F
t
h(t, X
t
) = H
t
X
t
The inﬁnitesimal generator of the signal process X
t
is given by
A
t
u =
N
¸
i=1
b
(i)
t
(X
t
)
∂u
∂x
i
+
1
2
N
¸
i=1
N
¸
j=1
(FF
T
)
ij
∂
2
u
∂x
i
∂x
j
which may readily be veriﬁed by applying Itˆo’s formula to f(X
t
).
Since the system of equations is linear it is simple to show that the conditional dis
tribution of X
t
conditional upon \
t
is a multivariate normal distribution. Thus it may
be characterized by its ﬁrst and second moments alone. It is therefore suﬃcient to con
sider φ(X) = x
i
and φ(X) = x
i
x
j
as test functions in the KushnerStratonovich equation
(18). The notation in what follows will become somewhat messy, we write X
(i)
t
for the ith
component of the signal process.
Since we shall only be concerned with these two moments, it will simplify the notation
to deﬁne the conditional mean process
ˆ
X
(i)
t
:= π
t
(x
i
) = E
X
(i)
t
\
t
and the conditional covariance process
R
ij
t
= E
X
(i)
t
−
ˆ
X
(i)
t
X
(j)
t
−
ˆ
X
(j)
t
= π
t
(x
i
x
j
) −π
t
(x
i
)π
t
(x
j
)
Thus at time t our best estimate of the signal will be a normal distribution N(
ˆ
X
t
, R
t
).
[89]
Kalman Filter 90
20.1. Conditional Mean
To derive an equation for the evolution of the conditional mean process, we apply the
KushnerStratonowich equation to φ(X) = X
(i)
, from theorem 18.11 this satisﬁes
ˆ
X
(i)
t
=
ˆ
X
(i)
0
+
t
0
E
(B
s
X
s
)
(i)
+b
(i)
t
\
t
dt
+
t
0
E
X
(i)
s
(H
s
X
s
)
T
[\
s
−
ˆ
X
(i)
s
(H
s
ˆ
X
s
)
T
dN
s
where N
t
, the innovations process, is
N
t
= Y
t
−
t
0
H
s
ˆ
X
s
ds
Whence writing the equation for the evolution of the conditional mean in vector form, we
obtain (since R
T
= R)
d
ˆ
X
t
=
B
t
ˆ
X
t
+b
s
dt +R
t
H
T
t
(dY
t
−H
t
ˆ
X
t
dt) (20)
20.2. Conditional Covariance
We now derive an equation for the evolution of the covariance matrix R, but ﬁrst we shall
need a result about the third moment of a multivariate normal distribution.
If X ∼ N(
ˆ
X, R) then
E
X
(i)
t
X
(j)
t
X
(k)
t
= E
(
ˆ
X
(i)
t
+Z
(i)
)(
ˆ
X
(j)
t
+Z
(j)
)(
ˆ
X
(k)
t
+Z
(k)
t
)
=
ˆ
X
(i)
t
ˆ
X
(j)
t
ˆ
X
(k)
t
+
ˆ
X
(i)
t
R
jk
t
+
ˆ
X
j
t
R
ik
t
+
ˆ
X
(k)
t
R
ij
t
Applying the KushnerStratonovich equation to φ(X) = x
i
x
j
yields
E
X
(i)
t
X
(j)
t
\
t
= E
X
(i)
0
X
(j)
0
\
0
+
t
0
E
(F
t
F
T
t
)
ij
+ (B
t
X
t
+b)
(i)
X
(j)
t
+ (B
t
X
t
+b)
(j)
X
(i)
t
\
t
dt
+
t
0
E(X
(i)
t
X
(j)
t
H
t
X
t
[\
t
) −E(X
(i)
t
X
(j)
t
[\
t
)E(H
t
X
t
[\
t
)
¸
dN
s
(21)
It is clear that
dR
ij
= d
E(X
(i)
t
X
(j)
t
[\
t
)
¸
−d
ˆ
X
(i)
t
ˆ
X
(j)
t
(22)
the ﬁrst term is given by (21), and using Itˆo’s form of the product rule to expand out the
second term
d
ˆ
X
(i)
t
ˆ
X
(j)
t
=
ˆ
X
(i)
d
ˆ
X
(j)
+
ˆ
X
(j)
d
ˆ
X
(i)
+ d[
ˆ
X
(i)
,
ˆ
X
(j)
]
t
Kalman Filter 91
Therefore using (20) we can evaluate this as
d
ˆ
X
(i)
t
ˆ
X
(j)
t
=
ˆ
X
(i)
t
(B
t
X
t
+b
t
)
(j)
dt +
ˆ
X
(i)
t
(R
t
H
T
t
dN
t
)
(j)
+
ˆ
X
(j)
t
(B
t
X
t
+b
t
)
(i)
dt
+
ˆ
X
(j)
t
(R
t
H
T
t
dN
t
)
(i)
+
(R
t
H
T
t
dN
t
)
(i)
, (R
t
H
T
t
dN
t
)
(j)
(23)
For evaluating the quadratic covariation term in this expression it is simplest to work
componentwise using the Einstein summation convention and the fact that the innovation
process N
t
is readily shown by L´evy’s characterization to be a P Brownian motion
(R
t
H
T
t
dN
t
)
(i)
, (R
t
H
T
t
dN
t
)
(j)
=
R
il
t
H
kl
dN
k
, R
jm
H
nm
dN
n
= R
il
H
kl
R
jm
H
nm
δ
kn
dt
= R
il
H
kl
H
km
R
jm
dt
= (RH
T
HR
T
)
ij
dt
= (RH
T
HR)
ij
dt (24)
where the last equality follows since R
T
= R. Substituting (24) into (23) and then using
this and (21) in (22) yields the following equation for the evolution of the ijth element of
the covariance matrix
dR
ij
t
= dt
1
2
(F
t
F
T
t
)
ij
+ (B
t
R
t
)
ij
+ (R
T
t
B
T
t
)
ij
−(RH
T
HR)
ij
+ (
ˆ
X
(i)
t
R
jm
t
+
ˆ
X
(j)
t
R
jm
t
)H
lm
dN
l
−(
ˆ
X
i
t
R
jm
t
+
ˆ
X
j
t
R
jm
t
)H
lm
dN
l
Thus we obtain the ﬁnal diﬀerential equation for the evolution of the conditional covariance
matrix; notice that all of the stochastic terms have cancelled
dR
t
dt
= F
t
F
T
t
+B
t
R
t
+R
t
B
T
t
−R
t
H
T
t
H
t
R
t
(25)
The two equations (20) and (25) therefore provide a complete description of the best
prediction of the system state X
t
, conditional on the observations up to time t. This is the
KalmanBucy ﬁlter.
It is most frequently encountered in a discrete time version, since in this form it is
practical for use in some of the problems described in the introduction to the stochastic
ﬁltering section of these notes.
21. Discontinuous Stochastic Calculus
So far we have discussed integrals with repect to continuous semimartingales. It might
seem that this includes a great deal of the cases which are of real interest, but a very
important class of processes have been neglected.
The standard Poisson process on the line should be familiar. This has trajectories p(t)
which are T
t
adapted and piecewise continuous. These have the properties that for any
t ≥ 0, p(t) is a.s. ﬁnite and for any t, s ≥ 0, p(t + s) − p(t) is independent of T
t
. The
process is also stationary – the distribution of p(t +s) −p(t) does not depend upon t. If the
probability of a jump in an inﬁnitessimal time interval is proportional to the length of the
time interval, the constant of proportionality is called the intensity λ and it can be shown
that the inter jump times have exponential distribution with parameter λ (i.e. mean 1/λ).
Such processes can also be described by their jump measures. The jump measure
N(σ, B) is the number of jumps within the time interval σ in the (Borel) set B. From the
above deﬁnition for any B ∈ B,
E(N([0, t +s], B) −N([0, t], B) [T
t
) = λΠ(B),
where Π is a probability measure. More formally, we make the following deﬁnition
Deﬁnition 1. Poisson Process
A Poisson process with intensity measure h(dt, dy) = λdtdy as a measurable map N from
(Ω, T, P) into the space of positive integer valued counting measures on (R
+
Γ, B(R
+
Γ))
with the properties
(i) For every t ≥ 0 and every Borel subset A of [0, t] Γ, N(A) is T
t−
measurable.
(ii) For every t ≥ 0 and every collection of Borel subsets A
i
of [t, ∞) Γ, ¦N(A
i
)¦ is
independent of T
t−
.
(iii) E(N(A)) = h(A) for every Borel subset A of R
+
Γ.
(iv) For each s > 0 and Borel B ⊂ Γ the distribution of N([t, t + s], B) does not depend
upon t.
This measure can be related back to the path p() by the following integral for any
t ≥ 0,
p(t) =
t
0
Γ
γ N(ds, dγ).
The above integral can be deﬁned without diﬃculty!
21.1. Compensators
Let p(t) be an T
t
adapted jump process with bounded jumps, such that p(0) = 0 and
E(p(t)) < ∞ for all t ≥ 0. Let λ(t) be a bounded nonnegative T
t
adapted process with
RCLL sample paths. If
p(t) −
t
0
λ(s) ds
is an T
t
martingale, then λ() is the intensity of the jump process p() and the process
t
0
λ(s) ds is the compensator of the process p().
[92]
Discontinuous Stochastic Calculus 93
Why are compensators interesting? Recall from the theory of stochastic integration
with respect to continuous semimartingales that one of the most useful properties was that
a stochastic integral with respect to a continuous local is a local martingale. We would
like the same sort of property to hold when we integrate with respect to a suitable class
of discontinuous process.
Deﬁnition 21.2.
For A a nondecreasing locally integrable process, there is a nondecreasing previsible
process, called the compensator of A, denoted A
p
which is a.s. unique characterised by one
of the following
(i) A−A
p
is a local martinagle.
(ii) E(A
p
T
) = E(A
T
) for all stopping times T.
(iii) E[(H A
p
)
∞
] = E[(H A)
∞
] for all nonnegative previsible processes H.
Corollary 21.3.
For A a process of locally ﬁnite variation there exists a previsible locally ﬁnite variation
compensator process A
p
unique a.s. such that A−A
p
is a local martingale.
Example
Consider a Poisson process N
t
with mean measure m
t
(). This process has a compensator
of m
t
. To see this note that E(N
t
− N
s
[T
s
) = m
t
− m
s
, for any s ≤ t. So N
t
− m
t
is
a martingale, and m
t
is a previsible (since it is deterministic) process of locally ﬁnite
variation.
Deﬁnition 21.4.
Two local martingales N and M are orthogonal if their product NM is a local martingale.
Deﬁnition 21.5.
A local martingale X is purely discontinuous if X
0
= 0 and it is orthogonal to all continuous
local martingales.
The above deﬁnition, although important is somewhat counter intuitive, since for
example if we consider a Poisson process N
t
which consists soley of jumps, then we can
show that N
t
−m
t
is purely discontinuous despite the fact that this process does not consist
solely of jumps!
Theorem 21.6.
Any local martingale M has a unique (up to indistinguishability) decomposition
M = M
0
+M
c
+M
d
,
where M
d
0
= M
c
0
= 0, M
c
is a continuous local martingale and M
d
is a purely discontinuous
local martingale.
Theorem 21.7.
If f is a real valued C
2
function with domain R
n
and X is an ndimensional semimartingale
with decomposition X = M
c
+ M
d
+ Y where M
c
+ M
d
is a local martingale and M
c
is
Discontinuous Stochastic Calculus 94
a continuous local martingale then the following generalization of Itˆo’s formula holds
f(X
t
) =f(X
0
) +
n
¸
i=1
∂f
∂x
i
(X
s−
) dX
s
+
1
2
n
¸
i,j=1
t
0
∂
2
f
∂x
i
∂x
j
(X
s−
) d[(M
c
)
i
, (M
c
)
j
]
s
+
¸
0<s≤t
f(X
s
) −f(X
s−
) −
n
¸
i=1
∂f
∂x
i
(X
s−
)∆X
i
s
.
The rather ugly notation (M
c
)
i
refers to the ith component of the continuous local mar
tingale M
c
and ∆X
s
= X
s
−X
s−
.
A rough heuristic explanation for this generalized form of Itˆo and an aid to memory
of the formula is obtained by noting that the ﬁrst three terms on the right hand side are
identical to those in the multidimensional form of Itˆo for continuous processes.
Also terms 1,2 and 4 are exactly what you would have expected by writing down an
analogue of Itˆo for discontinuous processes of ﬁnite variation. In other words if the process
X has a jump at time s, the value of f(X) changes by f(X
s
) −f(X
s−
). The jump in X
s
will give rise to a discontinuity in
n
¸
i=1
t
0
∂f
∂x
i
(X
s−
) dX
s
with the jump having size
n
¸
i=1
∂f
∂x
i
(X
s−
)(X
s
−X
s−
)
To get the correct jump of f(X
s
) −f(X
s−
) then the corrective term is
¸
0<s≤t
f(X
s
) −f(X
s−
) −
n
¸
i=1
∂f
∂x
i
(X
s−
)∆X
i
s
.
21.2. RCLL processes revisited
We shall be working extensively in the space of right continuous left limits processes (CAD
LAG) and we shall need a metric on the function space. First let us consider the space
D
T
of CADLAG functions from [0, T] to R. The usual approach of using the sup norm on
[0, T] isn’t going to be very useful. Consider two functions deﬁned on [0, 10] by
f(x) =
0 for x < 1,
1 for x ≥ 1,
and
g(x) =
0 for x < 1 +,
1 for x ≥ 1 +.
Discontinuous Stochastic Calculus 95
For arbitrary small these functions are always one unit apart in the sup metric. We
attempt to surmount this problem by considering the class Λ
T
of increasing maps from
[0, T] to [0, T]. Now deﬁne a metric
d
T
(f, g) := inf
: sup
0≤s≤T
[s −λ(s)[ ≤ , sup
0≤s≤T
[f(s) −g(λ(s))[ ≤ for some λ ∈ Λ
T
.
This metric would appear to nicely have solved the problem described above. Unfortunately
it is not complete! This is a major problem because Prohorov’s theorem (among others)
requires a complete separable metric space! However the problem can be ﬁxed via a simple
trick; for a time transform λ ∈ Λ
T
deﬁne a norm
[λ[ := sup
0≤s≤T
log
λ(t) −λ(s)
t −s
.
Thus we replace the condition [λ(s) −s[ ≤ by [λ[ ≤ and we obtain a new metric on D
T
which this time is complete and separable;
d
T
(f, g) := inf
: [λ[ ≤ , sup
0≤s≤T
[f(s) −g(λ(s))[ ≤ , for some λ ∈ Λ
T
.
This is deﬁned on the space of CADLAG functions on the compact time interval [0, T]
and can be exteneded via another standard trick to D
∞
,
d
(f, g) :=
∞
0
e
−t
min [1, d
t
(f, g)] dt.
22. References
Bensoussan, A. (1982). Stochastic control of partially observable systems. CUP.
Bollobas, B. (1990). Linear Analysis. Cambridge.
Dellacherie, C. and Meyer, P.A. (1975). Probabilit´es et Potentiel A. Hermann, Paris.
Dellacherie, C. and Meyer, P.A. (1980). Probabilit´es et Potentiel B. Hermann, Paris.
Durrett, R. (1996). Probability: Theory and Examples. Duxbury.
Ethier, S. and Kurtz, T. (1986). Markov Processes – Characterization and Convergence.
Wiley.
Karatzas, I. and Shreve, S. E. (1987). Brownian Motion and Stochastic Calculus. Springer.
Mandelbaum, A., Massey, W. A., and Reiman, M. I. (1998). Strong approximations for
markovian service networks. Queueing Systems – Theory and Applications, 30:149–
201.
Musiela, M. and Rutkowski, M. (2005). Martingale Methods in Financial Modelling.
Springer, 2nd edition.
Pardoux, E. (1989). Filtrage non lin´eaire et ´equations aux d´eriv´ees partielles stochastiques
associ´ees, pages 67–163. Number 1464 in Lecture notes in Mathematics. Springer
Verlag.
Protter, P. (1990). Stochastic Integration and Diﬀerential Equations. Springer.
Rogers, L. C. G. and Williams, D. (1994). Diﬀusions, Markov Processes and Martingales:
Volume One: Foundations. Wiley.
Rogers, L. C. G. and Williams, D. (2000). Diﬀusions, Markov Processes and Martingales:
Volume Two: Itˆo Calculus. Cambridge University Press, 2nd edition.
[96]
1. Introduction
The following notes aim to provide a very informal introduction to Stochastic Calculus, and especially to the Itˆ integral and some of its applications. They owe a great deal to Dan o Crisan’s Stochastic Calculus and Applications lectures of 1998; and also much to various books especially those of L. C. G. Rogers and D. Williams, and Dellacherie and Meyer’s multi volume series ‘Probabilities et Potentiel’. They have also beneﬁted from insights gained by attending lectures given by T. Kurtz. The present notes grew out of a set of typed notes which I produced when revising for the Cambridge, Part III course; combining the printed notes and my own handwritten notes into a consistent text. I’ve subsequently expanded them inserting some extra proofs from a great variety of sources. The notes principally concentrate on the parts of the course which I found hard; thus there is often little or no comment on more standard matters; as a secondary goal they aim to present the results in a form which can be readily extended Due to their evolution, they have taken a very informal style; in some ways I hope this may make them easier to read. The addition of coverage of discontinuous processes was motivated by my interest in the subject, and much insight gained from reading the excellent book of J. Jacod and A. N. Shiryaev. The goal of the notes in their current form is to present a fairly clear approach to the Itˆ integral with respect to continuous semimartingales but without any attempt at o maximal detail. The various alternative approaches to this subject which can be found in books tend to divide into those presenting the integral directed entirely at Brownian Motion, and those who wish to prove results in complete generality for a semimartingale. Here at all points clarity has hopefully been the main goal here, rather than completeness; although secretly the approach aims to be readily extended to the discontinuous theory. I make no apology for proofs which spell out every minute detail, since on a ﬁrst look at the subject the purpose of some of the steps in a proof often seems elusive. I’d especially like to convince the reader that the Itˆ integral isn’t that much harder in concept than o the Lebesgue Integral with which we are all familiar. The motivating principle is to try and explain every detail, no matter how trivial it may seem once the subject has been understood! Passages enclosed in boxes are intended to be viewed as digressions from the main text; usually describing an alternative approach, or giving an informal description of what is going on – feel free to skip these sections if you ﬁnd them unhelpful. In revising these notes I have resisted the temptation to alter the original structure of the development of the Itˆ integral (although I have corrected unintentional mistakes), o since I suspect the more concise proofs which I would favour today would not be helpful on a ﬁrst approach to the subject. These notes contain errors with probability one. I always welcome people telling me about the errors because then I can ﬁx them! I can be readily contacted by email as alanb@chiark.greenend.org.uk. Also suggestions for improvements or other additions are welcome. Alan Bain [i]
2. Contents
1. Introduction . . . . . . . . . . . . . . . . . . 2. Contents . . . . . . . . . . . . . . . . . . . . 3. Stochastic Processes . . . . . . . . . . . . . . 3.1. Probability Space . . . . . . . . . . . . . . 3.2. Stochastic Process . . . . . . . . . . . . . 4. Martingales . . . . . . . . . . . . . . . . . . 4.1. Stopping Times . . . . . . . . . . . . . . 5. Basics . . . . . . . . . . . . . . . . . . . . . 5.1. Local Martingales . . . . . . . . . . . . . 5.2. Local Martingales which are not Martingales . . 6. Total Variation and the Stieltjes Integral . . . . 6.1. Why we need a Stochastic Integral . . . . . . 6.2. Previsibility . . . . . . . . . . . . . . . . 6.3. LebesgueStieltjes Integral . . . . . . . . . . 7. The Integral . . . . . . . . . . . . . . . . . . 7.1. Elementary Processes . . . . . . . . . . . . 7.2. Strictly Simple and Simple Processes . . . . . 8. The Stochastic Integral . . . . . . . . . . . . 8.1. Integral for H ∈ L and M ∈ M2 . . . . . . . 8.2. Quadratic Variation . . . . . . . . . . . . 8.3. Covariation . . . . . . . . . . . . . . . . 8.4. Extension of the Integral to L2 (M ) . . . . . . 8.5. Localisation . . . . . . . . . . . . . . . . 8.6. Some Important Results . . . . . . . . . . . 9. Semimartingales . . . . . . . . . . . . . . . . 10. Relations to Sums . . . . . . . . . . . . . . 10.1. The UCP topology . . . . . . . . . . . . 10.2. Approximation via Riemann Sums . . . . . . 11. Itˆ’s Formula . . . . . . . . . . . . . . . . . o 11.1. Applications of Itˆ’s Formula . . . . . . . . o 11.2. Exponential Martingales . . . . . . . . . . 12. L´vy Characterisation of Brownian Motion . . e 13. Time Change of Brownian Motion . . . . . . 13.1. Gaussian Martingales . . . . . . . . . . . 14. Girsanov’s Theorem . . . . . . . . . . . . . 14.1. Change of measure . . . . . . . . . . . . 15. Brownian Martingale Representation Theorem 16. Stochastic Diﬀerential Equations . . . . . . . 17. Relations to Second Order PDEs . . . . . . . 17.1. Inﬁnitesimal Generator . . . . . . . . . . . 17.2. The Dirichlet Problem . . . . . . . . . . . [ii] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i ii 1 1 1 4 4 8 8 9 11 11 12 13 15 15 15 17 17 19 22 23 26 27 29 31 31 32 35 40 41 46 48 49 51 51 53 56 61 61 62
. . . . . . . . . . . . . . . . Discontinuous Stochastic Calculus . . . 20. . . . The Filtering Problem . Compensators . . . . . . c Stochastic Filtering . . . . . . . . . . . . . .1. . . . . . . . Gronwall’s Inequality . . . . . . . . . . . . . . . . . . 20. . FeynmanKa˘ Representation . . .1. . . . RCLL processes revisited . . . . . . . . . . . . . The Cauchy Problem . . . . . . . . . . . . .
22. . . . . . . . .5. . . . Conditional Mean . . . . . . 21. . . 18. . . . . . . . 17. . . . . .3. . .
19. . . 21. . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Unnormalised Conditional Distribution 18. . . . 64 66 69 69 70 70 70 76 78 86 87 89 89 90 92 92 94 96
iii
18. .3. . . . . . . . . . Observation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18. . References . . . .4. . . . . . . . 20. . . . . .
21. . .6. . . . . . . . .
. . . . . . . . . . . . . . 18. Change of Measure . . . . . . .Contents 17. . 18. . . . . . . . . . .2. . . Conditional Covariance . . . . . . . . . . . . . . . . . . . . . . . . Kalman Filter . . . .1. . . . . The Zakai Equation . . . . . . . . . . . . Signal Process . . . . . 18. . . . .4. KushnerStratonowich Equation . . .2. . . . . . . . . . . . . . . . . . . . . . . . 18. . . . .7. . . . . . .
1996]. F. if Xt is Ft measurable for each t ≥ 0. P) be a probability space. E) be a measurable space.
The process (Xt )t≥0 is said to be (Ft )t≥0 adapted. let (E.1. a map X : (R+ × Ω. with P(A) = 0} . where B + are the Borel sets of the time space R+ . B + ⊗ F) → (R. [1]
. F. an increasing chain of σalgebras i. E). Deﬁne F∞ by F∞ =
t≥0
Ft . Deﬁnition 1. or (Rn . P) is said to be complete if for A ⊂ B ⊂ Ω with B ∈ F and P(B) = 0 then this implies that A ∈ F. B). B). In addition to the probability space (Ω. F. Measurable Process The process (Xt )t≥0 is said to be measurable if the mapping (R+ × Ω. B) : (t. which in many of the cases considered here will be (R. F. ω) → Xt (ω) is measurable on R × Ω with respect to the product σﬁeld B(R) ⊗ F. P) and a measurable state space (E. More formally. The set of Pnull subsets of Ω is deﬁned by N := {N ⊂ Ω : N ⊂ A for A ∈ F. B + ⊗ F) → (R.e. then the natural ﬁltration of (Xt )t≥0 is given by
X Ft := σ(Xs : s ≤ t). P). Probability Space
Let (Ω. The process (Xt )t≥0 is obviously adapted with respect to the natural ﬁltration. called the state space.2.
3. The space (Ω. A random variable is a F/E measurable function X : Ω → E. B).
3.3. a stochastic process is a family (Xt )t≥0 such that Xt is an E valued random variable for each time t ≥ 0. proofs may be found in the usual books for example [Durrett. Stochastic Process
Given a probability space (Ω.
t≥0
Ft := σ
If (Xt )t≥0 is a stochastic process. Associated with a process is a ﬁltration. Fs ⊂ Ft if 0 ≤ s ≤ t < ∞. Stochastic Processes
The following notes are a summary of important deﬁnitions and results from the theory of stochastic processes.
so if X is left continuous. where B[0. Now deﬁne Yt = sups∈[0. Hence the next theorem. we can restrict the unions which cause problems to be over some dense subset (say the rationals) and this solves the problem.Stochastic Processes
2
Deﬁnition 2. is measurable with respect to B[0. ﬁx ω). and an event such as {Ys > 1} =
0≤s≤s
{Xs > 1}
is an uncountable union. n Consideration of the sequence X2 yields the same result for right continuous. t]. t]. From the deﬁnition of a stochastic process for each t that Xt ∈ Ft . But the convergence implies X is also. If X were progressively measurable then this would be suﬃcient to imply that Ys is Fs measurable.t] is the Borel σ algebra of subsets of [0.t] ⊗ Ft . Proof We consider the process X restricted to the time interval [0. Theorem 3.s] ⊗ Fs measurable.(k+1)s/2n ] (t)Xks/2n (ω). Why on earth is this useful? Consider a noncontinuous stochastic process Xt . Is Ys a stochastic process? The answer is not necessarily – sigma ﬁelds are only guaranteed closed under countable unions.
. adapted process is progressively measurable. working pointwise n (that is. Every adapted right (or left) continuous.
The following extra information about ﬁltrations should probably be skipped on a ﬁrst reading. If X has suitable continuity properties.t] Xs .3.s/2n ) (t)X0 (ω) +
k=1
1[ks/2n . Progressively Measurable Process A process is progressively measurable if for each t its restriction to the time interval [0. the sequence X1 converges to X.
:= 1[0. On this interval for each n ∈ N we deﬁne
2n −1 n X1 := k=0 2n n X2
1(ks/2n . since they are likely to appear as excess baggage. adapted processes. n But the individual summands in the deﬁnition of X1 are by the adpatedness of X n clearly B[0.(k+1)s/2n ) (t)X(k+1)s/2n (ω)
n Note that X1 is a left continuous process. hence X is progressively measurable. s]. hence X1 is also.
d. and let X = (Xt )t≥0 be a stochastic process deﬁned on (Ω. . Xt2 ∈ A2 . then Xt (ω) < K. ∞) Ft = Ft+ . It will turn out to be important because many deﬁnitions can only be made up to evanescence. The following deﬁnition provides us with a special name for a process which is indistinguishable from the zero process. . ∞) Ft+ =
t≤s<∞
whence it is clear that for each t. P).
∀t ∈ [0. Let X = (Xt )t≥0 be a stochastic process deﬁned on (Ω. P). Deﬁnition 3. F. Deﬁnition 3. . . Xtn ∈ An ) = P (Xt1 ∈ A1 . Deﬁnition 3. and A1 .3. 0 ≤ t1 < t2 < · · · < tn < ∞. Let X and X be deﬁned on (Ω. F. . . . Xtn ∈ An ). Deﬁnition 3.5. .7. An ∈ E. A process (Xt )t≥0 is said to be bounded if there exists a universal constant K such that for all ω and t ≥ 0. Deﬁnition 3. Then X and X are indistinguishable if and only if P ({ω ∈ Ω : Xt (ω) = Xt (ω)∀t ≥ 0}) = 1. ∞) Ft− =
0≤s<t
3
Fs Fs . P). The family {Ft } is called right continuous if ∀t ∈ [0. F.
Deﬁnition 3. A2 . Then X and X have the same ﬁnite dimensional distributions if for all n. .4. Ft− ⊂ Ft ⊂ Ft+ . P). . . Let X and X be deﬁned on (Ω. P(Xt1 ∈ A1 . Xt2 ∈ A2 .Stochastic Processes Deﬁne ∀t ∈ (0. . Then X and X are modiﬁcations of each other if and only if P ({ω ∈ Ω : Xt (ω) = Xt (ω)}) = 1 ∀t ≥ 0.2.
. F.d. There is a chain of implications indistinguishable ⇒ modiﬁcations ⇒ same f.6. A process X is evanescent if P(Xt = 0 ∀t) = 1.
A martingale X = {Xt .
A typical example is Y = Ms . and the subspace of continuous L2 bounded martingales is denoted Mc .1.s. Ft . N ). A common application is to the diﬀerence of two squares. t ≥ 0} be an integrable process then X is a (i) Martingale if and only if E(Xt Fs ) = Xs a. Ft . Then deﬁne Ft+ := >0 Ft+ and also ˜ the partial augmentation of F by Ft = σ(Ft+ . whence E(Ms (Mt − Ms )) = 0 is obtained.
t≥0
Orthogonality of Martingale Increments A frequently used property of a martingale M is the orthogonality of increments property which states that for a square integrable martingale M .4. Let X = {Xt . Ft . A process X = {Xt .
[4]
. Then if t → E(Xt ) is continuous there ˜ ˜ ˜ ˜ exists an Ft adapted stochastic process X = {Xt . for 0 ≤ s ≤ t < ∞ (ii) Supermartingale if and only if E(Xt Fs ) ≤ Xs a. Deﬁnition 4. The space of L2 bounded martingales is denoted by M2 . Deﬁnition 4. with left limits (CADLAG) such that X and X are modiﬁcations of each other. t ≥ 0} is said to be an L2 martingale or a square integrable 2 martingale if E(Xt ) < ∞ for every t ≥ 0. t ≥ 0} is said to be uniformly integrable if and only if sup E Xt 1Xt ≥N → 0 as N → ∞. 2 Deﬁnition 4. t ≥ 0} be an integrable process. for 0 ≤ s ≤ t < ∞ Theorem (Kolmogorov) 4. for 0 ≤ s ≤ t < ∞ (iii) Submartingale if and only if E(Xt Fs ) ≥ Xs a. Let X = {Xt . and so E(Y (Mt − Ms )) = E (E(Y (Mt − Ms )Fs )) = E (Y E(Mt − Ms Fs )) = 0. t ≥ 0} with sample paths which are ˜ right continuous. let t ≥ s then
2 E((Mt − Ms )2 Fs ) =E(Mt2 Fs ) − 2Ms E(Mt Fs ) + Ms 2 2 =E(Mt2 − Ms Fs ) = E(Mt2 Fs ) − Ms .
Proof Via Cauchy Schwartz inequality EY (Mt − Ms ) < ∞.4. Ft . and Y ∈ Fs with E(Y 2 ) < ∞ then E [Y (Mt − Ms )] = 0 for t ≥ s. A process X = {Xt . t ≥ 0} is said to be Lp bounded if and only if supt≥0 E(Xt p ) < ∞.3.5. Martingales
Deﬁnition 4. Ft .s.s.2. Ft .
note that if for each t ∈ [0.1.
Given a stochastic process X = (Xt )t≥0 . The following theorem is included as a demonstration of checking for stopping times. T is a stopping time with respect to Ft+ if and only if for all t ∈ [0.Martingales
5
4. the event {T < t} if Ft measurable. Theorem (Optional Stopping). Then the following are equivalent: (i) X is a martingale. a stopped process X T may be deﬁned by X T (ω) :=XT (ω)∧t (ω). Proof If T is an Ft+ stopping time then for all t ∈ (0. If in addition. T <t+ n as a consequence of which
∞
{T ≤ t} =
n=1
T <t+
1 n
∞
∈
n=1
Ft+1/n = Ft+ . ∞). Theorem 4.
. (ii) X T is a martingale for all stopping times T . Ft adapted process. ∞) the event {T ≤ t} is Ft+ measurable.6. ∞) is a stopping (optional) time if and only if {ω : T (ω) ≤ t} ∈ Ft .
To prove the converse. Let X be a right continuous integrable. and may be skipped if desired. then for each such t 1 ∈ Ft+1/n . (iv) E(XT FS ) = XS for all bounded stopping times S and T such that S ≤ T . (iii) E(XT ) = E(X0 ) for all bounded stopping times T . FT :={A ∈ F : A ∩ {T ≤ t} ∈ Ft }. ∞) we have that {T < t} ∈ Ft . X is uniformly integrable then (iv) holds for all stopping times (not necessarily bounded). Thus for 1/n < t we have T ≤t− so {T < t} =
n=1
1 n
∞
∈ F(t−1/n)+ ⊂ Ft
T ≤t−
1 n
∈ Ft . Stopping Times
A random variable T : Ω → [0.
there exists a martingale At such that limt→∞ At = A. M is Lp bounded limt→∞ Mt = M∞ in Lp . but it is also clear that EBT = 1 = EB0 = 0.Martingales The condition which is most often forgotten is that in (iii) that the stopping time T be bounded. λP(M ∗ ≥ λ) ≤ E [M∞ 1M ∗ <∞ ] .e. Here F∞ := limt→∞ Ft . Ft . Equally Bt is a martingale with respect to the ﬁltration generated by B itself. it is clear that 2 E(M∞ ) = sup E(Mt2 ). (iii) If p > 1 i. Let T = inf{t ≥ 0 : Xt = 1}. To see why it is necessary consider Bt a Brownian Motion starting from zero. since f (x) = x is convex. 2
Hence taking expectations
2 EMt2 ≤ EM∞ . martingales M such that sup EMt2 < ∞. (i) If M is Lp bounded then M∞ (ω) := limt→∞ Mt (ω) Pa. (ii) Lp maximal inequality.
Note that the norm used in stating the Doob Lp inequality is deﬁned by M
p
= [E(M p )]
1/p
. For 1 < p < ∞. Let M = {Mt . A norm 2 2 may be deﬁned on the space M2 by M 2 = M∞ 2 = E(M∞ ). Deﬁnition 4. Obviously in this case T < ∞ is false. and let M ∗ := supt≥0 Mt . For λ > 0. t ≥ 0} be a martingale.e. and At = E(AFt ). Let M2 denote the set of L2 bounded CADLAG martingales i. (ii) If p = 1 and M is uniformly integrable then limt→∞ Mt (ω) = M∞ (ω) in L1 . 2 2 From the conditional Jensen’s inequality. t ≥ 0} be a uniformly integrable martingale. clearly a stopping time. Theorem (Doob’s Martingale Inequalities). 2 and since by martingale convergence in L2 . Then (i) Maximal Inequality. Let M = {Mt .
t≥0
Let Mc denote the set of L2 bounded CADLAG martingales which are continuous. Then for all A ∈ L1 (F∞ ). we get E(Mt2 ) → E(M∞ ). Ft . M∗
p
6
≤
p M∞ p−1
p.
Theorem (Martingale Convergence).
2 E M∞ Ft ≥ (E(M∞ Ft )) 2 E M∞ Ft ≥(EMt )2 .7.s. t≥0
.
M∞ say. M ∈ Mc . M2 inherits this structure. in L2 (F∞ ). 2
.8. The bijection is obtained via f :M2 → L2 (F∞ ) f :(Mt )t≥0 → M∞ ≡ lim Mt
t→∞
7
g :L (F∞ ) → M2 g :M∞ → Mt ≡ E(M∞ Ft ) Notice that sup EMt2 = M∞
t 2 2 2 = E(M∞ ) < ∞. 2 Proof We prove this by showing a one to one correspondence between M2 (the space of square integrable martingales) and L2 (F∞ ). equivalently {M∞ } is Cauchy in L2 (F∞ ). As L2 (F∞ ) is a Hilbert space. Let Mt := E(M∞ Ft ). The space (M2 .
that is M (n) → M uniformly in L2 . with Mc a closed subspace. · ) (up to equivalence classes deﬁned by modiﬁcations) is a Hilbert space.Martingales Theorem 4. consider a Cauchy sequence {M (n) } in 2 (n) (n) M2 .
2
as Mt is a square integrable martingale. then sup Mt
t≥0 (n)
− Mt → 0. To see that Mc is a closed subspace of M2 . Hence M∞ converges to a limit. Hence there exists a subsequence n(k) such that M n(k) → M uniformly. 2 Thus Mc is a closed subspace of M. as a uniform limit of continuous functions is continuous. in L2 .
5. Basics
5.1. Local Martingales
A martingale has already been deﬁned, but a weaker deﬁnition will prove useful for stochastic calculus. Note that I’ll often drop references to the ﬁltration Ft , but this nevertheless forms an essential part of the (local) martingale. Just before we dive in and deﬁne a Local Martingale, maybe we should pause and consider the reason for considering them. The important property of local martingales will only be seen later in the notes; and as we frequently see in this subject it is one of stability that is, they are a class of objects which are closed under an operation, in this case under the stochastic integral – an integral of a previsible process with a local martingale integrator is a local martingale. Deﬁnition 5.1. M = {Mt , Ft , 0 ≤ t ≤ ∞} is a local martingale if and only if there exists a sequence of stopping times Tn tending to inﬁnity such that M Tn are martingales for all n. The space of local martingales is denotes Mloc , and the subspace of continuous local martingales is denotes Mc . loc Recall that a martingale (Xt )t≥0 is said to be bounded if there exists a universal constant K such that for all ω and t ≥ 0, then Xt (ω) < K. Theorem 5.2. Every bounded local martingale is a martingale. Proof Let Tn be a sequence of stopping times as in the deﬁnition of a local martingale. This T sequence tends to inﬁnity, so pointwise Xt n (ω) → Xt (ω). Using the conditional form of the dominated convergence theorem (using the constant bound as the dominating function), for t ≥ s ≥ 0 T lim E(Xt n Fs ) = E(Xt Fs ).
n→∞ T T But as X Tn is a (genuine) martingale, E(Xt n Fs ) = Xs n = XTn ∧s ; so T T E(Xt Fs ) = lim E(Xt n Fs ) = lim Xs n = Xs . n→∞ n→∞
Hence Xt is a genuine martingale. Proposition 5.3. The following are equivalent (i) M = {Mt , Ft , 0 ≤ t ≤ ∞} is a continuous martingale. (ii) M = {Mt , Ft , 0 ≤ t ≤ ∞} is a continuous local martingale and for all t ≥ 0, the set {MT : T a stopping time, T ≤ t} is uniformly integrable. Proof (i) ⇒ (ii) By optional stopping theorem, if T ≤ t then MT = E(Mt FT ) hence the set is uniformly integrable. [8]
Basics (ii) ⇒ (i)It is required to prove that E(M0 ) = E(MT ) for any bounded stopping time T . Then by local martingale property for any n, E(M0 ) = E(MT ∧Tn ), uniform integrability then implies that
n→∞
9
lim E(MT ∧Tn ) = E(MT ).
5.2. Local Martingales which are not Martingales
There do exist local martingales which are not themselves martingales. The following is an example Let Bt be a d dimensional Brownian Motion starting from x. It can be shown using Itˆ’s formula that a harmonic function of a Brownian motion is a local martingale o (this is on the example sheet). From standard PDE theory it is known that for d ≥ 3, the function 1 f (x) = xd−2 is a harmonic function, hence Xt = 1/Bt d−2 is a local martingale. Now consider the Lp norm of this local martingale Ex Xt p = y − x2 1 exp − 2t (2πt)d/2 y−(d−2)p dy.
Consider when this integral converges. There are no divergence problems for y large, the potential problem lies in the vicinity of the origin. Here the term 1 y − x2 exp − 2t (2πt)d/2 is bounded, so we only need to consider the remainder of the integrand integrated over a ball of unit radius about the origin which is bounded by C
B(0,1)
y−(d−2)p dy,
for some constant C, which on tranformation into polar coordinates yields a bound of the form
1
C
0
r−(d−2)p rd−1 dr,
with C another constant. This is ﬁnite if and only if −(d − 2)p + (d − 1) > −1 (standard integrals of the form 1/rk ). This in turn requires that p < d/(d − 2). So clealry Ex Xt  will be ﬁnite for all d ≥ 3. Now although Ex Xt  < ∞ and Xt is a local martingale, we shall show that it is not √ a martingale. Note that (Bt − x) has the same distribution as t(B1 − x) under Px (the
Basics probability measure induced by the BM starting from x). So as t → ∞, Bt  → ∞ in probability and Xt → 0 in probability. As Xt ≥ 0, we see that Ex (Xt ) = Ex Xt  < ∞. Now note that for any R < ∞, we can construct a bound Ex Xt ≤ which converges, and hence lim sup Ex Xt ≤ R−(d−2) .
t→∞
10
1 (2πt)d/2
y−(d−2) dy + R−(d−2) ,
y≤R
As R was chosen arbitrarily we see that Ex Xt → 0. But Ex X0 = x−(d−2) > 0, which implies that Ex Xt is not constant, and hence Xt is not a martingale.
6. Total Variation and the Stieltjes Integral
Let A : [0. Deﬁnition 6. the total 2 variation is not ﬁnite.1. First we shall prove that a continuous local martingale of ﬁnite variation is zero. Vt0 ≤ Vt . which is easy to prove by considering the variation expressed as the limit of a sum and factoring it by a maximum multiplies by the quadratic variation. Thus the LebesgueStieltjes integral deﬁnition isn’t valid in this case. separately for each ω ∈ Ω). even on a bounded interval like [0. . Let a partition Π = {t0 . tm } have 0 = t0 ≤ t1 ≤ · · · ≤ tm = t. ∞) → R be a CADLAG (continuous to right.
6. with left limits) process. Why we need a Stochastic Integral
Before delving into the depths of the integral it’s worth stepping back for a moment to see why the ‘ordinary’ integral cannot be used on a path at a time basis (i.s.1.e.e. if for every t and every ω. n(Π)
k=1
Vt := sup
Π
Atk ∧t − Atk−1 ∧t : 0 = t0 ≤ t1 ≤ · · · ≤ tn = t
. set
t
It (X) =
0
Xs (ω)dMs (ω). the ﬁrst term of which tends to zero by continuity).
1
These can be shown to be equivalent (for CADLAG processes).
for M ∈ Mc . . but for an interesting martingale (i. since trivially (use the dyadic partition).e. It is also possible to show that Vt0 ≥ Vt for the total variation of a CADLAG process. To generalise we shall see that the quadratic variation is actually the ‘right’ variation to use (higher variations turn out to be zero and lower ones inﬁnite. [11]
. . T ]. But to start.). t1 . Suppose we were to do this i. A process A is said to have ﬁnite variation if the associated variation process V is ﬁnite (i. the mesh of the partition is deﬁned by δ(Π) = max tk − tk−1 .
An alternative deﬁnition is given by
n(Π)
Vt0
:= lim
n→∞
Ak2−n ∧t − A(k−1)2−n ∧t . Vt (ω) < ∞. . we shall consider integrating a previsible process Ht with an integrator which is an increasing ﬁnite variation process.e.
1≤k≤m
The variation of A is then deﬁned as the increasing process V given by. one which isn’t zero a.
It therefore suﬃces to prove the result for a bounded. Then the martingale M Sn is of bounded variation. If M is a continuous local martingale of ﬁnite variation. Hence M ≡ 0.e. a a Theorem 6.Total Variation and the Stieltjes Integral Proposition 6. We shall show P = P . As X is left continuous. This V is a continuous. so X is P measurable. hence the expectation converges to zero as the modulus of the partition tends to zero by the bounded convergence theorem. continuous martingale M of bounded variation. c`gl`d proa a cesses) which are adapted to Ft− . i. t] where A ∈ Fs . with left continuous paths on (0. Now deﬁne a sequence of stopping times Sn as the ﬁrst time V exceeds n. The previsible σ ﬁeldis also generated by the collection of random sets A × {0} where A ∈ F0 and A × (s.2.(k+1)/2n ] (t)
It is clear that Xn ∈ P . adapted process.
6. Mt2 =
N k=1
12
Mt2 − Mt2 . t] where A ∈ Fs . or indeed continuous processes (Xt )t≥0 which are adapted to Ft− . Remark The same σﬁeld is generated by left continuous. right limits processes (i. Now is the time for a proper deﬁnition. thus P ⊂ P .4. Then since M0 = 0 it is clear that. Previsibility
The term previsible has crept into the discussion earlier. adapted to Ft .3. . tN = t} be a partition of [0. Proof Let the σ ﬁeld generated by the above collection of sets be denotes P . Deﬁnition 6. t]. .2. It should be noted that c`dl`g processes generate the optional σ ﬁeld which is usually diﬀerent. Then via orthogonality of martingale increments k k−1
N
E(Mt2 )
=E
k=1
Mtk − Mtk−1
2
≤E Vt sup Mtk − Mtk−1
k
The integrand is bounded by n2 (from deﬁnition of the stopping time Sn ). Let X be a left continuous process. Proof Let V be the variation process of M . . starting from zero then M is identically zero. The previsible (or predictable) σﬁeld P is the σﬁeld on R+ ×Ω generated by the processes (Xt )t≥0 . Sn := inf t {t ≥ 0 : Vt ≥ n}. deﬁne for n ∈ N X n = X0 10 (t) +
k
Xk/2n 1(k/2n . Fix t ≥ 0 and let {0 = t0 .e. t1 . Conversely
. ∞). the above sequence of leftcontinuous processes converges pointwise to X. . It is gnerated by sets of the form A × (s.
There are no really new concepts of the integral in the foregoing. hence P ⊂ P. the LebesgueStieltjes integral is considered ﬁrst for functions A and H. Let H be a previsible process (as deﬁned above). ω) → Xt (ω) is measurable with respect to the previsible σﬁeld P. This induces a Borel measure dA on (0. we shall write
t
(H · A)t ≡
0
HdX.
whenever H ≥ 0 or (H · A)t < ∞. if the mapping (t.3.tA ]\[0. it is basically the LebesgueStieltjes integral eextended from functions H(t) to processes in a pathwise fashion (that’s why ω has been included in those deﬁnitions as a reminder). here I consider processes on a pathwise basis. E(Xt ) = E(X0 ) for all t then Xt is a martingale.sA ] . i.] Let A be an increasing cadlag process.Total Variation and the Stieltjes Integral consider the indicator function of A × (s. If X is a nonnegative continuous local martingale and E(X0 ) < ∞ then Xt is a supermartingale. These indicator functions are adapated and left continuous.5. If additionally X has constant mean. LebesgueStieltjes Integral
[In the lecture notes for this course. This deﬁnition may be extended to integrator of ﬁnite variation which are not increasing. The integral of H with respect to the ﬁnite variation process A is then deﬁned by (H · A)t (ω) := (H · A+ )t (ω) − (H · A− )t (ω). by decomposing the process A of ﬁnite variation into a diﬀerence of two increasing processes. Theorem 6.
and later on we shall use d(H · X) ≡ HdX. so A = A+ − A− . A process (Xt )t≥0 is said to be previsible. t] this can be written as 1[0.e. The LebesgueStieltjes integral of H is deﬁned with respect to an increasing process A by
t
(H · A)t (ω) =
0
Hs (ω)dAs (ω).
. where A± = (V ± A)/2 (here V is the total variation process for A). ∞) such that dA((s. whenever (H · V )t < ∞. Deﬁnition 6. As a notational aside. t])(ω) = At (ω) − As (ω).6.
13
6. where sA (ω) = s for ω ∈ A and +∞ otherwise.
From this martingale property
T T E(Xt n Fs ) = Xs n . Let An := {ω : Xs − E(Xt Fs ) > 1/n}.
14
As Xt ≥ 0 we can apply the conditional form of Fatou’s lemma. n n
. hence a contradiction. so
T T T E(Xt Fs ) = E(lim inf Xt n Fs ) ≤ lim inf E(Xt n Fs ) = lim inf Xs n = Xs . Given the constant mean property E(Xt ) = E(Xs ).s. so Xt is a supermartingale. so A :=
n=1 ∞
An = {ω : Xs − E(Xt Fs ) > 0}. n but by the constant mean property the left hand side is zero. Suppose for some n. P(An ) > . n→∞ n→∞ n→∞
Hence E(Xt Fs ) ≤ Xs . E(Xs ) − E(Xt ) > : Xs − E(Xt Fs ) ≥ 0 1 1A . thus all the P(An ) are zero.Total Variation and the Stieltjes Integral Proof As Xt is a continuous local martingale there is a sequence of stopping times Tn ↑ ∞ such that X Tn is a genuine martingale.
∞
Consider P(A) = P(∪∞ An ) ≤ n=1 P(An ). then note n=1 that ω ∈ An : Xs − E(Xt Fs ) > 1/n ω ∈ Ω/An Hence Xs − E(Xt Fs ) ≥ taking expectations yields . so Xs = E(Xt Fs ) a.
V ) (t) ∈ B} = [{Z ∈ B} ∩ {U ≤ t}] ∩ {V > t}.
7. Let B be a borel set of R. Let’s prove that it is previsible: H is clearly a left continuous process. and since V is a stopping time {V > t} = Ω/{V ≤ t} is also in Ft . hence Z1[U.
n→∞
Sn = S +
1 . Various special classes of process will be needed in the sequel and these are all deﬁned here for convenience. By the deﬁnition of U as a stopping time and hence the deﬁnition of FU . and Z is a bounded FU measurable function. so we need only show that it is adapted.V ) is adapted when U and V are stopping times which satisfy U ≤ V . The Integral
We would like eventually to extend the deﬁnition of the integral to integrands which are previsible processes and integrators which are semimartingales (to be deﬁned later in these notes). then the event {Z1[U. Such a process is the simplest nontrivial example of a previsible process.T (ω)] (t).1. but it is possible to go the whole way and deﬁne the integral of a previsible process with respect to a general semimartingale. Elementary Processes
An elementary process Ht (ω) is one of the form Ht (ω) = Z(ω)1(S(ω).tk+1 ](t) .
[15]
. S ≤ T ≤ ∞. in particular as regards the construction of the quadratic variation process of a discontinuous process. n
Tn = T +
1 . n
So it is suﬃcient to show that Z1[U. where S. and Z is a bounded FS measurable random variable. the event enclosed by square brackets is in Ft .Tn ) . Strictly Simple and Simple Processes
A process H is strictly simple (H ∈ L∗ ) if there exist 0 ≤ t0 ≤ · · · ≤ tn < ∞ and uniformly bounded Ftk measurable random variables Zk such that
n−1
H = H0 (ω)10 (t)
k=0
Zk (ω)1(tk .V ) is adapted. Naturally with terms like ‘elementary’ and ‘simple’ occurring many books have diﬀerent names for the same concepts – so beware!
7. In fact in these notes we’ll only get as far as continuous semimartingales.7. T are stopping times. It can be considered as the pointwise limit of a sequence of right continuous processes Hn (t) = lim Z1[Sn .2. but some extra problems are thrown up on the way.
if there exists a sequence of stopping times 0 ≤ T0 ≤ · · · ≤ Tk → ∞. We shall see the application of this after the next section.
Similarly a simple process is also a previsible process. The fundamental result will follow from the fact that the σalgebra generated by the simple processes is exactly the previsible σalgebra.
. and Zk uniformly bounded FTk measurable random variables such that
∞
16
H = H0 (ω)10 (t) +
k=0
Zk 1(Tk .The Integral This can be extended to H is a simple processes (H ∈ L).Tk+1 ] .
and so. The Stochastic Integral
As has been hinted at earlier the stochastic integral must be built up in stages. If H is a simple process.1. Hence (H · M )T = (H1(0. and M an L2 bounded martingale then the integral may be deﬁned by the ‘martingale transform’ (c. and integrands which are simple processes.T ] · M ) = (H · M T ).T ] · M ) yielding the same result in both cases.
8.Tk+1 ] . By our deﬁnition for M ∈ M2 .8. and T a stopping time. ∞ ∞ Proof Part (i) As H ∈ L we can write H=
k=0
∞
Zk 1(Tk . Integral for H ∈ L and M ∈ M2
For a simple process H ∈ L.
[17]
. for T a general stopping time consider (H · M )T = (H · M )T ∧t and so t
∞
(H ·
M )T t
=
k=0
Zk MTk+1 ∧T ∧t − MTk ∧T ∧t .T ] ) · M = H · (M T ).
Similar computations can be performed for (H · M T ). we have
∞
(H · M )t =
k=0
Zk MTk+1 ∧t − MTk ∧t . and to start with we shall consider integrators which are L2 bounded martingales. and Zk an FTk measurable bounded random variable. ∞ 2 2 2 2 (iii) E[(H · M )2 ] = k=0 E[Zk (MTk+1 − MTk )] ≤ H 2 E(M∞ ). noting that MtT = MT ∧t and for (H1(0.
for Tk stopping times. discrete martingale theory)
t ∞
Hs dMs = (H · M )t :=
0 k=0
Zk MTk+1 ∧t − MTk ∧t
Proposition 8.f. M a L2 bounded martingale. Then (i) (H · M )T = (H1(0. (ii) (H · M ) ∈ M2 .1.
where Z 2 is removed from the conditional expectation since it is and FR measurable random variable. Thus (H · M )t is a martingale by optional stopping theorem. We shall prove that E ((H · M )T ) = E ((H · M )0 ) . where as before R and S are stopping times. an elementary process. H ∈ L). Let T be an arbitrary stopping time.e. so E(H · M )T = E(H · M T )∞ = 0. Part (iii) We wish to prove that (H · M ) is an L2 bounded martingale. We again start by considering H ∈ E. i. and hence as M is a martingale. ∞ =E Z 2 E (MS − MR )2 FR . By linearity. where R and S are stopping times and Z is a bounded FR measurable random variable. ∞ ω∈Ω
. this result extends to H a simple process (i. Using the same argument as used in the orthogonality of martingale increments proof.S] . ﬁrst we shall establish it for an elementary process H ∈ E.The Stochastic Integral Part (ii) To prove this result.
As M is an L2 bounded martingale and Z is a bounded process. Note that (H · M )∞ = Z (MS − MR ) . and Z is a bounded FR measurable random variable.e. and Z is FR measurable we obtain E(H · M )∞ =E (E (Z (MS − MR )) FR ) = E (ZE ((MS − MR ) FR )) =0.
2 E (H · M )2 ≤ sup 2Z(ω)2 E M∞ . and then extend to L by linearity. Suppose H = Z1(R. Via part (i) note that E(H · M )T = E(H · M T ). H = Z1(R. E (H · M )2 =E Z 2 (MS − MR )2 .
2 2 E (H · M )2 = E Z 2 E (MS − MR )FR ∞ 2 2 = E Z 2 MS − MR
18
.S] . and hence via optional stopping conclude that (H · M )t is a martingale.
All the deﬁnitions as given here aren’t based on the partition construction. Now extend this to the case of an inﬁnite sum. The quadratic variation process M t of a continuous L2 integrable martingale M is the unique process At starting from zero such that Mt2 − At is a uniformly integrable martingale. hence (H · M )Tn converges in M2 and the limit must be the pointwise limit (H · M ). applying the result just proven for ﬁnite sums to the right hand side yields (H · M )Tm − (H · M )Tn ∞ ∞
2 2 m−1
=
k=n
2 2 2 E Zk MTk+1 − MTk 2 2 2 2 E M ∞ − M Tn .
K
E (H ·
M )2 ∞
≤ H∞ E
k=0
2
2 2 MTk+1 − MTk
2 2 2 ≤ H∞ 2 E MTK+1 − MT0 ≤ H∞ 2 EM∞ . Let n = 0 and m → ∞ and the result of part (iii) is obtained.The Stochastic Integral so (H · M ) is an L2 bounded martingale. but requires a little care In general the orthogonality of increments arguments extends to the case where only ﬁnitely many of the Zk in the deﬁnition of the simple process H are non zero.2. so the ﬁnal bound is obtained via the L2 martingale convergence theorem.Tm ] · M ). since any continuous local martingale of ﬁnite variation is indistinguishable from zero. Theorem 8.
8. let n ≤ m. so together with part (ii). we have that (H · M )Tm − (H · M )Tn = (H1(Tn . and that this cannot be used for deﬁning a stochastic integral. To extend this to simple processes is similar. Let K be the largest k such that Zk ≡ 0.
K
19
E (H · which can be bounded as
2 M )∞
=
k=0
2 2 2 E Zk MTk+1 − MTk
. (H · M ) ∈ M2 .
≤ H∞
But by the L2 martingale convergence theorem the right hand side of this bound tends to zero as n → ∞. This is because I shall follow Dellacherie and Meyer and show that the other deﬁnitions are equivalent by using the stochastic integral. and M ∈ M2 .2. We are now going to look at a variation which will prove fundamental for the construction of the integral. Quadratic Variation
We mentioned earlier that the total variation is the variation which is used by the usual LebesgueStieltjes integral.
.
since we require T0 = 0.
e.
so the expression (∗) becomes Mt2 = 2(H n · M )t + An . n Jn (ω) := {Sk (ω) : k ≥ 0}. t
n From the construction of the stopping times Sk we have the following properties
(∗∗)
H n − H n+1 H −H
n n n+m
∞ ∞ ∞
n+1 n = sup Ht − Ht  ≤ 2−(n+1) t n+m n = sup Ht − Ht  ≤ 2−(n+1) for all m ≥ 1 t n = sup Ht − Mt  ≤ 2−n t
H −M
n Let Jn (ω) be the set of all stopping times Sk (ω) i.The Stochastic Integral Proof For each n deﬁne stopping times
n S0 := 0.
Clearly Jn (ω) ⊂ Jn+1 (ω).1(iii) the following result holds E (H n · M ) − (H n+m · M )
2 ∞
=E
H n − H n+m · M
2 2 ∞ E(M∞ ) 2
2 ∞
≤ H n − H n+m ≤ 2−(n+1)
2 E(M∞ ).
We can then think of the ﬁrst term in the decomposition (∗) as (H n · M ). Now for any m ≥ 1. using Proposition 8. Now deﬁne An := t
k≥1 2
n n MTk − MTk−1
.Sk ] .
. n n n Sk+1 := inf t > Sk : Mt − MSk > 2−n
20
for k ≥ 0
Deﬁne
n n Tk := Sk ∧ t
Then Mt2 =
k≥1 2 2 Mt∧S n − Mt∧S n
k k−1
=
k≥1
2 2 MT n − MT n
k
k−1
(∗)
2
n n MTk − MTk−1
=2
k≥1 n
n n n MTk−1 MTk − MTk−1 +
k≥1
Now deﬁne H to be the simple process given by H n :=
k≥1
n n n MSk−1 1(Sk−1 .
However if I is an open n interval in the complement of J. and can be skipped on a ﬁrst reading. such that Mt2 − At = 2Nt . Mt2 −Bt is a UI martingale. Then
t
Mt2 = 2
0
Ms dMs + M t . Thus we have established the existence result. hence by completeness of the Hilbert Space it converges to a limit in the same space. n n and since Jn (ω) ⊂ Jn+1 (ω).3. then no stopping time Sk lies in this interval.The Stochastic Integral Thus (H n · M )∞ is a Cauchy sequence in the complete Hilbert space L2 (F∞ ).
The following corollary will be needed to prove the integration by parts formula. where Nt is a UI martingale (since it is L2 bounded). so the same is true for the process A. and hence be zero. Corollary 8. where Mt2 = 2Nt + At . Hence the process A is continuous. It clearly must have ﬁnite variation. and so A is certainly increasing on the closure of J(ω) := ∪n Jn (ω). null at zero. so M must be constant throughout I.
n n Now we must check that this limit process A is increasing. That is suppose that also for some Bt continuous increasing from zero. to a process A. starting from zero. Suppose the process A in the above deﬁnition were not unique. It only remains to consider uniqueness. since this avoids having to redeﬁne the notation. Uniqueness follows from the result that a continuous local martingale of ﬁnite variation is everywhere zero. and null at zero. increasing. By Doob’s L2 inequality applied to the continuous martingale (H n · M ) − N . Then as Mt2 − At is also a UI martingale by subtracting these two equations we get that At − Bt is a UI martingale. From the relation (∗∗) we see that as a consequence of this. so is the the limit N say. such that
n sup Ht − M  ≤ 2−n . t
. t where each H n was a bounded previsible process.. Let M be a bounded continuous martingale. the process An converges uniformly a. As (H n · M ) is a continuous martingale for each n. however it is clearer to place it here.
Proof In the construction of the quadratic variation process the quadratic variation was constructed as the uniform limit in L2 of processes An such that t An = Mt2 − 2(H n · M )t .s.
2
Hence (H n · M ) converges to N uniformly a. it is also true that A(Sk ) ≤ A(Sk+1 ) for all n and k. Clearly An (Sk ) ≤ An (Sk+1 ).s. E sup (H n · M ) − N 2
t≥0
21
≤ 4E [(H · M ) − N ]∞ →n→∞ 0.
Theorem 8. We only need to check that we can sew up the pieces without any holes i. The idea is that it enables us to extend a deﬁnition which applies for ‘X widgets’ to one valid for ‘local X widgets’. The quadratic variation turns out to be the ‘right’ sort of variation to consider for a martingale. we shall see later how to extend this to a continuous semimartingale. 4 We need to generalise this slightly.
. Covariation
From the deﬁnition of the quadratic variation of a local martingale we can deﬁne the covariation of two local martingales N and M which are locally L2 bounded via the polarisation identity M +N − M −N M.s. Note that the deﬁnition given is for a continuous local martingale. We can prove the following theorem in a straightforward manner using the deﬁnition of quadratic variation above. so the martingales (H n · M ) converge to (M · M ) uniformly in L2 . N := . hence it follows immediately that
t
22
Mt2
=2
0
Ms dMs + M t .. It achieves this by using a sequence of stopping times which reduce the ‘local X widgets’ to ‘X widgets’ . that our deﬁnition is independent of the choice of stopping times! Let Tn = inf{t : Mt  > n}.e. and it can be shown that the higher order variations of a martingale are zero a. since the above deﬁnition required the quadratic variation terms to be ﬁnite. we see that M is deﬁned for all t.
8. since we have already shown that all but the zero martingale have inﬁnite total variation. and this will motivate the general deﬁnition of the covariation process.4.3. Now deﬁne M
t
:= M Tn for 0 ≤ t ≤ Tn
To check the consistency of this deﬁnition note that M Tn
Tn−1
= M Tn−1
and since the sequence of stopping times Tn → ∞. The mysterious seeming technique of localisation isn’t really that complex to understand. The quadratic variation process M t of a continuous local martingale M is the unique increasing process A.The Stochastic Integral and hence H n → M in L2 (M ). starting from zero such that M 2 − A is a local martingale. the original deﬁnition can then be applied to the stopped version of the ‘X widget’. deﬁne a sequence of stopping times. Uniqueness follows from the result that any ﬁnite variation continuous local martingale starting from zero is identically zero. Proof We shall use a localisation technique to extend the deﬁnition of quadratic variation from L2 bounded martingales to general local martingales.
and M ∈ Mc . M .M and N continuous local martingales M + N. and 2 we have noted that (H · M ) is itself in M2 .
The space L2 (M ) is then deﬁned as the subspace of the previsible processes.The Stochastic Integral Theorem 8.6. This process A is the covariation of M and N . where this seminorm is ﬁnite. then M 2 − M is a uniformly integrable martingale. i. N =λ M.
23
8. L2 (M ) := {previsible processes H such that H
M
< ∞}. In the light of this. i.
. L = M. Hence
2 E (H · M )2 = E Zi−1 MTi − MTi−1 ∞ 2
Recall that for M ∈ M2 . λ ∈ R. Extension of the Integral to L2 (M )
We have previously deﬁned the integral for H a simple process (in L). N = N. L + N. M. L . For M and N two local martingales which are locally L2 bounded then there exists a unique ﬁnite variation process A starting from zero such that M N − A is a local martingale. there exists a unique ﬁnite variation process A.
So summing we obtain E (H · M )2 =E ∞
2 Zi−1
M
Ti
− M
Ti−1
. Hence for S and T stopping times such that S ≤ T . For two continuous local martingales N and M .
=E (H 2 · M )∞ . such that M N − A is a local martingale. λM.4.e. This theorem is turned round to give the usual deﬁnition of the covariation process of two continuous local martingales as: Deﬁnition 8. N .5. The covariance process of N and M is deﬁned as this process A. It can readily be veriﬁed that the covariation process can be regarded as a symmetric bilinear form on the space of local martingales. for L. then
2 2 E (MT − MS )2 FS = E(MT − MS FS ) = E( M T
− M S FS ).e. we deﬁne a seminorm H
1/2 M
via
∞ 1/2 2 Hs d 0
H
M
= E (H · M )∞
2
= E
M
s
.
The Stochastic Integral However we would actually like to be able to treat this as a Hilbert space, and there remains a problem, namely that if X ∈ L2 (M ) and X M = 0, this doesn’t imply that X is the zero process. Thus we follow the usual route of deﬁning an equivalence relation via X ∼ Y if and only if X − Y M = 0. We now deﬁne L2 (M ) := {equivalence classes of previsible processes H such that H
M
24
< ∞},
and this is a Hilbert space with norm · M (it can be seen that it is a Hilbert space by considering it as suitable L2 space). This establishes an isometry (called the Itˆ isometry) between the spaces L2 (M ) ∩ L o 2 and L (F∞ ) given by I :L2 (M ) ∩ L → L2 (F∞ ) I :H → (H · M )∞ Remember that there is a basic bijection between the space M2 and the Hilbert Space L2 (F∞ ) in which each square integrable martingale M is represented by its limiting value M∞ , so the image under the isometry (H · M )∞ in L2 (F∞ ) may be thought of a describing an M2 martingale. Hence this endows M2 with a Hilbert Space structure, with an inner product given by (M, N ) = E (N∞ M∞ ) . We shall now use this Itˆ isometry to extend the deﬁnition of the stochastic integral o from L (the class of simple processes) to the whole of L2 (M ). Roughly speaking we shall approximate an element of L2 (M ) via a sequence of simple processes converging to it; just as in the construction of the Lebesgue Integral. In doing this, we shall use the Monotone Class Theorem. Recall that in the conventional construction of the Lebesgue integration, and proof of the elementary results the following standard machine is repeatedly invoked. To prove a ‘linear’ result for all h ∈ L1 (S, Σ, µ), proceed in the following way: (i) Show the result is true for h an indicator function. (ii) Show that by linearity the result extends to all positive step functions. (iii) Use the Monotone convergence theorem to see that if hn ↑ h, where the hn are step functions, then the result must also be true for h a nonnegative, Σ measurable function. (iv) Write h = h+ −h− where both h+ and h− are nonnegative functions and use linearity to obtain the result for h ∈ L1 . The monotone class lemmas is a replacement for this procedure, which hides away all the ‘machinery’ used in the constructions. Monotone Class Theorem. Let A be πsystem generating the σalgebra F (i.e. σ(A) = F). If H is a linear set of bounded functions from Ω to R satisfying (i) 1A ∈ H, for all A ∈ A,
The Stochastic Integral (ii) 0 ≤ fn ↑ f , where fn ∈ H and f is a bounded function f : Ω → R, then this implies that f ∈ H, then H contains every bounded, Fmeasurable function f : Ω → R. In order to apply this in our case, we need to prove that the σalgebra of previsible processes is that generated by the simple processes. The Previsible σﬁeld and the Simple Processes It is fairly simple to show that the space of simple processes L forms a vector space (exercise: check linearity, constant multiples and zero). Lemma 8.7. The σalgebra generated by the simple processes is the previsible σalgebra i.e. the previsible σalgebra us the smallest σalgebra with respect to which every simple process is measurable. Proof It suﬃces to show that every left continuous right limit process, which is bounded and adapted to Ft is measurable with respect to the σalgebra generated by the simple processes. Let Ht be a bounded left continuous right limits process, then
nk
25
H = lim lim
k→∞ n→∞
H(i−1)/n
i=2
i−1 i , , n n
and if Ht is adapted to Ft then H(i−1)/n is a bounded element of F(i−1)/n . We can now apply the Monotone Class Theorem to the vector space H of processes with a time parameter in (0, ∞), regarded as maps from (0, ∞) × Ω → R. Then if this vector space contains all the simple processes i.e. L ⊂ H, then H contains every bounded previsible process on (0, ∞). Assembling the Pieces Since I is an isometry it has a unique extension to the closure of U = L2 (M ) ∩ L, in L2 (M ). By the application of the monotone class lemma to H = U, and the πsystem of simple processes. We see that U must contain every bounded previsible process; hence U = L2 (M ). Thus the Itˆ Isometry extends to a map from L2 (M ) to L2 (F∞ ). o Let us look at this result more informally. For a previsible H ∈ L2 (M ), because of the density of L in L2 (M ), we can ﬁnd a sequence of simple processes Hn which converges to H, as n → ∞. We then consider I(H) as the limit of the I(Hn ). To verify that this limit is unique, suppose that Hn → H as n → ∞ also, where Hn ∈ L. Note that Hn − Hn ∈ L. Also Hn − Hn → 0 and so ((Hn − Hn ) · M ) → 0, and hence by the Itˆ isometry the limits o limn→∞ (Hn · M ) and limn→∞ (Hn · M ) coincide. The following result is essential in order to extend the integral to continuous local martingales.
The Stochastic Integral Proposition 8.8. For M ∈ M2 , for any H ∈ L2 (M ) and for any stopping time T then (H · M )T = (H1(0,T ] · M ) = (H · M T ). Proof Consider the following linear maps in turn f1 :L2 (F∞ ) → L2 (F∞ ) f1 :Y → E(Y FT ) This map is a contraction on L2 (F∞ ) since by conditional Jensen’s inequality
2 E(Y∞ FT )2 ≤ E(Y∞ FT ),
26
and taking expectations yields E(Y FT )
2 2 2 2 = E E(Y∞ FT )2 ≤ E E(Y∞ FT ) = E(Y∞ ) = Y 2 2.
Hence f1 is a contraction on L2 (F∞ ). Now f2 :L2 (M ) → L2 (M ) f2 :H → H1(0,T ] Clearly from the deﬁnition of · is increasing
∞ M,
and from the fact that the quadratic variation process
T ∞ 2 Hs d M 0 s
H1(0,T ]
M
=
0
2 Hs 1(0,T ] d M
s
=
≤
0
2 Hs d M
s
= H
M
.
Hence f2 is a contraction on L2 (M ). Hence if I denotes the Itˆ isometry then f1 ◦ I and o 2 2 I ◦ f2 are also contractions from L (M ) to L (F∞ ), (using the fact that I is an isometry between L2 (M ) and L2 (F∞ )). Now introduce I (T ) , the stochastic integral map associated with M T , i.e. I (T ) (H) ≡ (H · M T )∞ . Note that I (T ) (H)
2
= H
MT
≤ H
M.
We have previously shown that the maps f1 ◦ I and I ◦ f2 and H → I (T ) (H) agree on the space of simple processes by direct calculation. We note that L is dense in L2 (M ) (from application of Monotone Class Lemma to the simple processes). Hence from the three bounds above the three maps agree on L2 (M ).
In fact in these notes we will never drop the continuity requirement. Also note that if T = Tn−1 we can check consistency (H1(0. Some Important Results
We can now extend most of our results to stochastic integrals of a previsible process H with respect to a continuous local martingale M . because if say {Tn } reduces H and {Sn } reduces X then Tn ∧ Sn reduces both H and X. tending to inﬁnity such that H1(0. For the integrator (a martingale of integrable variation say). we get the same deﬁnition of (H · X). In this context a previsible process {Ht }t≥0 .6.5. Localisation
We’ve already met the idea of localisation in extending the deﬁnition of quadratic variation from L2 bounded continuous martingales to continuous local martingales.e.Tn ] is uniformly bounded for each n. that is one which has a sequence of stopping times Tn → ∞ such that for all n.Tn ] · X Tn )Tn−1 = (H · X)Tn−1 = (H1(0. Note ﬁrstly that for H and X local. the localisation is to a local martingale. is locally previsible if there exists a sequence of stopping times Tn → ∞ such that for all n H1(0.Tn ∧Sn ] · X Tn ∧Sn ) = (H1(0. and Tn a reducing sequence1 of stopping times for both H and X then we see that (H1(0. However if in addition we want the process H to be locally bounded we need the condition that there exists a sequence Tn of stopping times.
1
The reducing sequence is the sequence of stopping times tending to inﬁnity which makes the local version of the object into the nonlocal version. We can ﬁnd one such sequence. viz if we choose another regularising sequence Sn .The Stochastic Integral
27
8.
. hence the deﬁnition of (H · X)t is the same if constructed from the regularising sequence Sn as if constructed via Tn . It can be done.Tn−1 ] · X Tn−1 ). ∀n.Sn ] · X Sn )Tn . especially with regard to the deﬁnition of the quadratic variation process.Tn ] is a previsible process. Thus it is consistent to deﬁne (H · X)t on t ∈ [0. but it requires considerably more work.
8. nonlocalised classes) then it is possible to extend the deﬁnition of (H · X) to the local classes. Fairly obviously every previsible process has this property. X Tn is a genuine martingale. ∞) via (H · X)Tn = (H1(0.Tn ] · X Tn ).
We must check that this is well deﬁned.T ] · X T ) is deﬁned in the existing fashion. To see this note: (H1(0. If we can prove a result like (H · X)T = (H1(0.T ] · X T ) for H and X in their original (i.Tn ] · X Tn )Sn = (H1(0.
we have established H · M = H2 · M . Using this result it suﬃces to prove (iii) and (iv) where M .T ] · M ) = (H · M T ) for H bounded previsible process in L2 (M ) and M an L2 bounded martingale. To extend to H and K bounded previsible processes note that E (H · (K · M ))∞ =E =E =E
2
H2 · K · M
2 2
∞ ∞
H · (K · M ) (HK)2 · M
2 M )∞ ∞
=E ((HK) · Also note the following bound E HK)2 · M
∞
≤ min
H
2 ∞
K
2 M. Then: (i) (H · M )T = (H1(0. and M a continuous local martingale.T ] · M · M )2 ∞ H1(0. Let H be a locally bounded previsible process.
.T ] · M H2 · M
T 2 ∞
28
Hence we see that (H ·M )2 −(H 2 · M ) is a martingale (via the optional stopping theorem). Let T be an arbitrary stopping time. Part (iii) E (H · M )2 =E T =E =E H1(0. H and K are uniformly bounded (via localisation).The Stochastic Integral Theorem 8.
H
2 M
K
2 ∞
.9.T ] · M ) = (H · M T ) (ii) (H · M ) is a continuous local martingale (iii) H · M = H 2 · M (iv) H · (K · M ) = (HK) · M Proof The proof of parts (i) and (ii) follows from the result used in the localisation that: (H · M )T = (H1(0. Part (iv) The truth of this statement is readily established for H and K simple processes (in L). and so by uniqueness of the quadratic variation process.
X := M . but following this approach. where M is a local martingale. for X = X0 + M + A. N is the diﬀerence of two continuous local martingales. Deﬁnition 9. A continuous semimartingale is a process (Xt )t≥0 which has a DoobMeyer decomposition X = X0 + M + A.3. Mt is a continuous local martingale and At is a continuous adapted process of ﬁnite variation.2) it must be zero. We deﬁne† the quadratic variation of the continuous semimartingale as that of the continuous local martingale part i. To remove many of these diﬃculties we shall impose a continuity condition.e. null at zero and A is a process null at zero. and hence is itself a continuous local martingale. Now it’s time to extend the deﬁnition of the Itˆ integral to the case of semimartingale o integrators.
† These deﬁnitions can be made to look natural by considering the quadratic variation deﬁned in terms of a sum of squared increments. Proof Let another such decomposition be X = X0 + M + A .2. Theorem 9. Deﬁnition 9. Semimartingales
I mentioned at the start of these notes that the most general form of the stochastic integral would have a previsible process as the integrand and a semimartingale as an integrator. A process X is a semimartingale if X is an adapted CADLAG process which has a decomposition X = X0 + M + A. Hence M = M and A = A.1. where X0 is F0 measurable. where N = M − M = A − A. by the ﬁrst equality. M0 = A0 = 0.9. Then consider the process N . where M is a continuous local martingale and A a continuous adapted process of ﬁnite variation. and by the second inequality it has ﬁnite variation. Hence by an earlier proposition (5. The DoobMeyer decomposition in the deﬁnition of a continuous semimartingale is unique. these are result which are proved later using the Itˆ integral. o
[29]
. Note that the decomposition is not necessarily unique as there exist martingales which have ﬁnite variation. since under this most of our problems will vanish. with paths of ﬁnite variation. since this provided a better approach to the discontinuous theory.
N . where B is ﬁnite variation and N is a continuous local martingale. we deﬁne X. We can extend the deﬁnition of the stochastic integral to continuous semimartingale integrators by deﬁning (H · X) := (H · M ) + (H · A).Semimartingales Similarly if Y + Y0 + N + B is another semimartingale. where the ﬁrst integral is a stochastic integral as deﬁned earlier and the second is a LebesgueStieltjes integral (as the integrator is a process of ﬁnite variation). Y := M.
30
.
c. whence u.c. The space of CADLAG processes with u.10. n 2n n=1
for X and Y CADLAG processes. Deﬁnition 10. whence it must converge to zero in probability. 0≤s≤t
then for Y n a CADLAG process. or alternatively in terms of sums of squares of increments. however Doob’s martingale inequalities provide the key to handling it.p.p.p. it often suﬃces to apply Doob’s inequality to prove that the supremum converges to zero in L2 . topology is in fact metrizable. For example the quadratic variation of a process can either be deﬁned in terms of martingale properties. The metric space can also be shown to be complete.) if for each t > 0.c. (X − Y )∗ )) . Thus to prove that a sequence converges u.
10.
n sup Hs − Hs  → 0 in probability.1. Y ) =
1 E (min(1. [31]
. and is included to bring together the two approaches to the constructions involved in the stochastic integral. Relations to Sums
This section is optional. It would also appear to be quite diﬃcult to handle.c.c.p.
0≤s≤t
At ﬁrst sight this may seem to be quite an esoteric deﬁnition. For details see Protter. A sequence {Hn }n≥1 converges to a process H uniformly on compacts in probability (abbreviated u. so it makes sense to review this topology here.1. a compatible metric is given by
∞
d(X.p. Y n converges to Y u. Let
∗ Ht = sup Hs . in fact it is a natural extension of convergence in probability to processes. convergence follows. iﬀ (Y n − Y )∗ converges to zero in probability for each t ≥ 0. The UCP topology
We shall meet the notion of convergence uniformly on compacts in probability when considering stochastic integrals as limits of sums.
Xn (ω) → X(ω). Proof Apply Chebyshev’s inequality to f (x) = xp . Approximation via Riemann Sums
Following Dellacherie and Meyer we shall establish the equivalence of the two constructions for the quadratic variation by the following theorem which approximates the stochastic integral via Riemann sums.s. if for any P (Xn − X > ) → 0.
. as n → ∞. if EXn − Xp → 0. Theorem 10.s Xn converges to X in probability. then Xn converges to X in probability.
E (Xn p ) → 0. For convenience here are the usual deﬁnitions: Pointwise A sequence of random variables Xn converges to X pointwise if for all ω not in some null set. then there exists a subsequence nk such that Xnk → X a.. as n → ∞.
Theorem 10.3. then Xn → X in probability.
> 0.
Lp convergence A sequence of random variables Xn converges to X in Lp .4.v.2.Relations to Sums Since we have just met a new kind of convergence.
32
Probability A sequence of r. as n → ∞. If Xn → X in probability.2. which yields for any P (Xn  ≥ ) ≤
−p
> 0. If Xn → X a. It is trivial to see that pointwise convergence implies convergence in probability. If Xn converges to X in Lp for p > 0.
10.s. it is helpful to recall the other usual types of convergence on a probability space. It is also true that Lp convergence implies convergence in probability as the following theorem shows Theorem 10.
Let Tk be a reducing sequence for the continuous local martingale M such that M Tk is a bounded martingale.
Hence. Then
t ∞
33
Hs dXs = lim
0
n→∞
Ht∧k2−n Xt∧(k+1)2−n − Xt∧k2−n
k=0
u.
Proof Let Ks = Hs 1s≤t .
is standard from the LebesgueStieltjes theory.s.
∗
. Let X be a semimartingale.Relations to Sums Theorem 10.2. The result that t n Ks dAs 0 t
→
0
Ks dAs . Also since K is locally bounded we can ﬁnd a sequence of stopping times Sk such that K Sk is a bounded previsible process.c. u. It therefore suﬃces to prove for a sequence of stopping times Rk such that Rk ↑ ∞.p. we may conclude E [(K n · M ) − (K · M )] So
∗ 2
→ 0. s
2
Doob L2 Itˆ Isometry o
n (Ks − Ks )2 d M
As K n − K → 0 pointwise.p. clearly K n − K is also bounded uniformly in n.p. both starting from zero. then (K n · M )Rk → (K · M )Rk . and H a locally bounded previsible CADLAG process starting from zero. s s By Doob’s L2 inequality. and deﬁne the following sequence of simple process approximations
∞ n Ks
:=
k=0
Ht∧k2−n 1(t∧k2−n .
[(K n · M ) − (K · M )] → 0 in L2 . as n → ∞.c.
n Clearly this sequence Ks converges pointwise to Ks .c. u. and the Itˆ isometry we have o E ((K n · M ) − (K · M ))
∗ 2
≤4E [(K n · M ) − (K · M )] .t∧(k+1)2−n ] (s). ≤4 K n − K ≤4
2 M.. We can decompose the semimartingale X as X = X0 + At + Mt where At is of ﬁnite variation and Mt is a continuous local martingale.. Hence by the Dominated Convergence Theorem for the LebesgueStieltjes integral
n (Ks − Ks )2 d M s
→ 0 a. and K is bounded.
Proof In the theorem (7.
where the limit is taken in probability. Theorem 10. Remark The theorem can be strengthened still further by a result of Dol´ansDade to the eﬀect e that for X a continuous semimartingale
∞
X
t
= lim
n→∞
Xt∧(k+1)2−n − Xt∧k2−n
k=0
2
.
which is the required result.p.
k=0 ∞
In addition. This result can now be applied to the construction of the quadratic variation process. as required.
The diﬀerence of these two equations yields
∞
At =
2 X0
+ lim
n→∞
Xt∧(k+1)2−n − Xt∧k2−n
k=0
2
. Hence the process A is increasing and positive on the rational numbers.3.
2 Xt
−
2 X0
=
k=0
2 2 Xt∧(k+1)2−n − Xt∧k2−n . and hence on the whole of R by right continuity. This result is harder to prove (essentially the uniform integrability of the sums must be proven) and this is not done here.2) establishing the existence of the quadratic variation process. as illustrated by the next theorem. u. Hence [(K n · M ) − (K · M )] → 0 u.
. but this implies that [(K n · M ) − (K · M )] → 0 in probability. we noted in (∗∗) that An = Mt2 − 2(H n · M )t . and putting the two parts together yields
t n Ks dXs → 0 0 t ∗
34
Ks dXs .c.c.p.Relations to Sums as n → ∞.
where the limit is in the strong sense in L1 . t Now from application of the previous theorem
t ∞
2
0
Xs dXs = lim
n→∞
Xt∧k2−n Xt∧(k+1)2−n − Xt∧k2−n . The quadratic variation process X t is equal to the following limit in probability
∞
X
t
= lim
n→∞
Xt∧(k+1)2−n − Xt∧k2−n
k=0
2
in probability.
N = H · M. More importantly. where Z is an FS measurable bounded random variable. as k → ∞. We ﬁnally extend to general locally bounded previsible H. it is the fundamental weapon used to evaluate Itˆ integrals. Theorem (KunitaWatanabe Identity) 11.
= H nk − H [35]
2 M
→ 0. we o shall see some examples of this shortly. The Itˆ isometry provides a cleancut deﬁnition of the stochastic integral. and realise that our formalism has found a new subtlety in the subject. N )∞ ] Moreover we have (H · M ).
.T ] . Then (H · M )∞ is the unique element of L2 (F∞ ) such that for every N ∈ M2 we have: E [(H · M )∞ N∞ ] = E [(H · M. Then there exists a subsequence nk such that H nk converges to H is L2 (N ). Let M ∈ M2 and H and K are locally bounded previsible processes. It is clear that E [(H · M )∞ N∞ ] =E [Z (MT − MS ) N∞ ] =E [Z(MT NT − MS NS )] =E [M∞ (H · N )∞ ] Now by linearity this can be extended to establish the result for all simple functions (in L). Then E (H nk · M )∞ N∞ − E (H · M )∞ N∞ =E ≤ ≤ (H nk − H) · M N∞ E [(H nk − H) · M )]
2 2 E(N∞ ) 2 2 E(N∞ )
E [(H nk · M ) − (H · M )]
By construction H nk → H in L2 (M ) which means that H nk − H By the Itˆ isometry o E ((H nk − H) · M )
2 M
→ 0. Proof Consider an elementary function H. Itˆ’s Formula o
Itˆ’s Formula is the analog of integration by parts in the stochastic calculus. so H = Z1(S. as k → ∞.1. however it o was originally deﬁned via the following theorem of Kunita and Watanabe. by considering a sequence (provided it exists) of simple functions H n such that H n → H in L2 (M ). It is also o the ﬁrst place where we see a major diﬀerence creep into the theory. N .11. and S and T are stopping times such that S ≤ T .
Hence as N is an L2 bounded martingale. By polarisation M. H · N + K · M. Hence we can pass to the limit to obtain the result for H. N ) = 1 8 (H + K) · (M + N ) − (H + K) · (M − N ) − (H − K) · (M + N ) + (H − K) · (M − N ) Considering the ﬁrst two terms (H + K)·(M + N ) − (H + K) · (M − N ) = = (H + K) · M + (H + K) · N − (H + K) · M − (H + K) · N =4 (H + K) · M. we shall ﬁrst show that (H · N ). H · N + H · M. (H − K) · N by polarisation =4 ( H · M. N .9(iii)). 4
36
(H + K)2 − (H − K)2 . N = also HK = Hence 2(HK · M. 4
Now we use the result that (H · M ) = (H 2 · M ) which has been proved previously in theorem (7. (H + K) · N by polarisation =4 ( H · M. K · N + K · M. K · N − K · M. Similarly as (H nk · N )∞ → (H · N )∞ in L2 . (H · M ) = 2HK M. Similarly for the second two terms (H − K)·(M + N ) − (H − K) · (M − N ) = = (H − K) · M + (H − K) · N − (H − K) · M − (H − K) · N =4 (H − K) · M.Itˆ’s Formula o that is (H nk · M )∞ → (H · M )∞ in L2 . to see that 2(HK · M. the right hand side of the above expression tends to zero as k → ∞. K · N ) . H · N − H · M. To prove the second part of the theorem. we see also that E ((H nk · N )∞ M∞ ) → 0. . (K · M ) + (K · N ). H · N + K · M. K · N ) . M +N − M −N . as k → ∞.
. N ) = 1 8 (H + K)2 − (H − K)2 · { M+N − MN } .
(K · M ) =(H · X. N = M. which is equivalent to showing that (H · M )N − (H · N )M is a local martingale (from the deﬁnition of covariation process). N + M.Itˆ’s Formula o Adding these two together yields 2(HK · M. This is the ﬁrst major diﬀerence we have seen between the Stochastic Integral and the usual Lebesgue Integral. N ). (H · N ) . Before we can prove the general theorem. H · N . N ) = (H · N ). M ). Proof Note that the covariation is symmetric.2. (H · M ) Putting K ≡ 1 yields 2H · M. hence (H · N ). then (H · N ). (K · M ) + (K · N ). So it suﬃces to prove that (H · M ). the covariation of the processes X and Y . X) ) =(HK · M. whence we must check that for all stopping times T . since
T (H · M )T NT =(H · M )T N∞ ∞ T (H · N )T MT =(H · N )T M∞ ∞
37
Corollary 11. (K · M ) = (HK · N. but by the ﬁrst part of the theorem E ((H · M )∞ N∞ ) = E ((H · N )∞ M∞ ) . N = H · M. However note that there is an extra term on the right hand side. By localisation it suﬃces to consider M and N bounded martingales.
We can prove a stochastic calculus analogue of the usual integration by parts formula. M be continuous local martingales and H and K locally bounded previsible processes.
. we need a lemma. E ((H · M )T NT ) = E ((H · N )T MT ) . (K · M ) ) =(H · (K · M ). which is suﬃcient to establish the result. Let N.
0
as n → ∞. By localisation we can consider the local martingales M and N to be bounded martingales and the FV processes A and B to have bounded variation. Let M be a bounded continuous martingale starting from zero.Itˆ’s Formula o Lemma (Parts for Finite Variation Process and a Martingale) 11. then the following holds
t t
Xt Yt − X0 Y0 =
0
Xs dYs +
0
Ys dXs + X. the second term converges to
t
Ms dVs .4. so as n → ∞ the second term tends to
t
Vs dMs .
0
and by the Dominated Convergence Theorem for LebesgueStieltjes integrals. Hence let Xt = Mt + At and Yt = Nt + Bt in DoobMeyer decomposition.
Proof It is trivial to see that it suﬃces to prove the result for processes starting from zero. Hence by the usual (ﬁnite variation) theory
t t
At Bt =
0
As dBs +
0
Bs dAs . all starting from zero. 0
=
k≥1
Mk2−n ∧t Vk2−n ∧t − V(k−1)2−n ∧t +
where H n is the previsible simple process
n Hs = k≥1
Vk2−n ∧t 1((k−1)2−n ∧t.3. Y t . Theorem (Integration by Parts) 11. so Nt and Mt are continuous local martingales and At and Bt are ﬁnite variation processes. we can write M t Vt =
k≥1
Mk2−n ∧t Vk2−n ∧t − V(k−1)2−n ∧t +
k≥1 t
V(k−1)2−n ∧t Mk2−n ∧t − M(k−1)2−n ∧t
n Hs dMs .
These H n are bounded and converge to V by the continuity of V . Then
t t
38
Mt Vt =
0
Ms dVs +
0
Vs dMs . For X and Y continuous semimartingales. and V a bounded variation process starting from zero.
.
Proof For n ﬁxed.k2−n ∧t] (s).
j=1
n
t 0
∂f (Xs )d X i . Itˆ’s formula may be applied to them o individually.
However since by assumption f and g are in A. Then
n t 0
f (Xt ) − f (X0 ) =
i=1
1 ∂f i (Xs )dXs + i ∂x 2 i.X 2 . Gs . It implies that the product of two continuous semimartingales Xt Yt is a continuous semimartingale. Hence (Mt + At )(Nt + Bt ) =Mt Nt + Mt Bt + Nt At + At Bt
t t
=
0
Ms dNs +
0 t
Ns dMs + M. Clearly A is a vector space. Let Ft = f (Xt ) and Gt = g(Xt ) be the associated semimartingales. Then let o A be the collection of C 2 (twice diﬀerentiable) functions f : R → R for which it holds. then their product f g is also in A. N t .
This follows by application of polarisation to corollary (7. ﬁrst consider the n = 1 case to simplify the notation. N . so t t ∂f Fs dGs = f (Xs ) (Xs )dXs . X n ) be a continuous semimartingale in Rn . N
t t
t t
+
0 t
Ms dBs +
0 t
Bs dMs +
0
Ns dAs +
0
As dNs
+
0 t
As dBs +
0
Bs dAs
t
=
0
(Ms + As )d(Ns + Bs ) +
0
(Ns + Bs )d(Ms + As ) + M. .Itˆ’s Formula o It only remains to prove for bounded martingales N and M starting from zero that
t t
39
M t Nt =
0
Ms dNs +
0
Ns dMs + M. To do this we must check that if f and g are in A.3) to the quadratic variation existence theorem. . From the integration by parts formula
t t
Ft Gt − F0 G0 =
0
Fs dGs +
0
Gs dFs + Fs . o Let f : Rn → Rn be a twice continuously diﬀerentiable function. . in fact we shall show that it is also an algebra. X j s . . and also let X = (X 1 . Theorem (Itˆ’s Formula) 11. since it can be written as a stochastic integrals with respect to continuous semimartingales and so it itself a continuous semimartingale.
Reﬂect for a moment that this theorem is implying another useful closure property of continuous semimartingales. ∂xi ∂xj
Proof To prove Itˆ’s formula.5. ∂x 0 0
.
=
0
(Fs g (Xs ) + f (Xs )Gs ) dXs + 1 2
t 0
(Fs g (Xs ) + 2f g (Xs ) + f (Xs )Gs ) d M s . o o hence f g ∈ A. so for r = 0. we must approximate f by polynomials (which we can do by standard functional analysis).
t t t
Ft Gt − F0 G0 =
0 t
Fs dGs +
0
Gs dFs +
0
f (Xs )g (Xs )d X. Un ]) we obtain
x≤n
sup fk − f (r) → 0 as k → ∞
(r)
And from the Itˆ isometry we get the required convergence. Hence {Un } is a sequence of stopping times tending to inﬁnity. (r) fk → f (r) uniformly on a compact interval. X s . 1.
So this is just what Itˆ’s formula states for f g and so Itˆ’s formula also applies to f g. o
. On (0. Consider a polynomial sequence fk approximating f . G
t
=
0
f (Xs )g (Xs )d X. We have proved that Itˆ’s formula holds for o all polynomial.Itˆ’s Formula o Also by the KunitaWatanabe formula extended to continuous local martingales we have
t
40
F. X s . Since trivially f (x) = x is in A. so it holds for fk and hence
t∧Un
fk (Xt∧Un ) − fk (X0 ) =
0
f (Xs )dXs +
1 2
t∧Un 0
fk (Xs )d X s . Un ] the process X is uniformly bounded by n. o Introduce a sequence Un := inf{t : Xt  + X t > n}.
Let the continuous semimartingale X have DoobMeyer decomposition Xt = X0 + Mt + At . then as A is an algebra.
since X = M . and a vector space this implies that A contains all polynomials. Now we shall prove Itˆ’s formula for twice continuously o diﬀerentiable f restricted to the interval [0. We can rewrite the above as
t∧Un t∧Un
fk (Xt∧Un ) − fk (X0 ) =
0
f (Xs )dMs +
0
1 f (Xs )dAs + 2
t∧Un 0
fk (Xs )d M s . 1. so we can consider X as a bounded martingale. where M is a continuous local martingale and A is a ﬁnite variation process. in the sense that for r = 0. So to complete the proof. 2 from the convergence (which is uniform on the compact interval [0.
Thus from the integration by parts. 2. and check that in the limit we obtain Itˆ’s formula. Un ].
and it is the solution of the stochastic diﬀerential equation dZt = Zt dXt . However the point of this example was to show how Itˆ’s o formula can help in the actual evaluation of stochastic integrals. To prove this we apply Itˆ’s formula to the function f (x) = x2 . notably by consideration of the deﬁnition of the quadratic variation process. X ∈ Mc then this implies.
11.1. B s . The result isn’t the same as that which would be given by the ‘logical’ extension of the usual integration rules. not to establish a totally new result. the aim of this example is to establish that:
t
Bs dBs =
0
1 2 1 B − t. We obtain o
t
f (Bt ) − f (B0 ) =
0
∂f 1 (Bs )dBs + ∂x 2
t 0
∂2f (Bs )d B. by the loc stability property of stochastic integration.Itˆ’s Formula o
41
11. loc
.
This Zt is called the exponential semimartingale associated with Xt .
so clearly if X is a continuous local martingale.2. ∂x2
noting that B0 = 0 for a standard Brownian Motion we see that
t 2 Bt = 2 0
1 Bs dBs + 2ds. Suppose X is a continuous semimartingale starting from zero. Applications of Itˆ’s Formula o
Let Bt be a standard Brownian motion. i. that is Zt = 1 +
0 t
Zs dXs . Exponential Martingales
Exponential martingales play an important part in the theory. Deﬁne: Zt = exp Xt − 1 X 2
t
. there are grounds to complain that there are simpler ways to establish this result. 2 2
whence we derive that
0
t
Bs dBs =
For those who have read the foregoing material carefully. 2
2 Bt t − .e. that Z ∈ Mc also. 2 t 2
This example gives a nice simple demonstration that all our hard work has achieved something.
Xt − X t 2 2 2 2 1 1 1 = exp(Xt − X t )dXt − exp Xt − X t d X t 2 2 2 1 1 + exp Xt − X t d X t 2 2 = exp(Xt − =Zt dXt Hence Zt certainly solves the equation. o d(Zt Yt ) =Zt dYt + Yt dZt + Z. =0. =Zt (−Yt dXt + Yt d X t ) + Yt Zt dXt + (−Yt Zt )d X t . Now to check uniqueness. 1 exp(Yt )d Y. So Zt Yt is a constant. Y t . deﬁne Yt = exp −Xt + 1 X 2 .Itˆ’s Formula o Proof For existence. whence by integration by parts (alternatively consider Itˆ applied to f (x. 2
42
t
we wish to show that for every solution of the Stochastic Diﬀerential Equation Zt Yt is a constant. is Zt = exp Xt − 1 X 2 . with Z0 = 1. Y t . y) = xy). apply Itˆ’s formula to f (x) = exp(x) to obtain o d (exp(Yt )) = exp(Yt )dYt + Hence d exp(Xt − 1 X t) 2 1 1 X t )d Xt − X t 2 2 1 1 1 1 + exp Xt − X t d Xt − X t . By a similar application of Itˆ’s formula o dYt = −Yt dXt + Yt d X t . hence the unique solution of the stochastic diﬀerential equation dZt = Zt dXt .
t
.
such that EM0 < ∞ then M is a supermartingale.
hence M is a local martingale. Proof Let Tn be a reducing sequence for Mn − M0 .
n→∞ n→∞
43
Thus M is a supermartingale as required. then for t > s ≥ 0. Fix T a constant time. Theorem 11. and thus by application of Fatou’s lemma we can show that it is a supermartingale. Let M be a nonnegative local martingale. thus E(Mt ) ≤ 1 for all t. How do we go about checking if a local martingale is a martingale anyway? It will turn out that there are various methods. Now by application of the conditional form of Fatou’s lemma E(Mt Fs ) = E(lim inf Mt∧Tn Fs ) ≤ lim inf E(Mt∧Tn Fs ) = Ms . not the least of which will be Girsanov’s formula an actual Martingale will be needed. Checking for a Martingale The general exponential semimartingale is useful.Itˆ’s Formula o Example Let Xt = λBt . By the previous argument we know that
t
Mt = 1 +
0
λMs dBs . Clearly Xt is a continuous local martingale. some of which crop up in the section on ﬁltration. for an arbitrary scalar λ. T then B T is an L2 bounded martingale (E(Bt )2 = t ∧ T ≤ T ). which is of course a stopping time. We then show that M T is
. E(Mt∧Tn Fs ) = E(M0 Fs ) + E(Mt∧Tn − M0 Fs ) = M0 + Ms∧Tn − M0 = Ms∧Tn . but in many applications. starting from zero. 2 It is clear that the exponential semimartingale of a real valued martingale must be nonnegative.6. Let Mt = exp(λBt − 1/2λ2 t) be the exponential semimartingale associated with a standard Brownian Motion Bt . so the associated exponential martingale is 1 Mt = exp λBt − λ2 t . A common error is to think that it is suﬃcient to show that a local martingale is locally bounded in L2 to show that it is a martingale – this is not suﬃcient as should be made clear by this example! The exponential of λBt We continue our example from above. First I shall present a simple example and then prove a more general theorem.
there exists a constant Kt such that M t < ∞ a. The Exponential Martingale Inequality We have seen a speciﬁc proof that a certain (important) exponential martingale is a true martingale. P sup Ms > y ≤ P sup Zs > exp(θy − 1/2θ2 Kt ) ..
≤E
0 T
exp(2λBs )ds . starting from zero. E (exp(2λBs )) ds. Theorem 11.
=
0 T
=
0
exp(2λ2 s2 )ds < ∞.
=E
0 T
2 Ms ds . which implies that M is a martingale. and we use the characteristic function of the normal distribution. For the second part. Thus by the integration isometry theorem we have that (M T · B)t is an L2 bounded martingale. sup Ms > y ≤ exp(−y 2 /2Kt ). Let M be a continuous local martingale.s. Proof We have already noted that the exponential semimartingale Zt is a supermartingale.Itˆ’s Formula o in L2 (B T ) as follows
T
44
MT
B
=E
0 T
2 Ms d B
s
.7.
In the ﬁnal equality we use that fact that Bs is distributed as N (0. Z T is an L2 bounded martingale. Optimizing over θ now gives the desired result. so E(Zt ) ≤ 1 for all t ≥ 0. then for every t. and hence for θ > 0 and y > 0. we establish the
. we now show a more general argument.
s≤t s≤t
≤ exp(−θy + 1/2θ2 Kt ). and every y > 0. Suppose for each t. s). Thus for every such T .
0≤s≤t
P
Furthermore. the associated exponential semimartingale Zt = exp(θMt − 1/2θ2 M t ) is a true martingale.
δ > 0. let Tn be a reducing sequence for Z. since this is the exponential semimartingale associated with the process HdB. We shall use the useful result
∞ ∞ ∞
P(X ≥ log λ)dλ =
0 0
E 1eX ≥λ dλ = E
0
1eX ≥λ dλ = E(eX ). and thus by our bound we can apply the dominated convergence theorem to (∗∗) as n → ∞ to establish that Z is a true martingale.
Proof Set T = inf{t ≥ 0 : Mt ≥ }. Corollary 11. exp −(log λ)2 /2Kt dλ < ∞. then
t t
exp
0
Hs dBs −
1 2
Hs 2 ds
0
is a true martingale.
.9. that is if Kt ≤ C for all t.Itˆ’s Formula o following bound E sup Zs
0≤s≤t ∞
45
≤ E exp ≤
0 ∞
sup Zs
0≤s≤t
. From this corollary. the conditions of the previous theorem now apply to M T . hence Z Tn is a martingale. If the bounds Kt on M are uniform.
1
≤1+
We have previously noted that Z is a local martingale.
Hence Z is bounded in L∞ and thus a uniformly integrable martingale. with Kt = . (∗∗)
We note that Zt is dominated by exp(θ sup0≤s≤t Zs ). hence E [Zt∧Tn Fs ] = Zs∧Tn . it is clear that if H is any bounded previsible process. For all .8. then the exponential martingale is Uniformly Integrable. Corollary 11.
Proof Note that the bound (∗) extends to a uniform bound
∞
E sup Zt
t≥0
≤1+
1
exp −(log λ)2 /2C dλ < ∞. (∗)
P [sup 0 ≤ s ≤ tZs ≥ log λ] dλ. P sup Mt ≥
t≥0
& M
∞
≤ δ ≤ exp(− 2 /2δ).
. . . t)dt − θj θk δjk f (Bt . F. L´vy Characterisation of Brownian Motion e
A very useful result can be proved using the Itˆ calculus about the characterisation of o Brownian Motion. Proof In these circumstances it follows that the statement Bt is a Brownian Motion is by deﬁnition equivalent to stating that Bt − Bs is independent of Fs and is distributed normally with mean zero and covariance matrix (t − s)I. B j t = δij t for i. j ∈ {1. B j t = δij t ∀i. B k t . (M t0 )t satisﬁes (M t0 )t  ≤ (M t0 )∞  < ∞. j j ∂xk ∂x ∂t 2 ∂x 1 2 1 j =iθj f (Bt . But note that for each t. t)dBt + (Bt .1. and shall prove Bt is a Brownian Motion. t)dt + (Bt .
and is a sum of stochastic integrals with respect to continuous local martingales and is hence itself a continuous local martingale. t)dBt . . . if and only iﬀ B i . t)d B j . . . t)). Then 1 n Bt = (Bt . . Now to establish the converse. n}. Clearly if Bt is a Brownian motion then the covariation result follows trivially from the deﬁnitions. t)) = ∂f 1 ∂2f ∂f j (Bt . x) + 1 2 θ t .12. t)dt 2 2 j =iθj f (Bt . Mtθ  = e 2 θ Hence for any ﬁxed time t0 . Let {B i }t≥0 be continuous local martingales starting from zero for i = 1. t) = exp i(θ. P) adapted to the ﬁltration Ft . Observe that for ﬁxed θ ∈ Rn we can deﬁne Mtθ by Mtθ := f (Bt . n}. . . n. Theorem 12. .
t
Hence Mtθ = 1 +
0
d(f (Bt . Bt ) is a Brownian motion with respect to (Ω. t)dBt + θ f (Bt . 2
By application of Itˆ’s formula to f we obtain (in diﬀerential form using the Einstein o summation convention) d (f (Bt . . we assume B i . [46]
1 2
t
<∞
. . j ∈ {1. . . .
L´vy Characterisation of Brownian Motion e and so is a bounded local martingale. Thus for 0 ≤ s < t we have 1 2 E (exp (i(θ. Hence M t0 is a genuine martingale. so by the L´vy character theorem Bt − Bs is a N (0. e
. t − s) random variable. hence (M t0 )t ) is a martingale.s. Bt − Bs )) Fs ) = exp − (t − s) θ 2 a.
47
However this is just the characteristic function of a normal random variable following N (0. t − s).
s
Proof We may assume that the map t → M t is strictly increasing. [Un := M Note that from these deﬁnitions τt∧Un = inf{s > 0 : M = inf{s > 0 : M =Tn ∧ τt So
T ˜ ˜ AUn = As∧Un = Mτtn . Then deﬁne ˜ As := Mτs . Let M be a continuous local martingale starting from zero. (i) This τs is an F stopping time.
> t ∧ Un } >t∧ M
Tn }
˜ Now note that Un is an Ft stopping time.1.(ii) and (iii). such that M t → ∞ as t → ∞. Hence the results (i).
˜ ˜ the latter event is clearly Fτt measurable i. Moreover the process As is an Fs adapted Brownian Motion. can be written as a time change of Brownian motion. Note that the map s → τs is the inverse to t → M t . Proposition 13. (iii) The local martingale M can be written as a time change of Brownian Motion as ˜ ˜ ˜ Mt = B M .
[48]
. s s s Tn . since consider Λ ≡ {Un ≤ t} ≡ { M
Tn
≤ t} ≡ {Tn ≤ τt }. where Fs is t ˜ the timechanged σ algebra i. We may now apply the optional stopping theorem to the UI martingale M Tn . it is Ft measurable.13. Fs = Fτ . so Un is a Ft stopping time. which yields T ˜t ˜ ˜ ˜ E AUn Fs =E At∧Un Fs = E Mτtn Fs
T T ˜s =E Mτtn Fτs = Mτsn = AUn .e. Then deﬁne τs := inf{t > 0 : M t > s}. (ii) M τs = s.e. So modulo a time change a Brownian motion is the most general kind of continuous local martingale. Time Change of Brownian Motion
This result is one of frequent application. Deﬁne Tn := inf{t : M t > n}. essentially it tells us that any continuous local martingale starting from zero.
1. If M is a continuous local martingale starting from zero. Mt = Bf (t) . ˜ ˜ Hence A2 − t is a Ft local martingale. that is we must check that for almost every ω that we must check that A M is constant on each interval of constancy of M .e. Proof Note that by the time change of Brownian Motion theorem. characterisation theorem A
s
13. and deﬁne Sq := inf{t > q : M
t
> M q }..
t
= f (t). Sq ).Time Change of Brownian Motion ˜ ˜ So At is a Ft local martingale.
then it is enough to show that M is constant on [q. t where B is a standard Brownian Motion. Tn since M is a UI martingale. Gaussian Martingales
The time change of Brownian Motion can be used to prove the following useful theorem. = s and so by the L´vy e
˜ which establishes that A is continuous. and M t is deterministic. ˜ ˜ ˜ Thus A is a continuous Ft adapted martingale with As ˜s is a Brownian Motion. a deterministic
. then M is a Gaussian Martingale (i. Before we can apply L´vy’s characterisation theorem e ˜ is continuous. By localisation it suﬃces to consider M a square integrable martingale. hence for almost all t. But M 2 − M is a martingale. By hypothesis M function for almost all t.2. By the optional stopping theorem. now let q be a positive rational. hence E
2 MSq − M 2 Sq 2 Fq =Mq − M 2 =Mq − M q Sq . Theorem 13.e. that is if we can ﬁnd a deterministic function f taking values in the nonnegative real numbers such that M t = f (t) a. Mt has a Gaussian distribution for almost all t). But we also know that (M 2 − M )Tn is a UI martingale.
as M
q
= M
Sq .
Hence E MSq − Mq
2
Fq = 0. we can write Mt as a time change of Brownian Motion through Mt = B M . for 0 < r < s we have ˜s∧U ˜ E A2 n − (s ∧ Un )Fr = E =E
T Mτsn 2
49
− M
Tn
τs ∧Tn
Fτr
Tn τr
2 M τs − M
τs
2 Fτr = Mτr − M
˜r∧U =A2 n − (r ∧ Un ).
Let g(t) be a deterministic function of t. and hence E(Z0 ) = E(Zt ). we see that this is a continuous local martingale.3.
. f (t)).Time Change of Brownian Motion but the right hand side is a Gaussian random variable following N (0. As a corollary consider the stochastic integral of a purely deterministic function with respect to a Brownian motion. Hence M is a Gaussian Martingale. then M deﬁned by
t
50
Mt :=
0
f (s)dBs .
satisﬁes Mt ∼ N 0. and at time t it has distribution given by N (0. Consider Z deﬁned via e 1 Zt = exp iθMt + θ2 M 2 . Corollary 13.
0
t
f (s)2 ds .
t
=
0
f (s)ds. the quadratic variation of M is given by
t
M hence the result follows. M t ).
t
as in the L´vy characterisation proof. and by the KunitaWatanabe result.
This result can also be established directly in a fashion which is very similar to the proof of the L´vy characterisation theorem. it is clear that M is a continuous local martingale.
Proof From the deﬁnition of M via a stochastic integral with respect to a continuous martingale. e and by boundedness furthermore is a martingale. whence 1 E (exp(iθMt )) = E exp − θ2 2
0 t
f (s)2 ds
which is the characteristic function of the appropriate normal distribution.
Let Y be an Ft measurable random variable. but this will not be proved in detail here. EQ (Y Fs ) = Ls Then for any A ∈ Fs . This follows because the measures P and Q are equivalent. and Xt is an Ft adapted process then the following results hold dQ Xt .14. such that EQ (Y ) < ∞. If P and Q are equivalent measures. Girsanov’s Theorem
Girsanov’s theorem is an element of stochastic calculus which does not have an analogue in standard calculus. (P and Q). and the second is a continuous time version of Bayes theorem. we don’t want either of them simply to throw information away.
14. and if A is any event in the sample space then P(A) > 0 ⇔ Q(A) > 0. Theorem 14. [51]
. In other words P is absolutely continuous with respect to Q and Q is absolutely continuous with respect to P. Change of measure
When we wish to compare two measures P and Q. dP
Here Lt is the RadonNikodym derivative of Q with respect to P. Two measures P and Q are said to be equivalent if they operate on the same sample space. since when they are positive they can be related by the RadonNikodym derivative. Deﬁnition 14. We shall prove that 1 EP [Y Lt Fs ] a. this motivates the following deﬁnition of equivalence of two measures.1. using the deﬁnition of conditional expectation we have that EQ 1A 1 EP [Y Lt Fs ] Ls =EP (1A EP [Y Lt Fs ]) =EP [1A Y Lt ] = EQ [1A Y ] . Proof The ﬁrst part is basically the statement that the RadonNikodym derivative is a martingale.s. EQ (Xt ) = EP dP EQ (Xt Fs ) = L−1 EP (Lt Xt Fs ) . Substituting Y = Xt gives the desired result. The ﬁrst result basically shows that this is a martingale.1.2. s where Ls = EP dQ Fs .
then W is a Q Brownian motion. Y =(Xt − X. Thus X − X. X − X. hence ZY is a genuine Pmartingale. Deﬁne a sequence of stopping times which tend to inﬁnity via Tn := inf{t ≥ 0 : Xt  ≥ n. M t is a Q Brownian motion. M
Where the result d Z. M is a Q local martingale. But since Z is uniformly integrable. M t  ≥ n}. and Y is bounded (by construction of the stopping time Tn ).
If Z is uniformly integrable. W t = W. M t ) + (Xt − X. M t follows from the KunitaWatanabe theorem. Now consider X. remembering that dZt = Zt dMt as Z is the exponential o martingale associated with M . then a new measure Q. d(Zt Yt ) =Zt dYt + Yt dZt + d Z. a P local martingale. M t )Zt dMt + Zt dXt
=Zt (dXt − d X. ˜ Let Wt be a P Brownian motion. and let Z be the associated exponential martingale Zt = exp Mt − 1 M 2
t
52
. By Itˆ’s formula for 0 ≤ t ≤ Tn . since Tn is a reducing sequence such that (X − X. M t )Zt dMt + Zt d X. Proof ˜ Use L´vy’s characterisation of Brownian motion to see that since Wt is continuous and e ˜ ˜ ˜ W . equivalent to P may be deﬁned by dQ = Z∞ . Corollary 14. we have EQ [(Yt − Ys )1A ] = E [Z∞ (Yt − Ys )1A ] = E [(Zt Yt − Zs Ys )1A ] = 0. Hence for s < t and A ∈ Fs . or  X. Hence ZY is a Plocal martingale. Proof Since Z∞ exists a. Hence Q constructed thus is a probability measure which is equivalent to P. Y t = Zt d X. and Tn ↑ ∞ as n → ∞. M )Tn is a Qmartingale. then Wt := Wt − W. M t ) + Yt Zt dMt + d Z. since Wt is a P Brownian motion. M is a Q local martingale. a version of which is given by Zt = E(Z∞ Ft ).s. W t = t.
. it deﬁnes a uniformly integrable martingale (the exponential martingale).3. hence Y is a Q martingale. Y
t t t
=Zt (dXt − d X. Now consider the process Y deﬁned via Y := X Tn − X Tn .Girsanov’s Theorem Theorem (Girsanov). M . dP Then if X is a continuous P local martingale. Let M be a continuous local martingale.
Let Bt be a Brownian Motion on Rn and Gt is the usual augmentation of the ﬁltration generated by Bt .s.15. The outline of the proof is to describe all Y s which are GT measurable which cannot be represented in the form (1) as belonging to the orthogonal complement of a space. which includes Z so E(Z 2 ) = 0 and hence Z = 0 a. The theorem will be proved if we can establish that the image is the whole space.
As a consequence of the Itˆ isometry theorem. Brownian Martingale Representation Theorem
The following theorem has many applications. thus 0 E(ZX) = 0 for all X ∈ L2 (GT ) 0 [53] (2)
. Finally we show that this space is suﬃciently big that actually we have proved this for all GT measurable functions. 0 Suppose that Z is in the orthogonal complement of L2 (GT ). If Y is L2 integrable and is measurable with respect to G∞ then there exists a previsible Gt measurable process Hs uniquely deﬁned up to evanescence such that
t
E(Y Gt ) = E(Y ) +
0
Hs · dBs
(1)
The proof of this result can seem hard if you are not familiar with functional analysis style arguments. Deﬁne the space
T
L2 (B) = T
H : H is Gt previsible and E
0
Hs 2 ds
<∞
Consider the stochastic integral map I : L2 (B) → L2 (GT ) T deﬁned by
T
I(H) =
0
Hs · dBs . but these methods are beyond the scope of these notes! Theorem 15. even though the result is purely an existence theorem.1. We follow the usual approach in such proofs. Then we show that for Z in this orthogonal complement that E(ZX) = 0 for all X in a large space of GT measurable functions. for example in the rigorous study of mathematical ﬁnance. and we are done! Proof Without loss of generality prove the result in the case EY = 0 where Y is L2 integrable and measurable with respect to GT for some constant T > 0. this map is an isometry. Hence the image V o under I of the Hilbert space L2 (B) is complete and hence a closed subspace of L2 (GT ) = 0 T {H ∈ L2 (GT ) : EH = 0}. The Malliavin calculus oﬀers methods by which the process H in the following theorem can be stated explicitly. consider the orthogonal complement of V in L2 (GT ) and we aim to show that every element of this orthogonal complement is zero.
. We know that the sigma ﬁeld G0 is trivial by the 01 law therefore Z0 = E(ZG0 ) = EZ = 0.
so consequently NS ∈ V . The complex valued functions deﬁned on (Rn )m by (n)
K m
P
(r)
(x1 .k · xj
(r)
. n. conditioning on each Gtj we establish that 1 θ E ZT exp i = 0.Brownian Martingale Representation Theorem We can deﬁne Zt = E(ZGt ) which is an L2 bounded martingale.S] ). This is true for any choices of θ j ∈ Rn for j = 1. it follows that Zt Nt is a uniformly integrable martingale. .
θ such a process can be taken as H = iθ Mt in the deﬁnition of NT and by the foregoing argument we see that Zt Mt is a martingale. Thus 1 θ θ Zs Ms = E(Zt Mt Gs ) = E Zt exp iθ · Bt + θ 2 t 2 Thus 1 θ Zs exp − θ 2 (t − s) 2 Gs
θ = E ( Zt exp (iθ · (Bt − Bs ) Gs )) . xm ) =
k=1
(r) ck
exp i
j=1
aj. Since the stochastic exponential of a process may be written as 1 θ θ θ Mt = E(iθ · Bt ) = exp iθ · Bt + θ 2 t 2
t
=
0
iMt θ · dBt . .
Consider a partition 0 < t1 < t2 < · · · < tm ≤ T . Let S be a stopping time such that S ≤ T then
S T
54
NS = E(NT GS ) = E
0
Hs · dBs +
S
Hs · dBs GS
= I(H1(0. Clearly NT ∈ V as it is the image under I of some H. . . E(ZNS ) = E(E(ZNS GS )) = E(NS E(ZGS )) = E(ZS NS ) = 0 Since ZT and NT are square integrable. we may deﬁne Nt = E(NT Gt ) for 0 ≤ t ≤ T . . and by repeating the above argument. The orthogonality relation (2) then implies that E(ZNS ) = 0. θ j · (Btj − Btj−1 ) = E Z0 exp − (tj − tj−1 )θ j 2 2 j (3) where the last equality follows since Z0 = 0. Let H ∈ L2 (B) let NT = I(H). . Thus using the martingale property of Z.
The reader should examine the latter part of the proof carefully. Btn ) = 0 Hence by uniform approximation we can extend this to any f continuous. bounded E (ZT f (Bt1 . . Btn ) with 0 < t1 < t2 < · · · < tn ≤ T .. . But we have already shown in (3) that E ZT P (r) (Bt1 . . . The full proof of this result will reappear in a more abstract form in the stochastic ﬁltering section of these notes. . A set S is said to be total if E(af ) = 0 for all a ∈ S implies a = 0 a. . This establishes the desired result. The function ZT ∈ GT . Thus we can 2 take X = ZT whence E(ZT ) = 0 which implies that ZT = 0 a. for given distinct points we can choose coeﬃcients such that the functions have distinct values at these points). t]. . . Now we use the monotone class framework. E(ZT H) = 0 This {calH} is a vector space. . 1990])..s. . their uniform closure is the space of complex valued functions (recall that the complex variable form of this theorem only requires local compactness of the domain). form a linear space and are closed under complex conjugation. and contains the constant one since E(Z) = 0. and we have shown E(ZT X) = 0 for X ∈ GT . it is in fact related to the proof that the set
t
55
exp i
0
θ s · dBs
: θ ∈ L∞ ([0. Rm )
is total in L1 . Therefore we can approximate any continuous bounded complex valued function f : n m (R ) → C by a sequence of such P s.
.e. Therefore by the StoneWeierstass theorem (see [Bollobas. . Thus the monotone class theorem implies that it contains all functions which are measurable with respect to GT . . The foregoing argument shows that it contains all H measurable with respect to the sigma ﬁeld σ(Bt1 .s.Brownian Martingale Representation Theorem clearly separate points (i. consider the class H such that for H ∈ H. Btn )) = 0.
then the folllowing holds almost surely
t 2 bi (s. Xs ) ds < ∞. The variety of SDE to be considered here describes a diﬀusion process and has the form dXt = b(t. but as we have seen the Itˆ form has numerous calculational advantages (especially the fact that local o martinagles are a closed class under the Itˆ integral). so in theory we can integrate the trajectories from the initial point. such that for all x and y such that x ≤ n and y ≤ n then f (x) − f (y) ≤ Kn x − y .
Lipshitz Conditions Let · denote the usual Euclidean norm on Rd . we know that it’s motion will obey Newton’s law to a very high degree of accuracy. (∗) where bi (x. o Strong Solutions A strong solution of the SDE (∗) on the given probability space (Ω. For example in the problem of tracking a satelite. Xs ) + σij (s. F. Stochastic Diﬀerential Equations
Stochastic diﬀerential equations arise naturally in various engineering problems. t). which is denoted Ft . where the eﬀects of random ‘noise’ perturbations to a system are being considered. Xt ) + σ(t. and σij (t. P) with initial condition ζ is a process (Xt )t≥0 which has continuous sample paths such that (i) Xt is adapted to the augmented ﬁltration generated by the Brownian motion B and initial condition ζ. so it is conventional to transform the o SDE to the Itˆ form before proceeding.16. Recall that a function f is said to be Lipshitz if there exists a constant K such that f (x) − f (y) ≤ K x − y . In practice such SDEs generally occur written in the Statonowich form. Xt )dBt . (ii) P(X0 = ζ) = 1 (iii) For every 0 ≤ t < ∞ and for each 1 ≤ i ≤ d and 1 ≤ j ≤ r. i=1 j=1
σ
2
=
The concept of Liphshitz continuity can be extended to that of local Lipshitz continuity. by requiring that for each n there exists Kn . 0
(iv) Almost surely the following holds
t t
Xt = X0 +
0
b(s. Xs )ds +
0
σ(s. [56]
. However in practice there are other random eﬀects which perturb the motion. we shall generalise the norm concept to a (d × r) matrix σ by deﬁning
d r 2 σij . x) for 1 ≤ i ≤ d and 1 ≤ j ≤ r are Borel measurable functions. Xs )dBs .
x) are locally Lipshitz continuous in the spatial variable (x). Deﬁne a sequence of stopping times τn = inf{t ≥ 0 : Xt ≥ n}.Stochastic Diﬀerential Equations Strong Uniqueness of Solutions Theorem (Uniqueness) 16. x) and σ(t.1.
2
+
0
˜ Now we consider evaluating E Xt∧Sn − Xt∧Sn (a + b)2 ≤ 4(a2 + b2 ). ˜ E Xt∧Sn − Xt∧Sn
t∧Sn 2
. x) − σ(t. These stopping times are only needed because b and σ are being assumed merely to be locally Lipshitz. ˜ Now set Sn = min(τn . as it is most important to understand exactly where each step comes from! Proof ˜ Suppose that X and X are strong solutions of (∗). Therefore I make no apology for spelling the proof out in excessive detail. Hence ˜ Xt∧Sn − Xt∧Sn =
0 t∧Sn t∧Sn
57
˜ b(u. Xu ) du ˜ σ(u. Xu ) − σ(u. σ). y) ≤ Kn x − y . y) + σ(t. Suppose that b(t. The proof of this result is importanty inasmuch as it illustrates the ﬁrst example of a technique of bounding which recurs again and again throughout the theory of stochastic diﬀerential equations. and Sn → ∞ a. Xu ) − σ(u. as will be needed in the existence part of the proof. F. relative to the same brownian motion B and initial condition ζ on the same probability space (Ω. Xu ) dWu . Xu ) − b(u. Clearly Sn is also a stopping time. then this complexity may be ignored. that is ˜ P Xt = Xt ∀t : 0 ≤ t < ∞ = 1. Xu ) − b(u. and τn = inf{t ≥ 0 : Yt ≥ n}. ˜ Then strong uniqueness holds for the pair (b. Xu ) du
2
t∧Sn
+ 4E
0
˜ σ(u. P). (P) as ˜ n → ∞.s. the ﬁrst stage follows using the identity
2
≤4E
0
˜ b(u. which is to say that if X and X are two ˜ are indistinstrong solutions of (∗) relative to B with initial condition ζ then X and X guishable. That is for every n ≥ 1 there exists a constant Kn > 0 such that for every t ≥ 0. τn ). x ≤ n and y ≤ n the following holds b(t. x) − b(t. If they are assumed to be Lipshitz. Xu ) dWu
.
Xu )
2
2
ds
≤E t
0
˜ b(u. Xu ) − b(u. Xu ) − b(u. with p−1 + q −1 = 1 the following inequality is satisﬁed. ∞). Xu ) du
t∧Sn
2
≤E
0 t∧Sn
˜ b(u. Xu ) − b(u. Xu ) dWu
t∧Sn
=E
0
˜ σ(u. Xu ) − σ(u. The following form omits some measure theoretic details which are very important. Xu )2 du
The classical H¨lder inequality (in the form of the Cauchy Schwartz inequality) for o Lebsgue integrals which states that for p. Letting n → ∞ ˜ we see that the same is true for {Xt }t≥0 and {Xt }t≥0 . Xu )2 du
t
2 ≤ 4(T + 1)Kn E 0
˜ Xu∧Sn − Xu∧Sn
2
du
Now by Gronwall’s lemma. Xu ) − b(u.Stochastic Diﬀerential Equations Considering the second term.
. and thus indistinguishable. Xu ) − σ(u. Xu )
Thus combining these two useful inequalities and using the nth local Lipshitz relations we have that ˜ E Xt∧Sn − Xt∧Sn
t∧Sn 2
≤4tE
0 t∧Sn
˜ b(u. Xu ) du
t∧Sn
≤E
0
1ds
0 t∧Sn
˜ b(u. which in this case has a zero term outside of the integral. we ˜ ˜ see that E Xt∧Sn − Xt∧Sn 2 = 0. q ∈ (1. taking p = q = 2 which yields
t∧Sn 2
E
0
˜ b(u. so
t∧Sn 2
58
E
0
˜ σ(u. Now we impose Lipshitz conditions on the functions b and σ to produce an existence result. Xu ) − σ(u. Xu )
2
+ 4E
0
˜ σ(u. That is these two processes are modiﬁcations. we use the Itˆ isometry which we remember states that o (H · M ) 2 = H M . Xu ) − b(u. for a clear treatment see Chung & Williams chapter 10.
1/p 1/q
f (x)g(x)dµ(x) ≤
f (x)p dµ(x)
g(x)q dµ(x)
This result may be applied to the other term. and hence that P(Xt∧Sn = Xt∧Sn ) = 1 for all t < ∞.
independent of the Brownian Motion Bt . x) − σ(t. Then there exists a continuous. such that E supt≤T Xt < ∞. analogously to the deterministic case to prove the existence of solutions to ﬁrst order ordinary diﬀerential equations. F. Let F be a map from the space 2 CT of continuous adapted processes X from Ω × [0. y) ≤ Kx − y. Proof This proof proceeds by Picard iteration through a map F . with Xt
(k+1)
(0)
= ξ. T ) such that EXt 2 ≤ C 1 + Eξ2 eCt . P).
59
and additionally the bounded growth condition b(t. t b(u. with ﬁnite second moment. Let Ft be the augmented ﬁltration associated with the Brownian Motion Bt and ξ. we have using the same bounds as in the uniqueness result that
2 T 2
E
sup F (X)t − F (Y )t
0≤t≤T
≤ 2E sup
t≤T 0
(σ(Xs ) − σ(Ys )) dBs
T 0 T 2 2
+ 2E sup
t≤T
(b(Xs ) − b(Ys )) ds sup Xt − Yt 2
t≤T
≤ 2K 2 (4 + T )
0
E
dt.2.
By induction we see that for each T we can prove
2
E
sup F (X) − F (Y )
t≤T
n
n
C nT n E ≤ n!
2
sup Xt − Yt
t≤T
. for 0 ≤ t ≤ T . x) − b(u. Deﬁne Xt
(k)
recursively. Xs )dBs
Xt
= F (X k )t = ξ +
[Note: we have left out checking that the image of X under F is indeed adapted!] Now note that using (a + b)2 ≤ 2a2 + ab2 . x)2 + σ(t. If the coeﬃcients b and σ satisfy the global Lipshitz conditions that for all u. σ(t. T ] to R. Xs )ds + 0 0 (k) σ(s. x)2 ≤ K 2 (1 + x2 ) Fix a probability space (Ω. adapted process X which is a strong solution of the SDE with initial condition ξ. Additionally this process is square integrable: for each T > 0 there exists C(K. y) ≤ Kx − y. This is a departure from the more conventional proof of this result.Stochastic Diﬀerential Equations Theorem (Existence) 16. Let ξ be a random valued vector. and
t t (k) b(s.
and has already been shown to be unique.Stochastic Diﬀerential Equations So by taking n suﬃciently large we have that F n is a contraction mapping and so by the contraction mapping theorem. which must be (T ) (T ) unique. N ∈ N. F n mapping CT to itself has a ﬁxed point. and so we may consistently deﬁne X ∈ C by Xt = Xt
(N )
60
for t ≤ N.
which solves the SDE. call it X (T ) .
.s. Clearly from the uniqueness part Xt = Xt for t ≤ T ∧ T a..
Surprising though the results may seem they often provide a viable route to calculating the solutions of explicit PDEs (an example of this is solving the BlackScholes Equation in Option Pricing. ∂xi ∂xj
Substituting for dXt from the SDE we obtain. d t ∂f 1 f (Xt ) − f (X0 ) = bj (Xs ) j (Xs ) + ∂x 2 0 j=1
t d
d
d
d
σik σkj
i=1 j=1 k=1
∂ f (Xs ) dt ∂xi ∂xj
2
+
0 j=1 t
σij (Xs )
∂f (Xs )dBs ∂xj
t d
=
0
Af (Xs )ds +
0 j=1
σij (Xs ) [61]
∂f (Xs )dBs ∂xj
. X0 =x0 where σ is a d × d matrix with elements σ = {σij }.1. ∂xi ∂xj
Why is the deﬁnition useful? Consider application of Itˆ’s formula to f (Xt ). At ﬁrst this may well seem to be surprising since one problem is entirely deterministic and the other in inherently stochastic!
17.
whence A takes the simpler form
d
A=
j=1
bj (Xt )
∂ 1 + ∂xj 2
d
d
aij (Xt )
i=1 j=1
∂2 . Relations to Second Order PDEs
The aim of this section is to show a rather surprising connection between stochastic diﬀerential equations and the solution of second order partial diﬀerential equations. dXt =b(Xt )dt + σ(Xt )dBt . where d d d d 1 ∂2 ∂ k σik (Xt )σkj (Xt ) i j . which o yields
t d
f (Xt ) − f (X0 ) =
0 j=1
∂f 1 (Xs )dXs + j ∂x 2
t
d
d
0 i=1 j=1
∂2f (Xs )d X i .17. than as a second order PDE). Inﬁnitesimal Generator
Consider the following ddimensional SDE. This SDE has inﬁnitesimal generator A. X j s . A= b (Xt ) j + ∂x 2 i=1 j=1 ∂x ∂x j=1
k=1
It is conventional to set
d
aij =
k=1
σik σkj . which is much easier to solve via stochastic methods.
R).
17.2. similar to that o of the above discussion.1. The Dirichlet Problem
Let Ω be a subspace of Rd with a smooth boundary ∂Ω.
u = f on ∂D. The proof follows by an application of Itˆ’s formula to Mtφ .
. ∂s
then Mtφ is a local martingale. A simple example of a Dirichlet Problem is the solution of the Laplace equation in the disc. ∂xi ∂xj
which is associated as before to an SDE. Xt ) − φ(0. i.
2
u = 0 on D. Xs )ds.e. The Dirichlet Problem for f is deﬁned as the solution of the system Au + φ = 0 on Ω. for Xt a solution of the SDE associated with the inﬁnitesimal generator A. u = f on ∂Ω. with Dirichlet boundary conditions on the boundary. We say that Xt satisﬁes the martingale problem for A.
2 is a martingale for each f ∈ Cc (Rd ). This SDE will play an important role in what is to follow. X0 ) −
0
∂ + A φ(s.Relations to Second Order PDEs
62
Deﬁnition 17. if Xt is Ft adapted and
t
Mt = f (Xt ) − f (X0 ) −
0
Af (Xs )ds. This can be generalised if we consider test functions φ ∈ C 2 (R+ × Rd . It is simple to verify from the foregoing that any solution of the associated SDE solves the martingale problem for A. Where A is a second order partial diﬀerential operator of the form ∂ 1 A= b (Xt ) j + ∂x 2 j=1
j d d d
aij (Xt )
i=1 j=1
∂2 . and deﬁne t
Mtφ := φ(t.
conditional on x0 = x. X0 =x0 Often in what follows we shall want to consider the conditional expectation and probability measures. 2 2 For each f ∈ Cb (∂Ω) there exists a unique u ∈ Cb (Ω) solving the Dirichlet problem for f . / Then u(x) given by
T
u(x) := Ex
0
φ(Xs )ds + f (XT ) . it is clear that dMt = 0. by Itˆ’s formula.
Now remember the SDE which is associated with the inﬁnitesimal generator A: dXt =b(Xt )dt + σ(Xt )dBt . For t < T by Itˆ’s formula o dMt = du(Xt ) + φ(Xt )dt. y)f (y)σ(dy). X j i ∂xj ∂x
d d d
t
=
j=1
∂u 1 (Xt ) [b(Xt )dt + σ(Xt )dBt ] + j ∂x 2
d
i=1 j=1 k=1
∂du σik σkj i j (Xt )dt ∂x ∂x
=Au(Xt )dt +
j=1
σ(Xt )
∂u (Xt )dBt .
solves the Dirichlet problem for f .Relations to Second Order PDEs Theorem 17. Also. ∂xj
Putting these two applications of Itˆ’s formula together yields o
d
dMt = (Au(Xt ) + φ(Xt )) dt +
j=1
σ(Xt )
∂u (Xt )dBt . For t ≥ T . these will be denoted Ex and Px respectively.
We shall now show that this Mt is a martingale. 2 Moreover there exists a continuous function m : Ω → (0.2. o
d
du(Xt ) =
j=1 d
1 ∂u i (Xt )dXt + i ∂x 2
d
d
i=1 j=1
∂2u (Xt )d X i . Deﬁne a stopping time via T := inf{t ≥ 0 : Xt ∈ Ω}. Theorem (Dirichlet Solution). ∞) such that for all f ∈ Cb (∂Ω) this solution is given by u(x) =
∂Ω
63
m(x. Proof Deﬁne Mt := u(XT ∧t ) +
0
t∧T
φ(Xs )ds. ∂xj
.
But for arbitrary φ and f . Letting t → ∞. we have via monotone convergence that Ex (T ) < ∞. t ∂xj
from which we conclude by the stability property of the stochastic integral that Mt is a local martingale. and f ≡ 0. hence
d
64
dMt = (Au(Xt ) + φ(Xt )) dt +
j=1 d
σ(Xt )
∂u (Xt )dBj .
. We cannot simply apply the optional stopping theorem directly..Relations to Second Order PDEs but since u solves the Dirichlet problem.
x∈Ω x∈Ω
whence as Ex (T ) < ∞. we have that Mt  ≤ u
∞
+T φ
∞
= sup u(x) + T sup φ(x). let φ(x) ≡ 1. since T is not necessarily a bounded stopping time.s. and by the martingale convergence theorem has a limit M∞ . t j ∂x
=
j=1
σ(Xt )
∂u (Xt )dBj . t]. hence T < ∞ a. However Mt is uniformly bounded on [0.
T
u(x) = Ex (M0 ) = Ex (M∞ ) = Ex f (XT ) +
0
φ(Xs )ds . and hence Mt is a martingale. by the optional stopping theorem. since we know that the solutions u is bounded from the PDE solution existance theorem. this gives u(x) = Ex (M0 ) = Ex (MT ∧t ) = Ex [u(XT ∧t ) + (T ∧ t)]. In particular.
Hence from the identity Ex M0 = Ex M∞ we have that. the martingale M is uniformly integrable. since T ∧ t is a bounded stoppping time. This limiting random variable is given by
T
M∞ = f (XT ) +
0
φ(Xs )ds. then (Au + φ)(Xt ) = 0.
1. a condition of the form u(0. Theorem (Cauchy Existence) 17. Moreover there exists a continuous function (the heat kernel) p : (0. x) = f (x) ∀t ≥ 0. then the solution has the beautifully simple form u(t. x) = Ex [f (XT ∧t )]
. In addition the boundary has its temperature ﬁxed at zero. 1. u(0. y)f (y)dy. ∞). ∂u 1 = ∂t 2
2
u. x) = 0 for x ∈ ∂Ω.Relations to Second Order PDEs
65
17.
where the function u represents the temperature in a region Ω. where Bt is a standard Brownian Motion.2 Let u ∈ Cb (R × Rd ) be the solution of the Cauchy Problem for f . on x ∈ ∂Ω A typical problem of this sort is to solve the heat equation. x. x) =
Rd
p(t. i. x) = f (x) for x ∈ Ω. and the boundary condition is to specify the temperature ﬁeld over the region at time zero. If Ω is just the real line. a Cb function. x) = Ex (f (Bt )) . The Cauchy Problem
2 The Cauchy Problem for f . is the solution of the system
∂u = Au on Ω ∂t u(0.e.
Theorem 17.4. ∞) × Rd × Rd → (0. / a stopping time. x) = f (x) on x ∈ Ω u(t. the solution to the Cauchy Problem for f is given by
u(t.3.3. Then u(t.
2 such that for all f ∈ Cb (Rd ).2 2 For each f ∈ Cb (Rd ) there exists a unique u in Cb (R × Rd ) such that u solves the Cauchy Problem for f . Then deﬁne T := inf{t ≥ 0 : Xt ∈ Ω}.
t
k=1
We obtain a similar result in the 0 ≤ t ≤ T ≤ s. XT ). t) + V (x)Φ(x. For 0 ≤ s ≤ T ≤ t and for 0 ≤ t ≤ s ≤ T .
17. t) = i Φ(x. so from the boundary condition.Relations to Second Order PDEs Proof Fix s ∈ (0.
we see that Mt is a local martingale. Feynman introduced the concept of a pathintegral to express solutions to such an equation. where XT ∈ ∂Ω. ∞) and consider the time reversed process Mt := u((s − t) ∧ T. for 0 ≤ T ≤ t ≤ s. Xt )dt + j ∂t ∂x j=1 1 + 2
d d d
66
i=1 j=1
∂2u (s − t. Mt = f (XT ). and hence it is clear that dMt = 0. there is an action integral which is minimised over all ‘permissible paths’ that the system can take. FeynmanKa˘ Representation c
Feynman observed the following representation for the representation of the solution of a PDE via the expectation of a suitable function of a Brownian Motion ‘intuitively’ and the theory was later made rigorous by Ka˘. 2m ∂t which is a second order PDE.
. Xt )dXt dMt = − (s − t. Xt ) = − + Au (s − t. Xt )d X i . case but with s replaced by T . In a manner which is analogous to the ‘Hamiltonian’ principle in classical mechanics. and hence by optional stopping u(s. t). we have that − ∂u + Au ∂t = 0. in the latter case by Itˆ’s formula we obtain o ∂u ∂u j (s − t. Mt = u((s − t) ∧ T. i ∂xj ∂x
d d
∂u ∂u (s − t. 2 ∂ 2 − Φ(x. the argument is similar. x) = Ex (M0 ) = Ex (Ms ) = Ex (f (Xs∧T )). Thus for u solving the Cauchy Problem for f . Boundedness implies that Mt is a martingale. c In what context was Feynman interested in this problem? Consider the Schr¨dinger Equao tion. Xt )dt + ∂t ∂xj j=1
σjk (Xt )dBk . Xt∧T ). X j t . There are three cases now to consider.4.
x) = f (x) on ∂Rd .
Proof Note that the SDE for Xt has the form dXt = a(Xt )dBt + b(Xt )dt. x) = f (x) ∀t ≥ 0.Relations to Second Order PDEs We have already considered solving the Cauchy problem ∂u = Au on Ω ∂t u(0.
. The FeynmanKa˘ Representation Theorem expresses the solution of a general second c order PDE in terms of an expectation of a function of a Brownian Motion.2 Let u ∈ Cb (R × Rd ) be a solution of the Cauchy Problem with a generator of the above form with initial condition f . To simplify the statement of the result. c 1. x) = f (x) on x ∈ Ω u(t. since this removes the problem of considering the Brownian Motion hitting the boundary.
so in this example we are solving the problem ∂u 1 2 = u(t. Theorem (FeynmanKa˘ Representation). Then
t
u(t. and let Xt be a solution of the SDE with inﬁnitessimal generator A in Rd starting at x. ∂t 2 u(0. on x ∈ ∂Ω where A is the generator of an SDE and hence of the form 1 ∂ A= b (Xt ) j + ∂x 2 j=1
j d d d
67
aij (Xt )
i=1 j=1
∂2 . ∂xi ∂xj
For example let Xt = Bt corresponding to A= 1 2
2
. x) = Ex f (Xt ) exp
0
v(Xs )ds
. ∂xi ∂xj
Now consider the more general form of the same Cauchy problem where we consider a Cauchy Problem with generator L of the form: ∂ 1 L≡A+v = b (Xt ) j + ∂x 2 j=1
j d d d
aij (Xt )
i=1 j=1
∂2 + v(Xt ). we shall work on Ω = Rd . Xt ) + v(Xt )u(Xt ) on Rd .
L=
1 2
2
+ v(Xt ).
Xt )Et dt j ∂x ∂t ∂u j (s − t. For 0 ≤ t ≤ s. since it it bounded.Relations to Second Order PDEs Fix s ∈ (0. let
t
Et = exp
0
v(Xr )dr . Xt )Et dBt . Xt ) exp
0
v(Xr )dr .
For notational convenience. j ∂x
=
j=1
Hence Mt is a local martingale. Mt is a martingale and hence by optional stopping u(s. ∞) and apply Itˆ’s formula to o
t
68
Mt = u(s − t.
whence by Itˆ’s formula dEt = Et v(Xt )dt. we have o
d
dMt =
j=1 d
∂u ∂u j (s − t.
. Xt )Et dBt + − + Au + vu (s − t. x) = Ex (M0 ) = Ex (Ms ) = Ex (f (Xs )Es ) .
engine vibration. In the real world the behaviour of these systems is aﬀected by extra random unknown factors which can not be known. An important modern example of stochastic ﬁltering is the use of GPS measurements to correct an inertial navigation system (INS). taking a starsight or by looking at a handheld GPS receiver). This must be formalised in the language of stochastic calculus. But since there are error in acceleration measurements this position will gradually diverge from the true position. together with unavoidable observation noise. In a similar fashion we might try and correct our INS position estimates by using periodic GPS observations. Even a basic control system such as a central heating thermostat is based upon an element of feedback – observing the state of the system (room temperature) and adjusting the system inputs (turning the boiler oﬀ or on) to keep the system state close to the desired value (the temperature set on the dial). In the example of a high performance aircraft. The simple navigator checks his position as frequently as possible by other means (e. observing the bearing to several landmarks.g. Also all measurements have inherent random errors which cannot be eliminated completely. for example wind eﬀects on the airframe. What is the optimal way to use these GPS measurements to correct our INS estimate? The answer lies in the solution of a stochastic ﬁltering problem. Attempts to solve control problems in this context require that we have some idea how to use the observations made to give a best possible estimate of the system state at a given time. The notation · will be used in this section to denote the Euclidean norm in ddimensional space. sudden movement of an aircraft control surface communicated directly to the control surfaces might cause uncontrolled growing oscillations – which was not the response required and the control system would have to modify the control surfaces to avoid this happening. For example the connection between the ﬂight controls on a modern aircraft such as the Harrier Jump Jet and the aircraft control surfaces go via a complex control system rather than simple wire cables as found in early aircraft. The signal process is the true position and the observation process is the GPS position process. In this basic introduction to stochastic nonlinear ﬁltering we shall consider a system described by a stochastic process called the signal process whose behaviour is governed by a dynamical system containing a Brownian motion noise input. An INS system has transducers which measure acceleration. This is observed.18. additionally the observer continues to move while taking the readings. Stochastic Filtering
Engineers have for many years studied system problems arising in the design of control systems to be placed between user inputs to control system parameters in order to achieve a desired response. This is a phenomenon familiar to anyone who has navigated with a map by deadreckoning (From the station I walk NE for a mile. The goal of the stochastic ﬁltering problem is to determine the law of the signal given the observations. Starting from a known point in space the acceleration can be integrated twice to compute a position. [69]
. E for 2 miles – where am I?). these GPS position measurements can not be made continuously and are subject to random errors.
Signal Process
Let {Xt . and Y :=
t≥0
Yt .
which ensure that the equation for X has a unique solution.
where N is the set of P null subsets of Ω. (5)
where Wt is an m dimensional standard Brownian motion independent of Vt .
18.
These must be augmented in the usual fashion to give
o Yt = Yt ∪ N . y) ≤ k x − y f (t. x)
2
≤ k(1 + x 2 ). x) − σ(t. Xt )dt + σ(t. It is deﬁned to be the solution of the stochastic diﬀerential equation dXt = f (t. Xt )dt + dWt . t ≥ 0} be the signal process. T ] with suitable constants C and c which may be functions of T . Xs )
2
ds < ∞. x) − f (t. E Xt
2
≤C 1+E
X0
2
ect . y) + σ(t. The coeﬃcients satisfy the conditions f (t.Stochastic Filtering
70
18. Y0 = 0. Ft . The function h taking values in Rm satisﬁes a linear growth condition h(t. Observation Process
The observation process satisﬁes the stochastic diﬀerential equation dYt = h(t. As a result of this bound
T
E
0
h(s.1.
(6)
for t ∈ [0. x)
2
≤ k 2 (1 + x 2 ). Xt )dVt .
.2.
A consequence of this condition is that for any T > 0.
Given this process a sequence of observation σalgebras may be deﬁned
o Yt := σ (Ys : 0 ≤ s ≤ t) . x)
2
+ σ(t. (4) where V is a d dimensional standard Brownian motion.
The Filtering Problem
The above has set the scene. The system state is governed by an SDE with a noise term representing random perturbations to the system. The ﬁltering problem is to ﬁnd the conditional law of the signal process given the observations to date. Let Tn be a reducing sequence for the local martingale Zt . Xt )dWt . Change of Measure
To solve the ﬁltering problem Girsanov’s theorem will be used to make a change of measure to a new measure under which the observation process Y is a Brownian motion. Xs )dWs . to ﬁnd πt (φ) := E (φ(Xt )Yt ) ∀t ∈ [0. By application of Itˆ’s formula to the function f deﬁned by o f (x) = x . 1+ x
. an increasing sequence of stopping times tending to inﬁnity as n → ∞ such that for each n. Proposition 18. Z Tn is a genuine martingale. Zt is a positive continuous local martingale and hence is a supermartingale. The process Zt is a martingale with respect to the ﬁltration {Ft . we have a real physical system whose state at time t is represented by the vector Xt .e.
that is Zt solves the SDE
dZt = −Zt hT (t. and the local martingale property
T T T EZt = E lim Zt n ≤ lim inf EZt n = EZ0 n = 1. This is observed to give the observation process Yt which includes new independent noise (represented by Wt ). t ≥ 0}. Let
t
Zt := exp −
0
hT (s. T ].3. (7)
The following proofs are in many cases made complex by the fact that we do not assume that the functions f . (9) This will be used as an upper bound in an application of the dominated convergence theorem.4. To prove that it is a true martingale we must prove in addition that it has constant mean.
18.Stochastic Filtering
71
18. Xs )
0
2
ds .
(8)
be a real valued process.
Given this expression. Proof The process Zt satisﬁes
t
Zt = 1 −
0
Zs hT (s.1. Z0 = 1. i. n→∞
so EZt ≤ 1 ∀t. By Fatou’s lemma.e. Xs )dWs −
1 2
t
h(s. h and σ are bounded. i.
Xs )
2
ds .
however as a consequence of our linear growth condition on h. We therefore compute Zs hT (s. Xs ) (1 + Zs )2
2
ds ≤
1
2
T
E
0
h(s. Xs ) (1 + Zs )4 2 Zs 1 = × × h(s. The next step in the proof is explained in detail as it is one which occurs frequently.
0 t 0 2 Zs h(s. whence the integrand must be in L2 (W ) and hence the integral a genuine martingale.
2
Using this inequality we obtain
t
E
0
Zs hT (s. Xs ) 2 (1 + Zs ) (1 + Zs )2 1 2 ≤ 2 h(s. and ≤ 2. Zs . since it is a stochastic integral. that
2 1 Zs 1 ≤ 1. Xs ) (1 + Zs )3 2
Hence Zt 1 = − 1 + Zt 1+ Consider the term
0
t
Zs hT (s. We note that since Zs ≥ 0. Xs ) (1 + Zs )2
2
ds . and hence taking the expectation of (10) we obtain E Zt 1 = −E 1 + Zt 1+
t 0 2 Zs h(s. Xs ) dWs .
(10)
Zs hT (s. the last expectation is ﬁnite. Xs ) dWs − (1 + Zs )2
t 0
ds. Zs hT (s.Stochastic Filtering we obtain that
t
72
f (Zt ) = f (Z0 ) +
0
f (Zs )dZs +
1 2
t
f (Zs )d Zs . 2 (1 + Zs ) (1 + Zs )
so.
with a view to showing that it is ﬁnite.
. We wish to show that the above stochastic integral is in fact a genuine martingale. Xs ) dWs (1 + Zs )2 0 is a genuine martingale. (1 + Zs )2
clearly this a local martingale. From the earlier theory (for L2 integrable martingales) it suﬃces to show that integrand is in L2 (W ). Therefore t Zs hT (s. Xs ) (1 + Zs )3 2
ds . Xs ) (1 + Zs )2
t
=E
W 0
Zs hT (s. Xs ) (1 + Zs )2
2
=
2 Zs 2 h(s. Xs ) .
. since E(Zs ) ≤ 1. E
0 t
Zs 1 + Zs
→ E(Zs ). Xs ) (1 + Zs )3
2
ds → 0. So from the fact that EZs ≤ 1 and using the next lemma we shall see that E Zs Xs 2 ≤ C. Xs ) (1 + Zs )3 1 + Zs ≤ · Zs · h(s.
Proof To establish this result Itˆ’s formula is used to derive two important results o d( Xt 2 ) =2XT (f dt + σdWs ) + tr(σ T σ)dt t d(Zt Xt 2 ) = − Zt Xt 2 hT dVs + Zt 2XT (f dt + σdWs ) + tr(σσ T ) dt t By Itˆ’s formula. Xs ) (1 + Zs )3 ≤kZs 1 + Xs
2
2
2
. clearly it tends to zero as
2 Zs h(s. as
→ 0.2. as → 0.
t
E
0
2 Zs h(s. T ]. But also
=
Zs · Zs · h(s. the bounded convergence theorem yields E Hence we conclude that E(Zt ) = 1 ∀t. Hence we conclude that kZs (1 + Xs 2 ) is an integrable dominating function for the integrand on interest. Xs ) (1 + Zs )3 2
73 → 0.
In addition. Lemma 18. Hence by the Dominated Convergence Theorem.
Substituting for d(Zt Xt 2 ) into this expression yields d Zt Xt 2 1 + Zt Xt +
2
=
1 (1 + Zt Xt
2 2 )2
−Zt Xt 2 hT dWt + Zt 2XT σdVs t −
2 2 Zt Xt 4 hT h + 4Zt 2XT σσ T Xt t
Zt 2XT f + tr(σσ T ) t (1 + Zt Xt 2 )
(1 + Zt Xt 2 )
3
dt. and so Zt is a genuine martingale. Xs ) 2 ≤ k(1 + Xs 2 ).
where we have used the fact that h(s.
Zs Xs 2 ds < C(T ) ∀t ∈ [0. o d Zt Xt 2 1 + Zt Xt
2
=
1 (1 + Zt Xt 2 )2
d Zt Xt
2
+
(1 + Zt Xt 2 )3
d Zt Xt
2
.Stochastic Filtering Consider the integrand on the right hand side.
the terms which are stochastic integrals are local martingales. Similarly for the second term. and hence if we take the expectations of the integrals they are zero. But by the linear increase condition on σ. so they must both be genuine martingales. In fact they can be shown to be genuine martingales. Xt )σkj (t. the ﬁrst satisﬁes
t 0 2 Zs Xs 2 4 ds ≤ 0 t t
(1 + Zs Xs 2 )
Zs × Zs
1 Zs Xs 2 × 3 ds 2) (1 + Zs Xt (1 + Zs Xs 2 ) 1
0 t
≤
0
ds ≤
Zs .Stochastic Filtering After integrating this expression from 0 to t. 2 2 0 (1 + Zs Xs ) to show that this is a martingale we must therefore establish that E
0 t
74
Zs 2XT σ s (1 + Zs Xs 2 )
2
2
ds = 4E
0
t
2 Zs XT σσ T Xs s
(1 + Zs Xs 2 )
4 ds
< ∞.
The last integral has a bounded expectation since E (Zs ) ≤ 1.
A similar argument holds for the other stochastic integrals. and hence
i k Xt σij (t. Integrating
. Xt )σkj (t. Xt )Xt  ≤ Xt 2 σ 2 .
t 0 4 ds ≤ (1 + Zs Xs 2 ) 2 Zs Xs 4 0 t 2 Zs Xs 4 2 )2
(1 + Zs Xs
×
1 (1 + Zs Xs
2 )2
ds ≤
1
2
t < ∞. of which there are d3 terms (in Rd ).
so the integral may be bounded by
t 0 2 Zs XT σσ T Xs s t 3 4 ds ≤ κd 0 t 2 Zs Xs 2
1 + Xs
4 2
2
(1 + Zs Xs 2 )
(1 + Zs Xs 2 )
2 Zs Xs
ds
t
= κd
3 0
(1 + Zs Xs 2 )
4 ds
+
0
2 Zs Xs
4 4
(1 + Zs Xs 2 )
ds
Considering each term separately. For example consider the term t Zs 2XT σ s dVs . Xt )Xt  ≤ κ Xt 2
1 + Xt
2
. we have for some constant κ σ 2 ≤ κ(1 + X 2 ).
In order to establish this inequality consider the term XT σσ T Xt which is a sum over terms t of the form i k Xt σij (t.
Xs ) dWs −
1 2
t
h(s.Stochastic Filtering the whole expression from 0 to t. Xs ) ds. then taking the expectation and ﬁnally diﬀerentiation with respect to t yields d E dt Zt Xt 2 1 + Z t Xt Zt (2XT f + tr(σσ T ) t (1 + Zt Xt 2 )2 Zt Xt 2 ≤k 1+E 1 + Zt Xt 2 ≤E
75
2
.3. ∀t ∈ [0.
˜ As a corollary of Girsanov’s theorem Wt − W. Since Wt is a P Brownian Motion. Proof Deﬁne a process Mt taking values in R by
t
Mt = −
0
hT (s. ˜ the process L´vy’s characterisation theorem then implies that under P e
t
Wt − W. M
t
= Wt +
0
h(s.
o FT
This change of measure allows us to establish one of the most important results in stochastic ﬁltering theory. it follows that Mt is a P local martingale. ˜ Given that Zt has been shown to be a martingale a new probability measure P may be deﬁned by the RadonNikodym derivative: ˜ dP dP = ZT .
. Proposition 18. the observations process Yt is a Brownian motion.
(11)
is a martingale. M t is a continuous P local martingale. to construct a new measure where Yt = ˜ dP dP
t
= E(Mt ) = exp −
t 0
hT (s. E 1 + Z t Xt 2 Applying Fatou’s lemma the desired result is obtained. which is a martingale under P. which is independent of Xt . ˜ Under the probability measure P. T ]. The previous lemma then establishes that Mt is in fact a P martingale. Thus we may apply Girsanov’s theorem. Xs ) dWs . Zt Xt 2 ≤ C(T ). Xs )
0
2
ds
= Zt .
An application of Gronwall’s inequality (see the next section for a proof of this useful result) establishes.
T ]. the variables X and Y are independent. Xs ) 2 ds . Then under the measure P we have ˜ ˜ dZt = Zt hT (t. Hence EP [U Yt ] = EP [U Yt ∪ Yt ] = EP [U Y] . since the latter is equal to the former plus a drift term. since Y is an Ft adapted Brownian Motion which implies that it satisﬁes the independent increment property. Yt )] = EP [f (Xt .5. which means that for f any bounded measurable function EP [f (Xt . Yt )Ψt Ft ]] = EP [f (Xt . Vt )] . Yt )EP [Ψ∞ Ft ]] = EP [f (Xt . Yt )Ψ∞ ] = EP [EP [f (Xt . ˜ Hence in terms of the probability measure P EP [f (Xt . the observation process Yt is a Brownian Motion.4.
0
. ˜ ˜ and hence under P. ˜ ˜ ˜
76
18. Xs )dYt −
1 2
t
h(s. V) on [0.Stochastic Filtering is an m dimensional Brownian Motion. (12) ˜ ˜ so Zt is a P local martingale. To establish the second part of the proposition we must prove the independence of Y and X. Vt )] . ˜ then Y = Yt ∪ Yt . Their RadonNikodym derivative (which exists since they are absolutely continuous with respect to each other) is ΨT . the σalgebra Yt is independent of Ft . T ] is absolutely continuous with respect to that of the pair (X. ˜ ˜ Proof Deﬁne Yt := σ (Yt+u − Yt : u ≥ 0) . Let U be an integrable Ft measurable random variable. Then EP [U Yt ] = EP [U Y] . Yt )Ψt ] = EP [f (Xt . Yt )Z∞ ] = EP [f (Xt . Proposition 18. Under the measure P. The Unnormalised Conditional Distribution
˜ ˜ To simplify the notation in what follows deﬁne Zt = 1/Zt . Y) on [0. Xt )dYt . Hence
t
˜ Zt = exp
0
hT (s. Thus we have proven that under P. The law of the pair (X. But this process is just the observation process Yt ˜ by deﬁnition.
From the deﬁnition of πt viz πt (φ) := EP [φ(Xt )Yt ] . For every bounded measurable function φ. ˜ ˜ This implies that EP ( πt (φ)ρt (1) Yt ) = EP ( ρt (φ) Yt ) ˜ ˜ As the random variables πt (φ). EP ZT = EP ZT ZT = 1. we deduce from the deﬁnition of conditional expectation that EP (πt (φ)b) = EP (φ(Xt )b) . we have ρt (φ) ˜ . for every bounded measurable function φ.s. the unnormalised conditional distribution of X via ˜ ρt (φ) :=E˜ φ(Xt )Zt Yt
P
˜ =EP φ(Xt )Zt Y .Stochastic Filtering ˜ ˜ Also for any stopping time T . for all t ≥ 0. ˜ dP F∞ whence dP ˜ = Zt . P and P a. πt (φ) := EP [φ(Xt )Yt ] = ρt (1) Proof Let b be a bounded Yt measurable function.4.s.
. ˜ ˜ Whence ˜ ˜ EP πt (φ)EP Zt Yt b = EP EP φ(Xt )Zt Yt b . ˜ using the result of proposition 18.
(13)
The following result within the context of stochastic ﬁltering is sometimes called the KallianpurStriebel formula.s. the constant mean condition ˜ ˜ ˜ implies that ZT is a UI P martingale and thus by the martingale convergence theorem 1 Zt → Z∞ in L . ˜ dP F
t
77
Now deﬁne. ˜ ˜ ˜ ˜ which is equivalent to EP (πt (φ)ρt (1)b) = EP (ρt (φ)b) . ρt (φ) and ρt (1) from their deﬁnitions are all Yt measurable this implies that πt (φ)ρt (1) = ρt (φ) ˜ Since Zt > 0 a. Proposition 18.. which implies that ρt (1) > 0 a. this is suﬃcient to prove the required result. hence by the tower property of conditional expectation ˜ ˜ EP Zt πt (φ)b = EP Zt φ(Xt )b .5. So we may write dP ˜ = Z∞ .
The Zakai Equation
We wish to derive a stochastic diﬀerential equation which is satisﬁed by the unnormalised density ρt . Proof ˜ ˜ It suﬃces to show that if a ∈ L1 (Ω. 2005]. . . . 1982]. T )
p
kT Ytj = 0 j
˜ E a exp i
0
bs dYs
˜ = E a exp i
j=1
˜ and the same holds for linear combinations by the linearity of E viz ˜ E a
k=1 K
ck exp i
p
kT Ytj = 0 j. t2 . t]. . Rm ) . . and bs = Hence
t
µp−1 = kp + kp−1 . The following proof of this standard result is taken from [Bensoussan. .g. t) such that t1 < t2 < · · · < tp .Stochastic Filtering Remark The above proof is simply a speciﬁc proof of the abstract form of Bayes’ theorem. the general form of which is a standard result.
˜ then the set St is a total set in L1 (Ω. . kp
p p t
kT Ytj j
j=1
=
j=1
µT (Ytj i
− Ytj−1 ) =
0
bT dYs s
where µp = kp . tp ∈ (0. A.3 in [Musiela and Rutkowski. Let
t
St :=
t
= exp i
0
rT dYs + s
1 2
t
rs 2 ds
0
:
rs ∈ L∞ ([0. P) and E(a t ) = 0 for all t ∈ St that a = 0. Yt .
78
18. A set S is said to be total in L1 (Ω.6.
for t ∈ (tj−1 .k
j=1
. P). Take t1 . Ft . . then given k1 . .7. P). if for every a ∈ L1 (Ω.
Lemma 18. Deﬁnition 18. .6.0. . see e. P) E (a t ) = 0 ∀ ∈ S ⇒ a = 0. µj 0
µ 1 = kp + · · · k1 . tj ) for t ∈ (tp . . Ft . Yt .
.k ∈ C ∀j. taking ˜ ˜ f (x) = a(x) ∧ n for arbitrary n then implies E(a2 ∧ n) = 0. . . . Ytp ). . . xp ) = F (x1 .
∀T ≥ 0. . . P) and we wish to compute E (φY). . . . Let {Ut }t≥0 be a continuous Ft measurable process such that
T
˜ E
0
Ut2 dt
< ∞. Remark ˜ As a result if φ ∈ L1 (Ω. it is suﬃcient to prove that for any
t t
t
∈ St that
˜ E
t 0
Us dYsj
˜ =E
t 0
˜ E (Us Ys ) dYsj . and separates points in (Rm )p . k f (x1 . Yt . . . . . . xp ) is a continuous bounded complex valued function deﬁned on (Rm )p then since the set p K T i : ck ∈ C ∀k. the StoneWeierstrass approximation theorem for complex valued functions implies that there exists a uniformly bounded sequence of functions of the form (n)
K p
P
(n)
(x1 .Stochastic Filtering where c1 . . . . Ytp ) = 0
hence for every continuous bounded F and by the usual approximation argument this extends to every F bounded Borel measurable with respect to σ(Yt1 .
. .k
(n)
T
xj
such that
n→∞
lim P (n) (x1 . xp ) = ck exp kj. P). .. . . . .
then for any t ≥ 0.k xj
k=1 j=1
79
is closed under complex conjugation. xp ) =
k=1
(n) ck
exp i
j=1
kj. .
Proof ˜ As St is a total set in L1 (Ω. Lemma 18. . xp ) ˜ E aF (Yt1 . As 0 < t1 < . kj. whence a = 0 P a. . < ˜ tp < t were arbitrary it follows that E(af ) = 0 for every f bounded Yt measurable. it suﬃces to compute E (φ t ) for all t ∈ St .8. .s. . cK are complex coeﬃcients. . Yt . and for j = 1. . . . . m the following holds
t t
˜ E
0
Us dYsj Yt
=
0
˜ E (Us Ys ) dYsj . If F (x1 . . . .
t
in St . The ﬁrst term on the right hand side above is a martingale (from the boundedness condition on Us ) and hence its expectation vanishes. this
Lemma 18.
0 0
As this holds for all establishes the result.4.
˜ ˜ But from proposition 18.
∀T ≥ 0. ˜ Thus using the fact that the lebesgue integral and the expectation E(·Y) commute
t t
˜ E
t 0
Us dYsj
˜ =E
0
j i s rs Us ds t j i s rs Us ds Y 0 t
˜ ˜ =E E ˜ =E
0 t
˜ iE
j s rs Us
Y ds
˜ =E
0 t
j˜ i s rs E (Us Y) ds t
˜ =E
0
i s rT dYs s
0 t t 0
˜ E (Us Y) dYsj
˜ =E
˜ E (Us Y) dYsj .Stochastic Filtering To this end note that for every
t
80
∈ St the following holds
t t
=1+
0
i s rT dYs . Let {Ut }t≥0 be a real valued Ft adapted continuous process such that
T
˜ E
0
2 Us ds
< ∞.9. it follows that since Us is Ys measurable E(Us Y) = E(Us Ys ) whence t t j ˜ t ˜ t ˜ E Us dYs = E E (Us Ys ) dYsj . in conjunction with the Kunita Watanabe identity. by the earlier remarks. 0
Us dYsj
0
˜ +E
Here the last term is computed by recalling the deﬁnition of the covariation process.
(14)
. and the latter is a total set. s
and hence
t t t
˜ E
t 0
Us dYsj
˜ =E ˜ =E
1+
0 t
i s rT dYs s
0 t
Us dYsj
j i s rs Us ds .
R t = 0.10.
∀φ ∈ D(A). Let A be the inﬁnitesimal generator of the signal process Xt . Remark In the context of stochastic ﬁltering a natural application of this lemma will be made by setting Rt = Vt . the stochastic noise process driving the signal process. The last term vanishes. let the domain of this inﬁnitesimal generator be denoted D(A). the stochastic integral is a genuine martingale. so it suﬃces to show that for each
t
t
∈ St
˜ E
t 0
Us dRs
t t
= 0.
Proof As before use the fact that St is a total set. The Zakai equation is important because it is a parabolic stochastic partial diﬀerential equation which is satisﬁed by ρt .
As in the previous proof we note that each
t
in St satisﬁes i s rT dYs s
=1+
0
Hence substituting this into the above expression yields
t t t t
˜ E
t 0
Us dRs
˜ =E
0 t
Us dRs Us dRs
0 t
˜ +E
0 t
i s rT dYs s
0
Us dRs
s
˜ =E ˜ =E
0
˜ +E
0
i s Us rT d Y.
(15)
. because by the condition 14.Stochastic Filtering and let Rt be another continuous Ft adapted process such that R. the unnormalised conditional distribution on X satisﬁes the Zakai equation which is
t t
ρt (φ) = π0 (φ) +
0
ρs (As φ)ds +
0
ρs (hT φ)dYs . The other term vanishes because from the hypotheses Y. and indeed it’s solution provides a practical means of computing ρt (φ) = ˜ ˜ E φ(Xt )Zt Yt . Then subject to the usual conditions on f . Then
˜ E
0
Us dRs Yt = 0. R s
Us dRs
=0.
∀t ≥ 0. σ and h. Theorem 18. We are now in a position to state and prove the Zakai equation. Y
t t
81 = 0 for all t. Indeed the Zakai equation provides a method by which numerical solution of the nonlinear ﬁltering problem may be approached by using recursive algorithms to solve the stochastic diﬀerential equation.
we know that
t
φ(Xt ) − φ(X0 ) =
0
As φ(Xs )ds + Mtφ . 27 2
Thus we have approximated the process Zt by a bounded process. x)σkj (s.
where Mtφ is a martingale given by
d t d
Mtφ =
j=1 0 i=1
∂φ j (Xt )σij (t. this approximation is bounded by ˜ Zt ≤ 4 . However from the deﬁnition of an inﬁnitesimal generator. x) (x). Application of Itˆ’s o formula to f (x) = x/(1 + x). The integration by parts formula yields. As φ(x) = fi (s.Stochastic Filtering Proof ˜ We approximate Zt by ˜ Zt = ˜ Zt . ∂xi
Using the expression for dZt found earlier we may write ˜ ˜ ˜ ˜2 d Zt φ(Xt ) = Zt Aφ(Xt ) − φ(Xt ) (1 + Zt )−3 Zt h(t. φ(Xt ) . yields ˜ ˜ ˜ ˜ ˜ ˜2 dZt = df (Zt ) = (1 + Zt )−2 Zt hT dYt − (1 + Zt )−3 Zt h(t. ˜ 1 + Zt
82
noting that since Zt ≥ 0. Xt ) 2 . Xt )dYt In terms of the functions already deﬁned we can write the inﬁnitesimal generator for Xt as d d d d ∂2φ 1 ∂φ σik (s. Xt ) dWt . x) i (x) + ∂x 2 i=1 j=1 ∂xi ∂xj i=1
k=1
Now we wish to compute ˜ ˜ ρt (φ) = E Zt φ(Xt )Yt
. ˜ ˜ ˜ ˜ d Zt φ(Xt ) = φ(Xt )dZt + Zt d (φ(Xt )) + Zt . Xt )
2
dt
˜ ˜ ˜ + Zt dMtφ + φ(Xt )(1 + Zt )−2 Zt hT (t.
1+
Hence taking conditional expectations of both sides of (16) π0 (φ) ˜ ˜ ˜ E Zt φ(Xt ) Yt = +E 1+
t t
˜ ˜ φ(Xs )(1 + Zs )−2 Zs hT (s.Stochastic Filtering we shall do this using the previous expression which on integration gives us
t
83
˜ ˜ Zt φ(Xt ) =Z0 φ(X0 ) +
0 t
˜ ˜ ˜2 Zs Aφ(Xs ) − φ(Xt )(1 + Zs )−3 Zs h(s.
. ˜ ˜ ˜ E (Zs As φ(Xs )Ys ) ≤ As φ E Zs Ys . Xs )
φ ˜ ˜ E Zs Yt dMs 0
Yt ds
+
˜ ˜ Now we take the ↓ 0 limit. Xs )
t
2
ds (16)
+
0
φ ˜ Zs dMs + 0
˜ ˜ φ(Xs )(1 + Zs )−2 Zs hT (s. Xs )dYs
Note that ˜ ˜ ˜ E Z0 φ(X0 ) Yt = E
1 φ(X0 ) Y0 1+
= π0 (φ)
1 . ˜ ˜ ˜ ˜ E Zt φ(Xt )Yt → E Zt φ(Xt )Yt = ρt (φ). Xs ) dYs Yt
0
˜ +E
0 t
˜ ˜ ˜2 Zs Aφ(Xs ) − φ(Xs )(1 + Zs )−3 Zs h(s.8 since Zt is bounded allows us to interchange the (stochastic) integrals and the conditional expectations to give π0 (φ) ˜ ˜ E Zt φ(Xt ) Yt = + 1+
t t
˜ ˜ ˜ E φ(Xs )(1 + Zs )−2 Zs hT (s. and also by the monotone convergence theorem ˜ ˜ E Zs As φ(Xs )Ys → ρs (As φ). Xs ) Yt dYs
0 2
+
0 t
˜ ˜ ˜ ˜2 E Zs Aφ(Xs ) − φ(Xs )(1 + Zs )−3 Zs h(s. Xs )
φ ˜ Zs dMs Yt 0
2
ds Yt
˜ +E
˜ Applying lemma 18. and thus by the monotone convergence theorem since Zt ↑ Zt as → 0. Clearly Zt → Zt pointwise. We can bound this by ∀ > 0.
Also note that ˜ lim φ(Xs ) Zs
→0
2
˜ 1 + Zs
−1
h(s.
as
→ 0. Xs )
2
Y
→0
→ 0.
Now note again that the dominating function (17) is also Lebesgue integrable and bounded thus a second application of the Dominated convergence theorem yields
t
˜ ˜ E φ(Xs ) Zs
0
2
˜ 1 + Zs
−1
h(s. Xs ) ˜ ˜ ˜ 1 + Zs 1 + Zs 1 + Zs ˜ ≤ Zs h(s.Stochastic Filtering and the right hand side is an L1 bounded quantity. Xs ) 2 ˜ ≤ Zs k(1 + Xs 2 ) φ
2 ∞
2
But ˜ ˜ ˜ E(Zs k(1 + Xs 2 )) = kE Zs Zs 1 + Xs = k 1 + E( Xs 2 ) ≤ k 1 + C 1 + E( X0 2 ) ect (17)
2
Where the ﬁnal inequality follows by (6). Hence by the dominated convergence theorem we have
t t
84
˜ E Zs As φ(Xs )Ys ds →
0 0
ρs (As φ) ds a. by the conditional form of the Dominated Convergence theorem it follows that ˜ E as as ˜ φ(Xs ) Zs
2
˜ 1 + Zs
−1
h(s. Xs )
2
= 0. Xs )
2
Ys ds → 0.
.4 the sigma ﬁeld Y may be replaced by Yt which yields that →0 2 −1 ˜ ˜ ˜ E φ(Xs ) Zs 1 + Zs h(s. as the right hand side of the above inequality is integrable. By proposition 18. Xs ) Ys dYs →
0
ρs (hT φ(Xs )) dYs . Xs ) 2 Ys → 0.s. As a consequence of these two results. Xs )
2
≤
˜ Zs 1 Zs h(s.
and since the function may be dominated by ˜ φ(Xs ) Zs
2
˜ 1 + Zs
−1
h(s.
It now remains to check that the stochastic integral
t
˜ ˜ ˜ E φ(Xs )Zs 1 + Zs
0
−1
t
hT (s.
Xs )
T
≤ φ ≤4 φ ≤ 4k
2 ∞
˜ Zs ˜ 1 + Zs
1 1+ ˜ 1 + Zs
2
˜ Zs
h(s. Xs ) Y ˜ 1 + Zs 2 ˜ ˜2 Zs 2 + Zs T ˜ ≤ E Y 2 φ(Xs )h (s. Xs )φ(Xs ) Yt dYs . Xs )
2
2 ˜2 2 ∞ Zs h(s.
Both I(t) and I (t) are continuous bounded square integrable martingales.
and the desired limit
t
I(t) :=
0
ρs (hT φ(Xs ))dYs =
0
˜ ˜ E Zs hT (s. 0 ˜ 1 + Zs ˜ Consider the L2 norm of this diﬀerence and use the fact that Y is a P Brownian motion 2 2 ˜s 2 + Zs ˜ t Z 2 T ˜ ˜ ˜ E (I (t) − I(t)) = E E 2 φ(Xs )h (s. Consider the diﬀerence t ˜ Zs T ˜ ˜ E I (t) − I(t) = 2 − Zs φ(Xs )h (s. Xs ) Y 0 ˜ 1 + Zs ˜ ˜ By the conditional form of Jensen’s inequality E(XY) 2 ≤ E X 2 Y . Xs ) ˜2 φ 2 Zs 1 + Xs 2 ∞
. Xs ) Y dYs . Thus 2 2 ˜ ˜s 2 + Zs Z T ˜ E 2 φ(Xs )h (s. Xs ) ˜ 1 + Zs A dominating function can can readily be found for the quantity inside the conditional expectation viz ˜2 ˜ Zs 2 + Zs ˜ 1 + Zs
2 2
φ(Xs )h (s.Stochastic Filtering To this end deﬁne
t
85
I (t) :=
0
˜ ˜ ˜ E φ(Xs )Zs 1 + Zs
t
−1
hT (s. Xs ) Ys dYs . Xs ) Y dYs 0 ˜ 1 + Zs 2 ˜s 2 + Zs ˜ t Z T ˜ =− E 2 φ(Xs )h (s. Xs ) Y dYs 0 ˜ 1 + Zs 2 2 ˜ ˜s 2 + Zs t Z T ˜ ˜ = E E ds 2 φ(Xs )h (s.
a. Recall that the unnormalised conditional distribution is given by ˜ ˜ ρt (φ) = E φ(Xt )Zt Yt . which is suﬃcient to establish the claim. Xs ) 2 ≤ K(1 + Xs 2 ) we can write 2 2 ˜ ˜s 2 + Zs t Z ˜ T ˜ E E φ(Xs ) 2 h (s.s. ρt (1) Now using this result together with the Zakai equation. i.
By a standard theorem of convergence. called the innovations process. this means that there is a suitable subsequence such that I n (t) → I(t) a.
n
18. Xs )Ys ds → 0. we can derive the KushnerStratonovitch equation.s. πt (φ). ρt (φ).Stochastic Filtering Since ˜ ˜ ˜ ˜ E Zs Zs + Zs Xs
2
86
˜ ˜ = E Zs + E Zs Xs
2
2
Using the dominated convergence theorem as → 0 2 ˜s 2 + Zs ˜ t Z T ˜ E 2 φ(Xs )h (s. 0 ˜ 1 + Zs Thus we have shown that
2 ˜ E (I (t) − I(t)) → 0.
(18)
. The normalised conditional law of the process Xt subject to the usual conditions on the processes Xt and Yt satisﬁes the KushnerStratonovich equation πt (φ) =
t t
πt (φ) = π0 (φ) + The process Yt −
πs (As φ)ds +
0 0 t π (h(s.7. It seems natural therefore to derive a similar equation satisﬁed by the normalised conditional distribution. As a consequence of the proposition 17. and the normalised conditional law is given by πt (φ) := E [φ(Xt )Yt ] .11.5 ρt (φ) . Xs ) Yt 0 ˜ 1 + Zs
ds → 0
Now we use the dominated convergence theorem to show that for a suitable subsequence that since h(s. Xs )) ds is 0 s
πs (hT φ) − πs (hT )πs (φ) (dYs − πs (h)ds) .e. Theorem (KushnerStratonovich) 18. KushnerStratonowich Equation
The Zakai equation which has been derived in the previous section is an SDE which is satisﬁed by the unnormalised conditional distribution of Xt .
.
Combining this with the original equation (19) gives the desired result. Let x.
(19)
then
t t
x(t) ≤ z(t) +
0
z(s)y(s) exp
s
y(r)dr ds. 1998].
or equivalently
t t t
x(s)y(s)ds ≤
0 0
z(s)y(s) exp
s
y(r)dr ds. 1986].19. [Karatzas and Shreve. Theorem (Gronwall).
This can now be integrated to give
t t t s
x(s)y(s)ds exp −
0 0
y(s)ds
≤
0
z(s)y(s) exp −
0
y(r)dr ds.
Proof Multiplying both sides of the inequality (19) by y(t) exp −
t t
t 0
y(s)ds yields
t
x(t)y(t) exp −
0
y(s)ds −
0
x(s)y(s)ds y(t) exp −
0
y(s)ds
t
≤ z(t)y(t) exp −
0
y(s)ds . y and z be measurable nonnegative functions on the real numbers. and for all 0 ≤ t ≤ T then
t
x(t) ≤ z(t) +
0
x(s)y(s)ds. There are various versions of Gronwall’s theorem (see [Ethier and Kurtz. Gronwall’s Inequality
An important and frequently used result in the theory of stochastic diﬀerential equations is Gronwall’s lemma. If y is bounded and z is integrable on [0.. 1987]) but the following general form and proof follows that given in [Mandelbaum et al.
The left hand side can be written as the derivative of a product. [87]
. d dt
t t t
x(s)y(s)ds exp −
0 0
y(s)ds
≤ z(t)y(t) exp −
0
y(s)ds . T ].
If x.
T
≤ sup z(t) +
0≤t≤T
sup z(t)
0≤t≤T 0
y(s) exp
s T
y(r)dr ds −1
≤ sup z(t) +
0≤t≤T
sup z(t)
0≤t≤T T
exp
0
y(s)ds
≤
sup z(t) exp
0≤t≤T 0
y(s)ds .
which yields
T T
sup x(t) ≤ sup z(t) +
0≤t≤T 0≤t≤T 0
z(s)y(s) exp
s T
y(r)dr ds. y.Gronwall’s Inequality Corollary 19.
Proof From the conclusion of Gronwall’s theorem we see that for all 0 ≤ t ≤ T
T T
x(t) ≤ sup z(t) +
0≤t≤T 0
z(s)y(s) exp
s
y(r)dr ds. and z satisfy the conditions for Gronwall’s theorem then
T
88
sup x(t) ≤
0≤t≤T
sup z(t) exp
0≤t≤T 0
y(s)ds .1.
.
Xt ) = Ft h(t. dXt = (Bt Xt + bt ) dt + Ft dVt dYt = Ht Xt dt + dWt where Bt and Ft are a N × N matrix valued processes. Xt ) = Ht Xt The inﬁnitesimal generator of the signal process Xt is given by
N
At u =
i=1
(i) bt (Xt )
1 ∂u + ∂xi 2
N
N
(F F T )ij
i=1 j=1
∂2u ∂xi ∂xj
which may readily be veriﬁed by applying Itˆ’s formula to f (Xt ). Thus it may be characterized by its ﬁrst and second moments alone. Thus in terms of the notation used earlier f (t. A special case in which they are of ﬁnite dimension is the most famous example of stochastic ﬁltering is undoubtedly the KalmanBucy ﬁlter. described by the following SDEs. The relevant equations can be derived without using the general machinery of stochastic ﬁltering theory. Since we shall only be concerned with these two moments. [89]
. We consider the system (Xt . Rt ). it is hoped that the operation of the KushnerStratonowich equation may be made clearer. The notation in what follows will become somewhat messy. o Since the system of equations is linear it is simple to show that the conditional distribution of Xt conditional upon Yt is a multivariate normal distribution. which is independent of Vt . we write Xt for the ith component of the signal process.20. long before the general nonlinear stochastic ﬁltering theory had been developed. Vt is a N dimensional Brownian motion. Yt ) ∈ RN × RM where Xt is the signal process and Yt is the observation process. Ht is an M × N matrix valued process and Wt is an M dimensional Brownian Motion. it will simplify the notation to deﬁne the conditional mean process
(i) ˆ (i) Xt := πt (xi ) = E Xt Yt
and the conditional covariance process
ij Rt = E (i) ˆ (i) Xt − Xt
Xt
(j)
ˆ (j) − Xt
= πt (xi xj ) − πt (xi )πt (xj )
ˆ Thus at time t our best estimate of the signal will be a normal distribution N (Xt . However by examining this special linear case. Xt ) = bt + Bt Xt σ(t. The ﬁrst practical use of the Kalman ﬁlter was in the ﬂight guidance computer for the Apollo missions. Kalman Filter
It is clear that in the general case the SDEs given by the KushnerStratonivich equation are inﬁnite dimensional. It is therefore suﬃcient to consider φ(X) = xi and φ(X) = xi xj as test functions in the KushnerStratonovich equation (i) (18).
is
t
Nt = Yt −
0
ˆ Hs Xs ds
Whence writing the equation for the evolution of the conditional mean in vector form. X (j) ]t ˆ ˆ ˆ ˆ ˆ ˆ d Xt Xt
. Conditional Covariance
We now derive an equation for the evolution of the covariance matrix R. R) then E Xt Xt Xt
(i) (j) (k)
ˆ ˆ = E (Xt + Z (i) )(Xt
(i)
(j)
ˆ + Z (j) )(Xt
(k)
+ Zt )
(k)
ˆ (k) ij ˆ j ik ˆ (i) jk ˆ (i) ˆ (j) ˆ (k) = Xt Xt Xt + Xt Rt + Xt Rt + Xt Rt Applying the KushnerStratonovich equation to φ(X) = xi xj yields E Xt Xt
(i) (j)
Yt = E X0 X0
t
(i)
(j)
Y0
(j)
+
0 t
E (Ft FtT )ij + (Bt Xt + b)(i) Xt
(i) (j)
+ (Bt Xt + b)(j) Xt
(j)
(i)
Yt dt
+
0
E(Xt Xt Ht Xt Yt ) − E(Xt Xt Yt )E(Ht Xt Yt ) · dNs (21)
(i)
It is clear that
(i) (j) ˆ (i) ˆ (j) dRij = d E(Xt Xt Yt ) − d Xt Xt
(22)
the ﬁrst term is given by (21). the innovations process.11 this satisﬁes ˆ (i) ˆ (i) Xt = X0 +
0 t t
E (Bs Xs )(i) + bt +
0
(i)
Yt dt
(i) ˆ ˆ (i) E Xs (Hs Xs )T Ys − Xs (Hs Xs )T dNs
where Nt .Kalman Filter
90
20. Conditional Mean
To derive an equation for the evolution of the conditional mean process. we obtain (since RT = R)
T ˆ ˆ ˆ dXt = Bt Xt + bs dt + Rt Ht (dYt − Ht Xt dt)
(20)
20.1.2. from theorem 18. we apply the KushnerStratonowich equation to φ(X) = X (i) . ˆ If X ∼ N (X. and using Itˆ’s form of the product rule to expand out the o second term ˆ (i) ˆ (j) = X (i) dX (j) + X (j) dX (i) + d[X (i) . but ﬁrst we shall need a result about the third moment of a multivariate normal distribution.
(Rt Ht · dNt )(j) = Rt H kl dN k . since in this form it is practical for use in some of the problems described in the introduction to the stochastic ﬁltering section of these notes. conditional on the observations up to time t. Rjm Hnm dN n
= Ril H kl Rjm Hnm δkn dt = Ril H kl H km Rjm dt = (RH T HRT )ij dt = (RH T HR)ij dt (24)
where the last equality follows since RT = R.Kalman Filter Therefore using (20) we can evaluate this as
T ˆ (i) ˆ (j) = Xt (Bt Xt + bt )(j) dt + Xt (Rt Ht · dNt )(j) + Xt (Bt Xt + bt )(i) dt ˆ (i) ˆ (i) ˆ (j) d Xt Xt T T T ˆ (j) + Xt (Rt Ht · dNt )(i) + (Rt Ht · dNt )(i) . (Rt Ht · dNt )(j)
91
(23)
For evaluating the quadratic covariation term in this expression it is simplest to work componentwise using the Einstein summation convention and the fact that the innovation process Nt is readily shown by L´vy’s characterization to be a P Brownian motion e
il T T (Rt Ht · dNt )(i) . Substituting (24) into (23) and then using this and (21) in (22) yields the following equation for the evolution of the ijth element of the covariance matrix
ij dRt = dt
1 T T (Ft FtT )ij + (Bt Rt )ij + (Rt Bt )ij − (RH T HR)ij 2 ˆ (i) jm ˆ (j) jm ˆ i jm ˆ j jm + (Xt Rt + Xt Rt )Hlm dN l − (Xt Rt + Xt Rt )H lm dN l
Thus we obtain the ﬁnal diﬀerential equation for the evolution of the conditional covariance matrix. It is most frequently encountered in a discrete time version.
. notice that all of the stochastic terms have cancelled dRt T T = Ft FtT + Bt Rt + Rt Bt − Rt Ht Ht Rt dt (25)
The two equations (20) and (25) therefore provide a complete description of the best prediction of the system state Xt . This is the KalmanBucy ﬁlter.
It might seem that this includes a great deal of the cases which are of real interest. ∞) × Γ. t]. This has trajectories p(t) which are Ft adapted and piecewise continuous. The process is also stationary – the distribution of p(t + s) − p(t) does not depend upon t. Let λ(t) be a bounded nonnegative Ft adapted process with RCLL sample paths. From the above deﬁnition for any B ∈ B. These have the properties that for any t ≥ 0. The jump measure N (σ. (iv) For each s > 0 and Borel B ⊂ Γ the distribution of N ([t. This measure can be related back to the path p(·) by the following integral for any t ≥ 0. The standard Poisson process on the line should be familiar. t] × Γ. where Π is a probability measure.
t
p(t) =
0 Γ
γ N (ds. p(t) is a. Discontinuous Stochastic Calculus
So far we have discussed integrals with repect to continuous semimartingales.s.
The above integral can be deﬁned without diﬃculty!
21. dγ). Compensators
Let p(t) be an Ft adapted jump process with bounded jumps. More formally. B) − N ([0. Poisson Process A Poisson process with intensity measure h(dt. {N (Ai )} is independent of Ft− . such that p(0) = 0 and E(p(t)) < ∞ for all t ≥ 0. the constant of proportionality is called the intensity λ and it can be shown that the inter jump times have exponential distribution with parameter λ (i. mean 1/λ). F. N (A) is Ft− measurable.21. Such processes can also be described by their jump measures. If the probability of a jump in an inﬁnitessimal time interval is proportional to the length of the time interval. s ≥ 0. dy) = λ dt×dy as a measurable map N from (Ω.e. B(R+ ×Γ)) with the properties (i) For every t ≥ 0 and every Borel subset A of [0. B) is the number of jumps within the time interval σ in the (Borel) set B. (ii) For every t ≥ 0 and every collection of Borel subsets Ai of [t. ﬁnite and for any t. (iii) E(N (A)) = h(A) for every Borel subset A of R+ × Γ. t + s]. B) does not depend upon t. E (N ([0. P) into the space of positive integer valued counting measures on (R+ ×Γ. B) Ft ) = λΠ(B). but a very important class of processes have been neglected. t + s]. p(t + s) − p(t) is independent of Ft .1. we make the following deﬁnition Deﬁnition 1. then λ(·) is the intensity of the jump process p(·) and the process t λ(s) ds is the compensator of the process p(·). 0 [92]
. If
t
p(t) −
0
λ(s) ds
is an Ft martingale.
Deﬁnition 21. although important is somewhat counter intuitive. For A a process of locally ﬁnite variation there exists a previsible locally ﬁnite variation compensator process Ap unique a. there is a nondecreasing previsible process. T (iii) E [(H · Ap )∞ ] = E [(H · A)∞ ] for all nonnegative previsible processes H.3.7. M c is a continuous local martingale and M d is a purely discontinuous local martingale.s.Discontinuous Stochastic Calculus Why are compensators interesting? Recall from the theory of stochastic integration with respect to continuous semimartingales that one of the most useful properties was that a stochastic integral with respect to a continuous local is a local martingale. Deﬁnition 21. A local martingale X is purely discontinuous if X0 = 0 and it is orthogonal to all continuous local martingales. Two local martingales N and M are orthogonal if their product N M is a local martingale.2.
93
Theorem 21. If f is a real valued C 2 function with domain Rn and X is an ndimensional semimartingale with decomposition X = M c + M d + Y where M c + M d is a local martingale and M c is
. Deﬁnition 21. Corollary 21. We would like the same sort of property to hold when we integrate with respect to a suitable class of discontinuous process. denoted Ap which is a. called the compensator of A. since for example if we consider a Poisson process Nt which consists soley of jumps.5. unique characterised by one of the following (i) A − Ap is a local martinagle.4. Any local martingale M has a unique (up to indistinguishability) decomposition M = M0 + M c + M d . (ii) E(Ap ) = E(AT ) for all stopping times T . So Nt − mt is a martingale. The above deﬁnition. To see this note that E(Nt − Ns Fs ) = mt − ms .6. for any s ≤ t. then we can show that Nt −mt is purely discontinuous despite the fact that this process does not consist solely of jumps! Theorem 21. and mt is a previsible (since it is deterministic) process of locally ﬁnite variation. Example Consider a Poisson process Nt with mean measure mt (·). This process has a compensator of mt .s. For A a nondecreasing locally integrable process.
d c where M0 = M0 = 0. such that A − Ap is a local martingale.
RCLL processes revisited
We shall be working extensively in the space of right continuous left limits processes (CADLAG) and we shall need a metric on the function space.
The rather ugly notation (M c )i refers to the ith component of the continuous local martingale M c and ∆Xs = Xs − Xs− . In other words if the process o X has a jump at time s.j=1
n i=1 t 0
∂f (Xs− ) dXs ∂xi ∂2f (Xs− ) d[(M c )i .
21.Discontinuous Stochastic Calculus a continuous local martingale then the following generalization of Itˆ’s formula holds o
n
94
f (Xt ) =f (X0 ) + + 1 2 i.2 and 4 are exactly what you would have expected by writing down an analogue of Itˆ for discontinuous processes of ﬁnite variation. o Also terms 1. Consider two functions deﬁned on [0. the value of f (X) changes by f (Xs ) − f (Xs− ). T ] to R. 10] by f (x) = and g(x) = 0 for x < 1 + . The jump in Xs will give rise to a discontinuity in
n 0 t
i=1
∂f (Xs− ) dXs ∂xi
with the jump having size
n i=1
∂f (Xs− )(Xs − Xs− ) ∂xi
To get the correct jump of f (Xs ) − f (Xs− ) then the corrective term is
n
f (Xs ) − f (Xs− ) −
0<s≤t i=1
∂f i (Xs− )∆Xs ∂xi
. (M c )j ]s ∂xi ∂xj
n
+
0<s≤t
f (Xs ) − f (Xs− ) −
i=1
∂f i (Xs− )∆Xs ∂xi
. A rough heuristic explanation for this generalized form of Itˆ and an aid to memory o of the formula is obtained by noting that the ﬁrst three terms on the right hand side are identical to those in the multidimensional form of Itˆ for continuous processes. T ] isn’t going to be very useful. 1 for x ≥ 1 + .2. for x ≥ 1. 0 1 for x < 1. The usual approach of using the sup norm on [0. First let us consider the space DT of CADLAG functions from [0.
.
This is deﬁned on the space of CADLAG functions on the compact time interval [0. for some λ ∈ ΛT
0≤s≤T
. g) := inf : λ ≤ . Unfortunately it is not complete! This is a major problem because Prohorov’s theorem (among others) requires a complete separable metric space! However the problem can be ﬁxed via a simple trick. sup f (s) − g(λ(s)) ≤ .Discontinuous Stochastic Calculus For arbitrary small these functions are always one unit apart in the sup metric. T ] and can be exteneded via another standard trick to D∞ . T ].
. for a time transform λ ∈ ΛT deﬁne a norm λ := sup
0≤s≤T
log
λ(t) − λ(s) t−s
.
Thus we replace the condition λ(s) − s ≤ by λ ≤ and we obtain a new metric on DT which this time is complete and separable.
0≤s≤T
95
sup f (s) − g(λ(s)) ≤
0≤s≤T
for some λ ∈ ΛT
. g)] dt.
This metric would appear to nicely have solved the problem described above. We attempt to surmount this problem by considering the class ΛT of increasing maps from [0. Now deﬁne a metric dT (f. dT (f. g) :=
0
e−t min [1.
∞
d (f. g) := inf : sup s − λ(s) ≤ . dt (f. T ] to [0.
Rogers. M. Dellacherie. Diﬀusions. Number 1464 in Lecture notes in Mathematics. and Williams. (1989). Springer. Bollobas. Markov Processes – Characterization and Convergence. (1996). (2005). Hermann. C. Duxbury. and Williams. G. E.. Springer. Mandelbaum.
Pardoux.A. (1990). 2nd edition.22. (2000). S. C. Ethier. Protter. Linear Analysis. Brownian Motion and Stochastic Calculus. Massey. P. M. Hermann. and Meyer. M. 30:149– 201. pages 67–163. References
Bensoussan. C. (1986). and Shreve. D. Probability: Theory and Examples. Paris. 2nd edition. S. Stochastic Integration and Diﬀerential Equations. R. G. and Kurtz. P. (1980). T. Filtrage non lin´aire et ´quations aux d´riv´es partielles stochastiques e e e e associ´es. Queueing Systems – Theory and Applications. A. A. W. C.. Martingale Methods in Financial Modelling. I. Diﬀusions. and Reiman. e Dellacherie. Probabilit´s et Potentiel B. and Rutkowski. A. (1987). o
[96]
. Probabilit´s et Potentiel A.A. Markov Processes and Martingales: Volume Two: Itˆ Calculus. L. Springer. Cambridge University Press. Markov Processes and Martingales: Volume One: Foundations. and Meyer. Karatzas. Musiela. Paris. P. I. B. (1990). (1994). (1998). CUP. D. Strong approximations for markovian service networks. Wiley. Stochastic control of partially observable systems. E. (1982). Springer e Verlag. Rogers. L. e Durrett. Cambridge. (1975). Wiley.